Going for the gold…Part II
Kathy | October 1, 2007
In the first installment of this series, the author discussed overcoming some common misconceptions to help you on your way to becoming an Elegant Maintenance Manager. This month, he deals with bringing corrective maintenance under control and extending it to preventive maintenance work to achieve efficiencies consistent with the assigned budget.
If one could get the right mix of preventive maintenance (PM) and corrective maintenance (CM), things might not be so bad. If CM could be reduced to zero, that would be grand, but that is not going to happen. Amid the propositions of RCM, TPM, RCA, the Pareto rule, best laid plans, etc., one still must contend with randomness. When our programs result in high levels of performance, the limiting factor is often randomness. Using the concept of randomness in analyzing equipment failure events is helpful in establishing the realistic limiting factors in PM and CM program development and management.
Small changes, big results
Before getting enthused about understanding the technical milieu of equipment failure modes and causes, failure intervals and maintenance task development in recovering a dysfunctional maintenance program, first look for an opportunity of a large reward for a small resource expenditure. One large reward, for example, could be a 50% reduction in corrective maintenance resulting from a program developed and implemented in-house (small resource expenditure). That 50% number is not unrealistic. A high incidence of maintenance induced failures in the CM arena result from poorly designed and implemented management strategy and systems. These “activity control failures” are commonly referred to as “personnel errors, miscellaneous, or unknown cause.”
If managers are not providing the maintenance staff with sufficient training, procedures, resources, time, leadership and competent supervision, they are their own worst enemy. One of the first and biggest values in maintenance program development is ensuring that you have a competent staff and that your management systems are effective. Only then can one proceed to implement a maintenance strategy in such a way that the maintenance staff does much more good than harm. Table I summarizes the analyses of hundreds of events at a variety of industrial facilities.
In addition to the categories shown in Table I, modification work (such as replacement of obsolete equipment, increasing capacity of an existing system, or meeting new regulatory requirements) can also be a significant cause of failures. In such instances engineering work is unavoidable, but sufficient engineering resources are often unavailable to complete all the requirements of installation, operation and performance qualification. In these cases the maintenance department is left with a classic, serious maintenance management problem, because the complexity of these seemingly simple facility events is not recognized. These failures are also “activity control failures,” even though most people think they are making desired improvements.
Let’s see if there is an elegant approach to addressing these serious problem areas in a maintenance program that is in a hole. Is there an elegant way for maintenance managers to stop digging, and then develop a strategy to lift their organization out of the hole they find themselves in?
Stop digging, start climbing
First, develop a “maintenance process” for doing your maintenance business. The maintenance process describes the conduct of maintenance at your facility. It becomes one of the key components of your strategy. It tells your staff how you want them to conduct the business of your department. Administrative aspects of the process probably already exist simply because of government regulations for doing business in general. The Elegant Maintenance Manager now must add the technical aspects and describe how the average maintenance technician and his supervisor should be conducting maintenance business and ensuring the operability and reliability of systems and equipment.
For example, does the maintenance technician know where to find the source documents for working on equipment? Are there source documents? How does he know what post-maintenance testing to conduct and what the acceptance criteria are? How does the supervisor interact to ensure communication and implementation of the maintenance process?
In a later article we will see that the maintenance process is a critical training document. That’s because if the maintenance manager specifies the conduct of maintenance, there will be goals and objectives for the maintenance technicians and supervisors to achieve on a consistent basis. If this is not done, work orders can become adventures being directed by a loose cannon or two. Maintenance process knowledge is as important as technical skills—do your people know how to work?
Second, take a look at your CM work load. Based on failure cause determination in hundreds of analyses covering thousands of mechanical, electrical, and instrumentation and control components, I discovered eight causes of failure for use in front end development of maintenance programs. These causes are:
- Maintenance activity
- Installation anomalies
- Operations and testing
Coupled with the foregoing focus on irregular failure causes, this also is a great opportunity to apply the thinking inherent in RCM to address the CM problem.
Choose specific maintenance tasks on the basis of the actual failure characteristics for the equipment under study as evidenced by the CM history. All these tasks can be described in terms of the four basic forms of maintenance tasks, each of which is applicable under a unique set of circumstances. The four forms of maintenance tasks are well defined in the RCM literature.
Earlier, we discussed the problems associated with things like obsolescence and modifications that are not maintenance problems but become defacto maintenance problems due to a lack of an engineering function. Applied RCM thinking will help Elegant Maintenance Managers identify this trap before it gets sprung on them. Getting CM under control this way is cleverly apt and simple and obvious. Even in RCM program applications, this approach works when the specified RCM task doesn’t meet expectations for whatever reason, and CM is occurring on an “RCM’ed” unit.
There is even a concept in RCM that goes something like this for overhaul tasks: In the case of overhaul tasks, the question of applicability as well as effectiveness requires an analysis of operating data. Unless the age-reliability characteristics of the item are known from prior experience with a similar item exposed to a similar operating environment, the assumption in an initial program is that an item will not benefit from scheduled overhaul. The implication of these concepts is “make good use of CM data in specifying applicable and effective tasks.”
Let’s see where we are at this time after completing the previous actions. We stopped digging the hole we were in by managing. That produced the strategy that materialized in a maintenance process statement that the maintenance supervisors communicated to the work force, then implemented in the field as the staff conducted the business of maintenance. If supervisors cannot be counted on to communicate the strategy and see to implementation, then coaching and counseling are in order followed by getting competent supervisors in place if the incumbents cannot adapt to change. The reactive component of the maintenance business came under control with the conversion of CM work to PM work. That’s a good start on being effective. By managing, one attacks 50% of the CM cause, and by applying RCM thinking, one can attack just about all the remaining CM problem(s).
In the opening paragraph, randomness was put forth as a limiting factor in the maintenance business. This is where randomness comes in to help the maintenance manager understand what the PM/CM ratios are expected to be. Empirical determination shows that the reasonable values of the PM/CM ratios (computed as PM work orders divided by PM+CM work orders) are approximately as follows:
- Mechanical Systems—80%. Due to physically harsh service environments and the interplay of many uncontrolled variables, there is a significant impact from random events in these systems. These systems also generally are associated with energy transport and conversion through physical system interaction.
- Electrical Systems—90%. These systems are generally better controlled regarding environmental conditions, thus minimizing uncontrolled variables and random events. Also, the energy transport and conversion is in many cases by means of an electromagnetic wave, thus taking less of a physical toll on equipment.
- Measurement and Control Systems—>90%. High CM in these systems is generally due to poor heat management or poor design. These systems should be among the easiest to maintain through a PM program and have a minimum impact from random events.
- Overall—85%. This overall performance level accounts for about a 15% random failure level that will show up as the CM workload in effective PM programs.
Within a year of instituting the management and CM changes, the Elegant Maintenance Manager should be able to achieve the PM/CM ratios listed in this bulleted list.
Save some, get more resource efficiency
Now would be a good time to take an initial shot at bringing efficiency into the PM program. It is important to be effective first, then efficient, so that’s where the Elegant Maintenance Manager would be at this point in the maintenance program recovery. There are obvious savings from reducing the number of PMs, so the PM intervals should be examined to ensure the maintenance staff is not overdoing the PM thing.
Examining the PM results versus the interval of performance and effectiveness of the PM process is the first step in adjusting the PM schedule. For a start, consider the following:
- Eliminate, within reason, all “tear down and inspect”-type PMs. This is the pointless “tear up the plants to check the roots” mindset that seems to make sense but in reality is one of the main causes of maintenanceinduced failures.
- For scheduled replacement tasks, examine closely the item being replaced and make a determination as to condition. If it is not that bad, consider extending the PM interval by 50%. If you and your staff do not know how to make this judgment call, learn how to do this as soon as possible.
- For all equipment in low-energy or low-duty-cycle applications, consider doubling the PM interval. Examples of low-energy parameters include temperatures less than 300 F, flow rates less than 100 gpm, operating pressures less than about 100 psi air or water or low-pressure steam systems, and duty cycles of eight hours per day or less.
- For all changes made per the recommendations listed here, revisit the interval question at the next performance of the PM to again extend the PM interval as possible.
- For non-critical, low-cost components with no collateral damage potential, consider a run-to-failure maintenance strategy and eliminate the unit from the PM program except for routine monitoring for deficiency identification.
- Evaluate all skid-mounted instrumentation and control components for usefulness compared to remote process monitoring instrumentation and control systems. Remove redundant or not-used devices from the PM and calibration programs.
- Design special tools, jigs, and fixtures to support maintenance on certain equipment as necessary and place these items under inventory control to ensure availability when needed.
- Consolidate PMs. Ensure that subinterval PMs are embedded in longer interval PMs and that as many PMs as possible are included in work packages to minimize equipment downtime and minimize PM logistics.
- Supply each supervisor, as practical, with a set of specialty tools and measuring and test equipment for their controlled use on their shifts.
- Remove all barriers in the procurement process from the requisition phase to the receipt and staging of parts to support PMs. This will likely require some “just in time” (JIT) procurement tactics, vendor partnering, and removal of enterprise asset management system (EAMS)-type roadblocks.
Two elegant programs to implement
There are two must-have programs that will provide some of the important information details for the maintenance information flow network. These details are important inputs to the feedback networks that the maintenance department needs to ensure awareness of what is going on. These programs are the “CM Backlog Measure Program” and the “Material Condition Inspection Program.”
A CM work order is generally designed to process work within the constraints imposed by the facility organizational structure. The constraints are in the form of assigned responsibilities and authorizations for completing the specified tasks including appropriate paperwork closeout. If this system is run efficiently and adequate staff is available, experience has shown that an optimal CM backlog can be defined such that there are sufficient manpower resources to address other tasks besides CM or handle a reasonable, sudden increase in the CM workload.
What does this really mean and what does it have to do with a maintenance department’s backlog measure? It means that backlog is something that occurs due to a maintenance organization’s ability to respond to an increased workload in such a fashion that the increased workload is still manageable as part of the organization’s day-to-day business activities. The essence of this concept is that the organization should have the inherent characteristic of a system that responds to how much work there is to do.
Historically, CM backlog has been presented as a trended plot of total backlogged man-hours or total open work requests. While this type of measure does provide some useful information on the CM backlog, it does not tell much about the effectiveness of the organization or why the backlog exists as it is. This is really what we would like to know as opposed to knowing how much CM work has not been done. We are more interested in what caused the alarm than the alarm itself. In order to make use of a CM backlog measure consider this measure to be an indicator of how time-dependent work is addressed by the maintenance organization’s work order processing system. The key item for understanding what backlog really means is the phrase “time dependent.”
The material condition of the facility should be maintained to support safe and reliable operations. It should be everyone’s business to identify and correct deficiencies and prevent the deficiency culture that comes from complacency. The basic approach to developing the facility material condition inspection program (MCIP) is as follows:
- Develop and implement an inspection program to define responsibilities for conducting inspection, identifying and correcting deficiencies, and assuring cleanliness, safety and good material condition. Establish inspection areas so that the entire facility is inspected, including areas with difficult access.
- Establish inspection guidelines and criteria to assist inspectors in performing their inspections.
- Develop a training program for appropriate station personnel, including operations personnel, facility managers, and facility supervisors, to receive inspection techniques training.
- Establish a means to report, track, and correct, identified deficiencies in a timely manner. Document each deficiency on a work order. (See Fig. 1 for a simple reporting document that is extremely effective for use by anyone.)
- Include recommendation of operation and maintenance good practices in this reporting program as a means of identifying areas for improvement.
A significant side benefit of this program is the equipment monitoring and diagnostic results of these inspections—a sort of informal predictive maintenance. But this predictive maintenance program is a real bargain, since the cost is simply the cost of using available resources.
This installment of the Elegant Maintenance Management series deals with the fundamental mission of the maintenance department. We all know that a lot of CM comes from poor judgment, and that good judgment comes from the experience of bringing a lot of CM under control. Next, that control must be maintained and extended to PM work to achieve efficiencies consistent with the assigned budget. Only then can there be sufficient success to ask for more resources. Remember, only you as the maintenance manager can stop digging. Empower your talented staff to do the climbing.
Dr. Huzdovich is the service contract manager for Raven Services Corporation at the Bureau of Engraving and Printing’s Western Currency Facility in Ft. Worth, TX. He directs the O&M and engineering work performed by the Raven staff of 58 employees, which is responsible for the 24/7 operation and maintenance of all stationary and production support equipment in these operations, including their 850-ton chilled water units, 800-hp low-pressure steam boilers, 3600 KW of diesel generator capacity, the environmental management system and currency mutilation destruction equipment. He also is the principal engineer and consultant providing maintenance and reliability services and expert witness services for Forensic Action Services, LLC, in Denton, TX. Huzdovich serves as an adjunct instructor with the University of North Texas, MBA Program. E-mail: email@example.com; telephone: (817) 847-3674.