Sage Advice: Understanding RCM

EP Editorial Staff | May 15, 2013

Respected industry icon Anthony M. ‘Mac’ Smith expounds on Reliability Centered Maintenance and some of the many benefits it can deliver.

His three decades of exploring, implementing and leveraging the principles of RCM have given Mac Smith valuable insight into the subject. This Q&A is based on his edited responses to several questions from a 2012 Webinar sponsored by the Pharma Special Interest Group (SIG) of the Society for Maintenance and Reliability Professionals (SMRP).

Q: What is RCM?

SMITH: RCM is a maintenance strategy that originated in the 1960s when United Airlines developed it for the 747-100 aircraft. This approach was so successful that every U.S. com-
mercial airplane since has specified RCM methodology as its initial maintenance program to the FAA.

RCM contains four key principles. . .

1. Preserve System Function (or avoid system functional failure). Because its focus is on the system rather than specific equipment, this aspect of RCM signifies its departure from the conventional maintenance mindset. With RCM, function will guide resource use.

2. Link Functional Failure and Equipment. RCM calls for a detailed, component-to-component system review to determine what specific equipment failure modes could lead to functional failure. The identified failure modes may deserve special attention.

3. Determine Failure-Mode Criticality. Failure modes should be separated by the impact their occurrence will have on an operation, using the following guidelines:

Ocurrence will violate a safety or environmental requirement.
Occurrence could result in partial or complete plant outage.
Occurrence results in neither of the above (making this failure mode benign and a possible run-to-failure candidate).
Is the occurrence known or unknown to system operators? Unknown failure modes are considered “hidden,” and are a special area of concern.

4. Define the PM task to implement (assuring a proactive approach to the failure mode’s prevention, mitigation or, if hidden, its discovery).

In any RCM analysis, all four principles must be addressed. An analysis that ignores or short-circuits any of them can’t be considered RCM. In cases where RCM has been perceived as ineffective, the reason is usually not RCM strategy, but the implementers’ failure to follow RCM procedures.

Q: What’s the difference between Classical RCM and RCM2?

SMITH: 80/20 systems are where 80% of maintenance issues are caused by 20% of the plant systems. The Classical view is that these systems deserve most of the attention and resources. The RCM2 process is based on the premise that every system of a plant needs full RCM treatment.

Another difference is the manner in which the second RCM principle—linking functional failure and equipment—is accomplished. Both Classical RCM and RCM2 use the Failure Mode & Effects Analysis (FMEA) approach, but differ in how to apply it. Classical RCM, for example, acknowledges that failure mode can be separate from failure cause by requiring that each be recorded separately during analysis. RCM2 makes no such distinction, allowing for both to be recorded in a single column.

Classical RCM also requires that the Effect portion of the FMEA process be recorded for each failure mode at three levels: locally, for the specific equipment involved; systemically, for the system level in which that equipment resides; and at the plant level. Using these three levels assures that the analyst must carefully consider the possible cascading consequences of the failure mode all the way to its possible effect on the entire plant.

By contrast, RCM2 uses only a single column to record Effect. This can provide an inconsistent portrayal of failure-mode consequences if the failure is determined to affect only one level. Absent a complete picture of failure effect at all three levels, a later difficulty can arise in the process to determine the potential safety or outage criticality imposed by the failure mode.

A third area of concern in the FMEA is the manner in which RCM2 records the selection of a PM task for each critical failure mode. Classical RCM requires that all reasonable PM actions proposed be recorded, not just the final selection. In RCM2, only the selected PM task is recorded. Since these RCM analyses are often revisited, I have found it very useful to know all of the task options that were originally considered, especially when the original does not seem to provide the expected result.

Q: What is the difference between failure mode and failure cause?

SMITH: Failure mode describes what went wrong, usually with two or three words, one of which is a verb. Examples include “connecting shaft cracks” or “pipe joint leaks.” Failure cause describes why it went wrong, such as “low cycle fatigue” or “gasket age deterioration,” respectively, for the two examples noted. Separation of these terms is necessary because maintenance strategy must ultimately specify the task definitions that will be used to issue PM work orders that will eliminate or mitigate failure modes. It will be known, for example, how to avoid or mitigate the occurrence of shaft cracks (with vibration monitoring or alignment checks), but not that a work order is needed to specifically stop low cycle fatigue. Likewise, it will be known how to avoid serious leaks (by tightening joints or through periodic inspection), but not if a specific work order is required to stop the natural degradation that occurs over time with gasket materials.

Stated differently, failure mode is what directly results in a corrective maintenance action and, possibly, a plant or system outage. From a maintenance-strategy perspective, it’s also the failure mode that maintenance can stop or mitigate before it becomes a failure effect. And while maintenance work orders cannot realistically eliminate or mitigate a failure cause, an accurate estimate of causes can provide information needed for a change in design or operating procedure.

Elimination or mitigation of failure cause is a design issue, while elimination or mitigation of a failure mode is a maintenance issue. If we mix up the two terms, it can be difficult (or impossible) to define what maintenance action should be taken versus what design change should be considered.

Q: How do we get RCM buy-in?

SMITH: When trying to get organizational buy-in for an RCM program, the most important consideration is money or, more specifically, return on investment (ROI). Introducing RCM into the operations and maintenance side of a plant requires top management to commit staff resources—mainly O&M supervisors and technicians—to perform RCM studies, then take actions to implement the findings.

The biggest hurdle is often to get approval for a pilot project to determine how RCM will work within the plant and its culture. In making its decision, management will consider the cost to do this, including consultant charges, versus what they can expect in terms of ROI.

Using the 80/20 approach, I’ve not found it difficult to convince O&M management to try a pilot program. But if you have to sell the program, it behooves you to play a key role in selecting both the system for the pilot as well as the personnel who will supply the data and information for the RCM analysis. Be sure to focus on critical equipment for the best ROI, and to choose top people. Make no mistake: The proper database for an RCM analysis will come from the “A” team of craft technicians.

If you meet the ROI test, your next challenge will be to gain buy-in from the larger O&M population. You may now have a handful of converts from the successful pilot project, but to make this a plant-wide program, a majority of their peer technicians must be brought onboard. No matter what you try, things won’t happen overnight—but the following strategies can help change mind-sets:

1. Embark on a steady training program regarding RCM and its potential benefits for all who might interface with its analysis and implementation. People tend to automatically resist change, especially if they don’t understand what RCM is about. Count on individuals who have experienced the successful pilot project to be key participants in your training efforts.

2. Maintain program visibility at the management level on what’s happening and what’s planned Define KPIs that are to be religiously tracked and frequently reviewed with management. You must keep management on your side and not let them lose sight of the fact that ROI is trending in the right direction.

3. Consider designating an “RCM Champion” who will be responsible for assuring that the above actions happen.

Based on my 30-year experience with Classical RCM implementations, about 60% succeed and 40% either never get off the ground or stall after the pilot project. The failures can be traced to one or more of the above three factors not being done.

Q: What about the 20/80 systems?

SMITH: Concern for 20/80 systems—or “better-behaved” systems—was first expressed to me in the 1980s by U.S. Air Force management at the Arnold Engineering Center, in Tennessee. It came after three years of multiple, successful Classical RCM Projects on 80/20 systems. My response, which also proved successful, was to develop the Experience-Centered Maintenance (ECM) analysis methodology.

ECM is not RCM: It is data-driven while RCM is function-driven. ECM data comes from two sources. The first is data obtained from existing PM tasks, which is analyzed to see if the tasks performed as expected and, if not, what should be changed. The second source is data obtained during corrective maintenance events from the previous 12 to 18 months—and follow-up analysis to learn why these unexpected failures occur-red and how they could be prevented.

The ECM process takes about one-fourth the time of a full-blown Classical RCM study. But ECM isn’t intended for use on 80/20 systems: Those problems typically require more than a minor correction to the existing PM Task structure. ECM also does not explore the full range of possible failure modes that could initiate functional failures.

Q: What’s the future of RCM?

SMITH: The answer is self-evident: U.S. industry is still predominantly in the reactive maintenance mode and, for the most part, doesn’t even recognize the application of the 80/20 rule. RCM can change that unfortunate situation to a shift toward proactive maintenance. The opportunity is there for those who choose to seize it. MT

Mac Smith is Principal Consultant with AMS Associates (San Jose, CA.)His 50+ years in engineering include 24 with GE’s aerospace, jet-engine and nuclear-power operations. He has personally facilitated more than 75 RCM studies and authored two books on the subject. Telephone: (408) 532-7126; email: amsassoc@sbcglobal.net.