FMEA vs. FRACAS: What’s Best?

Column • Reliability

FMEA vs. FRACAS: What’s Best?

EP Editorial Staff | October 10, 2018

By Drew D. Troyer, CRE, Contributing Editor

As a reliability engineer, I’ve frequently debated the merits of FMEA and FRACAS in managing plant- and equipment-reliability risks. While many people consider them to be competing tools, that couldn’t be further from the truth. In fact, as I’ll try to explain here, they’re opposite sides of the same coin. (This column will either clear the air or get me into more arguments.)

FMEA (Failure Modes and Effects Analysis) is an analytical process that uses inductive reasoning to hypothesize what might go wrong with a plant, system, sub-system, component, or part. It’s an ideal tool in the design phase of the asset lifecycle where operational data is scant and/or unavailable. Engineers prioritize the failure modes by scoring three separate factors on individual scales of 1 to 10: likelihood of occurrence, severity of failure, and effectiveness of control systems, such as performance-monitoring systems. The three individual scores are multiplied to produce a risk-priority number (RPN) ranging from 1 to 1,000. A score of 1 gets the least attention, while a score of 1,000 receives the greatest amount from design engineers who modify designs, operating parameters, and/or operating procedures to reduce risk to manageable levels. In reliability-critical applications, i.e., aviation and nuclear power, FMEA often leads to experimental testing to evaluate competing designs and/or reliability-growth progress as a design matures. This process is often referred to the “Stage-Gate” design method.

FRACAS (Failure Reporting, Analysis, and Corrective Action System) is best suited for the operations and maintenance phase of the asset life-cycle. It starts with failure reporting (FR). Failure reporting can occur at the plant-, system-, sub-system-, component- and/or part-level, depending upon the analytical interest. In all instances, failure modes must be reported using a standardized taxonomy to ensure they’re grouped in the right analysis bin. For severity analysis, rather than use the somewhat arbitrary RPN, we simply multiply the number of events by the cost and/or HS&E (health/safety/environmental) impacts associated with them. Then we move to cause analysis (A). For lower-impact or simple events, we use apparent-cause analysis (ACA). For more serious events, we turn to root-cause analysis (RCA).

Remember, when performing cause analysis, you must define the mechanisms to connect the cause to the mode (see sidebar). Corrective actions pertain to, among other things, changes in material, operating procedures, maintenance procedures, and warning systems. Those trained in reliability-centered maintenance (RCM) have been instructed to create FMEAs. But for the most part, if you’re managing equipment in the operating or maintenance stages of its lifecycle, you are employing FRACAS. The question is, “Why guess with FMEA when empirical data is available?” Rather than anguish over wording, let’s put these combined tools to work. In an upcoming e-newsletter article, I’ll provide more details, along with links to two indispensable additions to your library. EP

Reliability 101: Failure Modes, Mechanisms, and Causes

• Mode = What went wrong
• Mechanism = How it went wrong
• Cause = Why it went wrong

Note: To get from failure mode to failure cause, you must define the mechanism.

Based in Tulsa, OK, industry veteran Drew Troyer is principal with Sigma Reliability Solutions. Email Drew.Troyer@sigma-reliability.com.