Failure Codes Key To Problem Elimination

CMMS • Maintenance • On The Floor • Preventive Maintenance • Reliability

Failure Codes Key To Problem Elimination

Klaus M. Blache | May 1, 2021

If you properly set up failure codes and tie them to work orders, you can use the system to track repeat failures and trigger root-cause analysis.

Q: How important are failure codes?

A: A better way to ask the question is, “What are failure codes doing for you?” The response would explain their current importance, but possibly not everything they should be doing for you. A failure code is basically an alphanumeric grouping that, in a concise way, depicts why the asset failed. Codes include such things as air leaks, calibration issues, operator error, insufficient lubrication, over lubrication, misalignment, overheating, machine controls, contamination, defective part, and abuse/vandalism. Exact format varies based on your industry, maintenance system, and internal processes.

How many failure codes you use (how specific you get) depends on your plant-floor culture and overall maturity of current maintenance practices. If you adequately tie the codes to work orders, you can track repeat failures to trigger root-cause analysis. Are your failure codes:

• available to all and understandable
• trended on a Pareto chart
• put on a matrix of severity and frequency of occurrence to prioritize efforts
• specific enough to enable problem solving, but easy to use
• tied to such items as individual assets, components, and causes
• supported with enough description/clarity to enable continuous improvement
• generic or categorized such as maintenance-tasking insufficient (resulting in issues such as not enough or too much lubrication or improper installation), asset problem (pump, bearing, and motor), calibration, or human errors
• built with a parent-child hierarchy so you can view issues, causes, and fixes
• used as input for your root-cause analysis process, FMEAs, TPM (looking at six big losses with continuous improvement teams)?

My studies show that, regardless of the computerized maintenance management system (CMMS), an average of less than 30% of the full capability of the CMMS is applied. What I typically hear is that it’s probably not more than 20%. Your failure-code strategy needs to be well thought out and implemented to align with your level of maintenance and reliability maturity. This should be done as a team effort with all of the groups (maintenance, engineering/reliability, operations, finance, technicians, and operators) to assure proper input and buy in. Just like putting together your workflow/management process before selecting your CMMS, the appropriate failure codes should be set up before full CMMS implementation.

I see numbers between 20 and 40 failure codes in many companies. Most computer systems allow you to set up failure codes to a naming convention. Be consistent. If tied to assets, you will total more codes as you get more equipment specific. Keep it simple and effective. More codes are fine if it works in your culture and gets accurate input.

The International Organization for Standardization (ISO), Geneva, Switzerland, (iso.org), has some documents that can help. One example is ISO 14224 (First edition,1999-07-15), Petroleum and natural gas industries—Collection and exchange of
reliability and maintenance data for equipment. It’s focused on how to collect and organize reliability and maintenance data in a common format, while assuring data quality. The document covers related terms, definitions, and abbreviated terms; data format, categories, structure; equipment failure and maintenance data; and failure and maintenance notation. Although it is industry specific (drilling, production, refining, and pipeline transport of petroleum and natural gas), it’s a good model for setting up your system to compare high-quality data between facilities.

Once you have trustworthy data, it can be used for deeper analysis in system design and to optimize maintenance, reliability, availability, safety, and life-cycle decisions. Many of the companies I interact with, including Fortune 100 companies, have similar data-trust issues. The pivotal question is, “Are you willing to make strategic decisions based on your analysis using plant-floor data?” The top revenue-producing companies typically have excellent talent to analyze, generate decision models, and optimize actionable steps. They perform some of the steps, but just as many times they choose not to take actionable steps. When asked why, they respond, “We don’t trust the data enough.” Yet, at the same time, if they were able to get higher-quality data and perform the analysis they are already capable of, it would provide a huge competitive advantage. It depends on implementation and culture.

Data collected can be the failed component, symptom of the problem (failure code), cause (why did it happen), effect (what resulted because it happened), and what was repaired/changed to correct the problem. Remember, it’s not about what works for someone else. It’s about what will work best in your organization.

A wealth of data most likely can help resolve many of your plant-floor issues if done correctly. Good failure coding can:

• provide a source of data to perform numerous types of analysis (such as maintenance effectiveness, RCM, FMEA, repeat failures, PM interval optimization) to improve maintenance processes (PM optimization) and operations

• help eliminate bottleneck operations and improve MTBF (Mean Time Between Failure)

• remove unnecessary maintenance tasks when used with a PM Optimization effort

• add necessary maintenance tasks if the issue cannot be engineered out

• provide insights from prior experience that can assist in more timely trouble shooting

• tie to your work orders to generate better instructions, understanding, and fixes

• assist in selecting better PM intervals

• point to training needs

• provide data and trends that guide your reliability strategy and design-in future reliability.

When used correctly, data provides an incredible amount of knowledge to analyze and capture. Overall, it’s an opportunity to improve safety, reliability, operations, and cost/ROI. EP

Based in Knoxville, Dr. Klaus M. Blache is director of the Reliability & Maintainability Center at the Univ. of Tennessee, and a research professor in the College of Engineering. Contact him at kblache@utk.edu.

Failure Codes Key To Problem Elimination

FEATURED VIDEO

ABOUT THE AUTHOR

Klaus M. Blache

View Comments

Sign up for insights, trends, & developments in

Failure Codes Key To Problem Elimination

FEATURED VIDEO

ABOUT THE AUTHOR

Klaus M. Blache

RECOMMENDED ARTICLES

View Comments

Sign up for insights, trends, & developments in