The Fundamentals: How To Begin Measuring Maintenance Effectiveness Part I

EP Editorial Staff | May 1, 2009


Going somewhere with your maintenance efforts? The journey can be much easier for you and more value-added for your operations with the right KPIs to help guide you.

I always thought the admonition that “you can’t get there from here” was a meaningless phrase—something old-timers used to say when they were trying to be funny. But as I began to write this article about developing a simple set of metrics to help you track your maintenance effectiveness, this venerable chestnut sprang to mind. That’s because, for many maintenance organizations, it’s quite a prophetic statement.

You truly can’t get there from here if you don’t know where you are now … where you’re going … how far it is from here to there … the route you want to take … how fast you are going … how fast you are capable of traveling … who is driving … how skilled the driver is … and how others who made the trip from here to there have fared.

Many metrics deal in some way with process or machine failure. Therefore, before we can have a meaningful discussion about developing metrics, we need to take a moment to create a workable definition for the concept of failure. A good way to define failure is by looking at its opposite. A machine that does what we want it to do when we want it to do it is not in failure mode. Any other condition can be considered failure to one degree or another. If you think this definition is too simple, look at some of the following scenarios.

To develop KPIs, you must first gather timely, meaningful, honest, consistent and relevant data on your maintenance performance, then statistically analyze your data to mark your progress.

Say we have a machine that is supposed to start at 8:00 a.m. and run for 10 hours while producing 100 widgets per hour with no more than one defect per hour for any reason. If the machine won’t start until 8:30, it is in failure mode. If it produces 16 rejects during the course of the day, it is in failure mode. If it has to be shut down twice for adjustments, it is in failure mode. If it must be shut down an hour early because it has a gearbox running hot, it is in failure mode. If it is capable of producing 100 parts per hour, but only 90 parts per hour were produced, it is in failure mode.

There are two things you should have picked up from the foregoing example. First, the concept of failure is directly related to your expectations. Second, if you lower your expectations to the level of your current reality, you are fooling yourself.

If you are not satisfied with your current maintenance reality, you can change the scenario by beginning to manage by metrics, or Key Performance Indicators (KPIs). A KPI is simply a standard measurement of some aspect of your process that is used for comparison purposes—either with similar measurements made over time in your own plant, or against industry standards or other benchmarks.

Gathering and analyzing the right data
To develop KPIs, you must first gather timely, meaningful, honest, consistent and relevant data on your maintenance performance, then statistically analyze your data to mark your progress. Because these factors are so important to the development of meaningful metrics, we will briefly examine each of them.

maintenance-effectiveness-bulletTimely Data. Regardless of whether it is generated by electronic means or gathered by hand, the data that you collect for analysis must be timely. The reason for this is that the closer the information is to what is happening in real time, the closer you are to looking at your actual maintenance reality. There is limited value in knowing that your process was running at a 79% uptime level 30 days ago, particularly if that is the most recent data you have.

For any metric to have meaning, it must be compared with something. Accordingly, there is more value in being able to compare a 79% uptime 30 days ago with an 84% uptime yesterday—although even this comparison is still extremely limited in its use as a management tool. Why?

Simply put, this comparison shows that on one day the process ran at 84% uptime, and on a different day it ran at 79%. And that’s all it shows. You don’t know why it ran the way it did on either day. Was it chance? Was it machine failure? Was it a maintenance issue? Did an operator push the wrong button? You don’t know what either of these numbers means in comparison to your own process or to others similar to it. You can’t even say for sure that your uptimes are trending upward. You need more data.

maintenance-effectiveness-bulletMeaningful Data. To begin managing by metrics, you must be sure that the data you collect is meaningful to your process. Meaningful data by definition will be that which pertains to and is a part of your process.

Suppose, for example, you have a machine center that is part of a single-line process in your plant, and that a bearing locks up for lack of lubrication. The data from this episode has meaning to you. It tells you something about your process that you need to know.

Now suppose a meteor falls out of the sky onto that same machine center during a production run. Even if you experience a substantial amount of production loss in the meteor strike’s aftermath, any data you may generate from this occurrence is not meaningful to your maintenance effort. The episode was not preventable, was not a part of or a result of your process and cannot reasonably be engineered out of your process. This is an extreme example of the randomness of chance, of course, but if you look closely at the data you are gathering, you may find a few meteor strikes in your own plant.

maintenance-effectiveness-bulletHonest Data. It would seem to go without saying that honesty in data collection is a must—and from my observations in the field, most maintenance organizations make at least some effort to be objective in the data-gathering process. That is laudable, because if the data is tainted in any way, there will be no value to the resulting metrics. Consequently, any maintenance decisions based upon said information will have little or no effect on your process if you are lucky, and may actually impede your maintenance process if you are not.

You can only do what you can do. If you have limited resources for data collection, you can’t afford to waste them by collecting data on issues that are outside of your control.

The biggest deterrent to accurate data is human involvement in the data-gathering process. Even now in the new millennium, most data is derived from operator or maintenance input. If the organization has an environment in which trust has flourished, employee error, poor judgment or just bad luck is more likely to be reported for what it was. In a punitive environment, though, even honest employees may subconsciously attempt to report the data in such a way as to shift management’s gaze away from them. As a result, any decisions made about the cause of a failure could be wrong.

maintenance-effectiveness-bulletConsistent Data. The importance of consistency in data collection cannot be overstated. The methodology used must not change from one collection to the next, since any added variables to the process will skew the measurement and corrupt the results.

A large amount of data is now collected and compiled by the PLCs and computers that control the manufacturing processes. Provided the sensors and probes are in proper calibration, this automatic collection of data can eliminate many of the variables and other sources of collection error that are inherent in human data collection. If most of your data collection is still in the realm of employees, some of the human sources of variables that must be considered include training, personnel change, process contamination, fatigue, recording error and work ethic.

maintenance-effectiveness-bulletRelevant Data. At first glance, relevant data may seem to be the same as meaningful data, but they are different. Meaningful data can lack relevance. This concept harkens back to the old joke about how does one eat an elephant? The answer is that an elephant is eaten one bite at a time. Similarly, when you are first beginning to manage by metrics, the scope of your efforts may be limited. You can only do what you can do. If you have limited resources for data collection, you can’t afford to waste them by collecting data on issues that are outside of your control.

Coming up
Now that we have reviewed some of the issues surrounding data collection and analysis, we can move forward to the next step of the process, which is to look at some of the more common metrics, what they mean, and how to calculate them. In future installments of this quarterly Fundamentals section (starting with the August issue) we’ll explore a number of KPIs, including: Production Uptime Percentage, Mean Time Between Failure, Mean Time To Repair, Overall Equipment Efficiency, Planned and Scheduled Work Completion Rate, Maintenance Overtime, Storeroom Service Level, Preventive Maintenance Hours as a Percentage of Total Work, Predictive Maintenance Hours as a Percentage of Total Work and Emergency Repairs as a Percentage of Total Work. Moreover, we’ll discuss their significance to your type of operations. Stay tuned. MT

Ray Atkins, CPMM, CMRP, is a veteran maintenance professional with 14 years experience in the lumber industry. He is based in Rome, GA, where he spent the last five years as maintenance superintendent at Temple-Inland’s Rome Lumber facility. He can be reached via e-mail at or through his Website,



View Comments

Sign up for insights, trends, & developments in
  • Machinery Solutions
  • Maintenance & Reliability Solutions
  • Energy Efficiency
Return to top