What Do Successful Reliability Engineers/Managers Do?
Kathy | October 1, 2008
So you’re the reliability engineer? Now what?
Corporations large and small have finally realized that if everyone is responsible for reliability, no one is responsible for reliability. The move to put someone “in charge”— with his/her primary responsibility being to assure reliable physical assets and operating systems—has spawned a position in industry that may remind one of the vegetable juice commercial and have management slapping themselves on the forehead and saying, “I should have had a reliability engineer.”
We will use the title “reliability engineer” (RE), and there are, indeed, certifications that bear this title. Petroleum plants, airlines and nuclear power facilities usually have degreed and/ or certified REs, but many small- to mid-size companies are realizing great benefit from “reliability managers,” “reliability supervisors,” “condition monitoring engineers,” “preventive maintenance specialists,” etc. Just having someone who feels that reliability is his/her first responsibility brings immediate dividends in many cases. This article is meant to help those who have been thrown into the fray and may need a little direction and camaraderie.
Today’s reliability engineer (RE) position can be one of the most exciting and most rewarding jobs in industrial America. It can be likened to a revolution in management with the RE being the “field general.” Unfortunately, some organizations may struggle when it comes to clearly identifying and communicating the actual job responsibilities for such a position. While you, as an RE, may find yourself in this situation, remember that it presents you with a tremendous opportunity—you can “define” the role of a reliability engineer for your company. Once you see the possibilities, you’ll probably become one of the busiest people in your operations, and you’ll love every minute of it!
Continuing with the “field general” analogy, one of the RE’s most important tasks is to assess the enemy’s strengths and his own forces’ weaknesses, then convey that information to headquarters so the resources necessary to attain victory don’t dry up. Anytime we use the word “assess,” the word “measure” won’t be far behind. How can you:
- Measure your need for reliability (assess your facilities’ weaknesses)?
- Measure your capabilities to meet the need for reliability (assess your department’s strengths and weaknesses)?
- Quantify your successes, so that the resources necessary to sustain the “revolution” are forthcoming?
Job #1 is measuring reliability (or, if the glass is half empty, “unreliability”). That’s because the level of support you receive is directly related to how imperative your work is perceived to be.
There must be a measure that is understandable at the top levels and directly related to the bottom line. Operating Equipment Effectiveness (OEE) is an excellent measurement that can be related to everyday occurrences and makes sense in the boardroom. You must make sure the measurement is done with total integrity. If people know it is a fudged number, it (and perhaps you) will become, at least to a degree, meaningless. OEE is the product of available operating time, operating time efficiency and product quality, and it is a cornerstone of Total Productive Maintenance (TPM).
An equally important measurement of reliability is MTBF (mean time between failures). While OEE is a great “overall” number and a starting place to begin understanding what should be worked on, MTBF can be more easily targeted at groups of poorly performing machine populations. Once quantified, certain causes will be obvious. Ricky Smith of Ivara notes that MTBF “is the number one measurement of reliability worldwide.” If you are not measuring MTBF of processes or production trains, and individual equipment populations, you are neglecting some of the most powerful information an RE needs to have at his/her fingertips.
Many times, with these two measurements (OEE and MTBF) quantified and publicized, conscientious employees spontaneously will begin to seek out improvements.
There are many other measurements that are a “must” for the RE. Are you measuring the “leading indicators” of reliability? There are behaviors that lead to improved reliability that should be measured; condition monitoring (CM) route compliance, predictive maintenance (PM) route compliance, training, planning, scheduling, stores, tool time, etc. Put these measurements in place and improve them, then watch your OEE and MTBF begin to improve.
While we refer to “measuring” as Job #1, there are several others that also could share that title—jobs that must be done simultaneously with measuring, and to a high level of proficiency.
Every moving part in your facility is subject to friction and wear. Moving metal parts are separated by mere microns of lubrication (or should be). Join the Society of Tribologists and Lubrication Engineers (STLE) or the International Council of Machinery Lubrication (ICML). Become an expert on lubrication. More importantly, make sure you have highly trained and qualified lubrication technicians on staff. They should be well paid and understand that the organization is keenly aware of their contribution.
When your lube techs are out in the field, going through the tedious and dirty task of making sure their jobs are done correctly, your company is depending on their personal integrity even more than their technical expertise. A tiny amount of contamination in the wrong place can alter that top level of OEE.
If you don’t lubricate well, not much else will matter. Your maintenance department will be tied up responding to failures, and no one will feel like he/she can afford to invest in resources other than those needed for repair. Lubrication, performed at the necessary level of competency, can help stop the vicious cycle of break and repair quicker than anything else in many facilities.
- Predictive technologies and technicians
Do you have a training plan for your condition monitoring technicians? Since the success of any leader is greatly impacted by the competency of those he or she is directing, don’t leave this to chance. Develop goals for each individual and drive excellence in your department. Become at least competent in knowing what technology will give you the most cost effective condition evaluation and see that the individuals performing the tasks are well trained and take personal pride in their abilities.
- Vibration analysis
Of all technologies, vibration analysis still yields the most complete picture of rotating equipment health. The well-trained and equipped vibration analyst doesn’t have a crystal ball, but he or she will be capable of tracking the health of rotating equipment. In addition, vibration measurement can detect many conditions, such as misalignment, that can be alleviated or improved, which will lead to much healthier and more reliable equipment.
- Infrared thermography (IR) and oil analysis
Thermal imaging detects many problems that vibration analysis cannot, and oil analysis is a must in optimizing hydraulic system and gear reducer life expectancy and yielding excellent equipment condition and failure analysis information. There are other valuable condition monitoring technologies, but these are the most widely used.
- RCFA (Root Cause Failure Analysis)
Are you driving root cause analysis of past failures? Some level of RCFA training is a must. Failure analysis is a kind of proactive reactiveness, if that makes any sense. You are reacting to past failures in order to prevent future ones.
- RCM (Reliability Centered Maintenance)
Some level of Reliability Centered Maintenance training is absolutely necessary. (Can you define “failure?”) RCM actually is a type of failure analysis— before the failure happens. Moreover, it is completely proactive. The marriage of RCM and RCFA is unbelievably robust.
- PMO (Preventive Maintenance Optimization)
You should be facilitating some form of preventive maintenance (PM) improvement or optimization. The RCM training will impact this, and is necessary before seriously installing a PMO vehicle. The most effective PMO incorporates RCM principles. PMs are the backbone of reliability. (Lubrication is really a subset of preventive maintenance.) PM improvement should be ongoing, with established triggers to re-evaluate any PM task that proves to be insufficient.
- Precision Maintenance Practices
Although the repair technicians may not report directly to you, you must understand that the quality of their work will profoundly impact your success. Use simple charts to illustrate the hidden cost of poor repair practices such as shaft misalignment, improper bearing mounting, etc. It should be demonstrated how these affect the OEE and MTBF measurements.
A relatively small investment in the proper tools and training can easily result in equipment lifetime extensions of 100 to 500%! Many times we congratulate ourselves on a four- or five-year equipment lifetime, yet a properly aligned machine train wouldn’t wear out seals or bearings for many more years.
Think, walk & talk “reliability”
One cannot hope to succeed in a campaign of defect elimination until repair technicians are properly trained and equipped. When an improving MTBF is viewed with pride and coveted by the staff of repair technicians and involved operators, as well as the “reliability group,” you are on your way to world-class performance and reliability.
Networking can be valuable for those in a reliability role. Join an association of reliability professionals! Think, walk and talk reliability. Learn and promote the TPM concept of “basic” equipment condition or “clean, tight and lubed.” You cannot expect extended equipment lifetime if these three conditions are not met—and you must sell this to everyone, from upper management to every shop floor employee.
This recap of roles and responsibilities for individuals who find themselves in the RE position is by no means comprehensive—but it is enough to keep one busy for a while. And why not? After all, your job should be the busiest. If you do it well, the repair department isn’t jumping through hoops and the production department is just going through a routine. The upset conditions will become a thing of the past.
A final thought
Here is a little advice about communicating a CM mindset. When others are convinced of the bedrock logic of condition monitoring, your job will be much easier. What I mean is, those at the top level of leadership in your company are looking at quantities that begin with $$$. They can easily connect uptime (the root of the availability # in OEE) to $$$. This being the case, they expect uptime! Everyone must understand that for this expectation to be rational, the condition of the equipment they expect uptime from must be known. You wouldn’t walk into the doctor’s office and expect a diagnosis without the necessary tests to determine your condition. By the same token, only when the condition of the equipment is measured and known can anyone rationally predict uptime. MT