Failure Finding: Why Bother?

EP Editorial Staff | November 2, 1998

Much of what has been written to date on the subject of maintenance strategy refers to three—and only three—types of maintenance: predictive, preventive, and corrective.

Predictive tasks entail checking items or components if something is failing.
Preventive maintenance means overhauling items or replacing components at fixed intervals.

Corrective maintenance means fixing things either when they are found to be failing or when they have failed.

However, there is a whole family of maintenance tasks which falls into none of these categories.

For example, when we periodically activate an alarm, we are not checking if it is failing. We are not overhauling or replacing it, nor are we repairing it. We are simply checking if it still works.

Tasks designed to check whether something still works are known as failure-finding tasks or functional checks. (In order to rhyme with the other three families of tasks, the author and his colleagues also call them detective tasks because they are used to detect whether something has failed.)

Failure finding applies only to hidden or unrevealed failures. This is because, by definition, the failure of an evident function inevitably becomes apparent to the operators, so there is no need to carry out regular checks to find out whether such a failure has occurred. So failure-finding tasks should be considered only if a functional failure will not become evident to the operating crew under normal circumstances or the failure is one that cannot be addressed by a suitable proactive maintenance task.

Hidden failures in turn only affect protective devices. The objective of failure finding is to satisfy us that a protective device will provide the required protection if it is called upon to do so. In other words, we are not checking whether the device looks OK—we are checking whether it still works as it should. (This is why failure-finding tasks are also known as functional checks.)

A failure-finding task must be sure of detecting all the failure modes which are reasonably likely to cause the protective device to fail. This is especially true of complex devices such as electrical circuits. In these cases, the function of the entire system should be checked from sensor to actuator. Ideally, this should be done by simulating the conditions the circuit should respond to, and checking if the actuator gives the right response.

For example, a pressure switch may be designed to shut down a machine if the lubricating oil pressure drops below a certain level. Whenever possible, switches of this type should be checked by dropping the oil pressure to the required level and checking whether the machine shuts down.

Similarly, a fire detection circuit should be checked from smoke detector to fire alarm by blowing smoke at the detector and checking if the alarm sounds.

If reliability centered maintenance is correctly applied to almost any modern, complex industrial system, it is not unusual to find that up to 40 percent of failure modes fall into the hidden category.

Furthermore, up to 80 percent of these hidden failure modes require failure finding, so up to one-third of tasks generated by comprehensive correctly applied maintenance strategy development programs are failure-finding tasks. (Note that these tasks must be done at frequencies that reduce the risk of a multiple failure to a tolerable level.)

A more troubling finding is that at the time they were written, many existing maintenance programs provide for fewer than one-third of protective devices to receive any attention at all (and then usually at inappropriate intervals).

The people who operate and maintain the plant covered by these programs are aware that another third of these devices exist but pay them no attention, while it is not unusual to find that no one even knows that the final third exist.

This lack of awareness and attention means that most of the protective devices in industry—our last line of protection when things go wrong—are maintained poorly or not at all.

This situation is completely untenable.

If industry is serious about safety and environmental integrity, then the whole question of failure finding needs to be given top priority as a matter of urgency. As more and more maintenance professionals become aware of the importance of this neglected area of maintenance, it is likely to become a bigger maintenance strategy issue in the next decade than predictive maintenance has been in the past 10 years. MT


Sign up for insights, trends, & developments in
  • Machinery Solutions
  • Maintenance & Reliability Solutions
  • Energy Efficiency
Return to top