Take A “CSI” Approach to Asset Management
Ken Bannister | November 16, 2015
The types of analytic, forensic, and diagnostic methods leveraged by various crime-lab sleuths in a long-running television franchise also have value for maintenance organizations in real-world environments.
Many maintenance professionals may be fans of the popular CBS television crime series “CSI (Crime Scene Investigation).” Modeled in the classic whodunit format popularized by Sir Arthur Conan Doyle’s Sherlock Holmes character, “CSI” has always used a mix of cutting-edge technology and common sense to quench our quixotic need to provide simple solutions and answers to what are most often complex problems—something that sounds much like conducting troubleshooting procedures in plants. The latest and, reportedly last, iteration of this franchise, “CSI: Cyber,” may be of particular interest to those involved in a site’s overall asset-management activities, given the show’s focus on computer networks and digital information. If you haven’t already, consider taking a page from this world of fiction in your efforts. It could pay great dividends.
Then and now
For decades, maintenance organizations have long prided themselves on their ability to swoop in at a moment’s notice and save the day whenever a breakdown occurred. The 1970s and 80s witnessed a wholesale change in industrial maintenance from a total reactive maintenance model to the commencement of a more preventive and proactive model—largely due to the introduction of computerized maintenance management software (CMMS), an improved understanding of how to prevent machine failure, and a lean approach to work that focused on waste elimination. Unfortunately, many of the maintenance-improvement approaches implemented to date have not evolved to meet the changing needs of today’s industrial environment.
To be sure, most companies have updated and/or changed their CMMS programs and added preventive-maintenance (PM) work orders to the system for new equipment purchases over time. Still, few have actually performed analytics on their CMMS set-up and operation data to determine its validity and ability to allow data-driven decision making.
Marcel Proust, a nineteenth-century novelist stated, “The real voyage of discovery consists not in making new landscapes but in having new eyes.” Analytics are the eyes that allow the compilation of meaningful data for communication purposes from which to make informed asset-management decisions.
A quick, two-part litmus-test procedure for a management system is to perform several basic queries (reports) on the data to see how well it can be mined.
The first test looks for a report on all of the data for the current year and the previous two years to determine what percentage of the maintenance spend was allocated to labor and what percentage to maintenance parts. This information can be further distilled to understand what percentages of the maintenance spend went to internal maintenance staff labor and parts versus outside contract staff labor and parts use.
The second test is to determine, over the same periods of time, the mean time between failure (MTBF) for all assets combined. These figures can then be trended on a simple graph to show the gross maintenance-spend relationship to overall asset reliability. These simple reports allow management to perform a validity test of maintenance spend versus reliability that can be more focused by performing similar reporting on smaller groups of assets, such as manufacturing (product) line, location, equipment type/group, and supervisor, to discover reduced levels of performance and determine key opportunities for improvement. This information can be scrutinized even further to determine if the PM task or schedule is valid. The ability to generate these simple performance reports allows the user to relate the outcome to maintenance processes and convert the data into actionable information.
Alas, many CMMS and enterprise asset management (EAM) systems in use today would have trouble passing this litmus test. Although they are likely charged full of data, such data are typically not relevant for reporting purposes or are difficult to access due to ineffective system code-management set-up. Other barriers to analytics can include work that’s performed but not captured on a work order or entered into the CMMS; and closed work orders that don’t contain vital information such as actual hours, parts used, tools used, and failure codes.
If your system can’t pass this basic two-part test or if you are unable to easily perform such a test on it, your CMMS or EAM is no longer a management system but rather a work-order system, and your data can be equated to MUD (meaningless unrelated data). MUD is difficult to navigate and extract meaning for informational purposes—information being compiled that can be used for effective management decision-making. (As many readers will recall, MUDA, ironically, is the Japanese word used for waste when implementing a lean approach to production and maintenance.)
When the word forensics is used in a maintenance context, it implies the use of science and technology in combination with the legal system. Forensics is most often brought into play when a maintenance failure resulting in serious consequence or harm to person(s), property, and/or the environment is considered capable of occurring—or has occurred. Mitigating responsibility requires a systematic, due-diligence approach to all machine failures, as part of an organized strategy in preventing, predicting, trending, documenting, and analyzing for potential and actual failures.
Physical parts that have failed can be sent to a forensic laboratory to understand metallurgical failure, and documented through a failure-mode effects and analysis (FMEA) or root-cause analysis of failure (RCAF) investigation. Where due diligence is an issue, a full documentation trail is essential. This will require documenting processes and procedures and a full work-order audit trail within the CMMS. All documents should be capable of undergoing a questioned document examination (QDE)—the forensic science of documents that can be challenged in court.
The standardization of crime-scene photography hasn’t changed much since its introduction in 1888 by French police officer Alphonse Bertillon. Interestingly, television’s “CSI’s” sleuths begin each investigation with a photo essay of the crime scene in an identical manner to the way Alphonse Bertillon did so long ago. Real-world problem-solvers in plants and facilities should do likewise.
With the communication devices and cameras in use today, there is no excuse not to photograph a failure scene. Each time a piece of equipment or component fails, it leaves behind an evidence trail that will lead not only to the failure cause, but also deliver a strategy to understand and/or predict and prevent future failure events.
Accordingly, if we are to reduce levels of maintenance while increasing availability and reliability in our operations, it behooves maintenance professionals to develop a systematic approach (see sidebar) to diagnosing a failure scene that follows the “CSI” lead, i.e., commencing with photography and documentation of all contextual aspects of the failure scene, and not destroying the scene by contaminating or throwing out evidence in our haste to “save the day.” The generated investigation documents, in turn, are essential for forensic and failure analysis and planning and scheduling use.
In short, adopting a “CSI”-inspired approach to failure-diagnostic investigations is sure to enhance your operation’s maintenance and reliability efforts and help meet your overall asset-management goals.
Follow the “CSI” Lead: 8 Simple Steps To Failure Diagnosis
- Secure the scene. Work with operations to perform a quality evaluation of the failure scene before commencing repairs and/or restarting the equipment.
- Photograph the scene. The old adage “a picture is worth a 1,000 words” could not be truer in a failure investigation. Photos allow the failure scene to be revisited well after the equipment is back up and running, and act as good training materials for preventing future failures.
- Perform on-scene forensics. Maintenance and reliability personnel can perform many technical diagnostics at a failure scene, i.e., infrared signatures, oil-analysis signatures, and metallurgy.
- Bag and tag all physical evidence of failure or tampering. Once all local physical evidence of tampering and breakage has been photographed, tagged, and bagged, the actual failed components can be dismantled and replaced. Any parts for repair must be photographed and any parts requiring replacement must also be bagged and tagged.
- Interview witnesses. Operators can describe any abnormal sound, smell, or vibration emanating from the equipment prior to failure.
- Perform laboratory forensics. Examine all past failure records and vibration readings, performing any necessary metallurgical and oil testing.
- Analyze findings and write up a FMEA or RCAF report. Include recommendations and update preventive strategy(ies), as required.
- Code the failure on the work order. Complete the work order with a report of the findings, making sure to include failure-symptom codes.