Drowning In Data? Look To The 'Stars'

Analysis • Automation • CMMS • Condition Monitoring • IIoT • Management • Motor Testing • Predictive Maintenance • Reliability

Drowning In Data? Look To The ‘Stars’

Jane Alexander | August 10, 2017

Identifying and acting on the right data can transform reliability and maintenance programs from resource black holes to key business drivers.

Advances in common communication protocols and wireless networks have created the Industrial Internet of Things (IIoT), technology that connects everything from material supply through manufacturing to product shipping. As IIoT data quantity increases, plant personnel are in danger of drowning in a flood of information. The dilemma for many is how to make sense of it all and derive answers that help them successfully operate and maintain their processes. Keith Berriman of Emerson (Round Rock, TX) advises to “look to the stars.”

To put things in context, a bit of history is in order, beginning with the ancient Greeks. They, according to many scholars, were some of the first people to recognize patterns among the seemingly endless numbers of stars filling the night skies from horizon to horizon. Assigning names to groups of conspicuous stars, i.e., constellations, they wove references to them into their beliefs, literature, and other forms of cultural expression. Over the centuries, explorers and others have looked to many of these constellations to locate certain stars that could help them navigate the globe.

While not advocating that plant personnel take up actual celestial navigation, Berriman encourages them to consider a similar approach when dealing with the seemingly endless amounts of IIoT-generated data they’re confronting. As he explained, they can locate specific “stars” in their facilities that will tell them about the condition of assets and processes and, in turn, allow them to take action to prevent and mitigate failure. It’s an approach that’s feasible for virtually any plant.

“From an economic perspective,” Berriman said, “the cost of installing connected devices that run on wireless networks has fallen to less than 20% of traditional wired devices. This allows us to install sensors on all sorts of equipment that we previously would have to monitor with hand-held devices or through some type of invasive inspection.” That’s the good news.

The bad news is, despite the affordability and widespread availability of continuous-monitoring technologies, personnel still need to know what to look for amid the data that constantly streams from them. Unfortunately, all systems for analyzing such information are not created equal. “Depending on the system you use,” Berriman observed, “you may not be getting a full and correct picture of equipment and process conditions in your plant.” This is where his “look to the stars” approach to data pays off.

Bringing order to chaos

Berriman’s approach starts with sorting data into fixed and variable groups. “This,” he said, “helps us solve the risk-identification and -mitigation equation.”

Fixed data is set when the plant or system is built or modified. This includes:

• plant layout
• equipment design
• equipment data
• material master data (spare/OEM parts)
• performance parameters
• potential failure data.

These items become the known variables in the risk-identification and -mitigation equation. Variable data, though, changes during the operation of a process or asset, including, among other things, as a result of raw-material composition, process variation, weather, equipment condition, and work history.

By selecting the right data points, personnel can populate the equation and determine their position, which, in this case, means the condition of their site’s assets. Doing this requires building a set of “constellations” to identify and capture critical asset data.

A reliability program is designed to proactively identify and mitigate failures, while eliminating defects. A maintenance program is designed to preserve or restore function to a system. Effective data constellations allow reliability and maintenance teams to detect and repair problems before they have an impact on performance.

Data and reliability programs

An effective reliability program consists of interconnected building blocks that include the following four steps, aimed at identifying impending failures with enough warning to allow repair or replacement. Root Cause Failure Analysis (RCFA) determines the causes of unexpected failures to improve the program and avoid similar events.

Build a complete master equipment list (MEL). The MEL includes the fixed data for the next steps in the process and the information required for planning and scheduling work and ordering parts and materials.

The MEL also contains an organized hierarchy of assets that users can follow to identify equipment. Ideally, the branches should extend down to the “functional location,” i.e., the place in the process where an asset operates. Associating a particular asset with a unique identifier allows it to be tracked as it moves from one location to another.

To complete the MEL, fixed data must be associated with each asset. This includes, among other things:

• equipment type (pump, motor) classification (centrifugal), location, process and operating information, process drawings, size, power, material of fabrication, and motor-frame size

• bills of material (BOMs), i.e., spare parts needed to make repairs to the equipment.

CMMS systems organize and sort this information in various ways and allow the roll-up of metrics, costs, and information to identify performance and trends.

Rank asset criticality. With an accurate MEL, sites can rank the criticality of their assets. While organizations often focus on one potential impact, such as production or safety, to completely understand the relative criticality of their equipment systems, they need to review a number of factors. Five basic categories are used to determine asset criticality:

• safety
• environment
• production
• maintenance cost
• quality.

Additional categories may be used and the weighting adjusted for the specific process under review. Weighting uses a series of questions with points associated with the severity of impact.

Ranking asset criticality requires data and expertise. The resulting distribution can be sorted into categories to determine the next level of analysis and develop preventive- and predictive-maintenance (PM and PdM) programs. Criticality should also be used to prioritize work and ensure high-risk issues are addressed in time to prevent failure.

Develop strategies. At this point, strategies to detect and mitigate impending failures can be developed. Tools for doing so include Reliability Centered Maintenance (RCM) and Failure Modes and Effects Analysis (FMEA). They ask structured questions about the function of an asset, how it might fail, the impact of failure, and how to detect signs of failure. Since RCM requires a team of subject-matter experts and significant time, it should focus on the critical group of assets and systems. FMEA, which can be conducted by one or two participants, should focus on the essential group. Templates can be used to create strategies for the monitor group. In applying templates, it’s crucial to understand the context of an asset, given the fact the same equipment in different locations may not require the same strategy.

Note: Since the impact of their failure isn’t great, assets that fall into a No Scheduled Maintenance group won’t require routine or continuous monitoring.

Select PM/PdM condition-monitoring tools. Understanding failure modes allows personnel to select the appropriate tools for the job. Typically, this selection is based on the warning that a tool provides and the cost of performing the task. The classic P-F (performance-failure) curve illustrates the relative effectiveness of different techniques. IIoT data allows sites to combine indicators and move further back up this curve to provide earlier warnings of failure and, thus, allow plant personnel more time to plan repairs and procure replacements.

Once personnel know the data they require from a site’s network of instruments, analytics, and inspections, they can generate alerts and warnings to restore assets to good operating condition. The more advanced warning they have, the more planned and organized they can be. To that end, they should set warning alarms that allow time to plan and action alarms that indicate when prompt intervention is required. These alarms, and the data they generate, are an important part of the solution to the risk-identification and -mitigation equation, in that they help determine asset condition. As Berriman emphasized, however, “The information must still be acted on.”

Data and maintenance programs

Regardless of industry sector, type of operation, or location, one constant is the basic maintenance process. All plants need to complete the following six steps to be consistent, strong performers

Identify work. Maintenance work is identified through a variety of sources. Most work should come from PM/PdM activities and the previously described warnings and action alerts. However, there will be issues identified by operations that the program missed, requested improvements, and other tasks. These issues need to be reviewed and approved before effort is expended on planning and scheduling.

Work entering the system needs to be reviewed for the completeness of information and approved before moving to planning. Known as gate keeping, this requires a dedicated resource for consistency. Ideally, the gate-keeping role belongs to Operations, i.e., the equipment owners.

Plan work. Planning is where a job is broken down into a logical sequence of tasks, maintenance craft assigned, parts ordered, and other resources identified, including such things as scaffolding and contractors. A good job plan allows accurate scheduling and work execution. Job plans should include safety and environmental precautions, work permits, and other procedures. Data collated at this step should include equipment data, materials/parts data, work history, safety/environmental data, and resource availability.

The output of this step is a backlog of planned work to build schedules and balance workforce composition, especially where contract resources are used to augment in-house maintenance personnel.

Schedule work. This step takes the job plan data for duration and resources, and integrates production-planning data and asset criticality to create a maintenance schedule that fits the production schedule. This requires collaboration between departments to understand priorities, equipment availability, and other issues. The scheduler role should be owned by Operations since, again, it owns (controls) the equipment.

The outputs of scheduling are long-range plans and a weekly calendar of maintenance work used to create daily schedules. Daily scheduling is a joint effort to select new work for the next day from the weekly schedule and to ensure incomplete work is carried forward.

Execute work. When a day’s schedule is completed, Operations can prepare the equipment and Maintenance can execute the work. This phase includes the integration of unplanned work that might supersede scheduled tasks, known as break-in work. This work needs to be managed to prevent organizations from becoming highly reactive.

Maintenance supervisors need to monitor progress on work to communicate with Operations and to ensure time is added to the next day’s schedule for incomplete work.

Follow up/capture data. Upon completion of work, data must be captured to drive analysis, planning, and other activities. That includes capturing “as found/as left” data for instruments, repair history, failed components, time, materials, and labor, among other things. The information should then be recorded in the CMMS for future use. Responsibility for this step typically falls to maintenance technicians and supervisors.

Analyze data. Once data has been captured, analysis can be performed on failure modes to determine and mitigate bad actors, or equipment with high costs and downtime. Cost and lost-production data can be used to understand budget variances and drive key performance indicators (KPIs). Reliability teams use maintenance data for detailed statistical analysis, such as Weibull, that identify patterns of failure and predict future events.

Navigating your data

According to Keith Berriman, the Industrial Internet of Things is an opportunity to increase the generation of accurate timely data without the use of invasive and time-based processes. As the integration of systems improves, the interconnectedness of data allows more accurate and simplified presentation of information for repair/replace decisions.

“But,” he cautioned, “too much unnecessary data can obscure the information personnel are looking for and hide problems that might become critical and dangerous. While technology is a great enabler, without a strong foundation, it won’t deliver the results plants seek. “The key,” Berriman concluded, “is to be able to identify and then act on the right data.” Looking to specific “stars” in your plant is a good way to ease that voyage.

Keith Berriman P.Eng, CMRP, is a senior reliability consultant for Emerson, based in Edmonton, Alberta. For more information, email Keith.Berriman@Emerson.com.