Reliability Concepts and Tools
EP Editorial Staff | April 1, 2000
How to accurately predict the competitive advantages and quantify the business benefits associated with improving reliability.
A previous article “Profit Driven Reliability” discussed a six-step work process to increase profitability with reliability improvements. This article will define the foundation concepts and tools needed to apply the process.
To improve profitability by improving reliability means there must be a partnership with the business. One end of the partnership is the commitment that a reliability project will deliver certain competitive advantages to the business. The business must uphold its end of the partnership by committing to act on these competitive advantages. A key to this process is the ability to accurately predict the competitive advantages associated with improving reliability.
This article defines reliability as a measure of a system’s ability to consistently function as designed or at its highest level of performance. This definition extends reliability beyond failure of a manufacturing process or equipment and includes the impact of availability, maintainability, first-pass yield, weight yield, etc.
With this definition, reliability can be applied to any system to measure the effect of deviations from design or optimal performance. The system can range from the entire value chain to a heat exchanger. The components of the value chain include the process for billing customers, a manufacturing site, an operating area within the manufacturing site, and individual equipment within an operating area. Failures in the value chain could include errors in customer invoices as well as manufacturing equipment functioning at a level lower than design or optimal performance. For example, a heat exchanger would suffer reliability losses if it were kept in service while severely fouled.
All business benefits achieved through reliability boil down to improved profitability achieved through competitive advantages. Reliability improvements deliver competitive advantages by reducing the hidden penalty for unreliability. The first step to harnessing reliability is defining the hidden penalty of unreliability which occurs in three forms: lost production time, avoidance costs to mitigate the consequences of unreliability (for example, redundant or oversized equipment), and customer-imposed penalties. All three forms result in lost profit opportunity.
Lost production time is converted into lost profit opportunity using site-specific variable profit margins. This form can be the most expensive method for paying for unreliability since it limits the ability to take advantage of margins from incremental sales. Profit margins from incremental sales are higher since they are based on the variable portion of cost of goods sold (the fixed costs have already been covered).
Lost production time is the lost production expressed as equivalent downtime. Equivalent downtime is the downtime that would have resulted in the same lost production. For example, a process with a design rate of 1000 lb/hr that actually ran at 750 lb/hr for one hour had an equivalent downtime of 0.25 hr.
Equivalent downtime = 0.25 hr
=( (1000 lb/hr – 750 lb/hr)/1000 lb/hr) x 1 hr
Frequently, the primary business benefit of recovering lost production time is to support increased sales volume with available capacity gains. Prediction of available capacity gains can be tricky. Fig. 1a shows that the elimination of 17 days of downtime increased available capacity by only 14 days (unit availability increased from 77 percent to 81 percent). The other 3 days appeared as additional noninstrumentation downtime.
At first glance, the increase in noninstrument downtime appears to be the result of deterioration in noninstrumentation reliability. Appearances can be deceiving since the failure rate for noninstrumentation failures before and after the reliability improvement was 0.08 failure/hr. Instead, elimination of 17 days of downtime provided noninstrumentation failures more opportunity to occur, as shown in Fig. 1b.
The situation shown in Fig. 1b becomes even more complex when you add another unit as shown in Fig. 2a. Adding another unit means that the effects of unreliability upstream and downstream must be included in the evaluation of improving reliability.
The system shown in Fig. 2a would have an average annual production of 216,000 lb (216 days of production), despite the fact that each unit is individually capable of producing 281,000 lb (281 days of production). As in the prior example, a portion of the lost production time is unrecoverable. As can be seen in Fig. 2b, approximately 18 percent of each unit’s capacity is lost because of reasons unrelated to its reliability. These are interaction losses. During an interaction loss, a unit is a victim of upstream (or downstream) unreliability.
Unit interactions will change the business value of eliminating downtime. For example, eliminating all of Unit A’s instrumentation downtime will increase production only by 11 days. Eliminating instrumentation failures in both Units A and B will increase production by 22 days. The key to accurately predicting the business value of increasing production by eliminating downtime is quantifying the unrecoverable time. This quantification may require reliability modeling tools.
The hidden penalty for unreliability may appear as an avoidance cost. Maintenance expenditures are a classic example of an avoidance cost. Maintenance is performed to avoid loss of equipment function. Avoidance costs also may be incurred to protect production capacity and the ability to meet customer expectations from unreliability. These avoidance costs include:
- Inventory—buying time for the downstream process during an upstream upset and vice versa. This is a method for increasing system output without increasing unit reliability. The value of inventory can be illustrated using the simple system shown in Fig. 2a, modified by adding a 1500-lb storage tank between Unit A and Unit B (Fig. 3a). Adding 1500 lb of storage capacity is equivalent to adding 1.5 days of recovery time to the system. Assuming that the tank is half full when a unit goes down, the down unit has 0.75 day to recover before shutting down the other unit. This recovery period allows a unit to continue running. Fig. 3b shows how this recovery period increases site production from 59 percent (216,000 lb/yr) to 65 percent (238,000 lb/yr). This increase implies that inventory can be reduced by increasing reliability. For example, eliminating instrumentation failures in both Units A and B will eliminate the need for inventory in a site that must produce 238,000 lb/yr.
- Increased capital investment—building oversized facilities in anticipation of reliability losses. The facility shown in Fig. 3a can deliver an annual production of only 238,000 lb. If the business required an annual production of 365,000 lb, Units A and B would have to be designed for a maximum daily rate of 1500 lb and the storage tank size would have to be increased to 2250 lb. The storage tank size must increase proportionally with the maximum rate because its size is based on the recovery time it provides.
- Increased order lead time—forcing the customer to bear part of this hidden penalty by accepting longer lead times. Order lead time behaves as pseudo-inventory. If an order arrives while the site is down, order lead time gives the site a chance to recover without missing the order.
- Increased staffing or overtime—buying the ability to recover quickly from a reliability failure.
- Increased shipping costs—shipping by a more expensive channel because product was not available in time to use normal channels.
- Finally, customers may decide that the hidden penalty for unreliability is not high enough. They may decide to up the ante for eliminating unreliability by:
- Shifting sales to your competitors—reliability influences many product attributes important to customers such as quality, stable supply, price, and short lead times.
- Increasing receivables—unreliability may result in unsatisfied customers who withhold payment until satisfied.
Unfortunately, most people are oblivious to customer-imposed penalties since neither the manufacturer or its customers may identify unreliability as one of the root causes of dissatisfaction.
Reliability modeling tools
Computer simulation tools may be required to link reliability improvements to increased production, reduced order lead time, or reduced inventory. Depending on system complexity, these tools can range from spreadsheets to a discrete event simulation model. All discrete event simulation models share common capabilities such as defining the relationship between reliability, productive capacity, inventory, quality, missed shipments, and order lead time. Some of the commercially available reliability modeling tools currently may not possess all of these capabilities; however, if the tools are based on discrete event technology this is a limitation of their current stage of development. The information required to use these models will vary from case to case; however, all models will need the following information at a minimum:
- Definition of the probability of failure.
- Definition of the consequences of a failure (shutdown, run at reduced rates, lose batch, produce off-specification material, etc.).
- Definition of how long it takes to return to service. This may be explicitly defined with a single time (6 hr) or a probability distribution (50 percent of failures require 6 hr, 50 percent of failures require 12 hr). It also may be implicitly defined by describing what must occur for the unit to come up (repairs will require 6 hr once a mechanic is available).
- Definition of storage capacities.
Models that seek to link reliability improvements to order lead times, finished product inventory, or saleable capacity also will require information about order predictability and size. MT
This article is based on a paper presented at Process Plant Reliability 99, October 1999, Houston, TX.