Uptime: Maintenance, Reliability, Profit & Loss
Marilyn | April 1, 2008
This 4th installment in a special series is the subject of Bob’s keynote address at MARTS 2008.
Businesses depend on processes functioning properly to achieve desired results and generate revenues to sustain the business, resulting in a return on an investment or a profit from doing the business. These “processes” can be human work methods and procedures, transformation stages during product flow, equipment-driven systems that produce a useable output, etc. When these processes cease to operate, momentarily stall or run inefficiently, the business is negatively impacted.
More commonly in business, costs increase, revenues decline and profits turn into losses when equipment breaks down or when human errors occur. Maintaining equipment and processes for optimum reliability is essential for competitive success. The more we ignore the value of maintenance and reliability, the greater our economic losses.
Generally, a maintenance department alone cannot make equipment and processes reliable. Other people and departments have a direct— or an indirect—effect on reliability. Senior management, corporate executives, operations management and staff, equipment operators, spare parts storerooms, spare parts purchasing, spare parts suppliers, training, engineering, procedure writers, product and process quality, and outside utilities are a just a few of the influencers.
When something fails, however, it is typically the maintenance (repair) organization that springs into action to restore equipment and process operation. Meanwhile, the other previously mentioned “influencers” (if not directly involved in the problem resolution efforts) are engaged in overhead activities while the process is NOT generating revenues. During process downtime, the business is NOT making money, NOT generating revenues, NOT posting profits. Even worse, it actually is LOSING money because of all the unplanned costs (labor, parts and supplies) and the extra efforts that interrupt the normal work processes and jobs of people who cannot do their work. Unscheduled downtime is a financial drain and the time that is lost NEVER can be recovered. Sure, people and processes can work harder, faster and extended hours to make up production, but the original “planned” time is LOST forever.
Unscheduled downtime is a THIEF that steals planned, scheduled, productive time that NEVER will be returned. Time is money. Consequently, LOST TIME really is LOST MONEY.
A racing example
Think of unrecoverable time this way. While leading at Daytona—the first race of the season—by a 10- second margin, a NASCAR Race team (let’s call them the “Car 54 team”) experiences unscheduled downtime when their car hits the wall coming out of turn two. A caution flag is waved and the other race cars slow down. The damaged car limps back to pit road for unscheduled repairs. While all the other cars make laps around the track, the damaged one is not moving—it is being repaired.
After the officials clean up the track debris (unscheduled cleanup), the race returns to “green flag” conditions and cars quickly reach track speed. Meanwhile, Car 54 is still on pit road with the team beating, banging, pulling and duct-taping it back together. After the race has continued for 20 more laps, Car 54 rolls down pit road and gets back on the track—not a pretty picture but it’s running.
The time Car 54 spent in the pits is LOST time (track position) that NEVER can be made up. Although it was running again and finished the race, it was still behind the rest of the field by more than 20 laps. Damage to the $300,000 vehicle and the time and money required to make it race-worthy for the next scheduled race caused costs to skyrocket.
Car 54 didn’t just lose the race. It also lost top-10 prize money, sponsors were sorely disappointed and damages amounted to $150,000 in repair costs. These repairs set the team’s planned shop schedule back about a week, leading to huge amounts of overtime to catch up. Furthermore, several team members were injured during the pit road repairs, requiring medical attention plus some recovery time off from work.
The Car 54 team realized its “problem” at Daytona COULD happen again at a future race—or races. The team had several choices: 1) suck it up and prepare for the inevitable wrecks, asking for a bigger maintenance and repair budget; 2) determine the cause(s) of the problem and develop a countermeasure to prevent it from ever happening again; 3) improve the efficiency and effectiveness of their pit road and shop repairs. Or, they could blend all three of these choices into a solution.
Discovering the root cause of Car 54 hitting the wall had nothing to do with the car itself, or the driver’s actions (other than being in the wrong place at the wrong time). The team determined that the slower, soon-to-be-lapped Car 13 had a right-front tire blowout, causing it to violently swerve to the right, just as Car 54 was passing on the right. Car 13 barely clipped the rear bumper of Car 54, causing it to pitch to the left. The Car 54 driver over-corrected and grazed the wall with his vehicle’s entire right side. OUCH!
Given those causal conditions, the Car 54 team also realized several other important things: 1) they operated with a fixed budget already impacted by the wreck, and neither owners nor sponsors had additional money to put into maintenance and repairs; 2) “stuff happens” that can’t be easily prevented, but such a case, the driver has to avoid passing slower cars on the outside in the turns; 3) the team had to improve how it repairs damages at the track.
Yes, the driver of Car 54 could have been FIRED and/or the driver of Car 13 could have been banned from racing. Unfortunately, these actions would NOT have addressed the causes of the problem and the business losses that were incurred. Organizationally, the team knew it would have to find ways to be more resource-effi- cient AND more effective.
A power-gen example
On February 26, 2008, at 1:09 p.m., a massive power failure slammed South Florida. Two nuclear power plants at Turkey Point (Units 3 and 4) and a natural gas power plant shut down and up to 15 other power plants were affected. More than 900,000 customers of three power companies —the equivalent of nearly 2 million people—were without power for several hours that day. A total of about 2700 megawatts (MW) of electricity and 4000 MW of load were impacted.
Turkey Point Unit 4 was down for five days and Unit 3 was down for seven days. These 12 days of power plant downtime, widespread outage, repair costs, lost generating capacity, expense of purchased electricity from other suppliers, reduced customer revenues, damages incurred and missing profits NEVER can be recovered. They are lost forever!
The Florida outage had a monumental business impact on the power company. Utility revenues dropped; daily profits turned into losses; stock prices declined by 3.7%; state, municipal and Federal tax receipts declined; repair costs increased; overhead costs continued without the supporting revenue; and untold consumer damages resulting from the event have yet to be determined. Early damage reports, though, reflected a high volume of traffic accidents because of inoperable traffic signals. Businesses had to shut down and send employees home. Restaurants and grocery stores closed and food spoiled. Countless other commercial, industrial and residential customers experienced varying degrees of damages and losses. The problem extended beyond Florida, too. Even as far away as Texas, one company had to import power from Mexico during the crisis.
In the months since, power experts have explained that automatic safeguards in the electric distribution grid worked as they were supposed to work. This, they note, is what prevented power plant damage and the type of far-reaching outage the U.S. and Canada experienced in August 2003, when the entire Northeast went black. That’s the good news. The bad news is that the multimillion dollar cost of the February 26th Florida power outage has yet to be determined. In fact, the Florida Reliability Coordinating Council will be taking “several months” to analyze the events.
The reported cause of the Florida outage is being cast in terms of “human error”. An employee with “significant tenure” appears to have disabled two levels of relay protection during the diagnosis of a malfunctioning disconnect switch at a substation in West Miami. While making the required measurements, a circuit shorted and cascaded to other parts of the system. The power company says “simultaneous removal of two levels of protection was contrary to its standard procedures and practices.”
The power company has several choices, much like those of the Car 54 race team: 1) suck it up and prepare for future, inevitable power outages, asking for a bigger maintenance and repair budget; 2) determine the cause(s) of the problem and develop countermeasures to prevent it from happening again; 3) improve the efficiency and effectiveness of its unscheduled power outage recovery and repairs. Or, the company could blend all three of these choices into a solution.
With calls for the firing of the person who reportedly set off the outage, the power company put the employee in question on administrative leave. But hold on! While this individual may have contributed to the true cause(s) of the event, he now has the knowledge of countermeasures to prevent such a catastrophic chain of events from occurring in the future. This employee was closer to the cause(s) of the problem than anyone else in the world! Thus, there are invaluable opportunities here to learn from “mistakes”—as an individual employee and as an organization trying to improve performance.
Were the “standard procedures and practices” up-todate and accurate? Was the employee trained and qualified in the specific procedures? Was he assigned to perform a task based on his “significant tenure” (experience) without regard to the proper procedures? Was this truly a maintenance- induced failure? Could a system be devised to allow the type of switch inspection and still offer the required level system of protection?
FIRING an employee might appease the media and public, but would it really do anything else? The actual “cause(s)” of the multi-million-dollar business losses still would exist—lurking just below the surface, only to happen again when least expected. In racing terms, the car will hit the wall again and the new driver might not be the cause.
What about our own operations?
If we can’t eliminate all of the causes of a problem, what can we do to minimize the damages? Maintenance and training budgets could be in jeopardy once damage has occurred and downtime losses turn into revenue losses, and, in turn, budgets get cut. Oh no! Now we have to do more with less, or do less with less. Car 54, where are you? MT
Resources used for this column
Florida Power & Light news releases
Platts.com: Electric Power News
Energy Assurance Daily: U.S. Department of Energy
Associated Press reports, various publications
Nuclear Energy Institute press releases
Miami Herald (Knight-Ridder/Tribune Business News) reports