A critical care hospital discovers leaks in the underground gas piping system that serves it, a paper mill has reoccurring boiler tube failures, and a steel plant's reheat furnace has a serious refractory failure — these are all recipes for serious business interruptions. Life, as you know it, will drastically change if your organization’s operating staff encounters one of these equipment problems.
Enterprise risk management (ERM) is a process that identifies and classifies many types of risks organizations face. Such risks are those that can impact an organization’s ability to achieve its objectives (whatever they are). ERM is a popular concept these days in the business continuity world. Today, ERM is largely related to data centers, software, and computer virus problems. Although these are certainly important, this article attempts to change your lens a little bit and think about business continuity risks related to your fuels and fired-equipment systems.
I was recently told by a client’s risk manager that besides owning large boilers, they also own thousands of miles of roads that have to be maintained for safety; process millions of pounds of food that has to not get anyone sick; and, yes, they have boilers and fuel systems that must function safely. This discussion reminded me that, on some level, every organization needs to understand the seriously disproportionate risks that are out there, but, at the same time, no one has the resources to manage every level of risk all the time.
This article articulates three of the biggest risks I know of that are not popular, somewhat obscure, and not well-managed by many organizations when it comes to fuel-fired equipment, such as boilers and industrial heat processing equipment.
These three major fuel and combustion system risks include underground fuel piping systems (i.e., natural gas), refractory systems, and boiler mechanical integrity (i.e., tubes and drums). These issues are not readily observable and are sometimes cloaked in a veil of “being an art” to be able to manage and correct. The truth is, these are understandable; they are not obscure if the proper technologies are applied; and, although there is certainly an art to managing and correcting them, it is not beyond the capability of today’s technologies and diligent caretakers.
Throughout this article, I address all three of these issues and give you some perspective on how these can be better managed. I also interviewed three of the best minds I know in each of these areas to get some idea of how bad things can get if there’s a problem in today’s world of supply chain issues and labor shortages. In each example, I examine a possible outage for each of these three critical scenarios by having you consider the work scope involved for getting things diagnosed and fixed. Then, you can decide how long you might be down at some of your key facilities. I’ll give you a hint before you get there: In each case, it could easily be weeks, not days.
Obscure Risk No. 1 — Underground Gas Piping Failures
How obscure are they? How about you can’t see them even if you wanted to. Equipment positioned above ground, such as corrosion protection system testing devices, can be used to assess the condition of the piping. The problem is that, in many cases, these corrosion protection systems are nonexistent or not monitored.
Gas utilities are required to validate the integrity of their cathodic protection systems (i.e., corrosion controls) annually via U.S. Department of Transportation (DOT) requirements. However, once that gas becomes owned by the customer, and the pipe heads back underground from the discharge of a meter, it’s usually the customer’s responsibility for protection. Very few customers do a good job of managing their cathodic protection systems, assuming they even exist. Testing these systems and managing this piping are not skills that are used regularly by many organizations. The testing and evaluation of these systems are typically contracted services, if they occur at all.
- Consider the Scope of the Repair — If an underground fuel line has lost its integrity, one must first identify the approximate location of the leak. Once the leaks are identified (e.g., there’s rarely ever just one), the piping system must be isolated and then purged out of service. This is usually when it is discovered that critical valves needed for isolation either do not exist where they're needed or they do not work. The piping to be purged could be hundreds of feet long. The planning and execution for the purge out of service could require two to three days. There are codes and standards that apply to this work that must be followed, like NFPA 54, the National Fuel Gas Code, and NFPA 56, Standard for Fire and Explosion Prevention During the Cleaning and Purging of Flammable Gas Piping Systems. I am a member of both of these committees and can vouch for the very practical information contained, including a sample purge plan in the annex of NFPA 56. These and other NFPA standards and codes are available for free viewing at www.NFPA.org.
Next, the suspected leak area must be excavated. Once the pipe is excavated, the extent and scope of the required repair can be seen for the first time. Years ago, one of my clients, an auto manufacturer, started this process for a service line leak that went through a parking lot. This identified more than 200 feet of buried 6-inch pipe that needed to be replaced. They were able to do this work over a summer shut down. They were lucky. I’m sure you know that luck’s not much of a strategy.
I contacted my go-to guy for underground cathodic protection issues, Mark Kocak, lead technical manager for Mears. In today’s world, he indicated specialty pipe lead times can be double what they used to be (now in the range of six to eight weeks, depending on what you need). Certain critical cathodic protection items can also take weeks to get. Not having these components can mean an excavation might need to remain open until all materials arrive, are installed, and the new cathodic (corrosion) protection system’s functionality can be validated.
- Mitigating the Risk — You need a management process and reporting system that includes information from the serving gas utility on the health of the service lines serving you. They are required to provide this information if asked.
If you have underground piping on your organization's property, then you need the services of a corrosion protection specialist who can identify the condition of your lines and whether or not the original protection systems put in place are still adequate. There, then, needs to be a regular measurement and reporting practice put into place so you understand the ongoing condition of the cathodic (corrosion) protection system you might have in place.
I don’t want to scare you, but even this assessment and the consequences of it can be a little ugly. So, let me prepare you. Years ago, at a recently purchased natural gas-driven combustion turbine power plant, the new owners decided that, since they didn't have good documentation on the 900-psig pressure gas line serving them, they would do some excavations to assess the pipe and install new cathodic protection. Our firm wrote the purge plans and managed the project so the gas piping could be empty and safe when the excavation and other work were occurring. We then managed reenergizing and commissioning the line once the work was done. This entire process took weeks, and everything went relatively well. There’s just nothing about any underground work that’s ever easy, risk-free, or fun. It’s also somewhat rare when underground projects contain little or no surprises.
Obscure Risk No. 2: Refractory Damage
Refractory is the cementitious, rock-type material that protects carbon steel boiler tubes and the structural steel that supports boiler, oven, and furnace fireboxes. It’s got a design useful life, and many operating parameters can influence the refractory useful life.
How obscure is it? You can only really see it completely if the equipment is nonfunctional, cooled down, and you send a crew inside with special training and personal protective equipment (PPE).
Everyone knows what obvious refractory problems are. These are situations where paints burned off and outer casings of equipment are glowing red and maybe even burned through. The key to managing this risk is to not let it get this bad. There are warning signs for when refractory is starting to have even minor problems. If you have the right equipment and a little bit of training, this can be part of your critical assets reporting process.
- Consider the Scope of the Repair — The repair scope and time required for refractory increases exponentially the worst you let it get. Once the heat starts to impact structures, they can warp or bend, breaking more and more refractory.
Repairs require the inside of fireboxes, combustion chambers, and furnaces first cool to temperatures where skilled trades can work inside. The work is often performed in confined spaces that require special safety measures, training, and PPE. In some cases, testing must occur first of the materials inside to make sure they are not asbestos. Likewise, silica dust can be another hazard workers inside may need to be protected from.
Once inside, it’s a matter of demolition of the broken materials and an assessment of damage to the steel beneath. Then, it’s a matter of hoping the proper repair materials are available in a reasonable time. Some refractory materials can be hard to get. Then, there’s the installation and the all-important dry out and curing process, which, for some, could take many days.
One of the folks I contacted in my research for this article was my friend Bob Humphrey, a sales manager with Onex Construction. He indicated the supply chain issues have hit his refractory world hard and that getting some materials could take weeks. In the pre-COVID world, refractory repairs could be a week or two out. Now, you can add a few additional weeks just for the hard-to-get materials and labor.
- Mitigating the Risk — You need a management process and reporting system that includes regular reviews of the condition of refractory and the conditions that cause it to degrade rapidly. The good news is that infrared imagers, the technology for early detection of some refractory problems, has now become very inexpensive. In fact, one can buy a cellphone attachment imager for $200 that can be a big help. Besides setting up a thermal imaging campaign, other things that can shorten refractory life need to be tracked, including, at a minimum, daily flame observations and documentation of startups and shutdowns. Flame impingement on refractory could dramatically shorten its life.
It’s not only how many startups and shutdowns, but the nature in which they occur. These should be done according to a written procedure that minimizes both rapid heat up and rapid cooling down of refractory. You cannot manage refractory issues with surprises and crisis mode reactions, as doing so will be very costly.
Obscure Risk No. 3: Boiler Mechanical Integrity Failures
Boilers rely on pressure-retaining components, like tubes and drums. There’s a myriad of tube and drum combinations and levels of complexity, including metallurgical considerations. In the case of firetube boilers, failure and tube repairs are generally not as consequential as in the case of a watertube boiler. The repair outage goes from days to sometimes weeks when the technology changes from firetubes to more complex and larger watertube designs.
How obscure is it? It takes a special non-destructive testing (NDT) processes to know with any sense of accuracy how much longer your tubes (and possibly your drums) have to live. Many organizations do not have programs to evaluate the remaining life expectancy of critical boilers. There could be years of improper water treatment or exterior corrosion from previous fuels, like coal or heavy oils, that can come home to roost when it’s least expected. Operating and maintenance staff usually see it as a “big surprise” when pressure-retaining parts of a boiler fail, instead of something that could have been detected and managed.
I’m not sure there’s a specific schedule for full-blown boiler life expectancy studies, as life expectancy depends on many factors. However, I am sure that frequency is not “never.” There are also many investigative steps between an expensive and exhaustive life expectancy study with metallurgical samples and extensive tube ultrasonic thickness testing, which is likely what you’re doing today. The U.S. Occupational Safety and Health Administration (OSHA) has even recently ruled that mechanical integrity programs at process safety management (PSM)-designated plants should include boiler systems.
- Consider the Scope of the Repair — The consequences of being caught by surprise can be tragic. Tubes that fail can, under the right circumstances, also be safety issues. Once again, let’s review the possible diagnosis and repair scope, and you can decide for yourself. First, there’s the matter of discovering the leak or failure. This is not usually a mystery. It’s sometimes a loss of pressure and/or a disturbance of flames; it could be loud, or it could fill the boiler house with steam and be life-threatening.
Just as in the case of refractory repairs, the boiler needs to be capable of being isolated from others that might be in service and connected to the same piping systems. This will require a double block and bleed isolation. Steam isolation valves are notorious for not working to isolate well when you need them to work, so don’t be surprised if yours doesn’t work as well. If this is the case, you may need to take all the boilers and the entire steam system down for isolation.
Also, this is a time for extreme caution, as there have been many people hurt operating high-pressure, high-energy steam piping system valves.
Once isolated, the firebox and piping have to cool down enough for people with special confined space equipment and training to get inside and assess the scope of repairs. There may also have to be scaffolding erected within the boiler for access to the tubes. It’s also possible that tubes have special metallurgical requirements that need to be understood. Hopefully, the tubes you need with the right metallurgical chemistry are readily available. If they are available, they might need to be bent somewhere in a shop to precise requirements. Then, highly skilled welders with American Society of Mechanical Engineers (ASME) code repair qualifications have to do the welds. Once complete, and after inspections by jurisdictional authorities, you might be ready for a liquid pressure test to validate the repair. This pressure test process can take a day or two, even if it’s successful. It’s also possible that some refractory repair would need to take place at this time. Once the pressure test is complete, the boiler needs to be drained of the test water and then fired up and recommissioned slowly.
I checked in with my friend Buck Holt, president at NBW Inc. in Cleveland, who has 50 years of experience as a boiler repair contractor, to discuss the state of the world for boiler and tube repairs. He indicated a repair like I described above could take weeks. He also indicated that, in today’s world, getting specialty tube materials could take additional weeks over the pre-COVID world.
- Mitigating the Risk — You need a management process and reporting system that includes NDT work surrounding your most critical boilers. This should be part of an overall comprehensive mechanical integrity program for other parts of the boiler systems and high-energy piping. If you’re already operating an OSHA PSM plant, you should check to see that others “got the memo” when it comes to including the boiler systems. You also need to examine the fleet and decide which boilers, if any, should be in for a life expectancy study that includes significant deep dives into tube and drum conditions.
One of the most prolific causes of boiler mechanical integrity issues is the mismanagement of water treatment. You should verify that quality control measures are in place for every step of this process. Some power plant facilities have even applied classical process safety layers of protection analysis (LOPA1) to this critical part of the boiler success operating parameter. A third-party review of what is being accomplished here is also a possibility. It has been my experience that there’s not a lot of difference in the proprietary chemicals amongst suppliers. Many of them can work successfully. Instead, it’s the infrastructure (deaerator performance for removing dissolved oxygen, metering pumps for chemical additions, blowdown controls) and the training of staff that make the most difference in the results.
One of the other things you can do to protect yourself from downtime is to get a rental boiler in the case of an outage, which is unique in the fired equipment world, as I don’t know of anyone who rents process furnaces. It’s not pretty, and it’s sure not cheap, but it can save your operations. The timing on rental boilers works like this: You have an outage and find someone who has the capacity and the pressure rating you need (day or two). They truck it over and find a place to set it (two to three days). They then need access to feedwater, power, and fuel. The connections for these are often where serious time delays occur (probably at least a week, if you're not prepared). You can do yourself a big favor and plan for this contingency in advance. Well-prepared facility managers have planned for a place to set a rental and already have valved-in connections in case there’s a need. This can easily save a week of an outage if it’s planned and executed correctly.
Conclusion
In the process safety and risk management world, there are techniques for assigning order of magnitude probabilities for each of the obscure risks I identified. Once a year, once in 10 years, once in 100 years, etc., you can apply industry experiences for these kinds of failures, or maybe your own organization has some insights. In my opinion, the consequence side of this is a little more difficult. It’s not somewhat straightforward lost-time injuries and fatalities. It’s more about the possible catastrophic loss of a blast furnace from there being no fuel for boilers that produce steam for “wind” as it’s termed in the steel industry or tubes in a critical hospital boiler during one of the world’s most critical medical emergencies. These are things that can shake the very core of a business and its reputation for many years to come.
I hope, if nothing else, this article has caused you to pause and think about getting these business continuity issues on a risk radar screen. Every organization has its own risk tolerance levels. In the case of organizations I have worked with that are conducting process hazard analyses (PHAs), when a concern has a low probability but the consequence is death, there are lots of mitigation steps put in place to minimize this concern from ever occurring. The consequences here are not necessarily injury or fatality related, but they sure are serious. I hope that after you’ve read this, you’ll reconsider the three obscure scenarios I've described and decide whether or not these are really within the risk tolerance level of your organization? Perhaps they’ve never even been considered or understood.
References:
- Research of Risk Assessment in the Boiler Water Treatment System Based on Layers of Protection Analysis; Copyright 2014 Published by Atlantic Press, Authors: Huiling Li, Nan Jiang, and Bo Deng.