Reducing risk of failure by replacing old equipment?

We all do it: look at something and say “it’s old—it’s time to be replaced.” In engineering and maintenance we talk about useful life: we base capital replacement plans on it, we base condition on life expectancy, and we view the reliability of our assets through age-coloured glasses. The question is: where do these concepts come from? In many cases useful or expected life is based on averages, manufacturers’ recommendations, or recommendations from the design engineers. It turns out that in many cases, the probability of your asset failing is likely to be higher when it is brand new, rather than when it is old.

Does age really equal increased risk of failure?

First, a little background. At United Airlines in the 1970s, the Director of Maintenance Analysis, Stanley Nowlan, and the Manager of Maintenance Program Planning, Howard Heap, engaged in maintenance research which classified all the failures that they found into six patterns—they called these age-reliability patterns. The US Department of Defense sponsored their findings to be published in a work titled Reliability-centered Maintenance—a work that revolutionized aviation maintenance.

Other industries have tapped into this knowledge, but for many of us we still assume that linear wear is the dominant way that our assets fail. It is certainly true for some assets—but not for all. Nowlan and Heap’s six patterns actually showed that over 68% of the assets fell into their F pattern, known as infant mortality, where there is a high conditional probability of failure early on, and this decreases to a steady (i.e. random) or slightly increasing probability of failure (see F Curve below).

Moreover, greater than 80% of all the failures were not based on wear-out. In the words of the original RCM work: “..their performance could not be improved by the imposition of an age limit.” (Nowlan & Heap 1978, p. 47) For more on the six failure (or age-reliability) patterns, see our blog post introducing Reliability-centred Maintenance.

Case study: engine oil

Let’s suppose you drive a car, and it runs synthetic oil. You are supposed to change the oil every 10,000 km or six months—whichever comes first. The oil itself, as a consumable, likely belongs in the B pattern of failure, which is a typical wear-out pattern.

Now let’s say that every time you change the oil, your oil system has a high initial risk of failure (wrong oil, wrong filter, stuck filter gasket, wrong amount of oil…). This fits the F pattern of failure (infant mortality—discussed previously).

So when you combine the high initial probability of the oil system with the wear-out characteristics of the oil, the overall probabilities of failure fit the A pattern (bath tub curve): the combination of F and B.

If 10,000 km is the start of the rise of failure (shown at decile 8 below), then getting your oil changed before that means that you are increasing your risk of failure. How’s that work? By changing your oil early (for instance at decile 4), you are incurring the risk associated with the F pattern sooner than required, increasing overall failure risk. Consider your positions on the failure pattern when you get your oil changed and where you will be after:

Now there is a serious caveat here: if you run your oil way over and it stops performing its functions that’s entirely your fault. The oil follows a wear-out pattern but the system incurs infant-mortality failure risks. In this case we are talking about the probability of failure occurring vs. financial risk or liability.

Back to your million-dollar facility

So, let’s back away from the oil in your car and talk about the million dollar facility at which you are the maintenance manager or administrator. Nowlan and Heap found that wear-out is very often associated with simple or consumable items, but pronounced wear-out was found with only six percent of assets in their study. Conversely, complex systems more often showed high infant mortality where the conditional probability of failure “showed no marked point of increase with increasing age; the failure probability may increase gradually or remain constant, but there is no age that can be identified as the beginning of a wear-out zone.” (Nowlan and Heap 1978, p.48)

Take a look at the F pattern again, the one below. Now consider if a large, complex, expensive asset in your care fits this pattern. Now consider whether your maintenance or capital replacement plan calls for it to be replaced based on age-related factors. Does that still make sense? Think about the oil, but without the wear-out. Are you increasing both your expenditure and your risk of failure by prematurely exposing your assets to infant-mortality risks, when they actually aren’t going to experience any wear-out related failure rate increases? Nowlan and Heap identified this problem: “In fact, in many cases scheduled overhaul actually increases the overall failure rate by introducing a high infant-mortality rate in an otherwise stable system.” (Nowlan and Heap 1978, p.48)

In an ideal asset management world, we would classify all assets into their failure patterns and base our risk management plans on those failure patterns. Sometimes that is hard to do—often those failure data aren’t available because they weren’t collected in the past. But they are an integral part to a complete asset management philosophy; you should think about collecting these data now, for the benefit of your community’s future.

So, can you reduce risk by replacing old equipment? You decide.

When you increase your efficiency, everyone prospers.

Does age really equal increased risk of failure?

Case study: engine oil

Back to your million-dollar facility

Leave a Reply Cancel reply