Editor's Note: This is the second article in a series about machinery health, to read the previous article, click here.

Potential to Failure (P-F) curves graphically display the failure time cycle and measurement techniques that can be used to detect asset failures prior to reaching the asset incurring functional failure. Proactive strategies should focus on managing assets high on the P-F curve, or early (P1 to P5) in the failure cycle (Figure 1). The ability to detect failures early in development allows top quartile performers to proactively manage their maintenance programs by understanding the health of their assets. Many companies, however, find it difficult to operate proactively and continually react to assets that reach functional failure with little or no warning.

Figure 1. Understanding the Potential to Functional Failure (P-F) Curve

Understanding Asset Criticality and the Impact on Maintenance Strategies

All assets do not have the same failure consequences. Understanding how failure affects an asset is essential to defining strategies to mitigate the impact.

First, understand that asset failures can impact various aspects of a business, including safety, environment, regulatory compliance, product quality, production and operations and maintenance costs. Understanding what is important to the business can help in prioritizing and weighting the impact of failure. For example, a failure that results in a safety incident or death should be regarded as more critical than a failure that results in a poor quality product. While both are important to understand and mitigate, addressing the safety impact takes precedence. In companies that are physical asset intensive, top performers assign a weighted value to the areas of failure impact and develop comprehensive rankings of asset criticalities. Top performers use this knowledge to apply appropriate failure mitigation strategies.

Figure 2

Figure 2 shows an example of a distribution of assets (%) by criticality and what asset strategies (see below) might be applied. Understanding asset criticality and business goals is key to applying the right strategy.

Maintenance Strategies

Reliability Centered Maintenance (RCM) is a systematic, disciplined process to ensure safety and mission compliance. RCM defines system boundaries and identifies system functions, functional failures and likely failure modes for equipment and structures in a specific operating context. Because it is a time- and resource-intensive process, the application of RCM is typically applied only to the top 15 percent (defined as part of the strategy) of the highly critical and critical assets.

Development of complete Failure Modes and Effects Analysis (FMEAs) is applied to about 55 percent of the assets, specifically those in the mid to low level criticalities. This technique applies unique FMEAs for each asset. Developing FMEAs require less time and resources to complete than applying RCM, yet drive the mitigation and controls to address the asset failure modes.

The application of FMEA or asset class specific predefined maintenance templates is a strategy that can be applied to about 25 percent of the assets, primarily those identified as having low critically.

Run-to-failure (RTF) strategies are typically applied to the bottom 5 percent of the assets in terms of criticality, primarily those where it is acceptable for failures to occur without prior warning. The most cost-effective approach in these cases is to simply replace and not maintain the asset.

The percentages above and in Figure 2 are based on experience and will ultimately depend on the specific criticality based maintenance strategy.

Condition Monitoring Methodologies

Choosing the most effective asset failure mitigation strategy will come from the application of RCM, FMEA, templates and RTF strategies. Using these results to drive the right Condition Monitoring (CM) and Predictive Maintenance (PdM) technologies is key to understanding asset health and essential to optimizing return on investment. Companies that are physical asset intensive should strive to have the correct mix of preventive, predictive, proactive and run-to-failure maintenance strategies (Figure 3) for managing their assets.

Figure 3

The appropriate CM/PdM strategy will vary depending on the failure mode and how long it takes a failure to manifest itself from detection to functional failure, which is known as the failure cycle. The failure cycle is graphically displayed on the asset's P-F curve (Figure 1). CM programs can be as basic as collecting and analyzing periodic oil samples or monitoring process variables (pressure, flow, etc.) available for the asset.

While these approaches may be sufficient for some assets, certain failure modes require additional data and less time between collection intervals to detect changes in condition and to proactively manage and possibly prevent the failure. For some failure modes, periodic data collection (in intervals ranging from once a week to once every six months, based upon the failure cycle) can detect failures with sufficient time to plan the required maintenance. Individual failure modes, failure consequences, failure detectability and the lead time in predicting functional failure help in determining whether to use continuous online, scanning, or portable data collection frequencies and methodologies.

Under maintaining or under-instrumenting a highly critical asset might ensure lower planned costs, but may also result in poor reliability, high reactive maintenance (RM) costs, poor asset performance and unacceptably high overall risk to the business. Conversely, over maintaining or over-instrumenting a non-critical asset will incur higher-than-necessary planned costs compared to the level of risk reduction that can be achieved. The optimum level of investment targets the right assets with the right mix of planned maintenance, resources and technology, thereby reducing asset risk to a tolerable level at manageable planned costs. Optimizing strategy over time (dotted lines) will result in the right risk reduction while balancing costs (Figure 4).

Figure 4

Typically, highly critical and many critical assets need more frequent sampling and require an on-line system. On-line systems sample data continuously and often offer automated relays and shut down systems or automated alarms to address failures with low failure cycles and high failure consequences. In addition to these traditional on-line API 670 compliant protection systems, scanning systems (including both traditional wired and newer wireless scanning) can be applied to assets that are critical, mid-level, and some low-level criticalities, but have failure cycles that allow for proper detection by the applied technology. Advances in technology have made wireless scanning a more viable CM option than in the past. While wired systems have a higher bandwidth and can provide dynamic waveform data more frequently than wireless systems, wireless systems have a much lower installation cost than that of wired systems.  Wireless scanning systems are appropriate predominantly on assets with detection-to-failure cycles of greater than two hours. In addition, scanning systems should only be used on assets where continuous machinery protection (immediate, automated shutdown) is not appropriate. The most critical assets may require continuous monitoring for a combination of reasons. Many of their failure modes can occur rapidly, resulting in catastrophic damage to the asset.

It is always beneficial to identify asset failures early so that parts may be ordered and maintenance can be planned at the right time to optimize production and reduce the costs. A continuous monitoring program may also utilize thermal performance data (such as pressures, temperatures, flow) to continuously analyze and optimize the asset to maximize production efficiency.

Proactive-Centered Maintenance (PCM)

Many companies spend time reacting to asset failures (RM) due to a lack of a proactive strategy and/or resources. More advanced maintenance strategies incorporate PdM or CM, without thoroughly analyzing the asset criticality or consequence of failure. While these strategies may be effective, they both have some limitations (see below).

Shortcomings from relying solely on Preventive Maintenance:

  • Not based on failure modes:
  • OEM requirements time-based, not condition based
  • Not enough detail in PM tasks to be value added
  • Too many inappropriate tasks
  • Asset unavailability during outage can outweigh reliability gains
  • Lack of management focus toward optimizing the PM program
  • Tasks not grouped to leverage efficient execution
  • Intrusive PM tasks performed regardless of asset condition
  • Not quantitative
  • Inadequate PMs rarely removed from the PM program

Shortcomings from relying on Predictive Maintenance without conducting RCM:

  • Not based on failure modes
  • Not integrated with other reliability tools (PM, Lean Six Sigma, etc.)
  • Personnel not properly trained on all technologies
  • Focus on data collection and not enough on driving proactive actions
  • Output not tied to work flow
  • Training is inadequate for the program expectations
  • Overdependence on one technology
  • Maintenance not focused on maintaining asset health

Top performers also apply Root Cause Failure Analysis (RCFA). The business defines the criteria for conducting RCFA based on business strategies and goals. This typically includes failure consequences, failure modes and suitability of the current failure mitigation strategies, including PdM, PM, and RTF.

RCFA helps drive Proactive Centered Maintenance (PCM),  which optimizes the use of all of the above maintenance and CM methodologies. It emphasizes the right maintenance on the right assets at the right time. In most cases, a PCM approach increases the use of PdM, while continuing to use PM and limiting RM to assets with no failure consequences. However, PCM also emphasizes improving procedures, operating parameters, processes and designs to limit or prevent recurring failures, reduce asset failures, and extend the mean time between asset failures. This can result in up to a 42 percent reduction in maintenance costs when compared to PM and up to a 59 percent reduction when compared to RM. In addition, a PCM program can reduce RM to 20 percent or below of the total time dedicated to maintenance, while 80 percent or more of the effort will be spent on predictive and preventive maintenance and on process/procedural/design improvements (see Figure 3).

Implementing proactive measures resulting from RCM can help reduce costs and risks. Maintaining assets proactively and implementing design, process and procedural changes can reduce the probability of failure.

Conclusion

Top quartile performers incorporate PdM and CM on a majority of their assets as a means of proactively addressing asset condition and mitigating the risk of failure. After going through an RCM analysis, FMEA will typically show that 80 percent of non-RM tasks require some form of CM, while less than 20 percent require time-based preventive maintenance tasks.

Companies who drive proactive strategies learn from past failures, successes and results of RCM and FMEA analysis and can use the knowledge to redesign processes, procedures, and engineering designs to help reduce failures. Moving from reactive to predictive to proactive maintenance requires operational effectiveness from the entire organization. A proactive approach will ensure that maintenance dollars are spent on the right assets at the right time and that assets are available when needed to meet production demands.

In the third article in this series, we will discuss the use of wireless systems as part of a Condition Based Maintenance program and how this technology enhances a PCM strategy.

Pumps and Systems, July 2009

Issue