ArticlePDF Available

Abstract

As component engineering has progressively advanced over the past 20 years to encompass a robust element of reliability, a paradigm shift has occurred in how complex systems fail. While failures used to be dominated by ‘component failures,’ failures are now governed by other factors such as environmental factors, integration capability, design quality, system complexity, built-in testability, etc. Of these factors, environmental factors are some of the most difficult to predict and assess. While test regimes typically encompass environmental factors, significant design changes to the system to mitigate any potential failures is not likely to occur due to the cost. The early stages of the systems engineering design process offer significant opportunity to evaluate and mitigate risks due to environmental factors. Systems that are expected to operate in a dynamic and changing environment have significant challenges for assessing environmental factors. For example, external failure initiating event probabilities may change with respect to time, and new discovered external initiating events may also be expected to have varying probabilities of occurrence with respect to time. While some industry standard methods such as Probabilistic Risk Assessment (PRA) [3] and Failure Modes and Effects Analysis (FMEA) [4] can partially address a time-dependent external initiating event probability, current methods of analyzing system failure risk during conceptual system design cannot. We have developed the Time Based Failure Flow Evaluator (TBFFE) to address the need for a risk analysis tool that can account for variable probabilities in initiating events over the duration of a system’s operation. This method builds upon the Function Based Engineering Design (FBED) [19] method of functional modeling and the Function Failure Identification and Propagation (FFIP) [9] failure analysis method that is compatible with FBED. Through the development of TBFFE, we have found that the method can provide significant insights into a design that is to be used in an environment with variable probability external initiating events. We present a case study of the conceptual design of a nuclear power plant’s spent fuel pool experiencing a variety of external initiating events that vary in probability based upon the time of year. The case study illustrates the capability of TBFFE by identifying how seasonally variable initiating event occurrences can impact the probability of failure on a monthly timescale that otherwise would not be seen on a yearly timescale. Changing the design helps to reduce the impact that time-varying initiating events have on the monthly risk of system failure.
... Common techniques of identifying failure risks and then mitigating them such as failure mode and effects analysis [6] and probabilistic risk assessment (PRA) [7,8] can miss emergent system behaviors and, while some information is provided to designers to aid in decision-making, little guidance is given on specific flow impacts due to failure events. Extensive work has been done to understand failure paths from a component and/or functional basis [9][10][11][12][13][14][15][16] but comparatively little effort has been expended in looking at flows of material, energy, and data through systems, and how their disruption or failure can impact overall system failure. ...
... In the context of this work, failure flow is defined as a flow that either is unexpectedly present or a flow that is unexpectedly absent. The inherent behavioral in functional models (IBFM) framework extends FFIP to include the ability to generate multiple functional models to drive toward a solution that can balance the cost and risk of a system, and a pseudo time-step [16,68,69]. A number of other risk and failure analysis tools have been developed from FFIP including the uncoupled failure flow state reasoner [11,70], a method of building prognostic systems in response to failure modeling [12], and other related methods and tools [13,14,[71][72][73]. ...
... For instance, failure flow can be defined in the context of a failure moving between components or functions [106]. Failure flow can also be defined as there being too high or too low of a flow [12], as a transient non-nominal condition in a flow that causes a steady-state failure in a function [14,16], as the reversal of a flow [107], or as a failure that jumps between functions without following a nominal flow path [11]. A more expansive definition of failure flow may be useful in expanding the capabilities of the method presented in this article. ...
Article
Full-text available
A challenge systems engineers and designers face when applying system failure risk assessment methods such as probabilistic risk assessment (PRA) during conceptual design is their reliance on historical data and behavioral models. This paper presents a framework for exploring a space of functional models using graph rewriting rules and a qualitative failure simulation framework that presents information in an intuitive manner for human-in-the-loop decision-making and human-guided design. An example is presented wherein a functional model of an electrical power system testbed is iteratively perturbed to generate alternatives. The alternative functional models suggest different approaches to mitigating an emergent system failure vulnerability in the electrical power system's heat extraction capability. A preferred functional model configuration that has a desirable failure flow distribution can then be identified. The method presented here helps systems designers to better understand where failures propagate through systems and guides modification of systems functional models to adjust the way in which systems fail to have more desirable characteristics.
... Work over the last decade has focused on a family of methods based around the function failure identification and propagation (FFIP) method [88] and the companion flow state logic (FSL) method [89]. The FFIP family of methods has been expanded to examine how prognostics and health management systems can be designed during system architecture [90], how failure flows may jump between systems using the uncoupled failure flow state reasoner (UFFSR) [91], how to protect against uncoupled failure flows within a system [92], how systems can deal with a variety of unanticipated external initiating events in a SoS [23], and several other important advances [93][94][95][96][97][98][99][100]. We use the FFIP family of methods extensively throughout the research in this paper. ...
... In this context, cut-sets are defined as the path that each failure flow travels from the initial failure event to exiting the system as a spurious failure flow emission. The definition is in line with how cut-sets have been used in recent FFIP-related research [23,95,98,99] and is similar to how cut-sets are defined in the PRA literature [60]. A table of system-level failure flows is generated from this step. ...
Conference Paper
Full-text available
Increasingly tight coupling and heavy connectedness in systems of systems (SoS) presents new problems for systems designers and engineers. While the failure of one system within a SoS may produce little collateral damage beyond a loss in SoS capability, a highly interconnected SoS can experience significant damage when one member system fails in an unanticipated way. It is therefore important to develop systems that are “good neighbors” with the other systems in a SoS by failing in ways that do not further degrade a SoS’s ability to complete its mission. This paper presents a method to (1) analyze a system for potential spurious emissions and (2) choose mitigation strategies that provide the best return on investment for the SoS. The method is suited for use during the system architecture phase of the system design process. A functional and flow approach to analyzing spurious emissions and developing mitigation strategies is used in the method. Use of the method may result in a system that causes less SoS damage during a failure event.
... One method is to use a hierarchical Bayesian analysis of the available data to estimate the failure frequency of tubes (Wang, Pandey, & Riznic, 2010). In this paper, due to the limited number of failures, the authors determined that a parametric analysis is necessary even with the potential for a high computational expense (Dempere, Papakonstantinou, O'Halloran, & Van Bossuyt, 2018). For the parametric analysis, a model-based systems engineering action diagram is developed to model the time dependent probability of reaching the leak state. ...
Article
Full-text available
Condensers are critical to the operation of naval vessels that utilize the Rankine cycle for propulsion. Eddy current analysis is a nondestructive evaluation of the integrity of seawater tubes in condensers. Defects significant enough to be expected to allow seawater to leak into the steam side of the condenser prior to the next inspection are identified and plugged. In this paper, the interval between eddy current inspections is determined with a known probability of a tube leak occurring prior to the next inspection based on the results of past inspections. Ship maintainers will be able to optimize the inspection periodicity, thus reducing life-cycle maintenance costs within an acceptable risk. Condenser tube degradation is modeled along with eddy current inspection accuracy to determine the probability of a defect growing to a leak. A case study is presented that evaluates the impacts of inspection frequency and tube-plugging limit on the probability of a leak.
... The Dedicated Failure Flow Arrestor Function (DFFAF) method replicates placing physical barriers between redundant systems to prevent a failure in one system from crossing an air gap to the other system [13]. Other methods such as Function Flow Decision Functions (FFDF) [14], a A System Design Method to Reduce Cable Failure Propagation Probability in Cable Bundles method of developing prognostic and health management systems via functional failure modeling [15], the Time Based Failure Flow Evaluator (TBFFE) method [16], and methods to understand potential functional failure inputs to systems that are hard to predict [17] have added additional capabilities to FFM in an effort to develop a more complete FFM toolbox for practitioners. ...
Article
Full-text available
This paper presents a method of assessing cable routing for systems with significant cabling to help system engineers make risk-informed decisions on cable routing and cable bundle management. We present the Cable Routing Failure Analysis (CRFA) method of cable routing planning that integrates with system architecture tools such as functional modeling and function failure analysis. CRFA is intended to be used during the early conceptual stage of system design although it may also be useful for retrofits or overhauls of existing systems. While cable raceway fires, cable bundle severing events, and other common cause cable failures (e.g., rodent damage, chemical damage, fraying and wear-related damage, etc.) are known to be a serious issue in many systems, the protection of critical cabling infrastructure and separation of redundant cables is often not taken into account until late in the systems engineering process. Cable routing and management often happens after significant system architectural decisions have been made. If a problem is uncovered with cable routing, it can be cost-prohibitive to change the system architecture or configuration to fix the issue and a system owner may have to accept the heightened risk of common cause cable failure. Given the nature of cables where energy and signal functions are shared between major subsystems, the potential for failure propagation is significant.
Article
Full-text available
An open area of research for complex, cyber‐physical systems is how to adequately support decision making using reliability and failure data early in the systems engineering process. Having meaningful reliability and failure data available early offers information to decision makers at a point in the design process where decisions have a high impact to cost ratio. When applied to conceptual system design, widely used methods such as probabilistic risk analysis (PRA) and failure modes effects and criticality analysis (FMECA) are limited by the availability of data and often rely on detailed representations of the system. Further, existing methods for system reliability and failure methods have not addressed failure propagation in conceptual system design prior to selecting candidate architectures. Consideration given to failure propagation primarily focuses on the basic representation where failures propagate forward. In order to address the shortcomings of existing reliability and failure methods, this paper presents the function failure propagation potential methodology (FFPPM) to formalize the types of failure propagation and quantify failure propagation potential for complex, cyber‐physical systems during the conceptual stage of system design. Graph theory is leveraged to model and quantify the connectedness of the functional block diagram (FBD) to develop the metrics used in FFPPM. The FFPPM metrics include (i) the summation of the reachability matrix, (ii) the summation of the number of paths between nodes (i.e., functions) i and j for all i and j, and (iii) the degree and degree distribution. In plain English, these metrics quantify the reachability between functions in the graph, the number of paths between functions, and the connectedness of each node. The FFPPM metrics can then be used to make candidate architecture selection decisions and be used as early indicators for risk. The unique contribution of this research is to quantify failure propagation potential during conceptual system design of complex, cyber‐physical systems prior to selecting candidate architectures. FFPPM has been demonstrated using the example of an emergency core cooling system (ECCS) system in a pressurized water reactor (PWR).
ResearchGate has not been able to resolve any references for this publication.