ArticlePDF Available

Abstract and Figures

This paper presents a method of assessing cable routing for systems with significant cabling to help system engineers make risk-informed decisions on cable routing and cable bundle management. We present the Cable Routing Failure Analysis (CRFA) method of cable routing planning that integrates with system architecture tools such as functional modeling and function failure analysis. CRFA is intended to be used during the early conceptual stage of system design although it may also be useful for retrofits or overhauls of existing systems. While cable raceway fires, cable bundle severing events, and other common cause cable failures (e.g., rodent damage, chemical damage, fraying and wear-related damage, etc.) are known to be a serious issue in many systems, the protection of critical cabling infrastructure and separation of redundant cables is often not taken into account until late in the systems engineering process. Cable routing and management often happens after significant system architectural decisions have been made. If a problem is uncovered with cable routing, it can be cost-prohibitive to change the system architecture or configuration to fix the issue and a system owner may have to accept the heightened risk of common cause cable failure. Given the nature of cables where energy and signal functions are shared between major subsystems, the potential for failure propagation is significant.
Content may be subject to copyright.
Page 5The Journal of RMS in Systems Engineering Winter 2018–19
Douglas L. Van Bossuyt
Bryan M. O’Halloran
Nikolaos Papakonstantinous
Summary & Conclusions
is paper presents a method of assessing cable routing for systems
with signicant cabling to help system engineers make risk-in-
formed decisions on cable routing and cable bundle management.
We present the Cable Routing Failure Analysis (CRFA) method of
cable routing planning that integrates with system architecture tools
such as functional modeling and function failure analysis. CRFA
is intended to be used during the early conceptual stage of system
design although it may also be useful for retrots or overhauls of
existing systems.
While cable raceway res, cable bundle severing events, and
other common cause cable failures (e.g., rodent damage, chemical
damage, fraying and wear-related damage, etc.) are known to be
a serious issue in many systems, the protection of critical cabling
infrastructure and separation of redundant cables is often not taken
into account until late in the systems engineering process. Cable
routing and management often happens after signicant system
architectural decisions have been made. If a problem is uncovered
with cable routing, it can be cost-prohibitive to change the system
architecture or conguration to x the issue and a system owner may
have to accept the heightened risk of common cause cable failure.
Given the nature of cables where energy and signal functions are
shared between major subsystems, the potential for failure propaga-
tion is signicant.
A System Design Method
to Reduce Cable Failure
Propagation Probability
in Cable Bundles
Page 6The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
rough a more complete understand-
ing of power and data cabling requirements
during system architecting, a system de-
sign can be developed that minimizes the
potential for collocation of critical cable
infrastructure. Reductions in critical ca-
bling collocation may lead to a reduction in
potential failure propagation pathways. e
CRFA method presented in this paper relies
on functional failure propagation probability
calculation methods to identify and avoid
potential high-risk cable routing choices.
e implementation of the CRFA method
may help system engineers to design systems
and facilities that protect against cabling
failure propagation events (cable raceway
res, cable bundle severing events, etc.)
during system architecture. Implementing
CRFA in the system architecture phase of
system design may help practitioners to
increase system reliability while reducing
system design costs and system design time.
1. Background
e CRFA method presented in this pa-
per relies upon several key areas of existing
research and industry methods including
complex system design, Functional Failure
Modeling (FFM), and Probabilistic Risk
Assessment (PRA). e important aspects
of each area necessary to understand and
make use of the CRFA method are reviewed
in this section.
With increasing system complexity,
design methods used for relatively sim-
ple product design are replaced by design
methodologies specically suited for com-
plex systems [1, 2]. Functional modeling
is often used in the early conceptual phase
of system design (generally referred to as
system architecture although this denition
is not universally accepted) [3]. Functional
models represent basic system functions
and the basic ows of information, material,
or energy transferred between individual
functions and through the system boundary
[1]. Individual functions perform actions on
energy, material, or information ows [4].
Functional modeling as generally practiced
in system architecting eorts often only
analyzes nominal system congurations and
states. Extensions to functional modeling
have been developed over the last decade to
analyze potential failure propagation paths
and determine mitigation strategies [5].
Function Failure Identication Propagation
(FFIP) was developed to model failure ows
propagating through system functions and
the resulting system-level failure outcomes
[3, 6]. FFIP can be used to predict failure
propagation paths and failure outcomes.
However, FFIP cannot account for failures
that cross functional boundaries or most
common cause failures. e Function Failure
Design Method (FFDM) provides a Failure
Modes and Eects Analysis (FMEA)-style
failure analysis tool to be used with func-
tional modeling [7, 8, 9, 10]. FFDM can
be used to nd a large variety of potential
failure modes for individual functions but
FFDM cannot analyze failure propagations
across non-nominal ow paths or com-
mon cause failure events. e Uncoupled
Failure Flow State Reasoner (UFFSR) was
developed to address the issue of analyzing
uncoupled failure ow propagation in FFM
[11, 12]. e UFFSR provides a geometric
basis for analyzing failure ow propagation
across uncoupled functions. An extension
of UFFSR was developed to model failure
ow arrestor functions in functional mod-
eling. e Dedicated Failure Flow Arres-
tor Function (DFFAF) method replicates
placing physical barriers between redundant
systems to prevent a failure in one system
from crossing an air gap to the other sys-
tem [13]. Other methods such as Function
Flow Decision Functions (FFDF) [14], a
Page 7The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
method of developing prognostic and health
management systems via functional fail-
ure modeling [15], the Time Based Failure
Flow Evaluator (TBFFE) method [16], and
methods to understand potential functional
failure inputs to systems that are hard to
predict [17] have added additional capabil-
ities to FFM in an eort to develop a more
complete FFM toolbox for practitioners.
PRA is a well-established discipline of
risk analysis with over 50 years of heritage
for complex systems used in a variety of
industries including aerospace, petroleum,
automotive, and civilian nuclear power,
among other areas. System failure models
are developed using event and fault trees
where event trees generally show the pro-
gression of a failure through systems and
fault trees generally show the progression of
failure within systems. Probabilistic fail-
ure data is attached to basic failure events
and through Bayesian statistical methods
and Boolean algebra, a probabilistic system
failure rate can be calculated. However, PRA
in its basic form does not capture emer-
gent system behavior during failure events.
Instead, specic methodologies are used to
assess specic emergent system behavior
such as during re or ood events in civilian
nuclear reactors [18, 19, 20, 21, 22, 23, 24,
25]. While many emergent system behav-
iors are identied by re and ood analysis,
other emergent system behaviors can remain
hidden from analysts [26, 19, 27, 28].
Common cause failure in particular
has had signicant attention paid over
the course of PRA methodological devel-
opment. Failure inducing events such as
maintenance errors across a series of identi-
cal, redundant valves can lead to a common
cause failure of all maintained valves. Fire
and ood events often can become common
cause failures, causing failure of every system
in a specic area of a system. Other exam-
ples include explosive, toxic, or radioactive
gas clouds; salt mine or hard rock tunnel
collapse; airplane, space debris, meteor, and
other impacts; and explosive deconstruc-
tion of rotating turbomachinery sending
out shrapnel. Several methods have recently
been developed to address common cause
failure in functional modeling [29, 30, 31,
32, 33, 34, 35, 36]. However, no method cur-
rently exists in the FFM toolbox to address
the issue of common cause failure events
destroying or disabling multiple cables rout-
ed through the same cable pathways, ducts,
raceways, bulkhead or wall penetrations, or
other cable routing methods. Most eorts
in cable management to prevent common
cause failures focus on separating redundant
and backup system cabling; isolating control,
motive power, and instrumentation cabling
from one another; and ensuring adequate
breaker coordination to prevent ground fault
wire ignition events in cable raceways. ese
eorts are typically performed after system
architecting eorts have been completed and
ignore potential benets of analyzing and
planning cable routing and bundling in the
early phases of design.
2. Methodology & Case Study
e CRFA method presented in this sec-
tion provides practitioners a useful method
to develop a better understanding of cable
routing and management during system
architecture from a risk-based perspective.
is section details the CRFA methodology
and presents a case study of cable routing
in a simplied Pressurized Water Reactor
(PWR) nuclear power plant primary coolant
loop pumping room where three redun-
dant pumping systems are co-located. Two
pumps are required to be active at all times
for proper core cooling with the third pump
acting as a “swing” pump for maintenance
purposes or coming online during a failure
Page 8The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
event involving one of the other pumps.
Step 1 of the CRFA method is to devel-
op a functional model. Figure 1 shows the
functional model of the pump room.
Step 2 involves calculating the system
failure probabilities and failure ow paths
using FFIP or other related FFMs as de-
sired. Here we use FFIP to calculate the
failure rate of the system. In the case study,
the system failure rate is calculated using
FFIP at 5.3E-4/yr.
Step 3 associates failure probabilities
with individual cables failing leading to a
potential common cause failure event of all
co-located cables. A practitioner used to the
FFIP methodology can think of this step
as adding another functional block into the
functional model to represent a cable, rather
than using a functional ow to represent the
transmission of signal, energy, or material.
For those who are more familiar with PRA,
this is similar to adding a basic event of a
common cause failure to a fault tree. For the
purposes of the case study presented to illus-
trate CRFA method presented here, cables
are dened as any electrical physical con-
veyance device which is generally referred to
as a cable, wire, conductor, etc. e authors
have found that CRFA can also be used
with optical cables, pneumatic and hydraulic
hoses and hard piping, and some bulk ma-
terial transport systems (e.g., conveyor belts,
pneumatic tubes, slurry chutes, etc.). In the
case study, individual cable failure rates were
chosen from an appropriate and proprietary
generic cabling failure database.
Step 4 determines all possible cable group-
ings. In this step, the practitioner can identify
any specic cables that cannot be located next
to other cables for regulatory or other reasons,
and any specic cables that must be co-located.
For example, if three cables are being analyzed,
there are nine total possible cable combina-
tions. e case study has a total of 12 cables
with 516 possible combinations.
Step 5 analyzes system failure probability
when two or more cables are co-located in a
raceway. e cable failure probabilities from
Step 3 are used to determine if all cables in a
cable bundle may fail simultaneously. FFIP is
run with each potential cable grouping iden-
tied in Step 4. Results for each cable group-
ing are kept separate and rank ordered from
highest to lowest system failure probability.
Step 6 sets the maximum threshold for
system failure probability. e authors advise
that the threshold be set above the base
FFIP calculation as FFIP does not gener-
ally take into account common cause cable
failure. en all cable groupings that exceed
the threshold value are marked as unaccept-
CABLE GROUPS
Cable group: Group331
CONTROL_SIGNAL_2
POWER_BUS_1
POWER_BUS_2
POWER_BUS_3
Group failure probability: 0.0077
System fails: true
Cable group: Group415
CONTROL_SIGNAL_3
POWER_BUS_1
POWER_BUS_2
POWER_BUS_3
Group failure probability: 0.0077
System fails: true
Cable group: Group252
CONTROL_SIGNAL_2
CONTROL_SIGNAL_3
POWER_BUS_1
POWER_BUS_2
Group failure probability: 0.0074
System fails: true
Table 1: Representative CRFA results including cable groupings
with highest system failure probabilities for the primary coolant
loop pumping room case study.
Page 9The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
able congurations from a risk perspective.
All cable groupings that were not marked as
unacceptable congurations are thus accept-
able from a risk perspective and can be used,
assuming no other mitigating circumstanc-
es, in physical system design. If no cable
conguration is acceptable, this indicates a
redesign of the functional model is needed.
Additional redundant systems or redundant
cables may also be warranted. Table 1 pres-
ents partial results from the case study where
a total of 516 potential cable groupings were
identied, 210 groupings were rejected due
to co-location exclusions (Step 4), and 313
groupings were eliminated due to exceeding
the maximum threshold set in Step 6, re-
sulting in 38 potential cable routing cong-
urations meeting all criteria identied in the
CRFA method.
e CRFA method is now complete.
Periodically through the rest of the concep-
tual design phase, CRFA should be re-run
to verify that appropriate cable groupings
and separations are maintained to meet fail-
ure probability expectations. When moving
from system architecture and early system
design into physical system design and lay-
out, the information from CRFA can then
be used to develop cable raceways and locate
individual cables.
3. Discussion
e CRFA method presented in the pre-
vious section has been implemented in
software and automated. Figure 2 presents
the Graphical User Interface (GUI) of the
CRFA software tool that the authors de-
veloped. e case study in this paper was
prepared using the software implementation
of CRFA. In the future, the CRFA software
is slated for integration with a larger eort
to develop a complete FFM software toolkit.
In the authors’ experience, evidence of
the success of CRFA can often be seen in
redundant systems cabling being isolated
from one another. Often this is because
of Step 4 identifying cables that cannot
be co-located. However, the authors have
observed CRFA identifying on its own that
redundant system cabling should not be
co-located due to increased system failure
probability. It is also possible that if the
maximum threshold set in Step 6 is su-
ciently high, redundant system cabling isola-
tion may not be observed. is is potentially
indicative of too high of a threshold being
set or may also indicate that redundant sys-
tem cabling is unnecessary. It is recommend-
ed that further review of the results and a
deeper understanding of why certain cables
are more or less isolated is sought before
moving forward if either case is identied.
Figure 2: The GUI of the software implementation of CRFA.
Page 10The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
While small-scale cable routing stud-
ies can be conducted using PRA tools and
larger complex system cable routing analysis
can be performed using specialized meth-
ods, the method presented in this paper
integrates cable routing failure analysis with
other FFMs, allowing a more holistic and
integrated approach to system risk analysis.
CRFA also provides the capability of ana-
lyzing common cause cable failures much
earlier in the system design process during
system architecture than existing methods
allow. Shifting the analysis of common cause
failures from cable routing to earlier in the
system design process may save both time
and money in the design process.
In the case where PRA is used to analyze
cable failures without analyzing re, ood,
or missile (turbomachinery shrapnel) com-
mon cause failure, the PRA results will likely
underestimate failure probability. Even when
analyzing the re, ood, or missile common
cause failure sources, the results will likely
not present as full and accurate of a picture of
cable grouping failure risks as CRFA does.
CRFA has been used to conduct analysis
on a variety of systems including civilian nu-
clear power plants of several types, aerospace
systems, automotive systems, and defense
systems. e results are promising and have
been useful for practitioners to understand
how cable routing and management can be
greatly impacted by system architectural de-
cisions. Feedback from some users of CRFA
indicate a desire for CRFA to be integrated
into commonly used model based systems
engineering (MBSE) tools.
Further development of CRFA is antic-
ipated including a more nuanced approach
to cable bundling. CRFA assumes that all
cables co-located in a raceway will all fail
simultaneously when a common cause fail-
ure event occurs. However, not all common
cause failure events will cause all cables to
fail. For instance, a very hungry rat will not
simultaneously eat through all data cables
in a large bundle. A potential extension of
CRFA may be to include aspects of TBFFE
in the modeling of cable bundle failures to
represent failure of cables in a bundle over
time. us, CRFA is a conservative meth-
od in this regard. Another area of future
improvement for CRFA is integrating the
method with uncoupled failure ow meth-
ods such as UFFSR. Uncoupled failure ows
can be accounted for to some degree in Step
3 by assigning failure probabilities for com-
mon cause cable failures from potential un-
coupled sources such as missiles or oods (of
cable insulation-eating liquids). However,
some sources of uncoupled failure ow may
be missed without integration of UFFSR.
Further future work includes adding
the ability to the software implementation
of CRFA to automatically add redundant
cabling. For instance, civilian nuclear power
plants often contain three redundant sensors
with three redundant cables where a func-
tional model may only show one functional
block to represent the three redundant sen-
sors and cables. Additional automation may
provide the practitioner with a more rapid
development process.
4. Conclusion
e CRFA method presented here provides
a novel way of analyzing cable routing and
determining cable routing schemes that are
below a desired system failure probability
threshold. Protecting critical cabling infra-
structure and separating redundant cables is
vitally important to ensuring that a common
cause failure does not cause a system-level
failure event. Cable routing and planning
currently happens late in the design process
after major architectural decisions have been
made and during physical system design.
e CRFA method brings the analysis and
Page 11The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
design of cable raceways and cable sepa-
ration to the system architecting phase of
system design using FFM as a basis for
further analysis. By having a more complete
understanding of cable requirements during
the early phases of system design, a system
architecture and design can emerge that
minimizes critical cabling infrastructure
co-location and identies the need for addi-
tional redundant cabling needs. Implement-
ing CRFA may help engineering practi-
tioners design complex systems and facilities
that guard against cable failure propagation
events that could disable or destroy the core
functionality of the system. us, system
reliability is expected to be increased while
driving down system risks that may other-
wise have gone unaddressed.
5. Acknowledgements
is research was partially supported by
United States Nuclear Regulatory Com-
mission Grant Number NRC-HQ-84-
14-G-0047 and by the Naval Postgraduate
School. Any opinions or ndings of this
work are the responsibility of the authors,
and do not necessarily reect the views of
the sponsors or collaborators. e case study
or example presented in this paper may not
be used or construed as an analysis of a spe-
cic system or plant and is only provided for
illustrative purposes of the method.
References
1. R. B. Stone and K. L. Wood, "Development of a Functional
Basis for Design," ASME Journal of Mechanical design, vol.
122, no. 4, pp. 359-370, 2000.
2. D. L. Van Bossuyt, I. Y. Tumer and S. D. Wall, "A case for
trading risk in complex conceptual design trade studies,"
Research in Engineering Design, vol. 24, no. 3, pp. 259-275,
2013.
3. D. Jensen, T. Kurtoglu and I. Y. Tumer, "Flow State Logic
(FSL) for Analysis of Failure Propagation in Early Design,"
in ASME International Design Engineering Technical Con-
ference IDETC/CIE, San Diego, CA, 2009.
4. J. Hirtz, R. Stone, D. McAdams, S. Szykman and K. Wood, "A
Functional Basis for Engineering Design: Reconciling and
Evolving Previous Eorts," Research in Engineering Design,
vol. 13, no. 2, pp. 65-82, 2002.
5. I. Y. Tumer and R. B. Stone, "Mapping Function to Failure
Mode During Component Development," Research in Engi-
neering Design, vol. 14, no. 1, pp. 25-33, 2003.
6. T. Kurtoglu and I. Y. Tumer, "A Graph-Based Fault Identifica-
tion and Propagation Framework for Functional Design of
Complex Systems," ASME Journal of Mechanical Design,
vol. 130, no. 5, 2008.
7. M. Stock, R. B. Stone and I. Y. Tumer, "Going Back in Time
to Improve Design: The Function-Failure Design Method,"
in ASME Design Engineering Technical Conference DTM,
Chicago, Il, 2003.
8. K. G. Lough, R. B. Stone and I. Y. Tumer, "Function Based
Risk Assessment: Mapping Function to Likelihood," in
ASME International Design Engineering Technical Confer-
ence DET, Long Beach, CA, 2005.
9. M. Stock, R. B. Stone and I. Y. Tumer, "Linking Product
Functionality to Historic Failure to Improve Failure Analysis
in Design," Research in Engineering Design, 2005.
10. R. A. Roberts, R. B. Stone and I. Y. Tumer, "Deriving Func-
tion-Failure Information for Failure-Free Rotocraft Com-
ponent Design," in ASME Design Engineering Technical
Conference DETC, Montreal, Canada, 2002.
11. I. Ramp and D. L. Van Bossuyt, "Toward an Automated
Model-Based Geometric Method of Representing Function
Failure Propagation Across Uncoupled Functions," in ASME
International Mechanical Engineering Congress and Expo-
sition IMECE, Montreal, Canada, 2014.
12. Bryan M. O'Halloran, N. Papakonstantinou and D. L. Van
Bossuyt, "Modeling of Function Failure Propagation Across
Uncoupled Systems," in Reliability and Maintainability
Symposium (RAMS), Palm Harbor, FL, 2015.
13. M. R. Slater and D. L. Van Bossuyt, "Toward a Dedicated
Failure Flow Arrestor Function Methodology," in ASME
International Design Engineering Technical Conference and
Computers in Information Conference, Boston, MA, 2015.
14. A . R. Short and D. L. Van Bossuyt, "Rerouting Failure Flows
Using Logic Blocks in Functional Models for Improved Sys-
tem Robustness: Failure Flow Decision Functions," in ASME
International Design Engineering Technical Conference and
Computers and Information in Engineering Conference,
Boston, MA, 2015.
15. G. L'Her, D. L. Van Bossuyt and B. M. O'Halloran, "Prognos-
tic systems representation in a function-based Bayesian
model during engineering design," International Journal
of Prognostics and Health Management, vol. 8, no. 2, p. 23,
2017.
16. J. Dempere, N. Papakonstantinou, B. O'Halloran and D. Van
Bossuyt, "Risk Modeling of Variable Probability External
Initiating Events in a Functional Modeling Paradigm," The
Journal of Reliability, Maintainability, and Supportability in
Systems Engineering, 2018.
17. D. L. Van Bossuyt, B. M. O'Halloran and R. M. Arlitt, "Irra-
tional System Behavior in a System of Systems," in IEEE
System of Systems Engineering Conference, Paris, 2018.
18. M. Stamatelatos and D. Homayoon, Probabilistic Risk
Assessment Procedures Guide for NASA Managers and
Practitioners, NASA, 2011.
19. US Nuclear Regulatory Commission, "Standard Review Plan
for the Review of Safety Analysis Reports for Nuclear Pow-
er Plants: LWR Edition - Severe Accidents (NUREG-0800,
Chapter 19)," US NRC, 2012.
20. D. L. DeMott, "PRA as a Design Tool," in Reliability and
Maintainability Symposium (RAMS), 2011.
21. W. E. Vesely, "Extended Fault Modeling Used in the Space
Shuttle PRA," in Reliability and Maintainability Symposium
(RAMS), 2004.
22. L. Meshkat, "Probabilistic Risk Assessment for Decision
Making During Spacecraft Operations," in Reliability and
Maintainability Symposium (RAMS), 2009.
23. L. L. Lydia, A. J. Ingegneri, L. Ming and D. F. Everett, "Prob-
abilistic Risk Assessment: A Practical and Cost Eective
Approach," in Reliability and Maintainability Symposium,
Page 12The Journal of RMS in Systems Engineering Winter 2018–19
A System Design
Method to Reduce
Cable Failure
Propagation
Probability in
Cable Bundles
2007.
24. J. Zamanali, "Probabilistic Risk Assessment Applications in
the Nuclear Power Industry," IEEE Transactions on Reliabili-
ty, vol. 47, no. 3, 1998.
25. T.-Y. Hsiao and C.-N. Lu, "Risk Informed Design Refinement
of a Power System Protection Scheme," IEEE Transactions
on Reliability, vol. 57, no. 2, pp. 311-321, 2008.
26. C. Dunglinson and H. Lambert, "Interval Reliability for Ini-
tiating and Enabling Events," IEEE Transactions on Reliabili-
ty, vol. 32, no. 2, pp. 150-163, 1983.
27. M. Garvey, F. Joglar and E. P. Collins, "HRA for Detection
and Suppression Activities in Response to Fire Events," in
Reliability and Maintainability Symposium (RAMS), 2014.
28. US Nuclear Regulatory Commission, "PRA Procedures
Guide: A Guide to the Performance of Probabilistic Risk As-
sessments for Nuclear Power Plants (NUREG/CR-2300),"
US NRC, 1983.
29. S. Sierla, B. O'Halloran, T. Karhela, N. Papakonstantinou
and I. Y. Tumer, "Common Cause Failure Analysis of
Cyber-Physical Systems Situated in Constructed Environ-
ments," Research in Engineering Design, vol. 24, no. 4, pp.
375-394, 2013.
30. M. Myrsky, H. Nikula, S. Sierla, J. Saarinen, N. Papakonstan-
tinou, V. Kyrki and B. O'Halloran, "Simulation-Based Risk
Assessment of Robot Fleets in Flooded Environments," in
IEEE Conference on Emerging Technologies and Factory
Automation (ETFA), 2013.
31. N. Papakonstantinou, S. Sierla, D. C. Jensen and I. Y. Tumer,
"Simulation of Interactions and Emergent Failure Behavior
During Complex System Design," Journal of Computing and
Information Science in Engineering, vol. 12, no. 3, 2012.
32. R. P. Hughes, "A New Approach to Common Cause Failure,"
Reliability Engineering, vol. 17, no. 3, pp. 211-236, 1987.
33. K. N. Fleming, A. Mosleh and R. K. Deremer, "A Systematic
Procedure for the Incorporation of Common Cause Events
into Risk and Reliability Models," Nuclear Engineering and
Design, vol. 93, no. 2, pp. 245-273, 1986.
34. W. E. Vesely, "Estimating Common Cause Failure Probabili-
ties in Reliability and Risk Analyses: Marshall-Olkin Specal-
izations," Nuclear Systems Reliability Engineering and Risk
Assessment, pp. 314-341, 1977.
35. H. W. Lewis, R. J. Budnitz, W. D. Rowe, H. C. Kouts, F.
Von Hippel, W. B. Loewenstein and F. Zachariasen, "Risk
Assessment Review Group Report to the US Nuclear Regu-
latory Commission," IEEE Transactions on Nuclear Science,
vol. 26, no. 5, pp. 4686-4690, 1979.
36. Idaho National Engineering and Environmental Laboratory,
"Common-Cause Event Failure Insights NUREG/CR-6819,"
2003.
37. B. M. O'Halloran, N. Papakonstantinou and D. L. Van
Bossuyt, "Cable routing modeling in early system design to
prevent cable failure propagation events," in IEEE Reliabili-
ty and Maintainability Symposium (RAMS), 2016.
... Work over the last decade has focused on a family of methods based around the function failure identification and propagation (FFIP) method [88] and the companion flow state logic (FSL) method [89]. The FFIP family of methods has been expanded to examine how prognostics and health management systems can be designed during system architecture [90], how failure flows may jump between systems using the uncoupled failure flow state reasoner (UFFSR) [91], how to protect against uncoupled failure flows within a system [92], how systems can deal with a variety of unanticipated external initiating events in a SoS [23], and several other important advances [93][94][95][96][97][98][99][100]. We use the FFIP family of methods extensively throughout the research in this paper. ...
... In this context, cut-sets are defined as the path that each failure flow travels from the initial failure event to exiting the system as a spurious failure flow emission. The definition is in line with how cut-sets have been used in recent FFIP-related research [23,95,98,99] and is similar to how cut-sets are defined in the PRA literature [60]. A table of system-level failure flows is generated from this step. ...
Conference Paper
Full-text available
Increasingly tight coupling and heavy connectedness in systems of systems (SoS) presents new problems for systems designers and engineers. While the failure of one system within a SoS may produce little collateral damage beyond a loss in SoS capability, a highly interconnected SoS can experience significant damage when one member system fails in an unanticipated way. It is therefore important to develop systems that are “good neighbors” with the other systems in a SoS by failing in ways that do not further degrade a SoS’s ability to complete its mission. This paper presents a method to (1) analyze a system for potential spurious emissions and (2) choose mitigation strategies that provide the best return on investment for the SoS. The method is suited for use during the system architecture phase of the system design process. A functional and flow approach to analyzing spurious emissions and developing mitigation strategies is used in the method. Use of the method may result in a system that causes less SoS damage during a failure event.
Article
Full-text available
An open area of research for complex, cyber‐physical systems is how to adequately support decision making using reliability and failure data early in the systems engineering process. Having meaningful reliability and failure data available early offers information to decision makers at a point in the design process where decisions have a high impact to cost ratio. When applied to conceptual system design, widely used methods such as probabilistic risk analysis (PRA) and failure modes effects and criticality analysis (FMECA) are limited by the availability of data and often rely on detailed representations of the system. Further, existing methods for system reliability and failure methods have not addressed failure propagation in conceptual system design prior to selecting candidate architectures. Consideration given to failure propagation primarily focuses on the basic representation where failures propagate forward. In order to address the shortcomings of existing reliability and failure methods, this paper presents the function failure propagation potential methodology (FFPPM) to formalize the types of failure propagation and quantify failure propagation potential for complex, cyber‐physical systems during the conceptual stage of system design. Graph theory is leveraged to model and quantify the connectedness of the functional block diagram (FBD) to develop the metrics used in FFPPM. The FFPPM metrics include (i) the summation of the reachability matrix, (ii) the summation of the number of paths between nodes (i.e., functions) i and j for all i and j, and (iii) the degree and degree distribution. In plain English, these metrics quantify the reachability between functions in the graph, the number of paths between functions, and the connectedness of each node. The FFPPM metrics can then be used to make candidate architecture selection decisions and be used as early indicators for risk. The unique contribution of this research is to quantify failure propagation potential during conceptual system design of complex, cyber‐physical systems prior to selecting candidate architectures. FFPPM has been demonstrated using the example of an emergency core cooling system (ECCS) system in a pressurized water reactor (PWR).
Article
Full-text available
As component engineering has progressively advanced over the past 20 years to encompass a robust element of reliability, a paradigm shift has occurred in how complex systems fail. While failures used to be dominated by ‘component failures,’ failures are now governed by other factors such as environmental factors, integration capability, design quality, system complexity, built-in testability, etc. Of these factors, environmental factors are some of the most difficult to predict and assess. While test regimes typically encompass environmental factors, significant design changes to the system to mitigate any potential failures is not likely to occur due to the cost. The early stages of the systems engineering design process offer significant opportunity to evaluate and mitigate risks due to environmental factors. Systems that are expected to operate in a dynamic and changing environment have significant challenges for assessing environmental factors. For example, external failure initiating event probabilities may change with respect to time, and new discovered external initiating events may also be expected to have varying probabilities of occurrence with respect to time. While some industry standard methods such as Probabilistic Risk Assessment (PRA) [3] and Failure Modes and Effects Analysis (FMEA) [4] can partially address a time-dependent external initiating event probability, current methods of analyzing system failure risk during conceptual system design cannot. We have developed the Time Based Failure Flow Evaluator (TBFFE) to address the need for a risk analysis tool that can account for variable probabilities in initiating events over the duration of a system’s operation. This method builds upon the Function Based Engineering Design (FBED) [19] method of functional modeling and the Function Failure Identification and Propagation (FFIP) [9] failure analysis method that is compatible with FBED. Through the development of TBFFE, we have found that the method can provide significant insights into a design that is to be used in an environment with variable probability external initiating events. We present a case study of the conceptual design of a nuclear power plant’s spent fuel pool experiencing a variety of external initiating events that vary in probability based upon the time of year. The case study illustrates the capability of TBFFE by identifying how seasonally variable initiating event occurrences can impact the probability of failure on a monthly timescale that otherwise would not be seen on a yearly timescale. Changing the design helps to reduce the impact that time-varying initiating events have on the monthly risk of system failure.
Conference Paper
Full-text available
System of systems (SoS) failures can sometimes be traced to a system within the SoS behaving in unexpected ways. Due to their emergent complexity, these types of failures are notoriously challenging to foresee. This paper presents a method to aid in predicting unknown unknowns in a SoS. Irrationality initiators-failure flows emanating from one system that serve as unexpected initiating events in another system-are introduced into quantitative risk analysis methods such as the Failure Flow Identification and Propagation framework and Probabilistic Risk Assessment. Analysis of models built using this approach yield a probability distribution of failure paths through a system within the SoS that are initiated by unexpected behaviors of other systems within the SoS. The method is demonstrated using an example of an autonomous vehicle network operating in a partially denied environment with hostile forces present. Using the concept of irrationality initiators, it is possible to identify and prioritize vulnerabilities in the system of interest in the SoS.
Article
Full-text available
Prognostics and Health Management (PHM) systems are usually only considered and set up in the late stage of design or even during the system’s lifetime, after the major design decision have been made. However, considering the PHM system’s impact on the system failure probabilities can benefit the system design early on and subsequently reduce costs. The identification of failure paths in the early phases of engineering design can guide the designer toward a safer, more reliable and cost-efficient design. Several functional failure modeling methods have been developed recently. One of their advantages is to allow for risk assessment in the early stages of the design. Risk and reliability functional failure analysis methods currently developed do not explicitly model the PHM equipment used to identify and prevent potential system failures. This paper proposes a framework to optimize prognostic systems selection and positioning during the early stages of a complex system design. A Bayesian network, incorporating the PHM systems, is used to analyze the functional model and failure propagation. The algorithm developed within the proposed framework returns the optimized placement of PHM hardware in the complex system, allowing the designer to evaluate the need for system improvement. A design tool was developed to automatically apply the proposed method. A generic pressurized water nuclear reactor primary coolant loop system is used to present a case study illustrating the proposed framework. The results obtained for this particular case study demonstrate the promise of the method introduced in this paper. The case study notably exhibits how the proposed framework can be used to support engineering design teams in making better informed decisions early in the design phase.
Conference Paper
Full-text available
Risk analysis in engineering design is of paramount importance when developing complex systems or upgrading existing systems. In many complex systems, new generations of systems are expected to have decreased risk and increased reliability when compared with previous designs. For instance, within the American civilian nuclear power industry, the Nuclear Regulatory Commission (NRC) has progressively increased requirements for reliability and driven down the chance of radiological release beyond the plant site boundary. However, many ongoing complex system design efforts analyze risk after early major architecture decisions have been made. One promising method of bringing risk considerations earlier into the conceptual stages of the complex system design process is functional failure model-ing. Function Failure Identification and Propagation (FFIP) and related methods began the push toward assessing risk using the functional modeling taxonomy. This paper advances the Dedicated Failure Flow Arrestor Function (DFFAF) method which incorporates dedicated Arrestor Functions (AFs) whose purpose is to stop failure flows from propagating along uncoupled failure flow pathways, as defined by Uncoupled Failure Flow State Rea-soner (UFFSR). By doing this, DFFAF provides a new tool to the functional failure modeling toolbox for complex system engineers. This paper introduces DFFAF and provides an illustrative simplified civilian Pressurized Water Reactor (PWR) nuclear power plant case study.
Conference Paper
Full-text available
Functional modelling methods used in the early conceptual phases of complex system design allow system designers to better understand and refine system architecture from a functional perspective. A family of methods exist to model functional failures and failure flows. These failure flow modelling methods provide the opportunity to understand potential system failure sources and redesign systems for more robustness. One area lacking from the family of function failure and flow methodological family is the ability to model failure flow decision-making. This paper presents the Function Flow Decision Functions (FFDF) methodology that allows system designers to model failure flow decision-making where critical functions and flow exports are protected from failure flows by sacrificing less critical functions and flow exports. By sacrificing less critical functions and flow exports, mission-critical functions and flow exports can be preserved in order to accomplish the primary mission objectives of a system. A case study based upon the Mars Exploration Rover platform is presented in this paper.
Conference Paper
In today’s world it is more important than ever to quickly and accurately satisfy customer needs when launching a new product. It is equally important to design products that adequately accomplish their desired functions with a minimum amount of failures. When failure analysis and prevention are coupled with a product design from its conception, shorter design times and fewer redesigns are necessary to arrive at a final product design. In this article, we explore the potential of a novel design methodology to guide designers toward new designs or redesigns that avoid failures. The Elemental Function-Failure Design Method (EFDM) is based on functional similarity of the product being designed to failed products within a knowledge base. The idea of using component functionality to explore the failure space in design was first introduced as a function-failure analysis approach by Tumer and Stone (2003). The overall approach offers potential improvement over current failure analysis methods (FMEA, etc.), because it can be implemented hand in hand with other conceptual design steps and carried throughout a product’s design cycle. In this paper, this idea is formalized into a systematic methodology that is specifically tailored for use at the conceptual design stage before any physical design choices have been made, hence moving failure analysis earlier in the design cycle. In the following, formalized guidelines for using the EFDM will be outlined for use in new designs and for redesign in existing products. A function-failure knowledge base, derived from actual failure occurrences for Bell 206 rotorcraft will be introduced and used to derive potential failure modes in a comparison of the EFDM and traditional FMEA for two design examples. This comparison will demonstrate the EFDM’s potential in conceptual design failure analysis.
Conference Paper
This study consisted of a qualitative and quantitative evaluation of fire detection and suppression capabilities in a facility by the standard operating crew. This evaluation was made using Human Reliability Analysis (HRA) quantification techniques, which resulted in a set of human error probabilities (HEPs) characterizing the detection and suppression actions. The HEPs were input to a comprehensive event tree model that integrated the different detection and suppression activities into sequences of events (i.e., fire scenario outcomes) that were also probabilistically quantified. Based on the study findings, the following conclusions were made: 1. The overall method proved to have significant merit for conducting an initial scoping evaluation of fire detection and suppression probability by crew members, including automatic detection and facility modification influences. 2. The majority of the HEPs were constant (i.e., did not change) for different system window values. However, for relatively short system windows, key “manual” detection and suppression activities may fail. These insights suggested areas of the facility where fire protection improvements could be beneficial. 3. Better characterization of the timing associated with human actions is necessary. Historical data should be investigated to re-assess the timing and support the probability distributions and corresponding parameters selected for representing the time values. 4. Better characterization of the fires. The analysis preliminarily considered “slow” and “fast” growing fires. However, no explicit treatment was given to small fires that are easy to suppress or larger fires that may not be easily suppress. Further characterization of the probability of fires that can overcome the available suppression capabilities should be incorporated into the analysis. 5. Better characterization of system window times. A combination of historical data and detailed fi- e modeling analysis of selected fire scenarios should be applied since overall system window times were assumed and not based on an analytical characterization of specific fire scenarios. 6. There was an inherent assumption that the fire could be suppressed by the crew given the training and equipment available. Further analysis should be done to address fires with the capability of growth and propagation that can overcome the available suppression capabilities.
Conference Paper
For safety critical complex systems, reliability and risk analysis are important design steps. Implementing these analyses early in the design stage can reduce costs associated with redesign and provide important information on design viability. In the past several years, various research methods have been presented in the design community to move reliability analysis into the early conceptual design stages. These methods all use a functional representation as the basis for reliability analysis. This paper asserts that, in non-nominal system states, the functional representation limits the scope of failure analysis. Specifically, when failures are modeled to propagate along energy, material, and signal (EMS) flows, a nominal-state functional model is insufficient for modeling all types of failures. To capture possible failure propagation paths, a function-based reliability method must consider all potential flows, and not be limited to the function structure of the nominal state. In this light, this paper introduces the Flow State Logic (FSL) method as a means for reasoning on the state of EMS flows that allows the assessment of failure propagation over potential flows that were not considered in a functional representation of a “nominally functioning” design. A liquid fueled rocket engine serves as a case study to illustrate the benefits of the methodology.