Technical ReportPDF Available

Systems Engineering for Resilience

Authors:

Abstract

Resilience has become an important consideration for systems engineering (SE). This paper discusses ways of addressing system resilience within SE methodologies. This paper starts by exploring the meaning of resilience, how it applies to the engineered system, and how it relates to established SE viewpoints and activities. The paper identifies a key distinguishing characteristic of resilience; viz., resilience requirements are an expression of the delivery of capability under adverse conditions. As such, resilience requirements include aspects of functional, performance, and environmental requirements, and do so with focus on the adverse conditions that fully define the acceptable delivery of capability. This results in resilience requirements having a unique structure of data that must be captured for them to be properly specified. That required data drives a need for resilience-specific considerations to be added to traditional SE activities; these activities start early in the system life cycle to establish the resilience needs, priorities, and constraints. This paper discusses the activities that need to be added. Once resilience has been addressed in the system requirements, architecture, and design, we find the SE practices that are normally carried out during the remainder of the SE lifecycle will handle resilience considerations without alteration. iii
Systems Engineering for Resilience
Authors: John S. Brtis
Michael A. McEvilley
July 2019
M P 1 90 49 5
M I T R E P RO DU C T
Dept. No.: T886, T884
Project No.: 10AOH080-FK
The views, opinions and/or findings
contained in this report are those of The
MITRE Corporation and should not be
construed as an official government position,
policy, or decision, unless designated by
other documentation.
Approved for Public Release; Distribution
Unlimited 19-2103.
©2019 The MITRE Corporation.
All rights reserved.
Colorado Springs, CO
ii
Abstract
Resilience has become an important consideration for systems engineering (SE). This paper
discusses ways of addressing system resilience within SE methodologies. This paper starts by
exploring the meaning of resilience, how it applies to the engineered system, and how it relates
to established SE viewpoints and activities. The paper identifies a key distinguishing
characteristic of resilience; viz., resilience requirements are an expression of the delivery of
capability under adverse conditions. As such, resilience requirements include aspects of
functional, performance, and environmental requirements, and do so with focus on the adverse
conditions that fully define the acceptable delivery of capability. This results in resilience
requirements having a unique structure of data that must be captured for them to be properly
specified. That required data drives a need for resilience-specific considerations to be added to
traditional SE activities; these activities start early in the system life cycle to establish the
resilience needs, priorities, and constraints. This paper discusses the activities that need to be
added. Once resilience has been addressed in the system requirements, architecture, and design,
we find the SE practices that are normally carried out during the remainder of the SE lifecycle
will handle resilience considerations without alteration.
iii
This page intentionally left blank.
iv
Introduction ............................................................................................................................. 1
The System Context for System Resilience ............................................................................ 1
Resilience Terminology & Basics ........................................................................................... 2
3.1 The Meaning of Resilience .............................................................................................. 2
3.2 Capabilities ...................................................................................................................... 2
3.3 Business and Stakeholder Needs ..................................................................................... 2
3.4 Resilience Scenarios ........................................................................................................ 3
Modeling the Resilience Scenario ........................................................................................... 3
4.1 Normal and Adverse Conditions ...................................................................................... 4
4.2 Adversity Due to Hazard and Vulnerability .................................................................... 4
4.3 Required System Capabilities .......................................................................................... 5
4.4 Capability Measures and Targets ..................................................................................... 5
Content of Resilience Requirements ....................................................................................... 5
SE Lifecycle Practices Affected by Resilience ....................................................................... 7
6.1 Technical Processes ......................................................................................................... 8
6.2 Technical Management Processes ................................................................................... 9
6.3 Key Resilience Information and Artifacts from the System Modeling Perspective ........ 9
Conclusions ............................................................................................................................. 9
Bibliography .......................................................................................................................... 10
v
List of Figures
Figure 1: System delivery of capability depends on environmental conditions, which may be
nominal or adverse. ......................................................................................................................... 1
Figure 2: Resilience is a statement about multiple considerations. ................................................ 2
Figure 3: Resilience scenario lifecycle components ....................................................................... 3
Figure 4: Data structure for specifying resilience requirements. .................................................... 7
List of Tables
Table 1: Modeling information and artifacts during lifecycle phases. ........................................... 9
vi
This page intentionally left blank.
1
Introduction
Resilience has become an important consideration in SE over the past dozen years. While the
term “resilience” has been used extensively in other domains such as psychology, ecology and
social science, “resilience engineering” was coined by Hollnagel only in 2006 and “resilience of
systems” were first explored by Haimes in 2008. Then starting in 2010, the need for resilient
systems appears extensively in White House Directives, DHS documents, National Space Policy
and numerous United Nations documents. To ensure that the needs and requirements for
resilience of engineered systems can be met, it is incumbent on the SE community to understand
what resilience is and how to achieve system resilience through SE methodologies. We suggest
specific additions to standard SE methods to improve the ability to achieve resilience of systems.
The System Context for System Resilience
The concepts we present in this paper are grounded in the SE definition of system. Systems by
definition deliver desired capability. It is the quality of the delivery of such capability – in the
face of adversity – that resilience addresses.
The following two definitions come from ISO 15288.
System – a combination of interacting
elements organized to achieve one or
more stated purposes.1
System Elements – may include
hardware, software, data, humans,
processes, procedures, facilities,
materials and naturally occurring entities.
Systems interact with their environments.
Nominal environmental conditions often
dominate the focus of SE activities. The
concept of resilience explicitly adds the
consideration of adversity and requires a shift in
requirements analysis, architecture, and design
methods to establish an approach that addresses
nominal and adverse conditions under which
the system should operate.
1 In this paper we are concerned exclusively with human created (engineered) systems, and not naturally occurring systems.
Figure 1: System delivery of capability depends on
environmental
conditions, which may be nominal or
adverse.
2
Resilience Terminology & Basics
3.1 The Meaning of Resilience
What we mean by resilience:
Resilience is the ability to provide required
capability in the face of adversity. [Brtis
2016]
An influence diagram representing this meaning
is shown in Figure 2.
The sources of adversity may be natural or
human, and may include sources external to or
within the system of interest. Human sources of
adversity may be opponent, friendly, or neutral
and may have intent or absence of intent.
Resilience encompasses the system’s ability to
avoid, withstand, and recover from adversity.
Inspection of Figure 2 leads to the understanding that to assess the resilience of alternative
systems one must:
Know the system architectures and/or designs under consideration
Know the system functional behavior, data and control flows to deliver the required
capability
Know the capability(s) of interest, how it is measured, and the required levels of delivery.
Know the adversity(s) that may affect the system.
Know the system behavior in response to the adversity(s)
3.2 Capabilities
We use the term capability throughout this document to represent the system’s ability to achieve
desired effects. This provides an umbrella term for considering the many objectives and
outcomes achieved by SE activities that are relevant to resilience, such as: mission objectives,
user needs, user requirements, system requirements, derived requirements, etc.
3.3 Business and Stakeholder Needs
Business and stakeholder needs establish the foundation for resilience analysis. Stakeholder
objectives are what the stakeholders value, are the reasons that capabilities are needed, and will
be the basis for identifying the needed capabilities, and the expectation of performance in the
face of adversity. Capabilities may or may not have direct value to the stakeholder, but they are
needed – and justified – by their ability to support the achievement of the objectives, which have
intrinsic value.2
2 As an example, there are capabilities of the system that exist for the purpose of the system to function. These capabilities result
from architecture and design decisions and therefore do not trace directly back to the stakeholder driven needs.
System System
Capability
Adversity Required
Capability
Delivers
Resilience Scenarios
Affects Gaged Against
Figure 2: Resilience is a statement about multiple
considerations.
3
3.4 Resilience Scenarios
Scenarios are an important means of representing business and stakeholder needs. A scenario
should describe the effect to be achieved and the environment in which this will be performed.
This establishes one basis for the measures, targets and conditions (including adversities) by
which acceptable capabilities will be judged. A range of scenarios should be developed that
properly represents the scope of operations that the system is expected to support.
Figure 3: Resilience scenario lifecycle components
Modeling the Resilience Scenario
To understand resilience it is useful to have a model of a resilience scenario. Figure 3 represents
a notional scenario on a two-dimensional graph. On this graph the horizontal axis represents time
and the vertical axis represents capability to the left and level of affect on the system to the right
as a function of time. The green line indicates the required capability. The red line indicates the
level of affect applied to the system by the adversity as a function of time. The blue line
indicates the capability delivered by the system as a function of time. A number of periods of
interest are labeled: the period of affect, the period of avoidance, the period of withstanding, and
the period of recovery. This is only a single notional example, and it is important to note that
avoidance, withstanding and recovery will often overlap in time.
The graph shows a highly simplified view of a resiliece scenario. At the start, the system is
shown to deliver capability in excess of that required (i.e, the system has a margin of capability),
while the adversity may be in the environment of the system. When an adversity begins to affect
the system a period of delay is shown during which the system withstands the affect, after which
the system’s delivery of capability degrades, eventually dropping below the required capability.
After the period of affect ends the capability delivery is shown to continue to degrade for a short
4
period of time after which it recovers to a level that is below the initially delivered capability, but
above the required capability.
This notional model is highly simplified for the purpose of discussion. In a real situation several
complications should be considered:
1. There may be multiple discrete adverse event scenarios that need to be considered.
2. There may be multiple adverse events affecting the system over the period of interest.
3. The required capability may vary in a much more complex manner.
4. As with risk, the ability to avoid, withstand and recovery from the adversity may or may not have
a probabilistic component.
One approach to addressing items 1, 2 and the probabilistic aspect of 4 is to sum the probability
weighted resilience of all of the salient resilience scenarios. This is detailed in [Brtis 2016].
4.1 Normal and Adverse Conditions
The performance of the system – the ability to provide the required capability(s) – depends on
the conditions under which the systems must operate. Sources of the conditions may include:
Mission conditions
Operational modes
ConOps (Concept of Operations)
Operational environment
Operator and user information
System modes of operation
Enemy threat structure
Enemy ConOps
Resilience Scenarios
Importantly, mission objectives and conditions may change with time or as a result of the
operational mode; e.g. both the mission objectives and an enemy’s capability to inflict harm may
be significantly different in 2017 than in 2030. The timeframe over which mission objectives
and conditions are valid must be identified.
It is common for there to be multiple scenarios with multiple parameter sets that must be
considered. If one of these sets can be shown to “envelope” all of the others, it may be
appropriate to look at only one conditions set. In many situations, however, it will be appropriate
to consider multiple salient resilience scenarios.
4.2 Adversity Due to Hazard and Vulnerability
Many of the adversities we consider can come from external sources: human (friendly or not),
nature, other systems. Adversity may also come from faults, errors, failures and other manners
in which the system of interest deviates from intended behavior. Identifying such adversities can
be based on the assessment of threat(s), the analysis of empirical data, or the modeling of the
impact of adversity that is unknown.
One other valuable approach to designing for resilience is to evaluate the system – independent
of environmental concerns – for hazards and vulnerabilities. Once hazards and vulnerabilities
are identified, they can be eliminated or they can be treated as posited adversities to be addressed
by the resilience design.
5
An example of hazard/vulnerability-based adversity would be the double-ended main coolant
feedline break, which is a postulated “design base accident” for all US built light water nuclear
power plants. This feedline provides the coolant to the reactor and it is part of the reactor
pressure boundary. There is no likely set of events that would lead to the breakage of this line,
but it is stipulated by law that designers must assume that this line breaks and then must design
to ensure that all safety requirements will be met.
4.3 Required System Capabilities
System capability analysis addresses two questions:
What system capabilities are required to achieve the system objectives under what
conditions for each adversity scenario?
What are the targets of acceptability for those system capabilities?
Each capability will have one or more measures and targets for assessing whether the capability
need has been met. A measure is the attribute of the capability that needs to be considered: e.g.,
availability, quantity, or accuracy. The target describes how well the capability measure needs to
be performed, and may consider multiple levels such as threshold and objective.
All of the identified capabilities should be reviewed and validated. Standard SE techniques
should be applied to ensure that the set of capabilities adequately represents the user needs and
the mission objectives.
4.4 Capability Measures and Targets
For each capability, the subject matter experts (SMEs), often supported by analysts, will identify
the measure(s) by which the delivery of each capability will judged and the units of that measure.
Some examples (for a communications example) of measures include:
Quality [0-1]
Reception success rate [min-1]
Reception success [percent]
Bandwidth [MHz]
Throughput [bits/sec]
Availability [unitless]
Processing time [seconds]
Delivery time [seconds]
Finally, the SMEs will establish the target: the amount of each capability required to support the
achievement of each system objective.
Content of Resilience Requirements
Requirements are a core consideration in SE. To be achieved through SE, resilience must be
effectively represented as system requirements. The challenge is that resilience aggregates the
considerations of functional, performance and environmental requirements. We must be able to
capture this compound requirement, so standard system engineering practices can trade system
resilience against other system properties expressed in the system requirements.
The content of a resilience requirement flows almost directly from the definition of resilience:
“the ability to deliver required capability in the face of adversity.” Importantly, resilience is not
6
just about the “whole” system. It needs to be addressed at a finer level of granularity.
Resilience is about the delivery of desired capabilities by the system – in the face of adversity.
Thus, specifying resilience requires that several parameters be identified. We will call the
aggregation of these parameters a “resilience scenario.” The following must be known in order
to specify resilience:
The capability(s) of interest (note: a system may deliver several capabilities each of which
may have different levels of resilience.)
The measure(s) (and units) of the capability.
The target(s) (required amount) of the capability
o Note: The required capability may vary as a function of time during the resilience
scenario.
o Note there may be several salient levels of “required” capability. (e.g., nominal,
degraded mode, minimum useful, objective, threshold, etc.). Also, the capability may
have a utility value that is a function of the amount of capability delivered.
System modes of operation (e.g., operational, contingency, training, exercise,
maintenance, update)
The adversity(s) being considered for this resilience scenario and the level of affect that
the adversities can impose on the system.
Understanding of the affects that the adversity imposes on the system and how the system
reacts to those affects in terms of its ability to deliver capability.
The timeframe of interest.
o Note: An adversity may be acute or chronic, single or multiple and may vary with
time.
The required resilience (performance) of the capability in the face of each identified
resilience scenario (e.g., expected availability, maximum allowed degradation, maximum
length of degradation, etc.).
o Note there may be several “required” resilience goals (e.g., threshold, objective, As
Resilient as Practicable (ARAP)) 3.
Importantly, any of these factors and parameters may vary over the timeframe of the scenario,
and this fact must be addressed by the systems engineer.
Note that the capability in question is likely to be a functional requirement of the system.
Resilience then extends such requirements into a resilience scenario by adding environmental
requirements (adversities) and performance requirements. This leads to a specific structure
among the salient parameters. The entity-relationship diagram for this information is shown in
Figure 4. This structure must be addressed in SE considerations for requirements traceability and
management, architecting, design, verification, and validation.4
3 By “practicable” we mean capable of being accomplished to the extent that any further increase in resilience results in an
unacceptable degradation in system performance, cost, or means of utilization.
4 These resilience requirements will, of course, compete in trades against other requirements and issues. Issues unique to
performing trades with resilience will be addressed in future work.
7
Figure 4: Data structure for specifying resilience requirements.
Examples of Resilience Requirements
For a nuclear power plant: In the event of a double ended guillotine pipe break, the fuel
clad temperature shall be maintained below 2200 degrees Fahrenheit for the entire
duration of the accident. This shall be met with the assumption of the total loss of offsite
power. This should be achieved with a 99.997% confidence.
For the electric power system at a hospital: In the event of loss of quality offsite power,
backup power at the nominal specified quality shall be made available to all critical
circuits within 300 ms and shall be available for up to 72 hours without any maintenance
or external resources such as fuel. This shall be achieved with a 99.99% confidence.
SE Lifecycle Practices Affected by Resilience
The INCOSE Systems Engineering Handbook; A Guide for System Life Cycle Processes and
Activities, Fourth Edition, INCOSE-TP-2003-002-04, 2015 (SE Handbook) is a standard source
on how to effectively apply SE. It specifies fourteen Technical Processes and eight Technical
Management Processes for performing SE. While the Handbook addresses resilience as a
specialty engineering activity in section 10.9, there are important considerations worth adding to
the some of the SE practices described if resilience is to be achieved. The sections below
provide the extensions to the SE Handbook to address this in five of the technical processes and
one of the technical management processes. For those processes not discussed here, the SE
processes in the SE Handbook should suffice as is. The numbering below reflects the numbering
of the SE Handbook. (Note: the INCOSE SE Handbook has been fully harmonized with Systems
and Software Engineering – System life cycle processes, ISO/IEC/IEEE 15288:2015(E), though
the numbering of the processes differs. Recommendations below apply equally to the
ISO/IEC/IEEE 15288.)
8
6.1 Technical Processes
6.1.1 Business or Mission Analysis Process
Defining the problem space should include the identification of adversities under which
the system must provide capability and the expectations for performance under those
adversities. Those expectations of performance shall consider the relative valuation and
criticality of the capability.
The OpsCon and solution classes which characterize the solution space should consider
the system’s ability to avoid, withstand, and recover from the adversities for the purpose
of providing the required capabilities.
Evaluation of alternative solution classes must consider the system’s ability to deliver
desired capabilities under the adversities.
6.1.2 Stakeholder Needs and Requirements Definition Process
The stakeholder set should include persons who understand the potential adversities and
the requisite stakeholder resilience needs.
Identifying stakeholder needs should identify stakeholder expectations for capability under
adverse conditions, and should consider degraded – but useful – modes of operation.
The operational concept should consider adversities as part of the defined operational
environment. The scenarios should include resilience scenarios.
Transformation of stakeholder needs into stakeholder requirements should include the
development of stakeholder resilience requirements.
Analysis of stakeholder requirements should include appropriate adversity scenarios
among the intended operational environment.
6.1.3 System Requirements Definition Process
In defining system requirements, resilience should be considered in the identification of
quality requirements.
System requirements that achieve resilience will often address system “-ilities.”
Achieving resilience and the -ilities should be addressed holistically.
6.1.4 Architecture Definition Process
The architecture viewpoints selected should support the representation of resilience.
Experience shows that resilience requirements can significantly limit the range of
acceptable architectures. It is critical that resilience requirements are fully mature, and
fully validated and verified when used for architecture selection.
Individuals developing candidate architectures should be familiar with architectural
techniques for achieving resilience.
Architectural techniques for achieving resilience are often germane to the system “-
ilities.” Achieving resilience and the -ilities should be addressed holistically.
6.1.5 Design Definition Process
Individuals developing candidate designs should be familiar with design techniques for
achieving resilience.
9
Design techniques for achieving resilience are often germane to the system “-ilities.”
Achieving resilience and the “-ilities” should be addressed holistically.
6.2 Technical Management Processes
6.2.1 Risk Management Process
It is important to recognize that system resilience (which is focused on the resilience of required
capability) and risk (which includes a focus on the effect of uncertainty on the ability to achieve
operational objectives) are tightly coupled. Risk management activities should be explicitly
planned and coordinated with SE resilience activities.
6.3 Key Resilience Information and Artifacts from the System Modeling
Perspective
Much of the information developed during the system development can take the form of models.
The manner in which resilience requirements, adversities and resilience scenarios will need to be
factored into the system models is discussed below. For the purpose of this discussion we will
assume that the Systems Modeling Language (SysML) is the language being used for this
purpose. Table 1 identifies the modeling information that needs to be captured during the
various lifecycle stages to support the effective development and documentation of resilience
scenarios and resilience requirements.
Table 1: Modeling information and artifacts during lifecycle phases.
Lifecycle Phase Artifacts and information
Mission and Stakeholder
Needs Analysis
Add adversities to the context diagram as actors.
Add resilience scenarios as use cases.
Stakeholder
Requirements
Develop use case interaction diagrams to document the interaction of actors
and architectural modules during the scenarios.
Develop sequence diagrams to represent the activity flow during scenarios.
System
Requirements
Develop activity diagrams to show the states of the system (and adversities)
during scena
rios.
Architecture and System
Design
Develop state models of the scenarios.
Model events and signals among the architectural nodes.
System Design Propose and select resilience design features.
Document resilience related object distribution.
Conclusions
This paper has addressed two broad considerations for improving SE’s ability to deliver resilient
systems: 1) how to generate quality requirements for resilience, and 2) how to augment the SE
lifecycle to address resilience.
10
Bibliography
Brtis, J. S., “How to Think About Resilience in a DoD Context,” MITRE Technical Document,
2016.
Hollnagel, E., D. Woods, and N. Leveson (eds). 2006. Resilience Engineering: Concepts and
Precepts. Aldershot, UK: Ashgate Publishing Limited.
INCOSE, Systems Engineering Body of Knowledge (SEBOK), Part 6: Related Disciplines/SE
and Specialty Engineering/System Resilience, http://sebokwiki.org/wiki/System_Resilience.
INCOSE “Technical Measurement,” Ver. 1.0, INCOSE-TP-2003-020-01, December 27, 2005.
Jackson, S., & Ferris, T. (2013). Resilience Principles for Engineered Systems. Systems
Engineering, 16(2), 152-164.
National Security Space Strategy (Unclassified Summary), Jan 2011.
National Space Policy of the United States of America, June 28, 2010.
NDIA National Security Space Policy & Architecture Symposium, “Progress toward resilient
national security space,” 02 Aug 2017.
Office of the Secretary of Defense – Space Policy, “Resilience of Space Capabilities” White
Paper, 2011.
“Space Domain Mission Assurance: A Resilience Taxonomy,” Office of the Assistant Secretary
of Defense for Homeland Defense & Global Security, Sept 2015.
The National Academies, “Disaster Resilience: a National Imperative,” the National Academies
Press, 2012.
11
This page intentionally left blank.
Article
In recent years, achieving resilience has become an important objective in many system development efforts. Resilience is defined as a system's ability to deliver required capability in the face of adversity. Developing resilient systems often depends on first establishing good resilience requirements. Resilience requirements are complex compound requirements and developing high quality resilience requirements is a challenge. In this paper we follow both a deductive and inductive approach to identify the critical content and structure of resilience requirements. We develop a pattern that represents that information and model that pattern in three forms, all containing the same information. The three forms are: 1) natural language, 2) entity‐relationship diagram, and 3) an extension to SysML. Having multiple forms makes the pattern easily developed, understood, and validated by stakeholders who are not modeling experts, while at the same time being formal, precise and computationally consumable. The resulting resilience requirements are consistent with systems engineering methodologies and are easily utilized in Model Based Systems Engineering and Digital Engineering environments.
Article
Full-text available
This paper examines a set of abstract, top-level principles and subprinciples collected from the literature to determine their usefulness in enabling the avoidance, survival, and recovery from disruptions caused by threats of various sources. The principles are compared to concrete solutions recommended by domain experts in various case studies and to the actual events in those case studies. Also examined are the limitations, conflicts, and vulnerabilities that may be apparent when concrete solutions are created from these principles. The systems considered are physical, organizational, and procedural systems. Examples include cases from fire protection, aviation, railways, and power distribution domains. Threats examined include terrorist attacks, natural disasters, and human and design error. Each principle is found to apply to different phases of the disruption cycle surrounding an encounter with a threat. It is found that principles, in general, cannot be applied singly to a system but must be combined with other principles to enable resilience. System developers in various domains can use the principles to create concrete solutions to characterize a particular system, model that solution, and determine the degree of recovery of the system from a specified threat. © 2012 Wiley Periodicals, Inc. Syst Eng 16
Article
Full-text available
For Resilience Engineering, 'failure' is the result of the adaptations necessary to cope with the complexity of the real world, rather than a breakdown or malfunction. The performance of individuals and organizations must continually adjust to current conditions and, because resources and time are finite, such adjustments are always approximate. This definitive new book explores this groundbreaking new development in safety and risk management, where 'success' is based on the ability of organizations, groups and individuals to anticipate the changing shape of risk before failures and harm occur. Featuring contributions from many of the worlds leading figures in the fields of human factors and safety, Resilience Engineering: Concepts and Precepts provides thought-provoking insights into system safety as an aggregate of its various components, subsystems, software, organizations, human behaviours, and the way in which they interact. The book provides an introduction to Resilience Engineering of systems, covering both the theoretical and practical aspects. It is written for those responsible for system safety on managerial or operational levels alike, including safety managers and engineers (line and maintenance), security experts, risk and safety consultants, human factors professionals and accident investigators.
How to Think About Resilience in a DoD Context," MITRE Technical Document
  • J S Brtis
Brtis, J. S., "How to Think About Resilience in a DoD Context," MITRE Technical Document, 2016.
Technical Measurement
INCOSE "Technical Measurement," Ver. 1.0, INCOSE-TP-2003-020-01, December 27, 2005.