ChapterPDF Available

Essential characteristics of resilience

Authors:
Chapter 2
Essential Characteristics of
Resilience
David D. Woods
Avoiding the Error of the Third Kind
When one uses the label ‘resilience,’ the first reaction is to think of
resilience as if it were adaptability, i.e., as the ability to absorb or adapt
to disturbance, disruption and change. But all systems adapt (though
sometimes these processes can be quite slow and difficult to discern) so
resilience cannot simply be the adaptive capacity of a system. I want to
reserve resilience to refer to the broader capability – how well can a
system handle disruptions and variations that fall outside of the base
mechanisms/model for being adaptive as defined in that system.
This depends on a distinction between understanding how a system
is competent at designed-for-uncertainties, which defines a ‘textbook’
performance envelope and how a system recognizes when situations
challenge or fall outside that envelope – unanticipated variability or
perturbations (see parallel analyses in Woods et al., 1990 and Carlson &
Doyle, 2000; Csete & Doyle, 2002). Most discussions of definitions of
‘robustness’ in adaptive systems debate whether resilience refers to first
or second order adaptability (Jen, 2003). In the end, the debates tend to
settle on emphasizing the system’s ability to handle events that fall
outside its design envelope and debate what is a design envelope, what
events challenge or fall outside that envelope, and how does a system
see what it has failed to build into its design (e.g., see url:
http://discuss.santafe.edu/robustness/)
The area of textbook competence is in effect a model of
variability/uncertainty and a model of how the strategies/plans
22 Resilience Engineering
/countermeasures in play handle these, mostly successfully.
Unanticipated perturbations arise (a) because the model implicit and
explicit in the competence envelope is incomplete, limited or wrong
and (b) because the environment changes so that new demands,
pressures, and vulnerabilities arise that undermine the effectiveness of
the competence measures in play.
Resilience then concerns the ability to recognize and adapt to
handle unanticipated perturbations that call into question the model of
competence, and demand a shift of processes, strategies and
coordination. When evidence of holes in the organization’s model
builds up, the risk is what Ian Mitroff called many years ago, the error
of the third kind, or solving the wrong problem (Mitroff, 1974). This is
a kind of under-adaptation failure where people persist in applying
textbook plans and activities in the face of evidence of changing
circumstances that demand a qualitative shift in assessment, priorities,
or response strategy.
This means resilience is concerned with monitoring the boundary
conditions of the current model for competence (how strategies are
matched to demands) and adjusting or expanding that model to better
accommodate changing demands. The focus is on assessing the
organization’s adaptive capacity relative to challenges to that capacity –
what sustains or erodes the organization’s adaptive capacities? Is it
degrading or lower than the changing demands of its environment?
What dynamics challenge or go beyond the boundaries of the
competence envelope? Is the organization as well adapted as it thinks it
is? Note that boundaries are properties of the model that defines the
textbook competence envelope relative to the uncertainties and
perturbations it is designed for (Rasmussen, 1990a). Hence, resilience
engineering devotes effort to make observable the organization’s model
of how it creates safety, in order to see when the model is in need of
revision.
To do this, Resilience Engineering must monitor organizational
decision-making to assess the risk that the organization is operating
nearer to safety boundaries than it realizes (Woods, 2005a). Monitoring
resilience should lead to interventions to manage and adjust the
adaptive capacity as the system faces new forms of variation and
challenges.
Essential Characteristics of Resilience 23
Monitoring and managing resilience, or its absence, brittleness, is
concerned with understanding how the system adapts and to what
kinds of disturbances in the environment, including properties such as:
buffering capacity: the size or kinds of disruptions the system can
absorb or adapt to without a fundamental breakdown in
performance or in the system’s structure;
flexibility versus stiffness: the system’s ability to restructure itself in
response to external changes or pressures;
margin: how closely or how precarious the system is currently
operating relative to one or another kind of performance boundary;
tolerance: how a system behaves near a boundary – whether the
system gracefully degrades as stress/pressure increase or collapses
quickly when pressure exceeds adaptive capacity.
In addition, cross-scale interactions are critical, as the resilience of a
system defined at one scale depends on influences from scales above
and below:
Downward, resilience is affected by how organizational context
creates or facilitates resolution of pressures/goal
conflicts/dilemmas, for example, mismanaging goal conflicts or
poor automation design can create authority-responsibility double
binds for operational personnel (Woods et al., 1994; Woods,
2005b).
Upward, resilience is affected by how adaptations by local actors in
the form of workarounds or innovative tactics reverberate and
influence more strategic goals and interactions (e.g., workload
bottlenecks at the operational scale can lead to practitioner
workarounds that make management’s attempts to command
compliance with broad standards unworkable; Cook et al., 2000).
As illustrated in the cases of resilience or brittleness described or
referred to in this book, all systems have some degree of resilience and
sources for resilience. Even cases with negative outcomes, when seen as
breakdowns in adaptation, reveal the complicating dynamics that stress
the textbook envelope and the often hidden sources of resilience used
to cope with these complexities.
24 Resilience Engineering
Accidents have been noted by many analysts as ‘fundamentally
surprising’ events because they call into question the organization’s
model of the risks they face and the effectiveness of the
countermeasure deployed (Lanir, 1986; Woods et al., 1994, chapter 5;
Rochlin, 1999; Woods, 2005b). In other words, the organization is
unable to recognize or interpret evidence of new vulnerabilities or
ineffective countermeasures until a visible accident occurs. At this stage
the organization can engage in fundamental learning but this window of
opportunity comes at a high price and is fragile given the consequences
of the harm and losses. The shift demanded following an accident is a
reframing process. In reframing one notices initial signs that call into
question ongoing models, plans and routines, and begins processes of
inquiry to test if revision is warranted (Klein et al., 2005). Resilience
Engineering aims to provide support for the cognitive processes of
reframing an organization’s model of how safety is created before
accidents occur by developing measures and indicators of contributors
to resilience such as the properties of buffers, flexibility, precariousness,
and tolerance and patterns of interactions across scales such as
responsibility-authority double binds.
Monitoring resilience is monitoring for the changing boundary
conditions of the textbook competence envelope – how a system is
competent at handling designed-for-uncertainties – to recognize forms
of unanticipated perturbations – dynamics that challenge or go beyond
the envelope. This is a kind of broadening check that identifies when
the organization needs to learn and change. Resilience engineering
needs to identify the classes of dynamics that undermine resilience and
result in organizations that act riskier than they realize. This chapter
focuses on dynamics related to safety-production goal conflicts.
Coping with Pressure to be Faster, Better, Cheaper
Consider recent NASA experience, in particular, the consequences of
NASA’s adoption of a policy called ‘faster, better, cheaper’ (FBC).
Several years later a series of mishaps in space science missions rocked
the organization and called into question that policy. In a remarkable
‘organizational accident’ report, an independent team investigated the
organizational factors that spawned the set of mishaps (Spear, 2000).
The investigation realized that FBC was not a policy choice, but the
acknowledgement that the organization was under fundamental
Essential Characteristics of Resilience 25
pressure from stakeholders. The report and the follow-up, but short-
lived, ‘Design for Safety’ program noted that NASA had to cope with a
changing environment with increasing performance demands combined
with reduced resources: drive down the cost of launches, meet shorter,
more aggressive mission schedules, do work in a new organizational
structure that required people to shift roles and coordinate with new
partners, eroding levels of personnel experience and skills. Plus, all of
these changes were occurring against a backdrop of heightened public
and congressional interest that threatened the viability of the space
program. The MCO investigation board concluded: NASA, which had
a history of ‘successfully carrying out some of the most challenging and
complex engineering tasks ever faced by this nation,’ was being asked to
‘sustain this level of success while continually cutting costs, personnel
and development time … these demands have stressed the system to
the limit’ due to ‘insufficient time to reflect on unintended
consequences of day-to-day decisions, insufficient time and workforce
available to provide the levels of checks and balances normally found,
breakdowns in inter-group communications, too much emphasis on
cost and schedule reduction.’ The MCO Board diagnosed the mishaps
as indicators of an increasingly brittle system as production pressure
eroded sources of resilience and led to decisions that were riskier than
anyone wanted or realized. Given this diagnosis, the Board went on to
re-conceptualize the issue as how to provide tools for proactively
monitoring and managing project risk throughout a project life-cycle
and how to use these tools to balance safety with the pressure to be
faster, better, cheaper.
The experience of NASA under FBC is an example of the law of
stretched systems: every system is stretched to operate at its capacity; as
soon as there is some improvement, for example in the form of new
technology, it will be exploited to achieve a new intensity and tempo of
activity (Woods, 2003). Under pressure from performance and
efficiency demands (FBC pressure), advances are consumed to ask
operational personnel ‘to do more, do it faster or do it in more complex
ways’, as the Mars Climate Orbiter Mishap Investigation Board report
determined. With or without cheerleading from prestigious groups,
pressures to be ‘faster, better, cheaper’ increase. Furthermore, pressures
to be ‘faster, better, cheaper’ introduce changes, some of which are new
capabilities (the term does include ‘better’), and these changes modify
the vulnerabilities or paths toward failure. How conflicts and trade-offs
26 Resilience Engineering
like these are recognized and handled in the context of vectors of
change is an important aspect of managing resilience.
Balancing Acute and Chronic Goals
Problems in the US healthcare delivery system provide another
informative case where faster, better, cheaper pressures conflict with
safety and other chronic goals. The Institute of Medicine in a calculated
strategy to guide national improvements in health care delivery
conducted a series of assessments. One of these, Crossing the Quality
Chasm: A New Health System for the 21st Century (IOM, 2001), stated
six goals needed to be achieved simultaneously: the national health care
system should be – Safe, Effective, Patient-centered, Timely, Efficient,
Equitable.1 Each goal is worthy and generates thunderous agreement.
The next step seems quite direct and obvious – how to identify and
implement quick steps to advance each goal (the classic search for so-
called ‘low hanging fruit’). But as in the NASA case, this set of goals is
not a new policy direction but rather an acknowledgement of
demanding pressures already operating on health care practitioners and
organizations. Even more difficult, the six goals represent a set of
interacting and often conflicting pressures so that in adapting to reach
1 The IOM states the quality goals as –
‘Health Care Should Be:
Safe – avoiding injuries to patients from the care that is intended to help them.
Effective – providing services based on scientific knowledge to all who could
benefit and refraining from providing services to those not likely to benefit
(avoiding underuse and overuse, respectively).
Patient-centered – providing care that is respectful of and responsive to
individual patient preferences, needs, and values and ensuring that patient
values guide all clinical decisions.
Timely – reducing waits and sometimes harmful delays for both those who
receive and those who give care.
Efficient – avoiding waste, including waste of equipment, supplies, ideas, and
energy.
Equitable – providing care that does not vary in quality because of personal
characteristics such as gender, ethnicity, geographic location, and
socioeconomic status.’
Essential Characteristics of Resilience 27
for one of these goals it is very easy to undermine or squeeze others. To
improve on all simultaneously is quite tricky.
As I have worked on safety in health care, I hear many highly
placed voices for change express a basic belief that these six goals can
be synergistic. Their agenda is to energize a search for and adoption of
specific mechanisms that simultaneously advance multiple goals within
the six and that do not conflict with others – ‘silver bullets’. For
example, much of the patient safety discussion in US health care
continues to be a search for specific mechanisms that appear to
simultaneously save money and reduce injuries as a result of care.
Similarly, NASA senior leaders thought that including ‘better’ along
with faster and cheaper meant that techniques were available to achieve
progress on being faster, better, and cheaper together (for almost comic
rationalizations of ‘faster, better, cheaper’ following the series of Mars
science mission mishaps and an attempt to protect the reputation of the
NASA administrator at the time, see Spear, 2000). The IOM and NASA
senior management believed that quality improvements began with the
search for these ‘silver bullet’ mechanisms (sometimes called ‘best
practices’ in health care). Once such practices are identified, the
question becomes how to get practitioners and organizations to adopt
these practices. Other fields can help provide the means to develop and
document new best practices by describing successes from other
industries (health care frequently uses aviation and space efforts to
justify similar programs in health care organizations). The IOM in
particular has had a public strategy to generate this set of silver bullet
practices and accompanying justifications (like creating a quality
catalog) and then pressure health care delivery decision makers to adopt
them all in the firm belief that, as a result, all six goals will be advanced
simultaneously and all stakeholders and participants will benefit (one
example is computerized physician order entry).
However, the findings of the Columbia accident investigation
board (CAIB) report should reveal to all that the silver bullet strategy is
a mirage. The heart of the matter is not silver bullets that eliminate
conflicts across goals, but developing new mechanisms that balance the
inherent tensions and trade-offs across these goals (Woods et al., 1994).
The general trade-off occurs between the family of acute goals – timely,
efficient, effective (or after NASA’s policy, the Faster, Better, Cheaper
or FBC goals) and the family of chronic goals, for the health care case
consisting of safety, patient-centeredness, and equitable access.
28 Resilience Engineering
The tension between acute production goals and chronic safety
risks is seen dramatically in the Columbia accident which the
investigation board found was the result of pressure on acute goals
eroding attention, energy and investments on chronic goals related to
controlling safety risks (Gehman, 2003). Hollnagel (2004, p. 160)
compactly captured the tension between the two sets of goals with the
comment that:
If anything is unreasonable, it is the requirement to be both
efficient and thorough at the same time – or rather to be
thorough when with hindsight it was wrong to be efficient.
The FBC goal set is acute in the sense that they happen in the short
term and can be assessed through pointed data collection that
aggregates element counts (shorter hospitals stays, delay times). Note
that ‘better’ is in this set, though better in this family means increasing
capabilities in a focused or narrow way, e.g., cardiac patients are treated
more consistently with a standard protocol. The development of new
therapies and diagnostic capabilities belongs in the acute sense of
‘better.’
Safety, access, patient-centeredness are chronic goals in the sense
that they are system properties that emerge from the interaction of
elements in the system and play out over longer time frames. For
example, safety is an emergent system property, arising in the
interactions across components, subsystems, software, organizations,
and human behavior.
By focusing on the tensions across the two sets, we can better see
the current situation in health care. It seems to be lurching from crisis
to crisis as efforts to improve or respond in one area are accompanied
by new tensions at the intersections of other goals (or the tensions are
there all along and the visible crisis point shifts as stakeholders and the
press shift their attention to different manifestations of the underlying
conflicts). The tensions and trade-offs are seen when improvements or
investments in one area contribute to greater squeezes in another area.
The conflicts are stirred by the changing background of capabilities and
economic pressure. The shifting points of crisis can be seen first in
1995-6 as dramatic well publicized deaths due to care helped create the
patient safety crisis (ultimately documented in Kohn et al., 1999). The
patient safety movement was energized by patients feeling vulnerable as
Essential Characteristics of Resilience 29
health care changed to meet cost control pressures. Today attention has
shifted to an access crisis as malpractice rates and prescription drug
costs undermine patients’ access to physicians in high risk specialties
and challenge seniors’ ability to balance medication costs with limited
personal budgets.
Dynamic Balancing Acts
If the tension view is correct, then progress revolves around how to
dynamically balance the potential trade-offs so that all six goals can
advance (as opposed to the current situation where improvements or
investments in one area create greater squeezes in another area). It is
important to remember that trade-offs are defined by two parameters,
one that captures discrimination power or how well one can make the
underlying judgement, and a second that defines where to place a
criterion for making a decision or taking an action along the trade-off
curve, criterion placement or movement. The parameters of a trade-off
cannot be estimated by a single case, but require integration over
behavior in sets of cases and over time.
One aspect of the difficulty of goal conflicts is that the default or
typical ways to advance the acute goals often make it harder to achieve
chronic goals simultaneously. For example, increasing therapeutic
capabilities can easily appear as new silos of care that do not redress
and can even exacerbate fragmentation of care (undermining the
patient-centeredness goal). To advance all of the goals, ironically, the
chronic set of goals of patient centered, safety and access must be put
first, with secondary concern for efficient and timely methods. To do
otherwise will fall prey to the natural tendency to value the more
immediate and direct consequences (which, by the way, are easier to
measure) of the acute set over the chronic and produce an
unintentional sacrifice on the chronic set. Effective balance seems to
arise when organizations shift from seeing safety as one of a set of goals
to be measured (is it going up or down?) to considering safety as a basic
value. The point is that for chronic goals to be given enough weight in
the interaction with acute goals, the chronic needs to be approached
much more like establishing a core cultural value.
For example, valuing the chronic set in health care puts patient
centeredness first with its fellow travelers safety and access. The central
30 Resilience Engineering
issue under patient centeredness is emergent continuity of care, as the
patient makes different encounters with the health care system and as
disease processes develop over time. The opposite of continuity is
fragmentation. Many of the tensions across goals exacerbate
fragmentation, e.g., ironically, new capabilities on specific aspects of
health care can lead to more specialization and more silos of care.
Placing priority on continuity of care vs. fragmentation focuses
attention (a) on health care issues related to chronic diseases which
require continuity and which are inherently difficult in a fragmented
system of care and (b) on cognitive system issues which address
coordination over time, over practitioners, over organizations, and over
specialized knowledge sources. Consider the different ways new
technology can have an effect on patient care. Depending on how
computer systems are built and adapted over time, more
computerization can lead to less contact with patients and more contact
with the image of the patient in the database. This is a likely outcome
when FBC pressure leads acute goals to dominate chronic ones (the
benefits of the advance in information technology will tend to be
consumed to meet pressures for productivity or efficiency). When a
chronic goal such as continuity of care, functions as the leading value,
the emphasis shifts to finding uses of computer capabilities that
increase attention and tailoring of general standards to a specific patient
over time (increasing the effective continuity) and only then developing
these capabilities to meet cost considerations.
The tension diagnosis is part of the more general diagnosis that
past success has led to increasingly complex systems with new forms of
problems and failure risks. The basic issue for organizational design is
how large-scale systems can cope with complexity, especially the pace
of change and coupling across parts that accompany the methods that
advance the acute goals. To miss the complexity diagnosis will make
otherwise well-intentioned efforts fail as each attempt to advance goals
simultaneously through silver bullets will rebound as new crises where
goal trade-offs create new dissatisfactions and tensions.
Sacrifice Judgements
To illustrate a safety culture, leaders tell stories about an individual
making tough decisions when goals conflict. The stories always have
the same basic form even though the details may come from a personal
Essential Characteristics of Resilience 31
experience or from re-telling of a story gathered from another domain
with a high reputation for safety (e.g., health care leaders often use
aerospace stories):
Someone noticed there might be a problem developing, but the
evidence is subtle or ambiguous. This person has the courage
to speak up and stop the production process underway. After
the aircraft gets back on the ground or after the system is
dismantled or after the hint is chased down with additional
data, then all discover the courageous voice was correct. There
was a problem that would otherwise have been missed and to
have continued would have resulted in failure, losses, and
injuries. The story closes with an image of accolades for the
courageous voice.
When the speaker finishes the story, the audience sighs with
appreciation – that was an admirable voice and it illustrates
how a great organization encourages people to speak up about
potential safety problems. You can almost see people in the
audience thinking, ‘I wish my organization had a culture that
helped people act this way.’
But this common story line has the wrong ending. It is a quite
different ending that provides the true test for a high resilience
organization.
When they go look, after the landing or after dismantling the
device or after the extra tests were run, everything turns out to
be OK. The evidence of a problem isn’t there or may be
ambiguous; production apparently did not need to be stopped.
Now, how does the organization’s management react? How do
the courageous voice’s peers react?
For there to be high resilience, the organization has to
recognize the voice as courageous and valuable even though
the result was apparently an unnecessary sacrifice on
production and efficiency goals. Otherwise, people balancing
multiple goals will tend to act riskier than we want them to, or
riskier than they themselves really want to.
32 Resilience Engineering
These contrasting story lines illustrate the difficulties of balancing
acute goals with chronic ones. Given a backdrop of schedule pressure,
how should an organization react to potential ‘warning’ signs and seek
to handle the issues the signs point to? If organizations never sacrifice
production pressure to follow up warning signs, they are acting much
too risky. On the other hand, if uncertain ‘warning’ signs always lead to
sacrifices on acute goals, can the organization operate within reasonable
parameters or stakeholder demands? It is easy for organizations that are
working hard to advance the acute goal set to see such warning signs as
risking inefficiencies or as low probability of concern as they point to a
record of apparent success and improvement. Ironically, these same
signs after-the-fact of an accident appear to all as clear cut undeniable
warning signs of imminent dangers.
To proactively manage risk prior to outcome requires ways to know
when to relax the pressure on throughput and efficiency goals, i.e.,
making a sacrifice judgement. Resilience engineering needs to provide
organizations with help on how to decide when to relax production
pressure to reduce risk (Woods, 2000). I refer to these trade-off
decisions as sacrifice judgements because acute production or efficiency
related goals are temporarily sacrificed, or the pressure to achieve these
goals is relaxed, in order to reduce the risks of approaching too near
safety boundaries. Sacrifice judgements occur in many settings: when to
convert from laparoscopic surgery to an open procedure (Dominguez
et al., 2004 and the discussion in Cook et al., 1998), when to break off
an approach to an airport during weather that increases the risks of
wind shear, and when to have a local slowdown in production
operations to avoid risks as complications build up.
New research is needed to understand this judgement process in
individuals and in organizations. Previous research on such decisions
(e.g., production/safety trade-off decisions in laparoscopic surgery)
indicates that the decision to value production over safety is implicit
and unrecognized. The result is that individuals and organizations act
much riskier than they would ever desire. A sacrifice judgement is
especially difficult because the hindsight view will indicate that the
sacrifice or relaxation may have been unnecessary since ‘nothing
happened.’ This means that it is important to assess how peers and
superiors react to such decisions.
The goal is to develop explicit guidance on how to help people
make the relaxation/sacrifice judgement under uncertainty, to maintain
Essential Characteristics of Resilience 33
a desired level of risk acceptance/risk averseness, and to recognize
changing levels of risk acceptance/risk averseness. For example, what
indicators reveal a safety/production trade-off sliding out of balance as
pressure rises to achieve acute production and efficiency goals?
Ironically, it is these very times of higher organizational tempo and
focus on acute goals that require extra investments in sources of
resilience to keep production/safety trade-offs in balance – valuing
thoroughness despite the potential for sacrifices on efficiency required
to meet stakeholder demands.
Note how the recommendation to aid sacrifice judgements is a
specialization of general methods for aiding any system confronting a
trade-off: (a) improve the discrimination power of the system
confronting the trade-off, and (b) help the system dynamically match its
placement of a decision criterion with the assessment of changing risk
and uncertainty.
Resilience Engineering should provide the means for dynamically
adjusting the balance across the sets of acute and chronic goals. The
dilemma of production pressure/safety trade-offs is that we need to pay
the most attention to, and devote scarce resources to, potential future
safety risks when they are least affordable due to increasing pressures to
produce or economize. As a result, organizations unknowingly act
riskier than they would normally accept. The first step is tools to
monitor the boundary between competence at designed-for-
uncertainties and unanticipated perturbations that challenge or fall
outside that envelope. Recognizing signs of unanticipated perturbations
consuming or stretching the sources of resilience in the system can lead
actions to re-charge a system’s resilience. How can we increase,
maintain, or re-establish resilience when buffers are being depleted,
margins are precarious, processes become stiff, and squeezes become
tighter?
Acknowledgements
This work was supported in part by grant NNA04CK45A from NASA
Ames Research Center to develop resilience engineering concepts for
managing organizational risk. The ideas presented benefited from
discussions in the NASA’s Design for Safety workshop and Workshop
34 Resilience Engineering
on organizational risk. Discussions with John Wreathall helped develop
the model of trade-offs across acute and chronic goals.
... Related research focuses on the elaboration of characteristics of resilient systems, e.g. [3], [4], its operability, e.g. [5], [6], and the assessment of resilience [7]- [9]. ...
Conference Paper
The knowledge about the responses to hazardous events is of importance throughout the whole life cycle of a complex system, regardless whether during design or operation phases. These responses also allow to draw conclusions about the resilience of the system. Consequently, there is a need for an extensive consideration of all possible hazardous events a system can be exposed to. This work presents a method for determining the hazards with the most critical system response in terms of resilience. Therefore, we introduce a method for modeling failure propagation under consideration of dynamic behavior in function models. This method is then extended for assessing resilience for random hazard scenarios. Finally, we propose two solutions for determining the most critical hazard scenarios, and thus, provide a base for improvements of the system.
Article
Full-text available
Widespread changes to climate-sensitive systems are placing increased demands on risk assessments as a foundation for managing risk. Recent attention to compounding and cascading risks, deep uncertainty, and ‘‘bottom-up’’ risk assessment frameworks have foregrounded the need to account for systemic complexity in risk assessment methodology. We describe the sources of systemic complexity and highlight the role of risk assessments as a formal sense-making device that enables learning and organizing knowledge of the dynamic interplay between the climate-sensitive system and its (climatological) environment. We highlight boundary judgments as a core concern of risk assessments, helping to create islands of analytical and cogni-tive tractability in a complex, uncertain, and ambiguous world. We then point to three key concepts—bound-ary critique, multi-methodology, and second-order learning—as critical elements of contemporary risk assessment practice, and we weave these into an overarching framework to better account for systemic complexity in the assessment of climate risk.
Article
With the rapid development of railway transporta- tion, higher requirements for capacity are increasing and amount of communication, computer and control technologies are applied in train control systems which is the popular method for train control. The application of 3 C brings the enhancement of railway transportation performance. However, the number of internal and external cyber security threats is rising, which could lead to some railway accidents. On the other hand, safety is the core of the design process for railway but security is rarely considered due to the closure of traditional railway systems. As cyber security is threatening the normal operation of industrial control systems and the critical infrastructure, security issues of train control systems should be paid more attention as railway transportation play an important role for society and economy. However, due to the safety-critical characteristics, safety assessment and analysis works have been performed for a long time, but cyber security related works are rarely considered. In the paper, we demonstrate the security situation of train control systems based on the inherent features and the experience of other industrial control systems, where the technical defects and potential threats are summarized. According to the practical engineering experience, the current security protection strategies are illustrated, which shows the popular security protection methods are limited and cannot realize the defense-in-depth. Finally, some research challenges of security issues of train control systems are identified.
Chapter
Findings about high-reliability organizations (HRO) capture the efforts that people make, at all levels of an organization, to learn and adapt to ensure safe operations despite variability, increasing complexity, and changing risks. The HRO empirical research base shows how safety originates in the interactions between the operational and leadership activities of people. The high-reliability organization perspective is relevant in aviation because the industry has worked to systematize processes for learning from incidents and accidents. HRO also has been one contributor to the rise of Resilience Engineering which leans forward in time to make learning more proactive and, thus, management more adaptive. High Resilience Organizations focus on how people are a source of adaptive capacity that regularly defuses trouble before it becomes visible in traditional management information channels. This shifts what is informative for management. One example is monitoring how managerial decisions and activities can create difficult conflicts and tight pressures that squeeze operations in critical periods. Other key findings include: HROs do not take a record of past reliability for granted as this undermines proactive learning. HROs keep wondering why operations are successful regularly, and they see people as primarily responsible for such resilient performance. HROs consider how ongoing changes in the environment, organization, and technology change risks. These forms of information can help make safety management highly adaptive and proactive.
Article
Full-text available
Progressing digitalization and networking of systems and organizations representing Critical Infrastructures opens promising new potentials and opportunities, which on the downside, are accompanied by rising complexity and increasingly opaque interdependencies. The consequently increasing lack of knowledge leads to uncertainties affecting risk assessment and decision-making in case of adverse events. This trend motivated recent discussions and developments in risk science, emphasizing the need to handle such uncertainties. Complementarily, research in the resilience domain focuses on system capabilities to handle surprising hazardous situations. Several frameworks presented in the literature aim at combining both perspectives but either lack the focus on operational management, have a rather theoretical approach, or are designed for specific applications. Based on this observation, we propose an approach that integrates resilience management into the actual operation of Critical Infrastructure Systems and Organizations by providing an operational process that coordinates the fundamental resilience capabilities of responding, monitoring, anticipation, and learning. Furthermore, we tackle the challenge of uncertainties resulting from a lack of knowledge by aligning the concepts of digital twin and resilience management. The proposed framework is extensively discussed, and required processes are presented in detail. Eventually, its applicability and potential are reviewed by means of a complex hazardous situation at a Bavarian district heating power plant.
Article
In recent years, several major accidents, such as the US Macondo well blowout in 2010, Chinese Bohai Bay oil spills in 2011, Brazilian FPSO Cidade de São Mateus gas explosion in 2015 and Chinese Bohai oil field blowout & fire accident in 2021, have provoked a high awareness that an essential distinction exists between the major accident management and the occupational accident management in the offshore petroleum sector. Further, the urgent need for defining effective major accident indicators is confirmed for the purpose of identifying early warning signals before the major offshore accident occurs. Regrettably, to this day, the offshore petroleum sector has not reached a consensus on the theoretical foundation for the development of effective major accident indicators. This article presents a focused review on the extensive work of the development of major accident indicators in the offshore petroleum sector, including terminologies, assessment criteria for good indicators, development approaches, as well as an overview of current major accident indicators. Following the close scrutiny of this focused review, the strengths and weaknesses of different development approaches are compared. The progress, challenges, suitability and validity of the development of major accident indicators are discussed. On the basis of these insights, future works are suggested to develop effective major accident indicators to better engage on the emerging and complex challenges in preventing major offshore accidents.
Article
As COVID-19 spread across Brazil, it quickly reached remote regions including Amazon's ultra-peripheral locations where patient transportation through rivers is added to the list of obstacles to overcome. This article analyses the pandemic's effects in the access of riverine communities to the prehospital emergency healthcare system in the Brazilian Upper Amazon River region. To do so, we present two studies that by using a Resilience Engineering approach aimed to predict the functioning of the Brazilian Mobile Emergency Medical Service (SAMU) for riverside and coastal areas during the COVID-19 pandemic, based on the normal system functioning. Study I, carried out before the pandemic, applied ethnographic methods for data collection and the Functional Resonance Analysis Method - FRAM for data analysis in order to develop a model of the mobile emergency care in the region during typical conditions of operation. Study II then estimated how changes in variability dynamics would alter system functioning during the pandemic, arriving at three trends that could lead the service to collapse. Finally, the accuracy of predictions is discussed after the pandemic first peaked in the region. Findings reveal that relatively small changes in variability dynamics can deliver strong implications to operating care and safety of expeditions aboard water ambulances. Also, important elements that add to the resilient capabilities of the system are extra-organizational, and thus during the pandemic safety became jeopardized as informal support networks grew fragile. Using FRAM for modelling regular operation enabled prospective scenario analysis that accurately predicted disruptions in providing emergency care to riverine population.
Article
The research literature has found current Enterprise Architecture (EA) methods are limited in dealing with uncertainty and pathologies of complex systems to enable design and operation of a resilient enterprise. To some extent, EAs’ approaches persist in applying textbook plans and activities in the face of mounting evidence of changing circumstances and the challenges of uncertainty. They rely on a qualitative shift in assessment, priorities, or response strategy, that often lead to a ‘failure to adapt adequately.’ To address this gap in EA resilience representation, we have combined several prior research proposals to produce a wholistic Department of Defence Architecture Framework (DoDAF) resilience architecture and enhanced that with our original underpinning resilience framework. Despite still being in an ongoing major case study, our comprehensive resilience representation shows promise of assisting all enterprise stakeholders with adapting this representation to their capability systems. Doing so will better incorporate resilience considerations in capability systems’ design and likely help capability stakeholders to evolve capability systems with appropriate levels of resilience throughout their life cycle.
ResearchGate has not been able to resolve any references for this publication.