How Adaptive Systems Fail

Chapter (PDF Available) · January 2011with 905 Reads
In book: Resilience Engineering in Practice, Chapter: How Adaptive Systems Fail, Publisher: Ashgate, Editors: E. Hollnagel, J. Paries, J. Wreathall, D. D. Woods, pp.127-143
Abstract
This chapter provides one input to resilience management strategies in the form of three basic patterns in how adaptive systems fail. The three basic patterns are (1) decompensation – when the system exhausts its capacity to adapt as disturbances / challenges cascade; (2) working at cross-purposes – when roles exhibit behaviour that is locally adaptive but globally mal-adaptive; and (3) getting stuck in outdated behaviours – when the system over-relies on past successes. Illustrations are drawn from urban fire-fighting and crisis management. A working organisation needs to be able to see and avoid or recognise and escape when the system is moving toward one of the three basic adaptive traps. Understanding how adaptive systems can fail requires contrasting diverse perspectives.
Figures - uploaded by David D Woods
Author content
All content in this area was uploaded by David D Woods
Chapter 10 1
Chapter 10: Basic Patterns in How Adaptive
Systems Fail
David D. Woods and Matthieu Branlat
This chapter provides one input to resilience management
strategies in the form of three basic patterns in how adaptive
systems fail. The three basic patterns are (1) decompensation
when the system exhausts its capacity to adapt as disturbances /
challenges cascade; (2) working at cross-purposes when roles
exhibit behaviour that is locally adaptive but globally mal-adaptive;
and (3) getting stuck in outdated behaviours when the system
over-relies on past successes. Illustrations are drawn from urban
fire-fighting and crisis management. A working organisation needs
to be able to see and avoid or recognise and escape when the
system is moving toward one of the three basic adaptive traps.
Understanding how adaptive systems can fail requires contrasting
diverse perspectives.
The Optimist-Pessimist Divide on Complex Adaptive
Systems
Adaptive System Sciences begin with fundamental trade-offs
optimality-brittleness, (Csete and Doyle, 2002; Zhou, Carlson and
Doyle, 2005) or efficiency-thoroughness (Hollnagel, 2009). As an entity,
group, system, or organisation attempts to improve its performance it
becomes better adapted to some things, factors, events, disturbances, or
variations in its environment (its ‘fitness’ improves). However, as a
consequence of improving its fitness with respect to some aspects of its
environment, that entity also must become less adapted to other events,
disturbances, or variations. As a result, when those ‘other’ events or
variations occur, the entity in question will be severely tested and may
2 Resilience Engineering in Practice
fail (this dynamic is illustrated by the story of the Columbia space
shuttle accident; e.g., Woods, 2005a).
The driving question becomes whether (and how) an entity can
identify and manage its position in the trade-off space? In other words,
can an organisation monitor its position and trajectory in a trade-off
space and make investments to move its trajectory prior to crisis
events? The pessimists on complexity and adaptive systems (e.g.,
Perrow, 1984) see adaptive systems as trapped in a cycle of expansion,
saturation, and eventual collapse. The pessimist stance answers the
above questions with ‘No.’ Their response means that as a system
adapts to meet pressures to be ‘faster, better, cheaper,’ it will become
more complex and experience the costs associated with increasing
complexity with little recourse.
Resilience Engineering, on the other hand, represents the optimist
stance and its agenda is to develop ways to control or manage a
system’s adaptive capacities based on empirical evidence. Resilience
Engineering maintains that a system can manage brittleness trade-offs.
To achieve such resilient control and management, a system must have
the ability to reflect on how well it is adapted, what it is adapted to, and
what is changing in its environment. Armed with information about
how the system is resilient and brittle and what trends are under way,
managers can make decisions about how to invest resources in targeted
ways to increase resilience (Woods, 2006a; Hollnagel, 2009).
The optimist stance assumes that an adaptive system has some
ability to self-monitor its adaptive capacity (reflective adaptation) and
anticipate/learn so that it can modulate its adaptive capacity to handle
future situations, events, opportunities and disruptions. In other words,
the optimist stance looks at human systems as able to examine, reflect,
anticipate, and learn about its own adaptive capacity.
The pessimist stance, on the other hand, sees an adaptive system as
an automatic built-in process that has very limited ability for learning
and self-management. Systems may vary in how they adapt and how
this produces emergent patterns but the ability to control these cycles is
very limited. It is ironic that the pessimist stance thinks people can
study and learn about human adaptive systems, but that little can be
done to change/design adaptive systems because new complexities and
unintended consequences will sabotage the best laid plans. Resilience
Chapter 10 3
Engineering admits that changing/designing adaptive systems is hard,
but sees it as both necessary and possible. Resilience Engineering in
practice provides guidance on how to begin doing this.
This chapter provides one input to resilience management
strategies in the form of three basic patterns in how adaptive systems
fail. The taxonomy continues the line of work begun by Woods and
Cook (2006) who described one basic pattern in how adaptive systems
behave and how they fail. The chapter also illustrates these patterns in
examples drawn from urban fire-fighting and crisis management. To
develop resilience management strategies, organisations need to be able
to look ahead and either see and avoid or recognise and escape when they are
headed for adaptive traps of one kind or another. A taxonomy of
different maladaptive patterns is valuable input to develop these
strategies.
Assessing Future Resilience from Studying the History
of Adaptation (and Maladaptation)
The resilience/brittleness of a system captures how well it can adapt to
handle events that challenge the boundary conditions for its operation.
Such ‘challenge’ events do occur (1) because plans and procedures have
fundamental limits, (2) because the environment changes over time and
in surprising ways, (3) because the system itself adapts around successes
given changing pressures and expectations for performance. In large
part, the capacity to respond to challenge events resides in the expertise,
strategies, tools, and plans that people in various roles can deploy to
prepare for and respond to specific classes of challenge.
Resilience, as a form of adaptive capacity, is a system’s potential for
adaptive action in the future when information varies, conditions change,
or when new kinds of events occur, any of which challenge the viability
of previous adaptations, models, plans, or assumptions. However, the
data to measure resilience comes from observing/analysing how the
system has adapted to disrupting events and changes in the past (Woods,
2009a, p. 500). Past incidents provide information about how a system
was both brittle, by revealing how it was unable to adapt in a particular
evolving situation, and resilient, by revealing aspects of how it routinely
adapted to disruptions (Woods and Cook, 2006). Analysis of data about
4 Resilience Engineering in Practice
how the system adapted and to what, can provide a characterisation of
how well operational systems are prepared in advance to handle
different kinds of challenge events and surprises (Hollnagel et al., 2006).
Patterns of failure arise due to basic regularities about adaptation in
complex systems. The patterns are generalisations derived from
analysing cases where systems were unable to prepare for and handle
new challenges. The patterns all involve dynamic interactions between
the system in question and the events that occur in its environment.
The patterns also involve interactions among people in different roles
each trying to prepare for and handle the events that occur within the
scope of their roles. The patterns apply to systems across different
scales – individuals, groups, organisations.
Patterns of Maladaptation
There are three basic patterns by which adaptive systems break down,
and within each, there is a variety of sub-patterns. The three basic
patterns are (1) decompensation, (2) working at cross-purposes, and (3)
getting stuck in outdated behaviours.
Decompensation: Exhausting Capacity to Adapt as Disturbances /
Challenges Cascade
In this pattern, breakdown occurs when challenges grow and cascade
faster than responses can be decided on and effectively deployed. A
variety of cases from supervisory control of dynamic processes provide
the archetype for the basic pattern. Decompensation occurs in human
cardiovascular physiology, e.g., the Starling curve in cardiology. When
physicians manage sick hearts they can miss signals that the
cardiovascular system is running out of control capability and fail to
intervene early enough to avoid a physiological crisis (Feltovich, Spiro
and Coulson, 1989; Cook, Woods and McDonald, 1991; Woods and
Cook, 2006). Decompensation also occurs in human supervisory
control of automated systems, for instance in aviation. In cases of
asymmetric lift due to icing or slowly building engine trouble,
automation can silently compensate but only up to a point. Flight crews
may recognise and intervene only when the automation is nearly out of
capacity to respond and when the disturbances have grown much more
Chapter 10 5
severe. At this late stage there is also a risk of a bumpy transfer of
control that exacerbates the control problem. Noticing early that the
automation has to work harder and harder to maintain control is
essential (Norman, 1990; Woods, 1994; Woods and Sarter, 2000
provide examples from cockpit automation). Figure 1 illustrates the
generic signature for decompensation breakdowns.
The basic decompensation pattern evolves across two phases. In
the first phase, a part of the system adapts to compensate for a growing
disturbance. Partially successful initially, this compensatory control
masks the presence and development of the underlying disturbance.
The second phase of a decompensation event occurs because the
automated response cannot compensate for the disturbance completely
or indefinitely. After the response mechanism’s capacity is exhausted,
the controlled parameter suddenly collapses (the decompensation event
that leads to the name).
Figure 1. The basic decompensation signature.
The question is whether a part of the system, a supervisory
controller, can detect the developing problem during the first phase of
6 Resilience Engineering in Practice
the event pattern or whether it misses the signs that the lower order or
base controllers (automated loops in the typical system analysis) are
working harder and harder to compensate but getting nearer to its
capacity limits as the external challenge persists or grows? This requires
discriminating between adaptive behaviour that is part of successful
control and adaptive behaviour that is a sign of incipient failure to
come.
In these situations, the critical information is not the abnormal
process symptoms per se but the increasing force with which they must
be resisted relative to the capabilities of the base control systems. For
example, when a human acts as the base control system, he/she would
as an effective team member communicate to others the fact that they
need to exert unusual control effort (Norman, 1990). Such information
provides a diagnostic cue for the team and is a signal that additional
resources need to be injected to keep the process under control. If
there is no information about how hard the base control system is
working to maintain control in the face of disturbances, it is quite
difficult to recognise the seriousness of the situation during the phase 1
portion, and therefore to respond early enough to avoid the
decompensation collapse that marks phase 2 of the event pattern. The
key information is how hard control systems are working to maintain
control and the trend: are control systems running out of control
capability as disturbances are growing or cascading?
There are a number of variations on the decompensation pattern,
notably:
Falling behind the tempo of operations (e.g., the aviation expression
‘falling behind the power curve;’ surges in demands in emergency
rooms Wears and Woods, 2007; bed crunches in intensive care
units Cook, 2006).
Inability of an organisation to transition to new modes of functioning when
anomalies challenge normal mechanisms or contingencies (e.g., a hospital’s
ability to manage mass casualty events see Committee on the
Future of Emergency Care in the US, 2006; Woods and Wreathall,
2008 provide a general description of this risk).
Chapter 10 7
Working at Cross-purposes: Behaviour that is Locally Adaptive, but
Globally Maladaptive
This refers to the inability to coordinate different groups at different
echelons as goals conflict. As a result of miscoordination the groups
work at cross-purposes. Each group works hard to achieve the local
goals defined for their scope of responsibility, but these activities make
it more difficult for other groups to meet the responsibilities of their
roles or undermine the global or long term goals that all groups
recognise to some degree.
The archetype is the tragedy of the commons (Ostrom, 1990, 1999)
which concerns shared physical resources (among the most studied
examples of common pools are fisheries management and water
resources for irrigation). The tragedy of the commons is a name for a
baseline adaptive dynamic whereby the actors, by acting rationally in the
short term to generate a return in a competitive environment, deplete
or destroy the common resource on which they depend in the long run.
In the usual description of the dynamic, participants are trapped in an
adaptive cycle that inexorably overuses the common resource (a
‘pessimist’ stance on adaptive systems); thus, from a larger systems view
the local actions of groups are counter-productive and lead them to
destroy their livelihood or way of life in the long run.
Organisational analyses of accidents like the Columbia space shuttle
accident put production/safety trade-offs in a parallel position to
tragedies of the commons. Despite the organisations’ attempts to
design operations for high safety and the large costs of failures in
money and in lives, line managers under production pressures make
decisions that gradually erode safety margins and thereby undermine
the larger common goal of safety. In other words, safety can be thought
of as an abstract common pool resource analogous to a fishery. Thus,
dilemmas that arise in managing physical common pool resources are a
specific example of a general type of goal conflict where different
groups are differentially responsible for and affected by different sub-
goals, even though there is one or only a couple of commonly held
over-arching goals (Woods et al., 1994, Chapter 4). When the activities
of different groups seem to advance local goals but undermine over-
arching or long term goals of the larger system that the groups belong
8 Resilience Engineering in Practice
to, the system-level pattern is maladaptive as the groups work at cross-
purposes. Specific concrete stories that capture this pattern of adaptive
breakdown can be found in Brown (2005), who collected cases of safety
dilemmas and sacrifice judgments in health care situations.
There is a variety of sub-patterns to working at cross purposes.
Some of these concerns vertical interactions, that is across echelons or
levels of control, such as the tragedy of the commons. Others concern
horizontal interactions when many different groups need to coordinate
their activities in time and space such as in disaster response and
military operations. This pattern can also occur over time. A sub-
pattern that includes a temporal component and is particularly
important in highly coupled systems is missing side effects of change
(Woods and Hollnagel, 2006). This can occur when there is a change
that disrupts plans in progress or when a new event presents new
demands to be handled, among other events. Other characteristic sub-
patterns are:
Fragmentation over roles (stuck in silos; e.g., precursors to Columbia
space shuttle accident, Woods, 2005a).
Failure to resynchronise following disruptions (Branlat et al., 2009).
Double binds (Woods et al., in press).
Getting Stuck in Outdated Behaviours: The World Changes but the
System Remains Stuck in what were Previously Adaptive Strategies
(Over-relying on Past Successes)
This pattern relates to breakdowns in how systems learn. What was
previously adaptive can become rigid at the level of individuals, groups,
or organisations. These behaviours can persist even as information
builds that the world is changing and that the usual
behaviours/processes are not working to produce desired effects or
achieve goals. One example is the description of the cycle of error as
organisations become trapped in narrow interpretations of what led to
an accident (Cook, Woods and Miller, 1998).
This pattern is also at play at more limited operational time scopes.
Domains such as military operations offer a rich environment for
Chapter 10 9
studying the pattern. When conditions of operation change over time,
tactics or strategies need to be updated to match new challenges or
opportunities. While such decisions are made difficult by the uncertain
nature of the operations’ environment and of the outcome of actions,
missed opportunities to re-plan constitute sources of failure (Woods
and Shattuck, 2000). Mishaps in the nuclear industry have also
exemplified the pattern by showing the dangers of “rote rule following”
(ibid.). In all of these cases there was a failure to re-plan when the
conditions experienced fell outside of the boundaries the system and
plans were designed for. Some characteristic sub-patterns are:
Oversimplifications (Feltovich, Spiro and Coulson, 1997).
Failing to revise current assessment as new evidence comes in (Woods and
Hollnagel, 2006; Rudolph, 2009).
Failing to revise plan in progress when disruptions/opportunities arise
(Woods and Hollnagel, 2006).
Discount discrepant evidence (e.g., precursors to Columbia, Woods,
2005a).
Literal Mindedness, particularly in automation failures (Woods and
Hollnagel, 2006).
Distancing through differencing (Cook and Woods, 2006).
Cook’s Cycle of Error (Cook et al., 1998).
The three basic patterns define kinds of adaptive traps. A reflective
adaptive system should be able to monitor its activities and functions
relative to its changing environment and determine whether it is likely
to fall into one or another of these adaptive traps. The three basic
patterns can be used to understand better how various systems are
vulnerable to failures, such as systems that carry out crisis management,
systems that respond to anomalies in space flights, and systems that
provide critical care to patients in medicine. In the next section, we test
the explanatory value of these three basic patterns by re-visiting a recent
analysis of critical incidents (Branlat et al., 2009) that provided markers
of both resilience and brittleness (Woods and Cook, 2006). Urban fire-
fighting provides a rich setting to examine aspects of resilience and
brittleness related to adaptation and coordination processes. Incident
command especially instantiates patterns generic to adaptive systems
10 Resilience Engineering in Practice
and observed in other domains or at other scales (Bengtsson et al.,
2003; Woods and Wreathall, 2008).
The Basic Patterns Are illustrated in Urban Fire-
fighting Critical Incidents
High uncertainty and potential for disruptions, new events, and
surprises all pose challenges for fire-fighting operations. The fire-
fighting organisation needs to be able to adapt to new information
(whether a challenge or opportunity) about the situation at hand and to
ever-changing conditions. For example, consider this case from the
corpus (Branlat et al., 2009):
Companies arrive on the fire scene and implement standard
operating procedures for an active fire on the first floor of the
building. The first ladder company initiates entry to the apartment
on fire, while the second ladder gets to the second floor in order to
search for potentially trapped victims (the ‘floor above the fire’ is
an acknowledged hazardous position). In the meantime, engine
companies stretch hose-lines but experience various difficulties
delaying their actions, especially because they cannot achieve
optimal positioning of their apparatus on a heavily trafficked street.
While all units are operating, conditions are deteriorating in the
absence of water being provisioned on the fire. The Incident
Commander (IC) transmits an ‘all hands’ signal to the dispatcher,
leading to the immediate assignment of additional companies.
Almost at the same time, members operating above the fire
transmit a ‘URGENT’ message over the radio. Although the IC
tries to establish communication and get more information about
the difficulties encountered, he does not have uncommitted
companies to assist the members. Within less than a minute, a
back-draft-type explosion occurs in the on fire apartment,
engulfing the building’s staircase in flames and intense heat for
several seconds, and erupting through the roof. As the members
operating on the second floor had not been able to get access to
the apartment there due to various difficulties, they lacked both a
Chapter 10 11
refuge area (apartment) and an egress route (staircase). The second
ladder company was directly exposed to life-threatening conditions.
The three basic patterns can all be seen at work in this case:
Decompensation. The situation deteriorated without companies being
able to address the problem promptly. The Incident Commander
(IC) recognised and signalled an ‘all hands’ situation, in order to
inform dispatchers that all companies were operating and to
promptly request additional resources. As there were no
uncommitted resources available, the fire companies were unable to
respond when an unexpected event occurred (the back-draft) which
created dangers and hindered the ability of others to assist. As a
result, team members were exposed to dangerous conditions.
Working at cross-purposes. Companies were pursuing their tasks and
experienced various challenges without the knowledge of other
companies’ difficulties. Without this information, actions on the
first floor worked against the actions and safety of operators on the
second floor. Goal conflict arose (1) between the need to provide
access to the fire and to contain it while water management was
difficult, and (2) between the need to address a deteriorating
situation and to rescue injured members while all operators were
committed to their tasks.
Getting stuck in outdated behaviour. The ladder companies continued to
implement standard procedures that assumed another condition
was met (water availability from the engine companies). They failed
to adapt the normally relevant sequence of activities to fit the
changing particulars of this situation: the first ladder company
gained access to the apartment on fire; but in the absence of water,
the opened door fuelled the fire and allowed flames and heat to
spread to the rest of the building (exacerbating how the fire
conditions were deteriorating). Similarly, the unit operating on the
second floor executed its tasks normally, but the difficulty it
encountered and the deteriorating situation required adaptation of
normal routines to fit the changing risks.
12 Resilience Engineering in Practice
Urban Fire-fighting and the Dynamics of Decompensation
During operations, it is especially important for the Incident
Commander (IC) constantly and correctly to assess progress in terms of
trends in whether the fire is in or out of control. To do this, the IC
monitors (a) the operational environment including the evolution of the
fire and the development of additional demands or threats (e.g.,
structural damages or trapped victims) and (b) the effort companies are
exerting to try to accomplish their tasks as well as their capacity to
respond to additional demands. Based on such assessments, the IC
makes critical decisions related to the management of resources:
redeploying companies in support of a particular task; requesting
additional companies to address fire extensions or need to relieve
members; requesting special units to add particular forms of expertise
to handle unusual situations (e.g., presence of hazardous material).
ICs are particularly attentive to avoid risks of falling behind by
exhausting the system’s capacity to respond to immediate demands as
well as to new demands (Branlat et al., 2009). The ‘all-hands’ signal is a
recognition that the situation is precarious because it is stretched close
to its maximum capacity and that current operations therefore are
vulnerable to any additional demands that may occur. The analysis of
the IC role emphasised anticipating trends or potential trends in
demands relative to how well operations were able to meet those
demands (see also Cook’s analysis of resource crunches in intensive
care units; Cook, 2006). For urban fire-fighting, given crucial time
constraints, resources are likely to be available too late if they are
requested only when the need is definitive. A critical task of the IC
therefore corresponds to the regulation of adaptive capacity by
providing ‘tactical reserves’ (Klaene and Sanders, 2008, p. 127), i.e., an
additional capacity promptly to adapt tactics to changing situations.
Equivalent processes also play out (a) at the echelon of fire-fighters or
fire teams, (b) in terms of the distributed activity (horizontal
interactions) across roles at broader echelons of the emergency
response system, and (c) vertically across echelons where information
about difficulties at one level change decisions and responses at another
echelon.
Chapter 10 13
Urban Fire-fighting and Coordination over Multiple Groups and Goals
Fire-fighting exemplifies situations within which tasks and roles are
highly distributed and interdependent, exposing work systems to the
difficulty of maintaining synchronisation while providing flexibility to
address ever-changing demands. Interdependencies also result from the
fact that companies operate in a shared environment.
Several reports within the corpus described incidents where
companies opened hose-lines and pushed fire and heat in the direction
of others. These situations usually resulted from companies adapting
their plan because of difficulties or opportunities. If the shift in activity
by one group was not followed by a successful resynchronisation, it
created conditions for a coordination breakdown where companies
(and, importantly, the IC) temporarily lost track of each other’s position
and actions. In this context one group could adapt to handle the
conditions they face in ways that inadvertently created or exacerbated
threats for other groups. Another example in the corpus was situations
where companies’ capacity to fulfil their functions were impeded by
actions of others. One groups actions, though locally adaptive relative
to their scope, introduced new constraints which reduced another
company’s ‘margins of manoeuvre’ (Coutarel, Daniellou and Dugué,
2003). This notion refers to the range of behaviours they are able to
deploy in order to fulfil their functions, therefore to their capacity to
adapt a course or plan of action in the face of new challenges. Such
dynamics might directly compromise members’ safety, for example
when the constrained functions were critical to egress route
management. In one case, a company vented a window adjacent to a
fire escape which had the consequence of preventing the members of
another company operating on the floor above from using the fire
escape as a potential egress route, should it have been needed.
Goal conflicts arise when there are trade-offs between achieving
the three fundamental purposes of urban fire-fighting: saving lives,
protecting property and ensuring personnel’s safety. This occurs when,
for example, a fire department forgoes the goal of protecting property
in order to minimise risk to fire-fighters. Incidents in the corpus vividly
illustrate the trade-offs that can arise during operations and require
adaptations to on-going operations. Under limited resources (time,
14 Resilience Engineering in Practice
water, operators), the need to rescue a distressed fire-fighter introduces
a difficult goal conflict between rescue and fire operations. If members
pursue fire operations, the victim risks life-threatening exposure to the
dangerous environment. Yet by abandoning fire operations,
momentarily or partially, team members risk letting the situation
degrade and the situation becomes more difficult and more dangerous
to address. The analysis of the corpus of cases found that adaptations in
such cases were driven by local concerns, e.g., when members
suspended their current operations to assist rescue operations nearby.
The management of goal conflicts is difficult when operations are not
clearly synchronised, since decisions that are only locally adapted risk
further fragmenting operations.
Urban Fire-fighting and the Risk of Getting Stuck in Outdated
Behaviours
As an instance of emergency response, urban fire-fighting is
characterised by the need to make decisions at a high-tempo and under
uncertainty. As fire-fighters discover and assess the problem to be
addressed during the course of operations, replanning is a central
process. It is critical that adaptations to the plan are made when
elements of the situation indicate that previous knowledge (on which
on-going strategy and tactics are based) is outdated. The capacity to
adapt is therefore highly dependent on the capacity correctly to assess
the situation at hand throughout the operations, especially at the level
of the IC. Accident cases show that the capacity of the IC to efficiently
supervise operations and modify the plan in progress is severely
impaired when this person only has limited information about and
understanding of the situation at hand and the level of control on the
fire.
Given the level of uncertainty, this also suggests the need for
response systems to be willing to devote resources to further assess
ambiguous signals, a characteristic of resilient and high-reliability
organisations (Woods, 2006a; Rochlin, 1999). This is nonetheless
challenging in the context of limited resources and high tempo, and
given the potential cost of replanning (risk of fragmenting operations,
cost of redeploying companies, coordination costs).
Chapter 10 15
At a wider temporal and organisational scale, fire departments and
organisations are confronted with the need to learn from situations in
order to increase or maintain operations’ resilience in the face of
evolving threats and demands. The reports analysed resulted from
thorough investigation processes that aimed at understanding limits in
current practices and tools and represented process of learning and
transformation. However, it is limiting to assume that the events that
produce the worst outcomes are also the ones that will produce the
most useful lessons. Instances where challenging and surprising
situations are managed without leading to highly severe outcomes also
reveal interesting and innovative forms of adaptations (Woods and
Cook, 2006). As stated previously, many minor incidents also represent
warning signals about the (in)adequacy of responses to the situations
encountered. They are indicators of the system starting to stretch
before it collapses in the form of a dramatic event (Woods and
Wreathall, 2008). To be resilient, organisations must be willing to
pursue these signals (Woods, 2009a). Unfortunately, selecting the
experiences or events which will prove fruitful to investigate, and
allocating the corresponding resources, is a difficult choice when it has
to be made a priori (Hollnagel, 2007; Dekker, 2008, chapter 3).
Recognising What is Maladaptive Depends on
Perspective Contrasts
The chapter has presented three basic patterns in how adaptive systems
fail. But it is difficult to understand how behaviours of people, groups,
and organisations are adapted to some factors and how those
adaptations are weak or strong, well or poorly adapted. One reason for
this is that what is well-adaptive, under-adaptive, or maladaptive is a
matter of perspective. As a result, labelling a behaviour or process as
maladapted is conditional on specifying a contrast across perspectives.
First, adaptive decision-making exhibits local (though bounded)
rationality (regardless of scale). A human adaptive system uses its
knowledge and the information available from its field of view/focus of
attention to adapt its behaviour (given its scope of autonomy/authority)
in pursuit of its goals. As a result, adaptive behaviour is well-adapted
16 Resilience Engineering in Practice
when examined locally, even though the system can learn and change to
become better adapted in the future (shifting temporal perspective).
Second, adaptive decision-making exists in a co-adaptive web
where adaptive behaviour by other systems horizontally or vertically (at
different echelons) influences (releases or constrains) the behaviour of
the system of interest. Behaviour that is adaptive for one unit or system
can produce constraints that lead to maladaptive behaviour in other
systems or can combine to produce emergent behaviour that is
maladaptive relative to criteria defined by a different perspective.
Working at cross-purposes happens when interdependent systems
do things that are all locally adaptive (relative to the role/goals set
up/pressured for each unit) but more globally maladaptive (relative to
broader perspectives and goals). This can occur horizontally across
units working at the same level as in urban fire-fighting (Branlat et al.,
2009). It can occur upward, vertically, where local adaptation at the
sharp end of a system is maladaptive when examined from a more
regional perspective that encompasses higher level or total system goals.
One example is ad hoc plan adaptation in the face of an impasse to a
plan in progress; in this case the adaptation works around the impasse
but fails to do so in a way that takes into account all of the relevant
constraints as defined from a broader perspective on goals (Woods and
Shattuck, 2000).
Working at cross-purposes can occur downward vertically too
(Woods et al., in press). Behaviour that is adaptive when considered
regionally can be seen as maladaptive when examined locally as the
regional actions undermine or create complexities that make it harder
for the sharp end to meet the real demands of situations (for example,
actions at a regional level can introduce complexities that force sharp
end operations to develop work-arounds and other forms of gap-filling
adaptations).
This discussion points to the finding in adaptive system science
that all systems face fundamental trade-offs. In particular, becoming
more optimal with respect to some aspects of the environment
inevitably leads that system to be less adapted to other aspects of the
environment (Doyle, 2000; Zhou et al., 2005; Woods, 2006a; Hollnagel,
2009). This leads us to a non-intuitive but fundamental conclusion that
all adaptive systems simultaneously are (Woods, 2009b):
Chapter 10 17
well-adapted to some aspects of its environment (e.g., the fluency
law‘well”-adapted’ cognitive work occurs with a facility that
belies the difficulty of the demands resolved and the dilemmas
balanced; see Woods and Hollnagel, 2006),
under-adapted in that the system has some degree of drive to learn
and improve its fitness relative to variation in its environment. This
is related in both intrinsic properties of that agent or system and to
the external pressures the system faces from stakeholders.
maladapted or brittle in the face of events and changes that challenge
its normal function.
This basic property of adaptive systems means that all forms of
linear causal analysis are inadequate for modelling and predicting the
behaviour of such systems. Adaptive systems’ sciences are developing
the new tools needed to accurately model, explain and predict how
adaptive systems will behave (e.g., Alderson and Doyle, in press), for
example, how to anticipate tipping points in complex systems (Scheffer
et al., 2009).
Working organisations need to be able to see and avoid or
recognise and escape when a system is moving toward one of the three
basic adaptive traps. Being resilient means the organisation can monitor
how it is working relative to changing demands and adapt in
anticipation of crunches, just as incident command should be able to do
in urban fire-fighting. Organisations can look at how they have adapted
to disruptions in past situations to estimate whether their system’s
‘margins of manoeuvre’ in the future are expanding or contracting.
Resilience Engineering is beginning to provide the tools to do this even
as more sophisticated general models of adaptive systems are being
developed.
18 Resilience Engineering in Practice
References
Andersson, K. P. and Ostrom, E. (2008). Analyzing decentralized
resource regimes form a polycentric perspective. Policy Science,
41, 71-93.
Alderson, D. L. and Doyle, J. C. (in press). Contrasting views of
complexity and their implications for network-centric
infrastructures. IEEE Systems, Man and Cybernetics, Part A.
Bengtsson, J., Angelstam, P., Elmqvist, T., Emanuelsson, U., Folke, C.,
Ihse, M., Moberg, F. and Nyström, M. (2003). Reserves, Resilience
and Dynamic Landscapes. Ambio, 32(6), 389-396.
Branlat, M., Fern, L., Voshell, M. and Trent, S. (2009). Coordination in
Urban Firefighting: A Study of Critical Incident Reports. Proceedings
of the Human Factors and Ergonomics Society 53rd Annual Meeting, San
Antonio, TX.
Brown, J. P. (2005). Key themes in healthcare safety dilemmas. In M. S.
Patankar, J. P. Brown, & M. D. Treadwell (Eds.), Safety Ethics: Cases
from Aviation, Healthcare, and Occupational and Environmental Health
(pp. 103-148). Adelshot, UK: Ashgate.
Committee on the Future of Emergency Care in the US (2006).
Hospital-based Emergency Care: At the Breaking Point. National
Academic Press, Washington, DC.
Cook, R. I. (2006). Being bumpable: consequences of resource
saturation and near-saturation for cognitive demands on ICU
practitioners. In D. D. Woods & E. Hollnagel (Eds.), Joint Cognitive
Systems: Patterns in Cognitive Systems Engineering. (pp. 2335). Boca
Raton, FL: Taylor & Francis/CRC Press.
Cook, R. and Rasmussen, J. (2005). “Going Solid”: A model of system
dynamics and consequences for patient safety. Quality and Safety in
Health Care, 14, 130-134.
Cook, R. I., Woods, D. D. and McDonald, J.S. (1991). Human
Performance in Anesthesia: A Corpus of Cases. Cognitive Systems
Engineering Laboratory Report, prepared for Anesthesia Patient
Safety Foundation, April 1991.
Chapter 10 19
Cook, R. I., Woods, D. D. and Miller, C. (1998). A Tale of Two Stories:
Contrasting Views of Patient Safety. Chicago, National Patient Safety
Foundation. (available at http://csel.eng.ohio-
state.edu/blog/woods/archives/000030.html )
Coutarel, F., Daniellou, F., & Dugué, B. (2003). Interroger
l'organisation du travail au regard des marges de manoeuvre en
conception et en fonctionnement [Examining Work Organization
in Relation to Margins of Maneuver in Design and in Operation].
Pistes, 5(2).
Csete, M.E. and Doyle, J.C. (2002). Reverse engineering of biological
complexity. Science, 295, 16641669.
Dekker, S. (2008). Just Culture: Balancing Safety and Accountability.
Adelshot, UK: Ashgate.
Doyle, J.C. (2000). Multiscale networking, robustness, and rigor. In T.
Samad and J. Weyrauch (Eds.), Automation, control, and complexity : an
integrated approach. NY: John Wiley & Sons, Inc. New York, pp. 287
301.
Feltovich, P. J., Spiro, R. J. and Coulson, R. L. (1989). The nature of
conceptual understanding in biomedicine: The deep structure of
complex ideas and the development of misconceptions. In D.
Evans and V. Patel (Eds.), The Cognitive Sciences in Medicine (pp. 113-
172). Cambridge MA: MIT Press.
Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1997). Issues of expert
flexibility in contexts characterized by complexity and change. In P.
J. Feltovich, K. M. Ford, & R. R. Hoffman (Eds.), Expertise in
context: Human and machine. Menlo Park, CA. AAAI/MIT Press.
Hollnagel, E. (2007). Resilience Engineering: Why, What and How. In
NoFS 2007 - Nordic Research Conference on Safety, 13-15 June 2007,
Tampere, Finland.
Hollnagel, E. (2009). The ETTO Principle: Efficiency-Thoroughness Trade-Off:
Why Things That Go Right Sometimes Go Wrong. Ashgate.
Klaene, B. J., & Sanders, R. E. (2008). Structural Firefighting: Strategies and
Tactics (2nd ed.). Sudbury, MA: Jones & Bartlett Publishers.
Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for
Collective Action. New York: Cambridge University Press, 1990.
Ostrom, E. (1999). Coping with Tragedies of the Commons. Annual
Review of Political Science, 2, pp. 493535.
20 Resilience Engineering in Practice
Perrow, C. (1984). Normal Accidents: Living with High-Risk Technologies.
New York: Basic Books.
Rochlin, G.I. (1999). Safe operation as a social construct. Ergonomics,
42(11), 1549-1560.
Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R.,
Dakos, V., Held, H., van Nes, E. H., Rietkerk, M. and Sugihara, G.
(2009). Early-warning signals for critical transitions. Nature,
461(7260), 53-59.
Wears, R. L. and Woods, D. D. (2007). Always Adapting. Annals of
Emergency Medicine, 50(5), 517-519.
Woods, D. D. (2005). Creating Foresight: Lessons for Resilience from
Columbia. In W. H. Starbuck and M. Farjoun (eds.), Organization at
the Limit: NASA and the Columbia Disaster. Malden, MA: Blackwell,
pp. 289--308.
Woods, D. D. (2006). Essential characteristics of resilience. In E.
Hollnagel, D. D. Woods, & N. Leveson (Eds.), Resilience Engineering:
Concepts And Precepts (pp. 19–30). Adelshot, UK: Ashgate.
Woods, D. D. (2009a). Escaping Failures of Foresight. Safety Science,
47(4), 498-501.
Woods, D. D. (2009b). Fundamentals to Engineer Resilient Systems:
How Human Adaptive Systems Fail and the Quest for Polycentric
Control Architectures. Keynote presentation, 2nd International
Symposium on Resilient Control Systems, Idaho Falls, ID, August 11-13
2009 (https://secure.inl.gov/isrcs2009/default.aspx accessed
September 8, 2009).
Woods, D. D. and Cook, R. I. (2006). Incidents: Are they markers of
resilience or brittleness? In E. Hollnagel, D.D. Woods and N.
Leveson, eds., Resilience Engineering: Concepts and Precepts. Ashgate,
Aldershot, UK, pp. 69-76.
Woods, D. D., & Hollnagel, E. (2006). Joint Cognitive Systems: Patterns in
Cognitive Systems Engineering. Boca Raton, FL: Taylor & Francis/CRC
Press.
Woods, D. D. and Sarter, N. (2000). Learning from Automation
Surprises and Going Sour Accidents. In N. Sarter and R. Amalberti
(Eds.), Cognitive Engineering in the Aviation Domain, Erlbaum,
Hillsdale NJ, pp. 327-354.
Chapter 10 21
Woods, D.D. and Shattuck, L. G. (2000). Distant supervision—local
action given the potential for surprise. Cognition, Technology and
Work, 2, 242245.
Woods, D. D. and Wreathall, J. (2008). Stress-Strain Plot as a Basis for
Assessing System Resilience. In E. Hollnagel, C. Nemeth and S. W.
A. Dekker, eds., Resilience Engineering: Remaining sensitive to the
possibility of failure. Ashgate, Aldershot, UK, pp. 145-161.
Zhou, T., Carlson, J. M. and Doyle, J. (2005). Evolutionary dynamics
and highly optimized tolerance. Journal of Theoretical Biology, 236,
438-447.
  • Conference Paper
    Full-text available
    This paper presents the latest results on the Stress-­‐Strain model of resilience and shows how the model provides a means to operaJonalize the four cornerstones of Resilience Engineering as proposed by Hollnagel and uJlized in the Resilience Analysis Grid. The Stress-­‐Strain model of resilience, originally proposed by Woods and Wreathall in 2006, addresses one of the original goals for Resilience Engineering-­‐-­‐ how to assess briPleness of an organizaJon or system. The model is based on a representaJon, in the tradiJon of plots of adapJve landscapes, that captures the relaJonship of demands or challenge events (what variaJons and events place stress on the system) and the ability of the system to draw on sources of adapJve capacity to respond to challenge events. The Stress-­‐Strain model provides a framework for analysis to answer the key quesJon-­‐-­‐ how does a system stretch to handle surprises?
  • Article
    Full-text available
    There seems to be a worldwide push, through policy and Government campaigns, to emphasise a local and decentralised responsibility for societal safety and security. Often, this push is argued for using the notion of resilience. Using an archaeological approach this paper sets out to analyse the conditions of possibility for resilience to get established as an object of knowledge within the discourse of societal safety and security. Three such conditions of possibility are analysed: a scientific availability of resilience language and theory which offers an academic credibility to claims for resilience, a political need to decentralise initiatives (and costs) for societal safety and security to local actors and networks, and a number of events defining the need for such an approach. Critical questions are raised regarding the transfer of responsibility to citizens for societal safety and security, the normative use of resilience language as well as whether the resilience object of knowledge actually provides new language or whether it rather repackages previously present objects of discursive knowledge.
  • Article
    Full-text available
    Investigations into complex adaptive systems (CAS) have identified multiple trade-offs that place hard limits on the behavior of adaptive systems of any type. Complexity theory continues to search for a formalization that can unify these trade-offs around one or a few fundamental ones, and explain how observed tradeoffs are derived from the most basic ones (Alderson and Doyle, 2010). Resilience Engineering (RE) also arose from the recognition that basic trade-offs placed hard limits on the safety performance of teams and organizations in the context of pressures for systems to be "faster, better, cheaper" (Woods, 2006; Hollnagel, 2009). Combining the results from CAS on physical complex systems with the results from RE on high risk, high consequence human designed systems leads to a potential unification. The unification consists of (a) five basic trade-offs that bound the performance of all human adaptive systems (Hoffman and Woods, 2011), and (b) an architecture for polycentric control or governance based on regulating margin of maneuver to be able to dynamically balance the conflicts, risks and pressures that arise from the fundamental trade-offs.
  • Book
    Building on the success of the 2007 original, Dekker revises, enhances and expands his view of just culture for this second edition, additionally tackling the key issue of how justice is created inside organizations. The goal remains the same: to create an environment where learning and accountability are fairly and constructively balanced. The First Edition of Sidney Dekker's Just Culture brought accident accountability and criminalization to a broader audience. It made people question, perhaps for the first time, the nature of personal culpability when organizational accidents occur. Having raised this awareness the author then discovered that while many organizations saw the fairness and value of creating a just culture they really struggled when it came to developing it: What should they do? How should they and their managers respond to incidents, errors, failures that happen on their watch? In this Second Edition, Dekker expands his view of just culture, additionally tackling the key issue of how justice is created inside organizations. The new book is structured quite differently. Chapter One asks, 'what is the right thing to do?' - the basic moral question underpinning the issue. Ensuing chapters demonstrate how determining the 'right thing' really depends on one's viewpoint, and that there is not one 'true stor' but several. This naturally leads into the key issue of how justice is established inside organizations and the practical efforts needed to sustain it. The following chapters place just culture and criminalization in a societal context. Finally, the author reflects upon why we tend to blame individual people for systemic failures when in fact we bear collective responsibility. The changes to the text allow the author to explain the core elements of a just culture which he delineated so successfully in the First Edition and to explain how his original ideas have evolved. Dekker also introduces new material on ethics and on caring for the' second victim' (the professional at the centre of the incident). Consequently, we have a natural evolution of the author's ideas. Those familiar with the earlier book and those for whom a just culture is still an aspiration will find much wisdom and practical advice here.
  • Article
    Accident investigation and risk assessment have for decades focused on the human factor, particularly 'human error'. Countless books and papers have been written about how to identify, classify, eliminate, prevent and compensate for it. This bias towards the study of performance failures, leads to a neglect of normal or 'error-free' performance and the assumption that as failures and successes have different origins there is little to be gained from studying them together. Erik Hollnagel believes this assumption is false and that safety cannot be attained only by eliminating risks and failures. The ETTO Principle looks at the common trait of people at work to adjust what they do to match the conditions – to what has happened, to what happens, and to what may happen. It proposes that this efficiency-thoroughness trade-off (ETTO) – usually sacrificing thoroughness for efficiency – is normal. While in some cases the adjustments may lead to adverse outcomes, these are due to the very same processes that produce successes, rather than to errors and malfunctions. The ETTO Principle removes the need for specialised theories and models of failure and 'human error' and offers a viable basis for effective and just approaches to both reactive and proactive safety management.
  • Article
    Full-text available
    In N. Sarter and R. Amalberti (Eds.) Cognitive Engineering in the Aviation Domain, Erlbaum, Hillsdale NJ, in press.
  • Article
    Normal Accidents analyzes the social side of technological risk. Charles Perrow argues that the conventional engineering approach to ensuring safety--building in more warnings and safeguards--fails because systems complexity makes failures inevitable. He asserts that typical precautions, by adding to complexity, may help create new categories of accidents. (At Chernobyl, tests of a new safety system helped produce the meltdown and subsequent fire.) By recognizing two dimensions of risk--complex versus linear interactions, and tight versus loose coupling--this book provides a powerful framework for analyzing risks and the organizations that insist we run them. The first edition fulfilled one reviewer's prediction that it "may mark the beginning of accident research." In the new afterword to this edition Perrow reviews the extensive work on the major accidents of the last fifteen years, including Bhopal, Chernobyl, and the Challenger disaster. The new postscript probes what the author considers to be the "quintessential 'Normal Accident'" of our time: the Y2K computer problem.