ArticlePDF Available

Abstract and Figures

This chapter provides one input to resilience management strategies in the form of three basic patterns in how adaptive systems fail. The three basic patterns are (1) decompensation – when the system exhausts its capacity to adapt as disturbances / challenges cascade; (2) working at cross-purposes – when roles exhibit behaviour that is locally adaptive but globally mal-adaptive; and (3) getting stuck in outdated behaviours – when the system over-relies on past successes. Illustrations are drawn from urban fire-fighting and crisis management. A working organisation needs to be able to see and avoid or recognise and escape when the system is moving toward one of the three basic adaptive traps. Understanding how adaptive systems can fail requires contrasting diverse perspectives.
Content may be subject to copyright.
Chapter 10 1
Chapter 10: Basic Patterns in How Adaptive
Systems Fail
David D. Woods and Matthieu Branlat
This chapter provides one input to resilience management
strategies in the form of three basic patterns in how adaptive
systems fail. The three basic patterns are (1) decompensation
when the system exhausts its capacity to adapt as disturbances /
challenges cascade; (2) working at cross-purposes when roles
exhibit behaviour that is locally adaptive but globally mal-adaptive;
and (3) getting stuck in outdated behaviours when the system
over-relies on past successes. Illustrations are drawn from urban
fire-fighting and crisis management. A working organisation needs
to be able to see and avoid or recognise and escape when the
system is moving toward one of the three basic adaptive traps.
Understanding how adaptive systems can fail requires contrasting
diverse perspectives.
The Optimist-Pessimist Divide on Complex Adaptive
Systems
Adaptive System Sciences begin with fundamental trade-offs
optimality-brittleness, (Csete and Doyle, 2002; Zhou, Carlson and
Doyle, 2005) or efficiency-thoroughness (Hollnagel, 2009). As an entity,
group, system, or organisation attempts to improve its performance it
becomes better adapted to some things, factors, events, disturbances, or
variations in its environment (its ‘fitness’ improves). However, as a
consequence of improving its fitness with respect to some aspects of its
environment, that entity also must become less adapted to other events,
disturbances, or variations. As a result, when those ‘other’ events or
variations occur, the entity in question will be severely tested and may
2 Resilience Engineering in Practice
fail (this dynamic is illustrated by the story of the Columbia space
shuttle accident; e.g., Woods, 2005a).
The driving question becomes whether (and how) an entity can
identify and manage its position in the trade-off space? In other words,
can an organisation monitor its position and trajectory in a trade-off
space and make investments to move its trajectory prior to crisis
events? The pessimists on complexity and adaptive systems (e.g.,
Perrow, 1984) see adaptive systems as trapped in a cycle of expansion,
saturation, and eventual collapse. The pessimist stance answers the
above questions with ‘No.’ Their response means that as a system
adapts to meet pressures to be ‘faster, better, cheaper,’ it will become
more complex and experience the costs associated with increasing
complexity with little recourse.
Resilience Engineering, on the other hand, represents the optimist
stance and its agenda is to develop ways to control or manage a
system’s adaptive capacities based on empirical evidence. Resilience
Engineering maintains that a system can manage brittleness trade-offs.
To achieve such resilient control and management, a system must have
the ability to reflect on how well it is adapted, what it is adapted to, and
what is changing in its environment. Armed with information about
how the system is resilient and brittle and what trends are under way,
managers can make decisions about how to invest resources in targeted
ways to increase resilience (Woods, 2006a; Hollnagel, 2009).
The optimist stance assumes that an adaptive system has some
ability to self-monitor its adaptive capacity (reflective adaptation) and
anticipate/learn so that it can modulate its adaptive capacity to handle
future situations, events, opportunities and disruptions. In other words,
the optimist stance looks at human systems as able to examine, reflect,
anticipate, and learn about its own adaptive capacity.
The pessimist stance, on the other hand, sees an adaptive system as
an automatic built-in process that has very limited ability for learning
and self-management. Systems may vary in how they adapt and how
this produces emergent patterns but the ability to control these cycles is
very limited. It is ironic that the pessimist stance thinks people can
study and learn about human adaptive systems, but that little can be
done to change/design adaptive systems because new complexities and
unintended consequences will sabotage the best laid plans. Resilience
Chapter 10 3
Engineering admits that changing/designing adaptive systems is hard,
but sees it as both necessary and possible. Resilience Engineering in
practice provides guidance on how to begin doing this.
This chapter provides one input to resilience management
strategies in the form of three basic patterns in how adaptive systems
fail. The taxonomy continues the line of work begun by Woods and
Cook (2006) who described one basic pattern in how adaptive systems
behave and how they fail. The chapter also illustrates these patterns in
examples drawn from urban fire-fighting and crisis management. To
develop resilience management strategies, organisations need to be able
to look ahead and either see and avoid or recognise and escape when they are
headed for adaptive traps of one kind or another. A taxonomy of
different maladaptive patterns is valuable input to develop these
strategies.
Assessing Future Resilience from Studying the History
of Adaptation (and Maladaptation)
The resilience/brittleness of a system captures how well it can adapt to
handle events that challenge the boundary conditions for its operation.
Such ‘challenge’ events do occur (1) because plans and procedures have
fundamental limits, (2) because the environment changes over time and
in surprising ways, (3) because the system itself adapts around successes
given changing pressures and expectations for performance. In large
part, the capacity to respond to challenge events resides in the expertise,
strategies, tools, and plans that people in various roles can deploy to
prepare for and respond to specific classes of challenge.
Resilience, as a form of adaptive capacity, is a system’s potential for
adaptive action in the future when information varies, conditions change,
or when new kinds of events occur, any of which challenge the viability
of previous adaptations, models, plans, or assumptions. However, the
data to measure resilience comes from observing/analysing how the
system has adapted to disrupting events and changes in the past (Woods,
2009a, p. 500). Past incidents provide information about how a system
was both brittle, by revealing how it was unable to adapt in a particular
evolving situation, and resilient, by revealing aspects of how it routinely
adapted to disruptions (Woods and Cook, 2006). Analysis of data about
4 Resilience Engineering in Practice
how the system adapted and to what, can provide a characterisation of
how well operational systems are prepared in advance to handle
different kinds of challenge events and surprises (Hollnagel et al., 2006).
Patterns of failure arise due to basic regularities about adaptation in
complex systems. The patterns are generalisations derived from
analysing cases where systems were unable to prepare for and handle
new challenges. The patterns all involve dynamic interactions between
the system in question and the events that occur in its environment.
The patterns also involve interactions among people in different roles
each trying to prepare for and handle the events that occur within the
scope of their roles. The patterns apply to systems across different
scales – individuals, groups, organisations.
Patterns of Maladaptation
There are three basic patterns by which adaptive systems break down,
and within each, there is a variety of sub-patterns. The three basic
patterns are (1) decompensation, (2) working at cross-purposes, and (3)
getting stuck in outdated behaviours.
Decompensation: Exhausting Capacity to Adapt as Disturbances /
Challenges Cascade
In this pattern, breakdown occurs when challenges grow and cascade
faster than responses can be decided on and effectively deployed. A
variety of cases from supervisory control of dynamic processes provide
the archetype for the basic pattern. Decompensation occurs in human
cardiovascular physiology, e.g., the Starling curve in cardiology. When
physicians manage sick hearts they can miss signals that the
cardiovascular system is running out of control capability and fail to
intervene early enough to avoid a physiological crisis (Feltovich, Spiro
and Coulson, 1989; Cook, Woods and McDonald, 1991; Woods and
Cook, 2006). Decompensation also occurs in human supervisory
control of automated systems, for instance in aviation. In cases of
asymmetric lift due to icing or slowly building engine trouble,
automation can silently compensate but only up to a point. Flight crews
may recognise and intervene only when the automation is nearly out of
capacity to respond and when the disturbances have grown much more
Chapter 10 5
severe. At this late stage there is also a risk of a bumpy transfer of
control that exacerbates the control problem. Noticing early that the
automation has to work harder and harder to maintain control is
essential (Norman, 1990; Woods, 1994; Woods and Sarter, 2000
provide examples from cockpit automation). Figure 1 illustrates the
generic signature for decompensation breakdowns.
The basic decompensation pattern evolves across two phases. In
the first phase, a part of the system adapts to compensate for a growing
disturbance. Partially successful initially, this compensatory control
masks the presence and development of the underlying disturbance.
The second phase of a decompensation event occurs because the
automated response cannot compensate for the disturbance completely
or indefinitely. After the response mechanism’s capacity is exhausted,
the controlled parameter suddenly collapses (the decompensation event
that leads to the name).
Figure 1. The basic decompensation signature.
The question is whether a part of the system, a supervisory
controller, can detect the developing problem during the first phase of
6 Resilience Engineering in Practice
the event pattern or whether it misses the signs that the lower order or
base controllers (automated loops in the typical system analysis) are
working harder and harder to compensate but getting nearer to its
capacity limits as the external challenge persists or grows? This requires
discriminating between adaptive behaviour that is part of successful
control and adaptive behaviour that is a sign of incipient failure to
come.
In these situations, the critical information is not the abnormal
process symptoms per se but the increasing force with which they must
be resisted relative to the capabilities of the base control systems. For
example, when a human acts as the base control system, he/she would
as an effective team member communicate to others the fact that they
need to exert unusual control effort (Norman, 1990). Such information
provides a diagnostic cue for the team and is a signal that additional
resources need to be injected to keep the process under control. If
there is no information about how hard the base control system is
working to maintain control in the face of disturbances, it is quite
difficult to recognise the seriousness of the situation during the phase 1
portion, and therefore to respond early enough to avoid the
decompensation collapse that marks phase 2 of the event pattern. The
key information is how hard control systems are working to maintain
control and the trend: are control systems running out of control
capability as disturbances are growing or cascading?
There are a number of variations on the decompensation pattern,
notably:
Falling behind the tempo of operations (e.g., the aviation expression
‘falling behind the power curve;’ surges in demands in emergency
rooms Wears and Woods, 2007; bed crunches in intensive care
units Cook, 2006).
Inability of an organisation to transition to new modes of functioning when
anomalies challenge normal mechanisms or contingencies (e.g., a hospital’s
ability to manage mass casualty events see Committee on the
Future of Emergency Care in the US, 2006; Woods and Wreathall,
2008 provide a general description of this risk).
Chapter 10 7
Working at Cross-purposes: Behaviour that is Locally Adaptive, but
Globally Maladaptive
This refers to the inability to coordinate different groups at different
echelons as goals conflict. As a result of miscoordination the groups
work at cross-purposes. Each group works hard to achieve the local
goals defined for their scope of responsibility, but these activities make
it more difficult for other groups to meet the responsibilities of their
roles or undermine the global or long term goals that all groups
recognise to some degree.
The archetype is the tragedy of the commons (Ostrom, 1990, 1999)
which concerns shared physical resources (among the most studied
examples of common pools are fisheries management and water
resources for irrigation). The tragedy of the commons is a name for a
baseline adaptive dynamic whereby the actors, by acting rationally in the
short term to generate a return in a competitive environment, deplete
or destroy the common resource on which they depend in the long run.
In the usual description of the dynamic, participants are trapped in an
adaptive cycle that inexorably overuses the common resource (a
‘pessimist’ stance on adaptive systems); thus, from a larger systems view
the local actions of groups are counter-productive and lead them to
destroy their livelihood or way of life in the long run.
Organisational analyses of accidents like the Columbia space shuttle
accident put production/safety trade-offs in a parallel position to
tragedies of the commons. Despite the organisations’ attempts to
design operations for high safety and the large costs of failures in
money and in lives, line managers under production pressures make
decisions that gradually erode safety margins and thereby undermine
the larger common goal of safety. In other words, safety can be thought
of as an abstract common pool resource analogous to a fishery. Thus,
dilemmas that arise in managing physical common pool resources are a
specific example of a general type of goal conflict where different
groups are differentially responsible for and affected by different sub-
goals, even though there is one or only a couple of commonly held
over-arching goals (Woods et al., 1994, Chapter 4). When the activities
of different groups seem to advance local goals but undermine over-
arching or long term goals of the larger system that the groups belong
8 Resilience Engineering in Practice
to, the system-level pattern is maladaptive as the groups work at cross-
purposes. Specific concrete stories that capture this pattern of adaptive
breakdown can be found in Brown (2005), who collected cases of safety
dilemmas and sacrifice judgments in health care situations.
There is a variety of sub-patterns to working at cross purposes.
Some of these concerns vertical interactions, that is across echelons or
levels of control, such as the tragedy of the commons. Others concern
horizontal interactions when many different groups need to coordinate
their activities in time and space such as in disaster response and
military operations. This pattern can also occur over time. A sub-
pattern that includes a temporal component and is particularly
important in highly coupled systems is missing side effects of change
(Woods and Hollnagel, 2006). This can occur when there is a change
that disrupts plans in progress or when a new event presents new
demands to be handled, among other events. Other characteristic sub-
patterns are:
Fragmentation over roles (stuck in silos; e.g., precursors to Columbia
space shuttle accident, Woods, 2005a).
Failure to resynchronise following disruptions (Branlat et al., 2009).
Double binds (Woods et al., in press).
Getting Stuck in Outdated Behaviours: The World Changes but the
System Remains Stuck in what were Previously Adaptive Strategies
(Over-relying on Past Successes)
This pattern relates to breakdowns in how systems learn. What was
previously adaptive can become rigid at the level of individuals, groups,
or organisations. These behaviours can persist even as information
builds that the world is changing and that the usual
behaviours/processes are not working to produce desired effects or
achieve goals. One example is the description of the cycle of error as
organisations become trapped in narrow interpretations of what led to
an accident (Cook, Woods and Miller, 1998).
This pattern is also at play at more limited operational time scopes.
Domains such as military operations offer a rich environment for
Chapter 10 9
studying the pattern. When conditions of operation change over time,
tactics or strategies need to be updated to match new challenges or
opportunities. While such decisions are made difficult by the uncertain
nature of the operations’ environment and of the outcome of actions,
missed opportunities to re-plan constitute sources of failure (Woods
and Shattuck, 2000). Mishaps in the nuclear industry have also
exemplified the pattern by showing the dangers of “rote rule following”
(ibid.). In all of these cases there was a failure to re-plan when the
conditions experienced fell outside of the boundaries the system and
plans were designed for. Some characteristic sub-patterns are:
Oversimplifications (Feltovich, Spiro and Coulson, 1997).
Failing to revise current assessment as new evidence comes in (Woods and
Hollnagel, 2006; Rudolph, 2009).
Failing to revise plan in progress when disruptions/opportunities arise
(Woods and Hollnagel, 2006).
Discount discrepant evidence (e.g., precursors to Columbia, Woods,
2005a).
Literal Mindedness, particularly in automation failures (Woods and
Hollnagel, 2006).
Distancing through differencing (Cook and Woods, 2006).
Cook’s Cycle of Error (Cook et al., 1998).
The three basic patterns define kinds of adaptive traps. A reflective
adaptive system should be able to monitor its activities and functions
relative to its changing environment and determine whether it is likely
to fall into one or another of these adaptive traps. The three basic
patterns can be used to understand better how various systems are
vulnerable to failures, such as systems that carry out crisis management,
systems that respond to anomalies in space flights, and systems that
provide critical care to patients in medicine. In the next section, we test
the explanatory value of these three basic patterns by re-visiting a recent
analysis of critical incidents (Branlat et al., 2009) that provided markers
of both resilience and brittleness (Woods and Cook, 2006). Urban fire-
fighting provides a rich setting to examine aspects of resilience and
brittleness related to adaptation and coordination processes. Incident
command especially instantiates patterns generic to adaptive systems
10 Resilience Engineering in Practice
and observed in other domains or at other scales (Bengtsson et al.,
2003; Woods and Wreathall, 2008).
The Basic Patterns Are illustrated in Urban Fire-
fighting Critical Incidents
High uncertainty and potential for disruptions, new events, and
surprises all pose challenges for fire-fighting operations. The fire-
fighting organisation needs to be able to adapt to new information
(whether a challenge or opportunity) about the situation at hand and to
ever-changing conditions. For example, consider this case from the
corpus (Branlat et al., 2009):
Companies arrive on the fire scene and implement standard
operating procedures for an active fire on the first floor of the
building. The first ladder company initiates entry to the apartment
on fire, while the second ladder gets to the second floor in order to
search for potentially trapped victims (the ‘floor above the fire’ is
an acknowledged hazardous position). In the meantime, engine
companies stretch hose-lines but experience various difficulties
delaying their actions, especially because they cannot achieve
optimal positioning of their apparatus on a heavily trafficked street.
While all units are operating, conditions are deteriorating in the
absence of water being provisioned on the fire. The Incident
Commander (IC) transmits an ‘all hands’ signal to the dispatcher,
leading to the immediate assignment of additional companies.
Almost at the same time, members operating above the fire
transmit a ‘URGENT’ message over the radio. Although the IC
tries to establish communication and get more information about
the difficulties encountered, he does not have uncommitted
companies to assist the members. Within less than a minute, a
back-draft-type explosion occurs in the on fire apartment,
engulfing the building’s staircase in flames and intense heat for
several seconds, and erupting through the roof. As the members
operating on the second floor had not been able to get access to
the apartment there due to various difficulties, they lacked both a
Chapter 10 11
refuge area (apartment) and an egress route (staircase). The second
ladder company was directly exposed to life-threatening conditions.
The three basic patterns can all be seen at work in this case:
Decompensation. The situation deteriorated without companies being
able to address the problem promptly. The Incident Commander
(IC) recognised and signalled an ‘all hands’ situation, in order to
inform dispatchers that all companies were operating and to
promptly request additional resources. As there were no
uncommitted resources available, the fire companies were unable to
respond when an unexpected event occurred (the back-draft) which
created dangers and hindered the ability of others to assist. As a
result, team members were exposed to dangerous conditions.
Working at cross-purposes. Companies were pursuing their tasks and
experienced various challenges without the knowledge of other
companies’ difficulties. Without this information, actions on the
first floor worked against the actions and safety of operators on the
second floor. Goal conflict arose (1) between the need to provide
access to the fire and to contain it while water management was
difficult, and (2) between the need to address a deteriorating
situation and to rescue injured members while all operators were
committed to their tasks.
Getting stuck in outdated behaviour. The ladder companies continued to
implement standard procedures that assumed another condition
was met (water availability from the engine companies). They failed
to adapt the normally relevant sequence of activities to fit the
changing particulars of this situation: the first ladder company
gained access to the apartment on fire; but in the absence of water,
the opened door fuelled the fire and allowed flames and heat to
spread to the rest of the building (exacerbating how the fire
conditions were deteriorating). Similarly, the unit operating on the
second floor executed its tasks normally, but the difficulty it
encountered and the deteriorating situation required adaptation of
normal routines to fit the changing risks.
12 Resilience Engineering in Practice
Urban Fire-fighting and the Dynamics of Decompensation
During operations, it is especially important for the Incident
Commander (IC) constantly and correctly to assess progress in terms of
trends in whether the fire is in or out of control. To do this, the IC
monitors (a) the operational environment including the evolution of the
fire and the development of additional demands or threats (e.g.,
structural damages or trapped victims) and (b) the effort companies are
exerting to try to accomplish their tasks as well as their capacity to
respond to additional demands. Based on such assessments, the IC
makes critical decisions related to the management of resources:
redeploying companies in support of a particular task; requesting
additional companies to address fire extensions or need to relieve
members; requesting special units to add particular forms of expertise
to handle unusual situations (e.g., presence of hazardous material).
ICs are particularly attentive to avoid risks of falling behind by
exhausting the system’s capacity to respond to immediate demands as
well as to new demands (Branlat et al., 2009). The ‘all-hands’ signal is a
recognition that the situation is precarious because it is stretched close
to its maximum capacity and that current operations therefore are
vulnerable to any additional demands that may occur. The analysis of
the IC role emphasised anticipating trends or potential trends in
demands relative to how well operations were able to meet those
demands (see also Cook’s analysis of resource crunches in intensive
care units; Cook, 2006). For urban fire-fighting, given crucial time
constraints, resources are likely to be available too late if they are
requested only when the need is definitive. A critical task of the IC
therefore corresponds to the regulation of adaptive capacity by
providing ‘tactical reserves’ (Klaene and Sanders, 2008, p. 127), i.e., an
additional capacity promptly to adapt tactics to changing situations.
Equivalent processes also play out (a) at the echelon of fire-fighters or
fire teams, (b) in terms of the distributed activity (horizontal
interactions) across roles at broader echelons of the emergency
response system, and (c) vertically across echelons where information
about difficulties at one level change decisions and responses at another
echelon.
Chapter 10 13
Urban Fire-fighting and Coordination over Multiple Groups and Goals
Fire-fighting exemplifies situations within which tasks and roles are
highly distributed and interdependent, exposing work systems to the
difficulty of maintaining synchronisation while providing flexibility to
address ever-changing demands. Interdependencies also result from the
fact that companies operate in a shared environment.
Several reports within the corpus described incidents where
companies opened hose-lines and pushed fire and heat in the direction
of others. These situations usually resulted from companies adapting
their plan because of difficulties or opportunities. If the shift in activity
by one group was not followed by a successful resynchronisation, it
created conditions for a coordination breakdown where companies
(and, importantly, the IC) temporarily lost track of each other’s position
and actions. In this context one group could adapt to handle the
conditions they face in ways that inadvertently created or exacerbated
threats for other groups. Another example in the corpus was situations
where companies’ capacity to fulfil their functions were impeded by
actions of others. One groups actions, though locally adaptive relative
to their scope, introduced new constraints which reduced another
company’s ‘margins of manoeuvre’ (Coutarel, Daniellou and Dugué,
2003). This notion refers to the range of behaviours they are able to
deploy in order to fulfil their functions, therefore to their capacity to
adapt a course or plan of action in the face of new challenges. Such
dynamics might directly compromise members’ safety, for example
when the constrained functions were critical to egress route
management. In one case, a company vented a window adjacent to a
fire escape which had the consequence of preventing the members of
another company operating on the floor above from using the fire
escape as a potential egress route, should it have been needed.
Goal conflicts arise when there are trade-offs between achieving
the three fundamental purposes of urban fire-fighting: saving lives,
protecting property and ensuring personnel’s safety. This occurs when,
for example, a fire department forgoes the goal of protecting property
in order to minimise risk to fire-fighters. Incidents in the corpus vividly
illustrate the trade-offs that can arise during operations and require
adaptations to on-going operations. Under limited resources (time,
14 Resilience Engineering in Practice
water, operators), the need to rescue a distressed fire-fighter introduces
a difficult goal conflict between rescue and fire operations. If members
pursue fire operations, the victim risks life-threatening exposure to the
dangerous environment. Yet by abandoning fire operations,
momentarily or partially, team members risk letting the situation
degrade and the situation becomes more difficult and more dangerous
to address. The analysis of the corpus of cases found that adaptations in
such cases were driven by local concerns, e.g., when members
suspended their current operations to assist rescue operations nearby.
The management of goal conflicts is difficult when operations are not
clearly synchronised, since decisions that are only locally adapted risk
further fragmenting operations.
Urban Fire-fighting and the Risk of Getting Stuck in Outdated
Behaviours
As an instance of emergency response, urban fire-fighting is
characterised by the need to make decisions at a high-tempo and under
uncertainty. As fire-fighters discover and assess the problem to be
addressed during the course of operations, replanning is a central
process. It is critical that adaptations to the plan are made when
elements of the situation indicate that previous knowledge (on which
on-going strategy and tactics are based) is outdated. The capacity to
adapt is therefore highly dependent on the capacity correctly to assess
the situation at hand throughout the operations, especially at the level
of the IC. Accident cases show that the capacity of the IC to efficiently
supervise operations and modify the plan in progress is severely
impaired when this person only has limited information about and
understanding of the situation at hand and the level of control on the
fire.
Given the level of uncertainty, this also suggests the need for
response systems to be willing to devote resources to further assess
ambiguous signals, a characteristic of resilient and high-reliability
organisations (Woods, 2006a; Rochlin, 1999). This is nonetheless
challenging in the context of limited resources and high tempo, and
given the potential cost of replanning (risk of fragmenting operations,
cost of redeploying companies, coordination costs).
Chapter 10 15
At a wider temporal and organisational scale, fire departments and
organisations are confronted with the need to learn from situations in
order to increase or maintain operations’ resilience in the face of
evolving threats and demands. The reports analysed resulted from
thorough investigation processes that aimed at understanding limits in
current practices and tools and represented process of learning and
transformation. However, it is limiting to assume that the events that
produce the worst outcomes are also the ones that will produce the
most useful lessons. Instances where challenging and surprising
situations are managed without leading to highly severe outcomes also
reveal interesting and innovative forms of adaptations (Woods and
Cook, 2006). As stated previously, many minor incidents also represent
warning signals about the (in)adequacy of responses to the situations
encountered. They are indicators of the system starting to stretch
before it collapses in the form of a dramatic event (Woods and
Wreathall, 2008). To be resilient, organisations must be willing to
pursue these signals (Woods, 2009a). Unfortunately, selecting the
experiences or events which will prove fruitful to investigate, and
allocating the corresponding resources, is a difficult choice when it has
to be made a priori (Hollnagel, 2007; Dekker, 2008, chapter 3).
Recognising What is Maladaptive Depends on
Perspective Contrasts
The chapter has presented three basic patterns in how adaptive systems
fail. But it is difficult to understand how behaviours of people, groups,
and organisations are adapted to some factors and how those
adaptations are weak or strong, well or poorly adapted. One reason for
this is that what is well-adaptive, under-adaptive, or maladaptive is a
matter of perspective. As a result, labelling a behaviour or process as
maladapted is conditional on specifying a contrast across perspectives.
First, adaptive decision-making exhibits local (though bounded)
rationality (regardless of scale). A human adaptive system uses its
knowledge and the information available from its field of view/focus of
attention to adapt its behaviour (given its scope of autonomy/authority)
in pursuit of its goals. As a result, adaptive behaviour is well-adapted
16 Resilience Engineering in Practice
when examined locally, even though the system can learn and change to
become better adapted in the future (shifting temporal perspective).
Second, adaptive decision-making exists in a co-adaptive web
where adaptive behaviour by other systems horizontally or vertically (at
different echelons) influences (releases or constrains) the behaviour of
the system of interest. Behaviour that is adaptive for one unit or system
can produce constraints that lead to maladaptive behaviour in other
systems or can combine to produce emergent behaviour that is
maladaptive relative to criteria defined by a different perspective.
Working at cross-purposes happens when interdependent systems
do things that are all locally adaptive (relative to the role/goals set
up/pressured for each unit) but more globally maladaptive (relative to
broader perspectives and goals). This can occur horizontally across
units working at the same level as in urban fire-fighting (Branlat et al.,
2009). It can occur upward, vertically, where local adaptation at the
sharp end of a system is maladaptive when examined from a more
regional perspective that encompasses higher level or total system goals.
One example is ad hoc plan adaptation in the face of an impasse to a
plan in progress; in this case the adaptation works around the impasse
but fails to do so in a way that takes into account all of the relevant
constraints as defined from a broader perspective on goals (Woods and
Shattuck, 2000).
Working at cross-purposes can occur downward vertically too
(Woods et al., in press). Behaviour that is adaptive when considered
regionally can be seen as maladaptive when examined locally as the
regional actions undermine or create complexities that make it harder
for the sharp end to meet the real demands of situations (for example,
actions at a regional level can introduce complexities that force sharp
end operations to develop work-arounds and other forms of gap-filling
adaptations).
This discussion points to the finding in adaptive system science
that all systems face fundamental trade-offs. In particular, becoming
more optimal with respect to some aspects of the environment
inevitably leads that system to be less adapted to other aspects of the
environment (Doyle, 2000; Zhou et al., 2005; Woods, 2006a; Hollnagel,
2009). This leads us to a non-intuitive but fundamental conclusion that
all adaptive systems simultaneously are (Woods, 2009b):
Chapter 10 17
well-adapted to some aspects of its environment (e.g., the fluency
law‘well”-adapted’ cognitive work occurs with a facility that
belies the difficulty of the demands resolved and the dilemmas
balanced; see Woods and Hollnagel, 2006),
under-adapted in that the system has some degree of drive to learn
and improve its fitness relative to variation in its environment. This
is related in both intrinsic properties of that agent or system and to
the external pressures the system faces from stakeholders.
maladapted or brittle in the face of events and changes that challenge
its normal function.
This basic property of adaptive systems means that all forms of
linear causal analysis are inadequate for modelling and predicting the
behaviour of such systems. Adaptive systems’ sciences are developing
the new tools needed to accurately model, explain and predict how
adaptive systems will behave (e.g., Alderson and Doyle, in press), for
example, how to anticipate tipping points in complex systems (Scheffer
et al., 2009).
Working organisations need to be able to see and avoid or
recognise and escape when a system is moving toward one of the three
basic adaptive traps. Being resilient means the organisation can monitor
how it is working relative to changing demands and adapt in
anticipation of crunches, just as incident command should be able to do
in urban fire-fighting. Organisations can look at how they have adapted
to disruptions in past situations to estimate whether their system’s
‘margins of manoeuvre’ in the future are expanding or contracting.
Resilience Engineering is beginning to provide the tools to do this even
as more sophisticated general models of adaptive systems are being
developed.
18 Resilience Engineering in Practice
References
Andersson, K. P. and Ostrom, E. (2008). Analyzing decentralized
resource regimes form a polycentric perspective. Policy Science,
41, 71-93.
Alderson, D. L. and Doyle, J. C. (in press). Contrasting views of
complexity and their implications for network-centric
infrastructures. IEEE Systems, Man and Cybernetics, Part A.
Bengtsson, J., Angelstam, P., Elmqvist, T., Emanuelsson, U., Folke, C.,
Ihse, M., Moberg, F. and Nyström, M. (2003). Reserves, Resilience
and Dynamic Landscapes. Ambio, 32(6), 389-396.
Branlat, M., Fern, L., Voshell, M. and Trent, S. (2009). Coordination in
Urban Firefighting: A Study of Critical Incident Reports. Proceedings
of the Human Factors and Ergonomics Society 53rd Annual Meeting, San
Antonio, TX.
Brown, J. P. (2005). Key themes in healthcare safety dilemmas. In M. S.
Patankar, J. P. Brown, & M. D. Treadwell (Eds.), Safety Ethics: Cases
from Aviation, Healthcare, and Occupational and Environmental Health
(pp. 103-148). Adelshot, UK: Ashgate.
Committee on the Future of Emergency Care in the US (2006).
Hospital-based Emergency Care: At the Breaking Point. National
Academic Press, Washington, DC.
Cook, R. I. (2006). Being bumpable: consequences of resource
saturation and near-saturation for cognitive demands on ICU
practitioners. In D. D. Woods & E. Hollnagel (Eds.), Joint Cognitive
Systems: Patterns in Cognitive Systems Engineering. (pp. 2335). Boca
Raton, FL: Taylor & Francis/CRC Press.
Cook, R. and Rasmussen, J. (2005). “Going Solid”: A model of system
dynamics and consequences for patient safety. Quality and Safety in
Health Care, 14, 130-134.
Cook, R. I., Woods, D. D. and McDonald, J.S. (1991). Human
Performance in Anesthesia: A Corpus of Cases. Cognitive Systems
Engineering Laboratory Report, prepared for Anesthesia Patient
Safety Foundation, April 1991.
Chapter 10 19
Cook, R. I., Woods, D. D. and Miller, C. (1998). A Tale of Two Stories:
Contrasting Views of Patient Safety. Chicago, National Patient Safety
Foundation. (available at http://csel.eng.ohio-
state.edu/blog/woods/archives/000030.html )
Coutarel, F., Daniellou, F., & Dugué, B. (2003). Interroger
l'organisation du travail au regard des marges de manoeuvre en
conception et en fonctionnement [Examining Work Organization
in Relation to Margins of Maneuver in Design and in Operation].
Pistes, 5(2).
Csete, M.E. and Doyle, J.C. (2002). Reverse engineering of biological
complexity. Science, 295, 16641669.
Dekker, S. (2008). Just Culture: Balancing Safety and Accountability.
Adelshot, UK: Ashgate.
Doyle, J.C. (2000). Multiscale networking, robustness, and rigor. In T.
Samad and J. Weyrauch (Eds.), Automation, control, and complexity : an
integrated approach. NY: John Wiley & Sons, Inc. New York, pp. 287
301.
Feltovich, P. J., Spiro, R. J. and Coulson, R. L. (1989). The nature of
conceptual understanding in biomedicine: The deep structure of
complex ideas and the development of misconceptions. In D.
Evans and V. Patel (Eds.), The Cognitive Sciences in Medicine (pp. 113-
172). Cambridge MA: MIT Press.
Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1997). Issues of expert
flexibility in contexts characterized by complexity and change. In P.
J. Feltovich, K. M. Ford, & R. R. Hoffman (Eds.), Expertise in
context: Human and machine. Menlo Park, CA. AAAI/MIT Press.
Hollnagel, E. (2007). Resilience Engineering: Why, What and How. In
NoFS 2007 - Nordic Research Conference on Safety, 13-15 June 2007,
Tampere, Finland.
Hollnagel, E. (2009). The ETTO Principle: Efficiency-Thoroughness Trade-Off:
Why Things That Go Right Sometimes Go Wrong. Ashgate.
Klaene, B. J., & Sanders, R. E. (2008). Structural Firefighting: Strategies and
Tactics (2nd ed.). Sudbury, MA: Jones & Bartlett Publishers.
Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for
Collective Action. New York: Cambridge University Press, 1990.
Ostrom, E. (1999). Coping with Tragedies of the Commons. Annual
Review of Political Science, 2, pp. 493535.
20 Resilience Engineering in Practice
Perrow, C. (1984). Normal Accidents: Living with High-Risk Technologies.
New York: Basic Books.
Rochlin, G.I. (1999). Safe operation as a social construct. Ergonomics,
42(11), 1549-1560.
Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R.,
Dakos, V., Held, H., van Nes, E. H., Rietkerk, M. and Sugihara, G.
(2009). Early-warning signals for critical transitions. Nature,
461(7260), 53-59.
Wears, R. L. and Woods, D. D. (2007). Always Adapting. Annals of
Emergency Medicine, 50(5), 517-519.
Woods, D. D. (2005). Creating Foresight: Lessons for Resilience from
Columbia. In W. H. Starbuck and M. Farjoun (eds.), Organization at
the Limit: NASA and the Columbia Disaster. Malden, MA: Blackwell,
pp. 289--308.
Woods, D. D. (2006). Essential characteristics of resilience. In E.
Hollnagel, D. D. Woods, & N. Leveson (Eds.), Resilience Engineering:
Concepts And Precepts (pp. 19–30). Adelshot, UK: Ashgate.
Woods, D. D. (2009a). Escaping Failures of Foresight. Safety Science,
47(4), 498-501.
Woods, D. D. (2009b). Fundamentals to Engineer Resilient Systems:
How Human Adaptive Systems Fail and the Quest for Polycentric
Control Architectures. Keynote presentation, 2nd International
Symposium on Resilient Control Systems, Idaho Falls, ID, August 11-13
2009 (https://secure.inl.gov/isrcs2009/default.aspx accessed
September 8, 2009).
Woods, D. D. and Cook, R. I. (2006). Incidents: Are they markers of
resilience or brittleness? In E. Hollnagel, D.D. Woods and N.
Leveson, eds., Resilience Engineering: Concepts and Precepts. Ashgate,
Aldershot, UK, pp. 69-76.
Woods, D. D., & Hollnagel, E. (2006). Joint Cognitive Systems: Patterns in
Cognitive Systems Engineering. Boca Raton, FL: Taylor & Francis/CRC
Press.
Woods, D. D. and Sarter, N. (2000). Learning from Automation
Surprises and Going Sour Accidents. In N. Sarter and R. Amalberti
(Eds.), Cognitive Engineering in the Aviation Domain, Erlbaum,
Hillsdale NJ, pp. 327-354.
Chapter 10 21
Woods, D.D. and Shattuck, L. G. (2000). Distant supervision—local
action given the potential for surprise. Cognition, Technology and
Work, 2, 242245.
Woods, D. D. and Wreathall, J. (2008). Stress-Strain Plot as a Basis for
Assessing System Resilience. In E. Hollnagel, C. Nemeth and S. W.
A. Dekker, eds., Resilience Engineering: Remaining sensitive to the
possibility of failure. Ashgate, Aldershot, UK, pp. 145-161.
Zhou, T., Carlson, J. M. and Doyle, J. (2005). Evolutionary dynamics
and highly optimized tolerance. Journal of Theoretical Biology, 236,
438-447.
... Adaptive range refers to the spectrum of response options that managers consider salient-those they deem relevant, actionable, and appropriate-when addressing events construed as near-miss errors and accident failures (S. Dekker, 2011;Grote, 2009;March et al., 1991;Oktem & Meel, 2008;Reason, 1997;Weick & Sutcliffe, 2015;Woods & Branlat, 2011). However, a significant gap exists in understanding the full range of responses that managers and organizational members develop in shaping their strategies and actions. ...
... This bipolarity underscores the importance of an organization's foresight, or lack thereof, in risk mitigation efforts. The bipolar range in this category provides insights into how HROs handle uncertainty-either by meticulously planning for possible outcomes or by managing the consequences of being caught off guard (Hollnagel et al., 2006;Reason, 1997;Vaughan, 2016;Woods, 2006Woods, , 2011Woods & Branlat, 2011). ...
... Finally, constructs like Aligning, Clarifying, and Communicating provide further layers of analysis by exploring organizational responses from cross-team coordination to individualistic efforts, from clear, measurable objectives to ambiguous goals, and from transparent communication to withholding information. Each bipolar spectrum encapsulates the various ways in which HROs adapt to maintain operational integrity across a range of near-miss errors and accident failures (Hollnagel et al., 2006;March et al., 1979;Rasmussen, 1997;Weick & Roberts, 1993;Woods & Branlat, 2011). ...
Article
Full-text available
This study examines how actors in a high‐reliability organization categorize errors as near‐misses or accidents through the lens of adaptive capacity and adaptive range. We studied a large defense entity with operations critical to national security to understand how organization members categorized errors during incidents. Using the repertory grid method to interview informants, we identify key dualities that actors navigate between anticipatory and retrospective responses to errors. These dualities collectively reflect the organization's adaptive capacity and adaptive range when balancing anticipatory and retrospective responses. Our analysis of error categorization through this lens provides new insights into how high‐reliability organizations manage incidents to maintain reliability and offers practical implications for enhancing organizational resilience in high‐risk settings.
... Because the rebound curve tells a story in which a stressor event initiates a decline in system performance, it tends to oversimplify the trigger event as a root cause of the incident when, in fact, failure in complex systems tends to have multiple contributing factors [43,23]. A deeper look reveals basic processes that enable resilience [44,81] and demonstrates basic patterns in how failure results when these processes break down [106,102]. Confusing the trigger event with the root cause is a common fallacy that makes it easy to scapegoat blame while avoiding the more challenging work of understanding and improving system operations [100]. ...
Article
The rebound curve remains the most prevalent model for conceptualizing, measuring, and explaining resilience for engineering and community systems by tracking the functional robustness and recovery of systems over time. (It also goes by many names, including the resilience curve, the resilience triangle, and the system functionality curve, among others.) Despite longstanding recognition that resilience is more than rebound, the curve remains highly used, cited, and taught. In this article, we challenge the efficacy of this model for resilience and identify fundamental shortcomings in how it handles system function, time, dynamics, and decisions — the key elements that make up the curve. These oversimplifications reinforce misconceptions about resilience that are unhelpful for understanding complex systems and are potentially dangerous for guiding decisions. We argue that models of resilience should abandon the use of this curve and instead be reframed to open new lines of inquiry that center on improving adaptive capacity in complex systems rather than functional rebound. We provide a list of questions to help future researchers communicate these limitations and address any implications on recommendations derived from its use.
... The total amount spent on each participant is the same, but a decreasing pay structure likely facilitates the process of internalization discussed in the Motivation section. The psychological literature on sunk costs, escalation of commitment, and maladaptive persistence suggests that people are often hesitant to change their behavior when a situation deteriorates progressively (Merkle et al., 2022;Weeth et al., 2020;Woods & Branlat, 2011). We are not aware of any studies that tested a decreasing payment structure despite the theoretical arguments that suggest a superior effectiveness. ...
Article
Full-text available
Financial incentives are widely used to get people to adopt desirable behaviors. Many small landholders in developing countries, for example, receive multiyear payments to engage in conservation behaviors, and the hope is that they will continue to engage in these behaviors after the program ends. Although effective in the short term, financial incentives rarely lead to long-term behavior change because program participants tend to revert to their initial behaviors soon after the payments stop. In this article, we propose that four psychological constructs can be leveraged to increase the long-term effectiveness of financial-incentive programs: motivation, habit formation, social norms, and recursive processes. We review successful and unsuccessful behavior-change initiatives involving financial incentives in several domains: public health, education, sustainability, and conservation. We make concrete recommendations on how to implement the four above-mentioned constructs in field settings. Finally, we identify unresolved issues that future research might want to address to advance knowledge, promote theory development, and understand the psychological mechanisms that can be used to improve the effectiveness of incentive programs in the real world.
... Resilient implementations of the best-of-N problem for robotic systems require the collective to make decisions that are both fast and accurate regardless of the sites' layout in the environment. Robot designers and operators will not always know the available sites and be able to encode an optimal solution a priori (Krotkov et al., 2017;Woods and Branlat, 2011;Yang et al., 2018). Resilience metrics must measure how well an algorithm copes with unexpected events and enable system designers to evaluate whether changes to an algorithm improve the collective's task performance. ...
Article
Full-text available
Biologically inspired collective decision-making algorithms show promise for implementing spatially distributed searching tasks with robotic systems. One example is the best-of-N problem in which a collective must search an environment for an unknown number of sites and select the best option. Real-world robotic deployments must achieve acceptable success rates and execution times across a wide variety of environmental conditions, a property known as resilience. Existing experiments for the best-of-N problem have not explicitly examined how the site layout affects a collective’s performance and resilience. Two novel resilience metrics are used to compare algorithmic performance and resilience between evenly distributed, obstructed, or unobstructed uneven site configurations. Obstructing the highest valued site negatively affected selection accuracy for both algorithms, while uneven site distribution had no effect on either algorithm’s resilience. The results also illuminate the distinction between absolute resilience as measured against an objective standard, and relative resilience used to compare an algorithm’s performance across different operating conditions.
Article
Full-text available
Accountability pressures on human operators supervising automation have been shown to reduce automation bias, but with increasingly autonomous automation enabled by artificial intelligence, the work structure between people and automated agents may be less supervisory and more interactive or team-like. We thus tested the effects of accountability pressures in supervisory and interactive work structures, recruiting 60 participants to interact with an automated agent in a resource management task. High versus low accountability pressures were manipulated based on previous studies, by changing the task environment, i.e., task instructions and the researcher’s dress code. Results show that an interactive control structure facilitated higher throughput, fewer resources shared, and lower resource utility compared to participants in a supervisory control structure. Higher accountability pressures resulted in lower throughput, more resources shared, and lower resource utility compared to lower accountability pressures. Although task environment complexity makes it difficult to draw clean conclusions, our results indicate that with more interactive structures and higher outcome accountability pressures, people will engage in the most available actions to maximize individual performance even when suboptimal performance is needed to achieve the highest joint outcome.
Preprint
Full-text available
Supporting coordination between a human and their machine counterparts is essential for realizing the benefits of an automated system and maintaining system safety. In supervising the automation, the ability to answer question "what will happen next" given the system design is necessary for continuous coordination. If the human's view of the world, the autonomous system's activities, and the world are misaligned, automation surprises occur. We introduce the What's Next diagram which can be used to visualize the ability of the human to coordinate with 15 automated systems over time in both a retrospective and future-oriented manner. By analyzing the interplay between projection, retrojection, and events as they occur temporally, gaps in design can be recognized and design recommendations can be formulated. Two case studies are presented showing how to use and generate insights from this diagram in both manners (retrospective and future-oriented) supported by a computational-based analysis.
Article
On November 28--29, 2023, Northwestern University hosted a workshop titled "Towards Re-architecting Today's Internet for Survivability" in Evanston, Illinois, US. The goal of the workshop was to bring together a group of national and international experts to sketch and start implementing a transformative research agenda for solving one of our community's most challenging yet important tasks: the re-architecting of tomorrow's Internet for "survivability", ensuring that the network is able to fulfill its mission even in the presence of large-scale catastrophic events. This report provides a necessarily brief overview of two full days of active discussions.
Article
Overload can threaten a software system’s performance and reliability due to resource exhaustion. Multiple or long-running incidents can similarly diminish an engineer’s ability to meet sustained workload demands by exhausting the adaptive capacity of human resources. In this article, we will look at the risk of saturation and coping strategies.
Article
Full-text available
Two trajectories underway transform human systems. Processes of growth/complexification have accelerated as stakeholders seek advantage from advances in connectivity/autonomy/sensing. Surprising empirical patterns also arise—puzzling collapses of critical valued services occur against a background of growth. In parallel, new scientific foundations have arisen from diverse directions explaining the observed anomalies and breakdowns, highlighting basic weaknesses of automata regardless of technology. Conceptual growth provides laws, theorems, and comprehensive theories that encompass the interplay of autonomy/people and complexity/adaptation across scales. One danger for synchronizing the trajectories is conceptual lag as researchers remain stuck in stale frames unable to keep pace with transformative change. Any approach that does not either build on the new conceptual advances—or provide alternative foundations—is no longer credible to match the scale and stakes of modern distributed layered systems and overcome the limits of automata. The paper examines longstanding challenges by contrasting progress then as the trajectories gathered steam, to situation now as change has accelerated.
Article
Tightening regulation of operators’ performance by adding new operation procedures has been one of the typical countermeasures for enhancing the safety of complex industrial systems. However, some recent studies have pointed out that detailed procedures can hinder operators’ resilient performance in response to unexpected situations, which in turn can lead to the increase of safety risks. The purpose of this study is to experimentally examine the effects of regulations by operation procedures on resilient performance. An experimental task simulating fire-fighting command and control was used to compare the performance of two participant groups: “a reference group” provided with the goals to be achieved and action rules as a reference, and “a rule group” with the goals and action rules as a procedure to be followed. The results showed that the rule group demonstrated less resilient performances and had lower outcomes than the reference group in experimental scenarios involving situations in which goal-rules and action-rules were in conflict. The interview results also suggested the superiority of the reference group in terms of attentive monitoring of situations that do not fit the procedures.
Technical Report
Full-text available
This report describes research conducted during 1989 and 1990 on the cognitive characteristics of a corpus of anesthesia critical incidents. The incidents were collected by monitoring and transcribing the regular quality assurance conferences in a large, university anesthesiology department. The 57 reports of incidents were analyzed by constructing protocols which traced the flow of attention and the knowledge activation sequence of the participants. Characteristics of the resulting protocols were used to divide the collection into five categories: acute incidents, going sour incidents, inevitable outcome incidents, airway incidents, and non-incident incidents. Of these, the acute and going sour categories represent distinct forms of incident evolution. The implications of this distinction are discussed in the report. Nearly all of the incidents involve human cognitive performance features. Cognition clearly plays a role in avoiding incidents but also in aborting and recovering from incidents in progress. Moreover, it is clear that subtle variations in cognitive function may playa crucial role in anesthetic disasters, of which incidents are taken to be prototypes. Review of the corpus reveals the different cognitive functions involved in anesthesia and anesthesia incidents. These cover a wide range including classic aspects of cognition, for example the direction of attention, and complex and poorly understood aspects such as situation awareness. The cognitive features include dealing with competing goals, dealing with competing indicators, the limitations of imperfect models, knowledge activation failures, the role of learned procedures and assumptions in reducing cognitive workload, failure to integrate multiple themes, organizational factors, and planning. These presence of these different cognitive features and cognitive failures in a single discipline is significant because it enhances and supports separate findings from other domains (e.g. nuclear power plant operation, commercial aviation) and also because it provides strong support for the contention that operators acting in these semantically complex, time pressured, high consequence domains face common problems and adopt similar strategies for dealing with them. The report demonstrates the way in which cognitive analysis of incidents can be accomplished in anesthesia and in other domains and suggests a system for categorizing the results obtained. It also raises questions about the adequacy of evaluations of risk and safety that do not explicitly account for the cognitive aspects of incidents and their evolution. In order to make real progress on safety in domains that depend critically on human operators it is necessary to examine and assess human cognitive performance, a process which requires large amounts of data and careful reconstruction. Such cognitive analysis is difficult. It requires substantial experience, skill, and effort and depends on acquiring and sifting through large quantities of data. This should not be suprising, since the domain itself is one characterized by experience, skill, effort, and large quantities of data. The challenge for us and for other researchers is to perform more such analyses and extend and refine the techniques described here and to link the analyses to those from other domains.
Book
Full-text available
Our fascination with new technologies is based on the assumption that more powerful automation will overcome human limitations and make our systems 'faster, better, cheaper,' resulting in simple, easy tasks for people. But how does new technology and more powerful automation change our work? Research in Cognitive Systems Engineering (CSE) looks at the intersection of people, technology, and work. What it has found is not stories of simplification through more automation, but stories of complexity and adaptation. When work changed through new technology, practitioners had to cope with new complexities and tighter constraints. They adapted their strategies and the artifacts to work around difficulties and accomplish their goals as responsible agents. The surprise was that new powers had transformed work, creating new roles, new decisions, and new vulnerabilities. Ironically, more autonomous machines have created the requirement for more sophisticated forms of coordination across people, and across people and machines, to adapt to new demands and pressures. This book synthesizes these emergent Patterns though stories about coordination and mis-coordination, resilience and brittleness, affordance and clumsiness in a variety of settings, from a hospital intensive care unit, to a nuclear power control room, to a space shuttle control center. The stories reveal how new demands make work difficult, how people at work adapt but get trapped by complexity, and how people at a distance from work oversimplify their perceptions of the complexities, squeezing practitioners. The authors explore how CSE observes at the intersection of people, technology, and work, how CSE abstracts patterns behind the surface details and wide variations, and how CSE discovers promising new directions to help people cope with complexities. The stories of CSE show that one key to well-adapted work is the ability to be prepared to be surprised. Are you ready?.
Book
Building on the success of the 2007 original, Dekker revises, enhances and expands his view of just culture for this second edition, additionally tackling the key issue of how justice is created inside organizations. The goal remains the same: to create an environment where learning and accountability are fairly and constructively balanced. The First Edition of Sidney Dekker's Just Culture brought accident accountability and criminalization to a broader audience. It made people question, perhaps for the first time, the nature of personal culpability when organizational accidents occur. Having raised this awareness the author then discovered that while many organizations saw the fairness and value of creating a just culture they really struggled when it came to developing it: What should they do? How should they and their managers respond to incidents, errors, failures that happen on their watch? In this Second Edition, Dekker expands his view of just culture, additionally tackling the key issue of how justice is created inside organizations. The new book is structured quite differently. Chapter One asks, 'what is the right thing to do?' - the basic moral question underpinning the issue. Ensuing chapters demonstrate how determining the 'right thing' really depends on one's viewpoint, and that there is not one 'true stor' but several. This naturally leads into the key issue of how justice is established inside organizations and the practical efforts needed to sustain it. The following chapters place just culture and criminalization in a societal context. Finally, the author reflects upon why we tend to blame individual people for systemic failures when in fact we bear collective responsibility. The changes to the text allow the author to explain the core elements of a just culture which he delineated so successfully in the First Edition and to explain how his original ideas have evolved. Dekker also introduces new material on ethics and on caring for the' second victim' (the professional at the centre of the incident). Consequently, we have a natural evolution of the author's ideas. Those familiar with the earlier book and those for whom a just culture is still an aspiration will find much wisdom and practical advice here.
Article
Accident investigation and risk assessment have for decades focused on the human factor, particularly 'human error'. Countless books and papers have been written about how to identify, classify, eliminate, prevent and compensate for it. This bias towards the study of performance failures, leads to a neglect of normal or 'error-free' performance and the assumption that as failures and successes have different origins there is little to be gained from studying them together. Erik Hollnagel believes this assumption is false and that safety cannot be attained only by eliminating risks and failures. The ETTO Principle looks at the common trait of people at work to adjust what they do to match the conditions – to what has happened, to what happens, and to what may happen. It proposes that this efficiency-thoroughness trade-off (ETTO) – usually sacrificing thoroughness for efficiency – is normal. While in some cases the adjustments may lead to adverse outcomes, these are due to the very same processes that produce successes, rather than to errors and malfunctions. The ETTO Principle removes the need for specialised theories and models of failure and 'human error' and offers a viable basis for effective and just approaches to both reactive and proactive safety management.