ArticlePDF Available

Cooperative Advocacy: An Approach for Integrating Diverse Perspectives in Anomaly Response

Authors:
  • Paychex and Rochester Institute of Technology

Abstract

This paper contrasts cooperative work in two cases of distributed anomaly response, both from space shuttle mission control, to learn about the factors that make anomaly response robust. In one case (STS-76), flight controllers in mission control recognized an anomaly that began during the ascent phase of a space shuttle mission, analyzed the implications of the failure for mission plans, and made adjustments to plans (the flight ended safely). In this case, a Cooperative Advocacy approach facilitated a process in which diverse perspectives were orchestrated to provide broadening and cross-checks that reduced the risk of premature narrowing. In the second case (the Columbia space shuttle accident—STS-107), mission management treated a debris strike during launch as a side issue rather than a safety of flight concern and was unable to recognize the dangers of this event for the flight which ended in tragedy. In this case, broadening and cross-checks were missing due to fragmentation over the groups involved in the anomaly response process. The comparison of these cases points to critical requirements for designing collaboration over multiple groups in anomaly response situations.
Cooperative Advocacy: An Approach for Integrating
Diverse Perspectives in Anomaly Response
Jennifer Watts-Perotti
2
& David D. Woods
1
1
The Ohio State University, Columbus, OH, USA (E-mail: woods.2@osu.edu);
2
Xerox Innovation
Group, Rochester, NY, USA
Abstract. This paper contrasts cooperative work in two cases of distributed anomaly response,
both from space shuttle mission control, to learn about the factors that make anomaly response
robust. In one case (STS-76), ight controllers in mission control recognized an anomaly that began
during the ascent phase of a space shuttle mission, analyzed the implications of the failure for
mission plans, and made adjustments to plans (the ight ended safely). In this case, a Cooperative
Advocacy approach facilitated a process in which diverse perspectives were orchestrated to provide
broadening and cross-checks that reduced the risk of premature narrowing. In the second case (the
Columbia space shuttle accidentSTS-107), mission management treated a debris strike during
launch as a side issue rather than a safety of ight concern and was unable to recognize the dangers
of this event for the ight which ended in tragedy. In this case, broadening and cross-checks were
missing due to fragmentation over the groups involved in the anomaly response process. The
comparison of these cases points to critical requirements for designing collaboration over multiple
groups in anomaly response situations.
Key words: anomaly response, space missions, mission control, accidents, replanning, premature
narrowing, Columbia accident, cooperative work, cross-checks, crisis management
1. Introduction
Anomaly response is a form of diagnostic work where a fault or disrupting event
produces a cascading set of disturbances in the process under control or plan in
progress (Woods and Hollnagel 2006). Recognizing what is anomalous,
diagnosing what is producing anomalies, analyzing the implications of anomalies,
and modifying plans and reprioritizing goals to maintain control are quite difcult
tasks vulnerable to many generic forms of failure (e.g., premature narrowing). In
most work domains like health care delivery, air trafc management, critical care
medicine and space mission control, anomaly response tasks involve multiple
groups with partial, interdependent and overlapping rolesdistributed anomaly
response. Collaboration and interaction over these roles can improve the
robustness of anomaly response or exacerbate the vulnerabilities for anomaly
response tasks to break down.
Computer Supported Cooperative Work © Springer 2009
DOI 10.1007/s10606-008-9085-4
Anomaly response can break down in many ways. Among the patterns identied,
anomalous ndings can be discounted or minimized (Klein et al. 2005). The
diagnostic process can become too focused too early on a narrow hypothesis set
or a narrow set of ndings to be explained (De Keyser and Woods 1990;Klein
et al. 2006). Practitioners can lose track of the cascade of disturbances within an
anomaly, causing inappropriate or mis-timed responses to the anomaly as it evolves
(Woods and Patterson 2000). Implications of disturbances for the plans in progress
canbemissed(Smithetal.1997). Re-planning can focus on working around
specic impasses and miss side effects on wider sets of constraints (Shattuck and
Wood s 2000). Responses can become fragmented and poorly synchronized in time
leaving gaps in assessments and plans. As consequence of fragmentation and gaps,
groups can carry out activities that look locally appropriate but that actually that
work at cross-purposes when considered from a broader perspective (Klein 2007).
The cognitive functions that occur as part of anomaly response are distributed
across multiple interdependent people and groups in many situations ranging
from process control industries (Woods et al. 1987), offshore oil production (Flin
et al. 1996), military command and control (Shattuck and Woods 2000), crisis
response (Flin and Arbuthnot 2002; Militello et al. 2007), critical care medicine
(Gaba et al. 1987), health care delivery (Patterson et al. 2004) to air trafc
management (Smith et al. 2007). Distributing the functions of anomaly response
over multiple groups can be a means to make anomaly response more reliable
and robust. However, dividing and integrating the different perspectives and
activities creates interactions with vulnerabilities that also can contribute to
break downs.
This paper examines two cases of distributed anomaly response, both from
space shuttle mission control, to learn from the contrast in the processes and
interactions across groups. In the rst case, mission control addressed an anomaly
that began during ascent (a hydraulics leak), analyzed the implications of the
failure for mission plans, and made adjustments to the plans, concluding with a
safe shuttle landing (STS-76). In the second case, mission management treated a
debris strike during launch as a side issue, and was unable to recognize the
danger of this event for the safety of the ight. The ight ultimately ended in
tragedy when the Columbia space shuttle disintegrated upon entry to the Earths
atmosphere due to damage to the leading edge of the wing caused by the debris
strike (STS-107).
Aerospace disasters generate independent thorough accident investigations
which provide data to examine how different groups worked different parts of
the mission (Columbia Accident Investigation Board or CAIB 2003). These
investigations provide evidence for others to build on in their own analyses of
the coordination or fragmentation across the different group activities (e.g.,
Starbuck and Farjoun 2005; Woods 2005). Successful anomaly response cases
usually provide less opportunity for investigation and analysis. However, in the
case of STS 76 a research project on anomaly response happened to be
Jennifer Watts-Perotti and David D. Woods
underway with research observers present when the anomaly began during
mission ascent. This fortuitous turn of events led to observations of the mission
control response over multiple days until the shuttle landed safely (Watts-Perotti
and Woods 2007).
These two analyses provide an opportunity to examine the factors in
collaborative activity that make anomaly response more robust despite the
various complexities and challenges that can arise. The Columbia Accident
Investigation highlighted the lack of rigorin the analyses that preceded
decisions about the anomaly. Managers were unaware they were making
decisions based on engineering analyses that appeared thorough, but were in
fact of very low rigor (Columbia Accident Investigation Board 2003; Return to
Flight Task Group [RTFTG] 2005). Others involved in the mission were unable
to recognize the weaknesses in apparent answers to safety-critical questions and
therefore the system as a whole was unable to detect and correct weaknesses in
anomaly response for that mission.
The analysis of anomaly response during the STS-76 mission highlighted
several ways that interactions across groups led those involved to broaden their
exploration of the anomaly and its implications, and helped groups nd and
resolve weaknesses in their assessments and positions. One pattern that emerged
was termed Cooperative Advocacy, an approach that led to the detection and
resolution of conicts in assessment, analysis, and re-planning (Watts-Perotti and
Woods 2007). Functionally distinct groups came together regularly during the
anomaly response process in STS-76. As practitioners prepared for and
participated in these coordinative meetings, they acted as advocates for the
systems they monitored, based on their distinct roles and perspectives. As a
result, they could propose constraints on the anomaly response process and on
mission activities which upheld their perspectives, addressed the scope of
authority of their group, and achieve safety within their scope of responsibility.
Through advocacy, differing assessments, conicting constraints and disagree-
ments over plan modications came to light. When conicts surfaced, the distinct
groups performed additional analysis, and worked together to search for or
innovate alternative ways to satisfy the total set of constraints over all of the
relevant groupsfunctions and goals. Thus, Cooperative Advocacy is an approach
for coordinating the multiple perspectives to help the groups involved expand the
range of factors considered and recognize potential mis-steps early in replanning
decisions.
This paper develops the concept of Cooperative Advocacy for distributed
teams by juxtaposing the anomaly response process in the STS-76 mission
where a Cooperative Advocacy approach was observed to help the teams detect
and resolve weaknesses and differences in assessments and re-planning
decisions, with the process in the case of STS-107 where the multiple teams
involved failed to identify weaknesses in assessments and failed to re-plan
successfully.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
2. Background
2.1. Anomaly response
Anomaly response consists of three interdependent and interwoven cognitive
components (Woods and Hollnagel 2006, chapter 8):
(1) anomaly recognition, in which practitioners identify and update the set of
ndings to be explained;
(2) diagnostic search, in which practitioners attempt to develop a coherent
explanation for the ndings; and
(3) response management, in which practitioners determine the implications of
the disturbances and their diagnosis for future plans and procedures.
Practitioners monitor some processes that change over time. New events can
occur which disrupt ongoing processes and plans (event-driven situations). When
faults or other abnormal events occur, they produce disturbances in the process
being managed and these disturbances can cascade. Practitioners may need to act
immediately to keep these disturbances from threatening the integrity of core
activities and the goals they serve (e.g., safety, and such actions are referred to as
sangactions in space shuttle mission control).
In parallel with sang actions, practitioners conduct diagnostic activities to
determine the source of the disturbancesin order to identify the underlying
problem so the source of the disturbances can be eliminated or contained.
Depending on the nature of the trouble, the anomaly has implications for plans in
progress which can lead to shifts in mission tasks and reprioritizing mission
goals. Various contingency plans serve as resources to guide these responses to
the evolving situation and to avoid mistakes. Anomaly response situations can
become quite challenging as cascading effects and changing tempo of operations
place pressure on monitoring, diagnostic search, shorter term corrective actions,
and longer term replanning. Anomaly response activities are further complicated
because operators must handle multiple interleaved tasks, consider multiple
interacting goals, and be ready to revise assessments and plans as new evidence
comes in or as the situation changes.
One particular vulnerability in anomaly response is premature narrowing.
Research ndings have highlighted the danger of becoming stuck in one
assessment and being unable to revise the assessment even as situations change
or new evidence comes in (Woods et al. 1987; Johnson et al. 1991). Studies of
hypothesis generation in diagnostic reasoning have found that a possible solution
to this problem is to broaden the set of possible explanations to be considered
(Gettys et al. 1987). Studies of re-planning nd that overlooking the side effects
of changes to a plan is a signicant risk (Smith et al. 2004). Studies of
professionals in information analysis found that premature narrowing was a basic
vulnerability as analysts moved to new computer-based systems to cope with
Jennifer Watts-Perotti and David D. Woods
massive increases in data availability (e.g., Patterson et al. 2001; Elm et al. 2005).
Follow up studies emphasized the need for broadening mechanisms to expand
the data and possible explanations analysts explored (Elm et al. 2005; Zelik
et al. 2007a).
The dangers of premature narrowing are also evident in studies of
collaboration. Layton et al. (1994) found that certain design characteristics
introduced into collaborative humanmachine systems can narrow the range of
data considered and hypotheses explored. Studies of error recovery point to the
need for collaborative cross-checks across the multiple practitioners involved in
managing critical activities (Patterson et al. 2004 and Branlat et al. 2008 in health
care or Fischer and Orasanu 2000, for aviation). Studies of humanhuman
collaboration nd that lack of diversity across participants can also contribute to
premature narrowing that limits problem-solving performance (Hong and Page
2002; Page 2007). Studies of what increases rigor highlight the need for
collaborative interactions to corroborate assessments, to cross-check for weak-
nesses, and to broaden the set of possible explanations (Zelik et al. 2007b).
All processes for anomaly response must, within some time and resource
horizon, funnel-in on key data sources, on the basic unexpected ndings to-
be-explained, on the storyline that explains the unfolding events and evidence
gathered, and on the critical action or plan revision that needs to be undertaken to
accomplish goals in a changing situation. The danger is a premature narrowing
that misses or discounts evidence that would lead to revision. Experienced
practitioners and teams develop broadening checks that they combine with the
normal funneling processes to reduce the risk of premature narrowing or closure.
The collaborative system adjusts the sequence of funneling-in plus broadening or
cross-checks to converge in a timely manner, while remaining sensitive to the
need to revise previous assessments. Using broadening checks is a balance
between the need to be sensitive to the potential for misassessment and the need
to accomplish work within time and resource bounds inherent in evolving
situations with goal pressures. Thus, one part of the value of effective
collaborative interconnections lies in how they can broaden focus, recognize
possible mis-assessments, reduce the risk of missing side effects in re-planning,
and support revision (Woods and Hollnagel 2006; Rudolph et al. 2008).
2.2. Structure and functions of mission control communities
Space shuttle ground support consists of different communities, which have
distinct, but overlapping expertise, resources, goals, responsibilities, and
authority (additional studies of how mission control works include (Patterson
and Woods 2001; Garrett and Caldwell 2002; Mark 2002; and Shalin 2005).
Anomaly response in the STS-76 hydraulics leak case was distributed across two
of these communities: Operations and Engineering.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
The Operations Community is a multi-level set of teams responsible for
monitoring each of the shuttle subsystems during every mission (e.g. Patterson
et al. 1999). This Community monitors real-time telemetry data from shuttle
subsystems. If controllers encounter anomalies in the telemetry data, they have
the authority to recommend specic procedures and actions for the crew to
perform in response to these situations. Therefore, it is the Operations
Community that becomes directly involved in the dynamics of managing faults
and disturbances during shuttle missions. Members of the Operations Community
detect and diagnose unfamiliar patterns in shuttle telemetry data, recommend
diagnostic and therapeutic actions, and develop primary and contingency plans
for future mission phases or for responding to potential failures in other systems.
The Engineering Community is a separate set of multi-level teams that
coordinate their activities to track and maintain individual shuttle components
across missions. They track performance trends for each component over time,
and decide when the components should be tested, refurbished, replaced, or
redesigned. When an anomaly occurs during a mission, the Engineering
Community determines what actions should be taken to make sure this
behavior does not occur again in subsequent ights. Even before the shuttle
lands, this community initiates efforts to understand the anomalous system
behavior and its implications for future ights, and they begin the turnaround
process of creating post-ight maintenance and design improvements. These
turnaround activities can affect shuttle launch schedule and program produc-
tivity. (In fact, it was turnaround productivity pressures that played a role in
the events leading up to the STS-107 accident; CAIB 2003; Starbuck and
Farjoun 2005)
In the Columbia accident a management group, the Mission Management
Team, was central to the decision making processes (ironically, the mission
control operations communitythe ight director, ight controllers and their
supporting teamswas not a main player in the events related to the debris
strike). The Mission Management Team (MMT), consisting of managers from
engineering, operations and several other organizations including safety, systems
integration, science and others, addresses problems that arise during a mission
that are outside the responsibility and authority of the Launch and Flight
Directors (see CAIB 2003). The MMT convenes 2 days before a launch and
continues until the space shuttle lands safely. Because debris strikes are a basic
risk to the integrity and safety of the space shuttle, whenever one is detected,
NASA guidelines call for initiating a Debris Assessment Team to analyze the
consequences and risks of the strike. In the case of the Columbia STS 107 the
rst formal meeting of this team occurred 5 days into the mission.
Each anomaly (a foam strike on launch, foam being one kind of debris that can
hit and damage structures on the orbiter, and a hydraulics leak in an auxiliary
power unit during ascent) presented itself to the groups handling shuttle missions
differently, triggering different interactions initially and different interactions over
Jennifer Watts-Perotti and David D. Woods
the time extent of each mission. In STS-76 both the operations teams and
engineering teams monitoring shuttle systems during ascent noticed the anomaly
in telemetry data, and the anomaly had implications for the different goals and
responsibilities of each group. The subsequent interactions between the two
groups were central to how the anomaly was handled.
The group interactions throughout this case are available because the authors
were involved in a study of anomaly response in cooperation with NASA at the
time of the mission (Watts-Perotti and Woods 2007). An observer from the
research team (rst author) was present and shadowed the front and back rooms
of the Mechanical, Maintenance, Arm and Crew Systems (MMACS) ight
control team during the entire STS-76 mission. This operations team monitors the
shuttles mechanical systems, which included the leaky hydraulics system. The
study was based on 70 h of observations documented in eld notes and
transcribed tape recordings. Observations included shift handovers, discussions
within the MMACS team, diagnostic analyses, voice loop conversations (see
Patterson et al. 1999 for a description of the voice loops and their functions),
conversations between the MMACS team and other ight control teams, and
coordinative meetings between operations and engineering. Follow up interviews
with participants were conducted to supplement observations.
In STS-107 the debris (foam) strike was recognized on Flight Day 2 when the
Intercenter Photo Working Group received high resolution lm of the launch.
This groups report went to the MMT and to other groups who formed a Debris
Assessment Team per standard practice. The sequence of events, group
interactions, decisions, and context were established by the Columbia Accident
Investigation Board (Columbia Accident Investigation Board 2003). As a
consultant to this board on the decision making processes prior to launch and
post-launch, the second author examined a variety of source material and
participated in discussions with the accident investigation team as the analysis
was completed (see Woods 2005).
Thus, the information about collaborative processes over groups in the two
cases is available through different meansa thorough detailed retrospective
analysis from the CAIB for one case and direct observation in the other case.
However, the detail and quality of information actually is quite similar because
NASA captures a great deal of the deliberations that go on during and leading up
to shuttle missions which could be used to develop detailed protocols of across
group interactions (e.g., logs, transcripts, electronic memos). For example, the
CAIB had transcripts of all of the MMT meetings available for detailed analysis,
and the Watts-Perotti and Woods (2007) study of STS-76 included data from
after-the-fact interviews with many of the ight control team (7 out of the
8 MMACS group involved). In these after-the-fact interviews, mission
documentation was used as cues to ground ight controllers descriptions of
perceptions, intent, and activities as they and others determined how to adapt
given the anomaly (cued retrospective technique).
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
2.3. Summary of the response to the STS-76 shuttle anomaly
The purpose of the STS-76 shuttle mission was to transport an American
astronaut to the Russian MIR Space Station. Just after liftoff, the Mechanical,
Maintenance, and Crew Systems (MMACS) Flight Control Team noticed that the
quantity value of one of three hydraulics systems began to decrease (the nding
to-be explained). The team recognized this pattern in their telemetry data as a
classic leak signature (initial diagnosis) and proceeded to investigate the
characteristics of the leak.
The team began to enact the procedures that had been created to respond to a
hydraulic leak. First, they quickly calculated the slope of the line in the telemetry
plot and determined that the leak rate was not great enough to call a mission abort
(sang responses). Next, they checked the values of the other two hydraulics
systems for an increase in their quantities to determine whether system three
was leaking into the other two systems. There was no increase, so they assumed
the leak was external, and that there was no intersystem leakage. Given these
initial assessments they decided the mission did not need to be aborted, and they
allowed the crew to continue their ascent into orbit and cut off the main engines
(additional sang response).
Just after main engine cutoff, the MMACS Flight Control Team performed the
diagnostic intervention of asking the crew to close a set of isolation valves to
determine whether they could isolate the leak. When the valves were closed, the
quantity telemetry signature of system three atlined. For a moment, it looked as
if the intervention was successful and that the leak had been isolated. However,
the quantity soon began to drop again. Therefore, the team determined that the
leak was not isolatable. When the orbit phase began, practitioners shut down the
leaky hydraulic system and began to further characterize the anomaly and
investigate its implications for the remainder of the mission (see Watts-Perotti and
Woods 2007 for more details about the mission).
Orbit: Description of the coordinative anomaly response process. When the
shuttle reaches orbit, the pace of activities for Flight Controllers slows, which
provides opportunities to investigate anomalies that have occurred during ascent. It is
at this stage of the mission when the cognitive demands of anomaly response and the
demands for coordination across teams escalate (Woods and Patterson 2000). After
the leaky hydraulic system was shut down, practitioners were primarily engaged in
contingency evaluation and replanning processes, given that an explanation for the
anomaly (a leak in the hydraulics system) seemed clear-cut. These replanning
activities focused on several issues that had to be resolved as a result of the anomaly:
the impact of the anomaly on 1) the mission schedule, on 2) preparation for re-entry
and on 3) the actual re-entry phase of the mission.
In the process of resolving these issues, the engineering and operations
communitiesscopes broadened, and their activities began to overlap. As a result,
Jennifer Watts-Perotti and David D. Woods
these communities began to interact and coordinate their response to the anomaly.
Throughout the orbit phase of the mission, the anomaly response activities
occurring across these functionally distinct communities were anchored around a
series of formal coordinative meetings. By observing the interactions across these
communities, Watts-Perotti and Woods (2007) found that their distinct perspec-
tives gave rise to different assessments and viewpoints which were compared and
coordinated throughout the anomaly response process. These distinct perspectives
often led the communities to prefer different approaches, which in contrast with
each other, represented different sides of several interesting tradeoffs. Here, we
walk through one set of trade-offs that arose (Watts-Perotti and Woods 2007
presents a complete description of the series of coordinative meetings and other
trade-offs).
Determining whether to use the leaking system during descent. One of the issues
that needed to be resolved during the STS-76 mission was whether the leaky
hydraulics system should be used during the descent phase of the mission.
Originally, both the Operations and Engineering communities agreed that the leaky
system should not be used for descent. In preparing for a coordinative meeting to
discuss this issue, the Operations Teams reviewed data from a previous mission
during which a leak occurred. In this review, they discovered ight rules that led
them to change their entry plan stance, and propose that the leaky system should be
used for descent. Note that the diagnosis of the anomaly remained the same; it was
the replanning efforts that the Teams were working to resolve.
At the beginning of the meeting, the Operations teams presented their new
entry plan. This new plan surprised the Engineering teams, whose initial reaction
was negative. The plan triggered a new idea from one of the engineers. He
proposed that a high-pressure leak could damage some sensitive areas of the
shuttle. Since the exact location of the leak was not known, the leaky hydraulic
system should not be used. The following sample is taken from handwritten notes
recorded during the meeting (see Watts-Perotti and Woods 2007):
Engineering:
Youre taking a major risk by doing norm press (normal pressure mode on
entry). Our previous analysis (during the other mission that sustained a leak)
was only considering ammability, not pressure, etc. We dont know where the
leak is. It could shoot 3000 PSI at something and damage it.
A crew representative (a third perspective) also introduced another new idea: the
possibility that leaking hydraulic uid could pose a re hazard if it touched hot
shuttle elements. Neither of these potential complications had been discussed
publicly within or across the sets of teams before the Operations teams introduced
their new entry plans. Based on these new hypotheses, the two communities agreed
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
to analyze the hypotheses independently and reconvene to resolve the issue. A later
coordinative meeting led to the resolution. The new analyses conrmed the risks of
running the leaky system and all parties agreed to not use the system during descent.
These examples show how discussions between the functionally distinct
groups gave rise to new ideas about potential consequences which challenged and
ultimately overturned the working hypothesis. These factors were not noted by
any one group in their own analysis process, but emerged from interactions
across multiple perspectives represented by the different groups.
The Cooperative Advocacy Process. Watts-Perotti and Woods (2007) describe the
processes observed during the STS-76 mission as Cooperative Advocacy, which
allowed practitioners to reveal and resolve conicts in assessment, analysis, and
re-planning. During the coordinative meetings, practitioners acted as advocates
for the shuttle systems and mission goals that fell within their primary
responsibility. Functionally distinct teams represented different perspectives
with different goals, tools, data, and experience. Each group proposed
constraints on mission activities, and on the anomaly response process, which
upheld their role and scope of responsibility relative to the whole mission and to
the larger Shuttle program. The term advocacyin the Cooperative Advocacy
process serves as a pointer to this aspect of their interactions.
Often, the constraints identied by different groups conicted. For example,
members of the Engineering Teams proposed the constraint that an Auxiliary
Power Unit should be run for Flight Control Systems checkout. However, the
Flight Controllers responsible for the leaky hydraulics system had the goal of
minimizing activity that directly affected any of the hydraulics systems, therefore
protecting system redundancy. To meet this goal they proposed the constraint that
the Auxiliary Power Unit should not be run during the checkout task. During the
debate, the ight control team introduced an alternative:
Operations:
What about running a 303 procedure?
(other meeting attendees seemed as if they had not considered this option
before)
Crew Rep:
Well, a 303 is not normally done. Weve tried it in simulations and they
usually run out of time. The crew doesnt really want to do this.
Engineering:
We could say do circ pump ops and then the 303 actuator check, but you might
not get it, and even then, youre not getting everything you would get if you
run an APU for FCS checkout.
Jennifer Watts-Perotti and David D. Woods
In this example, the Engineering community advocated for their mission goals by
proposing a plan that would maximize the information gain related to their goal of
xing the shuttle when it returned and/or redesigning the shuttle for future ights.
The Operations community advocated for their mission goals of keeping the shuttle
as safe as possible by running an alternative procedure which minimized the risk of
further damage to the leaky system. The crew representative advocated for the crew,
attempting to keep them out of a situation in which they would have to run an
unfamiliar procedure that they might not be able to nish in the time available. This
interaction across perspectives is another example of how disagreements between the
distinct communities led to the consideration of alternative plans, which might not
have been considered if groups worked in isolation.
In the Cooperative Advocacy approach, the word "cooperative" refers to the
higher-level goals shared by all the groups, teams, and individuals. Ultimately, all
practitioners were interested in bringing the shuttle back to Earth safely and in
keeping it running properly across missions. If one community disagreed with a
plan presented by another community, and could present a clear case for why this
plan did not satisfy important constraints, the communities coordinated their efforts
to nd alternative ways to satisfy those important constraints. The debate over the
checkout task is an example where communities continued to explore alternative
plans until they could nd a means to satisfy the safety constraints of the ight
control team and also meet the information needs of the Engineering community.
2.4. Summary of the response to the STS-107 shuttle anomaly
STS-107 was a 16-day mission, during which approximately 80 international physical,
life, and space science experiments were conducted (http://www.nasa.gov/columbia/
mission/index.html). Upon reentering the atmosphere on February 1, 2003, the
Columbia orbiter suffered a catastrophic failure due to a breach that occurred during
launch when falling foam from the External Tank struck the Reinforced Carbon panels
on the underside of the left wing. The orbiter and its seven crewmembers were lost
approximately 15 min before Columbia was scheduled to touch down at Kennedy
Space Center(http://history.nasa.gov/columbia/Introduction.html).
The foam strike that led to the shuttles tragic end was detected during the
launch phase of the mission (via standard means for monitoring shuttle launches
by the Intercenter Photo Working Group). The report and data on the foam strike
was sent to the Mission Management Team which became key to handling of the
event. Because of limited resolution and views of the area where the debris strike
occurred the initial report could not determine if the orbiter had sustained
damage. The chair of the Intercenter Photo Working Group anticipated that
further analysis of the debris strike would require additional information and
placed a request to have the Department of Defense (DoD) obtain a high
resolution image of the Orbiter on-orbit.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
The initial report triggered a variety of groups to begin to look at the available data
to analyze the result of the impact (e.g. Mission Evaluation Room, United Space
Alliance contractor groups) and, per standard NASA guidelines, this led to the
assembly of a Debris Assessment Team to analyze the anomaly post-launch, before
the descent phase of the mission (though their rst meeting occurred on Flight Day 6).
Very early assessments of the foam strike led to a stance in the Mission
Management Team and among other mission managers that further analysis was
not critical to the mission (CAIB, chapter 6). For example, in the MMT meeting
on Flight Day 9, the following is the only discussion of the event: And we sent
up to the crew about 16-s video clip of the strike just so they are armed if they get
any questions in the press conferences or that sort of thing. We made it very clear
to them, no concerns.(Mission Management Team transcript of 01-24-2003). In
general the transcripts show the MMT meetings included only brief discussions
regarding the foam strike. On Flight Day 2 a memo stated, Basically, the RCC
[reinforced carbon-carbon, shorthand for the wing leading edge structure] is
extremely resilient to impact type damage.The memo went on to state that the
debris didnt look big enoughto pose any serious threat, did not have enough
energy to create signicant damage, tiles in the leading edge area are thicker
than required, most likely received only shallow damage, and there is a single
mission safe re-entry plan available (CAIB, p. 141). All of these points were used
to justify the pre-existing belief the debris strike was not a safety issue requiring
signicant follow through. None of these points were derived from evidence or
analysis. All of the rationalizations turned out to be incorrectthe condence in
the strength of the leading edge structures was unjustied, the debris strike was
large (speed and mass), the energy of the strike was far outside the envelope for
the analytic tools available to predict damage, and there was insufcient data
available to be condent in any assessmentwith the exception that the debris
did strike the leading edge structure (the underside of RCC8). The initial request
to obtain further images from DoD assets and two subsequent requests to obtain
the imagery were denied by mission management as tile damage should be
considered a turn-around maintenance concern and not a safety-of-ight issue,
therefore, imagery of Columbias left wing was not necessary.(CAIB, p. 151).
Managements stance toward foam strikes had developed earlier in the history
of shuttle missions. First, foam strikes were re-classied from in-ight anomaly
status to maintenance and turn around issues (STS-113 Flight Readiness Review,
CAIB, p. 125126). Accompanying this re-classication was a general shift to
see foam loss as an accepted risk or even as one pre-launch brieng put it—“not a
safety of ight issue(CAIB, p. 126 1st column to top of 2nd column). In
addition, the fact that the shuttle underwent previous debris strikes without
consequence may have contributed to a false condence that inuenced the
Mission Management Teams initial assessment that foam strikes pose little risk
to orbiter safety (see Woods 2005 for an in-depth discussion of the organizational
drift toward failure).
Jennifer Watts-Perotti and David D. Woods
On the other hand working engineers thought damage could have occurred and,
more importantly, recognized that they did not have sufcient evidence and tools to
complete a high condence assessment of potential damage and its consequences.
For example, an ad hoc team working on Flight Days 3 and 4 (the weekend
following launch) developed an estimate of the size and speed of the debris strike
that showed the strike to be a serious threat with a total energy hundreds of times
larger than the assumptions built into the standard tools for analyzing debris strikes
on tiles (and these estimates turned out to be accurate). Overall, the CAIB report
(chapter 6) found at least 8 opportunities were missed where actions could have led
to the recognition that the orbiter had suffered serious damage, and it goes into detail
about how each of these opportunities arose and the specic factors that blocked or
sidelined each opportunity to understand the damage and its implications.
Eventually, in response to the STS-107 foam strike during launch (per standard
practices), a Debris Assessment Team formed and conducted several technical
analyses of the anomaly, including modeling of the strike with a simulation
modeling tool (Crater). This modeling tool was limited in its ability to provide a
clear understanding of the risks associated with the foam strike because the STS-
107 strike was hundreds of times the scale of what the model is designed to
handle (email on p. 151152 CAIB).
The Debris Assessment Team also researched the history of debris strikes in
past shuttle missions to determine whether events like the STS107 strike had
occurred beforeIs the size of the debris strike out-of-familyor in-family
given past experience? While the team looked at past experience, they were
unable to get a consistent or informative read on how past events indicated risk
for this event (Woods 2005). This team did, however, encounter several pieces of
evidence that the strike posed a risk of serious damage. For example, the foam
debris in STS-107 was 600 times larger than previously analyzed ice debris
(CAIB, p. 145), and models predicted that tile damage on STS-107 was deeper
than the tile thickness (CAIB p. 143). However, the team apparently discounted
these pieces of evidence and moved on to consider other possible damage
scenarios that would pose risk to safety during re-entry (e.g., focusing on the risk
that landing gear door seals were damaged).
As the Debris Assessment Team worked at the margins of available knowledge
and data (a signicant cue in itself), their partial assessments did not benetfrom
cross-checks through interactions with other technical groups with different
backgrounds and assumptions(Woods 2005). The CAIB did not nd any report
of a technical review process that accompanied the Debris Assessment Teamswork.
Overall, the Debris Assessment Team was unable to integrate partial and uncertain
data to generate a big picture, i.e., the situation was outside the understood risk
boundaries and carried signicant uncertainties(Woods 2005). Recognizing that the
situation at hand falls outside the bounds of previous experience and models and
carries a high level of uncertainty should serve as a major beacon signaling the need
for much greater investment in more rigorous analysis.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
Woods (2005) also notes that information about the strike was scattered across
time, and across disconnected groups of people.
What is striking is how there was a fragmented view of what was known about
the strike and its potential implications over time, people and groups. There
was no place, artifact, or person who had a complete and coherent view of the
analysis of the foam strike event (note a coherent view includes understanding
the gaps and uncertainties in the data or analysis to that point).
The Columbia Accident Investigation Board analysis noted that the team
who was charged to analyze the anomaly was unable to generate a clear
picture of the anomaly and its associated risks. This lack of clarity, combined
with the Mission Management Teams stance that foam strikes are merely
turn-around issues, created the effect of an anomaly in limbo(Woods 2005).
Not dismissed completely, yet the organization as a whole was unable to get
traction on the event as an in-ight anomaly with safety implications to be
thoroughly analyzed from different perspectives including contingency
analyses. Woods notes that this kind of situation, where understanding an
anomaly becomes stuck in an incomplete state unable to move forward,
usually emerges at the boundaries of different organizations that do not have
the appropriate mechanisms to facilitate constructive interplay, integrate partial
information, and recognize conicts.
In interacting with the groups attempting to analyze the nature and
implications of the debris strike, the Mission Management Team in the STS-
107 created an atmosphere where the analysts needed to demonstrate that the
foam strike was an issue in order to generate resources (e.g., get on-orbit
images) and to lay a claim on the time of mission management to consider
risks and contingencies. In other words, the analytic burden was to show the
foam strike as an ill-understood anomaly which requires further assessment,
contingency evaluation, and re-planning. The norm in effective anomaly
response should be the oppositeall discrepancies are anomalies until
analysis, appropriately scaled for rigor, shows that the anomaly in question
requires minimal or no modications to plans or contingencies (Woods 2005;
Zelik et al. 2007b). The stance downplaying the signicance of the debris
strike emerged before results were obtained from any technical analyses,
therefore decreasing the chances that the shuttle communities as a whole
would follow a rigorous engineering analysis process, where data evaluation
guides conclusions.
The STS-107 case, in which the anomaly response process was fragmented
across disconnected groups working in isolation, contrasts dramatically with the
anomaly response process that occurred for STS-76, where functionally distinct
teams used a Cooperative Advocacy approach to coordinate their efforts in
assessing and responding to the anomaly.
Jennifer Watts-Perotti and David D. Woods
3. Contrasting distributed anomaly response in STS-76 to STS-107
In comparing STS-76 with STS-107, one of the distinct differences between the
cases is the degree to which multiple diverse perspectives became involved in the
anomaly response process. During STS-76, functionally distinct teams collabo-
rated in a coherent, owing response process. The diversity in the teams
perspectives and approaches provided opportunities to catch mistakes and
broaden the set of alternative hypotheses (Watts-Perotti and Woods 2007). The
anomaly response process in STS-107 was fragmented across time and across
groups who did not communicate with each other, and who did not bring in broad
sets of expertise during decision making processes.
While both the STS-76 mission and the STS-107 mission were required to
respond to an anomaly that occurred early in a shuttle mission, the ways in which
the anomaly response process was approached were quite different. This section
discusses the differences in the anomaly response processes between the two
cases.
While one was a success and the other a failure in terms of outcome, it is
important to note that a variety of small missteps occurred during the anomaly
response process in both cases. To gain leads about how collaboration over
groups can make anomaly response more robust, the contrast focuses on how
missteps were detected and resolved in one case, while opportunities to correct
missteps were not created or were missed in the other case. Good process cannot
guarantee good outcomes, but some processes guard against complexities and
vulnerabilities that can lead to failures in anomaly response (Woods et al., in press).
Differences in the default anomaly response structure. One of the ways in which
the two cases differed is that the hydraulics leak anomaly on STS-76 occurred in
a shuttle system that was monitored by default by two functionally distinct
communities. Both the Engineering and the Operations communities always
monitor the hydraulic systems during ascent because they both have
responsibility for different aspects of the system. Therefore, when the anomaly
occurred, both of the functionally distinct communities were simultaneously
collecting data about the leak, and initiating parallel anomaly response processes.
In this case, the anomaly response process was structured by default in a way that
easily facilitated the possibility of broadening the set of possible assessments and
responses. Since foam strikes had been previously re-classied as a turn-around
issue, and not critical to orbiter safety, the operations community did not own
responsibility for the foam strike in STS-107. Therefore, they were not engaged
in the analysis of the strike.
Differences in initial stances. Another difference between the two cases is the
differing stances toward the anomalies adopted by the Mission Management
Team. In STS-76, the Mission Management Team assumed the hydraulic leak
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
was an issue that could have serious consequences for the mission. In the case of
a hydraulic leak there was a presumption that the mission might need to end early
and analysis was required to understand the anomaly, assess the consequences,
and estimate uncertainties as a basis for decisions about modifying mission
duration and planned accomplishments. This early stance in management
provided the framework to facilitate a thorough, evidence-driven analysis of the
leak behavior and its implications for the rest of the shuttle mission. The
management stance adopted in STS-107 was quite different as they assumed the
foam strike was of low risk to the orbiter safety so that the assessment team had
to generate evidence and analysis sufcient to change managements mindset.
For anomaly response to be robust given all of the difculties and
complexities that can arise (Woods and Hollnagel 2006), all discrepancies
must be treated as if they are anomalies to be wrestled with until their
implications are understood (including the implications of being uncertain or
placed in a difcult trade-off position). This stance is a kind of readiness to re-
frame and is a basic building block for other aspects of good process in
anomaly response (Klein et al. 2005
,
2006). Maintaining this as a group norm
is very difcult because following up on discrepancies consumes resources of
time, workload, and expertise. Inevitably, following up on a discrepancy will be
seen as low priority for these resources when a group or organization operates
under severe workload constraints and under increasing pressure to be faster,
better, cheaper(Woods 2006).
Differences in handling inconsistent evidence. How teams reacted to data that did
not support their evolving assessments of the anomalies was different in the two
cases. In STS-107, uncertainties and inconsistencies did not stimulate further analysis
and inquiry to resolve the issues. To do so requires processes that search out and bring
additional expertise and alternative perspectives to bear. In STS-107, conclusions
drove or limited the need for analysis, rather than investigations building the evidence
from which one then would (re-)evaluate risks, identify contingencies, and draw
conclusions (i.e., foam strikes were not safety ofight issues until shown otherwise).
However, in STS-76, the stances of management and the assessment teams was that
the anomaly could pose serious implications for the mission, and that the assessment
teams should determine what kinds of sang actions and responses were required to
maintain the safety of the orbiter as well as take into count other objectives.
There were several occasions when the STS-76 teams encountered evidence
that was not consistent with their assessments at the time. For example, at one
point, the operations community suggested that the leaky system should be used
during the descent phase of the mission. This suggestion was not consistent with
the Engineering communitys assessment that the leaky system should not be
used at all. When the Operations community presented this new suggestion, the
engineering team spontaneously came up with several reasons why this
suggestion was risky. Because the opposing assessments were held by
Jennifer Watts-Perotti and David D. Woods
functionally distinct groups, the communities found they had to test each
assessment, and create coherent arguments for why their current assessments
should, or should not be revisedrather than discounting inconsistent evidence
as was done in STS-107. Ultimately, the functionally distinct communities agreed
to choose the conservative approach of not using the leaky system, but this
decision was more robust because more alternatives had been considered and
tested in the process of making the decision.
Differences in assessment team structures. The anomaly response process in STS-
76 began with the participation of functionally distinct teams by default. Not only
were these distinct communities fully engaged in the anomaly response process from
the beginning, but they continued to bump into each other and stay connected to each
othersevolving assessments and response plans. It was through these connections
with each other that the Cooperative Advocacy process emerged. Ironically, the
venue for these iterative connections between the communities was a set of meetings
called by the Mission Management Team, which presented the directive that the
communities should meet with each other to resolve the conicting assessments that
surfaced in these meetings (Watts-Perotti and Woods 2007).
On the other hand, the anomaly response process in STS-107 was fragmented and
there was a lack of constructive interplay between the Mission Management Team
and the Debris Assessment Team and other groups, due to the original assessment of
low risk and the inability of the Debris Assessment Team to form a clear picture of
the risk based on the modeling and past experiences they could draw on.
Differences in coherence. Due to the iterative connections between the distinct
communities in STS-76, combined with the stance that the leak could pose
serious implications for the mission, a coherent picture of the hydraulic leak
became distributed across all of the communities involved. While the Cooperative
Advocacy approach led the operations and engineering communities to consider
broad sets of alternatives, this approach also ensured that each community had a
more coherent understanding of the anomaly, because they shared the process of
testing the broad set of alternative plans and hypotheses that emerged. This case
sharply contrasts with STS-107 where what was known about the foam strike and
its potential implications was fragmented over people and groups, and over time.
Together the cases illustrate the need for a coherent view observable and
trackable by all relevant groupswhich would be especially important for
tracking the status of issues in progress. This need also relates to issues that
appear to be resolved, since in these and other cases of anomaly response, some
apparently closed issues reappear and need to be reopened and reexamined as
situations change and new analytic results come in.
Differences in the ability to determine the need for more investigation. Another
interesting difference between the two cases is in the ability of the decision
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
makers to determine whether and where they needed to make more efforts and
draw in more expertise. Woods (2005) notes that the decision makers in STS-107
did not seem able to notice when they needed more expertise, data, and analysis
in order to have a proper evaluation of an issue.However, the decision makers
in STS-76 had built-in cues about when more investigation was appropriate.
These cues included conicting assessments, conicting stances, and conicting
proposals. For example, when the operations and engineering communities
presented their early assessments of the leak behavior to the Mission
Management Team, they learned that they had conicting interpretations of the
leak. These conicting interpretations served as a cue to all that further
investigation was needed to determine what actually happened with respect to
the leak during the ascent phase of the mission.
4. Discussion: strengths of the cooperative advocacy approach
In STS-76, where a Cooperative Advocacy approach emerged, practitioners
developed broad sets of possible hypotheses and replanning alternatives (Watts-
Perotti and Woods 2007). These practitioners did not encounter break downs
like xation in assessment (De Keyser and Woods 1990; Rudolph et al. 2008)
and the inability to detect side effects in re-planning (Klein 2007; Woods and
Hollnagel 2006). Several characteristics of the interactions across the teams
seemed to provide the opportunity to broaden the set of alternatives considered.
Intermixing of Parallel Activities. When the hydraulic leak occurred, the
Operations and Engineering communities began anomaly analysis and
response independently, in parallel (Watts-Perotti and Woods 2007). The
Operations community determined how the anomaly would affect the rest of
the ight, as the Engineering community determined what would need to
happen after the shuttle landed. Since the groups conducted their activities
toward distinct goals in parallel, they had the opportunity to form distinct and
possibly divergent ideas about the anomalous situation(Watts-Perotti and
Woods 2007). For example, the two communities used different methods for
analyzing the leak rate, which led to conicting stances about whether the leak
rate increased or remained stable.
As the mission progressed, both communities began to merge or align their
perspectives through the coordinative meetings(Watts-Perotti and Woods 2007).
They independently built stances toward a set of issues, and then integrated or
debated these stances in the meetings. This process allowed the communities to
mix their distinct ideas to produce a broad set of alternative hypotheses and
plans for future procedures(Watts-Perotti and Woods 2007).
Revising Assessments and Re-planning Approaches. Conicting assessments
across the communities in STS-76 led to self monitoring and deeper data review.
Jennifer Watts-Perotti and David D. Woods
When teams knew their stance conicted with other teamspositions, they
collected more data to support their view in preparation for upcoming
coordinative meetings. These activities not only helped teams argue their
position during meetings, but also served as an opportunity for teams to re-
examine their own assessments, assumptions, and positions(Watts-Perotti and
Woods 2007). In the STS-76 case, the conicting stances about whether the leak
rate increased led to further analysis of the leak. This rened analysis allowed the
Operations community to detect that their earlier analysis was incorrect, leading
them to revise their assessment of the leak rate. In the STS-107 mission, weak
and erroneous analyses were accepted as if they had a much more rigorous basis
in data (CAIB 2003; Zelik et al. 2007b). The presence of distinct groups can
therefore provide an opportunity for cross-checks and revision of assessments
(Patterson et al. 2004; Hong and Page 2002; Page 2007).
Explicit Representation of Tradeoffs. The distinct backgrounds, expertise, and
authority of the Operations and Engineering communities led them to advocate
different sides of several complex tradeoffs during re-planning for STS-76. These
tradeoffs arose from conicting stances that emerged across the communities.
By advocating opposing stances, the functionally distinct communities gained
the opportunity to examine the tradeoffs explicitly, which led the communities to
consider a broader region of the total solution space and to generate better ways
to balance all of the relevant constraints and side effects(Watts-Perotti and
Woods 2007).
Avoiding Premature Closure. Cooperative Advocacy is our label for a process
that emerged over groups in STS-76 and was absent in STS-107. As a result of a
process of Cooperative Advocacy in STS-76, opportunities for cross-checks and
broadening checks arose and resulted in error correction, repair of common
ground, and identication of additional alternatives and contingencies. In the case
of STS-107, interactions across groups were fragmented, and neither cross-checks
nor broadening checks were identied in the processes leading up to or during the
Columbia disaster (CAIB 2003).
More broadly, the differences in process during the two cases speak to factors
that facilitate or hinder an ongoing process of deciding whether to invest, gather,
or commit more resources (time, expertise, economic, social, and physical
resources) to the analysis and re-planning process (see Zelik et al. 2007a
,
b).
Cooperative Advocacy implicitly facilitated when to increase resource investment
to more fully understand the situation and more fully explore contingencies in
replanning. In the Columbia case, there was no implicit or explicit process to
facilitate escalating assessment resources, and the result was a rather dramatic
premature closure in analysis of the debris strike and its implications.
Avoiding premature closure during anomaly response requires treating all data
discrepancies as anomalies until analysis adequately characterizes the situation
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
and its implications for plans in progress. The key question is what is adequate
analysis, or what NASA and Zelik et al. (2007b) have labeled as the sufcient
rigor question: how does one assess the rigor of an analysis and how does one
decide what level of rigor is sufcient given the context and pressures?
Following up every discrepancy in a fully rigorous manner would be very
resource consuming. Inevitably, such thorough follow up will be seen in hindsight as
unnecessary overkillas many discrepancies occur in physical systems that will not
prove worth the diagnostic effort nor have signicant implications for plans. On the
other hand, not running down every discrepancy thoroughly increases the risk of
falling into one or another of the vulnerabilities that can plague anomaly response.
This workload/thoroughness tradeoff is at the heart of what it means to be effective in
anomaly response and in problem detection (Klein et al. 2005). The resolution to the
tradeoff is the ability to continually test and revise the answer to the sufcient rigor
question. What cues signal that additional analysis and the associated resources are
needed? These two cases instantiate many such cues including cues of ambiguities,
differences, narrowness, conicts, uncertainty, and violations of boundary conditions
or assumptions. Many of these cues are difcult to see as such by a single person,
single group or single perspective, but these cues are more evident, and more
justiable to after-the-fact review, as differences across diverse people, groups, and
perspectives (Page 2007). The Cooperative Advocacy process also provides a natural
means for escalating analytical and replanning activity as differences across groups
emerge and inherently signal the need for more resources to be invested to inform,
pursue, and resolve the differences.
The contrast between these two cases also reveals that organizations need to
consciously develop and sustain processes like Cooperative Advocacy. After all, it is
the same organization that provides us with samples of strong and weak process in
these two cases. More research cases of effective and ineffective anomaly response
processes can provide insights that enable organizations to design and sustain
processes that make anomaly response robust despite the inherent complexities.
5. Conclusions
This paper summarizes observations from a rare opportunity to contrast two actual
multi-day cases of anomaly response in the same domain. The salient differences
between the two cases can be used to inform the design and development of systems to
support anomaly response in domains where functionally distinct teams of practitioners
must coordinate their efforts. The following conclusions inform the design, operation,
and management of systems that may be called on to perform anomaly response. Some
of the points are relevant to systems that include human practitioners in the loop. Others
could inform the design of more automated anomaly response systems.
&It is crucial that organizations take the stance that all discrepancies are
signicant anomalies until analysis demonstrates what are the actual
Jennifer Watts-Perotti and David D. Woods
implications and uncertainties. For example, active evidence gathering and
analysis is always needed, despite the time and resource costs involved, to
establish conclusions. Organizations should not continue acting in accor-
dance with previously held assessments and just wait for disconrming
evidence to come to the fore (too often such disconrming indicators are
present but get sidelined on the way to critical decision bodies).
&Mechanisms for broadening checks increase the robustness of anomaly
response. For example, such checks help identify side effects of decisions or
actions in re-planning that might otherwise be missed, and such checks help
management recognize when (and what) additional resources need to be
brought to bear to re-plan effectively. Broadening checks can help reveal the
level of rigor behind assessments and recommendations. For example,
several times in STS-107 decisions were based on very low rigor
assessments that on the surface appeared to be of higher rigor.
&Mechanisms for cross-checks increase the robustness of anomaly response. For
example, such checks help groups recognize when current assessments are in
need of revision and help them recovery from mis-assessments that arise in the
process of trying to understand and respond to anomalies. In particular, cross-
checks are needed to test whether new events are consistent with past
experience and therefore consistent with applying standard analysis processes
and standard plans, or whether the events fall outside past experience,
assumptions and bounding conditions and require new analytical approaches
and re-planning processes (in STS-107, several versions of this arose: is a
specic foam strike in-family or out-of-family? does the event fall inside or
outside the assumptions of the available predictive tools?).
&Means for involving diverse perspectives in analysis processes is essential
for creating broadening and cross-checks (but not the only method). One
goal is avoiding errors of premature narrowing.
&Means for conict detection and resolution are needed and must be
grounded on technical processes that increase the rigor of analyses. An
effective anomaly response system needs to make salient any conicting
assessments that exist across disparate teams. The system must also include
a means for resolving these conicts based on data/uncertainties about the
anomaly and risk management over contingencies.
&A central place, process, or person is needed to provide an integrated picture
of the state of the response to the anomaly and makes this salient to all other
groups who may contribute to or be affected by the anomaly and the
organizations response. The integrated picture of the current state of the
anomaly response process should emphasize uncertainties, alternative
interpretations, and loose ends. This integrative function is needed to
maintain coherence given the disruptive forces that can occur and dominate
activities as a result of time pressures, resource pressures, and other
difculties inherent in anomaly response.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
&Assessments about anomaly response processes, such as assessments of the
rigor of the process, are critical to managing the appropriate level of
investment in analysis and replanning and in decisions about what additional
forms of expertise need to be brought to bear to adequately understand the
events and their implications.
The comparison of the STS-76 and STS-107 cases provides insights into
factors that affect the robustness of the anomaly response process despite the
inherent complexities and vulnerabilities. Interestingly, what makes anomaly
response robust is how different groups interact and utilize diverse perspectives to
develop a deeper understanding of the events in question and a broader
exploration of the implications for future plans and for critical goals. Designers
of CSCW systems can follow the recommendations gained from the comparison
of these two cases to increase the robustness of organizations and systems that
will be called upon to respond to anomalies, emergencies and crises.
References
Branlat, M., S. Anders, D.D. Woods and E.S. Patterson (2008): Detecting an Erroneous Plan: Does
a System Allow for Effective Cross-Checking? In E. Hollnagel, C. Nemeth and S.W.A. Dekker
(eds): Resilience Engineering: Remaining Sensitive to the Possibility of Failure., Aldershot:
Ashgate, pp. 247257.
Columbia Accident Investigation Board (2003): Columbia Accident Investigation Board Report.
Washington, DC: U.S. Government Printing Ofce.
De Keyser, V. and D.D. Woods (1990): Fixation Errors: Failures to Revise Situation Assessment in
Dynamic and Risky Systems. In A.G. Colombo and A. Saiz de Bustamante (eds): Systems
Reliability Assessment (pp. 231251). Dordrecht: Kluwer Academic.
Elm, W., S. Potter, J. Tittle, D.D. Woods, J. Grossman and E.S. Patterson (2005): Finding decision
support requirements for effective intelligence analysis tools. In Proceedings of the Human
Factors and Ergonomics Society 49th Annual Meeting. Santa Monica, CA: Human Factors and
Ergonomics Society.
Fischer, U. and J. Orasanu (2000): Error-challenging strategies: Their role in preventing and
correcting errors. In Proceedings of the International Ergonomics Association 14th Triennial
Congress and Human Factors and Ergonomics Society 44th Annual Meeting in San Diego,
California, August 2000.
Flin, R. and K. Arbuthnot (eds) (2002): Incident Command: Tales from the Hot Seat. Aldershot: Ashgate.
Flin, R., G. Slaven and K. Stewart (1996): Emergency decision making in the offshore oil and gas
industry. Human Factors, vol. 38, pp. 262277. doi:10.1518/001872096779048110.
Gaba, D.M., M. Maxwell and A. DeAnda (1987): Anesthetic mishaps: breaking the chain of
accident evolution. Anesthesiology, vol. 66, pp. 670676.
Garrett, S.K. and B.S. Caldwell (2002): Mission Control Knowledge Synchronization: Operations
To Reference Performance Cycles. Proceedings of the Human Factors and Ergonomics Society
46th Annual Meeting, Baltimore, MD.
Gettys, C.F., R.M. Pliske, C. Manning and J.T. Casey (1987): An Evaluation of Human Act
Generation Performance.
Jennifer Watts-Perotti and David D. Woods
Hong, L. and S.E. Page (2002): Groups of Diverse Problem Solvers Can Outperform Groups of
High-ability Problem Solvers. Proceedings of the National Academy of Science: Economic
Sciences, vol. 101, pp. 1638516389.
Johnson, P.E., K. Jamal and R.G. Berryman (1991): Effects of framing on auditor decisions.
Organizational Behavior and Human Decision Processes, vol. 50, pp. 75105. doi:10.1016/
0749-5978(91)90035-R.
Klein, G., R. Pliske, B. Crandall and D. Woods (2005): Problem detection. Cognition Technology
and Work, vol. 7(1), pp. 1428. doi:10.1007/s10111-004-0166-y.
Klein, G., B. Moon and R.R. Hoffman (2006): Making sense of sensemaking 2: a macrocognitive
model. IEEE Intelligent Systems, vol. 21(5), pp. 8892. doi:10.1109/MIS.2006.100.
Klein, G. (2007): Flexecution 1: exible execution as a paradigm for re-planning. IEEE Intelligent
Systems, vol. 22(5), pp. 7983. doi:10.1109/MIS.2007.4338498.
Layton, C., P.J. Smith and C.E. McCoy (1994): Design of a cooperative problem-solving system for
en-route ight planning: an empirical evaluation. Human Factors, vol. 36, pp. 94119.
Mark, G. (2002): Extreme collaboration. Communications of the ACM, vol. 45, pp. 8993.
doi:10.1145/508448.508453.
Militello, L.G., E.S. Patterson, L. Bowman and R. Wears (2007): Information ow during crisis
management: challenges to coordination in the emergency operations center. Cognition
Technology and Work, vol. 9, pp. 2531. doi:10.1007/s10111-006-0059-3.
Page, S.E. (2007): The Difference: How the Power of Diversity Creates Better Groups, Firms,
Schools, and Societies. Princeton: Princeton University Press.
Patterson, E.S. and D.D. Woods (2001): Shift changes, updates, and the on-call model in space
shuttle mission control. Computer Supported Cooperative Work: The Journal of Collaborative
Computing, vol. 10, pp. 317346.
Patterson, E.S., J.C. Watts-Perotti and D.D. Woods (1999): Voice loops as coordination aids in
space shuttle mission control. Computer Supported Cooperative Work, vol. 8, pp. 353371.
doi:10.1023/A:1008722214282.
Patterson, E.S., E.M. Roth and D.D. Woods (2001): Predicting vulnerabilities in computer-
supported inferential analysis under data overload. Cognition Technology and Work, vol. 3(4),
pp. 224237. doi:10.1007/s10111-001-8004-y.
Patterson, E.S., R.I. Cook, D.D. Woods and M.L. Render (2004): Examining the complexity
behind a medication error: generic patterns in communication. IEEE SMC Part A, vol. 34,
pp. 749756.
Return to Flight Task Group (2005): Return to Flight Task Group Final Report. Washington, DC:
U.S. Government Printing Ofce.
Rudolph, J.W., J.B. Morrison and J.S. Carroll. The Dynamics of Action-Oriented Problem Solving:
Linking Interpretation and Choice. Academy of Management Review, (2008) in press.
Shalin, V.L. (2005): The roles of humans and computers in distributed planning for dynamic
domains. Cognition Technology and Work, vol. 7, pp. 198211. doi:10.1007/s10111-005-0186-2.
Shattuck, L.G. and D.D. Woods (2000): Communication of Intent in Military Command and
Control Systems. In Carol McCann and Ross Pigeau (eds): The Human in Command:
Exploring the Modern Military Experience (pp. 279292), New York: Kluwer Academic/
Plenum Publishers.
Smith, P.J., E. McCoy and C. Layton (1997): Brittleness in the design of cooperative problem-
solving systems: the effects on user performance. IEEE Transactions on Systems,Man,and
Cybernetics, vol. 27, pp. 360371. doi:10.1109/3468.568744.
Smith, P.J., M. Klopfenstein, J. Jezerinac and A. Spenser (2004): Distributed Work in the National
Airspace System: Providing Feedback Loops Using the Post-Operations Evaluation Tool
(POET). In B. Kirwan, M. Rodgers and D. Schaefer (eds): Human Factors Impacts in Air
Trafc Management (pp. 127152), London: Ashgate.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
Smith, P.J., A.L. Spenser, C.E. Billings (2007): Strategies for designing distributed systems: Case
Studies in the design of an Air Trafc Management System. Cognition Technology and Work,
vol. 9, pp. 3949.
Starbuck, W.H. and M. Farjoun (eds) (2005): Organization at the Limit: NASA and the Columbia
disaster. Malden, MA: Blackwell.
Watts-Perotti, J. and D.D. Woods (2007): How anomaly response is distributed across functionally
distinct teams in space shuttle mission control. Journal of Cognitive Engineering and Decision
Making, vol. 1(4), pp. 405433. doi:10.1518/155534307X264889.
Woods, D.D. (2005): Creating Foresight: Lessons for Resilience from Columbia. In W.H. Starbuck
and M. Farjoun (eds): Organization at the limit: NASA and the Columbia Disaster (pp. 289
308), Malden: Blackwell.
Woods, D.D. (2006): Essential Characteristics of Resilience for Organizations. In E. Hollnagel,
D.D. Woods and N. Leveson (eds): Resilience Engineering: Concepts and Precepts., Aldershot:
Ashgate.
Woods, D.D. and E.S. Patterson (2000): How Unexpected Events Produce an Escalation of
Cognitive and Coordinative Demands. In P.A. Hancock and P. Desmond (eds.): Stress Workload
and Fatigue (pp. 290302), Hillsdale NJ: Lawrence Erlbaum.
Woods, D.D. and E. Hollnagel (2006): Joint Cognitive Systems: Patterns in Cognitive Systems
Engineering. Boca Raton: Taylor & Francis.
Woods, D.D., J. OBrien and L.F. Hanes (1987): Human Factors Challenges in Process Control:
The Case of Nuclear Power Plants. In G. Salvendy (ed): Handbook of Human Factors/
Ergonomics (rst edition, pp. 17241770). New York: Wiley.
Woods, D.D., S.W.A. Dekker, R.I. Cook, L.L. Johannesen and N.B. Sarter (in press): Behind
Human Error (Second Edition). Aldershot, Ashgate.
Zelik, D., E.S. Patterson and D.D. Woods (2007a): The Impact of Process Insight on Judgment of
Analytic Rigor in Information Analysis. In Proceedings of the Human Factors and Ergonomics
Society 51st Annual Meeting. October 15, Baltimore, MD.
Zelik, D., E.S. Patterson and D.D. Woods (2007b): Understanding Rigor in Information Analysis:
The Role of Rigor in Professional Intelligence Analysis. In K. Mosier and U. Fischer (eds):
Proceedings of Naturalistic Decision Making 8, June 2007.
Jennifer Watts-Perotti and David D. Woods
... NASA's Mission Control Center in Houston is a positive case study for this capability, especially how Space Shuttle mission controllers developed skill at handling anomalies, expecting that the next anomaly they would experience was unlikely to match any of the ones from the past that they had practiced or experienced. 7 IT-based companies exist in a pressurized world where technology, competitors, and stakeholders change. Their success requires scaling and transforming infrastructure to accommodate increasing demand and build new products. ...
... Handling anomalies in risky worlds such as space mission operation centers is one example. 7 But studies of joint activity also reveal that the costs of coordination can offset the benefits of involving multiple people and automation in situation management. 5 This earlier research looked at anomaly response anchored in physical control rooms where responders were collocated in open workspaces. ...
Article
Understanding, supporting, and sustaining the capabilities above the line of representation require all stakeholders to be able to continuously update and revise their models of how the system is messy and yet usually manages to work. This kind of openness to continually reexamine how the system really works requires expanding the efforts to learn from incidents.
... NASA's Mission Control Center in Houston is a positive case study for this capability, especially how Space Shuttle mission controllers developed skill at handling anomalies, expecting that the next anomaly they would experience was unlikely to match any of the ones from the past that they had practiced or experienced. 7 IT-based companies exist in a pressurized world where technology, competitors, and stakeholders change. Their success requires scaling and transforming infrastructure to accommodate increasing demand and build new products. ...
... Handling anomalies in risky worlds such as space mission operation centers is one example. 7 But studies of joint activity also reveal that the costs of coordination can offset the benefits of involving multiple people and automation in situation management. 5 This earlier research looked at anomaly response anchored in physical control rooms where responders were collocated in open workspaces. ...
Article
Full-text available
It's time to appreciate the human side of Internet-facing software systems.
... This diversity of expertise will bring with it diverse, yet not always complementary, perspectives on the goals, problems, and solutions that they are meant to jointly address. These teams must simultaneously cooperate while also continually crosschecking one another (Watts-Perotti & Woods, 2009). Teams must develop a shared set of techniques through a fusion of user-centered design, cognitive systems engineering, and cognitive agent design (Rayo, 2017), yet remain active individuals that continually act and react to the rest of the team (rather than passive receivers of information and instructions). ...
Article
Any clinical decision support (CDS) design project integrating computational technologies with clinician workflows will require the merging of multiple perspectives and fields of expertise in multidisciplinary teams. Much like the tools these teams aim to create, the team itself will need to continuously build, monitor, and repair a mutually beneficial relationship between each of its members. From our experience during the early development stages of an AI-enabled CDS tool for hospital-acquired infection (HAI) prevention, we abstract three central tenets of a symbiotic design process we have found to be vital for aligning goals, priorities, mental models, and techniques among a multidisciplinary team: (1) recurrent bottom-up feedback, (2) continual model (re-)alignment, and (3) openness to co-direction. With regards to these tenets, we discuss the successes and challenges our team has faced during the symbiotic design process through a series of vignettes and how these experiences coalescing diverse human design teams can influence the design of human-machine teams.
... Herein lies the crux of the issue: The collaborative interplay and synchronization of roles is critical, 12,13,15 but prior research has shown poor coordination design incurs cognitive costs for practitioners, specifically, the additional mental effort and load required to participate in joint activities. 5,6 This can be particularly exacerbated in the digital services domain where it plays out across geographically distributed groups. ...
Preprint
Full-text available
a set of 5 short articles on human performance and business critical software infrastructure including: 1. It’s time to revise our appreciation of the human side of Internet-facing software systems. 2. Above the Line, Below the Line. 3. Cognitive Work of Hypothesis Exploration during Anomaly Response. 4. Managing the Hidden Costs of Coordination. 5. Beyond the 'Fix-It' Treadmill.
... Herein lies the crux of the issue: The collaborative interplay and synchronization of roles is critical, 12,13,15 but prior research has shown poor coordination design incurs cognitive costs for practitioners, specifically, the additional mental effort and load required to participate in joint activities. 5,6 This can be particularly exacerbated in the digital services domain where it plays out across geographically distributed groups. ...
Article
Full-text available
Some initial considerations to control cognitive costs for incident responders include: (1) assessing coordination strategies relative to the cognitive demands of the incident; (2) recognizing when adaptations represent a tension between multiple competing demands (coordination and cognitive work) and seeking to understand them better rather than unilaterally eliminating them; (3) widening the lens to study the joint cognition system (integration of human-machine capabilities) as the unit of analysis; and (4) viewing joint activity as an opportunity for enabling reciprocity across inter- and intra-organizational boundaries.
Article
Professional services companies are relaxing the tone of voice and personality they use to interact with clients. In response to these trends, we used codesign methods to explore how clients want to interact with our company. Study findings revealed several opportunities for creating stronger connections with our clients. To increase the impact of our research, we conducted a codesign workshop with stakeholders across the company to share insights and co-create opportunities in an interactive format. This workshop created buy-in for some controversial findings and led to the creation of a task force focused on integrating study findings into multiple marketing and design projects. This paper summarizes the study findings and describes the codesign methods used in the internal stakeholder workshop. We will also describe workshop outcomes and discuss the benefits of using codesign to share study findings.
Preprint
This material is approved for public release. Distribution is unlimited. This material is based on research sponsored by the Air Force Research Lab (AFRL) under agreement number FA8650-17-2-7711. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL or the U.S. Government.
Conference Paper
Full-text available
Professional services companies are relaxing the tone of voice and personality they use to interact with clients. In response to these trends, we used codesign methods to explore how clients want to interact with our company. Study findings revealed several opportunities for creating stronger connections with our clients. To increase the impact of our research, we conducted a codesign workshop with stakeholders across the company to share insights and co-create opportunities in an interactive format. This workshop created buy-in for some controversial findings and led to the creation of a task force focused on integrating study findings into multiple marketing and design projects. This paper summarizes the study findings and describes the codesign methods used in the internal stakeholder workshop. We will also describe workshop outcomes and discuss the benefits of using codesign to share study findings.
Article
Controlling coordination costs when multiple, distributed perspectives are essential.
Article
Full-text available
The International Space Station (ISS) is research infrastructure enabling experiments in a microgravity environment. Building on a study of one of the ground control rooms in the ISS network, this paper concentrates on low-level operators and their efforts to display accountability in situations of trouble and problem solving. While the research infrastructure around the ISS is permeated by structural (bureaucratic) approaches to accountability (routines, procedures, audits and verifications), we discuss how real-time operations require a more dynamic form of continuously (re-)established accountability in the network of operators. In time-critical situations, operators need to establish accountability 'on the fly' in order to achieve the necessary agency to operate and troubleshoot their system. One key resource for this is the established voice loop system for synchronous communication. With significant constraints on the form and content of speaking turns, operators need to provide appropriate and recognizable accounts that align with the needs and expectations of the network. Based on an extensive multi-method study, with a focus here on recordings of voice loop interactions, we show how accounts of trouble are designed to manage uncertainty in the larger network, while also positioning the operators as competent and reliable members of the network. Conversely, inadequate accounts create uncertainty and delayed resolution of the issue. The design of accounts on the voice loop is crucial for time-critical articulation work in a distributed collaborative setting. The interactional details on the voice loop provide insights into the production and display of accountability, particularly relevant in networked organizations in which personal relations and trust can only play a marginal role and in which temporal constraints are critical. While the research literature has explored a wide variety of dimensions related to coordination and improvisation in distributed, mediated work environments, this study contributes with insights into the functions of verbal accounts in such contexts and how they may serve to supplement formal systems of accountability.
Book
Full-text available
Our fascination with new technologies is based on the assumption that more powerful automation will overcome human limitations and make our systems 'faster, better, cheaper,' resulting in simple, easy tasks for people. But how does new technology and more powerful automation change our work? Research in Cognitive Systems Engineering (CSE) looks at the intersection of people, technology, and work. What it has found is not stories of simplification through more automation, but stories of complexity and adaptation. When work changed through new technology, practitioners had to cope with new complexities and tighter constraints. They adapted their strategies and the artifacts to work around difficulties and accomplish their goals as responsible agents. The surprise was that new powers had transformed work, creating new roles, new decisions, and new vulnerabilities. Ironically, more autonomous machines have created the requirement for more sophisticated forms of coordination across people, and across people and machines, to adapt to new demands and pressures. This book synthesizes these emergent Patterns though stories about coordination and mis-coordination, resilience and brittleness, affordance and clumsiness in a variety of settings, from a hospital intensive care unit, to a nuclear power control room, to a space shuttle control center. The stories reveal how new demands make work difficult, how people at work adapt but get trapped by complexity, and how people at a distance from work oversimplify their perceptions of the complexities, squeezing practitioners. The authors explore how CSE observes at the intersection of people, technology, and work, how CSE abstracts patterns behind the surface details and wide variations, and how CSE discovers promising new directions to help people cope with complexities. The stories of CSE show that one key to well-adapted work is the ability to be prepared to be surprised. Are you ready?.
Chapter
Full-text available
Many critical real world human problem solving situations take place in dynamic event-drivers environments, where the evidence comes over time and situations can change rapidly. In these situations people must amass and integrate, uncertain, incomplete and changing evidence. A major source of human error in dynamic domains seems to be a failure to revise situation assessment as new evidence comes in. This paper will be concerned with the identification of the main descriptive patterns of fixation errors and with how to build new sytems to reduce this type of error. It will also begin the process of building a theory of fixation errors.
Article
Full-text available
We offer a theory of action-oriented problem solving that links interpretation and choice, processes usually separated in the sensemaking literature and decision-making literature. Through an iterative, simulation-based process we developed a formal model. Three insights emerged: (1) action-oriented problem solving includes acting, interpreting, and cultivating diagnoses; (2) feedback among these processes opens and closes windows of adaptive problem solving; and (3) reinforcing feedback and confirmation bias, usually considered dysfunctional, are helpful for adaptive problem solving.
Capturing and utilizing previously generated knowledge is crucial to an organization's development and responsiveness in a dynamic environment. Even so, the creation of a reference source from operational knowledge is affected by situational contexts, events, and organizational constraints. The organizational operations to reference cycle within NASA's Mission Control Center can be examined by the approval process of Flight Rule Change Requests (FRCRs). The FRCR process is intended to document knowledge capture and synchronization tasks associated with space flight missions. External pacing exists due to the operational demands of the launch schedule; procedures are not written or modified without direct relation to operational experience or mission requirements. Preliminary analysis illustrates that, although the FRCR approval process has a cyclic nature with a natural frequency of about one month, launches act as critical forcing functions since procedural knowledge and controller expertise must be synchronized before each mission.
Two studies were conducted to identify effective communication strategies for calling attention to problems and getting action on them from other crew members. In Study 1, pilots in both crew positions relied primarily on one status-consistent strategy to request action of another crew member: Captains generally preferred to use commands, while first officers predominantly used hints. However, when asked to rate the effectiveness of various strategies in Study 2, captains and first officers: favored communications that appealed to the crew concept rather than to any particular status-based model.