Content uploaded by David D Woods
Author content
All content in this area was uploaded by David D Woods
Content may be subject to copyright.
Cooperative Advocacy: An Approach for Integrating
Diverse Perspectives in Anomaly Response
Jennifer Watts-Perotti
2
& David D. Woods
1
1
The Ohio State University, Columbus, OH, USA (E-mail: woods.2@osu.edu);
2
Xerox Innovation
Group, Rochester, NY, USA
Abstract. This paper contrasts cooperative work in two cases of distributed anomaly response,
both from space shuttle mission control, to learn about the factors that make anomaly response
robust. In one case (STS-76), flight controllers in mission control recognized an anomaly that began
during the ascent phase of a space shuttle mission, analyzed the implications of the failure for
mission plans, and made adjustments to plans (the flight ended safely). In this case, a Cooperative
Advocacy approach facilitated a process in which diverse perspectives were orchestrated to provide
broadening and cross-checks that reduced the risk of premature narrowing. In the second case (the
Columbia space shuttle accident—STS-107), mission management treated a debris strike during
launch as a side issue rather than a safety of flight concern and was unable to recognize the dangers
of this event for the flight which ended in tragedy. In this case, broadening and cross-checks were
missing due to fragmentation over the groups involved in the anomaly response process. The
comparison of these cases points to critical requirements for designing collaboration over multiple
groups in anomaly response situations.
Key words: anomaly response, space missions, mission control, accidents, replanning, premature
narrowing, Columbia accident, cooperative work, cross-checks, crisis management
1. Introduction
Anomaly response is a form of diagnostic work where a fault or disrupting event
produces a cascading set of disturbances in the process under control or plan in
progress (Woods and Hollnagel 2006). Recognizing what is anomalous,
diagnosing what is producing anomalies, analyzing the implications of anomalies,
and modifying plans and reprioritizing goals to maintain control are quite difficult
tasks vulnerable to many generic forms of failure (e.g., premature narrowing). In
most work domains like health care delivery, air traffic management, critical care
medicine and space mission control, anomaly response tasks involve multiple
groups with partial, interdependent and overlapping roles—distributed anomaly
response. Collaboration and interaction over these roles can improve the
robustness of anomaly response or exacerbate the vulnerabilities for anomaly
response tasks to break down.
Computer Supported Cooperative Work © Springer 2009
DOI 10.1007/s10606-008-9085-4
Anomaly response can break down in many ways. Among the patterns identified,
anomalous findings can be discounted or minimized (Klein et al. 2005). The
diagnostic process can become too focused too early on a narrow hypothesis set
or a narrow set of findings to be explained (De Keyser and Woods 1990;Klein
et al. 2006). Practitioners can lose track of the cascade of disturbances within an
anomaly, causing inappropriate or mis-timed responses to the anomaly as it evolves
(Woods and Patterson 2000). Implications of disturbances for the plans in progress
canbemissed(Smithetal.1997). Re-planning can focus on working around
specific impasses and miss side effects on wider sets of constraints (Shattuck and
Wood s 2000). Responses can become fragmented and poorly synchronized in time
leaving gaps in assessments and plans. As consequence of fragmentation and gaps,
group’s can carry out activities that look locally appropriate but that actually that
work at cross-purposes when considered from a broader perspective (Klein 2007).
The cognitive functions that occur as part of anomaly response are distributed
across multiple interdependent people and groups in many situations ranging
from process control industries (Woods et al. 1987), offshore oil production (Flin
et al. 1996), military command and control (Shattuck and Woods 2000), crisis
response (Flin and Arbuthnot 2002; Militello et al. 2007), critical care medicine
(Gaba et al. 1987), health care delivery (Patterson et al. 2004) to air traffic
management (Smith et al. 2007). Distributing the functions of anomaly response
over multiple groups can be a means to make anomaly response more reliable
and robust. However, dividing and integrating the different perspectives and
activities creates interactions with vulnerabilities that also can contribute to
break downs.
This paper examines two cases of distributed anomaly response, both from
space shuttle mission control, to learn from the contrast in the processes and
interactions across groups. In the first case, mission control addressed an anomaly
that began during ascent (a hydraulics leak), analyzed the implications of the
failure for mission plans, and made adjustments to the plans, concluding with a
safe shuttle landing (STS-76). In the second case, mission management treated a
debris strike during launch as a side issue, and was unable to recognize the
danger of this event for the safety of the flight. The flight ultimately ended in
tragedy when the Columbia space shuttle disintegrated upon entry to the Earth’s
atmosphere due to damage to the leading edge of the wing caused by the debris
strike (STS-107).
Aerospace disasters generate independent thorough accident investigations
which provide data to examine how different groups worked different parts of
the mission (Columbia Accident Investigation Board or CAIB 2003). These
investigations provide evidence for others to build on in their own analyses of
the coordination or fragmentation across the different group activities (e.g.,
Starbuck and Farjoun 2005; Woods 2005). Successful anomaly response cases
usually provide less opportunity for investigation and analysis. However, in the
case of STS 76 a research project on anomaly response happened to be
Jennifer Watts-Perotti and David D. Woods
underway with research observers present when the anomaly began during
mission ascent. This fortuitous turn of events led to observations of the mission
control response over multiple days until the shuttle landed safely (Watts-Perotti
and Woods 2007).
These two analyses provide an opportunity to examine the factors in
collaborative activity that make anomaly response more robust despite the
various complexities and challenges that can arise. The Columbia Accident
Investigation highlighted the lack of “rigor”in the analyses that preceded
decisions about the anomaly. Managers were unaware they were making
decisions based on engineering analyses that appeared thorough, but were in
fact of very low rigor (Columbia Accident Investigation Board 2003; Return to
Flight Task Group [RTFTG] 2005). Others involved in the mission were unable
to recognize the weaknesses in apparent answers to safety-critical questions and
therefore the system as a whole was unable to detect and correct weaknesses in
anomaly response for that mission.
The analysis of anomaly response during the STS-76 mission highlighted
several ways that interactions across groups led those involved to broaden their
exploration of the anomaly and its implications, and helped groups find and
resolve weaknesses in their assessments and positions. One pattern that emerged
was termed Cooperative Advocacy, an approach that led to the detection and
resolution of conflicts in assessment, analysis, and re-planning (Watts-Perotti and
Woods 2007). Functionally distinct groups came together regularly during the
anomaly response process in STS-76. As practitioners prepared for and
participated in these coordinative meetings, they acted as advocates for the
systems they monitored, based on their distinct roles and perspectives. As a
result, they could propose constraints on the anomaly response process and on
mission activities which upheld their perspectives, addressed the scope of
authority of their group, and achieve safety within their scope of responsibility.
Through advocacy, differing assessments, conflicting constraints and disagree-
ments over plan modifications came to light. When conflicts surfaced, the distinct
groups performed additional analysis, and worked together to search for or
innovate alternative ways to satisfy the total set of constraints over all of the
relevant groups’functions and goals. Thus, Cooperative Advocacy is an approach
for coordinating the multiple perspectives to help the groups involved expand the
range of factors considered and recognize potential mis-steps early in replanning
decisions.
This paper develops the concept of Cooperative Advocacy for distributed
teams by juxtaposing the anomaly response process in the STS-76 mission
where a Cooperative Advocacy approach was observed to help the teams detect
and resolve weaknesses and differences in assessments and re-planning
decisions, with the process in the case of STS-107 where the multiple teams
involved failed to identify weaknesses in assessments and failed to re-plan
successfully.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
2. Background
2.1. Anomaly response
Anomaly response consists of three interdependent and interwoven cognitive
components (Woods and Hollnagel 2006, chapter 8):
(1) anomaly recognition, in which practitioners identify and update the set of
findings to be explained;
(2) diagnostic search, in which practitioners attempt to develop a coherent
explanation for the findings; and
(3) response management, in which practitioners determine the implications of
the disturbances and their diagnosis for future plans and procedures.
Practitioners monitor some processes that change over time. New events can
occur which disrupt ongoing processes and plans (event-driven situations). When
faults or other abnormal events occur, they produce disturbances in the process
being managed and these disturbances can cascade. Practitioners may need to act
immediately to keep these disturbances from threatening the integrity of core
activities and the goals they serve (e.g., safety, and such actions are referred to as
“safing”actions in space shuttle mission control).
In parallel with safing actions, practitioners conduct diagnostic activities to
determine the source of the disturbances—in order to identify the underlying
problem so the source of the disturbances can be eliminated or contained.
Depending on the nature of the trouble, the anomaly has implications for plans in
progress which can lead to shifts in mission tasks and reprioritizing mission
goals. Various contingency plans serve as resources to guide these responses to
the evolving situation and to avoid mistakes. Anomaly response situations can
become quite challenging as cascading effects and changing tempo of operations
place pressure on monitoring, diagnostic search, shorter term corrective actions,
and longer term replanning. Anomaly response activities are further complicated
because operators must handle multiple interleaved tasks, consider multiple
interacting goals, and be ready to revise assessments and plans as new evidence
comes in or as the situation changes.
One particular vulnerability in anomaly response is premature narrowing.
Research findings have highlighted the danger of becoming stuck in one
assessment and being unable to revise the assessment even as situations change
or new evidence comes in (Woods et al. 1987; Johnson et al. 1991). Studies of
hypothesis generation in diagnostic reasoning have found that a possible solution
to this problem is to broaden the set of possible explanations to be considered
(Gettys et al. 1987). Studies of re-planning find that overlooking the side effects
of changes to a plan is a significant risk (Smith et al. 2004). Studies of
professionals in information analysis found that premature narrowing was a basic
vulnerability as analysts moved to new computer-based systems to cope with
Jennifer Watts-Perotti and David D. Woods
massive increases in data availability (e.g., Patterson et al. 2001; Elm et al. 2005).
Follow up studies emphasized the need for broadening mechanisms to expand
the data and possible explanations analysts explored (Elm et al. 2005; Zelik
et al. 2007a).
The dangers of premature narrowing are also evident in studies of
collaboration. Layton et al. (1994) found that certain design characteristics
introduced into collaborative human–machine systems can narrow the range of
data considered and hypotheses explored. Studies of error recovery point to the
need for collaborative cross-checks across the multiple practitioners involved in
managing critical activities (Patterson et al. 2004 and Branlat et al. 2008 in health
care or Fischer and Orasanu 2000, for aviation). Studies of human–human
collaboration find that lack of diversity across participants can also contribute to
premature narrowing that limits problem-solving performance (Hong and Page
2002; Page 2007). Studies of what increases rigor highlight the need for
collaborative interactions to corroborate assessments, to cross-check for weak-
nesses, and to broaden the set of possible explanations (Zelik et al. 2007b).
All processes for anomaly response must, within some time and resource
horizon, funnel-in on key data sources, on the basic unexpected findings to-
be-explained, on the storyline that explains the unfolding events and evidence
gathered, and on the critical action or plan revision that needs to be undertaken to
accomplish goals in a changing situation. The danger is a premature narrowing
that misses or discounts evidence that would lead to revision. Experienced
practitioners and teams develop broadening checks that they combine with the
normal funneling processes to reduce the risk of premature narrowing or closure.
The collaborative system adjusts the sequence of funneling-in plus broadening or
cross-checks to converge in a timely manner, while remaining sensitive to the
need to revise previous assessments. Using broadening checks is a balance
between the need to be sensitive to the potential for misassessment and the need
to accomplish work within time and resource bounds inherent in evolving
situations with goal pressures. Thus, one part of the value of effective
collaborative interconnections lies in how they can broaden focus, recognize
possible mis-assessments, reduce the risk of missing side effects in re-planning,
and support revision (Woods and Hollnagel 2006; Rudolph et al. 2008).
2.2. Structure and functions of mission control communities
Space shuttle ground support consists of different communities, which have
distinct, but overlapping expertise, resources, goals, responsibilities, and
authority (additional studies of how mission control works include (Patterson
and Woods 2001; Garrett and Caldwell 2002; Mark 2002; and Shalin 2005).
Anomaly response in the STS-76 hydraulics leak case was distributed across two
of these communities: Operations and Engineering.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
The Operations Community is a multi-level set of teams responsible for
monitoring each of the shuttle subsystems during every mission (e.g. Patterson
et al. 1999). This Community monitors real-time telemetry data from shuttle
subsystems. If controllers encounter anomalies in the telemetry data, they have
the authority to recommend specific procedures and actions for the crew to
perform in response to these situations. Therefore, it is the Operations
Community that becomes directly involved in the dynamics of managing faults
and disturbances during shuttle missions. Members of the Operations Community
detect and diagnose unfamiliar patterns in shuttle telemetry data, recommend
diagnostic and therapeutic actions, and develop primary and contingency plans
for future mission phases or for responding to potential failures in other systems.
The Engineering Community is a separate set of multi-level teams that
coordinate their activities to track and maintain individual shuttle components
across missions. They track performance trends for each component over time,
and decide when the components should be tested, refurbished, replaced, or
redesigned. When an anomaly occurs during a mission, the Engineering
Community determines what actions should be taken to make sure this
behavior does not occur again in subsequent flights. Even before the shuttle
lands, this community initiates efforts to understand the anomalous system
behavior and its implications for future flights, and they begin the turnaround
process of creating post-flight maintenance and design improvements. These
turnaround activities can affect shuttle launch schedule and program produc-
tivity. (In fact, it was turnaround productivity pressures that played a role in
the events leading up to the STS-107 accident; CAIB 2003; Starbuck and
Farjoun 2005)
In the Columbia accident a management group, the Mission Management
Team, was central to the decision making processes (ironically, the mission
control operations community—the flight director, flight controllers and their
supporting teams—was not a main player in the events related to the debris
strike). The Mission Management Team (MMT), consisting of managers from
engineering, operations and several other organizations including safety, systems
integration, science and others, addresses problems that arise during a mission
that are outside the responsibility and authority of the Launch and Flight
Directors (see CAIB 2003). The MMT convenes 2 days before a launch and
continues until the space shuttle lands safely. Because debris strikes are a basic
risk to the integrity and safety of the space shuttle, whenever one is detected,
NASA guidelines call for initiating a Debris Assessment Team to analyze the
consequences and risks of the strike. In the case of the Columbia STS 107 the
first formal meeting of this team occurred 5 days into the mission.
Each anomaly (a foam strike on launch, foam being one kind of debris that can
hit and damage structures on the orbiter, and a hydraulics leak in an auxiliary
power unit during ascent) presented itself to the groups handling shuttle missions
differently, triggering different interactions initially and different interactions over
Jennifer Watts-Perotti and David D. Woods
the time extent of each mission. In STS-76 both the operations teams and
engineering teams monitoring shuttle systems during ascent noticed the anomaly
in telemetry data, and the anomaly had implications for the different goals and
responsibilities of each group. The subsequent interactions between the two
groups were central to how the anomaly was handled.
The group interactions throughout this case are available because the authors
were involved in a study of anomaly response in cooperation with NASA at the
time of the mission (Watts-Perotti and Woods 2007). An observer from the
research team (first author) was present and shadowed the front and back rooms
of the Mechanical, Maintenance, Arm and Crew Systems (MMACS) flight
control team during the entire STS-76 mission. This operations team monitors the
shuttle’s mechanical systems, which included the leaky hydraulics system. The
study was based on 70 h of observations documented in field notes and
transcribed tape recordings. Observations included shift handovers, discussions
within the MMACS team, diagnostic analyses, voice loop conversations (see
Patterson et al. 1999 for a description of the voice loops and their functions),
conversations between the MMACS team and other flight control teams, and
coordinative meetings between operations and engineering. Follow up interviews
with participants were conducted to supplement observations.
In STS-107 the debris (foam) strike was recognized on Flight Day 2 when the
Intercenter Photo Working Group received high resolution film of the launch.
This group’s report went to the MMT and to other groups who formed a Debris
Assessment Team per standard practice. The sequence of events, group
interactions, decisions, and context were established by the Columbia Accident
Investigation Board (Columbia Accident Investigation Board 2003). As a
consultant to this board on the decision making processes prior to launch and
post-launch, the second author examined a variety of source material and
participated in discussions with the accident investigation team as the analysis
was completed (see Woods 2005).
Thus, the information about collaborative processes over groups in the two
cases is available through different means—a thorough detailed retrospective
analysis from the CAIB for one case and direct observation in the other case.
However, the detail and quality of information actually is quite similar because
NASA captures a great deal of the deliberations that go on during and leading up
to shuttle missions which could be used to develop detailed protocols of across
group interactions (e.g., logs, transcripts, electronic memos). For example, the
CAIB had transcripts of all of the MMT meetings available for detailed analysis,
and the Watts-Perotti and Woods (2007) study of STS-76 included data from
after-the-fact interviews with many of the flight control team (7 out of the
8 MMACS group involved). In these after-the-fact interviews, mission
documentation was used as cues to ground flight controller’s descriptions of
perceptions, intent, and activities as they and others determined how to adapt
given the anomaly (cued retrospective technique).
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
2.3. Summary of the response to the STS-76 shuttle anomaly
The purpose of the STS-76 shuttle mission was to transport an American
astronaut to the Russian MIR Space Station. Just after liftoff, the Mechanical,
Maintenance, and Crew Systems (MMACS) Flight Control Team noticed that the
quantity value of one of three hydraulics systems began to decrease (the finding
to-be explained). The team recognized this pattern in their telemetry data as a
classic leak signature (initial diagnosis) and proceeded to investigate the
characteristics of the leak.
The team began to enact the procedures that had been created to respond to a
hydraulic leak. First, they quickly calculated the slope of the line in the telemetry
plot and determined that the leak rate was not great enough to call a mission abort
(safing responses). Next, they checked the values of the other two hydraulics
systems for an increase in their quantities to determine whether “system three”
was leaking into the other two systems. There was no increase, so they assumed
the leak was external, and that there was no intersystem leakage. Given these
initial assessments they decided the mission did not need to be aborted, and they
allowed the crew to continue their ascent into orbit and cut off the main engines
(additional safing response).
Just after main engine cutoff, the MMACS Flight Control Team performed the
diagnostic intervention of asking the crew to close a set of isolation valves to
determine whether they could isolate the leak. When the valves were closed, the
quantity telemetry signature of system three flatlined. For a moment, it looked as
if the intervention was successful and that the leak had been isolated. However,
the quantity soon began to drop again. Therefore, the team determined that the
leak was not isolatable. When the orbit phase began, practitioners shut down the
leaky hydraulic system and began to further characterize the anomaly and
investigate its implications for the remainder of the mission (see Watts-Perotti and
Woods 2007 for more details about the mission).
Orbit: Description of the coordinative anomaly response process. When the
shuttle reaches orbit, the pace of activities for Flight Controllers slows, which
provides opportunities to investigate anomalies that have occurred during ascent. It is
at this stage of the mission when the cognitive demands of anomaly response and the
demands for coordination across teams escalate (Woods and Patterson 2000). After
the leaky hydraulic system was shut down, practitioners were primarily engaged in
contingency evaluation and replanning processes, given that an explanation for the
anomaly (a leak in the hydraulics system) seemed clear-cut. These replanning
activities focused on several issues that had to be resolved as a result of the anomaly:
the impact of the anomaly on 1) the mission schedule, on 2) preparation for re-entry
and on 3) the actual re-entry phase of the mission.
In the process of resolving these issues, the engineering and operations
communities’scopes broadened, and their activities began to overlap. As a result,
Jennifer Watts-Perotti and David D. Woods
these communities began to interact and coordinate their response to the anomaly.
Throughout the orbit phase of the mission, the anomaly response activities
occurring across these functionally distinct communities were anchored around a
series of formal coordinative meetings. By observing the interactions across these
communities, Watts-Perotti and Woods (2007) found that their distinct perspec-
tives gave rise to different assessments and viewpoints which were compared and
coordinated throughout the anomaly response process. These distinct perspectives
often led the communities to prefer different approaches, which in contrast with
each other, represented different sides of several interesting tradeoffs. Here, we
walk through one set of trade-offs that arose (Watts-Perotti and Woods 2007
presents a complete description of the series of coordinative meetings and other
trade-offs).
Determining whether to use the leaking system during descent. One of the issues
that needed to be resolved during the STS-76 mission was whether the leaky
hydraulics system should be used during the descent phase of the mission.
Originally, both the Operations and Engineering communities agreed that the leaky
system should not be used for descent. In preparing for a coordinative meeting to
discuss this issue, the Operations Teams reviewed data from a previous mission
during which a leak occurred. In this review, they discovered flight rules that led
them to change their entry plan stance, and propose that the leaky system should be
used for descent. Note that the diagnosis of the anomaly remained the same; it was
the replanning efforts that the Teams were working to resolve.
At the beginning of the meeting, the Operations teams presented their new
entry plan. This new plan surprised the Engineering teams, whose initial reaction
was negative. The plan triggered a new idea from one of the engineers. He
proposed that a high-pressure leak could damage some sensitive areas of the
shuttle. Since the exact location of the leak was not known, the leaky hydraulic
system should not be used. The following sample is taken from handwritten notes
recorded during the meeting (see Watts-Perotti and Woods 2007):
Engineering:
You’re taking a major risk by doing norm press (normal pressure mode on
entry). Our previous analysis (during the other mission that sustained a leak)
was only considering flammability, not pressure, etc. We don’t know where the
leak is. It could shoot 3000 PSI at something and damage it.
A crew representative (a third perspective) also introduced another new idea: the
possibility that leaking hydraulic fluid could pose a fire hazard if it touched hot
shuttle elements. Neither of these potential complications had been discussed
publicly within or across the sets of teams before the Operations teams introduced
their new entry plans. Based on these new hypotheses, the two communities agreed
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
to analyze the hypotheses independently and reconvene to resolve the issue. A later
coordinative meeting led to the resolution. The new analyses confirmed the risks of
running the leaky system and all parties agreed to not use the system during descent.
These examples show how discussions between the functionally distinct
groups gave rise to new ideas about potential consequences which challenged and
ultimately overturned the working hypothesis. These factors were not noted by
any one group in their own analysis process, but emerged from interactions
across multiple perspectives represented by the different groups.
The Cooperative Advocacy Process. Watts-Perotti and Woods (2007) describe the
processes observed during the STS-76 mission as Cooperative Advocacy, which
allowed practitioners to reveal and resolve conflicts in assessment, analysis, and
re-planning. During the coordinative meetings, practitioners acted as advocates
for the shuttle systems and mission goals that fell within their primary
responsibility. Functionally distinct teams represented different perspectives
with different goals, tools, data, and experience. Each group proposed
constraints on mission activities, and on the anomaly response process, which
upheld their role and scope of responsibility relative to the whole mission and to
the larger Shuttle program. The term “advocacy”in the Cooperative Advocacy
process serves as a pointer to this aspect of their interactions.
Often, the constraints identified by different groups conflicted. For example,
members of the Engineering Teams proposed the constraint that an Auxiliary
Power Unit should be run for Flight Control Systems checkout. However, the
Flight Controllers responsible for the leaky hydraulics system had the goal of
minimizing activity that directly affected any of the hydraulics systems, therefore
protecting system redundancy. To meet this goal they proposed the constraint that
the Auxiliary Power Unit should not be run during the checkout task. During the
debate, the flight control team introduced an alternative:
Operations:
What about running a 303 procedure?
(other meeting attendees seemed as if they had not considered this option
before)
Crew Rep:
Well, a 303 is not normally done. We’ve tried it in simulations and they
usually run out of time. The crew doesn’t really want to do this.
Engineering:
We could say do circ pump ops and then the 303 actuator check, but you might
not get it, and even then, you’re not getting everything you would get if you
run an APU for FCS checkout.
Jennifer Watts-Perotti and David D. Woods
In this example, the Engineering community advocated for their mission goals by
proposing a plan that would maximize the information gain related to their goal of
fixing the shuttle when it returned and/or redesigning the shuttle for future flights.
The Operations community advocated for their mission goals of keeping the shuttle
as safe as possible by running an alternative procedure which minimized the risk of
further damage to the leaky system. The crew representative advocated for the crew,
attempting to keep them out of a situation in which they would have to run an
unfamiliar procedure that they might not be able to finish in the time available. This
interaction across perspectives is another example of how disagreements between the
distinct communities led to the consideration of alternative plans, which might not
have been considered if groups worked in isolation.
In the Cooperative Advocacy approach, the word "cooperative" refers to the
higher-level goals shared by all the groups, teams, and individuals. Ultimately, all
practitioners were interested in bringing the shuttle back to Earth safely and in
keeping it running properly across missions. If one community disagreed with a
plan presented by another community, and could present a clear case for why this
plan did not satisfy important constraints, the communities coordinated their efforts
to find alternative ways to satisfy those important constraints. The debate over the
checkout task is an example where communities continued to explore alternative
plans until they could find a means to satisfy the safety constraints of the flight
control team and also meet the information needs of the Engineering community.
2.4. Summary of the response to the STS-107 shuttle anomaly
STS-107 was a 16-day mission, during which approximately 80 international physical,
life, and space science experiments were conducted (http://www.nasa.gov/columbia/
mission/index.html). “Upon reentering the atmosphere on February 1, 2003, the
Columbia orbiter suffered a catastrophic failure due to a breach that occurred during
launch when falling foam from the External Tank struck the Reinforced Carbon panels
on the underside of the left wing. The orbiter and its seven crewmembers were lost
approximately 15 min before Columbia was scheduled to touch down at Kennedy
Space Center”(http://history.nasa.gov/columbia/Introduction.html).
The foam strike that led to the shuttle’s tragic end was detected during the
launch phase of the mission (via standard means for monitoring shuttle launches
by the Intercenter Photo Working Group). The report and data on the foam strike
was sent to the Mission Management Team which became key to handling of the
event. Because of limited resolution and views of the area where the debris strike
occurred the initial report could not determine if the orbiter had sustained
damage. The chair of the Intercenter Photo Working Group anticipated that
further analysis of the debris strike would require additional information and
placed a request to have the Department of Defense (DoD) obtain a high
resolution image of the Orbiter on-orbit.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
The initial report triggered a variety of groups to begin to look at the available data
to analyze the result of the impact (e.g. Mission Evaluation Room, United Space
Alliance contractor groups) and, per standard NASA guidelines, this led to the
assembly of a Debris Assessment Team to analyze the anomaly post-launch, before
the descent phase of the mission (though their first meeting occurred on Flight Day 6).
Very early assessments of the foam strike led to a stance in the Mission
Management Team and among other mission managers that further analysis was
not critical to the mission (CAIB, chapter 6). For example, in the MMT meeting
on Flight Day 9, the following is the only discussion of the event: “And we sent
up to the crew about 16-s video clip of the strike just so they are armed if they get
any questions in the press conferences or that sort of thing. We made it very clear
to them, no concerns.”(Mission Management Team transcript of 01-24-2003). In
general the transcripts show the MMT meetings included only brief discussions
regarding the foam strike. On Flight Day 2 a memo stated, “Basically, the RCC
[reinforced carbon-carbon, shorthand for the wing leading edge structure] is
extremely resilient to impact type damage.”The memo went on to state that the
debris didn’t look “big enough”to “pose any serious threat”, did not have enough
energy to create significant damage, tiles in the leading edge area “are thicker
than required”, most likely received only “shallow damage”, and there is a single
mission safe re-entry plan available (CAIB, p. 141). All of these points were used
to justify the pre-existing belief the debris strike was not a safety issue requiring
significant follow through. None of these points were derived from evidence or
analysis. All of the rationalizations turned out to be incorrect—the confidence in
the strength of the leading edge structures was unjustified, the debris strike was
large (speed and mass), the energy of the strike was far outside the envelope for
the analytic tools available to predict damage, and there was insufficient data
available to be confident in any assessment—with the exception that the debris
did strike the leading edge structure (the underside of RCC8). The initial request
to obtain further images from DoD assets and two subsequent requests to obtain
the imagery were denied by mission management as “tile damage should be
considered a turn-around maintenance concern and not a safety-of-flight issue”,
therefore, “imagery of Columbia’s left wing was not necessary.”(CAIB, p. 151).
Management’s stance toward foam strikes had developed earlier in the history
of shuttle missions. First, foam strikes were re-classified from in-flight anomaly
status to maintenance and turn around issues (STS-113 Flight Readiness Review,
CAIB, p. 125–126). Accompanying this re-classification was a general shift to
see foam loss as an accepted risk or even as one pre-launch briefing put it—“not a
safety of flight issue”(CAIB, p. 126 1st column to top of 2nd column). In
addition, the fact that the shuttle underwent previous debris strikes without
consequence may have contributed to a false confidence that influenced the
Mission Management Team’s initial assessment that foam strikes pose little risk
to orbiter safety (see Woods 2005 for an in-depth discussion of the organizational
drift toward failure).
Jennifer Watts-Perotti and David D. Woods
On the other hand working engineers thought damage could have occurred and,
more importantly, recognized that they did not have sufficient evidence and tools to
complete a high confidence assessment of potential damage and its consequences.
For example, an ad hoc team working on Flight Days 3 and 4 (the weekend
following launch) developed an estimate of the size and speed of the debris strike
that showed the strike to be a serious threat with a total energy hundreds of times
larger than the assumptions built into the standard tools for analyzing debris strikes
on tiles (and these estimates turned out to be accurate). Overall, the CAIB report
(chapter 6) found at least 8 opportunities were missed where actions could have led
to the recognition that the orbiter had suffered serious damage, and it goes into detail
about how each of these opportunities arose and the specific factors that blocked or
sidelined each opportunity to understand the damage and its implications.
Eventually, in response to the STS-107 foam strike during launch (per standard
practices), a Debris Assessment Team formed and conducted several technical
analyses of the anomaly, including modeling of the strike with a simulation
modeling tool (Crater). This modeling tool was limited in its ability to provide a
clear understanding of the risks associated with the foam strike because the STS-
107 strike was hundreds of times the scale of what the model is designed to
handle (email on p. 151–152 CAIB).
The Debris Assessment Team also researched the history of debris strikes in
past shuttle missions to determine whether events like the STS107 strike had
occurred before—Is the size of the debris strike “out-of-family”or “in-family”
given past experience? While the team looked at past experience, they were
unable to get a consistent or informative read on how past events indicated risk
for this event (Woods 2005). This team did, however, encounter several pieces of
evidence that the strike posed a risk of serious damage. For example, the foam
debris in STS-107 was 600 times larger than previously analyzed ice debris
(CAIB, p. 145), and models predicted that tile damage on STS-107 was deeper
than the tile thickness (CAIB p. 143). However, the team apparently discounted
these pieces of evidence and moved on to consider other possible damage
scenarios that would pose risk to safety during re-entry (e.g., focusing on the risk
that landing gear door seals were damaged).
As the Debris Assessment Team worked at the margins of available knowledge
and data (a significant cue in itself), their partial assessments “did not benefitfrom
cross-checks through interactions with other technical groups with different
backgrounds and assumptions”(Woods 2005). The CAIB did not find any report
of a technical review process that accompanied the Debris Assessment Team’swork.
Overall, the Debris Assessment Team was unable to “integrate partial and uncertain
data to generate a big picture, i.e., the situation was outside the understood risk
boundaries and carried significant uncertainties”(Woods 2005). Recognizing that the
situation at hand falls outside the bounds of previous experience and models and
carries a high level of uncertainty should serve as a major beacon signaling the need
for much greater investment in more rigorous analysis.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
Woods (2005) also notes that information about the strike was scattered across
time, and across disconnected groups of people.
What is striking is how there was a fragmented view of what was known about
the strike and its potential implications over time, people and groups. There
was no place, artifact, or person who had a complete and coherent view of the
analysis of the foam strike event (note a coherent view includes understanding
the gaps and uncertainties in the data or analysis to that point).
The Columbia Accident Investigation Board analysis noted that the team
who was charged to analyze the anomaly was unable to generate a clear
picture of the anomaly and its associated risks. This lack of clarity, combined
with the Mission Management Team’s stance that foam strikes are merely
turn-around issues, created the effect of an “anomaly in limbo”(Woods 2005).
Not dismissed completely, yet the organization as a whole was unable to get
traction on the event as an in-flight anomaly with safety implications to be
thoroughly analyzed from different perspectives including contingency
analyses. Woods notes that this kind of situation, where understanding an
anomaly becomes stuck in an incomplete state unable to move forward,
usually emerges at the boundaries of different organizations that do not have
the appropriate mechanisms to facilitate constructive interplay, integrate partial
information, and recognize conflicts.
In interacting with the groups attempting to analyze the nature and
implications of the debris strike, the Mission Management Team in the STS-
107 created an atmosphere where the analysts needed to demonstrate that the
foam strike was an issue in order to generate resources (e.g., get on-orbit
images) and to lay a claim on the time of mission management to consider
risks and contingencies. In other words, the analytic burden was to show the
foam strike as an ill-understood anomaly which requires further assessment,
contingency evaluation, and re-planning. The norm in effective anomaly
response should be the opposite—all discrepancies are anomalies until
analysis, appropriately scaled for rigor, shows that the anomaly in question
requires minimal or no modifications to plans or contingencies (Woods 2005;
Zelik et al. 2007b). The stance downplaying the significance of the debris
strike emerged before results were obtained from any technical analyses,
therefore decreasing the chances that the shuttle communities as a whole
would follow a rigorous engineering analysis process, where data evaluation
guides conclusions.
The STS-107 case, in which the anomaly response process was fragmented
across disconnected groups working in isolation, contrasts dramatically with the
anomaly response process that occurred for STS-76, where functionally distinct
teams used a Cooperative Advocacy approach to coordinate their efforts in
assessing and responding to the anomaly.
Jennifer Watts-Perotti and David D. Woods
3. Contrasting distributed anomaly response in STS-76 to STS-107
In comparing STS-76 with STS-107, one of the distinct differences between the
cases is the degree to which multiple diverse perspectives became involved in the
anomaly response process. During STS-76, functionally distinct teams collabo-
rated in a coherent, flowing response process. The diversity in the teams’
perspectives and approaches provided opportunities to catch mistakes and
broaden the set of alternative hypotheses (Watts-Perotti and Woods 2007). The
anomaly response process in STS-107 was fragmented across time and across
groups who did not communicate with each other, and who did not bring in broad
sets of expertise during decision making processes.
While both the STS-76 mission and the STS-107 mission were required to
respond to an anomaly that occurred early in a shuttle mission, the ways in which
the anomaly response process was approached were quite different. This section
discusses the differences in the anomaly response processes between the two
cases.
While one was a success and the other a failure in terms of outcome, it is
important to note that a variety of small missteps occurred during the anomaly
response process in both cases. To gain leads about how collaboration over
groups can make anomaly response more robust, the contrast focuses on how
missteps were detected and resolved in one case, while opportunities to correct
missteps were not created or were missed in the other case. Good process cannot
guarantee good outcomes, but some processes guard against complexities and
vulnerabilities that can lead to failures in anomaly response (Woods et al., in press).
Differences in the default anomaly response structure. One of the ways in which
the two cases differed is that the hydraulics leak anomaly on STS-76 occurred in
a shuttle system that was monitored by default by two functionally distinct
communities. Both the Engineering and the Operations communities always
monitor the hydraulic systems during ascent because they both have
responsibility for different aspects of the system. Therefore, when the anomaly
occurred, both of the functionally distinct communities were simultaneously
collecting data about the leak, and initiating parallel anomaly response processes.
In this case, the anomaly response process was structured by default in a way that
easily facilitated the possibility of broadening the set of possible assessments and
responses. Since foam strikes had been previously re-classified as a turn-around
issue, and not critical to orbiter safety, the operations community did not own
responsibility for the foam strike in STS-107. Therefore, they were not engaged
in the analysis of the strike.
Differences in initial stances. Another difference between the two cases is the
differing stances toward the anomalies adopted by the Mission Management
Team. In STS-76, the Mission Management Team assumed the hydraulic leak
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
was an issue that could have serious consequences for the mission. In the case of
a hydraulic leak there was a presumption that the mission might need to end early
and analysis was required to understand the anomaly, assess the consequences,
and estimate uncertainties as a basis for decisions about modifying mission
duration and planned accomplishments. This early stance in management
provided the framework to facilitate a thorough, evidence-driven analysis of the
leak behavior and its implications for the rest of the shuttle mission. The
management stance adopted in STS-107 was quite different as they assumed the
foam strike was of low risk to the orbiter safety so that the assessment team had
to generate evidence and analysis sufficient to change management’s mindset.
For anomaly response to be robust given all of the difficulties and
complexities that can arise (Woods and Hollnagel 2006), all discrepancies
must be treated as if they are anomalies to be wrestled with until their
implications are understood (including the implications of being uncertain or
placed in a difficult trade-off position). This stance is a kind of readiness to re-
frame and is a basic building block for other aspects of good process in
anomaly response (Klein et al. 2005
,
2006). Maintaining this as a group norm
is very difficult because following up on discrepancies consumes resources of
time, workload, and expertise. Inevitably, following up on a discrepancy will be
seen as low priority for these resources when a group or organization operates
under severe workload constraints and under increasing pressure to be “faster,
better, cheaper”(Woods 2006).
Differences in handling inconsistent evidence. How teams reacted to data that did
not support their evolving assessments of the anomalies was different in the two
cases. In STS-107, uncertainties and inconsistencies did not stimulate further analysis
and inquiry to resolve the issues. To do so requires processes that search out and bring
additional expertise and alternative perspectives to bear. In STS-107, conclusions
drove or limited the need for analysis, rather than investigations building the evidence
from which one then would (re-)evaluate risks, identify contingencies, and draw
conclusions (i.e., foam strikes were not safety offlight issues until shown otherwise).
However, in STS-76, the stances of management and the assessment teams was that
the anomaly could pose serious implications for the mission, and that the assessment
teams should determine what kinds of safing actions and responses were required to
maintain the safety of the orbiter as well as take into count other objectives.
There were several occasions when the STS-76 teams encountered evidence
that was not consistent with their assessments at the time. For example, at one
point, the operations community suggested that the leaky system should be used
during the descent phase of the mission. This suggestion was not consistent with
the Engineering community’s assessment that the leaky system should not be
used at all. When the Operations community presented this new suggestion, the
engineering team spontaneously came up with several reasons why this
suggestion was risky. Because the opposing assessments were held by
Jennifer Watts-Perotti and David D. Woods
functionally distinct groups, the communities found they had to test each
assessment, and create coherent arguments for why their current assessments
should, or should not be revised—rather than discounting inconsistent evidence
as was done in STS-107. Ultimately, the functionally distinct communities agreed
to choose the conservative approach of not using the leaky system, but this
decision was more robust because more alternatives had been considered and
tested in the process of making the decision.
Differences in assessment team structures. The anomaly response process in STS-
76 began with the participation of functionally distinct teams by default. Not only
were these distinct communities fully engaged in the anomaly response process from
the beginning, but they continued to bump into each other and stay connected to each
others’evolving assessments and response plans. It was through these connections
with each other that the Cooperative Advocacy process emerged. Ironically, the
venue for these iterative connections between the communities was a set of meetings
called by the Mission Management Team, which presented the directive that the
communities should meet with each other to resolve the conflicting assessments that
surfaced in these meetings (Watts-Perotti and Woods 2007).
On the other hand, the anomaly response process in STS-107 was fragmented and
there was a lack of constructive interplay between the Mission Management Team
and the Debris Assessment Team and other groups, due to the original assessment of
low risk and the inability of the Debris Assessment Team to form a clear picture of
the risk based on the modeling and past experiences they could draw on.
Differences in coherence. Due to the iterative connections between the distinct
communities in STS-76, combined with the stance that the leak could pose
serious implications for the mission, a coherent picture of the hydraulic leak
became distributed across all of the communities involved. While the Cooperative
Advocacy approach led the operations and engineering communities to consider
broad sets of alternatives, this approach also ensured that each community had a
more coherent understanding of the anomaly, because they shared the process of
testing the broad set of alternative plans and hypotheses that emerged. This case
sharply contrasts with STS-107 where what was known about the foam strike and
its potential implications was fragmented over people and groups, and over time.
Together the cases illustrate the need for a coherent view observable and
trackable by all relevant groups—which would be especially important for
tracking the status of issues in progress. This need also relates to issues that
appear to be resolved, since in these and other cases of anomaly response, some
apparently closed issues reappear and need to be reopened and reexamined as
situations change and new analytic results come in.
Differences in the ability to determine the need for more investigation. Another
interesting difference between the two cases is in the ability of the decision
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
makers to determine whether and where they needed to make more efforts and
draw in more expertise. Woods (2005) notes that the decision makers in STS-107
“did not seem able to notice when they needed more expertise, data, and analysis
in order to have a proper evaluation of an issue.”However, the decision makers
in STS-76 had built-in cues about when more investigation was appropriate.
These cues included conflicting assessments, conflicting stances, and conflicting
proposals. For example, when the operations and engineering communities
presented their early assessments of the leak behavior to the Mission
Management Team, they learned that they had conflicting interpretations of the
leak. These conflicting interpretations served as a cue to all that further
investigation was needed to determine what actually happened with respect to
the leak during the ascent phase of the mission.
4. Discussion: strengths of the cooperative advocacy approach
In STS-76, where a Cooperative Advocacy approach emerged, practitioners
developed broad sets of possible hypotheses and replanning alternatives (Watts-
Perotti and Woods 2007). These practitioners did not encounter break downs
like fixation in assessment (De Keyser and Woods 1990; Rudolph et al. 2008)
and the inability to detect side effects in re-planning (Klein 2007; Woods and
Hollnagel 2006). Several characteristics of the interactions across the teams
seemed to provide the opportunity to broaden the set of alternatives considered.
Intermixing of Parallel Activities. When the hydraulic leak occurred, the
Operations and Engineering communities began anomaly analysis and
response independently, in parallel (Watts-Perotti and Woods 2007). The
Operations community determined how the anomaly would affect the rest of
the flight, as the Engineering community determined what would need to
happen after the shuttle landed. Since the groups conducted their activities
toward distinct goals in parallel, they “had the opportunity to form distinct and
possibly divergent ideas about the anomalous situation”(Watts-Perotti and
Woods 2007). For example, the two communities used different methods for
analyzing the leak rate, which led to conflicting stances about whether the leak
rate increased or remained stable.
As the mission progressed, both communities began to “merge or align their
perspectives through the coordinative meetings”(Watts-Perotti and Woods 2007).
They independently built stances toward a set of issues, and then integrated or
debated these stances in the meetings. This process allowed the communities to
“mix their distinct ideas to produce a broad set of alternative hypotheses and
plans for future procedures”(Watts-Perotti and Woods 2007).
Revising Assessments and Re-planning Approaches. Conflicting assessments
across the communities in STS-76 led to self monitoring and deeper data review.
Jennifer Watts-Perotti and David D. Woods
When teams knew their stance conflicted with other teams’positions, they
collected more data to support their view in preparation for upcoming
coordinative meetings. “These activities not only helped teams argue their
position during meetings, but also served as an opportunity for teams to re-
examine their own assessments, assumptions, and positions”(Watts-Perotti and
Woods 2007). In the STS-76 case, the conflicting stances about whether the leak
rate increased led to further analysis of the leak. This refined analysis allowed the
Operations community to detect that their earlier analysis was incorrect, leading
them to revise their assessment of the leak rate. In the STS-107 mission, weak
and erroneous analyses were accepted as if they had a much more rigorous basis
in data (CAIB 2003; Zelik et al. 2007b). The presence of distinct groups can
therefore provide an opportunity for cross-checks and revision of assessments
(Patterson et al. 2004; Hong and Page 2002; Page 2007).
Explicit Representation of Tradeoffs. The distinct backgrounds, expertise, and
authority of the Operations and Engineering communities led them to advocate
different sides of several complex tradeoffs during re-planning for STS-76. These
tradeoffs arose from conflicting stances that emerged across the communities.
“By advocating opposing stances, the functionally distinct communities gained
the opportunity to examine the tradeoffs explicitly, which led the communities to
consider a broader region of the total solution space and to generate better ways
to balance all of the relevant constraints and side effects”(Watts-Perotti and
Woods 2007).
Avoiding Premature Closure. Cooperative Advocacy is our label for a process
that emerged over groups in STS-76 and was absent in STS-107. As a result of a
process of Cooperative Advocacy in STS-76, opportunities for cross-checks and
broadening checks arose and resulted in error correction, repair of common
ground, and identification of additional alternatives and contingencies. In the case
of STS-107, interactions across groups were fragmented, and neither cross-checks
nor broadening checks were identified in the processes leading up to or during the
Columbia disaster (CAIB 2003).
More broadly, the differences in process during the two cases speak to factors
that facilitate or hinder an ongoing process of deciding whether to invest, gather,
or commit more resources (time, expertise, economic, social, and physical
resources) to the analysis and re-planning process (see Zelik et al. 2007a
,
b).
Cooperative Advocacy implicitly facilitated when to increase resource investment
to more fully understand the situation and more fully explore contingencies in
replanning. In the Columbia case, there was no implicit or explicit process to
facilitate escalating assessment resources, and the result was a rather dramatic
premature closure in analysis of the debris strike and its implications.
Avoiding premature closure during anomaly response requires treating all data
discrepancies as anomalies until analysis adequately characterizes the situation
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
and its implications for plans in progress. The key question is what is adequate
analysis, or what NASA and Zelik et al. (2007b) have labeled as the sufficient
rigor question: how does one assess the rigor of an analysis and how does one
decide what level of rigor is sufficient given the context and pressures?
Following up every discrepancy in a fully rigorous manner would be very
resource consuming. Inevitably, such thorough follow up will be seen in hindsight as
unnecessary ‘overkill’as many discrepancies occur in physical systems that will not
prove worth the diagnostic effort nor have significant implications for plans. On the
other hand, not running down every discrepancy thoroughly increases the risk of
falling into one or another of the vulnerabilities that can plague anomaly response.
This workload/thoroughness tradeoff is at the heart of what it means to be effective in
anomaly response and in problem detection (Klein et al. 2005). The resolution to the
tradeoff is the ability to continually test and revise the answer to the sufficient rigor
question. What cues signal that additional analysis and the associated resources are
needed? These two cases instantiate many such cues including cues of ambiguities,
differences, narrowness, conflicts, uncertainty, and violations of boundary conditions
or assumptions. Many of these cues are difficult to see as such by a single person,
single group or single perspective, but these cues are more evident, and more
justifiable to after-the-fact review, as differences across diverse people, groups, and
perspectives (Page 2007). The Cooperative Advocacy process also provides a natural
means for escalating analytical and replanning activity as differences across groups
emerge and inherently signal the need for more resources to be invested to inform,
pursue, and resolve the differences.
The contrast between these two cases also reveals that organizations need to
consciously develop and sustain processes like Cooperative Advocacy. After all, it is
the same organization that provides us with samples of strong and weak process in
these two cases. More research cases of effective and ineffective anomaly response
processes can provide insights that enable organizations to design and sustain
processes that make anomaly response robust despite the inherent complexities.
5. Conclusions
This paper summarizes observations from a rare opportunity to contrast two actual
multi-day cases of anomaly response in the same domain. The salient differences
between the two cases can be used to inform the design and development of systems to
support anomaly response in domains where functionally distinct teams of practitioners
must coordinate their efforts. The following conclusions inform the design, operation,
and management of systems that may be called on to perform anomaly response. Some
of the points are relevant to systems that include human practitioners in the loop. Others
could inform the design of more automated anomaly response systems.
&It is crucial that organizations take the stance that all discrepancies are
significant anomalies until analysis demonstrates what are the actual
Jennifer Watts-Perotti and David D. Woods
implications and uncertainties. For example, active evidence gathering and
analysis is always needed, despite the time and resource costs involved, to
establish conclusions. Organizations should not continue acting in accor-
dance with previously held assessments and just wait for disconfirming
evidence to come to the fore (too often such disconfirming indicators are
present but get sidelined on the way to critical decision bodies).
&Mechanisms for broadening checks increase the robustness of anomaly
response. For example, such checks help identify side effects of decisions or
actions in re-planning that might otherwise be missed, and such checks help
management recognize when (and what) additional resources need to be
brought to bear to re-plan effectively. Broadening checks can help reveal the
level of rigor behind assessments and recommendations. For example,
several times in STS-107 decisions were based on very low rigor
assessments that on the surface appeared to be of higher rigor.
&Mechanisms for cross-checks increase the robustness of anomaly response. For
example, such checks help groups recognize when current assessments are in
need of revision and help them recovery from mis-assessments that arise in the
process of trying to understand and respond to anomalies. In particular, cross-
checks are needed to test whether new events are consistent with past
experience and therefore consistent with applying standard analysis processes
and standard plans, or whether the events fall outside past experience,
assumptions and bounding conditions and require new analytical approaches
and re-planning processes (in STS-107, several versions of this arose: is a
specific foam strike in-family or out-of-family? does the event fall inside or
outside the assumptions of the available predictive tools?).
&Means for involving diverse perspectives in analysis processes is essential
for creating broadening and cross-checks (but not the only method). One
goal is avoiding errors of premature narrowing.
&Means for conflict detection and resolution are needed and must be
grounded on technical processes that increase the rigor of analyses. An
effective anomaly response system needs to make salient any conflicting
assessments that exist across disparate teams. The system must also include
a means for resolving these conflicts based on data/uncertainties about the
anomaly and risk management over contingencies.
&A central place, process, or person is needed to provide an integrated picture
of the state of the response to the anomaly and makes this salient to all other
groups who may contribute to or be affected by the anomaly and the
organization’s response. The integrated picture of the current state of the
anomaly response process should emphasize uncertainties, alternative
interpretations, and loose ends. This integrative function is needed to
maintain coherence given the disruptive forces that can occur and dominate
activities as a result of time pressures, resource pressures, and other
difficulties inherent in anomaly response.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
&Assessments about anomaly response processes, such as assessments of the
rigor of the process, are critical to managing the appropriate level of
investment in analysis and replanning and in decisions about what additional
forms of expertise need to be brought to bear to adequately understand the
events and their implications.
The comparison of the STS-76 and STS-107 cases provides insights into
factors that affect the robustness of the anomaly response process despite the
inherent complexities and vulnerabilities. Interestingly, what makes anomaly
response robust is how different groups interact and utilize diverse perspectives to
develop a deeper understanding of the events in question and a broader
exploration of the implications for future plans and for critical goals. Designers
of CSCW systems can follow the recommendations gained from the comparison
of these two cases to increase the robustness of organizations and systems that
will be called upon to respond to anomalies, emergencies and crises.
References
Branlat, M., S. Anders, D.D. Woods and E.S. Patterson (2008): Detecting an Erroneous Plan: Does
a System Allow for Effective Cross-Checking? In E. Hollnagel, C. Nemeth and S.W.A. Dekker
(eds): Resilience Engineering: Remaining Sensitive to the Possibility of Failure., Aldershot:
Ashgate, pp. 247–257.
Columbia Accident Investigation Board (2003): Columbia Accident Investigation Board Report.
Washington, DC: U.S. Government Printing Office.
De Keyser, V. and D.D. Woods (1990): Fixation Errors: Failures to Revise Situation Assessment in
Dynamic and Risky Systems. In A.G. Colombo and A. Saiz de Bustamante (eds): Systems
Reliability Assessment (pp. 231–251). Dordrecht: Kluwer Academic.
Elm, W., S. Potter, J. Tittle, D.D. Woods, J. Grossman and E.S. Patterson (2005): Finding decision
support requirements for effective intelligence analysis tools. In Proceedings of the Human
Factors and Ergonomics Society 49th Annual Meeting. Santa Monica, CA: Human Factors and
Ergonomics Society.
Fischer, U. and J. Orasanu (2000): Error-challenging strategies: Their role in preventing and
correcting errors. In Proceedings of the International Ergonomics Association 14th Triennial
Congress and Human Factors and Ergonomics Society 44th Annual Meeting in San Diego,
California, August 2000.
Flin, R. and K. Arbuthnot (eds) (2002): Incident Command: Tales from the Hot Seat. Aldershot: Ashgate.
Flin, R., G. Slaven and K. Stewart (1996): Emergency decision making in the offshore oil and gas
industry. Human Factors, vol. 38, pp. 262–277. doi:10.1518/001872096779048110.
Gaba, D.M., M. Maxwell and A. DeAnda (1987): Anesthetic mishaps: breaking the chain of
accident evolution. Anesthesiology, vol. 66, pp. 670–676.
Garrett, S.K. and B.S. Caldwell (2002): Mission Control Knowledge Synchronization: Operations
To Reference Performance Cycles. Proceedings of the Human Factors and Ergonomics Society
46th Annual Meeting, Baltimore, MD.
Gettys, C.F., R.M. Pliske, C. Manning and J.T. Casey (1987): An Evaluation of Human Act
Generation Performance.
Jennifer Watts-Perotti and David D. Woods
Hong, L. and S.E. Page (2002): Groups of Diverse Problem Solvers Can Outperform Groups of
High-ability Problem Solvers. Proceedings of the National Academy of Science: Economic
Sciences, vol. 101, pp. 16385–16389.
Johnson, P.E., K. Jamal and R.G. Berryman (1991): Effects of framing on auditor decisions.
Organizational Behavior and Human Decision Processes, vol. 50, pp. 75–105. doi:10.1016/
0749-5978(91)90035-R.
Klein, G., R. Pliske, B. Crandall and D. Woods (2005): Problem detection. Cognition Technology
and Work, vol. 7(1), pp. 14–28. doi:10.1007/s10111-004-0166-y.
Klein, G., B. Moon and R.R. Hoffman (2006): Making sense of sensemaking 2: a macrocognitive
model. IEEE Intelligent Systems, vol. 21(5), pp. 88–92. doi:10.1109/MIS.2006.100.
Klein, G. (2007): Flexecution 1: flexible execution as a paradigm for re-planning. IEEE Intelligent
Systems, vol. 22(5), pp. 79–83. doi:10.1109/MIS.2007.4338498.
Layton, C., P.J. Smith and C.E. McCoy (1994): Design of a cooperative problem-solving system for
en-route flight planning: an empirical evaluation. Human Factors, vol. 36, pp. 94–119.
Mark, G. (2002): Extreme collaboration. Communications of the ACM, vol. 45, pp. 89–93.
doi:10.1145/508448.508453.
Militello, L.G., E.S. Patterson, L. Bowman and R. Wears (2007): Information flow during crisis
management: challenges to coordination in the emergency operations center. Cognition
Technology and Work, vol. 9, pp. 25–31. doi:10.1007/s10111-006-0059-3.
Page, S.E. (2007): The Difference: How the Power of Diversity Creates Better Groups, Firms,
Schools, and Societies. Princeton: Princeton University Press.
Patterson, E.S. and D.D. Woods (2001): Shift changes, updates, and the on-call model in space
shuttle mission control. Computer Supported Cooperative Work: The Journal of Collaborative
Computing, vol. 10, pp. 317–346.
Patterson, E.S., J.C. Watts-Perotti and D.D. Woods (1999): Voice loops as coordination aids in
space shuttle mission control. Computer Supported Cooperative Work, vol. 8, pp. 353–371.
doi:10.1023/A:1008722214282.
Patterson, E.S., E.M. Roth and D.D. Woods (2001): Predicting vulnerabilities in computer-
supported inferential analysis under data overload. Cognition Technology and Work, vol. 3(4),
pp. 224–237. doi:10.1007/s10111-001-8004-y.
Patterson, E.S., R.I. Cook, D.D. Woods and M.L. Render (2004): Examining the complexity
behind a medication error: generic patterns in communication. IEEE SMC Part A, vol. 34,
pp. 749–756.
Return to Flight Task Group (2005): Return to Flight Task Group Final Report. Washington, DC:
U.S. Government Printing Office.
Rudolph, J.W., J.B. Morrison and J.S. Carroll. The Dynamics of Action-Oriented Problem Solving:
Linking Interpretation and Choice. Academy of Management Review, (2008) in press.
Shalin, V.L. (2005): The roles of humans and computers in distributed planning for dynamic
domains. Cognition Technology and Work, vol. 7, pp. 198–211. doi:10.1007/s10111-005-0186-2.
Shattuck, L.G. and D.D. Woods (2000): Communication of Intent in Military Command and
Control Systems. In Carol McCann and Ross Pigeau (eds): The Human in Command:
Exploring the Modern Military Experience (pp. 279–292), New York: Kluwer Academic/
Plenum Publishers.
Smith, P.J., E. McCoy and C. Layton (1997): Brittleness in the design of cooperative problem-
solving systems: the effects on user performance. IEEE Transactions on Systems,Man,and
Cybernetics, vol. 27, pp. 360–371. doi:10.1109/3468.568744.
Smith, P.J., M. Klopfenstein, J. Jezerinac and A. Spenser (2004): Distributed Work in the National
Airspace System: Providing Feedback Loops Using the Post-Operations Evaluation Tool
(POET). In B. Kirwan, M. Rodgers and D. Schaefer (eds): Human Factors Impacts in Air
Traffic Management (pp. 127–152), London: Ashgate.
Cooperative Advocacy: Approach for Integrating Diverse Perspectives
Smith, P.J., A.L. Spenser, C.E. Billings (2007): Strategies for designing distributed systems: Case
Studies in the design of an Air Traffic Management System. Cognition Technology and Work,
vol. 9, pp. 39–49.
Starbuck, W.H. and M. Farjoun (eds) (2005): Organization at the Limit: NASA and the Columbia
disaster. Malden, MA: Blackwell.
Watts-Perotti, J. and D.D. Woods (2007): How anomaly response is distributed across functionally
distinct teams in space shuttle mission control. Journal of Cognitive Engineering and Decision
Making, vol. 1(4), pp. 405–433. doi:10.1518/155534307X264889.
Woods, D.D. (2005): Creating Foresight: Lessons for Resilience from Columbia. In W.H. Starbuck
and M. Farjoun (eds): Organization at the limit: NASA and the Columbia Disaster (pp. 289–
308), Malden: Blackwell.
Woods, D.D. (2006): Essential Characteristics of Resilience for Organizations. In E. Hollnagel,
D.D. Woods and N. Leveson (eds): Resilience Engineering: Concepts and Precepts., Aldershot:
Ashgate.
Woods, D.D. and E.S. Patterson (2000): How Unexpected Events Produce an Escalation of
Cognitive and Coordinative Demands. In P.A. Hancock and P. Desmond (eds.): Stress Workload
and Fatigue (pp. 290–302), Hillsdale NJ: Lawrence Erlbaum.
Woods, D.D. and E. Hollnagel (2006): Joint Cognitive Systems: Patterns in Cognitive Systems
Engineering. Boca Raton: Taylor & Francis.
Woods, D.D., J. O’Brien and L.F. Hanes (1987): Human Factors Challenges in Process Control:
The Case of Nuclear Power Plants. In G. Salvendy (ed): Handbook of Human Factors/
Ergonomics (first edition, pp. 1724–1770). New York: Wiley.
Woods, D.D., S.W.A. Dekker, R.I. Cook, L.L. Johannesen and N.B. Sarter (in press): Behind
Human Error (Second Edition). Aldershot, Ashgate.
Zelik, D., E.S. Patterson and D.D. Woods (2007a): The Impact of Process Insight on Judgment of
Analytic Rigor in Information Analysis. In Proceedings of the Human Factors and Ergonomics
Society 51st Annual Meeting. October 1–5, Baltimore, MD.
Zelik, D., E.S. Patterson and D.D. Woods (2007b): Understanding Rigor in Information Analysis:
The Role of Rigor in Professional Intelligence Analysis. In K. Mosier and U. Fischer (eds):
Proceedings of Naturalistic Decision Making 8, June 2007.
Jennifer Watts-Perotti and David D. Woods