Computer Supported Cooperative Work 10: 317–346, 2001.
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.
Shift Changes, Updates, and the On-Call
Architecture in Space Shuttle Mission Control
EMILY S. PATTERSON
and DAVID D. WOODS
Institute for Ergonomics, Cognitive Systems Engineering Laboratory, Ohio State University, 210
Baker Systems, 1971 Neil Ave., Columbus, OH 43210, U.S.A.
author for correspondence, E-mail: firstname.lastname@example.org)
(Received 14 July 2000)
Abstract. In domains such as nuclear power, industrial process control, and space shuttle mission
control, there is increased interest in reducing personnel during nominal operations. An essential
element in maintaining safe operations in high risk environments with this ‘on-call’ organizational
architecture is to understand how to bring called-in practitioners up to speed quickly during esca-
lating situations. Targeted ﬁeld observations were conducted to investigate what it means to update a
supervisory controller on the status of a continuous, anomaly-driven process in a complex, distributed
environment. Sixteen shift changes, or handovers, at the NASA Johnson Space Center were observed
during the STS-76 Space Shuttle mission. The ﬁndings from this observational study highlight the
importance of prior knowledge in the updates and demonstrate how missing updates can leave
ﬂight controllers vulnerable to being unprepared. Implications for mitigating risk in the transition
to ‘on-call’ architectures are discussed.
Key words: anomaly, common ground, decision, ethnography, event, knowledge, mutual awareness,
observation, plan, shift change, update
1. ‘On-call’ architecture in supervisory control
In supervisory control domains such as nuclear power, industrial process control,
and space shuttle mission control, there has been a widespread trend of reducing
large deployments of human personnel continuously monitoring dedicated subsets
of process data to minimizing human personnel until a problem arises, at which
time additional resources are called in. This ‘on-call’ architecture has the potential
to reduce operational expenses by using the full reservoir of resources only when
For example, during the STS-75 Space Shuttle mission, a tethered scientiﬁc
satellite unexpectedly separated from the shuttle. As a result, two ﬂight controllers
were immediately called in to support the nominally staffed controller respon-
sible for the mechanical systems on the shuttle. The ﬁrst controller took over the
standard operations for the nominally staffed controller. This substitution allowed
the nominally staffed ﬂight controller to work with the second called-in controller
318 EMILY S. PATTERSON & DAVID D. WOODS
on developing a way to prevent the astronauts from being electrically shocked when
recapturing the satellite.
By deﬁnition, with the on-call architecture, personnel are brought in only when
a situation is unusual, has begun to deteriorate, or involves high stakes. They are
called in as part of an escalation of cognitive and coordinative activities (Woods
and Patterson, 2001). There is inherently a ‘workload double-bind’ (Woods et al.,
1994b) in that when the on-call practitioner is most needed to provide additional
resources and expertise, the staffed practitioner has the least time to update the
incoming practitioner and to coordinate to redistribute the workload.
Therefore, in order to ensure that the on-call architecture functions effectively,
we need to identify ways to quickly bring incoming practitioners ‘up to speed’
without tying up the resources of the staffed practitioners during critical periods.
The goal of this research was to better understand how practitioners are currently
brought ‘up to speed’ in a complex, dynamic supervisory control setting. To this
end, targeted ﬁeld observations were conducted of updates in space shuttle mission
control: shift change handovers between mechanical ﬂight controllers during the
The goal of the shift change handover is to prevent a break in the ﬂow of the
monitored process and activities conducted by the ﬂight controllers when there is
a change in personnel (e.g., Grusenmeyer, 1995, shift changes in paper mills). A
successful handover is deﬁned by a smooth continuity of operations from one shift
to the next. There are two senses to this deﬁnition. The ﬁrst is to avoid a rift in
terms of interactions with others and ongoing activities being conducted. In other
words, the work should continue as if the operator had never been replaced. The
second is for the incoming operator to understand what had happened as if he or
she had been present and personally engaged in all the activities. The handover
update is given to avoid having an incoming practitioner:
• have an incorrect or incomplete model of the process state,
• be unaware of signiﬁcant data or events,
• be unprepared to deal with impacts from previous events,
• fail to anticipate events,
• lack knowledge that is necessary to perform relevant tasks,
• drop or rework activities that are in progress or that the team has agreed to do,
• create an unwarranted shift in goals, decisions, priorities, or plans.
The paper is organized as follows. We introduce the domain of space shuttle
mission control, including how responsibility is hierarchically distributed and
handovers are nominally conducted. We describe how the observational data was
analyzed and provide an overview of the observed STS-76 mission. The study
ﬁndings are then described. Implications of these ﬁndings for mitigating risk in
two on-call scenarios are discussed.
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 319
2. Overview of space shuttle mission control
HIERARCHICAL DISTRIBUTION OF RESPONSIBILITY IN MISSION
Ground-based mission control for the Shuttle Program at the NASA Johnson Space
Center (JSC) is responsible for supporting the crew in meeting the objectives of
the mission and for ensuring the health of the spacecraft while in ﬂight. The Flight
Director (Flight) acts as the central decision maker and coordinates the inform-
ation ﬂow between the various ﬂight controllers responsible for subsystems of
the shuttle. Flight and the approximately sixteen main controllers sit in assigned
positions in the ‘front room’. Various support personnel, known as the ‘back room
controllers,’ support the front room controllers. For example, for the Maintenance
Mechanical Arm and Crew Systems (MMACS) team, the front room position
is called MMACS and the back room controllers are Mechanical (Mech I and
II), In-Flight Maintenance (IFM), Photo/TV, and Escape. The observations were
conducted at the back room Mech I console position. The Mech is responsible
for the health and safety of mechanical systems such as power units, heaters, and
payload bay doors.
The MMACS team is responsible for ensuring the health and safety of the
orbiter’s structural and mechanical subsystems during a mission. Flight controllers
do much more than continuously monitor system parameters for anomalous read-
ings. Although this is a critical task, there are many subtleties and complexities
in the functions that they fulﬁll, particularly since surprising events are common
during space missions. Controllers must exhibit creativity, the ability to work with
others, and deep knowledge not only of their mechanical systems but also of the
rationales and risk trade-offs behind the ﬂight rules so that they can be applied or
modiﬁed for the speciﬁc circumstances.
Three shifts a day are scheduled when the shuttle is in orbit. Each shift change, or
handover, is scheduled for one hour.
Handover updates are designed to have information ﬂow bottom-up through the
hierarchy of the incoming shift. For every position, the outgoing ﬂight controller
updates the incoming controller, both physically co-located at their assigned
consoles. These primary brieﬁngs are essentially private (i.e., without using the
voice loop communication system described in Patterson et al., 1999), with the
convention that no one is allowed to interrupt these communications. After the
intensity of the primary brieﬁngs has died down, the incoming back room control-
lers (e.g., Mech) brief the incoming front room controllers (e.g., MMACS). These
brieﬁngs are used to check the understanding from the primary brieﬁngs and
coordinate the activities to be conducted during the shift. This update is conducted
on a dedicated voice loop channel (e.g., MMACS ALT) on which the ﬂight control-
320 EMILY S. PATTERSON & DAVID D. WOODS
lers speak using headphones with audio hookups so that other controllers can listen
in on the communications. In parallel with the brieﬁngs from the back room to
the front room controllers, the incoming front room controllers give the incoming
Flight Director a short, high-level update on a voice loop that is dedicated for this
purpose (AFD CONF). These brieﬁngs are closely monitored by the entire mission
control center, which serves to check the shared understanding of the situation
following the various discipline handovers.
Sixteen of twenty-seven handovers in the mission control center (MCC) at the
NASA Johnson Space Center were directly observed during the Space Shuttle
mission STS-76 [EP]. The 16 observed handovers were divided between the three
shift transitions (5, 6, and 5). The naturally occurring verbal behavior was audio-
taped. In order to minimize the effect of observation on the ﬂight controllers’
behavior, previous observations had been conducted with the controllers and ques-
tions to clarify the content of the handovers were asked only after the handover was
The raw data included ﬁeld notes of face-to-face and voice loop verbal commu-
nications and copies of ﬂight documentation such as handwritten logs and ﬂight
plans. The data was analyzed iteratively, using theoretical frameworks to recognize
and abstract relevant patterns (Hollnagel et al., 1981). Process tracing proto-
cols (Woods, 1993) for each handover were created that described the activities
in domain-independent terms and separated the communications made by the
different participants (Figure 1). One-page summaries for each handover were
generated and patterns across the handovers related to the research question were
As has been noted by many ethnographic, or “cognition in the wild”,
researchers, observation and analysis is heavily inﬂuenced by the theoretical frame-
works that are used to recognize and abstract patterns in complex data. Three
frameworks in particular guided the observation and data analysis in this study:
(1) dynamic fault management, (2) distributed replanning in anomaly response,
and (3) common ground in communication.
The ﬁrst conceptual framing of the ﬂight controller’s task was dynamic fault
management (Woods, 1994a). With this framing, a controller recognizes unex-
pected ﬁndings in the data stream, conducts diagnostic searches, and generates
hypotheses about faults that could account for the observed pattern of disturbances.
This reasoning process goes on in parallel with interventions intended to either
protect systems, i.e., “saﬁng” interventions, or to gather additional information,
i.e., diagnostic interventions. For a difﬁcult anomaly, there can be challenges in
diagnosing the anomaly, ﬁguring out its impacts on related subsystems, performing
saﬁng activities in parallel with troubleshooting, and deciding whether or not to
obtain more data. Based on this framework, during the transfer of responsibility
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 321
Figure 1. Using conceptual frameworks to guide data abstraction and analysis.
in a shift change, updates were anticipated to potentially include: (1) unex-
pected ﬁndings in the data stream that might be symptoms of system faults, (2)
diagnostic hypotheses to account for unexpected ﬁndings, (3) impacts of faults on
the monitored systems and other agents, (4) cascading events that were triggered
by a system fault, (5) diagnostic interventions, and (6) saﬁng interventions.
Second, it was believed that several important elements might not be covered by
the dynamic fault management framework alone that are important in distributed,
322 EMILY S. PATTERSON & DAVID D. WOODS
dynamic, event-driven supervisory control. First, although theoretically dynamic
fault management could be accomplished by a team as well as an individual,
the implicit assumption of the framework is that planning can be conducted by
the agent or agents performing dynamic fault management largely independently
of the goals and plans of other agents. In other words, there are few interde-
pendencies with agents external to the immediate team to take into account with
respect to dynamic fault management activities. In space shuttle mission control,
the coordination required by agents is very complex. Distributed replanning is a
critical component of anomaly response (Woods, 1994a). In distributed replanning,
multiple people supported by computerized systems assess the implications of an
unexpected ﬁnding, or anomaly, for planned future activities, evaluate contingen-
cies, and modify plans in progress. During replanning, coordination across multiple
people in different roles is more complex than assigning and synchronizing tasks.
As part of this coordination, teams of people adopt and portray stances about crit-
ical decisions that affect multiple agents. The concept of stance is a combination of
a position towards a signiﬁcant issue (i.e., a decision a team faces) and the rationale
for that position, which is often predictable given the position on a tradeoff func-
tion associated with particular roles. For example, mechanical systems controllers
might be more concerned with determining the cause of a malfunction than control-
lers primarily tasked with the safety of the astronauts, such as the ﬂight surgeon.
Based on this framework, during the transfer of responsibility in a shift change,
updates were anticipated to potentially include evidence of discussions about: (1)
plans, (2) stances, (3) goals, (4) positions on tradeoff functions, (5) contingencies,
(6) intent, (7) impacts to previously planned activities and expectations within the
team, and (8) impacts to previously planned and expectations of other teams.
Third, the goal of the observed updates could be framed as creating and main-
taining a common understanding, or common ground (Clark and Brennan, 1991)
between human agents. This common ground is what would allow the practitioners
to accept the responsibility and authority associated with a position for a period
of time without being taken by surprise. As others have observed, the notion of
common ground is a complex conglomerate of many interdependent elements,
including the interacting elements of: (1) knowledge that is known to be shared
between individuals (Clark and Brennan, 1991; Wegner et al., 1985; Hutchins,
1995), (2) shared goals or intentions, (3) mutual beliefs about the current state
of affairs and the predicted effects of actions on the state of affairs (Clark, 1992;
Suchman, 1987; Clark and Brennan, 1991), (4) shared awareness of others’ activ-
ities and the state of the monitored process, and (5) common frames of reference
(e.g., ﬁxed line diagram in London Underground line control room, Heath and
Luff, 2000). The conceptual framework of common ground inﬂuenced the data
observation and analysis in that updates to relatively ungrounded controllers, such
as the update immediately following ascent, were anticipated to have a different
character than updates based upon a more established common ground. In addi-
tion, deviations from expectations, including unexpected data and changes to the
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 323
plan, were expected to be highlighted more than data and plans that conformed to
prior expectations. Finally, it was anticipated that controllers might explicitly use
strategies that built upon existing shared understandings in the updates, such as by
implicitly assuming that some topics and sub-topics would not need to be included
and using coded language to communicate more efﬁciently.
4. The observed STS-76 mission
The STS-76 mission included a rendezvous docking with the MIR Space Station.
As a result, there was a very short liftoff window (seven minutes instead of several
hours) and the MMACS team had to monitor specialized docking mechanical
systems. Due to the additional workload, the back room Mech position was staffed
for the entire ﬂight instead of only during the high-tempo periods such as ascent
and entry, which is the stafﬁng conﬁguration for nominal missions.
The initially scheduled liftoff was postponed for one day because of high winds
and rough seas at Cape Kennedy (Figure 2 provides an overview of the mission
events). The second liftoff attempt began without incident at 2:13 a.m. on March
22, 1996. During ascent, two anomalies in the systems under the responsibility of
the MMACS team were observed: a freeze in the Water Spray Boiler (WSB) that
cools the third Auxiliary Power Unit (APU), and a hydraulic leak on the third APU.
Both anomalies were deﬁnitively diagnosed and neither was severe enough to
require an aborted ascent, so the shuttle attained its planned orbit altitude and most
of the ascent mechanical systems were shut down. The ﬁrst anomaly, the Water
Spray Boiler (WSB) freeze-up, is a relatively common problem with a well-deﬁned
response procedure that mainly involves verifying that the WSB works a day before
entry. Although this procedure could not be implemented because the water spray
boiler was on the same system that had the hydraulic leak, the WSB freeze-up
did not cause an escalation of cognitive and coordinative activities because the
procedural action was not required for several days and the eventual decision to
assume that the WSB would be operational for entry without the standard test was
The second anomaly was signiﬁcant and novel enough to create an escalation
of cognitive and coordinative activities. The MMACS controller with specialized
knowledge of the Auxiliary Power Unit (APU) immediately called himself in,
based on watching the ascent on NASA Select TV, to provide expertise in deciding
whether to shorten the mission duration. The decision was made not to shorten
the mission because the leak was small enough that some capability remained in
the APU system and the leak was unlikely to get much worse during the gener-
ally quiescent orbit conﬁguration. There were cascading repercussions from this
anomaly to several other aspects of the mission as actions were taken to protect the
leaking hydraulic system, both to maintain effective redundancy of critical systems,
and to protect the MIR Space Station from contamination (Table I). Several of
these planned potential actions were debated by additional called-in operational,
324 EMILY S. PATTERSON & DAVID D. WOODS
Figure 2. Overview of events and observed handovers in STS-76 mission.
engineering, and management personnel to ensure that the plans were robust to
On ﬂight day 8, the decision was made to come home one day early (ﬂight day 9
instead of ﬂight day 10) due to weather predictions at Kennedy Space Center (KSC)
and concern about the reduced redundancy in the APUs due to the hydraulic leak.
On ﬂight day 9, however, both opportunities for entry were waived off because of
fog and unpredictable weather at KSC. The astronauts prepared to spend another
day in orbit, expecting to land on ﬂight day 10.
When the decision was made to stay on orbit for another day, the payload bay
doors were commanded to open but the procedure automatically halted when the
sensor indicated that one latch was still closed. After the crew visually determined
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 325
Table I. Changes to plans as a result of the hydraulic leak
Changes to plans Rationale
Minimize circulation pump operations To minimize the use of the leaking APU
Close vent doors before docking with the
To protect the space station from hydraulic
Use a circulation pump instead of an APU to
check the ﬂight control system
To reduce the risk of losing redundancy on
Use 2/3 APUs for entry To avoid relying on the leaking APU
Land at Edwards not Kennedy Space Center To minimize stress from crosswinds on APUs
the latches to be in an open conﬁguration, it was assumed that the sensor was
giving an erroneous indication, and the doors were commanded open manually.
They opened without further incident. If the payload bay doors had not opened, the
shuttle would have had to make an immediate emergency landing.
On ﬂight day 10, the second landing opportunity at Edwards Air Force Base
was taken after waiving off the ﬁrst landing at KSC due to poor weather conditions.
The decision to land at the less preferred Edwards site, which requires expensive
ground transport back to KSC, was made in order to have better weather conditions,
particularly lower crosswinds. The shuttle therefore touched down at Edwards at
7:29 a.m. March 31, 1996, and responsibility for the orbiter transferred from the
ﬂight controllers at the NASA Johnson Space Center to other NASA organizations.
5. Findings from the observational study
It might be expected that, during the hour scheduled for each handover, the
incoming controller would immediately and continuously receive verbal updates
until the outgoing controller departed. This situation was not the case in any of
the observed 16 handovers. In every handover except the handover immediately
following ascent (which had been personally observed by the incoming controller)
and Handover 9 when the incoming controller read a packet of information left by
the outgoing controller, the controllers engaged in short high-tempo brieﬁngs about
20 minutes after the incoming controller arrived (Table II). During the time prior
to the update, the incoming controller would generally sit next to the outgoing
controller while listening to the voice loops, monitor the data screens, and look
through the ﬂight log and other documentation. One controller (personal commu-
nication) described his opinion about the reason why handover updates often do
not begin immediately upon arrival of the incoming controller:
You can see during handover that one of the ﬁrst items that would happen is that
the oncoming shift, the incoming shift, would sit down and read the previous
two shifts since he was in. And see what had happened over the 16 hours since
326 EMILY S. PATTERSON & DAVID D. WOODS
Table II. Length and start time of observed handover brieﬁngs
Handover Handover Brieﬁng Primary brieﬁng Voice loop Handover duration (min)
start time (min) start time (min) brieﬁng start
25:561422 36 42
3 16:18 20 32 68 88
4 14:51 10 0 N/A 39
5 20:58 15 15 N/A 55
6 23:41 10 0 33 49
7 1:25 11 28 N/A 39
89:3684 39 42
9 17:00 N/A 40 N/A N/A
10 1:34 1 6 N/A 41
11 7:20 8 (5+3) 15 N/A 137
12 9:12 14 0 N/A 14
13 16:52 9 (2+7) 42 N/A 71
14 9:45 4 (1+3) 47 0 50
15 17:02 13 13 N/A 31
16 11:53 10 (2+8) 27 N/A 40
Avg 9.87 19.44 35.20 43.33
St Dev 5.17 15.53 24.16 17.17
he had been in. They would sit down and discuss it with the person that they’re
taking over from and any other little innuendos that haven’t been mentioned in
the log so that they are well aware that everything that has happened up until
that point in time. Because when that person goes home, you know, they don’t
have any insight. So if there’s anything further coming up ...then they’re not
surprised by it, they know about it and they’re well aware of it. They know
who else is aware of it . . . It’s a good system. We couldn’t operate without logs
Also during the handover time, the incoming controllers would occasionally
brief their incoming superiors over the voice loops, who would then brief their
superior, the incoming ﬂight director. The position responsibility was ofﬁcially
handed over when the incoming controller switched from an alternate to a primary
team voice loop and the handover ofﬁcially ended when the ﬂight director from
the outgoing shift verbally released the outgoing controllers via the Flight Director
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 327
voice loop. In several instances, outgoing controllers stayed beyond the ofﬁcial
end of the handover to perform speciﬁc activities or attend meetings related to the
hydraulic leak anomaly.
The ﬁndings from the observations highlight the inﬂuence of prior knowledge
on the updates and how missing updates can leave ﬂight controllers vulnerable to
being surprised or unprepared. First, the incoming controllers initiated many of the
topics in the handover updates, demonstrating shared knowledge about what topics
would be important to cover in the handover. Second, incoming controllers were
observed to ask questions that were highly speciﬁc and indicated a detailed know-
ledge of the current status of a particular topic item, ofﬂoading much of the work
necessary by the outgoing controller to determine what the incoming controller
needed to learn. Third, the content of the handovers heavily emphasized events and
activities, data analyses, and decisions that were triggered by the escalating event
of the hydraulic leak anomaly. Finally, although many of the updates were effective
in bringing the incoming controllers up to speed, an incident was observed where
a controller was surprised by a request to close the vent doors because he had not
been updated that there had been a reversal to a prior decision not to close the
MIXED-INITIATIVE INTERACTIONS: TOPIC INITIATIONS BY INCOMING
AND OUTGOING CONTROLLERS
Handover updates ﬂuidly shifted from one topic to another. Handover 13 (Figure 3)
between the outgoing and incoming back room Mech controllers is used to illus-
trate how topics were initiated during the handover updates. Above each line is a
description of the topic that is introduced by either the outgoing ﬂight controller
(on the left) or the incoming ﬂight controller (on the right) and below the line is
the beginning of the dialogue on that topic. The entire brieﬁng took nine minutes,
divided into two segments of 2 and 7 minutes due to a pre-arranged side meeting
with another person. The update, like all of the handovers, began with a recogniz-
able signal that the controller was willing to initiate the brieﬁng: “Anything going
on?” Following this initial question, the controllers began discussing a meeting
between the mission controllers and engineers about impacts to the operational
plan due to the hydraulic leak anomaly. Many of the other topics discussed during
the handover were continuations of ongoing replanning efforts for entry procedures
as a result of the hydraulic leak in the auxiliary power unit, particularly contingency
planning for cases such as loss of another auxiliary power unit or high crosswinds.
Note that at the end of the update, the incoming controller re-initiated a previous
topic, changes to the shutdown procedure for the auxiliary power unit. This is likely
because he wanted to engage in a lengthier debate on the topic than would have
been appropriate earlier in the brieﬁng.
It is a clear pattern across multiple handover updates that topics were initiated
by both outgoing and incoming controllers. Since the controller who worked the
328 EMILY S. PATTERSON & DAVID D. WOODS
Figure 3. Topic initiations in handover 13.
previous shift should theoretically have more knowledge than the person being
updated should, it follows that the expectation would be that the outgoing controller
would initiate most of the topics. Nevertheless, it is apparent that incoming control-
lers initiated many of the topics in the handover updates (Figure 4). At an a α
level of 0.01 with the t-distribution, the conﬁdence interval for topics initiated by
incoming controllers is [1.5, 7.0], which is clearly greater than zero. In addition,
a one-tailed t-test comparison of the number of topics initiated by outgoing and
incoming controllers gives a p value of 0.08, which is suggestive but not conclusive
that outgoing controllers initiated somewhat more of the topics in the handover
The likely explanation for this ﬁnding is that incoming controllers had prior
expectations about the topics that would be important to discuss before initiating
the update. Not only were the incoming ﬂight controllers directly involved in the
ongoing activities two shifts before the update occurred, they also had probably
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 329
Figure 4. Topic initiations by incoming and outgoing controllers.
read the handwritten log, looked at the events that were being tracked, looked at
the ﬂight plan for the day, and listened to the voice loops for some time. Because
the incoming controllers had this mission-speciﬁc knowledge in addition to their
general heuristics about activities in mission control, they could anticipate the
important topics to be discussed. Note that the structure of handover 2 supports this
explanation. The update in handover 2 was given to a practitioner who was begin-
ning his ﬁrst shift of the mission. In this handover, the outgoing controller initiated
most of the topics. Note that the same personnel were involved in handovers 8
and 14, so the structure of handover 2 was probably not a result of individual
personality factors but a function of the incoming controller being less aware of the
important topics to cover in Handover 2. Similarly, in handover 16, the topic initi-
ations were dominated by the outgoing controller. This pattern is likely because the
incoming controller did not have an up-to-date situation awareness, either because
he was substituting for the nominally staffed controller or because it was the last
handover before entry and so many of the decisions had been recently ﬁnalized.
QUESTIONS ASKED BY INCOMING CONTROLLERS DEMONSTRATED
In addition to analyzing topic initiations, we wanted to characterize the questions
asked by incoming controllers during the primary handover brieﬁngs. The question
categories were iteratively characterized bottom-up from the data mainly with rela-
330 EMILY S. PATTERSON & DAVID D. WOODS
tion to the amount of shared understanding indicated by the question (Table III).
The categories that iteratively emerged from the data analysis were: (1) update
initiation questions, (2) topic initiation questions, (3) questions to obtain more
details, (4) conﬁrmation questions, and (5) error-checking questions. The ques-
tions asked by incoming controllers were used in the handover updates to steer
the outgoing controller to speciﬁc areas. The most common type of question was
where an incoming controller targeted speciﬁc information in a topic area about
which he or she wanted more details. These types of questions illustrated that the
two controllers in the brieﬁng shared much common ground on which to base the
update and allowed the incoming controller to narrowly target information which
was needed and known to be needed based on the preparatory work of the incoming
controller reviewing the documentation and listening to the voice loop discussions.
Although the majority of the questions that were asked were done with the
purpose of making the incoming controller more knowledgeable in preparation for
transferring responsibility, an additional function of the questions asked during the
handover was to perform error checking. In this sense, an additional beneﬁt of the
handover was to bring a fresh perspective to the decision making and planning
processes, which presumably would increase the robustness of these activities.
5.2.1. Update initiation questions
Questions that signaled a readiness to receive the handover update such as
“Anything going on?” were used to begin the primary brieﬁngs. This type of ques-
tion was the least informed in that the entire burden for structuring the update rested
with the outgoing controller. Variations on this question, such as “Anything else
going on?” were used within the updating session to remind the outgoing controller
to be thorough in covering all of the important topics.
5.2.2. Topic initiation questions
Like the initiation questions, this type of question prompted the other controller for
information, but it required that the controller knew that a particular topic existed.
Many of these questions were triggered by an incoming controller monitoring other
information sources, such as by reading the handwritten log (e.g., “Flight caught
us off guard?”), looking at the data screens (e.g., “The main pump case drain
temps?”), looking at the mission plan, or listening to a voice loop update.
5.2.3. Questions to obtain more details
The purpose of this type of question was to obtain more details about a topic that
was being discussed. In the example in Table IV, the incoming controller asked
for details that the controller who had actively been engaged in an activity would
know. In this case, the incoming controller obtained information that supported a
particular hypothesis to explain the anomalous data without requiring the outgoing
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 331
Table III. Questions asked during primary handover brieﬁngs
Update initiation Topic initiation Obtaining more details Conﬁrmation Error checking Misc.
Anything going on? Do you have a copy
of the write-up on the
Was the pressure low
when they were doing it?
Rads are deployed? Why do you have
so many limits?
Other than this, is
anything else going
Will you get me a
copy of that ET sep?
Voids? It’s heater DCU 2
not B, right?
Is it only 17? I
thought it was 10
Anything going on? Are we doing vent
How about circ pump
We’re doing an
earlier Ei purge?
Why do we need
to do that?
Anything going on? Do we need to work
any FCS CO changes?
Wiring? OK, so take out
Why wouldn’t we
put it to norm
What’s the circ pump
How does that ﬁt in [less
They already went
up, didn’t they?
Do you think we
ought to have
them open that
Did you list
Anything else going
How did the rad stow
Does the crew go to sleep
on the orbit shift?
He doesn’t want to
What all is going on? Where did all this tire
data come from?
As long as it’s above
what, zero degrees?
Do you know that
What else? Who gets copies of
this ﬂight note?
What is the MMACS
Does the switch
being in low or
norm affect the
caution and warn-
332 EMILY S. PATTERSON & DAVID D. WOODS
Table III. Continued
Update initiation Topic initiation Obtaining more details Conﬁrmation Error checking Misc.
Anything going with
the hydraulic leak?
Why did he catch us off
The rudder speed
brake is getting cold,
don’t you think?
Oh, you mean they’ll
start at TAEM?
Flight caught us off
How much leaked?
Slow start of circ
Do you know why in
the TMBU we’re doing
What’s going on with
circ pump 2?
Did you send the TMBU
The main pump case
Did you take the TMBUs
Are they asleep yet? Will the hose jump
And did we get any
information on the?
Did <previous control-
ler> look at the plan?
And the 10 knot cross-
Because only what?
Where are the deltas? What was his pitch?
What about AESP? What does the rule say?
APU before TAEM?
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 333
Table III. Continued
Update initiation Topic initiation Obtaining more details Conﬁrmation Error checking Misc.
CHIT 17, what does it
This is it?
You have any idea where
Do we do steam vent
heater activation during
FAO has already sent it
up so do we delete this
step or have them deac-
tivate it later?
Do you know what the
MEWS problem is?
Hardware and caution
warning for what, the
Do they know if
we’re going to have
What is that, a BFS
334 EMILY S. PATTERSON & DAVID D. WOODS
Table IV. Using a question to obtain more details
Commentary Outgoing controller Incoming controller
A circulation pump did
not work as expected. The
outgoing controller tells
the incoming controller to
investigate this potential
“We had a ratty circ pump.
The switching valve didn’t
change for 27 seconds. We
need to pull data on this.”
The incoming controller
asks a question to obtain
“Was the pressure low
when they were doing it?”
The outgoing controller
ﬁlls in the details
requested by the incoming
“Yeah, the circ pump
pressure came up about
halfway, toggled, went up
all the way.”
controller to update him on all of the potentially relevant details relating to the
5.2.4. Conﬁrmation questions
Conﬁrmation questions were generally “Yes/No” questions in order to verify that
the controllers shared the same knowledge or interpretation (e.g., “As long as it’s
above what, zero degrees?”).
5.2.5. Error checking questions
During the updates, incoming controllers were observed to question outgoing
controllers in an attempt to identify and correct potentially erroneous assumptions.
An example of this “Are you sure?” interrogation strategy is provided in Table V,
where the incoming controller questioned whether putting the leaking hydraulic
system on the auxiliary power unit (APU) to a “standby” conﬁguration for use in
case another APU failed would generate a false alarm. In this case, the outgoing
controller stated a high conﬁdence in his assumption that no alarm would be gener-
ated, so there was no direct effect on their decision to re-enable hydraulic pressure
on the leaking system. In other cases, erroneous assumptions were discovered and
changes to plans implemented as a result of this type of question.
In summary, incoming controllers were observed to ask questions that displayed
a range of prior knowledge, from questions that broadly indicated a desire to begin
the handover update to questions that were highly speciﬁc, targeting a gap in know-
ledge about details of a particular topic item or verifying that an understanding
was accurate. In only one case did a controller defer an answer to a question to a
later time in order to more quickly troubleshoot a server crash. If many question
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 335
Table V. Checking a potentially erroneous assumption
Commentary Outgoing controller Incoming controller
The outgoing controller
updates the incoming con-
troller about a decision to
re-enable pressure on a
“Heater system 3. We’re
going to go ahead and re-
enable hydraulic pressure
on their system.”
The incoming controller
asks if re-enabling pres-
sure will cause alarms to
be unnecessarily triggered.
“Do you know that for
sure? Does the switch
being in low or norm affect
the caution and warning?”
The outgoing controller
declares that no alarms
will be triggered.
deferrals had occurred during the updates, this would have indicated miscalibration
on the part of the incoming controllers as to what was important to discuss. By
accurately anticipating where they needed to be informed, incoming controllers
ofﬂoaded much of the work necessary by the outgoing controller to determine what
should be included in the update.
These patterns of mixed-initiative interactions and interrogation strategies
suggest that the update is less effortful and more robust when many of the topics
are mutually known before the brieﬁng. The outgoing controller is less prone to
missing an important topic as the incoming controller can help to remind the
outgoing controller of the topics to be covered. The incoming controller can aid
the outgoing controller in targeting knowledge gaps during the update. Investing
in a common understanding during low workload periods in preparation for unex-
pected problems, either by listening in on others’ conversation, observing others’
activities, or providing updates that have not been requested, has been observed to
be a strategy in many complex, dynamic domains (e.g., anesthesiology, Johannesen
et al., 1994; satellite mission control, Jones, 1995; aviation, Kerns et al., 1998;
military aviation, Rochlin et al., 1987; emergency call centers, Benchekroun et
al., 1995). An implication of these observations is that the on-call architecture
might work more effectively if practitioners who are assigned the responsibility
to be called in when an unexpected event occurs invest proactively in learning the
important topics that would need to be covered in an update before the situation
336 EMILY S. PATTERSON & DAVID D. WOODS
Figure 5. Topics in handover updates.
5.3. UPDATES EMPHASIZED CASCADES FROM THE ESCALATING EVENT
Analysis of the content of the handover updates revealed that the updates mainly
emphasized activities, data analyses, and decisions that resulted from the hydraulic
leak anomaly (Figure 5). The activities in the handover updates included activ-
ities that had been accomplished in the past, that were ongoing and needed to
be continued by the incoming shift, and activities that still remained to be done
during the next shift or handed over to future shifts. There were also data analysis
results that were described during the handovers that provided further information
about the extent of the hydraulic leak, performed either within the MMACS team
or by engineering personnel. Finally, controllers discussed changes in decisions
to nominal and contingency plans for upcoming landmark events. With every
update about a decision, there was an associated update about the stance toward
the decision. For example, the stance of the MMACS team toward the conﬁgur-
ation for entry was that the Auxiliary Power Unit (APU) with the hydraulic leak
should be turned off in order to avoid relying upon a potentially faulty system.
By including the stance toward a decision in the update, the incoming controller
would be positioned to provide and defend a recommendation in the event that the
decision was reopened for debate at a later time.
Note that the handover updates mainly emphasized deviations from the initial
plan. The handovers were built on top of a shared understanding of the nominal
plan. It should be recognized that called-in practitioners might not have this
shared understanding to build upon unless they are speciﬁcally provided with that
information in advance.
UPDATES HIGHLIGHTED EVENTS
Although many of the activities, data analyses, and decisions discussed in the
handover updates were triggered by the hydraulic leak in the Auxiliary Power Unit
during ascent, the handover updates also included discussions about events which
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 337
Table VI. An update about a nominal event
Commentary Outgoing controller Incoming controller
The incoming controllers asks
about the planned event of
radiator deployment which was
supposed to have occurred in
the previous shift.
Rads are deployed?
The outgoing controller implies
that the event occurred nomin-
ally and reminds the incoming
controller that only one of
two radiators was deployed, as
Port rad is deployed. We
only need the port.
did not trigger these cascading repercussions. All of the events discussed during the
handover updates are shown in Figure 2. The updates included a wide variety of
events along a continuum of deviation from expectations: nominal to off-nominal
to anomalous to escalating. Generally, the depth of the brieﬁng about the event was
a function of how far it deviated from expectations.
Nominal events are deﬁned as events that occurred as planned during the
mission. The main events of concern to the mechanical systems controllers
(MMACS) that were originally scheduled into the STS-76 ﬂight plan were liftoff,
shutdown of the mechanical systems upon obtaining orbit, radiator deployments,
radiator stows, extra-vehicular activity (EVA), docking and undocking with the
MIR space station, tests of the ﬂight control system a day before entry (FCS
checkout), and touchdown. Updates about nominal events were generally brief and
mainly given to conﬁrm that an event had occurred as expected (e.g., “They did
the EVA”). Not all of the nominal events were mentioned in the updates. In some
cases, additional details were provided about exactly what occurred during the
event because although it was mostly nominal, there were some aspects that should
be noted. In the example in Table VI, the update conﬁrmed that the event occurred
nominally and served as a reminder to the incoming controller that only the port
radiator was deployed in this case, which was the original plan but normally two
radiators are deployed.
Off-nominal events are deﬁned as unexpected deviations from the plan that had
few impacts to operational plans. The example in Table VII contains an update
about an off-nominal event: a temperature value in the third main engine that
was lower than expected. Note that the outgoing controller identiﬁed the event
based on noticing that the data from one system was lower than data from two
identical systems, even though the values were within the hard-coded nominal
ranges in the monitoring software. Also, the outgoing controller’s update had
338 EMILY S. PATTERSON & DAVID D. WOODS
Table VII. An update about an off-nominal event
Commentary Outgoing controller Incoming controller
The outgoing controller
mentions that there is a
potential problem with an
engine because although
the temperature value is
in the nominal range, the
value is lower than two
“System 3 main engine
return temp is lower than
the other two and I don’t
know why. So that’s a
He mentions similar data
that might also be related
because it is on the same
system and is also a lower
“Also, the main pump case
was 163 when the other
two were 180 post-ascent.”
The incoming controller
suggests a hypothesis to
account for the unexpected
“Hydraulic leak might
account for that.”
This suggestion triggers an
involved diagnostic debate
about two possible hypo-
theses that might account
for the data.
(Diagnostic debate about two possible hypotheses, a
hydraulic leak and a transducer, that might account
for the data.)
two related data deviations given sequentially, although he did not explicitly state
that the two deviations were related. In addition to giving the observation of the
low data, the incoming controller proposed a hypothesis to account for the data,
leading ﬂuidly into a diagnostic debate that allowed the two controllers to use
each other’s expertise to generate and evaluate hypotheses. Finally, the outcome
of the diagnostic debate did not include a resolution or selection of a particular
hypothesis, even as a working hypothesis, since it was not deemed important
to devote the resources to doing so. Had this update been about a large-scale
anomaly, selecting and justifying a rationale for an explanatory hypothesis would
have been much more important. At this stage, by learning about this deviation in
the handover, the incoming controller was prepared to:
• perform the activity of pulling the data,
• alter his expectations for monitoring to track those data points,
• connect that piece of data with other unexplained data, and
• answer questions as they arose.
Finally, there were two events during the STS-76 mission that were classiﬁed
as anomalous in that they were signiﬁcant enough deviations that they required
documented justiﬁcation of the rationale for diagnosis and response actions taken
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 339
during the mission, but did not cause an escalation of cognitive and coordinative
activities like the hydraulic leak anomaly:
1. a freeze in a water spray boiler that had almost no impact because, based on
experience in many past missions where that event was seen and the boiler
worked nominally when the coolant warmed up, no immediate action was
2. a microswitch failure on the payload bay doors; had the indication been correct,
it would have been a serious anomaly requiring an emergency landing.
It is interesting to note that, although events were clearly critical in the handover
updates, the practitioners rarely discussed base data values (e.g., “the pressure
is 82 psi”), but rather described data patterns in terms of events that were signi-
ﬁcant in some way (e.g., “there was a water spray boiler freeze”). In the situation
where an automated system would be used to monitor and call in practitioners, it
would be important for the system to highlight or visualize signiﬁcant events, not
just plot base data parameters (Thronesbery et al., 1999). It must be recognized,
however, that many of the shuttle events, and certainly the associated activities,
decisions, and data analyses, would be beyond the capabilities of an automated
logger to capture. Therefore, automated monitoring systems would need to be
designed such that this other information could be easily annotated by human
practitioners at regular intervals in order to avoid called-in practitioners lacking
critical information in escalating situations.
THE CASE OF THE MISSING UPDATE: UNPREPARED TO CLOSE VENT
Although in general the incoming ﬂight controllers took over the responsibility of
their positions without incident due in large part to the effectiveness of the updates
that they received, an incident was observed where the back room Mech controller
did not anticipate a request to close the vent doors prior to docking with the MIR
Space Station. The controller was clearly surprised by this request, as evidenced by
prior statements made by the controller that he did not believe the action would be
requested, a look of surprise when the request was made, and a delay in the timeline
because implementing the action took several minutes longer than expected. In
addition, the observed controller described the incident to the following shift’s
controller as: “In the unlikely event that we do it, I didn’t want to be stumbling
around, then all of a sudden we’re doing this ...”
The controller was unprepared for the request because he was not updated
by another agent in the distributed system about a reversal in the Russian space
agency’s stance toward the decision about closing the vent doors on the shuttle prior
to docking. The inferred evolution of the mindsets of the United States and Russian
space agencies regarding whether or not to close the vent doors prior to docking
are detailed in Table VIII. Normally the vent doors are left open in space to allow
oxygen to escape prior to entry. The anomalous hydraulic leak during ascent raised
340 EMILY S. PATTERSON & DAVID D. WOODS
Table VIII. Missing update on decision reversal triggers coordination surprise
concerns that hydraulic ﬂuid might contaminate the MIR Space Station. Analyses
conducted by both space agencies showed that the amount of leaked hydraulic
ﬂuid was negligible with the implication that it was not necessary to close the
vent doors prior to docking. In addition, NASA planned to conduct a space walk
during the mission, demonstrating that they were not concerned about the hydraulic
ﬂuid contaminating the interior of the shuttle. During communications between the
American and Russian space agencies, the two organizations presented evolving
stances toward the decision. One day before docking, the Russians announced that
they were “90% go” on docking without closing the vent doors. The observed
controller, along with the entire mission control center at NASA Johnson space
center, assumed that this was a ﬁnal decision not to close the vent doors, as
evidenced by a voice loop update to the ﬂight director.
Sometime between the conference call and the docking, a representative of the
American space agency had a private phone conversation with a representative
from the Russian space agency where the decision not to close the vent doors prior
to docking was reversed. This decision reversal was never communicated to the
personnel in mission control, with the subsequent consequence of the observed
controller dedicating his resources to preparing for other tasks and therefore being
unprepared for the request.
The observation of this instance where a missing update impacted the perform-
ance of the staffed controller provides converging evidence that updates are central
to effective performance. When practitioners are not fully updated on the current
situation, they are vulnerable to these types of ‘coordination surprises’ (Patterson et
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 341
al., 1998). Therefore, coping with additional workload during escalating situations
by delaying or eliminating updates to called-in practitioners will lead to predictable
cognitive and coordinative breakdowns. An essential element in maintaining safe
operations with the on-call architecture is to understand how to minimize the effort
to bring incoming practitioners quickly and efﬁciently up to speed in escalating
The study ﬁndings highlight the importance of updates in preparing incoming
practitioners to effectively accept responsibility to be a supervisory controller in
a dynamic, event-driven, complex setting and the central role of prior knowledge
during the updates. During the update, practitioners learn the status of both the
monitored process and distributed agents’ activities in response to expected and
unexpected changes in the process ﬂow. These observations elucidate why control-
lers will often refuse to accept a transfer of responsibility from another controller
without a face-to-face verbal update. The cognitive impact of the update was
observed in all facets of a ﬂight controller’s work. The expectations for monit-
oring were set by knowing what changes had been made to system conﬁgurations
and what events had occurred. The agenda of activities to be done during the
upcoming shift was inﬂuenced by knowing what past activities were concluded and
what activities were ongoing. Knowing the team’s stance toward critical decisions
impacted communications with other controllers, particularly when decisions were
reopened for debate.
In addition to direct implications for training how to conduct effective shift
changes in supervisory control settings, such as by conducting pre-planning for
updates by looking at logs and other documentation, these study ﬁndings point
to other design and organizational implications relating to on-call architectures.
Under pressure to be more cost-efﬁcient, NASA and other organizations are inter-
ested in using computer-enhanced sensor data processing to enable the reduction
of stafﬁng during nominal situations. These ﬁndings have implications for two
envisioned scenarios where the on-call architecture for supervisory control is used
to meet this economic goal. In the ﬁrst, stafﬁng is minimized until a problem
occurs. In this case, the staff must recognize when problems occur and call in
practitioners with the appropriate types of expertise to resolve the problem. An
example of this scenario is the role of the Station Duty Ofﬁcer, who is the only
staffed ﬂight controller for the US Space Station for all but 3 hours a week, when no
crew is onboard the space station. In the second scenario, a computerized system
monitors a process and alerts humans when a problem occurs that requires their
attention. Although this may seem somewhat futuristic, this scenario is already
being considered in several domains, including scientiﬁc spacecraft mission control
(Brann et al., 1996) and unmanned missions to Mars.
342 EMILY S. PATTERSON & DAVID D. WOODS
Table IX. Representative entries in traditional mission control automated logs
M23:51:35 MODE SEL MAN ORB UNL V72K2975J has changed from 1 to 0
M23:51:35 MODE SEL MAN EE V72K2976J has changed from 0 to 1
M23:51:55 ORB UNL MODE IND V72X2906J has changed from 1 to 0
M23:54:55 EE MODE AUTO V72K2990J has changed from 0 to 1
M23:51:55 ENTER V72K2982J has changed from 1 to 0
In both of these scenarios, computer-enhanced sensor data processing is
required in order to monitor the massive amounts of data in order to recognize
signiﬁcant events that need to be brought to a human’s attention. For example,
the traditional mission control automated log entries for one console for a few
minutes of data is displayed in Table IX (see Patterson, 1997, for a description of
current logging tools in space shuttle mission control). Clearly, if one controller is
responsible for monitoring dozens of such consoles at one time, signiﬁcant events
could be missed due to the sheer mass of the data without “intelligent” machine
process support to recognize, prioritize, and highlight deviations from expectations.
For both scenarios, it is clear from these observations that the event recognition,
prioritization, and communication conducted by the mission controllers was much
different than that provided in the traditional automated logs. First, the controllers
did not communicate base data about “bit ﬂips” on sensor data “changed from
0 to 1.” In fact, communications about exact data values was nearly non-existent
during the updates. The event descriptions were at a much higher level, such as
“ratty circ pump” and “heater cycling” that were based on a complex combination
of multiple parameters, not all of which would independently be viewed to be out
of normal ranges for most situations and not all of which occurred simultaneously
in a discrete fashion. Second, not all of the nominal events were included in the
updates, although all of the off-nominal and anomalous events were. Therefore,
events that deviated from expectations were treated differently during the updates,
and the expectations were highly tailored to what was happening in the mission as
well as against a baseline of deviations such as water spray boilers that often freeze
up. Third, the event “signature” on which the recognition of the event was based
was nearly always more complex than a threshold crossing on a single parameter.
For example, one of the events was about a temperature on an engine that was lower
than two other engine temperatures. This temperature value was not out of range
of nominal parameter values. In addition, there are situations where one would
expect the temperature to be lower than the observed value, such as upon entry in
the cold atmosphere, which would not constitute an event. Fourth, most automated
logging tools only capture and display past information, and much of the handover
content related to future events, activities, analyses, and decisions, in order to help
the incoming controller prepare for and anticipate these things, or else pass them
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 343
on to the next shift to do so. Finally, many of the events that were discussed in
the handover updates were not about the space shuttle, but about deviations in
expected activities, decisions, and plans with other agents, such as the reversal to
the decision to keep the vent doors open, and so would be nearly impossible for an
automated logger to detect at all due to difﬁculties in designing sensors to detect
those higher-level abstractions.
Overall, one implication of these observations is that automated loggers are
currently not capable of completely replacing a human monitor in accurately
detecting all unexpected data patterns, prioritizing these patterns, displaying them
ﬂexibly at multiple levels of detail upon demand, and quickly ﬁlling in targeted
holes in knowledge upon request. At the same time, these observations point
out how heavily the effectiveness and efﬁciency of the updates relied upon the
incoming controllers having substantial knowledge prior to initiating the update.
This observation calls into question whether or not updates to called-in practi-
tioners could be conducted as quickly and effectively as these shift change updates.
Without as much prior knowledge, as would be the case in a call-in situation,
called-in practitioners would likely take much more time and resources before
they could effectively aid the staffed practitioner. It is likely that the update would
take longer, and updating would be a larger cost to the staffed practitioner at a
very busy time than if the incoming practitioner were more knowledgeable. The
burden for thoroughly covering all of the topics to be discussed would fall onto the
staffed practitioner, and possibly the called-in practitioner would try to raise topics
that are less relevant. Rather than being able to target speciﬁc gaps in knowledge
through directed questions, the staffed practitioner would be forced to cover more
information in the update or risk leaving out important information. Finally, the
common ground would not have been built up between the staffed and the called-in
practitioner, so the communications would be less terse and rely less on a common
body of shared knowledge, leaving open more possibilities for miscommunica-
tions. Given that these shift change updates were ten minutes on average and that
in escalating on-call situations, ten minutes might be prohibitively long to tie up
the resources of the staffed controller, it is likely that other means to prime on-call
practitioners to receive updates might become important in effectively drawing in
the called-in practitioner in the ﬁrst scenario.
A partial organizational response to this dilemma would be for called-in prac-
titioners to invest in a process understanding before any problems occur. NASA
Johnson Space Center has already implemented this organizational solution during
missions where the stafﬁng is reduced unless a problem occurs. They have made
being on-call an ofﬁcial responsibility that requires investment, although less than
if all the controllers are continuously staffed. For each mission, two controllers
are assigned the responsibility of being on call, one scheduled from midnight to
noon and another from noon to midnight. These controllers observe critical phases
of the mission, such as ascent. They also stop by the console in mission control to
obtain updates, read the log, listen to the voice loops, and watch the monitored data
344 EMILY S. PATTERSON & DAVID D. WOODS
once a day for about 15 minutes. By investing in learning about events that have
occurred during low-tempo periods, they are then more prepared to respond in
an on-call situation. We are considering how to additionally support this solution
by providing ‘open’ tools remotely like voice loops and data screens for on-call
controllers who are physically and temporally removed from the control center so
that they can gain a process feel without leaving their ofﬁces. It is also possible
that the same tool could be used to provide called-in practitioners with a partial
understanding to prime them for the update from the staffed practitioner, thereby
reducing some of the burden of updating the incoming practitioner at a busy time.
This ﬁeld study also suggests implications for the second envisioned on-call
scenario where humans are removed from the monitoring loop during nominal situ-
ations. In this situation, it is likely that machine processing would have to perform
some control activities, not just monitor and record deviations from expectations,
in order to reduce the number of times a human agent would need to be called in.
From the results of this ﬁeld study, it is clear that such a tool would probably over-
control or inaccurately control a complex process on occasion. Therefore, control
actions from such systems should be highly constrained and the consequences of
over-controlling or inaccurately controlling should be low.
Based on the results of this study, it must be acknowledged that even with the
most advanced automated monitors, it would be dangerous to completely remove
human personnel from nominal operations in complex, high-risk environments
with escalating events. Automated loggers are mainly used to capture and manip-
ulate data at the level of data parameter values, missing much of the information
about signiﬁcant events, and activities and changes to plans that are associated with
cascades from escalating events. Rather than completely replacing human super-
visory controllers with automated loggers, perhaps we can develop support tools
for human practitioners that are only intermittently involved. For example, we can
develop ‘hybrid’ systems, where humans can periodically annotate information that
cannot be captured electronically onto automated logs. These records could then
be used to prime called-in practitioners for updates during escalating situations.
The pressure to minimize costs during nominal operations is expected to
continue to mount in most supervisory control domains. Given the potentially
extreme risks associated with failing to effectively integrate in the additional
resources necessary to respond to an escalating situation, there is increased interest
in ﬁnding ways to mitigate those risks. The ﬁndings from this ﬁeld study high-
light the inﬂuence of prior knowledge and building a common ground between
practitioners in having an effective and efﬁcient update. Investing in a common
ground before problems occur, by getting updates from staffed practitioners
during low workload periods and ‘looking in’ on data remotely through computer
support tools, will allow practitioners to be more effective at seamlessly providing
the necessary expertise and additional resources to safely respond to escalating
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 345
Support for this research was provided by NASA Johnson under the Grant No.
NAG 9-786, Human Interaction Design for Cooperating Automation. This work
was made possible through collaboration with colleagues in the Intelligent Systems
Branch, including Dr. Jane Malin, Dr. Carroll Thronesbery, Dr. Debra Schreck-
enghost, Mr. Ron Kerr, Dr. David Overland, and Dr. Tico Foley, as well as
with colleagues from the Cognitive Systems Engineering Laboratory, including
Dr. Jennifer Watts-Perotti, Mr. James Corban, Ms. Renee Chow, and Mr. Klaus
Christoffersen. This material is also based upon work supported under a National
Science Foundation Graduate Fellowship. Any opinions, ﬁndings, conclusions or
recommendations expressed in this publication are our own and do not necessarily
reﬂect the views of the National Science Foundation. We thank four anonymous
reviewers for their useful critiques and suggestions.
Benchekroun, H., B. Pavard and P. Salembier (1995): Design of Cooperative Systems in Complex
Dynamic Environments. In J.-M. Hoc, P.C. Cacciabue and E. Hollnagel (eds.): Expertise and
Technology: Cognition and Human-Computer Cooperation. Lawrence Erlbaum.
Brann, D.B., D.A. Thurman and C.M. Mitchell (1996): Human Interaction with Lights-out Automa-
tion: A Field Study. In Human Interaction with Complex Systems ’96. Dayton, OH.
Clark, H. and S. Brennan (1991): Grounding in Communication. In L. Resnick, J. Levine and S.
Teasley (eds.): Socially Shared Cognition. Washington, DC: American Psychological Associ-
Clark, H.H. (1992): Arenas of Language Use. Chicago: The University of Chicago Press.
Grusenmeyer, C. (1995): Shared Functional Representation in Cooperative Tasks – The Example
of Shift Changeover. International Journal of Human Factors in Manufacturing, vol. 5, no. 2,
Heath, C. and P. Luff (2000): Technology in Action. Cambridge: Cambridge University Press.
Hollnagel, E., O. Pederson and J. Rasmussen (1981): Notes on Human Performance Analysis
(Technical Report Riso-M-2285). Riso National Laboratory.
Hutchins, E. (1995): How a Cockpit Remembers Its Speed. Cognitive Science, vol. 19, pp. 265–288.
Johannesen, L., R. Cook and D. Woods (1994): Grounding Explanations in Evolving Diagnostic Situ-
ations (CSEL Report 1994-TR-03). The Ohio State University, Cognitive Systems Engineering
Jones, P.M. (1995): Cooperative Work in Mission Operations: Analysis and Implications for
Computer Support. Computer Supported Cooperative Work, vol. 3, pp. 103–145.
Kerns, K., P.J. Smith, C.E. McCoy and J. Orasanu (1998): Ergonomic Issues in Air Trafﬁc Manage-
ment. In W. Marras and W. Karwowski (eds.): Handbook of Industrial Ergonomics. CRC
Patterson, E.S. (1997): Coordination Across Shift Boundaries in Space Shuttle Mission Control
(CSEL Report 1997-TR-01). The Ohio State University, Cognitive Systems Engineering Labor-
Patterson, E.S., D.D. Woods, N.B. Sarter and J. Watts-Perotti (1998): Patterns in Cooperative Cogni-
tion. COOP ’98, Third International Conference on the Design of Cooperative Systems. Cannes,
France, 26–29 May, pp. 13–23.
346 EMILY S. PATTERSON & DAVID D. WOODS
Patterson, E.S., J. Watts-Perotti and D.D. Woods (1999): Voice Loops as Coordination Aids in Space
Shuttle Mission Control. Computer Supported Cooperative Work: The Journal of Collaborative
Computing, vol. 8, no. 4, pp. 353–371.
Rochlin, G.I., T.R. La Porte and K.H. Roberts (1987): The Self-designing High-reliability Organiza-
tion, Aircraft Carrier Flight Operations at Sea. Naval War College Review, Autumn, pp. 76–90.
Suchman, L. (1987): Plans and Situated Actions: The Problem of Human-Machine Communication.
Cambridge: Cambridge University Press.
Thronesbery, C., K. Christoffersen and J. Malin (1999): Situation-oriented Displays of Space
Shuttle Data. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting,
September 27–October 1, Houston, TX, pp. 284–288.
Wegner, D., T. Giuliano and P. Hertel (1985): Cognitive Interdependence in Close Relationships. In
W. Ickes (ed. ): Compatible and Incompatible Relationships. New York: Springer-Verlag.
Woods, D.D. (1993): Process Tracing Methods for the Study of Cognition Outside of the Exper-
imental Psychology Laboratory. In G. Klein, J. Orasanu and R. Calderwood (eds.): Decision
Making in Action: Models and Methods. Norwood, NJ: Ablex Publishing Corporation.
Woods, D.D. and E.S. Patterson (2001): How Unexpected Events Produce an Escalation of Cognitive
and Coordinative Demands. In P.A. Hancock and P.A. Desmond (eds.): Stress Workload and
Fatigue. Hillsdale, NJ: Lawrence Erlbaum, pp. 290–304.
Woods, D.D. (1994a): Cognitive Demands and Activities in Dynamic Fault Management: Abductive
Reasoning and Disturbance Management. In N. Stanton (eds.), Human Factors in Alarm Design.
Bristol, PA: Taylor and Francis.
Woods, D.D., L.J. Johannesen, R.I. Cook and N.B. Sarter (1994b): Behind Human Error: Cognitive
Systems, Computers, and Hindsight. Dayton, OH: CSERIAC.