ArticlePDF Available

Computer Supported Cooperative Work: The Journal of Collaborative

Authors:

Abstract and Figures

In domains such as nuclear power, industrial process control, and space shuttle mission control, there is increased interest in reducing personnel during nominal operations. An essential element in maintaining safe operations in high risk environments with this 'on-call' organizational architecture is to understand how to bring called-in practitioners up to speed quickly during escalating situations. Targeted field observations were conducted to investigate what it means to update a supervisory controller on the status of a continuous, anomaly-driven process in a complex, distributed environment. Sixteen shift changes, or handovers, at the NASA Johnson Space Center were observed during the STS-76 Space Shuttle mission. The findings from this observational study highlight the importance of prior knowledge in the updates and demonstrate how missing updates can leave flight controllers vulnerable to being unprepared. Implications for mitigating risk in the transition to 'on-call' architec...
Content may be subject to copyright.
Computer Supported Cooperative Work 10: 317–346, 2001.
© 2001 Kluwer Academic Publishers. Printed in the Netherlands. 317
Shift Changes, Updates, and the On-Call
Architecture in Space Shuttle Mission Control
EMILY S. PATTERSONand DAVID D. WOODS
Institute for Ergonomics, Cognitive Systems Engineering Laboratory, Ohio State University, 210
Baker Systems, 1971 Neil Ave., Columbus, OH 43210, U.S.A.
(author for correspondence, E-mail: patterson.150@osu.edu)
(Received 14 July 2000)
Abstract. In domains such as nuclear power, industrial process control, and space shuttle mission
control, there is increased interest in reducing personnel during nominal operations. An essential
element in maintaining safe operations in high risk environments with this ‘on-call’ organizational
architecture is to understand how to bring called-in practitioners up to speed quickly during esca-
lating situations. Targeted field observations were conducted to investigate what it means to update a
supervisory controller on the status of a continuous, anomaly-driven process in a complex, distributed
environment. Sixteen shift changes, or handovers, at the NASA Johnson Space Center were observed
during the STS-76 Space Shuttle mission. The findings from this observational study highlight the
importance of prior knowledge in the updates and demonstrate how missing updates can leave
flight controllers vulnerable to being unprepared. Implications for mitigating risk in the transition
to ‘on-call’ architectures are discussed.
Key words: anomaly, common ground, decision, ethnography, event, knowledge, mutual awareness,
observation, plan, shift change, update
1. ‘On-call’ architecture in supervisory control
In supervisory control domains such as nuclear power, industrial process control,
and space shuttle mission control, there has been a widespread trend of reducing
large deployments of human personnel continuously monitoring dedicated subsets
of process data to minimizing human personnel until a problem arises, at which
time additional resources are called in. This ‘on-call’ architecture has the potential
to reduce operational expenses by using the full reservoir of resources only when
needed.
For example, during the STS-75 Space Shuttle mission, a tethered scientific
satellite unexpectedly separated from the shuttle. As a result, two flight controllers
were immediately called in to support the nominally staffed controller respon-
sible for the mechanical systems on the shuttle. The first controller took over the
standard operations for the nominally staffed controller. This substitution allowed
the nominally staffed flight controller to work with the second called-in controller
318 EMILY S. PATTERSON & DAVID D. WOODS
on developing a way to prevent the astronauts from being electrically shocked when
recapturing the satellite.
By definition, with the on-call architecture, personnel are brought in only when
a situation is unusual, has begun to deteriorate, or involves high stakes. They are
called in as part of an escalation of cognitive and coordinative activities (Woods
and Patterson, 2001). There is inherently a ‘workload double-bind’ (Woods et al.,
1994b) in that when the on-call practitioner is most needed to provide additional
resources and expertise, the staffed practitioner has the least time to update the
incoming practitioner and to coordinate to redistribute the workload.
Therefore, in order to ensure that the on-call architecture functions effectively,
we need to identify ways to quickly bring incoming practitioners ‘up to speed’
without tying up the resources of the staffed practitioners during critical periods.
The goal of this research was to better understand how practitioners are currently
brought ‘up to speed’ in a complex, dynamic supervisory control setting. To this
end, targeted field observations were conducted of updates in space shuttle mission
control: shift change handovers between mechanical flight controllers during the
STS-76 mission.
The goal of the shift change handover is to prevent a break in the ow of the
monitored process and activities conducted by the flight controllers when there is
a change in personnel (e.g., Grusenmeyer, 1995, shift changes in paper mills). A
successful handover is defined by a smooth continuity of operations from one shift
to the next. There are two senses to this definition. The first is to avoid a rift in
terms of interactions with others and ongoing activities being conducted. In other
words, the work should continue as if the operator had never been replaced. The
second is for the incoming operator to understand what had happened as if he or
she had been present and personally engaged in all the activities. The handover
update is given to avoid having an incoming practitioner:
have an incorrect or incomplete model of the process state,
be unaware of significant data or events,
be unprepared to deal with impacts from previous events,
fail to anticipate events,
lack knowledge that is necessary to perform relevant tasks,
drop or rework activities that are in progress or that the team has agreed to do,
or
create an unwarranted shift in goals, decisions, priorities, or plans.
The paper is organized as follows. We introduce the domain of space shuttle
mission control, including how responsibility is hierarchically distributed and
handovers are nominally conducted. We describe how the observational data was
analyzed and provide an overview of the observed STS-76 mission. The study
findings are then described. Implications of these findings for mitigating risk in
two on-call scenarios are discussed.
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 319
2. Overview of space shuttle mission control
2.1. HIERARCHICAL DISTRIBUTION OF RESPONSIBILITY IN MISSION
CONTROL
Ground-based mission control for the Shuttle Program at the NASA Johnson Space
Center (JSC) is responsible for supporting the crew in meeting the objectives of
the mission and for ensuring the health of the spacecraft while in flight. The Flight
Director (Flight) acts as the central decision maker and coordinates the inform-
ation flow between the various flight controllers responsible for subsystems of
the shuttle. Flight and the approximately sixteen main controllers sit in assigned
positions in the ‘front room’. Various support personnel, known as the ‘back room
controllers,’ support the front room controllers. For example, for the Maintenance
Mechanical Arm and Crew Systems (MMACS) team, the front room position
is called MMACS and the back room controllers are Mechanical (Mech I and
II), In-Flight Maintenance (IFM), Photo/TV, and Escape. The observations were
conducted at the back room Mech I console position. The Mech is responsible
for the health and safety of mechanical systems such as power units, heaters, and
payload bay doors.
The MMACS team is responsible for ensuring the health and safety of the
orbiter’s structural and mechanical subsystems during a mission. Flight controllers
do much more than continuously monitor system parameters for anomalous read-
ings. Although this is a critical task, there are many subtleties and complexities
in the functions that they fulfill, particularly since surprising events are common
during space missions. Controllers must exhibit creativity, the ability to work with
others, and deep knowledge not only of their mechanical systems but also of the
rationales and risk trade-offs behind the flight rules so that they can be applied or
modified for the specific circumstances.
2.2. HANDOVER UPDATES
Three shifts a day are scheduled when the shuttle is in orbit. Each shift change, or
handover, is scheduled for one hour.
Handover updates are designed to have information flow bottom-up through the
hierarchy of the incoming shift. For every position, the outgoing flight controller
updates the incoming controller, both physically co-located at their assigned
consoles. These primary briefings are essentially private (i.e., without using the
voice loop communication system described in Patterson et al., 1999), with the
convention that no one is allowed to interrupt these communications. After the
intensity of the primary briefings has died down, the incoming back room control-
lers (e.g., Mech) brief the incoming front room controllers (e.g., MMACS). These
briefings are used to check the understanding from the primary briefings and
coordinate the activities to be conducted during the shift. This update is conducted
on a dedicated voice loop channel (e.g., MMACS ALT) on which the flight control-
320 EMILY S. PATTERSON & DAVID D. WOODS
lers speak using headphones with audio hookups so that other controllers can listen
in on the communications. In parallel with the briefings from the back room to
the front room controllers, the incoming front room controllers give the incoming
Flight Director a short, high-level update on a voice loop that is dedicated for this
purpose (AFD CONF). These briefings are closely monitored by the entire mission
control center, which serves to check the shared understanding of the situation
following the various discipline handovers.
3. Methods
Sixteen of twenty-seven handovers in the mission control center (MCC) at the
NASA Johnson Space Center were directly observed during the Space Shuttle
mission STS-76 [EP]. The 16 observed handovers were divided between the three
shift transitions (5, 6, and 5). The naturally occurring verbal behavior was audio-
taped. In order to minimize the effect of observation on the flight controllers’
behavior, previous observations had been conducted with the controllers and ques-
tions to clarify the content of the handovers were asked only after the handover was
completed.
The raw data included field notes of face-to-face and voice loop verbal commu-
nications and copies of flight documentation such as handwritten logs and flight
plans. The data was analyzed iteratively, using theoretical frameworks to recognize
and abstract relevant patterns (Hollnagel et al., 1981). Process tracing proto-
cols (Woods, 1993) for each handover were created that described the activities
in domain-independent terms and separated the communications made by the
different participants (Figure 1). One-page summaries for each handover were
generated and patterns across the handovers related to the research question were
identified.
As has been noted by many ethnographic, or “cognition in the wild”,
researchers, observation and analysis is heavily influenced by the theoretical frame-
works that are used to recognize and abstract patterns in complex data. Three
frameworks in particular guided the observation and data analysis in this study:
(1) dynamic fault management, (2) distributed replanning in anomaly response,
and (3) common ground in communication.
The first conceptual framing of the flight controller’s task was dynamic fault
management (Woods, 1994a). With this framing, a controller recognizes unex-
pected findings in the data stream, conducts diagnostic searches, and generates
hypotheses about faults that could account for the observed pattern of disturbances.
This reasoning process goes on in parallel with interventions intended to either
protect systems, i.e., “safing” interventions, or to gather additional information,
i.e., diagnostic interventions. For a difficult anomaly, there can be challenges in
diagnosing the anomaly, figuring out its impacts on related subsystems, performing
safing activities in parallel with troubleshooting, and deciding whether or not to
obtain more data. Based on this framework, during the transfer of responsibility
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 321
Figure 1. Using conceptual frameworks to guide data abstraction and analysis.
in a shift change, updates were anticipated to potentially include: (1) unex-
pected findings in the data stream that might be symptoms of system faults, (2)
diagnostic hypotheses to account for unexpected findings, (3) impacts of faults on
the monitored systems and other agents, (4) cascading events that were triggered
by a system fault, (5) diagnostic interventions, and (6) safing interventions.
Second, it was believed that several important elements might not be covered by
the dynamic fault management framework alone that are important in distributed,
322 EMILY S. PATTERSON & DAVID D. WOODS
dynamic, event-driven supervisory control. First, although theoretically dynamic
fault management could be accomplished by a team as well as an individual,
the implicit assumption of the framework is that planning can be conducted by
the agent or agents performing dynamic fault management largely independently
of the goals and plans of other agents. In other words, there are few interde-
pendencies with agents external to the immediate team to take into account with
respect to dynamic fault management activities. In space shuttle mission control,
the coordination required by agents is very complex. Distributed replanning is a
critical component of anomaly response (Woods, 1994a). In distributed replanning,
multiple people supported by computerized systems assess the implications of an
unexpected finding, or anomaly, for planned future activities, evaluate contingen-
cies, and modify plans in progress. During replanning, coordination across multiple
people in different roles is more complex than assigning and synchronizing tasks.
As part of this coordination, teams of people adopt and portray stances about crit-
ical decisions that affect multiple agents. The concept of stance is a combination of
a position towards a significant issue (i.e., a decision a team faces) and the rationale
for that position, which is often predictable given the position on a tradeoff func-
tion associated with particular roles. For example, mechanical systems controllers
might be more concerned with determining the cause of a malfunction than control-
lers primarily tasked with the safety of the astronauts, such as the flight surgeon.
Based on this framework, during the transfer of responsibility in a shift change,
updates were anticipated to potentially include evidence of discussions about: (1)
plans, (2) stances, (3) goals, (4) positions on tradeoff functions, (5) contingencies,
(6) intent, (7) impacts to previously planned activities and expectations within the
team, and (8) impacts to previously planned and expectations of other teams.
Third, the goal of the observed updates could be framed as creating and main-
taining a common understanding, or common ground (Clark and Brennan, 1991)
between human agents. This common ground is what would allow the practitioners
to accept the responsibility and authority associated with a position for a period
of time without being taken by surprise. As others have observed, the notion of
common ground is a complex conglomerate of many interdependent elements,
including the interacting elements of: (1) knowledge that is known to be shared
between individuals (Clark and Brennan, 1991; Wegner et al., 1985; Hutchins,
1995), (2) shared goals or intentions, (3) mutual beliefs about the current state
of affairs and the predicted effects of actions on the state of affairs (Clark, 1992;
Suchman, 1987; Clark and Brennan, 1991), (4) shared awareness of others’ activ-
ities and the state of the monitored process, and (5) common frames of reference
(e.g., fixed line diagram in London Underground line control room, Heath and
Luff, 2000). The conceptual framework of common ground influenced the data
observation and analysis in that updates to relatively ungrounded controllers, such
as the update immediately following ascent, were anticipated to have a different
character than updates based upon a more established common ground. In addi-
tion, deviations from expectations, including unexpected data and changes to the
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 323
plan, were expected to be highlighted more than data and plans that conformed to
prior expectations. Finally, it was anticipated that controllers might explicitly use
strategies that built upon existing shared understandings in the updates, such as by
implicitly assuming that some topics and sub-topics would not need to be included
and using coded language to communicate more efficiently.
4. The observed STS-76 mission
The STS-76 mission included a rendezvous docking with the MIR Space Station.
As a result, there was a very short liftoff window (seven minutes instead of several
hours) and the MMACS team had to monitor specialized docking mechanical
systems. Due to the additional workload, the back room Mech position was staffed
for the entire flight instead of only during the high-tempo periods such as ascent
and entry, which is the staffing configuration for nominal missions.
The initially scheduled liftoff was postponed for one day because of high winds
and rough seas at Cape Kennedy (Figure 2 provides an overview of the mission
events). The second liftoff attempt began without incident at 2:13 a.m. on March
22, 1996. During ascent, two anomalies in the systems under the responsibility of
the MMACS team were observed: a freeze in the Water Spray Boiler (WSB) that
cools the third Auxiliary Power Unit (APU), and a hydraulic leak on the third APU.
Both anomalies were definitively diagnosed and neither was severe enough to
require an aborted ascent, so the shuttle attained its planned orbit altitude and most
of the ascent mechanical systems were shut down. The first anomaly, the Water
Spray Boiler (WSB) freeze-up, is a relatively common problem with a well-defined
response procedure that mainly involves verifying that the WSB works a day before
entry. Although this procedure could not be implemented because the water spray
boiler was on the same system that had the hydraulic leak, the WSB freeze-up
did not cause an escalation of cognitive and coordinative activities because the
procedural action was not required for several days and the eventual decision to
assume that the WSB would be operational for entry without the standard test was
not contested.
The second anomaly was significant and novel enough to create an escalation
of cognitive and coordinative activities. The MMACS controller with specialized
knowledge of the Auxiliary Power Unit (APU) immediately called himself in,
based on watching the ascent on NASA Select TV, to provide expertise in deciding
whether to shorten the mission duration. The decision was made not to shorten
the mission because the leak was small enough that some capability remained in
the APU system and the leak was unlikely to get much worse during the gener-
ally quiescent orbit configuration. There were cascading repercussions from this
anomaly to several other aspects of the mission as actions were taken to protect the
leaking hydraulic system, both to maintain effective redundancy of critical systems,
and to protect the MIR Space Station from contamination (Table I). Several of
these planned potential actions were debated by additional called-in operational,
324 EMILY S. PATTERSON & DAVID D. WOODS
Figure 2. Overview of events and observed handovers in STS-76 mission.
engineering, and management personnel to ensure that the plans were robust to
contingencies.
On flight day 8, the decision was made to come home one day early (flight day 9
instead of flight day 10) due to weather predictions at Kennedy Space Center (KSC)
and concern about the reduced redundancy in the APUs due to the hydraulic leak.
On flight day 9, however, both opportunities for entry were waived off because of
fog and unpredictable weather at KSC. The astronauts prepared to spend another
day in orbit, expecting to land on flight day 10.
When the decision was made to stay on orbit for another day, the payload bay
doors were commanded to open but the procedure automatically halted when the
sensor indicated that one latch was still closed. After the crew visually determined
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 325
Table I. Changes to plans as a result of the hydraulic leak
Changes to plans Rationale
Minimize circulation pump operations To minimize the use of the leaking APU
Close vent doors before docking with the
station
To protect the space station from hydraulic
fluid
Use a circulation pump instead of an APU to
check the flight control system
To reduce the risk of losing redundancy on
APUs
Use 2/3 APUs for entry To avoid relying on the leaking APU
Land at Edwards not Kennedy Space Center To minimize stress from crosswinds on APUs
the latches to be in an open configuration, it was assumed that the sensor was
giving an erroneous indication, and the doors were commanded open manually.
They opened without further incident. If the payload bay doors had not opened, the
shuttle would have had to make an immediate emergency landing.
On flight day 10, the second landing opportunity at Edwards Air Force Base
was taken after waiving off the first landing at KSC due to poor weather conditions.
The decision to land at the less preferred Edwards site, which requires expensive
ground transport back to KSC, was made in order to have better weather conditions,
particularly lower crosswinds. The shuttle therefore touched down at Edwards at
7:29 a.m. March 31, 1996, and responsibility for the orbiter transferred from the
flight controllers at the NASA Johnson Space Center to other NASA organizations.
5. Findings from the observational study
It might be expected that, during the hour scheduled for each handover, the
incoming controller would immediately and continuously receive verbal updates
until the outgoing controller departed. This situation was not the case in any of
the observed 16 handovers. In every handover except the handover immediately
following ascent (which had been personally observed by the incoming controller)
and Handover 9 when the incoming controller read a packet of information left by
the outgoing controller, the controllers engaged in short high-tempo briefings about
20 minutes after the incoming controller arrived (Table II). During the time prior
to the update, the incoming controller would generally sit next to the outgoing
controller while listening to the voice loops, monitor the data screens, and look
through the flight log and other documentation. One controller (personal commu-
nication) described his opinion about the reason why handover updates often do
not begin immediately upon arrival of the incoming controller:
You can see during handover that one of the first items that would happen is that
the oncoming shift, the incoming shift, would sit down and read the previous
two shifts since he was in. And see what had happened over the 16 hours since
326 EMILY S. PATTERSON & DAVID D. WOODS
Table II. Length and start time of observed handover briefings
Handover Handover Briefing Primary briefing Voice loop Handover duration (min)
start time (min) start time (min) briefing start
time (min)
14:20120 N/A40
25:561422 36 42
3 16:18 20 32 68 88
4 14:51 10 0 N/A 39
5 20:58 15 15 N/A 55
6 23:41 10 0 33 49
7 1:25 11 28 N/A 39
89:3684 39 42
9 17:00 N/A 40 N/A N/A
10 1:34 1 6 N/A 41
11 7:20 8 (5+3) 15 N/A 137
(includes meeting)
12 9:12 14 0 N/A 14
13 16:52 9 (2+7) 42 N/A 71
(includes meeting)
14 9:45 4 (1+3) 47 0 50
(includes troubleshooting)
15 17:02 13 13 N/A 31
16 11:53 10 (2+8) 27 N/A 40
Avg 9.87 19.44 35.20 43.33
St Dev 5.17 15.53 24.16 17.17
he had been in. They would sit down and discuss it with the person that they’re
taking over from and any other little innuendos that haven’t been mentioned in
the log so that they are well aware that everything that has happened up until
that point in time. Because when that person goes home, you know, they don’t
have any insight. So if theres anything further coming up ...then theyre not
surprised by it, they know about it and they’re well aware of it. They know
who else is aware of it . . . It’s a good system. We couldn’t operate without logs
...Very important stuff.
Also during the handover time, the incoming controllers would occasionally
brief their incoming superiors over the voice loops, who would then brief their
superior, the incoming flight director. The position responsibility was officially
handed over when the incoming controller switched from an alternate to a primary
team voice loop and the handover officially ended when the flight director from
the outgoing shift verbally released the outgoing controllers via the Flight Director
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 327
voice loop. In several instances, outgoing controllers stayed beyond the official
end of the handover to perform specific activities or attend meetings related to the
hydraulic leak anomaly.
The findings from the observations highlight the influence of prior knowledge
on the updates and how missing updates can leave flight controllers vulnerable to
being surprised or unprepared. First, the incoming controllers initiated many of the
topics in the handover updates, demonstrating shared knowledge about what topics
would be important to cover in the handover. Second, incoming controllers were
observed to ask questions that were highly specific and indicated a detailed know-
ledge of the current status of a particular topic item, offloading much of the work
necessary by the outgoing controller to determine what the incoming controller
needed to learn. Third, the content of the handovers heavily emphasized events and
activities, data analyses, and decisions that were triggered by the escalating event
of the hydraulic leak anomaly. Finally, although many of the updates were effective
in bringing the incoming controllers up to speed, an incident was observed where
a controller was surprised by a request to close the vent doors because he had not
been updated that there had been a reversal to a prior decision not to close the
doors.
5.1. MIXED-INITIATIVE INTERACTIONS:TOPIC INITIATIONS BY INCOMING
AND OUTGOING CONTROLLERS
Handover updates fluidly shifted from one topic to another. Handover 13 (Figure 3)
between the outgoing and incoming back room Mech controllers is used to illus-
trate how topics were initiated during the handover updates. Above each line is a
description of the topic that is introduced by either the outgoing flight controller
(on the left) or the incoming flight controller (on the right) and below the line is
the beginning of the dialogue on that topic. The entire briefing took nine minutes,
divided into two segments of 2 and 7 minutes due to a pre-arranged side meeting
with another person. The update, like all of the handovers, began with a recogniz-
able signal that the controller was willing to initiate the briefing: “Anything going
on?” Following this initial question, the controllers began discussing a meeting
between the mission controllers and engineers about impacts to the operational
plan due to the hydraulic leak anomaly. Many of the other topics discussed during
the handover were continuations of ongoing replanning efforts for entry procedures
as a result of the hydraulic leak in the auxiliary power unit, particularly contingency
planning for cases such as loss of another auxiliary power unit or high crosswinds.
Note that at the end of the update, the incoming controller re-initiated a previous
topic, changes to the shutdown procedure for the auxiliary power unit. This is likely
because he wanted to engage in a lengthier debate on the topic than would have
been appropriate earlier in the briefing.
It is a clear pattern across multiple handover updates that topics were initiated
by both outgoing and incoming controllers. Since the controller who worked the
328 EMILY S. PATTERSON & DAVID D. WOODS
Figure 3. Topic initiations in handover 13.
previous shift should theoretically have more knowledge than the person being
updated should, it follows that the expectation would be that the outgoing controller
would initiate most of the topics. Nevertheless, it is apparent that incoming control-
lers initiated many of the topics in the handover updates (Figure 4). At an a α
level of 0.01 with the t-distribution, the confidence interval for topics initiated by
incoming controllers is [1.5, 7.0], which is clearly greater than zero. In addition,
a one-tailed t-test comparison of the number of topics initiated by outgoing and
incoming controllers gives a p value of 0.08, which is suggestive but not conclusive
that outgoing controllers initiated somewhat more of the topics in the handover
update.
The likely explanation for this finding is that incoming controllers had prior
expectations about the topics that would be important to discuss before initiating
the update. Not only were the incoming flight controllers directly involved in the
ongoing activities two shifts before the update occurred, they also had probably
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 329
Figure 4. Topic initiations by incoming and outgoing controllers.
read the handwritten log, looked at the events that were being tracked, looked at
the flight plan for the day, and listened to the voice loops for some time. Because
the incoming controllers had this mission-specific knowledge in addition to their
general heuristics about activities in mission control, they could anticipate the
important topics to be discussed. Note that the structure of handover 2 supports this
explanation. The update in handover 2 was given to a practitioner who was begin-
ning his first shift of the mission. In this handover, the outgoing controller initiated
most of the topics. Note that the same personnel were involved in handovers 8
and 14, so the structure of handover 2 was probably not a result of individual
personality factors but a function of the incoming controller being less aware of the
important topics to cover in Handover 2. Similarly, in handover 16, the topic initi-
ations were dominated by the outgoing controller. This pattern is likely because the
incoming controller did not have an up-to-date situation awareness, either because
he was substituting for the nominally staffed controller or because it was the last
handover before entry and so many of the decisions had been recently finalized.
5.2. QUESTIONS ASKED BY INCOMING CONTROLLERS DEMONSTRATED
PRIOR KNOWLEDGE
In addition to analyzing topic initiations, we wanted to characterize the questions
asked by incoming controllers during the primary handover briefings. The question
categories were iteratively characterized bottom-up from the data mainly with rela-
330 EMILY S. PATTERSON & DAVID D. WOODS
tion to the amount of shared understanding indicated by the question (Table III).
The categories that iteratively emerged from the data analysis were: (1) update
initiation questions, (2) topic initiation questions, (3) questions to obtain more
details, (4) confirmation questions, and (5) error-checking questions. The ques-
tions asked by incoming controllers were used in the handover updates to steer
the outgoing controller to specific areas. The most common type of question was
where an incoming controller targeted specific information in a topic area about
which he or she wanted more details. These types of questions illustrated that the
two controllers in the briefing shared much common ground on which to base the
update and allowed the incoming controller to narrowly target information which
was needed and known to be needed based on the preparatory work of the incoming
controller reviewing the documentation and listening to the voice loop discussions.
Although the majority of the questions that were asked were done with the
purpose of making the incoming controller more knowledgeable in preparation for
transferring responsibility, an additional function of the questions asked during the
handover was to perform error checking. In this sense, an additional benefit of the
handover was to bring a fresh perspective to the decision making and planning
processes, which presumably would increase the robustness of these activities.
5.2.1. Update initiation questions
Questions that signaled a readiness to receive the handover update such as
Anything going on?” were used to begin the primary briefings. This type of ques-
tion was the least informed in that the entire burden for structuring the update rested
with the outgoing controller. Variations on this question, such as “Anything else
going on?” were used within the updating session to remind the outgoing controller
to be thorough in covering all of the important topics.
5.2.2. Topic initiation questions
Like the initiation questions, this type of question prompted the other controller for
information, but it required that the controller knew that a particular topic existed.
Many of these questions were triggered by an incoming controller monitoring other
information sources, such as by reading the handwritten log (e.g., “Flight caught
us off guard?”), looking at the data screens (e.g., “The main pump case drain
temps?”), looking at the mission plan, or listening to a voice loop update.
5.2.3. Questions to obtain more details
The purpose of this type of question was to obtain more details about a topic that
was being discussed. In the example in Table IV, the incoming controller asked
for details that the controller who had actively been engaged in an activity would
know. In this case, the incoming controller obtained information that supported a
particular hypothesis to explain the anomalous data without requiring the outgoing
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 331
Table III. Questions asked during primary handover briefings
Update initiation Topic initiation Obtaining more details Confirmation Error checking Misc.
Anything going on? Do you have a copy
of the write-up on the
MER meeting?
Was the pressure low
when they were doing it?
Rads are deployed? Why do you have
so many limits?
What?
Other than this, is
anything else going
on?
Will you get me a
copy of that ET sep?
Voids? It’s heater DCU 2
not B, right?
Is it only 17? I
thought it was 10
or 14.
You didn’t
prepare
anything?
Anything going on? Are we doing vent
doors?
How about circ pump
temps?
We’re doing an
earlier Ei purge?
Why do we need
to do that?
You h a v e
that?
Anything going on? Do we need to work
any FCS CO changes?
Wiring? OK, so take out
Bravo?
Why wouldn’t we
put it to norm
press?
What?
Anything significant
going on?
What’s the circ pump
3 status?
How does that fit in [less
fluid]?
They already went
up, didn’t they?
Do you think we
ought to have
them open that
up?
Did you list
your num-
ber?
Anything else going
on?
How did the rad stow
go?
Does the crew go to sleep
on the orbit shift?
He doesn’t want to
do that?
What all is going on? Where did all this tire
data come from?
As long as it’s above
what, zero degrees?
Do you know that
for sure?
What else? Who gets copies of
this flight note?
What is the MMACS
preference?
Does the switch
being in low or
norm affect the
caution and warn-
ing?
332 EMILY S. PATTERSON & DAVID D. WOODS
Table III. Continued
Update initiation Topic initiation Obtaining more details Confirmation Error checking Misc.
Anything going with
the hydraulic leak?
Why did he catch us off
guard?
The rudder speed
brake is getting cold,
don’t you think?
Oh, you mean they’ll
start at TAEM?
Flight caught us off
guard?
How much leaked?
Slow start of circ
pump 3?
Do you know why in
the TMBU we’re doing
operating limits?
What’s going on with
circ pump 2?
Did you send the TMBU
number?
The main pump case
drain temps?
Did you take the TMBUs
to MOIR?
Are they asleep yet? Will the hose jump
around?
And did we get any
information on the?
Did <previous control-
ler>look at the plan?
And the 10 knot cross-
wind?
Because only what?
Where are the deltas? What was his pitch?
What about AESP? What does the rule say?
Whyarewestartingan
APU before TAEM?
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 333
Table III. Continued
Update initiation Topic initiation Obtaining more details Confirmation Error checking Misc.
CHIT 17, what does it
say?
This is it?
Just entry?
You have any idea where
that is?
Do we do steam vent
heater activation during
this?
FAO has already sent it
up so do we delete this
step or have them deac-
tivate it later?
Do you know what the
MEWS problem is?
Hardware and caution
warning for what, the
reservoir?
Do they know if
we’re going to have
it anymore?
What is that, a BFS
TMBU?
334 EMILY S. PATTERSON & DAVID D. WOODS
Table IV. Using a question to obtain more details
Commentary Outgoing controller Incoming controller
A circulation pump did
not work as expected. The
outgoing controller tells
the incoming controller to
investigate this potential
problem.
“We had a ratty circ pump.
The switching valve didn’t
change for 27 seconds. We
need to pull data on this.”
The incoming controller
asks a question to obtain
more details.
“Was the pressure low
when they were doing it?”
The outgoing controller
fills in the details
requested by the incoming
controller.
“Yeah, the circ pump
pressure came up about
halfway, toggled, went up
all the way.
controller to update him on all of the potentially relevant details relating to the
topic.
5.2.4. Confirmation questions
Confirmation questions were generally “Yes/No” questions in order to verify that
the controllers shared the same knowledge or interpretation (e.g., “As long as it’s
above what, zero degrees?”).
5.2.5. Error checking questions
During the updates, incoming controllers were observed to question outgoing
controllers in an attempt to identify and correct potentially erroneous assumptions.
An example of this Are you sure?” interrogation strategy is provided in Table V,
where the incoming controller questioned whether putting the leaking hydraulic
system on the auxiliary power unit (APU) to a “standby” configuration for use in
case another APU failed would generate a false alarm. In this case, the outgoing
controller stated a high confidence in his assumption that no alarm would be gener-
ated, so there was no direct effect on their decision to re-enable hydraulic pressure
on the leaking system. In other cases, erroneous assumptions were discovered and
changes to plans implemented as a result of this type of question.
In summary, incoming controllers were observed to ask questions that displayed
a range of prior knowledge, from questions that broadly indicated a desire to begin
the handover update to questions that were highly specific, targeting a gap in know-
ledge about details of a particular topic item or verifying that an understanding
was accurate. In only one case did a controller defer an answer to a question to a
later time in order to more quickly troubleshoot a server crash. If many question
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 335
Table V. Checking a potentially erroneous assumption
Commentary Outgoing controller Incoming controller
The outgoing controller
updates the incoming con-
troller about a decision to
re-enable pressure on a
leaking system.
“Heater system 3. We’re
going to go ahead and re-
enable hydraulic pressure
on their system.”
The incoming controller
asks if re-enabling pres-
sure will cause alarms to
be unnecessarily triggered.
“Do you know that for
sure? Does the switch
being in low or norm affect
the caution and warning?”
The outgoing controller
declares that no alarms
will be triggered.
“No, uh-uh.”
deferrals had occurred during the updates, this would have indicated miscalibration
on the part of the incoming controllers as to what was important to discuss. By
accurately anticipating where they needed to be informed, incoming controllers
offloaded much of the work necessary by the outgoing controller to determine what
should be included in the update.
These patterns of mixed-initiative interactions and interrogation strategies
suggest that the update is less effortful and more robust when many of the topics
are mutually known before the briefing. The outgoing controller is less prone to
missing an important topic as the incoming controller can help to remind the
outgoing controller of the topics to be covered. The incoming controller can aid
the outgoing controller in targeting knowledge gaps during the update. Investing
in a common understanding during low workload periods in preparation for unex-
pected problems, either by listening in on others’ conversation, observing others’
activities, or providing updates that have not been requested, has been observed to
be a strategy in many complex, dynamic domains (e.g., anesthesiology, Johannesen
et al., 1994; satellite mission control, Jones, 1995; aviation, Kerns et al., 1998;
military aviation, Rochlin et al., 1987; emergency call centers, Benchekroun et
al., 1995). An implication of these observations is that the on-call architecture
might work more effectively if practitioners who are assigned the responsibility
to be called in when an unexpected event occurs invest proactively in learning the
important topics that would need to be covered in an update before the situation
escalates.
336 EMILY S. PATTERSON & DAVID D. WOODS
Figure 5. Topics in handover updates.
5.3. UPDATES EMPHASIZED CASCADES FROM THE ESCALATING EVENT
Analysis of the content of the handover updates revealed that the updates mainly
emphasized activities, data analyses, and decisions that resulted from the hydraulic
leak anomaly (Figure 5). The activities in the handover updates included activ-
ities that had been accomplished in the past, that were ongoing and needed to
be continued by the incoming shift, and activities that still remained to be done
during the next shift or handed over to future shifts. There were also data analysis
results that were described during the handovers that provided further information
about the extent of the hydraulic leak, performed either within the MMACS team
or by engineering personnel. Finally, controllers discussed changes in decisions
to nominal and contingency plans for upcoming landmark events. With every
update about a decision, there was an associated update about the stance toward
the decision. For example, the stance of the MMACS team toward the configur-
ation for entry was that the Auxiliary Power Unit (APU) with the hydraulic leak
should be turned off in order to avoid relying upon a potentially faulty system.
By including the stance toward a decision in the update, the incoming controller
would be positioned to provide and defend a recommendation in the event that the
decision was reopened for debate at a later time.
Note that the handover updates mainly emphasized deviations from the initial
plan. The handovers were built on top of a shared understanding of the nominal
plan. It should be recognized that called-in practitioners might not have this
shared understanding to build upon unless they are specifically provided with that
information in advance.
5.4. UPDATES HIGHLIGHTED EVENTS
Although many of the activities, data analyses, and decisions discussed in the
handover updates were triggered by the hydraulic leak in the Auxiliary Power Unit
during ascent, the handover updates also included discussions about events which
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 337
Table VI. An update about a nominal event
Commentary Outgoing controller Incoming controller
The incoming controllers asks
about the planned event of
radiator deployment which was
supposed to have occurred in
the previous shift.
Rads are deployed?
The outgoing controller implies
that the event occurred nomin-
ally and reminds the incoming
controller that only one of
two radiators was deployed, as
planned.
Port rad is deployed. We
only need the port.
did not trigger these cascading repercussions. All of the events discussed during the
handover updates are shown in Figure 2. The updates included a wide variety of
events along a continuum of deviation from expectations: nominal to off-nominal
to anomalous to escalating. Generally, the depth of the briefing about the event was
a function of how far it deviated from expectations.
Nominal events are defined as events that occurred as planned during the
mission. The main events of concern to the mechanical systems controllers
(MMACS) that were originally scheduled into the STS-76 flight plan were liftoff,
shutdown of the mechanical systems upon obtaining orbit, radiator deployments,
radiator stows, extra-vehicular activity (EVA), docking and undocking with the
MIR space station, tests of the flight control system a day before entry (FCS
checkout), and touchdown. Updates about nominal events were generally brief and
mainly given to confirm that an event had occurred as expected (e.g., “They did
the EVA”). Not all of the nominal events were mentioned in the updates. In some
cases, additional details were provided about exactly what occurred during the
event because although it was mostly nominal, there were some aspects that should
be noted. In the example in Table VI, the update confirmed that the event occurred
nominally and served as a reminder to the incoming controller that only the port
radiator was deployed in this case, which was the original plan but normally two
radiators are deployed.
Off-nominal events are defined as unexpected deviations from the plan that had
few impacts to operational plans. The example in Table VII contains an update
about an off-nominal event: a temperature value in the third main engine that
was lower than expected. Note that the outgoing controller identified the event
based on noticing that the data from one system was lower than data from two
identical systems, even though the values were within the hard-coded nominal
ranges in the monitoring software. Also, the outgoing controller’s update had
338 EMILY S. PATTERSON & DAVID D. WOODS
Table VII. An update about an off-nominal event
Commentary Outgoing controller Incoming controller
The outgoing controller
mentions that there is a
potential problem with an
engine because although
the temperature value is
in the nominal range, the
value is lower than two
identical systems.
“System 3 main engine
return temp is lower than
the other two and I don’t
know why. So that’s a
question there.”
He mentions similar data
that might also be related
because it is on the same
system and is also a lower
temperature.
“Also, the main pump case
draintemponsystem3
was 163 when the other
two were 180 post-ascent.”
The incoming controller
suggests a hypothesis to
account for the unexpected
data.
“Hydraulic leak might
account for that.”
This suggestion triggers an
involved diagnostic debate
about two possible hypo-
theses that might account
for the data.
(Diagnostic debate about two possible hypotheses, a
hydraulic leak and a transducer, that might account
for the data.)
two related data deviations given sequentially, although he did not explicitly state
that the two deviations were related. In addition to giving the observation of the
low data, the incoming controller proposed a hypothesis to account for the data,
leading fluidly into a diagnostic debate that allowed the two controllers to use
each other’s expertise to generate and evaluate hypotheses. Finally, the outcome
of the diagnostic debate did not include a resolution or selection of a particular
hypothesis, even as a working hypothesis, since it was not deemed important
to devote the resources to doing so. Had this update been about a large-scale
anomaly, selecting and justifying a rationale for an explanatory hypothesis would
have been much more important. At this stage, by learning about this deviation in
the handover, the incoming controller was prepared to:
perform the activity of pulling the data,
alter his expectations for monitoring to track those data points,
connect that piece of data with other unexplained data, and
answer questions as they arose.
Finally, there were two events during the STS-76 mission that were classified
as anomalous in that they were significant enough deviations that they required
documented justification of the rationale for diagnosis and response actions taken
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 339
during the mission, but did not cause an escalation of cognitive and coordinative
activities like the hydraulic leak anomaly:
1. a freeze in a water spray boiler that had almost no impact because, based on
experience in many past missions where that event was seen and the boiler
worked nominally when the coolant warmed up, no immediate action was
required, and
2. a microswitch failure on the payload bay doors; had the indication been correct,
it would have been a serious anomaly requiring an emergency landing.
It is interesting to note that, although events were clearly critical in the handover
updates, the practitioners rarely discussed base data values (e.g., “the pressure
is 82 psi”), but rather described data patterns in terms of events that were signi-
ficant in some way (e.g., “there was a water spray boiler freeze”). In the situation
where an automated system would be used to monitor and call in practitioners, it
would be important for the system to highlight or visualize significant events, not
just plot base data parameters (Thronesbery et al., 1999). It must be recognized,
however, that many of the shuttle events, and certainly the associated activities,
decisions, and data analyses, would be beyond the capabilities of an automated
logger to capture. Therefore, automated monitoring systems would need to be
designed such that this other information could be easily annotated by human
practitioners at regular intervals in order to avoid called-in practitioners lacking
critical information in escalating situations.
5.5. THE CASE OF THE MISSING UPDATE:UNPREPARED TO CLOSE VENT
DOORS
Although in general the incoming flight controllers took over the responsibility of
their positions without incident due in large part to the effectiveness of the updates
that they received, an incident was observed where the back room Mech controller
did not anticipate a request to close the vent doors prior to docking with the MIR
Space Station. The controller was clearly surprised by this request, as evidenced by
prior statements made by the controller that he did not believe the action would be
requested, a look of surprise when the request was made, and a delay in the timeline
because implementing the action took several minutes longer than expected. In
addition, the observed controller described the incident to the following shift’s
controller as: “In the unlikely event that we do it, I didn’t want to be stumbling
around, then all of a sudden we’re doing this ...”
The controller was unprepared for the request because he was not updated
by another agent in the distributed system about a reversal in the Russian space
agency’s stance toward the decision about closing the vent doors on the shuttle prior
to docking. The inferred evolution of the mindsets of the United States and Russian
space agencies regarding whether or not to close the vent doors prior to docking
are detailed in Table VIII. Normally the vent doors are left open in space to allow
oxygen to escape prior to entry. The anomalous hydraulic leak during ascent raised
340 EMILY S. PATTERSON & DAVID D. WOODS
Table VIII. Missing update on decision reversal triggers coordination surprise
concerns that hydraulic fluid might contaminate the MIR Space Station. Analyses
conducted by both space agencies showed that the amount of leaked hydraulic
fluid was negligible with the implication that it was not necessary to close the
vent doors prior to docking. In addition, NASA planned to conduct a space walk
during the mission, demonstrating that they were not concerned about the hydraulic
fluid contaminating the interior of the shuttle. During communications between the
American and Russian space agencies, the two organizations presented evolving
stances toward the decision. One day before docking, the Russians announced that
they were “90% go” on docking without closing the vent doors. The observed
controller, along with the entire mission control center at NASA Johnson space
center, assumed that this was a final decision not to close the vent doors, as
evidenced by a voice loop update to the flight director.
Sometime between the conference call and the docking, a representative of the
American space agency had a private phone conversation with a representative
from the Russian space agency where the decision not to close the vent doors prior
to docking was reversed. This decision reversal was never communicated to the
personnel in mission control, with the subsequent consequence of the observed
controller dedicating his resources to preparing for other tasks and therefore being
unprepared for the request.
The observation of this instance where a missing update impacted the perform-
ance of the staffed controller provides converging evidence that updates are central
to effective performance. When practitioners are not fully updated on the current
situation, they are vulnerable to these types of ‘coordination surprises’ (Patterson et
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 341
al., 1998). Therefore, coping with additional workload during escalating situations
by delaying or eliminating updates to called-in practitioners will lead to predictable
cognitive and coordinative breakdowns. An essential element in maintaining safe
operations with the on-call architecture is to understand how to minimize the effort
to bring incoming practitioners quickly and efficiently up to speed in escalating
situations.
6. Discussion
The study findings highlight the importance of updates in preparing incoming
practitioners to effectively accept responsibility to be a supervisory controller in
a dynamic, event-driven, complex setting and the central role of prior knowledge
during the updates. During the update, practitioners learn the status of both the
monitored process and distributed agents’ activities in response to expected and
unexpected changes in the process flow. These observations elucidate why control-
lers will often refuse to accept a transfer of responsibility from another controller
without a face-to-face verbal update. The cognitive impact of the update was
observed in all facets of a flight controller’s work. The expectations for monit-
oring were set by knowing what changes had been made to system configurations
and what events had occurred. The agenda of activities to be done during the
upcoming shift was influenced by knowing what past activities were concluded and
what activities were ongoing. Knowing the team’s stance toward critical decisions
impacted communications with other controllers, particularly when decisions were
reopened for debate.
In addition to direct implications for training how to conduct effective shift
changes in supervisory control settings, such as by conducting pre-planning for
updates by looking at logs and other documentation, these study findings point
to other design and organizational implications relating to on-call architectures.
Under pressure to be more cost-efficient, NASA and other organizations are inter-
ested in using computer-enhanced sensor data processing to enable the reduction
of staffing during nominal situations. These findings have implications for two
envisioned scenarios where the on-call architecture for supervisory control is used
to meet this economic goal. In the first, staffing is minimized until a problem
occurs. In this case, the staff must recognize when problems occur and call in
practitioners with the appropriate types of expertise to resolve the problem. An
example of this scenario is the role of the Station Duty Officer, who is the only
staffed flight controller for the US Space Station for all but 3 hours a week, when no
crew is onboard the space station. In the second scenario, a computerized system
monitors a process and alerts humans when a problem occurs that requires their
attention. Although this may seem somewhat futuristic, this scenario is already
being considered in several domains, including scientific spacecraft mission control
(Brann et al., 1996) and unmanned missions to Mars.
342 EMILY S. PATTERSON & DAVID D. WOODS
Table IX. Representative entries in traditional mission control automated logs
M23:51:35 MODE SEL MAN ORB UNL V72K2975J has changed from 1 to 0
M23:51:35 MODE SEL MAN EE V72K2976J has changed from 0 to 1
M23:51:55 ORB UNL MODE IND V72X2906J has changed from 1 to 0
M23:54:55 EE MODE AUTO V72K2990J has changed from 0 to 1
M23:51:55 ENTER V72K2982J has changed from 1 to 0
In both of these scenarios, computer-enhanced sensor data processing is
required in order to monitor the massive amounts of data in order to recognize
significant events that need to be brought to a human’s attention. For example,
the traditional mission control automated log entries for one console for a few
minutes of data is displayed in Table IX (see Patterson, 1997, for a description of
current logging tools in space shuttle mission control). Clearly, if one controller is
responsible for monitoring dozens of such consoles at one time, significant events
could be missed due to the sheer mass of the data without “intelligent” machine
process support to recognize, prioritize, and highlight deviations from expectations.
For both scenarios, it is clear from these observations that the event recognition,
prioritization, and communication conducted by the mission controllers was much
different than that provided in the traditional automated logs. First, the controllers
did not communicate base data about “bit flips” on sensor data “changed from
0 to 1.” In fact, communications about exact data values was nearly non-existent
during the updates. The event descriptions were at a much higher level, such as
“ratty circ pump” and “heater cycling” that were based on a complex combination
of multiple parameters, not all of which would independently be viewed to be out
of normal ranges for most situations and not all of which occurred simultaneously
in a discrete fashion. Second, not all of the nominal events were included in the
updates, although all of the off-nominal and anomalous events were. Therefore,
events that deviated from expectations were treated differently during the updates,
and the expectations were highly tailored to what was happening in the mission as
well as against a baseline of deviations such as water spray boilers that often freeze
up. Third, the event “signature” on which the recognition of the event was based
was nearly always more complex than a threshold crossing on a single parameter.
For example, one of the events was about a temperature on an engine that was lower
than two other engine temperatures. This temperature value was not out of range
of nominal parameter values. In addition, there are situations where one would
expect the temperature to be lower than the observed value, such as upon entry in
the cold atmosphere, which would not constitute an event. Fourth, most automated
logging tools only capture and display past information, and much of the handover
content related to future events, activities, analyses, and decisions, in order to help
the incoming controller prepare for and anticipate these things, or else pass them
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 343
on to the next shift to do so. Finally, many of the events that were discussed in
the handover updates were not about the space shuttle, but about deviations in
expected activities, decisions, and plans with other agents, such as the reversal to
the decision to keep the vent doors open, and so would be nearly impossible for an
automated logger to detect at all due to difficulties in designing sensors to detect
those higher-level abstractions.
Overall, one implication of these observations is that automated loggers are
currently not capable of completely replacing a human monitor in accurately
detecting all unexpected data patterns, prioritizing these patterns, displaying them
flexibly at multiple levels of detail upon demand, and quickly filling in targeted
holes in knowledge upon request. At the same time, these observations point
out how heavily the effectiveness and efficiency of the updates relied upon the
incoming controllers having substantial knowledge prior to initiating the update.
This observation calls into question whether or not updates to called-in practi-
tioners could be conducted as quickly and effectively as these shift change updates.
Without as much prior knowledge, as would be the case in a call-in situation,
called-in practitioners would likely take much more time and resources before
they could effectively aid the staffed practitioner. It is likely that the update would
take longer, and updating would be a larger cost to the staffed practitioner at a
very busy time than if the incoming practitioner were more knowledgeable. The
burden for thoroughly covering all of the topics to be discussed would fall onto the
staffed practitioner, and possibly the called-in practitioner would try to raise topics
that are less relevant. Rather than being able to target specific gaps in knowledge
through directed questions, the staffed practitioner would be forced to cover more
information in the update or risk leaving out important information. Finally, the
common ground would not have been built up between the staffed and the called-in
practitioner, so the communications would be less terse and rely less on a common
body of shared knowledge, leaving open more possibilities for miscommunica-
tions. Given that these shift change updates were ten minutes on average and that
in escalating on-call situations, ten minutes might be prohibitively long to tie up
the resources of the staffed controller, it is likely that other means to prime on-call
practitioners to receive updates might become important in effectively drawing in
the called-in practitioner in the first scenario.
A partial organizational response to this dilemma would be for called-in prac-
titioners to invest in a process understanding before any problems occur. NASA
Johnson Space Center has already implemented this organizational solution during
missions where the staffing is reduced unless a problem occurs. They have made
being on-call an official responsibility that requires investment, although less than
if all the controllers are continuously staffed. For each mission, two controllers
are assigned the responsibility of being on call, one scheduled from midnight to
noon and another from noon to midnight. These controllers observe critical phases
of the mission, such as ascent. They also stop by the console in mission control to
obtain updates, read the log, listen to the voice loops, and watch the monitored data
344 EMILY S. PATTERSON & DAVID D. WOODS
once a day for about 15 minutes. By investing in learning about events that have
occurred during low-tempo periods, they are then more prepared to respond in
an on-call situation. We are considering how to additionally support this solution
by providing ‘open’ tools remotely like voice loops and data screens for on-call
controllers who are physically and temporally removed from the control center so
that they can gain a process feel without leaving their offices. It is also possible
that the same tool could be used to provide called-in practitioners with a partial
understanding to prime them for the update from the staffed practitioner, thereby
reducing some of the burden of updating the incoming practitioner at a busy time.
This field study also suggests implications for the second envisioned on-call
scenario where humans are removed from the monitoring loop during nominal situ-
ations. In this situation, it is likely that machine processing would have to perform
some control activities, not just monitor and record deviations from expectations,
in order to reduce the number of times a human agent would need to be called in.
From the results of this field study, it is clear that such a tool would probably over-
control or inaccurately control a complex process on occasion. Therefore, control
actions from such systems should be highly constrained and the consequences of
over-controlling or inaccurately controlling should be low.
Based on the results of this study, it must be acknowledged that even with the
most advanced automated monitors, it would be dangerous to completely remove
human personnel from nominal operations in complex, high-risk environments
with escalating events. Automated loggers are mainly used to capture and manip-
ulate data at the level of data parameter values, missing much of the information
about significant events, and activities and changes to plans that are associated with
cascades from escalating events. Rather than completely replacing human super-
visory controllers with automated loggers, perhaps we can develop support tools
for human practitioners that are only intermittently involved. For example, we can
develop ‘hybrid’ systems, where humans can periodically annotate information that
cannot be captured electronically onto automated logs. These records could then
be used to prime called-in practitioners for updates during escalating situations.
The pressure to minimize costs during nominal operations is expected to
continue to mount in most supervisory control domains. Given the potentially
extreme risks associated with failing to effectively integrate in the additional
resources necessary to respond to an escalating situation, there is increased interest
in finding ways to mitigate those risks. The findings from this field study high-
light the influence of prior knowledge and building a common ground between
practitioners in having an effective and efficient update. Investing in a common
ground before problems occur, by getting updates from staffed practitioners
during low workload periods and ‘looking in’ on data remotely through computer
support tools, will allow practitioners to be more effective at seamlessly providing
the necessary expertise and additional resources to safely respond to escalating
situations.
SHIFT CHANGES, UPDATES, AND ON-CALL ARCHITECTURE 345
Acknowledgements
Support for this research was provided by NASA Johnson under the Grant No.
NAG 9-786, Human Interaction Design for Cooperating Automation. This work
was made possible through collaboration with colleagues in the Intelligent Systems
Branch, including Dr. Jane Malin, Dr. Carroll Thronesbery, Dr. Debra Schreck-
enghost, Mr. Ron Kerr, Dr. David Overland, and Dr. Tico Foley, as well as
with colleagues from the Cognitive Systems Engineering Laboratory, including
Dr. Jennifer Watts-Perotti, Mr. James Corban, Ms. Renee Chow, and Mr. Klaus
Christoffersen. This material is also based upon work supported under a National
Science Foundation Graduate Fellowship. Any opinions, ndings, conclusions or
recommendations expressed in this publication are our own and do not necessarily
reflect the views of the National Science Foundation. We thank four anonymous
reviewers for their useful critiques and suggestions.
References
Benchekroun, H., B. Pavard and P. Salembier (1995): Design of Cooperative Systems in Complex
Dynamic Environments. In J.-M. Hoc, P.C. Cacciabue and E. Hollnagel (eds.): Expertise and
Technology: Cognition and Human-Computer Cooperation. Lawrence Erlbaum.
Brann, D.B., D.A. Thurman and C.M. Mitchell (1996): Human Interaction with Lights-out Automa-
tion: A Field Study. In Human Interaction with Complex Systems ’96. Dayton, OH.
Clark, H. and S. Brennan (1991): Grounding in Communication. In L. Resnick, J. Levine and S.
Teasley (eds.): Socially Shared Cognition. Washington, DC: American Psychological Associ-
ation.
Clark, H.H. (1992): Arenas of Language Use. Chicago: The University of Chicago Press.
Grusenmeyer, C. (1995): Shared Functional Representation in Cooperative Tasks The Example
of Shift Changeover. International Journal of Human Factors in Manufacturing, vol. 5, no. 2,
pp. 163–176.
Heath, C. and P. Luff (2000): Technology in Action. Cambridge: Cambridge University Press.
Hollnagel, E., O. Pederson and J. Rasmussen (1981): Notes on Human Performance Analysis
(Technical Report Riso-M-2285). Riso National Laboratory.
Hutchins, E. (1995): How a Cockpit Remembers Its Speed. Cognitive Science, vol. 19, pp. 265–288.
Johannesen, L., R. Cook and D. Woods (1994): Grounding Explanations in Evolving Diagnostic Situ-
ations (CSEL Report 1994-TR-03). The Ohio State University, Cognitive Systems Engineering
Laboratory.
Jones, P.M. (1995): Cooperative Work in Mission Operations: Analysis and Implications for
Computer Support. Computer Supported Cooperative Work, vol. 3, pp. 103–145.
Kerns, K., P.J. Smith, C.E. McCoy and J. Orasanu (1998): Ergonomic Issues in Air Traffic Manage-
ment. In W. Marras and W. Karwowski (eds.): Handbook of Industrial Ergonomics. CRC
Press.
Patterson, E.S. (1997): Coordination Across Shift Boundaries in Space Shuttle Mission Control
(CSEL Report 1997-TR-01). The Ohio State University, Cognitive Systems Engineering Labor-
atory.
Patterson, E.S., D.D. Woods, N.B. Sarter and J. Watts-Perotti (1998): Patterns in Cooperative Cogni-
tion. COOP ’98, Third International Conference on the Design of Cooperative Systems. Cannes,
France, 26–29 May, pp. 13–23.
346 EMILY S. PATTERSON & DAVID D. WOODS
Patterson, E.S., J. Watts-Perotti and D.D. Woods (1999): Voice Loops as Coordination Aids in Space
Shuttle Mission Control. Computer Supported Cooperative Work: The Journal of Collaborative
Computing, vol. 8, no. 4, pp. 353–371.
Rochlin, G.I., T.R. La Porte and K.H. Roberts (1987): The Self-designing High-reliability Organiza-
tion, Aircraft Carrier Flight Operations at Sea. Naval War College Review, Autumn, pp. 76–90.
Suchman, L. (1987): Plans and Situated Actions: The Problem of Human-Machine Communication.
Cambridge: Cambridge University Press.
Thronesbery, C., K. Christoffersen and J. Malin (1999): Situation-oriented Displays of Space
Shuttle Data. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting,
September 27–October 1, Houston, TX, pp. 284–288.
Wegner, D., T. Giuliano and P. Hertel (1985): Cognitive Interdependence in Close Relationships. In
W. Icke s ( ed.): Compatible and Incompatible Relationships. New York: Springer-Verlag.
Woods, D.D. (1993): Process Tracing Methods for the Study of Cognition Outside of the Exper-
imental Psychology Laboratory. In G. Klein, J. Orasanu and R. Calderwood (eds.): Decision
Making in Action: Models and Methods. Norwood, NJ: Ablex Publishing Corporation.
Woods, D.D. and E.S. Patterson (2001): How Unexpected Events Produce an Escalation of Cognitive
and Coordinative Demands. In P.A. Hancock and P.A. Desmond (eds.): Stress Workload and
Fatigue. Hillsdale, NJ: Lawrence Erlbaum, pp. 290–304.
Woods, D.D. (1994a): Cognitive Demands and Activities in Dynamic Fault Management: Abductive
Reasoning and Disturbance Management. In N. Stanton (eds.), Human Factors in Alarm Design.
Bristol, PA: Taylor and Francis.
Woods, D.D., L.J. Johannesen, R.I. Cook and N.B. Sarter (1994b): Behind Human Error: Cognitive
Systems, Computers, and Hindsight. Dayton, OH: CSERIAC.
... aborative technologies for distributed automotive design environment: reasons for partial adoption included extra work and lack of incentive to contribute to the databases. Rogers (1994) CSCW in a travel centre: notes a variant of 'extra work', that the impact failure to maintain housekeeping procedures is much greater than for single user systems. Sanderson (1994) Desktop videoconferencing in collaborative research: very limited adoption partially ascribed to a lack of real need for interactive communication outside meetings. Spellman et al (1997) CSCW in a large distributed software R&D organisation: factors in success included the need to interact frequently and rapidly. Star and Ruhleder (1994 ...
... unced reluctance to experiment influenced uptake Okamura et al. (1994) Computer conferencing in an research and development laboratory: expert users both adapted the technology and influenced patterns of use. Orlikowski and Gash (1994) Lotus Notes in a management consultancy: individual willingness or otherwise to collaborate as a factor in uptake. Sanderson (1994) Desktop videoconferencing in collaborative research: very limited adoption partially ascribed to the individual attitudes, some users unwilling to explore new facilities. Tammaro et al. (1997) Collaborative writing tool for a geographically dispersed department: not strictly innovativeness, but disappointing uptake in part due to collab ...
Article
Full-text available
Researchers in Information Systems have produced a rich collection of meta-analyses and models to further understanding of factors influencing the uptake of information technologies. In the domain of CSCW, however, these models have largely been neglected, and while there are many case studies, no systematic account of uptake has been produced. We use findings from Information Systems research to structure a meta-analysis of uptake issues as reported in CSCW case studies, supplemented by a detailed re-examination of one of our own case studies from this perspective. This shows that while there are some factors which seem to be largely specific to CSCW introductions, many of the case study results are very similar to standard IS findings. We conclude by suggesting how the two communities of researchers might build on each other's work, and finally propose activity theory as a means of integrating the two perspectives.
... The team's limited use of the other DUCK tools is also consistent with the results of implementing conferencing and information sharing systems reported elsewhere, as in[1,[23][24][25][26]. Again this result will have been influenced by the low level of machine provision. ...
Article
Full-text available
Existing organisational context and user expectations have a huge effect on the success of introducing CSCW technology, and should have a correspondingly strong influence on the choice and design of these tools. This paper first discusses organisational context and end-user expectations encountered in a large distributed engineering organisation planning to implement a CSCW pilot. It is demonstrated that while the organisational structure was apparently ripe for support with CSCW tools and tools existed which matched clearly expressed user requirements, potential users remained sceptical of their value. An account is then given of the pilot itself, where a range of collaborative technologies was implemented, of which the primary tool actually used was email. Reasons for this are discussed. While this organisation is, in the final analysis, unique, we hope that our conclusions will be of interest both to practitioners working in similar contexts and to CSCW researchers.
... "Open workspaces" have been shown to improve anticipation, synchronization of activities, detection of erroneous actions, and other forms of coordination. 48 System redesign and best practices can also reduce workload burdens to make it less likely that nurses circumvent the system during busy periods as an adaptation to increase efficiency. Circumventing the system will predictably lead to increased vulnerabilities to mis-identification of patients and medications as well as inaccurate administration data. ...
Article
In addition to providing new capabilities, the introduction of technology in complex, sociotechnical systems, such as health care and aviation, can have unanticipated side effects on technical, social, and organizational dimensions. To identify potential accidents in the making, the authors looked for side effects from a natural experiment, the implementation of bar code medication administration (BCMA), a technology designed to reduce adverse drug events (ADEs). Cross-sectional observational study of medication passes before (21 hours of observation of 7 nurses at 1 hospital) and after (60 hours of observation of 26 nurses at 3 hospitals) BCMA implementation. Detailed, handwritten field notes of targeted ethnographic observations of in situ nurse-BCMA interactions were iteratively analyzed using process tracing and five conceptual frameworks. Ethnographic observations distilled into 67 nurse-BCMA interactions were classified into 12 categories. We identified five negative side effects after BCMA implementation: (1) nurses confused by automated removal of medications by BCMA, (2) degraded coordination between nurses and physicians, (3) nurses dropping activities to reduce workload during busy periods, (4) increased prioritization of monitored activities during goal conflicts, and (5) decreased ability to deviate from routine sequences. These side effects might create new paths to ADEs. We recommend design revisions, modification of organizational policies, and "best practices" training that could potentially minimize or eliminate these side effects before they contribute to adverse outcomes.
Article
Researchers in Information Systems have produced a rich collection of metaanalyses and models of factors influencing the uptake of information technologies. In the domain of CSCW, however, these models have largely been neglected, and while there are many case studies, no systematic account of uptake has been produced. We use findings from Information Systems research to structure a meta-analysis of uptake issues as reported in Computer Supported Cooperative Work (CSCW) case studies, supplemented by a detailed re-examination of one of our own case studies from this perspective. This shows that while there are some factors that seem to be largely specific to CSCW introductions, many of the case study results are very similar to standard IS findings. We conclude by suggesting how the two communities of researchers might build on each other's work.
Article
and Teesside. In common with many universities, all five have been engaged in the process of evaluating and imple-menting online learning environments. During the 1999-2000 academic year the authors organized a series of meetings to share information and experience. From these meetings and other activities the group identified a set of issues and criteria that they felt needed to be addressed to inform the decision making process. This article will report on these. The process is, of course, ongoing with all of the institutions having elaborated implementation strategies. The article will also report briefly on these decisions. However, it is perhaps significant that the four universities that have, at the time of writing, chosen a VLE have independently chosen the same one, Blackboard., S. (2002). Evaluating and implementing learning environments: A United Kingdom experience. Educational Technology Review, [Online serial], 10(2), 28-51.
Chapter
Full-text available
shows how one can go beyond spartan laboratory paradigms and study complex problem-solving behaviors without abandoning all methodological rigor / describes how to carry out process tracing or protocol analysis methods as a "field experiment" (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Chapter
This chapter is concerned with the thinking processes of the intimate dyad. So, although we will focus from time to time on the thinking processes of the individual—as they influence and are influenced by the relationship with another person—our prime interest is in thinking as it occurs at the dyadic level. This may be dangerous territory for inquiry. After all, this topic resembles one that has, for many years now, represented something of a “black hole” in the social sciences—the study of the group mind. For good reasons, the early practice of drawing an analogy between the mind of the individual and the cognitive operations of the group has long been avoided, and references to the group mind in contemporary literature have dwindled to a smattering of wisecracks.
Article
Event-based, or situation-oriented, logs of Space Shuttle data were evaluated for their support of flight controllers in reviewing operations. An intelligent system was developed to bundle and annotate the data describing an operational event, or situation. A display tool was also developed to show this situation information to flight controllers, aiding the review of procedure operations and anomalies. Finally, flight controllers informally evaluated the situation displays in the context of scenarios based on recorded mission data. They confirmed that the situation-oriented logs provide useful information for reviewing operations and help the reviewer find relevant details associated with a procedure or anomaly. Initially, these logs were designed to support off-line tasks like shift hand-overs, anomaly analysis, and writing post-mission reports. As we learn more about how to support a quick understanding of events, we can better answer the challenges of real-time monitoring of large amounts of data and management by exception. Miller's 7±2 is an oft-cited but misleading heuristic in user interface design. Miller extracted this rule of thumb by examining data on memory span, among other things. However, memory span procedures measure short-term memory at its upper limits and thereby greatly overestimate humans' ability to easily hold in mind unrelated, linguistic material. Justifications for using a maximum of 3 items as a design heuristic are presented.
Article
The growing complexity of industrial systems emphasizes the problems of coordination of operators' activities. Studies on collective work are more and more numerous. However, most of them are focused on cooperation between operators with different abilities and a well-defined problem to solve. Within this framework, shift changeover seems to be a specific collective situation. This paper will try to demonstrate the interest of the concept of shared functional representation between operators through two case studies about shift changes. These analyses have shown how the lack of shared functional representation between operators can contribute to a nonoptimum shift change. @ 1995 John Wiley & Sons, Inc. Full text available : http://onlinelibrary.wiley.com/doi/10.1002/hfm.4530050205/epdf