ArticlePDF Available

Automation in Future Air Traffic Management: Effects of Decision Aid Reliability on Controller Performance and Mental Workload

Authors:

Abstract and Figures

Future air traffic management concepts envisage shared decision-making responsibilities between controllers and pilots, necessitating that controllers be supported by automated decision aids. Even as automation tools are being introduced, however, their impact on the air traffic controller is not well understood. The present experiments examined the effects of an aircraft-to-aircraft conflict decision aid on performance and mental workload of experienced, full-performance level controllers in a simulated Free Flight environment. Performance was examined with both reliable (Experiment 1) and inaccurate automation (Experiment 2). The aid improved controller performance and reduced mental workload when it functioned reliably. However, detection of a particular conflict was better under manual conditions than under automated conditions when the automation was imperfect. Potential or actual applications of the results include the design of automation and procedures for future air traffic control systems.
Content may be subject to copyright.
INTRODUCTION
Several proposals for future air traffic man-
agement (ATM) will change the roles of air traf-
fic controllers and pilots. For example, under
Free Flight (FF; Radio Technical Commission
for Aeronautics [RTCA], 1995) and Distributed
Air/Ground Traffic Management (DAG-TM; Na-
tional Aeronautics and Space Administration
[NASA], 1999), pilots would have greater free-
dom to choose their own heading, altitude, and
speed in real time and primary responsibility for
maintaining separation from other aircraft in the
immediate airspace. Controllers would not be in-
volved in active control of aircraft but would be
in a role of “management by exception” (Dekker
& Woods, 1999; Wickens, Mavor, Parasura-
man, & McGee, 1998). Management by excep-
tion refers to a management concept in which
managers are notified by staff only if a certain
variable (e.g., a budget) exceeds or falls below a
certain value (Drucker, 1954). In the case of air
traffic control (ATC), controllers would manage
traffic flow, leaving the detection and resolution
of conflicts to the pilots and intervene only if
aircraft separation falls below a certain value
(e.g., 5 nautical miles laterally and 1000 feet ver-
tically).
The feasibility of the FF and DAG-TM con-
cepts has been tested in studies with pilots in
flight simulations (e.g., Dunbar et al., 1999;
Lozito, McGann, Mackintosh, & Cashion, 1997;
van Gent, Hoekstra, & Ruigrok, 1998). How-
ever, all future ATM concepts envisage a role for
the controller to step in and intervene to ensure
aircraft separation under certain conditions (fail-
ure of aircraft systems, bad weather, etc.). It is
therefore important to examine how well con-
trollers can detect and resolve conflicts when
they are removed from the tactical control loop
but then have to reenter it to ensure safety.
Several studies using moderate- to high-
fidelity simulators and experienced en route
controllers have shown that conflict detection
performance and situation awareness were re-
duced and mental workload increased under such
Automation in Future Air Traffic Management: Effects of
Decision Aid Reliability on Controller Performance and
Mental Workload
Ulla Metzger and Raja Parasuraman, Catholic University of America, Washington, D.C.
Future air traffic management concepts envisage shared decision-making responsi-
bilities between controllers and pilots, necessitating that controllers be supported
by automated decision aids. Even as automation tools are being introduced, how-
ever, their impact on the air traffic controller is not well understood. The present
experiments examined the effects of an aircraft-to-aircraft conflict decision aid on
performance and mental workload of experienced, full-performance level controllers
in a simulated Free Flight environment. Performance was examined with both reli-
able (Experiment 1) and inaccurate automation (Experiment 2). The aid improved
controller performance and reduced mental workload when it functioned reliably.
However, detection of a particular conflict was better under manual conditions than
under automated conditions when the automation was imperfect. Potential or actual
applications of the results include the design of automation and procedures for
future air traffic control systems.
Address correspondence to Raja Parasuraman, Arch Lab, George Mason University, MS 3F5, 4400 Fairfax Dr., Fairfax, VA
22030-4444; rparasur@gmu.edu. HUMAN FACTORS, Vol. 47, No. 1, Spring 2005, pp. xxx–xxx. Copyright © 2005,
Human Factors and Ergonomics Society. All rights reserved.
Metzger,r1.qxd 3/18/05 6:00 PM Page 1
2 Spring 2005 – Human Factors
simulated FF conditions (Castaño & Parasur-
aman, 1999; Corker, Fleming, & Lane, 1999;
Endsley, Mogford, Allendoerfer, Snyder, & Stein,
1997; Endsley & Rogers, 1998; Galster, Duley,
Masalonis, & Parasuraman, 2001; Metzger
& Parasuraman, 2001; Willems & Truitt, 1999).
(However, no adverse effects were reported by
Hilburn & Parasuraman, 1997, who tested Brit-
ish military controllers, or by Remington, Johns-
ton, Ruthruff, Gold, & Romera, 2000, who tested
four retired controllers in a simple visual search
task that required only conflict detection and no
subsidiary tasks.) Air-ground integration stud-
ies involving both pilots and controllers have
also been conducted. For example, DiMeo et al.
(2002) found that controllers reported higher
workload and showed more conservative con-
flict resolution behavior than did pilots, whereas
pilots preferred FF scenarios and found them to
be safer and to provide greater situation aware-
ness than do current operations.
These investigations provided the first empir-
ical evidence of the effects of future ATM con-
cepts on controller performance and pointed to
the lack of aircraft intent information (Castaño
& Parasuraman, 1999) and the passive moni-
toring role (Metzger & Parasuraman, 2001) as
factors contributing to reduced controller per-
formance. For these ATM concepts to be im-
plemented successfully, therefore, automation
support must be provided for controllers (Corker
et al., 1999; Parasuraman, Duley, & Smoker,
1998), and the DAG-TM program specifically
incorporates controller automation tools (NASA,
1999). One system now in operational use is the
User Request Evaluation Tool (URET), devel-
oped by the MITRE Corporation. URET assists
controllers in detecting potential conflicts be-
tween aircraft or between aircraft and restricted
airspace and suggests resolutions by continuous-
ly checking current flight plan trajectories for
strategic conflicts up to 20 min into the future.
It includes sophisticated algorithms that analyze
and integrate data from different sources (e.g.,
radar data, flight plans) considering numerous
additional parameters (climb rates for different
aircraft types, wind, weather models, etc.). Hence
URET goes well beyond the capability of sim-
ple alerts such as the short-term conflict alert
(STCA). For evaluations of the effectiveness of
URET in conflict detection, see Brudnicki and
McFarland (1997) and Masalonis and Parasur-
aman (2003).
How will these and other automated systems
influence controller performance, and will they
enhance or reduce safety under FF? The advent
of these technologies has stimulated much re-
search on automation and human performance
(Parasuraman & Byrne, 2003; Parasuraman &
Mouloua,1996; Sheridan, 2002). A major conclu-
sion is that automation fundamentally changes
the nature of the cognitive demands and respon-
sibilities of human operators, often in ways that
were unintended or unanticipated by the de-
signers (Bainbridge, 1983; Billings, 1997; Para-
suraman & Riley, 1997; Sarter & Amalberti,
2000; Wiener & Curry, 1980). Previous research
and experience shows that automation leads to
both benefits and costs. Among the human per-
formance costs of certain automation designs
are unbalanced mental workload, complacen-
cy, reduced situation awareness, cognitive skill
loss, and poorly calibrated trust (Parasuraman
& Riley). Such effects have been discussed in de-
sign guidelines for automation for future ATM
(Ahlstrom, Longo, & Truitt, 2002), but these
have been derived mostly from research on cock-
pit automation. There is as yet little empirical
work on controller performance with automa-
tion, particularly in relation to ATM concepts
in which controllers share their decisions with
other members of the system and assume a
rather passive role.
The current study focused on two aspects of
controller-automation interaction: (a) whether
automation can reduce controller workload and
(b) how automation reliability affects controller
performance and workload (Wickens, 2000).
Automation is often implemented in an at-
tempt to reduce the operators workload during
peak periods of task load. However, this does
not always occur. For example, cockpit automa-
tion has sometimes reduced mental workload
in phases of flight when workload was already
low (e.g., autopilot during the cruise phase) and
increased mental workload in phases of flight
when workload was already high (e.g., repro-
gramming the flight management system during
final approach), a phenomenon referred to as
clumsy automation (Wiener & Curry, 1980). In
addition, automation often changes manual con-
trol tasks to monitoring tasks, leaving the human
Metzger,r1.qxd 3/18/05 6:00 PM Page 2
A
UTOMATION IN FUTURE ATM 3
to supervise the automation (Sheridan, 2002),
which can impose considerable workload
(Warm, Dember, & Hancock, 1996).
The second area of concern is the ability of
human operators to manage a system when
automation fails or malfunctions in some way.
This has been referred to as the out-of-the loop
unfamiliarity (OOTLUF) problem (Wickens,
1992). In addition, several studies have examined
the effects of imperfect or unreliable automa-
tion on operator performance in target detection
and complex decision-making tasks (Galster,
Bolia, Roe, & Parasuraman, 2001; Rovira, Mc-
Garry, & Parasuraman, 2002; Wickens, Gempler,
& Morphew, 2000). The results generally showed
that operators have difficulties in detecting tar-
gets or making effective decisions if the automa-
tion incorrectly highlights a low-priority target
or gives incorrect advice. The OOTLUF prob-
lem also results in operators requiring more
time to intervene under automated control than
under manual control because they have to first
regain awareness of the state of the system.
Operators have a better mental model or aware-
ness of the system state when they are actively
involved in creating the state of the system than
when they are passively monitoring the actions
of another agent or automation (Endsley, 1996;
Endsley & Kiris, 1995), particularly if the auto-
mation interface does not support the operator
in gathering the raw information on which the
automation bases its decisions (Lorenz, Di
Nocera, Rottger, & Parasuraman, 2002).
This problem seems particularly relevant to
the problem of automation in future ATM con-
cepts because the shared decision making can
already take the controller out of the control
loop and limit the controllers access to the in-
formation (e.g., pilot intent) relevant to conflict
detection and resolution. If automation is then
introduced to compensate for the effects of re-
duced situation awareness induced by the trans-
fer of decision-making authority away from the
controller to the pilot or dispatcher, the OOTLUF
problem might be further aggravated with im-
perfect automation when the controller is expect-
ed to detect and resolve conflicts despite being
initially “remote” from the control loop.
Although some decision aids may improve
performance under current ATC conditions (e.g.,
Hilburn, 1996; Schick & Völckers, 1991), no em-
pirical data are available on the effects of auto-
mation on controller performance and mental
workload under FF and other future ATM sys-
tems. Automating the decision-making process
in a dynamic environment such as ATC is not a
trivial task, especially under conditions of shared
decision making. A powerful decision aid will
have to accurately predict pilot intentions, weath-
er, and wind. Under traditional ATC conditions,
pilots always had to follow the direction of the
controller and, typically, stay on assigned air-
ways. Therefore, pilot intent was relatively easy
to predict. If the pilots indeed followed ATC in-
structions, they did whatever the controller told
them to do. With the introduction of the Na-
tional Route Program in the late 1990s, these
restrictions were loosened, and under FF condi-
tions pilots have even greater freedom to choose
their routes and altitudes and are not required
to stay on airways. Hence pilot intent is very dif-
ficult to predict for both controllers and auto-
mation.
Although highly capable automated systems
are being developed, these considerations sug-
gest that the emergence of fully reliable automa-
tion that can cope with all situations is unlikely.
As Bilimoria (2001) noted, changes in pilot intent
might in some cases not be received in time or
not be received at all by the computer system
from which URET receives the information re-
quired by its conflict prediction algorithm. There-
fore it is important to examine not only how
well controllers perform with decision-aiding
automation but also when the automation is less
than perfect. The two experiments reported here
on conflict detection automation examined
these issues.
EXPERIMENT 1
The first experiment examined the potential
of a reliable conflict detection aid to compensate
for the reduced performance and increased men-
tal workload typically associated with increased
traffic density and FF. A “mature” level of FF
was chosen in anticipation of a long-term future
operational concept, as in Concept Element 5
(Phillips, 2000) of the NASA DAG-TM, which
addresses en route free maneuvering. In this
concept, controllers monitor free-maneuvering
aircraft and provide only advisories on other
Metzger,r1.qxd 3/18/05 6:00 PM Page 3
traffic, weather, airspace restrictions, and re-
quired time-of-arrival assignments. Even though
pilots stayed on airways and filed flight plans
in this simulation, they could deviate from them
at any time without notifying the controller. Au-
thority for maintaining separation was with the
(simulated) pilots. Controllers monitored traffic
and were expected to act as a backup (i.e., detect
conflicts) in case such self-separations failed.
It was expected that conflict detection perfor-
mance would be reduced and mental workload
increased under high levels of traffic as com-
pared with moderate levels. With the support of
a decision aid, however, performance should be
improved and workload reduced under both
moderate and high traffic density. It was also
expected that the performance of routine ATC
tasks (e.g., communication) would be reduced
under high traffic conditions, as compared with
moderate traffic conditions, without the aid and
that the detection aid would free resources and
improve performance in routine ATC tasks as
compared with unaided performance. Finally,
because the use of automation (and eventually
operator performance) is determined by, among
other factors, operator trust in the automation
and self-confidence to perform without the
automation, controller ratings of trust and self-
confidence (Experiment 2 only) were also ob-
tained. If trust is low, operators are not likely to
use it, and therefore their performance might
not benefit from the aid as much as expected.
If operators place too much trust in the auto-
mation, however, an automation failure could
lead to a performance breakdown.
Method
Participants. Twelve active full-performance
level en route controllers from the Washington,
D.C., Air Route Traffic Control Center (ARTCC)
between the ages of 32 and 51 years (M =37.17,
SD = 4.84) served as paid volunteers. Their
average overall experience (years on the job), in-
cluding all military and civilian positions, ranged
from 11.0 to 19.5 years (M =13.46, SD = 3.24).
All were male.
Apparatus. A medium-fidelity ATC simulator
(Masalonis et al., 1997) was used to simulate a
generic airspace. The simulation consisted of
a radar or primary visual display (PVD), a data
link display, and electronic flight strips present-
ed on two different monitors. A trackball was
used as the input device for both monitors. The
PVD, as shown in Figure 1, consisted of aircraft
targets, data blocks, jet routes, and way points.
The adjacent monitor displayed the data link
4 Spring 2005 – Human Factors
Figure 1. Radar or primary visual display (PVD).
Metzger,r1.qxd 3/18/05 6:00 PM Page 4
A
UTOMATION IN FUTURE ATM 5
and an electronic flight progress strip for each
flight (Figure 2). The data link display was used
to simulate communications between pilots and
controllers. Not only is the data link an easy way
to simulate communications without the require-
ment for pseudo-pilots, it is also envisioned as
the means of communication for routine trans-
missions in future ATC and cockpit systems.
Simulated ATC tasks and dependent vari-
ables. The controllers’ most important (primary)
task was the detection of potential conflicts. A
potential conflict could result in an actual con-
flict, in which two aircraft lost separation (i.e.,
came within 5 nautical miles and 1000 feet of
each other), or in a self-separation, in which one
of two aircraft on a conflict course made an
evasive maneuver (e.g., changed speed, heading,
or altitude) in order to avoid the impending loss
of separation. Controllers were required to in-
dicate when they detected a potential conflict
and name the call sign of the aircraft involved.
Dependent variables were the percentage of de-
tected conflicts and self-separations and the
advance notification time for each. Advance no-
tification times indicate how long before the loss
of separation occurred or would have occurred
(in the case of a self-separation) a controller
reported a conflict.
Because ATC is a multitask situation, con-
trollers were required to perform several other
tasks. Controllers had to communicate with pi-
lots in order to accept them into the sector and
to hand them off to an adjacent sector. As soon
as an aircraft came close to entering the sector
boundaries, it sent a message via the data link
display asking for acceptance. The controller had
to respond to the message and accept the air-
craft. As soon as an aircraft crossed a designated
hand-off zone, controllers had to hand off the
aircraft to the next sector by selecting the aircraft
in a list of flights and clicking a hand-off button.
Dependent variables included the percentage
of successfully accepted aircraft, response times
to requests for acceptance, percentage of air-
craft handed off successfully, percentage of
aircraft handed off early, and the response times
to handoffs. Monitoring the progress of each air-
craft moving through the sector and updating
the flight strips accordingly served as a secondary
task. Controllers were instructed to click on the
way point of a flight strip as soon as an aircraft
had crossed that way point on the radar display.
Dependent measures included the percentage
of missed and early way point updates as well as
response times. Controllers were instructed that
updating the way points was secondary in priority.
Figure 2. Electronic flight progress strips (left side) and data link display (right side).
Metzger,r1.qxd 3/18/05 6:00 PM Page 5
Other measures. Subjective ratings of mental
workload were obtained with the NASA-TLX
(NASA Ames Research Center, 1986). In auto-
mation conditions, subjective ratings of trust in
the automation and self-confidence (Experi-
ment 2 only) to perform without automation
were obtained using a scale ranging from 0
(not at all)to100 (extremely). The scales were
based on measures used by Lee and Moray
(1992, 1994) and adapted to fit the format of
the NASA-TLX ratings. A heart rate monitor
was used in order to obtain the 0.10-Hz band of
heart rate variability (HRV) as a physiological
measure of mental workload. Eye movements
were also recorded. However, because of space
limitations, those results will be reported in a
separate paper.
Design. A two-factorial repeated measures
design with two levels on each factor was cho-
sen. Independent variables were (a) traffic den-
sity, with moderate and high levels of traffic in
the ATC sector, and (b) the availability of a de-
tection aid, with aid absent and aid present as
treatment levels. All controllers performed in all
resulting four conditions (within-subject design).
The order of conditions was presented accord-
ing to a complete double crossover design, with
the availability of the detection aid as the first
crossover and traffic density as the second cross-
over. The participants were randomly assigned
to an order.
Each condition was represented by one sce-
nario. Hence controllers were presented with
four 30-min scenarios that were created to com-
bine high (on average 16 aircraft after an initial
10-min ramp-up period) and moderate (on aver-
age 10 aircraft after an initial 10-min ramp-up
period) traffic density with the absence or pres-
ence of a conflict detection aid in a sector with
a 50-mile radius. Each scenario contained two
conflicts and four self-separations. In scenarios
with a conflict detection aid, a potential conflict
was indicated by a red circle around the two
aircraft involved (see Figure 1) 5 min before the
aircraft lost separation. As soon as one aircraft
made an evasive maneuver to avoid an impend-
ing conflict (e.g., descended 1000 feet), the circle
disappeared. Self-separations always occurred
after the aid appeared. In scenarios without the
detection aid, the circle appeared only when air-
craft lost separation so as to give the controllers
feedback in a manner similar to that in their real
work environments. In contrast to the currently
used STCA, which is based on only current tra-
jectory, the detection aid used in this study had
access to information based on the flight plans
and therefore was, in its functionality, similar
to advanced aids such as URET.
Procedure. After signing the informed consent
form, controllers were connected to the heart
rate monitor, given a demonstration of the sim-
ulation, completed a practice trial, and were
familiarized with the NASA-TLX and the trust
scales. After this, each controller completed
four 30-min scenarios.
Results
For all dependent variables, 2 (presence or
absence of aid) × 2 (high or moderate traffic
load) repeated measures analyses of variance
(ANOVAs) were calculated. An alpha level of
.05 was used for all statistical tests.
Primary task performance: Detection of con-
flicts and self-separations. Averaged across all
conditions, controllers detected 78.13% (SE =
4.44%) of all conflicts. They detected a higher
percentage of conflicts when traffic density was
moderate (M=89.58%, SE =4.23%) than when
traffic was high (M = 66.67%, SE = 7.16%),
F(1, 11) = 8.59, p < .05. With respect to self-
separations, more were detected under moder-
ate (M = 68.75%, SE = 6.60%) than under high
traffic density (M = 37.50%, SE = 5.43%),
F(1, 11) = 25.00, p <.001, and when the decision
aid was present (M = 69.79%, SE = 6.73%)
than when it was absent (M = 36.46%, SE =
4.99%), F(1, 11) = 30.61, p = .001.
Advance notification times were averaged
across the two conflicts and four self-separations
in each scenario. (Because of undetected po-
tential conflicts, 8 out of 96 cells [8.33%] were
empty, so they were replaced by the respective
means of the conditions in which the cell was
missing.) With the aid, conflicts were detected
approximately 90 s earlier (M = 256.99 s, SE =
10.12 s) than without the aid (M = 164.06 s,
SE = 20.53 s), F(1, 11) = 24.39, p < .001. Self-
separations were detected earlier under moderate
(M = 347.03 s, SE = 18.01 s) than under high
traffic density (M = 251.20 s, SE = 11.39 s), F(1,
11) = 17.23, p <.01. Table 1 gives a summary of
6 Spring 2005 – Human Factors
Metzger,r1.qxd 3/18/05 6:00 PM Page 6
A
UTOMATION IN FUTURE ATM 7
the detection rates and advance notification
times for conflicts and self-separations.
Communication: Accepting and handing off
aircraft. Controllers accepted more than 98%
of the aircraft (M = 98.43%, SE = 0.42%) into
their sector with an average response time of
41.91 s (SE = 2.48 s). Controllers handed off
more aircraft successfully when the aid was pre-
sent (M = 81.71%, SE = 3.19%) than when it
was absent (M = 70.00%, SE = 4.91%), F(1,
11) = 8.22, p < .05, and handed off aircraft sig-
nificantly later under high (M = 44.41 s, SE =
5.98 s) than under moderate traffic conditions
(M = 27.51 s, SE = 5.46 s), F(1, 11) = 24.76,
p < .001. Controllers handed off more aircraft
prematurely without (M =24.40%, SE = 5.08%)
than with the aid (M = 13.80%, SE = 3.23%),
F(1, 11) = 6.78, p < .05. While under high traffic
density conditions, they handed off significantly
more aircraft prematurely when the aid was ab-
sent (M = 25.95%, SE = 7.22%) than when it
was present (M = 9.78%, SE = 3.35%); under
moderate traffic density, the effect of the absence
(M = 22.86%, SE = 7.44%) or presence of the
aid (M = 17.82%, SE = 5.43%) was smaller.
This interaction approached significance, F(1,
11) = 3.39, p = .09.
Secondary task performance. Performance in
the secondary task was affected by traffic densi-
ty but not by the presence or absence of the aid.
Controllers missed updating more way points
under high (M = 63.41%, SE = 6.23%) than un-
der moderate traffic density (M = 41.55%, SE =
6.73%), F(1, 11) = 14.14, p < .01. Controllers
updated more way points early under moderate
(M= 15.00%, SE = 3.62%) than under high traf-
fic density (M = 7.62%, SE =2.39%), F(1, 11) =
3.93, p =.07.Two controllers were excluded from
the analysis of response times because they did
not update any way points in one or all condi-
tions. On average, controllers updated a way
point 86.36 s (SE = 6.39) after the aircraft passed
the corresponding way point on the radar dis-
play but were not significantly affected by the
experimental manipulations.
Physiological measure: HRV. The effect of
traffic density on the 0.10-Hz band of the HRV
was significant, F(1, 11) = 5.43, p < .05, indi-
cating higher mental workload under high (M =
4.63, SE = 0.19) than under moderate traffic
density (M = 4.74, SE = 0.19).
Subjective ratings of mental workload. Con-
trollers rated mental workload significantly high-
er under high (M = 66.29, SE =3.55) than under
moderate traffic (M = 52.57, SE = 3.47), F(1,
11) = 28.47, p < .001.
Trust. Controller ratings of trust in the conflict
detection aid on a scale from 0 to 100 ranged
from 30 to 90 with an average of 63.33 (SE =
4.7). Median and mode were both 70. This can
be considered a moderately high level of trust,
considering that the aid functioned 100% re-
liably.
Discussion
As expected, controllers missed more poten-
tial conflicts and detected self-separations later
under high than under moderate traffic density.
Performance in routine communication tasks as
well as all measures of mental workload showed
similar unfavorable effects of high traffic densi-
ty. This validates the findings of adverse effects
TABLE 1: Mean Detection Rates and Advance Notification Times for Conflicts and Self-Separations in
Experiment 1
Detection Rate (%) Advance Notification Time (s)
Conflicts Self-Separations Conflicts Self-Separations
Aid Traffic Density Traffic Density Traffic Density Traffic Density
Moderate High Moderate High Moderate High Moderate High
Absent 91.67 62.5 45.83 27.08 196.21 131.91 343.78 224.45
(5.62) (8.97) (8.04) (4.83) (25.36) (30.52) (30.08) (18.27)
Present 87.50 70.83 91.67 47.92 247.08 266.90 350.29 277.95
(6.53) (11.45) (4.70) (8.95) (18.42) (8.43) (21.19) (8.84)
Note. Standard errors are shown in parentheses.
Metzger,r1.qxd 3/18/05 6:00 PM Page 7
8 Spring 2005 – Human Factors
of high traffic density on controller performance
in earlier FF studies in which controllers were
removed from the active control loop (Galster,
Duley, et al., 2001; Metzger & Parasuraman,
2001) and did not have access to pilot intent in-
formation (Castaño & Parasuraman, 1999). The
increase in the time to detect potential conflicts
is of particular concern, given that the highly
dense and efficient airspace under FF will leave
controllers with less time to resolve conflicts
(Galster, Duley, et al.; Metzger & Parasuraman).
This could have serious implications for safe-
ty and corroborates the view that FF and related
future ATM concepts (e.g., Phillips, 2000; RTCA,
1995) will not be feasible without the provision
of automation tools to support controllers. Sup-
porting this view was the finding that the con-
flict decision aid improved performance in the
detection of potential conflicts. The decision aid
also had beneficial effects on the communica-
tion task, indicating that it might free resources
that controllers could allocate to the performance
of other tasks, such as communication or grant-
ing user requests.
The reallocation of freed resources might
explain why the decision aid did not reduce
workload as expected, given that whereas per-
formance was considerably enhanced with the
aid, mental workload remained unchanged (see
Parasuraman & Hancock, 2001). Alternatively,
the high demand for monitoring pilot actions
under FF, as well as for monitoring the automa-
tion, could have increased workload (Warm et
al., 1996). A third possibility is that traffic den-
sity is a stronger workload driver, as compared
with lack of automation. Each aircraft required
the controller to perform many routine and co-
ordination tasks. A conflict detection aid does
not reduce the demands imposed by these tasks.
Interestingly, more aircraft were handed off pre-
maturely without than with the aid, particularly
under high traffic density, which could be a
controller strategy to manage workload. Con-
trollers might have felt more time pressure
when they were manually performing under
high traffic conditions and were trying to hand
off aircraft whenever they could, even if they
did so prematurely.
Given the 100% reliability of the aid, con-
troller trust ratings of the automation were not
very high. Although this finding was initially
unexpected, it was consistent with the postex-
periment interviews with the controllers, who
did not view detection aids (in general and the
one used in the simulation) very favorably. This
attitude is based on their experience with the
STCA and its frequent false alarms. Perhaps
the difference between the aid we used and
STCA was not made clear enough.
EXPERIMENT 2
Experiment 1 established the benefits of a
conflict detection aid on controller performance
under FF. However, the automation in that study
was always perfectly reliable, a situation not like-
ly to be the case in real settings, given the inher-
ent uncertainty of projecting future events and
the difficulty of any automation aid having fully
up-to-date and accurate intent information. Con-
troller recovery from an automation failure is
of great concern under FF conditions (Galster,
Duley, et al., 2001) because more aircraft will be
accommodated in the same amount of airspace,
creating a denser airspace and leaving the con-
troller with less time to ensure minimum sepa-
ration requirements between aircraft than is
currently the case. An added difficulty is that FF
removes the controller from the active decision-
making process by transferring authority to the
pilots. Previous studies have shown that con-
trollers require more time to detect conflicts in
case airborne separation fails under passive FF
than under active control (Metzger & Parasur-
aman, 2001). If the required time is greater than
the time available to recover from a critical and
rapidly developing situation, safety may be com-
promised.
Automation can compensate for this problem
to some extent, but just like FF, it can move the
controller further away from the decision-making
process. As long as the automation performs
the conflict detection function reliably, system
safety is maintained. However, in case of an au-
tomation failure, the effect of delayed detection
of potential conflicts could be further aggravat-
ed. We therefore predicted the same results as
in Experiment 1 for when the automation was
reliable. For unreliable automation, we hypoth-
esized that controllers would detect a conflict
earlier under manual conditions than under
automated conditions when the automation
failed to point it out.
Metzger,r1.qxd 3/18/05 6:00 PM Page 8
A
UTOMATION IN FUTURE ATM 9
Method
Participants. Twenty active full-performance
level controllers from the Washington, D.C.,
ARTCC (n = 14) and Washington area termi-
nal radar control (TRACON, n = 6) facilities
served as paid volunteers. Most of the con-
trollers had participated in Experiment 1 or pre-
vious studies using the same simulation. Their
ages ranged from 31 to 53 years (M = 37.65,
SD = 5.05), and their overall ATC experience
ranged from 8 to 22 years (M = 14.08, SD =
3.86). There was no significant difference in
age, F(1, 18) < 1, p > .05, or years on the job,
F(1, 18) < 1, p > .05, between TRACON and en
route controllers. Three (15%) en route con-
trollers were female.
Apparatus. The same ATC simulation, tasks,
and dependent variables as in Experiment 1 were
used. Controllers were also interviewed about
their experience with conflict detection aids,
specifically with the more “intelligent” aids that
are being implemented in facilities. They were
instructed that the four automation conditions
would contain a conflict detection aid that was
highly reliable, based on the flight plan of the
aircraft (as opposed to merely current speed and
heading), and that it had a 6-min look-ahead
time. The 6-min look-ahead time was chosen be-
cause in Experiment 1 controllers detected self-
separations under moderate traffic conditions
on average before the aid detected them. Before
performing in the automated conditions, con-
trollers received a detailed description and de-
monstration of the aid that explained its basic
underlying principle and even pointed out some
of its limitations (“In some cases, changes in
flight intent information might not be available
to the conflict probe...”).
Design. A repeated measures design included
automation condition with the levels (a) reli-
able automation, (b) automation failure with 2
min to recover, (c) automation failure with
4 min to recover, and (d) manual condition. All
controllers performed in all four conditions.
Half the participants performed the manual
condition before the automated conditions (i.e.,
all other conditions), and the other half per-
formed the manual condition after the auto-
mated conditions. In both groups, the reliable
automation condition was always presented
before the two unreliable automation condi-
tions. This was chosen deliberately so that the
controllers would get enough training and ex-
perience with a reliable automation aid and
build trust before being exposed to failures. The
order of the two automation failure conditions
was such that half of the participants performed
the 2-min failure condition before the 4-min
failure condition and the other half performed
the scenarios in reverse order. This resulted in
a double crossover design, the first associated
with the order of the manual and automated
conditions and the second (nested within the
first) associated with the order of the two auto-
mation failure conditions.
The four different automation conditions
were represented in five 25-min scenarios that
included on average about 16 aircraft in a 50-
mile radius sector after a 10-min ramp-up period.
In the reliable automation condition, consist-
ing of two scenarios, the conflict detection aid
reliably detected all five potential conflicts (two
conflicts and three self-separations). In the two
automation failure conditions, the conflict de-
tection aid detected the same five potential con-
flicts reliably. However, the aid failed to detect
a sixth event about 21 min into the scenario.
The event was a situation in which one aircraft
deviated from its flight plan (e.g., after a Traffic
Alert and Conflict Alerting System alert) in or-
der to avoid a conflict and climbed or descend-
ed into the path of another aircraft that also
deviated from its altitude filed on the flight plan.
The maneuvers left controllers with 4 min in one
scenario and 2 min in the other to detect the
conflict before the loss of separation occurred.
Altitudes and altitude changes of the aircraft
involved in this situation were slightly different
in the two automation failure scenarios (e.g.,
an aircraft climbed in one and descended in
another scenario) so that controllers would not
easily recognize that the same situation was
presented twice.
A fifth scenario was assigned for the manual
condition, in which no conflict detection aid
was available. However, the controllers were
presented with the same situation in which the
automation failed in the automation condition
and 4 min remained for detection. This allowed
for a direct comparison of conflict detection per-
formance between the manual and automated
Metzger,r1.qxd 3/18/05 6:00 PM Page 9
10 Spring 2005 – Human Factors
conditions. In order to create a set of five com-
parable scenarios, the sector and traffic patterns
were rotated to simulate a different flow of traf-
c. Way points and flights were renamed so that
the participants did not recognize that the sce-
narios were almost identical.
Procedure. After signing a consent form and
providing biographical information, controllers
were given instructions and a demonstration of
the simulation, completed a practice trial, and
were familiarized with the NASA-TLX and the
trust ratings. Then the controllers completed
the five scenarios.
Results
Data from the two scenarios with reliable
automation were averaged after initial analyses
revealed no significant differences. The data of
the 20 participants were analyzed with repeated
measures ANOVAs with one four-level indepen-
dent variable (reliable automation, failure 2 min,
failure 4 min, and manual) and three planned
orthogonal contrasts: (a) automated versus man-
ual conditions, (b) reliable automation versus
failure conditions, and (c) failure conditions with
2 versus 4 min to detect a conflict.
Primary task performance: Detection of con-
flicts and self-separations with reliable automa-
tion. There was a significant effect of automation
on the detection of conflicts, F(3, 57) = 8.14, p =
.01, and of self-separations, F(3, 57) = 9.98, p =
.001. More conflicts, F(1, 19) = 8.14, p =.01, and
more self-separations, F(1, 19) = 13.11, p < .01,
were detected under automated conditions than
under the manual condition. The automation
condition also had a significant effect on advance
notification of conflicts, F(3, 57) = 5.28, p <
.01, with conflicts being detected earlier in the
automated conditions than in the manual con-
dition, F(1, 19) = 5.31, p < .05. Conflicts were
also detected earlier in the failure conditions
(M = 338.38 s, SE = 6.73 s) than in the reliable
conditions (M = 306.92 s, SE = 14.93 s), F(1,
19) = 7.39, p = .01, but this was confounded
with an order effect. The automation condition
had no significant effects on advance notifica-
tion of self-separations, F(3, 57) < 1, p > .05.
Table 2 displays the data.
Primary task performance: Detection of con-
flicts and self-separations with unreliable auto-
mation. Table 3 shows descriptive statistics for
the detection rates in the two automated condi-
tions and the manual condition. Under auto-
mation, only 35% and 40% of the controllers
detected the conflict when they had 2 and 4 min
available, respectively. In contrast, 55.56% of
the controllers detected the conflict under the
manual condition. There was no difference in
conflict detection under the manual condition
between controllers who performed the manual
condition first and those who performed it last,
F(1, 16) < 1, p > .05. The effect of the automation
condition failed to reach significance, F(2, 38) =
1.78, p = .18. Because a difference was predict-
ed, however, post hoc tests were performed.
Orthogonal contrasts revealed no significant
effect between the 2- and 4-min failure condi-
tions, F(1, 19) < 1, p >.05. Hence the data were
collapsed across the two conditions for subse-
quent analyses. The contrast comparing auto-
mated and manual conditions found a trend for
better detection under manual than under auto-
mated conditions, F(1, 19) = 2.40, p = .14.
Because of the high number of missed poten-
tial conflicts, advance notification times were
available for fewer than 50% of the cells, too
low a number to calculate meaningful inferen-
tial statistics. Also, the maximum advance noti-
fication time was determined by the onset of the
conflict (e.g., in the 2-min condition, maximum
TABLE 2: Mean Detection Rates and Advance Notification Times for Conflicts
and Self-Separations in Experiment 2
Detection Rate (%) Advance Notification Time (s)
Aid Conflicts Self-Separations Conflicts0 Self-Separations
Absent 85 (5.26) 76.67 (4.90) 279.09 (23.15) 342.59 (7.76)
Present 100 (0.00)0 96.11 (1.39) 327.89 (6.90)0 349.48 (4.17)
Note. Standard errors are shown in parentheses.
Metzger,r1.qxd 3/18/05 6:00 PM Page 10
A
UTOMATION IN FUTURE ATM 11
advance notification was 2 min), not by control-
ler performance. Nevertheless, the automation
failure condition with 4 min can be compared
with the manual condition with 4 min between
the onset of the conflict and the loss of separa-
tion. Table 4 displays the data. Visual inspection
of the descriptive statistics revealed no marked
differences between the manual and automated
condition, even though the advance notification
time in the automation condition was about 17 s
earlier than that under manual conditions. Al-
so, the minimum advance notification time was
greater in the automation (40.53 s) than in the
manual condition (7.39 s). However, the median
performance was better in the manual condition.
Communication: Accepting and handing off
aircraft. Controllers accepted more than 98%
of the aircraft (M = 98.13%, SE = 0.32%) into
their sector with an average response time of
36.79 s (SE = 2.39 s). There was a significant
effect of automation on response times to re-
quests for acceptance, F(3, 57) = 3.40, p < .05.
Response times were shorter in the failure condi-
tion (M = 32.04 s, SE =3.42 s) than in the reliable
condition (M = 39.36 s, SE = 5.61 s), F(1, 19) =
5.28, p < .03, reflecting the order effect. The ef-
fect of automation on response times failed to
reach significance, F(1, 19) = 2.71, p = .12. No
significant effects were found for the percent-
age of successful, F(3, 57) < 1, p >.05, or early
handoffs, F(3, 57) = 1.95, p > .05. Aircraft were
handed off faster in the failure (M= 26.42 s, SE =
2.49 s) than in the reliable (M = 31.58 s, SE =
3.54 s) condition, F(1, 19) = 4.77, p < .05, again
reflecting the order effect.
Secondary task performance. A significant ef-
fect of automation on the percentage of missed
updates was found, F(3, 57) = 4.28, p < .01.
Controllers missed updating more way points
under manual (M = 66.00%, SE = 2.87%)
than under automated conditions (M = 56.48%,
SE = 2.46%), F(1, 19) = 7.73, p = .01. One
participant was excluded from the analysis of
response times because he did not update any
way points in three scenarios. For the remaining
participants, a significant effect of the auto-
mation condition was found, F(3, 54) = 3.54,
p < .05. It took controllers significantly longer
to update way points under automated (M =
126.56 s, SE = 8.74 s) than under manual con-
ditions (M = 97.15 s, SE = 12.20 s), F(1, 18) =
7.2 7, p < .05. There was no effect of automation
on the percentage of early updates, F(3, 57) =
1.47, p > .05.
Subjective ratings of mental workload. The
effect of automation on ratings of mental work-
load failed to reach significance, F(3, 57) =
1.86, p > .05. The contrast between automated
(M = 60.03, SE = 1.71) and manual conditions
(M = 65.46, SE = 3.76) was also nonsignificant,
F(1, 19) = 2.59, p = .12.
Trust and self-confidence. Ratings of trust
ranged from 0 to 100 with an average of 75.71
(SE = 2.14). Trust ratings differed significantly
between the automation conditions, F(2, 38) =
1.63, p < .05. Controllers rated their trust high-
er under reliable (M = 79.33, SE = 2.66) than
under failure conditions (M = 72.10, SE =
3.30), F(1, 19) = 5.96, p < .05. Ratings of self-
confidence ranged from 25 to 100 with an
TABLE 3: Descriptive Statistics for Detection Rates
as a Function of Automation Condition
Condition
Auto Failure
2 min (%) 4 min (%) Manual (%)
n 20.00 20.00 18.00
Mean 35.00 40.00 55.56
SE 10.94 11.24 12.05
Minimum 0.00 0.00 0.00
Maximum 100.000 100.000 100.000
Mode 0.00 0.00 100.000
Median 0.00 0.00 100.000
TABLE 4: Descriptive Statistics for Advance Notifi-
cation Times as a Function of Automation Condition
Condition
Auto Failure
2-min (s) 4-min (s) Manual (s)
n 7.0 8. 10.0
Mean 83.23 127.12 110.27
SE 12.19 24.26 23.53
Minimum 28.80 40.53 7.39
Maximum 118.010 217.52 216.25
Mode
Median 87.38 102.62 120.00
Metzger,r1.qxd 3/18/05 6:00 PM Page 11
12 Spring 2005 – Human Factors
average of 65.70 (SE = 2.42) and did not vary
with automation conditions. Hence the con-
trollers’ trust in the automation was greater than
their self-confidence to perform without it.
Discussion
As in Experiment 1, there was a marked ben-
efitof the conflict detection automation on
controller performance when the automation
functioned 100% reliably. More potential con-
flicts and self-separations were detected and
conflicts were detected earlier with the automa-
tion than under manual conditions. The early
detection results are particularly encouraging,
considering that higher traffic densities will
make timely detection of potential conflicts es-
sential. These automation benefits did not carry
over to the communication task of accepting
and handing off aircraft. Such routine tasks im-
posed a considerable amount of workload but
did not benefit from the availability of a decision
aid aimed at conflict detection and resolution.
Either the aid did not free enough resources or
the controllers chose not to allocate them to
improve communication performance. Overall
mental workload was also not significantly re-
duced by the automation. Subjective ratings of
workload did not change, and even though con-
trollers updated more way points when they
had conflict detection automation available, it
took them longer to update them. This could
represent either a speed-accuracy trade-off or a
cost of the automation associated with putting
controllers in a monitoring mode. Still, automa-
tion had mostly beneficial effects as long as it
functioned 100% reliably.
Under imperfect automation, controllers were
more likely to detect a conflict when perform-
ing manually than when assisted by automation
that failed to point out a particular conflict. Even
though this effect only approached conventional
statistical significance, we believe it is practical-
ly significant, particularly for a safety-critical
environment such as ATC. The reliability of the
detection aid in this study (one failure per sce-
nario) was rather low in fact, much lower than
would be acceptable for a real-world system.
However, higher automation reliability is associ-
ated with a reduced likeliness to detect a failure
of the automation (May, Molloy, & Parasuraman,
1993). Therefore the effect in this study should
be considered a conservative estimate.
Further, studying operator response to very
rare system failures or other surprises requires
that the surprising event be presented only once
or twice to prevent the buildup of expectancies
(Molloy & Parasuraman, 1996; Wickens, 1998).
However, low numbers of events result in low
power and therefore the possibility of statisti-
cally nonsignificant results. Thus obtaining
empirical data on the response to rare, unex-
pected events is difficult, expensive, and time
consuming, especially if it becomes necessary to
keep the sample size low to meet cost or time
constraints, if a specific expert population is
tested (air traffic controllers, pilots, astronauts,
etc.), and if expensive equipment is used (e.g.,
simulators, eye-tracking equipment). As a con-
sequence, the analysis and interpretation of the
results focused on practical relevance as well as
statistical significance (Wickens, 1998; Wickens,
Gordon, & Liu, 1998).
No difference in detection rates was found
when controllers had 2 versus 4 min available
between the onset of the failure event and the
loss of separation. Timeliness in the detection
of a potential conflict was also not markedly
different between manual performance and the
comparable automation failure condition. It is
possible that a longer time frame (similar to
the one used in the other conditions) would
have shown an effect. Two or 4 min might sim-
ply not have been enough to show a variation.
However, inaccuracies in a conflict detection
aid such as URET are more likely to be short
term. Longer term changes in flight plans, for
example, would eventually be available to
URET so that they could be taken into account
for the conflict or no-conflict decision.
One limitation of this study should be noted.
The order of presentation of reliable and unre-
liable automation was confounded with the reli-
ability. Therefore it is possible that the (reduced)
detection of an automation failure was attribut-
able to effects other than the reliability. For
example, perhaps controllers became tired after
repeated exposure to the scenarios. Communi-
cation performance suggested quite the oppo-
site, however. Controllers accepted and handed
off aircraft faster in the unreliable than in the
reliable condition, suggesting a practice effect.
Metzger,r1.qxd 3/18/05 6:00 PM Page 12
A
UTOMATION IN FUTURE ATM 13
Based on this finding, controllers should be
expected to be more, rather than less, efficient
in their task performance, including conflict
detection. In addition, if fatigue were a major
factor, then manual performance should have
been reduced in those controllers who per-
formed the manual condition last, but this was
not found. There was no difference in controller
performance as a function of order.
GENERAL DISCUSSION AND IMPLICA-
TIONS FOR SYSTEM DESIGN
In two experiments we examined the effects
of automation, both reliable and imperfect, on
controller performance under conditions of
shared decision making. The results showed
that advanced FF conditions can lead to detri-
mental effects when traffic density is high. In
support of previous findings, such conditions
reduced conflict detection and increased mental
workload (Endsley et al., 1997; Galster, Duley,
et al., 2001; Metzger & Parasuraman, 2001).
The provision of conflict probe automation mit-
igated these performance costs. It should be
noted, however, that although controller perfor-
mance was significantly improved by the aid, a
significant number of potential conflicts were
still missed. In the real world this would not be
acceptable. The high number of missed events
can probably be attributed to the fact that traf-
fic patterns were created by the experimenter
and did not represent recorded or live traffic, as
in some higher fidelity ATC simulations. There-
fore they lacked some of the structure that con-
trollers are used to. However, that would be the
case under the National Route Program and,
even more so, under FF conditions. Also, con-
trollers were not nearly as familiar with the
sector as they would be in the real world. There-
fore the values obtained (e.g., detection rates)
should be considered as conservative estimates
and should be directly compared not with real-
world numbers but, rather, with the different
conditions created in the simulations.
The results also suggested that these automa-
tion benefits might not come without a cost.
Controllers performed better in detecting con-
flicts without automation than when they had
automation support that was less than 100%
accurate. FF removes the controller from the
decision-making process in a manner that leads
to reduced performance (e.g., in the detection of
conflicts). Automation is introduced to make
up for these deficits. If decision-making auto-
mation is reliable, performance is improved.
However, if the automation is imperfect (for
whatever reason), there is a chance that the fail-
ure will not be detected. For the design of future
ATC systems, this implies that automation might
not be able to fully compensate for the subop-
timal role of the controller, who will be left to
monitor the decisions of others under future
ATM conditions. The practical implication of
this finding for the design engineer is that con-
trollers should be given an active role in the sys-
tem to ensure that they can detect and respond
to malfunctions in a timely manner.
An alternative solution is to support control-
lers in routine tasks. In these experiments, auto-
mation did not reduce workload in all cases,
potentially because of the heavy workload asso-
ciated with the routine tasks under high traffic
density and with the monitoring requirement.
Supporting controllers in routine tasks would
help to keep operators in the loop of the most
important decisions but relieve them of repeti-
tious and less important tasks. Improved dis-
plays (e.g., display integration, color coding,
ecological interfaces) that allow the controller to
detect malfunctions in the system more easily
could also be helpful (Molloy & Parasuraman,
1994). However, as with any form of automa-
tion, the consequences on human performance
and other criteria (e.g., secondary evaluative cri-
teria; Parasuraman, Sheridan, & Wickens, 2000)
need to be thoroughly evaluated.
ACKNOWLEDGMENTS
This research was supported by Grant No.
NAG-2-1096 from NASA Ames Research Cen-
ter, Moffett Field, California. Kevin Corker and
Richard Mogford were the technical monitors.
We would like to thank all controllers who par-
ticipated in the experiments. We appreciate the
comments of two anonymous reviewers.
REFERENCES
Ahlstrom, V., Longo, K., & Truitt, T. (2002). Human factors design
guide update: A revision to chapter 5 – Automation guidelines
(DOT/FAA/CT-02/11). Atlantic City International Airport, NJ:
Federal Aviation Administration, William J. Hughes Technical
Center.
Metzger,r1.qxd 3/18/05 6:00 PM Page 13
14 Spring 2005 – Human Factors
Bainbridge, L. (1983). Ironies of automation. Automatica, 19,
775–779.
Bilimoria, K. D. (2001). Methodology for the performance evalua-
tion of a conflict probe. Journal of Guidance, Control, and
Dynamics, 24, 444–451.
Billings, C. E. (1997). Aviation automation: The search for a
human-centered approach. Mahwah, NJ: Erlbaum.
Brudnicki, D. J., & McFarland, A. L. (1997). User Request Evalua-
tion Tool (URET) conflict probe performance and benefits
assessment (Tech. Report MP97W112). McLean, VA: MITRE.
Castaño, D., & Parasuraman, R. (1999). Manipulation of pilot
intent under free flight: A prelude to not-so-free flight. In Pro-
ceedings of the 10th International Symposium on Aviation
Psychology (pp. 170–176). Columbus: Ohio State University.
Corker, K., Fleming, K., & Lane, J. (1999). Measuring controller
reactions to free flight in a complex transition sector. Journal of
Air Traffic Control, 4, 9–16.
Dekker, S. W. A., & Woods, D. D. (1999). To intervene or not to
intervene: The dilemma of management by exception. Journal
of Cognition, Technology and Work, 1, 86–96.
DiMeo, K., Sollenberger, R., Kopardekar, P., Lozito, S., Mackintosh,
M.-A., Cardosi, K., et al. (2002). Air ground integration experi-
ment (Tech. Report DOT/FAA/CT-TN02/06). Atlantic City
International Airport, NJ: Federal Aviation Administration,
William J. Hughes Technical Center.
Drucker, P. F. (1954). The practice of management. New York:
Harper and Row.
Dunbar, M., Cashion, P., McGann, A., Macintosh, M.-A., Dulchinos,
V., Jara, D., et al. (1999). Air-ground integration issues in a self-
separation environment. In Proceedings of the 10th International
Symposium on Aviation Psychology (pp. 183–189). Columbus:
Ohio State University.
Endsley, M. R. (1996). Automation and situation awareness. In R.
Parasuraman & M. Mouloua (Eds.), Automation and human
performance: Theory and applications (pp. 163–181). Mahwah,
NJ: Erlbaum.
Endsley, M. R., & Kiris, E. O. (1995). The out-of-the-loop perfor-
mance problem and level of control in automation. Human
Factors, 37, 381–394.
Endsley, M. R., Mogford, R. H., Allendoerfer, K. R., Snyder, M. D.,
& Stein, E. S. (1997). Effect of free flight conditions on con-
troller performance, workload, and situation awareness (DOT/
FAA/CT-TN97/12). Atlantic City International Airport, NJ:
Federal Aviation Administration, William J. Hughes Technical
Center.
Endsley, M. R., & Rodgers, M. D. (1998). Distribution of attention,
situation awareness and workload in a passive ATC task: Im-
plications for operational errors and automation. Air Traffic
Control Quarterly, 6(1), 21–44.
Galster, S. M., Bolia, R. S., Roe, M. M., & Parasuraman, R. (2001).
Effects of automated cueing on decision implementation in a
visual search task. In Proceedings of the Human Factors and
Ergonomics Society 45th Annual Meeting (pp. 321–325).
Santa Monica, CA: Human Factors and Ergonomics Society.
Galster, S. M., Duley, J. A., Masalonis, A. J., & Parasuraman, R.
(2001). Air traffic controller performance and workload under
mature free flight: Conflict detection and resolution of aircraft
self-separation. International Journal of Aviation Psychology,
11, 71–93.
Hilburn, B. G. (1996). The impact of advanced decision-aiding
automation on mental workload and human-machine system
performance. Unpublished dissertation, Catholic University of
America, Washington, DC.
Hilburn, B. G., & Parasuraman, R. (1997). Free flight: Military
controllers as an appropriate population for evaluation of ad-
vanced ATM concepts. In Proceedings of the 10th International
CEAS Conference on Free Flight (pp. 23–27). Amsterdam:
Confederation of European Aviation Societies.
Lee, J. D., & Moray, N. (1992). Trust, control strategies and alloca-
tion of function in human-machine systems. Ergonomics, 35,
1243–1270.
Lee, J. D., & Moray, N. (1994). Trust, self-confidence, and operators’
adaptation to automation. International Journal of Human-
Computer Studies, 40, 153–184.
Lorenz, B., Di Nocera, F., Rottger, S., & Parasuraman, R. (2002).
Automated fault management in a simulated space flight
micro-world. Aviation, Space, and Environmental Medicine,
73, 886–897.
Lozito, S., McGann, A., Mackintosh, M., & Cashion, P. (1997,
June). Free flight and self-separation from the flight deck per-
spective. Presented at the Seminar on Air Traffic Management
Research and Development, Saclay, France. Retrieved January
30, 2005, from http://atm-seminar-97.eurocontrol.fr/lozito.htm
Masalonis, A. J., Le, M. A., Klinge, J. C., Galster, S. M., Duley, J.
A., Hancock, P. A., et al. (1997). Air traffic control worksta-
tion mock-up for free flight experimentation: Lab development
and capabilities [Abstract]. In Proceedings of the Human
Factors and Ergonomics Society 41st Annual Meeting (p.
1379). Santa Monica, CA: Human Factors and Ergonomics
Society.
Masalonis, A. J., & Parasuraman, R. (2003). Fuzzy signal detection
theory: Analysis of human and machine performance in air
traffic control, and analytical considerations. Ergonomics, 46,
1045–1074.
May, P. A., Molloy, R. J., & Parasuraman, R. (1993). Effects of auto-
mation reliability and failure rate on monitoring performance
in a multi-task environment (Tech. Report). Washington, DC:
Catholic University of America, Cognitive Science Laboratory.
Metzger, U., & Parasuraman, R. (2001). The role of the air traffic
controller in future air traffic management: An empirical study
of active control versus passive monitoring. Human Factors,
43, 519–528.
Molloy, R., & Parasuraman, R. (1994). Automation-induced moni-
toring inefficiency: The role of display integration and redun-
dant color coding. In M. Mouloua & R. Parasuraman (Eds.)
Human performance in automated systems: Current research
and trends (pp. 224–228). Hillsdale, NJ: Erlbaum.
Molloy, R., & Parasuraman, R. (1996). Monitoring an automated
system for a single failure: Vigilance and task complexity
effects. Human Factors, 38, 311–322.
National Aeronautics and Space Administration Ames Research
Center. (1986). NASA Task Load Index (TLX): Paper and pen-
cil version. Moffett Field, CA: Author, Aerospace Human
Factors Research Division.
National Aeronautics and Space Administration, Aviation System
Capacity Program, Advanced Air Transport Technologies
Project. (1999). Concept definition for distributed air/ground
traffic management (DAG-TM), Version 1.0. Moffett Field,
CA: Author.
Parasuraman, R., & Byrne, E. A. (2003). Automation and human
performance in aviation. In P. Tsang & M. Vidulich (Eds.)
Principles and practice of aviation psychology (pp. 311–356).
Mahwah, NJ: Erlbaum.
Parasuraman, R., Duley, J. A., & Smoker, A. (1998). Automation
tools for controllers in future air traffic control. Controller:
Journal of Air Traffic Control, 37, 8–15.
Parasuraman, R., & Hancock, P. A. (2001). Adaptive control of
mental workload. In P. A. Hancock & P. A. Desmond (Eds.),
Stress, workload, and fatigue (pp. 305–320). Mahwah, NJ:
Erlbaum.
Parasuraman, R., & Mouloua, M. (Eds.). (1996). Automation and
human performance: Theory and applications. Mahwah, NJ:
Erlbaum.
Parasuraman, R., & Riley, V. (1997). Humans and automation:
Use, misuse, disuse, abuse. Human Factors, 39, 230–253.
Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A
model for types and levels of human interaction with automa-
tion. IEEE Transactions on Systems, Man, and Cybernetics,
Part A: Systems and Humans, 30, 286–297.
Phillips, C. T. (2000). Detailed description for CE-5 en route free
maneuvering (NAS2-98005 RTO-41). Mays Landing, NJ: Titan
Systems Corp., System Research Corp. Division.
Remington, R. W., Johnston, J. C., Ruthruff, E., Gold, M., &
Romera, M. (2000). Visual search in complex displays: Factors
affecting conflict detection by air traffic controllers. Human
Factors, 42, 349–366.
Rovira, E., McGarry, K., & Parasuraman, R. (2002). Effects of
unreliable automation on decision making in command and
control. In Proceedings of Human Factors and Ergonomics
Society 46th Annual Meeting (pp. 428–432). Santa Monica,
CA: Human Factors and Ergonomics Society.
Radio Technical Commission for Aeronautics. (1995). Report of
the RTCA Board of Director’s Select Committee on Free Flight.
Washington, DC: Author.
Sarter, N. B., & Amalberti, R. (Eds.). (2000). Cognitive engineering
in the aviation domain. Hillsdale, NJ: Erlbaum.
Metzger,r1.qxd 3/18/05 6:00 PM Page 14
A
UTOMATION IN FUTURE ATM 15
Schick, F. V., & Völckers, U. (1991). The COMPAS system in the
ATC environment. Braunschweig, Germany: Mitteilung
Deutsche Forschungsanstalt für Luft- und Raumfahrt.
Sheridan, T. B. (2002). Humans and automation. New York:
Wiley.
van Gent, R. N. H. W., Hoekstra, J. M., & Ruigrok, R. C. J. (1998).
Free flight with airborne separation assurance: A man-in-the-
loop simulation study (Tech. Report). Amsterdam: National
Aerospace Laboratory.
Warm, J. S., Dember, W., & Hancock, P. A. (1996). Vigilance and
workload in automated systems. In R. Parasuraman & M.
Mouloua (Eds.), Automation and human performance: Theory
and applications (pp. 183–200). Mahwah, NJ: Erlbaum.
Wickens, C. D. (1992). Engineering psychology and human perfor-
mance (2nd ed.). Scranton, PA: HarperCollins.
Wickens, C. D. (1998). Commonsense statistics. Ergonomics in
Design, 6(4), 18–22.
Wickens, C. D. (2000). Imperfect and unreliable automation and
its implications for attention allocation, information access and
situation awareness (Tech. Report ARL-00-10/NASA-00-2).
Urbana-Champaign, IL: Institute of Aviation, Aviation Research
Lab.
Wickens, C. D., Gempler, K., & Morphew, M. E. (2000). Workload
and reliability of traffic displays in aircraft traffic avoidance.
Transportation Human Factors, 2, 99–126.
Wickens, C. D., Gordon, S. E., & Liu, Y. (1998). An introduction
to human factors engineering. New York: Longman.
Wickens, C. D., Mavor, A., Parasuraman, R., & McGee, J. (1998).
The future of air traffic control: Human operators and automa-
tion. Washington, DC: National Academy.
Wiener, E. L., & Curry, R. E. (1980). Flight-deck automation:
Promises and problems. Ergonomics, 23, 995–1011.
Willems, B., & Truitt, T. R. (1999). Implications of reduced involve-
ment in en route air traffic control (Tech. Report DOT/FAA/
CT-TN99/2). Atlantic City International Airport, NJ: Federal
Aviation Administration, William J. Hughes Technical Center.
Ulla Metzger is a human factors specialist at Deutsche
Bahn AG, DB Systemtechnik, Munich, Germany. She
received her Ph.D. in psychology in 2001 at Darm-
stadt University of Technology, Germany.
Raja Parasuraman is a professor of psychology at
George Mason University, Fairfax, Virginia. He re-
ceived his Ph.D. in psychology in 1976 at the Uni-
versity of Aston, Birmingham, U.K.
Date received: October 30, 2003
Date accepted: May 13, 2004
Metzger,r1.qxd 3/18/05 6:00 PM Page 15
... Studies reported by Parasuraman and Riley (1997) and Parasuraman (2000) introduce further concerns about assigning the person the role of critiquing the computer's recommendations before acting. These studies discuss how the introduction of a DSS system can lead to overreliance by the human user when the software is generating the initial recommendations (Metzger and Parasuraman, 2005;Skitka, et al., 1999). ...
... A third consideration was concern over the potential for complacency if the person played the role of critic, letting the computer complete an initial assessment and then having the person decide whether to accept this assessment. Parasuraman and Riley (1997) have shown that, in such a role, there is a risk that the person will become overreliant on the computer, and will not adequately apply his knowledge in completing the critique (Metzger and Parasuraman, 2005). (Note, however, that a person could become overreliant even with the roles reversed, as the person might start to get careless and assume the computer will always catch his slips. ...
... Multiple studies have investigated how varying levels of mental workload impact ATCs' task performance. Metzger and Parasuraman (2005) found that high mental workload negatively affected ATCs' ability to detect conflicts and make timely decisions [7]. Similarly, Pant et al. (2012) reported that excessive mental workload could lead to diminished focus and increased likelihood of errors in ATC tasks [8]. ...
... Multiple studies have investigated how varying levels of mental workload impact ATCs' task performance. Metzger and Parasuraman (2005) found that high mental workload negatively affected ATCs' ability to detect conflicts and make timely decisions [7]. Similarly, Pant et al. (2012) reported that excessive mental workload could lead to diminished focus and increased likelihood of errors in ATC tasks [8]. ...
... However, when the decisions produced by the automated decision aids prove to be incorrect, detrimental impact follows. In these instances, participants not utilizing automation perform better than those operating with incorrect automated decision aids (Metzger and Parasuraman, 2005). This highlights how individuals can suffer from automation bias and complacency and stresses the need for the correct implementation of automaton, one that does not overpower human operators. ...
Article
Full-text available
Security Operation Centers (SOCs) comprise people, processes, and technology and are responsible for protecting their respective organizations against any form of cyber incident. These teams consist of SOC analysts, ranging from Tier 1 to Tier 3. In defending against cyber-attacks, SOCs monitor and respond to alert traffic from numerous sources. However, a commonly discussed challenge is the volume of alerts that need to be assessed. To aid SOC analysts in the alert triage process, SOCs integrate automation and automated decision aids (ADAs). Research in the human automation field has demonstrated that automation has the potential of cognitive skill degradation. This is because human operators can become over-reliant on automated systems despite the presence of contradictory information. This cognitive bias is known as automation bias. The result of this study is the development of four critical success factors (CSFs) for the adoption of automation within SOCs in an attempt to mitigate automation bias: (1) Task-based Automation; (2) Process-based Automation; (3) Automation Performance Appraisal; and (4) SOC Analyst Training of Automated Systems. In applying these CSFs, a beneficial balance between the SOC analyst and the use of automation is achieved. This study promotes the human-in-the-loop approach whereby experienced and cognitively aware SOC analysts remain at the core of SOC processes.
... In ADAS, the type of action varies from no reaction to simple reactions to more complex decision processes and largely depends on the individual system. Based on previous findings that identified increased mental workload with an increased degree of decision-making (Kuo et al. 2019;Metzger and Parasuraman 2005;Soria-Oliver, López, and Torrano 2017;Wickens 2017), we suggest that a higher level of complexity in interaction with the system further contributes to the amount of mental workload. In addition, findings from aviation point to the relevance of mode and interface complexity in describing the degree of complexity of the requested action by the operator (Martins 2016). ...
... Enhanced performance. HAT can significantly enhance overall performance by leveraging the strengths of humans and intelligent autonomous systems [3], [107], [108]. Humans possess cognitive abilities such as creativity, intuition, and complex decision-making, whereas autonomous systems provide computational power, precision, and efficiency. ...
Preprint
Full-text available
Artificial Intelligence (AI) techniques, particularly machine learning techniques, are rapidly transforming tactical operations by augmenting human decision-making capabilities. This paper explores AI-driven Human-Autonomy Teaming (HAT) as a transformative approach, focusing on how it empowers human decision-making in complex environments. While trust and explainability continue to pose significant challenges, our exploration focuses on the potential of AI-driven HAT to transform tactical operations. By improving situational awareness and supporting more informed decision-making, AI-driven HAT can enhance the effectiveness and safety of such operations. To this end, we propose a comprehensive framework that addresses the key components of AI-driven HAT, including trust and transparency, optimal function allocation between humans and AI, situational awareness, and ethical considerations. The proposed framework can serve as a foundation for future research and development in the field. By identifying and discussing critical research challenges and knowledge gaps in this framework, our work aims to guide the advancement of AI-driven HAT for optimizing tactical operations. We emphasize the importance of developing scalable and ethical AI-driven HAT systems that ensure seamless human-machine collaboration, prioritize ethical considerations, enhance model transparency through Explainable AI (XAI) techniques, and effectively manage the cognitive load of human operators.
... Multiple researchers including Hu et al. (2019) showed that lower reliability led to a decrease in trust. Further, Metzger & Parasuraman (2015) found that reliable automation reduced workload. Accordingly, we hypothesize that lower reliability reduces trust (H4) and higher reliability reduces workload (H5). ...
Article
Development of responsive automation necessitates a framework for studying human-automation interactions in a broad range of operating conditions. This study uses a novel experiment design involving multiple binary perturbations in different stimuli to elicit measurable changes in cognitive factors that affect human-decision making during conditionally-automated (SAE Level 3) driving: trust in automation, mental workload, self-confidence, and risk perception. To infer changes in these factors, psychophysiological metrics such as heart rate variability and galvanic skin response, behavioral metrics such as eye gaze and reliance on automation, and self-reports were collected. Findings from statistical tests revealed significant changes, particularly in psychophysiological and behavioral metrics, for some treatments. However, other treatments did not elicit a significant change, highlighting the complexities of a between-subject experiment design with variations in multiple independent variables. Findings also underscore the importance of collecting heterogeneous human data to infer changes in cognitive factors during interactions with automation.
Article
Full-text available
Irrespective of which proposal is implemented, the increasing complexity of future airspace will require the development of automation tools to support air traffic controllers. In this article we consider two major human factors issues critical to the design go ATC automation tools.
Article
Full-text available
The concept of free flight is intended to provide increased flexibility and efficiency throughout the global airspace system. This idea could potentially shift aircraft separation responsibility from air traffic controllers to flight crews creating a ‘shared-separation’ authority environment. A real-time, human-in-the-loop study was conducted using facilities at NASA Ames Research Center and the FAA William J. Hughes Technical Center. The goal was to collect data from controllers and pilots on shared-separation procedures, information requirements, workload, and situation awareness. The experiment consisted of four conditions that varied levels of controller and flight crew separation responsibilities. Twelve controllers and six pilots were provided with enhanced traffic and conflict alerting systems. Results indicated that while safety was not compromised, pilots and controllers had differing opinions regarding the application of these new tools and the feasibility of the operational concept. This limited investigation demonstrated the need to further explore the shared-separation concept.
Article
This paper discusses the ways in which automation of industrial processes may expand rather than eliminate problems with the human operator. Some comments will be made on methods of alleviating these problems within the 'classic' approach of leaving the operator with responsibility for abnormal conditions, and on the potential for continued use of the human operator for on-line decision-making within human-computer collaboration.
Article
The effectiveness of automated decision aids used by human operators in command and control systems may depend not only on automation reliability, but also on the type (stage) and level the automated support provides. Automation can be applied to information acquisition, information integration and analysis, decision choice selection, or action implementation (Parasuraman, Sheridan, & Wickens, 2000). The present study examined the effects of variations in the stage of automation support on performance in a “Sensor to Shooter” targeting simulation of command and control. Independent variables included the type and level of automation support (complete listing, priority listing, top choices, and recommendation of decision choice) and the reliability of the automation (60% and 80%). Dependent variables included accuracy and reaction time of target engagement decisions. Compared to manual performance, reliable automation did not affect the accuracy of target engagement decisions but did significantly reduce decision times. When the automation was unreliable, under the higher reliability condition (80%) there was a greater cost in accuracy performance for higher levels of automation aiding (priority listing, top choice, and recommendation) than at a lower level (complete listing). The results support the view that automation unreliability has a greater performance cost for decision automation than for information automation. This performance cost generalizes across a number of different forms of decision-aiding.
Article