PREVENTING SCENARIO RECOGNITION IN
HUMAN-IN-THE-LOOP AIR TRAFFIC CONTROL RESEARCH
Gijs de Rooij, Clark Borst, M. M. (Ren´
e) van Paassen and Max Mulder
Aerospace Engineering - Delft University of Technology, Delft, The Netherlands
In academic air trafﬁc control research, trafﬁc scenarios are often repeated to increase the sample
size and enable paired-sample comparisons, e.g., between different display variants. This comes
with the risk that participants recognize scenarios and consequently recall the desired response. In
this paper we provide an overview of mitigation techniques found in literature and conclude that ro-
tating scenario geometries is most frequently used. The potential impact of these transformations on
participant behavior, as described in this paper, is however not sufﬁciently addressed in most studies.
As an example we, therefore, analyze previously collected eye tracking data from ten professional
air trafﬁc controllers, each presented with three repetitions in various rotations of several distinct
scenarios. Results imply that researchers wishing to repeat scenarios should more carefully consider
whether mitigation techniques might have an impact on their results.
In air trafﬁc controller (ATCO) training and airspace redesign trials, simulation scenarios are designed to be as realistic
as possible with many different ﬂights over a prolonged period of time. High face validity enables the ATCOs to
execute their tasks as they would in an operational setting. Academic research, however, often beneﬁts from simpliﬁed,
more constrained scenarios that are presented to novices or experts while tracking their behavior e.g., when using
different display variants. Constructing alike scenarios, where the scenario itself is not an independent variable, is
a major task, requiring considerable effort and input from subject matter experts. As an alternative, identical trafﬁc
scenarios are, therefore, often repeated to obtain paired-samples at the risk of scenario recognition. Depending on the
aim of the study, this can be undesirable as participants may recall their earlier responses rather than coming up with an
independent solution, aggravating learning effects. This applies especially to studies that measure ATCO consistency,
such as in the personalization of conﬂict resolution advisories (Westin et al., 2016). Finding a balance between using
alike scenarios and preventing recognition is not trivial.
In this paper we, for the ﬁrst time, provide an overview of techniques used to mitigate scenario recognition
in existing air trafﬁc control (ATC) studies. A straightforward and frequently employed method is to rotate and/or
mirror scenarios. While these transformations result in identical scenarios in terms of conﬂict angles, trafﬁc densities
and patterns etc., the change in orientation may unconsciously impact participant behavior. This may not reveal itself
in the ﬁnal outcome, e.g., solving a conﬂict, but it can elicit different visual scan patterns to arrive at this outcome.
Visual search is an essential process that ATCOs use to continuously update their mental picture (Fraga et al., 2021).
Changes in this process may lead to faster or slower conﬂict detection in otherwise identical scenarios, affecting
related objective measures. Furthermore, perceived workload may be affected (e.g., due to unusual trafﬁc directions,
especially for experts) and action sequences or conﬂict resolutions might change due to different ﬁxation orders.
These effects are, to the best of our knowledge, not sufﬁciently identiﬁed and recognized in literature. Authors
often merely mention that scenarios are transformed to ‘prevent recognition’ without further detailing their consider-
ations or the transformation’s implications. In addition to our literature survey on mitigation techniques, we therefore
analyze eye tracking data from a previously executed experiment that featured scenario transformations. The data con-
sists of ten professional ATCOs who each performed conﬂict detection and resolution in 15 distinct scenarios, of which
ﬁve were selected for this analysis. Each scenario was presented three times to them with different transformations.
By comparing the order in and speed at which ﬂights were ﬁxated, we empirically describe the participants’ behav-
ioral consistency when presented with transformed repetitive scenarios. To conclude we argue on the implications that
researchers should consider when repeating scenarios, based on these initial ﬁndings.
A literature survey resulted in the identiﬁcation of three categories of techniques to prevent scenario recogni-
tion, explicitly described in 20 ATC studies and summarized in Table 1: geometric, textual and temporal. Most studies
used a combination of techniques, with rotating scenarios as the most popular technique, employed in 15 studies.
Table 1: Scenario recognition mitigation techniques explicitly mentioned in existing research.
Geometric Textual Temporal
Renaming Renaming Time
Study Rotation Mirroring callsigns waypoints shifting Reordering
Abdul Rahman, 2014 ✓- - - - -
Albuquerque et al., 2008 - ? - - ✓ ✓
Borst et al., 2017 ✓- - - - ✓
Borst et al., 2019 ✓ ✓ - - - -
Cummings et al., 2005 ✓- - - - -
Harrison et al., 2014 - - ✓- - -
Hilburn et al., 2014 ✓- - ✓- -
IJtsma et al., 2022 ✓- - - - -
Jans et al., 2019 ✓- - - - -
Jasek et al., 1995 - - - - - ✓
Jha et al., 2011 ✓-✓ ✓ - -
Kim et al., 2022 ✓-✓ ✓ - -
Klomp et al., 2016 ✓-✓- - -
Major and Hansman, 2004 ✓ ✓ - - - -
Metzger and Parasuraman, 2006 ✓- - - - -
Rovira and Parasuraman, 2010 ✓-✓ ✓ - -
Sollenberger and Hale, 2011 - - ✓- - -
ten Brink et al., 2019 ✓- - - - -
Trapsilawati et al., 2021 ✓-✓ ✓ ✓ -
Wilson and Fleming, 2002 - - ✓- - -
Geometric When a scenario is rotated or mirrored, its (objective) taskload formed by the trafﬁc density, conﬂict
geometries etc. remains the same, but its (subjective) workload might change. Especially with experts, ac-
customed to trafﬁc streams from certain directions, changing the principal axis can have an impact on their
perceived workload, as it requires a change in scan pattern.
Geometric transformations can only be done when the sectors are relatively symmetric, which is generally
not the case in operational environments. Furthermore, on a widescreen monitor, rotations other than 180° may
result in a reduced look-ahead range for ﬂights coming towards the sector. Square-shaped monitors (or simulated
windows), as found in many ATC centers, eliminate this problem. Only rotation multiples of 90° were found in
the studies, presumably because this generates sufﬁcient transformations and is easy to execute. Albuquerque
et al. (2008) mention that they ‘invert the route structure’, without further detailing what is meant by that.
Textual Changing callsigns and waypoint names is a simple technique that can be widely applied, does not change
the taskload and has proven to be sufﬁcient on its own in some cases, such as the study by Wilson and Fleming
(2002). When realistic callsigns and aircraft performance data are used, the callsign should match the ﬂight’s
characteristics (e.g., no big airliner for small airlines or non-standard destinations). Similarly, when using
operational airspaces, waypoints may need to be left unaltered to match operational routes. Neither are a problem
when using airspace-naive novices.
Temporal Shifting occurrences of, for example, conﬂicts in time is a feasible technique for relatively long scenarios,
where chunks of trafﬁc entering the sector can be shufﬂed (Albuquerque et al., 2008; Trapsilawati et al., 2021).
Such temporal transformations do, however, risk ignoring cognitive built-up and its associated impact on (per-
ceived) workload. This technique is, therefore, mostly used to construct realistic scenarios from recorded ﬂight
data, by shifting ﬂights to create a plausible scenario that is denser or has more conﬂicts than the recording.
When an experiment consists of multiple scenarios per test condition, their order can be changed. If, for exam-
ple, display variants are tested that are sufﬁciently distinct from each other, participants may be predominantly
occupied by the changed visuals and/or tasks, making it even less likely for them to recognize repeated scenarios
at all (Jasek et al., 1995).
An extreme case of re-ordering chunks of trafﬁc is to add dummy scenarios in between measurement scenarios,
as done by Borst et al. (2017). If planning allows, measurements for each participant can even be split over
multiple days. This requires good planning (difﬁcult when using experts) and is more prone to introducing con-
founds due to a lack of control over variables such as participant energy levels or between-session (professional)
experiences. It is therefore not often used, except in longitudinal studies such as by Hilburn et al. (2014).
A technique not explicitly found in literature is the shifting of all ﬂights up or down in altitude. The individual
contribution might be marginal, as people predominantly recognize plan-view patterns, but in combination with other
techniques it can require participants to not completely rely on their memory. Care must be taken not to alter the
altitudes too much, as changes in ﬂight level have an effect on ground speeds and thus closing rates, impacting the
time a loss of separation occurs and/or conﬂict warnings will be issued.
As an example of the potential impact of scenario transformations, we revisit and analyze eye tracking data
from a previously executed experiment designed for task analyses. To prevent scenario recognition it involved static
scenarios featuring several geometric and textual transformations, dummy scenarios and a varying scenario order.
Participants and Apparatus
Ten professional en-route ATCOs (age: µ= 42.7,
σ= 6.8, years of experience: µ= 18.8,σ= 6.2),
from Eurocontrol’s Maastricht Upper Area Control Centre
(MUAC) voluntarily participated in a simulator experiment,
as approved by the Human Research Ethics Committee of
TU Delft under number 2754. All participants provided
written informed consent. A TU Delft-built medium-ﬁdelity
simulator was designed to mimic the MUAC interface on a
1920 x 1920 pixels 27” display with a computer mouse for
control inputs, shown in Fig. 1. Although the scenarios were
static, participants could measure predicted minimum separa-
tion between ﬂights and display extended ﬂight labels.
Figure 1: Experiment set-up with participant (left)
and observer (right) positions.
Gaze data was recorded using a head-worn Pupil Labs Core eye tracker (Kassner et al., 2014) with Pupil
Capture v3.5.1. The forward facing scene camera recorded at 30 Hz and the pupils were recorded at 120 Hz. Eight
AprilTag markers were placed along the edges of the screen to relate gaze to screen pixels. Clusters of gaze points
that were close in location and time were classiﬁed as ﬁxations through the Python version of I2MC by Hessels et al.
(2017), with a minimal duration threshold of 60 ms as used by Fraga et al. (2021). The ﬁxations were correlated to
ﬂights by drawing voronoi-like areas of interest around each ﬂight’s symbol, speed vector and label.
Participants assessed ﬁve distinct static scenarios, intermingled with ten dummy scenarios that were not
included in the current analysis. Each scenario was shown three times with different transformations and featured
an artiﬁcial, octagonal 80 x 80 NM sector, with four waypoints in the cardinal directions. This made sure that the
professionals would not fully rely on their trained scan patterns and that scenarios could not be recognized based on
the sector shape. Four ﬂights were present on direct routes to their exit points. In the dummy scenarios there were only
two or three ﬂights, for which measures like ﬁxation orders would be less robust. Variants were created by applying
any (combination) of the following transformations:
• Rotation: 90, 180 or 270 degrees,
• Mirroring: ﬂipping along the x- or y-axis,
• Altitude shift: all ﬂights up or down by 1,000 or 2,000 ft.
Callsigns were randomized for all variants and ﬂight labels were always placed at a 90 degree offset to the direction
of travel. Figure 2 shows an example of a scenario with corresponding transformations. Note that ﬂights in the center
of the sector were invariant to all geometric transformations and always appeared at the same location on the screen.
Variant a Variant b Variant c
Figure 2: Three transformations of Scenario 5. Colors relate to the same ﬂights in each transformation.
All participants got to see the same order of transformations, but the scenario ordering was counterbalanced between
them to account for learning effects. This ordering of scenarios was deﬁned in the previously executed experiment and
might, in hindsight, have been suboptimal for the current study. The experiment started with six training scenarios.
Participants were asked to ﬁrst indicate for each scenario whether there were any conﬂicts and to consequently
solve these through altitude clearances only. Some ﬂights had to leave the sector at a different ﬂight level, requiring
a clearance that would generally also solve any conﬂict(s). An intermediate level was needed in some cases to not
create a conﬂict. If there were no (remaining) conﬂicts and all ﬂights were at or cleared to the correct ﬂight level, the
participant could advance to the next scenario by clicking a button in the lower right corner of the screen. This button
was carefully placed to ensure a common ﬁrst ﬁxation point, not related to any ﬂights, when a scenario loaded.
Results and Discussion
After the experiment, some participants mentioned that they did recognize the repetition of certain conﬂict
geometries, but none of them recalled that it were identical scenarios apart from the applied transformation(s). Our
present analysis stays away from concluding whether the recognition mitigation has worked and instead focuses on
the consistency of ﬁxation behavior. Since participants showed vastly individualized behavior, no between-participant
comparisons are performed and all observations discussed here relate to the three scenario repetitions per individual.
Conﬂict detection time is directly driven by the order in which ﬂights receive attention, especially when
scenarios include many ﬂights. After all, if an ATCO ﬁxates ﬂights in a different order, he/she might observe a
conﬂicting pair earlier or later. To this end, Fig. 3 shows for each scenario’s three repetitions the ﬂight that was
ﬁrst ﬁxated by each ATCO. The level of consistency, in terms of identical ﬁrst ﬁxations for all three transformations
(visible as a row of three similarly colored squares), varied per ATCO from zero (Participants 5 and 10) to three
scenarios (Participants 7 and 8). A similar variance can be seen between the scenarios, with consistent ﬁrst ﬁxations
for one (Scenarios 1 and 2) to ﬁve (Scenario 5) ATCOs. This suggests that the rotations may have had an impact on the
ﬁxation order, and that this can differ per individual and trafﬁc layout. On closer inspection, in 80% of the runs, the
ﬁrst ﬁxated ﬂight in Scenario 5 was located in the center of the sector (and therefore in the exact same location for all
repetitions). Conversely, Scenario 1, the only one with no ﬂight near the center, shows the lowest level of consistency.
To illustrate individual differences, complete orders of ﬁxation for two participants on either extremes of
the aforementioned consistency scale are shown in Fig. 4. Note how Participant 8’s complete ﬁxation sequence is
consistent for all variants of Scenario 3. This, in combination with the inconsistent ﬁxation orders seen in other
scenarios or with other participants, further hints at a non-negligible inﬂuence of scenario rotation on the processing of
trafﬁc scenarios. For more insight in the relevant mechanisms, an analysis of scan patterns at different transformations
would be useful, but this requires scenarios with more ﬂights. The static, low density scenarios used in this study
imply that the results are not necessarily applicable to dynamic and/or denser scenarios.
a b c a b c a b c a b c a b c
Figure 3: First ﬁxated ﬂight per participant. Colors
represent speciﬁc ﬂights in a scenario (see Fig. 2 for
a b c a b c a b c a b c a b c
Flight ﬁxation order
Figure 4: Complete ﬂight ﬁxation orders of two
participants. Colors represent speciﬁc ﬂights in a
scenario (see Fig. 2 for Scenario 5).
To further illustrate the potential inﬂuence of rotations on ﬁxation sequences and duration, Fig. 5 shows the
standardized time till speciﬁc ﬂights in Scenarios 3 and 5 had been ﬁrst ﬁxated. Results imply that the rotational-
inﬂuence on this measure is dependent on the researcher’s ﬂight of interest. This is most visible in Scenario 5b, where
Flight 1 shows signiﬁcantly different means compared to the other two rotations. Akin to the ﬁxation order, differences
between individuals are again considerable, reﬂected in the wide spread of most data.
abc abc abc abc
abc abc abc abc
Figure 5: Standardized (per participant) time till ﬂights have been ﬁrst ﬁxated in two scenarios, split per
transformation. Colors represent speciﬁc ﬂights in each scenario (see Fig. 2 for Scenario 5).
While the order of scenarios was counterbalanced between participants, the order of their repetitions was
not (i.e., all participants ﬁrst saw a, followed by b and then c). While this resulted in a clearly visible reduction in
total ﬁxation time over the three repetitions, this reduction is not (always) reﬂected in the results presented here. We
therefore conclude that this speed-up was mostly caused by the participants getting more acquainted with the task
at hand and advancing to the next scenario, rather recognizing the speciﬁc scenarios. To further isolate the effect of
purely the transformation, future studies should include duplicate scenarios where no transformation has been applied.
Scenario transformations such as rotation and mirroring are proven techniques to create paired-samples in
human-in-the-loop ATC research, but the potential impact on results is not always sufﬁciently recognized. We showed
that the most popular technique, rotating scenarios, does risk eliciting different eye ﬁxation behavior from participants,
potentially confounding objective measures such as conﬂict detection time. Whether this is a problem strongly depends
on the research at hand and requires careful consideration. No deﬁnitive conclusions regarding the size of these effects
can be made on the basis of the limited analysis presented here. The ﬁrst indications do warrant further research with
more elaborate, potentially dynamic, trafﬁc scenarios and a tailored experiment design.
The authors express their gratitude to all involved ATCOs and MUAC for facilitating the experiment.
Abdul Rahman, S. (2014). Solution Space-based Approach to Assess Sector Complexity in Air Trafﬁc Control (Doctoral disserta-
tion). Delft University of Technology.
Albuquerque, E. A. F., Trabasso, L. G., Sandes, A., de Ara´
ujo, M., Li, L., & Hansman, R. J. (2008). Experimental Setup for Air
Trafﬁc Control Cognitive Complexity Analysis. Symposium of Operational Applications in Areas of Defense, 342–347.
Borst, C., Bijsterbosch, V. A., van Paassen, M. M., & Mulder, M. (2017). Ecological interface design: supporting fault diagnosis of
automated advice in a supervisory air trafﬁc control task. Cognition, Technology and Work,19(4), 545–560.
Borst, C., Visser, R. M., van Paassen, M. M., & Mulder, M. (2019). Exploring Short-Term Training Effects of Ecological Interfaces:
A Case Study in Air Trafﬁc Control. IEEE Transactions on Human-Machine Systems,49(6), 623–632.
Cummings, M. L., Tsonis, C. G., & Cunha, D. C. (2005). Complexity Mitigation Through Airspace Structure. International Sym-
posium on Aviation Psychology, 159–163.
Fraga, R. P., Kang, Z., Crutchﬁeld, J. M., & Mandal, S. (2021). Visual Search and Conﬂict Mitigation Strategies Used by Expert en
Route Air Trafﬁc Controllers. Aerospace,8(7).
Harrison, J., Izzetoˇ
glu, K., Ayaz, H., Willems, B., Hah, S., Ahlstrom, U., Woo, H., Shewokis, P. A., Bunce, S. C., & Onaral, B.
(2014). Cognitive workload and learning assessment during the implementation of a next-generation air trafﬁc control
technology using functional near-infrared spectroscopy. IEEE Transactions on Human-Machine Systems,44(4), 429–
Hessels, R. S., Niehorster, D. C., Kemner, C., & Hooge, I. T. (2017). Noise-robust ﬁxation detection in eye movement data:
Identiﬁcation by two-means clustering (I2MC). Behavior Research Methods,49(5), 1802–1823.
Hilburn, B., Westin, C., & Borst, C. (2014). Will Controllers Accept a Machine That Thinks like They Think? The Role of Strategic
Conformance in Decision Aiding Automation. Air Trafﬁc Control Quarterly,22(2), 115–136.
IJtsma, M., Borst, C., van Paassen, M. M., & Mulder, M. (2022). Evaluation of a Decision-Based Invocation Strategy for Adaptive
Support for Air Trafﬁc Control. IEEE Transactions on Human-Machine Systems,52(6), 1135–1146.
Jans, M., Borst, C., van Paassen, M. M., & Mulder, M. (2019). Effect of ATC Automation Transparency on Acceptance of Resolu-
tion Advisories. IFAC PapersOnLine,52(19), 353–358.
Jasek, M., Pioch, N., & Zeltzer, D. (1995). Enhanced Visual Displays for Air Trafﬁc Control Collision Prediction. IFAC Proceedings
Jha, P. D., Bisantz, A. M., Parasuraman, R., & Drury, C. G. (2011). Air trafﬁc controllers’ performance in advance air trafﬁc
management system: Part I-performance results. International Journal of Aviation Psychology,21(3), 283–305.
Kassner, M., Patera, W., & Bulling, A. (2014). Pupil: An Open Source Platform for Pervasive Eye Tracking and Mobile Gaze-
based Interaction. Adjunct Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous
Kim, M., Borst, C., & Mulder, M. (2022). Situation Awareness Prompts: Bridging the Gap between Supervisory and Manual Air
Trafﬁc Control. IFAC-PapersOnLine,55(29), 13–18.
Klomp, R. E., Borst, C., van Paassen, M. M., & Mulder, M. (2016). Expertise Level, Control Strategies, and Robustness in Future
Air Trafﬁc Control Decision Aiding. IEEE Transactions on Human-Machine Systems,46(2), 255–266.
Major, L. M., & Hansman, R. J. (2004). Human-Centered Systems Analysis of Mixed Equipage in Oceanic Air Trafﬁc Control
(tech. rep.). MIT International Center for Air Transportation. Cambridge, MA, USA.
Metzger, U., & Parasuraman, R. (2006). Effects of automated conﬂict cuing and trafﬁc density on air trafﬁc controller performance
and visual attention in a datalink environment. International Journal of Aviation Psychology,16(4), 343–362.
Rovira, E., & Parasuraman, R. (2010). Transitioning to future air trafﬁc management: Effects of imperfect automation on controller
attention and performance. Human Factors,52(3), 411–425.
Sollenberger, R. L., & Hale, M. (2011). Human-in-the-Loop Investigation of Variable Separation Standards in the En Route Air
Trafﬁc Control Environment. Proceedings of the Human Factors and Ergonomics Society 55th Annual Meeting, 66–70.
ten Brink, D. S., Klomp, R. E., Borst, C., van Paassen, M. M., & Mulder, M. (2019). Flow-based air trafﬁc control: Human-machine
interface for steering a path-planning algorithm. 2019 IEEE International Conference on Systems, Man and Cybernetics,
SMC 2019, 3186–3191.
Trapsilawati, F., Chen, C. H., Wickens, C. D., & Qu, X. (2021). Integration of conﬂict resolution automation and vertical situation
display for on-ground air trafﬁc control operations. Journal of Navigation,74(3), 619–632.
Westin, C., Borst, C., & Hilburn, B. (2016). Strategic Conformance: Overcoming Acceptance Issues of Decision Aiding Automa-
tion? IEEE Transactions on Human-Machine Systems,46(1), 41–52.
Wilson, I. A. B., & Fleming, K. (2002). Controller reactions to free ﬂight in a complex transition sector re-visited using ADS-B+.
Proceedings. The 21st Digital Avionics Systems Conference,1, 5–1.