Content uploaded by Stephanie Balters
Author content
All content in this area was uploaded by Stephanie Balters on Nov 19, 2020
Content may be subject to copyright.
Back to School: Impact of Training on Driver Behavior
and State in Autonomous Vehicles
Srinath Sibi
Stanford University
Stanford, CA, USA
ssibi@stanford.edu
Stephanie Balters
Stanford University
Stanford, CA, USA
balters@stanford.edu
Ernestine Fu
Stanford University
Stanford, CA, USA
ernestinefu@stanford.edu
Ella G. Strack
Zoox Inc.
(previously Bosch LLC)
Foster City, CA, USA
estrack@zoox.com
Martin Steinert
Norwegian University of
Science And Technology
Trondheim, Norway
martin.steinert@ntnu.no
Wendy Ju
Cornell Tech
New York, NY
wendyju@cornell.edu
Abstract— Many producers of automated vehicle systems
have begun testing autonomous vehicles on the road. In order
to ensure safety and prevent crashes, human drivers are
enlisted to monitor autonomous vehicles. However, operators
of autonomous systems exhibit negative behavior adaptations
in response to prolonged supervision of automation. To prevent
the onset of undesirable behaviors in safety drivers, we must
investigate driver state and behavior changes during the opera-
tion of highly automated vehicles. In the study presented here,
we examine the effects of theoretical and practical training
on the drivers’ response to potentially critical situations in
a longitudinal driving simulator study. We also present the
effects of encountering a failure of the automated vehicle on
driver state and behavior. We conducted a two-part panel
driving simulator study (N=28), with an interval of 20-30 days
between the training and testing sessions. We found that while
participants with training are better prepared for a potential
failure of the automation, participants in both conditions show
a rise in sleepy or drowsy behavior before a potential failure
of automation.
I. INTRODUCTION
Autonomous vehicles (AVs) for widespread public use
bring the promise of safer and more efficient roads, while
simultaneously reducing overall travel time by decreasing
traffic [
1
]. Currently, vehicle manufacturers are testing vehi-
cles with automation features in Levels 3 to 5 of automation
on the SAE classification [
2
], with the goal of eventually
reaching Level 5, or full automation. During on-road testing
of AVs,
Safety Drivers
(also referred to as Fallback Drivers)
are needed to ensure safety in the event the AV is unable to
operate or experiences a failure; in such an event, the driver
can safely take-over and navigate the vehicle. Safety drivers
[
2
]) must monitor the AV operations and its surroundings
and, if possible, anticipate failures of the AV system. This
makes the understanding and the analysis of safety driver
training, behavior and state, vital [3].
However, it has been noted that drivers supervising of au-
tomated vehicles are imperfect themselves. The simultaneous
failure of the vehicle automation systems and the safety driver
can have disastrous consequences
1
,
2
. Moreover, prior research
indicates that with the resumption of manual driving from
lower levels of automation, drivers experience an increase
in response time [
4
] and in secondary task involvement [
5
].
In higher levels of automation, drivers experience increased
sleepy and drowsy behavior [
6
] caused by reduced cognitive
activity [
7
] associated with underload. However, this body of
AV research is based on studies conducted with participants
with driving experience, but not drivers given specific training
on the fallibility of AVs. Hence, it is important to understand
if training of drivers or increased experience with automation
failure can alleviate the detrimental driver behaviors that
occur when supervising AVs.
In this study, we analyze the impact of (1) failure training
and the (2) experience of a near-catastrophic-failure on
the AV driver’s ability to recognize a potentially critical
situation and prepare himself or herself. We employed a
1 x 2 (with Failure Training and without Failure Training)
design panel study (N=28). The study was run in two phases,
separated by 20-30 days: the first session was presented as
a training session and the second session as a test session.
In the training session, participants were given theoretical
and practical training using a training document packet and
driving simulation respectively. Those who underwent
failure
training
received an additional section in their document
packet that detailed the likely ways the AV driving system
could fail or suddenly disengage, and, how to best prepare
for such an event. Participants who received failure training
also were given a demonstration of sudden silent automation
failure in the training driving simulation. In second phase (i.e.
the testing session), participants drove the AV through a 40
minute simulated course. All participants experienced sudden
silent automation failure in the second driving simulation
1
https://www.nbcnews.com/tech/innovation/video-shows-moment-fatal-
crash-involving-self-driving-uber-n858921
2
https://qz.com/1410928/waymos-self-driving-car-crashed-because-its-
human-driver-fell-asleep/
Fig. 1. Participant in the Testing Phase
session. We analyzed the differences in driver state and driving
behavior between the 2 groups and within each group over
time. The key findings of this study are stated below.
•
Participants who received theoretical and practical failure
training were more likely to anticipate potential AV
failure.
•
Participants in both conditions showed a decreasing trend
in their preparedness and vigilance over time.
•
The experience of a near-failure of automation caused
atemporary increase in preparedness and decrease in
sleepy behavior, but prior trends take up again soon after.
II. BACKGROU ND AND PRIOR WORK
A. Driver Behavior Changes
As more and more automated driving features are included
in cars, driver behaviors change in ways that are not always
expected. Researchers have noted that driver behavior can
take a turn for the worse with lower levels of automation
(between Levels 1-2.5). Rudin-Brown et al. [
4
] find that
the prolonged use of ACC corresponded to increases in the
response time to hazard detection tasks. Similarly, a study
conducted by Strand et al. reveals that drivers using highly
automated driving systems maintain lower minimum time to
collision, lower minimum time headway and a longer response
time to a failure of the automation as compared to drivers
who use the lower levels of automation(ACC) [
8
]. In contrast,
longitudinal research on supervision of automation by Erika
Miller et al., investigating behavioral adaptations to Level 2
lane-keeping assistance features over the course of multiple
drives on multiple days over the course of a week, indicates
that drivers perform better with automation over time but then
suffer performance losses when assistance is not available [
9
],
[
10
]. Several studies also report that automation correlates
with decreased driver arousal, increased driver drowsiness,
slower reaction times and reduced eyes-on-road gazes [
11
],
[12], [13].
Similar adverse behavioral adaptations are also observed
in the higher levels of automation. Past studies indicate
that drivers who monitor the automated driving system for
lengthy time periods experience drowsy behavior [
6
], [
14
].
These incidences of drowsy behavior were investigated using
cortical activity measures [
7
] and were found to be the result
of prolonged duration of low cortical activity. Research on
vigilance indicates that uncertainty increases the workload
associated with supervision of automation, and the associated
fatigue degrades performance [
15
] particularly over time [
16
]
These studies point to a pattern of detrimental driver behaviors
emerging with the use of automated driving systems, making
it vital that any training program used for safety drivers
addresses these issues.
B. Safety Driver Training
Driver training programs have long been an essential tool
in ensuring the safety of drivers. It is ingrained into our
driving culture and is a rite of passage for many teenagers.
The design of an appropriate and effective driver training
program can reduce the risk of crashes and traffic violations
[
17
]. New and improved driver training programs are being
developed that focus on bettering drivers’ anticipation of risk,
rather than the drivers’ skill in driving at the limits of tire
adhesion. With the advent of automated driving and the use
of safety drivers to supervise the automation to ensure safety,
it is now imperative to study the effectiveness of training
programs for safety drivers.
While OEMs and tech companies commonly release the
number of miles logged by the AVs in simulation and on-
road, there is little to no information provided on the training
given to safety drivers who supervise the automation during
the testing [
18
]. In the aftermath of a fatal mishap, Uber
announced that it would improve safety driver training. The
report announced the addition of a co-pilot and the inclusion
of among aspects, “Fault Injection Training” and “Software
Limitation Training.” In other words, the proposed training
would include training about the failure modes of the AV.
Recently, a new set of guidelines released by the Automated
Vehicle Safety Consortium called for the inclusion of both
theoretical and practical training and, like the Uber report,
called for the inclusion of training into the failure modes of
the AV. Aside from the recommendations mentioned above,
some federal [
19
] and state policies allude to the need for
safety driver vigilance to ensure safety during AV testing,
making the design of safety driver programs and investigation
of safety driver behavior vital.
C. Measurement of Driver State
Gon
c¸
alves et al. define
Driver State Monitoring Systems
(DSM)
as systems that collect observable information about
the human driver in order to assess drivers capability to
Fig. 2. Panel Study Design with Training Phase(left) and Testing Phase(right). Training Phase consists of theoretical and practical training(smaller driving
simulator). Testing phase conducted in full-cab driving simulator
perform the driving task in a safe manner [
20
]. Past research
suggests myriad ways of inferring driver state, such as video-
based detection [
21
], odometric data [
22
], physiological tools
[
23
], [
24
]. These techniques have been successfully applied
in the detection of stress, fatigue and drowsiness [
24
], and are
being increasingly incorporated into AVs to monitor drivers
and ensure safety [
3
] and comfort
3
. Physiological sensors
offer researchers the ability to monitor driver state over long
time periods with minimal intrusion, making them ideal tools
for the driver state monitoring in AVs. In the current study, we
employ physiological measurement tools to characterize driver
state ahead of latent critical events. We employ galvanic skin
response (GSR) and heart rate (HR) measures to analyze the
preparedness and vigilance displayed by participants ahead
of latent critical events.
III. GOAL S OF TH E STUDY
The goal of the study is to answer the following research
questions:
•RQ1
: Does prior knowledge from theoretical and prac-
tical training have an impact on the driver behavior?
•RQ2
: How does the experience of a near-failure impact
the safety driver behavior?
IV. MET HOD
A. Overview
The study had a between-subject study design: one group
that was given failure training versus the control group which
was not. Participants of each group were required to attend
two sessions: the first session we call the Training Phase, and
the second, the Testing Phase. These sessions were separated
in time by 20-30 days to model the attrition of knowledge
over time. The practical training phase was conducted with a
small driving simulator consisting of a driver seat and one
small-sized screen (see Figure 2, left), similar to what might
3
https://www.faurecia.com/en/innovation/smart-life-board/comfort-
wellness
be used in a driver training class. The testing phase, which
is intended to predict how drivers will perform in subsequent
on-road driving, used a large immersive driving simulator
consisting of a 270-degree wrap-around screen and full-cab
vehicle (see Figure 2, right). A total of N=28 participant,
between 18 and 60 years old, completed both phases of the
study. All participants had a minimum of 2 years of on-
road driving experience. Participants were paid
$
5 for the
completion of the training phase and
$
25 for the testing phase
in Amazon gift cards. The overall study design is shown in
Figure 3.
B. Training Phase
Upon entering the study facility, participants fill out a
consent form and a questionnaire. The questionnaire collects
preliminary demographic information. The training phase
consists of two sub-sections: the theoretical and the practical
training sections. Both groups experience both training
sections. The failure training group is additionally briefed
about potential failures during theoretical training, and further
experience a silent automation failure during the practical
training phase.
1) Theoretical Training: In the theoretical training phase,
all
participants are presented with a document packet contain-
ing information detailing the functions of the AV subsystems.
It contains introductory content on perception, localization
and data fusion sub-systems in the AV, along with elementary
information regarding the functionalities of the RADAR,
LIDAR and camera subsystems in the AV. The theoretical
training packet instructs participants on how to engage and
disengage automation using the buttons on the steering wheel,
the brake and the steering. These methods are employed in a
driving simulator in the practical training segment.
The participants who receive theoretical failure training
receive an additional set of documents in the theoretical train-
ing session on the topic of ”safety in autonomous vehicles.”
This section contains the reasons why the automated driving
system (used in this experiment) would fail or disengage.
Fig. 3. (Left)Course Design with practice and baseline segments, as well as test events, shown along a timeline. (Right)Temporal structure for each critical
event. (Figures not to scale)
Fig. 4. Examples of information provided in the theoretical training packet.
The full training document packet is available in the repository indicated in
theoretical training sub-section.
For example, participants are instructed that missing lane
markings or large debris covering the road
could
cause the
system to disengage automation and suddenly transfer control
to the driver. To prepare for possible control transition and
to avoid crashes due to sudden automation disengagement,
participants are instructed that they should hold the steering
wheel and place their foot on the brake pedal if they suspect
that the automation will disengage.
After participants finish reading the theoretical training
packet, they are given a multiple-answer multiple-choice test
to ensure that they have read and familiarized themselves with
the contents of the training packet. The test contains a total
of 8 questions, with 3 questions that tested the participants
on the safety section. We ensure that the participants who
receive theoretical failure training are well versed with the
contents of the safety training section by making them re-
read the training packet until they are able to answer all the
questions in the test about the safety section accurately (i.e.,
they have to redo the test until all questions are answered
correctly). This marks the end of the theoretical training
section; participants then begin the practical training section.
The material used in the theoretical training section along
with the test administered to participants are included in a
Github repository4.
2) Practical Training: In the practical training,
all
partici-
pants practice operating the automated vehicle using a driving
simulator (Figure 2) with automated driving capabilities.
Participants are given a short introduction of how to engage
and disengage the automation. In order to engage automation,
they press a green button on the steering wheel. To disengage
automation, they can press a red button on the steering wheel,
step on the brake, or turn the steering wheel to a minimum
of 15 degrees. (The minimum steering wheel for disabling
the automation is set to 15 degrees to avoid any accidental
disengagements by participants.) All participants practice
engaging and disengaging the automation at least twice
using the methods listed above, until they felt familiarized
with the transfers of control. While the practical training
phase uses a small driving simulator and the testing phase
uses an immersive large driving simulator, both simulators
are based on the same simulation software and automated
system capabilities (see Figure 2). The methods to engage
and disengage automation were consistent across the two
driving simulators. This provides the participants the requisite
practical training to operate the automated vehicle in the
testing phase.
During the practical training, the AV encounters a section
of road where the lane markers are faded. The participants
who receive practical failure training condition experience a
sudden disengagement of the automation due to the absence
of lane markers at this location (shown in Figure 5), in a
manner similar to how the automated driving system will
fail in the subsequent testing phase. Participants who do not
receive the practical failure training do not experience this
disengagement in automation.
C. Testing Phase
1) Measurements: During the study, participants’ ECG
signals, galvanic skin response and video data are gathered.
ECG data are collected using the Shimmer3 ECG device
with Ag/AgCl placed over the chest. The galvanic skin
response is recorded using the Shimmer3 GSR unit with
4
https://github.com/srinathsibi/Safety-Driver-Failure-Training-
Materials.git
Fig. 5. Critical Event in Practical Training with faded lane markings
a Photoplethysmograph (PPG) attachment. The Ag/Agcl
electrodes placed on the forearm and the PPG sensor are
attached to the ear lobe as seen in Figure 1. The physiological
data is set to record at 512 Hz sampling frequency.
2) Testing Course Design: At the start of the testing
session drive, participants are told that they are driving to San
Francisco in their AV. Participants are instructed to closely
follow signs along the side of the guiding them to their
destination and other pertinent driving information such as
designated lanes for AVs. They are informed that the AV is
programmed to follow directions to the destination and obey
the road signs, but participants are instructed that they can
take over control if they feel it necessary, using the methods
learned in the training phase.
The total drive time in the testing phase is approximately
45 minutes depending on participant driving speeds.
The drive begins with a short 3-4 minute long practice
segment for the participants to acclimate to driving in the
larger full cab simulator. During this segment, participants are
asked to operate the vehicle in manual and automated driving
modes by taking-over and handing-off control from and to the
automated driving system using the methods covered during
the training phase.
3) Event Design: During the main portion of the testing
phase drive, participants encounter eight events in the simula-
tor test course. In all eight events, the lane markings are either
occluded or not available. The design of the course with the
practice segment and the eight individual events is shown in
Figure 3. Each event is separated from the previous event
by at least two minutes. In each event, the AV encounters
missing/faded lane markings. While the cause for the absence
or occlusion of the lane markings is different across the
events, the overall temporal structure of all the events is the
same (Figure 3). Every event is preceded by a road sign on
the side of the road, indicating a potential critical event in
the road ahead. Then, the lane markers fade or disappear
completely, and the AV encounters an obstacle in the road.
The time interval between the appearance of a road sign and
the lane markers disappearance is denoted as T
1
. The time
interval between the disappearance of the lane markers and
the appearance of an obstacle on the road is denoted as T
2
.
T
1
and T
2
are intentionally varied between the eight events
to avoid any learning effects between events based on timing
alone.
As seen in Figure 3, events 3 and 8 are denoted differently
from the other 6 events. In events 1, 2, 4, 5, 6 and 7, the AV
self-corrects and maneuvers to safety without any intervention
from the driver (participant), thereby successfully navigating a
situation where there were absent or unreadable lane markers.
For example, in event 1, the AV first encounters a warning
sign. On the other side of the hill, the AV’s lane and the lane
markers are covered by rocks due to a landslide. To avoid
hitting the rocks, the AV slowly self-corrects its course by
driving to the left and around the rocks and then back on
to the lane. Events 2, 4, 5, 6 and 7 have a similar structure
where the AV avoids any traffic incidents by identifying and
correcting its path.
Unlike the aforementioned events, events 3 and 8 feature
automation failure. In event 3, the participant encounters a
construction zone in which the lane markers are missing
and a section of the road is closed off for construction with
pylons. The AV begins braking at the last possible moment
to avoid crashing into the pylons and automation disengages
after the vehicle comes to a stop. Participants then have to
manually drive for
∼
30 seconds to a different location to
engage automation, marking the end of event 3. In event 8,
the AV encounters missing lane markers while driving along
the highway. When the highway lanes diverge due to the
presence of a highway island, the AV is unable to follow the
curve in the road and continues to drive straight due to the
absence of lane markers. As a result, without intervention
the AV crashes into trees on the highway island.
This test course design was chosen to enable comparisons
in the participants’ behavior and state during the T
1
and T
2
intervals for events, and differences in between the driver
behavior and driver state before and after event 3. This allows
us to compare participants who receive failure training and
those who did not, and the impact of experience a near-failure
of the AV on driver behavior and state.
D. Other Events
In between the eight critical events, it is important to
note that the AV navigates several events such as highway
exits and merges, lane changes, stop signs and traffic lights
with no errors in driving. We chose this setup as the future
paradigm of AVs is one where the safety driver supervises
the automation for long periods of successful driving with
the automation encountering disengagements or failures that
are further and further apart
5
. We built common traffic events
that the AV should navigate easily so that the narrative of
the study was in keeping with the emerging paradigm in
automated driving.
V. RESULTS
A. Preparedness for Critical Event
Each participant’s video was coded for their preparedness
for failure by identifying whether participants prepared
themselves for a possible failure of automation as instructed in
5
https://www.forbes.com/sites/alanohnsman/2019/02/13/waymo-tops-self-
driving-car-disengagement-stats-as-gm-cruise-gains-and-tesla-is-awol/
the training phase of the study. In other words, did participants
hold the steering wheel and prepare themselves for a possible
automation disengagement or failure of automation? In the
study design figure, (Figure 3) this is the interval T
1
+ T
2
(i.e. time interval from the appearance of the road sign to the
appearance of the obstacle on the road).
The fraction of participants in both study conditions who
displayed preparedness for a possible failure of automation
in the interval before an event are shown in Figure 6.
Three interesting results can be observed and are noted
as 1a, 1b and 1c in the Figure 6. First, in 1a, we observe
that a greater fraction of participants who received failure
training exhibit increased preparedness ahead of latent critical
events across all events. Next, in 1b, we also observe an
increase in preparedness after event 3 where the automation
experiences a disengagement through a silent failure. Lastly,
in 1c, we observe a marked downward trend in the fraction
of participants who showed preparedness in both study
conditions. This downward trend is temporarily interrupted by
the third critical event. However, the downward trend appears
once again after event 4.
B. Sleepy/Drowsy Behavior Before a Critical Event
In the T
1
+ T
2
interval for all events, we coded the
videos for sleepy or drowsy behavior. If participants exhibited
prolonged (
>
5 sec) eye closure, yawning or other sleep
related behavior (e.g., head nodding), they were classified as
displaying sleepy behavior. The fraction of participants who
exhibit sleepy behavior in the T
1
+T
2
interval for each event
in both conditions is shown in the lower part of Figure 7.
Here we observe two interesting results: First, the fraction
of participants who display drowsy behavior ahead of an event
is not significantly different between both study conditions.
However, both conditions towards event 8 show a marked
increase in drowsy behavior. Approximately a third of
participants in both cases display sleepy or drowsy behavior
before the last critical event. This is indicated as result 2a
in Figure 7. Another interesting observation in the results is
the decrease in the observed incidences of sleepy behavior
immediately following event 3 in both conditions, as denoted
by result 2b in Figure 7.
C. Physiological Data
We re-sampled the GSR and ECG data to 512Hz to
compensate for the loss of data over the Bluetooth connection
for the Shimmer device, and to ensure that the recording rate
of 512 Hz is retained. Once the data were re-sampled, the
Python BioSPPy package [
25
] was employed to extract the
GSR peaks and amplitude information and the heart rate from
the ECG data.
Baseline for the physiological data was set to the interval
after the practice segment, before event 1. In this baseline
interval (shown in Figure 3), participants monitored the
automation while the AV drove on a straight road with no
traffic or critical events. For the GSR data, the number of
peaks and the amplitude of the peaks were extracted for
the (T
1
+ T
2
). To analyze the increase in driver arousal and
vigilance ahead of the event, the number of peaks over 0.5
µ
Siemens and the increase in heart rate from the baseline
value for the (T1+ T2) interval were calculated.
Some participants’ physiological data had to be excluded
from further analysis due to excessive motion artefacts in the
data, resulting in N= 11 (FT group) and N= 13 (Control
group) participants for the heart rate measure, and N= 13 (FT
group) and N= 11 (Control group) participants for the GSR
measure. For each group, we further calculated the average
change in heart rate (compared to baseline), as well as the
average number of GSR peaks before the event – across all
events [
26
]. As data were not normally distributed, we present
boxplot diagrams (see Figure 8).
Non-parametric Mann-Whitney U tests between the groups
reveal a statistically significant higher average number of
GSR peaks before the event for the failure training group
compared to the control group (U= 3.000, z= 3.046, p=
.001), with a large effect size of r= .762. The tests show no
differences in heart rate.
In summary, results 1a, 1b, and 1c show that participants
who received failure training display an increased level of
driver preparedness in the event of a potential failure of
automation. These findings is bolstered by the results of the
GSR analysis. Participants who receive failure training show
an increase in the GSR response before the critical events.
These findings demonstrate that safety drivers benefit from
receiving training about the failure modes of the AV.
Both driver groups, with and without failure training,
show a decrease in preparedness indicating that the effect
of training on driver alertness and vigilance fades over time.
This downward trend is interrupted by the disengagements
of automation or potentially hazardous events, but the trend
resumes as drivers continue to monitor the automation after
the event. On one hand, these results imply that training and
knowledge of the AV’s capabilities and limits better safety
driver awareness and preparedness. However, on the other
hand, the results suggest that this preparedness decreases with
time as they monitor the AV navigating other (non-critical)
traffic events successfully.
Results 2a and 2b show that a prolonged duration for
monitoring the automated driving system results in increased
sleepy or drowsy behavior. This result is in keeping with past
research which shows that the cortical activity of drivers is the
lowest when monitoring automation and this sustained interval
of low cognitive activity often leads to sleepy or drowsy
behavior [
7
]. Similar to driver preparedness, the increasing
trend in sleepy or drowsy behavior is interrupted by a critical
event, but resumes a short while after this critical event.
These results provide some much needed insight into the
behavior of safety drivers and their training. It is important
to understand how we can effectively train safety drivers.
To this end, these results suggest that training and limited
time spans performing supervision are necessary to safety
driver performance. The newly released AVSC safety driver
training recommendations [
3
] advocates periodic safety driver
evaluation and retraining. These recommendations are in
keeping with our findings. Moreover, these recommendations
0
0.2
0.4
0.6
0.8
1
Event 1 Event 2 Event 3 Event 4 Event 5 Event 6 Ev ent 7 Event 8
Frac tion of Par ticipants
Preparedness for Cr itical Event
FT group
Control group
Result 1a
Result 1c
Result 1b
Result 1c
Near-failure
FT Group
Control G roup
Fig. 6. Preparedness of participants across all events
0
0.2
0.4
0.6
0.8
1
Event 1 Event 2 Event 3 Event 4 Event 5 Event 6 Event 7 Event 8
Frac tion of Par ticipants
Presence of Sleepy/Drowsy Be havior Prior to Event
FT group
Control group
Result 2b
Near-failure
Result 2a
FT Group
Control G roup
Fig. 7. Drowsy behavior of participants across all events
0
2
4
6
8
10
FT group Control
group
Avereage Number of GSR
Peaks Before Event
0
10
20
30
40
FT group Control
group
Avereage Change in Heart
Rate Compared to Baseline
p = .8 20 p = .0 01**
(N= 11)
(N= 13) (N= 11)
(N= 13)
FT Group
(N= 11)
Control
Group
(N= 13)
FT Group
(N= 13)
Control
Group
(N= 11)
Fig. 8. (Left) Average change in heart rate in (T
1
+ T
2
) interval and
(Right) Average number of peaks greater than 0.5
µ
Siemens in the (T
1
+
T2) interval.
also suggest 5 to 20 minute breaks after 2 hours of AV testing
and operation. These findings, too, are aligned with our own
results and observations.
D. Automation Take-Overs:
In the T
1
+ T
2
interval for all events, we coded for take-
overs of automation initiated by participants using the three
modalities introduced in the training phase. Figure 9 shows
the fraction of participants across all participants in both
condition who disengage automation in the T
1
+T
2
interval
ahead of all events.
No significant differences in take over behavior were
observed between the conditions across the events save
for event 6. In event 6, some participants in the control
group disengaged the automation ahead of the obstacle
without closely monitoring the automated driving system.
This deviation in the trend was because event 3 and event
6 share the same road sign (a construction zone sign) to
indicate that there might be a potentially dangerous road
condition ahead. Some participants in the control condition
misconstrued the reason for the AV failure as the construction
zone rather than the absence of lane markers and prematurely
disengage automation. This alludes to a potentially dangerous
learning trend; safety drivers who are not accurately trained
on the AV’s failure modes may incorrectly learn and interpret
its behavior during operation.
0
0.2
0.4
0.6
0.8
1
Event 1 Event 2 Event 3 Event 4 Event 5 Event 6 Event 7
Frac tion of Par ticipants
Tak e -ov er Befor e Event
FT group
Control group
Near-failure
Result 3
Fig. 9. Participants who initiate a take-over before a Critical Event
E. Choice of Failure Mode
In this study, the absence of lane markers was chosen as
the failure mode for the AV. The experimenters constructed an
overarching narrative where the simulated automated driving
software would disengage or fail when it was no longer able
to detect lane markers. This failure mode was chosen as a way
to test the impact of failure training on the participants who
underwent safety drive training. While AVs currently being
tested and deployed might employ localization algorithms
that rule out failures like these, the findings of this study
would still be applicable to any other failure mode on which
safety drivers may be instructed.
F. Time Interval Between Phases of the Panel Study
The length of longitudinal studies to study driver behavior
varies widely. Past studies use time intervals between phases
that range from a few days to few years [
27
], [
28
]. In this
panel study, we chose an interval of 20-30 days between the
two phases of the study for pragmatic reasons. This study
design decision was chosen to push the boundaries on the
learning and retention of the information and training given to
the participants in the training phase without compromising
the researchers’ ability to conduct the study effectively. It
must be noted that while this study analyzes the impact of
the failure training on safety drivers, further research on
the effects of the time interval between training and testing
may still be needed. Testing different time intervals between
training and testing phases may also cast much needed light
on the effectiveness of the theoretical and practical aspects
of the training and aid in the development of more effective
training programs for AV safety drivers.
G. Participant Mental Models
Using the test provided after the theoretical training section,
the authors ensure that participants were familiar with the
theoretical failure training material. However, the study does
not investigate in depth the mental models that participants
develop as a result of the failure training. Drivers will tend
to view the automated driving system in this study and its
limitations through the lens of their past experiences. This
in turn could impact their trust in the automated system[].
A detailed analysis of the participant mental models may
be required in future investigations of the impact of failure
training on AV operator behavior.
VI. CONCLUSION
This study investigated the impact of failure training on
safety driver preparedness and behavior (
RQ1
) and the
experience of a near failure of the automation during a
critical event on the safety driver behavior (
RQ2
) using a
2-phase panel study design with theoretical and practical
training. The results clearly show that theoretical and practical
failure training have a positive impact on the safety driver
preparedness; however, there is an overall trend indicating a
loss in driver preparedness and a loss in vigilance signalled
by increasing incidences of sleepy/drowsy behavior over time.
While experiencing critical events in which the automation
may fail increases driver preparedness and alleviates sleepy
behavior, the effect is only transient. In other words, the
longer the AV drives successfully, the lesser the safety drivers
are prepared for a failure of the automated driving system.
VII. ACKNOWLEDGMENTS
This research was conducted under Stanford IRB Protocol
30016, with support from Robert Bosch LLC.
REFERENCES
[1]
C. Wu, A. M. Bayen, and A. Mehta, “Stabilizing traffic with
autonomous vehicles,” in 2018 IEEE International Conference on
Robotics and Automation (ICRA). IEEE, 2018, pp. 1–7.
[2]
Taxonomy and Definitions for Terms Related to Driving Automation
Systems for On-Road Motor Vehicles, jun 2018. [Online]. Available:
https://doi.org/10.4271/J3016 201806
[3]
AVSC Best Practice for In-Vehicle Fallback Test Driver Selection,
Training, and Oversight Procedures for Automated Vehicles Under Test,
nov 2019.
[4]
C. M. Rudin-Brown and H. A. Parker, “Behavioural adaptation to
adaptive cruise control (acc): implications for preventive strategies,”
Transportation Research Part F: Traffic Psychology and Behaviour,
vol. 7, no. 2, pp. 59–76, 2004.
[5]
J. C. de Winter, N. A. Stanton, J. S. Price, and H. Mistry, “The effects
of driving with different levels of unreliable automation on self-reported
workload and secondary task performance,” International journal of
vehicle design, vol. 70, no. 4, pp. 297–324, 2016.
[6]
D. Miller, A. Sun, M. Johns, H. Ive, D. Sirkin, S. Aich, and W. Ju,
“Distraction becomes engagement in automated driving,” in Proceedings
of the Human Factors and Ergonomics Society Annual Meeting, vol. 59,
no. 1. SAGE Publications Sage CA: Los Angeles, CA, 2015.
[7]
S. Sibi, H. Ayaz, D. P. Kuhns, D. M. Sirkin, and W. Ju, “Monitoring
driver cognitive load using functional near infrared spectroscopy
in partially autonomous cars,” in 2016 IEEE Intelligent Vehicles
Symposium (IV). IEEE, 2016, pp. 419–425.
[8]
N. Strand, J. Nilsson, I. M. Karlsson, and L. Nilsson, “Semi-automated
versus highly automated driving in critical situations caused by
automation failures,” Transportation research part F: traffic psychology
and behaviour, vol. 27, pp. 218–228, 2014.
[9]
E. E. Miller and L. N. Boyle, “Behavioral adaptations to lane keeping
systems: Effects of exposure and withdrawal,” Human factors, vol. 61,
no. 1, pp. 152–164, 2019.
[10]
E. E. Miller, “Behavioral adaptations of drivers to autonomous systems:
Evaluating intermediate and carryover effects,” Ph.D. dissertation, 2018.
[11]
A. H. Jamson, N. Merat, O. M. Carsten, and F. C. Lai, “Behavioural
changes in drivers experiencing highly-automated vehicle control in
varying traffic conditions,” Transportation research part C: emerging
technologies, vol. 30, pp. 116–125, 2013.
[12]
O. Carsten, F. C. Lai, Y. Barnard, A. H. Jamson, and N. Merat, “Control
task substitution in semiautomated driving: Does it matter what aspects
are automated?” Human factors, vol. 54, no. 5, pp. 747–761, 2012.
[13]
J. C. De Winter, R. Happee, M. H. Martens, and N. A. Stanton,
“Effects of adaptive cruise control and highly automated driving on
workload and situation awareness: A review of the empirical evidence,”
Transportation research part F: traffic psychology and behaviour,
vol. 27, pp. 196–217, 2014.
[14]
F. Naujoks, S. H
¨
ofling, C. Purucker, and K. Zeeb, “From partial and
high automation to manual driving: relationship between non-driving
related tasks, drowsiness and take-over performance,” Accident Analysis
& Prevention, vol. 121, pp. 28–42, 2018.
[15]
J. S. Warm, W. N. Dember, and P. A. Hancock, “Vigilance and workload
in automated systems,” Automation and human performance: Theory
and applications, pp. 183–200, 1996.
[16]
E. T. Greenlee, P. R. DeLucia, and D. C. Newton, “Driver vigilance
in automated vehicles: Hazard detection failures are a matter of time,”
Human factors, vol. 60, no. 4, pp. 465–476, 2018.
[17]
R. C. Peck, “Do driver training programs reduce crashes and traffic
violations?a critical examination of the literature,” IATSS research,
vol. 34, no. 2, pp. 63–71, 2011.
[18]
P. Koopman and B. Osyk, “Safety argument considerations for public
road testing of autonomous vehicles,” SAE Technical Paper, Tech. Rep.,
2019.
[19]
D. NHTSA, “Automated driving systems 2.0: A vision for safety,”
2017.
[20]
J. Gon
c¸
alves and K. Bengler, “Driver state monitoring systems–
transferable knowledge manual driving to had,” Procedia Manufactur-
ing, vol. 3, pp. 3011–3016, 2015.
[21]
B. Kisacanin, “Method of detecting vehicle-operator state,” Sept. 9
2008, uS Patent 7,423,540.
[22]
F. Friedrichs, M. Miksch, and B. Yang, “Estimation of lane data-based
features by odometric vehicle data for driver state monitoring,” in 13th
International IEEE Conference on Intelligent Transportation Systems.
IEEE, 2010, pp. 611–616.
[23]
J. A. Healey and R. W. Picard, “Detecting stress during real-world
driving tasks using physiological sensors,” IEEE Transactions on
intelligent transportation systems, vol. 6, no. 2, pp. 156–166, 2005.
[24]
S. Begum, “Intelligent driver monitoring systems based on physiological
sensor signals: A review,” in 16th International IEEE Conference on
Intelligent Transportation Systems (ITSC 2013). IEEE, 2013, pp.
282–289.
[25]
C. Carreiras, A. Alves, A. Louren
c¸
o, F. Canento, H. Silva, A. Fred, et al.,
“Biosppy: Biosignal processing in python (2015–),” URL https://github.
com/PIA-Group/BioSPPy, 2018.
[26]
J. A. Healey, “Wearable and automotive systems for affect recogni-
tion from physiology,” Ph.D. dissertation, Massachusetts Institute of
Technology, 2000.
[27]
J. G. Hull, A. M. Draghici, and J. D. Sargent, “A longitudinal study
of risk-glorifying video games and reckless driving.” Psychology of
Popular Media Culture, vol. 1, no. 4, p. 244, 2012.
[28]
E. E. Miller and L. N. Boyle, “Driver adaptation to lane keeping
assistance systems: Do drivers become less vigilant?” in Proceedings
of the Human Factors and Ergonomics Society Annual Meeting, vol. 61,
no. 1. SAGE Publications Sage CA: Los Angeles, CA, 2017.