ArticlePDF Available

Abstract and Figures

Missing a message from an in-vehicle device can range in severity from annoying at best to dangerous at worst. The in-cab auditory environment can vary spontaneously, making some volume levels too loud while rendering others too quiet. It is in the best interest of system designers, both from a safety and user experience perspective, to ensure that users are able to adequately hear alerts, and that drivers do not have to alter their gaze or attention during a visually and attentionally demanding task such as driving. To this end, we propose a system for dynamically tracking the background noise intensity level immediately prior to alert presentation in order to present an alert at an appropriate loudness. Furthermore, we evaluated the proposed system across both behavioral (accuracy and reaction time) and subjective (questionnaire results) measures. Behavioral results showed that while the proposed system increased recognition in one noise condition (background music), it also led to slower responses in two other noise conditions (windows-down and windows-up noise).
Content may be subject to copyright.
Note: This is the initial manuscript version (before review) submitted for publication. The final published
version can be found here:
Adaptive Auditory Alerts for Smart In-Vehicle Interfaces
Edin Šabić, Daniel Henning, Justin MacDonald
New Mexico State University
Missing a message from an in-vehicle device can range in severity from annoying at best
to dangerous at worst. The in-cab auditory environment can vary sporadically, making
some volume levels too loud while rendering others too quiet. It is in the best interest of
both the driver and the system designer to make sure that users are able to adequately,
and comfortably, hear alerts and that they do not have to alter their gaze or attention
during a visually and attentionally demanding task such as driving. To this end, we
propose a system for dynamically tracking the background noise intensity level
immediately prior to alert presentation in order to present an alert at an appropriate
loudness. Furthermore, we evaluate the proposed system across both behavioral, in terms
of accuracy and reaction time, and subjective, in terms of questionnaire results, measures.
Behavioral results showed that while the proposed system increased recognition in one
noise condition (background music), it also led to slower responses in two other noise
conditions (windows-down and windows-up noise).
The increased popularity of in-vehicle interfaces
brings with it the promise of a more enjoyable, and
in some cases safer, user experience. The purpose of
these interfaces can range from media, eco-driving,
safety systems such as collision avoidance
systems, and navigation systems, to name only a
few. Furthermore, these interfaces can use a
unimodal, such as tactile-only systems, or
multimodal, such as audio-visual systems,
approach. Unfortunately, in-vehicle technologies
can decrease the driver’s attention to their primary
task, driving, and so it is crucial to fine-tune these
interfaces for a driving context. Certainly every
approach has its share of advantages and
disadvantages, but implementing auditory interfaces
into vehicles has a crucial advantage of not further
taxing visual resources that are critical to driving.
While there are multiple ways to communicate
an auditory alert, including auditory icons (Gaver,
1986; earcons (Blattner, Sumikawa, & Greenberg,
1989), spearcons (Walker, Nance, & Lindsay, 2006;
Walker 2013), and soundscapes (Kilander &
Lönnqvist, 2002), text-to-speech (TTS) is one of the
most popular approaches and has widespread
prevalence in infotainment systems. For instance, a
navigational system that produces TTS allows for
the communication of street names or locations.
TTS does, however, have the drawback that the
listener must be able to understand the given
language, something that is not a problem for a
language-free auditory cue, such as auditory icons.
Of course, the major advantage of augmenting
in-vehicle interfaces with auditory cues, rather than
visual cues, is that drivers do not need to alter their
gaze in order to receive the message. Furthermore,
unlike tactile alerts, auditory alerts can also take
advantage of language to communicate messages of
all kinds. Much research has supported the use of
auditory cues within vehicles. Auditory interfaces
have been shown to decrease off-road visual
attention compared to visual-only interfaces
(Tardieu, Misdariis, Langlois, Gaillard, &
Lemercier, 2015). Further research has supported
that auditory cues can improve in-vehicle menu
navigation performance (Jeon, Davison, Nees,
Wilson, & Walker, 2009; Jeon et al., 2015), while
other research has found that sound cues were
associated with increased physical comfort and
shorter response times when compared to tactile
cues (Cao, van der Sluis, Theune, op den Akker, &
Nijholt, 2010). Other research still has shown that
some individuals have a preference for in-vehicle
auditory interfaces versus visual interfaces in terms
of overall satisfaction and driving performance
(Sodnik, Dicke, Tomažič, & Billinghurst, 2008).
It is important to note, however, that the
efficacy of auditory interfaces is heavily impacted
by background noise. Unsurprisingly, auditory
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
alerts that go unheard can have severe
consequences. Some research has even supported
that noise can change the perceived urgency of
auditory alerts (Lerner, Singer, Kellman, & Traube,
2015), while other research has shown that
individuals may respond to alerts more slowly as a
result of increased background noise (Murata,
Kuroda, & Kanbayashi, 2014). Many system
designers have responded by making auditory alerts
or warnings louder, although this can,
unsurprisingly, increase the perceived annoyance of
the auditory alerts (Baldwin, 2009). It is paramount
to maintain the intelligibility of auditory cues and
alerts, but it also important to try to balance this
with aesthetic considerations as annoying systems
can be ignored or even turned off by the user.
Striking a balance between aesthetics and
pragmatics is challenging, but, if achieved, these
systems can increase user satisfaction while also
leading to safer interaction. To this end, a system
that dynamically presents alerts at a loudness
relative to background noise was designed and
evaluated in an experiment. Importantly, we
evaluated both performance in detecting and
responding to the alerts and the subjective
experience of the user while interacting with the
system. Taken together, these methods allow for an
overall understanding of both the efficacy and the
usability of the system.
The present study compared the efficacy and
usability of a dynamic alert presentation system,
which adjusted the loudness of alerts as a function
of background noise (code and an explanation of
each line can be found here:
Sabic/Adaptive_Alerts), to an alert presentation
system, henceforth referred to as the default system,
that presented the alerts at a constant loudness. The
type of background noise was also manipulated to
assess how performance might differ as a result of
varying noise type. Outside of behavioral measures,
subjective measures assessed how participants
perceived the system. We predicted that participants
would perform better with the dynamic interface
than with the default interface. We also
hypothesized that using the dynamic interface might
be perceived as annoying, as the alerts are much
A total of 40 undergraduate students (x
females) at New Mexico State University
participated in the study for class credit. Participants
were an average y years old. All participants
reported normal hearing.
The experiment was conducted on a custom-
built PC installed with Windows 10 OS. An Acer
touch-screen monitor displayed the interface and
allowed for touch-screen interaction. The entire
experiment, barring the Qualtrics questionnaire, was
programmed in MATLAB. A sound insulated room
with a 30-speaker array was utilized to present all
auditory stimuli. A Shure PGA48 microphone
resided within the speaker array next to the
participant. Computer audio volume was kept
constant at 50% of full volume.
Auditory Stimuli
All auditory alert stimuli were created by
making recordings of an Amazon 2nd Generation
Echo Dot, which generated the necessary text-to-
speech through the “Simon says” command. The
default Alexa voice was used, and input was taken
from a Shure PGA48 microphone and forwarded
into Audacity (
for clipping. Background noise was created by
driving in the Las Cruces area on a windy day while
recording the in-cab auditory environment. To
create the three different noise conditions, we
captured recordings of driving while the windows
were rolled all the way up with no radio playing,
while the windows were rolled all the way up with
radio playing, and while the windows rolled all the
way down without radio. Audio was captured using
internal microphones within a Neumann KU 100
dummy head connected to a SONUS amp (get
model). The amp was connected to and powered by
a laptop in the backseat, where another researcher
sat and made recordings through Audacity.
Subjective Measures
To better understand the user’s perception of
the system, the Usability, Satisfaction, and Ease-of-
Use (USE) questionnaire and the system usability
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
scale (SUS) were administered at the end of the
experiment. The 10-item SUS offers a quick way to
assess the usability of a system, and has been
described by some as an industry standard (Brooke,
2013). The USE is a 30-item questionnaire created
by Lund (2001) that produces an understanding of
the users perception of the system across four
categories. These include: usefulness, ease of use,
ease of learning, and satisfaction. Lastly, the
NASA-TLX (Hart & Staveland, 1988) was used to
evaluate the cognitive workload associated with
using each system.
Figure 1. The placement of the Neumann KU 100 during field
Participants first completed consent and
demographic forms, and were then briefed on the
structure of the experiment. They were then
escorted to a sound insulated room and asked to sit
in front of a computer positioned within the speaker
array. Participants were shown the interface they
would be interacting with, and the meaning of each
icon was explained. They were informed that their
task was to listen for any of the eight possible alerts
and press the button corresponding to alert if they
heard one. For instance, if the participant heard
“weather update” they should press the icon that
displayed a sun behind stormy clouds (see Figure
2). Alerts were chosen based on some common
functions of infotainment systems, such as media
and information updates. The alerts included:
“incoming call”, “weather update”, “connections
available”, “new mail”, “fuel update”, “traffic
update”, “new video”, and “new playlist”.
Participants were then randomized into either
the dynamic interface or default interface group.
The dynamic interface presented alerts 10 dB louder
than the background noise. The default interface did
not include the function that dynamically shifted
loudness, and instead produced all alerts at a
constant loudness throughout each block. The alert
loudness for this system was calculated to be 10 dB
quieter than the average loudness of the entire
background noise clip. Each participant performed
all three blocks presented in random order.
Figure 2. The touch-interface used within the study.
The participant’s task was to identify the alert
that was presented within the array, and respond by
pressing the appropriate icon on the interface. They
were instructed to respond as quickly and accurately
as possible. Alert presentations were jittered from 5
to 10 seconds from the beginning of a trial, and so
an average of 7.5 seconds passed between alerts
being presented. This was done so the participant
could not predict when an alert would be presented.
Each alert was repeated 12 times during the entirety
of a noise block, resulting in 96 total trials within
each block. After a participant completed a block,
they were given the option to take a break or
continue with the remaining blocks. Once all blocks
were completed, participants were asked to
complete a Qualtrics survey. The survey included
the NASA-TLX, the SUS, the USE questionnaire,
and a question asking how annoying the participant
found the alerts. Upon completion of the survey,
participants were debriefed, awarded credit, and
thanked for their time.
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
For both reaction time and accuracy data, a
mixed analysis of variance (ANOVA) was
conducted with noise block serving as a within-
subjects factor and system type (dynamic or default)
serving as the between-subjects factor.
The main effect of noise type on RT was not
significant, F(2, 76) = 1.402, p = .046, ηp2 = .04,
while the main effect of system type on RT was
significant, F(1, 38) = 4.257, p = .046, ηp2 = .10.
Lastly, a significant interaction emerged between
system and noise type, F(2, 76) = 7.926, p = .001,
ηp2 = .17. Within the windows-up and windows-
down noise blocks, mean response times were
shorter for participants using the default interface
(see Figure 3).
Figure 3. Violin plot of reaction time (ms), where median is
depicted by a white dot, interquartile range by the thick black
bar, and confidence intervals by the thin lines out from center.
The main effect of noise type on accuracy was
significant, F(2, 76) = 20.125, p < .001, ηp2 = .35,
and the main effect of system type on accuracy was
also significant, F(1, 38) = 22.090, p < .001, ηp2 =
.37. Further, a significant interaction emerged
between system and noise type, F(2, 76) = 23.404, p
< .001, ηp2 = .38. Pairwise comparisons (Sidak)
comparing all noise levels showed that only the
music noise block was significantly different from
the other noise blocks (see Figure 4), ps < .001.
Turning to subjective measures, alerts were
rated as more annoying by participants in the
dynamic condition (M = 48.79, SD = 29.08) than by
participants in the default condition (M = 38.75, SD
= 32.00). However, the results of a t-test did not
support that this difference was significant, p > .05.
To analyze NASA-TLX data, one-way between-
subjects ANOVAs were conducted on each subscale
Figure 4. Accuracy results across noise and system type. Error
bars indicate 95% confidence intervals.
except for the physical demand subscale, as there
was no reasoning for this being different across the
two groups. No significant differences were found
across any of the five subscales, ps > .05. No formal
analyses were conducted on the USE questionnaire,
but mean scores were quite similar across groups
(see Figure 5). Scores on the SUS were relatively
similar across the two groups, with participants in
the dynamic condition rating the system’s usability
as slightly lower (SUSscore = 75) than participants
interacting with the static interface (SUSscore = 80).
Figure 5. Scores for both groups across the four USE
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
The ability to hear auditory alerts or updates
from an infotainment system is clearly dependent
on the sound characteristics of both the alert and the
background noise. There are, of course, many such
characteristics that can effect this intelligibility, but
one of the most important is signal loudness relative
to noise. The present research evaluated a system
which was programmed to take this background
noise intensity into account, and adjust the
presentation of an alert accordingly. While the
results are admittedly mixed, the experiment takes
an initial step into evaluating such a system.
Behavioral results showed that while
participants accurately identified more alerts in one
of the noise conditions (music) when interacting
with the dynamic interface, they nevertheless
responded more slowly in the other noise conditions
relative to participants interacting with the default
interface. It is unclear as to why participants would
respond more slowly in one interface over another,
and future research is needed to replicate the results
of the current experiment. Subjective measures
supported that both interfaces were perceived
relatively similarly, but participants did rate the
dynamic interface as more annoying compared to
the default interface although this difference was
not significant. This is most likely due to the louder
alerts inherent to the dynamic interface.
A system that dynamically adjusts auditory
alerts relative to background noise has clear
applications not just in automotive contexts, but in
any environment where background noise can
interfere with alert recognition. The present
research evaluated such a system, but future
research is needed to fine-tune these systems to
enhance their usability. This endeavor will increase
the likelihood that users can hear alerts, which has
the benefit of both an improved user experience
and, in some cases, increased safety.
Baldwin, C. L. (2011). Verbal collision avoidance messages
during simulated driving: perceived urgency, alerting
effectiveness and annoyance. Ergonomics, 54(4),
Blattner, M. M., Sumikawa, D. A., & Greenberg, R. M.
(1989). Earcons and icons: Their structure and
common design principles. HumanComputer
Interaction, 4(1), 11-44.
Brooke, J. (2013). SUS: a retrospective. Journal of usability
studies, 8(2), 29-40.
Cao, Y., Van Der Sluis, F., Theune, M., & Nijholt, A. (2010,
November). Evaluating informative auditory and
tactile cues for in-vehicle information systems.
In Proceedings of the 2nd International Conference
on Automotive User Interfaces and Interactive
Vehicular Applications (pp. 102-109). ACM.
Gaver, W. W. (1986). Auditory icons: Using sound in
computer interfaces. Human-computer
interaction, 2(2), 167-177.
Hart, S. G., & Staveland, L. E. (1988). Development of
NASA-TLX (Task Load Index): Results of empirical
and theoretical research. In Advances in
psychology (Vol. 52, pp. 139-183). North-Holland.
Jeon, M., Davison, B. K., Nees, M. A., Wilson, J., & Walker,
B. N. (2009, September). Enhanced auditory menu
cues improve dual task performance and are preferred
with in-vehicle technologies. In Proceedings of the
1st international conference on automotive user
interfaces and interactive vehicular applications (pp.
91-98). ACM.
Jeon, M., Gable, T. M., Davison, B. K., Nees, M. A., Wilson,
J., & Walker, B. N. (2015). Menu navigation with in-
vehicle technologies: Auditory menu cues improve
dual task performance, preference, and
workload. International Journal of Human-Computer
Interaction, 31(1), 1-16.
Kilander, F., & Lönnqvist, P. (2002). A whisper in the woods-
an ambient soundscape for peripheral awareness of
remote processes. Georgia Institute of Technology.
Lerner, N., Singer, J., Kellman, D., & Traube, E. (2015). In-
Vehicle Noise Alters the Perceived Meaning of
Auditory Signals. In 8th International Driving
Symposium on Human Factors in Driver Assessment,
Training, and Vehicle Design.
Lund, A. M. (2001). Measuring usability with the use
questionnaire12. Usability interface, 8(2), 3-6.
Murata, A., Kuroda, T., & Kanbayashi, M. (2014).
Effectiveness of Auditory and Vibrotactile Cuing for
Driver’s Enhanced Attention under Noisy
Environment. Advances in Ergonomics In Design,
Usability & Special Populations: Part II, 17, 155.
Sodnik, J., Dicke, C., Tomažič, S., & Billinghurst, M. (2008).
A user study of auditory versus visual interfaces for
use while driving. International journal of human-
computer studies, 66(5), 318-332.
Tardieu, J., Misdariis, N., Langlois, S., Gaillard, P., &
Lemercier, C. (2015). Sonification of in-vehicle
interface reduces gaze movements under dual-task
condition. Applied ergonomics, 50, 41-49.
Walker, B. N., Nance, A., & Lindsay, J. (2006). Spearcons:
Speech-based earcons improve navigation
performance in auditory menus. Georgia Institute of
... Challenges exist in the adoption and acceptance of higher levels of autonomy in vehicles, revolving around safety concerns with the vehicles, and can be addressed through novel interactive displays for safe transitions in automation (Jeon, 2019), such as robot agents (Lee, Ko, Sanghavi, & Jeon, 2019) or augmented reality displays (von Sawitzky, Wintersberger, Riener, & Gabbard, 2019). Furthermore, researchers have suggested the use of adaptive auditory alerts (Šabić, Henning, & MacDonald, 2019), spatial sound (Petermeijer, Bazilinskyy, Bengler, & de Winter, 2017), or other auditory displays as ways to quickly provide feedback for drivers to takeover or respond to different vehicle states. ...
Highly automated driving systems are expected to require the design of new user-vehicle interactions. Sonification can be used to provide contextualized alarms and cues that can increase situation awareness and user experience. In this study, we examined user perceptions of potential use cases for level 4 automated vehicles in online focus group interviews (N=12). Also, in a driving simulator study, we evaluated (1) visual-only display; (2) non-speech with visual display; and (3) speech with visual display with 20 young drivers. Results indicated participants’ interest in the use cases and insight on desired functions in highly automated vehicles. Both audiovisual display conditions resulted in higher situation awareness for drivers than the visual-only condition. Some differences were found between the non-speech and speech conditions suggesting benefits of sonification for both driving and non-driving related auditory use cases. This study will provide guidance on sonification design for highly automated vehicles.
Full-text available
Full-text available
In-car infotainment systems (ICIS) often degrade driving performances since they divert the driver's gaze from the driving scene. Sonification of hierarchical menus (such as those found in most ICIS) is examined in this paper as one possible solution to reduce gaze movements towards the visual display. In a dual-task experiment in the laboratory, 46 participants were requested to prioritize a primary task (a continuous target detection task) and to simultaneously navigate in a realistic mock-up of an ICIS, either sonified or not. Results indicated that sonification significantly increased the time spent looking at the primary task, and significantly decreased the number and the duration of gaze saccades towards the ICIS. In other words, the sonified ICIS could be used nearly exclusively by ear. On the other hand, the reaction times in the primary task were increased in both silent and sonified conditions. This study suggests that sonification of secondary tasks while driving could improve the driver's visual attention of the driving scene. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Full-text available
Auditory display research for driving has mainly examined a limited range of tasks (e.g., collision warnings, cell phone tasks). In contrast, the goal of this project was to evaluate the effectiveness of enhanced auditory menu cues in a simulated driving context. The advanced auditory cues of ‘spearcons’ (compressed speech cues) and ‘spindex’ (a speech-based index cue) were predicted to improve both menu navigation and driving. Two experiments used a dual task paradigm in which users selected songs on the vehicle’s infotainment system. In Experiment 1, 24 undergraduates played a simple, perceptual-motor ball-catching game (the primary task; a surrogate for driving), and navigated through an alphabetized list of 150 song titles—rendered as an auditory menu—as a secondary task. The menu was presented either in the typical visual-only manner, or enhanced with text-to-speech (TTS), or TTS plus one of three types of additional auditory cues. In Experiment 2, 34 undergraduates conducted the same secondary task while driving in a simulator. In both experiments, performance on both the primary task (success rate of the game or driving performance) and the secondary task (menu search time) was better with the auditory menus than with no sound. Perceived workload scores, as well as user preferences favored the enhanced auditory cue types. These results show that adding audio, and enhanced auditory cues in particular, can allow a driver to operate the menus of in-vehicle technologies more efficiently while driving more safely. Results are discussed with multiple resources theory.
Conference Paper
Full-text available
Auditory display research for driving has mainly focused on collision warning signals, and recent studies on auditory in-vehicle information presentation have examined only a limited range of tasks (e.g., cell phone operation tasks or verbal tasks such as reading digit strings). The present study used a dual task paradigm to evaluate a plausible scenario in which users navigated a song list. We applied enhanced auditory menu navigation cues, including spearcons (i.e., compressed speech) and a spindex (i.e., a speech index that used brief audio cues to communicate the usertextquoterights position in a long menu list). Twenty-four undergraduates navigated through an alphabetized song list of 150 song titlestextemdashrendered as an auditory menutextemdashwhile they concurrently played a simple, perceptual-motor, ball-catching game. The menu was presented with text-to-speech (TTS) alone, TTS plus one of three types of enhanced auditory cues, or no sound at all. Both performance of the primary task (success rate of the game) and the secondary task (menu search time) were better with the auditory menus than with no sound. Subjective workload scores (NASA TLX) and user preferences favored the enhanced auditory cue types. Results are discussed in terms of multiple resources theory and practical IVT design applications.
Conference Paper
Full-text available
As in-vehicle information systems are increasingly able to obtain and deliver information, driver distraction becomes a larger concern. In this paper we propose that informative interruption cues (IIC) can be an effective means to support drivers’ attention management. As a first step, we investigated the design and presentation modality of IIC that conveyed not only the arrival but also the priority level of a message. Both sound and vibration cues were created for four different priority levels and tested in 5 task conditions that simulated possible perceptional and cognitive load in real driving situations. Results showed that the cues were quickly learned, reliably detected, and quickly and accurately identified. Vibration was found to be a promising alternative for sound to deliver IIC, as vibration cues were identified more accurately and interfered less with driving. Sound cues also had advantages in terms of shorter response time and more (reported) physical comfort.
With shrinking displays and increasing technology use by visually impaired users, it is important to improve usability with non-GUI interfaces such as menus. Using non-speech sounds called earcons or auditory icons has been proposed to enhance menu navigation. We compared search time and accuracy of menu navigation using four types of auditory representations: speech only; hierarchical earcons; auditory icons; and a new type called spearcons. Spearcons are created by speeding up a spoken phrase until it is not recognized as speech. Using a within-subjects design, participants searched a 5 x 5 menu for target items using each type of audio cue. Spearcons and speech-only both led to faster and more accurate menu navigation than auditory icons and hierarchical earcons. There was a significant practice effect for search time, within each type of auditory cue. These results suggest that spearcons are more effective than previous auditory cues in menu-based interfaces, and may lead to better performance and accuracy, as well as more flexible menu structures.