Content uploaded by Edin Sabic
Author content
All content in this area was uploaded by Edin Sabic on Jan 28, 2020
Content may be subject to copyright.
Note: This is the initial manuscript version (before review) submitted for publication. The final published
version can be found here: https://journals.sagepub.com/doi/abs/10.1177/1071181319631404
Adaptive Auditory Alerts for Smart In-Vehicle Interfaces
Edin Šabić, Daniel Henning, Justin MacDonald
New Mexico State University
Missing a message from an in-vehicle device can range in severity from annoying at best
to dangerous at worst. The in-cab auditory environment can vary sporadically, making
some volume levels too loud while rendering others too quiet. It is in the best interest of
both the driver and the system designer to make sure that users are able to adequately,
and comfortably, hear alerts and that they do not have to alter their gaze or attention
during a visually and attentionally demanding task such as driving. To this end, we
propose a system for dynamically tracking the background noise intensity level
immediately prior to alert presentation in order to present an alert at an appropriate
loudness. Furthermore, we evaluate the proposed system across both behavioral, in terms
of accuracy and reaction time, and subjective, in terms of questionnaire results, measures.
Behavioral results showed that while the proposed system increased recognition in one
noise condition (background music), it also led to slower responses in two other noise
conditions (windows-down and windows-up noise).
INTRODUCTION
The increased popularity of in-vehicle interfaces
brings with it the promise of a more enjoyable, and
in some cases safer, user experience. The purpose of
these interfaces can range from media, eco-driving,
safety systems – such as collision avoidance
systems, and navigation systems, to name only a
few. Furthermore, these interfaces can use a
unimodal, such as tactile-only systems, or
multimodal, such as audio-visual systems,
approach. Unfortunately, in-vehicle technologies
can decrease the driver’s attention to their primary
task, driving, and so it is crucial to fine-tune these
interfaces for a driving context. Certainly every
approach has its share of advantages and
disadvantages, but implementing auditory interfaces
into vehicles has a crucial advantage of not further
taxing visual resources that are critical to driving.
While there are multiple ways to communicate
an auditory alert, including auditory icons (Gaver,
1986; earcons (Blattner, Sumikawa, & Greenberg,
1989), spearcons (Walker, Nance, & Lindsay, 2006;
Walker 2013), and soundscapes (Kilander &
Lönnqvist, 2002), text-to-speech (TTS) is one of the
most popular approaches and has widespread
prevalence in infotainment systems. For instance, a
navigational system that produces TTS allows for
the communication of street names or locations.
TTS does, however, have the drawback that the
listener must be able to understand the given
language, something that is not a problem for a
language-free auditory cue, such as auditory icons.
Of course, the major advantage of augmenting
in-vehicle interfaces with auditory cues, rather than
visual cues, is that drivers do not need to alter their
gaze in order to receive the message. Furthermore,
unlike tactile alerts, auditory alerts can also take
advantage of language to communicate messages of
all kinds. Much research has supported the use of
auditory cues within vehicles. Auditory interfaces
have been shown to decrease off-road visual
attention compared to visual-only interfaces
(Tardieu, Misdariis, Langlois, Gaillard, &
Lemercier, 2015). Further research has supported
that auditory cues can improve in-vehicle menu
navigation performance (Jeon, Davison, Nees,
Wilson, & Walker, 2009; Jeon et al., 2015), while
other research has found that sound cues were
associated with increased physical comfort and
shorter response times when compared to tactile
cues (Cao, van der Sluis, Theune, op den Akker, &
Nijholt, 2010). Other research still has shown that
some individuals have a preference for in-vehicle
auditory interfaces versus visual interfaces in terms
of overall satisfaction and driving performance
(Sodnik, Dicke, Tomažič, & Billinghurst, 2008).
It is important to note, however, that the
efficacy of auditory interfaces is heavily impacted
by background noise. Unsurprisingly, auditory
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
https://journals.sagepub.com/doi/abs/10.1177/1071181319631404
alerts that go unheard can have severe
consequences. Some research has even supported
that noise can change the perceived urgency of
auditory alerts (Lerner, Singer, Kellman, & Traube,
2015), while other research has shown that
individuals may respond to alerts more slowly as a
result of increased background noise (Murata,
Kuroda, & Kanbayashi, 2014). Many system
designers have responded by making auditory alerts
or warnings louder, although this can,
unsurprisingly, increase the perceived annoyance of
the auditory alerts (Baldwin, 2009). It is paramount
to maintain the intelligibility of auditory cues and
alerts, but it also important to try to balance this
with aesthetic considerations – as annoying systems
can be ignored or even turned off by the user.
Striking a balance between aesthetics and
pragmatics is challenging, but, if achieved, these
systems can increase user satisfaction while also
leading to safer interaction. To this end, a system
that dynamically presents alerts at a loudness
relative to background noise was designed and
evaluated in an experiment. Importantly, we
evaluated both performance in detecting and
responding to the alerts and the subjective
experience of the user while interacting with the
system. Taken together, these methods allow for an
overall understanding of both the efficacy and the
usability of the system.
EXPERIMENT
The present study compared the efficacy and
usability of a dynamic alert presentation system,
which adjusted the loudness of alerts as a function
of background noise (code and an explanation of
each line can be found here: www.github.com/Edin-
Sabic/Adaptive_Alerts), to an alert presentation
system, henceforth referred to as the default system,
that presented the alerts at a constant loudness. The
type of background noise was also manipulated to
assess how performance might differ as a result of
varying noise type. Outside of behavioral measures,
subjective measures assessed how participants
perceived the system. We predicted that participants
would perform better with the dynamic interface
than with the default interface. We also
hypothesized that using the dynamic interface might
be perceived as annoying, as the alerts are much
louder.
METHODS
Participants
A total of 40 undergraduate students (x
females) at New Mexico State University
participated in the study for class credit. Participants
were an average y years old. All participants
reported normal hearing.
Apparatus
The experiment was conducted on a custom-
built PC installed with Windows 10 OS. An Acer
touch-screen monitor displayed the interface and
allowed for touch-screen interaction. The entire
experiment, barring the Qualtrics questionnaire, was
programmed in MATLAB. A sound insulated room
with a 30-speaker array was utilized to present all
auditory stimuli. A Shure PGA48 microphone
resided within the speaker array next to the
participant. Computer audio volume was kept
constant at 50% of full volume.
Auditory Stimuli
All auditory alert stimuli were created by
making recordings of an Amazon 2nd Generation
Echo Dot, which generated the necessary text-to-
speech through the “Simon says” command. The
default Alexa voice was used, and input was taken
from a Shure PGA48 microphone and forwarded
into Audacity (www.audacityteam.org/download/)
for clipping. Background noise was created by
driving in the Las Cruces area on a windy day while
recording the in-cab auditory environment. To
create the three different noise conditions, we
captured recordings of driving while the windows
were rolled all the way up with no radio playing,
while the windows were rolled all the way up with
radio playing, and while the windows rolled all the
way down without radio. Audio was captured using
internal microphones within a Neumann KU 100
dummy head connected to a SONUS amp (get
model). The amp was connected to and powered by
a laptop in the backseat, where another researcher
sat and made recordings through Audacity.
Subjective Measures
To better understand the user’s perception of
the system, the Usability, Satisfaction, and Ease-of-
Use (USE) questionnaire and the system usability
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
https://journals.sagepub.com/doi/abs/10.1177/1071181319631404
scale (SUS) were administered at the end of the
experiment. The 10-item SUS offers a quick way to
assess the usability of a system, and has been
described by some as an industry standard (Brooke,
2013). The USE is a 30-item questionnaire created
by Lund (2001) that produces an understanding of
the user’s perception of the system across four
categories. These include: usefulness, ease of use,
ease of learning, and satisfaction. Lastly, the
NASA-TLX (Hart & Staveland, 1988) was used to
evaluate the cognitive workload associated with
using each system.
Figure 1. The placement of the Neumann KU 100 during field
recordings.
Procedure
Participants first completed consent and
demographic forms, and were then briefed on the
structure of the experiment. They were then
escorted to a sound insulated room and asked to sit
in front of a computer positioned within the speaker
array. Participants were shown the interface they
would be interacting with, and the meaning of each
icon was explained. They were informed that their
task was to listen for any of the eight possible alerts
and press the button corresponding to alert if they
heard one. For instance, if the participant heard
“weather update” they should press the icon that
displayed a sun behind stormy clouds (see Figure
2). Alerts were chosen based on some common
functions of infotainment systems, such as media
and information updates. The alerts included:
“incoming call”, “weather update”, “connections
available”, “new mail”, “fuel update”, “traffic
update”, “new video”, and “new playlist”.
Participants were then randomized into either
the dynamic interface or default interface group.
The dynamic interface presented alerts 10 dB louder
than the background noise. The default interface did
not include the function that dynamically shifted
loudness, and instead produced all alerts at a
constant loudness throughout each block. The alert
loudness for this system was calculated to be 10 dB
quieter than the average loudness of the entire
background noise clip. Each participant performed
all three blocks presented in random order.
Figure 2. The touch-interface used within the study.
The participant’s task was to identify the alert
that was presented within the array, and respond by
pressing the appropriate icon on the interface. They
were instructed to respond as quickly and accurately
as possible. Alert presentations were jittered from 5
to 10 seconds from the beginning of a trial, and so
an average of 7.5 seconds passed between alerts
being presented. This was done so the participant
could not predict when an alert would be presented.
Each alert was repeated 12 times during the entirety
of a noise block, resulting in 96 total trials within
each block. After a participant completed a block,
they were given the option to take a break or
continue with the remaining blocks. Once all blocks
were completed, participants were asked to
complete a Qualtrics survey. The survey included
the NASA-TLX, the SUS, the USE questionnaire,
and a question asking how annoying the participant
found the alerts. Upon completion of the survey,
participants were debriefed, awarded credit, and
thanked for their time.
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
https://journals.sagepub.com/doi/abs/10.1177/1071181319631404
RESULTS
For both reaction time and accuracy data, a
mixed analysis of variance (ANOVA) was
conducted with noise block serving as a within-
subjects factor and system type (dynamic or default)
serving as the between-subjects factor.
The main effect of noise type on RT was not
significant, F(2, 76) = 1.402, p = .046, ηp2 = .04,
while the main effect of system type on RT was
significant, F(1, 38) = 4.257, p = .046, ηp2 = .10.
Lastly, a significant interaction emerged between
system and noise type, F(2, 76) = 7.926, p = .001,
ηp2 = .17. Within the windows-up and windows-
down noise blocks, mean response times were
shorter for participants using the default interface
(see Figure 3).
Figure 3. Violin plot of reaction time (ms), where median is
depicted by a white dot, interquartile range by the thick black
bar, and confidence intervals by the thin lines out from center.
The main effect of noise type on accuracy was
significant, F(2, 76) = 20.125, p < .001, ηp2 = .35,
and the main effect of system type on accuracy was
also significant, F(1, 38) = 22.090, p < .001, ηp2 =
.37. Further, a significant interaction emerged
between system and noise type, F(2, 76) = 23.404, p
< .001, ηp2 = .38. Pairwise comparisons (Sidak)
comparing all noise levels showed that only the
music noise block was significantly different from
the other noise blocks (see Figure 4), ps < .001.
Turning to subjective measures, alerts were
rated as more annoying by participants in the
dynamic condition (M = 48.79, SD = 29.08) than by
participants in the default condition (M = 38.75, SD
= 32.00). However, the results of a t-test did not
support that this difference was significant, p > .05.
To analyze NASA-TLX data, one-way between-
subjects ANOVAs were conducted on each subscale
Figure 4. Accuracy results across noise and system type. Error
bars indicate 95% confidence intervals.
except for the physical demand subscale, as there
was no reasoning for this being different across the
two groups. No significant differences were found
across any of the five subscales, ps > .05. No formal
analyses were conducted on the USE questionnaire,
but mean scores were quite similar across groups
(see Figure 5). Scores on the SUS were relatively
similar across the two groups, with participants in
the dynamic condition rating the system’s usability
as slightly lower (SUSscore = 75) than participants
interacting with the static interface (SUSscore = 80).
Figure 5. Scores for both groups across the four USE
dimensions.
Note: This is the initial manuscript version (before review) submitted for publication. The final published version can be found here:
https://journals.sagepub.com/doi/abs/10.1177/1071181319631404
DISCUSSION
The ability to hear auditory alerts or updates
from an infotainment system is clearly dependent
on the sound characteristics of both the alert and the
background noise. There are, of course, many such
characteristics that can effect this intelligibility, but
one of the most important is signal loudness relative
to noise. The present research evaluated a system
which was programmed to take this background
noise intensity into account, and adjust the
presentation of an alert accordingly. While the
results are admittedly mixed, the experiment takes
an initial step into evaluating such a system.
Behavioral results showed that while
participants accurately identified more alerts in one
of the noise conditions (music) when interacting
with the dynamic interface, they nevertheless
responded more slowly in the other noise conditions
relative to participants interacting with the default
interface. It is unclear as to why participants would
respond more slowly in one interface over another,
and future research is needed to replicate the results
of the current experiment. Subjective measures
supported that both interfaces were perceived
relatively similarly, but participants did rate the
dynamic interface as more annoying compared to
the default interface – although this difference was
not significant. This is most likely due to the louder
alerts inherent to the dynamic interface.
A system that dynamically adjusts auditory
alerts relative to background noise has clear
applications not just in automotive contexts, but in
any environment where background noise can
interfere with alert recognition. The present
research evaluated such a system, but future
research is needed to fine-tune these systems to
enhance their usability. This endeavor will increase
the likelihood that users can hear alerts, which has
the benefit of both an improved user experience
and, in some cases, increased safety.
REFERENCES
Baldwin, C. L. (2011). Verbal collision avoidance messages
during simulated driving: perceived urgency, alerting
effectiveness and annoyance. Ergonomics, 54(4),
328-337.
Blattner, M. M., Sumikawa, D. A., & Greenberg, R. M.
(1989). Earcons and icons: Their structure and
common design principles. Human–Computer
Interaction, 4(1), 11-44.
Brooke, J. (2013). SUS: a retrospective. Journal of usability
studies, 8(2), 29-40.
Cao, Y., Van Der Sluis, F., Theune, M., & Nijholt, A. (2010,
November). Evaluating informative auditory and
tactile cues for in-vehicle information systems.
In Proceedings of the 2nd International Conference
on Automotive User Interfaces and Interactive
Vehicular Applications (pp. 102-109). ACM.
Gaver, W. W. (1986). Auditory icons: Using sound in
computer interfaces. Human-computer
interaction, 2(2), 167-177.
Hart, S. G., & Staveland, L. E. (1988). Development of
NASA-TLX (Task Load Index): Results of empirical
and theoretical research. In Advances in
psychology (Vol. 52, pp. 139-183). North-Holland.
Jeon, M., Davison, B. K., Nees, M. A., Wilson, J., & Walker,
B. N. (2009, September). Enhanced auditory menu
cues improve dual task performance and are preferred
with in-vehicle technologies. In Proceedings of the
1st international conference on automotive user
interfaces and interactive vehicular applications (pp.
91-98). ACM.
Jeon, M., Gable, T. M., Davison, B. K., Nees, M. A., Wilson,
J., & Walker, B. N. (2015). Menu navigation with in-
vehicle technologies: Auditory menu cues improve
dual task performance, preference, and
workload. International Journal of Human-Computer
Interaction, 31(1), 1-16.
Kilander, F., & Lönnqvist, P. (2002). A whisper in the woods-
an ambient soundscape for peripheral awareness of
remote processes. Georgia Institute of Technology.
Lerner, N., Singer, J., Kellman, D., & Traube, E. (2015). In-
Vehicle Noise Alters the Perceived Meaning of
Auditory Signals. In 8th International Driving
Symposium on Human Factors in Driver Assessment,
Training, and Vehicle Design.
Lund, A. M. (2001). Measuring usability with the use
questionnaire12. Usability interface, 8(2), 3-6.
Murata, A., Kuroda, T., & Kanbayashi, M. (2014).
Effectiveness of Auditory and Vibrotactile Cuing for
Driver’s Enhanced Attention under Noisy
Environment. Advances in Ergonomics In Design,
Usability & Special Populations: Part II, 17, 155.
Sodnik, J., Dicke, C., Tomažič, S., & Billinghurst, M. (2008).
A user study of auditory versus visual interfaces for
use while driving. International journal of human-
computer studies, 66(5), 318-332.
Tardieu, J., Misdariis, N., Langlois, S., Gaillard, P., &
Lemercier, C. (2015). Sonification of in-vehicle
interface reduces gaze movements under dual-task
condition. Applied ergonomics, 50, 41-49.
Walker, B. N., Nance, A., & Lindsay, J. (2006). Spearcons:
Speech-based earcons improve navigation
performance in auditory menus. Georgia Institute of
Technology.