ArticlePDF Available

Abstract and Figures

Attention allows the listener to select relevant information from their environment, and disregard what is irrelevant. However, irrelevant stimuli sometimes manage to capture it and stand out from a scene because of bottom-up processes driven by salient stimuli. This attentional capture effect was observed using an implicit approach based on the additional singleton paradigm. In the auditory domain, it was shown that sound attributes such as intensity and frequency tend to capture attention during auditory search (cost to performance) for targets defined on a different dimension such as duration. In the present study, the authors examined whether a similar phenomenon occurs for attributes of timbre such as brightness (related to the spectral centroid) and roughness (related the amplitude modulation depth). More specifically, we revealed the relationship between the variations of these attributes and the magnitude of the attentional capture effect. In experiment 1, the occurrence of a brighter sound (higher spectral centroid) embedded in sequences of successive tones produced significant search costs. In experiments 2 and 3, different values of brightness and roughness confirmed that attention capture is monotonically driven by the sound features. In experiment 4, the effect was found to be symmetrical: positive or negative, the same difference in brightness had the same negative effect on performance. Experiment 5 suggested that the effect produced by the variations of the two attributes is additive. This work provides a methodology for quantifying the bottom-up component of attention and brings new insights on attention capture and auditory salience.
This content is subject to copyright. Terms and conditions apply.
1
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports
Revealing the stimulus‑driven
component of attention
through modulations of auditory
salience by timbre attributes
Baptiste Bouvier
1,2*, Patrick Susini
1, Catherine Marquis‑Favre
2 & Nicolas Misdariis
1
Attention allows the listener to select relevant information from their environment, and disregard
what is irrelevant. However, irrelevant stimuli sometimes manage to capture it and stand out from a
scene because of bottom‑up processes driven by salient stimuli. This attentional capture eect was
observed using an implicit approach based on the additional singleton paradigm. In the auditory
domain, it was shown that sound attributes such as intensity and frequency tend to capture attention
during auditory search (cost to performance) for targets dened on a dierent dimension such as
duration. In the present study, the authors examined whether a similar phenomenon occurs for
attributes of timbre such as brightness (related to the spectral centroid) and roughness (related
the amplitude modulation depth). More specically, we revealed the relationship between the
variations of these attributes and the magnitude of the attentional capture eect. In experiment 1,
the occurrence of a brighter sound (higher spectral centroid) embedded in sequences of successive
tones produced signicant search costs. In experiments 2 and 3, dierent values of brightness
and roughness conrmed that attention capture is monotonically driven by the sound features. In
experiment 4, the eect was found to be symmetrical: positive or negative, the same dierence in
brightness had the same negative eect on performance. Experiment 5 suggested that the eect
produced by the variations of the two attributes is additive. This work provides a methodology for
quantifying the bottom‑up component of attention and brings new insights on attention capture and
auditory salience.
e acoustic environment is so rich in information that our brain cannot process in detail all of the sounds it is
constantly receiving. Instead, the individual selects stimuli that they deem to be relevant for a particular task,
and ignores others1. e most famous example of selective attention is the cocktail party problem2. is ability
is made possible by an attentional process that lters the ow of stimulus information through certain irrelevant
channels3,4. e precise mechanisms involved in this ltering are still being investigated5. However, the brain
should not be completely blind to task-irrelevant stimuli since they could provide important information about
the environment. For example, if we are chatting to someone on the street, we can pick up what they are saying
and ignore the surrounding trac noise. However, the squeal of tires associated with a car’s sudden braking may
still attract our attention. So, if the stimulus is suciently salient, the brain may have to process the information
it contains involuntarily. is phenomenon is known as involuntary attentional capture. Salience is the property
of a stimulus that makes it likely to capture attention, i.e., the bottom-up component of attention6.
Attention capture has been extensively studied in the visual modality (see 30 for a review). Implicit approaches
measure the behavioral costs (increased reaction times and error rates) of the presence of an irrelevant distractor
in focal tasks. Among other things, irrelevant stimuli dened by their color, shape or onset time are known to
attract the attention of participants performing a visual search task79.
However, there has been some debate about how salient objects can automatically capture attention. Some
have argued that salient objects have an automatic power to attract attention, regardless of the subject’s goals. ey
observed that certain features, such as color or shape, make the salient object automatically capable of attracting
attention10. is led to a stimulus-driven conception of attentional capture11: visual selection is determined by the
physical properties of the stimuli, and attention is drawn to the location where one object diers from the others
OPEN
1STMS IRCAM, Sorbonne Université, CNRS, Ministère de La Culture, 75004 Paris, France. 2Univ Lyon, ENTPE,
École Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France. *email: baptiste.bouvier@ircam.fr
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Vol:.(1234567890)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
along a particular dimension. However, others have argued that only items that match the target’s features can
capture attention. For them, capture depends on the attentional set that is encouraged by the task12. For example,
it has recently been found that salience does not inuence the capture of visual stimuli. Instead, participants
can oen learn to suppress salient objects13,14. Authors from dierent parties eventually came together to review
and compare their theories15. ey agreed that "physically salient stimuli automatically generate a priority signal
that, in the absence of specic attentional control settings, will automatically capture attention, but there are
circumstances under which the actual capture of attention can be prevented", reconciling the stimulus-driven
and contingent capture approaches.
In the auditory modality, few studies have addressed this issue. Huang and Elhilali16 used an explicit approach
to measure auditory salience in complex sound scenes. Participants listened to the scenes dichotically (a dier-
ent scene in each ear), and continuously indicated which side their attention was focused on. Averaged across
scenes and participants, this allows the identication of salient events in a scene where their responses, on aver-
age, indicate how they orient their attention. is protocol involves top-down processes, as participants actively
listen to the sounds and report the orientation of their attention. We therefore cannot infer any measurement of
the purely bottom-up component of attention. In Kaya etal.17 the authors asked their participants to focus on
a visual task and to ignore background acoustic melodies. Brain responses were recorded, showing that varia-
tions in acoustic attributes could make notes in these melodies more salient, and how these dierent attributes
interacted to modulate brain responses.
Dalton and Lavie18 used an implicit approach based on the additional singleton paradigm to reveal an auditory
attentional capture eect by sound features such as frequency or intensity. is paradigm was rst developed
in the visual modality to show that irrelevant stimuli can capture participants’ attention during a visual search
task, resulting in increased error rates and response times7,19.
Results from Dalton and Lavie18 showed a signicant cost (increased response times and error rates) in an
auditory search task caused by irrelevant sounds. In their experiment, participants had to listen to sequences
of ve sounds. Among these, they had to detect a target dened by a dimension (e.g., a change in frequency
compared to non-targets). In half of the trials, one of the non-targets was made dierent from the others on
a dimension other than that which dened the target, such as intensity. is sound is called a singleton and is
irrelevant to the task. In fact, paying attention to the dimension that denes the singleton is not an advantageous
strategy for detecting the target. e results showed that the singleton features could cause interference: partici-
pants made more errors and took more time to detect the target when the singleton was present. e eect was
not due to low-level interactions between the singleton and the target, which would have caused it to be more
dicult to compare the target with the singleton than with a non-target. e eect was shown when the singleton
was separated from the target by another sound. Garrido etal.20 discussed the similarity to mismatch negativ-
ity studies, which focus on the elicitation of an event-related potential by deviant tones that dier in frequency
or duration. e much shorter inter-stimulus interval, the frequency of occurrence of the deviant tones, and
the explicit instruction to ignore these irrelevant singletons limit the parallels that can be drawn in this area of
research. Dalton and Lavie18 focused on the attentional capture produced by singletons of dierent frequency or
intensity, but did not investigate the eects of sounds whose features are gradually modied.
In addition, the study of variations in intensity, and therefore loudness, of sounds may be compromised in
this paradigm. Masking eects are likely to occur for louder sounds and interfere with the attentional processes
we wish to study21. However, the paradigm is compatible with the study of variations in timbre. One precaution
would be to equalize all sounds in loudness to remove potential masking eects and the inuence of loudness,
which can be aected by pitch or timbre variations22.
None of the approaches mentioned here focused on the relationship that may exist between variations in the
acoustic attributes and the attentional capture eect.
e rst acoustic feature one might think of when studying salience is loudness. Sounds that are perceived
as louder are more likely to attract the listener’s attention. Loudness has been shown to be an important feature
of salience16,23,24. In addition to this feature, several studies have shown that some dimensions of timbre can be
sound markers for conveying relevant information. Lemaitre etal.25 found that listeners used common perceptual
dimensions to categorize car horns. Two of the three dimensions identied were roughness and brightness. Arnal
etal.26 noted that amplitude modulated sounds in the roughness range are found in both natural and articial
alarm signals, and are better detected due to the privileged space they occupy in the communication landscape.
Rough sounds are also said to enhance aversiveness through specic neural processing27. Brightness has long
been known to be a major dimension of musical timbre28 and has therefore been included in most salience
models16,29. More recently, roughness has also been included30.
us, the existence of the stimulus-driven component of attention capture has been theoretically established.
Moreover, the additional singleton paradigm allows the measurement of the attentional capture eect due to
sound features. Finally, the literature ndings suggest that certain attributes of the sound timbre are potential
candidates that could be responsible for the salience of a sound, and thus its ability to capture attention. However,
to the authors’ knowledge, no study has ever established the relationship that might exist between variations
in these features and the magnitude of the attentional capture eect. In other words, the driving properties of
attentional capture by the stimulus features have not yet been revealed.
In the present work, we adopted the additional singleton paradigm to provide evidence for the eect of timbre
features on attentional capture. We then used this paradigm to quantify the relationship that may exist between
a sound feature and the associated capture eect. us, in the current study, we focused on the properties of the
stimulus-driving of the attentional capture eect.
To summarize, we wanted to answer the two following questions:
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
Do timbre attributes such as brightness or roughness trigger attention capture?
How do their variations drive attention capture?
First, the possibility of an attentional capture by a timbre variation was investigated. erefore, the spectral
centroid (SC) of the singleton, which correlates with its perceived brightness, was investigated in experiment 1.
en, the same experimental procedure was used to evaluate how the eect size was modulated by feature vari-
ations. In experiments 2 and 3, the SC and the depth of amplitude modulation (correlated with roughness) could
take several dierent values. Finally, experiment 4 examined the eect of symmetric variations in brightness and
experiment 5 focused on combined variations in brightness and roughness to investigate the directionality and
additivity of attentional modulation.
Experiment 1: attentional capture by a bright singleton
Method. Transparency and openness. We report how we determined our sample size, all data exclusions, all
manipulations and all measures in the study. Data were collected in 2021 and 2022 and analyzed using python
3.7. All statistical analyses were performed using python 3.7 and the open-source pingouin package.
Participants. A previous pilot experiment involving 11 participants was conducted to calculate the power of
the eect of the singleton presence on response time. e calculus was made for a one-tailed t-test, with an eect
size of d = 0.8, α = 0.05 and aiming for a power of 0.8, and determined a minimum sample size N = 12.
us, 15 participants (8 females, 7 males) took part in this experiment. ey ranged in age from 20 to
45years (mean age: 31 ± 8years). ey were all consenting and reported normal hearing. An audiometry in
the frequency range between 0.125 and 8kHz was performed for each participant and revealed no hearing
impairment. e protocol was approved according to Helsinki Declaration by the Ethics Committee of Institut
Européen d’Administration des Aaires (INSEAD). All methods were carried out in accordance with their guide-
lines and regulations. Participants gave written informed consent and received nancial compensation for their
participation.
Apparatus. e experiment was designed and run on Max soware (version 7, https:// cycli ng74. com), on a
Mac mini 2014 (OS Big Sur 11.2.3). e stimuli were designed with python 3.7, and presented during the experi-
ment through headphones (Beyerdynamic 770 pro, 250 Ohm). e experiment took place in the STMS labora-
tory of IRCAM in a soundproofed double-walled IAC booth.
Stimuli. e stimuli were made of sequences of 5 sounds (see Fig.1). All notes follow the harmonic structure
of Bouvier etal.31, with 20 harmonics, the nth harmonic fn having a frequency n*f0 and a weight
1
nα
. us, decreas-
ing α increased the sound spectral centroid (SC), and therefore its perceived brightness:
SC
=
20
i=1
iα
20
i=1
1
iα
.
Distractor. For the reference distractor, α = 3. It lasted 170ms, with a ramp at the beginning and end of 5ms,
and had a SC equal to 512Hz.
Targets. e targets were 50ms shorter or longer than the distractor. is value is higher than what Abel32
found as a just-noticeable dierence (jnd) for duration discrimination of sinusoidal sounds. Based on previous
tests done in the lab, the experimenters still ensured beforehand that the targets were clearly heard as distinct
Figure1. Stimuli without (le) and with (right) a singleton (surrounded with a glow), with 50% chances being
before or aer the target (dark blue). Only sequences with target in position 4 are shown here.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Vol:.(1234567890)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
from the distractors. e targets had the same fundamental frequency and spectrum distribution (α = 3) as the
reference distractor, but a duration of 220ms for the long one and 120ms for the short one.
Singleton. e singleton had the same fundamental frequency and envelope as the reference distractor, but
a dierent spectrum distribution with α = 2. It resulted in a higher SC, equal to 822Hz. Allen and Oxenham33
found a jnd of 5.0% for the SC, which ensures the singleton was indeed perceived brighter. e experimenters
still ensured beforehand that the singleton was clearly heard as distinct from the distractors.
In the reference condition, the target was embedded in sequences of distractors only such that a sequence was
composed of four distractors and a target stimulus. In the test condition, one of the distractors was the singleton
such that a sequence was composed of three distractors, one target and one singleton. e IOI ("Inter-Onset
Interval") was kept constant at 230ms. e rst sound of each sequence was always a distractor. e target was
in 3rd or 4th position (50% of the trials each). In the trials containing a singleton, its position was either just
before or just aer the target (50% of the trials each). All the conditions are presented in Fig.1.
Loudness equalization. All the soundswere equalized in an adjustment experiment with 12 participants from
the lab, using samesetup as the main experiment. Loudness adjustments were performed by comparing allthe
sounds(shorttarget, long target or singleton) to a reference (the distractor presentedat 80dB SPL). e sounds
were randomly distributed and presented 8 times each. e levels were measured at the headphones output with
a Brüel and Kjaer 2238 mediator sound level meter. e obtained levels were 81dB SPL for the short target,
79dB SPL for the long target and 74dB SPL for the singleton.All inter-participants standard deviations of these
obtained levels were lessthan 1dB SPL, i.e., less than a just-noticeable dierence in sound level34.
Procedure. Six blocks of 60 randomly distributed trials were run for each participant. For every trial, the word
Ready was displayed on the screen for 1500ms, then a sequence of 5 sounds was presented.
At the end of the sequence, the participant could respond by pressing a keyboard: "1" for "short" and "2" for
long (2 alternative forced choice protocol). Feedback regarding the participant’s response (Correct or Incorrect)
was displayed aer each trial and remained for 1500ms. If aer 3000ms no answer was given by the participant,
the message Too late. Answer faster! was displayed. e response time was measured from the moment the target
was played in the sequence. en, a 1500ms pause occurred and the next trial began.
e participants were asked, at the beginning of the experiment, to focus on the duration of the sounds and
their duration only in order to discriminate the target. Each participant had a training block before taking the
test. We kept only the results of participants with an error rate below 40% on the sequences containing the target.
Due to this criterion, one participant had to be replaced at this step. e experiment lasted 90min on average.
Results. For each participant, and for each singleton condition (absent or present), we calculated the mean
and the standard deviation of the response times. We then removed the data whose response time was more
than two standard deviations from the mean35. We also removed the data for which the response time was less
than 100ms, and those for which the participant did not answer. 94.9% of the data were kept at this stage. For
the response time analysis, only the data where the participant’s response was correct were kept, i.e., 75.6%
of the data. e results of mean error rates and response times are presented in Table1. For all the following
experiments, error rates follow the same trends as response time increases. e LISAS (Linear Integrated Speed
Accuracy Score—36) were also computed and followed the same trends. For the sake of clarity, we therefore show
only the increases in response time.
e error rates (16.2% and 24.2% in the conditions without and with a singleton, respectively) conrm that
participants were able to complete the task correctly in both conditions. e mean response time increase, when
the singleton was present, was 137ms. A t test revealed that the singleton presence had a signicant eect on
response time increase (t test: t(14) = 8.33, p < 0.001). e eect of the singleton presence was very large (cohen-
d = 2.1). A very large eect of the singleton presence was found for error rates as well (t(14) = 3.85, p < 0.001,
cohen-d = 1.0).
e eect of the singleton position on error rates was not signicant (t(14) = 0.72, p = 0.48), suggesting that
attentional capture occurs as much whether the singleton appears before or aer the target. However, there was
an eect of the singleton position on response times (t(14) = 4.38, p < 0.001): when the singleton appeared aer
the target, the response times were greater. is absence of eect of the singleton position on error rates and
the increased reaction times when the singleton occurs aer compared to before the target conrm that this
eect is not due to auditory masking. is is consistent with the loudness equalization that had been carried
out beforehand and the IOI which prevented auditory masking21. e observed eect is due to an attentional
Table 1. Mean and standard deviation of response times and error rates (across the 15 participants)
depending on the presence of the bright singleton.
Singleton Absent Present
Response time
(Standard deviation) 985ms
(142) 1121ms
(185)
Error rate
(Standard deviation) 16.2%
(13.5) 24.2%
(15.1)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
capture caused by the bright singleton. Finally, one could claim that the eect is due to the surprise caused by the
occurrence of the singleton. However, this singleton is present in 50% of the trials, and the participants identi-
ed and accustomed themselves to it during the training session. Moreover, no signicant dierence was found
for response times between trials where a singleton appears aer one or more trials without any singleton (the
"surprising" condition), and trials where the singleton is present aer one or more trials with a singleton (the
"non-surprising" condition): t(14) = 0.31, p = 0.76.
is rst experiment thus allowed us to validate the framework in which we can test modulations of timbre
features and observe how they drive the attentional capture eect. It was therefore decided to reproduce the
experiment, modifying it so that the singleton could take dierent values of brightness in a second experiment,
and dierent values of roughness in a third one.
Experiments 2 and 3: variations of brightness and roughness
Experiments 2 and 3 were conducted to study how the eect magnitude is modulated by the singleton fea-
ture variations. In experiment 2, we replicated experiment 1 with four dierent values of the spectral centroid
(SC) for the singleton. In experiment 3, four values of the amplitude modulation depth for the singleton were
used. is latter sound feature is associated to an auditory attribute usually described by the semantic attribute
“roughness37.
Method. Participants. Twenty participants (10 females, 10 males) took part in experiment 2, and 20 oth-
ers (10 females, 10 males) in experiment 3. e sample size was increased to ensure that the power of the eect
produced by the second-brightest singleton was greater than 0.8. is was done in order to have at least two dif-
ferent brightness conditions with sucient power. e participants ranged in age from 19 to 34years (mean age:
27 ± 4years) for experiment 2, and from 22 to 50years (mean age: 28 ± 8years) for experiment 3. ey were all
consenting and reported normal hearing. An audiometry in the frequency range between 0.125 and 8kHz was
performed for each participant and revealed no hearing impairment. Participants gave written informed consent
and received nancial compensation for their participation.
Apparatus. e apparatus was the same as in the rst experiment, except that it took place in the INSEAD-
Sorbonne Université Behavioural Lab, in soundproofed rooms.
Stimuli. e distractors and targets were the same as in experiment 1. For experiment 2, the singleton SC could
take 4 values: 538, 563, 640 or 768Hz. Each one was presented in 20% of the trials. To establish these values,
an increment of SC was calculated (using the estimation of 5% for SC jnd found by Allen and Oxenham33, and
then multiplied by 1, 2, 5 and 10. For experiment 3, the singleton signal ssing(t) was the distractor signal sdis(t)
modulated at a modulation frequency fmod = 50Hz:
ssing
(t)=
1+mcos
2πf
mod
t

s
dis
(t)
.
e modulation
depth m could take 4 values: 0.1, 0.2, 0.5 or 1.0. Each one was presented in 20% of the trials. To establish these
values, the increment of modulation depth estimation proposed by Zwicker and Fastl37 (10%) was multiplied by
1, 2, 5 and 10 as well.
Loudness equalization. e loudness of the singletons was equalized as in experiment 1. e levels obtained for
each singleton aer equalization were 79.5, 79.0, 77.5 and 75.0dB SPL for the bright singletons with SC of 538,
563, 640 and 768Hz, respectively, and 80dB SPL for all the rough singletons. All inter-participants standard
deviations of the obtained levels were less than 1dB SPL.
Procedure. e procedure was the same as in experiment 1, except that the number of trials had to be increased
because of the increased number of singletons. Eight blocks of 80 randomly distributed trials each were run for
each participant.
Results. e data processing was the same as for experiment 1. For the error rate analysis, 95.0% and 94.6%
of the data were kept for experiments 2 and 3, respectively. For the response time analysis, only the data where
the participant’s response was correct were kept, i.e., 78.6% and 76.4% of the data. e mean response time
and error rate across the 20 participants for sequences without singleton were 867ms (std = 246ms) and 12.6%
(std = 13.1%) for experiment 2, 1058ms (std = 294ms) and 15.2% (std = 12%) for experiment 3. e increase in
response time for each singleton, i.e., the dierence between the condition with the considered singleton and the
reference condition without any singleton, is presented in Fig.2 for each value of modulation depth and spectral
centroid.
For both experiments 2 and 3, t-tests were conducted with Holm corrections for repeating comparisons.
Complete statistics can be found in the Supplementary information (S1 and S2).
Data from experiment 2 conrmed and extended the result of experiment 1 as various bright singletons
produced an attentional capture eect. Moreover, the eect increased with SC values: the brighter the singleton,
the greater the eect.Experiment 3 showed that roughness is also a feature that triggers an attentional capture
eect: the presence of various rough singletons caused signicant behavioral costs. e results conrmed that
there is a dependency of salience with the variations of the feature which dene the singleton.
Interestingly, the manipulations of the two timbre attributes resulted in comparable eect magnitudes. An
increase of a few increments on brightness gives an eect similar to that obtained with an increase of the same
number of increments on roughness. is is discussed in the general discussion.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Vol:.(1234567890)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
Experiment 4: symmetrical variations of brightness
Experiment 4 was conducted to study the symmetry or the directionality of the eect. We replicated experiment
2 with SC values for the singleton being either higher or lower than the distractors SC.
Method. Participants. 19 participants (8 females, 11 males) took part in the experiment 4. ey ranged
in age from 18 to 32years (mean age: 25 ± 4years). ey were all consenting and reported normal hearing. An
audiometry in the frequency range between 0.125 and 8kHz was performed for each participant and revealed
no hearing impairment. Participants gave written informed consent and received nancial compensation for
their participation.
Apparatus. e apparatus was the same as in the rst experiment, except that it took place in the INSEAD-
Sorbonne Université Behavioural Lab, in soundproofed rooms.
Stimuli. e distractor and target SC was equal to 631Hz. e singleton SC was 2 and 4 jnd higher or lower
than the distractor one, i.e., 512, 569, 696 or 768Hz. Each one was presented in 20% of the trials. All the sounds
were equalized in loudness (12 participants with the same procedure as in experiment 1): the obtained levels
were 80, 79, 77 and 75dB SPL for the singletons with SC at 512, 569, 696 and 768Hz respectively, and 78dB SPL
for the distractor. All inter-participants standard deviations were less than 1dB SPL.
Results. e data processing was the same as for experiment 1. For the error rate analysis, 94.7% of the data
were kept. For the response time analysis, only the data where the participant’s response was correct were kept,
i.e., 87.1% of the data. e mean response time and error rate across the 19 participants for sequences without
singleton were 940ms (std = 195ms) and 5.1% (std = ± 8.4%). e increase in response time for each singleton,
i.e., the dierence between the condition with the considered singleton and the reference condition without any
singleton, is presented in Fig.3. Complete statistics can be found in the Supplementary information (S3).
e eect magnitudes are comparable to those obtained in experiment 2. A clear symmetry is observed
in experiment 4: the eect of a brighter singleton is the same as the one of a less bright singleton, if both vary
absolutely by the same amount of perceived brightness. is result tells us that it is the absolute variation of the
singleton feature that modulates the attention capture.e results of experiments 1, 2, 3 and 4 can be summarized
in Fig.4, which shows the driving of response time increases by the perceived variations in the singleton feature.
ese perceived variations are shown in terms of jnd values.
Interestingly, a linear relationship seems to emerge between increases of perceived brightness (combined
across experiment 1, 2 and the positive variations in experiment 4) and response time increase (rPearson(3) = 0.99,
Figure2. Increase in response time (ms) with singleton SC (le, experiment 2) and modulation depth (right,
experiment 3). Error bars represent the standard errors across participants in each condition compared to the
no-singleton condition. Signicances between conditions are displayed on the horizontal braces. *: p < .05, **:
p < .01, ***: p < .001.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
p < 0.001, slope = 14.0ms—std error = 0.9ms), and for perceived roughness as well (rPearson(3) = 0.99, p < 0.01,
slope = 12.4ms—std error = 0.9ms). is relationship is only valid for this range of feature variations and is
discussed in the general discussion.
Experiment 5: combination of roughness and brightness
Experiment 5 was conducted to study the additivity of the eects of dierent features variations. We replicated
experiment 2 with four dierent singletons, having dierent combinations of roughness and brightness. e
singleton could have two dierent SC combined with two dierent amplitude modulation depths.
Figure3. Increase in response time (ms) with singleton SC(experiment 4). Error bars represent the standard
errors across participants in each condition compared to the no-singleton condition. Signicances between
conditions are displayed on the horizontal braces. *: p < .05, **: p < .01, ***: p < .001.
Figure4. Increase in response time (ms) depending on the singleton perceived feature variations (jnd) in
experiments 2, 3, and 4. Error bars represent the standard errors across participants in each condition compared
to the no-singleton condition.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Vol:.(1234567890)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
Method. Participants. Nineteen participants (9 females, 10 males) took part in the experiment 4, whose
ages ranged from 21 to 36years (mean age: 26 ± 5years). ey were all consenting and reported normal hear-
ing. An audiometry in the frequency range between 0.125 and 8kHz was performed for each participant and
revealed no hearing impairment. Participants gave written informed consent and received nancial compensa-
tion for their participation.
Apparatus. e apparatus was the same as in the rst experiment, except that it took place in the INSEAD-
Sorbonne Université Behavioural Lab, in soundproofed rooms.
Stimuli. e distractor and target SC was equal to 512Hz, and they were not modulated, i.e., null roughness.
e singleton SC was 2 or 5 jnd higher than the distractor one, i.e., 564 and 653Hz. e singleton modulation
depth was 2 or 5 jnd higher as well, i.e., 0.2 and 0.5. e four singletons were thus obtained with the four com-
binations of these SC and modulation depths. Each one was presented in 20% of the trials. All the sounds were
equalized in loudness (12 participants with the same procedure as the one used in experiment 1): the obtained
levels were 79dB SPL for the singletons with 2 jnds of brightness, 77.5dB SPL for the singletons with 5 jnds of
brightness. All inter-participants standard deviations were less than 1dB SPL.
Results. e data processing was the same as for experiment 1. For the error rate analysis, 94.9% of the data
were kept. For the response time analysis, only the data where the participant’s response was correct were kept,
i.e., 74.9% of the data. e mean response time and error rate across the 19 participants for sequences without
singleton were 994ms (std = 158ms) and 17.4% (std = 12.7%). e increase in response time for each singleton,
i.e., the dierence between the condition with the considered singleton and the reference condition without any
singleton, is presented in Fig.5. Complete statistics can be found in the Supplementary information (S4).
e eect produced by a 2 + 2-jnds variation here is comparable to that produced by a 2-jnds variation in
experiments 2 and 3. It is uncertain whether this is due to a non-additivity of the eects of the combined features
or whether participants were simply less subject to attentional capture in this experiment. Nevertheless, within
their range of magnitudes, the response times in experiment 5 appear to increase linearly with the addition of
the perceptual variations on the two dimensions (rPearson(3) = 0.99, p < 0.01, slope = 8.5ms—std error = 0.4ms).
In other words, the eect seems to be additive across dimensions in this range of values.
Public signicance statement. ese ndings provide evidence that the perception of certain auditory
features drives the ability of sounds to capture our attention, according to laws that are revealed.
General discussion
Results from experiment 1 showed that a singleton dened by its timbre, specically its brightness, captured
participants’ attention despite being irrelevant to the task they had to perform. Experiment 2 proved that the eect
magnitude was driven by the singleton brightness. Experiment 3 showed that a dierent attribute, roughness,
Figure5. Increase in response time (ms) with singleton SC and modulation depth(experiment 5). Error bars
represent the standard errors across participants in each condition compared to the no-singleton condition.
Signicances between conditions are displayed on the horizontal braces. *: p < .05, **: p < .01, ***: p < .001.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
also drives the attentional capture eect. Results from experiment 4 and 5 revealed that this eect is symmetrical,
i.e., that only the absolute perceived deviation matters, and additive, i.e., that combining features produces the
addition of the eects that each feature variation produces alone.
us, in a series of 4 dierent experiments (2, 3, 4 and 5), a driving of attentional capture by the singleton
feature was observed. All else being equal in the experiments, the participants’ attentional state remained identi-
cal across the dierent values of the singleton feature. Nevertheless, the magnitude of the eect increased with
increasing brightness or roughness variation. e results cannot be explained by increasing singleton-target
similarity, because the timbre variations dening the singletons did not make them more similar to the target.
Since the increased response times cannot be explained by top-down processes that change with the value of
the singleton feature, the observed relationships represent purely feature-driven components of the eect. In
other words, the bottom-up component of the attentional capture eect is revealed here, not only conrming its
existence11, but also revealing its pattern.
us, by varying the timbre of the tones while keeping the participants’ attentional state xed, we were able
to elicit only the bottom-up component of attentional capture. However, the nature of our protocol itself could
raise questions about the participants’ attentional state and thus the origin of the capture. e contingency on
participants’ attentional state12 is questionable here. Indeed, according to the contingency hypothesis15, the task
leads to an attentional state that favors the detection of singletons, and this is why attention is captured by the
singleton. However, in the present experiments, there were two single items (out of ve) in 80% of the trials,
and the singleton was one out of 4 possible singletons. Furthermore, all sounds had a fundamental frequency
randomly drawn from a broad uniform distribution of 20Hz. us, the variability of the items was increased in
our protocol, and the target was not a single item among all identical items. e single-item detection strategy
may therefore no longer be advantageous in this setting, and the adaptation of the singleton to participants’
attentional state may be dierent from that which was traditionally thought to be responsible for detection in this
paradigm. Further work is needed to understand the interactions between the bottom-up component revealed
here and top-down processes, and to address the issue of the compatibility of these results with the contingent
capture approach. For example, it would be important to investigate how the driving by the singleton features
evolves as participants change their attentional state.
e feature-driven relationships obtained make it possible to observe and compare how dierent features
modulate attention capture. Indeed, the marginal increase of the eect (the derivative of the curves of response
time increases with the perceptual variations of the feature) can be interpreted as the weight of the feature in
the sound salience. Interestingly, in experiments 2 and 3, both features drove the eect in a similar way. Either
these two features are by chance equally responsible for the salience of a sound, or it is the perceived deviation
on each dimension that is important in making a sound salient. is evolution of attentional capture with varia-
tions of dierent features therefore deserves to be conrmed through more experiments involving more features
(harmonicity, attack time, spectral ux…). If a similar driving is found for other features, it would show that
it is precisely how dierent the sound is perceived that matters to trigger attentional capture, regardless of the
feature used. On the contrary, some features could drive the eect with more or less power. is would lead to a
hierarchy of features that inuence the salience of a stimulus in terms of its ability to capture attention.
Furthermore, the combined results of experiments 1, 2, 3 and 4 (summarized in Fig.4) reveal a monotonic
relationship between the perceived dierence of the singleton feature (quantied in just-noticeable dierences)
and the increase in reaction time. us, the attentional capture eect increases progressively with the perceived
dierence, according to a law that appears to be linear in the range of deviations tested. is law cannot extend
over a very wide range of values, as the capture eect must saturate at some point. In any case, we observe that
there is no threshold eect, the function is monotonic and continuous. A more precise and extensive determina-
tion of this function could also be further investigated in future studies.
is work also brings new insights into the understanding of auditory salience itself, conrming the impor-
tance of timbre in this property. Both brightness and roughness were found to be responsible for an attentional
capture by irrelevant sounds. It therefore appears that timbre is also a key dimension in directing auditory atten-
tion, in addition to the main dimensions of frequency and intensity highlighted by Dalton and Lavie18. e results
on brightness conrm the ndings that previously led some researchers to consider this feature in their salience
model16,29. Roughness has only recently been included in some form: Kothinti etal.30, for example, added aver-
age fast temporal modulations to the latest version of their model. e relationship found between attentional
capture and feature variations seems to be supported by both features and deserves further investigation, either
in other contexts (other tasks, more complex environments…) or with other features.
Our results show that attention capture is driven by absolute deviations of the sound features. In other words,
the features do not have an intrinsic polarity with respect to salience (e.g., the brighter, the more salient). Rather,
it is a dissimilarity eect that modulates it. is is consistent with predictive coding and theories of auditory
deviance detection38. ey suggest that the deviations between the prediction and what is subsequently perceived
determine auditory salience and trigger notied events39,40. Here, we support these theories by showing that
absolute deviations of the sound features directly modulate the magnitude of the attentional capture eect, i.e.,
their salience.
Finally, our ndings are interesting from the perspective of auditory salience modelling, which could be
improved by knowing the relevant parameters to consider and how salience depends on their variations. e
approach taken so far is to consider the absolute and normalized feature variations over time16,39,41, without
implying a more elaborate modulation of attention with these variations. e additivity of the eect produced
by dierent feature variations provides insights into how to combine them41. An interesting avenue might be
to consider more complex interactions and to go deeper in the understanding of the mechanisms underlying
auditory salience.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
Vol:.(1234567890)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
Conclusion
is work provides contributions on a theoretical, methodological and practical level. From a theoretical point
of view, a driving of attention capture by a stimulus feature was revealed. is modulation of bottom-up atten-
tion was found to be monotonic and similar for the two timbre attributes studied here: brightness and rough-
ness. e experiment with variations in brightness highlighted symmetric properties, and the experiment with
combinations of both attributes underlined the non-additive character. Methodologically, a way to measure
the feature-driven component of attention was proposed: it implies modulating the singleton features in an
additional singleton paradigm while keeping the attentional state constant. From a practical perspective, the
results may enrich salience models that can include these features and the way they modulate salience in their
implementation.
Finally, this study opens perspectives and calls for further studies. e extendibility of the modulation law
to more features and to a wider range of feature variations, its dependence on attentional sets and top-down
processes, and a higher resolution of the modulation curves deserve further investigation.
Data availability
All data are available at https:// github. co m/ Bouvi erBa p tiste/ Revea ling- the- stimu lus- driven- compo nent- of- atten
tion- throu gh- modul ations- of- audit ory- salie nce- by- tim. git.
Received: 22 December 2022; Accepted: 13 April 2023
References
1. Desimone, R. & Duncan, J. Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18(1), 193–222. https:// doi. org/
10. 1146/ annur ev. ne. 18. 030195. 001205 (1995).
2. Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25(5), 975–979.
https:// doi. org/ 10. 1121/1. 19072 29 (1953).
3. Broadbent, D. E. Perception and communication (Elsevier, 2013).
4. McDermott, J. H. e cocktail party problem. Curr. Biol. 19(22), R1024–R1027. https:// do i. org/ 10. 1016/j. cub. 2009. 09. 005 (2009).
5. Marinato, G. & Baldauf, D. Object-based attention in complex, naturalistic auditory streams. Sci. Rep. 9(1), 2854. https:// doi. org/
10. 1038/ s41598- 019- 39166-6 (2019).
6. Treue, S. Visual attention: e where, what, how and why of saliency. Curr. Opin. Neurobiol. 13(4), 428–432. https:// doi. org/ 10.
1016/ S0959- 4388(03) 00105-3 (2003).
7. eeuwes, J. Perceptual selectivity for color and form. Percept. Psychophys. 51(6), 599–606. https:// doi. org/ 10. 1037/ 0096- 1523.
30.1. 180 (1992).
8. eeuwes, J. Stimulus-driven capture and attentional set: selective search for color and visual abrupt onsets. J. Exp. Psychol. Hum.
Percept. Perform. 20(4), 799. https:// doi. org/ 10. 1037/ 0096- 1523. 20.4. 799 (1994).
9. Yantis, S. & Jonides, J. Abrupt visual onsets and selective attention: evidence from visual search. J. Exp. Psychol. Hum. Percept.
Perform. 10(5), 601. https:// doi. org/ 10. 1037/ 0096- 1523. 10.5. 601 (1984).
10. eeuwes, J. Top–down and bottom–up control of visual selection. Acta Physiol. (Oxf) 135(2), 77–99. https:// doi. org/ 10. 1016/j.
actpsy. 2010. 02. 006 (2010).
11. eeuwes, J. Visual selective attention: A theoretical analysis. Acta Physiol. (Oxf) 83(2), 93–154. https:// doi. org/ 10. 1016/ 0001-
6918(93) 90042-P (1993).
12. Folk, C. L., Remington, R. W. & Johnston, J. C. Involuntary covert orienting is contingent on attentional control settings. J. Exp.
Psychol. Hum. Percept. Perform. 18(4), 1030. https:// doi. org/ 10. 1037/ 0096- 1523. 18.4. 1030 (1992).
13. Gaspelin, N. & Luck, S. J. e role of inhibition in avoiding distraction by salient stimuli. Trends Cogn. Sci. 22(1), 79–92. https://
doi. org/ 10. 1016/j. tics. 2017. 11. 001 (2018).
14. Stilwell, B. T. & Gaspelin, N. Attentional suppression of highly salient color singletons. J. Exp. Psychol. Hum. Percept. Perform.
47(10), 1313. https:// doi. org/ 10. 1037/ xhp00 00948 (2021).
15. Luck, S. J., Gaspelin, N., Folk, C. L., Remington, R. W. & eeuwes, J. Progress toward resolving the attentional capture debate.
Vis. Cogn. 29(1), 1–21. https:// doi. org/ 10. 1080/ 13506 285. 2020. 18489 49 (2021).
16. Huang, N. & Elhilali, M. Auditory salience using natural soundscapes. J. Acoust. Soc. Am. 141(3), 2163–2176. https:// doi. org/ 10.
1037/ 0096- 1523. 30.1. 180 (2017).
17. Kaya, E. M., Huang, N. & Elhilali, M. Pitch, timbre and intensity interdependently modulate neural responses to salient sounds.
Neuroscience 440, 1–14. https:// doi. org/ 10. 1016/j. neuro scien ce. 2020. 05. 018 (2020).
18. Dalton, P. & Lavie, N. Auditory attentional capture: Eects of singleton distractor sounds. J. Exp. Psychol. Hum. Percept. Perform.
30(1), 180. https:// doi. org/ 10. 1037/ 0096- 1523. 30.1. 180 (2004).
19. Pashler, H. Cross-dimensional interaction and texture segregation. Percept. Psychophys. 43(4), 307–318. https:// doi. org/ 10. 3758/
BF032 08800 (1988).
20. Garrido, M. I., Kilner, J. M., Stephan, K. E. & Friston, K. J. e mismatch negativity: A review of underlying mechanisms. Clin.
Neurophysiol. 120(3), 453–463. https:// doi. org/ 10. 1016/j. clinph. 2008. 11. 029 (2009).
21. Moore, B. C. (2012). An introduction to the psychology of hearing. Brill.
22. Melara, R. D. & Marks, L. E. Interaction among auditory dimensions: Timbre, pitch, and loudness. Percept. Psychophys. 48(2),
169–178. https:// doi. org/ 10. 3758/ BF032 07084 (1990).
23. Kim, K., Lin, K. H., Walther, D. B., Hasegawa-Johnson, M. A. & Huang, T. S. Automatic detection of auditory salience with opti-
mized linear lters derived from human annotation. Pattern Recogn. Lett. 38, 78–85. https:// doi. org/ 10. 1016/j. patr ec. 2013. 11. 010
(2014).
24. Liao, H. I., Kidani, S., Yoneya, M., Kashino, M. & Furukawa, S. Correspondences among pupillary dilation response, subjective
salience of sounds, and loudness. Psychon. Bull. Rev. 23(2), 412–425. https:// doi. org/ 10. 3758/ s13423- 015- 0898-0 (2016).
25. Lemaitre, G., Susini, P., Winsberg, S., McAdams, S. & Letinturier, B. e sound quality of car horns: a psychoacoustical study of
timbre. Acta acustica United Acustica 93(3), 457–468 (2007).
26. Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A. L. & Poeppel, D. Human screams occupy a privileged niche in the com-
munication soundscape. Curr. Biol. 25(15), 2051–2056. https:// doi. org/ 10. 1121/1. 48632 69 (2015).
27. Arnal, L. H., Kleinschmidt, A., Spinelli, L., Giraud, A. L. & Mégevand, P. e rough sound of salience enhances aversion through
neural synchronisation. Nat. Commun. 10(1), 1–12. https:// doi. org/ 10. 1121/1. 48632 69 (2019).
28. McAdams, S. (2019). e perceptual representation of timbre. In Timbre: Acoustics, Perception, and Cognition (pp. 23–57). Springer.
https:// doi. org/ 10. 1007/ 978-3- 030- 14832-4_2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
Vol.:(0123456789)
Scientic Reports | (2023) 13:6842 | https://doi.org/10.1038/s41598-023-33496-2
www.nature.com/scientificreports/
29. Tordini, F., Bregman, A. S., Cooperstock, J. R., Ankolekar, A., & Sandholm, T. (2013). Toward an improved model of auditory
saliency. Georgia Institute of Technology.
30. Kothinti, S. R., Huang, N. & Elhilali, M. Auditory salience using natural scenes: An online study. J. Acoust. Soc. Am. 150(4),
2952–2966. https:// doi. org/ 10. 1121/ 10. 00067 50 (2021).
31. B ouvier, B., Susini, P., Marquis-Favre, C., & Misdariis, N. Auditory salience: A study of the inuence of timbre attributes using the
additional singleton paradigm (Version 0). 19th International Symposium on Hearing: Psychoacoustics, Physiology of Hearing, and
Auditory Modelling, from the Ear to the Brain (ISH2022). https:// doi. org/ 10. 5281/ zenodo. 65769 22 (2022).
32. Abel, S. M. Duration discrimination of noise and tone bursts. J. Acoust. Soc. Am. 51(4B), 1219–1223. https:// doi. org/ 10. 1121/1.
19129 63 (1972).
33. Allen, E. J. & Oxenham, A. J. Symmetric interactions and interference between pitch and timbre. J. Acoust. Soc. Am. 135(3),
1371–1379. https:// doi. org/ 10. 1121/1. 48632 69 (2014).
34. Jesteadt, W., Wier, C. C. & Green, D. M. Intensity discrimination as a function of frequency and sensation level. J. Acoust. Soc. Am.
61(1), 169–177. https:// doi. org/ 10. 1121/1. 381278 (1977).
35. Miller, J. Reaction time analysis with outlier exclusion: Bias varies with sample size. Q. J. Exp. Psychol. 43(4), 907–912. https:// doi.
org/ 10. 1080/ 14640 74910 84009 62 (1991).
36. Vandierendonck, A. A comparison of methods to combine speed and accuracy measures of performance: A rejoinder on the bin-
ning procedure. Behav. Res. Methods 49(2), 653–673. https:// doi. org/ 10. 3758/ s13428- 016- 0721-5 (2017).
37. Zwicker, E., & Fastl, H. (2013). Psychoacoustics: Facts and Models (Vol. 22). Springer.
38. Winkler, I. Interpreting the mismatch negativity. J. Psychophysiol. 21(3–4), 147. https:// doi. org/ 10. 1027/ 0269- 8803. 21. 34. 147 (2007).
39. Kaya, E. M. & Elhilali, M. Investigating bottom-up auditory attention. Front. Hum. Neurosci. 8(MAY), 1–12. https:// doi. org/ 10.
1037/ 0096- 1523. 30.1. 180 (2014).
40. Southwell, R. et al. Is predictability salient? A study of attentional capture by auditory patterns. Phil. Trans. R. Soc. B 372, 20160105.
https:// doi. org/ 10. 1098/ rstb. 2016. 0105 (2017).
41. Kaya, E. M. & Elhilali, M. Modelling auditory attention. Philos. Trans. R. Soc. B: Biol. Sci. https:// doi. org/ 10. 1037/ 0096- 1523. 30.1.
180 (2017).
42. Simons, D. J. Attentional capture and inattentional blindness. Trends Cogn. Sci. 4(4), 147–155. https:// doi. org/ 10. 1016/ S1364-
6613(00) 01455-8 (2000).
Acknowledgements
We thank the INSEAD sta for their help in welcoming and organizing participant slots during the experiments,
and Claire Richards for proofreading the manuscript. e contributed work from LTDS was performed within
the Labex CeLyA (ANR-10-LABX-0060).
Author contributions
All authors designed the experiments. B.B. collected the experimental data, performed analyses, and wrote the
manuscript. All authors reviewed the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at https:// doi. org/
10. 1038/ s41598- 023- 33496-2.
Correspondence and requests for materials should be addressed to B.B.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2023
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... This phenomenon is known as involuntary attentional capture. Salience is the property of a stimulus that can attract attention (a bottom-up component of attention) (Treue, 2003;Bouvier et al., 2023) Thus, the characteristics of a message that distinguish it from the general information flow are more likely to attract attention during exposure. For example, utilising visual elements, such as bright colours, images and warning signs, increases message impact because these elements are very noticeable to the audience and can involuntarily attract attention (Sutton et al., 2024). ...
Article
Full-text available
The article presents the results of a theoretical analysis on the topic of media-psychological aspects of the perception of environmental risks and the presentation of the author's media psychological model of the impact of media messages on the perception of environmental risks. The presented model examines the processes and stages involved in the initial encounter with environmental risk messages in the media and the further outcomes of their processing which can manifest in information-seeking behaviour. According to the presented model, changes in risk perception are considered a series of stages initiated by attention to information and can continue when information-seeking behaviour is triggered. The main models relied on by the author of this study are the Limited Capacity Model of Motivated Mediated Message Processing to describe media message processing and the Risk Information Search and Processing Model to describe aspects of information-seeking behaviour. Several additional communication models dedicated to the consideration of information behaviour and processing of media messages were also used to describe the processes of mass media influence on the perception of environmental risks. Thus, dual models of information processing and persuasive communication are also considered, including the Heuristic-Systematic Model and the Elaboration Likelihood Model, as well as additional theories examining information-seeking behaviour: the Theory of Motivated Information Management and the Planned Risk Information Seeking Model
... has been shown to demand attention (Bouvier et al., 2023), attention-related experiences that are linked to the urge to move, such as perceived catchiness or immersion (Duman et al., 2024), might mediate the direct effect of brightness. Alternatively, the direct effects might be representative of music structure's influence on the acoustic descriptors. ...
Preprint
This study investigates how polyphonic timbre, an important factor in music listening, influences the groove experience, one of the most important reactions to music. We selected six short popular music bass and drum patterns from the genres funk, pop, and rock and rendered them with three different genre-typical timbres (funk, pop, rock) each (18 stimuli). In an online listening experiment (N = 131), participants rated their experienced urge to move, pleasure, energetic arousal, time-related interest, and inner representation of temporal regularity in response to these stimuli. We found that the genre-typical timbres had only tiny effects on the experienced urge to move, which moreover varied by pattern. In contrast, acoustical measurements of two aspects of timbre proved to be better predictors for the urge to move (R2m = 0.132). An analysis with the psychological model of groove revealed that polyphonic timbre influences the urge to move directly, and via energetic arousal and time-related interest but not via pleasure.
... Our results provide additional empirical support to the idea that acoustic saliency is not only a matter of specific acoustic features (e.g., rugosity, see Arnal et al., 2019; or shorter inter-onset intervals, see Suied et al., 2010), but also a matter of contextual information (see Kothini & Elhilali, 2023 for a recent discussion)-since local alterations in loudness, both through crescendos and decrescendos, were enough to attract musicians' attention. While this idea has already been tested through highly controlled psychoacoustic tasks, in which participants are presented with short artificial sequences of sounds (see, e.g., Bouvier et al., 2023), our study extends the validity of these results to an interactional context in which participants rely on a variety of instrumental sounds that are on a far greater level of acoustic complexity. ...
Article
Full-text available
While empirical studies on joint music-making have shed light on many aspects of ensemble performance in the past few decades, the role of auditory attention in such a context has remained strikingly understudied. We draw here on a self-annotation methodology to investigate musicians’ listening behaviors in freely improvised performances. Six trios of professional musicians were asked to freely improvise in a recording studio, hearing each other only through headphones. While they were playing, the loudness of each musician as sent to the other two musicians’ headphones was covertly increased/decreased during random periods of time in order to enhance its relative saliency within the musical scene. Immediately after each improvisation, musicians were asked to listen to the improvisation that they had just performed and to continuously indicate, using a specific application, where their listening focus was as they were performing. The results demonstrated that during periods of loudness manipulation, musicians’ attention was significantly drawn to the musician who had been made more salient. Two follow-up studies then investigated the extent to which joint auditory attention correlated with perceived togetherness, as well as whether improvisers’ auditory attention during the performance aligned with that of external listeners attending to the recording of the performance. Taken together, our results suggest that, beyond the effects of saliency, musicians also tend to strategically adapt their listening behavior to the specificities of the interactional context and that musicians’ collective listening behaviors have an impact on the performance, both at an acoustic level and at a perceptual level. By relying on attentional patterns that dynamically emerged from complex, ecological musical interactions, our studies provide a first attempt at assessing the effects of auditory attention on coordination and contribute to establishing sonic interactions as a promising setting for the study of the effects of joint attention.
... While this effect was most evident in energy features, several other spectral and temporal modulation features showed similar trends. Features such as pitch, brightness, and roughness were observed to affect salience directly (Arnal et al., 2019;Bouvier et al., 2023), and the lower changes in these features for mismatched events suggest contribution from other factors. Additionally, none of the acoustic features considered showed significant differences between fwd-match and bwd-match events. ...
Article
Full-text available
Auditory salience is a fundamental property of a sound that allows it to grab a listener's attention regardless of their attentional state or behavioral goals. While previous research has shed light on acoustic factors influencing auditory salience, the semantic dimensions of this phenomenon have remained relatively unexplored owing both to the complexity of measuring salience in audition as well as limited focus on complex natural scenes. In this study, we examine the relationship between acoustic, contextual, and semantic attributes and their impact on the auditory salience of natural audio scenes using a dichotic listening paradigm. The experiments present acoustic scenes in forward and backward directions; the latter allows to diminish semantic effects, providing a counterpoint to the effects observed in forward scenes. The behavioral data collected from a crowd-sourced platform reveal a striking convergence in temporal salience maps for certain sound events, while marked disparities emerge in others. Our main hypothesis posits that differences in the perceptual salience of events are predominantly driven by semantic and contextual cues, particularly evident in those cases displaying substantial disparities between forward and backward presentations. Conversely, events exhibiting a high degree of alignment can largely be attributed to low-level acoustic attributes. To evaluate this hypothesis, we employ analytical techniques that combine rich low-level mappings from acoustic profiles with high-level embeddings extracted from a deep neural network. This integrated approach captures both acoustic and semantic attributes of acoustic scenes along with their temporal trajectories. The results demonstrate that perceptual salience is a careful interplay between low-level and high-level attributes that shapes which moments stand out in a natural soundscape. Furthermore, our findings underscore the important role of longer-term context as a critical component of auditory salience, enabling us to discern and adapt to temporal regularities within an acoustic scene. The experimental and model-based validation of semantic factors of salience paves the way for a complete understanding of auditory salience. Ultimately, the empirical and computational analyses have implications for developing large-scale models for auditory salience and audio analytics.
Chapter
This chapter describes a conceptual model of speech quality perception for listening-only test scenarios, specifying internal processes and representations at different processing stages (sensory, perceptual, and response-related). Components of the event-related brain potential (ERP) are linked to specific internal processes as psychophysiological indicators. The final paragraph lists research questions that are to be addressed in three studies (Chaps. 5–7).
Article
Occupational noise exposure is a widespread concern, impacting millions of workers. The present research focuses on the audibility of acoustic alarms to ensure worker safety while minimizing exposure to unnecessarily high alarm levels. It introduces a laboratory experiment carried on normal-hearing participants to assess the perceived audibility of acoustic alarms in various workplace noise conditions. The experiment aimed to enhance comprehension of the audibility of acoustic alarms at supra-threshold levels, sought to facilitate the formulation of improved guidelines for alarm design. The results reveal the inappropriateness of the most commonly employed alarm level setting criterion of the ISO 7731 international standard, leading to excessive alarm levels in highly noisy work environments. Based on our data, we propose a revised value for this criterion. In addition, an acoustical analysis of the sounds used in the experiment shows that alarms that are more salient are perceived as more audible, thereby providing leads for alarm design. The study also introduces an innovative technique using a convolutional neural network model to predict the audibility of alarms in noise. Moving beyond generic arbitrary criteria, this data-driven approach leverages knowledge from perceptually annotated examples sourced from our contributed dataset. Evaluation on the experimental data and further analysis of the model outputs demonstrate solid alignment of the model predictions with human perception.
Article
Full-text available
Salience is the quality of a sensory signal that attracts involuntary attention in humans. While it primarily reflects conspicuous physical attributes of a scene, our understanding of processes underlying what makes a certain object or event salient remains limited. In the vision literature, experimental results, theoretical accounts, and large amounts of eye-tracking data using rich stimuli have shed light on some of the underpinnings of visual salience in the brain. In contrast, studies of auditory salience have lagged behind due to limitations in both experimental designs and stimulus datasets used to probe the question of salience in complex everyday soundscapes. In this work, we deploy an online platform to study salience using a dichotic listening paradigm with natural auditory stimuli. The study validates crowd-sourcing as a reliable platform to collect behavioral responses to auditory salience by comparing experimental outcomes to findings acquired in a controlled laboratory setting. A model-based analysis demonstrates the benefits of extending behavioral measures of salience to broader selection of auditory scenes and larger pools of subjects. Overall, this effort extends our current knowledge of auditory salience in everyday soundscapes and highlights the limitations of low-level acoustic attributes in capturing the richness of natural soundscapes.
Article
Full-text available
A longstanding debate in visual attention research has been whether physically salient objects have an automatic power to capture attention. Recent evidence has supported a hybrid model. According to the signal suppression hypothesis, salient items automatically attract attention, but can be proactively suppressed via top-down control to prevent attentional capture. Although much recent evidence has suggested that salient items can be suppressed, many of these studies used color singletons with relatively low salience. It is therefore unknown whether highly salient color singletons can also be suppressed. The current study adapted the probe technique to assess capture by color singletons at large set sizes (10 or 30 items). In four experiments, we observed no evidence that highly salient color singletons captured attention and instead observed evidence that they were suppressed below baseline levels of processing. We did, however, find strong evidence of floor effects in probe report at high set sizes, which can be mitigated by limiting the number of items that are simultaneously probed. Altogether, the results support the signal suppression hypothesis.
Article
Full-text available
For over 25 years, researchers have debated whether physically salient stimuli capture attention in an automatic manner, independent of the observer’s goals, or whether the capture of attention depends on the match between a stimulus and the observer’s task set. Recent evidence suggests an intermediate position in which salient stimuli automatically produce a priority signal, but the capture of attention can be prevented via an inhibitory mechanism that suppresses the salient stimulus. Here, proponents from multiple sides of the debate describe how their original views have changed in light of recent research, as well as remaining areas of disagreement. These perspectives highlight some emerging areas of consensus and provide new directions for future research on attentional capture.
Article
Full-text available
As we listen to everyday sounds, auditory perception is heavily shaped by interactions between acoustic attributes such as pitch, timbre and intensity; though it is not clear how such interactions affect judgments of acoustic salience in dynamic soundscapes. Salience perception is believed to rely on an internal brain model that tracks the evolution of acoustic characteristics of a scene and flags events that do not fit this model as salient. The current study explores how the interdependency between attributes of dynamic scenes affects the neural representation of this internal model and shapes encoding of salient events. Specifically, the study examines how deviations along combinations of acoustic attributes interact to modulate brain responses, and subsequently guide perception of certain sound events as salient given their context. Human volunteers have their attention focused on a visual task and ignore acoustic melodies playing in the background while their brain activity using electroencephalography is recorded. Ambient sounds consist of musical melodies with probabilistically-varying acoustic attributes. Salient notes embedded in these scenes deviate from the melody’s statistical distribution along pitch, timbre and/or intensity. Recordings of brain responses to salient notes reveal that neural power in response to the melodic rhythm as well as cross-trial phase alignment in the theta band are modulated by degree of salience of the notes, estimated across all acoustic attributes given their probabilistic context. These neural nonlinear effects across attributes strongly parallel behavioral nonlinear interactions observed in perceptual judgments of auditory salience using similar dynamic melodies; suggesting a neural underpinning of nonlinear interactions that underlie salience perception.
Article
Full-text available
In vision, objects have been described as the ‘units’ on which non-spatial attention operates in many natural settings. Here, we test the idea of object-based attention in the auditory domain within ecologically valid auditory scenes, composed of two spatially and temporally overlapping sound streams (speech signal vs. environmental soundscapes in Experiment 1 and two speech signals in Experiment 2). Top-down attention was directed to one or the other auditory stream by a non-spatial cue. To test for high-level, object-based attention effects we introduce an auditory repetition detection task in which participants have to detect brief repetitions of auditory objects, ruling out any possible confounds with spatial or feature-based attention. The participants’ responses were significantly faster and more accurate in the valid cue condition compared to the invalid cue condition, indicating a robust cue-validity effect of high-level, object-based auditory attention.
Article
Full-text available
Researchers have long debated whether salient stimuli can involuntarily 'capture' visual attention. We review here evidence for a recently discovered inhibitory mechanism that may help to resolve this debate. This evidence suggests that salient stimuli naturally attempt to capture attention, but capture can be avoided if the salient stimulus is suppressed before it captures attention. Importantly, the suppression process can be more or less effective as a result of changing task demands or lapses in cognitive control. Converging evidence for the existence of this suppression mechanism comes from multiple sources, including psychophysics, eye-tracking, and event-related potentials (ERPs). We conclude that the evidence for suppression is strong, but future research will need to explore the nature and limits of this mechanism.
Article
Full-text available
Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.
Article
Full-text available
In this series of behavioural and electroencephalography (EEG) experiments, we investigate the extent to which repeating patterns of sounds capture attention. Work in the visual domain has revealed attentional capture by statistically predictable stimuli, consistent with predictive coding accounts which suggest that attention is drawn to sensory regularities. Here, stimuli comprised rapid sequences of tone pips, arranged in regular (REG) or random (RAND) patterns. EEG data demonstrate that the brain rapidly recognizes predictable patterns manifested as a rapid increase in responses to REG relative to RAND sequences. This increase is reminiscent of the increase in gain on neural responses to attended stimuli often seen in the neuroimaging literature, and thus consistent with the hypothesis that predictable sequences draw attention. To study potential attentional capture by auditory regularities, we used REG and RAND sequences in two different behavioural tasks designed to reveal effects of attentional capture by regularity. Overall, the pattern of results suggests that regularity does not capture attention. This article is part of the themed issue ‘Auditory and visual scene analysis’.
Conference Paper
Attention is a set of processes that allows a listener to select certain relevant information in a sound scene and to ignore others. However, some sounds that are not relevant to process sometimes manage to capture our attention. This effect is known as salience. In the present study, the influence of two timbre attributes, brightness and roughness, on the salience of a sound was investigated. To address this issue, an additional singleton paradigm was implemented to provide an indirect measure of the influence of a sound feature on its salience. We examined how performance on an auditory discrimination task was degraded in the presence of an irrelevant sound, called singleton. In a first experiment, the singleton was made brighter (higher spectral centroid) and its presence produced a behavioral cost, showing that this attribute has an effect on salience. In two following experiments, we analyzed the influence of different values of brightness on one hand, and different values of roughness (characterized by the amplitude modulation depth) on the other hand. The results confirmed that both brightness and roughness have a significant effect on salience. Moreover, the higher the spectral centroid or the amplitude modulation depth of the singleton, the higher the error rates and response times. This revealed that attentional capture modulates with variations of timbre attributes. This work opens new perspectives for the study of auditory salience and the understanding of bottom-up mechanisms that underly it.
Chapter
Timbre is a complex auditory attribute that is extracted from a fused auditory event. Its perceptual representation has been explored as a multidimensional attribute whose different dimensions can be related to abstract spectral, temporal, and spectrotemporal properties of the audio signal, although previous knowledge of the sound source itself also plays a role. Perceptual dimensions can also be related to acoustic properties that directly carry information about the mechanical processes of a sound source, including its geometry (size, shape), its material composition, and the way it is set into vibration. Another conception of timbre is as a spectromorphology encompassing time-varying frequency and amplitude behaviors, as well as spectral and temporal modulations. In all musical sound sources, timbre covaries with fundamental frequency (pitch) and playing effort (loudness, dynamic level) and displays strong interactions with these parameters.