ArticlePDF Available

Abstract and Figures

The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance-the capacity to make sense of complex 'auditory scenes' is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the 'stochastic figure-ground' stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a 'game' featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders.
Content may be subject to copyright.
RESEARCH ARTICLE
Large-Scale Analysis of Auditory Segregation
Behavior Crowdsourced via a Smartphone
App
Sundeep Teki
1,2¤
*, Sukhbinder Kumar
1,2
, Timothy D. Griffiths
1,2
1Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom, 2Institute
of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom
¤Current address: Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United
Kingdom
*sundeep.teki@gmail.com
Abstract
The human auditory system is adept at detecting sound sources of interest from a complex
mixture of several other simultaneous sounds. The ability to selectively attend to the speech
of one speaker whilst ignoring other speakers and background noise is of vital biological sig-
nificancethe capacity to make sense of complex auditory scenesis significantly impaired
in aging populations as well as those with hearing loss. We investigated this problem by
designing a synthetic signal, termed the stochastic figure-groundstimulus that captures
essential aspects of complex sounds in the natural environment. Previously, we showed
that under controlled laboratory conditions, young listeners sampled from the university sub-
ject pool (n = 10) performed very well in detecting targets embedded in the stochastic fig-
ure-ground signal. Here, we presented a modified version of this cocktail party paradigm as
agamefeatured in a smartphone app (The Great Brain Experiment) and obtained data
from a large population with diverse demographical patterns (n = 5148). Despite differences
in paradigms and experimental settings, the observed target-detection performance by
users of the app was robust and consistent with our previous results from the psychophysi-
cal study. Our results highlight the potential use of smartphone apps in capturing robust
large-scale auditory behavioral data from normal healthy volunteers, which can also be
extended to study auditory deficits in clinical populations with hearing impairments and cen-
tral auditory disorders.
Introduction
Every day, we are presented with a variety of sounds in our environment. For instance, on a
quiet walk in the park we can hear the sound of birds chirping, children playing, people talking
on their mobile phones, vendors selling ice cream amongst other sounds in the background.
The ability to selectively listen to a particular sound source of interest amongst several other
simultaneous sounds is an important function of hearing systems. This problem is referred to
as the cocktail party problem[1,2,3,4]. Auditory cortical processing in real-world
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 1/14
a11111
OPEN ACCESS
Citation: Teki S, Kumar S, Griffiths TD (2016) Large-
Scale Analysis of Auditory Segregation Behavior
Crowdsourced via a Smartphone App. PLoS ONE 11
(4): e0153916. doi:10.1371/journal.pone.0153916
Editor: Warren H Meck, Duke University, UNITED
STATES
Received: July 29, 2015
Accepted: April 6, 2016
Published: April 20, 2016
Copyright: © 2016 Teki et al. This is an open access
article distributed under the terms of the Creative
Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are
credited.
Data Availability Statement: The raw data are
available from Figshare: https://figshare.com/s/
c99b7b30398b1151ee87.
Funding: This work is supported by a Wellcome
Trust grant (WT091681MA) awarded to Timothy D.
Griffiths. Sundeep Teki is supported by the Wellcome
Trust (WT106084/Z/14/Z). The funders had no role in
study design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing Interests: The authors have that no
competing interests exist.
environments is a fertile field of scientific pursuit [5], and an inability to perform figure
ground analysis, especially speech-in-noise detection, is one of the most disabling aspect of
both peripheral hearing loss and central disorders of hearing [6,7].
Previous laboratory-based research on auditory scene analysis employed synthetic stimuli
that are conventionally based on simple signals such as pure tones, sequences of tones of differ-
ent frequencies, or speech-in-noise for instance [8,9,10]. We designed a stimulus that consists
of a series of chords containing random frequencies that change from one chord to another.
The stimulus, referred to as the Stochastic Figure-Ground (SFG) signal, has some common fea-
tures with previous informational masking (IM) stimuli in which masking is produced by mul-
tiple elements that do not produce energetic masking at the level of the cochlea [11,12,13].
Unlike previous IM stimuli there is no spectral protection regionaround the target: in the
SFG paradigm subjects are required to separate complex figures with multiple frequencies
from a noisy background over the same frequency range. The SFG stimulus comprises of a
sequence of chords that span a fixed frequency range, and the pure tones comprising the
chords change randomly from one chord to another. We incorporated a target in the middle of
the signal that contains a specific number of frequencies (where the number of frequencies is
referred to as the coherenceof the stimulus) that repeat for a certain number of chords
(referred to as the durationof the stimulus). The SFG stimulus offers better parametric con-
trol of the salience of the figure (e.g. by changing the coherence, duration, size and density of
chords) as demonstrated in previous psychophysical experiments [14,15]. The stimulus
requires the grouping of multiple elements over frequency and time, similar to the segregation
of speech from noise. However, unlike speech-in-noise paradigms, segregation in the SFG stim-
ulus depends on the temporal coherence of the repeating components [15]
This paradigm, however, has only been tested in traditional laboratory settings based on
limited numbers of participants (usually 1015) who are typically undergraduate students
from local universities. While this represents the conventional approach for psychophysical
experiments, the recent emergence of web-based and app-based experimentation has the
potential to provide large amounts of data from participants with diverse demographic and
hearing profiles. In order to examine auditory segregation performance from a large and
diverse pool of subjects, we customized our figure-ground segregation task [14,15] as a short
engaging game for The Great Brain Experiment(www.thegreatbrainexperiment.com), a
large-scale cognitive science crowdsourcing app [16] developed for iOS and Android based
smartphones and tablets in association with the Wellcome Trust, UK.
On every trial, participants were required to indicate via a button press, which one of the
two SFG stimuli contained a target. We fixed the coherence of the figure (to 8 repeating fre-
quencies) and varied the duration of the figure (12, 8, 6, 4 and 2 chord segments) in five
increasingly difficult levels. The main aim of the experiment was to examine the utility of the
app in terms of studying auditory segregation behavior.
Another aim of the study was to assess segregation behavior as a function of age. Aging is
accompanied by changes in the peripheral auditory structures, resulting in poorer detection
thresholds typically at high frequencies [17]. It represents a major challenge for hearing science
and audiology because of the significant impact on the quality of life and lack of targeted treat-
ments. Aging results in a loss of hearing acuity due to the inability to use a combination of
auditory cues, including frequency and duration [18], spatial cues [19], temporal regularity
[20], and the inability to process sequential order of stimuli [21], melodies [22], and under-
stand speech in noisy background [23]. However, not all auditory deficits have a cochlear basis,
and may have a central origin, that may result in impaired understanding of the acoustic world
due to the higher-level deficits related to attention and working memory. Here, we predicted
that older participants (50 years and above) would be impaired at the task compared to
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 2/14
younger participants (1829 years) due to poor spectrotemporal integration that is necessary
to extract the temporally coherent targets. However, given the lack of systematic controls whilst
playing the app, the precise nature of such a deficit cannot be accurately determined.
The use of the app allows large-scale powerful studies to examine the effect of demographic
variables such as age [24] and hearing loss on figure-ground analysis. The present study repre-
sents a proof of concept in which we demonstrate that behavior measured via the smartphone
app is consistent with our previous psychophysics results [15].
Materials and Methods
Smartphone app
Our auditory figure-ground segregation experiment (How well can I hear?), is one of a suite
of eight psychological paradigms featured in the app, The Great Brain Experiment launched by
the Wellcome Trust Centre for Neuroimaging, University College London in collaboration
with an external developer (White Bat Games). Initially launched as a public engagement and
crowdsourcing app for iOS and Android devices in March 2013, the original release comprised
of four games based on working memory, attention, response inhibition and decision-making
[16]. Funded by the Wellcome Trust, the app received widespread media attention and quickly
garnered several thousands of participants and user plays. Building upon the success of the ini-
tial release, we designed our auditory game for the next release launched on November 21,
2013.
The study was approved by the University College London Research Ethics Committee
(application number: 4354/001). Once downloaded, the participant was instructed to fill a brief
demographic questionnaire (age, gender, educational status, location, native language, current
life satisfaction level) and provided written informed consent. At the start of each game, the
participant received brief information about the scientific principles underlying the game (Fig
1A) as well as detailed information about how to play the game (Fig 1B). Participants could
play a game any number of times and the number of plays was recorded, however, only the
first complete play was registered as a response. Once the game was completed, a dataset was
submitted to the server (provided stable internet connection) with game-specific information
and responses. The device used to submit the first play of each participant was assigned a
unique ID number (UID) and subsequent plays from that device were tagged with the same
UID. However, no personal identification was recorded. At the end of each play, the participant
received his/her percentile score.
Stimuli
The SFG stimulus comprises of a sequence of chords containing a random number of pure-
tone components that are not harmonically related. A subset of these components is repeated
identically over a certain number of chords, resulting in the spontaneous percept of a coherent
figureemerging from the random background. The appearance of the figure embedded in
randomly varying background simulated the perception of an auditory object in noisy listening
environments. Crucially, the figure can only be extracted by integrating along both frequency
and time dimensions. We refer to the number of repeating components that comprise the fig-
ure as the coherence, and the number of chords over which they repeat as the durationof the
figure [14,15].
In the present study, the SFG signal comprised of forty 25ms chords equal to 1s duration
(Fig 2B; coherence equal to 4 here for illustration purposes only, actual value used in the exper-
iment = 8). The range of frequencies in the stimulus was reduced to 0.22.1kHz (from 0.2
7.2kHz; [14,15]) due to the restrictions imposed by the sound card in the smartphone devices.
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 3/14
The coherence of the figure, i.e., the number of repeating components was fixed at 8 and the
onset of the figure was also fixed at 0.4s post-stimulus onset. The duration of the figure, i.e. the
number of chords over which the coherent components repeat, was selected from one of five
values: 12, 8, 6, 4, and 2, corresponding to the different levels of the game. The stimuli were cre-
ated at a sampling rate of 44.1kHz in MATLAB 2013b (The MathWorks Inc.) and drawn from
a set of 16 different wav files for each figure duration and target condition (figure present or
absent).
Procedure
The experiment was presented as a radar detection game where the participant assumed the
role of a radar operator. Before the game started, the instruction screen prompted the partici-
pant to play the game with headphones. The task required the participant to decide whether
the acoustic mixture contained a signal corresponding to the target sound of a ship or not by
pressing one of two buttons on the devices touchscreen (Fig 2A). Every trial consisted of two
Fig 1. Smartphone game task instructions. Participants are shown two screenshots that explain the scientific rationale of the game (left) and the context of
the game and specific task instructions (right).
doi:10.1371/journal.pone.0153916.g001
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 4/14
1-s long SFG stimuli, where one of them contained the figure while the other did not. The
order of the stimuli was counterbalanced on each trial. Feedback was provided after each trial.
The game consisted of five levels with five trials each corresponding to different duration val-
ues: 12, 8, 6, 4, and 2. The game started at an easy levelwith the number of repeating compo-
nents equal to 12. Each level consisted of 5 trials and if the participant scored more than 40%
Fig 2. Task and stimulus. (A) During the game, the participants are shown a radar screen (left) and are required to listen to two sounds (marked A and B).
The participantstask is to judge which of the two sounds contained a target. (B) The spectrogram ofa SFG sound containing a target with repeating
components depicted by the black arrows is shown on the right. Each sound is 1s long and spans a frequency range from 0.22.1kHz.
doi:10.1371/journal.pone.0153916.g002
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 5/14
(i.e., >2/5 correct responses) on that level, the game proceeded to the next more difficult level
with a lower duration value. On average, the game took approximately 5 minutes to complete.
The score was scaled according to the difficulty level: a correct response at levels 15 (duration
of figure: 12, 8, 6, 4, 2) was equal to 1, 2, 3, 4, and, 5 points respectively. At the end of the game,
the participant received the final score (maximum score: 75).
In contrast, the experimental paradigm in the psychophysical study [15] was slightly differ-
ent: the SFG stimulus with 25ms chords that had a broader frequency range (0.27.2kHz), the
coherence (values: 1, 2, 4, 6, 8) and the duration (values: 210) of the figure were varied in a
block design (50 trials per block), and the onset of the figure was jittered. The hit rates for the
condition corresponding to the values tested in the game (coherence of 8 and duration of 2, 4,
6, and 8) are reported for comparison.
Data analysis
Data from all participants and all plays were collated as a comma separated value file on the
app server. The relevant fields in the data were imported into MATLAB R2014b (MathWorks
Inc.) for analysis using custom scripts. The primary fields of interest included the score
(hit = 1, miss = 0) and response time for each of the 25 trials. Demographic information includ-
ing age, gender, educational status, and type of device was also extracted. The dataset presented
in this study features plays collected over the period of a year, from the launch of the game on
November 21, 2013 to December 31, 2014. In all, 14451 participants (6309 females; 8142
males) played the game 33715 times (mean number of games player per participant: 2.33, stan-
dard deviation: 3.96; range: 168). 51.47%, 25.80%, 10.66%, 4.80% and 2.36% of all participants
played the game only once, only twice, only thrice, only four and only five times respectively.
The percentage of participants who played the game more than 5 times was only 4.91%.
The data were subjected to a few rigorous exclusion criteriadata from participants below
18 years of age were rejected as well as data where the game stopped when the participant
scored less than or equal to 40% on two consecutive levels (i.e. correct score of 2 or less on the
5 trials in each level). The latter criterion applied to all games featured in the app. After remov-
ing the data from participants younger than 18 years, 11489 valid participants were obtained
which further reduced to 7049 after rejecting participants on the basis of poor performance as
described above. The response times were also measured for each trial and any game with a
response time greater than 6s for a single trial was also excluded, finally resulting in 5148 (2093
females; 3055 males) valid participants. We used a conservative threshold for reaction times
(compared to psychophysical experiments) to account for variations in terms of touch screen
response fidelity, device type and the uncontrolled context in which the participants played the
game. The majority of the resultant participant group (77.2%) spoke English as their native lan-
guage. 2849 (2299) participants used iOS (Android) devices to play the game. In terms of edu-
cational qualifications, the number of participants educated at the level of GCSEs, A-levels,
Bachelors, and Postgraduate education was equal to 579, 1096, 2230, and 1243 respectively.
The age and gender distributions for the valid participants are shown in Fig 3.
Statistical analysis
The main analysis focused on examining the effect of the duration of the figure on the perfor-
mance (hit rate) and the response time as well as age-related differences in performance and
response times. All statistical tests were conducted in MATLAB R2014b using in-built func-
tions in the statistics toolbox. Sphericity was evaluated using the Greenhouse-Geisser correc-
tion. Effect sizes (partial eta squares: η2) were analyzed using the Measures of Effect Size
toolbox in MATLAB [25].
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 6/14
Results
The scores and reaction times of the valid participants were tested for normality in order to jus-
tify the use of parametric tests. The scores were normally distributed (Shapiro-Wilk W = 0.997,
p<0.001) with a mean of 43.47 (maximum score being 75) and standard deviation of 8.01.
The response times had a mean value of 1.12s and a standard deviation of 0.79s and were log-
transformed to ensure normality (Shapiro-Wilk W = 0.98, p <0.001).
The aim of the analysis was to determine the effect of the duration of the figure on hit rates
and reaction times. We observed a main effect of duration on the hit rate: F(4,25735) = 571.9, p
<0.001, η2 = 0.082. The hit rate for duration of 2 was at chance (0.51) and increased monotoni-
cally for duration values of 4 (0.55) and 6 (0.66) and then remained almost constant for dura-
tions of 8 and 12 (0.65 and 0.66 respectively) as shown in Fig 3 (blue). The pattern of responses
(Fig 4, blue) is remarkably similar to the hit rates observed in the psychophysical experiment
(Fig 4, red) where hit rates increased monotonically from duration of 2 (0.45) to 4 (0.78) and
then leveled off for higher duration values (0.92). The drop in performance based on the game
was ~25% on average which can be attributed to the greater amount of noise in the data, given
the uncontrolled acoustic and experimental settings. However, the effect size obtained in the
Fig 3. Demographics. Bar charts show the number of participants according to different age groups and gender (male in black bars, female in grey bars).
doi:10.1371/journal.pone.0153916.g003
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 7/14
smartphone study was approximately equal to the effect size observed for similar range of
coherence (equal to 8) and duration values (210) in the psychophysical experiment
(η2 = 0.081).
We also observed a significant effect of duration on the response times: F(4,25735) =
933.23, p <0.001, η2 = 0.127 (Fig 5). The response times were highest for the first trial: 2.58 +/-
1.61s, presumably because the first trial could be perceived to be the most difficult given the
lack of adequate practice. With increasing number of trials, the response times stabilized and
for trial numbers 625, the response times ranged from 0.99s to 1.10s.
Finally, we also analyzed performance and response times as a function of age and focused
on two age groups: 1829 year olds (n = 3033) and 5069 year olds (n = 324). The hit rates of
the two groups are plotted in Fig 6A. We performed an ANOVA with group as a between-sub-
ject factor (young vs. old participants) and duration of the figure as a within-subject factor. We
found a significant main effect of group: F(1,3374) = 20.10, p <0.001 but no significant effect
of duration: F(4,13496) = 0.74, p = 0.56, nor any significant interaction: F(4,13496) = 1.51,
p = 0.20. Although the interaction between age and duration was not significant (p = 0.20), we
performed an exploratory analysis by performing post hoc t-tests for each level of duration: we
observed a significant difference in the hit rate only for durations of 12 (t = 5.92, p <0.001,
Fig 4. Performance in the app vs. the psychophysics study. Hit rates are plotted as a function of the duration of the figure for data obtained from the
smartphone app (n = 5148, in blue) and the psychophysics study (n = 10, in red). Error bars depict 1 STD.
doi:10.1371/journal.pone.0153916.g004
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 8/14
df = 3374), 8 (t = 3.84, p = 0.0001, df = 3374), and 6 (t = 3.34, p = 0.0008, df = 3374) and not
for the more difficult duration conditions of 4 (t = -1.28, p = 0.2, df = 3374) and 2 (t = 0.75,
p = 0.46, df = 3374).
Fig 6B shows the response times for the two age groups. A similar ANOVA analysis yielded
a significant main effect of group: F(1,3374) = 20.01, p <0.001, a main effect of duration: F
(4,13496) = 4.40, p = 0.001 as well as a significant interaction: F(4,13496) = 3.78, p = 0.004.
Post-hoc t-tests confirmed a significant difference in response times for all but the smallest
duration value: 12 (t = 3.05, p = 0.002, df = 3374), 8 (t = 2.52, p = 0.01, df = 3374), and 6
(t = 3.41, p = 0.0006, df = 3374), 4 (t = 2.05, p = 0.04, df = 3374), 2 (t = 0.96, p = 0.34,
df = 3374).
Discussion
We demonstrate that experiments to assess a high-level aspect of auditory cognition using an
established auditory paradigm can be replicated using an app, despite the uncontrolled testing
environment compared to the laboratory. We present data from 5148 participants, gathered over
the course of a year. Our particular game was co-launched with three other games featured in the
Fig 5. Reaction Times. Reaction times (in seconds) are plotted against the duration of the figure. Error bars depict 1 SEM.
doi:10.1371/journal.pone.0153916.g005
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 9/14
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 10 / 14
Great Brain Experiment app. Presented as a citizen science project (e.g. [26,27]), the app was suc-
cessful in attracting tens of thousands of users through online and social media forums because of
its scientific appeal and interactive gamification of psychological experiments. We observed that
results from our auditory segregation game were consistent with results from laboratory experi-
ments and highlight the potential use of such citizen science projects in engaging with the public
and replicating laboratory experiments based on a large and diverse sample of participants.
The task was based on a stochastic figure-ground (SFG) stimulus developed to simulate seg-
regation in complex acoustic scenes [14,15]. Unlike simple signals used to study segregation,
like the two-tone alternating streaming stimulus [2,3], or informational masking paradigms
[11,12], segregation in the SFG stimulus can only be achieved by integrating across both fre-
quency and time [15,28]. We have demonstrated that listeners are highly sensitive to the emer-
gence of brief figuresfrom the random ongoing background, and that performance is
significantly modulated as a function of the coherence and duration of the figure. Additionally,
we also showed that our behavioral data [15] are consistent with the predictions of the tempo-
ral coherence theory of auditory scene analysis [29,30]. The temporal coherence model is
based on a mechanism that captures the extent to which activity in distinct neuronal popula-
tions, that encode different perceptual features, is correlated in time.
We used this paradigm to study auditory segregation as the SFG stimulus is associated with
a short build-up (figure duration varied from 50-350ms in original experiments) whereas for
streaming signals, build-up takes 2-3s in quiet environments [31,32,33]. The build-up of audi-
tory segregation represented an important practical factor related to the overall time required
to complete the experiment. With a short build-up for the SFG signal, the task took approxi-
mately 5 minutes. However, streaming paradigms would have taken much longer to complete,
given the slower build-up that may be further accentuated by the lack of a controlled acoustic
environment when playing on the app.
We fixed the number of repeating components, i.e. the coherence of the figure, to eight and
varied the number of chords over which they repeat. Performance was found to vary signifi-
cantly with the duration of the figure, replicating our earlier results [14,15]. We also found a
main effect of duration on the response times, i.e. response times decreased with increasing
duration of the figure. Compared to the results obtained from the psychophysical studies, per-
formance on the app was significantly impaired. This can be due to a number of reasons related
to the differences in experimental design (1-alternative forced choice design for the laboratory
experiments vs. 2-alternative forced choice design for the app), experimental setup (laboratory
experiments were conducted in soundproof booths with participants listening to the stimuli
over headphones at controlled sound levels, whilst there was no such experimental or acoustic
control for the app), as well as differences in training and practice of the experimental task
(participants in the psychophysics studies received adequate instruction about the stimuli and
also got to practice the task whilst the app only had minimal instruction about the scientific
rationale of the study and response instructions).
Data from the app is associated with greater within-subject measurement noise that is
reflected in lower hit rates as well as greater between-subject noise that is reflected in the higher
variance in the population data (see Fig 4). Thus, compared to standard psychophysical set-
tings, experimenting on the app is associated with greater noise both at the input level (sensory
signal) as well as at the output level (response).
Fig 6. Performance and reaction times for two different age groups. (A) Hit rates are plotted for two
different age groups: 1829 year olds (n = 3033, in blue) and 5059 year olds (n = 324, in red). (B) Reaction
times for the younger and older set of participants as above are shown in blue and red respectively. Error bars
depict 1 SEM.
doi:10.1371/journal.pone.0153916.g006
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 11 / 14
The large sample of participants allowed us to analyze segregation behavior as a function of
age. Based on previous research (see introduction), we expected older participants (aged 5069
years) to be worse on the task than a younger cohort (1829 years). Performance accuracy as
well as response times were modulated by age, i.e. we found significantly lower hit rates and
longer response times in the older versus the younger participants. Although peripheral hear-
ing suffers with normal aging, e.g. due to loss of hair cells and spiral ganglion neurons [34],
there are multiple factors that contribute to poor scene analysis abilities in older adults. Aging
affects frequency resolution, duration discrimination, spatial localization, melody perception
as well as speech comprehension in noisy backgrounds [18,19,20,21,22,23]. In our experi-
mental paradigm, we have previously demonstrated that segregation of the figures from the
background relies on temporal coherence [15]. Recent work has demonstrated that in addition
to spectral features, temporal coherence is a vital cue in scene analysis and promotes integra-
tion whilst temporal incoherence leads to segregation [35,36,37]. Our results provide the first
demonstration that the use of temporal coherence as a cue for scene analysis may worsen with
age, a result that needs to be confirmed in proper psychophysical settings. However, since the
neural substrate of temporal coherence analysis is not yet known [29], it is difficult to ascertain
whether the behavioural deficit with aging is associated with peripheral or central auditory
pathways.
Although smartphone experiments are useful in gathering data from a large number of
potentially very diverse participants, and link datasets across time and tasks with user specific
IDs, their use must be considered carefully according to the needs of the study. Recruitment of
participants via an app can be a potentially demanding task, requiring constant use of press
and social media to attract new users. For a smaller number of participants (e.g. a few hun-
dred), web-based testing or recruitment through Amazons Mechanical Turk [38] may be
more beneficial. The cost of developing an app represents the main cost, as it is best outsourced
to professional developers. Another drawback is the limited technical specifications that can be
harnessed on a smartphone, as opposed to web-based experiments or laboratory experiments
run on computers or laptops. Nevertheless, the benefits outweigh the limitations and previous
work based on this app [16,39,40] highlights the advantages of running large-scale experi-
ments that show consistent results with experiments conducted in the laboratory. Apart from
scientific applications, such app based games are also an effective means of engaging the public
with scientific research.
In summary, we demonstrate that standard psychoacoustic experimental results can be rep-
licated effectively using smartphone apps. The observed effect sizes are similar given the noisy
and limited data points and the uncontrolled acoustic and experimental environment. These
results highlight the utility of smartphone applications for collecting large-scale auditory
behavioral data for basic research and clinical investigations as well.
Acknowledgments
Our thanks to Neil Millstone of White Bat Games for developing the app. We also thank Rick
Adams, Harriet Brown, Peter Smittenaar and Peter Zeidman for help with app development;
Ric Davis, Chris Freemantle, and Rachael Maddock for supporting data collection; and Dan
Jackson (University College London) and Craig Brierley and Chloe Sheppard (Wellcome
Trust).
Author Contributions
Conceived and designed the experiments: ST SK TDG. Analyzed the data: ST. Wrote the
paper: ST SK TDG.
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 12 / 14
References
1. Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J
Acoust Soc Am 25(5): 975979.
2. van Noorden LPAS (1975) Temporal coherence in the perception of tone sequences. University of
Technology, Eindhoven; Eindhoven, The Netherlands.
3. Bregman AS (1990) Auditory scene analysis: the perceptual organization of sound. MIT Press.
4. McDermott JH (2009) The cocktail party problem. Curr Biol 19: R1024R1027. doi: 10.1016/j.cub.
2009.09.005 PMID: 19948136
5. Nelken I, Bizley J, Shamma SA, Wang X (2014) Auditory cortical processing in real-world listening: the
auditory system going real. J Neurosci 34: 1513515138. doi: 10.1523/JNEUROSCI.2989-14.2014
PMID: 25392481
6. Humes LE, Watson BU, Christensen LA, Cokely CG, Halling DC, Lee L (1994) Factors associated with
individual differences in clinical measures of speech recognition among the elderly. J Speech Hear Res
37(2): 465474. PMID: 8028328
7. Zekveld AA, Kramer SE, Festen JM (2011) Cognitive load during speech perception in noise: the influ-
ence of age, hearing loss, and cognition on the pupil response. Ear Hear. 32(4): 498510. doi: 10.
1097/AUD.0b013e31820512bb PMID: 21233711
8. Snyder JS, Gregg MK, Weintraub DM, Alain C (2012) Attention, awareness, and the perception of audi-
tory scenes. Front Psychol 3:15. doi: 10.3389/fpsyg.2012.00015 PMID: 22347201
9. Gutschalk A, Dykstra AR (2014) Functional imaging of auditory scene analysis. Hear Res 307: 98
110. doi: 10.1016/j.heares.2013.08.003 PMID: 23968821
10. Akram S, Englitz B, Elhilali M, Simon JZ, Shamma SA (2014) Investigating the neural correlates of a
streaming percept in an informational-masking paradigm. PLoS ONE 9: e114427. doi: 10.1371/journal.
pone.0114427 PMID: 25490720
11. Kidd G, Mason CR, Deliwala PS, Woods WS, Colburn HS (1994) Reducing informational masking by
sound segregation. J Acoust Soc Am 95: 34753480. PMID: 8046139
12. Kidd G, Mason CR, Dai H (1995) Discriminating coherence in spectro-temporal patterns. J Acoust Soc
Am 97: 37823790. PMID: 7790656
13. Gutschalk A, Micheyl C, Oxenham AJ (2008) Neural correlates of auditory perceptual awareness under
informational masking. PLoS Biol 6: e138. doi: 10.1371/journal.pbio.0060138 PMID: 18547141
14. Teki S, Chait M, Kumar S, von Kriegstein K, Griffiths TD (2011) Brain bases for auditory stimulus-driven
figure-ground segregation. J Neurosci 31: 164171. doi: 10.1523/JNEUROSCI.3788-10.2011 PMID:
21209201
15. Teki S, Chait M, Kumar S, Shamma S, Griffiths TD (2013) Segregation of complex acoustic scenes
based on temporal coherence. Elife 2: e00699. doi: 10.7554/eLife.00699 PMID: 23898398
16. Brown HR, Zeidman P, Smittenaar P, Adams RA, McNab F, Rutledge RB, et al. (2014) Crowdsourcing
for cognitive sciencethe utility of smartphones. PLoS ONE 9: e100662. doi: 10.1371/journal.pone.
0100662 PMID: 25025865
17. Morrell CH, Gordon-Salant S, Pearson JD, Brant LJ, Fozard JL (1996) Age- and gender-specific refer-
ence changes for hearing level and longitudinal changes in hearing level. J Acoust Soc Am 100: 1949
1967. PMID: 8865630
18. Abel SM, Krever EM, Alberti PW (1990) Auditory detection, discrimination and speech processing in
ageing, noise-sensitive and hearing-impaired listeners. Scand Audiol 19: 4354. PMID: 2336540
19. Abel SM, Sass-Kortsak A, Naugler JJ (2000) The role of high-frequency hearing in age-related speech
understanding deficits. Scand Audiol 29(3): 131138. PMID: 10990011
20. Rimmele J, Schroger E, Bendixen A (2012) Age-related changes in the use of regular patterns for audi-
tory scene analysis. Hear Res ( 289): 98107.
21. Trainor LJ, Trehub SE (1989) Aging and auditory temporal sequencing: Ordering the elements of
repeating tone patterns. Percept Psychophys 45(5): 417426. PMID: 2726404
22. Lynch MP, Steffens ML (1994) Effects of aging on processing of novel musical structure. J Gerontol 49
(4): P165P172. PMID: 8014397
23. Duquesnoy AJ (1983) Effect of a single interfering noise or speech source upon the binaural sentence
intelligibility of aged persons. J Acoust Soc Am 74(3): 739743. PMID: 6630729
24. Rimmele JM, Sussman E, Poeppel D (2015) The role of temporal structure in the investigation of sen-
sory memory, auditory scene analysis, and speech perception: a healthy-aging perspective. Int J Psy-
chophysiol 95(2): 175183 doi: 10.1016/j.ijpsycho.2014.06.010 PMID: 24956028
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 13 / 14
25. Hentschke H, Stüttgen MC (2011) Computation of measures of effect size for neuroscience data sets.
Eur J Neurosci 34(12): 18871894. doi: 10.1111/j.1460-9568.2011.07902.x PMID: 22082031
26. Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) reCAPTCHA: human-based character
recognition via Web security measures. Science 321:14651468. doi: 10.1126/science.1160379
PMID: 18703711
27. Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, et al. (2011) Algorithm discovery by protein
folding game players. Proc Natl Acad Sci USA 108:1894918953. doi: 10.1073/pnas.1115898108
PMID: 22065763
28. Dykstra AR, Gutschalk A (2013) Time is of the essence for auditory scene analysis. Elife 2:e01136.
doi: 10.7554/eLife.01136 PMID: 23898402
29. Shamma SA, Elhilali M, Micheyl C (2011) Temporal coherence and attention in auditory scene analysis.
Trends Neurosci 34:114123. doi: 10.1016/j.tins.2010.11.002 PMID: 21196054
30. Krishnan L, Elhilali M, Shamma S (2014) Segregating Complex Sound Sources through Temporal
Coherence. PLoS Comput Biol 10: e1003985. doi: 10.1371/journal.pcbi.1003985 PMID: 25521593
31. Fishman YI, Reser DH, Arezzo JC, Steinschneider M (2001) Neural correlates of auditory stream seg-
regation in primary auditory cortex of the awake monkey. Hear Res 151: 167187. PMID: 11124464
32. Fishman YI, Arezzo JC, Steinschneider M (2004) Auditory stream segregation in monkey auditory cor-
tex: effects of frequency separation, presentation rate, and tone duration. J Acoust Soc Am 116: 1656
1670. PMID: 15478432
33. Fishman YI, Steinschneider M (2010) Formation of auditory streams. In The Oxford Handbook of Audi-
tory Science, eds Rees A, Palmer AR ( Oxford University Press).
34. Perez P, Bao J (2011) Why do hair cells and spiral ganglion neurons in the cochlea die during aging?
Aging Dis 2:231241 PMID: 22396875
35. Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA (2009) Temporal coherence in the perceptual
organization and cortical representation of auditory scenes. Neuron 61(2): 317329. doi: 10.1016/j.
neuron.2008.12.005 PMID: 19186172
36. Micheyl C, Hanson C, Demany L, Shamma S, Oxenham AJ. (2013) Auditory Stream Segregation for
Alternating and Synchronous Tones. J Exp Psychol Hum Percept Perform 39(6): 15681580. doi: 10.
1037/a0032241 PMID: 23544676
37. Micheyl C, Kreft H, Shamma S, Oxenham AJ (2013) Temporal coherence versus harmonicity in audi-
tory stream formation. J Acoust Soc Am 133(3): 188194.
38. Crump MJC, McDonnell JV, Gureckis TM (2013) Evaluating Amazons Mechanical Turk as a tool for
experimental behavioral research. PLoS One 8:e57410. doi: 10.1371/journal.pone.0057410 PMID:
23516406
39. McNab F, Dolan RJ (2014) Dissociating distractor-filtering at encoding and during maintenance. J Exp
Psychol Hum Percept Perform 40: 960967. doi: 10.1037/a0036013 PMID: 24512609
40. Rutledge RB, Skandali N, Dayan P, Dolan RJ (2014) A computational and neural model of momentary
subjective well-being. Proc Natl Acad Sci USA 111: 1225212257. doi: 10.1073/pnas.1407535111
PMID: 25092308
App Based Analysis of Auditory Segregation
PLOS ONE | DOI:10.1371/journal.pone.0153916 April 20, 2016 14 / 14
... Our study achieves both exemplary breadth of different abilities and depth of volunteer participation compared to other game-based population-scale assessment studies such as SeaHero Quest and The Great Brain Experiment (H. R. Brown et al., 2014;Coughlan et al., 2019;Coutrot et al., 2018;Hunt et al., 2016;McNab et al., 2015;Rutledge et al., 2014Rutledge et al., , 2016Smittenaar et al., 2015;Teki et al., 2016). This is a positive step towards comprehensive citizen involvement in the construction of complex cognitive studies in the future. ...
Article
Full-text available
Rapid individual cognitive phenotyping holds the potential to revolutionize domains as wide-ranging as personalized learning, employment practices, and precision psychiatry. Going beyond limitations imposed by traditional lab-based experiments, new efforts have been underway toward greater ecological validity and participant diversity to capture the full range of individual differences in cognitive abilities and behaviors across the general population. Building on this, we developed Skill Lab, a novel game-based tool that simultaneously assesses a broad suite of cognitive abilities while providing an engaging narrative. Skill Lab consists of six mini-games as well as 14 established cognitive ability tasks. Using a popular citizen science platform (N = 10,725), we conducted a comprehensive validation in the wild of a game-based cognitive assessment suite. Based on the game and validation task data, we constructed reliable models to simultaneously predict eight cognitive abilities based on the users' in-game behavior. Follow-up validation tests revealed that the models can discriminate nuances contained within each separate cognitive ability as well as capture a shared main factor of generalized cognitive ability. Our game-based measures are five times faster to complete than the equivalent task-based measures and replicate previous findings on the decline of certain cognitive abilities with age in our large cross-sectional population sample (N = 6369). Taken together, our results demonstrate the feasibility of rapid in-the-wild systematic assessment of cognitive abilities as a promising first step toward population-scale benchmarking and individualized mental health diagnostics.
... These data demonstrate that remote test setting could be a viable alternative to laboratory for the study of emotion perception and would join a growing list of paradigms that have been successfully implemented remotely for research (e.g. Teki, Kumar, and Griffiths 2016;Paglialonga et al. 2020) and clinical purposes (e.g. Lancaster et al. 2008;Swanepoel, Koekemoer, and Clark 2010;Leensen et al. 2011). ...
Article
Objective To evaluate remote testing as a tool for measuring emotional responses to non-speech sounds. Design Participants self-reported their hearing status and rated valence and arousal in response to non-speech sounds on an Internet crowdsourcing platform. These ratings were compared to data obtained in a laboratory setting with participants who had confirmed normal or impaired hearing. Study sample Adults with normal and impaired hearing. Results In both settings, participants with hearing loss rated pleasant sounds as less pleasant than did their peers with normal hearing. The difference in valence ratings between groups was generally smaller when measured in the remote setting than in the laboratory setting. This difference was the result of participants with normal hearing rating sounds as less extreme (less pleasant, less unpleasant) in the remote setting than did their peers in the laboratory setting, whereas no such difference was noted for participants with hearing loss. Ratings of arousal were similar from participants with normal and impaired hearing; the similarity persisted in both settings. Conclusions In both test settings, participants with hearing loss rated pleasant sounds as less pleasant than did their normal hearing counterparts. Future work is warranted to explain the ratings of participants with normal hearing.
... Our study achieves both exemplary breadth of different abilities and depth of volunteer participation compared to other game-based population-scale assessment studies such as SeaHero Quest and The Great Brain Experiment (H. R. Brown et al., 2014;Coughlan et al., 2019;Coutrot et al., 2018;Hunt et al., 2016;McNab et al., 2015;Rutledge et al., 2014Rutledge et al., , 2016Smittenaar et al., 2015;Teki et al., 2016). This is a positive step towards comprehensive citizen involvement in the construction of complex cognitive studies in the future. ...
Preprint
Full-text available
Rapid individual cognitive phenotyping holds the potential to revolutionize domains as wide-ranging as personalized learning, employment practices, and precision psychiatry. Going beyond limitations imposed by traditional lab-based experiments, new efforts have been underway towards greater ecological validity and participant diversity to capture the full range of individual differences in cognitive abilities and behaviors across the general population. Building on this, we developed Skill Lab, a novel game-based tool that simultaneously assesses a broad suite of cognitive abilities while providing an engaging narrative. Skill Lab consists of six mini-games as well as 14 established cognitive ability tasks. Using a popular citizen science platform (N = 10725), we conducted a comprehensive validation in the wild of a game-based cognitive assessment suite. Based on the game and validation task data, we constructed reliable models to simultaneously predict eight cognitive abilities based on the users’ in-game behavior. Follow-up validation tests revealed that the models can discriminate nuances contained within each separate cognitive ability as well as capture a shared main factor of generalized cognitive ability. Our game-based measures are five times faster to complete than the equivalent task-based measures and replicate previous findings on the decline of certain cognitive abilities with age in our large cross-sectional population sample (N = 6369). Taken together, our results demonstrate the feasibility of rapid in-the-wild systematic assessment of cognitive abilities as a promising first step towards population-scale benchmarking and individualized mental health diagnostics.
... Nonetheless, consistent with the reported successful use of Smartphones in assessing cognitive abilities (e.g. [24,27,29]), physiological functions [41,42] and auditory processes [43], our results demonstrate the potential use of Smartphone applications for psychophysical measures of human visual performance. In particular, we suggest that Smartphones are a valid, convenient and cost-effective means of assessing the effect of everyday mood changes on the ability of individuals to perform visual searches (see also [44]). ...
Article
Full-text available
The study of visual perception has largely been completed without regard to the influence that an individual’s emotional status may have on their performance in visual tasks. However, there is a growing body of evidence to suggest that mood may affect not only creative abilities and interpersonal skills but also the capacity to perform low-level cognitive tasks. Here, we sought to determine whether rudimentary visual search processes are similarly affected by emotion. Specifically, we examined whether an individual’s perceived happiness level affects their ability to detect a target in noise. To do so, we employed pop-out and serial visual search paradigms, implemented using a novel smartphone application that allowed search times and self-rated levels of happiness to be recorded throughout each twenty-four-hour period for two weeks. This experience sampling protocol circumvented the need to alter mood artificially with laboratory-based induction methods. Using our smartphone application, we were able to replicate the classic visual search findings, whereby pop-out search times remained largely unaffected by the number of distractors whereas serial search times increased with increasing number of distractors. While pop-out search times were unaffected by happiness level, serial search times with the maximum numbers of distractors (n = 30) were significantly faster for high happiness levels than low happiness levels (p = 0.02). Our results demonstrate the utility of smartphone applications in assessing ecologically valid measures of human visual performance. We discuss the significance of our findings for the assessment of basic visual functions using search time measures, and for our ability to search effectively for targets in real world settings.
Article
Full-text available
Several assistive technologies (ATs) have been manufactured and tested to alleviate the challenges of deaf or hearing-impaired people (DHI). One such technology is sound detection, which has the potential to enhance the experiences of DHI individuals and provide them with new opportunities. However, there is a lack of sufficient research on using sound detection as an assistive technology, specifically for DHI individuals. This systematic literature review (SLR) aims to shed light on the application of non-verbal sound detection technology in skill development for DHI individuals. This SLR encompassed recent, high-quality studies from the prestigious databases of IEEE, ScienceDirect, Scopus, and Web of Science from 2014 to 2023. Twenty-six articles that met the eligibility criteria were carefully analyzed and synthesized. The findings of this study underscore the significance of utilizing sound detection technology to aid DHI individuals in achieving independence, access to information, and safety. It is recommended that additional studies be conducted to explore the use of sound detection tools as assistive technology, to enhance DHI individual’s sustainable quality of life.
Article
Full-text available
Background Occupational stress has huge financial as well as human costs. Application of crowdsourcing might be a way to strengthen the investigation of occupational mental health. Therefore, the aim of the study was to assess Danish employees’ stress and cognition by relying on a crowdsourcing approach, as well as investigating the effect of a 30-day mindfulness and music intervention. Methods We translated well-validated neuropsychological laboratory- and task-based paradigms into an app-based platform using cognitive games measuring sustained attention and working memory and measuring stress via. Cohen’s Perceived Stress Scale. A total of 623 healthy volunteers from Danish companies participated in the study and were randomized into three groups, which consisted of a 30-day intervention of either mindfulness or music, or a non-intervention control group. Results Participants in the mindfulness group showed a significant improvement in the coefficient of sustained attention, working memory capacity and perceived stress ( p < .001). The music group showed a 38% decrease of self-perceived stress. The control group showed no difference from pre to post in the survey or cognitive outcome measures. Furthermore, there was a significant correlation between usage of the mindfulness and music app and elevated score on both the cognitive games and the perceived stress scale. Conclusion The study supports the nascent field of crowdsourcing by being able to replicate data collected in previous well-controlled laboratory studies from a range of experimental cognitive tasks, making it an effective alternative. It also supports mindfulness as an effective intervention in improving mental health in the workplace.
Article
Full-text available
Translation in cognitive neuroscience remains beyond the horizon, brought no closer by supposed major advances in our understanding of the brain. Unless our explanatory models descend to the individual level—a cardinal requirement for any intervention—their real-world applications will always be limited. Drawing on an analysis of the informational properties of the brain, here we argue that adequate individualisation needs models of far greater dimensionality than has been usual in the field. This necessity arises from the widely distributed causality of neural systems, a consequence of the fundamentally adaptive nature of their developmental and physiological mechanisms. We discuss how recent advances in high-performance computing, combined with collections of large-scale data, enable the high-dimensional modelling we argue is critical to successful translation, and urge its adoption if the ultimate goal of impact on the lives of patients is to be achieved.
Article
Full-text available
Our ability to make sense of the auditory world results from neural processing that begins in the ear, goes through multiple subcortical areas, and continues in the cortex. The specific contribution of the auditory cortex to this chain of processing is far from understood. Although many of the properties of neurons in the auditory cortex resemble those of subcortical neurons, they show somewhat more complex selectivity for sound features, which is likely to be important for the analysis of natural sounds, such as speech, in real-life listening conditions. Furthermore, recent work has shown that auditory cortical processing is highly context-dependent, integrates auditory inputs with other sensory and motor signals, depends on experience, and is shaped by cognitive demands, such as attention. Thus, in addition to being the locus for more complex sound selectivity, the auditory cortex is increasingly understood to be an integral part of the network of brain regions responsible for prediction, auditory perceptual decision-making, and learning. In this review, we focus on three key areas that are contributing to this understanding: the sound features that are preferentially represented by cortical neurons, the spatial organization of those preferences, and the cognitive roles of the auditory cortex.
Article
Full-text available
Translation in cognitive neuroscience remains beyond the horizon, brought no closer by supposed major advances in our understanding of the brain. Unless our explanatory models descend to the individual level—a cardinal requirement for any intervention—their real-world applications will always be limited. Drawing on an analysis of the informational properties of the brain, here we argue that adequate individualisation needs models of far greater dimensionality than has been usual in the field. This necessity arises from the widely distributed causality of neural systems, a consequence of the fundamentally adaptive nature of their developmental and physiological mechanisms. We discuss how recent advances in high-performance computing, combined with collections of large-scale data, enable the high-dimensional modelling we argue is critical to successful translation, and urge its adoption if the ultimate goal of impact on the lives of patients is to be achieved.
Article
Psychophysical experiments conducted remotely over the internet permit data collection from large numbers of participants but sacrifice control over sound presentation and therefore are not widely employed in hearing research. To help standardize online sound presentation, we introduce a brief psychophysical test for determining whether online experiment participants are wearing headphones. Listeners judge which of three pure tones is quietest, with one of the tones presented 180° out of phase across the stereo channels. This task is intended to be easy over headphones but difficult over loudspeakers due to phase-cancellation. We validated the test in the lab by testing listeners known to be wearing headphones or listening over loudspeakers. The screening test was effective and efficient, discriminating between the two modes of listening with a small number of trials. When run online, a bimodal distribution of scores was obtained, suggesting that some participants performed the task over loudspeakers despite instructions to use headphones. The ability to detect and screen out these participants mitigates concerns over sound quality for online experiments, a first step toward opening auditory perceptual research to the possibilities afforded by crowdsourcing.
Article
Full-text available
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns ('figures') from overlapping 'background' signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation.
Article
Full-text available
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments.
Article
Full-text available
Humans routinely segregate a complex acoustic scene into different auditory streams, through the extraction of bottom-up perceptual cues and the use of top-down selective attention. To determine the neural mechanisms underlying this process, neural responses obtained through magnetoencephalography (MEG) were correlated with behavioral performance in the context of an informational masking paradigm. In half the trials, subjects were asked to detect frequency deviants in a target stream, consisting of a rhythmic tone sequence, embedded in a separate masker stream composed of a random cloud of tones. In the other half of the trials, subjects were exposed to identical stimuli but asked to perform a different task-to detect tone-length changes in the random cloud of tones. In order to verify that the normalized neural response to the target sequence served as an indicator of streaming, we correlated neural responses with behavioral performance under a variety of stimulus parameters (target tone rate, target tone frequency, and the "protection zone", that is, the spectral area with no tones around the target frequency) and attentional states (changing task objective while maintaining the same stimuli). In all conditions that facilitated target/masker streaming behaviorally, MEG normalized neural responses also changed in a manner consistent with the behavior. Thus, attending to the target stream caused a significant increase in power and phase coherence of the responses in recording channels correlated with an increase in the behavioral performance of the listeners. Normalized neural target responses also increased as the protection zone widened and as the frequency of the target tones increased. Finally, when the target sequence rate increased, the buildup of the normalized neural responses was significantly faster, mirroring the accelerated buildup of the streaming percepts. Our data thus support close links between the perceptual and neural consequences of the auditory stream segregation.
Article
Full-text available
The auditory sense of humans transforms intrinsically senseless pressure waveforms into spectacularly rich perceptual phenomena: the music of Bach or the Beatles, the poetry of Li Bai or Omar Khayyam, or more prosaically the sense of the world filled with objects emitting sounds that is so important for those of us lucky enough to have hearing. Whereas the early representations of sounds in the auditory system are based on their physical structure, higher auditory centers are thought to represent sounds in terms of their perceptual attributes. In this symposium, we will illustrate the current research into this process, using four case studies. We will illustrate how the spectral and temporal properties of sounds are used to bind together, segregate, categorize, and interpret sound patterns on their way to acquire meaning, with important lessons to other sensory systems as well.
Article
Full-text available
Significance A common question in the social science of well-being asks, “How happy do you feel on a scale of 0 to 10?” Responses are often related to life circumstances, including wealth. By asking people about their feelings as they go about their lives, ongoing happiness and life events have been linked, but the neural mechanisms underlying this relationship are unknown. To investigate it, we presented subjects with a decision-making task involving monetary gains and losses and repeatedly asked them to report their momentary happiness. We built a computational model in which happiness reports were construed as an emotional reactivity to recent rewards and expectations. Using functional MRI, we demonstrated that neural signals during task events account for changes in happiness.
Article
Full-text available
By 2015, there will be an estimated two billion smartphone users worldwide. This technology presents exciting opportunities for cognitive science as a medium for rapid, large-scale experimentation and data collection. At present, cost and logistics limit most study populations to small samples, restricting the experimental questions that can be addressed. In this study we investigated whether the mass collection of experimental data using smartphone technology is valid, given the variability of data collection outside of a laboratory setting. We presented four classic experimental paradigms as short games, available as a free app and over the first month 20,800 users submitted data. We found that the large sample size vastly outweighed the noise inherent in collecting data outside a controlled laboratory setting, and show that for all four games canonical results were reproduced. For the first time, we provide experimental validation for the use of smartphones for data collection in cognitive science, which can lead to the collection of richer data sets and a significant cost reduction as well as provide an opportunity for efficient phenotypic screening of large populations.
Article
Full-text available
Listening situations with multiple talkers or background noise are common in everyday communication and are particularly demanding for older adults. Here we review current research on auditory perception in aging individuals in order to gain insights into the challenges of listening under noisy conditions. Informationally rich temporal structure in auditory signals - over a range of time scales from milliseconds to seconds - renders temporal processing central to perception in the auditory domain. We discuss the role of temporal structure in auditory processing, in particular from a perspective relevant for hearing in background noise, and focusing on sensory memory, auditory scene analysis, and speech perception. Interestingly, these auditory processes, usually studied in an independent manner, show considerable overlap of processing time scales, even though each has its own ‚privileged‘ temporal regimes. By integrating perspectives on temporal structure processing in these three areas of investigation, we aim to highlight similarities typically not recognized.
Article
Full-text available
The effectiveness of distractor-filtering is a potentially important determinant of working memory capacity (WMC). However, a distinction between the contributions of distractor-filtering at WM encoding as opposed to filtering during maintenance has not been made and the assumption is that these rely on the same mechanism. Within 2 experiments, 1 conducted in the laboratory with 21 participants, and the other played as a game on smartphones (n = 3,247) we measure WMC without distractors, and present distractors during encoding or during the delay period of a WM task to determine performance associated with distraction at encoding and during maintenance. Despite differences in experimental setting and paradigm design between the 2 studies, we show a unique contribution to WMC from both encoding and delay distractor performance in both experiments, while controlling for performance in the absence of distraction. Thus, within 2 separate experiments, 1 involving an extremely large cohort of 3,247 participants, we show a dissociation between encoding and delay distractor-filtering, indicating that separate mechanisms may contribute to WMC. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Article
The article outlines various neural processes underlying auditory scene analysis. It refers to the processes by which the auditory system groups and segregates components of sound mixtures to construct meaningful perceptual representations of sound sources in the environment. The article gives an overview of sequential, simultaneous, and schema-based auditory perceptual segregation/grouping processes. Emphasis is placed on the relationship between neurophysiological studies of auditory scene analysis in humans and those involving animal models. General physiological principles and themes that have emerged may provide a framework for the continuing investigation of neural substrates underlying auditory perceptual organization. A greater understanding of neural mechanisms involved in processes of auditory perceptual organization suggests additional therapies or other forms of intervention to ameliorate deficits contributing to developmental language disorders.
Article
This study examined the ability of trained listeners to discriminate coherent components in randomly varying spectral patterns. In each observation interval, the listener was presented with a sequence of bursts of multitone complexes having a fixed number of tones (m) in each burst. In the standard interval, the frequency of each tone in every burst was chosen randomly between 200 and 5000 Hz. In the signal interval, the frequencies of n tones were repeated throughout the burst sequence while the remaining m–n tones were chosen at random. The n tones were coherent in the sense that they were perceived as ‘‘sticking together’’ to form a pattern. The listener’s task was to discriminate which burst sequence contained the n components. The results indicated that discrimination improved with increasing n/m, with increasing number of bursts per interval, and declined as the coherent components were increasingly perturbed in frequency. Further, for a fixed value of the ratio n/m discriminability was relatively independent of m. A model incorporating multichannel filtering and an optimum decision rule was reasonably successful in accounting for the experimental results.