ArticlePDF Available

Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence

Authors:

Abstract and Figures

To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency, space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to temporal coherence—the coincidence of sound elements in and across time—is also critical for the perceptual organization of acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory figure-ground segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved based on spectral cues alone. Signals (“stochastic figure-ground”: SFG) comprised a sequence of brief broadband chords containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are perceived as “figures” popping out of a stochastic “ground.” Magnetoencephalography (MEG) measurement in naïve, distracted, human subjects revealed robust evoked responses, commencing from about 150 ms after figure onset that reflect the emergence of the “figure” from the randomly varying “ground.” Neural sources underlying this bottom-up driven figure-ground segregation were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the “classic” auditory system, is also involved in the early stages of auditory scene analysis.”
Content may be subject to copyright.
ORIGINAL ARTICLE
Neural Correlates of Auditory Figure-Ground
Segregation Based on Temporal Coherence
Sundeep Teki1,2,4, Nicolas Barascud1,3, Samuel Picard3, Christopher Payne3,
Timothy D. Grifths1,2 and Maria Chait3
1
Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK,
2
Auditory
Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK,
3
Ear
Institute, University College London, London WC1X 8EE, UK and
4
Current address: Department of Physiology,
Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
Address correspondence to Maria Chait, UCL Ear Institute, University College London, London WC1X 8EE, UK. Email: m.chait@ucl.ac.uk; Sundeep Teki,
Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK. Email: sundeep.teki@gmail.com
Timothy D. Grifths and Maria Chait equally contributed as last authors.
Abstract
To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency,
space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to
temporal coherencethe coincidence of sound elements in and across timeis also critical for the perceptual organization of
acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory gure-ground
segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved
based on spectral cues alone. Signals (stochastic gure-ground: SFG) comprised a sequence of brief broadband chords
containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are
perceived as gurespopping out of a stochastic ground.Magnetoencephalography (MEG) measurement in naïve, distracted,
human subjects revealed robust evoked responses, commencing fromabout 150 ms after gure onset that reect the emergence
of the gurefrom the randomly varying ground.Neural sources underlying this bottom-up driven gure-ground segregation
were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the classicauditory
system, is also involved in the early stages of auditory scene analysis.
Key words: auditory cortex, auditory scene analysis, intraparietal sulcus, magnetoencephalography, segregation, temporal
coherence
Introduction
A major challenge for understanding listening in the crowded en-
vironments we typically encounter involves uncovering the per-
ceptual and neuro-computational mechanisms by which the
auditory system extracts a sound source of interest from a hectic
scene. Until recently, most such attempts focused primarily on
gureand groundsignals that differ in frequency, motivated
by ndings that segregation is associated with activation of
spatially distinct populations of neurons in the primary auditory
cortex (A1), driven by neuronal adaptation, forward masking, and
frequency selectivity (for reviews, see: Fishman et al. 2001,2004;
Carlyon 2004;Micheyl, Carlyon, et al. 2007;Micheyl, Hanson, et al.
© The Author 2016. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Cerebral Cortex, 2016, 112
doi: 10.1093/cercor/bhw173
Original Article
1
Cerebral Cortex Advance Access published June 19, 2016
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
2007;Gutschalk et al. 2008;Elhilali, Ma, et al. 2009;Elhilali, Xiang,
et al. 2009;Fishman and Steinschneider 2010;Kidd et al. 2011;
Moore and Gockel 2012;Snyder et al. 2012).
However, emerging work suggests that spectral separation
per se is neither sufcient (Elhilali, Ma, et al. 2009) nor necessary
(Teki et al. 2011,2013;Micheyl, Kreft, et al. 2013;Micheyl, Hanson,
et al. 2013;Christiansen et al. 2014;OSullivan et al. 2015)for
segregation to take place. Using a broadband signal (stochastic
gure-ground: SFG; Fig. 1), comprised of a sequence of brief
chords containing random pure tone components that vary
from 1 chord to another, we demonstratedthat listeners are high-
ly sensitive to the occasional repetition of a subset of tone-pips
across chords. Perceptually, the repeating tones fuse together to
form a gurethat pops out from the randomly varying ground
(Teki et al. 2011,2013). This emergence of structure from a sto-
chastic background captures the challenges of hearing in com-
plex scenes where sources overlap in spectrotemporal
dimensions such that segregation cannot be achieved based on
spectral cues alone. The notable sensitivity exhibited by listeners
conrms that the auditory system possesses specialized me-
chanisms which are tuned to the temporal coincidence of a
small subset of sound elements within a mixture. The general
pattern of performance, including that it scales with the number
of temporally correlated channels, is consistent with the predic-
tions of a recent model of auditory segregation—“ temporal co-
herence model(see extensive discussion in Shamma et al.
2011;Teki et al. 2013), based on a hypothesized mechanism
that captures the extent to which activity in distinct neuronal po-
pulations that encode different perceptual features is correlated
in time (Krishnan et al. 2014). The model proposes that, in add-
ition to spectral separation, the auditory system relies on tem-
poral relationships between sound elements to perceptually
organize acoustic scenes (Elhilali, Ma, et al. 2009;Shamma et al.
2011;Micheyl, Hanson, et al. 2013;Micheyl, Kreft, et al. 2013).
Using fMRI, and an SFG signal that contained brief gures
interspersed within long random tone patterns, we previously
observed activations in planum temporale (PT), superior tem-
poral sulcus (STS), and, intriguingly, in the intraparietal sulcus
(IPS; Teki et al. 2011) evoked specicallybytheappearanceof
temporally coherent tone patterns. However, due to the poor
temporal resolution of fMRI, it remains unclear at what stage,
in the course of gure-ground segregation, these areas play a
role. In particular, a central issue pertains to whether activity in
IPS reects early processes that are causally responsible for seg-
regation or rather the (later) consequences of perceptual organ-
ization (Cusack 2005;Shamma and Micheyl 2010;Teki et al. 2011).
The present magnetoencephalography (MEG) study was de-
signed to capture the temporal dynamics of the brain regions in-
volved in segregating the SFG stimulus. Participants performed
an incidental visual task while passively listening (in separate
blocks) to 2 versions of SFG signals (Fig. 1). One version (Fig. 1A)
hereafter termed the basiccondition consisted of a sequence
of brief (25 ms) chords, each containing a random number of
pure tone components that varied from 1 chord to the next. Part-
way through the signal, a certain number of components were
xed across chords for the remaining duration. The second ver-
sion (Fig. 1C) contained loud noise bursts (25 ms) interspersed be-
tween successive chords. The noise bursts were intended to
break the pattern of repeating tonal components that comprise
the gure and reduce possible effects of adaptation, which may
underlie gure detection. In previous behavioral experiments
(Teki et al. 2013), this manipulation revealed robust gure-
detection performance. In fact, listeners continued to detect
the guresignicantly above chance for intervening noise
durations of up to 500 ms, demonstrating that the underlying
mechanisms, which link successive temporally coherentcom-
ponents across time and frequency, are robust to interference
over very long time scales.
We used MEG to track, with excellent temporal resolution,
the process of gure-ground segregation and the brain areas in-
volved. We observed robust early (within 200 ms of gure onset)
evoked responses that were modulated by the number of tempor-
ally correlated channels comprising the gure. Sources underlying
this bottom-up gure-ground segregation were localized to PT
and, the IPS, demonstrating that this area, outside the classic
auditory cortex, is also involved in auditory scene analysis.
Materials and Methods
Participants
Sixteen participants (9 females; mean age: 26.9 years) with nor-
mal hearing and no history of audiological or neurological disor-
ders took part in the study. Experimental procedures were
approved by the Institute of Neurology Ethics Committee (Uni-
versity College London, UK), and written informed consent was
obtained from each participant.
Figure 1. Stochastic gure-ground stimulus. (A) An example spect rogram of the
basic SFG stimulus. Signals co nsisted of a seque nce of 25 ms chords, each
containin g a random num ber of pure ton e components t hat varied from 1
chord to the next. At 600 ms after onset (black dashed line), a certain number of
components (coherence =2, 4, or 8; 4 in this example; indicated by arrows) were
xed across chords in the s econd half of the stimulus. The resulting percept is
that of a gurewithin a randomly varying background. ( B) A schematic of the
basic SFG stimulus whose spe ctrogram is shown i n A. Randomly varying
background chords (in black, 25 ms long) form the no-gurepart of the
stimulus. Following the transition (indicated by red dotted lines), 4 extra
components (sh own in pink) are added which are tempo rally correlated in the
gure conditio n (FIG4), while randomly occurring in the ground condition
(GND4). (C) The noise SFG stimulus is construc ted similar to t he basic SFG
stimulus except for the introduc tion of 25 ms chords of whit e noise between
each SFG chord. The plots in A,Crepresent audito ryspectrograms, generated
with a lterbank of 1/ERB wide channels (Equi valent Rectan gular Bandwi dth;
Moore and Glasberg (1983)) equally spaced on a scale of ERB rate. Channels are
smoothed to obtain a temporal resolution similar to the Equivalent Rectangular
Duration (Plack and Moore 1990).
2|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
Stimuli
Signals consisted of a sequence of 25 ms chords, each comprising
a random set of tones, drawn from a xed frequency pool ranging
from 0.17 to 2.5 kHz spaced in 1/24 octave steps. This range is nar-
rower than that in our previous studies (0.177.2 kHz; Teki et al.
2011,2013) due to the low-pass ltering characteristics of the Ety-
motic tubes used for sound delivery. Each chord contained an
average of 10 (varying between 5 and 15) pure tone components
that changed randomly from 1 chord to the next. A gureis in-
corporated in this randomly varying tonal stimulus by randomly
repeating a number of frequencies (coherenceof the gure: 2, 4,
or 8) over a certain number of chords (referred to as the duration
of the gure). The resulting percept is that of a grouped auditory
object(gure) that pops out from the background. Importantly,
the gure is only detectable by integrating the repeating compo-
nents across frequency and time as the backgroundand gure
components are indistinguishable within a particular chord.
In earlier work, we used a stimulus design where the gure
appeared for a brief duration (ranging from 50 to 350 ms) amidst
an ongoing random chord sequence (Teki et al. 2011,2013,2016).
For the present study, the stimulus was modied such that the
gure was introduced exactly midway during the stimulus and
remained present until offset as shown in Figure 1. This design
was used to specically examine time-locked responses evoked
by the appearance of the gure, as well as later activity potential-
ly related to the ongoing representation of the gure amid the
uctuating background.
The stimulus was created by rst generating a background-
only signal for the total duration of the stimulus and then incorp-
orating additional repeating (temporally correlated) tones (2, 4,
or 8, hereby referred to as FIG2,FIG4, and FIG8,respectively)
during the second half of the signal. Similarly, additional uncor-
related components (2, 4, or 8; randomly varying across chords)
were incorporated in the stimuli (50%) that did not contain a g-
ure, to control for the increase in energy associated with the add-
ition of the gure components. These ground(or no-gure)
signals will be referred to as GND2,GND4,andGND8,re-
spectively. See a schematic representation of FIG4 and GND4 sig-
nals in Figure 1. Overall, half of the signals contained a gure
(with equal proportions of FIG2, FIG4, and FIG8) and the other
half did not (with equal proportions of GND2, GND4, and GND8).
Two versions of the SFG stimuli were used in different blocks:
the basicversion (Fig. 1A) consisted of consecutive 25 ms
chords (1200 ms long stimulus with the gure appearing at
600 ms post onset); and the noiseversion (Fig. 1C)consisted
of 25 ms of wide-band white noise interspersed between succes-
sive 25 ms long chords (2400 ms long stimulus with the gure ap-
pearing at 1200 ms post onset; note that the number of chords is
identical to that in the basicstimulus). The level of the noise
was set to 12 dB above the level of the chords.
All acoustic stimuli were created using MATLAB 7.5 software
(The Mathworks Inc.) at a sampling rate of 44.1 kHz and 16-bit
resolution. Sounds were delivered binaurally with a tube phone
attached to earplugs (E-A-RTONE 3A 10 Ω, Etymotic Research,
Inc.) inserted into the ear canal and presented at a comfortable
listening level adjusted individually by each participant. The ex-
periment was executed using the Cogent toolbox (http://www.
vislab.ucl.ac.uk/cogent.php).
Procedure
The recording started with a functional source-localizer session
where participants were required to attend to a series of 100 ms
long pure tones (1000 Hz) for approximately 3 min. A variable
number of tones (between 180 and 200) were presented with a
random interstimulus interval of 7001500 ms. Subjects were
asked to report the total number of tones presented. This locali-
zersession served to identify channels that respond robustly to
sound. These were used for subsequent analysis of the sensor-
evoked responses to the SFG stimuli.
During the experiment, subjects were engaged in an inciden-
tal visual task while passively listening to the SFG stimuli. The
visual task consisted of landscape images, presented in a series
of 3 (each image was presented for 5 s, with an average gap of 2
s between groups during which the screen was blank). Subjects
were instructed to xate in a cross at the center of the display
and press a button whenever the third image in a series was iden-
tical to the rst or the second image. Such repetitions occurred on
10% of the trials. Responses were executed using a button box
held in the right hand. The visual task served as a decoy task
a means to ensure that subjectsattention was diverted away
from the acoustic stimuli. At the end of each block, subjects re-
ceived feedback about their performance (number of hits, misses,
and false positives). To avoid any temporal correlation between
the auditory and visual presentation, the visual task was pre-
sented from a different computer, independent from the one
controlling the presentation of the acoustic stimuli.
The MEG experiment lasted approximately 1.5 h and con-
sisted of 8 blocks. Four blocks involved presentation of the
basicSFG stimulus, while the noisecondition was presented
in the remaining 4 blocks. The order of the presentation was
counterbalanced across subjects. A total of 660 trials were pre-
sented for each condition110 trials for each combination of
stimulus type (gure and ground) and number of added compo-
nents (2, 4, and 8). Each basicblock took between 8 and 10 min
and the noiseblocks took twice as long. Subjects were allowed a
short rest between blocks but were required to stay still.
MEG Data Acquisition and Preprocessing
Data were acquired using a 274-channel, whole-head MEG scan-
ner with third-order axial gradiometers (CTF systems) at a sam-
pling rate of 600 Hz and analyzed using SPM12 (Litvak et al.
2011; Wellcome Trust Centre for Neuroimaging, London) and
Fieldtrip (Oostenveld et al. 2011) in MATLAB 2013 (MathWorks
Inc.). The data from the localizer session were divided into
700 ms epochs, including 200 ms prestimulus baseline period,
baseline-corrected, and low-pass ltered with a cutoff frequency
of 30 Hz. The M100 onset response (Roberts et al. 2000) was iden-
tied for each subject as a source/sink pair in the magnetic eld
contour plots distributed over the temporal region of each hemi-
sphere. For each subject, the 40 most activated channels at the
peak of the M100 (20 in each hemisphere) were selected for sub-
sequent sensor-level analysis of the responses evoked by the SFG
stimuli.
Data epochs from the main experimental blocks consisted of
a 500 ms prestimulus baseline and a 700 ms poststimulus period
(overall 2400 ms for basicand 3600 ms for noiseconditions).
Epochs with peak amplitudes that deviated from the mean by
more than twice the standard deviation (typically 7%) were
agged as outliers and discarded automatically from further ana-
lyses (100 epochs were obtained for each stimulus condition).
Denoising Source Separation analysis (DSS, see de Cheveigné
and Parra 2014 for an extensive review of the method and its ap-
plications) was applied to each stimulus condition to extract
stimulus-locked activity (the most reproducible linear combin-
ation of sensors across trials)the 2 most repeatable
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |3
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
components in each condition were retained and projected back
to sensor space.
Epochs were then averaged and baseline corrected to the
prestimulus interval. In each hemisphere, the root-mean-
squared (RMS) eld strength across 20 channels (selected from
the localizer session) was calculated for each participant. The
time course of the RMS, reecting the instantaneous power of
neural responses, is employed as a measure of neuronal re-
sponses evoked in the auditory cortex. As most of the observed
activity (including in the source space) was in auditory cortex, se-
lecting channels based on the M100 represents a reasonable ap-
proach for summarizing the sensor-level data in a single time
series. For purposes of illustration, group-RMS (RMS of individual
subjectsRMS) is shown, but statistical analysis was always per-
formed across subjects, independently for each hemisphere.
Statistical Analysis
To estimate the time required to discover the gure, the differ-
ence between the RMS waveforms of each FIG and GND pair
was calculated for each participant and subjected to bootstrap re-
sampling (2000 iterations; balanced; Efron and Tibshirani 1993).
The difference was deemed signicant if the proportion of boot-
strap iterations that fell above or below zero was >99.9% (i.e., P<
0.001) for 5 or more consecutive samples. The rst signicant
sample identied in this way is considered the earliest time
point at which the response to the gure differed signicantly
from the corresponding GND control. The bootstrap analysis
was run over the entire epoch duration, and all signicant inter-
vals are indicated in Figures 2and 3as shaded gray regions.
A repeated-measures ANOVA, with mean amplitude between
gure onset and offset (t= 6001200 ms for basic, and t= 1200
2400 ms for noise conditions, respectively) as the dependen t vari-
able, was used to examine global effects of stimulus (FIG or GND),
number of added components (2, 4, and 8), and hemisphere.
Source Analysis
Source analysis was performed using the generic Imagingap-
proach implemented in SPM12 (Litvak et al. 2011;López et al.
2014). We used a classical minimum norm algorithm that seeks
to achieve a good data t while minimizing the overall energy
of the sources. In SPM12, this method is referred to as independ-
ent identical distribution (IID) as it is based on the assumption
that the probability of each source being active is independent
and identically distributed (Hämäläinen and Ilmoniemi 1994).
The IID method corresponds to a standard L2-minimum norm,
which consists of tting the data at the same time as minimizing
the total energy of the sources. Standard processing steps were
employed. Specically, data were rst concatenated across blocks
for each participant. A generic 8196-vertex cortical mesh tem-
plate was coregistered (as provided in SPM12 and dened in the
MNI stereotaxic space) to the sensor positions using 3 ducial
marker locations (Mattout et al. 2007). We then used a standard
Figure 2. MEG evoked responses to the basic SFG stimul us. (Top) Each plot dep icts the group-RMS res ponse to the basic SFG stimulus in the right hemisphere (left
hemisphere responses are identical). The onset of the stimulus occurs at t= 0 and offset at t= 1200 ms, the transition to the gure, as indicated by the dashed vertical
lines, occurs at t= 600 ms. The responses to the gure and ground segments are shown in the dar ker and lighter shade of each color: red (FIG8 and GND8), blue (FIG4
and GND4), green (FIG2 and GND2). The shaded gray bars indicate times where a signicant difference between the response to the gure and its corres ponding
control stimulus was observed (based on bootstrap analysis; see Materials and Methods). (Bottom) Mean RMS amplitude in each of the conditions comp uted over the
gure interval (between 600 and 1200 ms poststimulus onset). A repeated-measures ANOVA analysis indicated signicant differences between each FIG and GND pair.
** indicates P0.01.
4|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
single-shell head model for the forward computation of the gain
matrix of the lead eld model (Nolte 2003). Source estimates on
the cortical mesh were obtained via inversion of the forward
model with the IID method described above.
The IID model was used to identify distributed sources of
brain activity underlying the transition from a background to a
coherent gure. The inverse estimates were obtained for all 6 ex-
perimental conditions together to allow statistical comparisons
between them (basicand noiseblocks had to be analyzed
separately due to the different epoch lengths). The inverse recon-
struction was performed over the largest possible time window
to let the algorithm model all the brain sources generating the re-
sponse (Litvak et al. 2011). For the basicstimuli, the inversion
used the entire stimulus epoch (from 300 ms to + 1700 ms, rela-
tive to stimulus onset). For the noisestimuli, this approach did
not work well, because the signal-to-noise ratio of the data is in-
trinsically much smaller. To conrm this, we compared model
evidences in the noiseconditions with inversion over the entire
epoch, and inversion over the stimulus epoch from +1200 ms to
+2700 ms. The latter yielded a more accurate inversion for all
subjects (the difference in log-model evidences, i.e., log Bayes
factor, was >10 for all subjects; Penny et al. 2004) and was
therefore used for source localization.
Source activity for each condition was then summarized as
separate NIfTI images, by applying the projectors estimated
in the inversion stage to the averaged trials (to localize evoked
activity), over 2 distinct time windows: an initial transition
phase (early; a 100 ms period starting from the rst time
point at which the gure and the corresponding ground became
signicantly different), as well as a later phase (late; a 100 ms
period before stimulus offset). The specicvaluesforthetime
windows used for source localization for each coherence value
are detailed in the Results section. The resulting 3D images
were smoothed using a Gaussian kernel with 5-mm full-width
at half maximum and taken to second-level analysis for statistic-
al inference.
At the second level, the data were modeled with 6 conditions
(GND8, GND4, GND2, FIG2, FIG4, and FIG8) with a design matrix
including a subject-specic regressor and correcting for hetero-
scedasticity across conditions. We sought to identify brain
areas whose activity increased parametrically with correspond-
ing changes in coherence (i.e., over and above the changes in
power associated with adding components). For this purpose, a
parametric contrast [842 +2 +4+8]/14 was used. Effectively,
the contrast can be expressed as: 2 × (FIG2-GND2) + 4 × (FIG4-
GND4) + 8 × (FIG8-GND8), thus targeting brain regions whose ac-
tivity is parametrically modulated by rising temporal coherence
(2 < 4 <8) while controlling (by subtracting activity of matched
GND signals) for the increase in power associated with the
added gure components. We also used a simple Figure versus
Groundcontrast: [111 1 1 1]/3. Statistical maps were initially
thresholded at a level of P< 0.001 uncorrected, and peaks were
considered signicant only if they survived family-wise error
(FWE) correction at P< 0.05 across the whole- brain volume. Be-
cause we had prior hypotheses regarding PT and IPS based on
our fMRI results (Teki et al. 2011), FWE correction was also applied
using small volume correction (SVC; Frackowiak et al. 2003) with-
in these regions. Due to limitations inherent to the resolution of
Figure 3. MEG evoked responses to the no ise SFG stimulus. (Top) Each plot depicts the group-RMS resp onse to the noise SFG stimulu s in the right hemisphere (left
hemisphere responses are identical). The onset of the stimulus occurs at t= 0 and offset at t= 2400 ms, the transition to the gure, as indicated by the dashed vertical
lines, occurs at t= 1200 ms. The responses to the gure and ground segments are shown in the darker and lighter shade of each color: red (FIG8 and GND8), blue (FIG4
and GND4), green (FIG2 and GND2). The shaded gray bars indicate times where a signicant difference between the response to the gure and its corres ponding
control stimulus was observed (based on bootstrap analysis; see Materials and Methods). (Bottom) Mean RMS amplitude in each of the conditions comp uted over the
gure interval (between 1200 and 240 0 ms poststimulus onset). A repeate d-measures ANOVA analysis indicate d signicant diff erences between each FI G and GND
pair. ** indicates P0.01; * indicates P> 0.02.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |5
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
our source analysis, a (conservative) compound mask over PT and
the adjacent Heschls Gyrus (HG) was used. The corresponding
cortical masks were determined by reference to the Juelich histo-
logic atlas (for IPS; Eickhoff et al. 2005;Choi et al. 2006;Scheper-
jans et al. 2008), and the HarvardOxford Structural atlases (for
Heschls gyrus and PT; Desikan et al. 2006), available in FSLview
(http://surfer.nmr.mgh.harvard.edu/), thresholded at 10% prob-
ability. SVC-corrected regions are indicated by asterisks in Ta-
bles 1and 2.
Results
The performance on the incidental visual task was at ceiling for
all participants, suggesting that they remained engaged in the
task throughout the experiment. Since participants were naive
to the nature of the acoustic stimuli, and thus unlikely to actively
attend to the gures, it can be assumed that the observed audi-
tory responses primarily reect bottom-up, stimulus-driven
processes.
Basic SFG: Evoked Responses
Figure 2shows the group-RMS of stimulus-evoked elds, separ-
ately for the corresponding FIG and GND conditions, in the
right hemisphere (a similar pattern is observed in the left hemi-
sphere). In all conditions, a standard sound onset response is ob-
served with a clear M50, M100, and M200 peak complex (indicated
in Fig. 2). The ongoing slow evoked response is characterized by a
constant 40 Hz uctuation of mean evoked power, which follows
the rate of presentation of individual chords (every 25 ms).
Following the transition to the gure, clear evoked responses
are observed in all FIG conditions. This response consists of an
early transient phase characterized by a sharp increase over a
few chords (more evident for FIG8 and FIG4), leading to a local
maximum in evoked activity, and followed by a more sustained
phase until stimulus offset.
The responses to the control GND stimuli allow us to distin-
guish whether the gure-evoked responses are mediated simply
by an increase in energy associated with the additional compo-
nents or relate specically to the computation of temporal coher-
ence, linked to the appearance of the gure. Indeed, a transition
response (i.e., increase in RMS amplitude as a function of the
number of added components) is also present in the GND condi-
tions. However, this response is signicantly lower in amplitude
and lacks the initial transient phase (sharp increase in power),
demonstrating that the response observed for the FIG conditions
is largely driven by the temporal coherence of the components
comprising the gure.
Bootstrap analysis (see Materials and Methods) revealed that
the difference between the response to the gure and its corre-
sponding control condition remains signicant throughout the
gure segment (indicated by the gray-shaded region), until after
sound offset. The rst signicantly different sample (i.e., the time
when the response to the FIG condition rst diverges from that to
GND) occurred at 158 ms (158 ms), 206 ms (195 ms), and 280 ms
(225 ms) posttransition in the left (right) hemispheres for FIG8,
FIG4, and FIG2, respectively (see Fig. 2).
A repeated-measures ANOVA with mean amplitude during
the gure interval as the dependent variable, and condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of hemi-
sphere (F
1,15
=3.25 P=0.091) but conrmed signicant main
effects of condition: F
1,15
= 58.53, P< 0.001, number of added com-
ponents: F
2,30
= 27.26, P< 0.001, as well as a signicant interaction
between condition and number of added components: F
2,30
=
13.25, P< 0.001. The interaction indicates that the amplitude of
mean evoked eld strength is higher for the gure, over and
above the effect of increase in spectral energy, and it increases
signicantly with the number of coherent components in the g-
ure. We refer to this effect as the effect of coherence. A series of
repeated-measures ttests for each FIG and its corresponding
GND (data averaged over hemispheres) conrmed signicant dif-
ferences for all pairs (FIG8 vs. GND8: t=7.01P< 0.001; FIG4 vs.
GND4: t=6.77 P< 0.001; FIG2 vs. GND2: t=4.25P= 0.01 ), demon-
strating that the brains of naive listeners are sensitive to the tem-
poral coherence associated with only 2 repeating components.
Noise SFG: Evoked Responses
Figure 3shows group-RMS of stimulus-evoked elds for the noise
SFG stimuli. The general time course of evoked activity is similar
to that observed for the basic SFG stimulus. The ongoing slow
evoked response is characterized by a constant 20 Hz uctuation
of mean evoked power, which follows the rate of the (loud) noise
bursts interspersed between chords.
Table 2 MEG sources whose activity increased with coherence for the
noise SFG stimulus
Area Hemisphere Response
phase
xyztvalue
PT* R Early 60 24 20 4.21
62 28 6 4.03
PT* L Early 60 32 24 4.12
62 38 14 3.86
PT L Late 62 30 24 6.76
60 38 14 5.44
Postcentral
gyrus
RLate4016 38 5.40
50 14 42 4.99
PT* R Late 54 26 18 4.34
64 34 12 4.15
IPS* R Late 30 40 62 4.18
28 46 54 4.13
Note: Local maxima are shown at P< 0.05 (FWE) at the whole-brain level.
*Small volume-corrected P< 0.05 (FWE).
Table 1 MEG sources whose activity increased with coherence for the
basic SFG stimulus
Brain
area
Hemisphere Response
phase
x y z {mm} tvalue
PT R Early 52 18 12 6.23
64 18 6 6.12
HG* R Early 42 26 8 4.31
IPS R Early 50 56 34 5.98
48 62 28 4.53
IPS L Early 36 72 42 4.81
30 66 46 4.59
PT* L Early 60 32 12 3.56
58 30 18 3.46
IPS R Late 48 56 32 5.08
44 62 28 3.93
PT* R Late 56 6 12 4.16
64 20 8 3.75
PT* L Late 50 20 14 3.75
56 20 4 3.44
Note: Local maxima are shown at P< 0.05 (FWE) at the whole-brain level.
*Small volume-corrected P< 0.05 (FWE).
6|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
The addition of a gure is associated with a sharp increase in
power, followed by a sustained-like phase that persists until
stimulus offset. A bootstrap analysis revealed signicantly great-
er responses to each gure condition compared with its corre-
sponding control, as shown in Figure 3. The latencies at which
FIG responses became signicantly different from the responses
to the GND were approximately 238 ms (300 ms), 720 ms (410 ms),
and 412 ms (412 ms) in the left (right) hemisphere for coherence
of 8, 4, and 2, respectively.
A repeated-measures ANOVA with mean amplitude during
the gure interval as the dependent variable, and condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of hemi-
sphere (F
1,15
=3.21 P=0.093) but conrmed signicant main
effects of condition: F
1,15
= 31.98, P< 0.001, number of added com-
ponents: F
2,30
= 7.28, P= 0.003, as well as a signicant interaction
between condition and number of added components: F
2,30
=
4.55, P= 0.019 (effect of gure coherence). A series of ttests for
each FIG and GND pair (data averaged over hemispheres) con-
rmed signicant differences for all [FIG8 vs. GND8: t=5.02 P<
0.001; FIG4 vs. GND4: t=2.4P= 0.024; FIG2 vs. GND2: t=2.84 P=
0.012], demonstrating that despite the loud noise interspersed
between successive chords (resulting in large power uctuations
across the entire spectrum and therefore reduced power differ-
ences between channels) even a gure consisting of only 2 coher-
entcomponentsisreliablyencodedbythebrainsofnaive
listeners.
A repeated-measures ANOVA (over mean amplitude during
the gure period) with block (basicvs. noise), condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of
block (F
1,15
=2.5 P= 0.128) or hemisphere (F
1,15
=3.5,P= 0.08 ) but
conrmed signicant main effects of condition: F
1,15
=61.3,P<
0.001, number of added components: F
2,30
= 23.33, P< 0.001, as
well as an interaction between condition and number of added
components: F
2,30
= 15.06, P< 0.001 (effect of gure coherence; as
observed separately for basicand noisestimuli). The follow-
ing interactions were also signicant: 1) between block and num-
ber of added components F
1,15
= 16.2, P= 0.001, 2) be tween block
and condition F
2,30
= 5.23 P= 0.01, both due to the fact that the ef-
fects of condition and number of components were weaker in the
Noiserelative to the Basicstimuli. Crucially however, both
stimulus types show similar coherence effects.
Basic SFG: Source Analysis
To identify brain regions whose activity is parametrically modu-
lated by the coherence of the gure (on top of the increase in
power associated with the added gure components), we tested
for a signal increase with a parametric contrast over GND8,
GND4, GND2, FIG2, FIG4, and FIG8 conditions (see Materials
and Methods). This contrast mirrors the interaction observed
in the analysis of the time domain data and is in line with our pre-
vious fMRI study, where signicant parametric BOLD responses
were observed in the right PT and IPS (Teki et al. 2011). Although
the spatial resolution of MEG does not match the high resolution
provided by fMRI, recent advances in MEG source inversion tech-
niques permit source localization with a relatively high degree of
precision (López et al. 2014).
To capture effects associated with the initial discovery of the
gures as well as later processes related to tracking the gures
within the random background, we analyzed sources of evoked
eld strength in two 100 ms time windows: 1) Early: starting
from the rst time sample that showed signicant difference
between the gure and ground conditions as determined by the
bootstrap analysis above (FIG 8: t= 158258 ms ; FIG4: t= 195
295 ms; FIG2: t= 225325 ms) and 2) Late: during the sustained
portion of the response, immediately preceding the offset of
the stimulus (i.e., from t= 11001200 ms).
The results for the early phase revealed robust activations in
the PT bilaterally (P< 0.05 FWE), and the right inferior parietal
cortex bordering the supramarginal gyrus that varied paramet-
rically with the coherence of the gure (Fig. 4A;Table1). We
also observed activation in the IPS, and the corresponding activa-
tion clusters were clearly spatially separated from the temporal
sources, even in the uncorrected P< 0.001 t-maps (see Fig. 4).
We also observed some activity in lateral HG that was contiguous
with the PT activity in the right hemisphere only. A separate
mask, centered on bilateral medial HG, suggested that coher-
ence-related activity is also observed in the primary auditory cor-
tex. However, due to this being a post hoc analysis, and also
because of limits inherent to the resolution of MEG source ana-
lysis used here, it is difcult to distinguish this cluster from PT.
Activations in the late phase also involved PT bilaterally and
right inferior parietal cortex (P< 0.05 FWE; small volume-cor-
rected). We also examined activity in the IPS during both time
windows. Figure 4C,Dshow signicant activation clusters in the
IPS (P< 0.05 FWE; small volume-corrected) observed during the
early and later response phase, respectively. There was no inter-
action between response phase (earlyor late) and coherence,
suggesting that IPS and PT contributed to the early and late phase
processing in equal measure.
Figure 5shows the group-averaged source waveforms ex-
tracted from the right PT and IPS for the basic condition. Both
show activation consistent with the sensor-level RMS data (see
Fig. 2). The IPS source exhibits weaker onset and offset responses,
and lower amplitude sustained activity, consistent with its loca-
tion further upstream within the processing hierarchy. Import-
antly, however, the response associated with the appearance of
the gure is similar in magnitude in both areas. A repeated-mea-
sures ANOVA, with area (PT and IPS) and number of components
as factors, was run on the mean amplitude difference between
FIG and GND pairs during the gure period (6001200 ms post
onset). This showed a signicant main effect of number of com-
ponents (F
2,30
= 3.70, P< 0.05) only. The effect of area was not
signicant (F
1,15
= 0.16, P> 0.1), and there was n o interaction
between factors (F
2,30
= 0.87, P>0.1), conrming that the effect
of coherence was equally present in both PT and IPS.
We also conducted simple contrasts of Figureversus
Ground(over the earlytime windows as described above).
The negative contrast (Ground>Figure) was used to address
an alternative hypothesis for the mechanisms underlying gure-
ground segregation, that is, adaptation-based mechanisms (e.g.,
stimulus-specic adaptation, SSA; Nelken 2014;Pérez-González
and Malmierca 2014), which may be sensitive to repetition within
the coherentchannels. This effect would be observable as a
decrease in activation for FIG relative to GND stimuli. However,
the relevant contrast yielded no signicant activations, both
over the entire brain volume and within HG-centered masks.
The positive contrast (Figure>Ground) yielded activations
essentially identical to those reported in Figure 4and Table 1.
Noise SFG: Source Analysis
We examined the sources of evoked eld strength underlying g-
ure-ground processing in the noise SFG stimulus using a 200 ms
long window starting from the rsttimesamplethatshowed
signicant difference between the gure and its ground control
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |7
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
as determined by the bootstrap analysis (FIG 8: t= 238438 ms;
FIG4: t= 410610 ms; FIG2: t= 412612 ms), and another 200 ms
window, during the later phase of the response, immediately
preceding the offset of the gure segment (t= 22002400 ms).
The longer window for source localization in the noise condition
(100 ms SFG chords and 100 ms noise) is effectively equal to the
100 ms window (all SFG chords) used for the localization of
activations in the basic condition.
The results for the early phase revealed robust activations in
the PT bilaterally (P< 0.05 FWE) and the right inferior parietal cor-
tex, including the supramarginal gyrus that varied parametrical-
ly with the coherence of the gure (Fig. 6A; Table 2). Activations in
the late phase (Fig. 6B; Table 2) involved PT bilaterally and right
inferior parietal cortex (P< 0.05 FWE; small volume-corrected).
Figure 6Cshows signicant activity in the right IPS (P<0.05
FWE; small volume-corrected), observed during the later re-
sponse phase only. However, no signicant interaction between
phase (earlyor late) and coherence was found. Thus, despite
the fact that the Noisecondition localization was substantially
noisier than that for the basic condition (as also reected in the
weaker sensor-level responses), the results suggest a pattern of
activation similar to that for the basiccondition.
Discussion
We used MEG to assess the temporal dynamics of stimulus-
driven gure-ground segregation in naive, passively listening
participants. We used the Stochastic Figure-ground(SFG)
stimulusa complex broadband signal, which comprises a
gure,dened by temporal correlation between distinct fre-
quency channels. The SFG stimulus differs from other commonly
used gure-ground signals (Kidd et al. 1994,1995;Micheyl,
Hanson, et al. 2007;Gutschalk et al. 2008;Elhilali, Xiang, et al.
2009) in that the gureand groundoverlap in spectrotemporal
space like most natural sounds do, and segregation can only be
achieved by integration across frequencya nd time (Teki et al. 2013).
Evoked Transition Responses
Our results revealed robust evoked responses, commencing from
about 150200 ms after gure onset, that reect the emergence of
the gurefrom the randomly varying groundin the absence
of directed attention. The amplitude and latency of these re-
sponses varied systematically with the coherence of the gure.
Similar effects of coherence (for a coherence level of 8 and 10)
were recently reported in an EEG study based on a variant of
the basicSFG stimulus which used continuous changes in the
level of coherence (OSullivan et al. 2015). However, they observed
much longer latencies (e.g., 433 ms for a ramped SFG gure with a
coherence of 8) than those here, possibly due to differences in the
stimuli used.
The early transient responses were followed by a sustained-
like phase, continuing until gure offset, the amplitude of
which also varied systematically with coherence. This general
pattern was observed for the basic (Fig. 2) and, remarkably, the
noise SFG stimulus (Fig. 3)where successive chords are
Figure 4. MEG source activations as a func tion of coherence for the basic SFG stimulus. Activations (thresholded at P< 0. 001, uncorrected) are shown on the superior
temporal plane of th e MNI152 template image and the corresponding ycoordinates are overlaid on each image. The he at map adjacent to each gure depict s the T
value. Coordinates of local maxima are provided in Table 1. Maximum response during the early transition period was observed in PT and right inferior parietal cortex
(A) as well as in the right IPS (C). Activity during the later response window was observed in PT bilaterally and the right inferior parietal cortex (B) as well as in both the left
and right IPS (D).
8|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
separated by loud noise bursts. These results demonstrate that
the underlying brain mechanisms, hypothesized to compute
temporal coherence across frequency channels (Shamma et al.
2011), are robust to interference with the continuity of the
scene, even when listeners were naive and engaged in an inci-
dental visual task.
Figure 5. Group average of source activitywaveforms for the basic SFG stimuli. The average source activity waveforms for the basic SFG stimuli were computed for sources
in the right posterior superior temporal gyrus (MNI coordinates [64, 14, 8]; left panels) and the right intraparietal sulcus (MNI coordinates [54, 50, 40]; right panels). The 6
experimental conditions (FIG and GND; 2, 4 and 8 added components) were inverted together over the entire stimulus epoch, and the corresponding source activity was
extracted using the maximum a posteriori (MAP) projector in SPM12 (López et al. 2014). The resulting time-course datawere rst averaged over trials, then over all available
subjects (N= 16). The onset (t= 0 ms) and offset (t= 1200 ms) of the stimulus aremarked by solid vertical lines; dashed vertical lines indicate the transition to the gure
(t= 600 ms). The brain insets in each panel indicate the location of the sources for each region.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |9
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
We additionally show that these transition responses scale
with coherence not only at the sensor level but also at the level
of the underlying neural sources. As shown in Figure 5, group-
averaged source waveforms from the right PT show a similar
morphology to the sensor-level transition responses: an initial
transient peak is followed by a more sustained response, and
the amplitude of these 2 response components varies paramet-
rically with the coherence. Interestingly, group-averaged source
responses from the right IPS also show striking coherence-modu-
lated transition responses.
Neural Substrates of Figure-Ground Segregation
The discussion below is predominantly focused on the localiza-
tion results from the basiccondition. The responses for the
noisecondition were overall consistent with the basicre-
sponses but, as expected, yielded weaker effects.
The approach for identifying the neural substrates underlying
the detection of the SFG gures was based on a parametric con-
trast, seeking brain areas where activity is parametrically modu-
lated by the coherence of the gure. We also investigated a simple
ground > gurecontrast to address the alternative hypothesis
that gure pop-out may be driven by frequency-specic adapta-
tion (Nelken 2014). According to this account, the presence of the
gure may be detectable as a (repetition-based) decrease in activ-
ity within the coherentchannels. That the ground versus gure
contrast yielded no signicant activations, and in particular none
in the primary auditory cortex where stimulus-specic
adaptation is widely observed, suggests that adaptation may
not be the principal mechanism underlying gure-ground segre-
gation in the SFG stimulus. This is also in line with behavioral re-
sults that show that listeners can withstand signicant amounts
of interference, such as loud noise bursts up to 500 ms long, be-
tween successive chords (Teki et al. 2013).
Using the parametric contrast, we analyzed sources of evoked
eld strength in 2 different time windows to potentially capture 2
distinct response components: an early transient response re-
ecting the detection of the gure and later processes related to
following the gure amidst the background. We found signicant
activations in PT (Figs 4A,Band 6A,B) in both the early and later
stages. This is in agreement with previous human fMRI studies
of segregation (Gutschalk et al. 2007;Wilson et al. 2007;Schad-
winkel and Gutschalk 2010a,b) based on simple tone streams
with different spectral or spatial cues for segregation. The similar
patternof activations in PT for both stimulus conditions suggests a
common stimulus-driven segregation mechanism that is sensitive
to the emergence of salient targets in complex acoustic scenes.
Teki et al. (2011) did not observe any signicant BOLD activa-
tion related to gure-ground segregation in primary auditory cor-
tex in the region of medial Heschls gyrus. Similarly, Elhilali, Ma,
et al. (2009) did not nd evidence of temporal coherence-based
computations in the primary auditory cortex of awake ferrets
passively listening to synchronous streaming signals. This
could possibly be due to the low spike latencies (20 ms) in pri-
mary cortex, whereby longer integration windows as observed
in secondary and higher order auditory cortices (Bizley et al.
Figure 6. MEG sourceactivations as a functionof coherence forthe noise SFG stimulus.Activations (thresholded at P< 0.001,uncorrected) are shownon the superior temporal
plane of theMNI152 template image,and the correspondingycoordinatesare overlaid on eachimage. The heat mapadjacent to each guredepicts the tvalue. Coordinatesof
local maxima areprovided in Table 2. Maximum responseduring the early transition period was observedin the rightPT and left MTG (A). Signicantactivity during the late
response period was observed in the PT bilaterally as well asthe right precentral gyrus and rolandic operculum (B) and in the left IPS (C).
10 |Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
2005;Hackett 2011;Atiani et al. 2014) might be crucial for analysis
of temporal coherence across remote cortical ensembles. The
present results tentatively indicate some evidence of coher-
ence-related activation in human primary auditory cortex during
the early phase, but we cannot exclude the possibility that theob-
served cluster reects spillageof activity from PT and the issue
should be elaborated on with further work. Although how and
where the precise computational operations that underlie tem-
poral coherence analysis (Krishnan et al. 2014) are implemented
in the brain is not completely clear, it is likely that such opera-
tions occur along a processing hierarchy whereby cells in higher
order centers abstract temporal responses from lower level pro-
cessing stages. The present results demonstrate that PT forms
part of this network.
We found signicant activity in the IPS during both early and
late response phases (Figs 4C,D and 6C). These results are in line
with our previous fMRI work where we observed that activity in
the IPS increases parametrically with the coherence of the gures
(Teki et al. 2011). The nding that IPS activity is modulated sys-
tematically by coherence is consistent with earlier work implicat-
ing the IPS in perceptual organization of streaming signals
(Cusack 2005). Since this area lies outside of the classicden-
ition of the auditory system, it has previously been suggested
that IPS activation may not reect auditory processing per se
but rather relate to attentional effects such as the application of
top-down attention (Cusack 2005) or the perceptual conse-
quences of a bottom-up pop-out process(Shamma and Micheyl
2010;Teki et al. 2011). Due to the inherently low temporal reso-
lution of fMRI, and hence the lack of precise information regard-
ing the timing of the observed BOLD activations, this conjecture
was unresolvable in previous data. Our subjects were naive and
occupied by an incidental task and as such it is unlikely that
they were actively trying to hear out the gures from within the
background. This, together with the nding that coherence-
modulated IPS activity is observed at the earliest stages of the
evoked response, strongly supports the hypothesis that IPS is in-
volved in the initial stages of gure-ground segregation.
Because the computation of temporal coherence relies on re-
liable, phase-locked encoding of rapidly evolving auditory infor-
mation, it is likely that the temporal coherence maps as such are
computed in auditory cortex, perhaps in PT. IPS might be in-
volved in reading out these coherence maps or in the actual pro-
cess of perceptual segregation (encoding the input as consisting
of several sources rather than a single mixture). Specically, IPS
may represent a computational hub that integrates auditory
input from the auditory parabelt (Pandya and Kuypers 1969;
Divac et al. 1977;Hyvarinen 1982) and forms a relay station be-
tween the sensory and prefrontal cortex, which associates sen-
sory signals with behavioral meaning (Petrides and Pandya
1984;Fritz et al. 2010). Similar computational operations have
been attributed to the parietal cortex in saliency map models of
visual feature search (Gottlieb et al. 1998;Itti and Koch 2001;
Walther and Koch 2007;Geng and Mangun 2009). Overall our re-
sults suggest that IPS plays an automatic, bottom-up role in audi-
tory gure-ground processing, and call for a re-examination of
the prevailing assumptions regarding the neural computations
and circuits that mediate auditory scene analysis.
Funding
This work is supported by the Wellcome Trust (WT091681MA and
093292/Z/10/Z). S.T. is supported by the Wellcome Trust (106084/
Z/14/Z). Funding to pay the Open Access publication charges for
this article was provided by Wellcome Trust.
Notes
We thank Alain de Cheveigné and the MEG group at the Well-
come Trust Centre for Neuroimaging for technical support. Con-
ict of Interest: The authors declare no competing nancial
interests.
References
Atiani S, David SV, Elgueda D, Locastro M, Radtke-Schuller S,
Shamma SA, Fritz JB. 2014. Emergent selectivity for task-
relevant stimuli in higher-order auditory cortex. Neuron.
82:486499.
Bizley JK, Nodal FR, Nelken I, King AJ. 2005. Functional organiza-
tion of ferret auditory cortex. Cereb Cortex. 15:16371653.
Carlyon RP. 2004. How the brain separates sounds. Trends Cog Sci.
8:465471.
Choi HJ, Zilles K, Mohlberg H, Schleicher A, Fink GR, Armstrong E,
Amunts K. 2006. Cytoarchitectonic identication and prob-
abilistic mapping of two distinct areas within the anterior
ventral bank of the human intraparietal sulcus. J Comp
Neurol. 495:5369.
Christiansen SK, Jepsen ML, Dau T. 2014. Effects of tonotopicity,
adaptation, modulation tuning, and temporal coherence in
primitiveauditory stream segregation. J Acoust Soc Am.
135:323333.
Cusack R. 2005. The intraparietal sulcus and perceptual organiza-
tion. J Cogn Neurosci. 17:641651.
de Cheveigné A, Parra LC. 2014. Joint decorrelation, aversatile tool
for multichannel data analysis. Neuroimage. 98:487505.
Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC,
Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT,,
et al. 2006. An automated labeling system for subdividing
the human cerebral cortex on MRI scans into gyral based re-
gions of interest. Neuroimage. 31:968980.
Divac I, Lavail JH, Rakic P, Winston KR. 1977. Heterogenous afferents
to the inferior parietal lobule of the rhesus monkey revealed by
the retrograde transport method. Brain Res. 123:197207.
Efron B, Tibshirani R. 1993. An introduction to the bootstrap. Boca
Raton (FL): Chapman & Hall/CRC.
Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR,
Amunts K, Zilles K. 2005. A new SPM toolbox for combining
probabilistic cytoarchitectonic maps and functional imaging
data. Neuroimage. 25:13251335.
Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA. 2009. Tem-
poral coherence in the perceptual organization and cortical
representation of auditory scenes. Neuron. 61:317329.
Elhilali M, Xiang J, Shamma SA, Simon JZ. 2009. Interaction
between attention and bottom-up saliency mediates the re-
presentation of foreground and background in an auditory
scene. PLoS Biol. 7:e1000129.
Fishman YI, Arezzo JC, Steinschneider M. 2004. Auditory stream
segregation in monkey auditory cortex: effects of frequency
separation, presentation rate, and tone duration. J Acoust
Soc Am. 116:16561670.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M. 2001. Neural
correlates of auditory stream segregation in primary auditory
cortex of the awake monkey. Hear Res. 151:167187.
Fishman YI, Steinschneider M. 2010. Formation of auditory
streams. In: Rees A, Palmer AR, editors. The Oxford handbook
of auditory science. Oxford: Oxford University Press.
Frackowiak RSJ, Friston KJ, Frith C, Dolan R, Price CJ, Zeki S,
Ashburner J, Penny WD. 2003. Human brain function. 2nd
ed. Cambridge: Academic Press.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |11
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
Fritz JB, David SV, Radtke-Schuller S, Yin P, Shamma SA. 2010.
Adaptive, behaviorally gated, persistent encoding of task-
relevant auditory information in ferret frontal cortex. Nat
Neurosci. 13(8):10111019.
Geng JJ, Mangun GR. 2009. Anterior intraparietal sulcus is sensi-
tive to bottom-up attention driven by stimulus salience.
J Cogn Neurosci. 21:15841601.
Gottlieb JP, Kusunoki M, Goldberg ME. 1998. The representation of
visual salience in monkey parietal cortex. Nature. 391:481484.
Gutschalk A, Micheyl C, Oxenham AJ. 2008. Neural correlates of
auditory perceptual awareness under informational masking.
PLoS Biol. 6:e138.
Gutschalk A, Oxenham AJ, Micheyl C, Wilson EC, Melcher JR.
2007. Human cortical activity during streaming without spec-
tral cues suggests a general neural substrate for auditory
stream segregation. J Neurosci. 27:1307413081.
Hackett TA. 2011. Information ow in the auditory cortical
network. Hear Res. 271:133146.
Hämäläinen MS, Ilmoniemi RJ. 1994. Interpreting magnetic elds
of the brain: minimum norm estimates. Med Biol Eng Comput.
32:3542.
Hyvarinen J. 1982. The parietal cortex of monkey and man. Berlin:
Springer-Verlag.
Itti L, Koch C. 2001. Computational modeling of visual attention.
Nat Rev Neurosci. 2:194203.
Kidd G, Mason CR, Dai H. 1995. Discriminating coherence in spec-
tro-temporal patterns. J Acoust Soc Am. 97:37823790.
Kidd G, Mason CR, Deliwala PS, Woods WS, Colburn HS. 1994.
Reducing informational masking by sound segregation.
J Acoust Soc Am. 95:34753480.
Kidd G, Richards VM, Streeter T, Mason CR. 2011. Contextual
effects in the identication of nonspeech auditory patterns.
J Acoust Soc Am. 130:39263938.
Krishnan L, Elhilali M, Shamma S. 2014. Segregating complex
sound sources through temporal coherence. PLoS Comput
Biol. 10:e1003985.
Litvak V, Mattout J, Kiebel S, Phillips C, Henson R, Kilner J, Barnes G,
Oostenveld R, Daunizeau J, Flandin G,, et al. 2011. EEG and MEG
data analysis in SPM8. Comput Intell Neurosci. 1:32.
López JD, Litvak V, Espinosa JJ, Friston K, Barnes GR. 2014.
Algorithmic procedures for Bayesian MEG/EEG source recon-
struction in SPM. NeuroImage. 84:476487.
Mattout J, Henson RN, Friston KJ. 2007. Canonical source
reconstruction for MEG. Comput Intell Neurosci. 67613. doi:
10.1155/2007/67613.
Micheyl C, Carlyon RP, Gutschalk A, Melcher JR, Oxenham AJ,
Rauschecker JP, Tian B, Courtenay Wilson E. 2007. The role
of auditory cortex in the formation of auditory streams.
Hear Res. 229:116131.
Micheyl C, Hanson C, Demany L, Shamma S, Oxenham AJ. 2013.
Auditory stream segregation for alternating and synchronous
tones. J Exp Psychol Hum Percept Perform. 39(6):15681580.
Micheyl C, Kreft H, Shamma S, Oxenham AJ. 2013. Temporal co-
herence versus harmonicity in auditory stream formation.
J Acoust Am Soc. 133(3):188194.
Micheyl C, Shamma S, Oxenham AJ. 2007. In: Kollmeier B,
Klump G, Hohmann V, Langemann U, Mauermann M,
Uppenkamp S, Verhey J, editors. Hearing from basic
research to application. Berlin: Springer. p. 267274.
Moore BCJ, Glasberg BR. 1983. Suggested formulae for calculating
auditory-lter bandwidths and excitation patterns. J Acoust
Soc Am. 74:750753.
Moore BCJ, Gockel HE. 2012. Properties of auditory stream forma-
tion. Phil Trans R Soc. 367:919931.
Nelken I. 2014. Stimulus-specic adaptation and deviance detec-
tion in the auditory system: experiments and models. Biol
Cybern. 108:655663.
Nolte G. 2003. The magnetic lead eld theorem in the quasi-static
approximation and its use for magnetoencephalography
forward calculation in realistic volume conductors. Phys
Med Biol. 48:36373652.
Oostenveld R, Fries P, Maris E, Schoffelen J-M. 2011. FieldTrip:
open source software for advanced analysis of MEG, EEG,
and invasive electrophysiological data. Comput Intell
Neurosci. 2011:156869.
OSullivan JA, Shamma SA, Lalor EC. 2015. Evidence for neural
computations of temporal coherence in an auditory scene
and their enhancement during active listening. J Neurosci.
35:72567263.
Pandya DN, Kuypers HGJM. 1969. Cortico-cortical connections in
the rhesus monkey. Brain Res. 13:1336.
Penny WD, Stephan KE, Mechelli A, Friston KJ. 2004. Comparing
dynamic causal models. NeuroImage. 22:11571172.
Pérez-González D, Malmierca MS. 2014. Adaptation in the audi-
tory system: an overview. Front Integr Neurosci. 8:19.
Petrides M, Pandya DN. 1984. Projections to the frontal cortex
from the posterior parietal region in the rhesus monkey.
J Comp Neurol. 228:105116.
Plack CJ, Moore BCJ. 1990. Temporal window shape as a function
of frequency and level. J Acoust Soc Am. 87:21782187.
Roberts TP, Ferrari P, Stufebeam SM, Poeppel D. 2000. Latency
of the auditory evoked neuromagnetic eld components:
stimulus dependence and insights toward perception. J Clin
Neurophysiol. 17:114129.
Schadwinkel S, Gutschalk A. 2010a. Activity associated with
stream segregation in human auditory cortex is similar for
spatial and pitch cues. Cereb Cortex. 20:28632873.
Schadwinkel S, Gutschalk A. 2010b. Functional dissociation
of transient and sustained fMRI BOLD components in human
auditory cortex revealed with a streaming paradigm based on
interaural time differences. Eur J Neurosci. 32:19701978.
Scheperjans F, Eickhoff SB, Hömke L, Mohlberg H, Hermann K,
Amunts K, Zilles K. 2008. Probabilistic maps, morphometry,
and variability of cytoarchitectonic areas in the human super-
ior parietal cortex. Cereb Cortex. 18:21412157.
Shamma SA, Elhilali M, Micheyl C. 2011. Temporal coherence and
attention in auditory scene analysis. Trends Neurosci.
34:114123.
Shamma SA, Micheyl C. 2010. Behind the scenes of auditory
perception. Curr Opin Neurobiol. 20:361366.
Snyder JS, Gregg MK, Weintraub DM, Alain C. 2012. Attention,
awareness, and the perception of auditory scenes. Front
Psychol. 3:15.
Teki S, Chait M, Kumar S, Shamma S, Grifths TD. 2013. Segrega-
tion of complex acoustic scenes based on temporal coher-
ence. eLife. 2:e00699.
Teki S, Chait M, Kumar S, von Kriegstein K, Grifths TD. 2011.
Brain bases for auditory stimulus-driven gure-ground
segregation. J Neurosci. 31:164171.
Teki S, Kumar S, Grifths TD. 2016. Large-scale analysis of audi-
tory segregation behavior crowdsourced via a smartphone
app. PLoS ONE. 11:e0153916.
Walther DB, Koch C. 2007. Attention in hierarchical models of
object recognition. Prog Brain Res. 165:5778.
Wilson EC, Melcher JR, Micheyl C, Gutschalk A, Oxenham AJ.
2007. Cortical FMRI activation to sequences of tones alternat-
ing in frequency: relationship to perceived rate and stream-
ing. J Neurophysiol. 97:22302238.
12 |Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
... Sample stimuli for all 3 coherence levels are available in S1-S3 Audios. Note that our stimulus paradigm deliberately maintains fixed modulation statistics within each tonotopic channel throughout the length of the stimulus, thus avoiding discontinuities within any given channel, and deviating from the more classic design pioneered by Teki et al. [32][33][34]. We chose to do this in spite of the fact that this meant coherent periods might exhibit an increase in overall amplitude, because cochlear tonotopic processing dictates that the central nervous system does not have access to the overall amplitude and instead is driven by individual tonotopic channelsAU : Pleasecheckthatthiswordingiscorrect : :::thecentralnervoussystem:::i . ...
... Tone-cloud stimuli similar to the sounds employed in this study have been used with functional MRI, MEG, and electroencephalography to probe auditory figure-ground segregation in typical adults. These studies showed that although AU : Pleasecheckthatthewordinglaterparietalregionsiscorrect: evidence of temporal coherence processing is measurable in the auditory cortex, attention can significantly enhance the neural representation of the foreground figure, especially in later parietal regions [32,34,56,57]. Indeed, animal models show that temporal coherence sensitivity and neural computations that support auditory scene analysis begin subcortically, as early as the cochlear nucleus [58]. ...
... Although our stimulus design was analogous to the stochastic figure-ground (SFG) stimuli used by Teki et al. [32][33][34], our design deviated from the SFG stimulus in important ways. The SFG stimulus consisted of a cloud of 50-ms-long chords. ...
Article
Full-text available
Organizing sensory information into coherent perceptual objects is fundamental to everyday perception and communication. In the visual domain, indirect evidence from cortical responses suggests that children with autism spectrum disorder (ASD) have anomalous figure–ground segregation. While auditory processing abnormalities are common in ASD, especially in environments with multiple sound sources, to date, the question of scene segregation in ASD has not been directly investigated in audition. Using magnetoencephalography, we measured cortical responses to unattended (passively experienced) auditory stimuli while parametrically manipulating the degree of temporal coherence that facilitates auditory figure–ground segregation. Results from 21 children with ASD (aged 7–17 years) and 26 age- and IQ-matched typically developing children provide evidence that children with ASD show anomalous growth of cortical neural responses with increasing temporal coherence of the auditory figure. The documented neurophysiological abnormalities did not depend on age, and were reflected both in the response evoked by changes in temporal coherence of the auditory scene and in the associated induced gamma rhythms. Furthermore, the individual neural measures were predictive of diagnosis (83% accuracy) and also correlated with behavioral measures of ASD severity and auditory processing abnormalities. These findings offer new insight into the neural mechanisms underlying auditory perceptual deficits and sensory overload in ASD, and suggest that temporal-coherence-based auditory scene analysis and suprathreshold processing of coherent auditory objects may be atypical in ASD.
... Prominent sustained neural responses have been observed in the magneto-or electroencephalographic activity as a low-frequency DC power offset. These sustained responses have been reported to be modulated by various manipulations of sound statistics ranging from very low-level features such as amplitude and duration in pure tones (Pantev et al., 1994;Picton et al., 1978aPicton et al., , 1978b and temporal regularity using click trains (Gutschalk et al., 2002) or regular interval noise (Lütkenhöner et al., 2011) to higher order regularities such as changes in frequency-modulated (FM) coherence (Herrmann et al., 2021;Herrmann & Johnsrude, 2018), transitions between complex tone-pip patterns Herrmann & Johnsrude, 2018;Southwell et al., 2017) or changes in temporal coherence within tone clouds (Rezaeizadeh & Shamma, 2021;Teki et al., 2016). ...
... The transitions in Barascud et al. (2016) could be considered an auditory object boundary due to the change in the spectrotemporal statistics. Teki et al. (2016) investigated brain mechanisms for figure-ground segregation in the absence of a relevant task. Using a random tone cloud containing temporally coherent tones at certain frequencies that repeated in time, they found a robust sustained evoked response that reflected the emergence of a figure (auditory object) from the random background. ...
... They reported that the appearance of a new auditory object in an ongoing scene leads to increased evoked activity in a regular scene compared with a random scene in both active and passive conditions. These studies Herrmann & Johnsrude, 2018;O'Sullivan et al., 2015;Sohoglu & Chait, 2016;Teki et al., 2016) report a sustained MEG and EEG response associated with emergence of a new auditory object in an ongoing scene. ...
Article
Full-text available
Auditory object analysis requires the fundamental perceptual process of detecting boundaries between auditory objects. However, the dynamics underlying the identification of discontinuities at object boundaries are not well understood. Here, we employed a synthetic stimulus composed of frequency modulated ramps known as "acoustic textures", where boundaries were created by changing the underlying spectro‐temporal statistics. We collected magnetoencephalographic (MEG) data from human volunteers and observed a slow (<1 Hz) post boundary drift in the neuro‐magnetic signal. The response evoking this drift signal was source localized close to Heschl's Gyrus (HG) bilaterally, which is in agreement with a previous fMRI study that found HG to be involved in the detection of similar auditory object boundaries. Time‐frequency analysis demonstrated suppression in alpha and beta bands that occurred after the drift signal.
... Extraction of the "figure" requires successful segregation based on perceptual commonalities as well as a sequential grouping in timefrequency space, which is similar to tracking speech targets with background noise. Previous work has shown that participants can successfully detect figures, and that performance improves with increasing figure coherence ( Teki et al., 2013( Teki et al., , 2016Holmes & Griffiths, 2019 ), which refers to the number of spectral elements that repeat over time. Neural imaging studies also discovered that SFG engages high-level mechanisms, some of which are not within traditional auditory areas, including the superior temporal sulcus (STS) bilaterally, the intraparietal sulcus (IPS) and the planum temporale (PT), indicating that auditory grouping does not only involve processes in the early auditory cortices ( Teki et al., 2011 ). ...
Article
Full-text available
Speech-in-noise difficulty is commonly reported among hearing-impaired individuals. Recent work has established generic behavioural measures of sound segregation and grouping that are related to speech-in-noise processing but do not require language. In this study, we assessed potential clinical electroen-cephalographic (EEG) measures of central auditory grouping (stochastic figure-ground test) and speech-in-noise perception (speech-in-babble test) with and without relevant tasks. Auditory targets were presented within background noise (16 talker-babble or randomly generated pure-tones) in 50% of the trials and composed either a figure (pure-tone frequency chords repeating over time) or speech (English names), while the rest of the trials only had background noise. EEG was recorded while participants were presented with the target stimuli (figure or speech) under different attentional states (relevant task or visual-distractor task). EEG time-domain analysis demonstrated enhanced negative responses during detection of both types of auditory targets within the time window 150-350 ms but only figure detection produced significantly enhanced responses under the distracted condition. Further single-channel analysis showed that simple vertex-to-mastoid acquisition defines a very similar response to more complex arrays based on multiple channels. Evoked-potentials to the generic figure-ground task therefore represent a potential clinical measure of grouping relevant to real-world listening that can be assessed irrespective of language knowledge and expertise even without a relevant task.
... Temporally precise onset type responses may additionally create temporal reference frames to align responses [44], entrain oscillatory processes to ongoing speech [100,101], contribute features for speech intelligibility in noise [102,103], segregate streams based on temporal coherence [104,105], and mediate gap detection [106]. They may be impaired in auditory processing disorders and dyslexia [107][108][109], autism [110], or aging [111]. ...
Article
Full-text available
Studies of the encoding of sensory stimuli by the brain often consider recorded neurons as a pool of identical units. Here, we report divergence in stimulus-encoding properties between subpopulations of cortical neurons that are classified based on spike timing and waveform features. Neurons in auditory cortex of the awake marmoset ( Callithrix jacchus ) encode temporal information with either stimulus-synchronized or nonsynchronized responses. When we classified single-unit recordings using either a criteria-based or an unsupervised classification method into regular-spiking, fast-spiking, and bursting units, a subset of intrinsically bursting neurons formed the most highly synchronized group, with strong phase-locking to sinusoidal amplitude modulation (SAM) that extended well above 20 Hz. In contrast with other unit types, these bursting neurons fired primarily on the rising phase of SAM or the onset of unmodulated stimuli, and preferred rapid stimulus onset rates. Such differentiating behavior has been previously reported in bursting neuron models and may reflect specializations for detection of acoustic edges. These units responded to natural stimuli (vocalizations) with brief and precise spiking at particular time points that could be decoded with high temporal stringency. Regular-spiking units better reflected the shape of slow modulations and responded more selectively to vocalizations with overall firing rate increases. Population decoding using time-binned neural activity found that decoding behavior differed substantially between regular-spiking and bursting units. A relatively small pool of bursting units was sufficient to identify the stimulus with high accuracy in a manner that relied on the temporal pattern of responses. These unit type differences may contribute to parallel and complementary neural codes.
... Extraction of the "figure" requires successful segregation based on perceptual commonalities as well as a sequential grouping in timefrequency space, which is similar to tracking speech targets with background noise. Previous work has shown that participants can successfully detect figures, and that performance improves with increasing figure coherence ( Teki et al., 2013( Teki et al., , 2016Holmes & Griffiths, 2019 ), which refers to the number of spectral elements that repeat over time. Neural imaging studies also discovered that SFG engages high-level mechanisms, some of which are not within traditional auditory areas, including the superior temporal sulcus (STS) bilaterally, the intraparietal sulcus (IPS) and the planum temporale (PT), indicating that auditory grouping does not only involve processes in the early auditory cortices ( Teki et al., 2011 ). ...
Preprint
Speech-in-noise difficulty is commonly reported among hearing-impaired individuals. Recent work has established generic behavioural measures of sound segregation and grouping that are related to speech-in-noise processing but do not require language. In this study, we assessed potential clinical electroencephalographic (EEG) measures of central auditory grouping (stochastic figure-ground test) and speech-in-noise perception (speech-in-babble test) with and without relevant tasks. Auditory targets were presented within background noise (16 talker-babble or randomly generated pure-tones) in 50% of the trials and composed either a figure (pure-tone frequency chords repeating over time) or speech (English names). EEG was recorded while participants were presented with the target stimuli (figure or speech) under different attentional states (relevant task or visual-distractor task). EEG time-domain analysis demonstrated enhanced negative responses during detection of both types of auditory targets within the time window 650-850 ms but only figure detection produced significantly enhanced responses under the distracted condition. Further single-channel analysis showed that simple vertex-to-mastoid acquisition defines a very similar response to more complex arrays based on multiple channels. Evoked-potentials to the generic figure-ground task therefore represent a potential clinical measure of grouping relevant to real-world listening that can be assessed irrespective of language knowledge and expertise even without a relevant task.
... Furthermore, it is far easier to detect the temporal onset misalignment between tones across two synchronized sequences, compared with between asynchronous (e.g., alternating) sequences (Elhilali et al. 2009), suggesting that temporally coherent tone sequences are perceived as a single stream (Bregman and Campbell 1971;Zera 1993;Zera and Green 1995). Additional strong evidence for the temporal coherence principle was provided by a series of experiments utilizing the stochastic figure-ground stimulus, in which synchronous tones (referred to as the "figure") are found to pop out perceptually against a background of random desynchronized tones, with the perceptual saliency of the "figure" being proportional to the number of its coherent tones (Teki et al. 2013;O'Sullivan et al. 2015b;Teki et al. 2016). ...
Article
Full-text available
Numerous studies have suggested that the perception of a target sound stream (or source) can only be segregated from a complex acoustic background mixture if the acoustic features underlying its perceptual attributes (e.g., pitch, location, and timbre) induce temporally modulated responses that are mutually correlated (or coherent), and that are uncorrelated (incoherent) from those of other sources in the mixture. This “temporal coherence” hypothesis asserts that attentive listening to one acoustic feature of a target enhances brain responses to that feature but would also concomitantly (1) induce mutually excitatory influences with other coherently responding neurons, thus enhancing (or binding) them all as they respond to the attended source; By contrast, (2) suppressive interactions are hypothesized to build up among neurons driven by temporally incoherent sound features, thus relatively reducing their activity. In this study, we report on EEG measurements in human subjects engaged in various sound segregation tasks that demonstrate rapid binding among the temporally coherent features of the attended source regardless of their identity (pure tone components, tone complexes, or noise), harmonic relationship, or frequency separation, thus confirming the key role temporal coherence plays in the analysis and organization of auditory scenes.
Article
Speech-in-noise difficulty is commonly reported among hearing-impaired individuals. Recent work has established generic behavioural measures of sound segregation and grouping that are related to speech-in-noise processing but do not require language. In this study, we assessed potential clinical electroencephalographic (EEG) measures of central auditory grouping (stochastic figure-ground test) and speech-in-noise perception (speech-in-babble test) with and without relevant tasks. Auditory targets were presented within background noise (16 talker-babble or randomly generated pure-tones) in 50% of the trials and composed either a figure (pure-tone frequency chords repeating over time) or speech (English names), while the rest of the trials only had background noise. EEG was recorded while participants were presented with the target stimuli (figure or speech) under different attentional states (relevant task or visual-distractor task). EEG time-domain analysis demonstrated enhanced negative responses during detection of both types of auditory targets within the time window 150-350 ms but only figure detection produced significantly enhanced responses under the distracted condition. Further single-channel analysis showed that simple vertex-to-mastoid acquisition defines a very similar response to more complex arrays based on multiple channels. Evoked-potentials to the generic figure-ground task therefore represent a potential clinical measure of grouping relevant to real-world listening that can be assessed irrespective of language knowledge and expertise even without a relevant task.
Preprint
Understanding speech in noisy environments can be challenging and requires listeners to accurately segregate a target speaker from irrelevant background noise. An online SFG task with complex stimuli consisting of a sequence of inharmonic pure-tone chords was administered to 37 young, normal hearing adults, to have a more pure measure of auditory stream segregation that does not rely on linguistic stimuli. Detection of target figure chords consisting of 4, 6, 8, or 10 temporally coherent tones amongst a background of randomly varying tones was measured. Increased temporal coherence (i.e., number of tones in a figure chord) resulted in higher accuracy and faster reaction times (RTs). At higher coherence levels, faster RTs were associated with better scores on a standardized speech-in-noise recognition task. Increased working memory capacity hindered SFG accuracy as the tasked progressed, whereas self-reported musicianship modulated the relationship between speech-in-noise recognition and SFG accuracy. Overall, results demonstrate that the SFG task could serve as an assessment of auditory stream segregation that is sensitive to capture individual differences in working memory capacity and musicianship.
Article
Full-text available
Spatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.
Article
Objectives: Despite the widespread use of noise reduction (NR) in modern digital hearing aids, our neurophysiological understanding of how NR affects speech-in-noise perception and why its effect is variable is limited. The current study aimed to (1) characterize the effect of NR on the neural processing of target speech and (2) seek neural determinants of individual differences in the NR effect on speech-in-noise performance, hypothesizing that an individual's own capability to inhibit background noise would inversely predict NR benefits in speech-in-noise perception. Design: Thirty-six adult listeners with normal hearing participated in the study. Behavioral and electroencephalographic responses were simultaneously obtained during a speech-in-noise task in which natural monosyllabic words were presented at three different signal-to-noise ratios, each with NR off and on. A within-subject analysis assessed the effect of NR on cortical evoked responses to target speech in the temporal-frontal speech and language brain regions, including supramarginal gyrus and inferior frontal gyrus in the left hemisphere. In addition, an across-subject analysis related an individual's tolerance to noise, measured as the amplitude ratio of auditory-cortical responses to target speech and background noise, to their speech-in-noise performance. Results: At the group level, in the poorest signal-to-noise ratio condition, NR significantly increased early supramarginal gyrus activity and decreased late inferior frontal gyrus activity, indicating a switch to more immediate lexical access and less effortful cognitive processing, although no improvement in behavioral performance was found. The across-subject analysis revealed that the cortical index of individual noise tolerance significantly correlated with NR-driven changes in speech-in-noise performance. Conclusions: NR can facilitate speech-in-noise processing despite no improvement in behavioral performance. Findings from the current study also indicate that people with lower noise tolerance are more likely to get more benefits from NR. Overall, results suggest that future research should take a mechanistic approach to NR outcomes and individual noise tolerance.
Article
Full-text available
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns ('figures') from overlapping 'background' signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation.
Article
Full-text available
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance-the capacity to make sense of complex 'auditory scenes' is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the 'stochastic figure-ground' stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a 'game' featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders.
Article
Full-text available
The human brain has evolved to operate effectively in highly complex acoustic environments, segregating multiple sound sources into perceptually distinct auditory objects. A recent theory seeks to explain this ability by arguing that stream segregation occurs primarily due to the temporal coherence of the neural populations that encode the various features of an individual acoustic source. This theory has received support from both psychoacoustic and functional magnetic resonance imaging (fMRI) studies that use stimuli which model complex acoustic environments. Termed stochastic figure-ground (SFG) stimuli, they are composed of a "figure" and background that overlap in spectrotemporal space, such that the only way to segregate the figure is by computing the coherence of its frequency components over time. Here, we extend these psychoacoustic and fMRI findings by using the greater temporal resolution of electroencephalography to investigate the neural computation of temporal coherence. We present subjects with modified SFG stimuli wherein the temporal coherence of the figure is modulated stochastically over time, which allows us to use linear regression methods to extract a signature of the neural processing of this temporal coherence. We do this under both active and passive listening conditions. Our findings show an early effect of coherence during passive listening, lasting from ∼115 to 185 ms post-stimulus. When subjects are actively listening to the stimuli, these responses are larger and last longer, up to ∼265 ms. These findings provide evidence for early and preattentive neural computations of temporal coherence that are enhanced by active analysis of an auditory scene. Copyright © 2015 the authors 0270-6474/15/357256-08$15.00/0.
Article
Full-text available
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments.
Article
Full-text available
A variety of attention-related effects have been demonstrated in primary auditory cortex (A1). However, an understanding of the functional role of higher auditory cortical areas in guiding attention to acoustic stimuli has been elusive. We recorded from neurons in two tonotopic cortical belt areas in the dorsal posterior ectosylvian gyrus (dPEG) of ferrets trained on a simple auditory discrimination task. Neurons in dPEG showed similar basic auditory tuning properties to A1, but during behavior we observed marked differences between these areas. In the belt areas, changes in neuronal firing rate and response dynamics greatly enhanced responses to target stimuli relative to distractors, allowing for greater attentional selection during active listening. Consistent with existing anatomical evidence, the pattern of sensory tuning and behavioral modulation in auditory belt cortex links the spectrotemporal representation of the whole acoustic scene in A1 to a more abstracted representation of task-relevant stimuli observed in frontal cortex.
Book
I. Introduction.- II. Anatomy and Evolution of the Parietal Lobe in Monkeys and Man.- A. Anatomy.- B. Evolution.- III. Functional Properties of Neurones in the Primary Somatosensory Cortex.- A. Comments About Methods.- B. Movement and Orientation Selective Neurones in SI.- C. Receptive Field Integration and Submodality Convergence in SI.- D. Influence of Attention on Neuronal Function in SI.- IV. Neural Connections in the Posterior Parietal Lobe of Monkeys.- A. Connections of Area 5.- B. Connections of Area 7.- C. Summary of Connections.- V. Symptoms of Posterior Parietal Lesions.- A. Humans.- 1. Visuo-Spatial Disorientation.- 2. Defects in Eye Movements.- 3. Misreaching.- 4. Constructional Apraxia.- 5. Unilateral Neglect.- 6. Gerstmann Syndrome.- B. Monkeys.- 1. Visuo-Spatial Disorientation.- 2. Defects in Eye Movements.- 3. Misreaching.- 4. Unilateral Neglect.- 5. Somatic Deficits.- C. Comparison of Monkeys and Man.- VI. Electrical Stimulation of Posterior Parietal Lobe.- A. Monkey.- B. Man.- VII. Neuronal Activity in Area 5.- A. Sensory Properties.- B. Motor Properties.- C. Sensorimotor Interaction in Area 5.- VIII.Neuronal Activity in Area 7.- A. Visual and Oculomotor Mechanisms.- 1. Visual Fixation Neurones.- 2. Visual Tracking Neurones.- 3. Saccade Neurones...- 4. Visual Sensory Neurones.- B. Somatic Mechanisms.- 1. Cutaneous Responses.- 2. Kinaesthetic Responses.- 3. Activity Related to Somatic Movements.- C. Convergence of Somatic and Visual Functions.- D. Behavioural Mechanisms.- E. Effects of Drugs.- IX. Vestibular and Auditory Responses in the Parietal Lobe.- A. Vestibular Responses.- B. Auditory Responses in Area Tpt.- X. Regional Distribution of Functions in Area 7.- A. Mapping Methods.- B. Distribution of Responses.- 1. Visual Responses.- 2. Somatic Responses.- 3. Combined Responses from Several Modalities.- C. Somatotopy in Area 7.- D. Functional Differentiation.- XI. Modification of Area 7 and Functional Blindness After Visual Deprivation.- A. Visual Deprivation.- B. Deprivation Effects on the Visual Pathways.- C. Deprivation Effects on Area 7.- XII. Functional Role of Parietal Cortex.- A. Somatosensory Cortex.- B. Parietal Association Cortex.- 1. Sensory Functions.- a) Visual Functions.- b) Somaesthetic Functions.- c) Vestibular and Auditory Functions.- 2. Motor Functions.- a) Eye Movements.- b) Somatic Movements.- c) The Command Hypothesis.- d) The Corollary Discharge Hypothesis.- 3. Behavioural Functions.- a) Sensorimotor Interaction.- b) Spatial Schema.- c) Motivation-Intention-Attention.- d) Plasticity, Learning, Memory.- 4. Cellular Machinery.- a) Functional Organization.- b) Methodological Difficulties.- c) Factors That Influence Cellular Activity.- C. Parietal Lobe as a Whole.- References.
Article
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance-the capacity to make sense of complex 'auditory scenes' is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the 'stochastic figure-ground' stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a 'game' featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders. © 2016 Teki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Article
The article outlines various neural processes underlying auditory scene analysis. It refers to the processes by which the auditory system groups and segregates components of sound mixtures to construct meaningful perceptual representations of sound sources in the environment. The article gives an overview of sequential, simultaneous, and schema-based auditory perceptual segregation/grouping processes. Emphasis is placed on the relationship between neurophysiological studies of auditory scene analysis in humans and those involving animal models. General physiological principles and themes that have emerged may provide a framework for the continuing investigation of neural substrates underlying auditory perceptual organization. A greater understanding of neural mechanisms involved in processes of auditory perceptual organization suggests additional therapies or other forms of intervention to ameliorate deficits contributing to developmental language disorders.
Article
This study examined the ability of trained listeners to discriminate coherent components in randomly varying spectral patterns. In each observation interval, the listener was presented with a sequence of bursts of multitone complexes having a fixed number of tones (m) in each burst. In the standard interval, the frequency of each tone in every burst was chosen randomly between 200 and 5000 Hz. In the signal interval, the frequencies of n tones were repeated throughout the burst sequence while the remaining m–n tones were chosen at random. The n tones were coherent in the sense that they were perceived as ‘‘sticking together’’ to form a pattern. The listener’s task was to discriminate which burst sequence contained the n components. The results indicated that discrimination improved with increasing n/m, with increasing number of bursts per interval, and declined as the coherent components were increasingly perturbed in frequency. Further, for a fixed value of the ratio n/m discriminability was relatively independent of m. A model incorporating multichannel filtering and an optimum decision rule was reasonably successful in accounting for the experimental results.
Article
We review a simple yet versatile approach for the analysis of multichannel data, focusing in particular on brain signals measured with EEG, MEG, ECoG, LFP or optical imaging. Sensors are combined linearly with weights that are chosen to provide optimal signal-to-noise ratio. Signal and noise can be variably defined to match the specific need, e.g. reproducibility over trials, frequency content, or differences between stimulus conditions. We demonstrate how the method can be used to remove power line or cardiac interference, enhance stimulus-evoked or stimulus-induced activity, isolate narrow-band cortical activity, and so on. The approach involves decorrelating both the original and filtered data by joint diagonalization of their covariance matrices. We trace its origins; offer an easy-to-understand explanation; review a range of applications; and chart failure scenarios that might lead to misleading results, in particular due to overfitting. In addition to its flexibility and effectiveness, a major appeal of the method is that it is easy to understand.