ArticlePDF Available

Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence

Authors:

Abstract and Figures

To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency, space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to temporal coherence—the coincidence of sound elements in and across time—is also critical for the perceptual organization of acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory figure-ground segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved based on spectral cues alone. Signals (“stochastic figure-ground”: SFG) comprised a sequence of brief broadband chords containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are perceived as “figures” popping out of a stochastic “ground.” Magnetoencephalography (MEG) measurement in naïve, distracted, human subjects revealed robust evoked responses, commencing from about 150 ms after figure onset that reflect the emergence of the “figure” from the randomly varying “ground.” Neural sources underlying this bottom-up driven figure-ground segregation were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the “classic” auditory system, is also involved in the early stages of auditory scene analysis.”
Content may be subject to copyright.
ORIGINAL ARTICLE
Neural Correlates of Auditory Figure-Ground
Segregation Based on Temporal Coherence
Sundeep Teki1,2,4, Nicolas Barascud1,3, Samuel Picard3, Christopher Payne3,
Timothy D. Grifths1,2 and Maria Chait3
1
Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK,
2
Auditory
Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK,
3
Ear
Institute, University College London, London WC1X 8EE, UK and
4
Current address: Department of Physiology,
Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
Address correspondence to Maria Chait, UCL Ear Institute, University College London, London WC1X 8EE, UK. Email: m.chait@ucl.ac.uk; Sundeep Teki,
Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK. Email: sundeep.teki@gmail.com
Timothy D. Grifths and Maria Chait equally contributed as last authors.
Abstract
To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency,
space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to
temporal coherencethe coincidence of sound elements in and across timeis also critical for the perceptual organization of
acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory gure-ground
segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved
based on spectral cues alone. Signals (stochastic gure-ground: SFG) comprised a sequence of brief broadband chords
containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are
perceived as gurespopping out of a stochastic ground.Magnetoencephalography (MEG) measurement in naïve, distracted,
human subjects revealed robust evoked responses, commencing fromabout 150 ms after gure onset that reect the emergence
of the gurefrom the randomly varying ground.Neural sources underlying this bottom-up driven gure-ground segregation
were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the classicauditory
system, is also involved in the early stages of auditory scene analysis.
Key words: auditory cortex, auditory scene analysis, intraparietal sulcus, magnetoencephalography, segregation, temporal
coherence
Introduction
A major challenge for understanding listening in the crowded en-
vironments we typically encounter involves uncovering the per-
ceptual and neuro-computational mechanisms by which the
auditory system extracts a sound source of interest from a hectic
scene. Until recently, most such attempts focused primarily on
gureand groundsignals that differ in frequency, motivated
by ndings that segregation is associated with activation of
spatially distinct populations of neurons in the primary auditory
cortex (A1), driven by neuronal adaptation, forward masking, and
frequency selectivity (for reviews, see: Fishman et al. 2001,2004;
Carlyon 2004;Micheyl, Carlyon, et al. 2007;Micheyl, Hanson, et al.
© The Author 2016. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Cerebral Cortex, 2016, 112
doi: 10.1093/cercor/bhw173
Original Article
1
Cerebral Cortex Advance Access published June 19, 2016
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
2007;Gutschalk et al. 2008;Elhilali, Ma, et al. 2009;Elhilali, Xiang,
et al. 2009;Fishman and Steinschneider 2010;Kidd et al. 2011;
Moore and Gockel 2012;Snyder et al. 2012).
However, emerging work suggests that spectral separation
per se is neither sufcient (Elhilali, Ma, et al. 2009) nor necessary
(Teki et al. 2011,2013;Micheyl, Kreft, et al. 2013;Micheyl, Hanson,
et al. 2013;Christiansen et al. 2014;OSullivan et al. 2015)for
segregation to take place. Using a broadband signal (stochastic
gure-ground: SFG; Fig. 1), comprised of a sequence of brief
chords containing random pure tone components that vary
from 1 chord to another, we demonstratedthat listeners are high-
ly sensitive to the occasional repetition of a subset of tone-pips
across chords. Perceptually, the repeating tones fuse together to
form a gurethat pops out from the randomly varying ground
(Teki et al. 2011,2013). This emergence of structure from a sto-
chastic background captures the challenges of hearing in com-
plex scenes where sources overlap in spectrotemporal
dimensions such that segregation cannot be achieved based on
spectral cues alone. The notable sensitivity exhibited by listeners
conrms that the auditory system possesses specialized me-
chanisms which are tuned to the temporal coincidence of a
small subset of sound elements within a mixture. The general
pattern of performance, including that it scales with the number
of temporally correlated channels, is consistent with the predic-
tions of a recent model of auditory segregation—“ temporal co-
herence model(see extensive discussion in Shamma et al.
2011;Teki et al. 2013), based on a hypothesized mechanism
that captures the extent to which activity in distinct neuronal po-
pulations that encode different perceptual features is correlated
in time (Krishnan et al. 2014). The model proposes that, in add-
ition to spectral separation, the auditory system relies on tem-
poral relationships between sound elements to perceptually
organize acoustic scenes (Elhilali, Ma, et al. 2009;Shamma et al.
2011;Micheyl, Hanson, et al. 2013;Micheyl, Kreft, et al. 2013).
Using fMRI, and an SFG signal that contained brief gures
interspersed within long random tone patterns, we previously
observed activations in planum temporale (PT), superior tem-
poral sulcus (STS), and, intriguingly, in the intraparietal sulcus
(IPS; Teki et al. 2011) evoked specicallybytheappearanceof
temporally coherent tone patterns. However, due to the poor
temporal resolution of fMRI, it remains unclear at what stage,
in the course of gure-ground segregation, these areas play a
role. In particular, a central issue pertains to whether activity in
IPS reects early processes that are causally responsible for seg-
regation or rather the (later) consequences of perceptual organ-
ization (Cusack 2005;Shamma and Micheyl 2010;Teki et al. 2011).
The present magnetoencephalography (MEG) study was de-
signed to capture the temporal dynamics of the brain regions in-
volved in segregating the SFG stimulus. Participants performed
an incidental visual task while passively listening (in separate
blocks) to 2 versions of SFG signals (Fig. 1). One version (Fig. 1A)
hereafter termed the basiccondition consisted of a sequence
of brief (25 ms) chords, each containing a random number of
pure tone components that varied from 1 chord to the next. Part-
way through the signal, a certain number of components were
xed across chords for the remaining duration. The second ver-
sion (Fig. 1C) contained loud noise bursts (25 ms) interspersed be-
tween successive chords. The noise bursts were intended to
break the pattern of repeating tonal components that comprise
the gure and reduce possible effects of adaptation, which may
underlie gure detection. In previous behavioral experiments
(Teki et al. 2013), this manipulation revealed robust gure-
detection performance. In fact, listeners continued to detect
the guresignicantly above chance for intervening noise
durations of up to 500 ms, demonstrating that the underlying
mechanisms, which link successive temporally coherentcom-
ponents across time and frequency, are robust to interference
over very long time scales.
We used MEG to track, with excellent temporal resolution,
the process of gure-ground segregation and the brain areas in-
volved. We observed robust early (within 200 ms of gure onset)
evoked responses that were modulated by the number of tempor-
ally correlated channels comprising the gure. Sources underlying
this bottom-up gure-ground segregation were localized to PT
and, the IPS, demonstrating that this area, outside the classic
auditory cortex, is also involved in auditory scene analysis.
Materials and Methods
Participants
Sixteen participants (9 females; mean age: 26.9 years) with nor-
mal hearing and no history of audiological or neurological disor-
ders took part in the study. Experimental procedures were
approved by the Institute of Neurology Ethics Committee (Uni-
versity College London, UK), and written informed consent was
obtained from each participant.
Figure 1. Stochastic gure-ground stimulus. (A) An example spect rogram of the
basic SFG stimulus. Signals co nsisted of a seque nce of 25 ms chords, each
containin g a random num ber of pure ton e components t hat varied from 1
chord to the next. At 600 ms after onset (black dashed line), a certain number of
components (coherence =2, 4, or 8; 4 in this example; indicated by arrows) were
xed across chords in the s econd half of the stimulus. The resulting percept is
that of a gurewithin a randomly varying background. ( B) A schematic of the
basic SFG stimulus whose spe ctrogram is shown i n A. Randomly varying
background chords (in black, 25 ms long) form the no-gurepart of the
stimulus. Following the transition (indicated by red dotted lines), 4 extra
components (sh own in pink) are added which are tempo rally correlated in the
gure conditio n (FIG4), while randomly occurring in the ground condition
(GND4). (C) The noise SFG stimulus is construc ted similar to t he basic SFG
stimulus except for the introduc tion of 25 ms chords of whit e noise between
each SFG chord. The plots in A,Crepresent audito ryspectrograms, generated
with a lterbank of 1/ERB wide channels (Equi valent Rectan gular Bandwi dth;
Moore and Glasberg (1983)) equally spaced on a scale of ERB rate. Channels are
smoothed to obtain a temporal resolution similar to the Equivalent Rectangular
Duration (Plack and Moore 1990).
2|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
Stimuli
Signals consisted of a sequence of 25 ms chords, each comprising
a random set of tones, drawn from a xed frequency pool ranging
from 0.17 to 2.5 kHz spaced in 1/24 octave steps. This range is nar-
rower than that in our previous studies (0.177.2 kHz; Teki et al.
2011,2013) due to the low-pass ltering characteristics of the Ety-
motic tubes used for sound delivery. Each chord contained an
average of 10 (varying between 5 and 15) pure tone components
that changed randomly from 1 chord to the next. A gureis in-
corporated in this randomly varying tonal stimulus by randomly
repeating a number of frequencies (coherenceof the gure: 2, 4,
or 8) over a certain number of chords (referred to as the duration
of the gure). The resulting percept is that of a grouped auditory
object(gure) that pops out from the background. Importantly,
the gure is only detectable by integrating the repeating compo-
nents across frequency and time as the backgroundand gure
components are indistinguishable within a particular chord.
In earlier work, we used a stimulus design where the gure
appeared for a brief duration (ranging from 50 to 350 ms) amidst
an ongoing random chord sequence (Teki et al. 2011,2013,2016).
For the present study, the stimulus was modied such that the
gure was introduced exactly midway during the stimulus and
remained present until offset as shown in Figure 1. This design
was used to specically examine time-locked responses evoked
by the appearance of the gure, as well as later activity potential-
ly related to the ongoing representation of the gure amid the
uctuating background.
The stimulus was created by rst generating a background-
only signal for the total duration of the stimulus and then incorp-
orating additional repeating (temporally correlated) tones (2, 4,
or 8, hereby referred to as FIG2,FIG4, and FIG8,respectively)
during the second half of the signal. Similarly, additional uncor-
related components (2, 4, or 8; randomly varying across chords)
were incorporated in the stimuli (50%) that did not contain a g-
ure, to control for the increase in energy associated with the add-
ition of the gure components. These ground(or no-gure)
signals will be referred to as GND2,GND4,andGND8,re-
spectively. See a schematic representation of FIG4 and GND4 sig-
nals in Figure 1. Overall, half of the signals contained a gure
(with equal proportions of FIG2, FIG4, and FIG8) and the other
half did not (with equal proportions of GND2, GND4, and GND8).
Two versions of the SFG stimuli were used in different blocks:
the basicversion (Fig. 1A) consisted of consecutive 25 ms
chords (1200 ms long stimulus with the gure appearing at
600 ms post onset); and the noiseversion (Fig. 1C)consisted
of 25 ms of wide-band white noise interspersed between succes-
sive 25 ms long chords (2400 ms long stimulus with the gure ap-
pearing at 1200 ms post onset; note that the number of chords is
identical to that in the basicstimulus). The level of the noise
was set to 12 dB above the level of the chords.
All acoustic stimuli were created using MATLAB 7.5 software
(The Mathworks Inc.) at a sampling rate of 44.1 kHz and 16-bit
resolution. Sounds were delivered binaurally with a tube phone
attached to earplugs (E-A-RTONE 3A 10 Ω, Etymotic Research,
Inc.) inserted into the ear canal and presented at a comfortable
listening level adjusted individually by each participant. The ex-
periment was executed using the Cogent toolbox (http://www.
vislab.ucl.ac.uk/cogent.php).
Procedure
The recording started with a functional source-localizer session
where participants were required to attend to a series of 100 ms
long pure tones (1000 Hz) for approximately 3 min. A variable
number of tones (between 180 and 200) were presented with a
random interstimulus interval of 7001500 ms. Subjects were
asked to report the total number of tones presented. This locali-
zersession served to identify channels that respond robustly to
sound. These were used for subsequent analysis of the sensor-
evoked responses to the SFG stimuli.
During the experiment, subjects were engaged in an inciden-
tal visual task while passively listening to the SFG stimuli. The
visual task consisted of landscape images, presented in a series
of 3 (each image was presented for 5 s, with an average gap of 2
s between groups during which the screen was blank). Subjects
were instructed to xate in a cross at the center of the display
and press a button whenever the third image in a series was iden-
tical to the rst or the second image. Such repetitions occurred on
10% of the trials. Responses were executed using a button box
held in the right hand. The visual task served as a decoy task
a means to ensure that subjectsattention was diverted away
from the acoustic stimuli. At the end of each block, subjects re-
ceived feedback about their performance (number of hits, misses,
and false positives). To avoid any temporal correlation between
the auditory and visual presentation, the visual task was pre-
sented from a different computer, independent from the one
controlling the presentation of the acoustic stimuli.
The MEG experiment lasted approximately 1.5 h and con-
sisted of 8 blocks. Four blocks involved presentation of the
basicSFG stimulus, while the noisecondition was presented
in the remaining 4 blocks. The order of the presentation was
counterbalanced across subjects. A total of 660 trials were pre-
sented for each condition110 trials for each combination of
stimulus type (gure and ground) and number of added compo-
nents (2, 4, and 8). Each basicblock took between 8 and 10 min
and the noiseblocks took twice as long. Subjects were allowed a
short rest between blocks but were required to stay still.
MEG Data Acquisition and Preprocessing
Data were acquired using a 274-channel, whole-head MEG scan-
ner with third-order axial gradiometers (CTF systems) at a sam-
pling rate of 600 Hz and analyzed using SPM12 (Litvak et al.
2011; Wellcome Trust Centre for Neuroimaging, London) and
Fieldtrip (Oostenveld et al. 2011) in MATLAB 2013 (MathWorks
Inc.). The data from the localizer session were divided into
700 ms epochs, including 200 ms prestimulus baseline period,
baseline-corrected, and low-pass ltered with a cutoff frequency
of 30 Hz. The M100 onset response (Roberts et al. 2000) was iden-
tied for each subject as a source/sink pair in the magnetic eld
contour plots distributed over the temporal region of each hemi-
sphere. For each subject, the 40 most activated channels at the
peak of the M100 (20 in each hemisphere) were selected for sub-
sequent sensor-level analysis of the responses evoked by the SFG
stimuli.
Data epochs from the main experimental blocks consisted of
a 500 ms prestimulus baseline and a 700 ms poststimulus period
(overall 2400 ms for basicand 3600 ms for noiseconditions).
Epochs with peak amplitudes that deviated from the mean by
more than twice the standard deviation (typically 7%) were
agged as outliers and discarded automatically from further ana-
lyses (100 epochs were obtained for each stimulus condition).
Denoising Source Separation analysis (DSS, see de Cheveigné
and Parra 2014 for an extensive review of the method and its ap-
plications) was applied to each stimulus condition to extract
stimulus-locked activity (the most reproducible linear combin-
ation of sensors across trials)the 2 most repeatable
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |3
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
components in each condition were retained and projected back
to sensor space.
Epochs were then averaged and baseline corrected to the
prestimulus interval. In each hemisphere, the root-mean-
squared (RMS) eld strength across 20 channels (selected from
the localizer session) was calculated for each participant. The
time course of the RMS, reecting the instantaneous power of
neural responses, is employed as a measure of neuronal re-
sponses evoked in the auditory cortex. As most of the observed
activity (including in the source space) was in auditory cortex, se-
lecting channels based on the M100 represents a reasonable ap-
proach for summarizing the sensor-level data in a single time
series. For purposes of illustration, group-RMS (RMS of individual
subjectsRMS) is shown, but statistical analysis was always per-
formed across subjects, independently for each hemisphere.
Statistical Analysis
To estimate the time required to discover the gure, the differ-
ence between the RMS waveforms of each FIG and GND pair
was calculated for each participant and subjected to bootstrap re-
sampling (2000 iterations; balanced; Efron and Tibshirani 1993).
The difference was deemed signicant if the proportion of boot-
strap iterations that fell above or below zero was >99.9% (i.e., P<
0.001) for 5 or more consecutive samples. The rst signicant
sample identied in this way is considered the earliest time
point at which the response to the gure differed signicantly
from the corresponding GND control. The bootstrap analysis
was run over the entire epoch duration, and all signicant inter-
vals are indicated in Figures 2and 3as shaded gray regions.
A repeated-measures ANOVA, with mean amplitude between
gure onset and offset (t= 6001200 ms for basic, and t= 1200
2400 ms for noise conditions, respectively) as the dependen t vari-
able, was used to examine global effects of stimulus (FIG or GND),
number of added components (2, 4, and 8), and hemisphere.
Source Analysis
Source analysis was performed using the generic Imagingap-
proach implemented in SPM12 (Litvak et al. 2011;López et al.
2014). We used a classical minimum norm algorithm that seeks
to achieve a good data t while minimizing the overall energy
of the sources. In SPM12, this method is referred to as independ-
ent identical distribution (IID) as it is based on the assumption
that the probability of each source being active is independent
and identically distributed (Hämäläinen and Ilmoniemi 1994).
The IID method corresponds to a standard L2-minimum norm,
which consists of tting the data at the same time as minimizing
the total energy of the sources. Standard processing steps were
employed. Specically, data were rst concatenated across blocks
for each participant. A generic 8196-vertex cortical mesh tem-
plate was coregistered (as provided in SPM12 and dened in the
MNI stereotaxic space) to the sensor positions using 3 ducial
marker locations (Mattout et al. 2007). We then used a standard
Figure 2. MEG evoked responses to the basic SFG stimul us. (Top) Each plot dep icts the group-RMS res ponse to the basic SFG stimulus in the right hemisphere (left
hemisphere responses are identical). The onset of the stimulus occurs at t= 0 and offset at t= 1200 ms, the transition to the gure, as indicated by the dashed vertical
lines, occurs at t= 600 ms. The responses to the gure and ground segments are shown in the dar ker and lighter shade of each color: red (FIG8 and GND8), blue (FIG4
and GND4), green (FIG2 and GND2). The shaded gray bars indicate times where a signicant difference between the response to the gure and its corres ponding
control stimulus was observed (based on bootstrap analysis; see Materials and Methods). (Bottom) Mean RMS amplitude in each of the conditions comp uted over the
gure interval (between 600 and 1200 ms poststimulus onset). A repeated-measures ANOVA analysis indicated signicant differences between each FIG and GND pair.
** indicates P0.01.
4|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
single-shell head model for the forward computation of the gain
matrix of the lead eld model (Nolte 2003). Source estimates on
the cortical mesh were obtained via inversion of the forward
model with the IID method described above.
The IID model was used to identify distributed sources of
brain activity underlying the transition from a background to a
coherent gure. The inverse estimates were obtained for all 6 ex-
perimental conditions together to allow statistical comparisons
between them (basicand noiseblocks had to be analyzed
separately due to the different epoch lengths). The inverse recon-
struction was performed over the largest possible time window
to let the algorithm model all the brain sources generating the re-
sponse (Litvak et al. 2011). For the basicstimuli, the inversion
used the entire stimulus epoch (from 300 ms to + 1700 ms, rela-
tive to stimulus onset). For the noisestimuli, this approach did
not work well, because the signal-to-noise ratio of the data is in-
trinsically much smaller. To conrm this, we compared model
evidences in the noiseconditions with inversion over the entire
epoch, and inversion over the stimulus epoch from +1200 ms to
+2700 ms. The latter yielded a more accurate inversion for all
subjects (the difference in log-model evidences, i.e., log Bayes
factor, was >10 for all subjects; Penny et al. 2004) and was
therefore used for source localization.
Source activity for each condition was then summarized as
separate NIfTI images, by applying the projectors estimated
in the inversion stage to the averaged trials (to localize evoked
activity), over 2 distinct time windows: an initial transition
phase (early; a 100 ms period starting from the rst time
point at which the gure and the corresponding ground became
signicantly different), as well as a later phase (late; a 100 ms
period before stimulus offset). The specicvaluesforthetime
windows used for source localization for each coherence value
are detailed in the Results section. The resulting 3D images
were smoothed using a Gaussian kernel with 5-mm full-width
at half maximum and taken to second-level analysis for statistic-
al inference.
At the second level, the data were modeled with 6 conditions
(GND8, GND4, GND2, FIG2, FIG4, and FIG8) with a design matrix
including a subject-specic regressor and correcting for hetero-
scedasticity across conditions. We sought to identify brain
areas whose activity increased parametrically with correspond-
ing changes in coherence (i.e., over and above the changes in
power associated with adding components). For this purpose, a
parametric contrast [842 +2 +4+8]/14 was used. Effectively,
the contrast can be expressed as: 2 × (FIG2-GND2) + 4 × (FIG4-
GND4) + 8 × (FIG8-GND8), thus targeting brain regions whose ac-
tivity is parametrically modulated by rising temporal coherence
(2 < 4 <8) while controlling (by subtracting activity of matched
GND signals) for the increase in power associated with the
added gure components. We also used a simple Figure versus
Groundcontrast: [111 1 1 1]/3. Statistical maps were initially
thresholded at a level of P< 0.001 uncorrected, and peaks were
considered signicant only if they survived family-wise error
(FWE) correction at P< 0.05 across the whole- brain volume. Be-
cause we had prior hypotheses regarding PT and IPS based on
our fMRI results (Teki et al. 2011), FWE correction was also applied
using small volume correction (SVC; Frackowiak et al. 2003) with-
in these regions. Due to limitations inherent to the resolution of
Figure 3. MEG evoked responses to the no ise SFG stimulus. (Top) Each plot depicts the group-RMS resp onse to the noise SFG stimulu s in the right hemisphere (left
hemisphere responses are identical). The onset of the stimulus occurs at t= 0 and offset at t= 2400 ms, the transition to the gure, as indicated by the dashed vertical
lines, occurs at t= 1200 ms. The responses to the gure and ground segments are shown in the darker and lighter shade of each color: red (FIG8 and GND8), blue (FIG4
and GND4), green (FIG2 and GND2). The shaded gray bars indicate times where a signicant difference between the response to the gure and its corres ponding
control stimulus was observed (based on bootstrap analysis; see Materials and Methods). (Bottom) Mean RMS amplitude in each of the conditions comp uted over the
gure interval (between 1200 and 240 0 ms poststimulus onset). A repeate d-measures ANOVA analysis indicate d signicant diff erences between each FI G and GND
pair. ** indicates P0.01; * indicates P> 0.02.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |5
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
our source analysis, a (conservative) compound mask over PT and
the adjacent Heschls Gyrus (HG) was used. The corresponding
cortical masks were determined by reference to the Juelich histo-
logic atlas (for IPS; Eickhoff et al. 2005;Choi et al. 2006;Scheper-
jans et al. 2008), and the HarvardOxford Structural atlases (for
Heschls gyrus and PT; Desikan et al. 2006), available in FSLview
(http://surfer.nmr.mgh.harvard.edu/), thresholded at 10% prob-
ability. SVC-corrected regions are indicated by asterisks in Ta-
bles 1and 2.
Results
The performance on the incidental visual task was at ceiling for
all participants, suggesting that they remained engaged in the
task throughout the experiment. Since participants were naive
to the nature of the acoustic stimuli, and thus unlikely to actively
attend to the gures, it can be assumed that the observed audi-
tory responses primarily reect bottom-up, stimulus-driven
processes.
Basic SFG: Evoked Responses
Figure 2shows the group-RMS of stimulus-evoked elds, separ-
ately for the corresponding FIG and GND conditions, in the
right hemisphere (a similar pattern is observed in the left hemi-
sphere). In all conditions, a standard sound onset response is ob-
served with a clear M50, M100, and M200 peak complex (indicated
in Fig. 2). The ongoing slow evoked response is characterized by a
constant 40 Hz uctuation of mean evoked power, which follows
the rate of presentation of individual chords (every 25 ms).
Following the transition to the gure, clear evoked responses
are observed in all FIG conditions. This response consists of an
early transient phase characterized by a sharp increase over a
few chords (more evident for FIG8 and FIG4), leading to a local
maximum in evoked activity, and followed by a more sustained
phase until stimulus offset.
The responses to the control GND stimuli allow us to distin-
guish whether the gure-evoked responses are mediated simply
by an increase in energy associated with the additional compo-
nents or relate specically to the computation of temporal coher-
ence, linked to the appearance of the gure. Indeed, a transition
response (i.e., increase in RMS amplitude as a function of the
number of added components) is also present in the GND condi-
tions. However, this response is signicantly lower in amplitude
and lacks the initial transient phase (sharp increase in power),
demonstrating that the response observed for the FIG conditions
is largely driven by the temporal coherence of the components
comprising the gure.
Bootstrap analysis (see Materials and Methods) revealed that
the difference between the response to the gure and its corre-
sponding control condition remains signicant throughout the
gure segment (indicated by the gray-shaded region), until after
sound offset. The rst signicantly different sample (i.e., the time
when the response to the FIG condition rst diverges from that to
GND) occurred at 158 ms (158 ms), 206 ms (195 ms), and 280 ms
(225 ms) posttransition in the left (right) hemispheres for FIG8,
FIG4, and FIG2, respectively (see Fig. 2).
A repeated-measures ANOVA with mean amplitude during
the gure interval as the dependent variable, and condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of hemi-
sphere (F
1,15
=3.25 P=0.091) but conrmed signicant main
effects of condition: F
1,15
= 58.53, P< 0.001, number of added com-
ponents: F
2,30
= 27.26, P< 0.001, as well as a signicant interaction
between condition and number of added components: F
2,30
=
13.25, P< 0.001. The interaction indicates that the amplitude of
mean evoked eld strength is higher for the gure, over and
above the effect of increase in spectral energy, and it increases
signicantly with the number of coherent components in the g-
ure. We refer to this effect as the effect of coherence. A series of
repeated-measures ttests for each FIG and its corresponding
GND (data averaged over hemispheres) conrmed signicant dif-
ferences for all pairs (FIG8 vs. GND8: t=7.01P< 0.001; FIG4 vs.
GND4: t=6.77 P< 0.001; FIG2 vs. GND2: t=4.25P= 0.01 ), demon-
strating that the brains of naive listeners are sensitive to the tem-
poral coherence associated with only 2 repeating components.
Noise SFG: Evoked Responses
Figure 3shows group-RMS of stimulus-evoked elds for the noise
SFG stimuli. The general time course of evoked activity is similar
to that observed for the basic SFG stimulus. The ongoing slow
evoked response is characterized by a constant 20 Hz uctuation
of mean evoked power, which follows the rate of the (loud) noise
bursts interspersed between chords.
Table 2 MEG sources whose activity increased with coherence for the
noise SFG stimulus
Area Hemisphere Response
phase
xyztvalue
PT* R Early 60 24 20 4.21
62 28 6 4.03
PT* L Early 60 32 24 4.12
62 38 14 3.86
PT L Late 62 30 24 6.76
60 38 14 5.44
Postcentral
gyrus
RLate4016 38 5.40
50 14 42 4.99
PT* R Late 54 26 18 4.34
64 34 12 4.15
IPS* R Late 30 40 62 4.18
28 46 54 4.13
Note: Local maxima are shown at P< 0.05 (FWE) at the whole-brain level.
*Small volume-corrected P< 0.05 (FWE).
Table 1 MEG sources whose activity increased with coherence for the
basic SFG stimulus
Brain
area
Hemisphere Response
phase
x y z {mm} tvalue
PT R Early 52 18 12 6.23
64 18 6 6.12
HG* R Early 42 26 8 4.31
IPS R Early 50 56 34 5.98
48 62 28 4.53
IPS L Early 36 72 42 4.81
30 66 46 4.59
PT* L Early 60 32 12 3.56
58 30 18 3.46
IPS R Late 48 56 32 5.08
44 62 28 3.93
PT* R Late 56 6 12 4.16
64 20 8 3.75
PT* L Late 50 20 14 3.75
56 20 4 3.44
Note: Local maxima are shown at P< 0.05 (FWE) at the whole-brain level.
*Small volume-corrected P< 0.05 (FWE).
6|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
The addition of a gure is associated with a sharp increase in
power, followed by a sustained-like phase that persists until
stimulus offset. A bootstrap analysis revealed signicantly great-
er responses to each gure condition compared with its corre-
sponding control, as shown in Figure 3. The latencies at which
FIG responses became signicantly different from the responses
to the GND were approximately 238 ms (300 ms), 720 ms (410 ms),
and 412 ms (412 ms) in the left (right) hemisphere for coherence
of 8, 4, and 2, respectively.
A repeated-measures ANOVA with mean amplitude during
the gure interval as the dependent variable, and condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of hemi-
sphere (F
1,15
=3.21 P=0.093) but conrmed signicant main
effects of condition: F
1,15
= 31.98, P< 0.001, number of added com-
ponents: F
2,30
= 7.28, P= 0.003, as well as a signicant interaction
between condition and number of added components: F
2,30
=
4.55, P= 0.019 (effect of gure coherence). A series of ttests for
each FIG and GND pair (data averaged over hemispheres) con-
rmed signicant differences for all [FIG8 vs. GND8: t=5.02 P<
0.001; FIG4 vs. GND4: t=2.4P= 0.024; FIG2 vs. GND2: t=2.84 P=
0.012], demonstrating that despite the loud noise interspersed
between successive chords (resulting in large power uctuations
across the entire spectrum and therefore reduced power differ-
ences between channels) even a gure consisting of only 2 coher-
entcomponentsisreliablyencodedbythebrainsofnaive
listeners.
A repeated-measures ANOVA (over mean amplitude during
the gure period) with block (basicvs. noise), condition
(FIG vs. GND), number of components (8, 4, and 2), and hemi-
sphere (left vs. right) as factors indicated no main effect of
block (F
1,15
=2.5 P= 0.128) or hemisphere (F
1,15
=3.5,P= 0.08 ) but
conrmed signicant main effects of condition: F
1,15
=61.3,P<
0.001, number of added components: F
2,30
= 23.33, P< 0.001, as
well as an interaction between condition and number of added
components: F
2,30
= 15.06, P< 0.001 (effect of gure coherence; as
observed separately for basicand noisestimuli). The follow-
ing interactions were also signicant: 1) between block and num-
ber of added components F
1,15
= 16.2, P= 0.001, 2) be tween block
and condition F
2,30
= 5.23 P= 0.01, both due to the fact that the ef-
fects of condition and number of components were weaker in the
Noiserelative to the Basicstimuli. Crucially however, both
stimulus types show similar coherence effects.
Basic SFG: Source Analysis
To identify brain regions whose activity is parametrically modu-
lated by the coherence of the gure (on top of the increase in
power associated with the added gure components), we tested
for a signal increase with a parametric contrast over GND8,
GND4, GND2, FIG2, FIG4, and FIG8 conditions (see Materials
and Methods). This contrast mirrors the interaction observed
in the analysis of the time domain data and is in line with our pre-
vious fMRI study, where signicant parametric BOLD responses
were observed in the right PT and IPS (Teki et al. 2011). Although
the spatial resolution of MEG does not match the high resolution
provided by fMRI, recent advances in MEG source inversion tech-
niques permit source localization with a relatively high degree of
precision (López et al. 2014).
To capture effects associated with the initial discovery of the
gures as well as later processes related to tracking the gures
within the random background, we analyzed sources of evoked
eld strength in two 100 ms time windows: 1) Early: starting
from the rst time sample that showed signicant difference
between the gure and ground conditions as determined by the
bootstrap analysis above (FIG 8: t= 158258 ms ; FIG4: t= 195
295 ms; FIG2: t= 225325 ms) and 2) Late: during the sustained
portion of the response, immediately preceding the offset of
the stimulus (i.e., from t= 11001200 ms).
The results for the early phase revealed robust activations in
the PT bilaterally (P< 0.05 FWE), and the right inferior parietal
cortex bordering the supramarginal gyrus that varied paramet-
rically with the coherence of the gure (Fig. 4A;Table1). We
also observed activation in the IPS, and the corresponding activa-
tion clusters were clearly spatially separated from the temporal
sources, even in the uncorrected P< 0.001 t-maps (see Fig. 4).
We also observed some activity in lateral HG that was contiguous
with the PT activity in the right hemisphere only. A separate
mask, centered on bilateral medial HG, suggested that coher-
ence-related activity is also observed in the primary auditory cor-
tex. However, due to this being a post hoc analysis, and also
because of limits inherent to the resolution of MEG source ana-
lysis used here, it is difcult to distinguish this cluster from PT.
Activations in the late phase also involved PT bilaterally and
right inferior parietal cortex (P< 0.05 FWE; small volume-cor-
rected). We also examined activity in the IPS during both time
windows. Figure 4C,Dshow signicant activation clusters in the
IPS (P< 0.05 FWE; small volume-corrected) observed during the
early and later response phase, respectively. There was no inter-
action between response phase (earlyor late) and coherence,
suggesting that IPS and PT contributed to the early and late phase
processing in equal measure.
Figure 5shows the group-averaged source waveforms ex-
tracted from the right PT and IPS for the basic condition. Both
show activation consistent with the sensor-level RMS data (see
Fig. 2). The IPS source exhibits weaker onset and offset responses,
and lower amplitude sustained activity, consistent with its loca-
tion further upstream within the processing hierarchy. Import-
antly, however, the response associated with the appearance of
the gure is similar in magnitude in both areas. A repeated-mea-
sures ANOVA, with area (PT and IPS) and number of components
as factors, was run on the mean amplitude difference between
FIG and GND pairs during the gure period (6001200 ms post
onset). This showed a signicant main effect of number of com-
ponents (F
2,30
= 3.70, P< 0.05) only. The effect of area was not
signicant (F
1,15
= 0.16, P> 0.1), and there was n o interaction
between factors (F
2,30
= 0.87, P>0.1), conrming that the effect
of coherence was equally present in both PT and IPS.
We also conducted simple contrasts of Figureversus
Ground(over the earlytime windows as described above).
The negative contrast (Ground>Figure) was used to address
an alternative hypothesis for the mechanisms underlying gure-
ground segregation, that is, adaptation-based mechanisms (e.g.,
stimulus-specic adaptation, SSA; Nelken 2014;Pérez-González
and Malmierca 2014), which may be sensitive to repetition within
the coherentchannels. This effect would be observable as a
decrease in activation for FIG relative to GND stimuli. However,
the relevant contrast yielded no signicant activations, both
over the entire brain volume and within HG-centered masks.
The positive contrast (Figure>Ground) yielded activations
essentially identical to those reported in Figure 4and Table 1.
Noise SFG: Source Analysis
We examined the sources of evoked eld strength underlying g-
ure-ground processing in the noise SFG stimulus using a 200 ms
long window starting from the rsttimesamplethatshowed
signicant difference between the gure and its ground control
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |7
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
as determined by the bootstrap analysis (FIG 8: t= 238438 ms;
FIG4: t= 410610 ms; FIG2: t= 412612 ms), and another 200 ms
window, during the later phase of the response, immediately
preceding the offset of the gure segment (t= 22002400 ms).
The longer window for source localization in the noise condition
(100 ms SFG chords and 100 ms noise) is effectively equal to the
100 ms window (all SFG chords) used for the localization of
activations in the basic condition.
The results for the early phase revealed robust activations in
the PT bilaterally (P< 0.05 FWE) and the right inferior parietal cor-
tex, including the supramarginal gyrus that varied parametrical-
ly with the coherence of the gure (Fig. 6A; Table 2). Activations in
the late phase (Fig. 6B; Table 2) involved PT bilaterally and right
inferior parietal cortex (P< 0.05 FWE; small volume-corrected).
Figure 6Cshows signicant activity in the right IPS (P<0.05
FWE; small volume-corrected), observed during the later re-
sponse phase only. However, no signicant interaction between
phase (earlyor late) and coherence was found. Thus, despite
the fact that the Noisecondition localization was substantially
noisier than that for the basic condition (as also reected in the
weaker sensor-level responses), the results suggest a pattern of
activation similar to that for the basiccondition.
Discussion
We used MEG to assess the temporal dynamics of stimulus-
driven gure-ground segregation in naive, passively listening
participants. We used the Stochastic Figure-ground(SFG)
stimulusa complex broadband signal, which comprises a
gure,dened by temporal correlation between distinct fre-
quency channels. The SFG stimulus differs from other commonly
used gure-ground signals (Kidd et al. 1994,1995;Micheyl,
Hanson, et al. 2007;Gutschalk et al. 2008;Elhilali, Xiang, et al.
2009) in that the gureand groundoverlap in spectrotemporal
space like most natural sounds do, and segregation can only be
achieved by integration across frequencya nd time (Teki et al. 2013).
Evoked Transition Responses
Our results revealed robust evoked responses, commencing from
about 150200 ms after gure onset, that reect the emergence of
the gurefrom the randomly varying groundin the absence
of directed attention. The amplitude and latency of these re-
sponses varied systematically with the coherence of the gure.
Similar effects of coherence (for a coherence level of 8 and 10)
were recently reported in an EEG study based on a variant of
the basicSFG stimulus which used continuous changes in the
level of coherence (OSullivan et al. 2015). However, they observed
much longer latencies (e.g., 433 ms for a ramped SFG gure with a
coherence of 8) than those here, possibly due to differences in the
stimuli used.
The early transient responses were followed by a sustained-
like phase, continuing until gure offset, the amplitude of
which also varied systematically with coherence. This general
pattern was observed for the basic (Fig. 2) and, remarkably, the
noise SFG stimulus (Fig. 3)where successive chords are
Figure 4. MEG source activations as a func tion of coherence for the basic SFG stimulus. Activations (thresholded at P< 0. 001, uncorrected) are shown on the superior
temporal plane of th e MNI152 template image and the corresponding ycoordinates are overlaid on each image. The he at map adjacent to each gure depict s the T
value. Coordinates of local maxima are provided in Table 1. Maximum response during the early transition period was observed in PT and right inferior parietal cortex
(A) as well as in the right IPS (C). Activity during the later response window was observed in PT bilaterally and the right inferior parietal cortex (B) as well as in both the left
and right IPS (D).
8|Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
separated by loud noise bursts. These results demonstrate that
the underlying brain mechanisms, hypothesized to compute
temporal coherence across frequency channels (Shamma et al.
2011), are robust to interference with the continuity of the
scene, even when listeners were naive and engaged in an inci-
dental visual task.
Figure 5. Group average of source activitywaveforms for the basic SFG stimuli. The average source activity waveforms for the basic SFG stimuli were computed for sources
in the right posterior superior temporal gyrus (MNI coordinates [64, 14, 8]; left panels) and the right intraparietal sulcus (MNI coordinates [54, 50, 40]; right panels). The 6
experimental conditions (FIG and GND; 2, 4 and 8 added components) were inverted together over the entire stimulus epoch, and the corresponding source activity was
extracted using the maximum a posteriori (MAP) projector in SPM12 (López et al. 2014). The resulting time-course datawere rst averaged over trials, then over all available
subjects (N= 16). The onset (t= 0 ms) and offset (t= 1200 ms) of the stimulus aremarked by solid vertical lines; dashed vertical lines indicate the transition to the gure
(t= 600 ms). The brain insets in each panel indicate the location of the sources for each region.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |9
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
We additionally show that these transition responses scale
with coherence not only at the sensor level but also at the level
of the underlying neural sources. As shown in Figure 5, group-
averaged source waveforms from the right PT show a similar
morphology to the sensor-level transition responses: an initial
transient peak is followed by a more sustained response, and
the amplitude of these 2 response components varies paramet-
rically with the coherence. Interestingly, group-averaged source
responses from the right IPS also show striking coherence-modu-
lated transition responses.
Neural Substrates of Figure-Ground Segregation
The discussion below is predominantly focused on the localiza-
tion results from the basiccondition. The responses for the
noisecondition were overall consistent with the basicre-
sponses but, as expected, yielded weaker effects.
The approach for identifying the neural substrates underlying
the detection of the SFG gures was based on a parametric con-
trast, seeking brain areas where activity is parametrically modu-
lated by the coherence of the gure. We also investigated a simple
ground > gurecontrast to address the alternative hypothesis
that gure pop-out may be driven by frequency-specic adapta-
tion (Nelken 2014). According to this account, the presence of the
gure may be detectable as a (repetition-based) decrease in activ-
ity within the coherentchannels. That the ground versus gure
contrast yielded no signicant activations, and in particular none
in the primary auditory cortex where stimulus-specic
adaptation is widely observed, suggests that adaptation may
not be the principal mechanism underlying gure-ground segre-
gation in the SFG stimulus. This is also in line with behavioral re-
sults that show that listeners can withstand signicant amounts
of interference, such as loud noise bursts up to 500 ms long, be-
tween successive chords (Teki et al. 2013).
Using the parametric contrast, we analyzed sources of evoked
eld strength in 2 different time windows to potentially capture 2
distinct response components: an early transient response re-
ecting the detection of the gure and later processes related to
following the gure amidst the background. We found signicant
activations in PT (Figs 4A,Band 6A,B) in both the early and later
stages. This is in agreement with previous human fMRI studies
of segregation (Gutschalk et al. 2007;Wilson et al. 2007;Schad-
winkel and Gutschalk 2010a,b) based on simple tone streams
with different spectral or spatial cues for segregation. The similar
patternof activations in PT for both stimulus conditions suggests a
common stimulus-driven segregation mechanism that is sensitive
to the emergence of salient targets in complex acoustic scenes.
Teki et al. (2011) did not observe any signicant BOLD activa-
tion related to gure-ground segregation in primary auditory cor-
tex in the region of medial Heschls gyrus. Similarly, Elhilali, Ma,
et al. (2009) did not nd evidence of temporal coherence-based
computations in the primary auditory cortex of awake ferrets
passively listening to synchronous streaming signals. This
could possibly be due to the low spike latencies (20 ms) in pri-
mary cortex, whereby longer integration windows as observed
in secondary and higher order auditory cortices (Bizley et al.
Figure 6. MEG sourceactivations as a functionof coherence forthe noise SFG stimulus.Activations (thresholded at P< 0.001,uncorrected) are shownon the superior temporal
plane of theMNI152 template image,and the correspondingycoordinatesare overlaid on eachimage. The heat mapadjacent to each guredepicts the tvalue. Coordinatesof
local maxima areprovided in Table 2. Maximum responseduring the early transition period was observedin the rightPT and left MTG (A). Signicantactivity during the late
response period was observed in the PT bilaterally as well asthe right precentral gyrus and rolandic operculum (B) and in the left IPS (C).
10 |Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
2005;Hackett 2011;Atiani et al. 2014) might be crucial for analysis
of temporal coherence across remote cortical ensembles. The
present results tentatively indicate some evidence of coher-
ence-related activation in human primary auditory cortex during
the early phase, but we cannot exclude the possibility that theob-
served cluster reects spillageof activity from PT and the issue
should be elaborated on with further work. Although how and
where the precise computational operations that underlie tem-
poral coherence analysis (Krishnan et al. 2014) are implemented
in the brain is not completely clear, it is likely that such opera-
tions occur along a processing hierarchy whereby cells in higher
order centers abstract temporal responses from lower level pro-
cessing stages. The present results demonstrate that PT forms
part of this network.
We found signicant activity in the IPS during both early and
late response phases (Figs 4C,D and 6C). These results are in line
with our previous fMRI work where we observed that activity in
the IPS increases parametrically with the coherence of the gures
(Teki et al. 2011). The nding that IPS activity is modulated sys-
tematically by coherence is consistent with earlier work implicat-
ing the IPS in perceptual organization of streaming signals
(Cusack 2005). Since this area lies outside of the classicden-
ition of the auditory system, it has previously been suggested
that IPS activation may not reect auditory processing per se
but rather relate to attentional effects such as the application of
top-down attention (Cusack 2005) or the perceptual conse-
quences of a bottom-up pop-out process(Shamma and Micheyl
2010;Teki et al. 2011). Due to the inherently low temporal reso-
lution of fMRI, and hence the lack of precise information regard-
ing the timing of the observed BOLD activations, this conjecture
was unresolvable in previous data. Our subjects were naive and
occupied by an incidental task and as such it is unlikely that
they were actively trying to hear out the gures from within the
background. This, together with the nding that coherence-
modulated IPS activity is observed at the earliest stages of the
evoked response, strongly supports the hypothesis that IPS is in-
volved in the initial stages of gure-ground segregation.
Because the computation of temporal coherence relies on re-
liable, phase-locked encoding of rapidly evolving auditory infor-
mation, it is likely that the temporal coherence maps as such are
computed in auditory cortex, perhaps in PT. IPS might be in-
volved in reading out these coherence maps or in the actual pro-
cess of perceptual segregation (encoding the input as consisting
of several sources rather than a single mixture). Specically, IPS
may represent a computational hub that integrates auditory
input from the auditory parabelt (Pandya and Kuypers 1969;
Divac et al. 1977;Hyvarinen 1982) and forms a relay station be-
tween the sensory and prefrontal cortex, which associates sen-
sory signals with behavioral meaning (Petrides and Pandya
1984;Fritz et al. 2010). Similar computational operations have
been attributed to the parietal cortex in saliency map models of
visual feature search (Gottlieb et al. 1998;Itti and Koch 2001;
Walther and Koch 2007;Geng and Mangun 2009). Overall our re-
sults suggest that IPS plays an automatic, bottom-up role in audi-
tory gure-ground processing, and call for a re-examination of
the prevailing assumptions regarding the neural computations
and circuits that mediate auditory scene analysis.
Funding
This work is supported by the Wellcome Trust (WT091681MA and
093292/Z/10/Z). S.T. is supported by the Wellcome Trust (106084/
Z/14/Z). Funding to pay the Open Access publication charges for
this article was provided by Wellcome Trust.
Notes
We thank Alain de Cheveigné and the MEG group at the Well-
come Trust Centre for Neuroimaging for technical support. Con-
ict of Interest: The authors declare no competing nancial
interests.
References
Atiani S, David SV, Elgueda D, Locastro M, Radtke-Schuller S,
Shamma SA, Fritz JB. 2014. Emergent selectivity for task-
relevant stimuli in higher-order auditory cortex. Neuron.
82:486499.
Bizley JK, Nodal FR, Nelken I, King AJ. 2005. Functional organiza-
tion of ferret auditory cortex. Cereb Cortex. 15:16371653.
Carlyon RP. 2004. How the brain separates sounds. Trends Cog Sci.
8:465471.
Choi HJ, Zilles K, Mohlberg H, Schleicher A, Fink GR, Armstrong E,
Amunts K. 2006. Cytoarchitectonic identication and prob-
abilistic mapping of two distinct areas within the anterior
ventral bank of the human intraparietal sulcus. J Comp
Neurol. 495:5369.
Christiansen SK, Jepsen ML, Dau T. 2014. Effects of tonotopicity,
adaptation, modulation tuning, and temporal coherence in
primitiveauditory stream segregation. J Acoust Soc Am.
135:323333.
Cusack R. 2005. The intraparietal sulcus and perceptual organiza-
tion. J Cogn Neurosci. 17:641651.
de Cheveigné A, Parra LC. 2014. Joint decorrelation, aversatile tool
for multichannel data analysis. Neuroimage. 98:487505.
Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC,
Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT,,
et al. 2006. An automated labeling system for subdividing
the human cerebral cortex on MRI scans into gyral based re-
gions of interest. Neuroimage. 31:968980.
Divac I, Lavail JH, Rakic P, Winston KR. 1977. Heterogenous afferents
to the inferior parietal lobule of the rhesus monkey revealed by
the retrograde transport method. Brain Res. 123:197207.
Efron B, Tibshirani R. 1993. An introduction to the bootstrap. Boca
Raton (FL): Chapman & Hall/CRC.
Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR,
Amunts K, Zilles K. 2005. A new SPM toolbox for combining
probabilistic cytoarchitectonic maps and functional imaging
data. Neuroimage. 25:13251335.
Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA. 2009. Tem-
poral coherence in the perceptual organization and cortical
representation of auditory scenes. Neuron. 61:317329.
Elhilali M, Xiang J, Shamma SA, Simon JZ. 2009. Interaction
between attention and bottom-up saliency mediates the re-
presentation of foreground and background in an auditory
scene. PLoS Biol. 7:e1000129.
Fishman YI, Arezzo JC, Steinschneider M. 2004. Auditory stream
segregation in monkey auditory cortex: effects of frequency
separation, presentation rate, and tone duration. J Acoust
Soc Am. 116:16561670.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M. 2001. Neural
correlates of auditory stream segregation in primary auditory
cortex of the awake monkey. Hear Res. 151:167187.
Fishman YI, Steinschneider M. 2010. Formation of auditory
streams. In: Rees A, Palmer AR, editors. The Oxford handbook
of auditory science. Oxford: Oxford University Press.
Frackowiak RSJ, Friston KJ, Frith C, Dolan R, Price CJ, Zeki S,
Ashburner J, Penny WD. 2003. Human brain function. 2nd
ed. Cambridge: Academic Press.
Neural Correlates of Auditory Figure-Ground Segregation Teki et al. |11
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
Fritz JB, David SV, Radtke-Schuller S, Yin P, Shamma SA. 2010.
Adaptive, behaviorally gated, persistent encoding of task-
relevant auditory information in ferret frontal cortex. Nat
Neurosci. 13(8):10111019.
Geng JJ, Mangun GR. 2009. Anterior intraparietal sulcus is sensi-
tive to bottom-up attention driven by stimulus salience.
J Cogn Neurosci. 21:15841601.
Gottlieb JP, Kusunoki M, Goldberg ME. 1998. The representation of
visual salience in monkey parietal cortex. Nature. 391:481484.
Gutschalk A, Micheyl C, Oxenham AJ. 2008. Neural correlates of
auditory perceptual awareness under informational masking.
PLoS Biol. 6:e138.
Gutschalk A, Oxenham AJ, Micheyl C, Wilson EC, Melcher JR.
2007. Human cortical activity during streaming without spec-
tral cues suggests a general neural substrate for auditory
stream segregation. J Neurosci. 27:1307413081.
Hackett TA. 2011. Information ow in the auditory cortical
network. Hear Res. 271:133146.
Hämäläinen MS, Ilmoniemi RJ. 1994. Interpreting magnetic elds
of the brain: minimum norm estimates. Med Biol Eng Comput.
32:3542.
Hyvarinen J. 1982. The parietal cortex of monkey and man. Berlin:
Springer-Verlag.
Itti L, Koch C. 2001. Computational modeling of visual attention.
Nat Rev Neurosci. 2:194203.
Kidd G, Mason CR, Dai H. 1995. Discriminating coherence in spec-
tro-temporal patterns. J Acoust Soc Am. 97:37823790.
Kidd G, Mason CR, Deliwala PS, Woods WS, Colburn HS. 1994.
Reducing informational masking by sound segregation.
J Acoust Soc Am. 95:34753480.
Kidd G, Richards VM, Streeter T, Mason CR. 2011. Contextual
effects in the identication of nonspeech auditory patterns.
J Acoust Soc Am. 130:39263938.
Krishnan L, Elhilali M, Shamma S. 2014. Segregating complex
sound sources through temporal coherence. PLoS Comput
Biol. 10:e1003985.
Litvak V, Mattout J, Kiebel S, Phillips C, Henson R, Kilner J, Barnes G,
Oostenveld R, Daunizeau J, Flandin G,, et al. 2011. EEG and MEG
data analysis in SPM8. Comput Intell Neurosci. 1:32.
López JD, Litvak V, Espinosa JJ, Friston K, Barnes GR. 2014.
Algorithmic procedures for Bayesian MEG/EEG source recon-
struction in SPM. NeuroImage. 84:476487.
Mattout J, Henson RN, Friston KJ. 2007. Canonical source
reconstruction for MEG. Comput Intell Neurosci. 67613. doi:
10.1155/2007/67613.
Micheyl C, Carlyon RP, Gutschalk A, Melcher JR, Oxenham AJ,
Rauschecker JP, Tian B, Courtenay Wilson E. 2007. The role
of auditory cortex in the formation of auditory streams.
Hear Res. 229:116131.
Micheyl C, Hanson C, Demany L, Shamma S, Oxenham AJ. 2013.
Auditory stream segregation for alternating and synchronous
tones. J Exp Psychol Hum Percept Perform. 39(6):15681580.
Micheyl C, Kreft H, Shamma S, Oxenham AJ. 2013. Temporal co-
herence versus harmonicity in auditory stream formation.
J Acoust Am Soc. 133(3):188194.
Micheyl C, Shamma S, Oxenham AJ. 2007. In: Kollmeier B,
Klump G, Hohmann V, Langemann U, Mauermann M,
Uppenkamp S, Verhey J, editors. Hearing from basic
research to application. Berlin: Springer. p. 267274.
Moore BCJ, Glasberg BR. 1983. Suggested formulae for calculating
auditory-lter bandwidths and excitation patterns. J Acoust
Soc Am. 74:750753.
Moore BCJ, Gockel HE. 2012. Properties of auditory stream forma-
tion. Phil Trans R Soc. 367:919931.
Nelken I. 2014. Stimulus-specic adaptation and deviance detec-
tion in the auditory system: experiments and models. Biol
Cybern. 108:655663.
Nolte G. 2003. The magnetic lead eld theorem in the quasi-static
approximation and its use for magnetoencephalography
forward calculation in realistic volume conductors. Phys
Med Biol. 48:36373652.
Oostenveld R, Fries P, Maris E, Schoffelen J-M. 2011. FieldTrip:
open source software for advanced analysis of MEG, EEG,
and invasive electrophysiological data. Comput Intell
Neurosci. 2011:156869.
OSullivan JA, Shamma SA, Lalor EC. 2015. Evidence for neural
computations of temporal coherence in an auditory scene
and their enhancement during active listening. J Neurosci.
35:72567263.
Pandya DN, Kuypers HGJM. 1969. Cortico-cortical connections in
the rhesus monkey. Brain Res. 13:1336.
Penny WD, Stephan KE, Mechelli A, Friston KJ. 2004. Comparing
dynamic causal models. NeuroImage. 22:11571172.
Pérez-González D, Malmierca MS. 2014. Adaptation in the audi-
tory system: an overview. Front Integr Neurosci. 8:19.
Petrides M, Pandya DN. 1984. Projections to the frontal cortex
from the posterior parietal region in the rhesus monkey.
J Comp Neurol. 228:105116.
Plack CJ, Moore BCJ. 1990. Temporal window shape as a function
of frequency and level. J Acoust Soc Am. 87:21782187.
Roberts TP, Ferrari P, Stufebeam SM, Poeppel D. 2000. Latency
of the auditory evoked neuromagnetic eld components:
stimulus dependence and insights toward perception. J Clin
Neurophysiol. 17:114129.
Schadwinkel S, Gutschalk A. 2010a. Activity associated with
stream segregation in human auditory cortex is similar for
spatial and pitch cues. Cereb Cortex. 20:28632873.
Schadwinkel S, Gutschalk A. 2010b. Functional dissociation
of transient and sustained fMRI BOLD components in human
auditory cortex revealed with a streaming paradigm based on
interaural time differences. Eur J Neurosci. 32:19701978.
Scheperjans F, Eickhoff SB, Hömke L, Mohlberg H, Hermann K,
Amunts K, Zilles K. 2008. Probabilistic maps, morphometry,
and variability of cytoarchitectonic areas in the human super-
ior parietal cortex. Cereb Cortex. 18:21412157.
Shamma SA, Elhilali M, Micheyl C. 2011. Temporal coherence and
attention in auditory scene analysis. Trends Neurosci.
34:114123.
Shamma SA, Micheyl C. 2010. Behind the scenes of auditory
perception. Curr Opin Neurobiol. 20:361366.
Snyder JS, Gregg MK, Weintraub DM, Alain C. 2012. Attention,
awareness, and the perception of auditory scenes. Front
Psychol. 3:15.
Teki S, Chait M, Kumar S, Shamma S, Grifths TD. 2013. Segrega-
tion of complex acoustic scenes based on temporal coher-
ence. eLife. 2:e00699.
Teki S, Chait M, Kumar S, von Kriegstein K, Grifths TD. 2011.
Brain bases for auditory stimulus-driven gure-ground
segregation. J Neurosci. 31:164171.
Teki S, Kumar S, Grifths TD. 2016. Large-scale analysis of audi-
tory segregation behavior crowdsourced via a smartphone
app. PLoS ONE. 11:e0153916.
Walther DB, Koch C. 2007. Attention in hierarchical models of
object recognition. Prog Brain Res. 165:5778.
Wilson EC, Melcher JR, Micheyl C, Gutschalk A, Oxenham AJ.
2007. Cortical FMRI activation to sequences of tones alternat-
ing in frequency: relationship to perceived rate and stream-
ing. J Neurophysiol. 97:22302238.
12 |Cerebral Cortex
by guest on June 20, 2016http://cercor.oxfordjournals.org/Downloaded from
... In Experiment I, the effect of selec7ve aJen7on reshapes the organiza7on of the responses to the two-tone sequences (Fig. 1B). In Experiment II, responses in a passively listening animal exhibit the consequences of binding on the coherent tones of the figure (Fig. 1C), in agreement with findings from EEG human recordings passively listening to similar kinds of s7muli [11]. Finally, we employ func7onal UltraSound (fUS) imaging of the cortical responses to view large-scale spa7otemporal dynamics of brain ac7vity during coherent ac7va7on and thus gain a mul7-scale perspec7ve of the binding process. ...
... While these temporal coherence phenomena have already been demonstrated in physiological recordings in EEG/MEG/ECoG human studies [10,11,17,18,21,33], the experiments here offered new tests of binding and temporal coherence in animals where it is possible to conduct more detailed assessment of the changes in neuronal responses and their dynamics under different aJen7onal condi7ons [7]. For instance, Experiment I (Fig. 1B) deployed for the first-7me mulCple simultaneous streams (sequences) of different tokens, e.g., tones and noise-bursts, that could be perceptually re-organized by the animal in different ways depending on how aJen7on was selec7vely applied. ...
... For 4-tone figure, which is perceptually difficult for humans, the change curves centered with a low peak. For 10-tone figures, which produce a strong perceptual pop-out [9,11,14], the curve is right-shiTed, with a peak in the posi7ve region. ...
Preprint
Full-text available
Binding the attributes of a sensory source is necessary to perceive it as a unified entity, one that can be attended to and extracted from its surrounding scene. In auditory perception, this is the essence of the cocktail party problem in which a listener segregates one speaker from a mixture of voices, or a musical stream from simultaneous others. It is postulated that coherence of the temporal modulations of a source's features is necessary to bind them. The focus of this study is on the role of temporal-coherence in binding and segregation, and specifically as evidenced by the neural correlates of rapid plasticity that enhance cortical responses among synchronized neurons, while suppressing them among asynchronized ones. In a first experiment, we find that attention to a sound sequence rapidly binds it to other coherent sequences while suppressing nearby incoherent sequences, thus enhancing the contrast between the two. In a second experiment, a sequence of synchronized multi-tone complexes (figure), embedded in a cloud of randomly dispersed desynchronized tones (ground), perceptually and neurally pops-out after a fraction of a second highlighting the binding among its coherent tones against the incoherent background. These findings demonstrate the role of temporal-coherence in binding and segregation.
... [22][23][24][25] Figure-ground segregation has been recently studied with the help of tone clouds, a series of short chords composed of several pure tones with random frequencies. The figure within the cloud consists of a set of tones progressing together in time, while the rest of the tones randomly vary from chord to chord (background [24][25][26][27]. When the frequency range of the figure and the background tone set spectrally overlap, the figure is only distinguishable by parsing the coherently behaving tones across frequency (concurrent grouping) and time (sequential grouping). ...
... Figure-related neural responses commence as early as after two temporally coherent chords (ca. 150 ms from the onset of the figure 24,27,29 ). Figure detection accuracy scales with figure coherence and duration (the number of consecutive figure tone-sets presented). This suggests that both spectral and temporal integration processes are involved in figure detection. ...
... 25,26,32 This is because spectrally constant and variable figures can be detected equally efficiently [24][25][26] and the segregation process is robust against interruptions. 27 Some results suggest that figure-ground segregation is primarily pre-attentive. 26,27 However, there is also evidence that figure-ground segregation can be modulated by attention 25 and cross-modal cognitive load. ...
Article
Full-text available
The study investigates age-related decline in listening abilities, particularly in noisy environments, where the challenge lies in extracting meaningful information from variable sensory input (figure-ground segregation). The research focuses on peripheral and central factors contributing to this decline using a tone-cloud-based figure detection task. Results based on behavioral measures and event-related brain potentials (ERPs) indicate that, despite delayed perceptual processes and some deterioration in attention and executive functions with aging, the ability to detect sound sources in noise remains relatively intact. However, even mild hearing impairment significantly hampers the segregation of individual sound sources within a complex auditory scene. The severity of the hearing deficit correlates with an increased susceptibility to masking noise. The study underscores the impact of hearing impairment on auditory scene analysis and highlights the need for personalized interventions based on individual abilities.
... The processing of auditory patterns and formation of auditory objects ( gures) is associated with a sustained negative shift in neural current recorded with MEG/EEG [30,31,[38][39][40][41], which we will refer to here as sustained processing negativity (SPN). It has been suggested that SPN re ects the persistent activity of "non-synchronized" neural populations [41,42]. ...
... The presence of SPN in response to periodicity or formant structure is consistent with the previous ndings in neurotypical children and adults [39,41,88] and extends these ndings to children with ASD, at least those with phrasal speech. Similar sustained negative shifts of current were observed in several recent MEG and EEG studies in response to sounds that can be characterized as acoustic patterns distinguished on the basis of their temporal properties such as periodicity [91] or frequency composition, either static or coherently changing in time [30,38,40,[92][93][94]. It has been suggested that this negative shift re ects the fundamental cortical mechanism of automatic grouping in the auditory modality [94]. ...
Preprint
Full-text available
Background Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory “object”. Methods We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio. Results SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150–200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise Conclusions These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.
... The processing of auditory patterns and formation of auditory objects (figures) is associated with a sustained negative shift in neural current recorded with MEG/EEG [30,31,[38][39][40][41], which we will refer to here as sustained processing negativity (SPN). It has been suggested that SPN reflects the persistent activity of "non-synchronized" neural populations [41,42]. ...
... The presence of SPN in response to periodicity or formant structure is consistent with the previous findings in neurotypical children and adults [39,41,88] and extends these findings to children with ASD, at least those with phrasal speech. Similar sustained negative shifts of current were observed in several recent MEG and EEG studies in response to sounds that can be characterized as acoustic patterns distinguished on the basis of their temporal properties such as periodicity [91] or frequency composition, either static or coherently changing in time [30,38,40,[92][93][94]. It has been suggested that this negative shift reflects the fundamental cortical mechanism of automatic grouping in the auditory modality [94]. ...
Preprint
Full-text available
Background Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information ( f0 ) and its transformation into an auditory “object”. Methods We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio. Results SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150 - 200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise Conclusions These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.
... Indeed, temporal coherence is a sufficient prerequisite for segregating repeating tones from a cloud of randomly varying tones (Fig. 2b) in humans 87,94 and in rhesus and macaque monkeys 95 . There is also neural evidence compatible with the notion that temporal coherence is computed in the human brain [96][97][98] . Similar to timbre, measures of temporal coherence essentially capture the invariances characterizing the source over relatively long timescales. ...
Article
Sounds are generated by interactions between objects in the world, thus carrying information about both the sound sources and their sound-generating behaviors. This dual nature of auditory information has long posed a problem for staged theories of perception in defining and investigating auditory object representations. Here we propose that auditory source and action representations are separable. They differ from each other in how they are formed (especially the role of prediction in their formation), the information they carry, how they are experienced and remembered, and by the brain responses associated with them. We also suggest that auditory source and action representations are part of event segmentation, the general function of structuring information about the environment and what is happening in it. In real life, the auditory scene is resolved together with the other modalities, producing an integrated episodic description of the environment. Therefore, event segmentation may guide the integration between information obtained in different modalities and the effects of learned knowledge on auditory scene analysis. Thus, the event segmentation framework offers important advantages for the development of more comprehensive theories and computational models of sound perception in natural scenes.
... The planum temporale has been previously reported to contain segregated representations of the acoustic input (Smith et al., 2010) and is commonly thought to be essentially involved in auditory scene analysis (Griffiths and Warren, 2002;Zatorre et al., 2002;Deike et al., 2010;Ragert et al., 2014;Teki et al., 2016). Consistent with the view that attentional selection in competing-speaker experiments acts strongly on segregated speech representations, our data show a marked increase in attention-driven modulations of stimulus-driven activity at later stages of the auditory processing hierarchy, with the (left) posterior planum temporale (Te2.2) being the hierarchically lowest region showing statistically significant effects. ...
Article
Full-text available
Real-world listening settings often consist of multiple concurrent sound streams. To limit perceptual interference during selective listening, the auditory system segregates and filters the relevant sensory input. Previous work provided evidence that the auditory cortex is critically involved in this process and selectively gates attended input toward subsequent processing stages. We studied at which level of auditory cortex processing this filtering of attended information occurs using functional magnetic resonance imaging (fMRI) and a naturalistic selective listening task. Forty-five human listeners (of either sex) attended to one of two continuous speech streams, presented either concurrently or in isolation. Functional data were analyzed using an inter-subject analysis to assess stimulus-specific components of ongoing auditory cortex activity. Our results suggest that stimulus-related activity in the primary auditory cortex and the adjacent planum temporale are hardly affected by attention, whereas brain responses at higher stages of the auditory cortex processing hierarchy become progressively more selective for the attended input. Consistent with these findings, a complementary analysis of stimulus-driven functional connectivity further demonstrated that information on the to-be-ignored speech stream is shared between the primary auditory cortex and the planum temporale but largely fails to reach higher processing stages. Our findings suggest that the neural processing of ignored speech cannot be effectively suppressed at the level of early cortical processing of acoustic features but is gradually attenuated once the competing speech streams are fully segregated.
... In NH listeners, behavioral measures of SFG perception correlated with speech-in-noise ability independently of the audiometric thresholds, which ranged between −10 and 20 dB SPL in the frequency range of 250-8000 Hz, when the SFG stimuli were presented at a fixed level for all the participants [24], validating the crucial role of this ability for speech-in-noise perception. Previous studies support the idea that a possible mechanism that makes the SFG task doable is detecting temporal coherence between figure elements [37], which occurs in and beyond the auditory cortex [36,38,39]. Electrical hearing in CI users preserves the temporal envelope of the signal in different frequency bands, while limiting the temporal fine structure cues. ...
Article
Full-text available
Objectives Cochlear implant (CI) users exhibit large variability in understanding speech in noise. Past work in CI users found that spectral and temporal resolution correlates with speech-in-noise ability, but a large portion of variance remains unexplained. Recent work on normal-hearing listeners showed that the ability to group temporally and spectrally coherent tones in a complex auditory scene predicts speech-in-noise ability independently of the audiogram, highlighting a central mechanism for auditory scene analysis that contributes to speech-in-noise. The current study examined whether the auditory grouping ability also contributes to speech-in-noise understanding in CI users. Design Forty-seven post-lingually deafened CI users were tested with psychophysical measures of spectral and temporal resolution, a stochastic figure-ground task that depends on the detection of a figure by grouping multiple fixed frequency elements against a random background, and a sentence-in-noise measure. Multiple linear regression was used to predict sentence-in-noise performance from the other tasks. Results No co-linearity was found between any predictor variables. All three predictors (spectral and temporal resolution plus the figure-ground task) exhibited significant contribution in the multiple linear regression model, indicating that the auditory grouping ability in a complex auditory scene explains a further proportion of variance in CI users’ speech-in-noise performance that was not explained by spectral and temporal resolution. Conclusion Measures of cross-frequency grouping reflect an auditory cognitive mechanism that determines speech-in-noise understanding independently of cochlear function. Such measures are easily implemented clinically as predictors of CI success and suggest potential strategies for rehabilitation based on training with non-speech stimuli.
Article
Full-text available
Explored through EEG/MEG, auditory stimuli function as a suitable research probe to reveal various neural activities, including event-related potentials, brain oscillations and functional connectivity. Accumulating evidence in this field stems from studies investigating neuroplasticity induced by long-term auditory training, specifically cross-sectional studies comparing musicians and non-musicians as well as longitudinal studies with musicians. In contrast, studies that address the neural effects of short-term interventions whose duration lasts from minutes to hours are only beginning to be featured. Over the past decade, an increasing body of evidence has shown that short-term auditory interventions evoke rapid changes in neural activities, and oscillatory fluctuations can be observed even in the prestimulus period. In this scoping review, we divided the extracted neurophysiological studies into three groups to discuss neural activities with short-term auditory interventions: the pre-stimulus period, during stimulation, and a comparison of before and after stimulation. We show that oscillatory activities vary depending on the context of the stimuli and are greatly affected by the interplay of bottom-up and top-down modulational mechanisms, including attention. We conclude that the observed rapid changes in neural activitiesin the auditory cortex and the higher-order cognitive part of the brain are causally attributed to short-term auditory interventions.
Article
Perception is sensitive to statistical regularities in the environment, including temporal characteristics of sensory inputs. Interestingly, implicit learning of temporal patterns in one modality can also improve their processing in another modality. However, it is unclear how cross-modal learning transfer affects neural responses to sensory stimuli. Here, we recorded neural activity of human volunteers using electroencephalography (EEG), while participants were exposed to brief sequences of randomly timed auditory or visual pulses. Some trials consisted of a repetition of the temporal pattern within the sequence, and subjects were tasked with detecting these trials. Unknown to the participants, some trials reappeared throughout the experiment across both modalities (Transfer) or only within a modality (Control), enabling implicit learning in one modality and its transfer. Using a novel method of analysis of single-trial EEG responses, we showed that learning temporal structures within and across modalities is reflected in neural learning curves. These putative neural correlates of learning transfer were similar both when temporal information learned in audition was transferred to visual stimuli and vice versa. The modality-specific mechanisms for learning of temporal information and general mechanisms which mediate learning transfer across modalities had distinct physiological signatures: temporal learning within modalities relied on modality-specific brain regions while learning transfer affected beta-band activity in frontal regions.
Article
Full-text available
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns ('figures') from overlapping 'background' signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation.
Article
Full-text available
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance-the capacity to make sense of complex 'auditory scenes' is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the 'stochastic figure-ground' stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a 'game' featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders.
Article
Full-text available
The human brain has evolved to operate effectively in highly complex acoustic environments, segregating multiple sound sources into perceptually distinct auditory objects. A recent theory seeks to explain this ability by arguing that stream segregation occurs primarily due to the temporal coherence of the neural populations that encode the various features of an individual acoustic source. This theory has received support from both psychoacoustic and functional magnetic resonance imaging (fMRI) studies that use stimuli which model complex acoustic environments. Termed stochastic figure-ground (SFG) stimuli, they are composed of a "figure" and background that overlap in spectrotemporal space, such that the only way to segregate the figure is by computing the coherence of its frequency components over time. Here, we extend these psychoacoustic and fMRI findings by using the greater temporal resolution of electroencephalography to investigate the neural computation of temporal coherence. We present subjects with modified SFG stimuli wherein the temporal coherence of the figure is modulated stochastically over time, which allows us to use linear regression methods to extract a signature of the neural processing of this temporal coherence. We do this under both active and passive listening conditions. Our findings show an early effect of coherence during passive listening, lasting from ∼115 to 185 ms post-stimulus. When subjects are actively listening to the stimuli, these responses are larger and last longer, up to ∼265 ms. These findings provide evidence for early and preattentive neural computations of temporal coherence that are enhanced by active analysis of an auditory scene. Copyright © 2015 the authors 0270-6474/15/357256-08$15.00/0.
Article
Full-text available
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments.
Article
Full-text available
A variety of attention-related effects have been demonstrated in primary auditory cortex (A1). However, an understanding of the functional role of higher auditory cortical areas in guiding attention to acoustic stimuli has been elusive. We recorded from neurons in two tonotopic cortical belt areas in the dorsal posterior ectosylvian gyrus (dPEG) of ferrets trained on a simple auditory discrimination task. Neurons in dPEG showed similar basic auditory tuning properties to A1, but during behavior we observed marked differences between these areas. In the belt areas, changes in neuronal firing rate and response dynamics greatly enhanced responses to target stimuli relative to distractors, allowing for greater attentional selection during active listening. Consistent with existing anatomical evidence, the pattern of sensory tuning and behavioral modulation in auditory belt cortex links the spectrotemporal representation of the whole acoustic scene in A1 to a more abstracted representation of task-relevant stimuli observed in frontal cortex.
Book
I. Introduction.- II. Anatomy and Evolution of the Parietal Lobe in Monkeys and Man.- A. Anatomy.- B. Evolution.- III. Functional Properties of Neurones in the Primary Somatosensory Cortex.- A. Comments About Methods.- B. Movement and Orientation Selective Neurones in SI.- C. Receptive Field Integration and Submodality Convergence in SI.- D. Influence of Attention on Neuronal Function in SI.- IV. Neural Connections in the Posterior Parietal Lobe of Monkeys.- A. Connections of Area 5.- B. Connections of Area 7.- C. Summary of Connections.- V. Symptoms of Posterior Parietal Lesions.- A. Humans.- 1. Visuo-Spatial Disorientation.- 2. Defects in Eye Movements.- 3. Misreaching.- 4. Constructional Apraxia.- 5. Unilateral Neglect.- 6. Gerstmann Syndrome.- B. Monkeys.- 1. Visuo-Spatial Disorientation.- 2. Defects in Eye Movements.- 3. Misreaching.- 4. Unilateral Neglect.- 5. Somatic Deficits.- C. Comparison of Monkeys and Man.- VI. Electrical Stimulation of Posterior Parietal Lobe.- A. Monkey.- B. Man.- VII. Neuronal Activity in Area 5.- A. Sensory Properties.- B. Motor Properties.- C. Sensorimotor Interaction in Area 5.- VIII.Neuronal Activity in Area 7.- A. Visual and Oculomotor Mechanisms.- 1. Visual Fixation Neurones.- 2. Visual Tracking Neurones.- 3. Saccade Neurones...- 4. Visual Sensory Neurones.- B. Somatic Mechanisms.- 1. Cutaneous Responses.- 2. Kinaesthetic Responses.- 3. Activity Related to Somatic Movements.- C. Convergence of Somatic and Visual Functions.- D. Behavioural Mechanisms.- E. Effects of Drugs.- IX. Vestibular and Auditory Responses in the Parietal Lobe.- A. Vestibular Responses.- B. Auditory Responses in Area Tpt.- X. Regional Distribution of Functions in Area 7.- A. Mapping Methods.- B. Distribution of Responses.- 1. Visual Responses.- 2. Somatic Responses.- 3. Combined Responses from Several Modalities.- C. Somatotopy in Area 7.- D. Functional Differentiation.- XI. Modification of Area 7 and Functional Blindness After Visual Deprivation.- A. Visual Deprivation.- B. Deprivation Effects on the Visual Pathways.- C. Deprivation Effects on Area 7.- XII. Functional Role of Parietal Cortex.- A. Somatosensory Cortex.- B. Parietal Association Cortex.- 1. Sensory Functions.- a) Visual Functions.- b) Somaesthetic Functions.- c) Vestibular and Auditory Functions.- 2. Motor Functions.- a) Eye Movements.- b) Somatic Movements.- c) The Command Hypothesis.- d) The Corollary Discharge Hypothesis.- 3. Behavioural Functions.- a) Sensorimotor Interaction.- b) Spatial Schema.- c) Motivation-Intention-Attention.- d) Plasticity, Learning, Memory.- 4. Cellular Machinery.- a) Functional Organization.- b) Methodological Difficulties.- c) Factors That Influence Cellular Activity.- C. Parietal Lobe as a Whole.- References.
Article
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance-the capacity to make sense of complex 'auditory scenes' is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the 'stochastic figure-ground' stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a 'game' featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders. © 2016 Teki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Article
The article outlines various neural processes underlying auditory scene analysis. It refers to the processes by which the auditory system groups and segregates components of sound mixtures to construct meaningful perceptual representations of sound sources in the environment. The article gives an overview of sequential, simultaneous, and schema-based auditory perceptual segregation/grouping processes. Emphasis is placed on the relationship between neurophysiological studies of auditory scene analysis in humans and those involving animal models. General physiological principles and themes that have emerged may provide a framework for the continuing investigation of neural substrates underlying auditory perceptual organization. A greater understanding of neural mechanisms involved in processes of auditory perceptual organization suggests additional therapies or other forms of intervention to ameliorate defi