Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in
speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity
differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants
verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different
frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater
activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when
Researchers have long been fascinated by how infants and young
children can rapidly “crack” the language code with far greater
ease than adults. Although ample behavioral evidence suggests a
marked decrease in the ability to acquire a new language with
native-like proficiency after puberty (Johnson and Newport,
1989; Weber-Fox and Neville, 2001), little is known about the
neural correlates underlying this phenomenon. A continuous
process of neural commitment to the statistical and prosodic
the corresponding decrease in the ability to acquire another lan-
guage later on (Kuhl, 2004). However, the mechanisms by which
the brain encodes these statistical and prosodic patterns during
the initial stages of language learning have not been examined in
adults until now.
Previous functional neuroimaging studies have used several
approaches to investigate different aspects of language learning.
Some studies examined changes in the pattern of neural activity
after a period of intensive training on a novel linguistic task
(Friederici et al., 2002; Callan et al., 2003, 2005; Golestani and
Zatorre, 2004). Although these studies can address issues of neu-
ral plasticity (because they focus on the neural reorganization
that results from learning), they do not provide information
Other investigations explored the neural basis of language learn-
ing by examining changes in brain activity that occur while par-
ticipants learn an artificial grammar in the scanner (Opitz and
Friederici, 2003, 2004; Thiel et al., 2003; Hashimoto and Sakai,
2004). Such studies, however, involved language processing in
the visual domain and, thus, cannot shed light on the neural
a language. To investigate such processes, functional magnetic
resonance imaging (fMRI) was used to evaluate learning-
associated changes in neural activity during initial exposure to
Foundation, Capital Group Companies Charitable Foundation, Robson Family, and Northstar Fund. The project
Correspondence should be addressed to Mirella Dapretto, Ahmanson-Lovelace Brain Mapping Center, 660
TheJournalofNeuroscience,July19,2006 • 26(29):7629–7639 • 7629
artificial languages presented in the auditory modality, when a
a continuous stream of speech.
Word segmentation is a fundamental aspect of language
learning because word boundaries must be identified before any
additional linguistic analysis can be performed (Jusczyk, 2002;
Saffran and Wilson, 2003). Although speech does not contain
acoustic cues that invariably specify word boundaries, research
has shown that infants as young as 7.5 months of age can use
statistical regularities such as the greater co-occurrence of sylla-
bles within words than between words to successfully parse a
continuous stream of speech (Saffran et al., 1996a; Aslin et al.,
1998). In addition to these transitional probabilities, speech cues
increased amplitude, and higher pitch on certain syllables) can
also guide word segmentation (Johnson and Jusczyk, 2001;
Thiessen and Saffran, 2003). By adapting a well established par-
adigm from the infant behavioral literature, the present research
examined the neural correlates of speech parsing. In light of evi-
2002), the relationship between neural activity indexing success-
ful word segmentation and rapid auditory processing ability was
Twenty-seven participants (13 female; mean age, 26.63 years; range,
20–44 years) volunteered in this study after giving written informed
consent for the experimental protocol approved by the University of
California, Los Angeles Institutional Review Board. By report, the par-
ticipants were right-handed, native English speakers with no history of
neurological or psychiatric disorders. All 27 participants completed an
26.92 years; range, 20–31 years) also underwent an event-related fMRI
word discrimination scan immediately after the speech stream exposure
outside of the scanner after the scanning session.
Stimuli and tasks
Speech stream exposure task. Participants listened to three counterbal-
anced streams of nonsense speech supposedly spoken by aliens from
form a task except to listen, given that a recent study has demonstrated
that implicit learning can be attenuated by explicit memory processes
during sequence learning (Fletcher et al., 2004). As shown in Figure 1, A
and B, the three streams were created by repeatedly concatenating 12
syllables (a different set of 12 syllables was used for each speech stream).
In each of the two artificial language conditions, the 12 syllables were
used to make four trisyllabic words, following the exact same procedure
used in previous infant and adult behavioral studies (Saffran et al.,
recorded separately using SoundEdit (Macromedia; Adobe Systems, San
Jose, CA), ensuring that the average syllable duration (0.267 s), ampli-
tude (18.2 dB), and pitch (221 Hz) were (1) not significantly different
ously in the behavioral literature. For each artificial language, the four
words were randomly repeated three times to form a block of 12 words,
subject to the constraint that no word was repeated twice in a row. Five
itself concatenated three times to form a continuous speech stream last-
ing 2 min and 24 s, during which each word occurred 45 times. For
combined to form a continuous stream of nonsense speech containing
no breaks or pauses (e.g., pabikutibudogolatudaropitibudo. . . ). Within
the speech stream, transitional probabilities for syllables within a word
and across word boundaries were 1 and 0.33, respectively. Thus, as the
words were repeated, transitional probabilities could be computed and
used to segment the speech stream. In the “unstressed language condi-
tion” (U), the stream contained only transitional probabilities as cues to
word boundaries. In the “stressed language condition” (S), the speech
stream contained transitional probabilities, as well as speech, or pro-
one-third of the time it occurred. Stress was added by slightly increasing
formed by pseudorandomly concatenating individual syllables (B). Participants listened to
three counterbalanced speech streams (C) containing statistical regularities (unstressed lan-
7630 • J.Neurosci.,July19,2006 • 26(29):7629–7639McNealyetal.•fMRIofSpeechParsing
the duration (0.273 s), amplitude (16.9 dB), and pitch (234 Hz) of these
stressed syllables. At the same time, these small increases were offset by
minor reductions in these parameters for the remaining syllables within
the stressed language condition, which ensured that the mean duration,
amplitude, and pitch would not be reliably different across the three
words in conversational English have stress on their initial syllable (Cut-
ler and Carter, 1987).
A “random syllables condition” (R) was also created so as to identify
activity associated with the actual process of word segmentation, as af-
artificial language conditions (U ? S), as opposed to activity related to
merely listening to a series of concatenated syllables. In this condition,
the 12 syllables were not arranged into four words as in the two artificial
language conditions; rather, these syllables were arranged pseudoran-
the stream (the frequency with which two-syllable strings occurred was
also minimized). Therefore, in this condition, neither statistical regular-
Thus, as depicted in Figure 1C, each participant listened to three 144 s
speech streams (R, U, and S) interspersed between 30 s of resting base-
audio clips 1–3 (available at www.jneurosci.org as supplemental mate-
counterbalanced across participants according to a “Latin square”
It is important to note the following: (1) the same number of syllables
was repeated the same number of times across all three conditions, al-
12 syllables was used with the same frequency in each condition, thus
ensuring that any difference between conditions would not be attribut-
able to different degrees of familiarity with a given set of syllables; (3) to
guard against the possibility that the computation of the transitional
probabilities between the syllables chosen to form the words in the two
artificial languages might be influenced by previous experience with the
transitional probabilities between these syllables in English, three differ-
became “kubipa” and “bipaku” in the others, with each word being used
equally often across participants); and (4) although the length of the
activation blocks used in this task was unconventional for an imaging
study and could have resulted in decreased power to detect reliable dif-
achieved. Furthermore, effect sizes, computed to address the robustness
of the analyses performed, demonstrated that rather large effects were
obtained (Cohen’s d ? 0.80).
fMRI task to provide an implicit index of successful word segmentation
in lieu of the assessment procedure (i.e., head-turn novelty preference)
used in the infant behavioral studies. In this mixed block/event-related
fMRI design, participants were presented with trisyllabic combinations
from the “speech stream exposure task” and were simply told to listen to
what might have been words in the artificial languages they just heard.
Based on evidence from previous behavioral studies, participants were
not expected to be able to explicitly identify whether these trisyllabic
combinations were words in the artificial languages after such a short
exposure to the streams (Saffran et al., 1996b; Sanders et al., 2002). Ac-
cordingly, participants were not asked to make an explicit judgment in
2, A and B, trisyllabic combinations were presented in three activation
blocks, with each block using the set of syllables that corresponded to
those used in one of the three speech streams heard previously by a
participant. Within each block, participants listened to the four “words”
(W) used to create the artificial language streams (e.g., “golatu”, “da-
ropi”) as well as to four “partwords” (PW), that is, trisyllabic combina-
tions formed by grouping syllables from adjacent words within the
speech streams (e.g., the partword “tudaro” consisted of the last syllable
of the word “golatu” and the first two of the adjacent word “daropi”
within the stream “. . . golatudaropi. . . ”). As in the previous behavioral
studies, the transitional probabilities between the first and second sylla-
syllables in the partwords (1 as opposed to 0.33) as a result of the words
and partwords having occurred within the speech stream 45 and 15
times, respectively. Each word and partword was repeated three times
presentation for words and partwords was used in each block such that
words, partwords, or null events occurred consecutively. The words and
partwords from the stressed language condition were presented in their
unstressed version, such that there were no differences in duration, am-
plitude, or pitch between any of the syllables used in this task. The trisyl-
labic combinations used in the block corresponding to the random syl-
the unstressed and stressed language streams and to nonwords from the random syllables
block using the set of syllables that corresponded to those used in one of the three speech
McNealyetal.•fMRIofSpeechParsing J.Neurosci.,July19,2006 • 26(29):7629–7639 • 7631
lables condition effectively served as “nonwords” (NW), because these
combinations had occurred only one time during the random syllables
stream, in which no statistical or prosodic cues were afforded to the
listener (transitional probability between the first two syllables equal to
Behavioral tasks. To investigate whether participants were able to ex-
(response times and accuracy scores) for the word discrimination task
were collected outside of the scanner. Each participant listened to the
word stimuli and responded yes or no as to whether they thought each
trisyllabic combination could be a word in the artificial languages they
heard previously. All 27 participants completed this behavioral testing.
All but one participant also completed a computerized version of the
Tallal Repetition Test, a hierarchical series of subtests shown to assess
several aspects of central auditory processing, including rapid auditory
forced-choice design in which participants are operantly trained to re-
spond to two complex tones by clicking on two different squares on the
computer screen. After participants learn the tone–square association,
they are presented with progressively longer sequences of complex tones
and must subsequently indicate the temporal order of such sequences.
The accuracy scores from the SerialMemory3Fast (SM3Fast) subscale of
among participants at this difficulty level. In this subscale, the stimuli
triplets, consisting of three elements (e.g., tone 1-2-1, 2-1-1, etc.) sepa-
rated by varying interstimulus intervals of silence (0, 10, 60, or 150 ms).
trial the order of the stimulus pattern by clicking on the squares in the
same order as the tones were presented.
fMRI data acquisition
Functional images were collected using a Siemens Allegra 3 Tesla head-
only MRI scanner. A two-dimensional spin-echo scout [repetition time
(TR), 4000 ms; echo time (TE), 40 ms; matrix size, 256 ? 256; 4 mm
thick; 1 mm gap] was acquired in the sagittal plane to allow prescription
of the slices to be obtained in the remaining scans. For each participant,
ume [TR, 5000 ms; TE, 33 ms; matrix size, 128 ? 128; field of view
(FOV), 20 cm; 36 slices; 1.56 mm in-plane resolution; 3 mm thick] was
acquired coplanar with the functional scans to allow for spatial registra-
speech stream exposure task, one functional scan lasting 8 min and 48 s
was acquired covering the whole cerebral volume (174 images; EPI gra-
64 ? 64; FOV, 20 cm; 36 slices; 3.125 mm in-plane resolution; 3 mm
thick; 1 mm gap). For the word discrimination task, a second functional
(103 images; EPI gradient echo sequence; TR, 2000 ms; TE, 25 ms; flip
resolution; 3 mm thick; 1 mm gap).
Participants listened to the auditory stimuli through a set of magnet-
compatible stereo headphones (Resonance Technology, Northridge,
CA). Stimuli were presented using MacStim 3.2 psychological experi-
mentation software (CogState, West Melbourne, Victoria, Australia).
fMRI data analysis
Using automated image registration (Woods et al., 1998a,b), functional
head motion during scanning and coregistered to their respective high-
resolution structural images using a six-parameter rigid body transfor-
spatially normalized into a standard stereotactic space (Woods et al.,
1999) using polynomial nonlinear warping, and (3) smoothed with a 6
mm full-width at half-maximum isotropic Gaussian kernel to increase
(Wellcome Department of Cognitive Neurology, London, UK; http://
www.fil.ion.ucl.ac.uk/spm/). For each participant, contrasts of interest
were estimated according to the general linear model using a canonical
hemodynamic response function. For the speech stream exposure task,
the exponential decay function in SPM99 (which closely approximates a
linear function) was also used to model changes that occurred within
each activation block as a function of exposure to the speech stream.
Contrast images from these fixed effects analyses were then entered into
to be made at the population level (Friston et al., 1999). All reported
activity survived correction for multiple comparisons at the cluster level
( p ? 0.05, corrected) and t ?3.3 for magnitude ( p ? 0.001, uncor-
rected), unless otherwise noted. Small-volume correction at the cluster
hypothesis of basal ganglia involvement during sequence learning.
For the speech stream exposure task, separate one-sample t tests were
and random syllables vs resting baseline) to identify blood oxygenation
level-dependent (BOLD) signal increases associated with listening to
each speech stream. Direct comparisons between each condition, as well
ity related to the presence of statistical and prosodic cues within the
artificial language streams. Specifically, in an effect akin to repetition
priming, the detection of word boundaries (afforded by the statistical
regularities and prosodic cues available in the artificial languages only)
was expected to yield overall less activity for the artificial language
streams within language-relevant frontotemporal regions and the basal
ganglia, which are known to be involved in sequence learning (Poldrack
et al., 2005).
In addition, because word segmentation is thought to take place as a
function of exposure to the language streams, signal increases over time
were expected in left-hemisphere language cortices while listening to the
artificial languages only, with a larger effect predicted for the stressed
language condition in which both statistical and prosodic cues were
available to guide word segmentation. A region-of-interest (ROI) analy-
observed to examine whether these signal increases over time for the
ROI included all voxels showing reliable signal increases at the group
as their respective counterparts in the opposite hemisphere. For each
participant, a laterality index was computed based on the number of
voxels showing increased activity over time within these symmetrical
ROIs in the LH and RH (number of voxels activated in the LH - number
of voxels activated in the RH/number of voxels activated in the LH ?
Because the ability to identify words that had occurred within the
speech streams should be related to activity observed during exposure to
the artificial languages if this activity truly reflects word segmentation,
regression analyses were also conducted to assess the relationship be-
tween accuracy scores on the behavioral post-scanner word discrimina-
tion test and neural activity associated with listening to the two artificial
For the word discrimination task, separate one-sample t tests were
implemented to identify changes in the BOLD signal associated with
listening to the words, partwords, and nonwords that had occurred with
different frequencies during the speech stream exposure task. Direct
comparisons between each condition were then implemented to test the
hypothesis that differential activity in language networks would be ob-
served for words, partwords, and nonwords, as would be expected if
sure task. Specifically, this effect could manifest as stronger signal in-
creases in response to words (than partwords and nonwords) if these
site pattern could be observed because of a novelty effect for nonwords.
The SPM99 toolbox MarsBaR (http://marsbar.sourceforge.net/) was
used to extract parameter estimates for each participant from regions
7632 • J.Neurosci.,July19,2006 • 26(29):7629–7639McNealyetal.•fMRIofSpeechParsing
that were significantly active in response to words, partwords, and
of future language outcomes (Benasich and Tallal, 2002), a regression
rapid auditory processing skills and the ability to implicitly segment the
words in the word discrimination task. Based on previous findings link-
tic changes (Temple, 2002), a positive correlation was expected between
neural activity associated with listening to words in the word discrimi-
Accuracy and reaction times on the behavioral word recognition
test conducted immediately after the fMRI scans are reported in
Table 1. Not surprisingly, given the findings of previous behav-
ioral research (Saffran et al., 1996b; Sanders et al., 2002), partic-
ipants were unable to explicitly recognize which trisyllabic com-
binations might have been words in the artificial languages they
heard during the speech stream exposure
task, as demonstrated by accuracy scores
not different from chance for either the
stressed or unstressed language condi-
tions. Similarly, there were no significant
differences in reaction time to items from
within the unstressed language, stressed
language, or random syllables streams.
Random effects models were used to identify changes in the
exposure task. Compared with resting baseline, listening to all
three streams of continuous speech activated a large-scale bilat-
frontal, and parietal cortices as well as in the putamen (Fig. 3;
tion in terms of spatial extent) was observed when participants
listened to the artificial languages (U ? S) than when they lis-
focal activity was observed when participants listened to the
stressed language than to the unstressed language (Table 3), in-
dicating that the presence of statistical cues in both artificial lan-
guage conditions, as well as the addition of prosodic cues in the
stressed language, afforded more efficient neural processing.
Because transitional probabilities and the related identifica-
of the activation conditions, regions in which activity might be
increasing as a function of exposure to the speech streams were
examined. A statistical contrast modeling increases in activity
over time for the artificial languages (U1 ? S1) revealed sig-
nificant bilateral signal increases in superior temporal gyrus
(STG) and transverse temporal gyrus, extending into the supra-
marginal gyrus (SMG) in the LH (Fig. 4A,B; Table 4). A region-
of-interest analysis indicated that this increasing activity during
exposure to the unstressed and stressed language conditions was
reliably stronger in the LH than in the RH, as indicated by a
contrast, no reliable signal increases over time were observed
in the time series shown in Figure 4B. A contrast directly com-
paring signal increases occurring during exposure to the two ar-
tificial languages versus signal increases occurring during expo-
sure to the random syllables condition [(U1 ? S1) ? R1]
function of exposure to the two artificial language conditions
than the random syllables condition (Table 4). This pattern held
when comparing signal increases during each artificial language
S1 ? R1) (Fig. 5A,B; Table 5), with stronger signal increases
for the comparison of the stressed language versus random sylla-
bles than for the unstressed language versus random syllables
[(S1 ? R1) ? (U1 ? R1)] in left STG and the right tem-
poroparietal junction (Fig. 5C; Table 5).
A statistical contrast modeling decreases in activity over time
ity in ventromedial prefrontal cortex (Talairach coordinates 4,
56, 8; t ? 5.26, corrected for multiple comparisons at the cluster
putamen, was observed when participants listened to the random syllables stream (A), the
speech stream. Activation maps are displayed at a threshold of t ? 1.71, with correction for
Extensive activity in temporal, frontal, and parietal cortices, as well as in the
Words PartwordsWords Partwords
McNealyetal.•fMRIofSpeechParsing J.Neurosci.,July19,2006 • 26(29):7629–7639 • 7633
several other regions (i.e., left MTG, bilat-
eral inferior parietal lobule (IPL), right
middle and medial frontal gyri, anterior
cingulate, and left putamen) in which
overall greater activity was observed for
the random syllables versus artificial lan-
guage conditions [(R ? (S ? U)]. Con-
versely, at these more liberal thresholds,
cial languages versus random syllables
condition [(S ? U) ? R)] in the left STG
and IPL, that is, within those regions in
which significant increases over time were
observed for the artificial languages. For
nal decreases over time were only ob-
coordinates ?34, ?24, ?2, t ? 5.3 and
?50, ?36, 50, t ? 3.75, respectively, both
corrected for multiple comparisons at the
Participants’ accuracy at discriminating
words on the post-scanner behavioral
word discrimination task was positively
correlated with activity in the left STG as
participants listened to the stressed and
unstressed languages compared with ran-
dom syllables during the speech stream
exposure task (Fig. 6; Table 6). Impor-
region showing increasing activity over
time while listening to the artificial
with listening to words, partwords, and
nonwords presented during the three
blocks of the word discrimination task.
and partwords (which had occurred 45
and 15 times, respectively, during the
stressed language blocks. (In keeping with
the head-turn novelty preference para-
digm used in the developmental literature
12 partwords were presented within each
language block. Collapsing across the language blocks yielded 24
events for words and 24 for partwords to be contrasted with the
24 events for the nonwords from the random syllable block.)
Activity for the nonwords corresponded to trisyllabic combina-
tions presented during the random syllables block (these combi-
dition of the speech stream exposure task). A statistical contrast
nonwords (W ? NW) revealed significantly greater activity for
7). Reliably greater activity, albeit to a lesser extent, was also
observed in these same regions when comparing words versus
partwords (W ? PW) (Fig. 7B; Table 7) and partwords versus
nonwords (PW ? NW) (Table 7). The partwords versus non-
words comparison also revealed significantly greater activity in
the superior parietal lobule. Importantly, no reliably greater cor-
?42 160 4.82
Activity thresholded at t ? 3.30 (p ? 0.001), corrected for multiple comparisons at the cluster level (p ? 0.05), except for the putamen in which a
7634 • J.Neurosci.,July19,2006 • 26(29):7629–7639McNealyetal.•fMRIofSpeechParsing
tical activity was observed for any of the reverse statistical com-
parisons (NW ? W, NW ? PW, and PW ? W). Despite the
limited number of words and partwords within each language
block, when the above analyses were conducted for the stressed
and unstressed language blocks separately, a virtually identical
pattern of results was observed (albeit with smaller cluster size).
Additional analyses were conducted to determine whether
differential activity would be observed as participants listened to
words that had occurred in the speech stream when both pro-
sodic and statistical cues were available to guide word segmenta-
guage). Comparing activity observed during the stressed lan-
guage block with that observed during the unstressed language
block [S(W ? PW) ? U(W ? PW)] revealed that left MFG and
?50, 38, t ? 3.43, respectively, corrected for multiple compari-
sons at the cluster level) showed greater activation when partici-
pants listened to words and partwords from within the speech
liable activity was detected for the reverse comparison [U(W ?
PW) ? S(W ? PW)].
Participants’ performance on the Tallal Repetition Test was pos-
itively correlated with left MFG activity while participants lis-
is, the better the participants’ rapid auditory processing skills, as
indexed by accuracy on the SM3Fast subscale of the Tallal Repe-
tition Test (mean ? SD, 80.3 ? 19.6%), the greater the activity
observed in left MFG during the word discrimination task when
listening to trisyllabic combinations that had occurred 45 times
?42 28 3.59
during exposure to the random syllables stream (U1 ? R1 and S1 ? R1). Activity
Activity within temporal cortices was found to increase during exposure to the
McNealyetal.•fMRIofSpeechParsingJ.Neurosci.,July19,2006 • 26(29):7629–7639 • 7635
during the stressed and unstressed language conditions of the
speech stream exposure task.
The present research investigated the neural correlates of word
segmentation, a crucial process in the early stages of language
learning. Over the past decade, milestone studies in the infant
behavioral literature have revolutionized our thinking about the
mechanisms underlying language acquisition, showing that the
brain possesses striking computational abilities that allow young
cal and prosodic patterns within continuous speech (Saffran et
the paradigm used in the infant literature, this study explored
how the brain processes the statistical and prosodic speech cues
available in the input to crack the code of a new artificial
Despite the somewhat unusual fMRI design, the frontotem-
concatenated syllables are consistent with those observed in pre-
vious studies involving the presentation of similar speech-like
stimuli and spectrally complex acoustic information (Binder et
et al.; 2000; Vouloumanos et al., 2001; Gandour et al., 2002;
(Fig. 3; Table 2). The observed activity in motor speech areas is
also in line with a recent study showing considerable overlap
between the areas active during passive listening to syllables and
those active during overt syllable production (Wilson et al.,
2004). Moreover, our finding of basal ganglia activity is in agree-
ment with previous studies involving sequence learning across
different modalities (Poldrack et al., 2005; Doyon et al., 2003;
Saint-Cyr, 2003; Van der Graaf et al., 2004).
listened to the artificial languages, as opposed to the stream of
word segmentation present only in the artificial languages facili-
tated the parsing of the speech streams, thus affording more effi-
cient neural processing (Table 2). This interpretation is line with
other studies reporting less activity for easier or better-learned
Kelly and Garavan, 2005). The overall greater activity associated
with listening to the stream of random syllables, in which the
order of occurrence of syllables could not be predicted, is also
consistent with a study showing greater activity in superior tem-
poral cortices in response to sequences of visual stimuli with low
temporal predictability versus those with high predictability
(Bischoff-Grethe et al., 2000).
syllables, left-lateralized BOLD signal increases in temporal and
to the two streams containing statistical regularities, particularly
the stream also containing prosodic cues (Figs. 4, 5; Tables 4, 5).
Because the artificial languages and random syllables stream dif-
fered only in the presence of cues to word boundaries, the signal
(U1?R1) (S1?R1) (S1?R1?U1?R1)
?6 44 20
?22 16 3.70
?20 16 3.70
tion) during the speech stream exposure task was significantly correlated with participants’
racy on the behavioral post-scanner word discrimination task [peak at Talairach coordinates
7636 • J.Neurosci.,July19,2006 • 26(29):7629–7639McNealyetal.•fMRIofSpeechParsing
temporal gyri may reflect the ongoing computation of transi-
2000), whereas the increases in activity in the left supramarginal
gyrus may reflect the development of phonological representa-
tions for the “words” in the artificial languages (Gelfand and
Bookheimer, 2003; Xiao et al., 2005). The stronger signal in-
creases observed in these regions, as well as in primary auditory
cortices, for the stressed language compared with the unstressed
language, suggest that the presence of prosodic cues did aid
speech parsing, in line with previous behavioral data in both
fran, 2003) (Table 5). Although several studies have reported
decreases in activity as participants learn to perform a linguistic
task (Thiel et al., 2003; Golestani and Zatorre, 2004; Noppeney
and Price, 2004) and, indeed, some decreases were also observed
in the present study, our findings are consistent with previous
reports of increased activity as a function of learning in primary
and secondary sensory and motor areas, as well as in regions
involved in the storage of task-related cortical representations
(for review, see Kelly and Garavan, 2005).
Our finding of signal increases in temporal and inferior pari-
etal cortices during exposure to the artificial languages indicates
overall, participants could not reliably identify the words used to
create the speech streams in a post-scan behavioral test. This
interpretation is strongly supported by the positive relationship
and participants’ ability to discriminate words in the post-
scanner behavioral word discrimination task (Fig. 6; Table 6).
Additionally, although neuroimaging studies of learning have
typically examined changes in neural activity associated with ob-
sures), several other studies have reported changes in neural ac-
tivity that occur in the absence of any overt change in task
performance (Shadmehr and Holcomb, 1997; Jaeggi et al., 2003;
Landau et al., 2004; Kelly and Garavan, 2005). Neural changes
that precede behavioral changes have also been shown in event-
related potential (ERP) studies across a variety of linguistic tasks
(Shestakova et al., 2003; McLaughlin et al., 2004). Notably, an
ERP study of word segmentation has also demonstrated that,
although adults required prolonged exposure and training to ex-
plicitly identify words within a speech stream, they displayed an
increased N100 component to word initial syllables (taken to
index the implicit detection of word boundaries) long before
showing evidence of explicit knowledge (Sanders et al., 2002).
The results of the word discrimination task, in which partici-
ditional evidence that word segmentation had implicitly oc-
curred during the speech stream exposure task. Here, greater
listening to words was significantly correlated with participants’ rapid auditory processing
Regression analysis revealed that middle frontal gyrus activity associated with
McNealyetal.•fMRIofSpeechParsing J.Neurosci.,July19,2006 • 26(29):7629–7639 • 7637
activity was observed in left prefrontal regions in response to
trisyllabic combinations with higher frequency of occurrence
within the speech streams and, thus, higher transitional proba-
words compared with partwords (i.e., trisyllabic combinations
that had occurred 45 times versus 15 times within the artificial
language streams) and nonwords (i.e., trisyllabic combinations
ited greater activity in the posterior, superior region of left IFG
nological and sequential processing (Gelfand and Bookheimer,
2003). That this effect was more pronounced for the words from
that participants were able to capitalize on the presence of pro-
sodic cues to aid word segmentation during the speech stream
exposure task. The observed left-lateralized prefrontal activity is
consistent with a host of previous imaging studies implicating
these regions in several aspects of phonological processing (for
review, see Bookheimer, 2002), including phonemic discrimina-
tion, temporal sequencing, articulatory recoding, and the main-
tenance of acoustic information in working memory (Burton et
al., 2000; Newman and Twieg, 2001; Gelfand and Bookheimer,
2003; LoCasto et al., 2004; Xiao et al., 2005).
Interestingly, activity within the left MFG while listening to
words from within the stressed and unstressed languages was
modulated by participants’ rapid auditory processing skills, as
indexed by their performance on the Tallal Repetition Test (Tal-
lal and Piercy, 1973) (Fig. 8). Previous research has shown that
predictor of future language outcomes and verbal intelligence
Tallal, 2004). The observed relationship between MFG activity
vidual differences in rapid auditory processing skills further at-
tests to the importance of this region for language learning. Al-
though other neuroimaging studies have linked MFG activity to
the processing of rapid acoustic information at the phonemic
level (Fiez et al., 1995; Temple et al., 2000; Poldrack et al., 2001;
Temple, 2002), the present results indicate that rapid auditory
processing skills may also be important for other aspects of lan-
In conclusion, the current research confirms that minimal
exposure to a continuous stream of speech containing statistical
and prosodic cues is sufficient for implicit word segmentation to
tion in the mature brain. At an applied level, this paradigm may
be useful for identifying abnormalities in the neural architecture
subserving language learning in developmental language disor-
ders and for exploring changes in this circuitry after interven-
tions. At a theoretical level, given that infants and adults can
segment sequences of tones just as well as syllables (Saffran et al.,
disentangle domain-specific from domain-general mechanisms
subserving language learning. Most importantly, the present
investigate changes in the neural basis of language learning oc-
curring with age and linguistic experience to help answer the
ers than adults.
Aslin RN, Saffran JR, Newport EL (1998) Computation of probability sta-
tistics by 8-month-old infants. Psychol Sci 9:321–324.
Benasich A, Tallal P (2002) Infant discrimination of rapid auditory cues
predicts later language impairment. Behav Brain Res 136:31–49.
PossingET (2000) Humantemporallobeactivationbyspeechandnon-
speech sounds. Cereb Cortex 10:512–528.
Bischoff-Grethe A, Proper SM, Mao H, Daniels KA, Berns GS (2000) Con-
scious and unconscious processing of nonverbal predictability in Wer-
nicke’s area. J Neurosci 20:1975–1981.
BlakemoreSJ,ReesG,FrithCD (1998) Howdowepredicttheconsequences
of our actions? A functional imaging study. Neuropsychologia
Bookheimer S (2002) Functional MRI of language: new approaches to un-
derstanding the cortical organization of semantic processing. Annu Rev
Burton MW, Small SL, Blumstein SE (2000) The role of segmentation in
phonological processing: an fMRI investigation. J Cogn Neurosci
CallanAM,CallanDE,MasakiS (2005) Whenmeaninglesssymbolsbecome
letters: neural activity change in learning new phonograms. NeuroImage
Callan DE, Tajima K, Callan AM, Kubo R, Masaki S, Akahane-Yamada R
(2003) Learning-induced neural plasticity associated with improved
identification performance after training of a difficult second-language
phonetic contrast. NeuroImage 19:113–124.
Casey BJ, Galvan A, Hare TA (2005) Changes in cerebral functional organi-
zation during cognitive development. Curr Opin Neurobiol 15:239–244.
Cutler A, Carter DM (1987) The predominance of strong initial syllables in
the English vocabulary. Comput Speech Lang 2:133–142.
Dehaene-Lambertz G, Pallier C, Serniclaes W, Sprenger-Charolles L, Jobert
A, Dehaene S (2005) Neural correlates of switching from auditory to
speech perception. NeuroImage 24:21–33.
Doyon J, Penhune V, Ungerleider LG (2003) Distinct contribution of the
cortico-striatal and cortico-cerebellar systems to motor skill learning.
FiezJA,RaichleME,MiezinFM,PetersenSE,TallalP,KatzWF (1995) PET
acteristics and task demands. J Cogn Neurosci 7:357–375.
Fletcher PC, Zafiris O, Frith CD, Honey RAE, Corlett PR, Zilles K, Fink GR
(2004) On the benefits of not trying: brain activity and connectivity re-
flecting the interactions of explicit and implicit sequence learning. Cereb
Friederici AD, Steinhauer K, Pfeifer E (2002) Brain signatures of artificial
language processing: evidence challenging the critical period hypothesis.
Proc Natl Acad Sci USA 99:529–534.
FristonKJ,HolmesAP,PriceCJ,BuchelC,WorsleyKJ (1999) Multisubject
fMRI studies and conjunction analyses. NeuroImage 10:385–396.
Gandour J, Wong D, Lowe M, Dzemidzic M, Satthamnuwong N, Tong Y, Li
X (2002) A cross-linguistic fMRI study of spectral and temporal cues
underlying phonological processing. J Cogn Neurosci 14:1076–1087.
Gelfand J, Bookheimer SY (2003) Dissociating neural mechanisms of tem-
poral sequencing and processing phonemes. Neuron 38:831–842.
GolestaniN,ZatorreRJ (2004) Learningnewsoundsofspeech:reallocation
of neural substrates. NeuroImage 21:494–506.
Hashimoto R, Sakai KL (2004) Learning letters in adulthood: direct visual-
ization of cortical plasticity for forming a new link between orthography
and phonology. Neuron 42:311–322.
HickokG,PoeppelD (2000) Towardsafunctionalneuroanatomyofspeech
perception. Trends Cogn Sci 4:131–138.
Jaeggi SM, Seewer R, Nirkko AC, Eckstein D, Schroth G, Groner R, Gutbrod
K (2003) Does excessive memory load attenuate activation in the pre-
tional magnetic resonance imaging study. NeuroImage 19:210–225.
Ja ¨ncke L, Wustenberg T, Scheich H, Heinze HJ (2002) Phonetic perception
and the temporal cortex. NeuroImage 15:733–746.
Johnson EK, Jusczyk PW (2001) Word segmentation by 8-month-olds:
when speech cues count more than statistics. J Mem Lang 44:548–567.
Johnson EK, Newport EL (1989) Critical period effects in second language
as a second language. Cognit Psychol 21:60–99.
7638 • J.Neurosci.,July19,2006 • 26(29):7629–7639McNealyetal.•fMRIofSpeechParsing
Johnsrude IS, Penhune VB, Zatorre RJ (2000) Functional specificity in the
right human auditory cortex for perceiving pitch direction. Brain
Jusczyk PW (2002) How infants adapt speech-processing capacities to
native-language structure. Curr Dir Psychol Sci 11:15–18.
Kelly AMC, Garavan H (2005) Human functional neuroimaging of brain
changes associated with practice. Cereb Cortex 15:1089–1102.
Kuhl PK (2004) Early language acquisition: cracking the speech code. Nat
Rev Neurosci 5:831–843.
Landau SM, Schumacher EH, Garavan H, Druzgal TJ, D’Esposito (2004) A
of working memory. NeuroImage 22:211–221.
LoCasto PC, Krebs-Noble D, Gullapalli RP, Burton MW (2004) An fMRI
investigation of speech and tone segmentation. J Cogn Neurosci
McLaughlin J, Osterhout L, Kim A (2004) Neural correlates of second-
Newman SD, Twieg D (2001) Differences in auditory processing of words
and pseudowords: an fMRI study. Hum Brain Mapp 14:39–47.
NoppeneyU,PriceCJ (2004) AnfMRIstudyofsyntacticadaptation.JCogn
Opitz B, Friederici AD (2003) Interactions of the hippocampal system and
the prefrontal cortex in learning language-like rules. NeuroImage
Opitz B, Friederici AD (2004) Brain correlates of language learning: the
neuronal dissociation of rule-based versus similarity-based learning.
J Neurosci 24:8436–8440.
Petersen SE, van Mier H, Fiez JA, Raichle ME (1998) The effects of practice
on the functional anatomy of task performance. Proc Natl Acad Sci USA
Poldrack RA, Temple E, Protopapas A, Nagarajan S, Tallal P, Merzenich M,
Gabrieli JDE (2001) Relations between the neural bases of dynamic au-
ditory processing and phonological processing: evidence from fMRI. J
Cogn Neurosci 13:687–697.
Poldrack RA, Sabb FW, Foerde K, Tom SM, Asarnow RF, Bookheimer SY,
Knowlton BJ (2005) The neural correlates of motor skill automaticity.
J Neurosci 25:5356–5364.
Saffran JR, Wilson DP (2003) From syllables to syntax: multilevel statistical
learning by 12-month-old infants. Infancy 4:273–284.
Saffran JR, Aslin RN, Newport EL (1996a) Statistical learning by 8-month-
old infants. Science 274:1926–1928.
Saffran JR, Newport EL, Aslin RN (1996b) Word segmentation: the role of
distributional cues. J Mem Lang 35:606–621.
SaffranJR,JohnsonEK,AslinRN,NewportEL (1999) Statisticallearningof
tone sequences by human infants and adults. Cognition 70:27–52.
Saint-Cyr JA (2003) Frontal-striatal circuit functions: context, sequence,
and consequence. J Int Neuropsychol Soc 9:103–128.
Sanders LD, Newport EL, Neville HJ (2002) Segmenting nonsense: an
event-related potential index of perceived onsets in continuous speech.
Nat Neurosci 5:700–703.
ScottSK,JohnsrudeIS (2003) Theneuroanatomicalandfunctionalorgani-
zation of speech perception. Trends Neurosci 26:100–107.
ScottSK,BlankCC,RosenS,WiseRJ (2000) Identificationofapathwayfor
intelligible speech in the left temporal lobe. Brain 123:2400–2406.
Shadmehr R, Holcomb HH (1997) Neural correlates of motor memory
consolidation. Science 277:821–825.
Shestakova A, Huotilainen M, Ceponiene R, Cheour M (2003) Event-
related potentials associated with second language learning in children.
Clin Neurophysiol 114:1507–1512.
TallalP (2004) Improvinglanguageandliteracyisamatteroftime.NatRev
TallalP,PiercyM (1973) Defectsofnon-verbalauditoryperceptioninchil-
dren with developmental dysphasia. Nature 241:468–469.
Temple E (2002) Brain mechanisms in normal and dyslexic readers. Curr
Opin Neurobiol 12:178–183.
Temple E, Poldrack RA, Protopapas A, Nagarajan S, Salz T, Tallal P, Mer-
zenich MM, Gabrieli JDE (2000) Disruption of the neural response to
rapid acoustic stimuli in dyslexia: evidence from functional MRI. Proc
Natl Acad Sci USA 97:13907–13912.
Thiel CM, Shanks DR, Henson RNA, Dolan RJ (2003) Neuronal correlates
of familiarity-driven decisions in artificial grammar learning. NeuroRe-
Thiessen ED, Saffran JR (2003) When cues collide: use of stress and statisti-
cal cues to word boundaries by 7- to 9-month-old infants. Dev Psychol
Van der Graaf FHCE, de Jong BM, Maguire RP, Meiners LC, Leenders KL
(2004) Cerebral activation related to skills practice in a double serial
reaction time task: striatal involvement in random-order sequence learn-
ing. Cognit Brain Res 20:120–131.
Vouloumanos A, Kiehl KA, Werker JF, Liddle PF (2001) Detection of
sounds in the auditory stream: event-related fMRI evidence for differen-
tial activation to speech and nonspeech. J Cogn Neurosci 13:994–1005.
Weber-Fox C, Neville HJ (2001) Sensitive periods differentiate processing
Hear Res 44:1338–1353.
Wilson SM, Saygin AP, Sereno MI, Iacoboni M (2004) Listening to speech
activates motor areas involved in speech production. Nat Neurosci
WoodsRP,GraftonST,HolmesCJ,CherrySR,MazziottaJC (1998a) Auto-
mated image registration. I. General methods and intrasubject, intramo-
dality validation. J Comput Assist Tomogr 22:139–152.
WoodsRP,GraftonST,WatsonJDG,SicotteNL,MazziottaJC (1998b) Au-
tomated image registration. II. Intersubject validation of linear and non-
linear models. J Comput Assist Tomogr 22:153–165.
Woods RP, Dapretto M, Sicotte NL, Toga AW, Mazziotta JC (1999) Cre-
ation and use of a Talairach-compatible atlas for accurate, automated,
nonlinear intersubject registration, and analysis of functional imaging
data. Hum Brain Mapp 8:73–79.
XiaoZ,ZhangJX,WangX,WuR,HuX,WengX,TanLH (2005) Differen-
tial activity in left inferior frontal gyrus for pseudowords and real words:
an event-related fMRI study on auditory lexical decision. Hum Brain
Zatorre RJ, Belin P, Penhune VB (2002) Structure and function of auditory
cortex: music and speech. Trends Cogn Sci 6:37–46.
McNealyetal.•fMRIofSpeechParsing J.Neurosci.,July19,2006 • 26(29):7629–7639 • 7639