Hemispheric roles in the perception of speech prosody.
ABSTRACT Speech prosody is processed in neither a single region nor a specific hemisphere, but engages multiple areas comprising a large-scale spatially distributed network in both hemispheres. It remains to be elucidated whether hemispheric lateralization is based on higher-level prosodic representations or lower-level encoding of acoustic cues, or both. A cross-language (Chinese; English) fMRI study was conducted to examine brain activity elicited by selective attention to Chinese intonation (I) and tone (T) presented in three-syllable (I3, T3) and one-syllable (I1, T1) utterance pairs in a speeded response, discrimination paradigm. The Chinese group exhibited greater activity than the English in a left inferior parietal region across tasks (I1, I3, T1, T3). Only the Chinese group exhibited a leftward asymmetry in inferior parietal and posterior superior temporal (I1, I3, T1, T3), anterior temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both language groups shared a rightward asymmetry in the mid portions of the superior temporal sulcus and middle frontal gyrus irrespective of prosodic unit or temporal interval. Hemispheric laterality effects enable us to distinguish brain activity associated with higher-order prosodic representations in the Chinese group from that associated with lower-level acoustic/auditory processes that are shared among listeners regardless of language experience. Lateralization is influenced by language experience that shapes the internal prosodic representation of an external auditory signal. We propose that speech prosody perception is mediated primarily by the RH, but is left-lateralized to task-dependent regions when language processing is required beyond the auditory analysis of the complex sound.
- SourceAvailable from: Benjamin Zinszer[Show abstract] [Hide abstract]
ABSTRACT: Recent neuroimaging studies have revealed distinct functional roles of left and right temporal lobe structures in the processing of lexical tones in Chinese. In the present study, we ask whether knowledge of a second language (English) modulates this pattern of activation in the perception of tonal contrasts. Twenty-four native Chinese speakers were recruited from undergraduate and graduate students at Beijing Normal University, China. Participants listened to blocks of computationally manipulated /ba/ syllables which were varied to form within- and across-category deviants at equal acoustic intervals from a standard tone while their cortical blood oxygenation was measured by functional near-infrared spectroscopy (fNIRS). Blocks were analyzed for peak blood oxygenation (HbO) levels, and several linear models were estimated for these data, including effects of deviant tone type (within- or across-category), behavioral differences in tone identification, age of earliest exposure to English (spoken), and proficiency in English. Functional changes in HbO indicated a significantly greater response to within-category contrasts in right STG, consistent with previous findings. However, the effect of deviant type in left MTG was significantly modulated by the age of participants' earliest English exposure: Average across-category activation exceeded within-category activation only for participants exposed to English after 13 years of age. While previous research has established the importance of left MTG in the categorical perception of lexical tones, our findings suggest that the functional specialization of this region is sensitive to second language experience, even in the processing of native language.Journal of Neurolinguistics 01/2014; · 1.12 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Second language learning becomes increasingly difficult with age, but some adults learn more successfully than others. We examined whether inter-subject variability in the microstructure of white matter pathways, as measured by diffusion tensor imaging (DTI), would predict native English speakers' outcomes in learning Mandarin Chinese. Twenty-one adults were scanned before participating in an intensive 4-week Mandarin course. At the end of the Mandarin course, participants completed a final exam that assessed their skills in both spoken and written Mandarin. Individual participants' white-matter tracts were reconstructed from their native DTI data and related to final-exam performance. Superior language learning was correlated with DTI measures in the right hemisphere, but not in the left hemisphere. In particular, greater initial fractional anisotropy (FA) in both the right superior longitudinal fasciculus (parietal bundle) and the right inferior longitudinal fasciculus was associated with more successful Mandarin learning. The relation between white-matter structure in the right hemisphere of native English speakers and successful initial language learning may reflect the tonal and visuo-spatial properties, respectively, of spoken and written Mandarin Chinese.Journal of Neurolinguistics 09/2014; · 1.12 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The accurate perception of lexical tones in tonal languages involves the processing of both acoustic information and phonological information carried by the tonal signal. In this study we evaluated the relative role of the two types of information in native Chinese speaker's processing of tones at a preattentive stage with event-related potentials (ERPs), particularly the mismatch negativity (MNN). Specifically, we distinguished the acoustic from the phonological information by manipulating phonological category and acoustic interval of the stimulus materials. We found a significant main effect of phonological category for the peak latency of MMN, but a main effect of both phonological category and acoustic interval for the mean amplitude of MMN. The results indicated that the two types of information, acoustic and phonological, play different roles in the processing of Chinese lexical tones: acoustic information only impacts the extent of tonal processing, while phonological information affects both the extent and the time course of tonal processing. Implications of these findings are discussed in light of neurocognitive processes of phonological processing.Frontiers in Human Neuroscience 01/2014; 8:729. · 2.91 Impact Factor
Hemispheric roles in the perception of speech prosody
Jackson Gandour,a,*Yunxia Tong,aDonald Wong,bThomas Talavage,cMario Dzemidzic,d
Yisheng Xu,aXiaojian Li,eand Mark Lowef
aDepartment of Audiology and Speech Sciences, Purdue University, West Lafayette, IN 47907-2038, USA
bDepartment of Anatomy and Cell Biology, Indiana University School of Medicine, IN 46202-5120, USA
cSchool of Electrical and Computer Engineering, Purdue University, IN 47907-2035, USA
dMDZ Consulting Inc., Greenwood, IN 46143, USA
eSouth China Normal University, Guangzhou, PR China
fCleveland Clinic Foundation, Cleveland, OH 44195, USA
Received 11 March 2004; revised 2 June 2004; accepted 2 June 2004
Speech prosody is processed in neither a single region nor a specific
hemisphere, but engages multiple areas comprising a large-scale
spatially distributed network in both hemispheres. It remains to be
elucidated whether hemispheric lateralization is based on higher-level
prosodic representations or lower-level encoding of acoustic cues, or
both. A cross-language (Chinese; English) fMRI study was conducted
to examine brain activity elicited by selective attention to Chinese
intonation (I) and tone (T) presented in three-syllable (I3, T3) and one-
syllable (I1, T1) utterance pairs in a speeded response, discrimination
paradigm. The Chinese group exhibited greater activity than the
English in a left inferior parietal region across tasks (I1, I3, T1, T3).
Only the Chinese group exhibited a leftward asymmetry in inferior
parietal and posterior superior temporal (I1, I3, T1, T3), anterior
temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both
language groups shared a rightward asymmetry in the mid portions of
the superior temporal sulcus and middle frontal gyrus irrespective of
prosodic unit or temporal interval. Hemispheric laterality effects
enable us to distinguish brain activity associated with higher-order
prosodic representations in the Chinese group from that associated
with lower-level acoustic/auditory processes that are shared among
listeners regardless of language experience. Lateralization is influenced
by language experience that shapes the internal prosodic representa-
tion of an external auditory signal. We propose that speech prosody
perception is mediated primarily by the RH, but is left-lateralized to
task-dependent regions when language processing is required beyond
the auditory analysis of the complex sound.
D 2004 Elsevier Inc. All rights reserved.
Keywords: fMRI; Human auditory processing; Speech perception; Selective
attention; Laterality; Language; Prosody; Intonation; Tone; Chinese
The differential roles of the left (LH) and right (RH) cerebral
hemispheres in the processing of prosodic information have re-
ceived considerable attention over the last several decades. Evi-
dence supporting an RH role in the perception of prosodic units at
phrase- and sentence-level structures has been wide-ranging, in-
cluding dichotic listening (Blumstein and Cooper, 1974; Shipley-
Brown et al., 1988), lesion deficit (Baum and Pell, 1999; Bra ˚dvik et
al., 1991; Pell, 1998; Pell and Baum, 1997; Weintraub et al., 1981),
and functional neuroimaging (Gandour et al., 2003; George et al.,
1996; Meyer et al., 2003; Plante et al., 2002; Wildgruber et al.,
2002). Involvement of the LH in the perception of prosodic units at
the syllable- or word-level structures has also been compelling with
converging evidence from dichotic listening (Moen, 1993; Van
Lancker and Fromkin, 1973; Wang et al., 2001), lesion deficit (Eng
et al., 1996; Gandour and Dardarananda, 1983; Hughes et al., 1983;
Yiu and Fok, 1995), and neuroimaging (Gandour et al., 2000, 2003;
Hsieh et al., 2001; Klein et al., 2001).
The precise mechanisms underlying functional asymmetry for
speech prosody remain a matter of debate. Task-dependent hy-
potheses focus on functional properties (e.g., tone vs. intonation) of
the speech stimuli (Van Lancker, 1980), whereas cue-dependent
hypotheses are directed to particular physical properties (e.g.,
temporal vs. spectral) of the acoustic signal (Ivry and Robertson,
1998; Poeppel, 2003; Schwartz and Tallal, 1980; Zatorre and
Belin, 2001). Speech prosody is predicted to be right-lateralized
by cue-dependent hypotheses. Hemispheric specialization, howev-
er, appears to be sensitive to language-specific factors irrespective
of neural mechanisms underlying lower-level auditory processing
(Gandour et al., 2002).
The Chinese (Mandarin) language can be exploited to address
questions of functional asymmetry underlying prosodic processing
that involve primarily variations in pitch. Chinese has four lexical
tones (e.g., ma [tone 1] ‘‘mother’’, ma ‘‘hemp’’ [tone 2], ma [tone 3]
‘‘horse’’, ma [tone 4] ‘‘scold’’). Tones 1–4 can be described
phonetically as high level, high rising, falling rising, and high
falling, respectively (Howie, 1976). They are manifested at the level
of the syllable, the smallest structural unit for carrying prosodic
1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved.
$Supplementary data associated with this article can be found, in the
online version, at doi: 10.1016/j.neuroimage.2004.06.004.
* Corresponding author. Department of Audiology and Speech
Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West
Lafayette, IN 47907-2038. Fax: +1-765-494-0771.
E-mail address: email@example.com (J. Gandour).
Available online on ScienceDirect (www.sciencedirect.com.)
NeuroImage 23 (2004) 344–357
features, on a time scale of 200–350 ms. Intonation, on the other
hand, is manifested at the phrase or sentence level, typically on a
time scale of seconds. In Chinese, interrogative intonation exhibits a
higher pitch contour than that of its declarative counterpart (Shen,
1990) as well as a wider pitch range for sentence-final tones (Yuan
et al., 2002). In English, interrogative sentences do not have overall
higher pitch contours than declarative sentences, nor do they show
any effects of tone and intonation interaction in sentence-final
position. Chinese interrogative intonation with a final rising tone
has a rising end, which is similar to English, whereas that with a
final falling tone often has a falling end (Yuan et al., 2002).
In a previous fMRI study of Chinese tone and intonation (Gan-
dour et al., 2003), both tone and intonation were judged in sentences
presented at a fixed length (three words), and we observed left-
lateralized lexical tone perception in comparison to intonation.
However, the prosodic unit listeners selectively attended to and
the temporal interval of attentional focus were coterminous. In
judgments of lexical tone, the focus of attention was on the final
word only, whereas judgments of intonation required that the focus
be directed to the entire sentence. Whether the principal driving
force in hemispheric lateralization of speech prosody is due to the
temporal interval of attentional focus rather than the hierarchical
level of linguistic units is not yet well-established. The aim of the
present study is to determine whether the temporal interval in which
prosodic units are presented influences the neural substrates used in
prosodic processing. As such, participants are asked to make per-
ceptual judgments of tone and intonation in one-syllable and three-
syllable Chinese utterances. By comparing activation in homolo-
gous regions of both hemispheres, we can assess the extent to which
hemispheric laterality for speech prosody is driven by the temporal
interval, prosodic unit, or both. Only native Chinese speakers
possess implicit knowledge that relates external auditory cues to
internal representations of tone and intonation. By employing two
language groups, one consisting of Chinese speakers, the other of
English speakers, we are able to determine whether activation of
particular brain areas is sensitive to language experience.
Materials and methods
Ten native speakers of Mandarin (five male; five female) and
ten native speakers of American English (five male; five female)
were closely matched in age/years of education (Chinese: M = 29/
19; English: M = 27/19). All subjects were strongly right-handed
(Oldfield, 1971) and exhibited normal hearing sensitivity. All
subjects gave informed consent in compliance with a protocol
approved by the Institutional Review Board of Indiana University
Purdue University Indianapolis and Clarian Health.
Stimuli consisted of 36 pairs of three-syllable Chinese utter-
ances, and 44 pairs of one-syllable Chinese utterances. Utterances
were designed with two intonation patterns (declarative, interrog-
ative) in combination with the four Chinese tones on the utterance-
final syllable (Fig. 1). Focus was held constant on the utterance-
final syllable. No adjacent syllables in the three-syllable utterances
formed bisyllabic words to minimize lexical-semantic processing.
Tone or intonation each differed in 36% of the pairs for the one-
syllable utterances, 39% of the pairs for the three-syllable utter-
ances. Stimuli that were identical in both tone and intonation
comprised 28% and 22% of the pairs in one-syllable and three-
syllable utterances, respectively.
A 52-year-old male native speaker of Mandarin was instructed
to read one- and three-syllable utterances at a conversational
speaking rate in a declarative and interrogative sentence mood. A
reading task was chosen to maximize the likelihood of simulating
normal speaking conditions as much as possible while at the same
time controlling the syntactic, prosodic, and segmental character-
istics of the spoken sentences. To enhance the naturalness of
producing the three-syllable utterances, he was told to treat them
as SVO (subject verb object) sentences with non-emphatic stress
placed on the final syllable. All items in the list were typed in
Chinese characters. A sufficient pause was provided between items
to ensure that the speaker maintained a uniform speaking rate. By
controlling the pace of presentation, we maximized the likelihood
of obtaining consistent, natural-sounding productions. To avoid
list-reading effects, extra items were placed at the top and bottom
of the list. Recordings were made in a double-walled soundproof
booth using an AKG C410 headset type microphone and a Sony
TCD-D8 digital audio tape recorder. The subject was seated and
wore a custom-made headband that maintained the microphone at a
distance of 12 cm from the lips.
Prescreening identification procedure
All one- and three-syllable utterances were presented individ-
ually in random order for identification by five native speakers of
Chinese who were naive to the purposes of the experiment. They
were asked to respond whether they heard a declarative or
interrogative intonation and to indicate the tone occurring on the
final syllable. Only those stimuli that achieved a perfect (100%)
recognition score for both intonation and tone were retained for
possible use as stimuli in our training and experimental sessions.
The experimental paradigm consisted of four active tasks (Table
1) and a passive listening task. The active tasks required discrim-
ination judgments of intonation (I) and tone (T) in paired three-
syllable (I3, T3) and one-syllable (I1, T1) utterances. Subjects were
instructed to focus their attention on either the utterance-level
intonation or the lexical tone of the final syllable, make discrim-
ination judgments, and respond by pressing a mouse button (left =
same; right = different). The control task involved passive listening
to the same utterances, either one-syllable utterances (L1) or three-
syllable utterances (L3). Subjects responded by alternately pressing
the left and right mouse button after each trial.
A scanning sequence consisted of two tasks presented in a
blocked format alternating with rest periods (Fig. 2). The one-
syllable and three-syllable utterance blocks contained 11 and 9
trials, respectively. The order of scanning runs and trials within
blocks were randomized for each subject. Instructions were deliv-
ered to subjects in their native language via headphones during rest
periods immediately preceding each task: ‘‘listen’’ for passive
listening to speech stimuli, ‘‘intonation’’ for same–different judg-
ments on Chinese intonation, and ‘‘tone’’ for same–different
J. Gandour et al. / NeuroImage 23 (2004) 344–357
judgments on Chinese tone. Average trial duration was about 2.9
and 3.5 s, respectively, for the one-syllable and three-syllable
utterance blocks, including a response interval of 2 s.
All speech stimuli were digitally edited to have equal maximum
energy level in dB SPL. Auditory stimuli were presented binaurally
using a computer playback system (E-Prime) and a pneumatic-
Fig. 1. Acoustic features of sample Chinese speech stimuli. Broad-band spectrograms (SPG: 0–8 kHz) and voice fundamental frequency contours (F0: 0–400
Hz) are displayed for utterance pairs consisting of same tone/different intonation in three-syllable utterances (top left), same tone/different intonation in one-
syllable utterances (top right), different tone/same intonation in three-syllable utterances (bottom left), and different tone/same intonation in one-syllable
utterances (bottom right).
Samples of Chinese tone and intonation stimuli for tasks involving one-syllable and three-syllable utterances
Note. I1 (T1) and I3 (T3) represent intonation (tone) tasks in one-syllable and three-syllable utterances, respectively.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
based audio system (Avotec). The plastic sound conduction tubes
were threaded through tightly occlusive foam eartips inside the
earmuffs that attenuated the average sound pressure level of the
continuous scanner noise by f30 dB. Average intensity of all
experimental stimuli was 92 dB SPL as compared to 80 dB SPL
Accuracy, reaction time, and subjective ratings of task difficulty
were used to measure task performance. Each task was self-rated by
listeners on a 1- to 5-point graded scale of difficulty (1 = easy, 3 =
medium, 5 = hard) at the end of the scanning session. Before
scanning, subjects were trained to a high level of accuracy using
stimuli different from those presented during the scanning runs: I3
(Chinese, 93% correct; English, 88%); I1 (Chinese, 92%; English,
77%); T3 (Chinese, 99%; English, 82%); T1 (Chinese, 99%;
Scanning was performed on a 1.5T Signa GE LX Horizon
scanner (Waukesha, WI) equipped with birdcage transmit–re-
ceive radiofrequency head coils. Each of four 200-volume echo-
planar imaging (EPI) series was begun with a rest interval
consisting of 8 baseline volumes (16 s), followed by 184
volumes during which the two comparison tasks (32 s) alternated
with intervening 16 s rest intervals, and ended with a rest interval
of 8 baseline volumes (16 s) (Fig. 2). Functional data were
acquired using a gradient-echo EPI pulse sequence with the
following parameters: repetition time (TR) 2 s; echo time (TE)
50 ms; matrix 64 ? 64; flip angle (FA) 90j; field of view (FOV)
24 ? 24 cm. Fifteen 7.5-mm-thick, contiguous axial slices were
used to image the entire cerebrum. Before functional imaging
runs, high-resolution, and anatomic images were acquired in 124
contiguous axial slices using a 3D Spoiled-Grass (3D SPGR)
sequence (slice thickness 1.2–1.3 mm; TR 35 ms; TE 8 ms; 1
excitation; FA 30j; matrix 256 ? 128; FOV 24 ? 24 cm) for
purposes of anatomic localization and coregistration to a standard
stereotactic system (Talairach and Tournoux, 1988). Subjects
were scanned with eyes closed and room lights dimmed. The
effects of head motion were minimized by using a head–neck
pad and dental bite bar.
Image analysis was conducted using the AFNI software pack-
age (Cox, 1996). All data for a given subject were motion-
corrected to the fourth acquired volume of the first functional
imaging run. To remove differences in global intensity between
runs, the signal in each voxel was detrended across each functional
scan to remove scanner signal drift, and then normalized to its
mean intensity. Each of the four functional runs was analyzed to
obtain cross-correlation for each of three reference waveforms with
the measured fMRI time series for each voxel. The first reference
waveform corresponded to one of the four active conditions (I1, I3,
T1, T3) presented in a single run (Fig. 2). The second and third
reference waveforms corresponded to the two control conditions,
L1 and L3, respectively, presented during the two runs with the
same temporal interval for the intonation and tone conditions (L1
for I1 and T1; L3 for I3 and T3). After the resulting EPI volumes
were transformed to 1-mm isotropic voxels in Talairach coordinate
space (Talairach and Tournoux, 1988), the correlation coefficients
were converted to z scores for purposes of analyzing multisubject
fMRI data (Bosch, 2000), and spatially smoothed by a 5.2-mm
FWHM Gaussian filter to account for intersubject variation in brain
anatomy and to enhance the signal-to-noise ratio.
Direct comparison of active conditions (I1, I3, T1, T3) across
runs was accomplished by computing the average z score for each
of the four active conditions relative to its corresponding control
condition. Averaged z scores for the control conditions were then
subtracted from those obtained for their corresponding intonation
or tone conditions (e.g., DzI1 = zI1 ? zL1, DzI3 = zI3? zL3).
Evaluating each active condition to a control of the same temporal
interval also makes it possible to compare active conditions across
temporal intervals (e.g., DzI1vs. DzI3).
Within- and between-group random effects maps (I1 vs. L1, T1
by applying voxel-wise ANOVAs on the z (e.g., Chinese zI1vs.
Chinese zL1) and Dz (e.g., Chinese DzI1vs. English DzI1) values,
respectively. The individual voxel threshold for between-group
maps was set at P = 0.01. For within-group maps, significantly
activated voxels (P < 0.001) located within a radius of 7.6 mm were
grouped into clusters, with a minimum cluster size threshold
Fig. 2. Sequence and timing of conditions in each of the four functional imaging runs. I3 and I1 stand for intonation in three-syllable and one-syllable Chinese
utterances, respectively; T3 and T1 stand for tone in three-syllable and one-syllable Chinese utterances, respectively; R = rest interval; L3 and L1 stand for
passive listening to three-syllable and one-syllable Chinese utterances, respectively.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
corresponding to four original resolution voxels. According to a
Monte Carlo simulation (AlphaSim), this clustering procedure
yielded a false-positive alpha level of 0.04.
Nine anatomically constrained 5-mm radius spherical regions of
interest (ROI) were examined along with other regions. We chose
ROIs that have been implicated in previous studies of phonological
2003), speech perception (Binder et al., 2000; Davis and Johnsrude,
2003; Giraud and Price, 2001; Scott, 2003; Scott and Johnsrude,
2003; Scott et al., 2000; Zatorre et al., 2002), attention (Corbetta,
1998; Corbetta and Shulman, 2002; Corbetta et al., 2000; Shaywitz
Bongiolatti, 2002; Chein et al., 2003; D’Esposito et al., 2000;
Jonides et al., 1998; Newman et al., 2002; Paulesu et al., 1993;
Smith and Jonides, 1999). ROIs were symmetric in nonoverlapping
frontal, temporal, and parietal regions of both hemispheres (see
over peak location coordinates reported in previous studies. They
were then slightly adjusted to avoid overlapping of ROIs and
crossing of major anatomical boundaries. Of these coordinates, 26
out of 27 (9 ROIs ? 3 coordinates) fell within 1 SD, 1 (x, mSTS)
within 2 SD of the mean published values. Similar results were
radius results because larger ROIs would have to be shifted to avoid
crossing of anatomical boundaries.
The mean Dz (I1, I3, T1, T3) was calculated for each ROI and
every subject. These mean Dz values within each ROI were
analyzed using repeated measures mixed-model ANOVAs
(SASR) to compare activation between tasks (I1, T1, I3, T3),
hemispheres (LH, RH), and groups (Chinese, English). Tasks and
hemispheres were treated as fixed, within-subjects effects; groups
as a fixed, between-subjects effect. Subjects were nested within
groups as a random effect. It may seem reasonable to use stimulus
length as a separate factor in the ANOVA, treating one-syllable and
three-syllable as two levels of this factor. However, as pointed out in
Introduction, although each stimulus contained three syllables in
both I3 and T3 tasks, T3 was different from I3 with respect to
attentional demands. In T3, participants had to pay attention to the
last syllable only, whereas in I3, they had to focus their attention on
all three syllables. Treating stimulus length as a separate factor
would have confounded length (1, 3) and prosodic unit (I, T).
Behavioral measures of task performance by Chinese and En-
glish groups are given in Table 3. A repeated measures ANOVAwas
and Task as within-subjects factor (I1, I3, T1, T3). Results revealed
and reaction time (RT)[F(3,54) = 8.68, P< 0.0001]. Tests of simple
main effects indicated that for between group comparisons, the tone
task was judged to be easier for Chinese than for English listeners
(T1, P < 0.0001; T3, P = 0.0004); Chinese listeners judged all tasks
at a higher level of accuracy than English listeners (P < 0.01); and
RTs were longer for English than for Chinese listeners when making
tonal judgments (T1, P = 0.0281; T3, P = 0.007). Regardless of
language background, intonation judgments took longer in the one-
syllable (I1) than in the three-syllable (I3) utterances (Chinese, P =
be more difficult than T1 (P= 0.0003); more errors were made in I1
than T1 (P = 0.0001), and RTs were longer in I1 compared to T1
(P < 0.0001), I3 compared to T3 (P = 0.0339). In contrast, the
the other three tasks (P < 0.01).
Between group comparisons
ROI-based ANOVAs revealed that the Chinese group exhibited
significantly (P < 0.001) greater activity, as measured by Dz, in the
left IPL relative to the English group regardless of task (I1, I3, T1,
T3) (Figs. 4f and 5; Table 4). No other ROIs in either the LH or RH
elicited significantly more activity in the Chinese group as com-
pared to the English group.
In contrast, the English group showed significantly greater
bilateral or right-sided activity in frontal, parietal, and temporal
Center coordinates and extents of 5-mm spherical ROIs
F45 +32 +22
F44 +10 +33
45/13 F37 +25 +14 centered deep within the frontal
operculum of the inferior frontal gyrus,
extending dorsally to the lower bank of
the inferior frontal sulcus, ventrally to
the bordering edge of the anterior insula
F32 ?48 +43 centered in and confined to
the intraparietal sulcus
F50 ?31 +28 centered in anteroventral aspects of
the supramarginal gyrus, extending
ventrally into the bordering edge of
the Sylvian fissure
?8 centered in the temporal pole and
wholly confined to the STG; posterior
border (y = +5) was about 20 mm
anterior to the medial end of the first
transverse temporal sulcus (TTS)
?3 centered in the STS encompassing both
the upper and lower banks of the STS;
anterior border (y = ?16)
was contiguous with the medial
border of TTS
F56 ?38 +12 centered in the STG, extending
ventrally into the STS; anterior border
(y = ?35) was about 20 mm
posterior to the medial border of TTS
Notes. Stereotaxic coordinates (mm) are derived from the human brain atlas
of Talairach and Tournoux (1988). a, anterior; m, middle; p, posterior; FO,
frontal operculum; MFG, middle frontal gyrus; IPS, intraparietal sulcus;
IPL, inferior parietal lobule; STG, superior temporal gyrus; STS, superior
temporal sulcus. Right hemisphere ROIs were generated by reflecting the
left hemisphere location across the midline.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
ROIs relative to the Chinese group (Fig. 6; Table 4). In the frontal
lobe, all four ROIs (Figs. 4a–d) were more active bilaterally for the
tone tasks (T1, T3). In the parietal lobe, IPS (Fig. 4e) activity was
greater in both the LH and RH for T1. In the temporal lobe, the
pSTG (Fig. 4i) was more active bilaterally across tasks (I1, I3, T1,
T3), whereas greater activity in the aSTG (Fig. 4g) was observed
across tasks in the RH only.
Within group comparisons
Hemisphere effects for the Chinese group revealed complemen-
tary leftward and rightward asymmetries, as measured by Dz,
depending on ROI and task (Table 5). Laterality differences
favored the LH in the frontal aMFG (Figs. 4a and 7, upper panel)
for intonation tasks only, irrespective of temporal interval (I1, I3).
In the parietal lobe, significantly more activity was observed in the
left IPL (Figs. 4f and 5) across tasks, and in the left IPS (Figs. 4e
and 7, lower panel) for T3 (cf. Gandour et al., 2003). In the
temporal lobe, activity was greater in the left pSTG (Figs. 4i and 8)
and aSTG (Fig. 4g) across tasks regardless of temporal interval. In
contrast, laterality differences favored the RH in the frontal mMFG
(Fig. 4b) and temporal mSTS (Figs. 4h and 8) across tasks.
Hemisphere effects for the English group were restricted to
frontal and temporal ROIs in the RH (Table 5). Rightward asymme-
Fig. 3. Location of fixed spherical ROIs in frontal (open circle), parietal (checkered circle), and temporal (barred circle) regions displayed in left sagittal
sections (top and middle panels), and on the lateral surface of both hemispheres (bottom panels). LH = left hemisphere; RH = right hemisphere. Stereotactic x
coordinates that appear in the top and middle panels are derived from the human brain atlas of Talairach and Tournoux (1988). See also Table 2.
Behavioral performance and self-ratings of task difficulty
Language groupTask Accuracy (%) Reaction time (ms)Difficultya
Note. Values are expressed as mean and standard error (in parentheses). See
also note in Table 1.
aScalar units are from 1 to 5 (1 = easy; 3 = medium; 5 = hard) for self-
ratings of task difficulty.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
tries were observed in the frontal mMFG (Fig. 4b) and temporal
the RH were identical to those for the Chinese group. No significant
leftward asymmetries were observed for any task across ROIs.
Task effects for the Chinese group revealed laterality differ-
ences, as measured by Dz, related to the prosodic unit. Intonation
(I1, I3), when compared to tone (T1, T3), favored the LH in the
aMFG (Figs. 4a and 7). In the pMFG (Fig. 4c), I3 was greater than
T3 in the RH; I1 was greater than T1 in both hemispheres.
For both groups, a cluster analysis revealed significant (P <
.001) activation in the supplementary motor area across tasks. The
Chinese group showed predominantly right-sided activation in the
lateral cerebellum across tasks. In the caudate and thalamus,
increased activation was observed in the Chinese group for the
intonation tasks only (I1, I3), but across tasks in the English group.
Hemispheric roles in speech prosody
The major findings of this study demonstrate that Chinese tone
and intonation are best thought of as a mosaic of multiple local
asymmetries that allows for the possibility that different regions
Fig. 4. Comparison of mean Dz scores between language groups (Chinese, English) per task (I1, T1, I3, T3) and hemisphere (LH, RH) within each ROI. Frontal
lobe, a–d; parietal, e–f; temporal, g–i. I1 is measured by DzI1; T1 by DzT1; I3 by DzI3; T3 by DzT3. Error bars represent F1 SE.
Fig. 5. A random effects fMRI activation map obtained from comparison of
discrimination judgments of intonation in one-syllable utterances (I1)
relative to passive listening to the same stimuli (L1) between the two
language groups (DzI1Chinese vs. DzI1English). Left/right sagittal sections
through stereotaxic space are superimposed onto a representative brain
anatomy. The Chinese group shows increased activation in the left IPL, as
compared to the English group, centered in ventral aspects of the
supramarginal gyrus, and extending into the bordering edge of the Sylvian
fissure. Similar activation foci in the IPL are also observed in I3 vs. L3, T1
vs. L1, and T3 vs. L3 comparisons. See also Fig. 4.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
may be differentially weighted in laterality depending on language-,
modality-, and task-related features (Ide et al., 1999). Earlier
hypotheses that focus on hemispheric function capture only part
of, but not the whole, phenomenon. Not all aspects of speech
prosody are lateralized to the RH. Cross-language differences in
laterality of particular brain regions depend on a listener’s implicit
knowledge of the relation between external stimulus features
(acoustic/auditory) and internal conceptual representations (linguis-
tic/prosodic). All regions in the frontal, temporal, and parietal lobes
that are lateralized to the LH in response to all tasks or subsets of
tasks are found in the Chinese group only (Fig. 9). Conversely, the
two regions in the temporal and frontal lobes that are lateralized to
the RH are found in both language groups. We infer that LH
laterality reflects higher-order processing of internal representations
of Chinese tone and intonation, whereas RH laterality reflects
lower-order processing of complex auditory stimuli.
Previous models of speech prosody processing in the brain have
either focused on linguistics or acoustics as the driving force
underlying hemispheric lateralization. In this study, tone and
intonation are lateralized to the LH for the Chinese group. Despite
their functional differences from a linguistic perspective, they both
recruit shared neural mechanisms in frontal, temporal, and parietal
regions of the LH. The finding that intonation is lateralized to the
LH cannot be accounted for by a model that claims that ‘‘supra-
segmental sentence level information of speech comprehension is
subserved by the RH’’ (Friederici and Alter, 2004, p. 268). Neither
can this finding be explained by a hypothesis based on the size of
the temporal integration window (short ! LH; long ! RH)
(Poeppel, 2003). In spite of the fact that both intonation and tone
meet his criteria for a long temporal integration window, they are
lateralized to the LH instead of the RH.
Instead of viewing hemispheric roles as being derived from
either acoustics or linguistics independently, we propose that both
linguistics and acoustics, in addition to task demands (Plante et
al., 2002), are all necessary ingredients for developing a neuro-
biological model of speech prosody. This model relies on dynamic
interactions between the two hemispheres. Whereas the RH is
engaged in pitch processing of complex auditory signals, includ-
ing speech, we speculate that the LH is recruited to process
categorical information to support phonological processing, or
even syntactic and semantic processing (cf. Friederici and Alter,
2004). With respect to task demands, I1 elicits greater activation
than T1 in the left aMFG and bilaterally in the pMFG. These
differences cannot be explained by ‘‘prosodic frame length’’
(Dogil et al., 2002) since both tone and intonation are presented
in an identical temporal context (one-syllable). These findings
cannot be explained by a model that claims that segmental, lexical
Fig. 6. Random effects fMRI activation map obtained from comparison of
discrimination judgments of tone in one-syllable utterances (T1) relative to
passive listening to the same stimuli (L1) between the two language groups
(DzT1 English vs. DzT1 Chinese). An axial section reveals increased
activation bilaterally in both frontal and parietal regions, as well as in the
supplementary motor area, for the English group relative to the Chinese
group. Similar activation foci are also observed in the T3 vs. L3
comparison. See also Fig. 4.
Group effects per task-and-hemisphere from statistical analyses on mean Dz within each spherical ROI
Group Hemi Task FrontalParietal Temporal
aMFG mMFGpMFG FO IPS IPLaSTGmSTS pSTG
C > E LHI1
E > CLH**
**** ***** ***
Note. C = Chinese group; E = English group; Hemi = hemisphere. LH = left hemisphere; RH = right hemisphere. *F(1, 18), P < 0.05; **F(1, 18), P < 0.01;
***F(1, 18), P < 0.001. See also notes to Tables 1 and 2.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
(i.e., tone), and syntactic information is processed in the LH,
suprasegmental sentence level information (i.e., intonation) in the
RH (Friederici and Alter, 2004). Rather, they most likely reflect
task demands related to retrieval of internal representations
associated with tone and intonation.
Functional heterogeneity within a spatially distributed network
Activation in the frontopolar cortex (BA 10) was bilateral
across all tasks for English listeners, but predominantly left-sided
in the intonation tasks (I1, I3) for Chinese listeners (Table 5). The
frontopolar region has extensive interconnections with auditory
regions of the superior temporal gyrus (Petrides and Pandya,
1984). Thus, when presented with a competing articulatory sup-
pression task, bilateral activation of frontopolar cortex has been
reported in a verbal working memory paradigm (Gruber, 2001). Its
functional role is inferred to be that of integrating working memory
with the allocation of attentional resources (Koechlin et al., 1999),
or applying greater effort in memory retrieval (Buckner et al.,
1996; Schacter et al., 1996).
These cross-language differences in frontopolar activation are
likely to result from the linguistic function of suprasegmental
information in Chinese and English. As measured by RT and
accuracy, Chinese listeners take longer and are less proficient in
judging intonation than tone. The relatively greater difficulty in
intonation judgments presumably reflects the fact that in Chinese,
all syllables carry tonal contours obligatorily. Tones are likely to be
processed first, as compared to intonation, due to this syllable-by-
syllable processing. By comparison, intonation contours play a
comparatively minor role in signaling differences in sentence
mood. In this study, the unmarked (i.e., minus a sentence-final
particle) yes–no interrogatives are known to carry a light func-
tional load (Shen, 1990).
In the present study, subjects were required to keep tone or
intonation information of the first stimulus in a pair in their
working memory while concurrently accessing tone or intonation
identification of the second stimulus. Due to the functional
difference between tone and intonation for Chinese listeners,
intonation judgment of the second stimulus competes for more
attentional resources and leads to greater effort in memory retrieval
of intonation from the first stimulus. This process presumably
elicits greater activity in the left frontopolar region for intonation
tasks in Chinese listeners. English listeners, on the other hand,
employ a different processing strategy regardless of linguistic
function. Without prior knowledge of the Chinese language,
retrieving auditory information from working memory and making
discrimination judgments is presumed to be equally difficult
between tone and intonation, resulting in bilateral activation of
frontopolar cortex for all tasks.
Dorsolateral prefrontal cortex, including BA 46 and BA 9, is
involved in controlling attentional demands of tasks and maintain-
ing information in working memory (Corbetta and Shulman, 2002;
Knight et al., 1999; MacDonald et al., 2000; Mesulam, 1981). The
rightward asymmetry in the mMFG (BA 46) that is observed in all
tasks (I1, I3, T1, T3) in both language groups (Table 5) points to a
stage of processing that involves auditory attention and working
memory. Functional neuroimaging data reveal that auditory selec-
tive attention tasks elicit increased activity in right dorsolateral
prefrontal cortex (Zatorre et al., 1999). In the music domain,
perceptual analysis and short-term maintenance of pitch informa-
tion underlying melodies recruits neural systems within the right
prefrontal and temporal cortex (Zatorre et al., 1994). In this study,
activation of the prefrontal mMFG and temporal mSTS is similarly
lateralized to the RH across tasks in both language groups. These
data are consistent with the idea that the right dorsolateral
prefrontal area (BA 46/9) plays a role in auditory attention that
modulates pitch perception in sensory representations beyond the
lateral belt of the auditory cortex, and actively retains pitch
information in auditory working memory (cf. Plante et al., 2002).
Albeit in the speech domain, this frontotemporal network in the RH
serves to maintain pitch information regardless of its linguistic
relevance. A frontotemporal network for auditory short-term mem-
ory is further supported by epileptic patients who show significant
deficits in retention of tonal information after unilateral excisions
of the right frontal or temporal regions (Zatorre and Samson,
1991). In nonhuman primates, a processing stream for sound-
object identification has been proposed that projects anteriorly
Within-group hemisphere effects per task from statistical analyses on mean Dz within each spherical ROI
GroupHemi TaskFrontalParietal Temporal
aMFG mMFGpMFG FO IPS IPLaSTG mSTSpSTG
C LH > RHI1
RH > LH*
ELH > RH
RH > LH*
Note. *F(1, 9), P < 0.05; **F(1, 9), P < 0.01;+tTukey-adjusted(9), P < 0.05. See also notes to Tables 2 and 4.++tTukey-adjusted(9), P < 0.01.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
along the lateral temporal cortex (Rauschecker and Tian, 2000),
leading to the lateral prefrontal cortex (Hackett et al., 1999;
Romanski et al., 1999a,b). A similar anterior processing stream
destined for the lateral prefrontal cortex in humans presumably
underlies a frontotemporal network, at least in the RH, for low-
level auditory processing of complex pitch information.
Intonation elicited greater activity relative to tone in the pMFG
(BA 9), bilaterally in the one-syllable condition, right sided only in
the three-syllable condition (Fig. 4c). The fact that I3 elicited
greater activity than T3 in the posterior MFG of the RH replicates
Gandour et al. (2003). One possible explanation focuses on the
prosodic units themselves. Tones are processed in the LH, intona-
tion predominantly in the RH. However, this account is untenable
because I1 elicits greater activation bilaterally as compared to T1.
Moreover, intonation (I1, I3) and tone (T1, T3) tasks separately
elicit no hemispheric laterality effects in the pMFG. Another
possible explanation has to do with the temporal interval. One
might argue that the difference between I3 and T3 is due to the
time interval of focused attention for the prosodic unit: I3 = three
syllables; T3 = last syllable only. On this view, shorter prosodic
frames are processed in the LH, longer frames in the RH. This
alternative account of pMFG activity is also ruled out because I1
elicits similar hemispheric laterality effects as I3. Instead, differ-
ential pMFG activity related to direct comparisons between into-
nation and tone are most likely related to task demands (cf. Plante
et al., 2002). As measured by RT and self-ratings of task difficulty,
intonation tasks are more difficult than tone for Chinese listeners
(Table 3). Equally significant is the fact that the English group
shows greater activation for tonal processing (T1, T3) than the
Chinese group in the pMFG bilaterally (Table 4). These findings
together are consistent with the idea that the pMFG coordinates
attentional resources required by the task.
Fig. 7. Random effects fMRI activation maps obtained from comparison of discrimination judgments of intonation (I3; upper panel) and tone (T3; bottom
panel) in three-syllable utterances relative to passive listening to the same stimuli (L3) for the Chinese group (zI3vs. zL3; zT3vs. zL3). In I3 vs. L3 and I1 vs. L1
(not shown), increased activity in frontopolar cortex (aMFG) shows a leftward asymmetry (upper panel; x = ?35), whereas activation of the middle (mMFG)
region of dorsolateral prefrontal cortex shows the opposite laterality effect (upper panel; x = +35, +40, +45). In T3 vs. L3, IPS activity is predominant in the
LH (bottom panel; x = ?35, ?40, ?45). In I3 (upper panel; x = +35, +40, +45) vs. T3 (lower panel; x = +35, +40, +45), activation of the right pMFG is
greater in the I3 than the T3 task. See also Fig. 4.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
The fronto-opercular region (FO, BA 45/13) is activated
bilaterally in both language groups (Table 5). Activation levels
are similar across tasks (I1, I3, T1, T3). Recent neuroimaging
studies (Meyer et al., 2002, 2003) also show bilateral FO activation
in a prosodic speech condition in which a speech utterance is
reduced to speech melody by removal of all lexical and syntactic
information. Increased FO activity is presumed to reflect increased
effort in extracting syntactic, lexical-semantic, or slow pitch
information from degraded speech signals (Meyer et al., 2002,
2003), or in discriminating sequences of melodic pitch patterns
(Zatorre et al., 1994). Similarly, our tasks require increased
cognitive effort to extract tone and intonation from the auditory
stream to maintain this information in working memory.
There appear to be at least two distinct regions of activation in
the parietal cortex, one located more superiorly (IPS) in the
intraparietal sulcus and adjacent aspects of the superior parietal
lobule, another more inferiorly (IPL) in the anterior supramarginal
gyrus (SMG) near the parietotemporal boundary (cf. Becker et al.,
1999). Our findings show greater activation in the IPS bilaterally in
T1 for the English group compared to Chinese (Table 4). It has
been proposed that this area supports voluntary focusing and
shifting of attentional scanning across activated memory represen-
tations (Chein et al., 2003; Corbetta and Shulman, 2002; Corbetta
et al., 2000; Cowan, 1995; Mazoyer et al., 2002). The efficacy of
selective attention depends on how external stimuli are encoded
into internal phonological representations. English listeners expe-
rienced more difficulty in focusing and shifting of attention in T1
because lexically relevant pitch variations do not occur in English
In contrast, the Chinese group shows left-sided activity in the
IPS for T3 (Table 5). This finding replicates our previous study of
Chinese tone and intonation (Gandour et al., 2003), reinforcing the
view that a left frontoparietal network is recruited for the process-
ing of lexical tones (Li et al., 2003). In T1, listeners extract tone
from isolated monosyllables. In T3, they extract tone from a fixed
position in a sequence of syllables, which causes repeated shifts in
attention from one item to another. These laterality differences
between T3 and T1 indicate that selective attention to discrete
linguistic constructs is a gradient neurophysiological phenomenon
in the context of task-specific demands.
The Chinese group, as compared to English, shows greater
activation across tasks (I1, I3, T1, T3) in the left ventral aspects of
the IPL (BA 40) near the parietotemporal boundary (Table 4).
Within the Chinese group, a relatively greater IPL activation on the
left is observed across tasks and without regard to the prosodic unit
(I, T) or temporal interval (1, 3). Perhaps it is the ‘‘categoricalness’’
or phonological significance of the auditory stimuli that triggers
activation in this area (Jacquemot et al., 2003). This language-
specific effect can be understood from the conceptualization of the
IPL as part of an auditory-motor integration circuit in speech
perception (Hickok and Poeppel, 2000; Wise et al., 2001). Chinese
listeners possess articulatory-based representations of Chinese
tones and intonation. English listeners do not. Consequently, no
Fig. 8. A random effects fMRI activation map obtained from comparison of
discrimination judgments of intonation in one-syllable utterances (I1)
relative to passive listening to the same stimuli (L1) for the Chinese group
(zI1vs. zL1). Left/right sagittal sections reveal increased mSTS activity in
the RH, projecting both ventrally and dorsally into the MTG and STG,
respectively. pSTG activity shows the opposite hemispheric effect, part of a
continuous swath of activation extending caudally from middle regions of
the STG/STS. Similar activation foci are also observed in T1 vs. L1, I3 vs.
L3, and T3 vs. L3. See also Fig. 4.
Fig. 9. Laterality effects for ROIs in the Chinese group only, and in both
Chinese and English groups, rendered on a three-dimensional LH template
for common reference. In the Chinese group (top panel), IPL, aSTG, and
pSTG are left-lateralized (LH > RH) across tasks; aMFG (I1, I3) and IPS
(T3) are left-lateralized for specific tasks. In both language groups (bottom
panel), mMFG and mSTS are right-lateralized (RH > LH) across tasks
(bottom right panel). Other ROIs do not show laterality effects. No ROI
elicited either a rightward asymmetry for the Chinese group only, or a
leftward asymmetry for both Chinese and English groups. See also Table 5.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
activation of this area is observed in the English group. Its LH
activity co-occurs with a leftward asymmetry in the pSTG across
tasks. Co-activation of the IPL reinforces the view that it is part of
an auditory–articulatory processing stream that connects posterior
temporal and inferior prefrontal regions. An alternative conceptu-
alization is that the phonological storage component of verbal
working memory resides in the IPL (Awh et al., 1996; Paulesu et
al., 1993). This notion predicts that both passive listening and
verbal working memory tasks should elicit activation in this region,
since auditory verbal information has obligatory access to the store
(Chein et al., 2003). I1, I3, T1, and T3 were all derived by
subtracting their corresponding passive listening control condition.
Contrary to fact, this notion would wrongly predict no increased
activation in the IPL.
The anterior superior temporal gyrus (aSTG) displays an LH
advantage in the Chinese group across tasks (Table 5). A reduced
RH, rather than increased LH aSTG activation, appears to
underlie this hemispheric asymmetry across all tasks. Since
intelligible speech is used in all tasks, phonological input alone
may be sufficient to explain the leftward asymmetry in the
Chinese group (Scott and Johnsrude, 2003; Scott et al., 2000).
It is also consistent with the notion that this region maps
acoustic–phonetic cues onto linguistic representations as part of
a larger auditory-semantic integration circuit in speech perception
(Giraud and Price, 2001; Scott and Johnsrude, 2003; Scott et al.,
2003). In contrast, English listeners do not have knowledge of
these prosodic representations. Consequently, they employ a
nonlinguistic pitch processing strategy across tasks and fail to
show any hemispheric asymmetry.
A language group effect is not found in hemispheric laterality
of the mSTS (BA 22/21). Both groups show greater RH activity in
the mSTS across tasks (Table 5). This suggests that this area is
sensitive to different acoustic features of the speech signal irre-
spective of language experience. The rightward asymmetry may
reflect shared mechanisms underlying early attentional modulation
in processing of complex pitch patterns. In this study, subjects were
required to direct their attention to slow modulation of pitch
patterns (i.e., c300–1000 ms) underlying either Chinese tone or
intonation. This interpretation is consistent with hemispheric roles
hypothesized for auditory processing of complex sounds in the
temporal lobe: RH for spectral processing, LH for temporal
processing (Poeppel, 2003; Zatorre and Belin, 2001; Zatorre et
al., 2002). Moreover, it is consistent with the view that right
auditory cortex is most important in the processing of dynamic
pitch variation (Johnsrude et al., 2000). Both groups show greater
activation in the right mSTS. We therefore infer that this activity
reflects a complex aspect of pitch processing that is independent of
A left asymmetric activation of the posterior part of the superior
temporal gyrus (pSTG; BA 22) across tasks is observed in the
Chinese group only (Table 5). It has been suggested that the left
pSTG, as part of a posterior processing stream, is involved in
prelexical processing of phonetic cues and features (Scott, 2003;
Scott and Johnsrude, 2003; Scott and Wise, 2003). English listen-
ers, however, show no leftward asymmetry in the pSTG (Table 5).
Moreover, they show greater activation bilaterally relative to the
Chinese group (Table 4). Therefore, auditory phonetic cues that are
of phonological significance in one’s native language may be
primarily responsible for this leftward asymmetry.
These findings collectively support functional segregation of
temporal lobe regions, and their functional integration as part of a
temporofrontal network (Davis and Johnsrude, 2003; Scott, 2003;
Scott and Johnsrude, 2003; Specht and Reul, 2003). LH networks
in the temporal lobe that are sensitive to phonologically relevant
parameters from the auditory signal are in anterior and posterior, as
opposed to central, regions of the STG/STS (Giraud and Price,
2001). The anterior region appears to be part of an auditory-
semantic processing stream, the posterior region part of an audi-
tory-motor processing stream. Both processing streams, in turn,
project to convergence areas in the frontal lobe.
Effects of task performance on hemispheric asymmetry
In this study, the BOLD signal magnitude depends on the
participant’s proficiency in a particular phonological task (Chee et
al., 2001). The two groups differ maximally in relative language
proficiency: Chinese group, 100%; English group, 0%. As reflected
in behavioral measures of task performance (Table 3), perceptual
judgments of Chinese tones require more cognitive effort by
English monolinguals due to their unfamiliarity with lexical tones.
Their unfamiliarity with the Chinese language results in greater
BOLD activation for T1 and T3, either bilateral or RH only (cf.
Chee et al., 2001). The effect of minimal language proficiency
applies only to lexical tone. Intonation, on the other hand, elicits
bilateral activation for both groups in the posterior MFG, frontal
operculum, and intraparietal sulcus (Table 4; Fig. 4). This common
frontoparietal activity implies that processing of intonation requires
similar cognitive effort for Chinese and English participants.
Cross-language comparisons provide unique insights into the
functional roles of different areas of this cortical network that are
recruited for processing different aspects of speech prosody (e.g.,
auditory, phonological). By using tone and intonation tasks, we are
able to distinguish hemispheric roles of areas sensitive to linguistic
levels of processing (LH) from those sensitive to lower-level
acoustical processing (RH). Rather than attribute processing of
speech prosody to RH mechanisms exclusively, our findings
suggest that lateralization is influenced by language experience
that shapes the internal prosodic representation of an external
auditory signal. This emerging model assumes a close interaction
between the two hemispheres via the corpus callosum. In sum, we
propose a more comprehensive model of speech prosody percep-
tion that is mediated primarily by RH regions for complex-sound
analysis, but is lateralized to task-dependent regions in the LH
when language processing is required.
Funding was provided by a research grant from the National
Institutes of Health R01 DC04584-04 (JG) and an NIH
postdoctoral traineeship (XL). We are grateful to J. Lowe, T.
Osborn, and J. Zimmerman for their technical assistance in the
MRI laboratory. Portions of this research were presented at the 11th
annual meeting of the Cognitive Neuroscience Society, San
Francisco, April 2004. Correspondence should be addressed to
Jack Gandour, Department of Audiology and Speech Sciences,
J. Gandour et al. / NeuroImage 23 (2004) 344–357
Purdue University, West Lafayette, IN 47907-2038, or via email:
Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz,
S., 1996. Dissociation of storage and rehearsal in verbal working mem-
ory. Psychol. Sci. 7 (1), 25–31.
Baum, S., Pell, M., 1999. The neural bases of prosody: insights from lesion
studies and neuroimaging. Aphasiology 13, 581–608.
Becker, J., MacAndrew, D., Fiez, J., 1999. A comment on the functional
localization of the phonological storage subsystem of working memory.
Brain Cogn. 41, 27–38.
Binder, J., Frost, J., Hammeke, T., Bellgowan, P., Springer, J., Kaufman, J.,
Possing, E., 2000. Human temporal lobe activation by speech and non-
speech sounds. Cereb. Cortex 10 (5), 512–528.
Blumstein, S., Cooper, W.E., 1974. Hemispheric processing of intonation
contours. Cortex 10, 146–158.
Bosch, V., 2000. Statistical analysis of multi-subject fMRI data: assessment
of focal activations. J. Magn. Reson. Imaging 11 (1), 61–64.
Bra ˚dvik, B., Dravins, C., Holta ˚s, S., Rosen, I., Ryding, E., Ingvar, D., 1991.
Disturbances of speech prosody following right hemisphere infarcts.
Acta Neurol. Scand. 84 (2), 114–126.
Braver, T.S., Bongiolatti, S.R., 2002. The role of frontopolar cortex in
subgoal processing during working memory. NeuroImage 15 (3),
Buckner, R.L., Raichle, M.E., Miezin, F.M., Petersen, S.E., 1996. Func-
tional anatomic studies of memory retrieval for auditory words and
visual pictures. J. Neurosci. 16 (19), 6219–6235.
Burton, M., 2001. The role of the inferior frontal cortex in phonological
processing. Cogn. Sci. 25 (5), 695–709.
Chee, M.W., Hon, N., Lee, H.L., Soon, C.S., 2001. Relative language
proficiency modulates BOLD signal change when bilinguals perform
semantic judgments. NeuroImage 13 (6 Pt 1), 1155–1163.
Chein, J.M., Ravizza, S.M., Fiez, J.A., 2003. Using neuroimaging to eval-
uate models of working memory and their implications for language
processing. J. Neurolinguist. 16, 315–339.
Corbetta, M., 1998. Frontoparietal cortical networks for directing attention
and the eye to visual locations: identical, independent, or overlapping
neural systems? Proc. Natl. Acad. Sci. U. S. A. 95 (3), 831–838.
Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulus-
driven attention in the brain. Nat. Rev., Neurosci. 3 (3), 201–215.
Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P., Shulman, G.L.,
2000. Voluntary orienting is dissociated from target detection in human
posterior parietal cortex. Nat. Neurosci. 3 (3), 292–297.
Cowan, N., 1995. Sensory memory and its role in information processing.
Electroencephalogr. Clin. Neurophysiol., Suppl. 44, 21–31.
Cox, R.W., 1996. AFNI: software for analysis and visualization of func-
tional magnetic resonance neuroimages. Comput. Biomed. Res. 29 (3),
Davis, M.H., Johnsrude, I.S., 2003. Hierarchical processing in spoken
language comprehension. J. Neurosci. 23 (8), 3423–3431.
D’Esposito, M., Postle, B.R., Rypma, B., 2000. Prefrontal cortical contri-
butions to working memory: evidence from event-related fMRI studies.
Exp. Brain Res. 133 (1), 3–11.
Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J.,
Riecker, A., Wildgruber, D., 2002. The speaking brain: a tutorial intro-
duction to fMRI experiments in the production of speech, prosody and
syntax. J. Neurolinguist. 15, 59–90.
Eng, N., Obler, L., Harris, K., Abramson, A., 1996. Tone perception def-
icits in Chinese-speaking Broca’s aphasics. Aphasiology 10, 649–656.
Friederici, A.D., Alter, K., 2004. Lateralization of auditory language func-
tions: a dynamic dual pathway model. Brain Lang. 89 (2), 267–276.
Gandour, J., Dardarananda, R., 1983. Identification of tonal contrasts in
Thai aphasic patients. Brain Lang. 18 (1), 98–114.
Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., Hutch-
ins, G.D., 2000. A crosslinguistic PET study of tone perception.
J. Cogn. Neurosci. 12 (1), 207–222.
Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N.,
Tong, Y., Li, X., 2002. A cross-linguistic FMRI study of spectral and
temporal cues underlying phonological processing. J. Cogn. Neurosci.
14 (7), 1076–1087.
Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L.,
Satthamnuwong, N., Lurito, J., 2003. Temporal integration of speech
prosody is shaped by language experience: an fMRI study. Brain Lang.
84 (3), 318–336.
George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A.,
Heilman, K.M., Herscovitch, P., Post, R.M., 1996. Understanding emo-
tional prosody activates right hemisphere regions. Arch. Neurol. 53 (7),
Giraud, A.L., Price, C.J., 2001. The constraints functional neuroimaging
places on classical models of auditory word processing. J. Cogn. Neuro-
sci. 13 (6), 754–765.
Gruber, O., 2001. Effects of domain-specific interference on brain activa-
tion associated with verbal working memory task performance. Cereb.
Cortex 11 (11), 1047–1055.
Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections
of the parabelt auditory cortex in macaque monkeys. Brain Res. 817
Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of
speech perception. Trends Cogn. Sci. 4 (4), 131–138.
Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T., 2003. Auditory-
motor interaction revealed by fMRI: speech, music, and working mem-
ory in area Spt. J. Cogn. Neurosci. 15 (5), 673–682.
Howie, J.M., 1976. Acoustical Studies of Mandarin Vowels and Tones.
Cambridge University Press, New York.
Hsieh, L., Gandour, J., Wong, D., Hutchins, G.D., 2001. Functional het-
erogeneity of inferior frontal gyrus is shaped by linguistic experience.
Brain Lang. 76 (3), 227–252.
Hughes, C.P., Chan, J.L., Su, M.S., 1983. Aprosodia in Chinese patients
with right cerebral hemisphere lesions. Arch. Neurol. 40 (12), 732–736.
Ide, A., Dolezal, C., Fernandez, M., Labbe, E., Mandujano, R., Montes, S.,
Segura, P., Verschae, G., Yarmuch, P., Aboitiz, F., 1999. Hemispheric
differences in variability of fissural patterns in parasylvian and cingulate
regions of human brains. J. Comp. Neurol. 410 (2), 235–242.
Ivry, R., Robertson, L., 1998. The Two Sides of Perception. MIT Press,
Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., Dupoux, E., 2003.
Phonological grammar shapes the auditory cortex: a functional magnetic
resonance imaging study. J. Neurosci. 23 (29), 9541–9546.
Johnsrude, I.S., Penhune, V.B., Zatorre, R.J., 2000. Functional specificity
in the right human auditory cortex for perceiving pitch direction. Brain
123 (Pt 1), 155–163.
Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E.,
Reuter-Lorenz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of
parietal cortex in verbal working memory. J. Neurosci. 18 (13),
Klein, D., Zatorre, R., Milner, B., Zhao, V., 2001. A cross-linguistic PET
study of tone perception in Mandarin Chinese and English speakers.
NeuroImage 13 (4), 646–653.
Knight, R.T., Staines, W.R., Swick, D., Chao, L.L., 1999. Prefrontal cortex
regulates inhibition and excitation in distributed neural networks. Acta
Psychol. (Amst.) 101 (2–3), 159–178.
Koechlin, E., Basso, G., Pietrini, P., Panzer, S., Grafman, J., 1999. The role
of the anterior prefrontal cortex in human cognition. Nature 399 (6732),
Li, X., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Lowe, M.,
Tong, Y., 2003. Selective attention to lexical tones recruits left dorsal
frontoparietal network. NeuroReport 14 (17), 2263–2266.
MacDonald III, A.W., Cohen, J.D., Stenger, V.A., Carter, C.S., 2000. Dis-
sociating the role of the dorsolateral prefrontal and anterior cingulate
cortex in cognitive control. Science 288 (5472), 1835–1838.
J. Gandour et al. / NeuroImage 23 (2004) 344–357
Mazoyer, P., Wicker, B., Fonlupt, P., 2002. A neural network elicited by
parametric manipulation of the attention load. NeuroReport 13 (17),
Mesulam, M.M., 1981. A cortical network for directed attention and uni-
lateral neglect. Ann. Neurol. 10 (4), 309–325.
Meyer, M., Alter, K., Friederici, A.D., Lohmann, G., von Cramon, D.Y.,
2002. fMRI reveals brain regions mediating slow prosodic modulations
in spoken sentences. Hum. Brain Mapp. 17 (2), 73–88.
Meyer, M., Alter, K., Friederici, A.D., 2003. Functional MR imaging
exposes differential brain responses to syntax and prosody during au-
ditory sentence comprehension. J. Neurolinguist. 16, 277–300.
Moen, I., 1993. Functional lateralization of the perception of Norwegian
word tones—Evidence from a dichotic listening experiment. Brain
Lang. 44 (4), 400–413.
Newman, S.D., Just, M.A., Carpenter, P.A., 2002. The synchronization of
the human cortical working memory network. NeuroImage 15 (4),
Oldfield, R.C., 1971. The assessment and analysis of handedness: the
Edinburgh inventory. Neuropsychologia 9 (1), 97–113.
Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates
of the verbal component of working memory. Nature 362 (6418),
Pell, M.D., 1998. Recognition of prosody following unilateral brain lesion:
influence of functional and structural attributes of prosodic contours.
Neuropsychologia 36 (8), 701–715.
Pell, M.D., Baum, S.R., 1997. The ability to perceive and comprehend
intonation in linguistic and affective contexts by brain-damaged adults.
Brain Lang. 57 (1), 80–99.
Petrides, M., Pandya, D.N., 1984. Association fiber pathways to the frontal
cortex from the superior temporal region in the rhesus monkey.
J. Comp. Neurol. 273, 52–66.
Plante, E., Creusere, M., Sabin, C., 2002. Dissociating sentential prosody
from sentence processing: activation interacts with task demands.
NeuroImage 17 (1), 401–410.
Poeppel, D., 2003. The analysis of speech in different temporal integration
windows: cerebral lateralization as ‘asymmetric sampling in time’.
Speech Commun. 41 (1), 245–255.
Rauschecker, J.P., Tian, B., 2000. Mechanisms and streams for processing
of ‘‘what’’ and ‘‘where’’ in auditory cortex. Proc. Natl. Acad. Sci. U. S.
A. 97 (22), 11800–11806.
Romanski, L.M., Bates, J.F., Goldman-Rakic, P.S., 1999a. Auditory belt
and parabelt projections to the prefrontal cortex in the rhesus monkey.
J. Comp. Neurol. 403 (2), 141–157.
Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S.,
Rauschecker, J.P., 1999b. Dual streams of auditory afferents target mul-
tiple domains in the primate prefrontal cortex. Nat. Neurosci. 2 (12),
Schacter, D.L., Alpert, N.M., Savage, C.R., Rauch, S.L., Albert, M.S.,
1996. Conscious recollection and the human hippocampal formation:
evidence from positron emission tomography. Proc. Natl. Acad. Sci.
U. S. A. 93 (1), 321–325.
Schwartz, J., Tallal, P., 1980. Rate of acoustic change may underlie hemi-
spheric specialization for speech perception. Science 207, 1380–1381.
Scott, S., 2003. How might we conceptualize speech perception? The view
from neurobiology. J. Phon. 31, 417–422.
Scott, S.K., Johnsrude, I.S., 2003. The neuroanatomical and functional
organization of speech perception. Trends Neurosci. 26 (2), 100–107.
Scott, S.K., Wise, R., 2003. PET and fMRI studies of the neural basis of
speech perception. Speech Commun. 41, 23–34.
Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J., 2000. Identification of a
pathway for intelligible speech in the left temporal lobe. Brain 123
(Pt 12), 2400–2406.
Scott, S.K., Leff, A.P., Wise, R.J., 2003. Going beyond the information
given: a neural system supporting semantic interpretation. NeuroImage
19 (3), 870–876.
Shaywitz, B.A., Shaywitz, S.E., Pugh, K.R., Fulbright, R.K., Skudlarski,
P., Mencl, W.E., Constable, R.T., Marchione, K.E., Fletcher, J.M.,
Klorman, R., et al., 2001. The functional neural architecture of
components of attention in language-processing tasks. NeuroImage
13 (4), 601–612.
Shen, X.-N., 1990. The Prosody of Mandarin Chinese. University of Cal-
ifornia Press, Berkeley, CA.
Shipley-Brown, F., Dingwall, W.O., Berlin, C.I., Yeni-Komshian, G., Gor-
don-Salant, S., 1988. Hemispheric processing of affective and linguistic
intonation contours in normal subjects. Brain Lang. 33 (1), 16–26.
Shulman, G.L., d’Avossa, G., Tansy, A.P., Corbetta, M., 2002. Two atten-
tional processes in the parietal lobe. Cereb. Cortex 12 (11), 1124–1131.
Smith, E.E., Jonides, J., 1999. Storage and executive processes in the
frontal lobes. Science 283, 1657–1661.
Specht, K., Reul, J., 2003. Functional segregation of the temporal lobes
into highly differentiated subsystems for auditory perception: an audi-
tory rapid event-related fMRI-task. NeuroImage 20 (4), 1944–1954.
Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human
Brain : 3-Dimensional Proportional System: An Approach to Cerebral
Imaging. Thieme Medical Publishers, New York.
Van Lancker, D., 1980. Cerebral lateralization of pitch cues in the linguistic
signal. Pap. Linguist. 13 (2), 201–277.
Van Lancker, D., Fromkin, V., 1973. Hemispheric specialization for pitch
and tone: evidence from Thai. J. Phon. 1, 101–109.
Wang, Y., Jongman, A., Sereno, J., 2001. Dichotic perception of Mandarin
tones by Chinese and American listeners. Brain Lang. 78, 332–348.
Weintraub, S., Mesulam, M.M., Kramer, L., 1981. Disturbances in prosody.
A right-hemisphere contribution to language. Arch. Neurol. 38 (12),
Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., Grodd, W., 2002.
Dynamic brain activation during processing of emotional intonation:
influence of acoustic parameters, emotional valence, and sex. Neuro-
Image 15 (4), 856–869.
Wise, R.J., Scott, S.K., Blank, S.C., Mummery, C.J., Murphy, K., Warbur-
ton, E.A., 2001. Separate neural subsystems within ‘Wernicke’s area’.
Brain 124 (Pt 1), 83–95.
Yiu, E., Fok, A., 1995. Lexical tone disruption in Cantonese aphasic
speakers. Clin. Linguist. Phon. 9, 79–92.
Yuan, J., Shih, C., Kochanski, G., 2002. Comparison of declarative and
interrogative intonation in Chinese. In: Bel, B., Marlien, I. (Eds.), Pro-
ceedings of the First International Conference on Speech Prosody. Aix-
en-Provence, France, pp. 711–714 (April).
Zatorre, R.J., Belin, P., 2001. Spectral and temporal processing in human
auditory cortex. Cereb. Cortex 11 (10), 946–953.
Zatorre, R., Samson, S., 1991. Role of the right temporal neocortex in
retention of pitch in auditory short-term memory. Brain 114 (Pt 6),
Zatorre, R.J., Evans, A.C., Meyer, E., 1994. Neural mechanisms under-
lying melodic perception and memory for pitch. J. Neurosci. 14 (4),
Zatorre, R.J., Mondor, T.A., Evans, A.C., 1999. Auditory attention to space
and frequency activates similar cerebral systems. NeuroImage 10 (5),
Zatorre, R.J., Belin, P., Penhune, V.B., 2002. Structure and function of
auditory cortex: music and speech. Trends Cogn. Sci. 6 (1), 37–46.
J. Gandour et al. / NeuroImage 23 (2004) 344–357