Auditory–Motor Interaction Revealed by fMRI:
Speech, Music, and Working Memory in Area Spt
Gregory Hickok, Bradley Buchsbaum, Colin Humphries,
and Tugan Muftuler
& The concept of auditory–motor interaction pervades
speech science research, yet the cortical systems supporting
this interface have not been elucidated. Drawing on exper-
imental designs used in recent work in sensory–motor
integration in the cortical visual system, we used fMRI in an
effort to identify human auditory regions with both sensory and
motor response properties, analogous to single-unit responses
in known visuomotor integration areas. The sensory phase of
the task involved listening to speech (nonsense sentences) or
music (novel piano melodies); the ‘‘motor’’ phase of the task
involved covert rehearsal/humming of the auditory stimuli. A
small set of areas in the superior temporal and temporal–
parietal cortex responded both during the listening phase and
the rehearsal/humming phase. A left lateralized region in the
posterior Sylvian fissure at the parietal–temporal boundary,
area Spt, showed particularly robust responses to both phases
of the task. Frontal areas also showed combined auditory +
rehearsal responsivity consistent with the claim that the
posterior activations are part of a larger auditory–motor
integration circuit. We hypothesize that this circuit plays an
important role in speech development as part of the network
that enables acoustic–phonetic input to guide the acquisition
of language-specific articulatory-phonetic gestures; this circuit
may play a role in analogous musical abilities. In the adult, this
system continues to support aspects of speech production,
and, we suggest, supports verbal working memory. &
Researchers in several fields have postulated a link
between auditory and motor representations of speech:
Neurological tradition dictates that there is a connection
between the left posterior auditory fields (Wernicke’s
area) and the left frontal articulatory systems (Broca’s
area), which support aspects of speech production
including the ability to repeat heard speech (Benson
et al., 1973). In speech perception research, the idea of
articulatory-based representations supporting speech
perception (the Motor Theory; Liberman & Mattingly,
1985) has a long history and has garnered recent
attention with the discovery of mirror neurons (Rizzo-
latti & Arbib, 1998) (‘‘motor’’ cells that respond to the
perception of action). In the area of verbal working
memory, scientists have argued that articulatory circuits
can be used to refresh the contents of a sensory storage
system (Wilson, 2001; Baddeley, 1992). In addition,
workers in speech development have argued that young
children must compare the speech sounds they hear
in their environment with their own speech output
attempts as a means of tuning their articulatory system
(Doupe & Kuhl, 1999). Despite the widespread agree-
ment that some form of auditory–motor interface sys-
tem must exist, little progress has been made in
mapping the neural basis of this network. In fact, Doupe
and Kuhl (1999), in a discussion of this point in the
context of both speech and birdsong, have stated,
‘‘Despite its clear importance, the link between percep-
tion and production is surprisingly ill understood in
both speech and song systems ...’’ (p. 606).
At the same time, work in the visual domain has been
quite successful in identifying visuomotor interface sys-
tems in the dorsal (parietal) processing stream. Several
parietal regions in monkey have been identified which
appear to be optimized for interfacing visual input with
various motor effector systems (Andersen, 1997; Rizzo-
latti, Fogassi, & Gallese, 1997). For example, area AIP
contains a class of neurons which have visuomotor
response properties: They respond both to the visual
presentation of a 3-D object, and during grasping of that
object (even when grasping is carried out in the dark)
(Murata, Gallese, Kaseda, & Sakata, 1996). AIP is recip-
rocally connected to frontal area F5 that also contains
neurons responsive during grasping or manipulation
with the hand, and deactivation of either AIP or F5
produces grasping deficits (Gallese, Fadiga, Fogassi,
Luppino, & Murata, 1997). AIP, then, appears to be part
of a visuomotor integration circuit that relates percep-
tual codes for object shape/orientation to motor codes
for grasping/manipulation with the hand.
University of California, Irvine
D 2003 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 15:5, pp. 673–682
Guided by the recent success in mapping visuomotor
integration systems, we have recently hypothesized that
an auditory–motor interface system is located in the
inferior parietal lobe in humans, as part of the auditory
dorsal stream (Hickok & Poeppel, 2000). Consistent
with this proposal is the observation that the inferior
parietal lobe is activated during verbal working memory
tasks (Jonides et al., 1998), tasks that can be viewed as
involving a kind of auditory–motor integration (Wilson,
2001). Although most authors attribute the parietal lobe
activation to the operations of a verbal ‘‘storage’’ com-
ponent (the ‘‘phonological store’’), we have suggested
instead that it reflects an auditory–motor interface
system which translates between auditory representa-
tions of speech in the superior temporal lobe and motor
representations of speech in the frontal lobe (Hickok &
Poeppel, 2000). This interpretation predicts (i) that
inferior parietal areas (auditory–motor integration) as
well as portions of the superior temporal lobe (sensory
coding) should be active during verbal working memory
tasks, and (ii) that these sites should show both audi-
tory and motor response properties (analogous to
visuomotor response properties of many AIP neurons,
for example). We have reported a preliminary fMRI
study which supports these predictions (Buchsbaum,
Hickok, & Humphries, 2001). In that study, subjects
listened to sets of three multisyllabic nonsense words
(the sensory phase), and then silently rehearsed them
(the motor phase)1for several seconds. Regions that
responded to both phases of the task were found in two
posterior sites in every subject: a lateral site in the
posterior superior temporal sulcus (STS), and a more
dorsal site in the left posterior Sylvian fissure at the
parietal–temporal boundary (area Spt).
Another recent report has identified the left posterior
dorsal STG (our Spt) as a site active during the motor act
of speech (Wise et al., 2001). In this PET study, this
region was active when subjects repeated a phrase out
loud and mouthed the phrase silently, but not when
subjects were asked to ‘‘think’’ of the phrase repeatedly.
If subjects were silently rehearsing in the ‘‘think’’ con-
dition, we would have expected Spt to be active, yet it
was not. Thus, the evidence to date has not provided
unequivocal support for the hypothesis that Spt is a
region with auditory–motor response properties analo-
gous to single-unit responses in the dorsal visual stream.
The present fMRI experiment had two goals: first, to
determine whether or not Spt demonstrates auditory–
motor response properties, and second, to determine
the stimulus specificity of the response in area Spt by
contrasting speech stimuli with melodic tonal stimuli. If
this temporal–parietal site functions as a phonological
store, as has been proposed, it should be less responsive
in tasks that involve nonphonemic stimuli. Alternatively,
if this region subserves auditory–motor integration
more generally, as we have proposed, it may be equally
responsive in phonemic and nonphonemic tasks.
Five subjects listened to, and then covertly rehearsed,
either nonsense (‘‘jabberwocky’’) sentences, or melodic
tonal sequences (subjects covertly hummed the tonal
sequences) while the hemodynamic response was moni-
tored using fMRI. Using multiple regression analysis,
three classes of responses where identified: ‘‘auditory,’’
in which the MR signal increased in response to acoustic
stimulation but not during the rehearsal phase;
‘‘rehearsal,’’ in which the signal increased during the
rehearsal phase but not during auditory stimulation;
and, ‘‘auditory + rehearsal,’’ in which signal increased
both during the auditory and rehearsal phases of the
task. Again, articulatory rehearsal was carried out sub-
vocally, so activation during the rehearsal phase cannot
be a result of hearing one’s own voice. A ‘‘listen-only’’
condition was also included (carried out in separate
runs) in which participants simply listened to the same
set of materials without rehearsing. Comparing activa-
tion between the listen-only condition and the rehearse
condition provided another means to identify regions
with auditory + rehearsal responses.
Auditory responses were found bilaterally in the supe-
rior temporal lobe, but also in small foci of activation in
the right frontal cortex (lateral premotor) for both
stimulus classes. Rehearsal responses were found pre-
dominantly in the posterior frontal lobes (inferior and
middle frontal gyri) and anterior insula, although there
were foci of rehearsal activity both in the parietal (e.g.,
supramarginal gyrus) and in the superior temporal lobes
(posterior STS) for both music and speech.
Auditory + rehearsal responses, our primary focus,
were found in the left Spt in every subject for both the
speech and music conditions. In the group analysis,
Spt activations were more extensive in the left hemi-
sphere for speech, as well as for music, and were
centered at Talairach coordinates x = ?51, y = ?46,
z = 16, and x = ?54, y = ?39, z = 20, respectively
(see Figure 1). The group analysis also identified
auditory + rehearsal responses more ventrally within
the STS. Two STS foci were noted for speech, only in
the left hemisphere, one in the posterior sector of
STS (x = ?45, y = ?54, z = 4), the other in the
middle sector (x = ?59, y = ?30, z = 0). Two STS
activation foci were also found in the left hemisphere
in similar posterior and middle locations (x = ?45, y =
?55, z = 4 and x = ?59, y = ?32, z = 4, respectively),
and an additional focus for music was observed in the
right hemisphere (x = 53, y = ?47, z = 4), but see
below for further discussion of hemisphere differences
in the STS. Auditory and rehearsal foci were also noted
in the posterior frontal cortex including the lateral
premotor and inferior frontal gyrus, consistent with
our previous report (Buchsbaum et al., 2001) (see
Figure 1). Because the focus of the present study is
674Journal of Cognitive Neuroscience Volume 15, Number 5
inactivation experiments. In P. Thier & H.-O. Karnath (Eds.),
Parietal lobe contributions to orientation in 3D space
(pp. 255–270). Heidelberg: Springer-Verlag.
Geschwind, N. (1965). Disconnexion syndromes in animals and
man. Brain, 88, 237–294, 585–644.
Goodglass, H. (1992). Diagnosis of conduction aphasia. In S. E.
Kohn (Ed.), Conduction aphasia (pp. 39–49). Hillsdale,
Gordon, W. P. (1983). Memory disorders in aphasia: I. Auditory
immediate recall. Neuropsychologia, 21, 325–339.
Guenther, F. H., Hampson, M., & Johnson, D. (1998).
A theoretical investigation of reference frames for the
planning of speech movements. Psychological Review,
Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs
through your head: A PET investigation of auditory imagery
for familiar melodies. Cerebral Cortex, 9, 697–704.
Hickok, G. (2000). Speech perception, conduction aphasia, and
the functional neuroanatomy of language. In Y. Grodzinsky,
L. Shapiro, & D. Swinney (Eds.), Language and the brain
(pp. 87–104). San Diego: Academic Press.
Hickok, G. (2001). Functional anatomy of speech perception
and speech production: Psycholinguistic implications.
Journal of Psycholinguistic Research, 30, 225–234.
Hickok, G., Erhard, P., Kassubek, J., Helms-Tillery, A. K.,
Naeve-Velguth, S., Strupp, J. P., Strick, P. L., & Ugurbil, K.
(2000). A functional magnetic resonance imaging study of
the role of left posterior superior temporal gyrus in speech
production: Implications for the explanation of conduction
aphasia. Neuroscience Letters, 287, 156–160.
Hickok, G., & Poeppel, D. (2000). Towards a functional
neuroanatomy of speech perception. Trends in Cognitive
Sciences, 4, 131–138.
Indefrey, P., & Levelt, W. J. M. (2000). The neural correlates
of language production. In M. S. Gazzaniga (Ed.), The new
cognitive neurosciences (pp. 845–865). Cambridge:
Jones, D. M., & Macken, W. J. (1996). Irrelevant tones produce
an irrelevant speech effect: Implications for phonological
coding in working memory. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 19,
Jonides, J., Schumacher, E. H., Smith, E. E., Koeppe, R. A.,
Awh, E., Reuter-Lorenz, P. A., Marshuetz, C., & Willis, C. R.
(1998). The role of parietal cortex in verbal working
memory. Journal of Neuroscience, 18, 5026–5034.
Levelt, W. J. M., Praamstra, P., Meyer, A. S., Helenius, P., &
Salmelin, R. (1998). An MEG study of picture naming.
Journal of Cognitive Neuroscience, 10, 553–567.
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of
speech perception revised. Cognition, 21, 1–36.
Murata, A., Gallese, V., Kaseda, M., & Sakata, H. (1996). Parietal
neurons related to memory-guided hand manipulation.
Journal of Neurophysiology, 75, 2180–2186.
Ollinger, J. M., Shulman, G. L., & Corbetta, M. (2001).
Separating processes within a trial in event-related functional
MRI. Neuroimage, 13, 210–217.
Pinheiro, J. C. (2000). Mixed-effects models in S and S-PLUS. In
J. C. Pinheiro & D. M. Bates (Eds.), Statistics and computing.
New York: Springer.
Rizzolatti, G., & Arbib, M. (1998). Language within our grasp.
Trends in Neurosciences, 21, 188–194.
Rizzolatti, G., Fogassi, L., & Gallese, V. (1997). Parietal cortex:
From sight to action. Current Opinion in Neurobiology, 7,
Salame ´, P., & Baddeley, A. (1982). Disruption of short-term
memory by unattended speech: Implications for the
structure of working memory. Journal of Verbal Learning
and Verbal Behavior, 21, 150–164.
Smith, E. E., & Jonides, J. (1997). Working memory: A view
from neuroimaging. Cognitive Psychology, 33, 5–42.
Strub, R. L., & Gardner, H. (1974). The repetition defect in
conduction aphasia: Mnestic or linguistic? Brain and
Language, 1, 241–255.
Wilson, M. (2001). The case for sensorimotor coding in working
memory. Psychonomic Bulletin and Review, 8, 44–57.
Wise, R. J. S., Scott, S. K., Blank, S. C., Mummery, C. J.,
Murphy, K., & Warburton, E. A. (2001). Separate neural
sub-systems within ‘‘Wernicke’s area’’. Brain, 124, 83–95.
Woods, R. P., Grafton, S. T., Holmes, C. J., Cherry, S. R., &
Mazziotta, J. C. (1998). Automated image registration:
I. General methods and intrasubject, intramodality
validation. Journal of Computer Assisted Tomography, 22,
682 Journal of Cognitive NeuroscienceVolume 15, Number 5