ArticlePDF Available

Abstract and Figures

Merging the information from different senses is essential for successful interaction with real-life situations. Indeed, sensory integration can reduce perceptual ambiguity, speed reactions, or change the qualitative sensory experience. It is widely held that integration occurs at later processing stages and mostly in higher association cortices; however, recent studies suggest that sensory convergence can occur in primary sensory cortex. A good model for early convergence proved to be the auditory cortex, which can be modulated by visual and tactile stimulation; however, given the large number and small size of auditory fields, neither human imaging nor microelectrode recordings have systematically identified which fields are susceptible to multisensory influences. To reconcile findings from human imaging with anatomical knowledge from nonhuman primates, we exploited high-resolution imaging (functional magnetic resonance imaging) of the macaque monkey to study the modulation of auditory processing by visual stimulation. Using a functional parcellation of auditory cortex, we localized modulations to individual fields. Our results demonstrate that both primary (core) and nonprimary (belt) auditory fields can be activated by the mere presentation of visual scenes. Audiovisual convergence was restricted to caudal fields [prominently the core field (primary auditory cortex) and belt fields (caudomedial field, caudolateral field, and mediomedial field)] and continued in the auditory parabelt and the superior temporal sulcus. The same fields exhibited enhancement of auditory activation by visual stimulation and showed stronger enhancement for less effective stimuli, two characteristics of sensory integration. Together, these findings reveal multisensory modulation of auditory processing prominently in caudal fields but also at the lowest stages of auditory cortical processing.
Content may be subject to copyright.
Behavioral/Systems/Cognitive
Functional Imaging Reveals Visual Modulation of Specific
Fields in Auditory Cortex
Christoph Kayser, Christopher I. Petkov, Mark Augath, and Nikos K. Logothetis
Max Planck Institute for Biological Cybernetics, 72076 Tu¨bingen, Germany
Merging the information from different senses is essential for successful interaction with real-life situations. Indeed, sensory integration
can reduce perceptual ambiguity, speed reactions, or change the qualitative sensory experience. It is widely held that integration occurs
at later processing stages and mostly in higher association cortices; however, recent studies suggest that sensory convergence can occur
in primary sensory cortex. A good model for early convergence proved to be the auditory cortex, which can be modulated by visual and
tactile stimulation; however, given the large number and small size of auditory fields, neither human imaging nor microelectrode
recordings have systematically identified which fields are susceptible to multisensory influences. To reconcile findings from human
imaging with anatomical knowledge from nonhuman primates, we exploited high-resolution imaging (functional magnetic resonance
imaging) of the macaque monkey to study the modulation of auditory processing by visual stimulation. Using a functional parcellation of
auditory cortex, we localized modulations to individual fields. Our results demonstrate that both primary (core) and nonprimary (belt)
auditory fields can be activated by the mere presentation of visual scenes. Audiovisual convergence was restricted to caudal fields
[prominently the core field (primary auditory cortex) and belt fields (caudomedial field, caudolateral field, and mediomedial field)] and
continued in the auditory parabelt and the superior temporal sulcus. The same fields exhibited enhancement of auditory activation by
visual stimulation and showed stronger enhancement for less effective stimuli, two characteristics of sensory integration. Together, these
findings reveal multisensory modulation of auditory processing prominently in caudal fields but also at the lowest stages of auditory
cortical processing.
Key words: fMRI; macaque monkey; multisensory integration; cross-modal; perception; auditory cortex
Introduction
Under natural conditions, we commonly perceive our environ-
ment by integrating information from most of our senses. To
accomplish this, our brain merges spatially overlapping represen-
tations from different sensory channels (Stein and Meredith,
1993). Classically, it is thought that neurons in sensory areas
respond to stimulation in one modality only, whereas neurons in
higher association areas prefer specific combinations of stimuli
(Benevento et al., 1977; Hyvarinen and Shelepin, 1979; Bruce et
al., 1981; Pandya and Yeterian, 1985). These association areas
then send multisensory signals back to lower areas and (subcor-
tical) regions involved in perception and in planning and execut-
ing actions (Stein et al., 1993). According to this hypothesis, sen-
sory integration occurs only after unisensory information has
been processed along its specific sensory hierarchy (Jones and
Powell, 1970).
Recent studies differ from this view and suggest that multisen-
sory processing already occurs in areas that are classically re-
garded as unisensory (Macaluso and Driver, 2005; Schroeder and
Foxe, 2005; Ghazanfar and Schroeder, 2006). Of our senses, the
auditory system is proving to be a particularly valuable model for
studying sensory integration. Several studies demonstrated that
visual stimulation alone activates auditory cortex (Calvert et al.,
1997; Calvert and Campbell, 2003) and that visual stimuli en-
hance auditory activations (van Atteveldt et al., 2004; Pekkola et
al., 2005; van Wassenhove et al., 2005; Lehmann et al., 2006;
Martuzzi et al., 2006). In addition, similar effects have been ob-
served for combinations of auditory and tactile stimulation (Foxe
et al., 2002; Kayser et al., 2005; Murray et al., 2005) and as a result
of multisensory attention (Jancke et al., 1999; Macaluso et al.,
2000b).
Most of these findings were obtained by using imaging meth-
ods; however, given the intricate anatomical structure of auditory
cortex, which consists of many small and proximal fields (Pan-
dya, 1995; Kaas and Hackett, 2000; Hackett et al., 2001), most
functional magnetic resonance imaging (fMRI) studies cannot
localize multisensory processes to individual fields. In particular,
human fMRI studies often lack the resolution required to disen-
tangle, or cannot functionally localize, individual auditory fields.
Other methods, which allow fine spatial localization of multisen-
sory processing, are available in animal models. Electrophysio-
logical studies revealed nonauditory modulations in several au-
ditory fields of the macaque monkey, but they were restricted by
the difficulty in sampling broad regions of cortex (Schroeder et
al., 2001; Schroeder and Foxe, 2002; Fu et al., 2003, 2004; Werner-
Received Oct. 31, 2006; revised Jan. 9, 2007; accepted Jan. 9, 2007.
This work was supported by the Max Planck Society, the German Research Foundation (KA 2661/1), and the
Alexander von Humboldt Foundation.
Correspondence should be addressed to Christoph Kayser, Max Planck Institute for Biological Cybernetics, Spe-
mannstrasse 38, 72076 Tu¨bingen, Germany. E-mail: christoph.kayser@tuebingen.mpg.de.
DOI:10.1523/JNEUROSCI.4737-06.2007
Copyright © 2007 Society for Neuroscience 0270-6474/07/271824-12$15.00/0
1824 The Journal of Neuroscience, February 21, 2007 27(8):1824–1835
Reiss et al., 2003; Brosch et al., 2005; Ghazanfar et al., 2005).
Anatomical studies, however, have good resolution and spatial
coverage but cannot demonstrate whether a revealed connection
is functionally important (Pandya and Yeterian, 1985; Hackett et
al., 1998a; Rockland and Ojima, 2003). Altogether, this prohibits
our efforts to understand the human imaging results pertaining
to multisensory processing on the basis of detailed anatomical
knowledge and electrophysiological results available in other pri-
mate species.
The present study was designed to bridge this gap by exploit-
ing high-resolution fMRI of the macaque monkey and using a
recently described technique (Petkov et al., 2006) to localize nu-
merous fields in auditory cortex. Studying both the anesthetized
and alert preparation, we observed visual activations and nonlin-
ear enhancement of auditory activity specifically in caudal audi-
tory fields.
Materials and Methods
This study presents data from fMRI experiments with macaque monkeys
(Macaca mulatta). All procedures were approved by the local authorities
(Regierungspra¨sidium) and are in full compliance with the guidelines of
the European Community (EUVD 86/609/EEC) for the care and use of
laboratory animals.
Animal preparation. Data from anesthetized animals were obtained by
using the following protocol. After premedication with glycopyrolate
(0.01 mg/kg, i.m.) and ketamine (15 mg/kg, i.m.), an intravenous cath-
eter was inserted into the saphenous vein. Anesthesia was induced with
fentanyl (3
g/kg), thiopental (5 mg/kg), and succinyl-choline chloride
(3 mg/kg); the animal was intubated, and anesthesia was maintained with
remifentanyl (0.5–2
g kg
1
min
1
). Muscle relaxation was induced
with mivacurium chloride (5 mg kg
1
h
1
). Physiological parameters
(heart rate, blood pressure, body temperature, blood oxygenation, and
expiratory CO
2
) were monitored and kept in the desired range. Intravas
-
cular volume was maintained by continuous administration of lactated
Ringer’s solution (10 ml kg
1
h
1
) or injections of colloids (hydroxy
-
ethyl starch; 10 –50 ml over 1–2 min as needed). Headphones (MR Con-
fon, Magdeburg, Germany) for sound presentation were positioned over
the ears and covered with foam (Tempur-Pedic, Lexington, KY) to atten-
uate outside sounds. The eyes were dilated with cyclopentolate, and con-
tact lenses with the appropriate dioptic power were used to focus the
visual stimulus. A super video graphics array fiber-optic system (Silent
Vision; Avotec, Stuart, FL) for the presentation of visual stimuli was
aligned with the fovea of each eye by means of a fundus camera.
Data from the behaving animal were obtained by using the following
protocol. The animal was trained to complete trials of visual fixation in
combination with minimal jaw and body movements. Training pro-
ceeded in a mock environment with operant conditioning procedures
and juice rewards. Visual stimuli were presented through a super video
graphics array fiber-optic system (Silent Vision; Avotec) that was aligned
with each eye. Eye movements were measured with a custom-made op-
tical infrared eye tracker attached to the fiber-optic system. Headphones
(MR Confon) for sound presentation were positioned over the ears and
covered with foam (Tempur-Pedic) to attenuate outside sounds.
Stimulus presentation. Visual stimuli were movies stored as audio
video interleave files and presented at 60 Hz and a resolution of 640
480 pixels, covering a field of view of 30 23°. The movies clips were
taken from commercially available documentaries of animal wildlife and
showed different animals in their natural settings, such as grassland and
coppice. The effectiveness of the visual stimulation was verified by activ-
ity in the occipital lobe.
Auditory stimuli were stored as waveform audio format files, ampli-
fied with a Yamaha amplifier (AX-496), and delivered with MR-
compatible headphones at an average intensity range of 85 dB sound
pressure level (SPL). The sound presentation was calibrated with an MR-
compatible condenser microphone [Bru¨el & Kjær (Stuttgart, Germany)
4188 and a 2238 Mediator sound level meter] to ensure a linear transfer
function. The headphone cups together with the foam placed around
them were measured to attenuate the scanner noise (peak intensity, 105
dB SPL) by 30 dB SPL. Importantly, with the sparse scanning sequences
used, imaging data acquisition occurs for 1.5–2 s only every 10 s, leaving
time for the presentation of sound stimuli in the absence of scanner
noise. Sound stimuli consisted of natural sounds matching the movie
clips. Synthesized noise and tone stimuli were used as localizer stimuli to
delineate the different auditory cortical fields. Single-frequency tones
and one-octave noise bursts of 50 ms duration were presented at a repe-
tition rate of 8 Hz, with an interstimulus interval of 75 ms and peak
intensities between 75 and 85 dB SPL. These sounds covered a range from
0.250 to 16 kHz in steps of one octave and were used to construct the
functional parcellation of auditory cortex used for localizing many fields
(see Fig. 2).
Combined audiovisual stimuli were obtained by presenting the movie
clip and corresponding sound simultaneously. Degraded stimuli were
created based on three natural scenes by decreasing the signal-to-noise
ratio in the auditory and visual domain (see Fig. 6 A). For the visual
stimuli, random pixel noise was introduced as follows. For each frame of
the movie, a random subset of 80% of the pixels was chosen uniformly
across the frame, and the color values of these pixels were randomly
permuted (i.e., the value of one of theses pixels was assigned to another of
these pixels). For the auditory stimulus, pink noise (relative amplitude
10 dB relative to the average intensity of the sound clips) was added to
the auditory waveform.
Different stimulus paradigms were used for the anesthetized and be-
having animals. For the former, stimuli were presented in a pseudoran-
dom order, with each stimulus lasting 40 s. A 40 s baseline period flanked
each stimulation period (see Fig. 1B). During each stimulus or baseline
period, four data volumes were acquired (2 s acquisition time; see
below), and the scanner remained silent during the remaining time ( 8s
between successive volumes), allowing the presentation of auditory stim-
uli in the absence of scanner noise (Belin et al., 1999; Hall et al., 1999;
Maeder et al., 2001; Jancke et al., 2002; van Atteveldt et al., 2004). At least
36 repeats of each condition were obtained per experiment.
For the behaving animal, we relied on a trial-based paradigm (see
above, Animal preparation). Across trials, auditory, visual, and audiovi-
sual stimuli were presented in a pseudorandom order. During each ex-
periment, at least 36 repeats of each condition were obtained. A trial
began with a period of minimal body movement (4 s), after which the
animal engaged a visual fixation point. During this fixation period, a
baseline volume was acquired, followed by an 8 s stimulation interval, at
the end of which a stimulus-related volume was acquired; hence, two
volumes (one baseline and one stimulus) were acquired per trial. Data
acquisition started only after the animal successfully engaged in the fix-
ation and made only minimal body movements. Eye movements outside
the fixation window (3° radius) or any body movements aborted the trial,
and only correctly completed trials were analyzed.
Data collection. Measurements with anesthetized animals were made
on a vertical 4.7 T scanner equipped with a 40-cm-diameter bore (Bio-
spec 47/40v; Bruker Medical, Ettlingen, Germany) and a 50 mT/m ac-
tively shielded gradient coil (B-GA 26; Bruker Medical) of 26 cm inner
diameter. Measurements with the behaving animal were made on a ver-
tical 7 T scanner equipped with a 60-cm-diameter bore (Biospec 7/60v;
Bruker Medical) and an 80 mT/m actively shielded gradient coil (B-GA
38; Bruker Medical) of 38 cm inner diameter. During the experiment, the
animal’s head was positioned with a custom-made plastic head post
(Tecapeek; Ensinger, Nufringen, Germany). Signals were acquired by
placing surface coils over the auditory cortex of one hemisphere to max-
imize signal-to-noise and resolution over this area or by using whole-
head volume coils. The image slices were oriented parallel to the lateral
sulcus to capture auditory cortex within a small number of slices (see Fig.
1A). In fact, in many experiments, auditory activations were essentially
captured by two slices (see Fig. 1C).
For the anesthetized animals, functional data were acquired with a
multishot (four segments) gradient-recalled echo planar imaging se-
quence with typical parameters [echo time (TE), 20 ms; volume acquisi-
tion time (TA), 1.5 s; volume repetition time (TR), 10 s; flip angle, 30;
spectral width, 100 kHz; on a grid of 128 128 voxels; 2 mm slice
thickness; 9 –12 slices]. The field of view was adjusted for each animal and
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007 27(8):1824 –1835 1825
was typically 6.4 6.4 cm for the surface coil and between 9.6 9.6 and
12.8 12.8 cm for the volume coil, resulting in voxel sizes of 0.5–2 mm
3
.
Anatomical images (T1-weighted) were acquired with an eight-segment,
three-dimensional, modified-driven equilibrium with Fourier transform
pulse sequence with the following parameters: TE, 4 ms; TR, 22 ms; flip
angle, 20; spectral width, 75 kHz; 384 384 voxels; and with five
averages.
For the behaving animal, functional data were acquired with a multi-
shot (two segments) gradient-recalled echo planar imaging sequence
with typical parameters (TE, 9 ms; volume TA, 1.5 s; volume TR, 10 s; flip
angle, 45; spectral width, 158 kHz; on a grid of 96 80 voxels; 2 mm slice
thickness; 9 –12 slices). The field of view was 9.6 8.0 cm. Anatomical
images were acquired with a fast low-angle shot sequence with the fol-
lowing parameters: TE, 10 ms; TR, 750 ms; flip angle, 45; spectral width,
25 kHz; 192 160 voxels.
All anatomical images were acquired on the same field of view as the
functional data, but they covered a larger extent in the z-direction.
Hence, despite different absolute resolutions, functional and anatomical
images were acquired in register, alleviating the problem of post hoc
alignment. Although functional and anatomical data can show relative
distortions at high fields, this was not a problem for us because we re-
stricted our analysis to a region of interest (ROI) around the lateral and
superior temporal sulcus. Within this region, optimal adjustment of pa-
rameters ensured a good register of functional and anatomical data. For
each scan, an autoshim algorithm was used to optimize the linear and
higher-order shim coils.
Data analysis. In total, we scanned six different animals. One animal
was used for the alert preparation experiments and was scanned on 7
different days (each day was a separate experiment). Five other animals
were used for the anesthetized preparation experiments (two of these
were scanned twice; the other animals were scanned once). The results
pertaining to the alert animal (see Fig. 4) were obtained by statistically
testing the seven experiments with this animal for the respective effects.
The results pertaining to the anesthetized animals (see Fig. 3) were ob-
tained by pooling all seven experiments from this preparation; hence,
two animals contributed two experiments, and the remaining animals
contributed one experiment each. This pooling of experiments seemed
reasonable because the variability within animals was similar to the vari-
ability across animals.
As described above, some experiments used a surface coil (to maxi-
mize resolution over auditory cortex), whereas others used a volume coil
(to increase coverage outside auditory cortex). As a result, we either
quantified activations only in one hemisphere or pooled data from both
hemispheres, so that each experiment contributed equally to the final
statistics (e.g., for the ROI analysis, the activations in individual condi-
tions were averaged between hemispheres).
Unimodal activations and multisensory enhancement. The data were
analyzed off-line in Matlab (Mathworks, Natick, MA). Multislice data
(volumes) were converted into time points, linear drifts were removed,
and the data were normalized to units of signal change compared with
baseline. To quantify responses, the data were averaged across repeats of
the same condition within each experiment (i.e., all repeats of the same
condition were averaged). Functional maps were computed with a ran-
domization procedure, taking into account both voxel value and spatial
cluster extent (Nichols and Holmes, 2002; Hayasaka and Nichols, 2004).
For each voxel, the activation in a given condition was computed as the
summed signal change in a spatial neighborhood of 3 3 voxels in the
same slice. The significance of this activation was derived from a distri-
bution of values obtained after randomizing the time series of all voxels
within the brain. Voxels reaching a significant activation ( p 0.05) in at
least one condition were termed “sensory responsive voxels,” and the
following analysis was restricted to this set of voxels.
The activation strength for individual conditions was quantified as the
signal change of the blood oxygenation level-dependent (BOLD) re-
sponse (see Fig. 1 D). The signal change was averaged across the sensory
responsive voxels within each ROI (i.e., each auditory field). To compare
the activation strength across experiments, we used a normalization pro-
cedure. To account for variations in the total activation across experi-
ments, a relative response measure was obtained by dividing the signal
change in individual conditions by the sum of the signal change of all
three conditions. This relative response was then expressed in units of
percent (see Figs. 3A,4B). Such a relative response measure is advanta-
geous, because the total responsiveness (hence, the average signal
change) varies from experiment to experiment, both within and across
animals. By normalizing for this difference, the relative response measure
allows a more accurate quantification of the balance among auditory,
visual, and combined activations.
Multisensory modulations were quantified with an ROI approach and
at the level of individual voxels. First, an ROI analysis was conducted
within individual auditory fields. The (un-normalized) signal change was
computed for each ROI and condition as above. To detect enhancement
(or suppression) of auditory activations (condition A) by a simultaneous
visual stimulus (condition AV), the difference between the respective
signal changes was quantified in units of percent: the contrast (AV
A)/A 100 was computed for individual fields and compared with zero
across experiments (see Figs. 3A,4B).
Second, the enhancement of individual voxels was quantified by using
a stringent criterion frequently used in imaging studies. This criterion
assumes multisensory enhancement if the activation to the multisensory
stimulus is larger than the sum of the activations for the two unisensory
stimuli (Giard and Peronnet, 1999; Calvert, 2001; Calvert et al., 2001;
Beauchamp, 2005; Kayser et al., 2005). The signal change in the audiovi-
sual condition was compared with the sum of signal changes in auditory
and visual conditions; a voxel was identified as significantly enhanced if
the contrast [AV (A V)] was significantly positive. Statistically, this
was implemented with the same randomization procedure as used above
[i.e., the contrast AV (A V) was summed across neighboring voxels
and its significance determined with a randomization procedure]. To
detect multisensory enhancement, only voxels reaching a level of p
0.01 were considered.
Analysis of response patterns. As a complementary analysis to the acti-
vation strength, we compared the spatial pattern of activation among
conditions (Haxby et al., 2001; Cox and Savoy, 2003; Haynes and Rees,
2006). For each experiment, the set of sensory responsive voxels was
determined on the basis of the responses to all three conditions (see
above). Then, for each condition the activation pattern was defined as the
vector containing the signal change (for this condition) of all sensory
responsive voxels within an ROI (here defined by one or several auditory
fields). Because this set of voxels is the same for each condition, all three
vectors have the same length. To compare the similarity of the activation
pattern within and across conditions, we used a split-dataset approach,
similar to that of Haxby et al. (2001): the dataset for each condition was
split into even and odd runs, and the activation pattern was computed
across each half. The similarity within a condition was computed as the
Pearson correlation coefficient of the pattern obtained from even and
odd runs, whereas the similarity across conditions was obtained by cor-
relating both patterns from both conditions with each other (and aver-
aging all four combinations). If the activation pattern differs between two
conditions, then the similarity within a condition should be higher com-
pared with the similarity across conditions; hence, comparing the differ-
ence between the two similarity values with zero is a sensitive measure for
changes in the activation pattern among conditions. In the present case,
we used this approach to test whether an additional visual stimulus alters
the auditory activation pattern in different ROIs along the rostrocaudal
axis (see Fig. 5). For each region, the similarity within the auditory con-
dition was compared with the similarity between auditory and audiovi-
sual conditions.
A dissimilarity of activation patterns either can reflect a difference in
spatial pattern or can be the result of a simple scaling of activity for
certain conditions and voxels. We used two strategies to rule out the
possibility that simple differences in activation strength are the main
cause of the dissimilarity of auditory and audiovisual activation patterns.
First, if the most active voxels dominate the dissimilarity, then the acti-
vation patterns should become more similar when those voxels are omit-
ted. Thus, we replicated the same analysis while skipping 15% of the most
active voxels [as was done by Haxby et al. (2001)]. The set of the 15%
most active voxels was defined as follows. For each condition, those 15%
voxels with the strongest activation were identified and removed from all
1826 J. Neurosci., February 21, 2007 27(8):1824 –1835 Kayser et al. Visual Modulation of Auditory Cortex
conditions; then the set of “sensory responsive voxels” was recalculated
for each experiment; and, finally, the activation pattern was then defined
based on this reduced set of sensory responsive voxels. This analysis
resulted in the same rostrocaudal pattern as the previous analysis, ruling
out a dominance of the strongly active voxels (see Fig. 5, small boxes).
Second, instead of using the Pearson correlation coefficient, which is
sensitive to both the sign and magnitude of the difference between indi-
vidual pairs of numbers, to compare activation patterns, we used the
Spearman-rank correlation, which considers only the sign. The pattern
similarity analysis based on the Spearman-rank correlation also con-
firmed the results reported in Figure 5.
Functional parcellation of auditory cortex and ROIs. The primate audi-
tory cortex consists of a number of functional fields that can be distin-
guished on the basis of their anatomical and functional properties (Pan-
dya, 1995; Kosaki et al., 1997; Rauschecker, 1998; Kaas and Hackett,
2000; Hackett et al., 2001; Hackett, 2002). Three auditory fields [labeled
primary auditory cortex (A1), rostral field (R), and rostrotemporal field]
receive strong input from the ventral division of the medial geniculate
nucleus and are considered to be primary auditory cortex (see Fig. 2 A,
the core). The remaining regions receive projections from the core (and
also from the thalamus), are regarded as auditory association cortex, and
are separated into belt and parabelt fields. Many of these fields can be
distinguished on the basis of their cytoarchitecture. In addition, these
fields can be functionally distinguished on the basis of their response
selectivity to sounds of different frequency content, bandwidth, and tem-
poral envelope (Merzenich and Brugge, 1973; Rauschecker et al., 1995;
Kosaki et al., 1997; Recanzone et al., 2000; Tian and Rauschecker, 2004).
Importantly, the selectivity to sound frequency and bandwidth is topo-
graphically organized, and many fields contain ordered maps represent-
ing different sound frequencies along the rostrocaudal axis (see Fig. 2 A).
A functional parcellation of auditory cortex can be obtained by using
extensive electrophysiological mappings (Kosaki et al., 1997; Raus-
checker et al., 1997; Recanzone et al., 1999) and also by using fMRI–
BOLD measurements, as shown by recent studies (Wessinger et al., 1997,
2001; Formisano et al., 2003; Talavage et al., 2004; Petkov et al., 2006).
Different auditory fields were defined on the basis of their response
preferences to sound frequency and bandwidth. The preference for
sound frequencies changes in the rostrocaudal direction and shows mul-
tiple reversals. For example, the caudal portion of A1 prefers high fre-
quencies, and its rostral portion prefers low frequencies. Similarly, the
preference for sound bandwidth changes from core to belt: the former
prefers narrow bandwidth sounds, and the latter prefers broadband
sounds (Merzenich and Brugge, 1973; Rauschecker et al., 1995; Recan-
zone et al., 2000). These differential preferences for sound frequency and
bandwidth were used to obtain a functional parcellation of auditory
cortex; for details see Petkov et al. (2006). In short, sounds of different
frequencies (250 Hz to 16 kHz) and bandwidth (single-frequency tones
and one-octave band-pass noise) were used to establish a frequency pref-
erence map in which each voxel was assigned the frequency causing the
strongest response (see Fig. 2 B). Based on a smoothed version of this, a
frequency gradient map is computed, the reversals of which define bor-
ders between neighboring regions. Figure 2C shows the sign of this fre-
quency gradient map. In an analogous way, a bandwidth preference map
is computed, the gradient of which defines regions preferring narrow and
broadband sounds. Based on a combination of these, a full parcellation of
auditory cortex into 11 fields is obtained (3 core and 8 belt fields). Such a
functional division of auditory cortex was derived for each animal in the
present study.
These functional ROIs were extended by adding two other regions of
the auditory cortex that surround the belt, the so-called “parabelt.” The
caudal parabelt was defined as the region on the temporal plane caudal
and lateral to the caudal belt, including the area known as the temporal-
parietal area (Leinonen et al., 1980; Preuss and Goldman-Rakic, 1991).
The rostral parabelt was defined as the region rostral to the belt and
lateral to the rostral belt, including the area known as Ts2 (Pandya and
Sanides, 1973; Hackett et al., 1998a). In addition to the parabelt, we also
analyzed a region from the middle of the upper bank of the superior
temporal sulcus, corresponding to the multisensory area rostral polysen-
sory area in the superior temporal sulcus (TPOr) (Seltzer et al., 1996;
Padberg et al., 2003); this region was defined as the middle half of the
upper bank of the sulcus. The exact extent of these anatomically defined
areas cannot currently be obtained from functional activations; hence,
the areas localized from the anatomical scans by using landmarks de-
scribed by previous histological studies might not fully cover the respec-
tive area in each individual animal.
Results
Unimodal and multisensory activations in auditory cortex
Images of the BOLD response were acquired during stimulation
with naturalistic auditory only, visual only, and combined audio-
visual stimulation. The data acquisition proceeded by using a
sparse imaging sequence that allows the presentation of auditory
stimuli in the absence of scanner noise (Fig. 1B). To obtain high-
resolution images of the auditory cortex, we positioned the image
slices parallel to the lateral sulcus (Fig. 1 A), the lower bank of
which is covered by auditory cortex.
Figure 1C displays activation maps obtained from one exper-
iment with an anesthetized animal. Auditory stimulation led to
activity throughout auditory cortex. The activated voxels were
distributed in both rostrocaudal and mediolateral directions.
This broad strip of activity is in agreement with neurophysiolog-
ical findings that neurons in both primary and hierarchically
higher auditory areas respond to natural and complex sounds
(Rauschecker et al., 1995; Kosaki et al., 1997; Poremba et al., 2004;
Rauschecker and Tian, 2004). A time-course example of a region
(region 1) responding well to auditory (but not to visual) stimu-
lation is shown in Figure 1 D. Visual stimulation alone, however,
also led to activity in some parts of auditory cortex. In the exam-
ple, a few clusters of voxels occurred in auditory cortex, and these
were located mostly at its caudal end (region 2). Combining au-
ditory and visual stimulation led to robust responses throughout
auditory cortex, and in some regions (region 3) this led to an
enhancement of the response compared with the auditory-alone
condition. The localization of regions showing such enhance-
ment of activation was the goal of the present study. To deter-
mine which parts of auditory cortex were modulated by the visual
stimulus, we localized individual auditory fields for each animal,
as described next.
The primate auditory cortex consists of a number of fields that
are defined on the basis of their anatomical and functional prop-
erties (Pandya, 1995; Rauschecker, 1998; Hackett et al., 2001).
Functionally, several auditory fields can be distinguished on the
basis of a topographical pattern of response selectivity to sounds
of different frequency and bandwidth (Merzenich and Brugge,
1973; Rauschecker et al., 1995; Kosaki et al., 1997; Recanzone et
al., 2000; Tian and Rauschecker, 2004). For example, many fields
contain ordered maps representing different sound frequencies
along the rostrocaudal axis (Fig. 2 A). For the present study, we
used sounds of varying frequency and bandwidth to functionally
segregate core and belt fields, as well as anatomical constraints to
add ROIs in the parabelt (Fig. 2C,D) [for details, see Materials
and Methods and Petkov et al. (2006)]. For the example experi-
ment in Figure 1C, we indicated the parcellation of auditory cor-
tex on the left slice (Fig. 1C, white outlines). In a series of separate
experiments, we obtained a functional parcellation of auditory
cortex for each animal participating in the present study.
Audiovisual activations in individual auditory fields in core
and belt
Using an ROI analysis, with individual core and belt fields as
ROIs, we quantified the activation to auditory, visual, and com-
bined audiovisual stimulation (Fig. 3A). Across experiments with
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007 27(8):1824 –1835 1827
anesthetized animals (five animals; seven
experiments), we found significant audi-
tory activation in each of the 11 fields (Fig.
3A)(p values are indicated), which dem-
onstrated the effectiveness of the natural
auditory sounds in driving both primary
and higher auditory areas. Interestingly,
activations to just visual stimulation were
significant as well: both caudal belt fields
[caudomedial (CM) and caudolateral
(CL)] exhibited weak but significant acti-
vations (7.3 3.1 and 3.0 1.4% of the
total response; mean and SEM across ex-
periments; t test; p 0.05 in both fields).
This demonstrates that some auditory
fields can be activated by visual simulation
alone and reveals regions in auditory cor-
tex with an overlap of visual and auditory
representations. Such convergence of sen-
sory information is a necessary first step
for the integration of multisensory infor-
mation (Stein and Meredith, 1993; Cal-
vert, 2001).
In addition to convergence, a second
characteristic of sensory integration is a
modulation of activity induced by one
sense in conjunction with stimulation of
another sense. Figure 3A (top right panel)
shows the activation to combined audio-
visual stimulation, which on average was
stronger compared with the auditory-only
condition (57 8.7 compared with 33
10.3% of the total response; mean and SD
across all fields). To quantify this system-
atically, we computed the enhancement of
audiovisual compared with auditory acti-
vations in units of percent (Fig. 3A, bot-
tom right panel). Across experiments, sig-
nificant enhancement was found in three
caudal fields: the belt fields, CL and me-
diomedial (MM), as well as the primary
field, A1 (38 10, 46 16, and 40 15%
enhancement, respectively). Although
theoretically possible, no field showed a
significant suppression. These findings demonstrate that caudal
auditory fields are susceptible to both activation and enhance-
ment by a visual stimulus.
The analysis of enhancement at the level of ROIs might be
problematic, because the total enhancement is pooled across all
voxels within the ROI and hence might depend on the number of
ROIs (Laurienti et al., 2005). Indeed, across all ROIs, including
those with nonsignificant enhancement, the enhancement effect
was anti-correlated to the number of voxels (r ⫽⫺0.47; p
0.13). This suggests that the quantitative effect at the ROI level
might indeed depend on the size of the ROI. As a result, the
numbers of the enhancement as such have to be treated with
caution, but the consistency across experiments demonstrates
that the caudal fields indeed show an enhanced response during
combined multisensory stimulation.
To investigate multisensory enhancement at a fine spatial res-
olution, i.e., at the level of individual voxels, we tested individual
voxels for supralinear enhancement. This criterion of nonlinear
facilitation of responses is frequently used in imaging studies and
has been derived from criteria used in electrophysiological exper-
iments (Stein and Meredith, 1993; Calvert, 2001; Beauchamp,
2005; Kayser et al., 2005). It requires that the activation to the
combined audiovisual stimulus must be larger than the sum of
the activations to the two unisensory stimuli. Figure 3B shows the
enhancement map for the example experiment from Figure 1C.
Clearly, several groups of voxels clustered along the lateral and
caudal edge of auditory cortex exhibit supralinear response facil-
itation. A systematic analysis across experiments revealed that
three caudal belt fields contain a significant fraction of nonlin-
early enhanced voxels (fields CM, CL, and MM; 2.4 1.1, 0.8
0.3, and 4.8 2.1%, respectively, of the sensory responsive voxels
within these fields). In individual experiments, like the one
shown Figure 3B, other fields could contain voxels with signifi-
cant enhancement, but these were not consistent across experi-
ments. Together, these results demonstrate that several, but
mostly caudal, auditory fields are susceptible to both activation
and modulation by a visual stimulus and hence display charac-
teristics of sensory integration.
p<0.01
p<10
-7
LS
Visual
Audio-
visual
Audio
A
Sparse imaging paradigm
Baseline volumes
Stimulus volumes
...
Visual Audio
-
Audio-
visual
0 40 80 120 160 200 [sec]
Region 1
Region 2
Region 3
-0.4
0
2
0.4
0.8
1.2
1.6
-0.4
0
2
0.4
0.8
1.2
1.6
-0.4
0
2
0.4
0.8
1.2
1.6
Bold response [% from baseline]
B
D
C
Visual
Region 1
Region 3
Audio-visual
Audio
Region 2
A1
R
RT
MM
CM
CL
RPB
AL
CPB
RTL
RTM
RM
ML
Figure 1. Example experiment with auditory, visual, and combined activations. A, Sagittal image showing the alignment of
image slices with auditory cortex, which lies on the lower bank of the lateral sulcus (LS). B, Stimulus conditions (auditory only,
visual only, and combined audiovisual) were randomly presented within a scanning block. Functional data were acquired with a
sparse imaging sequence that allows auditory stimulation in the absence of the scanner noise (notice the 8 s silent gap between
successive acquisition of the imaged volumes). C, Example data (two slices covering auditory cortex) from one session with an
anesthetized animal. Individual panels display the activation maps ( p values) superimposed on anatomical images. White out-
lines show the functional parcellation of auditory cortex into individual fields (for clarity, shown only on the left slice). The
parcellation at the bottom indicates thenames ofindividual fields(compare Fig.2 fordetails). Arrows indicate locations for which
the time course is shown in D. D, Time course of the BOLD signal change of three locations from C (mean, SEM from 36 repeats).
Region 1 shows activations to auditory and audiovisual stimulation, whereas region 2 shows visual activations. Region 3 exhibits
auditory activations that are enhanced during audiovisual stimulation (multisensory enhancement).
1828 J. Neurosci., February 21, 2007 27(8):1824 –1835 Kayser et al. Visual Modulation of Auditory Cortex
Audiovisual activations in the alert animal
The above experiments investigated data obtained from the anes-
thetized preparation. Experiments with anesthetized animals are
advantageous because they allow us to acquire data more quickly.
In addition, cognitive and modulatory feedback from higher as-
sociation regions to lower sensory areas is eliminated as a result of
anesthesia; in particular, attention effects are absent in the anes-
thetized preparation. Results obtained from anesthetized ani-
mals, however, might not generalize to real-life situations in a
straightforward manner. In experiments with one alert animal,
we confirmed our findings from the anesthetized preparation
and found that visual stimuli enhance auditory activations in the
caudal belt as well as in primary fields.
Figure 4A shows activation maps obtained from the alert an-
imal in one experiment. Here, auditory and audiovisual activa-
tions covered a broad region along the rostrocaudal axis, and
visual activations were strongest at the caudal end of auditory
cortex. Comparable results were obtained in a total of seven ex-
periments with this animal. Figure 4B summarizes the data for
individual fields. Both auditory and audiovisual activations were
significant in all fields (Fig. 4, p values). Significant activations to
visual stimulation occurred in several caudal fields in the belt
(fields CL, CM, and MM; 10.8 3.2, 19.2 4.1, and 18.6 4.8%,
respectively, of the total response; mean and SEM across experi-
ments), in one rostral belt field (rostromedial; 10.3 4.0%), and
in primary auditory cortex (fields A1 and R; 10.4 2.7 and 6.1
2.3%). This confirms the overlap of visual and auditory activa-
tion seen in the anesthetized preparation and demonstrates that
visual activations can also occur in primary auditory cortex (the
auditory core).
Enhancement of auditory activation by a simultaneous visual
stimulus was seen in several fields. By using an ROI-based anal-
ysis, significant enhancement was found in both the belt and
primary auditory cortex (Fig. 4 B, bottom right panel) (fields CL,
CM, and A1; 27.2 13.8, 52.1 24.3, and 205 54% enhance-
ment, respectively). At the level of individual voxels, many fields
consistently exhibited voxels with significant nonlinear enhance-
ment (Fig. 4C) (with values ranging from 6.8 to 29% of all active
voxels in the belt, and from 10.7 to 12.9% in the core). These
findings unequivocally demonstrate that a visual stimulus can
significantly enhance auditory activations in the belt as well as in
primary auditory cortex.
Both the alert and anesthetized preparation exhibit similar
patterns of audiovisual enhancement. In both cases, the influence
of the visual stimulus was most pronounced in the caudal part of
auditory cortex, yet visual activations encompassed a larger num-
ber of fields in the alert animal and were quantitatively stronger in
this preparation (two-sample t test pooling fields with significant
visual activation in both preparations; p 0.01). In addition, the
fraction of voxels with significant enhancement was larger in the
alert animal as well ( p 0.01). This suggests that multisensory
audiovisual enhancement is more pronounced in the alert animal
but does not depend on the conscious state of the animal.
Functional organization
of auditory cortex
A
B
C
CM
CL
ML
AL
RT
L
RM
A1
R
RT
RTM
MM
Belt
Core (PAC)
Parabelt
High
Low
Low
High
Low
Frequency
preference
Bandwidth preference
Broad Broad Narrow
RPB
CPB
Frequency map
Frequency preference
Regions of interest
representing individual fields
D
A1
R
RT
MM
CM
CL
RPB
AL
CPB
RTL
RTM
RM
ML
Frequency gradient Bandwidth gradient
Low to high
High to low
Broad-band
Narrow-band
Low High
Low freq.
High freq.
p<0.05
p<10
-5
caudal
rostral
lateral
lateral
rostral
caudal
Belt
Core (PAC)
Parabelt
Figure 2. Functional organization of auditory cortex: localizing auditory cortical fields. A, Schematic of auditory cortex, which can be separated into core (primary auditory cortex), belt, and
parabelt regions, adapted from Hackett et al. (1998a). The preference for sound frequencies changes in the rostrocaudal direction and shows multiple reversals. In the orthogonal direction, the
preference for soundbandwidth changes from core tobelt. Abbreviations ofthe auditorycortical fields areshown (see below). B–D, Thedifferential preferences forsound frequencyand bandwidth
can beused to obtain a functional parcellation of numerous auditory fields (for details, see Materials and Methods). Borders were delineated between regionswith opposing frequency gradients at
the point ofmirror reversal andbetween regions selectiveto narrow or broadband sounds.B, First, themain frequency-selective regionsare approximated byusing low- and high-frequency sounds
(500 Hz and 8kHz), as shown by the activation mapon theleft ( p values arecolor-coded on an anatomical image). Then, amore detailed frequency preference map was obtained by usingmultiple
frequency bands (6 frequencies, equally spaced between 250 Hz and 16 kHz), as shown on the right. C, A gradient analysis (computed along the rostrocaudal direction) results in an alternating
pattern of regions with progressively increasing or decreasing frequency selectivity and separates the regionswith mirror reversed-frequency gradients. Based on the gradient analysis, borders can
be delineated from the points at which the sign of the frequency gradient changes.The same gradientanalysis is done for sound bandwidth (tone vs noise preference maps) along the mediolateral
direction (right image).D, Afunctional parcellation of auditory cortexinto 11 core and beltfields isobtained by combiningbandwidth andfrequency maps. As additional ROIs,the caudal and rostral
parabelt (CPB and RPB) were defined; these extend rostrocaudally on the superior temporal plane and laterally on the superior temporal gyrus. RT, Rostrotemporal; AL, anterolateral; AM,
anteromedial; RM, rostromedial; RTL, rostrotemporal-lateral; RTM, rostrotemporal-medial field.
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007 27(8):1824 –1835 1829
Visual influences in the parabelt and superior
temporal sulcus
Extending the above analysis, we quantified the effect of visual
stimulation on auditory activation in the parabelt. For this anal-
ysis, we pooled alert and anesthetized preparations because they
exhibited qualitatively similar effects. Across experiments, we
found highly significant visual activation in the caudal parabelt
(19.1 3.9% of the total response; p 0.001) and weaker visual
activation in the rostral parabelt (6.8 2.5% of the total re-
sponse; p 0.05). Similarly, the fraction of voxels exhibiting
significant nonlinear enhancement was larger in the caudal com-
pared with the rostral parabelt (15.1 3.6 and 10.7 4.9% of all
active voxels; p 0.001 and p 0.05). This demonstrates that
visual activations and audiovisual enhancement affect the entire
parabelt but are more pronounced in its caudal part.
In the multisensory area TPO, which lies on the upper bank of
the superior temporal sulcus, visual activations were consider-
ably stronger compared with auditory cortex (42 5.3% of the
total response; p 10
6
). The fraction of voxels showing signif
-
icant nonlinear enhancement reached similar levels as in auditory
cortex (11.5 3.5% of all active voxels; p 0.01). These results
show that audiovisual activations extend outside auditory cortex
and confirm the expectation that the overlap of auditory and
visual representations is larger in regions of the brain that are
known to be multisensory, such as in the upper bank of the su-
perior temporal sulcus.
Visual influences on auditory activation patterns
Imaging studies commonly use changes of BOLD signal ampli-
tude or activity time course to localize and quantify multisensory
integration (Calvert, 2001; Martuzzi et al., 2006). These quanti-
ties are usually sampled at individual voxels or averaged across
ROIs, yet in many cases sensory integration might be a spatially
distributed process that changes large-scale activation patterns
more than the signal amplitude of individual voxels (Laurienti et
al., 2005). In this case, one could expect a change in the spatial
pattern of the BOLD response that might not be accompanied by
a significant alternation of the signal change at individual voxels.
We used a method to quantify differences in activity patterns to
asses whether a visual stimulus affects the pattern of auditory
responses in auditory cortex (Haxby et al., 2001; Cox and Savoy,
2003; Haynes and Rees, 2006).
Activation patterns of a group of voxels can be compared
across conditions. Using a split-dataset approach, we compared
the similarity of the BOLD activation pattern between auditory
only and audiovisual stimulation (Fig. 5): The (spatial) cross-
correlation of the BOLD response was computed for two patterns
of auditory activation and pairs of auditory and audiovisual acti-
vation, yielding one similarity value for auditory patterns and one
for auditory versus audiovisual patterns. The difference between
these similarity values is shown in Figure 5 (large boxes), pooled
across all experiments with anesthetized and alert animals (n
14). This difference along the rostrocaudal axis demonstrates that
the effect of a visual stimulus is restricted to the caudal parabelt,
the caudal belt, and fields A1, MM, and mediolateral (ML) (Wil-
coxon sign-rank tests; p 0.001 and p 0.01, respectively). For
more rostral fields, the pattern of activation is indistinguishable
between auditory only and audiovisual stimulation.
It seems unlikely that the differences in activation patterns
between auditory and audiovisual conditions are only the result
of a difference in activation strength between these conditions.
To substantiate this quantitatively, we used two controls (see
Materials and Methods). First, we omitted those voxels with the
strongest response from the activation pattern [as has been done
previously (Haxby et al., 2001)]. This yielded the same result as
obtained with all voxels (Fig. 5, small boxes). Second, we used a
metric for comparison that is sensitive only to the sign of a dif-
ference between pairs of voxels but not the magnitude
(Spearman-rank correlation instead of Pearson correlation). The
result was again similar to that displayed in Figure 5 [median
values for the caudal parabelt (CM, CL) and (ML, A1, MM): 0.12,
0.05, and 0.07; p 0.001, p 0.05, and p 0.01, respectively;
other regions were not significantly different]. This analysis dem-
onstrates that the visual stimulus alters the auditory activation
pattern not only by scaling the signal amplitude of individual
voxels; instead, the visual stimulus introduces a spatially distinct
pattern in the auditory response. Hence, these findings extend the
results from the analysis of activation strength and demonstrate
that the visual stimulus alters activation patterns in the caudal
half of the auditory cortex, where the visual influences are
strongest.
Activation Audio Activation Audio-Visual
Activation Visual Enhancement
60
0
60
0
12
0
60
0
[ (AV-A) / A *100 ]
A
B
p<0.01
p<10
-4
AV
>
(A+V)
Enhancement of individual voxels
[% total response]
[% total response]
[% total response]
*
**
***
***
***
*
**
**
**
**
***
***
**
***
***
*** ***
***
***
***
***
**
**
*
**
*
*
Signif. enhanced voxels
0
5
**
*
*
[% of responsive voxels]
* p<0.05, ** p<0.01, *** p<10
-3
or higher
Figure 3. Activations and multisensory enhancement. A, Schematic of core and belt fields
summarizing the activation for all experiments with anesthetized animals (n 7). Activations
are quantified within each field as relative response (signal change for the respective condition
as a fraction of the summed signal change across all three conditions). The bottom right panel
displaysthe percentage enhancement ofsignalchangefromauditory to audiovisual conditions.
B, Multisensory enhancement of individual voxels. Voxels exhibiting significant nonlinear en-
hancement of responses to combined audiovisual stimulation are shown color-coded on the
same slices as used in Figure 1. The inset summarizes the fraction of sensory responsive voxels
within individual fields that exhibited significant nonlinear enhancement. In all panels, color-
coding indicates the mean across experiments, and only fields with significant activation or
enhancement are colored: t test across experiments; *p 0.05; **p 0.01; ***p 0.001 or
higher.
1830 J. Neurosci., February 21, 2007 27(8):1824 –1835 Kayser et al. Visual Modulation of Auditory Cortex
The principle of inverse effectiveness
A prime function of integrating informa-
tion across senses is to improve our per-
ception of the environment. Hence, the
benefit of multisensory processing should
be strongest when unisensory stimuli are
least effective. For example, our percep-
tion of speech profits mostly from simul-
taneous visual cues when the auditory sig-
nal is corrupted by background noise
(Sumby and Polack, 1954). Reflecting this,
sensory integration at the neuronal level
exhibits what is known as the principle of
inverse effectiveness (Stein and Meredith,
1993); i.e., the effect of integration is high-
est when both unisensory stimuli are min-
imally effective. In a separate set of exper-
iments (n 6; three scans of the alert
animal and three scans of two anesthetized
animals), we used degraded audiovisual
stimuli to show that the observed en-
hancement in auditory cortex follows the
principle of inverse effectiveness (Fig. 6).
The auditory and visual stimuli were de-
graded by adding noise, and their activity
was compared with that for the response
to the original stimuli within the caudal
belt fields, which were previously shown
to be most prone to audiovisual
enhancement.
The reduced effectiveness of degraded
stimuli is demonstrated by a reduction of visual activation
(20.4 2.6 vs 11.8 2.3% of the total response for the original
and degraded stimulus; t test; p 0.001) and an insignificant
change of auditory activation strength (36.0 3.4 vs 37.5 4.1%
of the total response; p 0.69). Indeed, degraded stimuli led to
stronger enhancement of audiovisual activation in the ROI (57
22 vs 91 19% enhancement; p 0.05) as well as an increased
fraction of voxels showing significant nonlinear enhancement
(15.1 3.1 vs 28.1 6.4% of the sensory responsive voxels in this
region; p 0.05). This demonstrates that degraded audiovisual
stimuli were less effective in driving auditory cortex but caused
stronger multisensory enhancement, hence obeying the principle
of inverse effectiveness.
Discussion
Our results demonstrate that the processing of sound in auditory
cortex can be modulated by visual stimulation. This is consistent
with previous human imaging studies that used both audiovisual
speech and artificial stimuli (Giard and Peronnet, 1999; Molholm
et al., 2002; Olson et al., 2002; Calvert and Campbell, 2003; Lau-
rienti, 2004; Ojanen et al., 2005; Pekkola et al., 2005; Teder-
Salejarvi et al., 2005; van Wassenhove et al., 2005). Although
many studies could not localize the multisensory interactions to
particular auditory fields, some attributed these to primary (Cal-
vert et al., 1997; Lehmann et al., 2006; Martuzzi et al., 2006) or
caudal regions (van Atteveldt et al., 2004; Lehmann et al., 2006).
Our results support and clarify that work, and we can now
relate the human imaging work to the detailed anatomical and
electrophysiological knowledge available from the monkey. An-
atomically, many auditory fields are thought to be homologous in
humans and monkeys (Hackett et al., 2001), yet human imaging
studies often have difficulties in functionally distinguishing au-
A
Audio
Visual
Audio-visual
p<0.01
p<10
-7
60
0
B
Activation Audio Activation Audio-visual
Activation Visual
60
0
12
0
60
0
15
0
Enhancement
Voxels with signif. enhancement (AV>(A+V))
[% responsive
voxels]
[ (AV-A) / A *100 ]
[% total response][% total response]
[% total response]
*
**
***
*
**
**
***
***
***
*
**
***
***
***
**
***
***
***
***
***
***
**
**
**
*
*
*
*
***
***
*
**
****
**
**
**
C
* p<0.05, ** p<0.01, *** p<10
-
3
or higher
Figure 4. Activations andmultisensory enhancementin the alertanimal. A, Exampledata (two slices covering auditorycortex)
from one session with the alert animal. Individual panels display the activation maps for each condition ( p values) superimposed
on an anatomical image. B, Schematic of core and belt fields showing the activation strength in individual conditions for all
experiments with the alert animal (n 7). Activations are quantified as the relative response (Fig. 3). The bottom right panel
displays the percentage enhancement of signal change from auditory to audiovisual conditions. C, Fraction of responsive voxels
within individual fields that showed significant nonlinear enhancement. In all panels, color-coding indicates the mean across
experiments, and only fields with significant activation or enhancement are colored: t test across experiments; *p 0.05; **p
0.01; ***p 0.001 or higher.
Figure 5. Dissimilarity of auditory and audiovisual response patterns. The similarity of the
BOLD response pattern between auditory and audiovisual conditions was quantified within
individual fields. For each experiment, the data for auditory and audiovisual conditions were
splitin half, andthesimilarity (correlation) wascomputedwithin and acrossconditions(the top
left inset shows the similarity values for the caudal parabelt for individual experiments. A/A,
Similarity within auditory; A/AV, similarity between auditory and audiovisual conditions. This
analysis was doneseparately for each field, andthe results were averaged forfields with similar
rostrocaudal positions (see color code). The main graph (large boxes) shows the difference
between the similarity within auditory conditions minus the similarity of auditory to audiovi-
sual response patterns (boxes indicate the median and 25th and 75th percentile; lines indicate
the full data range; data from all 14 experiments). The small boxes display thesame result for a
control in which the 15% most active voxels were omitted from each condition. Wilcoxon
rank-sum test: **p 0.01; ***p 0.0001).
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007 27(8):1824 –1835 1831
ditory fields (Wessinger et al., 1997;
Formisano et al., 2003). One notable rea-
son for this could be the use of group av-
eraging and anatomical landmark-based
techniques to localize activations (Desai et
al., 2005), especially given the intersubject
variability with regard to the exact posi-
tion of functional areas within the brain
(Rademacher et al., 1993; Van Essen and
Drury, 1997). As a result, the human im-
aging literature provides good insight into
multisensory integration in various be-
havioral paradigms, but often cannot
faithfully localize effects to particular
fields. Here we combined high-resolution
imaging with a functional approach to lo-
calize auditory fields and demonstrate that
audiovisual integration is strongest in the
caudal belt and parabelt but can extend
into primary auditory cortex, especially in
the alert animal. This suggests the interest-
ing possibility that some of the described
influences in auditory cortex could be cog-
nitive in nature, whereas others are inde-
pendent of cognitive or attentional
mechanisms.
Function of early
multisensory convergence
One possibility is that multisensory en-
hancement reflects enhanced sensory pro-
cessing resulting from focused attention.
Both sensory integration and attention
serve to enhance perception by increasing the sensitivity to par-
ticular sensory events. Hence, it is likely that attentional and mul-
tisensory processes are mediated by similar mechanisms. Indeed,
imaging studies demonstrated that focused attention to one mo-
dality can enhance the processing and activity of colocalized
stimuli in another modality (Driver and Spence, 1998; Macaluso
et al., 2000a,b; Weissman et al., 2004), can suppress activity in the
unattended system (Johnson and Zatorre, 2005), and can interact
with multisensory enhancement (Talsma et al., 2006).
There are several reasons, however, why we believe that atten-
tion is not the only source of visual modulation of auditory cor-
tex. In the present study, we found similar audiovisual enhance-
ment in alert and anesthetized animals. In the alert animal, which
was performing a fixation task, we cannot be sure about the bal-
ance of attention between visual and auditory stimuli. Neither
can we exclude small eye movement or position effects, although
the task aimed to control for these (Werner-Reiss et al., 2003; Fu
et al., 2004). For the anesthetized preparation, however, anesthe-
sia reduces activity in association areas more than activity in sen-
sory regions and prevents cognitive and attentive mechanisms
(Heinke and Schwarzbauer, 2001). Hence, the results from the
anesthetized preparation cannot be explained by attentional
modulation. Because the findings from both preparations were in
good agreement, we conclude that the existence of audiovisual
integration in auditory cortex does not depend on the animal’s
conscious state.
It may well be that the largely feed-forward multisensory in-
teractions, as present in the anesthetized preparation, are further
enhanced by attentive or other cognitive mechanisms. Indeed, at
the quantitative level, we found a number of differences between
both preparations. The alert animal showed stronger, purely vi-
sual activations, more voxels with nonlinear enhancement, and
stronger effects within primary auditory cortex. It is unclear
whether these differences are caused only by the effects of anes-
thesia on the hemodynamic response or whether they have a
different source. Previous studies, for example, identified atten-
tion, expectation, learned associations, or eye movements as pos-
sible mediators of multisensory influences in auditory cortex
(Werner-Reiss et al., 2003; Fu et al., 2004; Brosch et al., 2005;
Tanabe et al., 2005; Baier et al., 2006; Talsma et al., 2006).
With regard to multisensory integration within the caudal
auditory cortex, an additional hypothesis can be formulated:
multisensory convergence could improve the spatial localization
of external events. The caudal auditory areas are supposedly in-
volved in spatial localization, belonging to an auditory “where”
processing stream (Rauschecker and Tian, 2000; Zatorre et al.,
2002), and could help to bring auditory and visual information
into register. The finding that multisensory enhancement occurs
prominently in caudal fields supports this hypothesis (Schroeder
and Foxe, 2005); however, this does not rule out additional influ-
ences of multisensory object processing (Amedi et al., 2005) or
the integration of face–voice information, which has been ob-
served in electrophysiological studies (Ghazanfar et al., 2005).
Pathways of multisensory enhancement of auditory cortex
The anatomical knowledge available for the monkey brain can
suggest pathways of visual influence on early processing in audi-
tory cortex. The visual signals could directly come from the thal-
amus. Several multisensory nuclei such as the suprageniculate,
the limitans, and the posterior nuclei, and also the medial pulvi-
Original
Degraded
A
0
40
120
0
10
20
30
40
Enhancement
(AV-A)/A*100
Voxels with signif.
enhancement
B
*
*
[% total response]
[% voxels]
0
20
40
60
0
20
40
60
Activation AudioActivation Visual
***
(ns)
80
[%]
Figure 6. Principle of inverse effectiveness: responses to degraded audiovisual stimuli. A, Examples from the original auditory
and visual stimuli (left) and degraded versions of these (right). The degraded stimuli were obtained by adding noise to both the
movie and sound. B, Activation strength for auditory and visual conditions, enhancement and fraction of voxels exhibiting
significantnonlinear enhancement, separatelyfororiginal and degradedstimuli(n 6 experiments;datapooled from thecaudal
belt fields CM and CL). Visual activations are significantly reduced, demonstrating the reduced effectiveness of the degraded
stimulus.Incontrast, the activation enhancement isincreased,asisthefractionof voxels exhibiting significant nonlinearenhance-
ment, demonstrating stronger multisensory enhancement. Paired t test: *p 0.05; ***p 0.001.
1832 J. Neurosci., February 21, 2007 27(8):1824 –1835 Kayser et al. Visual Modulation of Auditory Cortex
nar, have been shown to project to the auditory belt and parabelt
areas (Fitzpatrick and Imig, 1978; Morel and Kaas, 1992; Pandya
et al., 1994; Hackett et al., 1998b; de la Mothe et al., 2006) [also see
Budinger et al. (2006) for a study in the gerbil]. All of these
structures are responsive to visual stimulation. In addition, re-
cent studies revealed that supposedly unisensory thalamic nuclei
can modulate each other via the thalamic reticular nucleus
(Crabtree et al., 1998; Crabtree and Isaac, 2002). It is likely that
such subcortical exchange of multisensory information occurs
between visual and auditory modalities, yet this hypothesis has
not been tested.
Another source of visual input to auditory cortex could be
direct projections from other early sensory areas. Projections
from auditory to visual cortex have been demonstrated in the
macaque monkey (Falchier et al., 2002; Rockland and Ojima,
2003). Although the reverse direction has not been reported in
this species, it has been shown that the primary auditory cortex of
the gerbil receives projections from secondary visual areas
(Budinger et al., 2006).
Last, feedback projections from higher association areas could
mediate visual modulations. Particularly, auditory cortex
projects to various regions in the frontal lobe, such as the frontal
eye fields and the principal sulcus, which could send reciprocal
connections to auditory fields (Romanski et al., 1999a,b). The
same argument also applies to areas in the intraparietal sulcus
(Lewis and Van Essen, 2000a,b). Similarly, auditory cortex has
strong interconnectivity with visual and multisensory areas in the
superior temporal sulcus (Hackett et al., 1998a; Cappe and Bar-
one, 2005), which project back to auditory cortex (Barnes and
Pandya, 1992). This suggests many different pathways, of both
the feed-forward and feed-back type, that could mediate the vi-
sual modulation of auditory processing.
The neuronal basis of integration in auditory cortex
There is abundant evidence for multisensory convergence in au-
ditory cortex from measurements of the fMRI–BOLD response.
Because the BOLD signal indirectly reflects changes in neuronal
activity, similar observations should be possible at the level of
neuronal activity (Logothetis et al., 2001); however, because the
BOLD signal arises from a population of neurons, it is not easy to
predict the responses of individual neurons on the basis of imag-
ing data. It might be that multisensory influences are mostly a
subthreshold phenomenon with little effect on the firing rates of
individual neurons or with an effect on only a few of them. In this
case, they might easily be missed by studies focusing on single
neurons. Indeed, only a few studies have reported multisensory
modulations at the level of individual neurons (Fu et al., 2003;
Brosch et al., 2005); yet if multisensory influences are subthresh-
old processes and most prominent at the level of synaptic activity
or somatic potentials, they should be detectable in local field
potentials, which represent exactly this form of neuronal activity
(Mitzdorf, 1985, 1987). As a matter of fact, recent studies have
demonstrated audiovisual enhancement at the level of local field
potentials (Ghazanfar et al., 2005) and current source densities
(Schroeder and Foxe, 2002). Given the prominent link between
the BOLD signal and local field potentials, this might explain why
multisensory influences are often detected in imaging but not in
electrophysiological studies (Logothetis et al., 2001); however, it
could also be that multisensory responses are more prevalent at
the single neuron level than previously thought and only partly
visible in the fMRI–BOLD responses because of a spatial pooling
of different neuronal populations (Laurienti et al., 2005). Further
electrophysiological studies are needed to finally resolve the ques-
tion of whether multisensory influences on auditory cortex are a
subthreshold phenomenon, existing only at the population level,
or whether individual auditory neurons are able to fuse multisen-
sory information. The present results can guide future studies by
revealing caudal auditory fields as a promising area for future
work.
References
Amedi A, von Kriegstein K, van Atteveldt NM, Beauchamp MS, Naumer MJ
(2005) Functional imaging of human crossmodal identification and ob-
ject recognition. Exp Brain Res 166:559 –571.
Baier B, Kleinschmidt A, Muller NG (2006) Cross-modal processing in early
visual and auditory cortices depends on expected statistical relationship of
multisensory information. J Neurosci 26:12260–12265.
Barnes CL, Pandya DN (1992) Efferent cortical connections of multimodal
cortex of the superior temporal sulcus in the rhesus monkey. J Comp
Neurol 318:222–244.
Beauchamp MS (2005) Statistical criteria in FMRI studies of multisensory
integration. Neuroinformatics 3:93–114.
Belin P, Zatorre RJ, Hoge R, Evans AC, Pike B (1999) Event-related fMRI of
the auditory cortex. NeuroImage 10:417–429.
Benevento LA, Fallon J, Davis BJ, Rezak M (1977) Auditory–visual interac-
tion in single cells in the cortex of the superior temporal sulcus and the
orbital frontal cortex of the macaque monkey. Exp Neurol 57:849 872.
Brosch M, Selezneva E, Scheich H (2005) Nonauditory events of a behav-
ioral procedure activate auditory cortex of highly trained monkeys. J Neu-
rosci 25:6797– 6806.
Bruce C, Desimone R, Gross CG (1981) Visual properties of neurons in a
polysensory area in superior temporal sulcus of the macaque. J Neuro-
physiol 46:369 –384.
Budinger E, Heil P, Hess A, Scheich H (2006) Multisensory processing via
early cortical stages: connections of the primary auditory cortical field
with other sensory systems. Neuroscience 143:1065–1083.
Calvert GA (2001) Crossmodal processing in the human brain: insights
from functional neuroimaging studies. Cereb Cortex 11:1110 –1123.
Calvert GA, Campbell R (2003) Reading speech from still and moving faces:
the neural substrates of visible speech. J Cogn Neurosci 15:57–70.
Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire
PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory
cortex during silent lipreading. Science 276:593–596.
Calvert GA, Hansen PC, Iversen SD, Brammer MJ (2001) Detection of
audio-visual integration sites in humans by application of electrophysio-
logical criteria to the BOLD effect. NeuroImage 14:427–438.
Cappe C, Barone P (2005) Heteromodal connections supporting multisen-
sory integration at low levels of cortical processing in the monkey. Eur
J Neurosci 22:2886–2902.
Cox DD, Savoy RL (2003) Functional magnetic resonance imaging (fMRI)
“brain reading”: detecting and classifying distributed patterns of fMRI
activity in human visual cortex. NeuroImage 19:261–270.
Crabtree JW, Isaac JT (2002) New intrathalamic pathways allowing
modality-related and cross-modality switching in the dorsal thalamus.
J Neurosci 22:8754 8761.
Crabtree JW, Collingridge GL, Isaac JT (1998) A new intrathalamic pathway
linking modality-related nuclei in the dorsal thalamus. Nat Neurosci
1:389–394.
de la Mothe LA, Blumell S, Kajikawa Y, Hackett TA (2006) Thalamic con-
nections of the auditory cortex in marmoset monkeys: core and medial
belt regions. J Comp Neurol 496:72–96.
Desai R, Liebenthal E, Possing ET, Waldron E, Binder JR (2005) Volumetric
vs. surface-based alignment for localization of auditory cortex activation.
NeuroImage 26:1019 –1029.
Driver J, Spence C (1998) Crossmodal attention. Curr Opin Neurobiol
8:245–253.
Falchier A, Clavagnier S, Barone P, Kennedy H (2002) Anatomical evidence
of multimodal integration in primate striate cortex. J Neurosci
22:5749–5759.
Fitzpatrick KA, Imig TJ (1978) Projections of auditory cortex upon the thal-
amus and midbrain in the owl monkey. J Comp Neurol 177:555–575.
Formisano E, Kim DS, Di Salle F, van de Moortele PF, Ugurbil K, Goebel R
(2003) Mirror-symmetric tonotopic maps in human primary auditory
cortex. Neuron 40:859 869.
Foxe JJ, Wylie GR, Martinez A, Schroeder CE, Javitt DC, Guilfoyle D, Ritter
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007 27(8):1824 –1835 1833
W, Murray MM (2002) Auditory-somatosensory multisensory process-
ing in auditory association cortex: an fMRI study. J Neurophysiol
88:540–543.
Fu KM, Johnston TA, Shah AS, Arnold L, Smiley J, Hackett TA, Garraghty PE,
Schroeder CE (2003) Auditory cortical neurons respond to somatosen-
sory stimulation. J Neurosci 23:7510–7515.
Fu KM, Shah AS, O’Connell MN, McGinnis T, Eckholdt H, Lakatos P, Smiley
J, Schroeder CE (2004) Timing and laminar profile of eye-position ef-
fects on auditory responses in primate auditory cortex. J Neurophysiol
92:3522–3531.
Ghazanfar AA, Schroeder CE (2006) Is neocortex essentially multisensory?
Trends Cogn Sci 10:278 –285.
Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK (2005) Multisensory
integration of dynamic faces and voices in rhesus monkey auditory cortex.
J Neurosci 25:5004–5012.
Giard MH, Peronnet F (1999) Auditory-visual integration during multimo-
dal object recognition in humans: a behavioral and electrophysiological
study. J Cogn Neurosci 11:473–490.
Hackett TA (2002) The comparative anatomy of the primate auditory cor-
tex. In: Primate audition: behavior and neurobiology (Ghazanfar AA, ed),
pp 199 –226. Boca Raton, FL: CRC.
Hackett TA, Stepniewska I, Kaas JH (1998a) Subdivisions of auditory cortex
and ipsilateral cortical connections of the parabelt auditory cortex in
macaque monkeys. J Comp Neurol 394:475– 495.
Hackett TA, Stepniewska I, Kaas JH (1998b) Thalamocortical connections
of the parabelt auditory cortex in macaque monkeys. J Comp Neurol
400:271–286.
Hackett TA, Preuss TM, Kaas JH (2001) Architectonic identification of the
core region in auditory cortex of macaques, chimpanzees, and humans.
J Comp Neurol 441:197–222.
Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott
MR, Gurney EM, Bowtell RW (1999) “Sparse” temporal sampling in
auditory fMRI. Hum Brain Mapp 7:213–223.
Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P (2001)
Distributed and overlapping representations of faces and objects in ven-
tral temporal cortex. Science 293:2425–2430.
Hayasaka S, Nichols TE (2004) Combining voxel intensity and cluster ex-
tent with permutation test framework. NeuroImage 23:54 63.
Haynes JD, Rees G (2006) Decoding mental states from brain activity in
humans. Nat Rev Neurosci 7:523–534.
Heinke W, Schwarzbauer C (2001) Subanesthetic isoflurane affects task-
induced brain activation in a highly specific manner: a functional mag-
netic resonance imaging study. Anesthesiology 94:973–981.
Hyvarinen J, Shelepin Y (1979) Distribution of visual and somatic functions
in the parietal associative area 7 of the monkey. Brain Res 169:561–564.
Jancke L, Mirzazade S, Shah NJ (1999) Attention modulates activity in the
primary and the secondary auditory cortex: a functional magnetic reso-
nance imaging study in human subjects. Neurosci Lett 266:125–128.
Jancke L, Wustenberg T, Scheich H, Heinze HJ (2002) Phonetic perception
and the temporal cortex. NeuroImage 15:733–746.
Johnson JA, Zatorre RJ (2005) Attention to simultaneous unrelated audi-
tory and visual events: behavioral and neural correlates. Cereb Cortex
15:1609–1620.
Jones EG, Powell TP (1970) An anatomical study of converging sensory
pathways within the cerebral cortex of the monkey. Brain 93:793–820.
Kaas JH, Hackett TA (2000) Subdivisions of auditory cortex and processing
streams in primates. Proc Natl Acad Sci USA 97:11793–11799.
Kayser C, Petkov CI, Augath M, Logothetis NK (2005) Integration of touch
and sound in auditory cortex. Neuron 48:373–384.
Kosaki H, Hashikawa T, He J, Jones EG (1997) Tonotopic organization of
auditory cortical fields delineated by parvalbumin immunoreactivity in
macaque monkeys. J Comp Neurol 386:304 –316.
Laurienti PJ (2004) Deactivations, global signal, and the default mode of
brain function. J Cogn Neurosci 16:1481–1483.
Laurienti PJ, Perrault TJ, Stanford TR, Wallace MT, Stein BE (2005) On the
use of superadditivity as a metric for characterizing multisensory integra-
tion in functional neuroimaging studies. Exp Brain Res 166:289 –297.
Lehmann C, Herdener M, Esposito F, Hubl D, di Salle F, Scheffler K, Bach DR,
Federspiel A, Kretz R, Dierks T, Seifritz E (2006) Differential patterns of
multisensory interactions in core and belt areas of human auditory cortex.
NeuroImage 31:294 –300.
Leinonen L, Hyvarinen J, Sovijarvi AR (1980) Functional properties of neu-
rons in the temporo-parietal association cortex of awake monkey. Exp
Brain Res 39:203–215.
Lewis JW, Van Essen DC (2000a) Mapping of architectonic subdivisions in
the macaque monkey, with emphasis on parieto-occipital cortex. J Comp
Neurol 428:79 –111.
Lewis JW, Van Essen DC (2000b) Corticocortical connections of visual, sen-
sorimotor, and multimodal processing areas in the parietal lobe of the
macaque monkey. J Comp Neurol 428:112–137.
Logothetis NK, Pauls J, Augath M, Trinath T, Oeltermann A (2001) Neuro-
physiological investigation of the basis of the fMRI signal. Nature
412:150–157.
Macaluso E, Driver J (2005) Multisensory spatial interactions: a window
onto functional integration in the human brain. Trends Neurosci
28:264–271.
Macaluso E, Frith CD, Driver J (2000a) Modulation of human visual cortex
by crossmodal spatial attention. Science 289:1206 –1208.
Macaluso E, Frith C, Driver J (2000b) Selective spatial attention in vision
and touch: unimodal and multimodal mechanisms revealed by PET.
J Neurophysiol 83:3062–3075.
Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E, Thiran JP, Pittet A,
Clarke S (2001) Distinct pathways involved in sound recognition and
localization: a human fMRI study. NeuroImage 14:802– 816.
Martuzzi R, Murray MM, Michel CM, Thiran JP, Maeder PP, Clarke S, Meuli
RA (2007) Multisensory interactions within human primary cortices re-
vealed by BOLD dynamics. Cereb Cortex, in press.
Merzenich MM, Brugge JF (1973) Representation of the cochlear partition
of the superior temporal plane of the macaque monkey. Brain Res
50:275–296.
Mitzdorf U (1985) Current source-density method and application in cat
cerebral cortex: investigation of evoked potentials and EEG phenomena.
Physiol Rev 65:37–100.
Mitzdorf U (1987) Properties of the evoked potential generators: current
source-density analysis of visually evoked potentials in the cat cortex. Int
J Neurosci 33:33–59.
Molholm S, Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ (2002)
Multisensory auditory-visual interactions during early sensory processing
in humans: a high-density electrical mapping study. Brain Res Cogn Brain
Res 14:115–128.
Morel A, Kaas JH (1992) Subdivisions and connections of auditory cortex in
owl monkeys. J Comp Neurol 318:27– 63.
Murray MM, Molholm S, Michel CM, Heslenfeld DJ, Ritter W, Javitt DC,
Schroeder CE, Foxe JJ (2005) Grabbing your ear: rapid auditory-
somatosensory multisensory interactions in low-level sensory cortices are
not constrained by stimulus alignment. Cereb Cortex 15:963–974.
Nichols TE, Holmes AP (2002) Nonparametric permutation tests for func-
tional neuroimaging: a primer with examples. Hum Brain Mapp 15:1–25.
Ojanen V, Mottonen R, Pekkola J, Jaaskelainen IP, Joensuu R, Autti T, Sams
M (2005) Processing of audiovisual speech in Broca’s area. NeuroImage
25:333–338.
Olson IR, Gatenby JC, Gore JC (2002) A comparison of bound and un-
bound audio-visual information processing in the human cerebral cortex.
Brain Res Cogn Brain Res 14:129 –138.
Padberg J, Seltzer B, Cusick CG (2003) Architectonics and cortical connec-
tions of the upper bank of the superior temporal sulcus in the rhesus
monkey: an analysis in the tangential plane. J Comp Neurol 467:418 434.
Pandya DN (1995) Anatomy of the auditory cortex. Rev Neurol (Paris)
151:486 494.
Pandya DN, Sanides F (1973) Architectonic parcellation of the temporal
operculum in rhesus monkey and its projection pattern. Z Anat Entwick-
lungsgesch 139:127–161.
Pandya DN, Yeterian EH (1985) Architecture and connections of cortical
association areas. In: Cerebral cortex: association and auditory cortices
(Peters A, Jones EG, eds), pp 3– 61. New York: Plenum.
Pandya DN, Rosene DL, Doolittle AM (1994) Corticothalamic connections
of auditory-related areas of the temporal lobe in the rhesus monkey.
J Comp Neurol 345:447– 471.
Pekkola J, Ojanen V, Autti T, Jaaskelainen IP, Mottonen R, Tarkiainen A,
Sams M (2005) Primary auditory cortex activation by visual speech: an
fMRI study at 3 T. NeuroReport 16:125–128.
Petkov CI, Kayser C, Augath M, Logothetis NK (2006) Functional imaging
reveals numerous fields in the monkey auditory cortex. PLoS Biol 4:e215.
Poremba A, Malloy M, Saunders RC, Carson RE, Herscovitch P, Mishkin M
1834
J. Neurosci., February 21, 2007 27(8):1824 –1835 Kayser et al. Visual Modulation of Auditory Cortex
(2004) Species-specific calls evoke asymmetric activity in the monkey’s
temporal poles. Nature 427:448 451.
Preuss TM, Goldman-Rakic PS (1991) Architectonics of the parietal and
temporal association cortex in the strepsirhine primate Galago compared
to the anthropoid primate Macaca. J Comp Neurol 310:475–506.
Rademacher J, Caviness Jr VS, Steinmetz H, Galaburda AM (1993) Topo-
graphical variation of the human primary cortices: implications for neu-
roimaging, brain mapping, and neurobiology. Cereb Cortex 3:313–329.
Rauschecker JP (1998) Parallel processing in the auditory cortex of pri-
mates. Audiol Neurootol 3:86 –103.
Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of
what and where in auditory cortex. Proc Natl Acad Sci USA
97:11800–11806.
Rauschecker JP, Tian B (2004) Processing of band-passed noise in the lateral
auditory belt cortex of the rhesus monkey. J Neurophysiol 91:2578–2589.
Rauschecker JP, Tian B, Hauser M (1995) Processing of complex sounds in
the macaque nonprimary auditory cortex. Science 268:111–114.
Rauschecker JP, Tian B, Pons T, Mishkin M (1997) Serial and parallel pro-
cessing in rhesus monkey auditory cortex. J Comp Neurol 382:89–103.
Recanzone GH, Schreiner CE, Sutter ML, Beitel RE, Merzenich MM (1999)
Functional organization of spectral receptive fields in the primary audi-
tory cortex of the owl monkey. J Comp Neurol 415:460 481.
Recanzone GH, Guard DC, Phan ML (2000) Frequency and intensity re-
sponse properties of single neurons in the auditory cortex of the behaving
macaque monkey. J Neurophysiol 83:2315–2331.
Rockland KS, Ojima H (2003) Multisensory convergence in calcarine visual
areas in macaque monkey. Int J Psychophysiol 50:19 –26.
Romanski LM, Bates JF, Goldman-Rakic PS (1999a) Auditory belt and
parabelt projections to the prefrontal cortex in the rhesus monkey.
J Comp Neurol 403:141–157.
Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, Rauschecker
JP (1999b) Dual streams of auditory afferents target multiple domains
in the primate prefrontal cortex. Nat Neurosci 2:1131–1136.
Schroeder CE, Foxe JJ (2002) The timing and laminar profile of converging
inputs to multisensory areas of the macaque neocortex. Brain Res Cogn
Brain Res 14:187–198.
Schroeder CE, Foxe JJ (2005) Multisensory contributions to low-level,
“unisensory” processing. Curr Opin Neurobiol 15:454 458.
Schroeder CE, Lindsley RW, Specht C, Marcovici A, Smiley JF, Javitt DC
(2001) Somatosensory input to auditory association cortex in the ma-
caque monkey. J Neurophysiol 85:1322–1327.
Seltzer B, Cola MG, Gutierrez C, Massee M, Weldon C, Cusick CG (1996)
Overlapping and nonoverlapping cortical projections to cortex of the
superior temporal sulcus in the rhesus monkey: double anterograde tracer
studies. J Comp Neurol 370:173–190.
Stein BE, Meredith MA (1993) Merging of the senses. Cambridge, MA:
MIT.
Stein BE, Meredith MA, Wallace MT (1993) The visually responsive neuron
and beyond: multisensory integration in cat and monkey. Prog Brain Res
95:79–90.
Sumby WH, Polack I (1954) Visual contribution to speech intelligibility in
noise. J Acoust Soc Am 26:212–215.
Talavage TM, Sereno MI, Melcher JR, Ledden PJ, Rosen BR, Dale AM (2004)
Tonotopic organization in human auditory cortex revealed by progres-
sions of frequency sensitivity. J Neurophysiol 91:1282–1296.
Talsma D, Doty TJ, Woldorff MG (2007) Selective attention and audiovi-
sual integration: is attending to both modalities a prerequisite for early
integration? Cereb Cortex, in press.
Tanabe HC, Honda M, Sadato N (2005) Functionally segregated neural
substrates for arbitrary audiovisual paired-association learning. J Neuro-
sci 25:6409 6418.
Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA (2005) Effects of
spatial congruity on audio-visual multimodal integration. J Cogn Neuro-
sci 17:1396 –1409.
Tian B, Rauschecker JP (2004) Processing of frequency-modulated sounds
in the lateral auditory belt cortex of the rhesus monkey. J Neurophysiol
92:2993–3013.
van Atteveldt N, Formisano E, Goebel R, Blomert L (2004) Integration of
letters and speech sounds in the human brain. Neuron 43:271–282.
Van Essen DC, Drury HA (1997) Structural and functional analyses of hu-
man cerebral cortex using a surface-based atlas. J Neurosci 17:7079–7102.
van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up
the neural processing of auditory speech. Proc Natl Acad Sci USA
102:1181–1186.
Weissman DH, Warner LM, Woldorff MG (2004) The neural mechanisms
for minimizing cross-modal distraction. J Neurosci 24:10941–10949.
Werner-Reiss U, Kelly KA, Trause AS, Underhill AM, Groh JM (2003) Eye
position affects activity in primary auditory cortex of primates. Curr Biol
13:554–562.
Wessinger CM, Buonocore MH, Kussmaul CL, Mangun GR (1997) Tono-
topy in human auditory cortex examined with functional magnetic reso-
nance imaging. NeuroImage 5:18 –25.
Wessinger CM, VanMeter J, Tian B, Van Lare J, Pekar J, Rauschecker JP
(2001) Hierarchical organization of the human auditory cortex revealed
by functional magnetic resonance imaging. J Cogn Neurosci 13:1–7.
Zatorre RJ, Bouffard M, Ahad P, Belin P (2002) Where is “where” in the
human auditory cortex? Nat Neurosci 5:905–909.
Kayser et al. Visual Modulation of Auditory Cortex J. Neurosci., February 21, 2007
27(8):1824–1835 1835
... Instead, modern neuroimaging techniques are ideal for undertaking whole-brain studies with relatively good temporal and spatial resolution. Indeed, several functional magnetic resonance imaging (fMRI) studies in human volunteers [16][17][18][19][20][21][22][23][24][25] and nonhuman primates 26,27 have explored the signal changes occurring in non-core sensory regions to unisensory stimuli, addressing such fundamental questions as which parts of the non-core sensory circuits are activated by an input into the core sensory circuit. However, more complex questions, such as how the non-core sensory circuits react to varying inputs into the core sensory circuit, have remained largely unexplored 28,29 . ...
... To date, no fMRI study has characterized or compared cross-sensory brain-wide activity in the awake and anesthetized conditions. fMRI responses to visual stimulation have been reported in cortical auditory areas in both alert and anesthetized monkeys, being larger in the alert condition 26 . Similarly, we observed cross-sensory signal changes to whisker pad stimulation in Aud in both anesthetized and awake rats, with the responses being stronger in awake animals. ...
... First, in the anesthetized animals, many regions, particularly among the high-order regions, showed no signs of evoked activity. Second, similarly to previous observations 26 , anesthesia appeared to restrict fMRI responses to a smaller area compared to measurements in conscious rats. Third, anesthesia distorted the linear stimulation frequencydependent behavior of fMRI responses along the primary sensory tract, exhibiting non-linear relationships that were not present in the awake state. ...
Preprint
Full-text available
Primary sensory systems are classically considered to be separate units, however there is current evidence that there are notable interactions between them. We examined the cross-sensory interplay by applying a quiet and motion-tolerant zero echo time functional magnetic resonance imaging (fMRI) technique to elucidate the evoked brain-wide responses to whisker pad stimulation in awake and anesthetized rats. Specifically, characterized the brain-wide responses in core and non-core regions to whisker pad stimulation by the varying stimulation-frequency, and determined whether isoflurane-medetomidine anesthesia, traditionally used in preclinical imaging, confounded investigations related to sensory integration. We demonstrated that unilateral whisker pad stimulation not only elicited robust activity along the whisker-mediated tactile system, but also in auditory, visual, high-order, and cerebellar regions, indicative of brain-wide cross-sensory and associative activity. By inspecting the response profiles to different stimulation frequencies and temporal signal characteristics, we observed that the non-core regions responded to stimulation in a very different way compared to the primary sensory system, likely reflecting different encoding modes between the primary sensory, cross-sensory, and integrative processing. Lastly, while the activity evoked in low-order sensory structures could be reliably detected under anesthesia, the activity in high-order processing and the complex differences between primary, cross-sensory, and associative systems were visible only in the awake state. We conclude that our study reveals novel aspects of the cross-sensory interplay of whisker-mediated tactile system, and importantly, that these would be difficult to observe in anesthetized rats.
... The principle of inverse effectiveness has a significant amount of empirical evidence supporting it, including at the level of individual multisensory neurons, where the increase in neuronal firing rate in response to a multisensory relative to a unisensory stimuli increases as the response to the unisensory stimuli decreases (Ghazanfar et al., 2005;Kayser et al., 2005Kayser et al., , 2007Meredith & Stein, 1986). It has also been demonstrated through neuron population studies such as electroencephalography (Crosse et al., 2016;Liu et al., 2013;Senkowski et al., 2011;, functional magnetic resonance imaging Kim et al., 2012;Stevenson & James, 2009), as well as in a range of behavioral tasks including simple detection and discrimination (Hecht et al., 2008;Krueger Fister et al., 2016;Nidiffer et al., 2016;Peiffer et al., 2007;Rach et al., 2011). ...
Article
Full-text available
Older adults experience a greater benefit from multisensory integration than their younger counterparts, but it is unclear why. One hypothesis is that age-related sensory decline weakens unisensory stimulus effectiveness, causing a boost in multisensory gain through inverse effectiveness. Many previous studies present stimuli at the same intensity for both younger and older adults (i.e., stimulus-matched), as opposed to accounting for each participant’s unique perceptual ability (i.e., perception-matched). This makes it difficult to discern the source of age-related differences in multisensory gain. As such, we used two experiments to examine whether sensory decline is contributing to age-related differences in multisensory gain. In the first, we presented auditory (pure tones in noise), visual (Gabor patches in noise), and audiovisual stimuli and recorded response times from 31 younger (18–15) and 30 older (55–80) adults. Importantly, all participants were given identical stimuli, with the expectation that older adults would show worse unisensory performance, inducing inverse effectiveness. The second task was identical (younger N = 31, older N = 34), except stimuli were presented at each participant’s 50% detection threshold, identified with an adaptive psychophysical staircase, controlling for any influence of inverse effectiveness. Older adults were found to exhibit greater multisensory gain (as measured by race model violations) on stimulus- but not perception-matched tasks, thus aligning with the principle of inverse effectiveness. That is, when accounting for potential age-related differences in perceptual abilities, older adults no longer experienced greater benefit from multisensory integration. These two experiments together suggest that the age-related increases in multisensory integration previously reported may be in part due to age-related declines in vision and audition.
Article
Full-text available
Our brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events are perceived, the specific brain regions and timings for processing different levels of information remain less investigated. To address that, we curated naturalistic videos and recorded functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data when participants viewed videos with accompanying sounds. Our findings reveal early asymmetrical cross-modal interaction, with acoustic information represented in both early visual and auditory regions, while visual information only identified in visual cortices. The visual and auditory features were processed with similar onset but different temporal dynamics. High-level categorical and semantic information emerged in multisensory association areas later in time, indicating late cross-modal integration and its distinct role in converging conceptual information. Comparing neural representations to a two-branch deep neural network model highlighted the necessity of early cross-modal connections to build a biologically plausible model of audiovisual perception. With EEG-fMRI fusion, we provided a spatiotemporally resolved account of neural activity during the processing of naturalistic audiovisual stimuli.
Article
Audiovisual (AV) interaction has been shown in many studies of auditory cortex. However, the underlying processes and circuits are unclear because few studies have used methods that delineate the timing and laminar distribution of net excitatory and inhibitory processes within areas, much less across cortical levels. This study examined laminar profiles of neuronal activity in auditory core (AC) and parabelt (PB) cortices recorded from macaques during active discrimination of conspecific faces and vocalizations. We found modulation of multi-unit activity (MUA) in response to isolated visual stimulation, characterized by a brief deep MUA spike, putatively in white matter, followed by mid-layer MUA suppression in core auditory cortex; the later suppressive event had clear current source density concomitants, while the earlier MUA spike did not. We observed a similar facilitation-suppression sequence in the PB, with later onset latency. In combined AV stimulation, there was moderate reduction of responses to sound during the visual-evoked MUA suppression interval in both AC and PB. These data suggest a common sequence of afferent spikes, followed by synaptic inhibition; however, differences in timing and laminar location may reflect distinct visual projections to AC and PB.
Article
Neural oscillations play a role in sensory processing by coordinating synchronized neuronal activity. Synchronization of gamma oscillations is engaged in local computation of feedforward signals and synchronization of alpha-beta oscillations is engaged in feedback processing over long-range areas. These spatially and spectrally segregated bi-directional signals may be integrated by a mechanism of cross-frequency coupling. Synchronization of neural oscillations has also been proposed as a mechanism for information integration across multiple sensory modalities. A transient stimulus or rhythmic stimulus from one modality may lead to phase alignment of ongoing neural oscillations in multiple sensory cortices, through a mechanism of cross-modal phase reset or cross-modal neural entrainment. Synchronized activities in multiple sensory cortices are more likely to boost stronger activities in downstream areas. Compared to synchronized oscillations, asynchronized oscillations may impede signal processing, and may contribute to sensory selection by setting the oscillations in the target-related cortex and the oscillations in the distractor-related cortex to opposite phases.
Article
Full-text available
There is growing recognition that working memory and selective attention are highly related. However, a key function of selective attention—ignoring distractors—is much less understood in the domain of working memory. In the attention domain, it is now clear that distractors’ task relevance and stimulation of multiple senses at a time (i.e., being multisensory), affect how much such information can distract from the main task, and that load modulates these effects. Here, we examined the effects of the task relevance and multisensory nature of distractors on working memory performance under high and low memory load, aiming to clarify whether distracting information similarly affects selective attention performance and working memory performance. We proposed a multiexperiment research plan involving up to three consecutive experiments, based on an initial online study (Experiment 0) with fully task-irrelevant distractors. There, we found conclusive evidence against a difference in how unisensory and multisensory distractors affected working memory performance. The next study, Experiment 1, replicated these results. However, when distractors were made partly task relevant in the subsequent Experiment 2d, multisensory distractors disrupted working memory performance more than unisensory distractors on average. However, closer nonpreregistered inspection revealed that multisensory distractors were actually only more disruptive than auditory distractors, and similarly as disruptive as visual distractors. Thus, overall, there was no strong evidence for multisensory distractors being more disruptive to working memory performance than unisensory distractors. Taken together, these experiments constitute a novel and detailed investigation of the impact of distracting information on working memory performance.
Article
Full-text available
Multisensory integration (MSI) occurs in a variety of brain areas, spanning cortical and subcortical regions. In traditional studies on sensory processing, the sensory cortices have been considered for processing sensory information in a modality-specific manner. The sensory cortices, however, send the information to other cortical and subcortical areas, including the higher association cortices and the other sensory cortices, where the multiple modality inputs converge and integrate to generate a meaningful percept. This integration process is neither simple nor fixed because these brain areas interact with each other via complicated circuits, which can be modulated by numerous internal and external conditions. As a result, dynamic MSI makes multisensory decisions flexible and adaptive in behaving animals. Impairments in MSI occur in many psychiatric disorders, which may result in an altered perception of the multisensory stimuli and an abnormal reaction to them. This review discusses the diversity and flexibility of MSI in mammals, including humans, primates and rodents, as well as the brain areas involved. It further explains how such flexibility influences perceptual experiences in behaving animals in both health and disease. This article is part of the theme issue ‘Decision and control processes in multisensory perception’.
Article
Full-text available
Using high-field (3 Tesla) functional magnetic resonance imaging (fMRI), we demonstrate that auditory and somatosensory inputs converge in a subregion of human auditory cortex along the superior temporal gyrus. Further, simultaneous stimulation in both sensory modalities resulted in activity exceeding that predicted by summing the responses to the unisensory inputs, thereby showing multisensory integration in this convergence region. Recently, intracranial recordings in macaque monkeys have shown similar auditory-somatosensory convergence in a subregion of auditory cortex directly caudomedial to primary auditory cortex (area CM). The multisensory region identified in the present investigation may be the human homologue of CM. Our finding of auditory-somatosensory convergence in early auditory cortices contributes to mounting evidence for multisensory integration early in the cortical processing hierarchy, in brain regions that were previously assumed to be unisensory.
Article
We studied the corticocortical connections of architectonically defined areas of parietal and temporoparietal cortex, with emphasis on areas in the intraparietal sulcus (IPS) that are implicated in visual and somatosensory integration. Retrograde tracers were injected into selected areas of the IPS, superior temporal sulcus, and parietal lobule. The distribution of labeled cells was charted in relation to architectonically defined borders throughout the hemisphere and displayed on computer-generated three-dimensional reconstructions and on cortical flat maps. Injections centered in the ventral intraparietal area (VIP) revealed a complex pattern of inputs from numerous visual, somatosensory, motor, and polysensory areas, and from presumed vestibular- and auditory-related areas. Sensorimotor projections were predominantly from the upper body representations of at least six somatotopically organized areas. In contrast, injections centered in the neighboring ventral lateral intraparietal area (LIPv) revealed inputs mainly from extrastriate visual areas, consistent with previous studies. The pattern of inputs to LIPv largely overlapped those to zone MSTdp, a newly described subdivision of the medial superior temporal area. These results, in conjunction with those from injections into other parietal areas (7a, 7b, and anterior intraparietal area), support the fine-grained architectonic partitioning of cortical areas described in the preceding study. They also support and extend previous evidence for multiple distributed networks that are implicated in multimodal integration, especially with regard to area VIP. J. Comp. Neurol. 428:112–137, 2000. © 2000 Wiley-Liss, Inc.
Article
Using functional magnetic resonance imaging, ten healthy subjects were scanned whilst listening to consonant-vowel syllables under three different conditions: (i) a 'no-attention' condition required subjects to ignore the stimuli; (ii) an 'attend' condition requiring attentive listening to stimuli; (iii) a 'detect' condition requiring detection of a specific target syllable. Hemodynamic responses were measured in the primary and secondary auditory cortex. These three conditions were associated with significantly different activations in the primary and secondary auditory cortex. The strongest activations were found for the 'detect' condition, followed by the 'attend' condition. The weakest activation was evident during the 'no-attention' condition. There were also stronger activations in the left hemisphere and within the primary auditory cortex. These results suggest that the primary and secondary auditory cortex play a main role in the selective attention.
Article
Using functional magnetic resonance imaging, ten healthy subjects were scanned whilst listening to consonant-vowel syllables under three different conditions: (i) a 'no-attention' condition required subjects to ignore the stimuli; (ii) an 'attend' condition requiring attentive listening to stimuli; (iii) a 'detect' condition requiring detection of a specific target syllable. Hemodynamic responses were measured in the primary and secondary auditory cortex. These three conditions were associated with significantly different activations in the primary and secondary auditory cortex. The strongest activations were found for the 'detect' condition, followed by the 'attend' condition. The weakest activation was evident during the 'no-attention' condition. There were also stronger activations in the left hemisphere and within the primary auditory cortex. These results suggest that the primary and secondary auditory cortex play a main role in the selective attention.
Chapter
In the sensory systems of the mammalian brain, one of the clearest organizing principles is the existence of topographic maps of the sensory receptors. In the neocortex, orderly representations of the receptor surface are characteristic of the primary sensory areas and some secondary areas, as well. The location and spatial extent of a given map are correlated with distinctive patterns of interneuronal connections and architectonic features of the tissue. Together, these structural and functional properties contribute to the identification of individual areas, and ultimately to the assembly of areas that comprise the cortical system dedicated to a particular sensory modality. For many regions of the brain, however, the organizational picture is uncertain because substantive data are scarce. Thus, the identification of cortical fields devoted to sensory processing remains the subject of ongoing investigation, as it has for over a century.
Article
Watching a speaker’s lips during face-to-face conversation (lipreading) markedly improves speech perception, particularly in noisy conditions. With functional magnetic resonance imaging it was found that these linguistic visual cues are sufficient to activate auditory cortex in normal hearing individuals in the absence of auditory speech sounds. Two further experiments suggest that these auditory cortical areas are not engaged when an individual is viewing nonlinguistic facial movements but appear to be activated by silent meaningless speechlike movements (pseudospeech). This supports psycholinguistic evidence that seen speech influences the perception of heard speech at a prelexical stage.
Conference Paper
Voxel intensity-based tests provide good sensitivity for high intensity signals, whereas cluster extent-based tests are sensitive to spatially extended signals. To benefit from the strength of both, we consider combining intensity and extent information. We generalize previous work by proposing the use of weighted combining functions. Using a combining framework with permutation tests, we consider a variety of ways of combining voxel intensity and cluster extent information without knowing their distribution. Further, we propose meta-combining, a combining function of combining functions, which integrates strengths of multiple combining functions into one single statistic. Using real data, we demonstrate that combined tests can be more sensitive than voxel or cluster size test alone. Though not necessarily sensitive than individual combining functions, the meta-combining function is sensitive to all types of signals, thus can be uses as a single test summarizing all the combining functions.
Article
Feedback regulation of luteinizing hormone-releasing hormone (LHRH) neurons by estradiol plays important roles in the neuroendocrine control of reproduction. Recently, we found that the majority of LHRH neurons in the rat contain estrogen receptor-β (ER-β) mRNA, whereas, they seemed to lack ER-α mRNA expression. In addition, we observed nuclear uptake of ¹²⁵I-estrogen by a subset of these cells. These data suggest that ER-β is the chief receptor isoform mediating direct estrogen effects upon LHRH neurons. To verify the translation of ER-β protein within LHRH cells, the present studies applied dual-label immunocytochemistry (ICC) to free-floating sections obtained from the preoptic area of rats. The improved ICC method using the silver-gold intensification of nickel-diaminobenzidine chromogen, enabled the observation of nuclear ER-β-immunoreactivity in the majority of LHRH cells. The incidence of ER-β expression was similarly high in LHRH neurons of ovariectomized female (87.8 ± 2.3%, mean ± SEM), estradiol-primed female (74.9 ± 3.2%) and intact male (85.0 ± 4.7%) rats. The presence of ER-β mRNA, ER-β immunoreactivity and ¹²⁵I-estrogen binding sites in LHRH neurons of the rat provide strong support for the notion that these cells are directly regulated by estradiol, through ER-β. The gene targets and molecular mechanisms of this regulation remain unknown.