Multisensory Integration Affects Visuo-Spatial Working Memory
Sapienza University of Rome and University of Granada
University of Perugia and Santa Lucia Foundation, Rome, Italy
Sapienza University of Rome and Brain Science Institute
RIKEN, Tokyo, Japan
Daniel Sanabria and Juan Lupia ´n ˜ez
University of Granada
Marta Olivetti Belardinelli
Sapienza University of Rome and ECONA, Interuniversity Center for Research in Cognitive Processing in Natural and Artificial
Systems, Rome, Italy
the access of information into visuo-spatial working memory (VSWM). In a series of four experiments, we
compared the effectiveness of spatially-nonpredictive visual, auditory, or audiovisual cues in capturing
participants’ spatial attention towards a location where to-be-remembered visual stimuli were or were not
presented (cued/uncued trials, respectively). The results suggest that the effect of peripheral visual cues in
biasing the access of information into VSWM depend on the size of the attentional focus, while auditory cues
did not have direct effects in biasing VSWM. Finally, spatially congruent multisensory cues showed an
enlarged attentional effect in VSWM as compared to unimodal visual cues, as a likely consequence of
multisensory integration. This latter result sheds new light on the interplay between spatial attention and
VSWM, pointing to the special role exerted by multisensory (audiovisual) cues.
Keywords: exogenous orienting, multisensory integration, crossmodal attention, attentional capture,
visuo-spatial working memory
Spatial attention plays a key role in selecting information from
the overwhelming amount of stimuli that continuously reach our
senses (Posner, 1980). More than a century ago, William James
(1890) proposed a distinction between two forms of attention,
active and passive. This distinction is usually rephrased nowadays
in terms of endogenous and exogenous spatial attention. While the
former is related to voluntary and goal-directed behavior, with
control exerted in a top-down manner, the latter is related to the
physical properties of objects and to automatic (bottom-up) pro-
cessing (Posner & Snyder, 1975; Jonides, 1981; though see San-
tangelo & Spence, 2008).
To date, researchers have mainly investigated the effect of the
orienting of spatial attention following either symbolic (endoge-
nous) or peripheral (exogenous) cues in the perceptual domain.
The main finding resulting from a number of studies is that stimuli
presented at attended locations are processed more rapidly and
more accurately (though see Prinzmetal, McCool & Park, 2005;
Prinzmetal, Park & Garrett, 2005, on this point) than those pre-
sented at unattended locations (e.g., Posner, 1980; Prinzmetal,
Presti & Posner, 1986; see Driver, 2001, for a review). However,
the selection of information exerted by spatial attention has im-
portant implications for other cognitive processes, such as the
encoding, maintenance, and retrieval of the selected information,
which is relevant for the observer’s current goals. Even though
many theoretical models have assumed a special role of attention
on active memory representation (Bundesen, 1990; Awh &
Jonides, 2001), these cognitive functions have been studied mostly
separately. This has left the interplay between attentional orienting
This article was published Online First May 9, 2011.
Fabiano Botta, Department of Psychology, Sapienza University of
Rome, and Department of Experimental Psychology, University of
Granada; Valerio Santangelo, Neuroimaging Laboratory, Santa Lucia
Foundation, Rome, Italy, and Department of Human and Educational
Sciences, University of Perugia, Italy; Antonino Raffone, Department of
Psychology, Sapienza University of Rome, and Laboratory for Perceptual
Dynamics, Brain Science Institute RIKEN, Tokyo, Japan; Daniel Sanabria,
Department of Experimental Psychology, University of Granada; Juan
Lupia ´n ˜ez, Department of Experimental Psychology, University of
Granada; Marta Olivetti Belardinelli, Department of Psychology, Sapienza
University of Rome, Laboratory for Perceptual Dynamics, Brain Science
Institute RIKEN, Tokyo, Japan, and ECONA, Interuniversity Center for
Research in Cognitive Processing in Natural and Artificial Systems, Rome,
We would like to thank Charles Spence, Erik van der Burg, Jan Theeu-
wes, and two anonymous reviewers for their helpful comments during the
review process. Fabiano Botta and Juan Lupia ´n ˜ez were supported by the
Spanish Ministerio de Ciencia e Innovacio ´n (PSI2008-03595PSIC and
EUI2009-04082), and the CSD2008-00048 CONSOLIDER INGENIO (Di-
reccion General de Investigacio ´n). Daniel Sanabria was supported by the
Spanish Ministerio de Ciencia e Innovacio ´n (SEJ-2007-63645).
Correspondence concerning this article should be addressed to Dr.
Fabiano Botta, Department of Psychology, Sapienza University of Rome,
Via dei Marsi 78, 00185 - Roma - Italy. E-mail: fabianobottaster@gmail
Journal of Experimental Psychology:
Human Perception and Performance
2011, Vol. 37, No. 4, 1099–1109
© 2011 American Psychological Association
and memory access relatively unexplored until recently. For in-
stance, Desimone and Duncan (1995) proposed that when an
observer has to select a specific target from a multi-stimuli display,
a representation of the target will be pre-activated in working
memory1before the presentation of the display. This proposal
suggests a strict overlap between these two cognitive functions,
highlighting that both attention and working memory are based on
the need to extract relevant information. The main difference is
that in working memory the selection occurs in the absence of the
Smyth and Scholey (1994) were among the first authors to
demonstrate the relationship between spatial attention and spatial
working memory empirically. They showed that their participants’
spatial memory span (as measured by the classical Corsi blocks
test; Corsi, 1972) decreased when they were engaged with a
secondary task requiring attentional shifts during the rehearsal
delay. In a similar vein, a study by Awh, Smith, and Jonides (1995)
demonstrated that the instruction to retain a location in working
memory resulted in a side benefit: When a given stimulus was
presented during the retention interval, faster responses were ob-
served for stimuli that were presented at memorized locations.
More recently, Schmidt, Vogel, Woodman and Luck (2002; cf.
Luck & Vogel, 1997), combined an attentional cuing paradigm
with a visuo-spatial working memory (VSWM) task. Schmidt et al.
reported that peripheral spatial cues increased the likelihood that
visual information presented at the cued (or nearby) locations will
be transferred into VSWM (see also Griffin & Nobre, 2003, for
similar results using endogenous cues). On the basis of these and
other studies (see Awh, Vogel, & Oh, 2006, for a review), it has
been proposed that the maintenance of information in VSWM
might be accomplished by an attention-based rehearsal mechanism
(Awh et al., 1999; Awh & Jonides, 2001; Postle, Awh, Jonides,
Smith, & D’Esposito, 2004). In particular, this hypothesis suggests
that covert shifts of spatial attention could aid information main-
tenance by providing a functional marker for location-specific
representations in working memory, in a similar way as the covert
articulation is engaged in maintaining information within the pho-
nological loop (Baddeley & Hitch, 1974).
Despite the increasing evidence and theoretical models high-
lighting a strong relationship between spatial attention and
VSWM, an important issue still remains unexplored: the interac-
tion between VSWM and the attentional bias exerted by other
sensory modalities. It is by now well-established that a spatial cue
can capture attention at the location where it was presented,
producing facilitatory effects not only within the cued modality but
also in other sensory modalities. The existence of the so-called
‘crossmodal links’ has been documented by a large body of re-
search in both the endogenous and exogenous orienting of spatial
attention (see Eimer, 2004; Spence & Driver, 2004, for reviews).
The question remains as to whether these crossmodal links also
exist in the interplay between spatial attention and VSWM.
The aim of the present study was therefore to examine the effect
of visual, auditory and audiovisual exogenous cues on storage in
VSWM, by combining a change detection-based working memory
task (cf. Luck & Vogel, 1997) with a classical cueing paradigm
(Posner, 1980). Across four experiments, we compared the effec-
tiveness of presenting spatially-nonpredictive unimodal (visual or
auditory) and multisensory (audiovisual) cues in capturing partic-
ipants’ spatial attention toward the hemifield in which a to-be-
remembered stimulus was presented (cued trials) or toward the
opposite hemifield (uncued trials). A crucial point is that we also
addressed whether multisensory cues play a special role (as com-
pared to unimodal cues) in biasing access to VSWM, thus provid-
ing an increased likelihood to recall a to-be-remembered item
presented at the cued location.
It has been proposed that crossmodal cueing effects might be
related to the activation of multisensory neurons from several brain
regions (such as the superior colliculus, the parietal lobe, the
superior temporal sulcus, and the putamen) that have been shown
to respond to stimuli presented in more than one sensory modality
(e.g., Calvert, Spence & Stein, 2004; Graziano & Gross, 1995;
Stein & Meredith, 1993). Moreover, multisensory spatially and
temporally aligned stimuli have been shown to elicit activations in
these neurons that exceed responses to single unimodal stimuli,
giving rise, in some cases, to ‘superadditive effects’ (i.e., the
activation following multisensory stimulation is larger than the
sum of activations following single unimodal stimuli). For exam-
ple, in nonhuman mammals, single-cell recording studies have
shown that neurons in the superior colliculus are activated by
auditory, visual, and tactile stimuli (Stein, Meredith, & Wallace,
1993) and sometimes respond superadditively to spatially and
temporally aligned multisensory stimuli, especially if presented at
near threshold levels (see e.g., Wallace, Meredith, & Stein, 1998;
see also Holmes & Spence, 2005, for a review). What is more,
Macaluso Frith, and Driver (2002) indicated that multisensory
effects do not only affect heteromodal regions: Seemingly, uni-
modal areas can also be affected by crossmodal interactions, likely
by means of modulatory back-projections from multisensory to
unimodal brain areas (e.g., visual areas), which might mediate
crossmodal spatial effects (e.g., Macaluso & Driver, 2005). For
instance, Zimmer and Macaluso (2007) asked whether increased
working memory load affect crossmodal processing in the visual
cortex. They showed reliable increase of brain activity for congru-
ent vs. incongruent visuo-tactile (bimodal) stimuli in contralateral
occipital cortex, irrespective of the current level of load. On the
basis of these findings, they concluded that processing of visuo-
tactile spatially congruent stimuli in the visual cortex was not
affected by working memory load. While Zimmer and Macaluso’s
study points to an independence between working memory load
and multisensory integration, indicating that working memory load
does not modulate multisensory processing, the relevant issue here
is to investigate whether processing of multisensory bimodal cues
can affect VSWM performance.
Experiment 1A and 1B
In Experiment 1, we investigated the role of visual (Experiment
1A) and auditory (Experiment 1B) cues in biasing the transfer of
information into VSWM. Moreover, we used three different levels
of working memory load in order to examine whether the atten-
tional bias in VSWM depended on the number of to-be-
1Working memory enables the temporary maintenance of information in
a limited-capacity system that promotes efficient access and updating by
other cognitive processes (Baddeley & Hitch, 1974).
BOTTA ET AL.
unteers, all university students. 14 of them (6 male, mean age 23
years old, ranging from 19–28 years old) participated in Experi-
ment 1A, while the other 12 (4 male, mean age 25 years old,
ranging from 19–30 years old) participated in Experiment 1B. All
the participants had normal or corrected-to-normal visual acuity,
normal color discrimination and no history of neurological disor-
ders. We also assessed (here and in the following experiments)
their auditory localization ability by means of pretest trials in
which they had to indicate the side (left or right) of the onset of the
auditory stimulus that served as a spatial cue in the main experi-
ment. They were naı ¨ve of the purpose of the study, which lasted
for approximately 30 minutes.
The stimuli were displayed on a light grey back-
ground (x ? .312, y ? .329, 52.7 cd/m2) on a 17" CRT video
monitor (refresh rate ? 60 Hz) located in a dark and quiet room.
The distance between the participant’s head and the video monitor
was approximately 60 cm. A black fixation cross was continuously
displayed at the center of the screen. Each memory array consisted
of 4, 6, or 8 filled colored squares (each subtending a visual angle
of 0.65° ? 0.65°), which were randomly located on the screen with
the constraint that half of them were presented in the right hemi-
The group of participants consisted of 26 vol-
field and the remaining half in the left hemifield (see Figure 1).
The colored squares were located within a 4.9° ? 3.5° region
centered in the middle of each hemifield, at about 6.85° of visual
angle from the central fixation point (center-to-center). The color
of each square was randomly selected from a set of seven easily
discriminable colors: brown (x ? .64, y ? .329, 4.5 cd/m2), blue
(x ? .150, y ? .060, 7.22 cd/m2), green (x ? .30, y ? .60, 71.5
cd/m2), cyan (x ? .224, y ? .328, 78.7 cd/m2), red (x ? .64, y ?
.330, 21 cd/m2), yellow (x ? .419, y ? .505, 92.7 cd/m2), and
violet (x ? .320, y ? .154, 28.4 cd/m2). The visual cue (Experi-
ment 1 A) consisted of a black outlined square (subtending an
angle of 1° ? 1°) presented at the center of the left or right
hemifield. The auditory cue (Experiment 1B) consisted of a 50-ms
pure tone of 500 Hz presented at 65 dB (measured at the partici-
pant’s head position) by means of two loudspeakers, located one at
either side of the computer monitor (at an eccentricity of approx-
imately 18°). The test arrays consisted of 4, 6, or 8 squares
presented at the same locations as the memory array, and of a
decision box (i.e., a white outlined square drawn around one of the
squares; see Figure 1).
Each trial began with the presentation of two
digits just above the fixation cross for 900 ms. Participants were
instructed to repeat aloud these two digits throughout the duration
cross) together with two numbers (verbal load) were presented for 900 ms. After 1500 ms, either a visual
(Experiment 1A), an auditory (Experiment 1B) or an audiovisual (Experiment 2) cue was presented for 100 ms
(equiprobably on either the left or right side). After 50 ms, a memory array consisting of four, six, or eight
to-be-remembered colored squares (Experiment 1) or six to-be-remembered colored squares (Experiment 2) was
presented for 100 ms. (For display purposes, we presented here only the set-size 6 condition of Experiment 1.)
The number of squares was always balanced across the left and right hemifield. After a retention interval of 900
ms, a test array consisting of squares presented at the same locations of the memory array, and a decision box
were presented for 2000 ms. The participants had to establish whether the color of square marked by the decision
box matched the color of the corresponding square in the memory array.
Schematic illustration of a sequence of trials in Experiments 1 & 2. A visual starting signal (a central
SPATIAL ATTENTION AND VISUO-SPATIAL WORKING MEMORY
of the trial, in order to minimize the contribution of verbal WM (cf.
Schmidt et al., 2002). The experimenter was always present in the
testing room in order to ensure the participants’ adherence to these
instructions. After 1,500 ms, the spatial cue (visual or auditory)
appeared for 100 ms in one of the two hemifields. After a 50 ms
blank interval from the offset of the spatial cue, the memory array
was presented for 100 ms. This array was followed by a 900 ms
blank period and then by the test array that appeared for 2000 ms
(see Figure 1). Participants had 3 seconds to respond, starting from
the onset of the test array. On half of the trials, the square marked
by the decision box had the same color as in the memory array,
while in the remaining trials the square had a different color.
The visual (Experiment 1A) and auditory (Experiment 1B)
spatial cues were equiprobably presented on either side of the
fixation point. The cues were also spatially-nonpredictive, i.e.,
they were equiprobably presented in either the same or opposite
side as compared to the decision box, so that half of the trials were
‘cued’ and the other half were ‘uncued’. The participants made a
speeded manual response on a keypad. They pressed either the ‘L’
key with their right hand or the ‘D’ key with their left hand in order
to discriminate whether or not the color of the square within the
decision box matched the color of the square at the corresponding
location in the memory array. The response-key assignment was
counterbalanced across participants. Response accuracy was pri-
oritized over response speed. For this reason, our analyses were
focused on the accuracy data. In each experiment (1A & 1B),
participants completed three blocks of 96 experimental trials, for a
total of 288 trials [(with 12 trials for each combination of set size
(3) ? cue side (2) ? target side (2) ? color change (2)]. A short
rest was allowed between blocks. Before the start of the experi-
ment, the participants performed 30 practice trials in order to
familiarize with the task.
A 3?2 within-participants ANOVA with the factors of set size
(4, 6, or 8 squares) and cueing (cued vs. uncued trials) was
performed on the accuracy data.
Experiment 1A (Visual cue)
The ANOVA revealed a significant main effect of cueing, F(1,
13) ? 27.3, p ? .001, d ? 1.06, indicating greater accuracy for
cued (M ? 81 %) than for uncued (M ? 74 %) trials. In line with
previous reports (e.g., Luck & Vogel, 1997; Vogel, Woodman &
Luck, 2001), the analysis also revealed a significant main effect of
set size, F(2, 26) ? 45.3, p ? .001. Bonferroni-corrected post hoc
comparisons showed greater accuracy for set size 4 (M ? 85%)
than for set size 6 (M ? 76%; p ? .001, d ? 1.38), which, in turn,
was greater than for set size 8 (M ? 71%; p ? .01, d ? 0.68).
However, the two factors did not interact, F ? 1, suggesting that
the cueing condition did not affect differentially the participants’
VSWM performance at different set sizes (see Figure 2A). Misses
(trials in which participants did not respond) occurred only seldom
(0.007% of trials, overall) and were not further analyzed.
Experiment 1B (Auditory cue)
In Experiment 1B, the main effect of cueing was not statistically
significant, F ? 1, revealing no significant differences between
cued (M ? 78.7 %) and uncued (M ? 78.6 %) trials. By contrast,
the main effect of set size was significant, F(2, 22) ? 58.63, p ?
.001. Just as for Experiment 1A, Bonferroni-corrected post hoc
comparisons showed greater accuracy for set size 4 (M ? 87.9 %)
in comparison with set size 6 (M ? 77.6 %; p ? .001, d ? 1.59),
which, in turn, was greater than for set size 8 (M ? 70.5 %; p ?
.01, d ? 0.94). The interaction between the two factors was once
again not significant, F ? 1 (see Figure 2B). Misses occurred only
seldom (0.002% of trials, overall) and were not further analyzed.
The results of Experiment 1A clearly showed that visual cues
affected VSWM performance irrespective of the current working
memory load (i.e., the set size), thus confirming the role of
exogenous visual attention in biasing the transfer of information
into VSWM (see Botta, Santangelo, Raffone, Lupia ´n ˜ez & Olivetti
Belardinelli, 2010; Schmidt et al. 2002, for similar results). On the
other hand, the auditory cues used in Experiment 1B did not appear
(4, 6, or 8 memory objects) in Experiments 1A and 1B. The error bars represent the standard error of the means.
A & B) Mean percentage of response accuracy for cued and uncued trials as a function of the set-size
BOTTA ET AL.
to bias the access of information in VSWM. This result appears
surprising in light of the extant literature, since the existence of
crossmodal links (as mentioned above) have been consistently
documented in attentional/perceptual studies (e.g., Spence &
The inefficiency of auditory cues in biasing VSWM change-
detection performance in Experiment 1B could simply have been
due to the discrepancy between the sensory modalities in which
cues and probes were presented. Nonetheless, we believe that this
explanation does not appear plausible since it is in sharp contrast
with many evidences of crossmodal auditory effects on perfor-
mance in visual tasks. In our opinion, a more compelling hypoth-
esis is that the absence of auditory effects was perhaps a conse-
quence of an insufficient spatial overlap between the auditory cue
and to-be-remembered visual stimuli. If that is true then there are
two alternatives to explain the results in the auditory condition in
Experiment 1B. On one side, it is possible that the absence of
spatial overlap was due to the fact that the spatial resolution of
sound localization was very low in comparison with the visual
spatial resolution. It is therefore possible that the absence of any
auditory effects in biasing VSWM performance was caused by a
less precise localizability of the auditory event (Burr & Alais,
2006). Alternatively, as the auditory cue was presented more
peripherally than the visual stimuli, the absence of auditory effects
could have been due to the absence of spatial overlap between the
two events (e.g., see Eimer & Schröger, 1998). This possibility has
been tested in Experiment 4 and in the second follow-up experi-
ment (see “Results and Discussion” of Experiment 4).
In Experiment 2, we decreased the spatial resolution of the
visual peripheral cues to test the hypothesis that the VSWM bias
operated by spatial attention depends on the spatial resolution of
the sensory modality stimulated by the peripheral cue (thus per-
mitting a comparison with the lower spatial resolution of the
auditory cues). In particular, we enlarged the size of the visual cue,
thus making it less precise in signaling a specific location in one or
the other hemifield. We expected that the reduced spatial resolu-
tion would make the visual cue ineffective in biasing access into
VSWM, just as for the auditory spatial cues in Experiment 1B.
Indeed, according to the ‘zoom lens’ model (Eriksen & St. James,
1986),2the attentional focus can be contracted or expanded as
required by the task or instructions; as a consequence, the strength
of the attentional capture decreases with the increase of the size of
the focus itself (see also Belopolsky, Zwaan, Theeuwes, &
In Experiment 2, we also addressed whether multisensory (au-
diovisual) cues play a special role (as compared to unimodal cues)
in biasing access in VSWM. We hypothesized that, even if audi-
tory cues presented alone were not able to produce an attentional
bias in VSWM, when presented together with visual cues they
could contribute to an enlarged attentional bias in VSWM perfor-
mance as compared with unimodal visual spatial cues (see also
Spence & Santangelo, 2009, for a review).
periment 2. All participants (4 males, mean age 26 years old,
Thirteen university students participated in Ex-
ranging from 19–28 years old) had normal or corrected-to-normal
visual acuity, normal hearing, normal color discrimination, and no
history of neurological problems. They were naı ¨ve as to the pur-
pose of the study, which lasted for approximately 30 minutes.
Stimuli and procedure.
The stimuli and procedure were
identical to those used in Experiment 1, with the following excep-
tions: Now the exogenous cue could be visual (a black outlined
square, subtending a visual angle of about 5° ? 5°), auditory
(which was now presented at 80 dB, i.e., 15 dB louder that in
Experiment 1B) or audiovisual (simultaneous presentation of both
the auditory and the visual cues on the same side; see Figure 1).
We increased the sound intensity of the auditory cue to investigate
whether the absence of the effect in Experiment 1B was due to a
lack of perceptual salience. The cue types were equiprobable and
equiprobably presented on either side of fixation. Finally, since
Experiment 1A & B revealed no interaction between cueing and
set size, we used only the middle set size in Experiment 2,
consisting of six to-be-remembered squares.
Results and Discussion
A 3?2 within-participants ANOVA with the factors of cue type
(visual, auditory, audiovisual) and cueing (cued vs. uncued trials)
was carried out on the accuracy data. Given that one of our main
aims was to assess whether multisensory (audiovisual) cues elic-
ited larger cueing effects (i.e., cued–uncued difference) than single
visual/auditory cues, we also performed planned comparisons to
test for the magnitude of cueing effects elicited by each type of
The analysis on the accuracy data revealed a marginally significant
main effect of cueing, F(1, 12) ? 4.67, p ? .051, d ? 0.41, indicating
type failed to reach significance, F ? 1, indicating that participants’
responses were overall similar following visual (M ? 70.1 %), audi-
tory (M ? 71.2 %), and audiovisual (M ? 69.6 %) cues. More
important, the analysis revealed a significant interaction between the
two factors, F(2, 24) ? 7.78, p ? .01, indicating that the magnitude
type of cues (see Figure 3A). Planned comparisons showed a signif-
icant cueing effect only for the multisensory (audiovisual) condition
(magnitude ? 8.8 %; p ? .003, d ? 1.01), while no significant effects
were found following visual (magnitude ? 3.6 %; p ? .123, d ?
0.39) or auditory (magnitude ? ?3.0 %; p ? .169, d ? ?0.36) cues.
Misses occurred only seldom (0.006% of trials, overall).
The results of Experiment 2 suggest a special role of multisen-
sory cues in biasing VSWM access. In fact, our audiovisual cues
were more effective in their capability of biasing information
transferred into VSWM, as compared to unimodal visual or audi-
tory cues. Concerning the intramodal visual cue, consistent with
our hypothesis, reducing its spatial resolution (by increasing its
size) resulted in a decreased VSWM bias. This result is consistent
with several studies that have highlighted a decrease of the mag-
2Eriksen and St. James (1986) suggested that visual attention might
operate as the zoom lens of a camera. Thus, spatial attention can either be
focused on a specific (and circumscribed) location in the visual field, with
a high level of spatial resolution, or on a larger area of the visual space, but
with a reduced spatial resolution.
SPATIAL ATTENTION AND VISUO-SPATIAL WORKING MEMORY
nitude of attentional orienting effects with a concurrent increase of
the size of the spatial cue, in agreement with the ‘zoom lens’ model
(e.g., Castiello & Umilta `, 1992; Eriksen & St. James, 1986; Handy,
Kingstone, & Mangun, 1996; Turatto et al., 2000). It is important
to note that the present results also add new evidence about this
effect in terms of VSWM performance.
Regarding the auditory cue, the results of Experiment 2 are con-
sistent with those of Experiment 1B, in showing no effects of auditory
cueing on VSWM performance. It is unlikely that the absence of
auditory effects reported in Experiment 1B and Experiment 2 (in
which we increased the intensity of the sound cue) was due to the low
salience of the auditory cues, as the same type of cues have been
shown to be effective in many perceptual attention tasks (e.g., Spence
& Driver, 1997; see also Spence, 2010, for a review).
When the auditory cues were presented together with visual cues
(i.e., the audiovisual cue condition) they contributed to an enlarged
attentional bias in VSWM, as compared to unimodal spatial cues. We
argue that this effect was the consequence of multisensory integration
(e.g., Wallace, Meredith, & Stein, 1998) taking place between the
visual and auditory component of the audiovisual cue. However, at
least two alternative accounts must be considered before accepting an
explanation in terms of multisensory integration. Indeed, a trivial
account of the observed results would be that our multisensory (au-
diovisual) cues simply provided increased perceptual saliency in
comparison with unimodal cues (cf. Santangelo & Spence, 2007).
Moreover, one might argue that the auditory cue in the present setting
simply involved an increase of the participants’ state of alertness,
irrespective of its lateralization. In other words, the spatial alignment
between the auditory and visual components of the audiovisual cues
would not be crucial in this case (see e.g. Ho, Santangelo, & Spence,
2009). Also the presence of a sound in our audiovisual cues could
have simply made the visual cue more localizable (see e.g. Alais &
Burr, 2004), so giving rise to an enlarged attentional bias in VSWM.
To rule out these alternative hypotheses, we conducted a further
experiment in which we compared three types of peripheral cues: A
thicker visual cue (to increase its perceptual salience); a “standard”
multisensory (audiovisual) cue, identical to the one used in Experi-
ment 2; and a multisensory cue consisting of a lateralized visual cue
and a central auditory cue (audiovisual/auditory neutral).
If the enlarged effect following audiovisual cues observed in Ex-
periment 2 was a consequence of an increased perceptual salience,
then we would expect no difference in VSWM performance between
the audiovisual and the thicker visual cue. Analogously, if the mul-
tisensory effect was due to an increase in the participants’ alerting
state, then we would expect no differences between the audiovisual
and the audiovisual/auditory neutral cues. By contrast, if the effect is
observed only for the audiovisual cue, then it might be concluded that
the audiovisual effect of Experiment 2 was the consequence of the
space-based multisensory integration occurring between the two cues.
dents. All of them (5 male, mean age 22.5 years old, ranging from
18–31 years old) had normal or corrected-to-normal visual acuity,
normal hearing, normal color discrimination and no history of
The participants consisted of 22 university stu-
following Audiovisual (AV), Auditory (A) and Visual (V) cues in
Experiment 2. B) Mean percentage of response accuracy for cued and
uncued trials following bimodal (B), Audiovisual/auditory neutral
(A/AN), and Thicker Visual (TV) cues in Experiment 3. C) Mean
percentage of accuracy for cued and uncued trials following Audiovi-
sual (AV), Auditory (A) and Visual (V) cues in Experiment 4. In all
graphs, the error bars represent the standard error of the means.
A) Mean percentage of accuracy for cued and uncued trials
BOTTA ET AL.
neurological problems. They were naı ¨ve as to the purpose of the
study, which lasted for approximately 30 minutes.
Stimuli and procedure.
The stimuli and procedure were
identical to those used in Experiment 2, except that now the
exogenous cue could be Audiovisual (AV, which consisted of the
simultaneous presentation of both a black outlined square and a
pure tone, just as in Experiment 2); Thicker Visual (TV, which
consisted of a black outlined square with a thickness border double
than the visual cue previously used); or Audiovisual/Auditory
Neutral (AV/AN, identical to the audiovisual cue, but with the
auditory component presented from a central location, by means of
the simultaneous presentation of the auditory cue from both the left
and right loudspeakers, and, consequently, not spatially-aligned
with the visual component; e.g., Ho et al., 2009). Note that the set
size was fixed at six to-be-remembered objects.
Results and Discussion
We performed a 3?2 within-participants ANOVA on the accu-
racy data with the factors of cue type (Audiovisual, Thicker Visual
and Audiovisual/Auditory Neutral) and cueing (cued vs. uncued
trials). This analysis revealed a significant main effect of cueing,
F(1, 21) ? 11.1, p ? .01, d ? 49.8, indicating that participants
responded more accurately on cued (M ? 71.3 %) than on uncued
trials (M ? 67.2 %). The analysis did not reveal any significant
differences between Audiovisual (M ? 69.6 %), Thicker Visual
(M ? 69.8 %) and Audiovisual/Auditory Neutral (M ? 68.4 %)
cues, F ? 1. The interaction between the two factors did not reach
statistical significance, F(2, 42) ? 1.84, p ? .17. However, just as
for Experiment 2, we performed planned comparisons given their
theoretical relevance to our hypotheses. These comparisons re-
vealed once again a significant biasing effect in VSWM following
audiovisual cues (magnitude ? 7.3 %; p ? .002, d ? 0.72) but not
following thicker visual or audiovisual/auditory neutral cues (mag-
nitude ? 1.3 %; p ? .569, and magnitude ? 3.4 %; p ? .150,
respectively; see Figure 3B). Misses occurred only seldom
(0.003% of trials, overall).
To compare the effects elicited by audiovisual and visual cues
and to further test the effect of perceptual saliency, we collapsed
together the accuracy data following audiovisual cues in Experi-
ments 2 and 3, and the accuracy data following visual cues of
Experiment 2 and thicker visual cues of Experiment 3 (because
they are essentially visual in nature). We then performed a 2?2?2
ANOVA on the accuracy data with the within-participants factors
of cue type (audiovisual and visual) and cueing (cued vs. uncued
trials) and the between-participants factor of Experiment (2 and 3).
This analysis revealed a significant main effect of cueing F(1,
33) ? 18.25, p ? .001, indicating that participants responded more
accurately on cued (M ? 72.3%) than on uncued trials (M ?
67.2%). No differences were found between audiovisual (M ?
69.7%) and visual (M ? 69.7%) cues, F ? 1. A crucial point is
that there was a significant interaction between the two factors F(1,
26) ? 5.7, p ? .02, indicating that the magnitude of the biasing
effect in VSWM elicited by audiovisual cues (7.9%) was signifi-
cantly larger than that elicited by visual cues (2.2%). This analysis
confirmed the previous results, indicating an enlarged bias in
VSWM following audiovisual, in comparison with unimodal vi-
sual cues.3Neither the between-participants factor of experiment
nor the interaction between cue type, cueing, and experiment were
significant (both Fs ? 1), indicating that the difference in the
magnitude of the biasing effects between visual and audiovisual
cues was not due to any increase of perceptual salience.
The results described above are in line with the findings of
Experiment 2, while at the same time they rule out two alternative
accounts of the observed multisensory effect. Just as in Experiment
2, the audiovisual condition produced a larger attentional bias
effect which did not seem to depend on any effect of increased
perceptual saliency. These results confirm that the multisensory
effect was likely due to multisensory integration taking place
between the visual and auditory components of the audiovisual
cue. In fact, the results of Experiment 3 seem to suggest that the
two components of the audiovisual cues must be spatially congru-
ent (lateralized and on the same side) in order to be integrated and
then be effective in biasing VSWM performance more than uni-
modal (visual) cues. The latter result also ruled out any explanation
in terms of a general increase in the participants’ alerting state
provided by the auditory component of the audiovisual cue, al-
though we cannot completely rule out the possibility that the
central auditory cue was less salient that the same auditory periph-
Experiments 1–3 indicated that auditory cues were able to elicit
VSWM biases only when presented in combination with
(spatially-aligned) visual cues, a result that contrasts with the
extant literature (Spence & Driver, 1999b; Santangelo & Spence,
2007). For this reason, we conducted a final experiment to control
for some potential methodological issues that may have caused the
absence of any auditory modulations on VSWM: a) we eliminated
the articulatory suppression procedure, which might have reduced
the effectiveness of the auditory cue in modulating WM perfor-
mance; b) we substituted the pure tone with a white noise burst
(that is, a more localizable auditory stimulus; see Koelewijn et al.,
2009; Spence & Driver, 1997); c) we located the speakers behind
the screen so that they were perfectly spatially aligned with the
visual component of the cue; d) we controlled for ocular move-
ment effects by reducing the total stimulus exposure.
dents. All them (3 male, mean age 23 years old, ranging from
18–35 years old) had normal or corrected-to-normal visual acuity,
normal hearing, normal color discrimination and no history of
neurological problems. Three participants were excluded from the
analyses since their overall accuracy was 2.5 standard deviations
above or below the mean. All of the participants were naı ¨ve as to
the purpose of the study, which lasted for approximately 30 min-
Stimuli and procedure.
The stimuli and procedure were
identical to those used in Experiment 2, except that now we did not
use an articulatory suppression procedure, and we used a white
noise burst (50-ms presented at 70 dB) instead of the pure tone as
The participants consisted of 23 university stu-
3Note that the small-size visual cue of Experiment 1A (set size 6; effect
size 8.7%) elicited a significantly larger cuing effect than the large-size
visual cue used in Experiment 2 (3.6%) (one-tail t(25) ? 1.73, p ? .05).
SPATIAL ATTENTION AND VISUO-SPATIAL WORKING MEMORY
the auditory cue and as the auditory component of the multisensory
audiovisual cue. Moreover, we positioned the loudspeakers behind
the screen, thus giving the subjective impression of a spatial
correspondence between the auditory stimuli and the location
where the visual memory items were presented. Finally, we re-
duced the SOA between the cue presentation to the memory array
to 180 ms of total exposure duration (cue 50 ms ? delay 30 ms ?
memory array 100 ms) to control for ocular movement effects.
Results and Discussion
We performed a 3?2 within-participants ANOVA on the accu-
racy data with the factors of cue type (visual, auditory and audio-
visual) and cueing (cued vs. uncued trials). This analysis revealed
a significant main effect of cueing F(1, 19) ? 5.46, p ? .031,
indicating that participants responded more accurately on cued
(M ? 68.6 %) than on uncued trials (M ? 64.4 %). The analysis
did not reveal any significant differences between Audiovisual
(M ? 64.7 %), Auditory (M ? 67.2 %) and Visual (M ? 67.6 %)
cues, F(2, 40) ? 2.1, p ? .13. It is crucial that the analysis revealed
a significant interaction between the two factors F(2, 38) ? 3.47,
p ? .041, showing that the magnitude of the bias effect in VSWM
was significantly different between the three cue types. Planned
comparisons showed a significant cueing effect only for the au-
diovisual condition (magnitude ? 8.6 %; p ? .008, d ? 1.01),
while no significant effects were found following visual (magni-
tude ? 3.6 %; p ? .2, d ? 0.39) or auditory (magnitude ? 0.5 %;
p ? .73, d ? 0.07; see Figure 3C) cues. Misses occurred only
seldom (0.01% of trials, overall).
These findings are consistent with those shown in Experiments
1–3, pointing once again to an enlarged attentional bias effect
following multisensory audiovisual cueing in comparison with
unimodal cues. Furthermore, the absence of auditory attentional
bias observed in Experiment 4 rules out the possibility that the lack
of attentional auditory effects in Experiments 1 and 2 was due to
a low perceptual salience of the auditory cue or to an imprecise
spatial alignment between the auditory cue and the visual memory
items or else to the interference eventually produced by the artic-
ulatory suppression task.
Overall, the results of Experiment 4 indicated that the same
auditory cues typically used in classical attentional paradigms
seem to be ineffective in biasing the information encoding in
VSWM. In any case, we made sure that the auditory cue used in
Experiment 4 was able to capture attention in a simple attentional
color discrimination task. Moreover, we further analyzed the in-
efficiency of auditory cues in biasing VSWM access. In particular,
we presented visual arrays at more eccentric locations to exclude
that the lack of attentional auditory effects would be due to a
difficult sound localization. This was done by means of two other
In the first follow-up experiment, we asked 8 new volunteers (4
male, mean age 27.1, years, ranging from 20–38 years) to perform
a simple color discrimination task. Each trial consisted of the
presentation of a central fixation cross. After 1500 ms, an auditory
spatial cue (white noise burst) was presented for 50 ms equiprob-
ably on one of the two hemifields (the sound sources were spatially
aligned with the subsequent visual targets as in Experiment 4, by
positioning the loudspeakers behind the screen in spatial corre-
spondence with the target location). The target which consisted of
a colored square could either appear for 100 ms on the same side
(cued trials) or opposite side (uncued trials) as the auditory cue.
Participants were instructed to press one of two buttons to dis-
criminate between two possible target colors. A t-test comparison
between cued and uncued trials was performed on the RT data. The
analysis revealed that participants were reliably more rapid re-
sponding in cued (355 ms) than in uncued trials (366 ms), t(7) ?
3.29, p ? .02. The same analysis performed on accuracy data did
not show any significant difference between cued and uncued
trials, (p ? .46).
In the second follow-up experiment, we further investigated the
inefficiency of auditory cues in biasing VSWM encoding. The
experiment was identical to Experiment 1B, but now we ‘moved’
the to-be-memorized visual stimuli to more eccentric locations, at
approximately 14° of visual angle, near to the borders of the
monitor and contiguously to the loudspeaker, which were located
one on either side of the computer monitor (the angular separation
between the middle of visual display and the middle of loud-
speaker was approximately 6°). The logic was to maximize the
spatial proximity between the auditory cue and the to-be-
remembered visual stimuli, but at the same time to exclude the
possibility of the lack of an attentional effect due to a difficulty in
localizing the sound cue. In this experiment (N ? 12, 4 male, mean
age 24 years old, ranging from 18–31 years old), we used only one
set size condition (4 squares). The auditory cues were the same as
in Experiment 4. Again, no effects of auditory cueing on informa-
tion transferring into VSWM were observed, as shown by a two
tailed t-test performed on the accuracy data deriving from cued
(69.7%) and uncued trials (70.1%) (p ? .8).
Taken together the results of these two control experiments (and
those of Experiment 4) showed that the auditory cue used here was
efficient in capturing spatial attention (in a typical cuing para-
digm), but not in biasing the transfer of information into VSWM.
The aim of the present study was to investigate the effect of
visual, auditory, and multisensory (audiovisual) cues on VSWM
performance. The main results were the following: a) The effect of
peripheral visual cues in biasing the access of information into
VSWM depended on the size of the visual cue (see Experiment 1
and 2); b) The visual attentional bias in VSWM did not interact
with the memory load (see Experiment 1); c) Auditory cues did not
have direct effects in biasing VSWM, presumably as a conse-
quence of their low spatial resolution (see Experiments 1a, 2 and
4); d) Multisensory cues showed enlarged attentional biasing ef-
fects in comparison with unimodal cues, as a likely consequence of
multisensory integration (see Experiments 2, 3 and 4).
We argue that in the experiments reported in this manuscript, the
cue presentation automatically led to an increase of the attentional
weight of all incoming information within the cued region. As a
consequence, visual objects in this preactivated spatial region were
more likely to “win the race” to enter in the limited capacity
VSWM. This is in line with Bundesen’s (1990) “Theory of Visual
Attention,” where the limited capacity of visual processing has the
form of a race between objects of the visual field to become encoded
in visual working memory, whose storage capacity seems to be
limited to only 3–4 objects (see Luck & Vogel, 1997). Commenting
on Posner’s cost and benefits paradigm, Bundesen (1990) argued:
BOTTA ET AL.
“Increasing the attentional weight of an element (in relation to the
weights of other elements in the visual field) speeds processing of that
element at the expense of other elements. Accordingly, if the target
appears in the cued location, performance should improve in both
latency (the time taken to sample the required information) and
accuracy (the probability that the information is sampled from the
display). Otherwise, if the cue is misleading, performance should
degrade in both latency and accuracy” (p. 532). We assume that a
similar reasoning could be applied to our results, within an exogenous
attentional cueing framework.
The novelty of our results lies in the differential biasing effect in
VSWM produced by visual, auditory, and audiovisual cues. In
effect, both visual (although depending on the size of the cue) and
audiovisual cues produced a significant effect in VSWM perfor-
mance, while auditory cues failed to show any modulatory effect
on VSWM. Concerning the intramodal visual cue, we observed
that increasing its size resulted in a decreased VSWM bias in
agreement with the ‘zoom lens’ model (e.g., Castiello & Umilta `,
1992; Eriksen & St. James, 1986; Handy, Kingstone & Mangun,
1996; Turatto et al., 2000). However alternative hypotheses should
be considered to explain this size-dependent attentional bias de-
crease. For example, Belopolsky et al. (2007) manipulated the size
of the “attentional window” and found that increasing its size led
participants to orient their attention more frequently to an irrele-
vant location. Following this logic, it is possible that the decrease
of the attentional bias that we observed with the large visual cue
might be due to a larger tendency of participants to orient attention
to the uncued location.
There are several reasons why auditory cues, which have proved to
be effective in biasing perceptual processing (Driver & Spence,
In the case of typical cuing paradigms, participants are required to
perform a perceptual detection or discrimination of the target stimu-
lus. Conversely, in our paradigm, the effect of spatial attention plau-
and to retrieval processes (Herrero, Nikolaev, Raffone & van Leeu-
to prioritize response speed, VSWM paradigms are usually based on
response accuracy as dependent variable instead. It might be argued
that RT is more suitable to give information about channel selection
processes while response accuracy is more related to the channel or
signal enhancement process. In fact, while channel enhancement
makes the perceptual representation more veridical, channel selection
can be considered as a decision process which involves a selection for
action (Prinzmetal, McCool, et al., 2005; Prinzmetal, Park, et al.,
2005; see also Prinzmetal, Amiri, Allen & Edwards, 1998; Lu &
Dosher, 1998). Accordingly, the absence of auditory effects in
might be more effective in terms of channel selection than channel
on accuracy or reaction times (see Prinzmetal, McCool, et al., 2005).
Another reliable difference between the task used in this manu-
script and the typical audiovisual attention task is that the present task
involves multielement displays while in Posner’s-like paradigms the
target stimulus is usually presented alone. According to the ‘event
integration hypothesis’ ( Lupia ´n ˜ez, Ruz, Funes, & Milliken, 2007),
exogenous cuing effects might be due to an integration process of the
cue and the target within the same event file. In this process, the
object-position binding represents a very important factor, since lo-
cation acts as the link between the different features of an object
(Treisman, 1988). Spatial position is unique in specifying a particular
object, thus segregating the features of one object from those of
another one (Hollingworth & Rasmussen, 2010; Kahneman, Treis-
man, & Gibbs, 1992). We speculate that the absence of auditory
attentional bias in VSWM as well as the reduction of the visual
attentional bias with increasing the attentional focus (compare the
results in Experiment 1A and in the visual cue condition in Experi-
ment 2), might have been due to an imprecise spatial localization of
the cue. It should be noted that, in our paradigm, the presentation of
the cue in a given location is substituted with the presentation of 6
stimuli, and the participant did not know in advance which of them
would be probed in the memory test array. This implies a relation of
one-to-multiple locations in our task as compared to one-to-one lo-
argue that in perceptual tasks, the integration of the cue and the target
within the same object file represents a more obvious process than in
the VSWM task used here, as it implies the association of only two
spatio-temporally contingent stimuli.
The present study also revealed a larger effect of peripheral
multisensory cues than unimodal visual cues on VSWM. A crucial
point is that while neither the auditory nor the enlarged visual cue
(Experiment 2, 3, and 4) was able to elicit VSWM biases when
presented alone, the combination of both cues elicited a significant
bias on VSWM performance. We argued that this might be the
consequence of multisensory integration processes taking place
between the two spatial cues. This is motivated by at least two
main reasons highlighted by the results of Experiments 3 and 4.
First, it seems that the auditory component of the audiovisual cues
has to be lateralized to be effective, since no effects were observed
when ambiguous central auditory stimuli were used. Second, an
increased perceptual saliency of the visual cue (i.e., a thicker visual
cue) was not sufficient to elicit any bias in VSWM, thus indicating
that our multisensory effect required the integration between two
unimodal components to be effective.
The special role observed here for audiovisual cues in biasing
access into VSWM is in good agreement with other recent data
showing a special role for multisensory (audiovisual) cues. For
instance, multisensory spatial cues have been proved to be more
resistant to the disruption elicited by increased perceptual load (see
Santangelo & Spence, 2008, for a review). According to the
perceptual load theory (e.g., Lavie, 2005), participants’ perceptual
resources are necessarily and unavoidably used to process stimuli
until they have run out. Based on this hypothesis, Santangelo and
Spence (2007) assessed attentional capture effects following audi-
tory, visual, and audiovisual (multisensory) exogenous cues under
conditions of no load versus high perceptual load. Their results
showed that all kinds of cue types attracted selective visual atten-
tion in the no load perceptual condition, while only multisensory
cues were able to capture attention in the high load condition, thus
suggesting that multisensory integration might be unique in dis-
engaging spatial attention from a simultaneous perceptually-
demanding task (see also Santangelo, Ho, & Spence, 2008).
Our findings, consistent with Santangelo and Spence’s (2007)
results, might be taken to imply that multisensory stimuli are more
effective in biasing behavioral performance than unimodal stimuli
under conditions of high memory load (although Santangelo and
Spence’s results were based on increased ‘perceptual’ load). Indeed,
SPATIAL ATTENTION AND VISUO-SPATIAL WORKING MEMORY
in the experiments in which we used multisensory peripheral cues
(i.e., Experiment 2 & 3), the memory array always consisted of six
squares, which widely exceeded the capacity of VSWM maintenance,
multisensory stimuli might represent particularly salient stimuli (as a
consequence of multisensory integration processes taking places be-
tween auditory and visual components) which might be particularly
& Spence, 2007; see also Mastroberardino, Santangelo, Botta,
Marucci, & Olivetti Belardinelli, 2008, for a review) such as in a
supra-span VSWM task. Obviously, a more direct demonstration is
required. Future studies could address this issue by varying, for
example, the memory load for multisensory cues exactly as we did in
the case of unimodal cues.
need to be localizable for boosting the attentional weight of the visual
event, many studies have shown that multisensory integration can
occur independently of the cue and target locations (Van der Burg,
Olivers, Bronkhorst, & Theeuwees, 2008; Dalton & Spence, 2007;
Vroomen & de Gelder, 2000). We speculate that the difference
between our results and other studies showing that spatial alignment
is not a necessary condition for multisensory integration (see also
Woods & Racanzone, 2004), is related to the spatial precision of the
to be cued location. Actually, in our task, the visual component
signaled almost a whole hemifield. Consequently, in our experiments,
the multisensory integration process involved two spatially “impre-
cise” stimuli given by visual and auditory components of the audio-
visual cue. For this reason, we suggest that, in our paradigm, the
spatial congruence between the two sensory components represents a
necessary condition to increase the location precision and, accord-
ingly, to allow multisensory cuing effect. In fact, some studies indi-
cated that the results of multisensory integration depends on the
relative uncertainty in individual sensory domains (Alais & Burr,
2004; Heron, Whitaker & McGraw, 2004; Battaglia, Jacobs & Aslin,
2003; Ernst & Banks, 2002; Sanabria, Soto-Faraco & Spence, 2005).
For instance, Heron et al. (2004) indicated that when visual location
corresponded to a small target size, auditory signals had little or no
influence on perceived visual location. However, with increasing
visual uncertainty (larger target sizes), auditory signals exerted a
significantly greater influence on visual perceived location. Nonethe-
less, it must be said that our results are not clear enough to drive firm
conclusions regarding this point. Further research is needed to clarify
In sum, our results confirm previous evidence that visual pe-
ripheral cues modulate VSWM. More important is that they indi-
cate that while multisensory stimuli are particularly efficient in
biasing information access in VSWM, auditory stimuli seem to be
totally ineffective, in contrast with simple RT detection and dis-
crimination crossmodal attention tasks. It will be a matter for
future research to elucidate the reason of the unexpected absence
of auditory attentional bias in VSWM and which are the optimal
conditions for multimodal stimuli to bias VSWM performance.
Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-
optimal bimodal integration. Current Biology, 14, 257–262.
Awh, E., & Jonides, J. (2001). Overlapping mechanism of attention and
spatial working memory. Trends in Cognitive Sciences, 5, 119–126.
Awh, E., Jonides, J., Smith, E. E., Buxton, R. B., Frank, L. R., Love, T.,
. . . Wong, E. C. (1999). Rehearsal in spatial working memory: Evidence
from neuroimaging. Psychological Science, 10, 443–437.
Awh, E., Smith, E. E., & Jonides (1995). Human rehearsal processes and
the frontal lobes: PET evidence. In J. Grafman, K. Holyoak, & F. Boiler
(Eds.), Annals of the New York Academy of Sciences, Vol. 769. Structure
and functions of the human prefrontal cortex (pp. 97–119). New York:
New York Academy of Sciences.
Awh, E., Vogel, E., & Oh, S. H. (2006). Interactions between attention and
working memory. Neuroscience, 139, 201–208.
Baddeley, A., & Hitch, G. (1974). Working memory. In Bower, G. (Ed.),
The Psychology of Learning and Motivation, (pp. 47–89). New York:
Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration
of visual and auditory signals for spatial localization. Journal of the
Optical Society A, 20, 1391–1397.
Belopolsky, A. V., Zwaan, L., Theeuwes, J., & Kramer, A. F. (2007). The
size of an attentional window modulates attentional capture by color
singletons. Psychonomic Bulletin & Review, 14, 934–938.
Botta, F., Santangelo, V., Raffone, A., Lupia ´n ˜ez, J., & Olivetti Belardinelli,
M. (2010). Exogenous and endogenous spatial attention effects on
visuo-spatial working memory. Quarterly Journal of Experimental Psy-
chology, 27, 1–13.
Bundesen, C. (1990). A theory of visual attention. Psychological Review,
Burr, D., & Alais, D. (2006). Combining visual and auditory information.
Progress in Brain Research, 155, 243–258.
Calvert, G. A., Spence, C., & Stein, B. E. (Eds.). (2004). The handbook of
multisensory processes. Cambridge, MA: MIT Press.
Castiello, U., & Umilta `, C. (1992). Splitting focal attention. Journal of
Experimental Psychology: Human Perception & Performance, 18, 837–
Corsi, P. M. (1972). Human memory and the medial temporal region of the
brain. (Unpublished doctoral dissertation). McGill University, Montreal,
Dalton, P., & Spence, C. (2007). Attentional capture in serial audiovisual
search tasks. Perception & Psychophysics, 69, 422–438.
Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual
attention. Annual Review of Neuroscience, 18, 193–222.
Driver, J. (2001). A selective review of selective attention research from
the past century. British Journal of Psychology, 92, 53–78.
Driver, J., & Spence, C. (1998). Crossmodal links in spatial attention.
Proceedings of the Royal Society of London Series B, 353, 1–13.
Eimer, M. (2004). Electrophysiology of human crossmodal spatial atten-
tion. In C. Spence, & J. Driver (Eds.), Crossmodal space and cross-
modal attention (pp. 221–245). Oxford, UK: Oxford University Press.
Eimer, M., & Schröger, E. (1998). ERP effects of intermodal attention and
cross-modal links in spatial attention. Psychophysiology, 35, 313–327.
Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and
around the field of focal attention: A zoom lens model. Perception &
Psychophysics, 40, 225–240.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic
information in a statistically optimal fashion. Nature, 415, 429–433.
Graziano, M. S., & Gross, C. G. (1995). The representation of extraper-
sonal space: A possible role for bimodal, visuo–tactile neurons. In M. S.
Gazzaniga (Ed.), The Cognitive Neurosciences (pp. 1021–1034). Cam-
bridge, MA: MIT Press.
Griffin, I. C., & Nobre, A. C. (2003). Orienting attention to locations in
internal representations. Journal of Cognitive Neuroscience, 15, 1176–
Handy, T., Kingstone, A., & Mangun, G. R. (1996). Spatial distribution of
visual attention: Perceptual sensitivity and response latency. Perception
& Psychophysics, 58, 613–627.
Heron, J., Whitaker, D., & McGraw, P. V. (2004). Sensory uncertainty
BOTTA ET AL.
governs the extent of audiovisual interaction. Vision Research, 44,
Herrero, J. L., Nikolaev, A. R., Raffone, A., & van Leeuwen, C. (2009).
Selective attention in visual short-term memory consolidation. Neurore-
port, 20, 652–656.
Ho, C., Santangelo, V., & Spence, C. (2009). Multisensory warning sig-
nals: When spatial correspondence matters. Experimental Brain Re-
search, 195, 261–272.
Hollingworth, A., & Rasmussen, I. P. (2010). Binding objects to locations:
The relationship between object files and visual working memory.
Journal of Experimental Psychology: Human Perception & Perfor-
mance, 36, 543–564.
Holmes, N. P., & Spence, C. (2005). Multisensory integration: Space, time
and superadditivity. Current Biology, 15, 762–764.
James, W. (Ed.). (1890). The principles of psychology. Cambridge, MA:
Harvard University Press.
Jonides, J. (1981). Voluntary vs. automatic control over the mind’s eye’s
movement. In J. B. Long, & A. D. Baddeley (Eds.), Attention and
performance IX (pp. 187–203). Hillsdale, NJ: Erlbaum.
Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of
object files: Object-specific integration of information. Cognitive Psy-
chology, 24, 175–219.
Koelewijn, T., Bronkhorst, A., & Theeuwes, J. (2009). Competition be-
tween auditory and visual spatial cues during visual task performance.
Experimental Brain Research, 195, 593–602.
Lavie, N. (2005). Distracted and confused?: Selective attention under load.
Trends in Cognitive Sciences, 9, 76–82.
Lu, Z. L., & Dosher, B. (1998). External noise distinguishes attention
mechanisms. Vision Research, 38, 1183–1198.
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory
for features and conjunctions. Nature, 390, 279–281.
Lupia ´n ˜ez, J., Ruz, M., Funes, M. J., & Milliken, B. (2007). The manifes-
tation of attentional capture: Facilitation or IOR depending on task
demands. Psychological Research, 71, 77–91.
Macaluso, E., & Driver, J. (2005). Multisensory spatial interactions: A
window onto functional integration in the human brain. Trends in
Neurosciences, 28, 264–271.
Macaluso, E., Frith, C. D., & Driver, J. (2002). Crossmodal spatial influ-
ences of touch on extrastriate visual areas take current gaze-direction
into account. Neuron, 34, 647–658.
Mastroberardino, S., Santangelo, V., Botta, F., Marucci, F. S., & Olivetti
Belardinelli, M. (2008). How the bimodal format of presentation affects
working memory: An overview. Cognitive Processing, 9, 69–76.
Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Exper-
imental Psychology, 32, 3–25.
Posner, M. I., & Snyder, C. R. R. (1975). Attention and cognitive control.
In R. Solso (Ed.), Information processing and cognition: The Loyola
Symposium (pp. 55–85). Hillsdale, NJ: Erlbaum.
Postle, B. R., Awh, E., Jonides, J., Smith, E. E., & D’Esposito, M. (2004).
The where and how of attention-based rehearsal in spatial working
memory. Cognitive Brain Research, 20, 194–205.
Prinzmetal, W., Amiri, H., Allen, K., & Edwards, T. (1998). The phenom-
enology of attention: I. Color, location, orientation, and “clarity.” Jour-
nal of Experimental Psychology: Human Perception and Performance,
Prinzmetal, W., McCool, C., & Park, S. (2005). Attention: Reaction time
and accuracy reveal different mechanisms. Journal of Experimental
Psychology: General, 134, 73–92.
Prinzmetal, W., Park, S., & Garrett, R. (2005). Involuntary attention and
identification accuracy. Perception & Psychophysics, 67, 1344–1353.
Prinzmetal, W., Presti, D. E., & Posner, M. I. (1986). Does attention affect
visual feature integration? Journal of Experimental Psychology: Human
Perception & Performance, 12, 361–369.
Sanabria, D., Soto-Faraco, S., & Spence, C. (2005). Spatiotemporal inter-
actions between audition and touch depend on hand posture. Experimen-
tal Brain Research, 165, 505–514.
Santangelo, V., Ho, C., & Spence, C. (2008). Capturing spatial attention
with multisensory cues. Psychonomic Bulletin & Review, 15, 398–403.
Santangelo, V., & Spence, C. (2007). Multisensory cues capture spatial
attention regardless of perceptual load. Journal of Experimental Psy-
chology, 33, 1311–1321.
Santangelo, V., & Spence, C. (2008). Is the exogenous orienting of spatial
attention truly automatic? Evidence from unimodal and multisensory
studies. Consciousness and Cognition, 17, 989–1015.
Schmidt, B. K., Vogel, E. K., Woodman, G. F., & Luck, S. J. (2002).
Voluntary and automatic attentional control of visual working memory.
Perception & Psychophysics, 64, 754–763.
Smyth, M. M., & Scholey, K. A. (1994). Interference in immediate spatial
memory. Memory & Cognition, 22, 1–13.
Spence, C. (2010). Crossmodal spatial attention. Annals of New York
Academy of Sciences, 1191, 182–200.
Spence, C., & Driver, J. (1997). Audiovisual links in exogenous covert
spatial orienting. Perception & Psychophysics, 59, 1–22.
Spence, C., & Driver, J. (1999). A new approach to the design of multi-
modal warning signals. In D. Harris (Ed.), Engineering psychology and
cognitive ergonomics, Vol. 4: Job design, product design and human-
computer interaction (pp. 455–461). Hampshire: Ashgate Publishing.
Spence, C., & Driver, J. (Eds.). (2004). Crossmodal space and crossmodal
attention. Oxford: Oxford University Press.
Spence, C., & Santangelo, V. (2009). Capturing spatial attention with
multisensory cues: A review. Hearing Research, 258, 134–142.
Stein, B. E., & Meredith, M. A. (Eds.). (1993). The merging of the senses.
Cambridge, MA: MIT Press.
Stein, B. E., Meredith, M. A., & Wallace, M. T. (1993). The visually
responsive neuron and beyond: Multisensory integration in cat and
monkey. Progress in Brain Research, 95, 79–90.
Treisman, A. M. (1988). Features and objects: The fourteenth Bartlett
memorial lecture. Quarterly Journal of Experimental Psychology, 40A,
Turatto, M., Benso, F., Facoetti, A., Galfano, G., Mascetti, G. G., &
Umilta `, C. (2000). Automatic and voluntary focusing of attention. Per-
ception & Psychophysics, 62, 935–952.
Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J.
(2008). Audiovisual events capture attention: Evidence from temporal
order judgements. Journal of Vision, 8, 1–10.
Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features,
conjunctions, and objects in visual working memory. Journal of Exper-
imental Psychology: Human Perception & Performance, 27, 92–114.
Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception:
Cross-modal effects of auditory organization on vision. Journal of
Experimental Psychology: Human Perception and Performance, 26,
Wallace, M. T., Meredith, M. A., & Stein, B. E. (1998). Multisensory
integration in the superior colliculus of the alert cat. Journal of Neuro-
physiology, 80, 1006–1010.
Woods, T. M., & Recanzone, G. H. (2004). Multimodal interactions
evidenced by the ventriloquism effect in humans and monkeys. In C.
Spence, G. Calvert, & B. E. Stein (Eds.), Handbook of multisensory
processes (pp. 35–48). Cambridge, MA: MIT Press.
Zimmer, U., & Macaluso, E. (2007). Processing of multisensory spatial
congruency can be dissociated form working memory and visuo-spatial
attention. European Journal of Neuroscience, 26, 1681–1691.
Received December 1, 2009
Revision received January 18, 2011
Accepted January 30, 2011 ?
SPATIAL ATTENTION AND VISUO-SPATIAL WORKING MEMORY