Content uploaded by Jonas Vibell
All content in this area was uploaded by Jonas Vibell on May 09, 2022
Content may be subject to copyright.
NeuroImage xxx (xxxx) 119285
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/ynimg
Sound-modulations of visual motion perception implicate the cortico-
Dorita H.F. Chang a,⁎, David Thinnes b,c, Pak Yam Au a, Danilo Maziero d, Victor Andrew Stenger d,
Scott Sinnett b, Jonas Vibell b,⁎
aDepartment of Psychology and The State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong
bDepartment of Psychology, University of Hawaiʻi at Mānoa, Hawaii, USA
cSystems Neuroscience & Neurotechnology Unit, Faculty of Medicine, Saarland University & HTW Saar, Germany
dMR Research Program, Department of Medicine, John A. Burns School of Medicine, University of Hawai'i, HI, USA
audiovisual bounce-inducing effect
A widely used example of the intricate (yet poorly understood) intertwining of multisensory signals in the brain
is the audiovisual bounce inducing effect (ABE). This effect presents two identical objects moving along the az-
imuth with uniform motion and towards opposite directions. The perceptual interpretation of the motion is am-
biguous and is modulated if a transient (sound) is presented in coincidence with the point of overlap of the two
objects’motion trajectories. This phenomenon has long been written-off to simple attentional or decision-
making mechanisms, although the neurological underpinnings for the effect are not well understood. Using be-
havioural metrics concurrently with event-related fMRI, we show that sound-induced modulations of motion
perception can be further modulated by changing motion dynamics of the visual targets. The phenomenon en-
gages the posterior parietal cortex and the parieto-insular-vestibular cortical complex, with a close correspon-
dence of activity in these regions with behaviour. These findings suggest that the insular cortex is engaged in de-
riving a probabilistic perceptual solution through the integration of multisensory data.
Seemingly simple perception belies its complexity. A bouncing ball
or a person talking produce both auditory and visual information that
have to follow specific rules for us to perceive them as coherent. In one
well-known (but poorly understood) demonstration of audio-visual in-
tegration, two identical disks are moving towards and across each
other. As they meet, they can either be perceived as streaming through
or bouncing away from each other. Perceptual interpretations shift in
accordance with whether a transient (typically, a sound) is presented at
the temporal point of visual overlap of the disks (Fujisaki et al., 2004;
Grassi and Casco, 2010,2009;Sekuler and Sekuler, 1999;Sekuler et al.,
1997;Watanabe and Shimojo, 2001;Zhou et al., 2007). This phenome-
non is popularly referred to as the Audiovisual Bounce-Inducing Effect
(ABE) –an extension of Metzger's (1934) original motion-grouping
The mechanistic bases of the ABE are unclear, though various expla-
nations favouring attentional (Kawabe and Miura, 2006;Kawachi and
Gyoba, 2013;Watanabe and Shimojo, 1998), response inference, or
sensory-perceptual contexts have been offered. Attentional explana-
tions posit that under Metzger-like (no-sound) presentations, intact at-
tention allows the integration of the discs’local motion signals at over-
lap, producing a streaming percept. This integration is facilitated by
adding a transient sound (Watanabe and Shimojo, 1998), leading to
greater bounce percepts. Still, the work of Grassi and Casco (2009,
2010) point to something more than mere attentional effects, showing
additional perceptual modulations based on sound envelope. Decision-
making explanations posit that multisensory data are integrated proba-
bilistically to infer the distal stimulus from the ambiguous proximal
stimulus (Sekuler and Sekuler, 1999). Finally, other explanations offer
that ABE interpretations can implicate earlier stages of perceptual pro-
cessing where the temporal integration of sensory signals can be altered
by V1-suited features (i.e., orientation; Kawabe and Miura, 2006).
Accompanying the limited theoretical understanding of the ABE is
an even more limited collection of neuroscience data. In one fMRI study
(Bushara et al., 2003), ‘bounce’interpretations were accompanied by
greater activity in the insula, prefrontal cortex, posterior parietal cortex
(PPC), thalamus, superior colliculus, and the cerebellar vermis, relative
E-mail addresses: firstname.lastname@example.org (D.H.F. Chang), email@example.com (J. Vibell).
Received 7 March 2022; Received in revised form 20 April 2022; Accepted 5 May 2022
Note: Low-resolution images were used to create this PDF. The original images will be used in the final composition.
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
to ‘stream’interpretations. Magnetoencephalography (MEG) work has
revealed ‘bounce’interpretations to be associated with early activity in
middle frontal regions, followed by activity in the superior parietal lob-
ule (Zvyagintsev et al., 2011). Electroencephalography (EEG) studies
have investigated the role of oscillatory coherence and latencies associ-
ated with cross-modal interactions in the ABE (Hipp et al., 2011;Zhao
et al., 2020,2018,2017). Finally, in one transcranial magnetic stimula-
tion (TMS) study, Maniglia et al (2012) found that offline TMS of the
right PPC decreased bounce percepts.
Here, we sought to understand multisensory binding in the human
brain by introducing a derivation on the classic ABE stimuli to probe
the involvement of a tertiary system that may have been overlooked:
the visuo-vestibular system. Previous engagement of the insular cortex
during the ABE has been explained away as reflecting a node for ‘multi-
sensory’processing (along with putatively comparable multisensory
roles for the middle and prefrontal cortex) –but stops short of speculat-
ing on the phenomenon's potential integration with the vestibular sys-
tem. This is puzzling, given the prominent role of this region in vestibu-
lar function. Here, we ask whether altering the visual motion profiles of
the discs to move consistently or inconsistently with visual patterns ex-
pected in a visual world governed by laws of gravity could further elicit
perceptual and neural modulations during the presentation of ABE
events. Motivation for this work comes from studies showing that the
visual presentation of gravitationally-consistent motion elicits activity
in the cortical vestibular system (Indovina et al., 2013,2005). We rea-
soned that while we are not directly stimulating vestibular function, al-
tering visual motion dynamics may be a good conduit for understand-
ing the nature of insular involvement in audiovisual integration. In Ex-
periment 1, we presented audiovisual events while manipulating their
motion profiles such that they moved upwards/downwards diagonally
while decelerating, accelerating, or in constant motion. We observed
that consistent with the classic work (Sekuler et al., 1997), the presence
of the auditory transient acted to shift perceptual interpretations to-
wards a greater number of ‘bounce’responses. For downwards (but not
upward) moving events, percepts additionally changed with changes in
motion profile. To better understand the mechanistic origins of the mo-
tion-profile-based modulations of multisensory perceptions and to in-
crease statistical power, in Experiment 2, we presented only the down-
wards moving events while fMRI responses were concurrently imaged.
2. Materials and methods
Thirty subjects (14 males; mean age = 22.1 years; all right-handed)
were tested in the initial behavioural experiment. An independent
group of 22 subjects (13 males; mean age = 28.4 years; all right-
handed except one who was ambidextrous) were tested in the fMRI ex-
periment. Sample sizes were determined using a power (0.95) determi-
nation based on large effect sizes (d > 0.9; N = 7) retrieved from prior
fMRI work involving the ABE, for comparisons of ‘bounce’versus
‘stream’univariate responses in the brain (Bushara et al., 2003). All
participants had normal or corrected-to-normal vision and provided
written informed consent in line with ethical review and approval of
the work by the Human Research Ethics committee of The University of
Hong Kong, and the Internal Review Board of the University of Hawaii
at Manoa. MR participants were additionally screened for MRI con-
2.2. Apparatus, stimulus and task
One audiovisual event lasted 1 second. At the start of each event,
two identical white disks, each with an angular size of 0.5 deg (50.18
cd/m2) were presented at the top or bottom of the screen with an initial
horizontal separation of 6.9 deg. These two discs moved towards the di-
agonally-opposite corner, hence having trajectories that would inter-
sect at the screen center. The event terminated once the two discs
reached the diagonally-opposing corners of the screen with a lateral
separation of 6.9 deg. Hence, the full event subtended 6.9 × 6.9 deg.
Depending on the particular condition, the discs carried one of deceler-
ating (-13.8 deg/s2), accelerating (13.8 deg/s2), or constant velocity
motion (6.9 deg/s). Therefore, the average velocity of the disks was
held at 6.9 deg/s across all conditions and one event lasted 1 sec. Also
depending on the particular trial condition (i.e., sound present trials), a
brief auditory click (10 ms, 440 Hz) was presented in coincidence with
the point of the discs’visual intersection (with sound onset synced to
the first frame of visual intersect).
For the behavioural experiment, stimulus configurations constituted
all possible combinations of two motion directions (upwards, down-
wards), three motion profiles (decelerating, constant velocity, acceler-
ating), and sound option (present, absent). Participants viewed the
stimuli while seated upright at a distance of 50 cm as maintained by a
head rest. For the MR experiment, all stimuli were downwards moving
in one of the three motion profiles, and all were presented with the
sound. Participants viewed the stimuli while laying supine in bore.
Stimuli were generated on a PC using custom software written in
Matlab (The Mathworks Inc., Natick, USA) using extensions from Psy-
chtoolbox (Brainard, 1997;Pelli, 1997). For the behavioural experi-
ment, stimuli were presented on a 24-inch LCD display (1920 × 1080
resolution; 120 hz). For the MRI experiment, stimuli were back-
projected via a mirror, onto a small projection screen (143 × 80.5 mm)
by an LCD projector (model AV 0611, Avotec LLC, Florida, USA) with
1,280 × 1,024 pixel resolution and custom-built optics. Sounds were
delivered through tympanic headphones powered by a graphic equal-
izer –(Model Q2031B, Yamaha, LLC, Hamamatsu, Japan).
2.3. Experimental design and statistical analysis (behavioural experiment)
2.3.1. Design and procedures
Prior to the experiment proper, participants were provided with 6
practice trials to familiarize them with the task. The experiment proper
consisted of 4 runs, comprising separately, upwards motion/sound pre-
sent, upwards motion/sound absent, downwards motion/sound pre-
sent, and downwards motion/sound absent configurations. Each run
consisted of 120 trials, consisting of 40 repetitions of each of the three
types of motion dynamics (i.e., deceleration, constant velocity, acceler-
ation). On each trial, a single audiovisual event was presented (as de-
scribed above), each lasting 1 sec, followed by a response prompt. Par-
ticipants were instructed that each event (trial) would consist of two
discs in motion. On some trials, they may perceive the discs to collide
and bounce off each other. On other trials, they may perceive the discs
to stream past each other. No further information with regards to the
stimulus manipulations (i.e., motion profiles) nor expected bounce/
stream outcomes were provided. Participants were asked to indicate
their percept (bouncing off of- or streaming through- each other) via a
left/right keypress with key-correspondences counterbalanced across
participants. Participants were allotted a maximum duration of 3 secs
to respond, after which the trial timed out and the next trial was pre-
sented. Trial (and run) order was randomized. Completion of the full
behavioural task took approximately 30 minutes and breaks were per-
mitted in between runs.
2.3.2. Behavioural data analysis
Behavioural performances were quantified in terms of the propor-
tion of ‘bounce’responses. As response input on each trial was not per-
mitted until after stimulus offset, reaction time data were not consid-
ered informative and not analyzed. Responses were analyzed with a re-
peated-measures analysis of variance (ANOVA) comparing motion di-
rection (upwards/downwards, with respect to the retina), motion pro-
file (decelerating, constant velocity, accelerating,), and sound (present/
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
absent). Data were verified to satisfy parametric assumptions, and any
variance (sphericity) violations were addressed with Greenhouse-
Geisser corrections. Follow-up post-hoc comparisons were conducted
by means of t-tests corrected by Bonferroni-adjustments (i.e., here, by
correcting threshold by number of tests).
2.4. Experimental Design and Statistical Analysis (fMRI Experiment)
2.4.1. fMRI acquisition
Imaging data were acquired on a 3 Tesla Magnetom Prisma
(Siemens, Erlangen, Germany) at the University of Hawaii using a 64-
channel phase-array head coil. Each scanning session began with the
acquisition of a high-resolution MPRAGE anatomical sequence
(TR = 2300 ms, TE= 2.98 ms, TI = 900 ms, 1mm isotropic voxels,
256 mm FOV). Blood oxygen level-dependent (BOLD) signals were
measured with a gradient-recalled-echo-planar (EPI) sequence
(3 × 3 × 3 mm; TR 2000 ms; TE 28 ms; 32 slices; 178 volumes).
2.4.2. fMRI design and procedures
Stimulus events were presented in an event-related design.All
events were presented with the sound at the point of disc intersection
and were downwards-moving. We elected not to include the no-sound
variants of the events, nor upwards-moving variants of the sound-
present events in order to reduce run variability, and to increase power
for the conditions of interest. Notably, results from our behavioural
testing indicated that for upwards moving events, perceptual outcomes
are independent of motion profiles.
Each fMRI run began and ended with a 10 second fixation period. In
between, events (trials) comprised a 1 second stimulus (as described
above) or null (fixation only; 1 second) presentation, followed by a 5
(half of the trials) or 7 second response period during which a central
fixation was re-presented (Fig. 1A). On each trial, participants were
asked to indicate whether they perceived the two depicted discs to be
bouncing off of- or streaming through- each other via an MR-
compatible button box (with button-response assignments counterbal-
anced across participants). Responses were permitted for a maximum of
5 (or 7) seconds after which the next trial was presented. One particular
fMRI run consisted of four event types (downwards accelerating; down-
wards decelerating; downwards constant velocity motion; null-fixation)
repeated 12 times and presented in random order, for a total of 48 trials
per run. Each run lasted a total of ∼6 minutes and participants com-
pleted a total of 6 runs (288 total trials). Completion of the full scan ses-
sion took approximately 60 minutes.
2.4.3. fMRI data analysis
fMRI data were analysed with BrainVoyager v22 (BrainInnovation,
B.V., Maastricht, Netherlands). Anatomical data of each observer were
used for cortex reconstruction, inflation, and flattening. Functional
(EPI) data were preprocessed using three-dimensional motion correc-
tion (three translations, three rotations), slice-time correction (with cu-
bic-spline interpolation), spatial smoothing (Gaussian filter, full-width
at half maximum, 5 mm; for General Linear Model –GLM), linear trend
removal, and high-pass filtering (three cycles per run cut-off; .008 Hz).
EPI data were then aligned to each observer's anatomical image and
transformed into Talairach space. Volumes were aligned to the first vol-
ume of the first run of each session.
The functional data were then analyzed in terms of their univariate
activity and multivariate responses (multivoxel pattern analysis,
MVPA). Univariate responses were estimated with a deconvolution [Fi-
nite Impulse Response (FIR)] model that also takes into account six mo-
tion regressors (three translation parameters, in millimeters; and three
rotation parameters, pitch roll, yaw; in degrees). Unlike standard box-
car models, the FIR generates, for each non-baseline condition, a set of
predictors coding separate time intervals with systematic delays in rela-
tion to condition onset. For each condition here, we used 7 delays, and
Fig. 1. (a) Schematic depicting one run in the MR protocol. A particular run be-
gan and ended with a 10 second fixation period. In between, events (trials)
comprised a 1 second stimulus or null (fixation) presentation, followed by a 5
or 7 second response period during which a central fixation was re-presented.
(b) Behavioural data from Experiment 1, indexed in terms of the proportion of
bounce responses for each of the motion directions and profiles (deceleration,
constant velocity, acceleration), presented separately for the auditory tran-
sient/cue present and absent conditions. Error bars represent +/- 1 standard
error of the mean. (c) Behavioural replication from the MR component. Only
downwards stimuli with transient sounds at disc coincidence were presented.
Error bars represent +/- 1 standard error of the mean.
computed contrasts based on linear combinations of the middle three
estimates (i.e., delays 3-5).
We performed two sets of contrasts: In the first set of contrasts, we
estimated responses based on the stimulus (event type) presented only
(e.g., decelerating). In the second set of contrasts, we recomputed re-
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
sponses after dividing trials based on user-entered percepts (e.g., decel-
erating stimulus; “bounce”).
In order to identify regions of interest (ROIs) from which to extract
individual condition beta-weights and multivariate patterns, we first
split the data of every subject into two sets: an initial set comprising the
first run of every subject, and a second set comprising all remaining
runs. ROIs were defined as spherical ROIs (5 mm radius) centered on
the centroid of the significant clusters (qFDR < .05), as identified from
a random effects GLM using the initial set of data. Subsequent analyses
were computed by using the second, independent set of data. This ap-
proach of using spherical ROIs centered on the centroid rather than iso-
lating clusters/significant voxels identified from the initial contrasts en-
sured that we were able to identify ROIs relevant to the task at hand,
while similarly minimizing circularity issues (Kriegeskorte et al., 2010).
Finally, for the multivariate classifications, we chose to use a linear
SVM classifier (libSVM) (Chang, 2011) together with a multivariate fea-
ture selection algorithm, Recursive Feature Elimination (RFE) to esti-
mate spatial patterns (De Martino et al., 2008). The RFE eliminated the
necessity of selecting a somewhat arbitrary fixed number of voxels per
ROI, and instead offered estimates based on voxel subsets with the best
performance within each ROI. Briefly, for this analysis, all voxels and
their time courses were first converted to Z scores and shifted in time (4
s) to account for the hemodynamic response. This shift corresponds to
the typical peak of the BOLD response function (Serences, 2004). We
then took 80% of the training data set to compute SVM weights. The se-
lection of the data set (and computation of SVM weights) were repeated
20 times within a particular RFE step (i.e., each voxel had 20 sampled
weights, which were subsequently averaged). Following each step, we
then ordered voxels based on their (step-averaged) weight from the
highest to the lowest. Using these weights, we omitted the 5 most unin-
formative voxels, and used the rest to decode the test patterns. This
yielded an accuracy at the current voxel count/pattern. This process
was repeated until voxel count fell below 50. For each ROI, based on
the maximum mean accuracy (across all cross-validations) across all
RFE steps, we retrieved the final voxel pattern with which to compare
the accuracies of the different cross validations. For one variation of the
SVM analysis, as trial conditions were further separated based on ob-
server percepts, this resulted in an unbalanced training data set that
was corrected via the Synthetic Minority Over-sampling Technique
(SMOTE) algorithm (Chawla et al., 2002). Mean prediction accuracies
were tested against chance level (0.53), which was determined via per-
mutation tests for the data (i.e., by running 1000 SVMs with shuffled la-
We ran two sets of multivariate classifications. In the first set of
analyses, data were patterned regardless of perceptual outcomes. In the
second set of analyses, data were patterned separately for events re-
ported as ‘bounce’or ‘stream’events. We performed a support vector
machine (SVM-based) multivariate pattern analysis (MVPA) training
and testing responses for the accelerating vs constant velocity, deceler-
ating vs constant velocity, and accelerating vs decelerating events, and
then again training and testing ‘bounce’vs ‘stream’responses for each
of the event types and within each ROI.
2.5. Data/code availability
Any custom code used for analyses of data in this manuscript (in
conjunction with built-in tools in BrainVoyager) are available at
https://github.com/hiroshiban/FS2BV (shared with permission from
Hiroshi Ban). Anonymised data (including skull-stripped imaging data)
can be made available upon request, but require additional approval to
be sought from the Human Research Ethics Committee of The Univer-
sity of Hong Kong.
We firstly examined whether the perceptual outcome elicited by the
audiovisual event was affected by the varying motion profiles and the
presence or absence of the sound transient. Responses, indexed in terms
of the proportion of ‘bounce’percepts (Fig. 1B), were entered in a 2 (di-
rection; upwards/downwards) x 3 (motion profile; decelerating, accel-
erating, constant velocity) x 2 (sound; present/absent) repeated-
measures ANOVA. The analysis indicated a significant main effect of
transient [F(1,29) = 4.28, p= .048, = .129], reflecting the fact
that the proportion of ‘bounce’responses was significantly higher with
the sound transient present (mean=0.58) versus absent (mean=0.50).
The analysis also indicated a significant direction by motion profile in-
teraction [F(2,58) = 5.35, p= .007, = .156], which was followed
up by independent one-way ANOVAs for each motion direction. The
analysis for the upwards moving stimuli indicated no significant differ-
ences in responses among the three motion profiles [F(2,58) = .072,
p= .931, = .002]. By contrast, the analysis for the downwards
moving stimuli [deceleration = .41 (SE = .05); constant veloc-
ity = .50 (SE = .04); acceleration =.62 (SE=.039)] indicated that ob-
servers made a significantly greater number of ‘bounce’responses for
the accelerating vs decelerating stimuli [F(2,58) = 5.32, p= .008,
= .155; t(29) = 2.61, p= .014]. Responses for the constant veloc-
ity vs decelerating stimuli [t(29) = 1.47, p= .15] and constant veloc-
ity vs accelerating stimuli [t(29) = -2.37, p= .024] did not differ (for
clarity, all ps reported are uncorrected, and Bonferroni corrections are
done with threshold adjustments for the triple comparisons).
We firstly verified that perceptual outcomes for the downwards
moving events presented in the modified MR protocol were consistent
with effects observed in the first behavioural experiment. Of the 22 sub-
jects who participated in the MR component, behavioural responses for
three were not recorded due to equipment (i.e., button box) failure on
their acquisition date. As such, only behavioural data from the remain-
ing 19 participants (11 males; mean age = 28.5 yrs) were included in
Behavioural responses, indexed in terms of the proportion of
‘bounce’responses (Fig. 1C; deceleration=0.42 (SE=.05); constant
velocity=0.55 (SE=.04), acceleration=0.59 (SE =.05) were entered
in a one-way repeated-measures ANOVA for the three motion profiles
(deceleration, constant velocity, acceleration). Similar to the behav-
ioural experiment, the analysis indicated a significant main effect of
motion profile [F(2,36) = 3.52, p= .04, = .164]. Here, subse-
quent comparisons trended similarly to the behavioural data, with
greater bounce responses for the accelerating vs decelerating stimuli [t
(18) = -2.27, p= .03], but this contrast, together with those for con-
stant velocity vs decelerating [t(18) = -1.95, p= .07], and constant
velocity vs accelerating comparisons [t(18) = -.79, p= .44] did not
survive the triple Bonferroni-adjustment.3.2.2 fMRI: univariate responses
We examined firstly amplitude responses from the FIR GLM analy-
sis, taking initially a subset of the data (1 run per subject). Using these
data, we performed a whole-brain contrast, disregarding at first, behav-
ioural responses, comparing responses of accelerating vs constant ve-
locity events, responses of decelerating vs constant velocity events, in
addition to a conjunction contrast comparing [(Acceleration-Constant
Velocity) –(Deceleration-Constant Velocity)] (Fig. 2). Across these
contrasts, we found significant clusters (qFDR < .05) in bilateral cau-
date [Caud; TALx,y,z +/- 13, 12, 12], posterior parietal cortex [PPC;
TALx,y,z +/- 13, -74, 43], angular gyrus [AG; TALx,y,z +/- 31, -74, 29],
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
Fig. 2. Results from a group-level GLM contrast comparing responses of ac-
celerating vs constant velocity events decelerating vs constant velocity
events, in addition to a conjunction contrast comparing [(Acceleration-
Constant Velocity) –(Deceleration-Constant Velocity)]. Contrasts were per-
formed using the middle three FIR delays (FIRd3-5) while concatenating
across behavioural responses. Responses are superimposed onto representa-
tive surface meshes of one participant. Sulci are coded in darker grey than
posterior insula and parieto-insular vestibular cortex, hereafter referred
to as the PIVC+ complex (Frank and Greenlee, 2018) [PIVC+; TALx,y,z
+/- 34, -28, 18], superior temporal gyrus [STG; TALx,y,z +/- 53, -23,
12], dorsal occipital cortex [dOCC; TALx,y,z +/- 28, -74, 31], superior
frontal gyrus [SFG; TALx,y,z +/- 11, 11, 50], and premotor cortex [PMC;
TALx,y,z +/- 49, 0, 28].
In order to quantify the univariate responses in these regions, we
generated spherical ROIs (5 mm rad.) centered on these clusters using
the remaining data from each subject. This approach of splitting data
for the identification of ROIs on the one hand, and quantifying response
properties with the remaining data ensures that effects are robust and
unhindered by well-documented circularity practices in MR analysis
(Kriegeskorte et al., 2010). For each ROI, we indexed responses for each
contrast by computing differences between the mean of the middle
three FIR delays (i.e., delays 3-5) of the corresponding conditions.
In the first set of contrasts, we computed the difference between the
means of the Acceleration (FIRd3-5) and Constant Velocity (FIRd3-5),
Deceleration (FIRd3-5) and Constant Velocity (FIRd3-5), and the Accel-
eration and Deceleration conditions. The resulting beta differences are
presented in Fig. 3. In this set of contrasts, we concatenated signals re-
gardless of perceptual outcome (i.e., “bounce”or “stream”). For each
contrast, beta differences were tested with respect to zero, holding fam-
ily-wise error at .05 via Bonferroni-correction (across the ROIs). The
analysis for acceleration vs constant velocity comparison (Fig. 3b) on
the right hemisphere, indicated above-zero indices for the posterior in-
Fig. 3. Beta differences obtained from individual ROIs, computed as the differ-
ence between (a) the mean of Deceleration (FIRd3-5) and the mean of Constant
Velocity (FIRd3-5), (b) between the mean of Acceleration (FIRd3-5) and the
mean of Constant Velocity (FIRd3-5), and (c) between the mean of the Decelera-
tion and Acceleration conditions. In this set of contrasts, we concatenated sig-
nals regardless of perceptual outcome (i.e., “bounce”or “stream”). Error bars
represent +/- standard error of the mean. Asterisks denote beta differences that
are significantly different from zero.
sula [t(21) = 2.74, p= .006] and superior temporal gyrus [t
(21) = 3.14, p= .002] only. No comparisons on the left hemisphere
survived statistical correction. Similarly, no difference-indices survived
statistical correction on either hemisphere for the deceleration vs con-
stant velocity contrast (Fig. 3a). For the deceleration vs acceleration
contrast (Fig. 3c), signals for the decelerating events were significantly
weaker than those for the accelerating events in the right PPC [t
(21) = -2.8, p= .005] only.
Next, and perhaps more informatively, we extracted signals for
events during which subjects reported ‘bounce’percepts. This was done
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
to establish better comparisons among events with varying motion pro-
files under comparable perceptual outcomes. Note that we were only able
to do this for 19 of the 22 participants to due failure of the button box
for the remaining three as noted earlier. We then computed additional
indices that referenced each of the responses for the decelerating and
accelerating events with respect to the constant velocity event (i.e., ac-
celeration –constant velocity; deceleration –constant velocity) as well
as differencing responses for the decelerating and accelerating events
(i.e., deceleration –acceleration). (Fig. 4a-b). These difference indices
(beta differences) were tested against zero, holding family-wise error at
.05 via Bonferroni-correction. The analyses for the deceleration vs con-
stant velocity contrast indicated higher signals for the decelerating
stimulus relative to the constant velocity stimulus in right PPC [t
(18) = 2.9, p= .0045], and right PMC [t(18) = 3.96, p<.001]. The
analysis for the accelerating vs constant velocity contrast indicated
higher signals for the accelerating stimulus relative to the constant ve-
locity stimulus again in right PPC [t(18) = 2.8, p= .006], right AG [t
(18) = 3.42, p= .0015], right PIVC+ [t(18) = 3.74, p< .001], right
dOCC [t(18) = 3.37, p= .0015], and right PMC [t(18) = 2.96,
p= .004]. Finally, the analysis for the decelerating vs accelerating con-
trast indicated that indices across all ROIs did not differ from zero.
For this second set of amplitudes (i.e., obtained for the ‘bounce’per-
cepts only), we next returned to the individual (rather than differenced)
condition amplitudes (i.e., FIR delays 3-5; Fig. 4c-d) and computed a 2
(hemisphere) x 3 (motion profile) x 8 (ROI) ANOVA that indicated sig-
nificant main effects of motion profile [F(2,36) = 7.1, p= .003,
= .283], ROI [F(7,126) = 5.37, p< .001, = .23], and a three-
way hemisphere by motion profile by ROI interaction [F
(14,252) = 1.83, p= .035, = .092]. The main effect of ROI re-
flected the fact that overall signals in the PIVC+ were higher than those
in dOCC [t(18) = 3.73, p= .002]. Overall responses of all other ROIs
did not differ]. The interaction was probed with individual two-way
ANOVAs for each ROI. The analysis for the PPC indicated that signals in
the right hemisphere were significantly lower for the constant velocity
stimulus versus the two other stimuli [motion profile by hemisphere in-
teraction, F(2,36) = 3.7, p= .034, = .17]. The analysis for the
PIVC+ indicated that signals for the accelerating events were higher
than signals for the constant velocity events in the right hemisphere
[motion profile by hemisphere interaction, F(2,36) = 4.3, p= .021,
= .19]. Analyses for all other ROIs did not reveal any other differ-
ences among conditions.
Thirdly, we investigated differences in response amplitudes for the
various ROIs and motion profiles, depending on whether the subject re-
ported ‘bounce’vs ‘stream’percepts. In Fig. 5, response differences
computed by subtracting the mean of “Stream”(FIRd3-5) trials from
the mean of “Bounce”(FIRd3-5) reported trials are presented for each
hemisphere. Individual amplitudes (i.e., undifferenced) were then en-
tered a 2 (hemisphere) x 3 (motion profile) x 8 (ROI) x 2 (percept,
bounce/stream percept) ANOVA. The analysis indicated a significant
main effect of ROI [F(7,126) = 7.79, p< .001, = .301], and mo-
tion profile by percept interaction [F(2,36) = 3.67, p= .03,
= .169]. The main effect of ROI reflected the fact that responses
were higher for the PIVC+ than for the PPC, STG, and dOCC (all ps <
.035), and higher for the SFG vs the PPC (p= .003). The interaction
was probed by paired t-tests comparing ‘bounce’versus ‘stream’re-
sponses for each event type. ‘Bounce’amplitudes trended higher than
‘stream’amplitudes for the accelerating stimuli [t(18) = 2.35, p= .03
for accelerating; t(18) = .172, p= .87 for decelerating; t(18) = .64,
p= .53 for constant velocity] though the contrast did not survive a fur-
ther triple Bonferroni adjustment.
3.2.2. fMRI: patterned (multivariate) responses
Next, we ran two sets of multivariate classifications: in the first set
of comparisons, data were patterned regardless of perceptual outcomes.
Fig. 4. Beta differences computed using data from ‘bounce’-perceived trials
only, presented separately for the (a) left and (b) right hemispheres. We dif-
ferenced responses for the deceleration vs constant velocity, acceleration vs
constant velocity, and deceleration vs acceleration events. Asterisks indicate
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
indices that are significantly different from zero. In (c,d), mean undifferenced
amplitudes (corresponding to the mean of FIRd3-5) are presented for each of
the main event types. Error bars represent +/- standard error of the mean.
Fig. 5. Beta differences for the various ROIs and motion profiles computed by
subtracting the mean of ‘Stream’(FIRd3-5) trials from the mean of ‘Bounce’
(FIRd3-5) reported trials. Responses are presented separately for the (a) left
and (b) right hemispheres. Error bars represent +/- standard error of the
In the second set of analyses, data were patterned separately for events
reported as ‘bounce’or ‘stream’events. For all SVM comparisons, clas-
sification accuracies for all ROIs were weak and did not differ signifi-
cantly from shuffled baseline (0.53) as confirmed by t-tests versus this
baseline (FDR < 0.05). While it is entirely possible that the information
of interest here is not encoded by the brain in a multivariate manner,
we suspect that this analysis was hindered by the relatively large imbal-
ance in behavioural responses for certain subjects and stimuli (e.g., far
fewer patterns of ‘stream’available, perhaps weakening the utility of
classification training). Though, it is also important to note that this im-
balance was corrected for in the SVM analysis.
3.2.3. fMRI vs Behaviour
Finally, due to the prominent appearance of the right PPC and right
PIVC+ in the above analyses, we elected to further probe their rele-
vance to behaviour by correlating individual subject activity in each of
these ROIs (taking only responses for ‘bounce’percept trials), for each
event (motion profile) type, with corresponding behaviour (indexed in
terms of the proportion of bounce responses) (Fig. 6). Recall that as
stimuli presented in-bore were downwards-moving events only, then
motion profile here also corresponds to Newtonian-congruency (i.e., ac-
celerating-congruent; decelerating-incongruent). The analyses indi-
cated a significant correlation between behaviour and activity for de-
celerating events in both the right PPC [r(17)= -.57, p= .011] and
Fig. 6. Correlations between individual subject activity in each of the right PPC
and right PIVC+ ROIs (taking only responses for ‘bounce’percept trials), and
the corresponding behaviour metric (indexed in terms of the proportion of
right PIVC+ [r(17)=-.59, p= .008] such that stronger activity in both
ROIs correspond to a lower proportion of ‘bounce’reports. There was
also a significant correlation between behaviour and activity for accel-
erating events in the right PIVC+ [r(17) = .55, p= .014] such that
stronger activity in the PIVC+ corresponded to a higher proportion of
‘bounce’reports, although this particular correlation and the earlier
correlation involving the right PPC do not survive a six-way statistical
correction. GLM signals and behavioural responses for the constant ve-
locity stimulus did not correlate [right PPC, r(17) = -.09, p= .69;
right PIVC+, r(17) = -.06, p=.80].
In the first (behavioural) experiment, we tested the effects of alter-
ing visual motion dynamics on bounce vs stream perceptual interpreta-
tions by presenting observers with events where the two discs moved
with decelerating, constant, or accelerating motion. Sound was pre-
sented in coincidence with the point of overlap of the two objects or
was absent, and object motion was varied (i.e., upwards or down-
wards). We found that motion dynamics acted to shift perceptual inter-
pretations such that events with downwards accelerating visual motion
were more likely to be interpreted as a ‘bounce’event relative to those
with decelerating motion. Curiously, there were no differences in per-
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
ceptual interpretations between the three motion dynamics when stim-
uli moved in the upwards direction. The presence of the auditory tran-
sient acted to shift perceptual interpretations (across all conditions) to-
wards a greater number of ‘bounce’responses.
In the second experiment, we presented identical stimuli (down-
wards only, and with sound transients) to observers while measuring
fMRI responses concurrently with behaviour. We found that events
with accelerating- and to a lesser extent, decelerating motion, elicited
stronger responses in several cortico-vestibular regions, including the
posterior parietal cortex, angular gyrus, the posterior insula, including
the parieto-insular vestibular cortex (referred here as the PIVC+ com-
plex), the dorsal occipital cortex, and premotor cortex, as compared to
events containing uniform motion, but only in the right hemisphere.
Notably, responses in these regions varied depending on perceptual in-
terpretation (i.e., bounce vs stream) such that they were higher during
the reports of ‘bounce’rather than ‘stream’events. Finally, responses in
the right PIVC+ in particular showed a close correspondence (correla-
tion) to behavioural metrics. We visit the behavioural and fMRI find-
ings in turn.
4.1. Behaviour: motion profile matters, but doesn't interact with transient
There are two observations to take away from our behavioural re-
sults. First, while the sound transient leads to increased ‘bounce’re-
sponses relative to no-sound events, altering the motion profile of the
ABE displays modulates perceptual interpretations, but only for down-
wards-moving events. For the downwards-moving events, observers re-
ported a higher proportion of ‘bounce’interpretations when the discs
were accelerating, and the smallest proportion of ‘bounce’interpreta-
tions when the discs were decelerating. Altering motion profiles had no
effect on upwards-moving events. Note of course, in both the behav-
ioural and MRI experiments ‘downwards’is referenced with respect to
the retina. In the laboratory, ‘downwards’is also confluenced with
gravitational-down, but in the MRI, this is not the case as observers lay
supine (i.e., egocentric ‘down’, and stimulus ‘down’are misaligned
with gravitational ‘down’). Thus, our testing configurations, and in par-
ticular our ability to replicate the behavioural findings in-bore allow us
only to speculate on the relevance of the egocentric ‘down’. We note
nevertheless that previous work has indicated that cortico-vestibular
engagement can be brought out using visual-events in the fMRI even
with observers laying supine (Brannick et al., 2021;Indovina et al.,
Second, the addition of the sound transient at the visual point of co-
incidence acts to shift all conditions towards a greater proportion of
‘bounce’reports. The data here are important as this suggests that mo-
tion-profile-based influences of ABE outcomes do not interact or even
require the presence of the transient. That is, these motion-based modu-
lations are evident both with or without the transient (Fig. 1b). Mecha-
nistically, this suggests that motion influences are served by a mecha-
nism independent of that serving auditory integration. during the inter-
pretation of the visual event.
Why does motion profile even matter? It is unlikely that the percep-
tual modulations observed here reflect differing general motion sensi-
tivity to objects moving upwards versus downwards, or to objects mov-
ing with varying motion profiles. To our knowledge, the only reported
directional anisotropies for motion perception have been demonstrated
in eccentric viewing, and specifically, for centripetal motion (Raymond,
1994). Contrast detection thresholds, as well as motion aftereffects for
centrally viewed motion patterns do not change as a function of direc-
tion (Levinson and Sekuler, 1975;Marshak, 1981). It is further unlikely
that the effects observed here merely reflect the degree of visual and au-
ditory information overlap. While this overlap is the greatest in the ac-
celerating events and the smallest in the decelerating events (given the
onset of the sound is synced to the first frame of visual overlap), this re-
mains true for both upwards and downwards movements. Yet, motion
profile only seems to matter for the downwards moving stimuli. In-
stead, we interpret our data in a manner much in line with Sekuler and
Sekuler, 1999)’s speculation on the role of the sound transient: an asser-
tion that the brain resolves ambiguous proximal stimuli while integrat-
ing Newtonian principles. A transient sound, then, helps the brain come
a resolution about what to do with the underspecified retinal data. In a
similar vein, the varying motion profiles serve to disambiguate among
possible interpretations (i.e., here, bounce vs stream). The brain recog-
nizes an accelerating (downwards) event as more realistic in a visual
world constrained by gravity, and finds it easier to interpret than one
that accelerates upwards, or decelerates downwards (without addi-
tional force). The uniform motion event falls somewhere in between: it
is not the least helped by the integration of physical laws, but it is at the
very least not counter to natural laws of gravity. Correspondingly, per-
ceptual outcomes fall somewhere in between those for the accelerating
and decelerating events. We should add that an integration of Newton-
ian physics to arrive at perceptual probabilities is not a novel concept,
and has been presented in the context of action perception (Kaiser et al.,
1992;Kominsky et al., 2017;Muchisky and Bingham, 2002;Runeson
and Frykholm, 1981;Soma Tsutsuse et al., 2020), and the perception of
inanimate and animate events (Muchisky and Bingham, 2002;Runeson
and Frykholm, 1981). Still, there is an outstanding issue to address.
Why does motion profile matter only in the downwards direction?
While we could collapse data from the two directions in accordance to
gravitational congruency (i.e., collapsing the congruent downwards ac-
celerating and upwards decelerating events, and the incongruent down-
wards decelerating and upwards accelerating events, Supplementary
Fig. S1) to better illustrate the relevance of Newtonian consistency, we
would be overlooking the fact that there is a clear perceptual indiffer-
ence to motion profiles when stimuli are moving upwards. Does this re-
flect a weaker tendency to map Newtonian principles to sensory data
for upwards moving stimuli, perhaps, due the relative infrequency with
which upwards vs downwards motion is encountered in the visual
world? Alternatively, might this reflect a difference in acceleration/de-
celeration sensitivity between the two directions (i.e., directional
anisotropy for acceleration/deceleration perception?), although as
noted earlier, there is little evidence in the available literature to sug-
gest this may be the case. While our data cannot tease apart these possi-
bilities, the mechanistic reason underlying this curious indifference for
motion profiles of upwards moving events awaits further empirical clar-
4.2. Relevance of responses in PIVC+ and PPC to the ABE
We first surveyed the brain's response to the varying events while
disregarding perceptual outcomes. We gathered that this analysis may
be interesting to reveal the broader set of regions engaged by the physi-
cal motion profile changes themselves independent of the observer's in-
terpretation of the event. That is, by contrasting these responses with
those that then take the perceptual interpretation into account (e.g., ac-
celeration-bounce vs acceleration-stream), we may then arrive at re-
sponses (regions) that are relevant to the perceptual interpretation,
rather than the physical motion changes per se. We found stimulus-
relevant clusters along the caudate, PPC, AG, PIVC+, STG, dOCC, SFG,
and PMC. Many of these areas have been reported in the two prior
imaging studies with the ABE (fMRI: PPC, insula, PMC; MEG: STG,
Bushara et al., 2003;Zvyagintsev et al., 2011) but also in broader au-
diovisual work (Beauchamp et al., 2004;Calvert et al., 2001;Gao et al.,
2020;Park et al., 2010;Sekiyama et al., 2003). Note that the AG (Brod-
mann 39) is anatomically distinct from the broader IPS (and PPC), and
can be identified as a continuation of the superior/middle temporal gyri
into the inferior parietal lobule. It has an anterior boundary with the
supramarginal gyrus that is marked by the descending portion of the
sulcus of Jensen (Ribas, 2010), and a posterior boundary at the dorsal
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
part of the anterior occipital sulcus (Rademacher et al., 1992). Addi-
tionally, dOCC identified here reflects Brodmann area 19. The cluster's
location reflects the approximate location of V3d (i.e., not primary vi-
sual cortex, V1). Of these ROIs, responses of the PIVC+ and PPC ap-
peared to be sensitive to the varying event types such that right PIVC+
responses were significantly higher during the presentation of acceler-
ating (and decelerating) events versus the constant velocity event, and
responses of the right PPC (rPPC) were significantly lower during the
presentation of decelerating rather than accelerating stimuli. Compar-
ing response amplitudes during ‘bounce’only percepts (presumably tri-
als with audiovisual integration), implicated many of the same regions
as in the initial analysis. We found the rPPC, rAG, rPVIC+, r,dOCC, and
rPMC to be significantly more responsive to accelerating vs constant ve-
locity events. Responses of the rPPC and rPMC were also more respon-
sive to decelerating vs constant velocity events. Again, of these ROIs,
event-related differences reflected in the rPVIC+ and rPPC appeared to
be particularly robust (showing up prominently across the different
variations of statistical analyses).
Notably, among the above regions, responses of the PPC and PIVC+
were response-selective (i.e., more responsive when comparing
‘bounce’vs ‘stream’reports), though only for the accelerating stimuli.
Our finding that responses of the PIVC+ were higher during ‘bounce’
reports than those for visual cortex are broadly consistent with findings
of Bushara et al. (2003), who reported enhanced BOLD responses in
purportedly multisensory regions relative to traditionally unisensory
areas. In our case, however, it should be clear that the effects have noth-
ing to do with the availability of the sound information per se (as it was
always presented in the MRI), but relates to the utility or integration of
the sound which in turn, seems to depend on the motion profile exhib-
ited by the visual trajectories.
The PPC, located posterior to the postcentral sulcus and anterior to
the parieto-occipital sulcus, is a curious region due to its broad reaching
functions and propensity to respond to a variety of stimuli and tasks
(Culham and Kanwisher, 2001). While early lesion work placed the PPC
to serve spatial localization (Ungerleider, 1982), it has since been impli-
cated in stimulus-driven attention, extending to aspects of spatial cogni-
tion including the analysis of external multisensory information (Colby
et al., 1996;Corbetta and Shulman, 2002;Sack, 2009;Sack et al.,
2008). In non-human primates, activity in the parietal lobule, particu-
larly area 7 in the monkey (thought to be homologous to the SPL in the
human; (Anderson, 1988), has been shown to change as a function of
spatial attention, in the visual, auditory and haptic domains (Burton et
al., 1999;Pugh et al., 1996;Vibell et al., 2017,2007). This region seems
to be particularly well-equipped to integrate multi-sensory data due to
its position as part of a broader frontoparietal network that handles rep-
resentations along viewer-centric and allocentric references frames
(Szczepanski et al., 2013).
The second area that appears prominently in our data is the PIVC+.
It is well-demonstrated that the vestibular system (beyond its basic bal-
ance, posture and gait functions), exerts considerable influence on
other sensory systems (Angelaki and Cullen, 2008;Smith et al., 2012),
and plays roles in spatial navigation, learning, and decision-making
(Cullen and Taube, 2017;Smith and Zheng, 2013). Vestibular integra-
tion with other sensory signals begins rather early in the vestibular nu-
clei of the brain stem (Barmack, 2003). From the brainstem, signals are
sent to the thalamus (ventral posterior, ventral lateral, ventral anterior
and geniculate nuclei; e.g., (Kirsch et al., 2016;Lang et al., 1979). Pos-
terior thalamic responses are subsequently projected to vestibular cor-
tex (Akbarian et al., 1992), within which the PIVC, a subregion located
anatomically along the midposterior Sylvian Fissure, is critical for
vestibular processing (Dieterich and Brandt, 2008;Lopez and Blanke,
2011). In recent years, and largely motivated by work in non-human
primates, there has been increasing belief that the human PIVC+ in
fact comprises at least two subregions: the PIVC proper, thought to cor-
respond to the nonhuman primate PIVC, and the posterior insular cor-
tex (PIC) (Frank and Greenlee, 2018;Glasser et al., 2016). The former is
situated adjacent to the insular gyrus and is primarily vestibular; the
latter is situated posteriorly, in the retroinsular cortex, and is thought to
be visuo-vestibular. Both project to brainstem vestibular nuclei. As the
exact delineation of the purported subregions is still controversial, we
present the cluster here as the PIVC+ complex. However, a tentative
segregation of our cluster into ‘anterior’(∼roughly PIVC; TAL +/- 45, -
27, 13) and ‘posterior’(∼roughly PIC; TAL +/- 38, -36, 13) segments
suggest that responses of both show comparable trends here (Supple-
mentary Fig. S2).
Of the two key ROIs in our data, we found responses of the right
PIVC+ to be more closely corresponding to behavioural metrics al-
though only one particular correlation, that involving the decelerating
events, survives the six-way statistical correction. This finding fuels
speculation that there is integration of vestibular data that leads closely
to perceptual resolutions of the ambiguous audiovisual events. Curi-
ously, for decelerating stimuli, as rPIVC+ responses increase, observers
are less likely to report a ‘bounce’percept. By contrast, for accelerating
stimuli, as rPIVC+ responses increase, observers are more likely to re-
port a ‘bounce’percept. We speculate the role of the rPIVC+ here to be
one that signifies the extent of the event's match/mismatch with New-
tonian physics. These vestibular signals are then used to help arrive at a
conclusion as to the distal stimulus, perhaps through the vestibular net-
work's extensive connections with association (parietal) cortex. Such a
role for the PIVC+ would fall in line with Indovina et al. (2005) who
posited that the brain has a distributed network that stores an internal
G- (gravity) model. This distributed vestibular network includes areas
such as the insulae, posterior thalamus, and temporoparietal junctions
that help to calculate tilt and movement based on sensory and vestibu-
lar input. At the very least, we should conclude that rPVIC+ activity
does not merely signify audiovisual integration –if so, responses should
not vary with motion profile. Could this region be part of a broader ‘G’
network that stores statistically-derived templates against which sen-
sory data are weighed?
Our findings of an apparent right lateralisation of the effects are
congruent with earlier TMS work (Maniglia et al., 2012) that found that
disruption of the right (but not left) PPC alters visual-auditory integra-
tion. Lateralization of our effects may relate to the widely reported
right-dominance of responses reflecting spatial attention, and in partic-
ular stimulus-driven shifts of spatial attention along the frontoparietal
network (Arrington et al., 2000;Corbetta et al., 2000). Alternatively,
our findings may reflect a more general right-dominance for cross-
modal sensory integration involving the vestibular cortex (Dieterich et
al., 2003). Interestingly, somatosensory-vestibular interactions, and vi-
suo-vestibular perception of body rotation, both appear to lateralize to
the right (Hashimoto et al., 2013;Philbeck et al., 2006). In another
study, visual (optokinetic stimulation) has been shown to activate
vestibular cortex (among other regions) with right dominance. While
we can't be sure, there appear to be breadcrumb-like hints in the litera-
ture of a right emphasis in broader vestibular function, with lesion data
suggesting that the vestibular cortex on the right hemisphere may play
a stronger role for perceiving body rotations (Philbeck et al., 2006). Our
data might still reflect a vestibular dominance that is handedness-
dependent (Kirsch et al., 2018) as our participants were almost exclu-
Lastly, it is worth restating that our inability to find unique pattern
representations along the visuo-vestibular network should be taken
with a grain of salt. Future work that attempts to better balance the per-
ceptual outcomes in an attempt to give the SVM classifier a better
chance at revealing multivariate response characteristics in this context
would be a worthy endeavor.
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
We investigated multisensory binding in the human brain by intro-
ducing a derivation on the classic Metzger (1934) stimulus that alters
the visual motion profiles of the discs such that they move consistently
or inconsistently with patterns typically encountered in a gravity-
constrained visual environment. We showed that audiovisual percepts
are influenced by the motion profile of the events, and that percepts are
reflected in the brain by modulations of activity in the parieto-insular-
vestibular cortex and posterior parietal cortex. We propose that insular
engagement reported in previous audiovisual work involving visual
motion events does not simply reflect its role as a generic multisensory
integration center, but rather reflects direct involvement of visuo-
vestibular read-outs that contribute to a probabilistic determination of
D.C., J.V., and S.S. conceptualized the study. D.C. and J.V. designed
the research. D.C., P.A., J.V., D.T., D.M. performed the research (behav-
ioural, fMRI data collection). D.C. and P.A. analyzed the data. All au-
thors contributed to the writing of the manuscript.
Declaration of Competing Interests
The authors declare no competing financial interests.
This work was supported by an Early Career Scheme (27612119)
and General Research Fund (176129020) grants (Research Grants
Council, Hong Kong) to D.C. and by the Foundation for the National In-
stitutes of Health Grants (R01EB028627) to A.S.
Supplementary material associated with this article can be found, in
the online version, at doi:10.1016/j.neuroimage.2022.119285.
Akbarian, S., Grüsser, O.J., Guldin, W.O., 1992. Thalamic connections of the vestibular
cortical fields in the squirrel monkey (Saimiri sciureus). J. Comp. Neurol. 326,
Anderson, R.A., 1988. The neurobiological basis of spatial cognition: role of the parietal
lobe. Spat. Cogn. Brain Bases Dev. 57–80.
Angelaki, D.E., Cullen, K.E., 2008. Vestibular system: the many facets of a multimodal
sense. Annu. Rev. Neurosci. 31, 125–150. https://doi.org/10.1146/
Arrington, C.M., Carr, T.H., Mayer, A.R., Rao, S.M., 2000. Neural mechanisms of visual
attention: object-based selection of a region in space. J. Cogn. Neurosci. 12 (Suppl 2),
Barmack, N.H., 2003. Central vestibular system: vestibular nuclei and posterior
cerebellum. Brain Res. Bull., Functional Anatomy of Ear Connections 60, 511–541.
Beauchamp, M.S., Lee, K.E., Argall, B.D., Martin, A., 2004. Integration of auditory and
visual information about objects in superior temporal sulcus. Neuron 41, 809–823.
Brainard, D.H., 1997. The psychophysics toolbox. Spat. Vis. 10, 433–436. https://doi.org/
Brannick, S.M., Chang, D.H.F., Vibell, J.F., 2021. Influences of posture on gravity
perception in the audiovisual bounce inducing effect. J. Vis. 21, 2864. https://
Burton, H., Abend, N.S., MacLeod, A.-M.K., Sinclair, R.J., Snyder, A.Z., Raichle, M.E.,
1999. Tactile attention tasks enhance activation in somatosensory regions of parietal
cortex: a positron emission tomography study. Cereb. Cortex 9, 662–674. https://
Bushara, K.O., Hanakawa, T., Immisch, I., Toma, K., Kansaku, K., Hallett, M., 2003.
Neural correlates of cross-modal binding. Nat. Neurosci. 6, 190–195.
Calvert, G.A., Hansen, P.C., Iversen, S.D., Brammer, M.J., 2001. Detection of audio-visual
integration sites in humans by application of electrophysiological criteria to the BOLD
effect. NeuroImage 14, 427–438. https://doi.org/10.1006/nimg.2001.0812.
Chang, C.-C., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell.
Syst. Technol. 2, 1–27. 2727, 2011. Httpwww Csie Ntu Edu Tw∼Cjlinlibsvm 2.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic
minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357.
Colby, C.L., Duhamel, J.R., Goldberg, M.E., 1996. Visual, presaccadic, and cognitive
activation of single neurons in monkey lateral intraparietal area. J. Neurophysiol. 76,
Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P., Shulman, G.L., 2000.
Voluntary orienting is dissociated from target detection in human posterior parietal
cortex. Nat. Neurosci. 3, 292–297. https://doi.org/10.1038/73009.
Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulus-driven attention
in the brain. Nat. Rev. Neurosci. 3, 201–215. https://doi.org/10.1038/nrn755.
Culham, J.C., Kanwisher, N.G., 2001. Neuroimaging of cognitive functions in human
parietal cortex. Curr. Opin. Neurobiol. 11, 157–163. https://doi.org/10.1016/S0959-
Cullen, K.E., Taube, J.S., 2017. Our sense of direction: progress, controversies and
challenges. Nat. Neurosci. 20, 1465–1473. https://doi.org/10.1038/nn.4658.
De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., Formisano, E., 2008.
Combining multivariate voxel selection and support vector machines for mapping and
classification of fMRI spatial patterns. Neuroimage 43, 44–58.
Dieterich, M., Bense, S., Lutz, S., Drzezga, A., Stephan, T., Bartenstein, P., Brandt, T.,
2003. Dominance for vestibular cortical function in the non-dominant hemisphere.
Cereb. Cortex 13, 994–1007.
Dieterich, M., Brandt, T., 2008. Functional brain imaging of peripheral and central
vestibular disorders. Brain 131, 2538–2552. https://doi.org/10.1093/brain/awn042.
Frank, S.M., Greenlee, M.W., 2018. The parieto-insular vestibular cortex in humans: more
than a single area? J. Neurophysiol. 120, 1438–1450.
Fujisaki, W., Shimojo, S., Kashino, M., Nishida, S., 2004. Recalibration of audiovisual
simultaneity. Nat. Neurosci. 7, 773–778.
Gao, C., Weber, C.E., Wedell, D.H., Shinkareva, S.V., 2020. An fMRI study of affective
congruence across visual and auditory modalities. J. Cogn. Neurosci. 32, 1251–1262.
Glasser, M.F., Coalson, T.S., Robinson, E.C., Hacker, C.D., Harwell, J., Yacoub, E., Ugurbil,
K., Andersson, J., Beckmann, C.F., Jenkinson, M., 2016. A multi-modal parcellation of
human cerebral cortex. Nature 536, 171–178.
Grassi, M., Casco, C., 2010. Audiovisual bounce-inducing effect: when sound congruence
affects grouping in vision. Atten. Percept. Psychophys. 72, 378–386.
Grassi, M., Casco, C., 2009. Audiovisual bounce-inducing effect: attention alone does not
explain why the discs are bouncing. J. Exp. Psychol. Hum. Percept. Perform. 35, 235.
Hashimoto, T., Taoka, M., Obayashi, S., Hara, Y., Tanaka, M., Iriki, A., 2013. Modulation
of cortical vestibular processing by somatosensory inputs in the posterior insula. Brain
Inj 27, 1685–1691. https://doi.org/10.3109/02699052.2013.831128.
Hipp, J.F., Engel, A.K., Siegel, M., 2011. Oscillatory synchronization in large-scale cortical
networks predicts perception. Neuron 69, 387–396.
Indovina, I., Maffei, V., Bosco, G., Zago, M., Macaluso, E., Lacquaniti, F., 2005.
Representation of visual gravitational motion in the human vestibular cortex. Science
Indovina, I., Maffei, V., Pauwels, K., Macaluso, E., Orban, G.A., Lacquaniti, F., 2013.
Simulated self-motion in a visual gravity field: sensitivity to vertical and horizontal
heading in the human brain. Neuroimage 71, 114–124.
Kaiser, M.K., Proffitt, D.R., Whelan, S.M., Hecht, H., 1992. Influence of animation on
dynamical judgments. J. Exp. Psychol. Hum. Percept. Perform. 18, 669–689. https://
Kawabe, T., Miura, K., 2006. Effects of the orientation of moving objects on the perception
of streaming/bouncing motion displays. Percept. Psychophys. 68, 750–758.
Kawachi, Y., Gyoba, J., 2013. Occluded motion alters event perception. Atten. Percept.
Psychophys. 75, 491–500.
Kirsch, V., Boegle, R., Keeser, D., Kierig, E., Ertl-Wagner, B., Brandt, T., Dieterich, M.,
2018. Handedness-dependent functional organizational patterns within the bilateral
vestibular cortical network revealed by fMRI connectivity based parcellation.
Neuroimage 178, 224–237.
Kirsch, V., Keeser, D., Hergenroeder, T., Erat, O., Ertl-Wagner, B., Brandt, T., Dieterich,
M., 2016. Structural and functional connectivity mapping of the vestibular circuitry
from human brainstem to cortex. Brain Struct. Funct. 221, 1291–1308. https://
Kominsky, J.F., Strickland, B., Wertz, A.E., Elsner, C., Wynn, K., Keil, F.C., 2017.
Categories and constraints in causal perception. Psychol. Sci. 28, 1649–1662.
Kriegeskorte, N., Lindquist, M.A., Nichols, T.E., Poldrack, R.A., Vul, E., 2010. Everything
you never wanted to know about circular analysis, but were afraid to ask. J. Cereb.
Blood Flow Metab. 30, 1551–1557.
Lang, W., Büttner-Ennever, J.A., Büttner, U., 1979. Vestibular projections to the monkey
thalamus: an autoradiographic study. Brain Res 177, 3–17. https://doi.org/10.1016/
Levinson, E., Sekuler, R., 1975. The independence of channels in human vision selective
for direction of movement. J. Physiol. 250, 347–366.
Lopez, C., Blanke, O., 2011. The thalamocortical vestibular system in animals and
humans. Brain Res. Rev. 67, 119–146. https://doi.org/10.1016/
Maniglia, M., Grassi, M., Casco, C., Campana, G., 2012. The origin of the audiovisual
bounce inducing effect: a TMS study. Neuropsychologia 50, 1478–1482.
Marshak, W.P., 1981. Perceptual Integration and Differentiation of Directions in Moving
Patterns. Northwestern University (PhD Thesis).
Metzger, W., 1934. Beobachtungen über phänomenale Identität. Psychol. Forsch. 19,
Muchisky, M.M., Bingham, G.P., 2002. Trajectory forms as a source of information about
events. Percept. Psychophys. 64, 15–31. https://doi.org/10.3758/BF03194554.
Park, J.-Y., Gu, B.-M., Kang, D.-H., Shin, Y.-W., Choi, C.-H., Lee, J.-M., Kwon, J.S., 2010.
Integration of cross-modal emotional information in the human brain: An fMRI study.
D.H.F. Chang et al. NeuroImage xxx (xxxx) 119285
Cortex 46, 161–169. https://doi.org/10.1016/j.cortex.2008.06.008.
Pelli, D.G., 1997. The VideoToolbox software for visual psychophysics: transforming
numbers into movies. Spat. Vis. 10, 437–442. https://doi.org/10.1163/
Philbeck, J.W., Behrmann, M., Biega, T., Levy, L., 2006. Asymmetrical perception of body
rotation after unilateral injury to human vestibular cortex. Neuropsychologia 44,
Pugh, K.R., Shaywitz, B.A., Shaywitz, S.E., Fulbright, R.K., Byrd, D., Skudlarski, P.,
Shankweiler, D.P., Katz, L., Constable, R.T., Fletcher, J., Lacadie, C., Marchione, K.,
Gore, J.C., 1996. Auditory selective attention: an fMRI investigation. NeuroImage 4,
Rademacher, J., Galaburda, A.M., Kennedy, D.N., Filipek, P.A., Caviness, V.S., 1992.
Human cerebral cortex: localization, parcellation, and morphometry with magnetic
resonance imaging. J. Cogn. Neurosci. 4, 352–374. https://doi.org/10.1162/
Raymond, J.E., 1994. Directional anisotropy of motion sensitivity across the visual field.
Vision Res 34, 1029–1037.
Ribas, G.C., 2010. The cerebral sulci and gyri. Neurosurg. Focus 28, E2. https://doi.org/
Runeson, S., Frykholm, G., 1981. Visual perception of lifted weight. J. Exp. Psychol. Hum.
Percept. Perform. 7, 733–740. https://doi.org/10.1037/0096-15126.96.36.1993.
Sack, A.T., 2009. Parietal cortex and spatial cognition. Behav. Brain Res. 202, 153–161.
Sack, A.T., Jacobs, C., Martino, F.D., Staeren, N., Goebel, R., Formisano, E., 2008.
Dynamic premotor-to-parietal interactions during spatial imagery. J. Neurosci. 28,
Sekiyama, K., Kanno, I., Miura, S., Sugita, Y., 2003. Auditory-visual speech perception
examined by fMRI and PET. Neurosci. Res. 47, 277–287. https://doi.org/10.1016/
Sekuler, A.B., Sekuler, R., 1999. Collisions between moving visual targets: what controls
alternative ways of seeing an ambiguous display? Perception 28, 415–432.
Sekuler, R., Sekuler, A.B., Lau, R., 1997. Sound changes perception of visual motion.
Nature 384, 308–309.
Serences, J.T., 2004. A comparison of methods for characterizing the event-related BOLD
timeseries in rapid fMRI. Neuroimage 21, 1690–1700.
Smith, A.T., Wall, M.B., Thilo, K.V., 2012. Vestibular inputs to human motion-sensitive
visual cortex. Cereb. Cortex 22, 1068–1077. https://doi.org/10.1093/cercor/bhr179.
Smith, P., Zheng, Y., 2013. From ear to uncertainty: vestibular contributions to cognitive
function. Front. Integr. Neurosci. 7, 84. https://doi.org/10.3389/fnint.2013.00084.
Soma Tsutsuse, K., Vibell, J., Sinnett, S., 2020. Multisensory effects on causal perception.
J. Vis. 20, 1759. https://doi.org/10.1167/jov.20.11.1759.
Szczepanski, S.M., Pinsk, M.A., Douglas, M.M., Kastner, S., Saalmann, Y.B., 2013.
Functional and structural architecture of the human dorsal frontoparietal attention
network. Proc. Natl. Acad. Sci. 110, 15806–15811. https://doi.org/10.1073/
Ungerleider, L.G., 1982. Two cortical visual systems. Anal. Vis. Behav. 549–586.
Vibell, J., Klinge, C., Zampini, M., Nobre, A.C., Spence, C., 2017. Differences between
endogenous attention to spatial locations and sensory modalities. Exp. Brain Res. 235,
Vibell, J., Klinge, C., Zampini, M., Spence, C., Nobre, A.C., 2007. Temporal order is coded
temporally in the brain: early event-related potential latency shifts underlying prior
entry in a cross-modal temporal order judgment task. J. Cogn. Neurosci. 19, 109–120.
Watanabe, K., Shimojo, S., 2001. When sound affects vision: effects of auditory grouping
on visual motion perception. Psychol. Sci. 12, 109–116.
Watanabe, K., Shimojo, S., 1998. Attentional modulation in perception of visual motion
events. Perception 27, 1041–1054.
Zhao, S., Wang, Y., Feng, C., Feng, W., 2020. Multiple phases of cross-sensory interactions
associated with the audiovisual bounce-inducing effect. Biol. Psychol. 149, 107805.
Zhao, S., Wang, Y., Jia, L., Feng, C., Liao, Y., Feng, W., 2017. Pre-coincidence brain
activity predicts the perceptual outcome of streaming/bouncing motion display. Sci.
Rep. 7, 1–11.
Zhao, S., Wang, Y., Xu, H., Feng, C., Feng, W., 2018. Early cross-modal interactions
underlie the audiovisual bounce-inducing effect. NeuroImage 174, 208–218.
Zhou, F., Wong, V., Sekuler, R., 2007. Multi-sensory integration of spatio-temporal
segmentation cues: one plus one does not always equal two. Exp. Brain Res 180,
Zvyagintsev, M., Nikolaev, A.R., Sachs, O., Mathiak, K., 2011. Early attention modulates
perceptual interpretation of multisensory stimuli. Neuroreport 22, 586–591.