ArticlePDF Available

Explaining the Effect of Likelihood Manipulation and Prior Through a Neural Network of the Audiovisual Perception of Space

Authors:

Abstract

Results in the recent literature suggest that multisensory integration in the brain follows the rules of Bayesian inference. However, how neural circuits can realize such inference and how it can be learned from experience is still the subject of active research. The aim of this work is to use a recent neurocomputational model to investigate how the likelihood and prior can be encoded in synapses, and how they affect audio-visual perception, in a variety of conditions characterized by different experience, different cue reliabilities and temporal asynchrony. The model considers two unisensory networks (auditory and visual) with plastic receptive fields and plastic crossmodal synapses, trained during a learning period. During training visual and auditory stimuli are more frequent and more tuned close to the fovea. Model simulations after training have been performed in crossmodal conditions to assess the auditory and visual perception bias: visual stimuli were positioned at different azimuth (±10° from the fovea) coupled with an auditory stimulus at various audio-visual distances (±20°). The cue reliability has been altered by using visual stimuli with two different contrast levels. Model predictions are compared with behavioral data. Results show that model predictions agree with behavioral data, in a variety of conditions characterized by a different role of prior and likelihood. Finally, the effect of a different unimodal or crossmodal prior, re-learning, temporal correlation among input stimuli, and visual damage (hemianopia) are tested, to reveal the possible use of the model in the clarification of important multisensory problems.
A preview of the PDF is not available
... Recent work has begun to explore the question of how prior information is stored at the neural level [45,46]. Generally, two types of prior information have been considered: the prior probabilities associated with unisensory estimates and the joint probability of two stimuli co-occurring. ...
... In terms of the unisensory prior information, some have proposed that this is represented by the density distribution of receptive fields where more frequent sensory events are represented by more dense neural populations. Regarding the binding tendency prior, it has been proposed that cross-modal synaptic connectivity could represent prior information where denser cross-modal synaptic connections could represent a greater tendency for audiovisual information to co-occur [46]. ...
... A recent model [46] utilizing these biologically inspired prior representations was successful at reproducing common multisensory integration illusions such as visual biasing of auditory localization estimates, known as the ventriloquist effect. The authors also explored the ability of the model in a mature state to re-learn input statistical regularities and re-adjust the prior representation through training. ...
Article
Full-text available
Integration of sensory signals that emanate from the same source, such as the visual of lip articulations and the sound of the voice of a speaking individual, can improve perception of the source signal (e.g., speech). Because momentary sensory inputs are typically corrupted with internal and external noise, there is almost always a discrepancy between the inputs, facing the perceptual system with the problem of determining whether the two signals were caused by the same source or different sources. Thus, whether or not multisensory stimuli are integrated and the degree to which they are bound is influenced by factors such as the prior expectation of a common source. We refer to this factor as the tendency to bind stimuli, or for short, binding tendency. In theory, the tendency to bind sensory stimuli can be learned by experience through the acquisition of the probabilities of the co-occurrence of the stimuli. It can also be influenced by cognitive knowledge of the environment. The binding tendency varies across individuals and can also vary within an individual over time. Here, we review the studies that have investigated the plasticity of binding tendency. We discuss the protocols that have been reported to produce changes in binding tendency, the candidate learning mechanisms involved in this process, the possible neural correlates of binding tendency, and outstanding questions pertaining to binding tendency and its plasticity. We conclude by proposing directions for future research and argue that understanding mechanisms and recipes for increasing binding tendency can have important clinical and translational applications for populations or individuals with a deficiency in multisensory integration.
... In the model, each region is simulated with a single neural element. This choice is made for simplicity; given the experimental set-up of Crosse, Foxe, and Molholm (2019), we do not require either multiple units sensitive to a different spatial position in each sensory area as implemented in our previous neurocomputational spatial models (see, Cuppini, Stein, & Rowland, 2018;Cuppini et al., 2011;Cuppini et al., 2012Cuppini et al., , 2014Cuppini et al., , 2017aMagosso, Cuppini, & Ursino, 2012;Ursino, Cuppini, & Magosso, 2017;Ursino et al., 2017;Ursino et al., 2019) nor do we need multiple sensory regions sensitive to different input features, as necessary to realize semantic memory models (Cuppini, Magosso, & Ursino, 2009;Ursino, Cuppini, & Magosso, 2010Ursino, Magosso, & Cuppini, 2009b;Ursino et al., 2018). ...
... This architectural implementation matches with our previous studies and computational models realized to simulate MSI in various experimental conditions and for different perceptual and cognitive processes. For examples, the role played by cross-modal direct connections between sensory primary regions has been found to be critical in case of multisensory illusions, as spatial ventriloquism, sound-induced flash illusion and fusion effect (Cuppini et al., , 2017aMagosso et al., 2012;Ursino, Cuppini, & Magosso, 2017;Ursino et al., 2019); the feedforward synapses, converging to a multisensory region, are critical to explain the integrative abilities of the Superior Colliculus and their maturation (Cuppini et al., 2018;Cuppini et al., 2011;Cuppini et al., 2012); and both mechanisms, and their maturation through specific multisensory experience, simulate and explain how the brain deals with processes of a higher level of complexity, such as the solution of the Causal Inference (Cuppini et al., 2017a) and the acquisition of MSI language abilities (Cuppini et al., 2017b). ...
... In particular, in the present paper we simulated only audio-visual stimuli at a single specific spatial position (and so we used just a single neural unit per area, representing a population of neurons that codes for that position). Previous modeling work, including more neurons to code for different azimuthal positions (Cuppini et al., 2017a;Ursino et al., 2019) demonstrated that cross-modal excitatory connections are at the basis of some illusory phenomena, such as the spatial ventriloquism (Cuppini et al., 2017a;Magosso, Cuppini, & Bertini, 2017;Magosso et al., 2012) and the soundinduced flash illusion . Moreover, previous studies demonstrated that these synapses can be trained by experience, to implement the prior probability of the co-occurrence of audio-visual stimuli in close temporal and spatial proximity . ...
Article
In a simple reaction time task in which auditory and visual stimuli are presented in random sequence alone (A or V) or together (AV), there is a so-called reaction time (RT) cost on trials in which sensory modality switches (A→V) compared to when it repeats (A→A). This is always true for unisensory trials, whereas RTs to AV stimuli preceded by unisensory stimuli are statistically comparable with the Repeat condition (AV→AV). Neural facilitation for Repeat trials or neural inhibition for Switch trials could both account for these effects. Here we used a neural network model (Multisensory Integration with Crossed Inhibitory Dynamics (MICID) model) to test the ability of these two distinct mechanisms, inhibition and facilitation, to produce the specific patterns of behavior that we see experimentally, modeling switch and repeat trials as well as the influence of the interval between the present and the previous trial. The model results are consistent with an inhibitory account in which there is competition between the different sensory modalities, instead of a facilitation account in which the preceding stimulus sensitizes the neural system to its particular sensory modality. Moreover, the model shows that multisensory integration can explain the results in case of multisensory stimuli, where the preceding stimulus has little effect. This is due to faster dynamics for multisensory facilitation compared to cross-sensory inhibition. These findings link the cognitive framework delineated by the empirical results to a plausible neural implementation.
... Literature about computational models of multisensory integration has grown remarkably in the last twenty years [5]. The main computational frameworks are optimal cue combination [6], Bayesian causal inference [8], race [9] and network [10] models. Despite being based on very general mechanisms, these models are typically limited to a specific experimental paradigm (e.g. ...
Conference Paper
Full-text available
Research on the neural processes by which unisensory signals are combined to form a multisensory response has grown remarkably in the recent years. Nevertheless, there is as yet no computational modelling software that allows for explanations for different paradigms and levels of explanation in multisensory integration. We introduce Scikit-NeuroMSI, a Python framework for multisensory integration modelling aimed at fostering the creation of a unifying framework that narrows the gap between neural and behavioural multisensory responses. Here we show how Scikit-NeuroMSI can be used to easily reproduce the spatial ventriloquist effect employing two different modelling approaches: near-optimal bimodal integration and Bayesian Causal Inference.
... Critically, in this illusion, perception is influenced by learned "rules of thumb" regarding how the world ought to appear. In Bayesian terms, these rules of thumb are termed "priors", and these priors are used to guide quick and optimal perceptual judgments (for an outline as to how priors are implemented in audiovisual perception at the neurocomputational level see Ursino et al., 2019). Susceptibility to the SIFI appears to arise from such Bayesian optimal inference, combined with the reliability of the information encoded by each sense (Shams, Ma et al., 2005). ...
Article
Full-text available
Recent studies suggest that the lived environment can affect cognition across the lifespan. We examined, in a large cohort of older adults (n = 3447), whether susceptibility to a multisensory illusion, the Sound-Induced Flash Illusion (SIFI), was influenced by the reported urbanity of current and childhood (at age 14 years) residence. If urban environments help to shape healthy perceptual function, we predicted reduced SIFI susceptibility in urban dwellers. Participants reporting urban, compared with rural, childhood residence were less susceptible to SIFI at longer Stimulus-Onset Asynchronies (SOAs). Those currently residing in urban environments were more susceptible to SIFI at longer SOAs, particularly if they scored low on general cognitive function. These findings held even when controlling for a several covariates, such as age, sex, education, social participation and cognitive ability. Exposure to urban environments in childhood may influence individual differences in perception and offer a multisensory perceptual benefit in older age.
... These types of models are also typically used in single-modal perceptual decision-making (e.g., Ratcliff, 1978;Wang, 2002;Wong and Wang, 2006). In this review, we will only focus on the accumulator and probabilistic models; the neural network (connectionist) models provide finer-grained, more biologically plausible description of neural processes, but on the behavioral level are mostly similar to the models reviewed here (Bogacz et al., 2006;Ma et al., 2006;Wong and Wang, 2006;Ma and Pouget, 2008;Roxin and Ledberg, 2008;Liu et al., 2009;Pouget et al., 2013;Ursino et al., 2014Ursino et al., , 2019Zhang et al., 2016;Meijer et al., 2019). ...
Article
Full-text available
Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimizes the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audiovisual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artifacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artifacts. Finally, we suggest avenues of research toward solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audiovisual illusions can benefit the design of artificial cognitive systems.
... These types of models are also typically used in single-modal perceptual decision-making (e.g., Ratcliff, 1978;Wang, 2002;Wong and Wang, 2006). In this review, we will only focus on the accumulator and probabilistic models; the neural network (connectionist) models provide finer-grained, more biologically plausible description of neural processes, but on the behavioral level are mostly similar to the models reviewed here (Bogacz et al., 2006;Ma et al., 2006;Wong and Wang, 2006;Ma and Pouget, 2008;Roxin and Ledberg, 2008;Liu et al., 2009;Pouget et al., 2013;Ursino et al., 2014Ursino et al., , 2019Zhang et al., 2016;Meijer et al., 2019). ...
Preprint
Full-text available
Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimises the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audio-visual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artefacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artefacts. Finally, we suggest avenues of research towards solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audio-visual illusions can benefit the design of artificial cognitive systems.
Article
Full-text available
The brain integrates information from different sensory modalities to generate a coherent and accurate percept of external events. Several experimental studies suggest that this integration follows the principle of Bayesian estimate. However, the neural mechanisms responsible for this behavior, and its development in a multisensory environment, are still insufficiently understood. We recently presented a neural network model of audio-visual integration (Neural Computation, 2017) to investigate how a Bayesian estimator can spontaneously develop from the statistics of external stimuli. Model assumes the presence of two unimodal areas (auditory and visual) topologically organized. Neurons in each area receive an input from the external environment, computed as the inner product of the sensory-specific stimulus and the receptive field synapses, and a cross-modal input from neurons of the other modality. Based on sensory experience, synapses were trained via Hebbian potentiation and a decay term. Aim of this work is to improve the previous model, including a more realistic distribution of visual stimuli: visual stimuli have a higher spatial accuracy at the central azimuthal coordinate and a lower accuracy at the periphery. Moreover, their prior probability is higher at the center, and decreases toward the periphery. Simulations show that, after training, the receptive fields of visual and auditory neurons shrink to reproduce the accuracy of the input (both at the center and at the periphery in the visual case), thus realizing the likelihood estimate of unimodal spatial position. Moreover, the preferred positions of visual neurons contract toward the center, thus encoding the prior probability of the visual input. Finally, a prior probability of the co-occurrence of audio-visual stimuli is encoded in the cross-modal synapses. The model is able to simulate the main properties of a Bayesian estimator and to reproduce behavioral data in all conditions examined. In particular, in unisensory conditions the visual estimates exhibit a bias toward the fovea, which increases with the level of noise. In cross modal conditions, the SD of the estimates decreases when using congruent audio-visual stimuli, and a ventriloquism effect becomes evident in case of spatially disparate stimuli. Moreover, the ventriloquism decreases with the eccentricity.
Article
Full-text available
Individuals vary in their tendency to bind signals from multiple senses. For the same set of sights and sounds, one individual may frequently integrate multisensory signals and experience a unified percept, whereas another individual may rarely bind them and often experience two distinct sensations. Thus, while this binding/integration tendency is specific to each individual, it is not clear how plastic this tendency is in adulthood, and how sensory experiences may cause it to change. Here, we conducted an exploratory investigation which provides evidence that (1) the brain’s tendency to bind in spatial perception is plastic, (2) that it can change following brief exposure to simple audiovisual stimuli, and (3) that exposure to temporally synchronous, spatially discrepant stimuli provides the most effective method to modify it. These results can inform current theories about how the brain updates its internal model of the surrounding sensory world, as well as future investigations seeking to increase integration tendencies.
Article
Full-text available
Psychophysical studies have frequently found that adults with normal hearing exhibit systematic errors (biases) in their auditory localisation judgments. Here we tested (i) whether systematic localisation errors could reflect reliance on prior knowledge, as has been proposed for other systematic perceptual biases, and (ii) whether auditory localisation biases can be reduced following training with accurate visual feedback. Twenty-four normal hearing participants were asked to localise the position of a noise burst along the azimuth before, during, and after training with visual feedback. Consistent with reliance on prior knowledge to reduce sensory uncertainty, we found that auditory localisation biases increased when auditory localisation uncertainty increased. Specifically, participants mis-localised auditory stimuli as being more eccentric than they were, and did so more when auditory uncertainty was greater. However, biases also increased with eccentricity, despite no corresponding increase in uncertainty, which is not readily explained by use of a simple prior favouring peripheral locations. Localisation biases decreased (improved) following training with visual feedback, but the reliability of the visual feedback stimulus did not change the effects of training. We suggest that further research is needed to identify alternative mechanisms, besides use of prior knowledge, that could account for increased perceptual biases under sensory uncertainty.
Article
Full-text available
The brain efficiently processes multisensory information by selectively combining related signals across the continuous stream of multisensory inputs. To do so, it needs to detect correlation, lag and synchrony across the senses; optimally integrate related information; and dynamically adapt to spatiotemporal conflicts across the senses. Here we show that all these aspects of multisensory perception can be jointly explained by postulating an elementary processing unit akin to the Hassenstein–Reichardt detector—a model originally developed for visual motion perception. This unit, termed the multisensory correlation detector (MCD), integrates related multisensory signals through a set of temporal filters followed by linear combination. Our model can tightly replicate human perception as measured in a series of empirical studies, both novel and previously published. MCDs provide a unified general theory of multisensory processing, which simultaneously explains a wide spectrum of phenomena with a simple, yet physiologically plausible model.
Article
Full-text available
Unlabelled: Optimal use of sensory information requires that the brain estimates the reliability of sensory cues, but the neural correlate of cue reliability relevant for behavior is not well defined. Here, we addressed this issue by examining how the reliability of spatial cue influences neuronal responses and behavior in the owl's auditory system. We show that the firing rate and spatial selectivity changed with cue reliability due to the mechanisms generating the tuning to the sound localization cue. We found that the correlated variability among neurons strongly depended on the shape of the tuning curves. Finally, we demonstrated that the change in the neurons' selectivity was necessary and sufficient for a network of stochastic neurons to predict behavior when sensory cues were corrupted with noise. This study demonstrates that the shape of tuning curves can stand alone as a coding dimension of environmental statistics. Significance statement: In natural environments, sensory cues are often corrupted by noise and are therefore unreliable. To make the best decisions, the brain must estimate the degree to which a cue can be trusted. The behaviorally relevant neural correlates of cue reliability are debated. In this study, we used the barn owl's sound localization system to address this question. We demonstrated that the mechanisms that account for spatial selectivity also explained how neural responses changed with degraded signals. This allowed for the neurons' selectivity to capture cue reliability, influencing the population readout commanding the owl's sound-orienting behavior.
Article
Recently, experimental and theoretical research has focused on the brain's abilities to extract information from a noisy sensory environment and how cross-modal inputs are processed to solve the causal inference problem to provide the best estimate of external events. Despite the empirical evidence suggesting that the nervous system uses a statistically optimal and probabilistic approach in addressing these problems, little is known about the brain's architecture needed to implement these computations.
Article
In primates, posterior auditory cortical areas are thought to be part of a dorsal auditory pathway that processes spatial information. But how posterior (and other) auditory areas represent acoustic space remains a matter of debate. Here we provide new evidence based on functional magnetic resonance imaging (fMRI) of the macaque indicating that space is predominantly represented by a distributed hemifield code rather than by a local spatial topography. Hemifield tuning in cortical and subcortical regions emerges from an opponent hemispheric pattern of activation and deactivation that depends on the availability of interaural delay cues. Importantly, these opponent signals allow responses in posterior regions to segregate space similarly to a hemifield code representation. Taken together, our results reconcile seemingly contradictory views by showing that the representation of space follows closely a hemifield code and suggest that enhanced posterior-dorsal spatial specificity in primates might emerge from this form of coding.
Article
Recent theoretical and experimental studies suggest that in multisensory conditions, the brain performs a near-optimal Bayesian estimate of external events, giving more weight to the more reliable stimuli. However, the neural mechanisms responsible for this behavior, and its progressive maturation in a multisensory environment, are still insufficiently understood. The aim of this letter is to analyze this problem with a neural network model of audiovisual integration, based on probabilistic population coding-the idea that a population of neurons can encode probability functions to perform Bayesian inference. The model consists of two chains of unisensory neurons (auditory and visual) topologically organized. They receive the corresponding input through a plastic receptive field and reciprocally exchange plastic cross-modal synapses, which encode the spatial co-occurrence of visual-auditory inputs. A third chain of multisensory neurons performs a simple sum of auditory and visual excitations. The work includes a theoretical part and a computer simulation study. We show how a simple rule for synapse learning (consisting of Hebbian reinforcement and a decay term) can be used during training to shrink the receptive fields and encode the unisensory likelihood functions. Hence, after training, each unisensory area realizes a maximum likelihood estimate of stimulus position (auditory or visual). In cross-modal conditions, the same learning rule can encode information on prior probability into the cross-modal synapses. Computer simulations confirm the theoretical results and show that the proposed network can realize a maximum likelihood estimate of auditory (or visual) positions in unimodal conditions and a Bayesian estimate, with moderate deviations from optimality, in cross-modal conditions. Furthermore, the model explains the ventriloquism illusion and, looking at the activity in the multimodal neurons, explains the automatic reweighting of auditory and visual inputs on a trial-by-trial basis, according to the reliability of the individual cues.
Article
Hemianopic patients retain some abilities to integrate audiovisual stimuli in the blind hemifield, showing both modulation of visual perception by auditory stimuli and modulation of auditory perception by visual stimuli. Indeed, conscious detection of a visual target in the blind hemifield can be improved by a spatially coincident auditory stimulus (auditory enhancement of visual detection), while a visual stimulus in the blind hemifield can improve localization of a spatially coincident auditory stimulus (visual enhancement of auditory localization). To gain more insight into the neural mechanisms underlying these two perceptual phenomena, we propose a neural network model including areas of neurons representing the retina, primary visual cortex (V1), extrastriate visual cortex, auditory cortex and the Superior Colliculus (SC). The visual and auditory modalities in the network interact via both direct cortical-cortical connections and subcortical-cortical connections involving the SC; the latter, in particular, integrates visual and auditory information and projects back to the cortices. Hemianopic patients were simulated by unilaterally lesioning V1, and preserving spared islands of V1 tissue within the lesion, to analyze the role of residual V1 neurons in mediating audiovisual integration. The network is able to reproduce the audiovisual phenomena in hemianopic patients, linking perceptions to neural activations, and disentangles the individual contribution of specific neural circuits and areas via sensitivity analyses. The study suggests i) a common key role of SC-cortical connections in mediating audiovisual phenomena; ii) a different role of visual cortices in the two phenomena: auditory enhancement of conscious visual detection being conditional on surviving V1 islands, while visual enhancement of auditory localization persisting even after complete V1 damage. The present study may contribute to advance understanding of the audiovisual dialogue between cortical and subcortical structures in healthy and unisensory deficit conditions.
Article
Previous studies have shown a surprising amount of between-subjects variability in the strength of interactions between sensory modalities. For the same set of stimuli, some subjects exhibit strong interactions, whereas others exhibit weak interactions. To date, little is known about what underlies this variability. Sensory integration in the brain could be governed by a global mechanism or by task-specific mechanisms that could be either stable or variable across time. We used a rigorous quantitative tool (Bayesian causal inference) to investigate whether integration (i.e., binding) tendencies generalize across tasks and are stable across time. We report for the first time that individuals' binding tendencies are stable across time but are task-specific. These results provide evidence against the hypothesis that sensory integration is governed by a single, global parameter in the brain.