Preprint

Balancing Active Inference and Active Learning with Deep Variational Predictive Coding for EEG

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

This paper discusses representation learning from electroencephalographic (EEG) signal with deep predictive coding networks. We introduce a hierarchical probabilistic network that minimises prediction error on multiple levels. While the lowest layer predicts brain activity directly, higher layers abstract away from the data and predict sequences of the hidden states in lower layers. The network captures both expected and actual uncertainty by relating predicted and observed mean and variance of the state posteriors. Each layer minimises (expected) surprise either with or without sampling new evidence from the layer below. This structure motivates both active learning and active inference as means to learn representations. Active learning refers to model parameter exploration which allows to learn regularities, especially when they are stable between trials. Active inference refers to hidden state exploration, a process that enables dynamic inference of the current context using the learned generative model. We show that separating and weighting the internally propagated errors from those used for model weight updates allows to define prediction errors independently for each learning type. We train the model on EEG data recorded during free reading and apply it to adaptive Fixation Related Potential (FRP) prediction.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This study introduces PV-RNN, a novel variational RNN inspired by the predictive-coding ideas. The model learns to extract the probabilistic structures hidden in fluctuating temporal patterns by dynamically changing the stochasticity of its latent states. Its architecture attempts to address two major concerns of variational Bayes RNNs: how can latent variables learn meaningful representations and how can the inference model transfer future observations to the latent variables. PV-RNN does both by introducing adaptive vectors mirroring the training data, whose values can then be adapted differently during evaluation. Moreover, prediction errors during backpropagation-rather than external inputs during the forward computation-are used to convey information to the network about the external data. For testing, we introduce error regression for predicting unseen sequences as inspired by predictive coding that leverages those mechanisms. As in other Variational Bayes RNNs, our model learns by maximizing a lower bound on the marginal likelihood of the sequential data, which is composed of two terms: the negative of the expectation of prediction errors; and the negative of the Kull-back-Leibler divergence between the prior and the approximate posterior distributions. The model introduces a weighting parameter, the meta-prior, to balance the optimization pressure placed on those two terms. We test the model on two datasets with proba-bilistic structures and show that with high values of the meta-prior the network develops deterministic chaos through which the data's randomness is imitated. For low values, the model behaves as a random process. The network performs best on intermediate values, and is able to capture the latent probabilistic structure with good generalization. Analyzing the meta-prior's impact on the network allows to precisely study the theoretical value and practical benefits of incorporating stochastic dynamics in our model. We demonstrate better prediction performance on a robot imitation task with our model using error regression compared to a standard variational Bayes model lacking such a procedure.
Article
Full-text available
Successful behaviour depends on the right balance between maximising reward and soliciting information about the world. Here, we show how different types of information-gain emerge when casting behaviour as surprise minimisation. We present two distinct mechanisms for goal-directed exploration that express separable profiles of active sampling to reduce uncertainty. ‘Hidden state’ exploration motivates agents to sample unambiguous observations to accurately infer the (hidden) state of the world. Conversely, ‘model parameter’ exploration, compels agents to sample outcomes associated with high uncertainty, if they are informative for their representation of the task structure. We illustrate the emergence of these types of information-gain, termed active inference and active learning, and show how these forms of exploration induce distinct patterns of ‘Bayes-optimal’ behaviour. Our findings provide a computational framework for understanding how distinct levels of uncertainty systematically affect the exploration-exploitation trade-off in decision-making.
Article
Full-text available
We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4–6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement.
Article
Full-text available
This article offers a formal account of curiosity and insight in terms of active (Bayesian) inference. It deals with the dual problem of inferring states of the world and learning its statistical structure. In contrast to current trends in machine learning (e.g., deep learning), we focus on how people attain insight and understanding using just a handful of observations, which are solicited through curious behavior. We use simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies. This epistemic behavior closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity. We then move from epistemic learning to model selection or structure learning to show how abductive processes emerge when agents test plausible hypotheses about symmetries (i.e., invariances or rules) in their generative models. The ensuing Bayesian model reduction evinces mechanisms associated with sleep and has all the hallmarks of "aha" moments. This formulation moves toward a computational account of consciousness in the pre-Cartesian sense of sharable knowledge (i.e., con: "together"; scire: "to know").
Article
Full-text available
Predictive coding is an influential model emphasizing interactions between feedforward and feedback signals. Here, we investigated the temporal dynamics of these interactions. Two gray disks with different versions of the same stimulus, one enabling predictive feedback (a 3D-shape) and one impeding it (random-lines), were simultaneously presented on the left and right of fixation. Human subjects judged the luminance of the two disks while EEG was recorded. The choice of 3D-shape or random-lines as the brighter disk was used to assess the influence of feedback signals on sensory processing in each trial (i.e., as a measure of post-stimulus predictive coding efficiency). Independently of the spatial response (left/right), we found that this choice fluctuated along with the pre-stimulus phase of two spontaneous oscillations: a ~5 Hz oscillation in contralateral frontal electrodes and a ~16 Hz oscillation in contralateral occipital electrodes. This pattern of results demonstrates that predictive coding is a rhythmic process, and suggests that it could take advantage of faster oscillations in low-level areas and slower oscillations in high-level areas.
Article
Full-text available
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting.
Article
Full-text available
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research.
Article
Full-text available
Eyetracking during reading has provided a critical source of on-line behavioral data informing basic theory in language processing. Similarly, event-related potentials (ERPs) have provided an important on-line measure of the neural correlates of language processing. Recently there has been strong interest in co-registering eyetracking and ERPs from simultaneous recording to capitalize on the strengths of both techniques, but a challenge has been devising approaches for controlling artifacts produced by eye movements in the EEG waveform. In this paper we describe our approach to correcting for eye movements in EEG and demonstrate its applicability to reading. The method is based on independent components analysis, and uses three criteria for identifying components tied to saccades: (1) component loadings on the surface of the head are consistent with eye movements; (2) source analysis localizes component activity to the eyes, and (3) the temporal activation of the component occurred at the time of the eye movement and differed for right and left eye movements. We demonstrate this method's applicability to reading by comparing ERPs time-locked to fixation onset in two reading conditions. In the text-reading condition, participants read paragraphs of text. In the pseudo-reading control condition, participants moved their eyes through spatially similar pseudo-text that preserved word locations, word shapes, and paragraph spatial structure, but eliminated meaning. The corrected EEG, time-locked to fixation onsets, showed effects of reading condition in early ERP components. The results indicate that co-registration of eyetracking and EEG in connected-text paragraph reading is possible, and has the potential to become an important tool for investigating the cognitive and neural bases of on-line language processing in reading.
Article
Full-text available
The descending projections from motor cortex share many features with top-down or backward connections in visual cortex; for example, corticospinal projections originate in infragranular layers, are highly divergent and (along with descending cortico-cortical projections) target cells expressing NMDA receptors. This is somewhat paradoxical because backward modulatory characteristics would not be expected of driving motor command signals. We resolve this apparent paradox using a functional characterisation of the motor system based on Helmholtz's ideas about perception; namely, that perception is inference on the causes of visual sensations. We explain behaviour in terms of inference on the causes of proprioceptive sensations. This explanation appeals to active inference, in which higher cortical levels send descending proprioceptive predictions, rather than motor commands. This process mirrors perceptual inference in sensory cortex, where descending connections convey predictions, while ascending connections convey prediction errors. The anatomical substrate of this recurrent message passing is a hierarchical system consisting of functionally asymmetric driving (ascending) and modulatory (descending) connections: an arrangement that we show is almost exactly recapitulated in the motor system, in terms of its laminar, topographic and physiological characteristics. This perspective casts classical motor reflexes as minimising prediction errors and may provide a principled explanation for why motor cortex is agranular.
Article
Full-text available
We have suggested that the mirror-neuron system might be usefully understood as implementing Bayes-optimal perception of actions emitted by oneself or others. To substantiate this claim, we present neuronal simulations that show the same representations can prescribe motor behavior and encode motor intentions during action-observation. These simulations are based on the free-energy formulation of active inference, which is formally related to predictive coding. In this scheme, (generalised) states of the world are represented as trajectories. When these states include motor trajectories they implicitly entail intentions (future motor states). Optimizing the representation of these intentions enables predictive coding in a prospective sense. Crucially, the same generative models used to make predictions can be deployed to predict the actions of self or others by simply changing the bias or precision (i.e. attention) afforded to proprioceptive signals. We illustrate these points using simulations of handwriting to illustrate neuronally plausible generation and recognition of itinerant (wandering) motor trajectories. We then use the same simulations to produce synthetic electrophysiological responses to violations of intentional expectations. Our results affirm that a Bayes-optimal approach provides a principled framework, which accommodates current thinking about the mirror-neuron system. Furthermore, it endorses the general formulation of action as active inference.
Article
Full-text available
This paper considers prediction and perceptual categorization as an inference problem that is solved by the brain. We assume that the brain models the world as a hierarchy or cascade of dynamical systems that encode causal structure in the sensorium. Perception is equated with the optimization or inversion of these internal models, to explain sensory data. Given a model of how sensory data are generated, we can invoke a generic approach to model inversion, based on a free energy bound on the model's evidence. The ensuing free-energy formulation furnishes equations that prescribe the process of recognition, i.e. the dynamics of neuronal activity that represent the causes of sensory input. Here, we focus on a very general model, whose hierarchical and dynamical structure enables simulated brains to recognize and predict trajectories or sequences of sensory states. We first review hierarchical dynamical models and their inversion. We then show that the brain has the necessary infrastructure to implement this inversion and illustrate this point using synthetic birds that can recognize and categorize birdsongs.
Article
Full-text available
We describe a model of visual processing in which feedback connections from a higher- to a lower-order visual cortical area carry predictions of lower-level neural activities, whereas the feedforward connections carry the residual errors between the predictions and the actual lower-level activities. When exposed to natural images, a hierarchical network of model neurons implementing such a model developed simple-cell-like receptive fields. A subset of neurons responsible for carrying the residual errors showed endstopping and other extra-classical receptive-field effects. These results suggest that rather than being exclusively feedforward phenomena, nonclassical surround effects in the visual cortex may also result from cortico-cortical feedback as a consequence of the visual system using an efficient hierarchical strategy for encoding natural images.
Article
Two theoretical ideas have emerged recently with the ambition to provide a unifying functional explanation of neural population coding and dynamics: predictive coding and Bayesian inference. Here, we describe the two theories and their combination into a single framework: Bayesian predictive coding. We clarify how the two theories can be distinguished, despite sharing core computational concepts and addressing an overlapping set of empirical phenomena. We argue that predictive coding is an algorithmic / representational motif that can serve several different computational goals of which Bayesian inference is but one. Conversely, while Bayesian inference can utilize predictive coding, it can also be realized by a variety of other representations. We critically evaluate the experimental evidence supporting Bayesian predictive coding and discuss how to test it more directly.
Conference Paper
The current paper proposes a novel variational Bayes predictive coding RNN model, which can learn to generate fluctuated temporal patterns from exemplars. The model learns to maximize the lower bound of the weighted sum of the regularization and reconstruction error terms. We examined how this weighting can affect development of different types of information processing while learning fluctuated temporal patterns. Simulation results show that strong weighting of the reconstruction term causes the development of deterministic chaos for imitating the randomness observed in target sequences, while strong weighting of the regularization term causes the development of stochastic dynamics imitating probabilistic processes observed in targets. Moreover, results indicate that the most generalized learning emerges between these two extremes. The paper concludes with implications in terms of the underlying neuronal mechanisms for autism spectrum disorder.
Conference Paper
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
Conference Paper
While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning - leveraging unlabeled examples to learn about the structure of a domain - remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network ("PredNet") architecture that is inspired by the concept of "predictive coding" from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and generalizing across video datasets. These results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.
Article
This chapter discusses how many features of cortical anatomy and physiology can be understood in the light of a predictive coding theory of brain function. In Sect. 7.1, we briefly discuss the theoretical reasons to suppose that the brain is likely to use predictive coding. One key theoretical underpinning of predictive coding is the free energy principle, which argues that brains must maximize the evidence for their (generative) model of sensory inputs: A process of ‘active inference’. In Sect. 7.2, we discuss how active inference predicts commonalities in the extrinsic connections of sensory and motor systems. Such commonalities are found in their hierarchical structure (shown by laminar characteristics), their topography, their pharmacology and physiology. In Sect. 7.3, we show how the equations describing hierarchical message passing within a predictive coding scheme can be mapped on to key features of intrinsic connections, namely the canonical cortical microcircuit, and their implications for the oscillatory dynamics of different cell populations. In Sect. 7.4, we briefly review some empirical evidence for predictive coding in the brain.
Article
In this paper, we explore the inclusion of random variables into the dynamic latent state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, our variational RNN (VRNN) is able to learn to model the kind of variability observed in highly-structured sequential data (such as speech). We empirically evaluate the proposed model against related sequential models on five sequence datasets, four of speech and one of handwriting. Our results show the importance of the role random variables can play in the RNN dynamic latent state.
Article
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion of related topics in machine learning research are also presented.
Conference Paper
We present an active learning scheme that exploits cluster structure in data.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Article
An unsupervised learning algorithm for a multilayer network of stochastic neurons is described. Bottom-up "recognition" connections convert the input into representations in successive hidden layers, and top-down "generative" connections reconstruct the representation in one layer from the representation in the layer above. In the "wake" phase, neurons are driven by recognition connections, and generative connections are adapted to increase the probability that they would reconstruct the correct activity vector in the layer below. In the "sleep" phase, neurons are driven by generative connections, and recognition connections are adapted to increase the probability that they would produce the correct activity vector in the layer above.
Article
The basic principles of linear predictive coding (LPC) are presented. Least-squares methods for obtaining the LPC coefficients characterizing the all-pole filter are described. Computational factors, instantaneous updating, and spectral estimation are discussed.< >
Deep predictive coding network with local recurrent processing for object recognition
  • K Han
  • H Wen
  • Y Zhang
  • D Fu
  • E Culurciello
  • Z Liu
K. Han, H. Wen, Y. Zhang, D. Fu, E. Culurciello, and Z. Liu, "Deep predictive coding network with local recurrent processing for object recognition," in Advances in Neural Information Processing Systems, 2018, pp. 9201-9213.
Representation learning with contrastive predictive coding
  • A V Oord
  • Y Li
  • O Vinyals
A. v. d. Oord, Y. Li, and O. Vinyals, "Representation learning with contrastive predictive coding," arXiv preprint arXiv:1807.03748, 2018.
Learning latent dynamics for planning from pixels
  • D Hafner
  • T Lillicrap
  • I Fischer
  • R Villegas
  • D Ha
  • H Lee
  • J Davidson
D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, "Learning latent dynamics for planning from pixels," arXiv preprint arXiv:1811.04551, 2018.
Deep variational bayes filters: Unsupervised learning of state space models from raw data
  • M Karl
  • M Soelch
  • J Bayer
  • P Van Der
  • Smagt
M. Karl, M. Soelch, J. Bayer, and P. Van der Smagt, "Deep variational bayes filters: Unsupervised learning of state space models from raw data," arXiv preprint arXiv:1605.06432, 2016.
Probabilistic recurrent state-space models
  • A Doerr
  • C Daniel
  • M Schiegg
  • D Nguyen-Tuong
  • S Schaal
  • M Toussaint
  • S Trimpe
A. Doerr, C. Daniel, M. Schiegg, D. Nguyen-Tuong, S. Schaal, M. Toussaint, and S. Trimpe, "Probabilistic recurrent state-space models," arXiv preprint arXiv:1801.10395, 2018.
Learning and querying fast generative models for reinforcement learning
  • L Buesing
  • T Weber
  • S Racaniere
  • S Eslami
  • D Rezende
  • D P Reichert
  • F Viola
  • F Besse
  • K Gregor
  • D Hassabis
L. Buesing, T. Weber, S. Racaniere, S. Eslami, D. Rezende, D. P. Reichert, F. Viola, F. Besse, K. Gregor, D. Hassabis et al., "Learning and querying fast generative models for reinforcement learning," arXiv preprint arXiv:1802.03006, 2018.
Temporal difference variational auto-encoder
  • K Gregor
  • G Papamakarios
  • F Besse
  • L Buesing
  • T Weber
K. Gregor, G. Papamakarios, F. Besse, L. Buesing, and T. Weber, "Temporal difference variational auto-encoder," arXiv preprint arXiv:1806.03107, 2018.