PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

This document presents a concise mathematical formulation of Active inference. Active inference is an algorithm that aims to model adaptive behaviors and has a wide range of applications [Da +20], in particular when the number of parameters used to describe a phenomenon is of 'reasonable' size. In the simplest setting, an agent has a generative model of the time evolution of its environment and of the consequences of its actions on its environment. The agent infers beliefs about its environment through noisy observations and plans its actions in a Bayesian fashion, i.e. rewards are stochastic and best action is chosen by maximizing the likelihood of possible actions.
A short presentation of Active Inference
Jules Tsukahara Gr´egoire Sergeant-Perthuis
March 2024
Abstract
This document presents a concise mathematical formulation of Active
inference. Active inference is an algorithm that aims to model adaptive
behaviors and has a wide range of applications [Da +20], in particular
when the number of parameters used to describe a phenomenon is of
‘reasonable’ size. In the simplest setting, an agent has a generative model
of the time evolution of its environment and of the consequences of its
actions on its environment. The agent infers beliefs about its environment
through noisy observations and plans its actions in a Bayesian fashion,
i.e. rewards are stochastic and best action is chosen by maximizing the
likelihood of possible actions.
1 References
This note covers the material of the presentation given at the ‘Paris Mathe-
matical Models of Cognition and Consciousness’ seminar on March 20, 2024.
The main references we used to write this document are [Da +20; Da +24] and
[Lal21]. We will follow [Da +20] and its specification of Active Inference. When
referring to ‘Active Inference’ (in uppercase), we refer to a precise algorithm,
first formalized in [Fri10] and reformulated in [Da +20], rather than ‘active in-
ference’ (in lower case), which is a broader concept that encapsulates cycles of
action and perception, whose formulation could depart from [Fri10]. In other
words, throughout the document, ‘Active Inference’ is synonymous with the
Free Energy Principle.
2 Introduction
In Active Inference, agents are assumed to have an internal world model of
their environment over which they maintain beliefs that are updated with time;
such a hypothesis is sometimes coined the Bayesian Brain Hypothesis. This
internal world model is an incomplete description of the agent’s environment.
IMJ-PRG, Sorbonne Universit´e
LCQB, Sorbonne Universit´e
1
In this case, one could say that an agent has incomplete information about its
environment. There are two possible interpretations of this world model. In the
first one, the agent is ‘conscious’ that it has a world model that accounts for
its environment and leverages it to make optimal decisions or actions. In the
second interpretation, the world model is hard-coded in the agent, for example
through neuronal connections which account for dependencies relations between
the variables of each of the model (see Section 5.2 [TG21]); in this setting, the
reaction of the agent follows active inference in an automated manner. In the
second setting, we can therefore argue that the agent is not ‘conscious’ of having
a world model of its environment but acts accordingly to this world model.
The second interpretation is closer to the conceptual framework of Partially
Observable Markov Decision Process: in this setting, an agent is designed to
answer optimally with incomplete knowledge of its environment. The difference
between POMDP and active inference is discussed for example in [FSM12].
The random variable that accounts for the state of the environment (modulo
the actions of the agent), at a given time step t, will be denoted by St, and the
associated state space, i.e., space of all possible configurations of the environ-
ment, will be denoted by ESt. The environment is to be understood in a broad
sense; it accounts for all the information the agent has on its ‘world’ and could
account for some knowledge of itself, such as position and configuration of some
of its parts. Otis the random variable associated to one observation the agent
can make on its environment at time t, the associated state space is EOt. The
agent can choose to make an action Atfrom a set EAt. In this document, for
simplicity, ESt, EOt, EAtare finite sets.
We assume that at each time step, the spaces ESt,EOt, and EAtare the
same; in other words, for any t=t1,ESt=ESt1,EOt=EOt1, and EAt=EAt1.
We drop the reference to time tin the notation of these spaces: ES,EO,EA.
Let us denote Tas a time horizon which corresponds, for simplicity, to the
number of steps we want the Active Inference algorithm to run for.
The state of the environment up to Tis encoded by the tuple (S0, ..., ST),
denoted S0:T, taking values in ES0:T:= Q0tTESi, which in our setting is
Q0tTES. Similarly, we denote A0:Tas the tuple of actions (A0, ..., AT) and
O0:Tas the tuple of observations (O0, ..., OT). The associated realizations will
be denoted by in lowercase, for example a realization of (S0, ..., ST) is written
(s0...sT).
3 Generative model of the environment and ob-
servations
The time evolution of the environment from time tto time t+ 1 is encoded by a
stochastic map or Markov kernel, denoted by T. Let us denote P(F) as the set
of probability measures over the state space F. A Markov kernel Kfrom state
space Fto state space F1is a (measurable) map K:FP(F1) which sends
2
any element ωFto the probability distribution KωP(F1) that satisfies
Pω1FKω(ω1) = 1.
The Markov kernel Tthat models the possible evolutions of the environment
of the agent is conditioned on the possible actions of the agent at time t. This
implies that T:EAt×EStP(ESt+1 ). This kernel is the same for any time
tand encodes all possible consequences of action aEAon state sESas a
distribution of possible states over ES. Therefore the explicit map is:
T:ES×EAP(ES)
(st, at)7→ T(·|st, at).
The relation between the observations states of the environment of the agent
at time t,St, and observations at time t,Ot, is related through a Markov kernel
fs:ESP(EO). Once again, we assume that this kernel does not depend on
time, corresponding to assuming that the sensors remain the same throughout
the time period of the experiment. We explicitly denote by fs(ot|st)[0,1] the
probability of the event otgiven st. Here sis intended as a shorthand for the
word sensation.
In active inference, the sensory experience of the agent inherently includes a
level of uncertainty which is accounted for by assuming that the sensory kernel
fsis a random variable. In [Da +20], fsis itself a random variable that takes
values in the space of probability kernels from ESP(EO); we consider a
slightly more general version where instead the Markov Kernel depends on a
parameter Θ EΘ, where Θ is a random variable. In this setting, fs
θnow
depends on a realization θEΘ, and fsis redefined to be a kernel from EΘ×ES
to P(EO), i.e. fs:EΘ×ESP(EO).
The dependencies between variables Sand Ocan be summarized by the
following (directed) graphical model shown in Figure 1. Recall that a graphical
model associates to a graph a factorization of a probability distribution [GR08].
stst+1
otot+1
T
fs(·|st)fs(·|st+1 )
Figure 1: Part of the generative model of states of the environment and of
observations
The probability distribution Passociated to Figure 1 for a given θEθand
atEA factorizes as follows:
PSt,t+1,Ot,t+1 |θ,at(st,t+1 , ot,t+1) = T(st+1 |st, at)fs(ot+1|st+1 , θ)fs(ot|st, θ )
(3.1)
3
Equation 3.1 represents only the part corresponding to what happens in time
tand t+1 of the generative model of the agent for a given action and the sensor
associated with θ. We will use the notation pol, for policy, to condense (a0:T1)
and Π will denote the variable A0:T1. The agent assigns weights to collections
of possible actions using a probability distribution, indicating its consideration
of competing outcomes in relation to its choice of action. τcorresponds to a
time up to which the agent has access to observations. It is greater than 0 and
less than the horizon T. In practice, if the agent can remember all possible
observations it made up to step Iof the Active Inference algorithm, then I
equals τ. However, the agent can simulate the environment after τand up to
time Teven though it does not receive any observation; this fact is accounted
for in the generative model that we will now explicitly state. Following [Da +20]
and [Lal21], we introduce the generative model of active inference.
PS0:T,O0:τ,Π,Θ(s0:T, o0:τ,pol, θ)
=fΠ(a0:T1)fΘ(θ)Y
0tT1
T(st+1|st, at)Y
0tτ
fs(ot|st, θ)
Here fΠP(EA0:T1) is a probability distribution over sequences of actions
and fΘP(EΘ) is a probability distribution over the parameter Θ. The asso-
ciated graphical model for fixed sensor θand collection of actions a0. . . aT1is
given in Figure 2.
s0s1· · · sτsτ+1 · · · sT
o0ot+1 oτ
Ta0
fs(·|s0)fs(·|s1)
Ta1Taτ1
fs(·|sτ)
TaτTaτ+1 TaT1
Figure 2: Generative model of state of the environment and of observations up
to time τ, for a given sensor θand collection of actions a0. . . aT1
4 Active Inference algorithm: Inference and ac-
tion
There are two steps in the Active Inference algorithm. The first one is the
Inference step, in which the agent computes an approximation to the posterior
given by conditioning the generative model on observation. Doing so implies
that it updates its belief about future states of the environment.
The second step is a probabilistic variant of choosing the best action that
maximizes a sum of rewards, given the previously computed update of beliefs
on the future outcomes. This will be referred to as the Action step.
4
4.1 Inference step
At step τof the active inference algorithm, the agent has as input observations
o0:τ. The agent wants to compute the posterior of the generative model P, that
is:
PS0:T,A0:T1,Θ|O0:τ(s0:T, a0:T1, θ|o0:τ) (4.1)
To ease readability, we will group the triple (S0:T, A0:T1,Θ) into one ran-
dom variable Xand denote the corresponding state space by EX. Under this
notation, the posterior is written PX|O0:τ(x|o0:τ).
Computing the posterior is generally intractable. Instead, we will proceed
by variational inference (see A for definitions and an exposition of variational in-
ference in a general setting) that is to say, we approximate the posterior through
optimizing an entropy functional under some constraints that accounts for fit-
ness with respect to observations. Thus, Variational Inference is a constrained
optimization problem over the space of distributions QXP(EX), with an
objective function F1(QX) known as variational free energy. F1(QX) arises nat-
urally when minimizing the Kullback-Leibler divergence DKL(QXPX|O0:τ). It
takes the following form:
F1(QX) = X
xEX
QX(x) ln PX,O0:τ(x, o0:τ)S(QX) (4.2)
with S(QX) the entropy of QXdefined as S(QX) = PxEXQX(x) ln QX(x).
To simplify the optimization problem and to obtain better robustness with re-
spect to the variation in the observations, we assume that the QXtake value
in a subspace Fac of P(EX), implicitly defined by imposing the following fac-
torization: QXFac,gΠP(EA0:T1),gΘP(Θ), and t[0, T 1],gt:
EAtP(ESt) such that:
QX(x) = gΠ(a0:T1)gΘ(θ)Y
0tT1
gt(st|at)gT(sT).(4.3)
This is known as the Na¨ıve Bayes assumption or the mean-field approxima-
tion [Bis06]. Let
Q= arg min
QXFac F1(QX).(4.4)
4.2 Action step
During the Inference step, the agent updates its beliefs about the environment.
To select subsequent actions, it is necessary to quantify the agent’s preferred
states. According to those preferences and its inferred beliefs about the environ-
ment, the agent will seek to select actions that are most likely to produce those
states. To do so, the second step of Active Inference is to compute a likelihood
5
distributions qΠP(EA0:τ+1 ) on the actions a0:τ+1 from a second ‘free energy’
term F2(a0:τ, a0:τ+1). The likelihood satisfies the proportionality relation:
qΠ(a0:τ, aτ+1|QX)eF2(a0:τ,aτ+1 ,QX).(4.5)
We let ˜qΠ(a0:τ, aτ+1|QX) = eF2(a0:τ,aτ+1 ,QX)be the approximate likelihood.
This second free energy is often referred to as ‘expected free energy’ in the
literature. It is the sum of two terms referred to as ‘Exploitation’ and ‘Ex-
ploration’. Each of them depends on the previously computed posterior ap-
proximation QS0:T,Θ|A0:T1,O0:τ=o0:τconditioned on the actions A0:T1for the
observations up to t=τ. For simplicity, we write it as QS0:T,Θ|A0:T1,o0:τ. The
free energy F2can thus be written as follows:
F2(a0:τ, aτ+1, QX) = Exploitation(QS0:T,Θ|A0:T1,o0:τ)
+ Exploration(QS0:T,Θ|A0:T1,o0:τ).(4.6)
4.2.1 Exploitation
The Exploitation term corresponds to approaching the preferred next state. It
is encoded by a distribution DSτ+1,ΘP(ESτ+1 ×EΘ) over possible states at
time τ+ 1. In theory, one could also include more future states by replacing
τ+ 1 by t1:t2such that Tt2t1τ+ 1. Following [Da +20], the full
Exploitation term is given by the following Kullback-Leibler divergence:
Exploitation(QS0:T,Θ|A0:T1) = DKL (QSτ+1,Z |A0:T1DSτ+1 ,Θ).(4.7)
4.2.2 Exploration
The exploration term reflects the uncertainty about future observations. Min-
imizing this terms corresponds to choosing actions that will be more likely to
generate observations that will convey more information about the state.
This is achieved by minimizing the expected entropy of the sensation function
fs:
Exploration(QSτ,Θ|A0:T1)
=EQSτ,Θ|A0:T1[S(fs
Oτ+1|Sτ,Θ)]
=EQSτ,Θ|A0:T1fs
Oτ+1|Sτ,Θ[log fs
Oτ+1|Sτ,Θ]
=X
sτ,z,oτ+1
QSτ,Θ|A0:T1(sτ, z, oτ+1 )fs(oτ+1|sτ, θ) ln fs(oτ+1|sτ, θ).
(4.8)
The action chosen is the one that maximizes the marginal over atof the
distribution ˜qπ,
a
τ+1 = arg max
aτ+1EAX
˜a0:T:s.t. ˜aτ+1 =aτ+1
˜qΠa0:T).(4.9)
The agent executes the action, evolving the environment. Inference can then
take place again at step τ+ 1.
6
A Variational inference: inference as minimiza-
tion of Entropy
Consider a joint distribution PX,Y P(E×E1) over two random variables X
E, Y E1. A classical problem is, given an observation ω1on Y, to compute
the posterior PX|Y(ω, ω1) = PX,Y (ω,ω1)
PY(ω1)with PY(ω1) = PωEPX|Y(ω, ω1), the
marginal distribution of Y. However, doing so requires summing over all possible
configurations of X, which can be computationally too costly. This is the case,
for example, when X=X0, ..., XTwith XiSand E=QiTB. Instead, one
resorts to variational inference to compute PX|Yapproximately [Alq20]. We
will now explain what variational inference is, but first let us introduce entropy
and Gibbs free energy. When Eis a finite set, the entropy of a probability
distribution Qon Eis defined as:
S(Q) = X
xE
Q(x) ln Q(x).(A.1)
Let Hbe a measurable function H:ER. For QP(E), one calls
EQ[H]1
βS(Q) the Gibbs free energy; in general β= 1. An important property
is that,
ln X
ωE
eβH (ω)= inf
QP(E)
EQ[βH ]S(Q).(A.2)
The optimal solution to Equation A.2 is given by the Boltzmann distribution
Q(ω) = eH(ω)
PωEeβH (ω).(A.3)
Let H(ω) = ln PX,Y (ω, ω1) and β= 1, then Q(ω) = PX|Y(ω|ω1). There-
fore, solving the optimization problem of Equation A.2 is equivalent to comput-
ing the posterior PX|Y(ω|ω1). Solving Equation A.2 over a subset of distribu-
tions QΘP(E) is called variational inference. If furthermore the Gibbs
free energy is replaced by an approximation, we call it approximate variational
inference.
One remarks that infQP(E)EQ[βH ]S(Q) is equivalent to supQP(E)S(Q)
EQ[βH ]. And this last optimization problem relates, through Lagrange multi-
pliers, to maximizing entropy under energy constraints UR,
sup
QP(E)
EQ[H]=U
X
xE
Q(x) ln Q(x).(A.4)
In the physics literature, one refers to Equation A.4 as MaxEnt [Kes09],
which stands for the principle of maximum entropy and such principle has many
application see [Kes09; DD18]. Variational inference is called the variational
principle.
7
A celebrated example of variational inference is called Naive Bayes in the
machine learning literature (see Chapter 8 [Bis06]) and mean field approxima-
tion in statistical physics; let us now present this case. The global state space
E=QiIEiis the joint configuration of variables XiEi. Θ P(E) is the set
of independently distributed variables, i.e., QI(xi, i I) = QiIQi(xi). One
then solves for example through a gradient descent,
inf
QΘ
EQ[βH ]S(Q).(A.5)
References
[Da +20] Lancelot Da Costa et al. “Active inference on discrete state-spaces:
A synthesis”. In: Journal of Mathematical Psychology 99 (2020),
p. 102447.
[Da +24] Lancelot Da Costa et al. Active inference as a model of agency. 2024.
[Lal21] Rida Lali. Inf´erence Active pour les agents ´emotionnels au sein du
mod`ele projective de la conscience. M1 Intership report done under
the supervision of D. Rudrauf and G. Sergeant-Perthuis. 2021.
[Fri10] Karl J. Friston. “The free-energy principle: a unified brain theory?”
In: Nature Reviews Neuroscience 11 (2010), pp. 127–138. url:https:
//api.semanticscholar.org/CorpusID:5053247.
[TG21] Youri Timsit and Sergeant-Perthuis Gr´egoire. “Towards the Idea of
Molecular Brains”. In: International Journal of Molecular Sciences
22.21 (2021). issn: 1422-0067. doi:10.3390/ijms222111868.url:
https://www.mdpi.com/1422-0067/22/21/11868.
[FSM12] Karl John Friston, Spyridon Samothrakis, and Read Montague. “Ac-
tive inference and agency: optimal control without cost functions”.
In: Biological Cybernetics 106 (2012), pp. 523–541. url:https://
api.semanticscholar.org/CorpusID:253889571.
[GR08] Kevin Gimpel and Daniel Rudoy. Statistical inference in graphical
models. Massachusetts Institute of Technology, Lincoln Laboratory,
2008.
[Bis06] Christopher M. Bishop. Pattern Recognition and Machine Learning.
Springer, 2006.
[Alq20] Pierre Alquier. “Approximate Bayesian Inference”. en. In: Entropy
22.11 (Nov. 2020), p. 1272. issn: 1099-4300. doi:10.3390/e22111272.
url:https://www.mdpi.com/1099-4300/22/11/1272 (visited on
02/20/2023).
[Kes09] H. K. Kesavan. “Jaynes’ maximum entropy principle”. In: Encyclope-
dia of Optimization. Boston, MA: Springer US, 2009, pp. 1779–1782.
isbn: 978-0-387-74759-0. doi:10.1007/978- 0-387- 74759-0_312.
url:https://doi.org/10.1007/978-0-387-74759- 0_312.
8
[DD18] Andrea De Martino and Daniele De Martino. “An introduction to the
maximum entropy approach and its application to inference prob-
lems in biology”. In: Heliyon 4.4 (2018), e00596. issn: 2405-8440.
doi:https : / / doi . org / 10 . 1016 / j . heliyon . 2018 . e00596.
url:https: //www. sciencedirect.com/ science/article/ pii/
S2405844018301695.
9
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
How can single cells without nervous systems perform complex behaviours such as habituation, associative learning and decision making, which are considered the hallmark of animals with a brain? Are there molecular systems that underlie cognitive properties equivalent to those of the brain? This review follows the development of the idea of molecular brains from Darwin’s “root brain hypothesis”, through bacterial chemotaxis, to the recent discovery of neuron-like r-protein networks in the ribosome. By combining a structural biology view with a Bayesian brain approach, this review explores the evolutionary labyrinth of information processing systems across scales. Ribosomal protein networks open a window into what were probably the earliest signalling systems to emerge before the radiation of the three kingdoms. While ribosomal networks are characterised by long-lasting interactions between their protein nodes, cell signalling networks are essentially based on transient interactions. As a corollary, while signals propagated in persistent networks may be ephemeral, networks whose interactions are transient constrain signals diffusing into the cytoplasm to be durable in time, such as post-translational modifications of proteins or second messenger synthesis. The duration and nature of the signals, in turn, implies different mechanisms for the integration of multiple signals and decision making. Evolution then reinvented networks with persistent interactions with the development of nervous systems in metazoans. Ribosomal protein networks and simple nervous systems display architectural and functional analogies whose comparison could suggest scale invariance in information processing. At the molecular level, the significant complexification of eukaryotic ribosomal protein networks is associated with a burst in the acquisition of new conserved aromatic amino acids. Knowing that aromatic residues play a critical role in allosteric receptors and channels, this observation suggests a general role of π systems and their interactions with charged amino acids in multiple signal integration and information processing. We think that these findings may provide the molecular basis for designing future computers with organic processors.
Article
Full-text available
Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.
Article
Full-text available
This is the Editorial article summarizing the scope of the Special Issue: Approximate Bayesian Inference.
Article
Full-text available
A cornerstone of statistical inference, the maximum entropy framework is being increasingly applied to construct descriptive and predictive models of biological systems, especially complex biological networks, from large experimental data sets. Both its broad applicability and the success it obtained in different contexts hinge upon its conceptual simplicity and mathematical soundness. Here we try to concisely review the basic elements of the maximum entropy principle, starting from the notion of ‘entropy’, and describe its usefulness for the analysis of biological systems. As examples, we focus specifically on the problem of reconstructing gene interaction networks from expression data and on recent work attempting to expand our system-level understanding of bacterial metabolism. Finally, we highlight some extensions and potential limitations of the maximum entropy approach, and point to more recent developments that are likely to play a key role in the upcoming challenges of extracting structures and information from increasingly rich, high-throughput biological data.
Article
Full-text available
ABSTRACT Graphical models fuse probability theory and graph theory in such a way as to permit ef- ficient representation and computation with probability distributions. They intuitively capture statistical relationships among,random,variables in a distribution and exploit these relationships to permit tractable algorithms for statistical inference. In recent years, certain types of graphical models, particularly undirected graphical models, Bayesian networks, and dynamic Bayesian net- works (DBNs), have been shown to be applicable to various problems in air and missile defense that involve decision making under uncertainty and estimation in dynamic systems. While the scope of problems addressed by such systems is quite diverse, all require mathematically-sound machinery for dealing with uncertainty. Graphical models provide a robust, flexible framework for representing and computationally handling uncertainty in real-world problems. While the graphical model regime is relatively new, it has deep roots in many fields, as the formalism generalizes many commonly-used stochastic models, including Kalman filters [25] and hidden Markov models [47]. Statistical inference on a graphical model is NP-Hard, but there have been extensive eorts aimed at developing ecient,algorithms for certain classes of models as well as for obtaining approxima- tions for quantities of interest. The literature oers,a rich collection of both exact and approximate statistical inference algorithms, several of which we will describe in detail in this report. iii iv TABLE OF CONTENTS
Article
Full-text available
This paper describes a variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty. We show that optimal control can be cast as active inference. In active inference, both action and posterior beliefs about hidden states minimise a free energy bound on the negative log-likelihood of observed states, under a generative model. In this setting, reward or cost functions are absorbed into prior beliefs about state transitions and terminal states. Effectively, this converts optimal control into a pure inference problem, enabling the application of standard Bayesian filtering techniques. We then consider optimal trajectories that rest on posterior beliefs about hidden states in the future. Crucially, this entails modelling control as a hidden state that endows the generative model with a representation of agency. This leads to a distinction between models with and without inference on hidden control states; namely, agency-free and agency-based models, respectively.
Article
Keywords Entropy and Uncertainty Why Choose Maximum Uncertainty? Shannon Entropy Jaynes’ Maximum Entropy Formalism Applications of MaxEnt and Conclusions See also References
Active inference as a model of agency
  • Lancelot Da
Lancelot Da Costa et al. Active inference as a model of agency. 2024.
Inférence Active pour les agentsémotionnels au sein du modèle projective de la conscience. M1 Intership report done under the supervision of D. Rudrauf and G. Sergeant-Perthuis
  • Rida Lali
Rida Lali. Inférence Active pour les agentsémotionnels au sein du modèle projective de la conscience. M1 Intership report done under the supervision of D. Rudrauf and G. Sergeant-Perthuis. 2021.
The free-energy principle: a unified brain theory?
  • Karl J Friston
Karl J. Friston. "The free-energy principle: a unified brain theory?" In: Nature Reviews Neuroscience 11 (2010), pp. 127-138. url: https: //api.semanticscholar.org/CorpusID:5053247.