Available via license: CC BY 4.0
Content may be subject to copyright.
Available via license: CC BY 4.0
Content may be subject to copyright.
Journal of Mathematical Psychology 99 (2020) 102447
Contents lists available at ScienceDirect
Journal of Mathematical Psychology
journal homepage: www.elsevier.com/locate/jmp
Review
Active inference on discrete state-spaces: A synthesis
Lancelot Da Costa a,b,∗, Thomas Parr b, Noor Sajid b, Sebastijan Veselic b, Victorita Neacsu b,
Karl Friston b
aDepartment of Mathematics, Imperial College London, London, SW7 2RH, United Kingdom
bWellcome Centre for Human Neuroimaging, University College London, London, WC1N 3AR, United Kingdom
article info
Article history:
Received 17 April 2020
Received in revised form 23 July 2020
Accepted 3 September 2020
Available online 6 November 2020
Keywords:
Active inference
Free energy principle
Process theory
Variational Bayesian inference
Markov decision process
Mathematical review
abstract
Active inference is a normative principle underwriting perception, action, planning, decision-making
and learning in biological or artificial agents. From its inception, its associated process theory has
grown to incorporate complex generative models, enabling simulation of a wide range of complex
behaviours. Due to successive developments in active inference, it is often difficult to see how its
underlying principle relates to process theories and practical implementation. In this paper, we try to
bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-
space models. This technical summary provides an overview of the theory, derives neuronal dynamics
from first principles and relates this dynamics to biological processes. Furthermore, this paper provides
a fundamental building block needed to understand active inference for mixed generative models;
allowing continuous sensations to inform discrete representations. This paper may be used as follows:
to guide research towards outstanding challenges, a practical guide on how to implement active
inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological
responses that may be used to make empirical predictions.
©2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
Contents
1. Introduction......................................................................................................................................................................................................................... 2
2. Active inference .................................................................................................................................................................................................................. 3
3. Discrete state-space generative models........................................................................................................................................................................... 5
4. Variational Bayesian inference.......................................................................................................................................................................................... 5
4.1. Free energy and model evidence......................................................................................................................................................................... 5
4.2. On the family of approximate posteriors ........................................................................................................................................................... 6
4.3. Computing the variational free energy ............................................................................................................................................................... 7
5. Perception............................................................................................................................................................................................................................ 8
5.1. Plausibility of neuronal dynamics........................................................................................................................................................................ 9
6. Planning, decision-making and action selection............................................................................................................................................................. 9
6.1. Planning and decision-making ............................................................................................................................................................................. 9
6.2. Action selection, policy-independent state-estimation ..................................................................................................................................... 10
6.3. Biological plausibility ............................................................................................................................................................................................ 10
6.4. Pruning of policy trees.......................................................................................................................................................................................... 10
6.5. Discussion of the action–perception cycle ......................................................................................................................................................... 10
7. Properties of the expected free energy ........................................................................................................................................................................... 11
8. Learning ............................................................................................................................................................................................................................... 11
9. Structure learning............................................................................................................................................................................................................... 13
9.1. Bayesian model reduction .................................................................................................................................................................................... 13
9.2. Bayesian model expansion ................................................................................................................................................................................... 14
10. Discussion............................................................................................................................................................................................................................ 15
11. Conclusion ........................................................................................................................................................................................................................... 16
Declaration of competing interest.................................................................................................................................................................................... 16
∗Corresponding author at: Department of Mathematics, Imperial College London, London, SW7 2RH, United Kingdom.
E-mail address: l.da-costa@imperial.ac.uk (L. Da Costa).
https://doi.org/10.1016/j.jmp.2020.102447
0022-2496/©2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
.............................................................................................................................................................................................................................................. 16
Appendix A. More complex generative models.............................................................................................................................................................. 16
A.1. Learning B and D ................................................................................................................................................................................................... 16
A.2. Complexifying the prior over policies................................................................................................................................................................. 16
A.3. Multiple state and outcome modalities .............................................................................................................................................................. 17
A.4. Deep temporal models.......................................................................................................................................................................................... 17
Appendix B. Computation of dynamics underlying perception.................................................................................................................................... 17
B.1. Free energy conditioned upon a policy .............................................................................................................................................................. 17
B.2. Free energy gradients............................................................................................................................................................................................ 18
Appendix C. Expected free energy as reaching steady-state ........................................................................................................................................ 18
Appendix D. Computing expected free energy ............................................................................................................................................................... 19
D.1. Ambiguity ............................................................................................................................................................................................................... 20
D.2. Risk .......................................................................................................................................................................................................................... 20
D.3. Novelty .................................................................................................................................................................................................................... 20
References ........................................................................................................................................................................................................................... 21
1. Introduction
Active inference is a normative principle underlying percep-
tion, action, planning, decision-making and learning in biological
or artificial agents, that inherits from the free energy principle,
a theory of self-organisation in the neurosciences (Buckley et al.,
2017;Friston,2019;Friston et al.,2006). Active inference pos-
tulates that these processes may all be seen as optimising two
complementary objective functions; namely, a variational free en-
ergy, which measures the fit between an internal model and past
sensory observations, and an expected free energy, which scores
possible future courses of action in relation to prior preferences.
Active inference has been employed to simulate a wide range
of complex behaviours in neuropsychology and machine learning,
including planning and navigation (Kaplan & Friston,2018a),
reading (Friston et al.,2018b), curiosity and abstract rule learn-
ing (Friston, Lin et al.,2017), substance use disorder (Smith,
Schwartenbeck et al.,2020), approach avoidance conflict (Smith,
Kirlic et al.,2020), saccadic eye movements (Parr & Friston,
2018a), visual foraging (Mirza et al.,2016;Parr & Friston,2017a),
visual neglect (Parr & Friston,2018c), hallucinations (Adams
et al.,2013), niche construction (Bruineberg et al.,2018;Constant
et al.,2018), social conformity (Constant et al.,2019), impulsiv-
ity (Mirza et al.,2019), image recognition (Millidge,2019), and
the mountain car problem (Çatal et al.,2019;Friston, Adams et al.,
2012;Friston et al.,2009). The key idea that underwrites these
simulations is that creatures use an internal forward (generative)
model to predict their sensory input, which they use to infer
the causes of these data. In addition to simulate behaviour,
active inference allows to answer questions about an individual’s
psychological processes, by comparing the evidence of different
mechanistic hypotheses in relation to behavioural data.
Active inference is very generic and allows to view different
models of behaviour in the same light. For example, a drift dif-
fusion model can now be seen in relation to predictive coding as
they can both be interpreted as minimising free energy through
a process of evidence accumulation (Bogacz,2017;Buckley et al.,
2017;Friston & Kiebel,2009). Similarly, a dynamic program-
ming model of choice behaviour corresponds to minimising ex-
pected free energy under the prior preference of maximising
reward (Da Costa et al.,2020). In being generic active inference is
not meant to replace any of the existing models, rather it should
be used as a tool to uncover the commitments and assumptions
of more specific models.
Early formulations of active inference employed generative
models expressed in continuous space and time (for an introduc-
tion see Bogacz,2017, for a review see Buckley et al.,2017), with
behaviour modelled as a continuously evolving random dynami-
cal system. However, we know that some processes in the brain
conform better to discrete, hierarchical, representations, com-
pared to continuous representations (e.g., visual working mem-
ory (Luck & Vogel,1997;Zhang & Luck,2008), state estimation via
place cells (Eichenbaum et al.,1999;O’Keefe & Dostrovsky,1971),
language, etc.). Reflecting this, many of the paradigms studied in
neuroscience are naturally framed as discrete state-space prob-
lems. Decision-making tasks are a prime candidate for this, as
they often entail a series of discrete alternatives that an agent
needs to choose among (e.g., multi-arm bandit tasks (Daw et al.,
2006;Reverdy et al.,2013;Wu et al.,2018), multi-step decision
tasks (Daw et al.,2011)). This explains why – in active inference
– agent behaviour is often modelled using a discrete state-space
formulation, the particular applications of which are summarised
in Table 1. More recently, mixed generative models (Friston, Parr
et al.,2017) – combining discrete and continuous states – have
been used to model behaviour involving discrete and continu-
ous representations (e.g., decision-making and movement (Parr
& Friston,2018d), speech production and recognition (Friston,
Sajid et al.,2020), pharmacologically induced changes in eye-
movement control (Parr & Friston,2019) or reading; involving
continuous visual sampling informing inferences about discrete
semantics (Friston, Parr et al.,2017)).
Due to the pace of recent theoretical advances in active in-
ference, it is often difficult to retain a comprehensive overview
of its process theory and practical implementation. In this paper,
we hope to provide a comprehensive (mathematical) synthesis
of active inference on discrete state-space models. This techni-
cal summary provides an overview of the theory, derives the
associated (neuronal) dynamics from first principles and relates
these to known biological processes. Furthermore, this paper
and Buckley et al. (2017) provide the building blocks neces-
sary to understand active inference on mixed generative models.
This paper can be read as a practical guide on how to imple-
ment active inference for simulating experimental behaviour, or a
pointer towards various in-silico neuro- and electro-physiological
responses that can be tested empirically.
This paper is structured as follows. Section 2is a high-level
overview of active inference. The following sections elucidate the
formulation by deriving the entire process theory from first prin-
ciples; incorporating perception, planning and decision-making.
This formalises the action–perception cycle: (1) an agent is pre-
sented with a stimulus, (2) it infers its latent causes, (3) plans
into the future and (4) realises its preferred course of action; and
repeat. This enactive cycle allows us to explore the dynamics of
synaptic plasticity, which mediate learning of the contingencies
of the world at slower timescales. We conclude in Section 9with
an overview of structure learning in active inference.
2
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Table 1
Applications of active inference (discrete state-space).
Application Description References
Decision-making under uncertainty Initial formulation of active inference on partially observable
Markov decision processes.
Friston, Samothrakis et al. (2012)
Optimal control Application of KL or risk sensitive control in an engineering
benchmark—the mountain car problem.
Çatal et al. (2019) and Friston, Adams et al. (2012)
Evidence accumulation Illustrating the role of evidence accumulation in
decision-making through an urns task.
FitzGerald, Moran et al. (2015) and FitzGerald,
Schwartenbeck et al. (2015)
Psychopathology Simulation of addictive choice behaviour. Schwartenbeck, FitzGerald, Mathys, Dolan, Wurst
et al. (2015)
Dopamine The precision of beliefs about policies provides a plausible
description of dopaminergic discharges.
Friston et al. (2014) and FitzGerald, Dolan et al.
(2015)
Functional magnetic resonance imaging Empirical prediction and validation of dopaminergic
discharges.
Schwartenbeck, FitzGerald, Mathys, Dolan and
Friston (2015)
Maximal utility theory Evidence in favour of surprise minimisation as opposed to
utility maximisation in human decision-making.
Schwartenbeck, FitzGerald, Mathys, Dolan,
Kronbichler et al. (2015)
Social cognition Examining the effect of prior preferences on interpersonal
inference.
Moutoussis et al. (2014)
Exploration–exploitation dilemma Casting behaviour as expected free energy minimising
accounts for epistemic and pragmatic choices.
Friston et al. (2015)
Habit learning and action selection Formulating learning as an inferential process and action
selection as Bayesian model averaging.
Friston et al. (2016) and FitzGerald et al. (2014)
Scene construction and anatomy of time Mean-field approximation for multi-factorial hidden states,
enabling high dimensional representations of the environment.
Friston and Buzsáki (2016) and Mirza et al. (2016)
Electrophysiological responses Synthesising various in-silico neurophysiological responses via
a gradient descent on free energy. E.g., place-cell activity,
mismatch negativity, phase-precession, theta sequences,
theta–gamma coupling and dopaminergic discharges.
Friston, FitzGerald et al. (2017)
Structure learning, curiosity and insight Simulation of artificial curiosity and abstract rule learning.
Structure learning via Bayesian model reduction.
Friston, Lin et al. (2017)
Hierarchical temporal representations Generalisation to hierarchical generative models with deep
temporal structure and simulation of reading.
Friston et al. (2018b) and Parr and Friston (2017b)
Computational neuropsychology Simulation of visual neglect, hallucinations, and prefrontal
syndromes under alternative pathological priors.
Benrimoh et al. (2018), Parr, Benrimoh et al.
(2018), Parr and Friston (2018c), Parr, Rees et al.
(2018) and Parr, Rikhye et al. (2019)
Neuromodulation Use of precision parameters to manipulate exploration during
saccadic searches; associating uncertainty with cholinergic and
noradrenergic systems.
Parr and Friston (2017a,2019), Sales et al. (2018)
and Vincent et al. (2019)
Decisions to movements Mixed generative models combining discrete and continuous
states to implement decisions through movement.
Friston, Parr et al. (2017) and Parr and Friston
(2018d)
Planning, navigation and niche construction Agent induced changes in environment (generative process);
decomposition of goals into subgoals.
Bruineberg et al. (2018), Constant et al. (2018)
and Kaplan and Friston (2018a)
Atari games Active inference compares favourably to reinforcement
learning in the game of Doom.
Cullen et al. (2018)
Machine learning Scaling active inference to more complex machine learning
problems.
Tschantz et al. (2019)
2. Active inference
To survive in a changing environment, biological (and arti-
ficial) agents must maintain their sensations within a certain
hospitable range (i.e., maintaining homeostasis through allosta-
sis). In brief, active inference proposes that agents achieve this by
optimising two complementary objective functions, a variational
free energy and an expected free energy. In short, the former
measures the fit between an internal (generative) model of its
sensations and sensory observations, while the latter scores each
possible course of action in terms of its ability to reach the range
of ‘‘preferred’’ states of being.
Our first premise is that agents represent the world through
an internal model. Through minimisation of variational free en-
ergy, this model becomes a good model of the environment.
In other words, this probabilistic model and the probabilistic
beliefs1that it encodes are continuously updated to mirror the
environment and its dynamics. Such a world model is considered
to be generative; in that it is able to generate predictions about
sensations (e.g., during planning or dreaming), given beliefs about
1By beliefs we mean Bayesian beliefs, i.e., probability distributions over a
variable of interest (e.g., current position). Beliefs are therefore used in the sense
of Bayesian belief updating or belief propagation—as opposed to propositional
or folk psychology beliefs.
future states of being. If an agent senses a heat source (e.g., an-
other agent) via some temperature receptors, the sensation of
warmth represents an observed outcome and the temperature
of the heat source a hidden state; minimisation of variational
free energy then ensures that beliefs about hidden states closely
match the true temperature. Formally, the generative model is
a joint probability distribution over possible hidden states and
sensory consequences – that specifies how the former cause
the latter – and minimisation of variational free energy enables
to ‘‘invert’’ the model; i.e., determine the most likely hidden
states given sensations. The variational free energy is the negative
evidence lower bound that is optimised in variational Bayes in
machine learning (Bishop,2006;Xitong,2017). Technically – by
minimising variational free energy – agents perform approximate
Bayesian inference (Sengupta & Friston,2016;Sengupta et al.,
2016), which enables them to infer the causes of their sensations
(e.g., perception). This is the point of contact between active infer-
ence and the Bayesian brain (Aitchison & Lengyel,2017;Friston,
2012;Knill & Pouget,2004). Crucially, agents may incorporate an
optimism bias (McKay & Dennett,2009;Sharot,2011) in their
model; thereby scoring certain ‘‘preferred’’ sensations as more
likely. This lends a higher plausibility to those courses of action
that realise these sensations. In other words, a preference is
simply something an agent (believes it) is likely to work towards.
3
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Fig. 1. Markov blankets in active inference. This figure illustrates the Markov blanket assumption of active inference. A Markov blanket is a set of variables through
which states internal and external to the system interact. Specifically, the system must be such that we can partition it into a Bayesian network of internal states µ,
external states η, sensory states oand active states u, (µ,oand uare often referred together as particular states) with probabilistic (causal) links in the directions
specified by the arrows. All interactions between internal and external states are therefore mediated by the blanket states b. The sensory states represent the sensory
information that the body receives from the environment and the active states express how the body influences the environment. This blanket assumption is quite
generic, in that it can be reasonably assumed for a brain as well as elementary organisms. For example, when considering a bacillus, the sensory states become the
cell membrane and the active states comprise the actin filaments of the cytoskeleton. Under the Markov blanket assumption – together with the assumption that
the system persists over time (i.e., possesses a non-equilibrium steady state) – a generalised synchrony appears, such that the dynamics of the internal states can be
cast as performing inference over the external states (and vice versa) via a minimisation of variational free energy (Friston,2019;Parr et al.,2020). This coincides
with existing approaches to inference; i.e., variational Bayes (Beal,2003;Bishop,2006;Blei et al.,2017;Jordan et al.,1998). This can be viewed as the internal states
mirroring external states, via sensory states (e.g., perception), and external states mirroring internal states via active states (e.g., a generalised form of self-assembly,
autopoiesis or niche construction). Furthermore, under these assumptions the most likely courses of actions can be shown to minimise expected free energy. Note
that external states beyond the system should not be confused with the hidden states of the agent’s generative model (which model external states). In fact, the
internal states are exactly the parameters (i.e., sufficient statistics) encoding beliefs about hidden states and other latent variables, which model external states in
a process of variational free energy minimisation. Hidden and external states may or may not be isomorphic. In other words, an agent uses its internal states to
represent hidden states that may or may not exist in the external world.
To maintain homeostasis, and ensure survival, agents must
minimise surprise.2Since the generative model scores preferred
outcomes as more likely, minimising surprise corresponds to
maximising model evidence.3In active inference, this is assured
by the aforementioned processes; indeed, the variational free en-
ergy turns out to be an upper bound on surprise and minimising
expected free energy ensures preferred outcomes are realised,
thereby avoiding surprise on average.
Active inference can thus be framed as the minimisation of
surprise (Friston,2009,2010;Friston et al.,2006;Friston &
Stephan,2007) by perception and action. In discrete state models
– of the sort discussed here – this means agents select from dif-
ferent possible courses of action (i.e., policies) in order to realise
their preferences and thus minimise the surprise that they expect
to encounter in the future. This enables a Bayesian formulation
of the perception–action cycle (Fuster,1990): agents perceive
the world by minimising variational free energy, ensuring their
2In information theory, the surprise (a.k.a., surprisal) associated with an
outcome under a generative model is given by −log p(o). This specifies the
extent to which an observation is unusual and surprises the agent—but this
does not mean that the agent consciously experiences surprise. In information
theory this kind of surprise is known as self-information.
3In Bayesian statistics, the model evidence (often referred to as marginal
likelihood) associated with a generative model is p(o)—the probability of ob-
served outcomes according to the model (sometimes this is written as p(o|m),
explicitly conditioning upon a model). The model evidence scores the goodness
of the model as an explanation of data that are sampled, by rewarding accuracy
and penalising complexity, which avoids overfitting.
model is consistent with past observations, and act by minimising
expected free energy, to make future sensations consistent with
their model. This account of behaviour can be concisely framed
as self-evidencing (Hohwy,2016).
In contrast to other normative models of behaviour, active
inference is a ‘first principle’ account, which is grounded in sta-
tistical physics (Friston,2019;Parr et al.,2020). Active inference
describes the dynamics of systems that persist (i.e., do not dis-
sipate) during some timescale of interest, and that can be statis-
tically segregated from their environment—conditions which are
satisfied by biological systems. Mathematically, the first condition
means that the system is at non-equilibrium steady-state (NESS).
This implies the existence of a steady-state probability density to
which the system self-organises and returns to after perturbation
(i.e., the agent’s preferences). The statistical segregation condi-
tion is the presence of a Markov blanket (c.f., Fig. 1) (Kirchhoff
et al.,2018;Pearl,1998): a set of variables through which states
internal and external to the system interact (e.g., the skin is
a Markov blanket for the human body). Under these assump-
tions it can be shown that the states internal to the system
parameterise Bayesian beliefs about external states and can be
cast a process of variational free energy minimisation (Friston,
2019;Parr et al.,2020). This coincides with existing approaches
to approximate inference (Beal,2003;Bishop,2006;Blei et al.,
2017;Jordan et al.,1998). Furthermore, it can be shown that the
most likely courses of action taken by those systems are those
which minimise expected free energy (or a variant thereof, see
4
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Appendix C)—a quantity that subsumes many existing constructs
in science and engineering (see Section 7).
By subscribing to the above assumptions, it is possible to
describe the behaviour of viable living systems as performing
active inference—the remaining challenge is to determine the
computational and physiological processes that they implement
to do so. This paper aims to summarise possible answers to this
question, by reviewing the technical details of a process theory
for active inference on discrete state-space generative models,
first presented in Friston, FitzGerald et al. (2017). Note that it is
important to distinguish between active inference as a principle
(presented above) from active inference as a process theory.
The former is a consequence of fundamental assumptions about
living systems, while the latter is a hypothesis concerning the
computational and biological processes in the brain that might
implement active inference. The ensuing process theories theory
can then be used to predict plausible neuronal dynamics and
electrophysiological responses that are elicited experimentally.
3. Discrete state-space generative models
The generative model (Bishop,2006) expresses how the agent
represents the world. This is a joint probability distribution over
sensory data and the hidden (or latent) causes of these data.
The sorts of discrete state-space generative models used in active
inference are specifically suited to represent discrete time series
and decision-making tasks. These can be expressed as variants
of partially observable Markov decision processes (POMDPs; As-
tröm,1965): from simple Markov decision processes (Barto &
Sutton,1992;Stone,2019;White,2001) to generalisations in the
form of deep probabilistic (hierarchical) models (Allenby et al.,
2005;Box & Tiao,1965;Friston et al.,2018b). For clarity, the
process theory is derived for the simplest model that facilitates
understanding of subsequent generalisations; namely, a POMDP
where the agent holds beliefs about the probability of the initial
state (specified as D), the transition probabilities from one state
to the next (defined as matrix B) and the probability of outcomes
given states (i.e., the likelihood matrix A); see Fig. 2.
As mentioned above, a substantial body of work justifies
describing certain neuronal representations with discrete state-
space generative models (e.g., Luck & Vogel,1997;Tee & Taylor,
2018;Zhang & Luck,2008). Furthermore, it has been long known
that – at the level of neuronal populations – computations occur
periodically (i.e., in distinct and sometimes nested oscillatory
bands). Similarly, there is evidence for sequential computation
in a number of processes (e.g., attention Buschman & Miller,
2010;Duncan et al.,1994;Landau & Fries,2012, visual per-
ception Hanslmayr et al.,2013;Rolls & Tovee,1994) and at
different levels of the neuronal hierarchy (Friston,2008;Friston
et al.,2018b), in line with ideas from hierarchical predictive
processing (Chao et al.,2018;Iglesias et al.,2013). This accom-
modates the fact that visual saccadic sampling of observations
occurs at a frequency of approximately 4 Hz (Parr & Friston,
2018d). The relatively slow presentation of a discrete sequence of
observations enables inferences to be performed in peristimulus
time by (much) faster neuronal dynamics.
Active inference, implicitly, accounts for fast and slow neu-
ronal dynamics. At each time-step the agent observes an out-
come, from which it infers the past, present and future (hidden)
states through perception. This underwrites a plan into the future,
by evaluating (the expected free energy of) possible policies. The
inferred (best) policies specify the most likely action, which is
executed. At a slower timescale, parameters encoding the con-
tingencies of the world (e.g., A), are inferred. This is referred to as
learning. Even more slowly, the structure of the generative model
is updated to better account for available observations—this is
called structure learning. The following sections elucidate these
aspects of the active inference process theory.
This paper will be largely concerned with deriving and in-
terpreting the inferential dynamics that agents might implement
using the generative model in Fig. 2. We leave the discussion of
more complex models to Appendix A, since the derivations are
analogous in those cases.
4. Variational Bayesian inference
4.1. Free energy and model evidence
Variational Bayesian inference rests upon minimisation of a
quantity called (variational) free energy, which bounds the im-
probability (i.e., the surprise) of sensory observations, under a
generative model. Simultaneously, free energy minimisation is
a statistical inference technique that enables the approximation
of the posterior distribution in Bayes rule. In machine learning,
this is known as variational Bayes (Beal,2003;Bishop,2006;Blei
et al.,2017;Jordan et al.,1998). Active inference agents minimise
variational free energy, enabling concomitant maximisation of
their model evidence and inference of the latent variables of their
generative model. In the following, we consider a particular time
point to be given t∈ {1,...,T}, whence the agent has observed a
sequence of outcomes o1:t. The posterior about the latent causes
of sensory data is given by Bayes rule:
P(s1:T,A, π |o1:t)=P(o1:t|s1:T,A, π)P(s1:T,A, π )
P(o1:t)(1)
Note the policy πis a random variable. This entails planning
as inferring the best action sequence from observations (Attias,
2003;Botvinick & Toussaint,2012). Computing the posterior
distribution requires computing the model evidence P(o1:t)=
π∈Πs1:T∈STP(o1:t,s1:T,A, π )dA, which is intractable for
complex generative models embodied by biological and artifi-
cial systems (Friston,2008)—a well-known problem in Bayesian
statistics. An alternative to computing the exact posterior distri-
bution is to optimise an approximate posterior distribution over
latent causes Q(s1:T,A, π ), by minimising the Kullback–Leibler
(KL) divergence (Kullback & Leibler,1951)DKL—a non-negative
measure of discrepancy between probability distributions. We
can use the definition of the KL divergence and Bayes rule to
arrive at the variational free energy F, which is a functional of
approximate posterior beliefs:
0≤DKL[Q(s1:T,A, π )∥P(s1:T,A, π|o1:t)]
=EQ(s1:T,A,π)[log Q(s1:T,A, π )−log P(s1:T,A, π |o1:t)]
=EQ(s1:T,A,π)[log Q(s1:T,A, π )−log P(o1:t,s1:T,A, π )
+log P(o1:t)]
=EQ(s1:T,A,π)[log Q(s1:T,A, π )−log P(o1:t,s1:T,A, π )]
=:F[Q(s1:T,A,π)]
+log P(o1:t)
⇒ − log P(o1:t)≤F[Q(s1:T,A, π )]
(2)
From (2), one can see that by varying Qto minimise the vari-
ational free energy enables us to approximate the true posterior,
while simultaneously ensuring that surprise remains low. The
former offers the intuitive interpretation of the free energy as
a generalised prediction error, as minimising free energy cor-
responds to suppressing the discrepancy between predictions,
i.e., Q, and the actual state of affairs, i.e., the posterior; and indeed
for a particular class of generative models, we recover the predic-
tion error given by predictive coding schemes (see Bogacz,2017;
Buckley et al.,2017;Friston et al.,2007). Altogether, this means
5
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Fig. 2. Example of a discrete state-space generative model. Panel 2a, specifies the form of the generative model, which is how the agent represents the world. The
generative model is a joint probability distribution over (hidden) states, outcomes and other variables that cause outcomes. In this representation, states unfold
in time causing an observation at each time-step. The likelihood matrix Aencodes the probabilities of state–outcome pairs. The policy πspecifies which action
to perform at each time-step. Note that the agent’s preferences may be specified either in terms of states or outcomes. It is important to distinguish between
states (resp. outcomes) that are random variables, and the possible values that they can take in S(resp. in O), which we refer to as possible states (resp. possible
outcomes). Note that this type of representation comprises a finite number of timesteps, actions, policies, states, outcomes, possible states and possible outcomes.
In Panel 2b, the generative model is displayed as a probabilistic graphical model (Bishop,2006;Jordan et al.,1998;Pearl,1988,1998) expressed in factor graph
form (Loeliger,2004). The variables in circles are random variables, while squares represent factors, whose specific form are given in Panel 2a. The arrows represent
causal relationships (i.e., conditional probability distributions). The variables highlighted in grey can be observed by the agent, while the remaining variables are
inferred through approximate Bayesian inference (see Section 4) and called hidden or latent variables. Active inference agents perform inference by optimising
the parameters of an approximate posterior distribution (see Section 4). Panel 2c specifies how this approximate posterior factorises under a particular mean-field
approximation (Tanaka,1999), although other factorisations may be used (Parr, Markovic et al.,2019;Schwöbel et al.,2018). A glossary of terms used in this figure is
available in Table 2. The mathematical yoga of generative models is heavily dependent on Markov blankets. The Markov blanket of a random variable in a probabilistic
graphical model are those variables that share a common factor. Crucially, a variable conditioned upon its Markov blanket is conditionally independent of all other
variables. We will use this property extensively (and implicitly) in the text.
that variational free energy minimising agents, simultaneously,
infer the latent causes of their observations and maximise the
evidence for their generative model. One should note that the
free energy equals the surprise −log P(o1:t) only at the global free
energy minimum, when the approximate posterior Q(s1:T,A, π )
equals the true posterior P(s1:T,A, π |o1:t). Outside of the global
free energy minimum, the free energy upper bounds the surprise,
in which case, since the true posterior is generally intractable, the
tightness of the bound is generally unknowable.
To aid intuition, the variational free energy can be rearranged
into complexity and accuracy:
F[Q(s1:T,A, π )] = DKL[Q(s1:T,A, π)∥P(s1:T,A, π )]
Complexity
−EQ(s1:T,A,π)[log P(o1:t|s1:T,A, π )]
Accuracy
(3)
The first term of (3) can be regarded as complexity: a simple
explanation for observable data Q, which makes few assumptions
over and above the prior (i.e., with KL divergence close to zero),
is a good explanation. In other words, a good explanation is an
accurate account of some data that requires minimal movement
for updating of prior to posterior beliefs (c.f., Occam’s principle).
The second term is accuracy; namely, the probability of the data
given posterior beliefs about model parameters Q. In other words,
how well the generative model fits the observed data. The idea
that neural representations weigh complexity against accuracy
underwrites the imperative to find the most accurate explanation
for sensory observations that is minimally complex, which has
been leveraged by things like Horace Barlow’s principle of min-
imum redundancy (Barlow,2001) and subsequently supported
empirically (Dan et al.,1996;Lewicki,2002;Olshausen & Field,
2004;Olshausen & O’Connor,2002). Fig. 3 illustrates the various
implications of minimising free energy.
4.2. On the family of approximate posteriors
The goal is now to minimise variational free energy with
respect to Q. To obtain a tractable expression for the variational
free energy, we need to assume a certain simplifying factori-
sation of the approximate posterior. There are many possible
forms (e.g., mean-field, marginal, Bethe, see Heskes,2006;Parr,
Markovic et al.,2019;Yedidia et al.,2005), each of which trades
off the quality of the inferences with the complexity of the
computations involved. For the purpose of this paper we use
a particular structured mean-field approximation (see Table 2
for an explanation of the different distributions and variables in
play):
Q(s1:T,A, π )=Q(A)Q(π)
T
τ=1
Q(sτ|π) (4)
Q(sτ|π)=Cat(sπ τ ),sπτ ∈ {x∈Rm|xi>0,
i
xi=1}
Q(π)=Cat(π
π
π),{x∈R|Π||xi>0,
i
xi=1}
Q(A)=
m
i=1
Q(A•i),Q(A•i)=Dir(a•i),a•i∈(R>0)n
6
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Table 2
Glossary of terms and notation.
Notation Meaning Type
SSet of all possible (hidden) states. Finite set of cardinality m>0.
sτ(Hidden) state at time τ. In computations, if sτevaluates to the ith possible
state, then interpret it as the ith unit vector in Rm.
Random variable over S.
s1:tSequence of hidden states s1,...,st. Random variable over S× · · · × S
ttimes
=St.
OSet of all possible outcomes. Finite set of cardinality n>0.
oτOutcome at time τ. In computations, if oτevaluates to the jth possible
outcome, then interpret it as the jth unit vector in Rn.
Random variable over O.
o1:tSequence of outcomes o1,...,otRandom variable over O× · · · × O
ttimes
=Ot.
TNumber of timesteps in a trial of observation epochs under the generative
model.
Positive integer.
USet of all possible actions. Finite set.
ΠSet of all allowable policies; i.e., action sequences indexed in time. Finite subset of U× · · · × U
Ttimes
=UT.
πPolicy or actions sequence indexed in time. Random variable over Π, or element of Π
depending on context.
QApproximate posterior distribution over the latent variables of the generative
model s1:T,A, π.
Scalar valued probability distribution over
S× {x∈Rn|xi>0,ixi=1}m×Π.
F,FπVariational free energy and variational free energy conditioned upon a policy. Functionals of Qthat evaluate to a scalar
quantity.
GExpected free energy. Function defined on Πthat evaluates to a
scalar quantity.
Cat Categorical distribution; probability distribution over a finite set assigning
strictly positive probabilities.
Probability distribution over a finite set of
cardinality kparameterised by a real valued
vector of probabilities in
{x∈Rk|xi>0,ixi=1}
Dir Dirichlet distribution (conjugate prior of the categorical distribution).
Probability distribution over the parameter space of the categorical
distribution, parameterised by a vector of positive reals.
Probability distribution over
{x∈Rk|xi>0,ixi=1}, itself parameterised
by an element of (R>0)k.
X•i,Xki ith column and (k,i)th element of matrix X. Matrix indexing convention.
·,⊗,⊙,⊙Respectively inner product, Kronecker product, element-wise product and
element-wise power. Following existing active inference literature, we adopt
the convention X·Y:= XTYfor matrices.
Operation on vectors and matrices.
ALikelihood matrix. The probability of the state–outcome pair oτ,sτ, namely
P(oτ|sτ,A) is given by oτ·Asτ.
Random variable over the subset of Mn×m(R)
with columns in {x∈Rn|xi>0,ixi=1}.
Bπτ−1Matrix of transition probabilities from one state to the next state given action
πτ−1. The probability of possible state sτ, given sτ−1and action πτ−1is
sτ·Bπτ−1sτ−1.
Matrix in Mm×m(R) with columns in
{x∈Rm|xi>0,ixi=1}.
DVector of probabilities of initial state. The probability of the ith possible state
occurring at time 1 is Di.
Vector of probabilities in
{x∈Rm|xi>0,ixi=1}.
a,aParameters of prior and approximate posterior beliefs about A. Matrices in Mn×m(R>0).
a0,a0Matrices of the same size as a,a, with homogeneous columns; any of its ith
column elements are denoted by ai0,ai0and defined by
ai0=n
j=1aji,ai0=n
j=1aji.
Matrices in Mn×m(R>0).
log,Γ, ψ Natural logarithm, gamma function and digamma function. By convention
these functions are taken component-wise on vectors and matrices.
Functions.
EP(X)[f(X)]Expectation of a random variable f(X) under a probability density P(X), taken
component-wise if f(X) is a matrix. EP(X)[f(X)] := f(X)P(X)dX
Real-valued operator on random variables.
A A := EQ(A)[A] = a⊙a⊙(−1)
0Matrix in Mn×m(R>0).
logA logA := EQ(A)[log A] = ψ(a)−ψ(a0). Note that logA ̸= log A! Matrix in Mn×m(R).
σSoftmax function or normalised exponential. σ(x)k=exk
iexiFunction Rk→ {x∈Rk|xi>0,ixi=1}
H[P]Shannon entropy of a probability distribution P. Explicitly,
H[P] = EP(x)[− log P(x)]
Functional over probability distributions.
This choice is driven by didactic purposes and since this fac-
torisation has been used extensively in the active inference liter-
ature (Friston, FitzGerald et al.,2017;Friston, Parr et al.,2017;
Friston et al.,2018b). However, the most recent software im-
plementation of active inference (available in spm_MDP_VB_X.m)
employs a marginal approximation (Parr,2019;Parr, Markovic
et al.,2019), which retains the simplicity and biological inter-
pretation of the neuronal dynamics afforded by the mean-field
approximation, while approximating the more accurate infer-
ences of the Bethe approximation. For these reasons, the marginal
free energy currently stands as the most biologically plausible.
4.3. Computing the variational free energy
The next sections focus on producing biologically plausible
neuronal dynamics that perform perception and learning based
on variational free energy minimisation. To enable this, we first
compute variational the free energy, using the factorisations of
the generative model and approximate posterior (c.f., Fig. 2):
F[Q(s1:T,A, π )] = EQ(s1:T,A,π)[log Q(s1:T,A, π )
−log P(o1:t,s1:T,A, π )]
=EQ(s1:T,A,π)[log Q(A)
+log Q(π)+
T
τ=1
log Q(sτ|π)
−log P(A)−log P(π)−log P(s1)
−
T
τ=2
log P(sτ|sτ−1, π )
7
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Fig. 3. Markov blankets and self-evidencing. This schematic illustrates the various interpretations of minimising variational free energy. Recall that the existence of
a Markov blanket implies a certain lack of influences among internal, blanket and external states. These independencies have an important consequence; internal
and active states are the only states that are not influenced by external states, which means their dynamics (i.e., perception and action) are a function of, and
only of, particular states (i.e., internal, sensory and active states); here, the variational (free energy) bound on surprise. This surprise has a number of interesting
interpretations. Given it is the negative log probability of finding a particle or creature in a particular state, minimising surprise corresponds to maximising the
value of a particle’s state. This interpretation is licensed by the fact that the states with a high probability are, by definition, attracting states. On this view, one
can then spin-off an interpretation in terms of reinforcement learning (Barto & Sutton,1992), optimal control theory (Todorov & Jordan,2002) and, in economics,
expected utility theory (Bossaerts & Murawski,2015). Indeed, any scheme predicated on the optimisation of some objective function can now be cast in terms of
minimising surprise – in terms of perception and action (i.e., the dynamics of internal and active states) – by specifying these optimal values to be the agent’s
preferences. The minimisation of surprise (i.e., self-information) leads to a series of influential accounts of neuronal dynamics; including the principle of maximum
mutual information (Linsker,1990;Optican & Richmond,1987), the principles of minimum redundancy and maximum efficiency (Barlow,1961) and the free energy
principle (Friston et al.,2006). Crucially, the average or expected surprise (over time or particular states of being) corresponds to entropy. This means that action and
perception look as if they are minimising entropy. This leads us to theories of self-organisation, such as synergetics in physics (Haken,1978;Kauffman,1993;Nicolis
& Prigogine,1977) or homeostasis in physiology (Ashby,1947;Bernard,1974;Conant & Ashby,1970). Finally, the probability of any blanket states given a Markov
blanket (m) is, on a statistical view, model evidence (MacKay,1995,2003). This means that all the above formulations are internally consistent with things like the
Bayesian brain hypothesis, evidence accumulation and predictive coding; most of which inherit from Helmholtz motion of unconscious inference (von Helmholtz &
Southall,1962), later unpacked in terms of perception as hypothesis testing in 20th century psychology (Gregory,1980) and machine learning (Dayan et al.,1995).
−
t
τ=1
log P(oτ|sτ,A)]
=DKL[Q(A)∥P(A)] + DKL [Q(π)∥P(π)]
+EQ(π)[Fπ[Q(s1:T|π)]]
(5)
where
Fπ[Q(s1:T|π)] :=
T
τ=1
EQ(sτ|π)[log Q(sτ|π)]
−
t
τ=1
EQ(sτ|π)Q(A)[log P(oτ|sτ,A)]
−EQ(s1|π)[log P(s1)] −
T
τ=2
EQ(sτ|π)Q(sτ−1|π)
× [log P(sτ|sτ−1, π )]
(6)
is the variational free energy conditioned upon pursuing a par-
ticular policy. This is the same quantity that we would have
obtained by omitting Aand conditioning all probability distribu-
tions in the numerators of (1) by π. In the next section, we will
see how perception can be framed in terms of variational free
energy minimisation.
5. Perception
In active inference, perception is equated with state estima-
tion (Friston, FitzGerald et al.,2017) (e.g., inferring the tempera-
ture from the sensation of warmth), consistent with the idea that
perceptions are hypotheses (Gregory,1980). To infer the (past,
present and future) states of the environment, an agent must
minimise the variational free energy with respect to Q(s1:T|π) for
each policy π. This provides the agent’s inference over hidden
states, contingent upon pursuing a given policy. Since the only
part of the free energy that depends on Q(s1:T|π) is Fπ, the
agent must simply minimise Fπ. Substituting Q(sτ|π) by their
sufficient statistics (i.e., the vector of parameters sπτ ), Fπbecomes
a function of those parameters. This enables us to rewrite (6),
conveniently in matrix form (see Appendix B for details):
Fπ(sπ1,...,sπT)=
T
τ=1
sπτ ·log sπ τ −
t
τ=1
oτ·logAsπτ
−sπ1log D−
T
τ=2
sπτ ·log(Bπτ−1)sπ τ −1
(7)
8
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
This enables to compute the variational free energy gradi-
ents (Petersen & Pedersen,2012):
∇sπτ Fπ(sπ1,...,sπT)=⃗
1+log sπτ
−
oτ·logA +sπτ +1·log(Bπτ)
+log Dif τ=1
oτ·logA +sπτ +1·log(Bπτ)
+log(Bπτ−1)sπτ −1if 1 < τ ≤t
sπτ +1·log(Bπτ)+log(Bπτ−1)sπ τ −1
if τ > t
(8)
The neuronal dynamics are given by a gradient descent on
free energy (Friston, FitzGerald et al.,2017), with state-estimation
expressed as a softmax function of accumulated (negative) free
energy gradients, that we denote by vπτ (see Section 5.1 for an
interpretation). The constant term ⃗
1 is generally omitted since the
softmax function removes it anyway.
˙vπτ (sπ1,...,sπT)= −∇sπ τ Fπ(sπ1,...,sπT)
sπτ =σ(vπ τ )(9)
The softmax function σ– a generalisation of the sigmoid to
vector inputs – is a natural choice as the variational free energy
gradient is a logarithm and the components of sπτ must sum
to one. Note the continuous time gradient descent on the free
energy (9); although we focus on active inference with discrete
generative models, this does not preclude the belief updating
from occurring in continuous time (this is particularly important
when relating these dynamics to neurobiological processes, see
below). Yet, any numerical implementation of active inference
would implement a discretised version of (9) until convergence,
for example
v(k)
πτ =v(k−1)
πτ −κ∇s(k−1)
πτ Fπ(s(k−1)
π1,...,s(k−1)
πT) for small κ > 0
s(k)
πτ =σ(v(k)
πτ ).
5.1. Plausibility of neuronal dynamics
The temporal dynamics expressed in (9) unfold at a much
faster timescale than the sampling of new observations (i.e.,
within timesteps) and correspond to fast neuronal processing
in peristimulus time. This is consistent with behaviour-relevant
computations at frequencies that are higher than the rate of
visual sampling (e.g., working memory (Lundqvist et al.,2016),
visual stimulus perception in humans (Hanslmayr et al.,2013)
and macaques (Rolls & Tovee,1994)).
Furthermore, these dynamics (9) are consistent with predic-
tive processing (Bastos et al.,2012;Rao & Ballard,1999) – since
active inference prescribes dynamics that minimise prediction
error – although they generalise it to a wide range of generative
models. Note that, while also a variational free energy, this sort
of prediction error (7) is not the same as that given by predictive
coding schemes (which rely upon a certain kind of continuous
state-space generative model, see Bogacz,2017;Buckley et al.,
2017;Friston et al.,2007).
Just as neuronal dynamics involve translation from post-
synaptic potentials to firing rates, (9) involves translating from
a vector of real numbers (v), to a vector whose elements are
bounded between zero and one (sπτ ); via the softmax function. As
a result, it is natural to interpret the components of vas the av-
erage membrane potential of distinct neural populations, and sπτ
as the average firing rate of those populations, which is bounded
thanks to neuronal refractory periods. This is consistent with
mean-field formulations of neural population dynamics, in that
the average firing rate of a neuronal population follows a sigmoid
function of the average membrane potential (Deco et al.,2008;
Marreiros et al.,2008;Moran et al.,2013). Using the fact that a
softmax function is a generalisation of the sigmoid to vector in-
puts – here the average membrane potentials of coupled neuronal
populations – it follows that their average firing follows a softmax
function of their average potential. In this context, the softmax
function may be interpreted as performing lateral inhibition,
which can be thought of as leading to narrower tuning curves of
individual neurons and thereby sharper inferences (Von Békésy,
1967). Importantly, this tells us that state-estimation can be
performed in parallel by different neuronal populations, and a
simple neuronal architecture is sufficient to implement these
dynamics (see Parr, Markovic et al. (2019, Figure 6)).
Lastly, interpreting the dynamics in this way has a
degree of face validity, as it enables us to synthesise a wide-
range of biologically plausible electrophysiological responses;
including repetition suppression, mismatch negativity, violation
responses, place-cell activity, phase precession, theta sequences,
theta–gamma coupling, evidence accumulation, race-to-bound
dynamics and transfer of dopamine responses (Friston, FitzGer-
ald et al.,2017;Schwartenbeck, FitzGerald, Mathys, Dolan and
Friston,2015).
The neuronal dynamics for state estimation coincide with vari-
ational message passing (Dauwels,2007;Winn & Bishop,2005),
a popular algorithm for approximate Bayesian inference. This
follows, as we have seen, from free energy minimisation under
a particular mean-field approximation (4). If one were to use the
Bethe approximation, the corresponding dynamics coincide with
belief propagation (Bishop,2006;Loeliger,2004;Parr, Markovic
et al.,2019;Schwöbel et al.,2018;Yedidia et al.,2005), another
widely used algorithm for approximate inference. This offers a
formal connection between active inference and message pass-
ing interpretations of neuronal dynamics (Dauwels et al.,2007;
Friston, Parr et al.,2017;George,2005). In the next section, we
examine planning, decision-making and action selection.
6. Planning, decision-making and action selection
So far, we have focused on optimising beliefs about hidden
states under a particular policy by minimising a variational free
energy functional of an approximate posterior over hidden states,
under each policy.
In this section, we explain how planning and decision-making
arise as a minimisation of expected free energy—a function scor-
ing the goodness of each possible future course of action. We
briefly motivate how the expected free energy arises from first-
principles. This allows us to frame decision-making and action-
selection in terms of expected free energy minimisation. Finally,
we conclude by discussing the computational cost of planning
into the future.
6.1. Planning and decision-making
At the heart of active inference, is a description of agents
that strive to attain a target distribution specifying the range of
preferred states of being, given a sufficient amount of time. To
work towards reaching these preferences, agents select policies
Q(π), such that their predicted states Q(sτ,A) at some future time
point τ > t(usually, the time horizon of a policy T) reach the
preferred states P(sτ,A), which are specified by the generative
model. These considerations allow us to show in Appendix C
9
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
that the requisite approximate posterior over policies Q(π) is a
softmax function of the negative expected free energy G4:
Q(π)=σ(−G(π))
G(π)=DKL[Q(sτ,A|π)∥P(sτ,A)]
Risk
−EQ(sτ,A|π)P(oτ|sτ,A)[log P(oτ|sτ,A)]
Ambiguity
(10)
By risk we mean the difference between predicted and a
priori predictions in the future (e.g., the quantification of losses
as in financial risk) and ambiguity is the uncertainty associated
to future observations, given states. This means that the most
likely (i.e., best) policies minimise expected free energy. This
ensures that future courses of action are exploitative (i.e., risk
minimising) and explorative (i.e., ambiguity minimising). In par-
ticular, the expected free energy balances goal-seeking and itin-
erant novelty-seeking behaviour, given some prior preferences
or goals. Note that the ambiguity term rests on an expecta-
tion over fictive (i.e., predicted) outcomes under beliefs about
future states. This means that optimising beliefs about future
states during perception is crucial to accurately predict future
outcomes during planning. In summary, planning and decision-
making respectively correspond to evaluating the expected free
energy of different policies, which scores their goodness in re-
lation to prior preferences and forming approximate posterior
beliefs about policies.
6.2. Action selection, policy-independent state-estimation
Approximate posterior beliefs about policies allows to obtain
the most plausible action as the most likely under all policies—
this can be expressed as a Bayesian model average
ut=arg max
u∈U
π∈Π,πt=u
Q(π).(11)
In addition, we obtain a policy independent state-estimation
at any time point Q(sτ), τ ∈ {1,...,T}, as a Bayesian model av-
erage of approximate posterior beliefs about hidden states under
policies, which may be expressed in terms of the distribution’s
parameters (Q(sτ)=Cat(sτ),Q(sτ|π)=Cat (sπ τ )):
Q(sτ)=
π∈Π
Q(sτ|π)Q(π)
⇐⇒ sτ=
π∈Π
sπτ Q(π)
(12)
Note that these Bayesian model averages may be implemented
by neuromodulatory mechanisms (FitzGerald et al.,2014).
6.3. Biological plausibility
Winner take-all architectures of decision-making are already
commonplace in computational neuroscience (e.g., models of se-
lective attention and recognition (Carpenter & Grossberg,1987;
Itti et al.,1998), hierarchical models of vision (Riesenhuber &
Poggio,1999)). This is nice, since the softmax function in (10)
can be seen as providing a biologically plausible (Deco et al.,
4A more complete treatment may include priors over policies – usually
denoted by E– and the evidence for a policy afforded by observed outcomes
(usually denoted by F). These additional terms supplement the expected free
energy, leading to an approximate posterior of the form σ(−log E−F−G) (Friston
et al.,2018b).
2008;Marreiros et al.,2008;Moran et al.,2013), smooth ap-
proximation to the maximum operation, which is known as soft
winner take-all (Maass,2000). In fact, the generative model,
presented in Fig. 2, can be naturally extended such that the
approximate posterior contains an (inverse) temperature param-
eter γmultiplying the expected free energy inside the softmax
function (see Appendix A.2). This temperature parameter reg-
ulates how precisely the softmax approximates the maximum
function, thus recovering winner take-all architectures for high
parameter values (technically, this converts Bayesian model av-
eraging into Bayesian model selection, where the policy corre-
sponds to a model of what the agent is doing). This parameter,
regulating precision of policy selection, has a clear biological
interpretation in terms of confidence encoded in dopaminergic
firing (FitzGerald, Dolan et al.,2015;Friston, FitzGerald et al.,
2017;Friston et al.,2014;Schwartenbeck, FitzGerald, Mathys,
Dolan and Friston,2015). Interestingly, Daw and colleagues (Daw
et al.,2006) uncovered evidence in favour of a similar model
employing a softmax function and temperature parameter in
human decision-making.
6.4. Pruning of policy trees
From a computational perspective, planning (i.e., computing
the expected free energy) for each possible policy can be cost-
prohibitive, due do the combinatorial explosion in the number of
sequences of actions when looking deep into the future. There
has been work in understanding how the brain finesses this
problem (Huys et al.,2012), which suggests a simple answer:
during mental planning, humans stop evaluating a policy as soon
as they encounter a large loss (i.e., a high value of the expected
free energy that renders the policy highly implausible). In ac-
tive inference this corresponds to using an Occam window; that
is, we stop evaluating the expected free energy of a policy if
it becomes much higher than the best (smallest expected free
energy) policy—and set its approximate posterior probability to
an arbitrarily low value accordingly. This biologically plausible
pruning strategy drastically reduces the number of policies one
has to evaluate exhaustively.
Although effective and biologically plausible, the Occam win-
dow for pruning policy trees cannot deal with large policy spaces
that ensue with deep policy trees and long temporal horizons.
This means that pruning can only partially explain how biologi-
cal organisms perform deep policy searches. Further research is
needed to characterise the processes in which biological agents
reduce large policy spaces to tractable subspaces. One explana-
tion – for the remarkable capacity of biological agents to evaluate
deep policy trees – rests on deep (hierarchical) generative mod-
els, in which policies operate at each level. These deep models
enable long-term policies, modelling slow transitions among hid-
den states at higher levels in the hierarchy, to contextualise
faster state transitions at subordinate levels (see Appendix A).
The resulting (semi Markovian) process can then be specified in
terms of a hierarchy of limited horizon policies that are nested
over temporal scales; c.f., motor chunking (Dehaene et al.,2015;
Fonollosa et al.,2015;Haruno et al.,2003).
6.5. Discussion of the action–perception cycle
Minimising variational and expected free energy are com-
plementary and mutually beneficial processes. Minimisation of
variational free energy ensures that the generative model is a
good predictor of its environment; this allows the agent to ac-
curately plan into the future by evaluating expected free energy,
which in turn enables it to realise its preferences. In other words,
minimisation of variational free energy is a vehicle for effective
10
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
planning and reaching preferences via the expected free energy;
in turn, reaching preferences minimises the expected surprise of
future states of being.
In conclusion, we have seen how agents plan into the future
and make decisions about the best possible course of action. This
concludes our discussion of the action–perception cycle. In the
next section, we examine expected free energy in greater detail.
Then, we will see how active agents can learn the contingencies
of the environment and the structure of their generative model
at slower timescales.
7. Properties of the expected free energy
The expected free energy is a fundamental construct of inter-
est. In this section, we unpack its main features and highlight its
importance in relation to many existing theories in neurosciences
and engineering.
The expected free energy of a policy can be unpacked in a
number of ways. Perhaps the most intuitive is in terms of risk
and ambiguity:
G(π)=DKL[Q(sτ,A|π)∥P(sτ,A)]
Risk
+EQ(sτ,A|π)[H[P(oτ|sτ,A)]]
Ambiguity
(13)
This means that policy selection minimises risk and ambiguity.
Risk, in this setting, is simply the difference between predicted
and prior beliefs about final states. In other words, policies will
be deemed more likely if they bring about states that conform
to prior preferences. In the optimal control literature, this part of
expected free energy underwrites KL control (Todorov,2008;van
den Broek et al.,2010). In economics, it leads to risk sensitive
policies (Fleming & Sheu,2002). Ambiguity reflects the uncer-
tainty about future outcomes, given hidden states. Minimising
ambiguity therefore corresponds to choosing future states that
generate unambiguous and informative outcomes (e.g., switching
on a light in the dark).
We can express the expected free energy of a policy as a bound
on information gain and expected log (model) evidence (a.k.a.,
Bayesian risk):
G(π)=EQ[DKL[Q(sτ,A|oτ, π )∥P(sτ,A|oτ)]]
Expected evidence bound
−EQ[log P(oτ)]
Expected log evidence
−EQ[DKL[Q(sτ,A|oτ, π )Q(sτ,A|π)]]
Expected information gain
≥ − EQ[log P(oτ)]
Expected log evidence
−EQ[DKL[Q(sτ,A|oτ, π )∥Q(sτ,A|π)]]
Expected information gain
(14)
The first term in (14) is the expectation of log evidence under
beliefs about future outcomes, while the second ensures that this
expectation is maximally informed, when outcomes are encoun-
tered. Collectively, these two terms underwrite the resolution
of uncertainty about hidden states (i.e., information gain) and
outcomes (i.e., expected surprise) in relation to prior beliefs.
When the agent’s preferences are expressed in terms of out-
comes (c.f., Fig. 2), it is useful to express risk in terms of outcomes,
as opposed to hidden states. This is most useful when the gen-
erative model is not known or during structure learning, when
the state-space evolves over time. In these cases, the risk over
hidden states can be replaced risk over outcomes by assuming the
KL divergence between the predicted and true posterior (under
expected outcomes) is small:
DKL[Q(sτ,A|π)∥P(sτ,A)]
Risk (states)
=DKL[Q(oτ|π)∥P(oτ)]
Risk (outcomes)
+EQ(oτ|π)[DKL[Q(sτ,A|oτ, π )∥P(sτ,A|oτ)]]
≈0
≈DKL[Q(oτ|π)∥P(oτ)]
Risk (outcomes)
(15)
This divergence constitutes an expected evidence bound that
also appears if we express expected free energy in terms of
intrinsic and extrinsic value:
G(π)= − EQ(oτ|π)[log P(oτ)]
Extrinsic value
+EQ(oτ|π)[DKL[Q(sτ,A|oτ, π )∥P(sτ,A|oτ)]]
Expected evidence bound
−EQ(oτ|π)[DKL[Q(sτ|oτ, π )∥Q(sτ|π)]]
Intrinsic value (states) or salience
−EQ(oτ,sτ|π)[DKL[Q(A|oτ,sτ, π )∥Q(A)]]
Intrinsic value (parameters) or novelty
(16)
Extrinsic value is just the expected value of log evidence,
which can be associated with reward and utility in
behavioural psychology and economics, respectively (Barto et al.,
2013;Kauder,1953;Schmidhuber,2010). In this setting, ex-
trinsic value is the negative of Bayesian risk (Berger,1985),
when reward is log evidence. The intrinsic value of a policy
is its epistemic value or affordance (Friston et al.,2015). This
is just the expected information gain afforded by a particular
policy, which can be about hidden states (i.e., salience) or model
parameters (i.e., novelty). It is this term that underwrites artificial
curiosity (Schmidhuber,2006).
Intrinsic value corresponds to the expected information gain
about model parameters. It is also known as intrinsic motivation
in neurorobotics (Barto et al.,2013;Deci & Ryan,1985;Oudeyer
& Kaplan,2009), the value of information in economics (Howard,
1966), salience in the visual neurosciences and (rather confus-
ingly) Bayesian surprise in the visual search literature (Itti & Baldi,
2009;Schwartenbeck et al.,2013;Sun et al.,2011). In terms of
information theory, intrinsic value is mathematically equivalent
to the expected mutual information between hidden states in
the future and their consequences—consistent with the princi-
ples of minimum redundancy or maximum efficiency (Barlow,
1961,1974;Linsker,1990). Finally, from a statistical perspective,
maximising intrinsic value (i.e., salience and novelty) corresponds
to optimal Bayesian design (Lindley,1956) and machine learning
derivatives, such as active learning (MacKay,1992). On this view,
active learning is driven by novelty; namely, the information
gain afforded model parameters, given future states and their
outcomes. Heuristically, this curiosity resolves uncertainty about
‘‘what would happen if I did that’’ (Schmidhuber,2010). Fig. 4
illustrates the compass of expected free energy, in terms of its
special cases; ranging from optimal Bayesian design through to
Bayesian decision theory.
8. Learning
In active inference, learning concerns the dynamics of synaptic
plasticity, which are thought to encode beliefs about the con-
tingencies of the environment (Friston, FitzGerald et al.,2017)
11
L. Da Costa, T. Parr, N. Sajid et al. Journal of Mathematical Psychology 99 (2020) 102447
Fig. 4. Expected free energy. This figure illustrates the various ways in which minimising expected free energy can be unpacked (omitting model parameters for
clarity). The upper panel casts action and perception as the minimisation of variational and expected free energy, respectively. Crucially, active inference introduces
beliefs over policies that enable a formal description of planning as inference (Attias,2003;Botvinick & Toussaint,2012;Kaplan & Friston,2018a). In brief, posterior
beliefs about hidden states of the world, under plausible policies, are optimised by minimising a variational (free energy) bound on log evidence. These beliefs are
then used to evaluate the expected free energy of allowable policies, from which actions can be selected (Friston, FitzGerald et al.,2017). Crucially, expected free
energy subsumes several special cases that predominate in the psychological, machine learning and economics literature. These special cases are disclosed when one
removes particular sources of uncertainty from the implicit optimisation problem. For example, if we ignore prior preferences, then the expected free energy reduces
to information gain (Lindley,1956;MacKay,2003) or intrinsic motivation (Barto et al.,2013;Deci & Ryan,1985;Oudeyer & Kaplan,2009). This is mathematically
the same as expected Bayesian surprise and mutual information that underwrite salience in visual search (Itti & Baldi,2009;Sun et al.,2011) and the organisation
of our visual apparatus (Barlow,1961,1974;Linsker,1990;Optican & Richmond,1987). If we now remove risk but reinstate prior preferences, one can effectively
treat hidden and observed (sensory) states as isomorphic. This leads to risk sensitive policies in economics (Fleming & Sheu,2002;Kahneman & Tversky,1988) or
KL control in engineering (van den Broek et al.,2010). Here, minimising risk corresponds to aligning predicted outcomes to preferred outcomes. If we then remove
ambiguity and relative risk of action (i.e., intrinsic value), we are left with extrinsic value or expected utility in economics (Von Neumann & Morgenstern,1944) that
underwrites reinforcement learning and behavioural psychology (Barto & Sutton,1992). Bayesian formulations of maximising expected utility under uncertainty is
also known as Bayesian decision theory (Berger,1985). Finally, if we just consider a completely unambiguous world with uninformative priors, expected free energy
reduces to the negative entropy of posterior beliefs about the causes of data; in accord with the maximum entropy principle (Jaynes,1957). The expressions for
variational and expected free energy correspond to those described in the main text (omitting model parameters for clarity). They are arranged to illustrate the
relationship between complexity and accuracy, which become risk and ambiguity, when considering the consequences of action. This means that risk-sensitive policy
selection minimises expected complexity or computational cost. The coloured dots above the terms in the equations correspond to the terms that constitute the
special cases in the lower panels.
(e.g., beliefs about B, in some settings, are thought to be encoded
in recurrent excitatory connections in the prefrontal cortex (Parr,
Rikhye et al.,2019)). The fact that beliefs about matrices (e.g., A,
B) may be encoded in synaptic weights conforms to connectionist
models of brain function, as it offers a convenient way to compute
probabilities, in the sense that the synaptic weights could be
interpreted as performing matrix multiplication as in artificial
neural networks, to predict; e.g., outcomes from beliefs about
states, using the likelihood matrix A.
These synaptic dynamics (e.g., long-term potentiation and de-
pression) evolve at a slower timescale than action and percep-
tion, which is consistent with the fact that such inferences need
evidence accumulation over multiple state–outcome pairs. For
simplicity, we will assume the only variable that is learned is A,
but what follows generalises to more complex generative models
(c.f., Appendix A.1. Learning Ameans that approximate posterior
beliefs about Afollow a gradient descent on variational free
energy. Seeing the variational free energy (5) as a function of a
(the sufficient statistic of Q(A)) we can write:
F(a)=DKL[Q(A)∥P(A)] −
t
τ=1
EQ(π)Q(sτ|π)Q(A)[oτ·log(A)sτ] + · · ·
=DKL[Q(A)∥P(A)] −
t
τ=1
oτ·logAsτ+ · · ·
(17)
Here, we ignore the terms in (5) that do not depend on Q(A), as
these will vanish when we take the gradient. The KL-divergence
12