PreprintPDF Available

Curiosity driven exploration through perspective transformation

Preprints and early-stage research may not have been peer reviewed yet.
Curiosity driven exploration through perspective transformation
Gr´egoire Sergeant-Perthuis, David Rudrauf, Yvain Tisserand
14 December 2022
Keywords. Partially Observable Markov Decision Process (POMDP), Perspective Taking, Epistemic
We explain how one can modify the formal expression of epistemic value, which drives curiosity-
based exploration of an agent, to take into account that the agent has its own perspective on
its environment. It is an example of how perspective-taking and decision-making can be related;
more generally we propose to relate both by exhibiting a special class of policies that accounts for
perspective-taking. We suggest an experiment that can discriminate between agents exploring based
on an objective representation versus a subjective perspective on their environment. This is one step
in the direction of clarifying how perspective-taking can contribute to decision-making.
1 Curiosity: expected distance between a posteriori and a priori
Active Inference is one implementation of the Bayesian Brain Hypothesis for generating behaviors similar
to those expected from adaptive agents [Friston et al., 2006, Timsit and Sergeant-Perthuis, 2021]. It
relies on an internal representation of the environment that an agent wants to explore and exploit. Using
this world model, the agent continually updates beliefs about plausible competing internal hypotheses on
the state of this environment. Under common sensory limitations, Active Inference relates to Partially
Observable Markov Decision Process (POMDP) [Da Costa et al., 2020, Ognibene et al., 2019]. Curiosity
or epistemic value is one of the quantities that come into play when a principle of how the agent should
act is given [Friston et al., 2015]. In this note, we will consider a toy model of an agent exploring an
environment based on curiosity but with a different flavor on how actions are implemented so that they
reflect perspective tacking on the internal representation space of the agent.
The toy model we consider is that of an agent Awhich is looking for an object Owhose position is
yR3. Assume that the agent’s internal representation space is the set of all possible positions of O,
which is simply R3.Ahas internal beliefs on the position of the object encoded by a probability measure
QX(dx) = qX(x)dx for xR3. These beliefs are updated according to noisy sensory observations
of the real position of the object yR3. This uncertainty is captured by a probability kernel PY|X
from R3R3which is a (measurable) function that associates to a position in the internal space
xR3a probability measure PY|X(dy|x) on the possible observations of the object. Let us denote
PX,Y (dx, dy) := PY|X(y|x)q(x)dxdy the probability measure on the internal space (R3) times the space
of configuration of the environment (R3) and pX,Y (x, y) the associated density evaluated in x, y R3.
We focus on exploration based on curiosity as it is one case where the computational expression of the
concept of curiosity is salient. Following [Friston et al., 2015], curiosity can be defined as
C(QX) = EPYH(PX|Y|QX)=ZpY(y)dy ZpX|Y(x|y) ln pX|Y(x|y)
qX(x)dx (1)
where Hstand for the relative entropy also called Kullback-Leibler divergence. The previous expres-
sion can be rewritten as,
C(QX) = ZpX,Y ln pX,Y (x, y)
pY(y)qX(x)dxdy (2)
The agent Achooses its actions so as to maximize curiosity.
Active inference can be used as a framework to understand and model central aspects of conscious-
ness [Rudrauf et al., 2017] [Solms, 2019][Rudrauf et al., 2022]. In particular, an important feature of
consciousness is to represent information in a Global Workspace [Dehaene et al., 2017] and take sub-
jective perspectives on an internal world model accessed within that workspace to appraise possible
actions based on their expected utility and epistemic value [Rudrauf et al., 2022]. In previous work
[Williford et al., 2018] [Rudrauf et al., 2022] [Rudrauf et al., 2021], we proposed that the internal repre-
sentation space is geometrically structured, more precisely we explored the possibility of it being a 3-d
projective space, denoted P3(R). This has the advantage of making natural the notion of perspective
taking: it is the choice of a projective transformation ψ, a projective transformation being a linear iso-
morphism MψGL4(R) up to a multiplicative constant. Results that support the use of this model
are for example that it gives an explanation for the moon illusion [Rudrauf et al., 2020] with, for the
first time, falsifiable predictions on how strong the effect should be depending on context, as well as the
generation of adaptive and maladaptive behaviours in a manner that is consistent with developmental
and clinical psychology (see [Rudrauf et al., 2022]).
2 How perspective taking modifies epistemic value
Although essential in psychology notably for understanding multi-agent social interactions (empathy),
perspective taking is most often absent of existing models of consciousness [Koch et al., 2016, Kleiner and Tull, 2021,
Mashour et al., 2020]. Likewise, the advantages of perspective taking for cybernetics has not yet been
clearly formulated (but see [Rudrauf et al., 2022]). Here, we make a step in the direction of clarifying
how perspective taking can modify behaviors. We propose to compare exploration driven by curiosity
with and without perspective taking on the internal representation space.
In order to do so, we adopt the dual point of view on movement which is that it can also be seen
as a change of frame (point of view): in physics, it is commonly known as the duality between active
and passive transformations. In the optimal control literature in which Markov Decision Process is a
central concept, and in particular, in the multi-agent setting [Wiering and van Otterlo, 2012], actions are
thought of as transformations that change the state space. More precisely, in the standard formulation
of POMDP, policies πare composed of actions and relate a state to a stochastic choice of actions. To
simplify the presentation, we identify policies π, taken from a space of policies Π, with probability kernels
pSt+1|St,Π(st+1 |st, π), where st+1 , stbelong to the state space S; they capture how the state space Scan
2.1 Epistemic value driven exploration
For example, in the toy model we consider and in the standard formulation, this translates into considering
that there are several possible dynamics pX1|X0,Π(x1|x0, π) on the internal representation space, induced
by actions (moves) of the agent (here πΠ), which allow the agent to go from one configuration x0in
the internal space to the next configuration x1. The a priori QX0induces an a priori at the next step,
QX1, which density is defined as follows,
qX1|Π(x1|π) = ZpX1|X0,Π(x1|x0, π)qX0(x0)dx0(3)
Therefore each policies πdefines an a priori measure QX1[π] := qX1|Π(x1|π)dx1. Curiosity is computed
from pX1,Y |Π(x1, y|π) = pY|X(y|x1)qX1|Π(x1|π) following the same expression as Equation 2 but this time
depending on policy πΠ,
C(QX1[π]) = ZpX1,Y |Π(x1, y|π) ln pX1,Y |Π(x1, y|π)
pY|Π(y|π)qX1|Π(x1|π)dx1dy (4)
When curiosity Cis computed with respect to policies, we will call it epistemic value. Indeed, the
epistemic value is the quantity that weights the different policies when exploration is driven by curiosity.
In this case curiosity depends on policy πand a priori QX0; we will indicate this dependence by denoting
C(QX1[π]) as C(Q, π).
2.2 Perspective taking as a policy
When action is seen as a change of coordinates (with possible loss of information) on the internal rep-
resentation space, one must replace policies by (measurable) applications ψ:SS(π:= ψ). ψcan
be for example a projective transformation from R3to R3. Measurable applications induce naturally a
probability kernel which expression, in the toy model we consider, is defined as for any measurable subset
AR3, and x0R3,
pX1|X0(A|x0, ψ ) = 1[ψ(x0)A] (5)
The a priori QX1|ψthen becomes the pushforward measure ψQX0and epistemic value is still com-
puted applying Equation 2. Taking a passive point of view on action makes it possible for the agent to
be always centered on itself in its internal space, that is to leverage an egocentric perspective. Doing so
makes it natural to define sensory uncertainty and any relative properties that structures integration of
information from the environment to the internal space.
The experimental design we propose is to compare exploration driven by curiosity for policies seen
as (stochastic) actions on the state space of the agent, when policies are seen as an Euclidean change
of frame versus when they are seen as a projective transformation. The following Theorem states that
this experiment allows to discriminate when the behavior of the agent is dictated by objective perspec-
tive on its environment (Euclidiean change of frame) or when it is subjective (Projective change of frame).
Theorem (Discrimination of behaviour with respect to internal representation).If the agent explores
its environment in order to search for Owhen it has an objective representation of its environment, its
stops moving when it see O.
But, if its representation is given through projective transformations it will search to get closer to O,
even after seeing O.
Proof. See Appendix A
[Da Costa et al., 2020] Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., and Friston, K. (2020).
Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99:102447.
[Dehaene et al., 2017] Dehaene, S., Lau, H., and Kouider, S. (2017). What is consciousness, and could
machines have it? Science, 358(6362):486–492.
[Friston et al., 2006] Friston, K., Kilner, J., and Harrison, L. (2006). A free energy principle for the
brain. Journal of Physiology-Paris, 100(1):70–87. Theoretical and Computational Neuroscience: Un-
derstanding Brain Functions.
[Friston et al., 2015] Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., and Pezzulo, G.
(2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4):187–214.
[Kleiner and Tull, 2021] Kleiner, J. and Tull, S. (2021). The mathematical structure of integrated infor-
mation theory. Frontiers in Applied Mathematics and Statistics, 6.
[Koch et al., 2016] Koch, C., Massimini, M., Boly, M., and Tononi, G. (2016). Neural correlates of
consciousness: progress and problems. Nature Reviews Neuroscience.
[Mashour et al., 2020] Mashour, G. A., Roelfsema, P., Changeux, J.-P., and Dehaene, S. (2020). Con-
scious processing and the global neuronal workspace hypothesis. Neuron, 105(5):776–798.
[Ognibene et al., 2019] Ognibene, D., Mirante, L., and Marchegiani, L. (2019). Proactive intention recog-
nition for joint human-robot search and rescue missions through monte-carlo planning in pomdp en-
vironments. In Salichs, M. A., Ge, S. S., Barakova, E. I., Cabibihan, J.-J., Wagner, A. R., Castro-
Gonz´alez, ´
A., and He, H., editors, Social Robotics, pages 332–343, Cham. Springer International Pub-
[Rudrauf et al., 2017] Rudrauf, D., Bennequin, D., Granic, I., Landini, G., Friston, K., and Williford, K.
(2017). A mathematical model of embodied consciousness. Journal of theoretical biology, 428:106–131.
[Rudrauf et al., 2020] Rudrauf, D., Bennequin, D., and Williford, K. (2020). The moon illusion explained
by the projective consciousness model. Journal of Theoretical Biology, 507:110455.
[Rudrauf et al., 2022] Rudrauf, D., Sergeant-Perthuis, G., Belli, O., Tisserand, Y., and Serugendo, G.
D. M. (2022). Modeling the subjective perspective of consciousness and its role in the control of
behaviours. Journal of Theoretical Biology, 534:110957.
[Rudrauf et al., 2021] Rudrauf, D., Sergeant-Perthuis, G., Tisserand, Y., Monnor, T., and Belli, O.
(2021). Combining the Projective Consciousness Model and Virtual Humans to assess ToM capac-
ity in Virtual Reality: a proof-of-concept.
[Solms, 2019] Solms, M. (2019). The hard problem of consciousness and the free energy principle. Fron-
tiers in Psychology, 9.
[Timsit and Sergeant-Perthuis, 2021] Timsit, Y. and Sergeant-Perthuis, G. (2021). Towards the idea of
molecular brains. International Journal of Molecular Sciences, 22(21).
[Wiering and van Otterlo, 2012] Wiering, M. and van Otterlo, M. (2012). Reinforcement Learning: State-
of-the-Art. Springer Berlin Heidelberg.
[Williford et al., 2018] Williford, K., Bennequin, D., Friston, K., and Rudrauf, D. (2018). The projective
consciousness model and phenomenal selfhood. Frontiers in Psychology, 9:2571.
A Proof of Theorem
Assume that, for any x, y R3,
PY|X(y|x) = 1[xy 1] (6)
where .designates the Euclidean norm on R3, i.e. x2=x2
2. We will also denote By(1)
the Euclidean ball of radius 1 around yR3, i.e. By(1) = {xR3| xy 1}.
Then, for any probability distribution Qover R3, let us denote C(Q[ψ]) as C(Q, ψ),
C(Q, ψ) = ZψQ(dx1)Zdy1[x1By(1)] ln 1[x1By(1)]
RψQ(dx1)1[x1By(1)] (7)
=Zdy ln Q(ψ1(By(1)) ZψQ(dx1)1[x1By(1)] (8)
=ZdyQ(ψ1(By(1)) ln Q(ψ1(By(1)) (9)
When ψis an Euclidean transformation, ψ1(By(1)) = Bψ1(y)(1), therefore, in the Euclidean case,
C(Q, ψ) = ZdyQ(Bψ1(y)(1)) ln Q(Bψ1(y)(1))
=ZdyQ(By(1)) ln Q(By(1))
In this case, curiosity is independent on the change of Euclidean frame and not moving is a perfect
valid choice in order to maximize the agent’s curiosity. Let us remark that this effect appears as the agent
assumes (or believes) that it has access to the whole configuration space of O, if it knew it had limited
access to it, through for example limited sight, we expect the agent to look around until the object Ois
in sight and then stop.
In the non Euclidean case, thanks to the rewriting of curiosity as Equation 9, we show that curiosity
favors transformations that shrink By(1) through ψ1; indeed if two transformations ψ, ψ1are such that,
ψ1(By(1)) ψ1
1(By(1)) (10)
ψ1(By(1)) ln ψ1(By(1)) ψ1
1(By(1)) ln ψ1
1(By(1)) (11)
For any initial prior Q0at stating time t= 0, that we assume to be absolutely continuous with respect
to the Lebesgue measure (i.e. Q0(dx) = q0(x)dx), after one step, the agent updates its prior as,
=1[xBy0(1)]q0(x) (12)
where y0R3is the real position of the object O: the belief is supported around By0(1). By minimiz-
ing curiosity the agent will try to shrink ψ1(Bψ(y0)(1)), which is always inside the set where measure is
supported (i.e. By0(1) ). Shrinking ψ1(Bψ(y0)(1)) means magnifying the zone around y0in the agents
new frame after action. If the agent has enough actions it will always be able to shrink ψ1(Bψ(y0)(1))
and therefore move closer to y0.
The case where the initial prior Q0is uniform in a delimited large box around y0(large with respect
to 1) is a particular case where all the computations are tractable by hand.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
How can single cells without nervous systems perform complex behaviours such as habituation, associative learning and decision making, which are considered the hallmark of animals with a brain? Are there molecular systems that underlie cognitive properties equivalent to those of the brain? This review follows the development of the idea of molecular brains from Darwin’s “root brain hypothesis”, through bacterial chemotaxis, to the recent discovery of neuron-like r-protein networks in the ribosome. By combining a structural biology view with a Bayesian brain approach, this review explores the evolutionary labyrinth of information processing systems across scales. Ribosomal protein networks open a window into what were probably the earliest signalling systems to emerge before the radiation of the three kingdoms. While ribosomal networks are characterised by long-lasting interactions between their protein nodes, cell signalling networks are essentially based on transient interactions. As a corollary, while signals propagated in persistent networks may be ephemeral, networks whose interactions are transient constrain signals diffusing into the cytoplasm to be durable in time, such as post-translational modifications of proteins or second messenger synthesis. The duration and nature of the signals, in turn, implies different mechanisms for the integration of multiple signals and decision making. Evolution then reinvented networks with persistent interactions with the development of nervous systems in metazoans. Ribosomal protein networks and simple nervous systems display architectural and functional analogies whose comparison could suggest scale invariance in information processing. At the molecular level, the significant complexification of eukaryotic ribosomal protein networks is associated with a burst in the acquisition of new conserved aromatic amino acids. Knowing that aromatic residues play a critical role in allosteric receptors and channels, this observation suggests a general role of π systems and their interactions with charged amino acids in multiple signal integration and information processing. We think that these findings may provide the molecular basis for designing future computers with organic processors.
Full-text available
Integrated Information Theory is one of the leading models of consciousness. It aims to describe both the quality and quantity of the conscious experience of a physical system, such as the brain, in a particular state. In this contribution, we propound the mathematical structure of the theory, separating the essentials from auxiliary formal tools. We provide a definition of a generalized IIT which has IIT 3.0 of Tononi et al., as well as the Quantum IIT introduced by Zanardi et al. as special cases. This provides an axiomatic definition of the theory which may serve as the starting point for future formal investigations and as an introduction suitable for researchers with a formal background.
Full-text available
Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.
Full-text available
This article applies the free energy principle to the hard problem of consciousness. After clarifying some philosophical issues concerning functionalism, it identifies the elemental form of consciousness as affect and locates its physiological mechanism (an extended form of homeostasis) in the upper brainstem. This mechanism is then formalized in terms of free energy minimization (in unpredicted contexts) where decreases and increases in expected uncertainty are felt as pleasure and unpleasure, respectively. Emphasis is placed on the reasons why such existential imperatives feel like something to and for an organism.