Content uploaded by Gregoire Sergeant-Perthuis
Author content
All content in this area was uploaded by Gregoire Sergeant-Perthuis on Dec 14, 2022
Content may be subject to copyright.
Content uploaded by Gregoire Sergeant-Perthuis
Author content
All content in this area was uploaded by Gregoire Sergeant-Perthuis on Dec 14, 2022
Content may be subject to copyright.
Curiosity driven exploration through perspective transformation
Gr´egoire Sergeant-Perthuis, David Rudrauf, Yvain Tisserand
14 December 2022
Keywords. Partially Observable Markov Decision Process (POMDP), Perspective Taking, Epistemic
Value.
Abstract
We explain how one can modify the formal expression of epistemic value, which drives curiosity-
based exploration of an agent, to take into account that the agent has its own perspective on
its environment. It is an example of how perspective-taking and decision-making can be related;
more generally we propose to relate both by exhibiting a special class of policies that accounts for
perspective-taking. We suggest an experiment that can discriminate between agents exploring based
on an objective representation versus a subjective perspective on their environment. This is one step
in the direction of clarifying how perspective-taking can contribute to decision-making.
1 Curiosity: expected distance between a posteriori and a priori
Active Inference is one implementation of the Bayesian Brain Hypothesis for generating behaviors similar
to those expected from adaptive agents [Friston et al., 2006, Timsit and Sergeant-Perthuis, 2021]. It
relies on an internal representation of the environment that an agent wants to explore and exploit. Using
this world model, the agent continually updates beliefs about plausible competing internal hypotheses on
the state of this environment. Under common sensory limitations, Active Inference relates to Partially
Observable Markov Decision Process (POMDP) [Da Costa et al., 2020, Ognibene et al., 2019]. Curiosity
or epistemic value is one of the quantities that come into play when a principle of how the agent should
act is given [Friston et al., 2015]. In this note, we will consider a toy model of an agent exploring an
environment based on curiosity but with a different flavor on how actions are implemented so that they
reflect perspective tacking on the internal representation space of the agent.
The toy model we consider is that of an agent Awhich is looking for an object Owhose position is
y∈R3. Assume that the agent’s internal representation space is the set of all possible positions of O,
which is simply R3.Ahas internal beliefs on the position of the object encoded by a probability measure
QX(dx) = qX(x)dx for x∈R3. These beliefs are updated according to noisy sensory observations
of the real position of the object y∈R3. This uncertainty is captured by a probability kernel PY|X
from R3→R3which is a (measurable) function that associates to a position in the internal space
x∈R3a probability measure PY|X(dy|x) on the possible observations of the object. Let us denote
PX,Y (dx, dy) := PY|X(y|x)q(x)dxdy the probability measure on the internal space (R3) times the space
of configuration of the environment (R3) and pX,Y (x, y) the associated density evaluated in x, y ∈R3.
We focus on exploration based on curiosity as it is one case where the computational expression of the
concept of curiosity is salient. Following [Friston et al., 2015], curiosity can be defined as
C(QX) = EPYH(PX|Y|QX)=ZpY(y)dy ZpX|Y(x|y) ln pX|Y(x|y)
qX(x)dx (1)
where Hstand for the relative entropy also called Kullback-Leibler divergence. The previous expres-
sion can be rewritten as,
1
C(QX) = ZpX,Y ln pX,Y (x, y)
pY(y)qX(x)dxdy (2)
The agent Achooses its actions so as to maximize curiosity.
Active inference can be used as a framework to understand and model central aspects of conscious-
ness [Rudrauf et al., 2017] [Solms, 2019][Rudrauf et al., 2022]. In particular, an important feature of
consciousness is to represent information in a Global Workspace [Dehaene et al., 2017] and take sub-
jective perspectives on an internal world model accessed within that workspace to appraise possible
actions based on their expected utility and epistemic value [Rudrauf et al., 2022]. In previous work
[Williford et al., 2018] [Rudrauf et al., 2022] [Rudrauf et al., 2021], we proposed that the internal repre-
sentation space is geometrically structured, more precisely we explored the possibility of it being a 3-d
projective space, denoted P3(R). This has the advantage of making natural the notion of perspective
taking: it is the choice of a projective transformation ψ, a projective transformation being a linear iso-
morphism Mψ∈GL4(R) up to a multiplicative constant. Results that support the use of this model
are for example that it gives an explanation for the moon illusion [Rudrauf et al., 2020] with, for the
first time, falsifiable predictions on how strong the effect should be depending on context, as well as the
generation of adaptive and maladaptive behaviours in a manner that is consistent with developmental
and clinical psychology (see [Rudrauf et al., 2022]).
2 How perspective taking modifies epistemic value
Although essential in psychology notably for understanding multi-agent social interactions (empathy),
perspective taking is most often absent of existing models of consciousness [Koch et al., 2016, Kleiner and Tull, 2021,
Mashour et al., 2020]. Likewise, the advantages of perspective taking for cybernetics has not yet been
clearly formulated (but see [Rudrauf et al., 2022]). Here, we make a step in the direction of clarifying
how perspective taking can modify behaviors. We propose to compare exploration driven by curiosity
with and without perspective taking on the internal representation space.
In order to do so, we adopt the dual point of view on movement which is that it can also be seen
as a change of frame (point of view): in physics, it is commonly known as the duality between active
and passive transformations. In the optimal control literature in which Markov Decision Process is a
central concept, and in particular, in the multi-agent setting [Wiering and van Otterlo, 2012], actions are
thought of as transformations that change the state space. More precisely, in the standard formulation
of POMDP, policies πare composed of actions and relate a state to a stochastic choice of actions. To
simplify the presentation, we identify policies π, taken from a space of policies Π, with probability kernels
pSt+1|St,Π(st+1 |st, π), where st+1 , stbelong to the state space S; they capture how the state space Scan
evolve.
2.1 Epistemic value driven exploration
For example, in the toy model we consider and in the standard formulation, this translates into considering
that there are several possible dynamics pX1|X0,Π(x1|x0, π) on the internal representation space, induced
by actions (moves) of the agent (here π∈Π), which allow the agent to go from one configuration x0in
the internal space to the next configuration x1. The a priori QX0induces an a priori at the next step,
QX1, which density is defined as follows,
qX1|Π(x1|π) = ZpX1|X0,Π(x1|x0, π)qX0(x0)dx0(3)
Therefore each policies πdefines an a priori measure QX1[π] := qX1|Π(x1|π)dx1. Curiosity is computed
from pX1,Y |Π(x1, y|π) = pY|X(y|x1)qX1|Π(x1|π) following the same expression as Equation 2 but this time
depending on policy π∈Π,
2
C(QX1[π]) = ZpX1,Y |Π(x1, y|π) ln pX1,Y |Π(x1, y|π)
pY|Π(y|π)qX1|Π(x1|π)dx1dy (4)
When curiosity Cis computed with respect to policies, we will call it epistemic value. Indeed, the
epistemic value is the quantity that weights the different policies when exploration is driven by curiosity.
In this case curiosity depends on policy πand a priori QX0; we will indicate this dependence by denoting
C(QX1[π]) as C(Q, π).
2.2 Perspective taking as a policy
When action is seen as a change of coordinates (with possible loss of information) on the internal rep-
resentation space, one must replace policies by (measurable) applications ψ:S→S(π:= ψ). ψcan
be for example a projective transformation from R3to R3. Measurable applications induce naturally a
probability kernel which expression, in the toy model we consider, is defined as for any measurable subset
A⊆R3, and x0∈R3,
pX1|X0,ψ(A|x0, ψ ) = 1[ψ(x0)∈A] (5)
The a priori QX1|ψthen becomes the pushforward measure ψ∗QX0and epistemic value is still com-
puted applying Equation 2. Taking a passive point of view on action makes it possible for the agent to
be always centered on itself in its internal space, that is to leverage an egocentric perspective. Doing so
makes it natural to define sensory uncertainty and any relative properties that structures integration of
information from the environment to the internal space.
The experimental design we propose is to compare exploration driven by curiosity for policies seen
as (stochastic) actions on the state space of the agent, when policies are seen as an Euclidean change
of frame versus when they are seen as a projective transformation. The following Theorem states that
this experiment allows to discriminate when the behavior of the agent is dictated by objective perspec-
tive on its environment (Euclidiean change of frame) or when it is subjective (Projective change of frame).
Theorem (Discrimination of behaviour with respect to internal representation).If the agent explores
its environment in order to search for Owhen it has an objective representation of its environment, its
stops moving when it see O.
But, if its representation is given through projective transformations it will search to get closer to O,
even after seeing O.
Proof. See Appendix A
References
[Da Costa et al., 2020] Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., and Friston, K. (2020).
Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99:102447.
[Dehaene et al., 2017] Dehaene, S., Lau, H., and Kouider, S. (2017). What is consciousness, and could
machines have it? Science, 358(6362):486–492.
[Friston et al., 2006] Friston, K., Kilner, J., and Harrison, L. (2006). A free energy principle for the
brain. Journal of Physiology-Paris, 100(1):70–87. Theoretical and Computational Neuroscience: Un-
derstanding Brain Functions.
[Friston et al., 2015] Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., and Pezzulo, G.
(2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4):187–214.
3
[Kleiner and Tull, 2021] Kleiner, J. and Tull, S. (2021). The mathematical structure of integrated infor-
mation theory. Frontiers in Applied Mathematics and Statistics, 6.
[Koch et al., 2016] Koch, C., Massimini, M., Boly, M., and Tononi, G. (2016). Neural correlates of
consciousness: progress and problems. Nature Reviews Neuroscience.
[Mashour et al., 2020] Mashour, G. A., Roelfsema, P., Changeux, J.-P., and Dehaene, S. (2020). Con-
scious processing and the global neuronal workspace hypothesis. Neuron, 105(5):776–798.
[Ognibene et al., 2019] Ognibene, D., Mirante, L., and Marchegiani, L. (2019). Proactive intention recog-
nition for joint human-robot search and rescue missions through monte-carlo planning in pomdp en-
vironments. In Salichs, M. A., Ge, S. S., Barakova, E. I., Cabibihan, J.-J., Wagner, A. R., Castro-
Gonz´alez, ´
A., and He, H., editors, Social Robotics, pages 332–343, Cham. Springer International Pub-
lishing.
[Rudrauf et al., 2017] Rudrauf, D., Bennequin, D., Granic, I., Landini, G., Friston, K., and Williford, K.
(2017). A mathematical model of embodied consciousness. Journal of theoretical biology, 428:106–131.
[Rudrauf et al., 2020] Rudrauf, D., Bennequin, D., and Williford, K. (2020). The moon illusion explained
by the projective consciousness model. Journal of Theoretical Biology, 507:110455.
[Rudrauf et al., 2022] Rudrauf, D., Sergeant-Perthuis, G., Belli, O., Tisserand, Y., and Serugendo, G.
D. M. (2022). Modeling the subjective perspective of consciousness and its role in the control of
behaviours. Journal of Theoretical Biology, 534:110957.
[Rudrauf et al., 2021] Rudrauf, D., Sergeant-Perthuis, G., Tisserand, Y., Monnor, T., and Belli, O.
(2021). Combining the Projective Consciousness Model and Virtual Humans to assess ToM capac-
ity in Virtual Reality: a proof-of-concept.
[Solms, 2019] Solms, M. (2019). The hard problem of consciousness and the free energy principle. Fron-
tiers in Psychology, 9.
[Timsit and Sergeant-Perthuis, 2021] Timsit, Y. and Sergeant-Perthuis, G. (2021). Towards the idea of
molecular brains. International Journal of Molecular Sciences, 22(21).
[Wiering and van Otterlo, 2012] Wiering, M. and van Otterlo, M. (2012). Reinforcement Learning: State-
of-the-Art. Springer Berlin Heidelberg.
[Williford et al., 2018] Williford, K., Bennequin, D., Friston, K., and Rudrauf, D. (2018). The projective
consciousness model and phenomenal selfhood. Frontiers in Psychology, 9:2571.
A Proof of Theorem
Assume that, for any x, y ∈R3,
PY|X(y|x) = 1[∥x−y∥ ≤ 1] (6)
where ∥.∥designates the Euclidean norm on R3, i.e. ∥x∥2=x2
0+x2
1+x2
2. We will also denote By(1)
the Euclidean ball of radius 1 around y∈R3, i.e. By(1) = {x∈R3| ∥x−y∥ ≤ 1}.
Then, for any probability distribution Qover R3, let us denote C(Q[ψ]) as C(Q, ψ),
C(Q, ψ) = Zψ∗Q(dx1)Zdy1[x1∈By(1)] ln 1[x1∈By(1)]
Rψ∗Q(dx1)1[x1∈By(1)] (7)
=−Zdy ln Q(ψ−1(By(1)) Zψ∗Q(dx1)1[x1∈By(1)] (8)
=−ZdyQ(ψ−1(By(1)) ln Q(ψ−1(By(1)) (9)
4
When ψis an Euclidean transformation, ψ−1(By(1)) = Bψ−1(y)(1), therefore, in the Euclidean case,
C(Q, ψ) = −ZdyQ(Bψ−1(y)(1)) ln Q(Bψ−1(y)(1))
=−ZdyQ(By(1)) ln Q(By(1))
In this case, curiosity is independent on the change of Euclidean frame and not moving is a perfect
valid choice in order to maximize the agent’s curiosity. Let us remark that this effect appears as the agent
assumes (or believes) that it has access to the whole configuration space of O, if it knew it had limited
access to it, through for example limited sight, we expect the agent to look around until the object Ois
in sight and then stop.
In the non Euclidean case, thanks to the rewriting of curiosity as Equation 9, we show that curiosity
favors transformations that shrink By(1) through ψ−1; indeed if two transformations ψ, ψ1are such that,
ψ−1(By(1)) ⊆ψ−1
1(By(1)) (10)
then,
−ψ−1(By(1)) ln ψ−1(By(1)) ≥ −ψ−1
1(By(1)) ln ψ−1
1(By(1)) (11)
For any initial prior Q0at stating time t= 0, that we assume to be absolutely continuous with respect
to the Lebesgue measure (i.e. Q0(dx) = q0(x)dx), after one step, the agent updates its prior as,
q1(x)∼
=1[x∈By0(1)]q0(x) (12)
where y0∈R3is the real position of the object O: the belief is supported around By0(1). By minimiz-
ing curiosity the agent will try to shrink ψ−1(Bψ(y0)(1)), which is always inside the set where measure is
supported (i.e. By0(1) ). Shrinking ψ−1(Bψ(y0)(1)) means magnifying the zone around y0in the agents
new frame after action. If the agent has enough actions it will always be able to shrink ψ−1(Bψ(y0)(1))
and therefore move closer to y0.
The case where the initial prior Q0is uniform in a delimited large box around y0(large with respect
to 1) is a particular case where all the computations are tractable by hand.
5