Content uploaded by Gregoire Sergeant-Perthuis

Author content

All content in this area was uploaded by Gregoire Sergeant-Perthuis on Dec 14, 2022

Content may be subject to copyright.

Content uploaded by Gregoire Sergeant-Perthuis

Author content

All content in this area was uploaded by Gregoire Sergeant-Perthuis on Dec 14, 2022

Content may be subject to copyright.

Curiosity driven exploration through perspective transformation

Gr´egoire Sergeant-Perthuis, David Rudrauf, Yvain Tisserand

14 December 2022

Keywords. Partially Observable Markov Decision Process (POMDP), Perspective Taking, Epistemic

Value.

Abstract

We explain how one can modify the formal expression of epistemic value, which drives curiosity-

based exploration of an agent, to take into account that the agent has its own perspective on

its environment. It is an example of how perspective-taking and decision-making can be related;

more generally we propose to relate both by exhibiting a special class of policies that accounts for

perspective-taking. We suggest an experiment that can discriminate between agents exploring based

on an objective representation versus a subjective perspective on their environment. This is one step

in the direction of clarifying how perspective-taking can contribute to decision-making.

1 Curiosity: expected distance between a posteriori and a priori

Active Inference is one implementation of the Bayesian Brain Hypothesis for generating behaviors similar

to those expected from adaptive agents [Friston et al., 2006, Timsit and Sergeant-Perthuis, 2021]. It

relies on an internal representation of the environment that an agent wants to explore and exploit. Using

this world model, the agent continually updates beliefs about plausible competing internal hypotheses on

the state of this environment. Under common sensory limitations, Active Inference relates to Partially

Observable Markov Decision Process (POMDP) [Da Costa et al., 2020, Ognibene et al., 2019]. Curiosity

or epistemic value is one of the quantities that come into play when a principle of how the agent should

act is given [Friston et al., 2015]. In this note, we will consider a toy model of an agent exploring an

environment based on curiosity but with a diﬀerent ﬂavor on how actions are implemented so that they

reﬂect perspective tacking on the internal representation space of the agent.

The toy model we consider is that of an agent Awhich is looking for an object Owhose position is

y∈R3. Assume that the agent’s internal representation space is the set of all possible positions of O,

which is simply R3.Ahas internal beliefs on the position of the object encoded by a probability measure

QX(dx) = qX(x)dx for x∈R3. These beliefs are updated according to noisy sensory observations

of the real position of the object y∈R3. This uncertainty is captured by a probability kernel PY|X

from R3→R3which is a (measurable) function that associates to a position in the internal space

x∈R3a probability measure PY|X(dy|x) on the possible observations of the object. Let us denote

PX,Y (dx, dy) := PY|X(y|x)q(x)dxdy the probability measure on the internal space (R3) times the space

of conﬁguration of the environment (R3) and pX,Y (x, y) the associated density evaluated in x, y ∈R3.

We focus on exploration based on curiosity as it is one case where the computational expression of the

concept of curiosity is salient. Following [Friston et al., 2015], curiosity can be deﬁned as

C(QX) = EPYH(PX|Y|QX)=ZpY(y)dy ZpX|Y(x|y) ln pX|Y(x|y)

qX(x)dx (1)

where Hstand for the relative entropy also called Kullback-Leibler divergence. The previous expres-

sion can be rewritten as,

1

C(QX) = ZpX,Y ln pX,Y (x, y)

pY(y)qX(x)dxdy (2)

The agent Achooses its actions so as to maximize curiosity.

Active inference can be used as a framework to understand and model central aspects of conscious-

ness [Rudrauf et al., 2017] [Solms, 2019][Rudrauf et al., 2022]. In particular, an important feature of

consciousness is to represent information in a Global Workspace [Dehaene et al., 2017] and take sub-

jective perspectives on an internal world model accessed within that workspace to appraise possible

actions based on their expected utility and epistemic value [Rudrauf et al., 2022]. In previous work

[Williford et al., 2018] [Rudrauf et al., 2022] [Rudrauf et al., 2021], we proposed that the internal repre-

sentation space is geometrically structured, more precisely we explored the possibility of it being a 3-d

projective space, denoted P3(R). This has the advantage of making natural the notion of perspective

taking: it is the choice of a projective transformation ψ, a projective transformation being a linear iso-

morphism Mψ∈GL4(R) up to a multiplicative constant. Results that support the use of this model

are for example that it gives an explanation for the moon illusion [Rudrauf et al., 2020] with, for the

ﬁrst time, falsiﬁable predictions on how strong the eﬀect should be depending on context, as well as the

generation of adaptive and maladaptive behaviours in a manner that is consistent with developmental

and clinical psychology (see [Rudrauf et al., 2022]).

2 How perspective taking modiﬁes epistemic value

Although essential in psychology notably for understanding multi-agent social interactions (empathy),

perspective taking is most often absent of existing models of consciousness [Koch et al., 2016, Kleiner and Tull, 2021,

Mashour et al., 2020]. Likewise, the advantages of perspective taking for cybernetics has not yet been

clearly formulated (but see [Rudrauf et al., 2022]). Here, we make a step in the direction of clarifying

how perspective taking can modify behaviors. We propose to compare exploration driven by curiosity

with and without perspective taking on the internal representation space.

In order to do so, we adopt the dual point of view on movement which is that it can also be seen

as a change of frame (point of view): in physics, it is commonly known as the duality between active

and passive transformations. In the optimal control literature in which Markov Decision Process is a

central concept, and in particular, in the multi-agent setting [Wiering and van Otterlo, 2012], actions are

thought of as transformations that change the state space. More precisely, in the standard formulation

of POMDP, policies πare composed of actions and relate a state to a stochastic choice of actions. To

simplify the presentation, we identify policies π, taken from a space of policies Π, with probability kernels

pSt+1|St,Π(st+1 |st, π), where st+1 , stbelong to the state space S; they capture how the state space Scan

evolve.

2.1 Epistemic value driven exploration

For example, in the toy model we consider and in the standard formulation, this translates into considering

that there are several possible dynamics pX1|X0,Π(x1|x0, π) on the internal representation space, induced

by actions (moves) of the agent (here π∈Π), which allow the agent to go from one conﬁguration x0in

the internal space to the next conﬁguration x1. The a priori QX0induces an a priori at the next step,

QX1, which density is deﬁned as follows,

qX1|Π(x1|π) = ZpX1|X0,Π(x1|x0, π)qX0(x0)dx0(3)

Therefore each policies πdeﬁnes an a priori measure QX1[π] := qX1|Π(x1|π)dx1. Curiosity is computed

from pX1,Y |Π(x1, y|π) = pY|X(y|x1)qX1|Π(x1|π) following the same expression as Equation 2 but this time

depending on policy π∈Π,

2

C(QX1[π]) = ZpX1,Y |Π(x1, y|π) ln pX1,Y |Π(x1, y|π)

pY|Π(y|π)qX1|Π(x1|π)dx1dy (4)

When curiosity Cis computed with respect to policies, we will call it epistemic value. Indeed, the

epistemic value is the quantity that weights the diﬀerent policies when exploration is driven by curiosity.

In this case curiosity depends on policy πand a priori QX0; we will indicate this dependence by denoting

C(QX1[π]) as C(Q, π).

2.2 Perspective taking as a policy

When action is seen as a change of coordinates (with possible loss of information) on the internal rep-

resentation space, one must replace policies by (measurable) applications ψ:S→S(π:= ψ). ψcan

be for example a projective transformation from R3to R3. Measurable applications induce naturally a

probability kernel which expression, in the toy model we consider, is deﬁned as for any measurable subset

A⊆R3, and x0∈R3,

pX1|X0,ψ(A|x0, ψ ) = 1[ψ(x0)∈A] (5)

The a priori QX1|ψthen becomes the pushforward measure ψ∗QX0and epistemic value is still com-

puted applying Equation 2. Taking a passive point of view on action makes it possible for the agent to

be always centered on itself in its internal space, that is to leverage an egocentric perspective. Doing so

makes it natural to deﬁne sensory uncertainty and any relative properties that structures integration of

information from the environment to the internal space.

The experimental design we propose is to compare exploration driven by curiosity for policies seen

as (stochastic) actions on the state space of the agent, when policies are seen as an Euclidean change

of frame versus when they are seen as a projective transformation. The following Theorem states that

this experiment allows to discriminate when the behavior of the agent is dictated by objective perspec-

tive on its environment (Euclidiean change of frame) or when it is subjective (Projective change of frame).

Theorem (Discrimination of behaviour with respect to internal representation).If the agent explores

its environment in order to search for Owhen it has an objective representation of its environment, its

stops moving when it see O.

But, if its representation is given through projective transformations it will search to get closer to O,

even after seeing O.

Proof. See Appendix A

References

[Da Costa et al., 2020] Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., and Friston, K. (2020).

Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99:102447.

[Dehaene et al., 2017] Dehaene, S., Lau, H., and Kouider, S. (2017). What is consciousness, and could

machines have it? Science, 358(6362):486–492.

[Friston et al., 2006] Friston, K., Kilner, J., and Harrison, L. (2006). A free energy principle for the

brain. Journal of Physiology-Paris, 100(1):70–87. Theoretical and Computational Neuroscience: Un-

derstanding Brain Functions.

[Friston et al., 2015] Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., and Pezzulo, G.

(2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4):187–214.

3

[Kleiner and Tull, 2021] Kleiner, J. and Tull, S. (2021). The mathematical structure of integrated infor-

mation theory. Frontiers in Applied Mathematics and Statistics, 6.

[Koch et al., 2016] Koch, C., Massimini, M., Boly, M., and Tononi, G. (2016). Neural correlates of

consciousness: progress and problems. Nature Reviews Neuroscience.

[Mashour et al., 2020] Mashour, G. A., Roelfsema, P., Changeux, J.-P., and Dehaene, S. (2020). Con-

scious processing and the global neuronal workspace hypothesis. Neuron, 105(5):776–798.

[Ognibene et al., 2019] Ognibene, D., Mirante, L., and Marchegiani, L. (2019). Proactive intention recog-

nition for joint human-robot search and rescue missions through monte-carlo planning in pomdp en-

vironments. In Salichs, M. A., Ge, S. S., Barakova, E. I., Cabibihan, J.-J., Wagner, A. R., Castro-

Gonz´alez, ´

A., and He, H., editors, Social Robotics, pages 332–343, Cham. Springer International Pub-

lishing.

[Rudrauf et al., 2017] Rudrauf, D., Bennequin, D., Granic, I., Landini, G., Friston, K., and Williford, K.

(2017). A mathematical model of embodied consciousness. Journal of theoretical biology, 428:106–131.

[Rudrauf et al., 2020] Rudrauf, D., Bennequin, D., and Williford, K. (2020). The moon illusion explained

by the projective consciousness model. Journal of Theoretical Biology, 507:110455.

[Rudrauf et al., 2022] Rudrauf, D., Sergeant-Perthuis, G., Belli, O., Tisserand, Y., and Serugendo, G.

D. M. (2022). Modeling the subjective perspective of consciousness and its role in the control of

behaviours. Journal of Theoretical Biology, 534:110957.

[Rudrauf et al., 2021] Rudrauf, D., Sergeant-Perthuis, G., Tisserand, Y., Monnor, T., and Belli, O.

(2021). Combining the Projective Consciousness Model and Virtual Humans to assess ToM capac-

ity in Virtual Reality: a proof-of-concept.

[Solms, 2019] Solms, M. (2019). The hard problem of consciousness and the free energy principle. Fron-

tiers in Psychology, 9.

[Timsit and Sergeant-Perthuis, 2021] Timsit, Y. and Sergeant-Perthuis, G. (2021). Towards the idea of

molecular brains. International Journal of Molecular Sciences, 22(21).

[Wiering and van Otterlo, 2012] Wiering, M. and van Otterlo, M. (2012). Reinforcement Learning: State-

of-the-Art. Springer Berlin Heidelberg.

[Williford et al., 2018] Williford, K., Bennequin, D., Friston, K., and Rudrauf, D. (2018). The projective

consciousness model and phenomenal selfhood. Frontiers in Psychology, 9:2571.

A Proof of Theorem

Assume that, for any x, y ∈R3,

PY|X(y|x) = 1[∥x−y∥ ≤ 1] (6)

where ∥.∥designates the Euclidean norm on R3, i.e. ∥x∥2=x2

0+x2

1+x2

2. We will also denote By(1)

the Euclidean ball of radius 1 around y∈R3, i.e. By(1) = {x∈R3| ∥x−y∥ ≤ 1}.

Then, for any probability distribution Qover R3, let us denote C(Q[ψ]) as C(Q, ψ),

C(Q, ψ) = Zψ∗Q(dx1)Zdy1[x1∈By(1)] ln 1[x1∈By(1)]

Rψ∗Q(dx1)1[x1∈By(1)] (7)

=−Zdy ln Q(ψ−1(By(1)) Zψ∗Q(dx1)1[x1∈By(1)] (8)

=−ZdyQ(ψ−1(By(1)) ln Q(ψ−1(By(1)) (9)

4

When ψis an Euclidean transformation, ψ−1(By(1)) = Bψ−1(y)(1), therefore, in the Euclidean case,

C(Q, ψ) = −ZdyQ(Bψ−1(y)(1)) ln Q(Bψ−1(y)(1))

=−ZdyQ(By(1)) ln Q(By(1))

In this case, curiosity is independent on the change of Euclidean frame and not moving is a perfect

valid choice in order to maximize the agent’s curiosity. Let us remark that this eﬀect appears as the agent

assumes (or believes) that it has access to the whole conﬁguration space of O, if it knew it had limited

access to it, through for example limited sight, we expect the agent to look around until the object Ois

in sight and then stop.

In the non Euclidean case, thanks to the rewriting of curiosity as Equation 9, we show that curiosity

favors transformations that shrink By(1) through ψ−1; indeed if two transformations ψ, ψ1are such that,

ψ−1(By(1)) ⊆ψ−1

1(By(1)) (10)

then,

−ψ−1(By(1)) ln ψ−1(By(1)) ≥ −ψ−1

1(By(1)) ln ψ−1

1(By(1)) (11)

For any initial prior Q0at stating time t= 0, that we assume to be absolutely continuous with respect

to the Lebesgue measure (i.e. Q0(dx) = q0(x)dx), after one step, the agent updates its prior as,

q1(x)∼

=1[x∈By0(1)]q0(x) (12)

where y0∈R3is the real position of the object O: the belief is supported around By0(1). By minimiz-

ing curiosity the agent will try to shrink ψ−1(Bψ(y0)(1)), which is always inside the set where measure is

supported (i.e. By0(1) ). Shrinking ψ−1(Bψ(y0)(1)) means magnifying the zone around y0in the agents

new frame after action. If the agent has enough actions it will always be able to shrink ψ−1(Bψ(y0)(1))

and therefore move closer to y0.

The case where the initial prior Q0is uniform in a delimited large box around y0(large with respect

to 1) is a particular case where all the computations are tractable by hand.

5