Content uploaded by Guillermo Oliver
Author content
All content in this area was uploaded by Guillermo Oliver on Oct 07, 2019
Content may be subject to copyright.
Active inference body perception and action for
humanoid robots
Guillermo Oliver‡, Pablo Lanillos‡, Gordon Cheng‡
‡Institute for Cognitive Systems, Technical University of Munich, Arcisstrasse 21, 80333 Munich, Germany
Abstract—One of the biggest challenges in robotics systems
is interacting under uncertainty. Unlike robots, humans learn,
adapt and perceive their body as a unity when interacting with
the world. We hypothesize that the nervous system counteracts
sensor and motor uncertainties by unconscious processes that
robustly fuse the available information for approximating their
body and the world state. Being able to unite perception and
action under a common principle has been sought for decades
and active inference is one of the potential unification theories.
In this work, we present a humanoid robot interacting with the
world by means of a human brain-like inspired perception and
control algorithm based on the free-energy principle. Until now,
active inference was only tested in simulated examples. Their
application on a real robot shows the advantages of such an
algorithm for real world applications. The humanoid robot iCub
was capable of performing robust reaching behaviors with both
arms and active head object tracking in the visual field, despite
the visual noise, the artificially introduced noise in the joint
encoders (up to 40 degrees deviation), the differences between
the model and the real robot and the misdetections of the hand.
Index Terms—Active inference, Free-energy optimization, Bio-
inspired Perception, Predictive coding, Humanoid robots, iCub.
I. INTRODUCTION
The medical doctor and physicist Hermann von Helmholtz
described visual perception as an unconscious mechanism that
infers the world [1]. In other words, the brain has generative
models that complete or reconstruct the world from partial
information. Nowadays, there is a scientific mainstream that
describes the inner workings of the brain as those of a
Bayesian inference machine [2] [3]. This approach supports
that we are able to adjust the cues (visual, proprioceptive,
tactile, etc) contribution to our interpretation in a Bayesian
optimal way taking into account sensors and motor uncertain-
ties. This implies that the brain is able to encode uncertainty
not only for perception but also for acting in the world.
Optimal feed-back control was proposed for modelling motor
coordination [4]. Alternatively, active inference [5], defended
that both perception and action are two sides of the same pro-
cess: an unconscious mechanism that infers and adapts to the
environment. Either way, perception is inevitably connected
to the body senses and actuators, being the body the entity of
interaction [6] and possible learned through development [7].
From the several available brain theories that have arisen in
the last decades, some of them can be unified under the free-
This work has been supported by SELFCEPTION project
(www.selfception.eu) European Union Horizon 2020 Programme under
grant agreement n. 741941 and the European Union’s Erasmus+ Programme.
TARGET
PREDICTION
VISUAL
ERROR
PREDICTION
ERROR
ACTION
MOTION
Fig. 1: Dual arm and head active inference. The robot dy-
namically infers its body configuration (transparent arm) using
the prediction errors. Visual prediction error is the difference
between the real visual location of the hand (red) and the
predicted one (green), which generates an action to reduce this
discrepancy. In the presence of a perceptual attractor (blue)
an error in the desired sensory state is produced promoting an
action towards the goal: the equilibrium point appears when
the hand reaches the object. The head is in motion to keep the
object in its visual field, improving the reaching performance.
energy principle [5]. This principle accounts for perception,
action and learning through the minimization of surprise. This
is the discrepancy between the current state and the predicted
or desired one, also known as prediction error. According to
this approach, free-energy is a way of quantifying surprise
and it can be optimized by changing the current beliefs
(perception) or by acting on the environment (action) to adjust
the difference between reality and prediction [8].
We present robotic body perception as a flexible and dy-
namic process that approximates the body latent configuration
using the error between the expected and the observed sensory
information. In this work we provide an active inference
mathematical model for a humanoid robot combining per-
ception and action, extending [9]. This model enabled the
robot to have adaptive body perception and to perform robust
reaching behaviors even under high levels of sensor noise and
discrepancies between the model and the real robot (Fig. 1).
This is due to the way the optimization framework fuses the
available information from different sources and the coupling
between action and perception.
arXiv:1906.03022v2 [cs.RO] 8 Aug 2019
A. Related work
Multisensory perception has been widely studied in the
literature and enables the robot to combine joint information
with other sensors such as images and tactile cues. Bayesian
estimation has been proved to achieve robust and accurate
model based robot arm tracking [10] even under occlusion
[11]. Furthermore, integrated visuomotor processes enabled
humanoid robots to learn object representations through ma-
nipulation without any prior knowledge about them [12], learn
motor representations for robust reaching [13], [14] and even
visuotactile motor representations for reaching and avoidance
behaviors [15].
Active inference (under the free-energy principle) includes
action as a classical spinal reflex arc pathway triggered by
perception prediction errors and has been mainly studied in
theoretical or simulated conditions. Friston presented in [8]
a theoretical motor model with two degrees of freedom as an
extension of the dynamic expectation maximization algorithm.
It was recently studied in robot control of a simulated PR2
robot [16], one degree of freedom simulated vehicle [17] and
two degrees of freedom simulated robot arm [18].
A first model of the free-energy optimization in a real robot
was performed in [9] working as an approximate Bayesian
filter estimation, where the robot was able to perceive its arm
location fusing visual, proprioceptive and tactile information.
However, authors left out the action. In this work, we took one
step further and modelled and applied active inference to the
iCub robot for dual arm reaching with active head tracking.
For reproducibility, the code is publicly available1. While the
arms goal is to minimize the prediction error between the goal
(object) and the end-effector visual location, the head goal is
to maintain the object centered in the field of view to provide
wider and more accurate reaching capabilities.
B. Paper organization
First in Sec. II we explain the general mathematical free-
energy optimization for perception and action. Afterwards,
Sec. III describes the the iCub physical model and in Sec. IV
and Vwe detail the active inference computational model that
allows the robot to perform robust reaching and tracking tasks.
Finally, VI shows the results obtained analyzing the advantages
and limitations of the proposed algorithm.
II. FR EE -ENERGY OPTIMIZATION MODEL
A. Bayesian inference
According to the Bayesian inference model for the brain, the
body configuration2xis inferred using the available sensory
data sby applying Bayes’ theorem:
p(x|s) = p(s|x)p(x)
p(s)(1)
where the posterior probability,p(x|s), corresponding to the
probability of body configuration xgiven the observed data
1tobereleased
2We define body configuration or body schema as a generic way to refer
to the body position in the space. For instance, the joint angles of the robot.
s, is obtained as a consequence of three antecedents: (1)
likelihood,p(s|x), or compatibility of the observed data s
with the body configuration x, (2) prior probability,p(x),
or current belief about the configuration before receiving
the sensory data s, also known as previous state belief,
and (3) marginal likelihood,p(s), which corresponds to the
marginalization of the likelihood of receiving sensory data s
regardless of the configuration. This is a normalization term,
p(s) = Rxp(s|x)p(x)dx, which ensures that the posterior
probabilities, p(x|s), for the whole range of x, integrates to 1.
The goal is to find the value of xwhich maximizes p(x|s),
because it is the most likely value for the real-world body con-
figuration xaccording the sensory data obtained s. This direct
method presents a great difficulty, where the marginalization
over all the possible body states becomes intractable.
B. Free-energy principle
The free-energy principle [5] provides a tractable solution to
this obstacle, where, instead of calculating the marginal likeli-
hood, the idea is to minimize the Kullback-Leibler divergence
[19] between a reference distribution q(x)and the real p(x|s),
in order for the first to the a good approximation of the second.
DKL (q(x)||p(x|s)) = Zq(x) ln q(x)
p(x|s)dx
=Zq(x) ln q(x)
p(s, x)dx + ln p(s) = −F+ ln p(s)≥0(2)
It is important to note that the marginal likelihood, p(s), is
independent of the configuration variable xand because q(x)
is a probability distribution, which means that the integral over
its entire range is 1. Thus, p(s)falls out of the integration.
Maximizing the negative first term effectively minimizes the
difference between these two densities, with only the marginal
likelihood remaining. Unlike the whole expression of the K-L
divergence, the first term can be evaluated because it depends
on the reference distribution and the knowledge about the en-
vironment we can assume the agent has, p(s, x) = p(s|x)p(x).
This term is defined as variational negative free-energy [20],
[21].
F≡ −Zq(x) ln q(x)
p(s, x)dx =Zq(x) ln p(s, x)
q(x)dx (3)
Maximizing this expression with respect to q(x)is known
as free-energy optimization and results in minimizing the K-L
divergence between the two distributions3.
Maximizing Fis equivalent to the previous goal of maxi-
mizing the posterior probability p(x|s), due to the fact that all
probability distributions are strictly non-negative. Considering
that the second term of the K-L divergence, ln p(s), is not de-
pendent on neither the reference distribution q(x)nor the value
3The expression for free-energy also appears in the literature in its positive
version, without the negative sign that precedes it, being in that case the
objective of optimization the minimization of the expression.
of the body configuration x, the same value of xoptimizes all
three quantities: F,p(x|s)and DKL (q(x)||p(x|s)).
According to the free-energy optimization theory, there
are two ways to minimize surprise, which accounts for the
discrepancy between the current state and the predicted or
desired one (prediction error): changing the belief (perceptual
inference) or acting on the world (active inference). Perceptual
inference and active inference optimize the value of the free-
energy expression F, while active inference also optimizes the
value of the marginal likelihood by acting on the environment
and changing the sensory data s.
Under the mean-field and Laplace approximation (the pos-
terior is approximated from a family of tractable distributions)
we can use the Maximum-A-Posteriori (MAP) to approximate
the mode [21] and simplify the calculus of F. The factorized
variational density qiis assumed to have Gaussian form
N(µi,Σi). Then, defining the Laplace-encoded energy [22]
as L(s, µ) = −ln p(s, µ),Fcan be approximated to:
F=Zq(x)L(s, µ)dx +Zq(x) ln q(x)dx ≈(4)
≈L(s, µ) + X
i
1
2(ln |Σi|+niln 2π)(5)
where niis the number of parameters in the iset.
Finally, the modes µican be computed maximizing the
expected variational energy. We can further approximate µ
using the gradient descent on the Laplace-encoded energy
L(s, µ)assuming that the second term is a constant.
C. Perceptual inference
Perceptual inference, is the process of updating the inner
model belief to best account for sensory data, minimizing the
prediction error.
The agent must update the most-likely or optimal value for
the body configuration µin each state. This optimal value is
the one that maximizes negative free-energy, therefore a first-
order iterative optimization algorithm of gradient ascent will
be applied to approach the local maximum of the function. In
this case, this means that it should be changed proportionally
to the gradient of negative free-energy.
For static systems, this update is done directly considering
only the gradient ascent formulation: ˙µ=∂F
∂µ [9]. In dynamic
systems, the time derivatives of the body configuration should
be considered. Usually first and second order derivatives are
considered, µ0and µ00, but higher order derivatives could
also be considered if their dynamic equations of behavior are
known. The state variable is now a vector: µ= [µ µ0µ00]T.
In this case, all values and derivatives must be updated taking
into consideration the next higher order derivative:
˙
µ=Dµ +∂F
∂µ(6)
where the Dis the block-matrix derivative operator with the
superdiagonal filled with ones.
When negative free-energy is maximized, the value of its
derivative is ∂ F
∂µ = 0, and the system is at equilibrium; i.e.
static systems ˙µ= 0 and dynamic systems ˙
µ=Dµ.
The expression (6) denotes the change of µwith time,
and it is used to update the value of µusing any kind of
numerical integration methods. We use a simple first-order
Euler integration method, where in each iteration the value will
be calculated using a linear dependency: µi+1 =µi+ ˙µ∆t,
where ∆t=Tis the period of execution of the updating cycle
for the internal state.
D. Active inference
Active inference [8], is the extension of perceptual inference
to the relationship between sensors and actions, taking into
account that actions can change the world to make sensory
data more accurate with predictions made by the inner model.
Action plays a core role on the optimization and improves
the approximation of the real distribution, therefore reducing
the prediction error by minimizing free-energy. It also acts
on the marginal likelihood by changing the real configuration
which modifies the sensory data sto obtain new data that is
more in concordance with the agent’s belief.
In this case, the optimal value is also the one which
maximizes negative free-energy, and again a gradient ascent
approach will be taken to update the value of the action.
˙a=∂ F
∂a (7)
where ais calculated using a first-order Euler numerical
integration with explicit gain: ai+1 =ai+ka˙a∆t.
The combination of active inference and perception provides
the mathematical framework for free-energy optimization.
III. ROBOT PHYSICAL MODEL
μ
μ'
q
v
Environment
v
a
ρ
ρ
q1a1
q2
a2q3a3
Fig. 2: Model description. (Left) Generative model of the
robot. (Right) Notation and configuration of the robot. Shown
variables: internal variables µand µ0, joint angle position q,
actions applied to the joints a, the visual location of the end-
effector vand the causal variables ρ.
iCub [23] (v1.4) is a 104 centimeter tall and 22 kilogram
humanoid robot that resembles a small child with 53 degrees
of freedom powered by electric motors and driven by tendons.
The upper body has 38 degrees of freedom, distributed in 7 for
each arm, 9 for each hand and 6 for the head (3 on the neck and
3 for the eyes). The lower body has 15 degrees of freedom, 6
for each leg and 3 more in the waist. The software is built on
top of the YARP (Yet Another Robot Platform) framework,
to facilitate communication between different hardware and
software implementations in robotics [24].
The robot is divided into several kinematic chains, that
are distributed according to its extremities. All kinematic
chains are defined through homogeneous transformation ma-
trices using Denavit-Hartenberg convention. We focus on two
kinematic chains, those with the end-effector being the right
hand (without considering its fingers) and the left eye.
TABLE I: Considered DOF for iCub robot
Location Link θi(deg) Joint name
Arm (right/left) 4 -90 + [0, 160.8] (r/l) shoulder roll
Arm (right/left) 5 -105 + [-37, 100] (r/l) shoulder yaw
Arm (right/left) 6 [5.5, 106] (r/l) elbow
Head 3 90 + [-40, 30] neck pitch
Head 5 90 + [-55, 55] neck yaw
Without lose of generality, the arm model is defined
as a three degree of freedom (revolute joints) system:
r shoulder roll,r shoulder yaw and r elbow. The left eye
camera observes the end-effector position and the world
around it. The joints considered for the motion of the head
are: neck pitch and neck yaw.
The symbolic matrices for the kinematics of these chains in
terms of the joint variable were obtained using Mathematica.
These are the homogeneous transformation matrices for both
complete chains from the local robot origin to the end-effector
reference frame, as well as their partial derivatives in terms of
its three degrees of freedom.
We generated a 3-dimensional model in SolidWorks in order
to provide more accurate simulations for multi-body dynamics.
Figures 3c and 3f show the surface that defines the working
range of the robot in terms of its degrees of freedom. We used
this model to design all reaching experiments.
IV. ACTIVE INFERENCE COMPUTATIONAL MODEL FOR
ICUB A RM R EAC HI NG TASK
A. Problem formulation
The body configuration, or internal variables, is defined as
the joint angles. The estimated states µ∈IR3are the belief
the agent has about the joint angle position and the action
a∈IR3is the angular velocity of those same joints. Due to
the fact that we use a velocity control for the joints, first order
dynamics must also be considered µ0∈IR3.
µ=
µ1
µ2
µ3
µ0=
µ0
1
µ0
2
µ0
3
a=
a1
a2
a3
(8)
The sensory data will be obtained though several input
sensors that provide information about the position of the end-
effector in the visual field, sv∈IR2, and joint angle position,
sp∈IR3.
sp=
q1
q2
q3
sv=v1
v2(9)
The likelihood p(s|µ)is made up of proprioception func-
tions in terms of the current body configuration, while the prior
p(µ)will take into account the dynamic model of the agent
that describes how this internal state changes with time. The
combination of both probabilities formalize the negative free-
energy. Adapting the Laplace-encoded energy for the model
described in Fig. 2:
p(s,µ) = p(s|µ)p(µ) = p(sp|µ)p(sv|µ)p(µ0|µ,ρ)(10)
B. Negative free-energy optimization
In order to define the conditional densities for each of the
terms, we should define the expressions for the sensory data.
Joint angle position, sp, is obtained directly from the joint
angle sensors. Lets assume that the input is noisy and follows
a normal distribution with mean at the internal value µand
variance Σsp. The end-effector visual position, sv, is defined
by a non-linear function dependent on the body configuration
and obtained using the forward model of the right arm and
the pinhole camera model for the left eye camera of the robot.
Lets assume that the input is noisy and follows a normal
distribution with mean at the value of this function g(µ)∈IR2
and variance Σsv. The dynamic model is determined by a
function which depends on both the current state µand the
causal variables ρ(e.g. the visual plane position of the object
to be reached). We assume that the input is noisy and follows
a normal distribution with mean at the value of this function
f(µ,ρ)∈IR3and variance Σsµ.
g(µ) = g1(µ)
g2(µ)f(µ,ρ) =
f1(µ,ρ)
f2(µ,ρ)
f3(µ,ρ)
(11)
Considering the same normal distribution assumptions for
the internal state and sensorial terms, the expressions of
probability functions are extended to consider all the elements
of the vectors, where Ci=1
√2πΣsi
:
p(sp|µ) = Cp
3
Y
i=1
exp −1
2Σsp
(qi−µi)2(12)
p(sv|µ) = Cv
2
Y
i=1
exp −1
2Σsv
(vi−gi(µ))2(13)
p(µ0|µ,ρ) = Cµ
3
Y
i=1
exp −1
2Σsµ
(µ0
i−fi(µ,ρ))2(14)
Variational negative free-energy, considering the previous
density functions is obtained applying the natural logarithm to
(10). The sequence product is transformed into a summation
due to the properties of the natural logarithm.
F= ln p(sp|µ) + ln p(sv|µ) + ln p(µ0|µ,ρ) + C=
=
3
X
i=1 −1
2Σsp
(qi−µi)2+
2
X
i=1 −1
2Σsv
(vi−gi(µ))2
+
3
X
i=1 −1
2Σsµ
(µ0
i−fi(µ, ρ))2+ ln Cp+ ln Cv+ ln Cµ+C
(15)
The vectorial equations used for the gradient ascent for-
mulation are obtained from the differentiation of the scalar
free-energy term by the internal state vector and the action
vector (Eq. (6) and (7)). The dependency of Fwith respect
to the vector of internal variables µcan be calculated using
the chain rule on the functions that depend on those internal
variables. The dependency of Fwith respect to the vector of
actions ais calculated considering that the only magnitudes
directly affected by action are the values obtained from the
sensors.
∂F
∂µ=1
Σsp
(sp−µ) + 1
Σsv
∂g(µ)
∂µ
T
(sv−g(µ))
+1
Σsµ
∂f(µ,ρ)
∂µ
T
(µ0−f(µ,ρ)) (16)
∂F
∂a=−1
Σsp
∂sp
∂a
T
(sp−µ) + 1
Σsv
∂sv
∂a
T
(sv−g(µ))(17)
Even though an angular velocity control is being carried out,
the agent can also be aware of the values of the first-order
dynamics and they can be updated using a gradient ascent
formulation. The dependency of Fwith respect to the first-
order dynamics vector µ0is limited to the influence of the
dynamic model.
∂F
∂µ0=−1
Σsµ
(µ0−f(µ,ρ)) = 1
Σsµ
(f(µ,ρ)−µ0)(18)
Equation (18) shows that the update of the first-order
dynamics has a negative feedback, causing a beneficial effect
on the stability of the process. This also means that at the
equilibrium point the value of the derivative should be zero.
Considering the equations above, the complete update equa-
tions are:
˙
µ=µ0+∂F
∂µ˙
µ0=∂F
∂µ0˙
a=∂F
∂a(19)
A first order Euler integration method is applied to update
the values of µ,µ0and ain each iteration.
C. Perceptual attractor dynamics
We introduce the reaching goal as a perceptual attractor in
the visual field as follows:
A(µ,ρ) = ρ3 ρ1
ρ2−g1(µ)
g2(µ) (20)
The internal variable dynamics will be defined in terms of
the attractor:
f(µ,ρ) = T(µ)A(µ,ρ)(21)
where T(µ)is the function that transforms the attractor vector
from target space (visual) to the joint space. The system
is velocity controlled, therefore the target space is a linear
velocity vector and the joint space is an angular velocity.
The visual Jacobian matrix that relates visual space (2
coordinates) to joint space (3 DOF) is a rectangular matrix,
therefore the mapping matrix used is the generalized inverse
(Moore-Penrose pseudoinverse) of it: T(µ) = J+
v(µ). This
matrix is calculated using the singular-value decomposition
(SVD), where Jv=UΣVTand J+
v=VΣUT.
D. Active inference
The action is set to be an angular velocity magnitude, a,
which corresponds with angular joint velocity in the latent
space. We must calculate the expression of the partial deriva-
tives for the matrices ∂sp
∂aand ∂sv
∂ain (17) to quantify the
dependency of these parameters with respect to the velocities
of the joints.
We assume that the control action, a, is updated for every
cycle, and therefore for each interval of time between cycles
it has the same value. For each period (cycle time between
updates), the equation of a uniform circular motion is satisfied
for each joint position. If this equation is discretized, for each
sampling moment, which are Tseconds apart, the value will
be updated in the following way: qi+1 =qi+aiT. The
dependency of the joint angle position with respect to the
control action is therefore defined.
The partial derivatives of joint position spwith respect
to action, considering there is no cross-influence or coupling
between joint velocities and that qiand its expected µishould
converge at equilibrium, are given by the following expression:
∂qi
∂aj
=∂µi
∂aj
=(T i =j,
0otherwise.(22)
If the dependency of joint position with respect to action is
known, we can use the chain rule to calculate the dependency
for the visual sensor, sv. Considering that the values of g1(µ)
and g2(µ)should also converge to v1and v2at equilibrium,
the partial derivatives are given by the following expression:
∂vi
∂aj
=∂vi
∂qj
∂qj
∂aj
=∂gi(µ)
∂µj
∂µj
∂aj
(23)
V. ACTIVE INFERENCE COMPUTATIONAL MODEL FOR
ICUB H EA D OB JE CT T RAC KI NG TASK
We extend the arm reaching model for the head to obtain
an object tracking motion behavior. The goal of this task
is to maintain the object in the center of the visual field,
thus increasing its reaching working range capabilities. Two
available degrees of freedom (yaw and pitch) are used for this
purpose.
Visual error (pixels)
1
1
1 2
2
2
3
3 4 4
encoders +vision
encoders
vision
0 5 10 15 20 25 30
0
20
40
60
80
Time (s)
(a) Sensory fusion error.
Visual coordinate v1(pixels)
1
2
3
4
encoders +vision
encoders
vision
100 150 200 250 300
100
125
150
175
200
225
Visual coordinate v2(pixels)
(b) Sensory fusion path.
(c) Working range of right arm
end-effector (front).
Visual error (pixels)
1
1
1
1 2
2
2
3
3
3
2 3
4
4
4 4
No noise
x=0, σ=10°
x=0, σ=20°
x=0, σ=40°
0 10 20 30 40
0
20
40
60
80
Time (s)
(d) Encoder noise handling error.
Visual coordinate v1(pixels)
1
2
3
4
No noise
x=0, σ=10°
x=0, σ=20°
x=0, σ=40°
100 150 200 250 300
100
125
150
175
200
225
Visual coordinate v2(pixels)
(e) Encoder noise handling path.
(f) Working range of right arm
end-effector (back).
Fig. 3: Results of right arm object reaching using sensory fusion and under noisy sensors. (Left) RMS visual error between the
real visual position and the attractor position. For each one of the runs the moment at which the attractor position is achieved
is represented with a number inside a colored circle. (Center) Path followed by the end-effector in the visual plane. (Right)
Working range of the right hand end-effector used to calculate the attractor positions. The dark surface is considering only
shoulder roll and elbow (2 DOF), while the volume is considering also shoulder yaw (3 DOF).
A. Problem formulation
Sensory data and proprioception for the humanoid robot
head is defined by internal variables beliefs µe∈IR2, actions
ae∈IR2, and first-order dynamic vectors µ0
e∈IR2and
because the end effector in this case is the camera itself, there
is only joint angle position, se∈IR2.
B. Negative free-energy optimization
Variational negative free-energy for the head motion, Fe,
is obtained from the conditional densities and its dependency
with respect to internal variables µe, actions aeand first-order
dynamics µ0
eis calculated.
Fe=
2
X
i=1 −1
2Σse
(qei−µei)2+ ln Ce
+
2
X
i=1 −1
2Σsµe
(µ0
ei−fei(µei, ρ))2+ ln Cµe+C(24)
∂Fe
∂µe
=1
Σse
(se−µe) + 1
Σsµe
∂fe(µe,ρ)
∂µe
T
(µ0
e−fe(µe,ρ))
∂Fe
∂ae
=−1
Σse
∂se
∂ae
T
(se−µe)
∂F
∂µ0
e
=1
Σsµe
(fe(µe,ρ)−µ0
e)(25)
C. Perceptual attractor dynamics
In order to obtain a the desired motion in the visual field,
an attractor will be defined towards the center of the image
(cx, cy). The attractor position (ρ1, ρ2)is read from the visual
input and will dynamically update with the motion of the head,
while the center coordinates have a constant value.
Ae(µe,ρ) = ρ3 cx
cy−ρ1
ρ2 (26)
Internal variable dynamics will be defined in terms of the
attractor: fe(µe,ρ) = Te(µe)Ae(µe,ρ),fe(µe,ρ)∈IR2.
With two pixel coordinates and two degrees of freedom, the
inverse of the Jacobian matrix can be directly used as the
mapping matrix in the visual space: Te(µe) = J−1
ev(µe).
VI. RE SU LTS
A. Experimental Setup
The iCub robot is placed in a controlled environment with
an easily recognizable object in front of it, which serves as a
perceptual attractor to produce the movement of the robot. The
values of the causal variables ρ1and ρ2are the horizontal and
vertical positions of that object in the image plane obtained
from the left eye camera, and the value of ρ3is a weighting
Angular position (radians)
q1
μ1
q2
μ2
q3
μ3
0 10 20 30 40 50
0.0
0.5
1.0
1.5
Time (s)
(a) Right arm internal state and encoders.
Angular position (radians)
q1e
μ1e
q2e
μ2e
0 10 20 30 40 50
-0.6
-0.4
-0.2
0.0
0.2
Time (s)
(b) Head internal state and encoders.
Angular error (radians)
(q1-μ1)
(q2-μ2)
(q3-μ3)
(qe1-μe1)
(qe2-μe2)
0 10 20 30 40 50
-0.05
0.00
0.05
Time (s)
(c) Error (sp−µ)and (se−µe).
Visual position (pixels)
v1
g1
ρ1
cx
0 10 20 30 40 50
0
50
100
150
200
250
300
Time (s)
(d) Visual coordinate 1.
Visual position (pixels)
v2
g2
ρ2
cy
0 10 20 30 40 50
0
50
100
150
200
Time (s)
(e) Visual coordinate 2.
Visual error (pixels)
(v1-g1(μ))
(v2-g2(μ))
0 10 20 30 40 50
-30
-20
-10
0
10
20
30
Time (s)
(f) Error (sv−g(µ)).
Angular velocity (radians/s)
a1
a2
a3
0 10 20 30 40 50
-0.10
-0.05
0.00
0.05
0.10
Time (s)
(g) Right arm actions.
Negative free-energy (right arm)
-0.06
-0.05
-0.04
-0.03
-0.02
-0.01
0.00
Right arm
Head
0 10 20 30 40 50
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
Negative free-energy (head)
Time (s)
(h) Negative free-energy. (i) Robot vision.
Fig. 4: Right arm reaching and head object tracking. Internal states are driven towards the perceptual attractor position. The
difference between the value internal states and encoders (Fig. 4c) and between calculated position and visual location of
end-effector (Fig. 4f) drive the actions. This algorithm optimizes free-energy and reaches the target attractor position. Fig. 4i
shows the visual perception of the robot with the attractor (blue), the prediction (green) and the real position (red).
factor to adjust the power of the attractor. The right arm end-
effector is also recognized by a visual marker that is placed on
the hand of the robot, obtaining the values of v1and v2. The
goal is to obtain a reaching behavior in the robot. The proposed
algorithm will generate an action in the right arm towards the
object, in order to reduce the discrepancy between the current
and the desired state imposed by the goal.
Three different experiments were performed: (1) robustness,
right arm reaching towards a series of locations that the robot
must follow in its visual plane, (2) dynamics evaluation, right
arm reaching model and active head model towards a moving
object, (3) generalization, both arms reaching and active head.
The relevant parameters in the algorithm are: (1) variance
Σspin the encoder sensor, (2) variance Σsvin the visual
perception, (3) variance Σsµin attractor dynamics and (4)
action gains ka. These parameters were tuned empirically with
their physical meaning in mind and remain constant during
the experiments, except encoder noise that was modified
to withstand more deviation in the encoder noise handling
situation of the first experiment.
B. Right arm reaching with sensory fusion under noisy sensors
The first experiment is performed to test the robustness of
the algorithm under two different conditions (Figure 3). The
robot has to reach four different static locations in the visual
field with the right arm. Once a location is reached the next
location becomes active. A location is considered to be reached
when the visual position is inside a disk with a radius of
five pixels centered at the location position. The evaluation
is assessed using the root mean square (RMS) of the errors
in the visual plane coordinates between the real end-effector
location (visual marker) and the target location.
Under the first condition we did not add any noise (only
intrinsic noise from the sensors and model errors). We tested
the contribution of each sensory information at the reaching
task: visual, joint angle encoders, and both together. Figures
3a and 3b show the RMS error and the path followed by the
arm respectively. Even though the model has been verified
and the camera calibrated, there was a difference between the
forward model and the real robot, due to possible deviations in
parameters and motor backslash, which implies that the robot
has to use the visual information to correct its real position.
Employing joint angle encoders and vision provides the best
behavior for the reaching of the fixed locations in the visual
field, achieving all positions in the shortest fashion. Visual
perception also reaches all the positions but it does not follow
the optimum path, while using only the encoder values fails
to reach all locations.
At the second condition we injected Gaussian noise in the
robot motors encoders spin order to test the robustness against
high noise. Thus, four trials were performed with Gaussian
noise with a zero mean and with standard deviation of 0°
(control test), 10°, 20° and 40°. Figures 3d and 3e show the
reaching error and the followed path for each trial. The results
of the runs with no noise and σ= 10° achieved very similar
results, with the first one achieving the objectives slightly
faster. When σ= 20°, motion was affected by deviations
in the path followed by the end-effector. The extreme case
with σ= 40° caused oscillations and erroneous approximation
trajectories that produced significant delays in the reaching of
the target locations. These results show the importance of a
reliable visual perception when discrepancies in the model or
unreliable joint angle measurements are present.
C. Right arm reaching of moving object with active head
We evaluated the algorithm for the right arm model and
active head for a moving object (manually operated by a
human, Fig. 4i). Variable dynamics are shown in Fig. 4. The
initial position of the right arm lies outside of the visual
plane (Fig. 4e missing v2). Hence, the free-energy optimization
algorithm only relies on joint measurements to produce the
reaching motion until the right hand appears in the visual plane
(v1enters Fig. 4d from the top). Fig. 4a and 4b show both
the encoders measurements qand the estimated joint angle µ
of the arm and head. Fig. 4d and 4e show how calculated g
and real vvisual positions of the right arm end-effector follow
the perceptual attractor ρ, while the head tries to maintain the
object at the center of the visual field c. Right arm actions
are depicted in 4g, and stop action is produced by the sense
of touch. Contact in any of the pressure sensors triggers the
grasping motion of the hand. Finally, Fig. 4h shows that the
algorithm optimizes (maximizes) the value of negative free-
energy for both arm and head.
D. Generalization: Dual arm reaching and active head
We generalized the algorithm for dual arm reaching. Free-
energy optimization reaching task was replicated for the left
arm, obtaining a reaching motion for both arms with a tracking
motion performed by the head. The result of this experiment,
along with other runs of the previous experiments can and be
found in the supplementary video tobereleased.
VII. CONCLUSIONS
This work presents the first active inference model working
on a real humanoid robot for dual arm reaching and active
head object tracking. The robot, evaluated with different level
of sensor noise (up to 40 degrees joint angles deviation),
was able to reach the visual goal compensating the errors
through free-energy optimization. The body configuration was
treated as an unobserved variable and the forward model as
an approximation of the real end-effector location corrected
online with visual input and thus tackling model errors. The
proposed approach can be generalized to whole body reaching
and incorporate forward model learning as shown in [9].
REFERENCES
[1] H. v. Helmholtz, Handbuch der physiologischen Optik. L. Voss, 1867.
[2] D. C. Knill and A. Pouget, “The bayesian brain: the role of uncertainty
in neural coding and computation,” TRENDS in Neurosciences, vol. 27,
no. 12, pp. 712–719, 2004.
[3] K. Friston, “A theory of cortical responses,” Philos Trans R Soc Lond
B: Biological Sciences, vol. 360, no. 1456, pp. 815–836, 2005.
[4] E. Todorov and M. I. Jordan, “Optimal feedback control as a theory of
motor coordination,” Nature neuroscience, vol. 5, no. 11, p. 1226, 2002.
[5] K. J. Friston, “The free-energy principle: a unified brain theory?” Nature
Reviews. Neuroscience, vol. 11, pp. 127–138, 02 2010.
[6] P. Lanillos, E. Dean-Leon, and G. Cheng, “Yielding self-perception in
robots through sensorimotor contingencies,” IEEE Trans. on Cognitive
and Developmental Systems, no. 99, pp. 1–1, 2016.
[7] Y. Kuniyoshi and S. Sangawa, “Early motor development from par-
tially ordered neural-body dynamics: experiments with a cortico-spinal-
musculo-skeletal model,” Biological cybernetics, vol. 95, p. 589, 2006.
[8] K. J. Friston, J. Daunizeau, J. Kilner, and S. J. Kiebel, “Action and
behavior: a free-energy formulation,” Biological cybernetics, vol. 102,
no. 3, pp. 227–260, 2010.
[9] P. Lanillos and G. Cheng, “Adaptive robot body learning and estimation
through predictive coding,” Intelligent Robots and Systems (IROS), 2018
IEEE/RSJ Int. Conf. on, 2018.
[10] C. Fantacci, U. Pattacini, V. Tikhanoff, and L. Natale, “Visual end-
effector tracking using a 3d model-aided particle filter for humanoid
robot platforms,” in 2017 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS). IEEE, 2017, pp. 1411–1418.
[11] C. Garcia Cifuentes, J. Issac, M. Wthrich, S. Schaal, and J. Bohg,
“Probabilistic articulated real-time tracking for robot manipulation,”
IEEE Robotics and Automation Letters, vol. PP, 10 2016.
[12] A. Ude, D. Omrˇ
cen, and G. Cheng, “Making object learning and recog-
nition an active process,” International Journal of Humanoid Robotics,
vol. 5, no. 02, pp. 267–286, 2008.
[13] C. Gaskett and G. Cheng, “Online learning of a motor map for humanoid
robot reaching,” in 2nd Int. Conf. on computational inte., robotics and
autonomous systems, Singapore, 2003.
[14] L. Jamone, M. Brandao, L. Natale, K. Hashimoto, G. Sandini, and
A. Takanishi, “Autonomous online generation of a motor representation
of the workspace for intelligent whole-body reaching,” Robotics and
Autonomous Systems, vol. 62, no. 4, pp. 556–567, 2014.
[15] A. Roncone, M. Hoffmann, U. Pattacini, L. Fadiga, and G. Metta,
“Peripersonal space and margin of safety around the body: learning
visuo-tactile associations in a humanoid robot with artificial skin,” PloS
one, vol. 11, no. 10, p. e0163713, 2016.
[16] L. Pio-Lopez, A. Nizard, K. Friston, and G. Pezzulo, “Active inference
and robot control: a case study,” J R Soc Interface, vol. 13, 2016.
[17] M. Baltieri and C. L. Buckley, “An active inference implementation of
phototaxis,” 2018 Conference on Artificial Life, no. 29, pp. 36–43, 2017.
[18] P. Lanillos and G. Cheng, “Active inference with function learning for
robot body perception,” International Workshop on Continual Unsuper-
vised Sensorimotor Learning, ICDL-Epirob, 2018.
[19] S. Kullback and R. A. Leibler, “On information and sufficiency,” The
Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 03 1951.
[20] R. Bogacz, “A tutorial on the free-energy framework for modelling
perception and learning,” Journal of Mathematical Psychology, vol. 76,
no. B, pp. 198–211, 2015.
[21] K. Friston, J. Mattout, N. Trujillo-Barreto, J. Ashburner, and W. Penny,
“Variational free energy and the laplace approximation,” Neuroimage,
vol. 34, no. 1, pp. 220–234, 2007.
[22] C. L. Buckley, C. S. Kim, S. McGregor, and A. K. Seth, “The free energy
principle for action and perception: A mathematical review,” Journal of
Mathematical Psychology, 2017.
[23] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori, “The icub
humanoid robot: An open platform for research in embodied cognition,”
Performance Metrics for Intelligent Systems Workshop, 01 2008.
[24] G. Metta, P. Fitzpatrick, and L. Natale, “Yarp: Yet another robot
platform,” International Journal of Advanced Robotic Systems, vol. 3(1),
pp. 43–48, 2006.