Conference PaperPDF Available

Online Learning of Visuo-Motor Coordination in a Humanoid Robot. A Biologically Inspired Model

Authors:

Abstract and Figures

Coordinating vision with movements of the body is a fundamental prerequisite for the development of complex motor and cognitive skills. Visuo-motor coordination seems to rely on processes that map spatial vision onto patterns of muscular contraction. In this paper, we investigate the formation and the coupling of sensory maps in the humanoid robot Aldebaran Nao. We propose a biologically inspired model for coding internal representations of sensorimotor experience that can be fed with data coming from different motor and sensory modalities, such as visual, auditory and tactile. The model is inspired by the self-organising properties of areas in the human brain, whose topologies are structured by the information produced through the interaction of the individual with the external world. In particular, Dynamic Self-Organising Maps (DSOMs) proposed by Rougier et al. [1] have been adopted together with a Hebbian paradigm for on-line and continuous learning on both static and dynamic data distributions. Results show how the humanoid robot improves the quality of its visuo-motor coordination over time, starting from an initial configuration where no knowledge about how to visually follow its arm movements is present. Moreover, plasticity of the proposed model is tested. At a certain point during the developmental timeline, a damage in the system is simulated by adding a perturbation to the motor command used for training the model. Consequently, the performance of the visuo-motor coordination is affected by an initial degradation, followed by a new improvement as the proposed model adapts to the new mapping.
Content may be subject to copyright.
Online Learning of Visuo-Motor Coordination
in a Humanoid Robot.
A Biologically Inspired Model.
Guido Schillaci, Verena V. Hafner
Cognitive Robotics Group
Humboldt-Universität zu Berlin, Germany
Emails:guido.schillaci@informatik.hu-berlin.de
hafner@informatik.hu-berlin.de
Bruno Lara
Cognitive Robotics Group
Universidad Autónoma del Estado de Morelos, Mexico
Email: bruno.lara@uaem.mx
Abstract—Coordinating vision with movements of the body is a
fundamental prerequisite for the development of complex motor
and cognitive skills. Visuo-motor coordination seems to rely on
processes that map spatial vision onto patterns of muscular
contraction.
In this paper, we investigate the formation and the coupling of
sensory maps in the humanoid robot Aldebaran Nao. We propose
a biologically inspired model for coding internal representations
of sensorimotor experience that can be fed with data coming
from different motor and sensory modalities, such as visual,
auditory and tactile. The model is inspired by the self-organising
properties of areas in the human brain, whose topologies are
structured by the information produced through the interaction
of the individual with the external world. In particular, Dynamic
Self-Organising Maps (DSOMs) proposed by Rougier et al. [1]
have been adopted together with a Hebbian paradigm for on-
line and continuous learning on both static and dynamic data
distributions.
Results show how the humanoid robot improves the quality
of its visuo-motor coordination over time, starting from an
initial configuration where no knowledge about how to visually
follow its arm movements is present. Moreover, plasticity of
the proposed model is tested. At a certain point during the
developmental timeline, a damage in the system is simulated by
adding a perturbation to the motor command used for training
the model. Consequently, the performance of the visuo-motor
coordination is affected by an initial degradation, followed by
a new improvement as the proposed model adapts to the new
mapping.
I. INTRO DUC TIO N
Coordinating vision with movements of the body is a fun-
damental prerequisite for the development of complex motor
and cognitive skills. In early developmental stages, infants
progressively bootstrap their attention capabilities towards a
growing number of salient events in their environment, such
as moving objects, their own body, external objects and
other individuals [2]. Developmental studies showed an early
coupling between visual and motor systems in infants [3]
and suggested a correlation between hand-eye coordination,
learning capabilities and social skills [4].
Control of movement is a capability that has been observed
to be acquired through exploration behaviours already during
prenatal stages [5]. Zoia and collegues [5] showed that there
is no evidence of coordinated kinematic patterns in hand-
to-mouth and hand-to-eye movements in foetuses up to the
gestational age of 18 weeks. However, around the 22nd week
of gestation, foetuses perform movements that show kinematic
patterns with acceleration and deceleration phases apparently
planned according to the size and to the delicacy of the target
(facial parts, such as mouth or eyes) [5].
Work related to the development of visuo-motor coordina-
tion can be found also in the developmental robotics literature.
Metta [6] implemented an adaptive control system inspired
by biological development of visuo-motor coordination for
the acquisition of orienting and reaching behaviours on a
humanoid robot. Following a developmental paradigm, the
system starts with moving the eyes only. At this point, control
is a mixture of random and goal-directed movements. The
development proceeds with the acquisition of closed loop
gains, reflex-like modules controlling the arm sub-system,
acquisition of an eye-head coordination and of a head-arm
coordination map.
Saegusa et al. [7] studied self-body perception in a hu-
manoid robot based on the coherence of visual and pro-
prioceptive sensory feedback. A robot was programmed to
generate random arm movements and to store image cues
in a visuomotor base together with joint angles information.
Correlations between visual and physical movements were
used to predict the location of the robot’s body in the visual
input, and to recognise it.
In recent publications [8] [9], we showed how a humanoid
robot acquires hand-eye coordination and reaching skills by
exploring its movement capabilities through body babbling
and by using a biologically inspired model consisting of Self-
Organising Maps (SOMs [10]). Such a behaviour led to the
development of pointing gestures. The model architecture is
inspired by the Epigenetic Robotics Architecture [11], where
a structured association of multiple SOMs has been adopted
for mapping different sensorimotor modalities in a humanoid
robot. We also showed how a robot can deal with tool-
use when equipped with self-exploration behaviours and with
the capability to execute internal simulations of sensorimotor
Fig. 1. A screenshot of the robot Aldebaran Nao babbling its left arm
in the simulated environment Cyberbotics Webots. The bottom left window
shows the visual input grabbed from the bottom camera of the robot. A
fiducial marker (ARToolkit, www.hitl.washington.edu/artoolkit) has been used
for tagging the hand of the robot. The experiment is run in real time.
cycles [12] [13].
Visuo-motor coordination seems to rely on processes that
map spatial vision onto patterns of muscular contraction.
Such a mapping would be acquired over time through the
physical interaction of the infant with its surrounding, with
a gradual formation of internal representations already during
the early stages of development. Rochat [15] demonstrated that
infants start to show, around the age of 3 months, systematic
visual and proprioceptive self-exploration. Rochat and Morgan
suggested that infants, by the age of 12 months, already
possess a sense of a calibrated intermodal (that is, occurring
from multiple sensory modalities) space of their body, or a
body schema, that is a perceptually organised entity which
they can monitor and control [16].
Body schemas are thought to rely on mappings between
different motor and sensor modalities. Evidences in the neuro-
science suggest the existence of topographic maps in the brain,
which can be seen as projections of sensory receptors or of
effector systems into structured areas of the brain. These maps
self-organise throughout the brain development in a way that
adjacent regions process spatially close sensory parts of the
body. Studies show the existence of such maps in the visual,
auditory, olfactory and somatosensory systems, as well as in
parts of the motor brain areas [17].
In this paper, we investigate the formation and the coupling
of sensory and motor maps in the humanoid robot Aldebaran
Nao, inspired by the formation of body schemas in humans.
We propose a biologically inspired model for coding internal
representations of sensorimotor experience that can be fed with
data coming from different motor and sensory modalities, such
as visual, auditory and tactile. The model is inspired by the
self-organising properties of areas in the human brain, whose
topology is structured by the sensory information produced by
the interaction of the individual with the external world.
Already in 1990, Martinetz et al. [18] proposed an ex-
tension of Kohonen’s self-organizing mapping for learning
visuo-motor coordination in a simulated robot arm with fixed
cameras. The authors used a network with three-dimensional
topology matched to the work space of the robot arm. The
system extracted the position of an object to reach from
the visual input and fed the 3D-lattice of nodes with its
coordinates. An output vector representing the arm posture was
associated to each node of the map. A training session has been
run for mapping sequences of input-output relations, to learn
the required transformations for visuo-motor coordination of a
robot arm [18]. However, as Arras and colleagues [19] pointed
out, the approach proposed by Martinetz and collegues [18]
was based on a time-dependent learning rate. While the model
worked well for the initial learning, it then kept the learning
rate at a constant level, which was insufficient for allowing
the network to adapt to changes in the robot’s environment.
Thus, Arras et al. extended the algorithm by coupling the
learning rate to the arm positioning error estimated from
the continuous camera feedback, thus allowing for adaptation
to drastic changes in the robot’s work environment [19].
However, both the approaches addressed learning of visuo-
motor coordination of a robot arm with fixed cameras, using
a model consisting of a three-dimensional map whose nodes
contain both visual input and motor output information [18]
[19].
In this paper, Dynamic Self-Organising Maps (DSOMs)
proposed by Rougier et al. [1] have been adopted as topology
preserving maps. Similarly to the algorithm presented in [19],
DSOMs allow for online and continuous learning on both
static and dynamic data distributions, thus enabling a dynamic
coupling between the environment and the model. In the ex-
periment presented here, we address visuo-motor coordination
in a humanoid robot with moving arm and camera, using
two DSOMs for coding the proprioceptive information coming
from the joint encoders of the arm and of the neck of the robot.
The two DSOMs are associated through Hebbian learning
modulated from the visual input through the interaction of
the robotic agent with its surrounding.
II. DYNAMI C SELF-O RGA NIS IN G MA PS
Classical Self-Organising Map algorithms implement de-
caying adaptation parameters for tracking data distribution.
Thus, self-organisation depends heavily on time-dependent
decreasing learning rate and neighbourhood function. Once
the adaptation strength has decayed, the network is unable to
react to subsequent changes in the signal distribution [20].
Models such as Growing Neural Gas (GNG) have been
proposed for online and lifelong learning, that can also adapt
to dynamic distributions [21]. GNGs have no parameters that
change over time and they allow for continuous learning,
adding units and connections, until a performance criterion
has been met [21]. Similarly, Evolving Self-Organising Maps
(ESOMs) [22] implement incremental networks that create
nodes dynamically based on the distance of the winner node
to the input data.
Rougier et al. [1] proposed the Dynamic Self-Organising
Map (DSOM), a modified SOM algorithm where the learning
rule and the neighbourhood function do not depend on time.
The authors demonstrated how the model dynamically adapts
to changing environments, or data distributions, as well as
stabilises over stationary distributions. They also reported
DSOM to perform better than classical SOM and Neural Gas
in a simulated scenario [1].
DSOM is a structured neural map composed of nodes with
fixed positions piin Rqin the lattice, where qis the dimension
of the lattice (in our experiment q= 2). Each node ihas a
weight withat is updated according to the input data pattern
vthrough a learning function and a neighbourhood function.
For each input pattern v, a winner sis determined as the
closest node in the DSOM to vusing an Euclidean distance.
As described by Rougier et al. [1], all codes wiare thus shifted
towards vaccording to the following rule:
wi=ǫkvwikhη(i, s, v)(vwi)(1)
where ǫis a constant learning rate and hη(i, s, v)is a neigh-
bourhood function of the form:
hη(i, s, v) = e
1
η2
kpipsk2
kvwik2(2)
where ηis the elasticity or plasticity parameter, piis the
position of the node iin the lattice, psis the position of the
winner node in the lattice. If v=wi, then hη(i, s, v) = 0.
The rationale behind such equations is that if a node is close
enough to the data, there is no need for other nodes to learn
anything, since the winner can represent the data. If there is
no node close enough to the data, any node learns the data
according to its own distance to the data [1].
However, the DSOM algorithm is not parameter free: the
elasticity parameter modulates the strength of the coupling
between nodes. If elasticity is too high, nodes cannot span
the whole space and the DSOM algorithm does not converge.
If elasticity is too low, coupling between nodes is weak and
may prevent self-organisation to occur [1]. The effect of the
elasticity, as reported by the authors, not only depends on the
size of the network and the size of the support but also on the
initial conditions. As mentioned by Rougier and collegues,
in order to reduce the dependency on the elasticity, the initial
configuration of the network should cover as much as possible
the entire support [1].
Nonetheless, DSOMs allow for dynamic neighbourhood and
lead to a qualitatively different self-organisation that can be
controlled using the elasticity parameter. DSOMs map the
structure or support of the distribution rather than its density,
as many other Vector Quantisation algorithms do.
III. LEA RN ING VI SUO -MOT OR COO RDI NATI ON
We implemented a biologically inspired model for learning
visuo-motor coordination in the Nao robot. The model consists
of two bi-dimensional DSOMs encoding the arm postures and
the head postures of the robot, respectively. Arm postures
consist of 4-dimensional vectors containing the angle positions
of the following joints of the robot: shoulder pitch, shoulder
roll, elbow yaw, elbow roll. Head postures consist of 2-
dimensional vectors containing the angle positions of the neck
joints of the robot: head yaw, head pitch.
The two DSOMs are associated through Hebbian links.
In particular, each node of the first DSOM is connected to
each node of the second DSOM, where the connection is
characterised by a weight. The weight is updated according
to a positive Hebbian rule that simulates synaptic plasticity
of the brain: the connection between a pre-synaptic neuron (a
node in the first DSOM) and a post-synaptic neuron (a node
in the second DSOM) increases if the two neurons activate
simultaneously. Thus, the model consists of two DSOMs and
a Hebbian table containing the weights of the links connecting
the two DSOMs. The size of the table is equal to the number
of nodes of the first DSOM multiplied by the number of nodes
of the second DSOM.
Learning consists of two parallel processes. The robot
executes random body babbling of its arm, that is, every 1.5
seconds, it executes a motor command of its arm towards a
joints configuration that is sampled from a uniform random
distribution within its arm joints ranges1. The first learning
process consists in updating the two DSOMs during the
execution of the random arm movement. The process of
moving the arm is decoupled from the processes of updating
the DSOMs. Instant by instant, with a frequency of 15H z,
the current positions of the joints of the arm are used as input
data vector for the learning rule of the arm DSOM (equations
(1) and (2)). The head DSOM is also updated using equations
1 and 2 and the current angle positions of the neck joints as
input data vector, with the same frequency of 15Hz.
Head movements are also generated every 1.5seconds,
at the same time as the generation of arm movements. In
particular, a motor command is sent to the neck joints as
follows:
- search for the winner node of the arm DSOM (the closest
node in the arm DSOM to the input vector represented
by the current arm joints configuration);
- select the winner node in the head DSOM as the one that
has the highest connection weight to the winner node in
the arm DSOM. If there is more than one winner node
(that is, multiple connections with the same weight), then
choose a random one from the group of winners;
- send a motor command to the joints of the neck equal to
the weight of the winning node.
A second learning process based on a Hebbian learn-
ing paradigm is run in parallel to the first learning pro-
cess. The Hebbian learning paradigm describes an associative
connection between activities of two connected nodes [8].
Here, when the end-effector of the robot is visible from
the visual input, the connection between the winner nodes
of the two DSOMs is strengthened. The hand of the robot
has been tagged with a fiducial marker and its position in
image coordinates has been estimated using the ARToolkit
1In the experiment presented here, only the four joints of left arm of the
robot are used: shoulder pitch, whose joint angle position can range between
119.5degrees and 119.5degrees; shoulder roll (range: 18 degrees to 76
degrees); elbow yaw (range: 119.5degrees to 119.5degrees); elbow roll
(range: 88.5degrees to 2degrees).
1234
58
11 12
15 16
124
5678
910 11 12
13 14 15 16
2-dimensional lattice
of the Head DSOM
2-dimensional lattice
of the Arm DSOM
Fig. 2. Illustration of the proposed model. On the left side, the 2-dimensional lattices of the two DSOMs (arm and head) are shown. The DSOMs can be
also represented in the input space, where nodes are positioned according to their weights (right side). Lines connecting the two DSOMs represent Hebbian
links, with weights w6= 0. Thicker lines correspond to stronger Hebbian links.
(www.hitl.washington.edu/artoolkit). The bottom camera of
the Nao robot has been used for grabbing the visual input.
Thus, visuo-motor coordination can be considered as suc-
cessful if the marker tagging the end-effector of the robot is
visible from the visual input. In this case, the Hebbian learning
process updates the Hebbian table connecting the two DSOMs
as follows. If a marker is visible:
- select the pre-synaptic neuron (winner node) as the clos-
est node iin the arm DSOM to the current arm joint
configuration x;
- select the post-synaptic neuron (winner node) as the
closest node jin the Head DSOM to the current neck
joint configuration y;
- strengthen the connection wij between the pre- and
post-synaptic neurons according to the modified positive
Hebbian rule:
wij =λAi(x)Aj(y)fc(3)
where Ai(x)is the activation function of the neuron iover
the Euclidean distance between the neural weights and the data
pattern x,λis a scaling factor for slowing down the growth
of the weights (in this experiments it is initialised as equal to
0.01), and fcis a multiplying factor related to the distance
between the perceived position of the hand (marker) in image
coordinates to the center of the image grabbed from the robot
camera (image size: 320 ×240). fcranges from 1 (hand at the
center of the image) to 0 (hand at the corner of the image)
and it is used to make the system choose head positions that
result in the hand being close to the center of the image.
As in Kajic et al. [8], the activation function of a neuron,
A(d), is computed as:
A(d) = 1
1 + tanh(d)(4)
where dis the Euclidean distance between the position
of the node and the input pattern. All weights between the
two DSOMs are set initially to zero allowing for an activity-
dependent role of structural growth in neural networks [8].
Figure 2 shows an illustration of the proposed model, which
consists of two DSOMs connected by Hebbian links.
IV. RES ULTS
A preliminary experiment was run on the Cyberbotics
Webots robot simulator, where no noise was modelled in
the joint encoders. Future works will include reproducing the
experiment on a real robot.
As described in the previous section, the arm DSOM and the
head DSOM were trained with data generated through motor
babbling. Each DSOM consisted of 30 ×30 nodes. A weight
vector of four dimensions was associated with each node of
the arm DSOM, representing the positions of the following
joints: shoulder pitch, shoulder roll, elbow yaw and elbow roll.
Similarly, each node of the head DSOM was associated with
a weight vector of two dimensions, representing the following
joint positions: head yaw and head pitch. Weights of the nodes
of both the two DSOMs were randomly initialised within the
ranges of the corresponding joints, to reduce the effect of
elasticity dependency. As pointed out by Rougier et al. [1],
the initial configuration of the DSOM network should cover
the entire support as much as possible to reduce elasticity
dependency.
The experiment was run for almost 3 hours and 20 minutes
(197.58 minutes). It consisted in the robot generating random
arm movements and moving its head accordingly to the cur-
rent visuo-motor coordination skills. Learning was performed
online, in parallel to the execution of the movements. It
consisted in updating the DSOMs based model with training
data represented by the current positions of the joints of the
arm and those of the head. Instant by instant, the current
arm joint configuration was used as input pattern for the arm
DSOM update rule, as described by equation (1). Similarly,
the current head joints configuration was used as input pattern
for the update rule of the head DSOM. Frequency of the
updates matched the 15 Hz frame rate of the visual input.
Therefore, during 197.58 minutes, the DSOMs were updated
using 177,823 input training patterns. In parallel to the DSOM
updates, the Hebbian table connecting the two DSOMs was
updated with the positive Hebbian rule described by equation
3, only when the hand of the robot was visible in the visual
input. During the 197.58 minutes, the hand of the robot was
detected 91,658 times. The Hebbian table was updated at each
detection.
As a measure for the quality of visuo-motor coordination,
we considered the number of times the hand of the robot was
detected from the visual input during a time window of 5
minutes. This measurement was repeated every 5 minutes for
the entire duration of the learning session (197.58 minutes).
A linear regression computed on the collected measurements
showed a positive trend (slope 12.147, intercept 2175.176),
suggesting that the quality of visuo-motor coordination, in
terms of the number of times the robot detected its hand,
improved over time.
In addition, the capability of the proposed model to adapt to
unexpected changes of the input data distributions was tested.
After the first learning session, a damage in the system was
simulated by adding a perturbation to the motor command
used for training the model. In particular, arm movements
were randomly generated as in the first learning session but
the vector representing the current arm joint configuration
was affected by a perturbation. The perturbation consisted
in translating the vector of the arm motor command. The
perturbation was initialised as random, but then it has been
kept constant. In this experiment, the following perturbation
was added to the arm joints: 0.1265 radians to the shoulder
pitch joint, 1.1411 radians to the shoulder roll joint, 1.2295
radians to the elbow yaw joint and -0.2242 radians to the elbow
roll joint.
Therefore, learning continued in the perturbation regime
for 106.72 minutes. During this second learning session,
96,049 new input patterns containing the perturbation were
used for the online update of the models. As in the pre-
vious analysis, the hand-detection rate was measured over
a 5 minutes window. A linear regression computed on the
collected measurements during the first 35 minutes of learning
affected by perturbation showed a negative trend (slope: -
80.286, intercept: 2685.857). In other words, there was degra-
dation of the performance of the visuo-motor coordination.
However, a new improvement in the visuo-motor coordination
was reported during the following 71.72 minutes, as confirmed
by the positive trend of the linear regression (slope: 1.2154,
intercept: 2255.264) computed over the measurements of the
third learning phase. This suggests that the proposed model
was able to partially recover from the unexpected change in
the data distributions. However, we did not analyse how fast
the model can fully recover to the original performance. This
will be addressed in future experiments, including evaluating
the results of several runs to access the reliance of the learning
and recovery processes in response to different perturbations.
Figure 3 shows the trends of the quality of visuo-motor
coordination. The three blue segments show the linear regres-
sions of: the initial learning phase (without perturbation), the
first degradation phase under the perturbation regime and the
final phase, under the perturbation regime, characterised by a
new improvement.
We performed a similar analysis on the trend of the distance
between the detected position (in image coordinates) of the
robot’s hand and the center of the image grabbed from the
robot camera. As described in the Hebbian rule in equation 3,
the connections between pre- and post-synaptic neurons were
strengthened also according to a multiplying factor related to
such a distance. The multiplying factor, which ranged between
1 (hand at the center of the image) and 0 (hand at the corner
of the image), was used for making the system choose head
positions that resulted in the hand being close to the center of
the image. In other words, such a distance can be interpreted
as a measure for the accuracy of the head movements, where
agood movement results in the hand visible at the center of
the image.
We expected to observe an improvement of the accuracy of
the robot’s head movements while learning, followed by an
initial degradation of the performances under the perturbation
regime. A linear regression was computed on the measure-
ment: 1distance(hand, centerof image)/maxdistance,
averaged over a 5 minutes window as in the previous analysis.
The results of a linear regression computed on the measure-
ments of the first learning phase (197.58 minutes) showed
a slightly positive trend (slope: 0.0032, intercept: 0.60687).
A linear regression computed on the following measurements
collected during the first 35 minutes of learning under the
perturbation regime showed, as expected, a negative trend
(slope: -0.0020, intercept: 0.77210), suggesting a degradation
of the performance. However, differently to the previous
analysis, there was a very little improvement in the accuracy
of the head movements during the third learning phase under
the perturbation regime (the last 71.72 minutes of learning).
Although higher than in the second learning phase, the trend
of the linear regression resulted to be still negative (slope: -
0.0009, intercept: 0.73715). This issue will be addressed in
future experiments.
Figure 4 and 5 show the trends of the distortion measure-
ment of the arm DSOM and of the head DSOM. Distortion
is a popular criterion for assessing the quality of a Kohonen
map [10]. It is computed as follows. For each input pattern:
- Update the DSOM using the input pattern;
- Compute the quantization error, as the distance between
the input pattern and the winner node (the closest DSOM
node to the input)
Distortion is computed as the average of the quantization
errors, that is, the sum of the calculated distances, divided
by the number of input patterns. Since we were dealing with
online learning mechanisms, only a partial set of the processed
input data was used for computing the distortion. At each
instant, only the previous 1,800 observations (corresponding to
two minutes of exploration) were used for computing the error.
50 100 150 200 250 300
Time (Minutes)
Fig. 3. The quality of visuo-motor coordination was measured as the number
of times the hand of the robot was detected from the visual input during a
time window of 5 minutes. This measurement (red line in the figure) was
repeated every 5 minutes for the entire duration of the learning session. Blue
lines show linear regressions. First learning phase: slope 12.147, intercept
2175.176; second learning phase under the perturbation regime (the first
vertical green line marks the instant when the perturbation is added): slope
-80.286, intercept 2685.857; third learning phase under perturbation regime
(re-adaptation to the new data distribution): slope 1.2154, intercept 2255.264.
The instant represented by the second green line, corresponding to the end of
the degradation phase, has been arbitrarily chosen.
Fig. 4. Distortion measurement of the arm DSOM. The green vertical line
marks the time instant (39.51, or 197.58 minutes) when the perturbation is
added to the arm commands. Errors are computed over a moving window
of 1,800 input samples (2 minutes, considering the update frequency of 15
frames per second).
Figure 4 shows a decreasing distortion for the arm DSOM
during the first 5 minutes of learning, followed by a quasi-
stationary error until the moment when the perturbation was
added to the arm command (around 197 minutes). Thus, an
increase of the distortion was reported, in correspondence to
the change in the distribution where the data is sampled from.
Once the DSOM adapted to the new distribution, the distortion
error started to decrease and to stabilise.
The head DSOM was not affected by the perturbation, in the
current experiment. In fact, as shown in Figure 5, no significant
jumps in the distortion error signal have been reported.
V. CO N CL U SI ON
Developmental studies in humans suggest that control and
coordination of movements are capabilities that are acquired
over time through exploration behaviours and that would
0.04
0.05
50 100 300 350 400 450150 200 250
Time (Minutes)
0
Distortion Error
0.02
0.01
0.03
Fig. 5. Distortion measurement of the head DSOM. Errors are computed over
a moving window of 1,800 input samples (2 minutes, considering the update
frequency of 15 frames per second). After the first 10 minutes of learning,
the distortion error stabilises between 0.01 and 0.02, since the underlying data
distribution is not changing.
pave the ground for the acquisition of more complex skills.
Applying such a developmental paradigm into robotics, not
only could provide insights into the human development of
cognitive and motor skills, but could also be used for providing
robots with adaptive behaviours and with the capability to
react to unexpected circumstances. This paper addressed the
challenge of autonomous acquisition of internal body repre-
sentations in artificial agents, a fundamental prerequisite for
making robots able to successfully interact with humans and
with their environment.
We investigated the formation and the coupling of sensory
maps in the humanoid robot Aldebaran Nao. In particular,
we proposed a biologically inspired model for online and
continuous learning of visuo-motor coordination. From an
engineering perspective, one might conclude that due to robots
having kinematic models the learning of hand-eye coordination
is a redundant problem. On the other hand, defining models
of robots’ embodiment and their surrounding world a priori
should be avoided, since there is a risk is to stumble across
problems such as robot behaviours lacking of adaptability. Us-
ing humanoid robots equipped with neurally plausible model
also provides a controlled environment for studying learning
mechanisms in infant [8]. Recently, we showed how a similar
architecture based on classical Self-Organizing Maps imple-
mented on a robotic platform accounts for the development
of pointing gestures, an attentional behaviour fundamental
for social interaction and imitation learning. In particular, we
studied how a robot can develop the capability to generate
attention manipulation behaviours as failed attempts to reach
for an object [8]. The reaching capability was acquired through
self-exploration, as in the experiment presented here. However,
the motor babbling algorithm used in [8] manually generated
head movements for following the hand trajectories. Once the
babbling session was over, the collected data was used for
training the hand-eye coordination model. Here, in going a step
backwards in the developmental time line, we addressed the
acquisition of visuo-motor coordination in an online fashion
and already at the babbling level, where no a priori knowledge
on how to follow hand movements was present in the system.
The model proposed here consists of two Dynamic Self-
Organising Maps associated through Hebbian links, which
allow for online learning of mappings between different senso-
rimotor modalities. As in [8], the modular organization of the
model in terms of sensory maps allow for easier identification
of its components with the biological equivalents. Moreover,
results show that the model is able to adapt to dynamic data
distributions.
In particular, the aim of the experiment presented here was
to make the robot able to learn how to follow the movements of
its hand, while generating random motor commands to its arm
joints. During the random movement generation, arm and head
postures were used for updating the corresponding DSOMs in
an online fashion, while they were associated through Hebbian
learning whenever the hand of the robot was visible in the
visual input. Head movements were generated as outputs of the
proposed model. The quality of the head movements depended
on how well the DSOMs encoded the data distributions where
the arm and neck postures were sampled from, and on how
well they were associated through Hebbian learning.
Using the proposed model, the humanoid robot improved
the quality of its visuo-motor coordination over time, starting
from a random configuration where no knowledge about how
to visually follow its arm movements is present. Moreover,
the capability of the proposed model to adapt to unexpected
changes was tested. At a certain point during the devel-
opmental timeline, a damage in the system was simulated
by adding a perturbation to the motor command used for
training the model, resulting in translating the original data
distribution. Consequently, the performance of the visuo-motor
coordination was affected by an initial degradation, followed
by a new improvement as the Arm DSOM adapted to the new
data distribution and the Hebbian connections between the arm
DSOM and the head DSOM adapt to the new mapping.
Future work will include adding a sensory map coding the
hand position of the robot between the arm and head maps.
Extending the model to represent different motor and sensory
modalities, such as visual, auditory and tactile, will also be
investigated.
ACK NO WL E DG M EN T
The research leading to these results has received funding
from the European Union’s Seventh Framework Programme
(FP7/2007-2013) under grant agreement n. 609465, related
to the EARS (Embodied Audition for RobotS) project. The
authors would like to thank the members of the Cognitive
Robotics Group at the Humboldt-Universität zu Berlin for very
helpful feedback.
REF ERE NCE S
[1] N. Rougier and Y. Boniface, “Dynamic self-organising map,” Neuro-
comput., vol. 74, no. 11, pp. 1840–1847, May 2011.
[2] F. Kaplan and V. V. Hafner, “The challenges of joint attention,Inter-
action Studies, vol. 7, no. 2, pp. 135–169, 2006.
[3] S. Tükel, “Development of visual-motor coordination in children with
neurological dysfunctions,” 2013.
[4] C. Yu and L. B. Smith, “Joint attention without gaze following: Human
infants and their parents coordinate visual attention to objects through
eye-hand coordination,” PLoS ONE, vol. 8, no. 11, p. e79659, 11 2013.
[5] S. Zoia, L. Blason, G. Dâ ˘
A´
ZOttavio, M. Bulgheroni, E. Pezzetta,
A. Scabar, and U. Castiello, “Evidence of early development of action
planning in the human foetus: a kinematic study,” Experimental Brain
Research, vol. 176, no. 2, pp. 217–226, 2007. [Online]. Available:
http://dx.doi.org/10.1007/s00221-006-0607-3
[6] G. Metta, “Babyrobot – a study on sensori-motor development,” Ph.D.
dissertation, 2000.
[7] R. Saegusa, G. Metta, and G. Sandini, “Self-body discovery based on vi-
suomotor coherence,” in 3rd Conference on Human System Interactions
(HSI), 2010, May 2010, pp. 356–362.
[8] I. Kajic, G. Schillaci, S. Bodiroza, and V. V. Hafner, “A biologically
inspired model for coding sensorimotor experience leading to the devel-
opment of pointing behaviour in a humanoid robot,” in Proceedings of
the Workshop "HRI: a bridge between Robotics and Neuroscience". 9th
ACM/IEEE Int. Conf. on Human-Robot Interaction (HRI 2014), 2014.
[9] V. V. Hafner and G. Schillaci, “From field of view to field of reach -
could pointing emerge from the development of grasping?” Frontiers
in Computational Neuroscience, Conference Abstract: IEEE ICDL-
EPIROB 2011, 2011.
[10] T. Kohonen, “Self-organized formation of topologically correct feature
maps,” Biological cybernetics, vol. 43, no. 1, pp. 59–69, 1982.
[11] A. Morse, J. de Greeff, T. Belpaeme, and A. Cangelosi, “Epigenetic
robotics architecture (era),Autonomous Mental Development, IEEE
Transactions on, vol. 2, no. 4, pp. 325–339, 2010.
[12] G. Schillaci, V. Hafner, and B. Lara, “Coupled inverse-forward models
for action execution leading to tool-use in a humanoid robot,” in 7th
ACM/IEEE Int. Conf. on Human-Robot Interaction (HRI), 2012, March
2012, pp. 231–232.
[13] G. Schillaci, “Sensorimotor learning and simulation of experience
as a basis for the development of cognition in robotics,
Ph.D. dissertation, 2014. [Online]. Available: http://edoc.hu-
berlin.de/docviews/abstract.php?id=40534
[14] H. Ritter, “Self-organizing maps for internal representations, Psycho-
logical Research, vol. 52, no. 2-3, pp. 128–136, 1990.
[15] P. Rochat, “Self-perception and action in infancy,” Experimental Brain
Research, vol. 123, no. 1-2, pp. 102–109, 1998. [Online]. Available:
http://dx.doi.org/10.1007/s002210050550
[16] P. Rochat and R. Morgan, “Two functional orientations of
self-exploration in infancy,” British Journal of Developmental
Psychology, vol. 16, no. 2, pp. 139–154, 1998. [Online]. Available:
http://dx.doi.org/10.1111/j.2044-835X.1998.tb00914.x
[17] J. H. Kaas, “Topographic maps are fundamental to
sensory processing,Brain Research Bulletin, vol. 44,
no. 2, pp. 107 – 112, 1997. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0361923097000944
[18] T. Martinetz, H. Ritter, and K. Schulten, “Three-dimensional neural net
for learning visuomotor coordination of a robot arm,” IEEE Transactions
on Neural Networks, vol. 1, no. 1, pp. 131–136, Mar 1990.
[19] M. K. Arras, P. W. Protzel, and D. L. Palumbo, “Automatic learning
rate adjustment for self-supervising autonomous robot control,” NASA
Technical Memorandum TM-107592, NASA Langley Research Center,
Tech. Rep., 1992.
[20] B. Fritzke, “A self-organizing network that can follow non-stationary
distributions,” in Artificial Neural Networks â ˘
Aˇ
T ICANN’97, ser. LNCS,
W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, Eds., 1997, vol.
1327, pp. 613–618.
[21] ——, “A growing neural gas network learns topologies, in Advances
in Neural Information Processing Systems 7. MIT Press, 1995, pp.
625–632.
[22] D. Deng and N. Kasabov, “Esom: an algorithm to evolve self-organizing
maps from online data streams,” in Proceedings of the IEEE-INNS-ENNS
International Joint Conference on Neural Networks, 2000. IJCNN 2000,
vol. 6, 2000, pp. 3–8 vol.6.
... In their experiment, the robot progressed through a staged development whereby eye saccades emerged first, followed by gaze control, then primitive reaching, and followed by eventual coordinated gaze-to-touch behaviors. An extension of the approach proposed by Kajić et al. (2014) was presented by Schillaci et al. (2014), where Dynamic Self-Organizing Maps [DSOMs (Rougier and Boniface, 2011)] and a Hebbian paradigm were adopted for online and continuous learning on both static and dynamic data distributions. The authors addressed the learning of visuo-motor coordination in robots, but focused on the capability of the proposed internal model for body representations to adapt to sudden changes in the dynamics of the system. ...
... A MOSAIC-like architecture for action recognition was also presented by Schillaci et al. (2012b), where the authors also compared different learning strategies for inverse and forward model pairs (see Figure 8). In an experiment on action selection, Schillaci et al. (2012bSchillaci et al. ( , 2014 showed how a robot can deal with tool-use when equipped with self-exploration behaviors and (2014)]. The inverse model simulates the motor command (in the example, a displacement of the joints of one arm of the humanoid robot Aldebaran Nao) needed for reaching a desired sensory state, from the current state of the system. ...
... Illustration of body representation proposed bySchillaci et al. (2014) andKajić et al. (2014). The body representation is formed by two self-organizing maps [standard Kohonen SOMs inKajić et al. (2014) and Dynamic SOMs inSchillaci et al. (2014)], connected through Hebbian links. ...
Article
Full-text available
Sensorimotor control and learning are fundamental prerequisites for cognitive development in humans and animals. Evidence from behavioral sciences and neuroscience suggests that motor and brain development are strongly intertwined with the experiential process of exploration, where internal body representations are formed and maintained over time. In order to guide our movements, our brain must hold an internal model of our body and constantly monitor its configuration state. How can sensorimotor control enable the development of more complex cognitive and motor capabilities? Although a clear answer has still not been found for this question, several studies suggest that processes of mental simulation of action–perception loops are likely to be executed in our brain and are dependent on internal body representations. Therefore, the capability to re-enact sensorimotor experience might represent a key mechanism behind the implementation of higher cognitive capabilities, such as behavior recognition, arbitration and imitation, sense of agency, and self–other distinction. This work is mainly addressed to researchers in autonomous motor and mental development for artificial agents. In particular, it aims at gathering the latest developments in the studies on exploration behaviors, internal body representations, and processes of sensorimotor simulations. Relevant studies in human and animal sciences are discussed and a parallel to similar investigations in robotics is presented.
... Models of the former group mostly require only body-related sensors including proprioception and touch, e.g., [49,83,103,151,200]. The latter group additionally requires visual information and takes advantage of the relation between the internal and external sensory modalities to construct the robot's body, e.g., [97,125,158,178,183,189]. As a result, the former category requires some sort of a priori knowledge of the robot's body in terms of parameterized functions, e.g., CAD model, Forward kinematic, Inverse Kinematic, etc. ...
... Schillaci et al. [158] learn a visuomotor coordination task in a Nao humanoid robot with a model consisting of two Dynamic Self-organising maps (DSOMs) encoding the arm and head joint space input, associated by Hebbian links to simulate synaptic plasticity of the brain. Two learning processes, one for updating DSOMs and another for Hebbian learning, are employed to train the model in an online manner during the robot's motor babbling. ...
Article
Full-text available
Safe human-robot interactions require robots to be able to learn how to behave appropriately in spaces populated by people and thus to cope with the challenges posed by our dynamic and unstructured environment, rather than being provided a rigid set of rules for operations. In humans, these capabilities are thought to be related to our ability to perceive our body in space, sensing the location of our limbs during movement, being aware of other objects and agents, and controlling our body parts to interact with them intentionally. Toward the next generation of robots with bio-inspired capacities, in this paper, we first review the developmental processes of underlying mechanisms of these abilities: The sensory representations of body schema, peripersonal space, and the active self in humans. Second, we provide a survey of robotics models of these sensory representations and robotics models of the self; and we compare these models with the human counterparts. Finally, we analyze what is missing from these robotics models and propose a theoretical computational framework, which aims to allow the emergence of the sense of self in artificial agents by developing sensory representations through self-exploration.
... In the literature, many efforts have been placed to model human sensorimotor abilities with methods based on self-organizing maps (SOM) [5][6][7]. Most of these works use SOMs as a topography-preserving and dimension-reducing tool, to map several sensor readings with motor actions. ...
... However, the learning process to form a sensorimotor map takes place mainly through gradient-descent rule which makes it less biologically plausible. In [5], two dynamic SOMs (DSOM) [8] representing head and arm of a humanoid robot were used to achieve visuo-motor coordination. Yet, that model suffered from a degradation in performance when perturbations were added to motor commands. ...
Chapter
Full-text available
In this work, we present the development of a neuro-inspired approach for characterizing sensorimotor relations in robotic systems. The proposed method has self-organizing and associative properties that enable it to autonomously obtain these relations without any prior knowledge of either the motor (e.g. mechanical structure) or perceptual (e.g. sensor calibration) models. Self-organizing topographic properties are used to build both sensory and motor maps, then the associative properties rule the stability and accuracy of the emerging connections between these maps. Compared to previous works, our method introduces a new varying density self-organizing map (VDSOM) that controls the concentration of nodes in regions with large transformation errors without affecting much the computational time. A distortion metric is measured to achieve a self-tuning sensorimotor model that adapts to changes in either motor or sensory models. The obtained sensorimotor maps prove to have less error than conventional self-organizing methods and potential for further development.
... The encoded map of the learned internal representation allows the robot to "mentally imagine" the appearance and position of its body parts. Schillaci et al. (2014) learn a visuomotor coordination task in a Nao humanoid robot with a model consisting of two Dynamic Self-organising maps (DSOMs) encoding the arm and head joint space input, associated by Hebbian links to simulate synaptic plasticity of the brain. Two learning processes, one for updating DSOMs and another for Hebbian learning, are employed to train the model in an online manner during the robot's motor babbling. ...
Preprint
Full-text available
Safe human-robot interactions require robots to be able to learn how to behave appropriately in spaces populated by people and thus to cope with the challenges posed by our dynamic and unstructured environment, rather than being provided a rigid set of rules for operations. In humans, these capabilities are thought to be related to our ability to perceive our body in space, sensing the location of our limbs during movement, being aware of other objects and agents, and controlling our body parts to interact with them intentionally. Toward the next generation of robots with bio-inspired capacities, in this paper, we first review the developmental processes of underlying mechanisms of these abilities: The sensory representations of body schema, peripersonal space, and the active self in humans. Second, we provide a survey of robotics models of these sensory representations and robotics models of the self; and we compare these models with the human counterparts. Finally, we analyse what is missing from these robotics models and propose a theoretical computational framework, which aims to allow the emergence of the sense of self in artificial agents by developing sensory representations through self-exploration.
... Multisensory representation learning of the body schema enables humans to perform pose estimation of their body parts, and coordinate transformation between sensory sources, which, ultimately, enables action [12], [13]. Robotic and computational models of body-schematic representations mostly focus on exploiting sensory information from proprioceptive and tactile sensing [14]- [16], or proprioceptive and visual sensing [17]- [19] and cast the representation learning as calibration, pose estimation or visuomotor mapping. ...
Preprint
Cognitive science suggests that the self-representation is critical for learning and problem-solving. However, there is a lack of computational methods that relate this claim to cognitively plausible robots and reinforcement learning. In this paper, we bridge this gap by developing a model that learns bidirectional action-effect associations to encode the representations of body schema and the peripersonal space from multisensory information, which is named multimodal BidAL. Through three different robotic experiments, we demonstrate that this approach significantly stabilizes the learning-based problem-solving under noisy conditions and that it improves transfer learning of robotic manipulation skills.
... Most of the review was focused on integrating visual and proprioceptive information. For instance, the better part of the robotic experiments were designed in using the linear combination of basic functions for visuomotor transformations (Halgand et al., 2010;Chinellato et al., 2011;Schillaci et al., 2014). However, in these works, the tactile information was not considered at all and it would have been interesting to use an artificial skin to contribute to the representation of the body schema and its space around as an additional modality with respect to the visual and proprioceptive modalities. ...
Article
Full-text available
Representing objects in space is difficult because sensorimotor events are anchored in different reference frames, which can be either eye-, arm-, or target-centered. In the brain, Gain-Field (GF) neurons in the parietal cortex are involved in computing the necessary spatial transformations for aligning the tactile, visual and proprioceptive signals. In reaching tasks, these GF neurons exploit a mechanism based on multiplicative interaction for binding simultaneously touched events from the hand with visual and proprioception information.By doing so, they can infer new reference frames to represent dynamically the location of the body parts in the visual space (i.e., the body schema) and nearby targets (i.e., its peripersonal space). In this line, we propose a neural model based on GF neurons for integrating tactile events with arm postures and visual locations for constructing hand- and target-centered receptive fields in the visual space. In robotic experiments using an artificial skin, we show how our neural architecture reproduces the behaviors of parietal neurons (1) for encoding dynamically the body schema of our robotic arm without any visual tags on it and (2) for estimating the relative orientation and distance of targets to it. We demonstrate how tactile information facilitates the integration of visual and proprioceptive signals in order to construct the body space.
... Although many state-of-the-art learning paradigms still require batch processing, some works have proposed online strategies in the field of humanoid robot learning. A biologically inspired model for online and continuous learning of visuo-motor coordination has been proposed in [20], where dynamic self-organising maps associated through Hebbian links have been adopted for learning the visuo-motor coordination online on a Nao humanoid robot. An online learning approach to achieve reaching behaviour in a humanoid robot has been proposed in [16], where the receptive field weighted regression algorithm has been employed to learn online a representation of the robot's reachable space. ...
Article
Internal models play a key role in cognitive agents by providing on the one hand predictions of sensory consequences of motor commands (forward models), and on the other hand inverse mappings (inverse models) to realise tasks involving control loops, such as imitation tasks. The ability to predict and generate new actions in continuously evolving environments intrinsically requiring the use of different sensory modalities is particularly relevant for autonomous robots, which must also be able to adapt their models online. We present a learning architecture based on self-learned multimodal sensorimotor representations. To attain accurate forward models, we propose an online heterogeneous ensemble learning method that allows us to improve the prediction accuracy by leveraging differences of multiple diverse predictors. We further propose a method to learn inverse models on-the-fly to equip a robot with multimodal learning skills to perform imitation tasks using multiple sensory modalities. We have evaluated the proposed methods on an iCub humanoid robot. Since no assumptions are made on the robot kinematic/dynamic structure, the method can be applied to different robotic platforms.
... This method is conceptually simple and attractive but deploying it in an online manner presents some difficulties (stability / plasticity dilemma) (Wolpert and Kawato, 1998), (Wolpert et al., 2011), (Demiris and Dearden, 2005), (Schillaci and Hafner, 2011). Nonetheless, successful experiments in this direction are presented in (Schillaci et al., 2014), which integrates the multi-modal sensing problem, or in (Stoelen et al., 2012) for a trajectory tracking task and associations of labels and low-level sensory input. ...
Conference Paper
Full-text available
In this paper, we investigate the closed-loop acquisition of basic behaviours on Sphero – a real spherical robot. We impose the additional requirement of learning from scratch in a single episode. The behaviour is encoded as an inverse model for stabilization and sensory target tracking tasks using recurrent neural networks.
Article
Full-text available
Over the last twenty years, a significant part of the research in exploratory robotics partially switches from looking for the most efficient way of exploring an unknown environment to finding what could motivate a robot to autonomously explore it. Moreover, a growing literature focuses not only on the topological description of a space (dimensions, obstacles, usable paths, etc.) but rather on more semantic components, such as multimodal objects present in it. In the search of designing robots that behave autonomously by embedding life-long learning abilities, the inclusion of mechanisms of attention is of importance. Indeed, be it endogenous or exogenous, attention constitutes a form of intrinsic motivation for it can trigger motor command towards specific stimuli, thus leading to an exploration of the space. The Head Turning Modulation model presented in this paper is composed of two modules providing a robot with two different forms of intrinsic motivations leading to triggering head movements towards audiovisual sources appearing in unknown environments. First, the Dynamic Weighting module implements a motivation by the concept of Congruence, a concept defined as an adaptive form of semantic saliency specific for each explored environment. Then, the Multimodal Fusion & Inference module implements a motivation by the reduction of Uncertainty through a self-supervised online learning algorithm that can autonomously determine local consistencies. One of the novelty of the proposed model is to solely rely on semantic inputs (namely audio and visual labels the sources belong to), in opposition to the traditional analysis of the low-level characteristics of the perceived data. Another contribution is found in the way the exploration is exploited to actively learn the relationship between the visual and auditory modalities. Importantly, the robot---endowed with binocular vision, binaural audition and a rotating head---does not have access to prior information about the different environments it will explore. Consequently, it will have to learn in real-time what audiovisual objects are of \qmarks{importance} in order to rotate its head towards them. Results presented in this paper have been obtained in simulated environments as well as with a real robot in realistic experimental conditions.
Conference Paper
Full-text available
Robots are gaining more attention in the neuroscienti�c community as means of verifying theoretical models of social skill development. In particular, humanoid robots which resemble dimensions of a child o�er a controlled platform to simulate interactions between humans and infants. Such robots equipped with biologically inspired models of social and cognitive skill development might provide invaluable insights into learning mechanisms infants employ when interacting with others. One such mechanism which develops in infancy is the ability to share and direct attention of interacting participants. Pointing behaviour underlies joint attention and is preceded by hand-eye coordination. Here, we attempt to explain how pointing emerges from sensorimotor learning of hand-eye coordination in a humanoid robot. A robot learned joint con�gurations for di�erent arm postures using random body babbling. Arm joint con�gurations obtained in babbling experiment were used to train biologically inspired models based on self-organizing maps. We train and analyse models with various map sizes and depending on their con�guration relate them to different stages of sensorimotor skill development. Finally, we show that a model based on self-organizing maps implemented on a robotic platform accounts for pointing gestures when a human presents an object out of reach for the robot.
Thesis
Full-text available
State-of-the-art robots are still not properly able to learn from, adapt to, react to unexpected circumstances, and to autonomously and safely operate in uncertain environments. Researchers in developmental robotics address these issues by building artificial systems capable of acquiring motor and cognitive capabilities by interacting with their environment, inspired by human development. This thesis adopts a similar approach in finding some of those basic behavioural components that may allow for the autonomous development of sensorimotor and social skills in robots. Here, sensorimotor interactions are investigated as a mean for the acquisition of experience. Experiments on exploration behaviours for the acquisition of arm movements, tool-use and interactive capabilities are presented. The development of social skills is also addressed, in particular of joint attention, the capability to share the focus of attention between individuals. Two prerequisites of joint attention are investigated: imperative pointing gestures and visual saliency detection. The established framework of the internal models is adopted for coding sensorimotor experience in robots. In particular, inverse and forward models are trained with different configurations of low-level sensory and motor data generated by the robot through exploration behaviours, or observed by human demonstrator, or acquired through kinaesthetic teaching. The internal models framework allows the generation of simulations of sensorimotor cycles. This thesis investigates also how basic cognitive skills can be implemented in a humanoid robot by allowing it to recreate the perceptual and motor experience gathered in past interactions with the external world. In particular, simulation processes are used as a basis for implementing cognitive skills such as action selection, tool-use, behaviour recognition and self-other distinction.
Article
Full-text available
The coordination of visual attention among social partners is central to many components of human behavior and human development. Previous research has focused on one pathway to the coordination of looking behavior by social partners, gaze following. The extant evidence shows that even very young infants follow the direction of another's gaze but they do so only in highly constrained spatial contexts because gaze direction is not a spatially precise cue as to the visual target and not easily used in spatially complex social interactions. Our findings, derived from the moment-to-moment tracking of eye gaze of one-year-olds and their parents as they actively played with toys, provide evidence for an alternative pathway, through the coordination of hands and eyes in goal-directed action. In goal-directed actions, the hands and eyes of the actor are tightly coordinated both temporally and spatially, and thus, in contexts including manual engagement with objects, hand movements and eye movements provide redundant information about where the eyes are looking. Our findings show that one-year-olds rarely look to the parent's face and eyes in these contexts but rather infants and parents coordinate looking behavior without gaze following by attending to objects held by the self or the social partner. This pathway, through eye-hand coupling, leads to coordinated joint switches in visual attention and to an overall high rate of looking at the same object at the same time, and may be the dominant pathway through which physically active toddlers align their looking behavior with a social partner.
Conference Paper
Full-text available
We propose a computational model based on inverse-forward model pairs for the simulation and execution of actions. The models are implemented on a humanoid robot and are used to control reaching actions with the arms. In the experimental setup a tool has been attached to the left arm of the robot extending its covered action space. The preliminary investigations carried out aim at studying how the use of tools modifies the body scheme of the robot. The system performs action simulations before the actual executions. For each of the arms, predicted end-effector positions are compared with the desired one and the internal pair presenting the lowest error is selected for action execution. This allows the robot to decide on performing an action either with its hand alone or with the one with the attached tool.
Conference Paper
Full-text available
This paper proposes a plausible approach for a humanoid robot to discover its own body part based on the coherence of two different sensory feedbacks; vision and proprioception. The image cues of a visually salient region are stored in a visuomotor base with the level of visuo proprioceptional coherence. The high coherence between the motions in the vision and proprioception suggests the visually attracted object in the view is correlated to its own motor functions. Then, the robot can defln the motor correlated objects in the view as the self-body parts without prior knowledge on the body appearances nor the body kinematics. The acquired visuomotor base is also useful to coordinate the head and arm posture to bring the hand inside the view, and also recognize it visually. The adaptable body part perception paradigm is effective when the body is possibly extended by the tool use. Each visual and proprioceptional processes are distributed in parallel, which allows on-line perception and real-time interaction with people and objects in the environment.
Article
Twenty 3- and 5-month-old infants were presented with either a congruent or incongruent (left/right reversal) on-line views of their own legs on a large TV monitor in two different experimental conditions. In one (no-object) condition, infants viewed their legs which produced a sound each time they moved them. In another (object) condition, they viewed their legs plus an object target which produced a sound each time it was kicked. Results indicate that from 3 months of age infants tend to reverse their pattern of relative visual attention and leg movement depending on the condition. Confirming previous findings, at both ages infants looked significantly longer and were more active while looking at the incongruent view of their own legs in the no-object condition. In contrast, infants looked significantly longer and were more active while looking at the congruent view of their own legs in the object condition. These observations are interpreted as evidence that early in the first year of life, infants express a sense of their own body as a perceptually organized entity which they monitor and control as either an object of exploration or an agent of action in the environment.
Article
full text available here: http://www.frontiersin.org/10.3389/conf.fncom.2011.52.00017/event_abstract
Article
This work contains a theoretical study and computer simulations of a new self-organizing process. The principal discovery is that in a simple network of adaptive physical elements which receives signals from a primary event space, the signal representations are automatically mapped onto a set of output responses in such a way that the responses acquire the same topological order as that of the primary events. In other words, a principle has been discovered which facilitates the automatic formation of topologically correct maps of features of observable events. The basic self-organizing system is a one- or two-dimensional array of processing units resembling a network of threshold-logic units, and characterized by short-range lateral feedback between neighbouring units. Several types of computer simulations are used to demonstrate the ordering process as well as the conditions under which it fails.
Article
We present in this paper a variation of the self-organising map algorithm where the original time-dependent (learning rate and neighbourhood) learning function is replaced by a time-invariant one. This allows for on-line and continuous learning on both static and dynamic data distributions. One of the property of the newly proposed algorithm is that it does not fit the magnification law and the achieved vector density is not directly proportional to the density of the distribution as found in most vector quantisation algorithms. From a biological point of view, this algorithm sheds light on cortical plasticity seen as a dynamic and tight coupling between the environment and the model.