IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 201143
Implicit Sensorimotor Mapping of the Peripersonal
Space by Gazing and Reaching
Eris Chinellato, Member, IEEE, Marco Antonelli, Beata J. Grzyb, and Angel P. del Pobil
Abstract—Primates often perform coordinated eye and arm
movements, contextually fixating and reaching towards nearby
objects. This combination of looking and reaching to the same
target is used by infants to establish an implicit visuomotor
representation of the peripersonal space, useful for both oculo-
motor and arm motor control. In this work, taking inspiration
from such behavior and from primate visuomotor mechanisms,
a shared sensorimotor map of the environment, built on a ra-
dial basis function framework, is configured and trained by the
coordinated control of eye and arm movements. Computational
results confirm that the approach seems especially suitable for the
problem at hand, and for its implementation on a real humanoid
robot. By exploratory gazing and reaching actions, either free
or goal-based, the artificial agent learns to perform direct and
inverse transformations between stereo vision, oculomotor, and
joint-space representations. The integrated sensorimotor map that
allows to contextually represent the peripersonal space through
different vision and motor parameters is never made explicit, but
rather emerges thanks to the interaction of the agent with the
Index Terms— Eye–arm coordination, humanoid robots, ra-
dial basis function networks, self-supervised learning, spatial
stimuli, mainly looking and reaching at them. Through active
exploration, they construct a representation of the environment
useful for further interactions. The main sensory information
used to build such representation is retinotopic (visual data) and
proprioceptive (eye, neck, and arm position). A critical issue in
UMANS and other primates build their perception of the
surrounding space by actively interacting with nearby
Manuscript received February 15, 2010; revised June 06, 2010; accepted De-
cember 23, 2010. Date of publication January 28, 2011; date of current version
March 16, 2011. This work was supported in part by the European Commis-
sion’s Seventh Framework Programme FP7/2007-2013, under Grant 217077
(EYESHOTS project), by the Ministerio de Ciencia y Innovación (DPI-2008-
06636, FPU Grant AP2007-02565, and FPI Grant BES-2009-027151), by the
Fundació Caixa-Castello-Bancaixa (P1-1B2008-51), and by the World Class
by the Ministry of Education, Science, and Technology (Grant R31-2008-000-
E. Chinellato,M. Antonelli, and B. J. Grzyb are with the RoboticIntelligence
Laboratory, Jaume I University, Castellón de la Plana 12071, Spain (e-mail:
email@example.com; firstname.lastname@example.org; email@example.com).
A. P. del Pobil is with the Robotic Intelligence Laboratory, Jaume I Univer-
sity, Castellón de la Plana, 12071, Spain, and he is also with the Department
of Interaction Science, Sungkyunkwan University, Seoul, South Korea (e-mail:
Color versions of one or more of the figures in this paper are available online
Digital Object Identifier 10.1109/TAMD.2011.2106781
this process is to coordinate movements and associate sensory
inputs, in order to obtain a coherent mental image of the envi-
ronment. Indeed, eye and arm movements often go together, as
we fixate an object before, or while, we reach it. Such combina-
tion of looking and reaching towards the same target is used to
establish a consistent, integrated visuomotor representation of
the peripersonal space.
In primates, areas within the dorsal visual stream of the pri-
mate brain, and more precisely, regions of the posterior parietal
cortex (PPC), are in charge of performing the reference frame
transformations required to map visual information to appro-
candidates for the role of accessing and updating a visuomotor
representation of the reachable space. It is often argued, and
increasingly accepted by the neuroscientific community, that
such ability is achieved through the use of gain fields and basis
function representations, that permit to simultaneously repre-
sent stimuli in various reference frames. In fact, the basis func-
tion approach has the attractive feature that both head-centric
representations for arm movements and retino-centric represen-
tations for gaze movements can be encoded concurrently in the
same neural map . In the basis function framework, explicit
encoding of targets in retino-centric coordinates is enhancedvia
gain fields to hold in parallel an implicit encoding in other ref-
erence frames . Such gain fields are found in retino-centric
organized eye movement areas lateral intraparietal sulcus (LIP)
,  and frontal eye field (FEF)  and, most importantly, in
posterior parietal area V6A, as explain in Section II.
separate effectors that receive motor control via different move-
ment vectors associated to specific spatial representations, i.e.,
bined to form a unique, shared visuomotor map of the periper-
sonal space. The exploratory behavior of the robot is based on
a functional model of the tasks performed by the primate pos-
terior parietal cortex. The main building block of such a model
is a basis function framework that associate different reference
frames, giving them mutual access to each other, during plan-
ning, execution and monitoring of eye and arm movements.
We implement such framework by relying upon findings from
human and monkey studies, especially from data on gaze direc-
tion and arm reaching movements in monkey area V6A .
Our system should finally be able to achieve a visuomotor
the practical interaction with the environment, using both stere-
optic visual input and proprioceptive data concerning eye and
arm movements. Following this approach, the robot should nat-
urally achieve very good open-loop reaching and saccade capa-
1943-0604/$26.00 © 2011 IEEE
44IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 2011
Fig. 1. Conceptual schema of the space representation map, with afferent and
resentation is generated and updated. The peripersonal space is
represented bya plastic map,that constitutes bothknowledge of
the environment and a sensorimotor code for performing move-
ments and evaluate their outcome. The map is accessed and
modified by two types of information: retinotopic (visual) and
arm motor plans are devised in accordance to the map itself.
This mechanism allows us to keep both eye and arm targeting
in register and to establish a common spatial representation of
on a common code for spatial awareness obtained by a learning
procedure based on target errors of eye or arm movements to vi-
should be able to purposefully build such visuomotor map of
the environment in 3-D, simultaneously learning to look at and
reach towards different visual targets.
The perceptionof the space and the relatedsensorimotor map
is thus accessed and updated by visuomotor interaction, e.g.,
moving the gaze and the arm toward a goal position. More in-
terestingly, the agent has to be able to keep learning during
its normal behavior, by interacting with the world and contex-
tually update its representation of the world itself. Such goal
can be achieved by adjusting the weights of the basis func-
tions according to the errors observed in the oculomotor and
reaching movements asguided bythe same basisfunctions. The
concurrence of eye and arm movements should be sufficient
to provide the appropriate error signals even when actual dis-
tances between target and final position of the action are not
explicitly provided. We could call this approach as a “self-su-
pervised learning” framework, in which the different modal-
ities supervise each other, and eye and arm movements both
improve, and obtain together a precise visuomotor representa-
that the target object has been reached, the actual gazing and
the 3-D visuomotor representation accordingly updated if nec-
essary. The presence of a tactile response, as feedback for the
correct execution of a reaching movement, provides a reliable
“master” signal to ensure the accuracy of the global represen-
tation. The results presented in Section IV refers to the com-
putational implementation of the model, and we are currently
working on its development on our humanoid robot setup (see
This research builds on neuroscience findings and insights,
computational intelligence concepts and techniques, and engi-
neering goals and constraints posed by the robotic implemen-
tation. In this section we introduce the fundamental inspiring
concepts and relevant literature for each of these fields.
A. Sensorimotor Transformations in the Posterior
elaborates visual data with the main purpose of endowing the
subject with the ability of interacting with his/her environment,
and its tasks are often synthesized as “vision for action.” The
latter is dedicated to object recognition and conceptual pro-
cessing, and thus performs “vision for perception.” Although
the interaction between the two streams is necessary for most
everyday tasks, dorsal stream areas are more strictly related to
the planning and monitoring of reaching and grasping actions
. In fact, dorsal visual analysis is driven by the absolute di-
mension and location of target objects, requiring continuous
transformations from retinal data to effector-based frames of
reference. The frame of reference used for arm movements is
body-centered, e.g., fixed on the shoulder. Considering a dex-
terous primate, or a humanoid robot endowed with a pan-tilt-
vergence head, head movements are also body-centered, whilst
gaze movements are head-centered, usually referred to the cy-
clopean eye, ideal midpoint between the eyes. Visual informa-
tion is retinocentric, and there are two different retinal reference
frames for a stereo system. All these different reference frames
are brought in register to each other by coordinated movements
to the same target.
The ventral stream maintains instead a contextual coding of
objectsin theenvironment,basedon theiridentity and meaning.
Spatially, such coding can be defined as object-centered, as it
is mainly concerned with the relative location of objects with
respectto each other. This sortof codingis notused atthis stage
of our research.
The hypothesis of parallel visuomotor channels within the
dorsal stream dedicated respectively to the transport and the
preshaping components of the reach-to-grasp action is well rec-
ognized . Anatomically, these two channels fall both inside
the dorsal stream, and are sometimes named dorso–medial and
dorso–lateral visuomotor channels . For what concerns prox-
imal joint movements, focus of interest of this research, and
according to a well established nomenclature, the most impor-
tant reach-related cortical areas are V6A and medial intrapari-
etal sulcus (MIP), both receiving their main input from V6 and
projecting to the dorsal premotor cortex –.
Considering the functional role of the dorso–medial stream,
information regarding eye position and gaze direction is very
likely employed by area V6A in order to estimate the position
of surrounding objects and guide reaching movements toward
them. Two types of neurons have been found in V6A that allow
us to sustain this hypothesis . The receptive fields of neu-
rons of the first type are organized in retinotopic coordinates,
CHINELLATO et al.: IMPLICIT SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE BY GAZING AND REACHING 45
but they can encode spatial locations thanks to gaze modula-
tion. The receptive fields of the second type of neurons are or-
ganized according to the real, absolute distribution of the sub-
ject peripersonal space. In addition, V6A contains neurons that
arguably represent the target of reaching retinocentrically, and
others that use a spatial representation . This strongly sug-
gests a critical role of V6A in the gradual transformation from
a retinotopic to an effector-centered frame of reference. More-
charge of performing the visuomotor transformations required
for the purposive control of proximal arm joints, integrating vi-
sual, somatosensory, and somatomotor signals in order to reach
a given target in the 3-D space.
B. The Basis Function Approach to Sensorimotor
Basis functions are building blocks that, when combined
linearly, can approximate any nonlinear function, such as those
required to map between different neural representations of the
peripersonal space (retinotopic, head-centered, arm-centered).
Basis function networks havebeen proposed as a computational
solution especially suitable for modeling the kind of senso-
rimotor transformations performed by the posterior parietal
cortex , , . Networks of suitable basis functions
are in fact able to naturally reproduce the gain-field effects
often observed in parietal neurons . It was suggested that
positions of object in the peripersonal space are coded through
the activity of parietal neurons that act as basis functions, and
any coordinate frame can be read out from such population
coding according to the task requirements .
Several different transfer functions can be used as basis func-
interaction is also nonlinear (e.g., product versus sum), and that
for their convenience and biological plausibility, are Gaussian
and sigmoid functions. For example, retinotopic maps are often
modeled by Gaussian basis functions, and eye position by sig-
moid, or logistic, functions . Learning in basis function net-
the output representation. The first step is usually unsupervised,
the second depend on errors observed during the sensorimotor
interaction with the world.
C. Connectionist Sensorimotor Transformations in Robotics
Although the use of artificial neural networks in robotics is
very diffuse and not at all novel , just few works concern
visuomotor transformations involving arm movements, and es-
pecially rare is the coordinate control of gazing and reaching
movements. Visuomotor arm control has been usually tackled
with the use of self-organizing maps (SOM). Some works ,
 apply them to the coordination between visual information
and arm control, modeling two cameras (although not in a clas-
sical stereo head configuration) and three degrees of freedom
manipulators. Neither of the above papers consider eye move-
ments, and their applicability to real robot setups is limited, re-
spectively by the huge number of required learning steps ,
and by the discrete sampling of the space which makes any
target reachable only up to a certain precision, and only after
a number of steps dependent on the sampling . A recent ex-
tension of these works  makes use of alternative SOM maps
linked to different cameras, in order to deal with occlusion, and
takes also into account the issue of obstacle avoidance. Espe-
cially interesting is the flexibility of their system to changes in
the geometry of the effector, which we are able to reproduce
with our approach. Fuke et al.  have used a SOM to model
the relation between arm movements and the perception of a
subject own face, as supposedly performed by area ventral in-
traparietal sulcus (VIP) of the primate brain.
Whilst theuse of SOMnetworks is hence relativelycommon,
the employment of biologically inspired radial basis function
(RBF) networks remains relatively unexplored. Although RBF
have been successfully applied to the computation of inverse
kinematics, alone  or together with SOM , to the best
of our knowledge only two papers describes the use of RBF
networks for visuomotor transformations. The system of Mar-
janovic et al.  firstly learns the mapping between image co-
ordinates and the pan/tilt encoder coordinates of the eye motors
arm position (the ballistic map). A similar learning strategy is
employed by Sun and Scassellati , which use the difference
vector between the target and the hand position in the eye-cen-
stages. Despite the similarity of their approaches to ours, some
major differences can be pointed out: first of all, we exploit
stereo vision, realizing a coordinated control of vergence and
version movements, moreover, the saccade map in  is fixed
and mainly used to provide visual feedback during the ballistic
map learning. On the other hand, our sensorimotor transforma-
tions are bidirectional, so that our system learns to gaze towards
its hand but also to reach where it is looking at. This skill is
trained through a self-supervised learning framework, in which
the different modalities supervise each other, and both improve
contextually their mapping of the space. The distribution of the
RBF centers also differs from the cited works, as we place the
neural receptive fields according to findings from neurophysio-
logical studies on monkeys.
A few attempts to tackle the problem of coordinate control of
gazing and arm movements by using neural networks, but not
forward neural network for learning to saccade toward targets,
and a recurrent neural network is employed for executing the
transformation carrying from the visual input to an appropriate
arm posture, suitable for reaching and grasping a target object.
The reaching model of Nori et al.  consists in learning a
motor–motor map to direct the hand close to the fixated object,
and then activate a closed loop controller that using visual dis-
tance between the hand and the target improves reaching accu-
the importance of contextually maintaining a series of represen-
tations in different body reference frames, as suggested by neu-
roscience findings, especially those regarding posterior parietal
46IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 2011
III. SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE:
As mentioned in Section I and considering the theoretical
description of Section II, the main sources of inspiration for
our model are the basis function approach ,  and neu-
roscience experiments on the role of posterior parietal cortical
areas (especially V6A) during gazing and reaching actions ,
The use of a robot hardware constitutes a possible complica-
tion in the realization of a model of cortical mechanisms, and
some issues that would easily be solved in simulated environ-
ments have to be dealt with more accurately considering the
real world implementation. The approach we follow is epige-
netic, being the robot endowed with an innate knowledge of
how to move in its environment, which is later developed and
customized through exploration and interaction with visual and
tactile stimuli. Following this idea, all transformations are first
implemented on a computational model, which final configu-
ration represents the genetic component of the developmental
process, and is then used as a bootstrap condition for the actual
experimental learning process by the robot.
Although in principle one representation should be enough
for all the required transformations, the number of neurons nec-
essary to contextually code for
the size of the signals to the power of
a representation maintaining both eye visual and proprioceptive
feasible,evenfor thebrainitself. Amorelogicalstructureis one
inwhicha central,body-centered representationisaccessedand
updated both by limb sensorimotor signals on the one hand and
visual and oculomotor signals on the other hand (see Fig. 2).
Indeed, this seems to be how the problem is solved within the
brain, in which different areas or populations of neurons in the
same areas are dedicated to different transformations. Most im-
portantly, this approach is consistent with the findings related to
position of targets, others that code for their limb-related posi-
tion, and even others that seem to maintain both codings and
look thus especially critical for performing sensorimotor trans-
formations. In this way, different representations of the same
target can be maintained contextually, and used to act on the
target if required. It is the relation between these representa-
tions that is accessed and modified by a conjunction of gaze and
reach movements to the same target. The global structure of our
model follows this principles, and is thus modular, separating
the retinal to body-centered transformation and the body-cen-
The quality and the nature of the sensory stimuli to be intro-
duced in the schema is quite varied, as the process of 3-D lo-
calization requires the integration of information coming from
various sources and different modalities. Such integration can
be modeled with different levels of detail and considering alter-
native data sources and formats. In our case, we include visual
information about potential targets and proprioceptive data on
eye position and arm position. Several possible alternatives for
representing the above information can be employed. Among
possible alternatives for representing binocular information we
different signals is given by
. It is easy to see that
Fig. 2. Building blocks of the global space representation. The central
body-centered map constitutes the integrated sensorimotor representation of
the peripersonal space.
favor the composition of a cyclopean image representation with
a disparity map (under the assumption that the correspondence
problem is already solved), over the option of having separate
left and right retinotopic maps (see left side of the schema of
Fig. 2). Similarly, considering that we are modeling extrastriate
and associativevisual areas,it is plausible toassumethatgazing
direction is represented by version and vergence angles instead
form ocular movements and stereoptic visual information to
a body/head-centered reference frame and also, when needed,
visual target. On the right hand of the conceptual schema of
Fig. 2 we find the somatosensory/arm–motor map, required to
code arm movements. Such map is modified by proprioceptive
and tactile feedback, and allows to execute reaching actions to-
ward visual or remembered targets. The integrated map, built of
the two sides of the schema, is thus accessed and updated upon
requirements, as described in Section IV-A.
The exploration of the environment through saccades and
reaching movements constitutes the basic behavior that is
employed to build the visuomotor representation of the periper-
through subsequent, increasingly complex interactions. The
learning sequence is inspired by infant development .
As a first step, the system learns the association between
retinal information and gaze direction (i.e., proprioceptive eye
position). This can be done simply by successive foveation
on salient points of the binocular images. The subject looks
around and focuses the eyes on certain stimuli, thus learning
the association between retinal information and vergence and
version parameters. Then, gaze direction is associated to arm
position, e.g., moving the arm randomly and following it with
the gaze, so that each motor configuration of the arm joint is
associated to a corresponding configuration of the system for
eye motor control. In this case, proprioceptive information
regarding arm position is included in the computation, and the
vectors corresponding to reaching movements can be extracted
similarly to what is done for ocular movements. This process
make the subject learn a bidirectional link between different
sensorimotor systems. The subject can look where its hand is
but also reach a point in space he is looking at. Later on, visual
targets are shown to the system, which is required to perform
CHINELLATO et al.: IMPLICIT SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE BY GAZING AND REACHING 47
Fig. 3. Computational framework of the visuomotor integration model. Two transformations allow to code a stimulus contextually in visual, oculomotor, and
arm–motor frames of reference.
both saccadic and arm reaching movements toward them. This
requires the use of both direct and inverse transformations,
and allow to fine-tune the sensorimotor representation of the
space. Tactile feedback can be used as a master signal, for
confirming that the target has been reached, making all the
process substantially self-supervised.
IV. SENSORIMOTOR TRANSFORMATIONS WITH
RADIAL BASIS FUNCTIONS
So far, the model has been implemented in a simulated en-
vironment, taking always into account the final application on
our humanoid robotic setup, currently under development. The
computational framework is depicted in Fig. 3, which is a sim-
plification of the conceptual schema of Fig. 2, in which the neck
is fixed, and thus body-centered corresponds to head-centered.
Also, there is no tactile feedback for the moment, and the con-
trol of arm movements is based purely on proprioception.
The visual input regardinga potential target is expressedwith
its location in a cyclopean visual field accompanied by infor-
mation on binocular disparity; the output is the correspondent
head/body center representation, built of a potential vergence/
version movement required to foveate on the target. This trans-
basis functions, is used instead to maintain a contextual coding
of stimuli in both a body-centered and an effector based frame
of reference. It is used to recode oculomotor coordinates in arm
joint space and vice versa.
Each of the two codings of the space corresponds to a poten-
tial movement, so that, thank to this second transformation, the
agent is able to reach where it is looking at (direct transforma-
tion) and to foveate on the position of the hand (inverse trans-
formation). If one of the potential motor signals is not released,
eye and arm movements can be decoupled, i.e., the system can
for example reach a peripheral visual target without directing
the gaze toward it, only using the body-centered representation
as an intermediate step to recode visual input to arm motor re-
sponse. Similarly, arm movements can also be planned but not
executed, e.g., waiting for a cue signal in an experimental pro-
tocol of delayed reaching. Details regarding the computational
implementation of the two transformations are given next.
A. Visual to Oculomotor Transformation
with both eyes, in order to associate appropriate version and
vergence movements to retinal locations. Either left and right
retinal images or a cyclopean visual field accompanied by a dis-
parity map can be used as visual input, and we employed the
latter. Since visual processing is not the focus at this stage of
development, visual stimuli are just point-like features, similar
to the LEDs used in monkey experiments. The transformation
was implemented with an RBF network, for the theoretical rea-
sons explained above.
We tested Gaussian and sigmoid neural activation functions,
for both cyclopean visual input and disparity, and try the corre-
sponding nets with different spreads. The best performance was
achieved for Gaussian-shaped units for both inputs, described
by the following equation, which is the vectorial equivalent of a
single variable Gaussian function
th bidimensional basis function,
the spreads of the unit in each of its dimensions, and
resulting activation of the th unit. The two outputs of the net-
work, i.e., vergence and version movements required to foveate
on the given input, are computed by a linear combination of the
basis functions, according to
is the input vector,the center of the
the diagonal matrix with
the th output and
the weights that best fit input with output datasets.
We decided to employ fixed centers, which receptive fields
can not move according to the input data, favoring biological
plausibility overpotentially better performance. For thisreason,
we distributed the radial basis functions according to a retino-
topic-like criterion (input to V6A is, at least partly, retinotopic),
following a logarithmic distribution of the centers. For what
concerns cyclopean visual input, a logarithmic organization of
the neural receptive fields is suitable for modeling foveal mag-
nification, whilst for disparity it corresponds to a finer coding
for smaller disparities, actually observed in the primate visual
cortex. Itis hence notsurprisingthat thelogarithmic organ-
good as by using a homogeneous distribution (0.10 mm against
is the number of units,is the value of
is the weight that connects the th RBF
48IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 2011
Fig. 4. Distribution and spread of the radial basis functions of the visual to
oculomotor network on the cyclopean/disparity space.
1.1 mm). For setting the number of neurons, we defined an ar-
bitrary threshold of 0.10 mm error for
that was achieved by a 7
7 neural lattice in the cyclopean/dis-
parity input space (thus
The training points are constituted by input–output pairs
, and are provided by the simulated execution of
saccadic movements and the estimation of the target visual
displacement. More exactly, an oculomotor behavior is simu-
lated in which the agent: 1) is fixating a given point in space
close to a visual stimulus; and 2) performs a random ocular
movement toward a different location. The input-output pairs
are composed by the visual position (cyclopean-disparity) of
the initial stimulus as seen from the current ocular position, and
the vergence–version components of the performed movement.
This solution allows to learn a visual to oculomotor mapping
without previous focusing skills. The underlying assumption
is that of a rich visual environment, in which visual stimuli
are always available close to the ocular fixation point. This
assumption is reasonable for biological agents, and useful for
the modeled system, but the robot has to follow a different
strategy, as explained in Section V, due to its limited visual
skills in dealing with cluttered environments.
An initial setting of the weights is done employing the batch
learning technique of linear pseudo-inverse solution, typical for
RBF networks 
; see Fig. 4).
neuron for all inputs
matrix. We tested this first learning step on datasets of different
size , obtaining the error evolution depicted in Fig. 5. Looking
for a trade-off between performance and size of the dataset, we
of training points to employ in this stage of the process.
To simulate the sort of learning process that would be per-
formed by robot, we executed a step by step learning sequence
on the next 200 points. This second learning stage is performed
is a matrix containing the output of each RBF
, while is the output
as the number
of different size.
by applying the delta rule gradient descent technique, to update
the weights of (2) on each time step
actual output of the network,
learning rate (which we maintain constant). The use of an incre-
mental learning rule such as in (4), instead of a batch learning
as in(3), allows to keep the system flexible to possible changes
in visual accuracy and body kinematics. In principle, applying
the delta rule should allow to adapt to unavoidable hardware
asymmetries, image distortions, and also to deceptive sensory
information, as described below.
The inverse transformation from oculomotor to visual data
has also been implemented, using the same parameters of the
direct one. Its role is to estimate the expected retinal position
of a fixated point after a given saccadic vergence
movement. We plan to use it in the real robot to detect possible
discrepancies between expected and observed visual feedback.
We considered most plausible for this transformation to employ
the same parameters of the direct one, such as the retinal organ-
ization of the centers, and probably for this reason it is not as
precise as the visual to oculomotor transformation (the average
error is about 0.5 mm).
To validate the model, we are comparing its behavior with
some psychophysical effects described in the literature re-
garding the tasks it executes. For example, we are checking the
model behavior in the case of the deceptive visual feedback,
such as in typical experiments of saccadic adaptation . This
is done by eliciting a saccade (based on vergence/version eye
movement control) toward a given visual target, and providing
a fictitious error on the final reached position. For the com-
putational model, this is achieved by adding an offset to the
output. On the robot, the same effect will be obtained moving
is the output of the neurons for input, is the
thethe expected output and
CHINELLATO et al.: IMPLICIT SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE BY GAZING AND REACHING 49
Fig. 6. Gazing and reaching schema. At each training step the artificial agent,
either model or robot, is required to move its hand and gaze toward the same
point, and update its sensorimotor representation using the observed error.
the visual target as for human subjects. Analysis of how (as in
the saccadic adaptation protocol) such artificial displacement of
the target affects the artificial agent oculomotor and arm motor
abilities can serve as a validation of the underlying model,
and may help in advance hypotheses on saccadic adaptation
mechanisms in humans and monkeys. So far, we were able to
verify that our model do exhibit saccadic adaptation, altering its
ability to perform correct saccades according to the deceptive
feedback. The analysis of error distributions around the target
point and of error vectors is also providing interesting informa-
tion that we are currently studying with more detail, together
with collaborators from cognitive sciences.
B. Oculomotor to Arm–Motor Transformation
In this learning phase, arm movements are introduced, as ex-
emplified in Fig. 6. This phase is further subdivided in two
stages, respectively free and goal-based. The free exploration
ward the final hand position, which allows to learn the transfor-
mation from joint space to oculomotor space and vice versa. To
work in gazing movements is required, so the second transfor-
mation can only be trained after the first one. In the goal-ori-
ented exploration a target object in space has to be foveated and
The choice of how to distribute the basis function neurons is
less straightforward for this second network. Automatic placing
we favor biological plausibility over performance. Our main
inspiration is on neuroscience findings regarding the posterior
parietal cortex,and especiallyarea V6A. Ina previouswork, we
showed that a population of V6A neurons is properly modeled
Fig. 7. Mapping of the space according to uniform distributions in a vergence/
version oculomotor space (red), in a J1/J2 joint space (cyan) and in a standard
horizontal/depth ????? Cartesian space (green).
by a basis function approach . As anticipated, this area in-
cludes neurons having only visual response, neurons apparently
involved mainly in motor actions and mixed neurons, activated
in all phases of sensorimotor processes. With our model we
wanted to check what computational advantages could be given
two arm joints were used, and no tilt movements of the eyes, so
that the accessible environment is a 2-D space placed horizon-
tally in front of the subject, as in Fig. 6. This is anyway consis-
tent with most of the monkey experiments in which activity in
V6A was registered.
At this stage of the model development, we want thus to
achieve good performances in the learning of the transfor-
mations between oculomotor and arm motor space, while
respecting, and trying to emulate, the responsiveness pattern
observed in area V6A. We simulated the different types of
neurons of V6A with populations of radial basis function
neurons uniformly distributed in the vergence/version space
(representing oculomotor neurons) and in arm joint space
(representing arm–motor neurons). Homogeneous distributions
are used in this case instead of logarithmic ones, because the
reachable space has to be covered all with the same precision.
Again, we tested with both Gaussian and sigmoid functions,
finding slightly better results for the former, as for the first
In order to check their suitability to model the transforma-
tions performed by V6A neurons, we trained RBF networks
having the centers distributed as in Fig. 7, red and cyan graphs,
for vergence/version and joint space respectively. V6A and
nearby areas perform all the transformations required for a
correct gazing and reaching, and for this reason, an important
requirement is that the same pool of artificial neurons, centers
of the radial basis functions, have to be used in the direct and
inverse transformations, so we included both transformations
in the comparison. To avoid biasing toward one or the other
distribution, training (again 400 points) and test sets were taken
randomly from a Cartesian space. As depicted in Fig. 7, the
50IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 2011
Fig. 8. Radial basis functions distributed according to a mixed criterion
??? oculomotor ??? arm motor, red stars?, visualized over a typical Cartesian
training set (blue dots).
OCULOMOTOR AND ARM JOINT SPACE RANGES
ranges were taken so that the superposition between the center
distributions and the training and test sets were equivalent be-
tween the oculomotor and the joint space. The exact ranges of
vergence and version and of joints J1 and J2 employed in both
direct and inverse transformations are provided in Table I. A
further complication in the comparison between distributions is
that different neuron placements and different transformations
are optimized with different number of neurons and amplitudes.
We tried to normalize the various solutions as much as possible
in order to make them comparable. The number of neurons
of the pure oculomotor and joint space distributions were 49
7), to repeat the population of the first network, whilst
for the mixed distribution we employed 50 neurons (5
to match the total number of neurons as close as possible,
obtaining the placement shown in Fig. 8. For each configura-
tion we searched for the values of the spreads which provide
the smallest errors, to compare their best performances. The
learning process did not differ from the visual to oculomotor
case, and the same equations apply.
The results of the different tested configurations are shown in
Table II. As it can be observed, the joint space distribution of
neurons is reasonably good in both transformations, from ocu-
lomotor to joint space and inverse, whilst the vergence/version
distribution is good only for the joint to oculomotor transfor-
mation, which in general seems to be easier to learn than its in-
verse. A mixed distribution, with both types of neurons, allows
to obtain the best results in both transformations, much better
than either distribution alone. As a further experiment, we tried
PERFORMANCE OF RBF NETWORKS WITH NEURONS DISTRIBUTED ACCORDING
TO A VERGENCE/VERSION OCULOMOTOR SPACE (V), ARM JOINT SPACE (J),
AND MIXED SPACE (M), FOR BOTH DIRECT AND INVERSE
TRANSFORMATIONS OCULOMOTOR ? ARM–JOINT
Fig. 9. Typical learning curve of the ? ? ? transformation during the adap-
tation to the new parameters after the kinematics of the robot model has been
to distribute the neurons according to a forward selection algo-
rithm, that automatically place the centers to best fit the training
data. As shown in Table II, the results are better than the single
criterion distributionbut not better than the mixed one. The stop
condition for the forward select algorithm was to have 50 neu-
rons, to allow for a fair comparison with the other methods.
Apart for the shear improvement in performance, the use
of the mixed distribution should be especially suitable for
modifying working conditions. To test this hypothesis, and
to estimate the sort of results we could expect applying the
computational framework to the robotic setup, we changed the
kinematic parameters of the robot model, and start training
the network with the old weights from the new configuration.
The parameters included in the model are five: lengths of
and forearm, interocular distance
position of shoulder and eyes (two parameters, supposing they
are aligned in the
coordinate). Considering for example the
more demanding oculomotor to joint-space transformation,
we modified the first three of the above parameters, leaving
unchanged the relative position of shoulder and eyes. Changing
andboth from 700 to 650 mm and
mm, the error first rises up to 40 mm, and drops back almost
to the original precision only after about 50 trials, as shown in
Fig. 9. This behavior shows the adaptability of the system to
changes in working conditions, and supports its suitability for
implementation on the robot. Moreover, this test shows that
the RBF architecture constitutes a simple body-schema for the
, and relative
from 270 to 240
CHINELLATO et al.: IMPLICIT SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE BY GAZING AND REACHING51
robot , as it implicitly represents its internal parameters,
and is able to plastically adapts to modified conditions and
altered body parts.
Recent experiments [P.Fattori, unpublished data] show that
the receptive fields of many V6A neurons seem to be indeed
distributed according a vergence/version criterion. Less clear is
the effect of arm joints, also because of our simplification of
the actual joint space. In any case, our simulation supports the
hypothesis that a mixed population of neurons such as that ob-
served in V6A is especially suitable for a cortical area which
contextually codes for different reference frames. From a prag-
matic point of view, through the use of basis function neurons
set according to what suggested by neuroscience data, we were
able to learn very accurately direct and inverse transformations
between oculomotor and joint space, in a way suitable for their
application to the robotic setup.
V. ROBOTIC SETUP AND EXPERIMENTAL FRAMEWORK
On the robotics side, the final goal of this work is to provide
the robot with advanced skills in its interaction with the envi-
ronment, namely in the purposeful exploration of the periper-
sonal space and the contextual coding and control of eye and
arm movements. On the other hand, the implementation on an
actual sensorimotor setup is a potential source of additional in-
sights for the computational model, hardly achievable with sim-
ulated data. Extensive experimentation with the robot is not yet
available, and constitutes the bulk of our current work, which
methodology is outlined below.
Our humanoid robot (Fig. 10) is endowed with a pan-tilt-ver-
gence stereo head with coordinated vergence/version control of
the eyes and a multijoint arm with a three finger Barrett Hand
(not used in this work). The sort of stimuli the robot is able to
recognize by visual processing are simple dots/crosses/blobs on
a computer monitor, or custom visual markers placed on a vis-
ible section of the robot arm, independently from their position
in the robot field of view. The workspace is first positioned at
eyelevel,so thatonly2-D eyeand armmovementsare required.
After the 2-D transformation have been successfully applied
to the robot according to the model described in the previous
section, we plan to extend it to the 3-D space, introducing tilt
movements of the head and at least one more joint for the arm.
Preliminary studies with three-input RBF transformations were
successful in this regard.
As explained above, the actual map of the peripersonal space
is learned through active exploration, following the increas-
ingly demandingsequenceoftasksdescribed inSectionIII.The
training points for the visual to oculomotor transformation are
in the analysis of complex visual environments. The employed
solution is to visualize at each step a new visual target, to which
the robot has to try and perform a gazing movement using the
visual to oculomotor map itself. The gaze is very likely not to
land on the target, and the estimated residual distance of the
target from the new fixation point is used to train the network.
At the beginning, the residual error of the movement can be
very high, and this is one of the main reasons which justifies the
use of bootstrap learning with weights provided by the model,
Fig. 10. Humanoid robot with detail of pan/tilt/vergence head and arm with
in order to start with an acceptable gazing performance, that is
refined by online learning.
Once the robot has learned to perform the visual to oculo-
motor transformation according to the above schema, no addi-
tional skills are required to train the oculomotor to joint space
marker placed on its arm, which is moved in a random position
at each step, using the first network. The eye/arm movement
pairs are then used to train both direct and inverse transforma-
tions between eye and arm motor representations.
Following the above principles, the robot keeps improving
its visuomotor and arm–motor skills in each gazing or reaching
movement towards nearby goals. The use of tactile feedback
upon object touching could finally constitute a master signal
that allows to infer the exact magnitude of visual and motor er-
rors. The adaptability of the RBF-based computational frame-
work, highlighted by experiments on saccadic adaptation and
with altered kinematics conditions, indicates that the learning
Experiments of concurrent reaching and gazing allow to
generate an implicit representation of the peripersonal space
obtained by matching head-center and arm-centered schemes.
Such representation remains implicit, and far from being an
actual map of the environment, it rather constitutes a skill of
the robot in interacting with it. As a first implementation of
the model, simulated experiments of coordinated reach/gaze
actions have been performed, in which there is visual tracking
52IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 3, NO. 1, MARCH 2011
of the effector but not tactile feedback. The implemented RBF
framework is capable of bidirectional transformations between
stereo visual information and oculomotor (vergence/version)
space, and between oculomotor and arm joint space. For our
modeling purposes we used insights and functional indications
coming from monkey and human studies, especially regarding
the transformations and the contextual encoding of features in
the peripersonal space performed by area V6A. The computa-
tional structure which allows to jointly represent oculomotor
and joint space was defined in accordance to the above studies,
supporting the hypothesis that a mixed population of neurons
is the most suitable for performing different transformations.
Additionally, the system is able to adapt to altered conditions,
such as visual distortions or modified kinematic configurations,
and experiments were performed in which the agent had its
kinematics changed and was able to learn the correct actions
associated to the new parameters.
The final, integrated representation of the peripersonal space
emerges thanks to the simulated interaction of the agent with
the environment. Such implicit representation allows to contex-
tually represent the peripersonal space through different vision
and motor parameters. Very importantly, the oculomotor/arm
motor transformation is bidirectional, and the underlying repre-
sentations are both accessed and modified by each exploratory
action. The above schema is now being implemented on a real
humanoid torso, in which coordinated reach/gaze actions are
being used to integrate and match the sensorimotor maps. This
the most fundamental componentof itsbasic capability of inter-
acting with the world, and contextually updating its representa-
tion of it.
 A. Pouget and L. H. Snyder, “Computational approaches to sen-
sorimotor transformations,” Nature Neurosci., vol. 3 Suppl, pp.
1192–1198, Nov. 2000.
 S. Deneve and A. Pouget, “Basis functions for object-centered repre-
sentations,” Neuron, vol. 37, no. 2, pp. 347–359, Jan. 2003.
 R. A. Andersen, R. M. Bracewell, S. Barash, J. W. Gnadt, and L. Fo-
gassi, “Eye position effects on visual, memory, and saccade-related ac-
tivity in areas LIP and 7A of macaque,” J. Neurosci., vol. 10, no. 4, pp.
 F. Bremmer, C. Distler, and K.-P. Hoffmann, “Eye position effects in
etal areas LIP and 7A,” J. Neurophysiol., vol. 77, pp. 962–977, 1997.
 C. R. Cassanello, A. T. Nihalani, and V. P. Ferrera, “Neuronal re-
sponses to moving targets in monkey frontal eye fields,” J. Neuro-
physiol., vol. 100, pp. 1544–1556, 2008.
 P. Fattori, D. F. Kutz, R. Breveglieri, N. Marzocchi, and C. Galletti,
“Spatial tuning of reaching activity in the medial parieto-occipital
cortex (area V6A) of macaque monkey,” Eur. J. Neurosci., vol. 22, no.
4, pp. 956–972, Aug. 2005.
 M. A. Goodale and A. D. Milner, Sight Unseen.
ford Univ. Press, 2004.
 M. Jeannerod, “Visuomotor channels: Their integration in goal-di-
rected prehension,” Human Movement Science, vol. 18, no. 2, pp.
201–218, Jun. 1999.
 C. Galletti, D. F. Kutz, M. Gamberini, R. Breveglieri, and P. Fattori,
“Role of the medial parieto-occipital cortex in the control of reaching
and grasping movements,” Exp. Brain Res., vol. 153, no. 2, pp.
158–170, Nov. 2003.
 P. Fattori, M. Gamberini, D. F. Kutz, and C. Galletti, “’Arm-reaching’
neurons in the parietal area V6A of the macaque monkey,” Eur. J. Neu-
rosci., vol. 13, no. 12, pp. 2309–2313, Jun. 2001.
London, U.K.: Ox-
 P. Dechent and J. Frahm, “Characterization of the human visual V6
vol. 17, no. 10, pp. 2201–2211, 2003.
 R. Caminiti, S. Ferraina, and A. B. Mayer, “Visuomotor transforma-
tions: Early cortical mechanisms of reaching,” Current Opinion Neu-
robiol., vol. 8, no. 6, pp. 753–761, Dec. 1998.
 N. Marzocchi, R. Breveglieri, C. Galletti, and P. Fattori, “Reaching
activityin parietal area V6A of macaque: Eyeinfluence onarm activity
or retinocentric coding of reaching movements?,” Eur. J. Neurosci.,
vol. 27, no. 3, pp. 775–789, Feb. 2008.
 A. Pouget and T. J. Sejnowski, “A new view of hemineglect based on
the response properties of parietal neurones,” Philos. Trans. Roy. Soc.
B: Biol. Sci., vol. 352, no. 1360, pp. 1449–1459, Oct. 1997.
 A. Pouget, S. Deneve, and J.-R. Duhamel, “A computational perspec-
tive on the neural basis of multisensory spatial representations,” Nat.
Rev. Neurosci., vol. 3, no. 9, pp. 741–747, Sep. 2002.
 E. Salinas and P. Thier, “Gain modulation: A major computational
principle of the central nervous system,” Neuron, vol. 27, no. 1, pp.
15–21, Jul. 2000.
 , G. A. Bekey and K. Y. Goldberg, Eds., Neural Networks in
Robotics. Norwell, MA: Kluwer Academic, 1993.
 T. M. Martinetz, H. J. Ritter, and K. J. Schulten, “Three-dimensional
neural net for learning visuomotor coordination of a robot arm,” IEEE
Trans. Neural Netw., vol. 1, no. 1, pp. 131–136, Mar. 1990.
 M. Jones and D. Vernon, “Using neural networks to learn
hand-eye co-ordination,” Neural Comput. Appl., vol. 2, no. 1, pp.
visuo-motor system based on multiple self-organizing maps,” JSME
Int. J. Series C Mech. Syst., Mach. Elements Manufact., vol. 49, no. 1,
pp. 230–239, 2006.
 S. Fuke, M. Ogino, and M. Asada, “Acquisition of the head-cen-
tered peri-personal spatial representation found in vip neuron,” IEEE
Trans. Autonom. Mental Develop., vol. 1, no. 2, pp. 131–140, Aug.
 P.-Y. Zhang and T.-S. L.-B. Song, “RBF networks-based inverse kine-
matics of 6r manipulator,” Int. J. Adv. Manufact. Technol., vol. 26, no.
1–2, pp. 144–147, Jul. 2005.
 S. Kumar, L. Behera, and T. McGinnity, “Kinematic control of a re-
dundant manipulator using an inverse-forward adaptive scheme with a
ksom based hint generator,” Robot. Autonom. Syst., vol. 58, no. 5, pp.
622–633, Mau 2010.
 M. Marjanovic, B. Scassellati, and M. Williamson, “Self-taught visu-
ally-guided pointing for a humanoid robot,” in Proc. Int. Conf. Simu-
lation Adapt. Behav. (SAB) 1996, 1996.
 G. Sun and B. Scassellati, “A fast and efficient model for learning to
reach,” Int. J. Human. Robot., vol. 2, no. 4, pp. 391–413, 2005.
 W. Schenck, H. Hoffmann, and R. Möller, F. Schmalhofer, R. M.
Young, and G. Katz, Eds., “Learning internal models for eye-hand
coordination in reaching and grasping,” in Proc. EuroCogSci 2003,
2003, pp. 289–294.
 F. Nori, L. Natale, G. Sandini, and G. Metta, “Autonomous learning
of 3D reaching in a humanoid robot,” in Proc. IEEE Int. Conf. Intell.
Robot. Syst., San Diego, CA, 2007, pp. 1142–1147.
 K. E. Adolph and A. S. Joh, “Motor development: How infants get into
Eds.London, U.K.: Oxford Univ. Press, 2007, pp. 63–80.
monkey visual cortex: Binocular correlation and disparity selectivity,”
J. Neurosci., vol. 8, no. 12, pp. 4531–4550, Dec. 1988.
 C. M. Bishop, Neural Networks for Pattern Recognition.
U.K.: Oxford Univ. Press, 1995.
 T. Collins, K. Dore-Mazars, and M. Lappe, “Motor space structures
perceptual space: Evidence from human saccadic adaptation,” Brain
Res., vol. 1172, pp. 32–39, Aug. 2007.
 E. Chinellato, B. J. Grzyb, N. Marzocchi, A. Bosco, P. Fattori, and A.
P. del Pobil, “Eye-hand coordination for reaching in dorsal stream area
V6A: Computational lessons,” in Bioinspired Applications in Artificial
and Natural Computation, J. Mira, J. M. Ferrez, J.-R. Alvarez, F. de la
Paz, and F. J. Toledo, Eds. Berlin, Germany: Springer-Verlag, 2009,
vol. LNCS 5602, pp. 304–313.
 M. Hoffmann, H. Marques, A. Arieta, H. Sumioka, M. Lungarella, and
Mental Develop., vol. 2, no. 4, pp. 304–324, Dec. 2010.
CHINELLATO et al.: IMPLICIT SENSORIMOTOR MAPPING OF THE PERIPERSONAL SPACE BY GAZING AND REACHING 53 Download full-text
Eris Chinellato (S’03–M’08) received the B.Sc.
degree in industrial engineering from the Universitá
degli Studi di Padova, Padova, Italy, in 1999, the
M.Sc. degree in artificial intelligence, with the Best
Student Prize, from the University of Edinburgh,
Edinburgh, U.K., in 2002, and the Ph.D. degree
in intelligent robotics from Jaume I University,
Castellón de la Plana, Spain, in 2008.
His interdisciplinary research is mainly focused
on the use of visual information for reaching and
grasping actions in natural and artificial systems. He
has published in influential journals and proceedings in robotics, neuroscience,
and computational neuroscience, and has served as reviewer and program
committee member for international journals and conferences.
Marco Antonelli received the M.Sc. degree in
computer engineering, with an industrial automation
specialism, from Universitá degli Studi di Padova,
Padova, Italy, in 2008. He is currently working
towards the Ph.D. degree at the Robotic Intelligence
Lab, Universitat Jaume I, Castellón de la Plana,
He is collaborating with the EU-FP7 Project “Eye-
inspired humanoid robotics. He has published in in-
ternational conferences and journals and participated
to specialized courses on the subject. He is now involved in the development
of a model of multisensory egocentric representation of the 3-D space based on
binocular visual cues, and oculomotor and arm–motor signals.
Beata J. Grzyb received the M.Sc. degree in
computer science from Maria Curie-Sklodowska
University, Lublin, Poland. She is currently working
towards the Ph.D. degree in the Robotic Intelligence
Lab, Jaume I University, Castellón de la Plana,
In her research, she follows the approach of cogni-
lated to body representation, peripersonal space rep-
resentation, and perception of body effectivities, by
means of synthesizing neuroscience, developmental
psychology, and robotics, and she has already published in several journal and
Angel P. del Pobil received the B.Sc. degree in
physics in 1986, and the Ph.D. degree in engi-
neering robotics in 1991, both from the University
of Navarra, Navarra, Spain. His Ph.D. dissertation
was the winner of the 1992 National Award of the
Spanish Royal Academy of Doctors.
He is currently a Professor of Computer Science
and Artificial Intelligence at Jaume I University,
Castellón de la Plana, Spain, and is the founding Di-
rector of the Robotic Intelligence Laboratory. He is
author or coauthor of over 120 research publications,
and has been invited speaker of 34 tutorials, plenary talks, and seminars. His
past and present research interests include motion planning, visually guided
grasping, service robotics, mobile manipulators, visual servoing, learning
for sensor-based manipulation, and the interplay between neurobiology and