A biologically inspired meta-control navigation system for the Psikharpax rat robot.
ABSTRACT A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e.g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment-recognized as new contexts-and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics.
- SourceAvailable from: Stefan Elfwing[Show abstract] [Hide abstract]
ABSTRACT: Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free-energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.Frontiers in Neurorobotics 01/2013; 7:3.
- [Show abstract] [Hide abstract]
ABSTRACT: Gaining a better understanding of the biological mechanisms underlying the individual variation observed in response to rewards and reward cues could help to identify and treat individuals more prone to disorders of impulsive control, such as addiction. Variation in response to reward cues is captured in rats undergoing autoshaping experiments where the appearance of a lever precedes food delivery. Although no response is required for food to be delivered, some rats (goal-trackers) learn to approach and avidly engage the magazine until food delivery, whereas other rats (sign-trackers) come to approach and engage avidly the lever. The impulsive and often maladaptive characteristics of the latter response are reminiscent of addictive behaviour in humans. In a previous article, we developed a computational model accounting for a set of experimental data regarding sign-trackers and goal-trackers. Here we show new simulations of the model to draw experimental predictions that could help further validate or refute the model. In particular, we apply the model to new experimental protocols such as injecting flupentixol locally into the core of the nucleus accumbens rather than systemically, and lesioning of the core of the nucleus accumbens before or after conditioning. In addition, we discuss the possibility of removing the food magazine during the inter-trial interval. The predictions from this revised model will help us better understand the role of different brain regions in the behaviours expressed by sign-trackers and goal-trackers.Journal of Physiology-Paris 06/2014; · 0.82 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Vicarious trial-and-error (VTE) is a behavior observed in rat experiments that seems to suggest self-conflict. This behavior is seen mainly when the rats are uncertain about making a decision. The presence of VTE is regarded as an indicator of a deliberative decision-making process, that is, searching, predicting, and evaluating outcomes. This process is slower than automated decision-making processes, such as reflex or habituation, but it allows for flexible and ongoing control of behavior. In this study, we propose for the first time a robotic model of VTE to see if VTE can emerge just from a body-environment interaction and to show the underlying mechanism responsible for the observation of VTE and the advantages provided by it. We tried several robots with different parameters, and we have found that they showed three different types of VTE: high numbers of VTE at the beginning of learning, decreasing numbers afterward (similar VTE pattern to experiments with rats), low during the whole learning period, and high numbers all the time. Therefore, we were able to reproduce the phenomenon of VTE in a model robot using only a simple dynamical neural network with Hebbian learning, which suggests that VTE is an emergent property of a plastic and embodied neural network. From a comparison of the three types of VTE, we demonstrated that 1) VTE is associated with chaotic activity of neurons in our model and 2) VTE-showing robots were robust to environmental perturbations. We suggest that the instability of neuronal activity found in VTE allows ongoing learning to rebuild its strategy continuously, which creates robust behavior. Based on these results, we suggest that VTE is caused by a similar mechanism in biology and leads to robust decision making in an analogous way.PLoS ONE 01/2014; 9(7):e102708. · 3.53 Impact Factor
A biologically inspired meta-control navigation
system for the Psikharpax rat robot
K Caluwaerts1,2, M Staffa1,3, S N’Guyen1, C Grand1, L Doll´ e1
A Favre-Felix1, B Girard1and M Khamassi1
1Institut des Syst` emes Intelligents et de Robotique (ISIR) UMR7222, Universit´ e
Pierre et Marie Curie, CNRS, 4 place Jussieu, 75005 Paris, France
2Reservoir Lab, Electronics and Information Systems (ELIS) department, Ghent
University, Sint Pietersnieuwstraat 41, 9000 Ghent, Belgium
3Dipartimento di Informatica e Sistemistica, Universit` a degli Studi di Napoli
Federico II, Via Claudio 21, 80125, Naples, Italy
A biologically-inspired navigation system for the mobile rat-like robot named
Psikharpax is presented, allowing for self-localization and autonomous navigation in
an initially unknown environment. The ability of parts of the model (e.g. the strategy
selection mechanism) to reproduce rat behavioral data in various maze tasks has been
validated before in simulation. But the capacity of the model to work on a real robot
platform had not been tested.
This article presents our work on the implementation on the Psikharpax robot
of two independent navigation strategies (a place-based planning strategy and a cue-
guided taxon strategy) and a strategy selection meta-controller. We show how our
robot can memorize which was the optimal strategy in each situation, by means of a
reinforcement learning algorithm. Moreover, a context detector enables the controller
to quickly adapt to changes in the environment - recognized as new contexts -, and
to restore previously acquired strategy preferences when a previously experienced
context is recognized. This produces adaptivity closer to rat behavioral performances
and constitutes a computational proposition of the role of the rat prefrontal cortex
in strategy shifting. Moreover, such brain-inspired meta-controller may provide an
advancement for learning architectures in robotics.
A biologically inspired meta-control navigation system for the Psikharpax rat robot2
1.1. The Psikharpax robot
Figure 1. The v2 Psikharpax robot.
The Psikharpax robot (Meyer et al. 2005) is designed as an artificial rat, a
robotic platform built to integrate computational models of the rat’s decision, learning,
motivational and navigation circuits. It is used for two purposes: as a tool to contribute
to neuroscience by studying how an embodied agent can adapt in the real world
with noisy perceptions and continuous time and state spaces, and by testing current
neuroscience theories in such context ; as a mean to test the potential application to
robotics by assessing the transferability of neurocomputational models of learning and
decision-making to robots operating in dynamic, unknown environments.
This article is the first to report on spatial navigation with the new version of
Psikharpax (v2; Fig. 1).The robot has been equipped with a rich sensory set of
devices for multimodal perception (binaural auditory equipment, artificial whiskers,
binocular vision) and sensory integration. This previously allowed to perform tactile
texture discrimination and obstacle avoidance with the whiskers (N’Guyen, Pirim &
Meyer 2010), hearing and noise localization (Bernard et al. 2010), vision and adaptive
saccadic eye movements (N’Guyen 2010, N’Guyen, Pirim, Meyer & Girard 2010).
Here we present the upgrade of the robot’s cognitive architecture to enable the robot
to coordinate and learn multiple strategies for spatial navigation, and perform fast
adaptation to environmental changes.
1.2. State of the art of neuro-inspired navigation models for robots
Several previous projects have also tested biomimetic models of rodent navigation on
robots. These projects participate to the global approach consisting in transferring
neurocomputational models to robotics with a two-fold objective: On the one hand,
taking inspiration from the computational principles underlying mammals’ behavioural
A biologically inspired meta-control navigation system for the Psikharpax rat robot3
flexibility to contribute to the improvement of current robots’ autonomy and adaptivity.
On the other hand, using the robot as a platform to test the robustness of current
biological hypotheses about spatial cognition, beyond perfectly controlled simulations,
and try to learn more about the computational mechanisms at stake by analyzing which
solutions enabled the model to work on a physical robot (Pfeifer et al. 2007, Arbib
et al. 2008, Meyer & Guillot 2008).
Arleo and Gerstner developed a computational model of place cells - neurons
located in the hippocampus whose activity encode an estimation of the animal’s current
position - and head-direction cells - neurons selective for the estimated orientation of
the animal’s head (Arleo & Gerstner 2000). With this model, they enabled a Khepera
robot to navigate in a small arena, using a navigation strategy where a reinforcement
learning algorithm learns associations between places and directions of movement (a
place recognition triggered response or PRTR strategy, see section 1.3).
Krichmar and colleagues showed how prospective and retrospective coding at the level
of place cells’ activity can enable a robot to efficiently solve a spatial memory task
(Krichmar et al. 2005, Fleischer et al. 2007), here also navigation was performed by a
PRTR strategy. Barrera and Weitzenfeld proposed a hybrid PRTR strategy using a
graph, where the choice of the next action took into account the next three actions in a
prospective manner (Barrera & Weitzenfeld 2008). Their robot could solve discretized
implementations of various rodent laboratory mazes (T and radial mazes). Giovanangeli
and Gaussier developped a model of another navigation strategy consisting in planning
routes towards the goal in a topological graph (’cognitive map’) of the environment.
Their model produced efficient navigation in both indoor and outdoor environments
(Giovannangeli & Gaussier 2008). More recently, the RatSLAM algorithm has been
implemented as a neural network inspired by the rat’s hippocampus in order to perform
efficient, continuous and long duration Simultaneous Localization and Mapping (SLAM)
on a robotic platform put in a large non-stationary environment (Milford & Wyeth 2010).
Planning is also used here to perform navigation.
Our contribution relies in transferring to robotics another aspect of rodent
navigation abilities: the combination of various navigation strategies, in order to benefit
from their respective strengths (accuracy, learning rate, etc.), coordinated by a meta-
controller for strategy-shifting which has been previously shown to better reproduce
rodents’ behavioral performance than single navigation strategies (Doll´ e, Sheynikhovich,
Girard, Chavarriaga & Guillot 2010). Thus we extracted the principles of each previously
studied components of rats’ currently known cognitive architecture for navigation: place
cells, path integration component, path planner, reinforcement learner. And we focused
on the integration of these components in a brain-inspired system used for adaptive
A biologically inspired meta-control navigation system for the Psikharpax rat robot4
1.3. Multiple navigation strategies in rodents
Mammals are able to use multiple strategies when faced with a navigation problem
(Trullier et al. 1997, Redish 1999, Khamassi 2007, Arleo & Rondi-Reig 2007, for reviews),
like reaching a hidden platform in a pool, the so-called Morris water maze (Morris 1984).
Among the numerous possible strategies, experimental neuroscience studies of strategy
interactions favored two main families:
• Response strategies, resulting from the learning of direct sensori-motor associations
(like swimming towards a cue indicating the platform location, which is called a
• Place strategies, where the animal builds an internal representation (or cognitive
map) of the various locations of the environment, using the configuration of multiple
allocentric cues. It then uses this information to choose the direction of the next
movement by either learning place-action associations (place recognition triggered
response strategy), or, more adaptively, by planning a path in a graph connecting
the places with the actions allowing the transitions from one place to another
(topological planning strategy).
(taxon/topological-map/exploration) are connected to the gating network.
strategy has a dedicated expert which proposes actions (ΦT for the taxon, ΦP for
the planing, ...). The gating network decides which of the experts is the winner in the
current situation and then the action Φ∗from this strategy is performed.
Overview of the (Doll´ e et al., 2010) model.Different strategies
In this paper, we first apply to the Psikharpax robotic platform the multiple
strategy switching model (see Fig. 2) proposed by (Doll´ e, Sheynikhovich, Girard,
Chavarriaga & Guillot 2010), which was tested in simulation to replicate rat behavioral
experimental results. We then propose an extension of this model allowing a more
A biologically inspired meta-control navigation system for the Psikharpax rat robot5
flexible adaptation when switching from an experimental context to another (i.e. change
in goal location).
A premise of this model is that the multiple navigation strategies of rodents are
operated by parallel independent memory systems (Packard et al. 1989, Burgess 2008),
which can result in cooperative or competitive behaviors, depending on the experimental
protocol. The basal ganglia (BG) and the hippocampal formation (Hpc) appear to
have a central role in this circuitry. The BG can be subdivided in parallel sub-circuits
(Alexander et al. 1986), usually identified by the part of the striatum – the main BG
input nucleus – they incorporate. The BG operate action selection (Mink 1996) and use
reinforcement learning signals mediated by dopamine (Houk et al. 1995) to adapt these
selections to environmental conditions.
Response strategies are considered to rely on the projections from the sensory
and motor cortices to the basal ganglia circuits issued from the dorso-lateral striatum
(DLS) to select directions of movement, using reinforcement learning capabilities of
the BG to learn which cue is to be followed at a given time (Graybiel 1998, Yin &
Knowlton 2006). Consistently, lesions of the striatum – or more specifically of the
DLS – impair or reduce the expression of response strategies while promoting place
strategies (Devan & White 1999, Packard & Knowlton 2002).
of the hippocampal system impair place strategies while sparing response strategies
(Morris 1981, Packard et al. 1989, Devan & White 1999). This suggests that response
strategies are independant from the Hpc. On the other hand, place strategies would
rely on the Hpc, with its ability to encode places in the so-called place cells (O’Keefe
& Nadel 1978), to provide inputs to work with. The neural circuits exploiting them
to either learn place-action associations or to plan trajectories, would be located in the
prefrontal cortex, in the ventral striatum (VS) and in the dorso-medial BG circuit
(DMS). Indeed, lesions of the DMS reduce the expression of place strategies while
promoting response strategies (Devan & White 1999, Yin & Knowlton 2004). Lesions
of the VS impair animals ability to associate different places with different amounts of
reward (Albertin et al. 2000).
In contrast, lesions
1.4. The computational model previously used in simulation
It is however not yet clear where and how these strategies are coordinated. The (Doll´ e,
Sheynikhovich, Girard, Chavarriaga & Guillot 2010) model provides a simple mechanism
able to replicate experimental results obtained by (Pearce et al. 1998) and (Devan &
White 1999): it proposes that an additional circuit –the gating network– is dedicated
to the selection of the strategy to be used, and that it uses reinforcement learning to
learn which strategy is the most efficient in each situation, based on all the inputs used
by the strategies to take their own decisions (i.e. sensory and place cells activity).
All strategies learn simultaneously: those which did not have the control over the last
decision use the reward/punishment signals modulated by the angular difference between
their movement suggestion and the actual one: the smaller the difference, the more the
A biologically inspired meta-control navigation system for the Psikharpax rat robot6
suggested movement of a non-selected strategy will be rewarded. This is a key element
of the model to explain the cooperative effects observed in animals, where the learning
process of a slow learning strategy can thus be guided by the selections made by a fast
Mieux dcrire ici les rsultats de Doll. a a permit de reproduire des tches dans
une piscine de Morris o la position de la plateforme (le but) change systmatiquement.
Le modle apprend petit petit que l’information visuelle donn par la cue pose prs de
la plateforme est plus importante que l’information de position (qui attire le robot
vers l’ancien emplacement de la plateforme).
coopration entre stratgies qui permet de reproduire le comportement des rats: mme
aprs apprentissage, les 2 stratgies participent toutes les deux contrler une partie de la
trajectoire du rat jusqu’au but. Benoit va mieux r-expliquer a.
However, the model was previously tested in simulation which, although in
continuous state-space, was perfectly controlled and thus permitted a set of crucial
Le modle permet aussi d’analyser la
• The model had perfect access to its position and orientation;
• Visual perception was also perfect, permitting the robot to distinguish without
errors different landmark cues, and thus making it possible for the model to have
taxon submodules which learned to select a movement direction in association to
each specific landmark;
• The agent was a virtual point without a body surface, allowing holonomic motion.
Thus it is not clear whether the model can be applied to robotics in the real world, and
whether it can still reproduce rodent behavioral performance and adaptivity in such
Enoncer ici plus clairement les diffrences avec le modle de Laurent.. On
sinspire dun modle recent de strategy shifting pour voir si a marche dans real-world (avec
application potentielle pour robotique). Mais du coup, quand une partie du modle ne
marche pas (place cells dArleo), on garde le principe et on utilise une mthode dingnieurie
Here we present the integration of this neurocomputational model in the Psikharpax
robot, and the solutions adopted to cope with noisy perception and odometry. While
in simulation, each strategy could individually solve the rat goal-seeking task but a
combination of strategies was required to produce the same behavioral performance
as the rat (Doll´ e, Sheynikhovich, Girard, Chavarriaga & Guillot 2010), here we find
that each strategy can only partially but complementarily solve the task, and the
combination of strategies permits to achieve the problem. In addition, we show that
the previous strategy-shifting mechanism can adapt to environmental changes, but with
slower performances than real rats. We finally add a meta-controller to the model
which detects context-switching, permits faster adaptation to environmental changes,
and allows to quickly restore previously learned behavior when a known context is
presented again to the robot. Such meta-controller may constitute a better model of
A biologically inspired meta-control navigation system for the Psikharpax rat robot7
rat prefrontal cortical functions known to be required for adaptive strategy shifting
(Ragozzino et al. 1999, Birrell & Brown 2000, Killcross & Coutureau 2003). It may also
provide a more robust solution for strategy shifting in autonomous robots.
The first part of this paper gives a technical overview of the platform.
theoretical foundation of our work was verified by Doll´ e et al. (Doll´ e, Sheynikhovich,
Girard, Chavarriaga & Guillot 2010) in simulation, based on almost perfect sensory
input and simulated grid cells (Ujfalussy et al. 2008). Therefore, the second and third
parts of this paper present the equivalent navigation strategies and strategy selection
mechanism for the real robot. The last part presents the results obtained in a series of
robotic tests of the model.
2. Material and Methods
2.1. Architecture overview
Our software architecture was built on the ROS‡ - Robot Operating System -
middleware. The robot runs the ROS core and an external quad-core machine is used
for the visual system and the navigation strategies.
An overview of the software architecture of our model is given in Fig. 3. The system
consists of six distributed subsystems, each consisting of one or more ROS nodes. As
can be seen from Fig. 3, the central node of the system is the action selection node.
This node interacts with the gating network (see Section 4) to decide upon the next
action the robot will take.
Two additional mechanisms - guiding and obstacle avoidance - are not shown in the
figure. The obstacle avoidance strategy is implemented as a reflex strategy to prevent
the robot from leaving the environment. The guiding procedure is used to lead the robot
towards and from the goal at the end of failed and successful trials (see Section 5.4).
2.2. Visual processing and localization
More details of the visual system are given in the Appendix of this article. Here we
summarize how visual information concerning landmark cues in the environment is
extracted to build a map of place cells for localization. An overview of the model is
given in Fig. 4.
The robot is equipped with two small front-facing cameras with a total field of view
of about 60 degrees. While real rats have side-facing eyes with a large field of view, their
stereoscopic vision is limited to a region of about 76 degrees (Block 1969). The choice
for a small but stereoscopic view originally stems from experiments with saccadic eye
movements on the Psikharpax platform (N’Guyen, Pirim, Meyer & Girard 2010). We
chose to keep this setup as it allows the robot to easily distinguish nearby from far away
objects and it allows to extend the model to include attention. To overcome the limited
field of view, the robot is programmed to turn its head around at regular intervals.
‡ ROS is open-source and can be downloaded from http://ros.org
A biologically inspired meta-control navigation system for the Psikharpax rat robot8
Figure 3. Simplified overview of the software architecture (only the most important
nodes and connections are shown)
egocentric panoramic vision
Figure 4. Concise overview of the visual system. In this example there is a brightly-
colored star-shaped object at a distance of approx. 3.5 m from the robot’s head. The
robot sees this object through its two cameras directly connected to the BIPS hardware
(L1). The BIPS hardware extracts feature information from the visual object and this
information is coded on a set of feature neurons in the second layer of the visual system
(L2). Based on the angle at which both cameras see the cue, the disparity neurons
are activated to code the distance information (see also Fig. 6). The trust neurons
are activated based on odometric information: if the robot’s head is moving fast, the
trust drops. There are disparity, feature and trust neurons for each direction within
the field of view (not shown in the figure). Information in L2 layer is sent to layer L3
and integrated over different orientations to produce a 360 degrees view.
A biologically inspired meta-control navigation system for the Psikharpax rat robot9
At the lowest layer of the visual system, the cameras are directly connected to an
onboard electronic device, called the Bio-Inspired Perception System (BIPS, developed
by Brain Vision Systems, BVS) (N’Guyen 2010). This layer implements the retina and
the first layers of the visual cortex by using a neural network with inhibitory connections
to detect and track stable and saturated objects (see Fig. 5). This layer is shown as L1
in Fig. 4. Note that after this layer the raw image is discarded and only the detected
objects are used.
Figure 5. The same part of the environment seen from two different angles can result
in an object not being detected. The system should cope with this limitation through
the higher layers of the visual system.
The second layer of the visual system codes the visual information onto a set of
feature neurons. For each object, the following features are extracted: size, vertical
position, orientation, color and disparity. For each orientation within the field of view,
such a set of feature neurons exists and a detected object activates the neurons in the
direction in which it is seen. We use leaky-integrator neurons to low-pass filter the input.
The disparity codes the distance of an object with respect to the robot. Four neurons are
used to code disparity information. These neurons have a Gaussian activation function,
centered around different disparities. This results in an activation function that has a
large tail as function of the distance (Fig. 6). Hence the robot has more precise distance
information on nearby objects.
Figure 6. Activation function of the disparity neurons as a function of the distance.
They are Gaussian as a function of the disparity, naturally resulting in a non-linear
distance scale with higher precision for nearby objects.
A biologically inspired meta-control navigation system for the Psikharpax rat robot10
Primates seem to use other types of disparity measures such as relative disparity
between objects next to absolute disparity (Parker 2007). We tried to increase the
performance of the visual system by adding disparity neurons with other activation
functions (based on Gabor filters), but the quality of the resulting place cells (next
Section) did not increase. This is probably due to the fact that enough information is
already coded by the other feature neurons to distinguish places in the environment we
used and that a non-linear training algorithm is used.
An important part of the second layer of the visual system are the so-called trust
neurons. These neurons modulate the output of the second layer to the third layer of the
visual system. The idea is to suppress noisy inputs when the visual input is unreliable.
This happens when the head of the robot moves too fast, as the neurons in the first
(tracking units) and second layer need some time to stabilize. This is easily detected by
the odometric system and hence the odometric system is used to modulate (suppress)
the connections between the second and third layers when necessary. The faster the
head movement, the less reliable the visual information. This prevents the third layer
of the visual system from being influenced by unreliable information.
The third level of the visual system integrates the information from the second
layer by combining it with odometric information. This results in egocentric panoramic
information on the environment.
2.3. Visual place cells
The output from the neural network visual layers is high-dimensional (about 800
neurons). Because simple rate-coding neurons were used, the output can be seen as
a vector representing egocentric visual information integration.
directional place cells, such output vectors were summed over all orientations to
activate the same neuron (i.e. a place cell). The problem was therefore reduced to
a dimensionality reduction or clustering problem. This subsystem is indicated as PC
on Fig. A1 in the Appendix.
In a first version of the simulation model (Doll´ e, Sheynikhovich, Girard,
Chavarriaga & Guillot 2010)-a, ad hoc place cells were used, and thus the dimensionality
reduction / clustering problem was not addressed. In (Doll´ e, Sheynikhovich, Girard,
Ujfalussy, Chavariagga & Guillot 2010)-b, a model of the hippocampus (Ujfalussy
et al. 2008) was used to autonomously create the place cells. It is based on a competitive
Hebbian-like learning rule: a number of random place cells are created, during the
learning phase, the place cells specialize for particular input patterns using a sparseness-
based Hebbian rule, which only allows the most active input neurons to reinforce their
Such an approach works very well when the number of input neurons and distinct
patterns is not too high and the patterns are well characterized by their most active
neurons.In our case however, the input can be noisy with typically large, but
meaningless values for a few neurons in the input. When the sparseness function from
To construct non-
A biologically inspired meta-control navigation system for the Psikharpax rat robot11
(Doll´ e, Sheynikhovich, Girard, Ujfalussy, Chavariagga & Guillot 2010) is applied to such
an input, the noise is reinforced, while useful neurons will be ignored.
We therefore needed a technique that learns the input patterns by evaluating the
whole set of input neurons instead of only the most active ones. We initially tried linear
approaches such as Principal Component Analysis (Pearson 1901) to check if the inputs
were linearly separable. At most 4 to 5 regions could be consistently separated. This
is insufficient for good performance, as the place fields of the place cells would be too
large (only 4 to 5 distinct zones). The gating network takes input from the place cells
and hence its precision is limited by the place cells.
clustering are Self-Organizing Maps (SOM)(Kohonen et al. 2001). The goal of the SOM
is to move the neurons in the high-dimensional input space to approach the topology of
the input. For each input the Euclidean distance between the input and each neuron
of the SOM is computed. The closest neuron is called the best-matching unit (BMU).
The SOM is then updated by moving the BMU closer to the input (weighted sum) as
well as its neighbors. In order to avoid using a fixed number of neurons in the SOM,
we used the Growing Neural Gas (GNG) algorithm (Fritzke 1995). GNGs are created
incrementally by inserting a new neuron after a number of input samples by splitting
the neuron with the largest accumulated error (sum of distances) into two new neurons.
The topology itself is also learned by keeping the neighborhood of the neurons up to
We used the GNG algorithm to train the weights of an artifical neural network
(see Appendix for a detailed description of the implementation). While we do not
assume that there is a direct biological equivalent of this training algorithm, nor
the activation function (which is based on the Euclidean distance), we do not think
that our model makes unrealistic assumptions about the role of the hippocampus in
categorizing different places. As (Fritzke 1995) indicates, the GNG algorithm can
be seen as a form of (non-linear) competitive Hebbian learning, which is the main
reason why we chose this algorithm. Because the main interest of this work lies in
the strategy and context switching mechanisms, we did not investigate how exactly
one might implement the GNG algorithm with a biologically plausible neural network.
However, this should not be a problem, as the algorithm is straight-forward and only
depends on the computation of the Euclidean distance (or another distance measure),
the creation/removal of edges and updating a local error measure. A further argument
for using a non-linear approach is that non-linear algorithms can often be cast as a linear
technique working in a larger (possibly infinite) feature space by simply replacing inner
products with a kernel function (i.e. the kernel function computes the inner product in
a different space)(Bishop 2007). For example, the kernel trick is often applied to PCA
(Kernel PCA) (Sch¨ olkopf et al. 1998) for which a Hebbian version already exists(Kim
et al. 2003).
We found that the GNG technique is very flexible. When one forces the GNG to
Implementing a Self-Organizing Map A popular non-linear alternative for
A biologically inspired meta-control navigation system for the Psikharpax rat robot 12
use only a small number of neurons, the GNG creates large continuous place fields (Fig.
7). When more neurons are available, the place fields are smaller.
The neurons from the GNG layer project onto the planning strategy and the gating
network. The resulting place cells for a GNG layer with 12 neurons are shown in figure
7. One can see that the best (i.e. most restricted to a particular zone) place cells are
found near the borders (similar to Fig. 9 (D) of (Arleo & Rondi-Reig 2007)). This is
due to the fact that there are no intramaze cues, which causes observations near the
center to be more similar and thus less place cells are created in this region. We used a
higher number of place cells in our experiments to increase the precision of the planning
strategy (Section 3.1) and the gating network (Section 4.2). However this phenomenon
still occurs (e.g. Fig. 8(b)), causing topological maps to be less dense near the center.
Figure 7. Heat map of 12 place cells (GNG with 12 output neurons), with smooth
activation (Eq. A.4). Note that there are 2 place cells (top left and first row third from
left) which only have very weak activations but large receptive fields. These cells will
thus not be important as their activation will be negligible (the topological map will
discard them). The axes of each of the images give the position in meters of the robot’s
head as recorded by a ceiling camera. Note that there are more and more precise place
cells coding for locations near the borders of the environment.
2.3.2. Experimental testing of the place cell system To get a rough estimate of the
usefulness of the place cell system, the robot was put at 40 random places in the
environment and indicated the activation of its place cells (for this experiment, we
trained 100 place cells). The Euclidean distance between the real position of the robot
and the center of activation (computed by averaging over a large training set using
A biologically inspired meta-control navigation system for the Psikharpax rat robot 13
a ceiling camera) of the most active place cell (binary activation) was computed to
estimate the precision of the place cell coding. The mean distance was found to be 16.5
cm with a standard deviation of 8.5 cm. When near the border of the environment, the
mean lies around 11 cm, which is very good, but when approaching the center of the
environment the mean distance becomes considerably larger (between 20 and 30 cm).
In (Arleo & Rondi-Reig 2007) the authors found a mean error of 6 cm but in a smaller
(0.8 m by 0.8 m) environment.
In the ideal case, one would expect the place fields to be evenly distributed. Hence,
to evaluate our place cells mechanism, we consider 100 points picked randomly from a
uniform distribution on a rectangle of 2.5 m by 2.0 m. The mean distance from any
such point within the rectangle to the nearest point out of the 100 randomly chosen
points would be about 12 cm. The mean distance to the second closest point would be
about 18 cm and 22.5 cm for the third closest point.
This indicates that the presented place cell system performs reasonably well,
compared to the ideal, uniformly distributed case.
We tested the system with different number of place cells, splitting the data in a
train and test set. We found that 100 place cells is about the maximum one can obtain
in our environment without overfitting the training data. Moreover, for higher numbers
of cells, the classification results on the test set did not increase. Thus for all the next
experiments with the multiple navigation strategy model, we used 100 place cells. In
the next section we present the navigation strategy which is based on information from
the place cells layer.
3. Navigation strategies
3.1. Planning expert
The (Doll´ e, Sheynikhovich, Girard, Chavarriaga & Guillot 2010) model best replicated
experimental results using a planning algorithm for the place strategy, rather than a
place recognition triggered response one. It is organized as follows: a place cell module,
simulating the hippocampus, is in charge of learning internal representations of places in
the environment using sensory inputs; a graph module, simulating the prefrontal cortex,
learns by means of a Hebbian rule the directions of movement, which are used to go
from one place to another. When the goal has been found at least once, a diffusion of
activity in this graph originating from the goal node generates a gradient which, when
followed, leads to the goal with the shortest path (Hasselmo 2005, Martinet et al. 2011).
Hence, we take a similar approach, by using a graph based approach for the planning
strategy based on information from our GNG place cells.
The output of the place cells layer feeds into the topological map.
experiments discussed in this article, the place cells feed directly into the topological
map. Hence the number of locations of the topological map is equal to the number of
A biologically inspired meta-control navigation system for the Psikharpax rat robot 14
This is not a requirement as other subsystems (e.g. grid cells) might feed into
the topological map to define the current location and because the topological map
can cluster similar inputs (e.g. different place cells projecting onto the same node
in the topological map to create a coarser, but more stable topological map). In the
experiments presented here, we intentionally made the number of place cells equals to the
number of nodes (locations) in the topological map (the place cells and the topological
map have the same resolution). At each time step, only the most active place cell is used
to define the most likely location in the topological map (all other connections are set
to zero). This makes analyzing the results easier, as the performance of the topological
map will be directly related to the performance of the place cells. To differentiate nodes
in the topological map from place cells, we use the notation nPFCfor nodes in the map
(referring to the prefrontal cortex) and nPCfor place cells.
3.1.1. Learning the topological map
During an exploration phase BG: can you precise here (or later) that this
phase is carried out before the navigation experiments, so that the robot
begins with an operational map of place cells. If it was done offline with
previously recorded data, it should probably be mentioned too., the topological
map learns connections between nodes by Hebbian learning.
of information need to be stored, the relative angle between two nodes and their
mutual distance (Fig. 8(a)). To store this information, we use two sets of transition
neurons for each connection between nodes (Banquet et al. 2005, Cuperlier et al. 2007).
There are NANG transition neurons (per set) for directional information and NDIST
for distance information. Each node initially has connections (transition neurons) to
every other node with weights zero. The transition neurons are stored in a vector
vk,l = [v1
the transition from node k to l and the superscript is the affected transition neuron
(orientation or distance information).
As for the distance information, the angular information is stored in a set of NANG
neurons with Gaussian activation functions centered around a fixed directions. We define
the vector b = [b1,...,bNANG,bNANG+1,...,bNDIST+NANG]Tsimilar to vk,l. The first NANG
elements of b code the angular information between locations. The last NDIST neurons
contain the distance information between 2 locations. I.e. a vector b(t0,t1) contains the
activation of the transition neurons to the location where the robot is at time t1from
the location where the robot was at time t0.
Now, to update the neurons vi
update the weights to the transition neurons using a simple learning rule:
For this, two types
]T, i.e.the subscript k,l indicates
k,l, we iterate over a trajectory of the robot and
k,l(t + 1) =
(1 − δk,l)H(¯nL3
conf− β)bi(tk,t) if k = last ∧ l = winner
Here δk,lis the Kronecker delta, to prohibit connections from a node to itself. H(x) is
the Heaviside step function used to prevent updating the graph when the confidence of
A biologically inspired meta-control navigation system for the Psikharpax rat robot 15
the place cells (¯
the topological map where the robot was when
last time, i.e. the previous location.
For planning in this graph only the shortest transitions between nodes are kept,
based on the distance coding neurons. For the results presented here, we fixed the
maximum number of transitions per node to 6. A learnt map is shown in Fig. 8(b).
While we explained this process as a sequential algorithm (recruitment of nodes, learning
of transitions, competition), it can be done online by simply adding an additional
transition usage intensity neuron v0
from k to l, combined with a decay rate or competition factor (i.e. −ψ¯
previous equation, which also prevents the weights from increasing without bounds.
conf) is below a threshold β. last is an index referring to the node in
confwas above the threshold for the
k,lincreasing in activation when the robot moves
k,l) to the
3.1.2. Using the planning expert
In order to plan in this graph, the model maintains a set of neurons gi, one for each
node corresponding to the reward received at each of the locations. A leak rate is added
so that the robot can navigate in an environment with changing reward locations. gwinner
is the neuron assigned to the node at the robot’s current position.
gj(t + 1)
gwinner(t + 1) = gwinner(t)(1 − τlearnnPFC
The second value associated with a node is the diffusion value dj(t) and this value is
used to implement a shortest-path algorithm. The activation from the goals diffuses or
spreads out (Hasselmo 2005, Martinet et al. 2011) over the other nodes (Fig. 8(a)). To
compute the equilibrium state efficiently, we used a modified Floyd-Warshall algorithm
(Floyd 1962), where dj[iter] is used to refer to the value of djat iteration iter:
= gj(t + 1)(1 − τforget)(1)
winner(t)) + τlearnnPFC
Algorithm 1 Computing the diffusion values
iter = 0
for i = 0;i < NPFC;i = i + 1 do
di = gi
while iter < NPFC− 1 do
for i = 0;i < NPFC;i = i + 1 do
di[iter + 1] = max(di[iter],maxj∈neighbors(i)(dj[iter])ι)
iter = iter + 1
for i = 0;i < NPFC;i = i + 1 do
di= di[NPFC− 1]
This algorithm is only run when
its current position.
conf> β, i.e. when the robot has a high trust innL3