A biologically inspired meta-control navigation system for the Psikharpax rat robot

Institut des Systèmes Intelligents et de Robotique (ISIR), Université Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France.
Bioinspiration &amp Biomimetics (Impact Factor: 2.35). 06/2012; 7(2):025009. DOI: 10.1088/1748-3182/7/2/025009
Source: PubMed


A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e.g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment-recognized as new contexts-and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics.

82 Reads
  • Source
    • " MB systems are then updated according to the action a taken by the full model in state s – even if the systems would have individ - ually favoured different actions – and the resulting new state s 0 and retrieved reward r , as previously done in other computational mod - els involving a cooperation between model - free and model - based systems ( Caluwaerts et al . , 2012 ) ."
    [Show abstract] [Hide abstract]
    ABSTRACT: Gaining a better understanding of the biological mechanisms underlying the individual variation observed in response to rewards and reward cues could help to identify and treat individuals more prone to disorders of impulsive control, such as addiction. Variation in response to reward cues is captured in rats undergoing autoshaping experiments where the appearance of a lever precedes food delivery. Although no response is required for food to be delivered, some rats (goal-trackers) learn to approach and avidly engage the magazine until food delivery, whereas other rats (sign-trackers) come to approach and engage avidly the lever. The impulsive and often maladaptive characteristics of the latter response are reminiscent of addictive behaviour in humans. In a previous article, we developed a computational model accounting for a set of experimental data regarding sign-trackers and goal-trackers. Here we show new simulations of the model to draw experimental predictions that could help further validate or refute the model. In particular, we apply the model to new experimental protocols such as injecting flupentixol locally into the core of the nucleus accumbens rather than systemically, and lesioning of the core of the nucleus accumbens before or after conditioning. In addition, we discuss the possibility of removing the food magazine during the inter-trial interval. The predictions from this revised model will help us better understand the role of different brain regions in the behaviours expressed by sign-trackers and goal-trackers.
    Journal of Physiology-Paris 06/2014; 109(1-3). DOI:10.1016/j.jphysparis.2014.06.001 · 1.90 Impact Factor
  • Source
    • "Neural models of memory can also detect novelty by learning sequences of states that provide a simple mean of representing pathways through the environment (Hasselmo and McClelland, 1999). Dollé (2011) and Caluwaerts et al. (2012) propose models of metacontroller for spatial navigation that select on the fly the best strategy in a given situation. A competition following by a reinforcement learning allows to associate the action that best fits to the situation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Autonomy and self-improvement capabilities are still challenging in the fields of robotics and machine learning. Allowing a robot to autonomously navigate in wide and unknown environments not only requires a repertoire of robust strategies to cope with miscellaneous situations, but also needs mechanisms of self-assessment for guiding learning and for monitoring strategies. Monitoring strategies requires feedbacks on the behavior's quality, from a given fitness system in order to take correct decisions. In this work, we focus on how a second-order controller can be used to (1) manage behaviors according to the situation and (2) seek for human interactions to improve skills. Following an incremental and constructivist approach, we present a generic neural architecture, based on an on-line novelty detection algorithm that may be able to self-evaluate any sensory-motor strategies. This architecture learns contingencies between sensations and actions, giving the expected sensation from the previous perception. Prediction error, coming from surprising events, provides a measure of the quality of the underlying sensory-motor contingencies. We show how a simple second-order controller (emotional system) based on the prediction progress allows the system to regulate its behavior to solve complex navigation tasks and also succeeds in asking for help if it detects dead-lock situations. We propose that this model could be a key structure toward self-assessment and autonomy. We made several experiments that can account for such properties for two different strategies (road following and place cells based navigation) in different situations.
    Frontiers in Neurorobotics 10/2013; 7:16. DOI:10.3389/fnbot.2013.00016
  • Source
    • "In this study, we used a machine learning approach to visual navigation in neurorobotics, where the neural encoding is an emergent property of the function approximation used in the learning algorithm. An alternative approach is to use biologically-inspired computational modeling of the brain circuits involved in navigation in real animals (Arleo and Gerstner, 2000; Krichmar et al., 2005; Fleischer et al., 2007; Barrera and Weitzenfeld, 2008; Giovannangeli and Gaussier, 2008; Milford and Wyeth, 2010; Caluwaerts et al., 2012). Currently, the two approaches are mostly complementary. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free-energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.
    Frontiers in Neurorobotics 02/2013; 7:3. DOI:10.3389/fnbot.2013.00003
Show more