A biologically inspired meta-control navigation system for the Psikharpax rat robot.

Institut des Systèmes Intelligents et de Robotique (ISIR), Université Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France.
Bioinspiration &amp Biomimetics (Impact Factor: 2.41). 06/2012; 7(2):025009. DOI: 10.1088/1748-3182/7/2/025009
Source: PubMed

ABSTRACT A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e.g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment-recognized as new contexts-and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A common assumption in psychology, economics, and other fields holds that higher performance will result if extrinsic rewards (such as money) are offered as an incentive. While this principle seems to work well for tasks that require the execution of the same sequence of steps over and over, with little uncertainty about the process, in other cases, especially where creative problem solving is required due to the difficulty in finding the optimal sequence of actions, external rewards can actually be detrimental to task performance. Furthermore, they have the potential to undermine intrinsic motivation to do an otherwise interesting activity. In this work, we extend a computational model of the dorsomedial and dorsolateral striatal reinforcement learning systems to account for the effects of extrinsic and intrinsic rewards. The model assumes that the brain employs both a goal-directed and a habitual learning system, and competition between both is based on the trade-off between the cost of the reasoning process and value of information. The goal-directed system elicits internal rewards when its models of the environment improve, while the habitual system, being model-free, does not. Our results account for the phenomena that initial extrinsic reward leads to reduced activity after extinction compared to the case without any initial extrinsic rewards, and that performance in complex task settings drops when higher external rewards are promised. We also test the hypothesis that external rewards bias the competition in favor of the computationally efficient, but cruder and less flexible habitual system, which can negatively influence intrinsic motivation and task performance in the class of tasks we consider.
    Frontiers in Psychology 01/2013; 4:739. · 2.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gaining a better understanding of the biological mechanisms underlying the individual variation observed in response to rewards and reward cues could help to identify and treat individuals more prone to disorders of impulsive control, such as addiction. Variation in response to reward cues is captured in rats undergoing autoshaping experiments where the appearance of a lever precedes food delivery. Although no response is required for food to be delivered, some rats (goal-trackers) learn to approach and avidly engage the magazine until food delivery, whereas other rats (sign-trackers) come to approach and engage avidly the lever. The impulsive and often maladaptive characteristics of the latter response are reminiscent of addictive behaviour in humans. In a previous article, we developed a computational model accounting for a set of experimental data regarding sign-trackers and goal-trackers. Here we show new simulations of the model to draw experimental predictions that could help further validate or refute the model. In particular, we apply the model to new experimental protocols such as injecting flupentixol locally into the core of the nucleus accumbens rather than systemically, and lesioning of the core of the nucleus accumbens before or after conditioning. In addition, we discuss the possibility of removing the food magazine during the inter-trial interval. The predictions from this revised model will help us better understand the role of different brain regions in the behaviours expressed by sign-trackers and goal-trackers.
    Journal of Physiology-Paris 06/2014; · 0.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Vicarious trial-and-error (VTE) is a behavior observed in rat experiments that seems to suggest self-conflict. This behavior is seen mainly when the rats are uncertain about making a decision. The presence of VTE is regarded as an indicator of a deliberative decision-making process, that is, searching, predicting, and evaluating outcomes. This process is slower than automated decision-making processes, such as reflex or habituation, but it allows for flexible and ongoing control of behavior. In this study, we propose for the first time a robotic model of VTE to see if VTE can emerge just from a body-environment interaction and to show the underlying mechanism responsible for the observation of VTE and the advantages provided by it. We tried several robots with different parameters, and we have found that they showed three different types of VTE: high numbers of VTE at the beginning of learning, decreasing numbers afterward (similar VTE pattern to experiments with rats), low during the whole learning period, and high numbers all the time. Therefore, we were able to reproduce the phenomenon of VTE in a model robot using only a simple dynamical neural network with Hebbian learning, which suggests that VTE is an emergent property of a plastic and embodied neural network. From a comparison of the three types of VTE, we demonstrated that 1) VTE is associated with chaotic activity of neurons in our model and 2) VTE-showing robots were robust to environmental perturbations. We suggest that the instability of neuronal activity found in VTE allows ongoing learning to rebuild its strategy continuously, which creates robust behavior. Based on these results, we suggest that VTE is caused by a similar mechanism in biology and leads to robust decision making in an analogous way.
    PLoS ONE 01/2014; 9(7):e102708. · 3.53 Impact Factor

Full-text (5 Sources)

Available from
May 19, 2014