Conference Paper

An actor-critic with an internal model

To read the full-text of this research, you can request a copy directly from the authors.


Current evidence suggests that the brain uses multiple systems for instrumental control; these systems are known as \textit{model-based} and \textit{model-free}. The former predicts action-outcomes using an internal model of the agent's environment, while the latter learns to repeat previously rewarded actions. This paper proposes a neural architecture comprised of both model-free and model-based reinforcement learning systems, and tests this model's ability to perform target-reaching with a simulated biarticulate robotic arm. Target-reaching conditions included (A) both static and dynamic target properties, (B) slowly changing robotic arm kinematics, and (C) absence of visual inputs. The proposed model rapidly learns an internal model of environmental dynamics, shows target-reaching performance superior to an existing state of the art model, and successfully performs target-reaching without visual input.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this case, the APAC behaves exactly like DDPG. If the arbitrator always selects the action from the inverse model for each step, then the APAC becomes an exclusive deliberate planning controller which we call supervised predictive actor-critic (SPAC) [41]. The third model is when the APAC is able to arbitrate between the actions provided by the inverse model and the actor. ...
Full-text available
It is well established that humans decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent's environment, while habitual action selection systems learn to automate by repeating previously rewarded actions. Habitual control is computationally efficient but may be inflexible in changing environments. Conversely, deliberate planning may be computationally expensive, but flexible in dynamic environments. This paper proposes a general architecture comprising both control paradigms by introducing an arbitrator that controls which subsystem is used at any time. This system is implemented for a target-reaching task with a simulated two-joint robotic arm that comprises a supervised internal model and deep reinforcement learning. Through permutation of target-reaching conditions, we demonstrate that the proposed is capable of rapidly learning kinematics of the system without a priori knowledge, and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to exclusive deliberate planning with the internal model and exclusive habitual control instances of the model. The results show how such a model can harness the benefits of both systems, using fast decisions in reliable circumstances while optimizing performance in changing environments. In addition, the proposed model learns very fast. Finally, the system which includes internal models is able to reach the target under the visual occlusion, while the pure habitual system is unable to operate sufficiently under such conditions.
Full-text available
Internal models are neural mechanisms that can mimic the input-output or output-input properties of the motor apparatus and external objects. Forward internal models predict sensory consequences from efference copies of motor commands. There is growing acceptance of the idea that forward models are important in sensorimotor integration as well as in higher cognitive function, but their anatomical loci and neural mechanisms are still largely unknown. Some of the most convincing evidence that the central nervous system (CNS) makes use of forward models in sensory motor control comes from studies on grip force-load force coupling. We first present a brief review of recent computational and behavioral studies that provide decisive evidence for the utilization of forward models in grip force-load force coupling tasks. Then, we used functional magnetic resonance imaging (fMRI) to measure the brain activity related to this coupling and demonstrate that the cerebellum is the most likely site for forward models to be stored.
Although choice is often unitary on theoretical accounts, there is much empirical evidence that decisions are produced by multiple, cooperating or competing neural and psychological mechanisms. We review the evidence that decisions in humans and other animals are influenced by three systems for value learning: Pavlovian, habitual, and goal-directed. These systems are behaviorally dissociable, are mediated by at least partly differentiable brain systems, and embody distinct computational principles. We discuss how the interactions between these systems for behavioral control can produce errors, inefficiencies, and disorders involving compulsion, and how these systems relate to other dual- or multiple-system models in neuroeconomics.
Humans can point fairly accurately to memorized states when closing their eyes despite slow or even missing sensory feedback. It is also common that the arm dynamics changes during development or from injuries. We propose a biologically motivated implementation of an arm controller that includes an adaptive observer. Our implementation is based on the neural field framework, and we show how a path integration mechanism can be trained from few examples. Our results illustrate successful generalization of path integration with a dynamic neural field by which the robotic arm can move in arbitrary directions and velocities. Also, by adapting the strength of the motor effect the observer implicitly learns to compensate an image acquisition delay in the sensory system. Our dynamic implementation of an observer successfully guides the arm toward the target in the dark, and the model produces movements with a bell-shaped velocity profile, consistent with human behavior data.
This review will focus on the possibility that the cerebellum contains an internal model or models of the motor apparatus. Inverse internal models can provide the neural command necessary to achieve some desired trajectory. First, we review the necessity of such a model and the evidence, based on the ocular following response, that inverse models are found within the cerebellar circuitry. Forward internal models predict the consequences of actions and can be used to overcome time delays associated with feedback control. Secondly, we review the evidence that the cerebellum generates predictions using such a forward model. Finally, we review a computational model that includes multiple paired forward and inverse models and show how such an arrangement can be advantageous for motor learning and control.
  • T P Lillicrap
  • J J Hunt
  • A Pritzel
  • N Heess
  • T Erez
  • Y Tassa
  • . . Wierstra
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y.,... Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.