Jun Morimoto

National Institute of Informatics, Edo, Tōkyō, Japan

Are you Jun Morimoto?

Claim your profile

Publications (91)35.92 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning trials. Such trials cannot be executed in real environments because unrealistic time is necessary and the real system's durability is limited. Therefore, in this study, instead of executing many learning trials, we propose to use a recently developed RL algorithm, importance-weighted PGPE, by which the robot can efficiently reuse previously sampled data to improve it's policy parameters. We apply importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can learn a target reaching movement and a cart-pole swing up movement in a real environment without using any prior knowledge of the task or any carefully designed initial trajectory.
    05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning trials. Such trials cannot be executed in real environments because unrealistic time is necessary and the real system's durability is limited. Therefore, in this study, instead of executing many learning trials, we propose to use a recently developed RL algorithm, importance-weighted PGPE, by which the robot can efficiently reuse previously sampled data to improve it's policy parameters. We apply importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can learn a target reaching movement and a cart-pole swing up movement in a real environment without using any prior knowledge of the task or any carefully designed initial trajectory.
    04/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose to tackle in this paper the problem of controlling whole-body humanoid robot behavior through non-invasive brain-machine interfacing (BMI), motivated by the perspective of mapping human motor control strategies to human-like mechanical avatar. Our solution is based on the adequate reduction of the controllable dimensionality of a high-DOF humanoid motion in line with the state-of-the-art possibilities of non-invasive BMI technologies, leaving the complement subspace part of the motion to be planned and executed by an autonomous humanoid whole-body motion planning and control framework. The results are shown in full physics-based simulation of a 36-degree-of-freedom humanoid motion controlled by a user through EEG-extracted brain signals generated with motor imagery task.
    Frontiers in Systems Neuroscience 01/2014; 8:138.
  • Giuseppe Lisi, Tomoyuki Noda, Jun Morimoto
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the influence of the leg afferent input, induced by a leg assistive robot, on the decoding performance of a BMI system. Specifically, it focuses on a decoder based on the event-related (de)synchronization (ERD/ERS) of the sensorimotor area. The EEG experiment, performed with healthy subjects, is structured as a 3 × 2 factorial design, consisting of two factors: "finger tapping task" and "leg condition." The former is divided into three levels (BMI classes), being left hand finger tapping, right hand finger tapping and no movement (Idle); while the latter is composed by two levels: leg perturbed (Pert) and leg not perturbed (NoPert). Specifically, the subjects' leg was periodically perturbed by an assistive robot in 5 out of 10 sessions of the experiment and not moved in the remaining sessions. The aim of this study is to verify that the decoding performance of the finger tapping task is comparable between the two conditions NoPert and Pert. Accordingly, a classifier is trained to output the class of the finger tapping, given as input the features associated with the ERD/ERS. Individually for each subject, the decoding performance is statistically compared between the NoPert and Pert conditions. Results show that the decoding performance is notably above chance, for all the subjects, under both conditions. Moreover, the statistical comparison do not highlight a significant difference between NoPert and Pert in any subject, which is confirmed by feature visualization.
    Frontiers in Systems Neuroscience 01/2014; 8:85.
  • Yuka Ariki, Tetsunari Inamura, Jun Morimoto
    Proc. of the IEEE-RAS Int'l Conf. on Humanoid Robots; 11/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Learning of goal-directed behaviors in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). Although traditionally modeled as separate learning systems in artificial agents, numerous animal experiments point towards their co-operative role in behavioral learning. Based on this concept, the recently introduced framework of neural combinatorial learning combines the two systems where both the systems run in parallel to guide the overall learned behavior. Such a combinatorial learning demonstrates a faster and efficient learner. In this work, we further improve the framework by applying a reservoir computing network (RC) as an adaptive critic unit and reward modulated Hebbian plasticity. Using a mobile robot system for goal-directed behavior learning, we clearly demonstrate that the reservoir critic outperforms traditional radial basis function (RBF) critics in terms of stability of convergence and learning time. Furthermore the temporal memory in RC allows the system to learn partially observable markov decision process scenario, in contrast to a memoryless RBF critic.
    IEEE International Conference on Systems, Man and Cybernetics 2013 (IEEE SMC 2013), Manchester, U.K.; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Humans can effortlessly perceive an object they encounter for the first time in a possibly cluttered scene and memorize its appearance for later recognition. Such performance is still difficult to achieve with artificial vision systems because it is not clear how to define the concept of objectness in its full generality. In this paper we propose a paradigm that integrates the robot’s manipulation and sensing capabilities to detect a new, previously unknown object and learn its visual appearance. By making use of the robot’s manipulation capabilities and force sensing, we introduce additional information that can be utilized to reliably separate unknown objects from the background. Once an object has been identified, the robot can continuously manipulate it to accumulate more information about it and learn its complete visual appearance. We demonstrate the feasibility of the proposed approach by applying it to the problem of autonomous learning of visual representations for viewpoint-independent object recognition on a humanoid robot.
    Adaptive Behavior 10/2013; 21(5):328-345. · 1.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the practical usefulness of the proposed method.
    Neural Networks. 07/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Classical conditioning (conventionally modeled as correlation-based learning) and operant conditioning (conventionally modeled as reinforcement learning or reward-based learning) have been found in biological systems. Evidence shows that these two mechanisms strongly involve learning about associations. Based on these biological findings, we propose a new learning model to achieve successful control policies for artificial systems. This model combines correlation-based learning using input correlation learning (ICO learning) and reward-based learning using continuous actor–critic reinforcement learning (RL), thereby working as a dual learner system. The model performance is evaluated by simulations of a cart-pole system as a dynamic motion control problem and a mobile robot system as a goal-directed behavior control problem. Results show that the model can strongly improve pole balancing control policy, i.e., it allows the controller to learn stabilizing the pole in the largest domain of initial conditions compared to the results obtained when using a single learning mechanism. This model can also find a successful control policy for goal-directed behavior, i.e., the robot can effectively learn to approach a given goal compared to its individual components. Thus, the study pursued here sharpens our understanding of how two different learning mechanisms can be combined and complement each other for solving complex tasks.
    Advances in Complex Systems 07/2013; 16(02n03). · 0.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
    Neural Computation 03/2013; · 1.76 Impact Factor
  • T Matsubara, J Morimoto
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we propose a multi-user myoelectric interface that can easily adapt to novel users. When a user performs different motions (e.g., grasping and pinching), different EMG signals are measured. When different users perform the same motion (e.g., grasping), different EMG signals are also measured. Therefore, designing a myoelectric interface that can be used by multiple users to perform multiple motions is difficult. To cope with this problem, we propose for EMG signals a bilinear model that is composed of two linear factors: 1) user-dependent and 2) motion-dependent. By decomposing the EMG signals into these two factors, the extracted motion-dependent factors can be used as user-independent features. We can construct a motion classifier on the extracted feature space to develop the multi-user interface. For novel users, the proposed adaptation method estimates the user-dependent factor through only a few interactions. The bilinear EMG model with the estimated user-dependent factor can extract the user-independent features from the novel user data. We applied our proposed method to a recognition task of five hand gestures for robotic hand control using four-channel EMG signals measured from subject forearms. Our method resulted in 73% accuracy, which was statistically significantly different from the accuracy of standard non-multi user interfaces, as the result of a two-sample t-test at a significance level of 1%.
    IEEE transactions on bio-medical engineering 03/2013; · 2.15 Impact Factor
  • Yuka Ariki, Sang-Ho Hyon, Jun Morimoto
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose an imitation learning framework to generate physically consistent behaviors by estimating the ground reaction force from captured human behaviors. In the proposed framework, we first extract behavioral primitives, which are represented by linear dynamical models, from captured human movements and measured ground reaction force by using the Gaussian mixture of linear dynamical models. Therefore, our method has small dependence on classification criteria defined by an experimenter. By switching primitives with different combinations while estimating the ground reaction force, different physically consistent behaviors can be generated. We apply the proposed method to a four-link robot model to generate squat motion sequences. The four-link robot model successfully generated the squat movements by using our imitation learning framework. To show generalization performance, we also apply the proposed method to robot models that have different torso weights and lengths from a human demonstrator and evaluate the control performances. In addition, we show that the robot model is able to recognize and imitate demonstrator movements even when the observed movements are deviated from the movements that are used to construct the primitives. For further evaluation in higher-dimensional state space, we apply the proposed method to a seven-link robot model. The seven-link robot model was able to generate squat-and-sway motions by using the proposed framework.
    Neural networks: the official journal of the International Neural Network Society 01/2013; 40C:32-43. · 1.88 Impact Factor
  • N. Sugimoto, J. Morimoto
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop fast reinforcement learning (RL) framework using the approximated dynamics of a humanoid robot. Although RL is a useful non-linear optimizer, applying it to real robotic systems is usually difficult due to the large number of iterations required to acquire suitable policies. In this study, we approximate the dynamics using data from a real robot with sparse pseudo-input Gaussian processes (SPGPs). By using SPGPs, we estimated the probability distribution considering both the input vector and output signal variances. In real environments, since the observations from robotic sensors include large noise, SPGPs can suitably approximate the stochastic dynamics of a real humanoid robot. We use the approximated dynamics to improve the performance of a movement task in a path integral RL framework, which updates a policy from the sampled trajectories of the state and action vectors and the cost. We implemented our proposed method on a real humanoid robot and tested on a via-point reaching task. The robot achieved successful performance with fewer number of interactions with the real environment by using the proposed method than a conventional approach which dose not use the simulated dynamics.
    Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The paper reports on a novel hybrid drive lower-extremity exoskeleton research platform, XoR2, an improved version of XoR. Its design concept, details of the new hardware and basic experimental results are presented. The robot is designed so that it does not interfere with the user's normal walking and supports a 30-kg payload in addition to its own weight of 20 kg. The robot has a total of 14 joints; among them six flexion/extension joints are powered. Pneumatic artificial muscles are combined with small high-response servo motors for the hip and knee joints, and arranged antagonistically at the hip and ankle joints to provide passive stability and variable stiffness. The preliminary experimental results on position and torque control demonstrate that the proposed mechanisms, sensors and control systems are effective, and hybrid drive is promising for torque-controllable, high-speed, backdrivable, mobile (but non-power-autonomous) exoskeleton robots.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Direct transfer of human motion trajectories to humanoid robots does not result in dynamically stable robot movements due to the differences in human and humanoid robot kinematics and dynamics. We developed a system that converts human movements captured by a low-cost RGB-D camera into dynamically stable humanoid movements. The transfer of human movements occurs in real-time. As need arises, the developed system can smoothly transition between unconstrained movement imitation and imitation with balance control, where movement reproduction occurs in the null space of the balance controller. The developed balance controller is based on an approximate model of the robot dynamics, which is sufficient to stabilize the robot during on-line imitation. However, the resulting movements cannot be guaranteed to be optimal because the model of the robot dynamics is not exact. The initially acquired movement is therefore subsequently improved by model-free reinforcement learning, both with respect to the accuracy of reproduction and balance control. We present experimental results in simulation and on a real humanoid robot.
    Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We introduce our Pneumatic-Electric (PE) hybrid actuator model and propose to use the model to derive a controller for the hybrid actuation system by an optimal control method. Our PE hybrid actuator is composed of Pneumatic Artificial Muscle (PAM) and an electric motor. The PE hybrid actuator is light and can generate large torque. These properties are desirable for assistive devices such as exoskeleton robots. However, to maximally take advantage of PE hybrid system, we need to reasonably distribute necessary torque to these redundant actuators by properly taking distinctive characteristics of a pneumatic actuator and an electric motor into account. To do this, in this study, we use an optimal control method called iterative LQG to reasonably distribute the necessary torque to the PAM and the electric motor. The crucial issue to apply the optimal control method to the PE hybrid system is PAM modeling. We built a PAM model composed of three elements: 1) an (air)pressure-force conversion model, 2) a contraction rate model, 3) time delay of the air valve, and 4) the upper limit of force generation that depends on the contraction rate and the movable range. We apply our proposed method to a one degree of freedom (one-DoF) arm with PE hybrid actuator. The one-DoF arm successfully swing tasks 0.5 Hz, 2 Hz and 4 Hz and swing up and stability task by reasonably distributing necessary torque to the two different actuators in a simulated and a real environments.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study proposes the design of electromyography (EMG)-based force feedback controller which explicitly considers human-robot interaction for the exoskeletal assistive robot. Conventional approaches have been only consider one-directional mapping from EMG to control input for assistive robot control. However, EMG and force generated by the assistive robot interfere each other, e.g., amplitude of EMG decreases if limb movements are assisted by the robot. In our proposed method, we first derive the nonlinear mapping from EMG signal to muscle force for estimating human joint torque, and convert it to assistive force using human musculoskeletal model and robot kinematic model. Additionally the feedforward interaction torque is feedback into torque controller to acquire the necessity loads. To validate the feasibility of the proposed method, assistive One-DOF system was developed as the real equipment and the simulator. We compared the proposed method with conventional approaches using both the simulated and the real One-DOF systems. As the result, we found that the proposed model was able to estimate the necessary torque adequately to achieve stable human-robot interaction.
    Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Autonomous robots cannot be programmed in advance for all possible situations. Instead, they should be able to generalize the previously acquired knowledge to operate in new situations as they arise. A possible solution to the problem of generalization is to apply statistical methods that can generate useful robot responses in situations for which the robot has not been specifically instructed how to respond. In this paper we propose a methodology for the statistical generalization of the available sensorimotor knowledge in real-time. Example trajectories are generalized by applying Gaussian process regression, using the parameters describing a task as query points into the trajectory database. We show on real-world tasks that the proposed methodology can be integrated into a sensory feedback loop, where the generalization algorithm is applied in real-time to adapt robot motion to the perceived changes of the external world.
    Robotics and Autonomous Systems. 10/2012; 60(10):1327–1339.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we propose an extension of the MOSAIC architecture to control real humanoid robots. MOSAIC was originally proposed by neuroscientists to understand the human ability of adaptive control. The modular architecture of the MOSAIC model can be useful for solving nonlinear and non-stationary control problems. Both humans and humanoid robots have nonlinear body dynamics and many degrees of freedom. Since they can interact with environments (e.g., carrying objects), control strategies need to deal with non-stationary dynamics. Therefore, MOSAIC has strong potential as a human motor-control model and a control framework for humanoid robots. Yet application of the MOSAIC model has been limited to simple simulated dynamics since it is susceptive to observation noise and also cannot be applied to partially observable systems. Our approach introduces state estimators into MOSAIC architecture to cope with real environments. By using an extended MOSAIC model, we are able to successfully generate squatting and object-carrying behaviors on a real humanoid robot.
    Neural networks: the official journal of the International Neural Network Society 01/2012; 29-30:8-19. · 1.88 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a framework for generating coordinated periodic movements of robotic systems with external inputs. We developed an adaptive pattern generator model that is composed of a two-factor observation model with a style parameter and phase dynamics with a phase variable. The style parameter controls the spatial patterns of the generated trajectories, and the phase variable controls its temporal profiles. To validate the effectiveness of our proposed method, we applied it to a simulated humanoid model to perform biped walking behaviors coordinated with observed walking patterns and the environment. The robot successfully performed stable biped walking behaviors even when the style of the observed walking pattern and the period were suddenly changed.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012;

Publication Stats

1k Citations
35.92 Total Impact Points

Institutions

  • 2013
    • National Institute of Informatics
      Edo, Tōkyō, Japan
  • 2012
    • National Science Communication Institute
      Seattle, Washington, United States
  • 2010
    • Jožef Stefan Institute
      • Department of Automation, Robotics and biocybernetics
      Ljubljana, Ljubljana, Slovenia
  • 2005–2010
    • Advanced Telecommunications Research Institute
      Kioto, Kyōto, Japan
  • 2001–2010
    • Nara Institute of Science and Technology
      • Graduate School of Information Science
      Ikuma, Nara, Japan
  • 2008
    • University of Southern California
      Los Angeles, California, United States
  • 2007–2008
    • Japan Science and Technology Agency (JST)
      Edo, Tōkyō, Japan
  • 2004
    • Sony Corporation
      Edo, Tōkyō, Japan
    • Kyushu Institute of Technology
      Japan