[Show abstract][Hide abstract] ABSTRACT: In this study, we show that a movement policy can be improved efficiently
using the previous experiences of a real robot. Reinforcement Learning (RL) is
becoming a popular approach to acquire a nonlinear optimal policy through trial
and error. However, it is considered very difficult to apply RL to real robot
control since it usually requires many learning trials. Such trials cannot be
executed in real environments because unrealistic time is necessary and the
real system's durability is limited. Therefore, in this study, instead of
executing many learning trials, we propose to use a recently developed RL
algorithm, importance-weighted PGPE, by which the robot can efficiently reuse
previously sampled data to improve it's policy parameters. We apply
importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can
learn a target reaching movement and a cart-pole swing up movement in a real
environment without using any prior knowledge of the task or any carefully
designed initial trajectory.
[Show abstract][Hide abstract] ABSTRACT: The creation and adaptation of motor behaviors is an important capability for autonomous robots. In this paper we propose an approach for altering existing robot behaviors online, where a human coach interactively changes the robot motion to achieve the desired outcome. Using hand gestures, the human coach can specify the desired modifications to the previously acquired behavior. To preserve a natural posture while performing the task, the movement is encoded in the robot’s joint space using periodic dynamic movement primitives. The coaching gestures are mapped to the robot joint space via robot Jacobian and used to create a virtual force field affecting the movement. A recursive least squares
technique is used to modify the existing movement with respect to the virtual force field. The proposed approach was evaluated on a simulated three degrees of freedom planar robot and on a real humanoid robot, where human coaching gestures were captured by an RGB-D sensor. Although our focus was on rhythmic movements, the developed approach is also applicable to discrete (point-to-point) movements.
2014 IEEE International Conference on Robotics and Automation (ICRA); 05/2014
[Show abstract][Hide abstract] ABSTRACT: In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning trials. Such trials cannot be executed in real environments because unrealistic time is necessary and the real system's durability is limited. Therefore, in this study, instead of executing many learning trials, we propose to use a recently developed RL algorithm, importance-weighted PGPE, by which the robot can efficiently reuse previously sampled data to improve it's policy parameters. We apply importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can learn a target reaching movement and a cart-pole swing up movement in a real environment without using any prior knowledge of the task or any carefully designed initial trajectory.
[Show abstract][Hide abstract] ABSTRACT: We propose to tackle in this paper the problem of controlling whole-body humanoid robot behavior through non-invasive brain-machine interfacing (BMI), motivated by the perspective of mapping human motor control strategies to human-like mechanical avatar. Our solution is based on the adequate reduction of the controllable dimensionality of a high-DOF humanoid motion in line with the state-of-the-art possibilities of non-invasive BMI technologies, leaving the complement subspace part of the motion to be planned and executed by an autonomous humanoid whole-body motion planning and control framework. The results are shown in full physics-based simulation of a 36-degree-of-freedom humanoid motion controlled by a user through EEG-extracted brain signals generated with motor imagery task.
[Show abstract][Hide abstract] ABSTRACT: This paper investigates the influence of the leg afferent input, induced by a leg assistive robot, on the decoding performance of a BMI system. Specifically, it focuses on a decoder based on the event-related (de)synchronization (ERD/ERS) of the sensorimotor area. The EEG experiment, performed with healthy subjects, is structured as a 3 × 2 factorial design, consisting of two factors: "finger tapping task" and "leg condition." The former is divided into three levels (BMI classes), being left hand finger tapping, right hand finger tapping and no movement (Idle); while the latter is composed by two levels: leg perturbed (Pert) and leg not perturbed (NoPert). Specifically, the subjects' leg was periodically perturbed by an assistive robot in 5 out of 10 sessions of the experiment and not moved in the remaining sessions. The aim of this study is to verify that the decoding performance of the finger tapping task is comparable between the two conditions NoPert and Pert. Accordingly, a classifier is trained to output the class of the finger tapping, given as input the features associated with the ERD/ERS. Individually for each subject, the decoding performance is statistically compared between the NoPert and Pert conditions. Results show that the decoding performance is notably above chance, for all the subjects, under both conditions. Moreover, the statistical comparison do not highlight a significant difference between NoPert and Pert in any subject, which is confirmed by feature visualization.
[Show abstract][Hide abstract] ABSTRACT: Learning of goal-directed behaviors in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). Although traditionally modeled as separate learning systems in artificial agents, numerous animal experiments point towards their co-operative role in behavioral learning. Based on this concept, the recently introduced framework of neural combinatorial learning combines the two systems where both the systems run in parallel to guide the overall learned behavior. Such a combinatorial learning demonstrates a faster and efficient learner. In this work, we further improve the framework by applying a reservoir computing network (RC) as an adaptive critic unit and reward modulated Hebbian plasticity. Using a mobile robot system for goal-directed behavior learning, we clearly demonstrate that the reservoir critic outperforms traditional radial basis function (RBF) critics in terms of stability of convergence and learning time. Furthermore the temporal memory in RC allows the system to learn partially observable markov decision process scenario, in contrast to a memoryless RBF critic.
IEEE International Conference on Systems, Man and Cybernetics 2013 (IEEE SMC 2013), Manchester, U.K.; 10/2013
[Show abstract][Hide abstract] ABSTRACT: Humans can effortlessly perceive an object they encounter for the first time in a possibly cluttered scene and memorize its appearance for later recognition. Such performance is still difficult to achieve with artificial vision systems because it is not clear how to define the concept of objectness in its full generality. In this paper we propose a paradigm that integrates the robot’s manipulation and sensing capabilities to detect a new, previously unknown object and learn its visual appearance. By making use of the robot’s manipulation capabilities and force sensing, we introduce additional information that can be utilized to reliably separate unknown objects from the background. Once an object has been identified, the robot can continuously manipulate it to accumulate more information about it and learn its complete visual appearance. We demonstrate the feasibility of the proposed approach by applying it to the problem of autonomous learning of visual representations for viewpoint-independent object recognition on a humanoid robot.
[Show abstract][Hide abstract] ABSTRACT: The goal of reinforcement learning (RL) is to let an agent learn an optimal
control policy in an unknown environment so that future expected rewards are
maximized. The model-free RL approach directly learns the policy based on data
samples. Although using many samples tends to improve the accuracy of policy
learning, collecting a large number of samples is often expensive in practice.
On the other hand, the model-based RL approach first estimates the transition
model of the environment and then learns the policy based on the estimated
transition model. Thus, if the transition model is accurately learned from a
small amount of data, the model-based approach can perform better than the
model-free approach. In this paper, we propose a novel model-based RL method by
combining a recently proposed model-free policy search method called policy
gradients with parameter-based exploration and the state-of-the-art transition
model estimator called least-squares conditional density estimation. Through
experiments, we demonstrate the practical usefulness of the proposed method.
[Show abstract][Hide abstract] ABSTRACT: Classical conditioning (conventionally modeled as correlation-based learning) and operant conditioning (conventionally modeled as reinforcement learning or reward-based learning) have been found in biological systems. Evidence shows that these two mechanisms strongly involve learning about associations. Based on these biological findings, we propose a new learning model to achieve successful control policies for artificial systems. This model combines correlation-based learning using input correlation learning (ICO learning) and reward-based learning using continuous actor–critic reinforcement learning (RL), thereby working as a dual learner system. The model performance is evaluated by simulations of a cart-pole system as a dynamic motion control problem and a mobile robot system as a goal-directed behavior control problem. Results show that the model can strongly improve pole balancing control policy, i.e., it allows the controller to learn stabilizing the pole in the largest domain of initial conditions compared to the results obtained when using a single learning mechanism. This model can also find a successful control policy for goal-directed behavior, i.e., the robot can effectively learn to approach a given goal compared to its individual components. Thus, the study pursued here sharpens our understanding of how two different learning mechanisms can be combined and complement each other for solving complex tasks.
Advances in Complex Systems 07/2013; 16(02n03). · 0.79 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
[Show abstract][Hide abstract] ABSTRACT: In this study, we propose a multi-user myoelectric interface that can easily adapt to novel users. When a user performs different motions (e.g., grasping and pinching), different EMG signals are measured. When different users perform the same motion (e.g., grasping), different EMG signals are also measured. Therefore, designing a myoelectric interface that can be used by multiple users to perform multiple motions is difficult. To cope with this problem, we propose for EMG signals a bilinear model that is composed of two linear factors: 1) user-dependent and 2) motion-dependent. By decomposing the EMG signals into these two factors, the extracted motion-dependent factors can be used as user-independent features. We can construct a motion classifier on the extracted feature space to develop the multi-user interface. For novel users, the proposed adaptation method estimates the user-dependent factor through only a few interactions. The bilinear EMG model with the estimated user-dependent factor can extract the user-independent features from the novel user data. We applied our proposed method to a recognition task of five hand gestures for robotic hand control using four-channel EMG signals measured from subject forearms. Our method resulted in 73% accuracy, which was statistically significantly different from the accuracy of standard non-multi user interfaces, as the result of a two-sample t-test at a significance level of 1%.
IEEE transactions on bio-medical engineering 03/2013; · 2.15 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose an imitation learning framework to generate physically consistent behaviors by estimating the ground reaction force from captured human behaviors. In the proposed framework, we first extract behavioral primitives, which are represented by linear dynamical models, from captured human movements and measured ground reaction force by using the Gaussian mixture of linear dynamical models. Therefore, our method has small dependence on classification criteria defined by an experimenter. By switching primitives with different combinations while estimating the ground reaction force, different physically consistent behaviors can be generated. We apply the proposed method to a four-link robot model to generate squat motion sequences. The four-link robot model successfully generated the squat movements by using our imitation learning framework. To show generalization performance, we also apply the proposed method to robot models that have different torso weights and lengths from a human demonstrator and evaluate the control performances. In addition, we show that the robot model is able to recognize and imitate demonstrator movements even when the observed movements are deviated from the movements that are used to construct the primitives. For further evaluation in higher-dimensional state space, we apply the proposed method to a seven-link robot model. The seven-link robot model was able to generate squat-and-sway motions by using the proposed framework.
Neural networks: the official journal of the International Neural Network Society 01/2013; 40C:32-43. · 1.88 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We develop fast reinforcement learning (RL) framework using the approximated dynamics of a humanoid robot. Although RL is a useful non-linear optimizer, applying it to real robotic systems is usually difficult due to the large number of iterations required to acquire suitable policies. In this study, we approximate the dynamics using data from a real robot with sparse pseudo-input Gaussian processes (SPGPs). By using SPGPs, we estimated the probability distribution considering both the input vector and output signal variances. In real environments, since the observations from robotic sensors include large noise, SPGPs can suitably approximate the stochastic dynamics of a real humanoid robot. We use the approximated dynamics to improve the performance of a movement task in a path integral RL framework, which updates a policy from the sampled trajectories of the state and action vectors and the cost. We implemented our proposed method on a real humanoid robot and tested on a via-point reaching task. The robot achieved successful performance with fewer number of interactions with the real environment by using the proposed method than a conventional approach which dose not use the simulated dynamics.
Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
[Show abstract][Hide abstract] ABSTRACT: The paper reports on a novel hybrid drive lower-extremity exoskeleton research platform, XoR2, an improved version of XoR. Its design concept, details of the new hardware and basic experimental results are presented. The robot is designed so that it does not interfere with the user's normal walking and supports a 30-kg payload in addition to its own weight of 20 kg. The robot has a total of 14 joints; among them six flexion/extension joints are powered. Pneumatic artificial muscles are combined with small high-response servo motors for the hip and knee joints, and arranged antagonistically at the hip and ankle joints to provide passive stability and variable stiffness. The preliminary experimental results on position and torque control demonstrate that the proposed mechanisms, sensors and control systems are effective, and hybrid drive is promising for torque-controllable, high-speed, backdrivable, mobile (but non-power-autonomous) exoskeleton robots.
Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
[Show abstract][Hide abstract] ABSTRACT: Direct transfer of human motion trajectories to humanoid robots does not result in dynamically stable robot movements due to the differences in human and humanoid robot kinematics and dynamics. We developed a system that converts human movements captured by a low-cost RGB-D camera into dynamically stable humanoid movements. The transfer of human movements occurs in real-time. As need arises, the developed system can smoothly transition between unconstrained movement imitation and imitation with balance control, where movement reproduction occurs in the null space of the balance controller. The developed balance controller is based on an approximate model of the robot dynamics, which is sufficient to stabilize the robot during on-line imitation. However, the resulting movements cannot be guaranteed to be optimal because the model of the robot dynamics is not exact. The initially acquired movement is therefore subsequently improved by model-free reinforcement learning, both with respect to the accuracy of reproduction and balance control. We present experimental results in simulation and on a real humanoid robot.
Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
[Show abstract][Hide abstract] ABSTRACT: We introduce our Pneumatic-Electric (PE) hybrid actuator model and propose to use the model to derive a controller for the hybrid actuation system by an optimal control method. Our PE hybrid actuator is composed of Pneumatic Artificial Muscle (PAM) and an electric motor. The PE hybrid actuator is light and can generate large torque. These properties are desirable for assistive devices such as exoskeleton robots. However, to maximally take advantage of PE hybrid system, we need to reasonably distribute necessary torque to these redundant actuators by properly taking distinctive characteristics of a pneumatic actuator and an electric motor into account. To do this, in this study, we use an optimal control method called iterative LQG to reasonably distribute the necessary torque to the PAM and the electric motor. The crucial issue to apply the optimal control method to the PE hybrid system is PAM modeling. We built a PAM model composed of three elements: 1) an (air)pressure-force conversion model, 2) a contraction rate model, 3) time delay of the air valve, and 4) the upper limit of force generation that depends on the contraction rate and the movable range. We apply our proposed method to a one degree of freedom (one-DoF) arm with PE hybrid actuator. The one-DoF arm successfully swing tasks 0.5 Hz, 2 Hz and 4 Hz and swing up and stability task by reasonably distributing necessary torque to the two different actuators in a simulated and a real environments.
Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013