Conference Paper

Using Reinforcement Learning to Improve the Stability of a Humanoid Robot: Walking on Sloped Terrain

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In order to perform a walk on a real environment, humanoid robots need to adapt themselves to the environment, as humans do. One approach to achieve this goal is to use Machine Learning techniques that allow robots to improve their behavior with time. In this paper, we propose a system that uses Reinforcement Learning to learn the action policy that will make a robot walk in an upright position, in a lightly sloped terrain. To validate this proposal, experiments were made with a humanoid robot -- a robot for the RoboCup Humanoid League based on DARwIn-OP. The results showed that the robot was able to walk on sloping floors, going up and down ramps, even in situations where the slope angle changes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These machine learning methods have also been applied to sloped surfaces to improve the humanoid robot walking stability on sloped terrain. 15 Silva et al. 15 proposed a robotic gait correction system based on a reinforcement learning method that enabled the robot to remain upright and stable on an approximately 20°inclined board. Li et al. 16 used an accelerometer and force sensor to design a robot for climbing stairs. ...
... These machine learning methods have also been applied to sloped surfaces to improve the humanoid robot walking stability on sloped terrain. 15 Silva et al. 15 proposed a robotic gait correction system based on a reinforcement learning method that enabled the robot to remain upright and stable on an approximately 20°inclined board. Li et al. 16 used an accelerometer and force sensor to design a robot for climbing stairs. ...
... Similarly, the transfer function of the highpass filter in discrete Z transform is shown in (13), and the time domain equation of (13) is in (14). Then, the equation of the high-pass filter is rewritten in (15), where c, d, and e are constants. ...
Article
Full-text available
We designed a stable gait pattern and posture-control balance system to enable a biped humanoid robot to maintain balance and avoid falling when walking on uneven ground or slopes. In this study, we first examined the problem of gait generation and the balance of a humanoid robot and then proposed a posture-control balance system using the inertial sensors of a gyroscope and accelerometer to sense the tilt angle of the robot according to the environment. To process the data obtained by the sensors, the mean filter was applied to eliminate the noise in the data, and the complementary filter was used to properly combine the data from both the gyroscope and accelerometer. The system further modifies the gait and posture of the robot based on the results obtained through a fuzzy system to attain the angle of balance and stabilization. A robot with an open platform was used to test the implementation of the proposed algorithm, and the experimental results demonstrated that the robot could successfully maintain balance when walking uphill and downhill on uneven surfaces. Moreover, because only one parameter needs to be adjusted when applying the balance-control system, the system can be easily extended to any related humanoid robot.
... Hybrid architecture argues that a robot has to deliberate while it senses and acts, and it is done by decomposing a task into sub-tasks. Tasks that are reactive, such as protecting a robot from a falling or adapting the walking to sloped terrains (Silva et al. 2015), do not need to be planned or deliberated. When one of those situations occurs, the robot should simply sense and react, leaving complex behaviors, like deciding what to do or localizing itself, to the hierarchical structure. ...
... Figure 1 depicts all processes and how they are interconnected by the blackboard (bkb) to exchange data among them, considering a robot n. There are some researches with humanoid robots that already use the Cross Architecture as their basic architecture for the development of cognitive behaviors, for example Silva et al. (2015), Vilão et al. (2014), and Homem et al. (2017). ...
Article
This paper presents a humanoid robot framework, composed of a simulator and a telemetry interface. The framework is based on the Cross Architecture, and it is developed aiming for the RoboCup Soccer Humanoid League domain. A simulator is an important tool for testing cognitive algorithms without handling issues of real robots; furthermore, a simulator is extremely useful for allowing reproducibility of any developed algorithm, even if there is no robot available. The proposed simulator allows an easy transfer of the algorithms developed in the simulator to real robots, as long as it uses the Cross Architecture as its software architecture. Then, in order to evaluate the cognitive algorithms in real robots, a telemetry interface is proposed. From this interface, it is possible to monitor any variable in the robot’s shared memory. The framework is open source and has low computational cost. Experiments were conducted in order to analyze both, simulator and telemetry interface. Experiments performed with the simulator aim to validate the high-level strategy development and the portability to a real robot, while experiments with telemetry interface aim to evaluate the robot behavior using, as input, the information received from the shared memory passed by all processes. The results show that the simulator can be used to test and develop new algorithms, while the telemetry can be used to monitor the robot, thus validating the framework for this domain.
... The hybrid paradigm argues that a robot has to deliberate while it senses and acts, and it is done by decomposing a task into sub-tasks. Tasks that are reactive, such as protecting a robot from a falling or adapting the walking to sloped terrains [7], do not need to be planned or deliberated. When one of those situations occurs, the robot should simply sense and react, leaving complex behaviors, like deciding what to do or localizing itself, to the hierarchical structure. ...
... There are some researches with humanoid robots that already use the Cross Architecture as their basic architecture for the development of cognitive behaviors, for example [7], [9] and [10]. ...
Conference Paper
This paper presents a new 2D robot simulator based on the Cross Architecture for RoboCup Soccer Humanoid League domain. A simulator is an important tool for testing cognitive algorithms in robots without the need of handling with real robot problems, moreover, a simulator is extremely useful for allowing reproducibility of any developed algorithm, even if there is no robot available. The proposed simulator allows the direct application of the algorithms developed in the simulator into a real robot that works with the Cross Architecture. Besides, the simulator is freely available, it is open-source and has low computational cost. Experiments were conducted in order to analyze the portability of a decision code developed in the simulator to a real robot. The results allowed us to conclude that the simulator can be used to test new algorithms, since the decisions performed by the robot in simulation and in the real world were quite similar.
... The stable biped walking pattern is verified through a simulation. Silva [25], [26] used reinforcement learning to improve the walking stability of humanoid robot on sloped terrain and optimize the parameter values for the gait pattern generation with temporal generalization. The methods based on reinforcement learning can optimize the gait parameters and are possible to identify the relationship between the parameters, which have great development potentials. ...
Article
An important goal for humanoid robots is to achieve fast, flexible and stable walking. In previous research, the structure and walking algorithms evolved separately, resulting in a slow evolution speed and lack of an initial design basis. This paper proposes comprehensively considering body morphology and walking patterns, exploring the relationship between them and their influence on the motion ability. The method parameterizes the body morphology and walking patterns. Then a response surface model is established to describe the complex relationship between these parameters and finally obtain the optimized parameters, which provides a reference for the structural design and gait generation.
... The anatomy of humanoid robots enable them to traverse irregular terrains, move through gaps or ascend and descend stairs. As an example, the work of [2] uses reinforcement learning to teach a kid-size humanoid robot to stand upright and walk in terrains with slight inclinations. ...
Conference Paper
Recent advances in deep learning point towards the use of computer vision systems based on Deep Neural Networks (DNNs). However, these network architectures are optimized to be executed in specialized hardware, such as in computers with Graphics Processing Units (GPU). Such hardware is rarely available in embedded computers, for instance, those used by mobile robots, so alternatives must be studied in order to guarantee that mobile systems may still benefit from the applications of deep learning. In this work, we investigate the performance of a vision system for ball detection, based on different configurations of the MobileNet Convolutional Neural Network architecture, under a constrained hardware scenario. By gradually reducing the input size and the number of parameters that compose the neural network and comparing their inference time in an Intel NUC Core i7 mini-PC, embedded in a humanoid soccer robot, we have found acceptable values for the width and resolution multipliers to be used in our soccer ball detection system during a robot-soccer match.
Article
Full-text available
The difficult task of creating reliable mobility for humanoid robots has been studied for decades. Even though several different walking strategies have been put forth and walking performance has substantially increased, stability still needs to catch up to expectations. Applications for Reinforcement Learning (RL) techniques are constrained by low convergence and ineffective training. This paper develops a new robust and efficient framework based on the Robotis-OP2 humanoid robot combined with a typical trajectory-generating controller and Deep Reinforcement Learning (DRL) to overcome these limitations. This framework consists of optimizing the walking trajectory parameters and posture balancing system. Multi-sensors of the robot are used for parameter optimization. Walking parameters are optimized using the Dueling Double Deep Q Network (D3QN), one of the DRL algorithms, in the Webots simulator. The hip strategy is adopted for the posture balancing system. Experimental studies are carried out in both simulation and real environments with the proposed framework and Robotis-OP2’s walking algorithm. Experimental results show that the robot performs more stable walking with the proposed framework than Robotis-OP2’s walking algorithm. It is thought that the proposed framework will be beneficial for researchers studying in the field of humanoid robot locomotion.
Conference Paper
Full-text available
Uneven terrain walking is one of the key challenges in bipedal walking. In this paper, we propose a motion pattern generator for slope walking in 3D dynamics using preview control of zero moment point (ZMP). In this method, the future ZMP locations are selected with respect to known slope gradient. The trajectory of the center of mass (CoM) of the robot is generated by using the preview controller to maintain the ZMP at the desired location. Two models of slope walking, namely upslope and downslope, are investigated. Continuous walking on slopes with different gradients is also studied to enable the robots to walk on uneven terrains. Since staircase walking is similar to slope walking, the slope walking trajectory generator can also be applied to the staircase walking. Simulation results show that the robot can walk on many types of slopes and stairs by using the proposed pattern generator.
Conference Paper
Full-text available
This paper presents the design method for a humanoid which has a network based modular structure and a standard PC architecture. Based on the proposed method, we developed DARwIn-OP which meets the requirements for an open humanoid platform. DARwIn-OP has an expandable system structure, high performance, simple maintenance, familiar development environment, and affordable prices. All resources of DARwIn-OP including source codes, circuit diagrams, mechanical CAD files, and parts information will be opened to the public.
Conference Paper
Full-text available
To achieve a balanced walking for a humanoid, it is necessary to estimate the dynamic stability of the system. However, in a small size humanoid with restricted system resource, it is hard to satisfy the performance level desired by dynamics analysis. Therefore, in this paper, we propose the feasible methods to generate gait pattern and stabilize walking based on coupled oscillators which have a clear correlation between oscillator parameters and system dynamics without a real time ZMP calculation. The proposed method was tested on the open humanoid platform DARwIn-OP for the evaluation, and the result showed that a real time gait pattern generation and stabilization were realized.
Article
Full-text available
There exists a class of two-legged machines for which walking is a natural dynamic mode. Once started on a shallow slope, a machine of this class will settle into a steady gait quite comparable to human walking, without active control or en ergy input. Interpretation and analysis of the physics are straightforward; the walking cycle, its stability, and its sensi tivity to parameter variations are easily calculated. Experi ments with a test machine verify that the passive walking effect can be readily exploited in practice. The dynamics are most clearly demonstrated by a machine powered only by gravity, but they can be combined easily with active energy input to produce efficient and dextrous walking over a broad range of terrain.
Article
Full-text available
This paper is devoted to the permanence of the concept of Zero-Moment Point, widely-known by the acronym ZMP. Thirty-five years have elapsed since its implicit presentation (actually before being named ZMP) to the scientific community and thirty-three years since it was explicitly introduced and clearly elaborated, initially in the leading journals published in English. Its first practical demonstration took place in Japan in 1984, at Waseda University, Laboratory of Ichiro Kato, in the first dynamically balanced robot WL-10RD of the robotic family WABOT. The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot. After a short survey of the history of the origin of ZMP a very detailed elaboration of ZMP notion is given, with a special review concerning "boundary cases" when the ZMP is close to the edge of the support polygon and "fictious cases" when the ZMP should be outside the support polygon. In addition, the difference between ZMP and the center of pressure is pointed out. Finally, some unresolved or insufficiently treated phenomena that may yield a significant improvement in robot performance are considered.
Article
Full-text available
Photocopy. Supplied by British Library. Thesis (Ph. D.)--King's College, Cambridge, 1989.
Conference Paper
Full-text available
In this paper, we formulate gait synthesis of humanoid biped locomotion as an optimization problem with consideration of some constraints, e.g. zero-moment point (ZMP) constraints for dynamically stable locomotion, internal forces constraints for smooth transition, geometric constraints for walking on an uneven floor, e.g. sloping surface and etc. In the frame of gait synthesis tied with constraint functions, computational learning methods can be incorporated to further improve the gait. The effectiveness of the proposed dynamically stable gait planning and learning approach for humanoid walking on both even floor and sloping surface has been successfully tested on our humanoid soccer robots named Robo-Erectus, which won first place in the RoboCup 2003 Humanoid League Free Performance competition and got 4 silver awards in the RoboCup Humanoid League 2004.
Conference Paper
Full-text available
Simple intuitive control strategies can be used to compel bipedal robots to walk over sloped terrain. We describe an algorithm for walking dynamically and steadily over sloped terrain with unknown slope gradients and transition locations. The algorithm is developed based on geometric considerations. The overall algorithm is very simple and does not require the biped to have an extensive sensory system for walking over moderate slopes. The ground is detected blindly using only foot contact switches. Using a few simple strategies, we have compelled a simulated 7-link planar biped to walk up and down slopes and over rolling terrain
Conference Paper
This paper describes the design and development of a new humanoid robot named Newton, that is intended for applications in research and also to be used in the RoboCup KidSize League World Competition. Newton robot has been designed to work without any dedicated sub-controller implemented in low level hardware, often used to control the servomotors of the robot. Newton uses only a standard personal computer to do all processing and control necessary by the robot. To be able to deal with all the tasks involved in the robotic soccer domain, a new software architecture is proposed. This architecture is based on the hybrid paradigm, involving sensing, decision, planning, low level control, localization and communication. Preliminary tests show that the robot can walk properly while it performs tasks like finding the ball in an unknown position or positioning itself at the ball for kicking, exhibiting a very good performance.
Conference Paper
Simulation is often used in research and industry as a low cost, high efficiency alternative to real model testing. Simulation has also been used to develop and test powerful learning algorithms. However, parameters learned in simulation often do not translate directly to the application, especially because heavy optimization in simulation has been observed to exploit the inevitable simulator simplifications, thus creating a gap between simulation and application that reduces the utility of learning in simulation. This paper introduces Grounded Simulation Learning (GSL), an iterative optimization framework for speeding up robot learning using an imperfect simulator. In GSL, a behavior is developed on a robot and then repeatedly: 1) the behavior is optimized in simulation; 2) the resulting behavior is tested on the real robot and compared to the expected results from simulation, and 3) the simulator is modified, using a machine-learning approach to come closer in line with reality. This approach is fully implemented and validated on the task of learning to walk using an Aldebaran Nao humanoid robot. Starting from a set of stable, hand-coded walk parameters, four iterations of this three-step optimization loop led to more than a 25% increase in the robot's walking speed.
Article
Previous research related to walking on an inclined plane for humanoid robots, including the 3-D linear inverted pendulum model (3D-LIPM) approach, were unable to modify walking period, step length, and walking direction independently without any additional step for adjusting the center of mass (CoM) motion. Moreover, the inclination along the pitch direction was only considered for walking. To solve these problems, a novel command state (CS)-based modifiable walking pattern generator for humanoid robots is proposed for modifiable walking on an inclined plane in both pitch and roll directions. The dynamic equation of the 3D-LIPM on the inclined plane in both pitch and roll directions is derived to obtain the CoM motion. Using the CoM motion, a method for modifiable walking pattern generation on the inclined plane is developed to follow a given CS composed of walking periods, step lengths, and walking directions for both legs. The effectiveness of the proposed walking pattern generator is demonstrated through both simulation and experiment for the small-sized humanoid robot, HanSaRam-IX (HSR-IX).
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Article
A scheme to enable the SD-2 biped robot to climb sloping surfaces is proposed. By means of sensing devices, namely position sensors on the joints and force sensors underneath the heel and toe, the robot is able to detect the transition of the supporting terrain from a flat floor to a sloping surface. An algorithm is developed for the biped robot control system to evaluate the inclination of the supporting foot and the unknown gradient, and a compliant motion scheme is then used to enable the robot to transfer from level walking to climbing the slope. While the robot walks on the slope, the gait synthesis is a simple modification to the one used for level walking. Experiments with the SD-2 biped robot show that the overall scheme, while simple to implement, is powerful and reliable enough to permit walking from level to slope or vice versa. Finally, it is argued that the proposed mechanism can be extended to quasi-dynamic and dynamic gaits
Article
The connection between the dynamics of an object and the algorithmic level has been modified in this paper, based on two-level control. The central modification consists in introducing feedbacks, that is, a system of regulators at the level of the formed typed of gait only. Such a modification originates from the assumption that a very narrow class of gait types needs to be taken into account when generating the gait. In the paper the gait has been formed on the basis of a fixed program having a kinematic-dynamic character. The kinematic part concerns the kinematic programnmed connections for activating the lower extremities, while the dynamic part exposes appropriate changes in the characteristic coordinates of the compensation system. Such a connection with a minimum number of coordinates extends the possibility of solving the problem of equilibrium in motion for one type of gait without any particular algorithm that would take into account the motion coordinates and form out of them a stable motion at a higher algebraic level.
Dynamically stable gait planning for a humanoid robot to climb sloping surface
  • C Zhou
  • P K Yue
  • J Ni
  • S.-B Chan
C. Zhou, P. K. Yue, J. Ni, and S.-B. Chan, "Dynamically stable gait planning for a humanoid robot to climb sloping surface," in Robotics, Automation and Mechatronics, 2004 IEEE Conference on, vol. 1, Dec 2004, pp. 341-346 vol.1.
Hardware and software aspects of the design and assembly of a new humanoid robot for robocup soccer
  • D H Perico
  • I J Silva
  • C O Vilao
  • T P Homem
  • R C Destro
  • F Tonidandel
  • R A Bianchi
D. H. Perico, I. J. Silva, C. O. Vilao, T. P. Homem, R. C. Destro, F. Tonidandel, and R. A. Bianchi, "Hardware and software aspects of the design and assembly of a new humanoid robot for robocup soccer," in Robotics: SBR-LARS Robotics Symposium and Robocontrol (SBR LARS Robocontrol), 2014 Joint Conference on. IEEE, 2014, pp. 73-78.