ArticlePublisher preview available

Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Dexterous manipulation tasks usually have multiple objectives. The priorities of these objectives may vary at different phases of a manipulation task. Current methods do not consider the objective priority and its change during the task, making a robot have a hard time or even fail to learn a good policy. In this work, we develop a novel Adaptive Hierarchical Curriculum to guide the robot to learn manipulation tasks with multiple prioritized objectives. Our method determines the objective priorities during the learning process and updates the learning sequence of the objectives to adapt to the changing priorities at different phases. A smooth transition function is developed to mitigate the effects on the learning stability when updating the learning sequence. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method outperforms the baseline methods with a 92.5% success rate in 40 tests and on average takes 36.4% less time to finish the task.
Vol.:(0123456789)
1 3
https://doi.org/10.1007/s10846-022-01680-7
SHORT PAPER
Multi‑Phase Multi‑Objective Dexterous Manipulation withAdaptive
Hierarchical Curriculum
LingfengTao1· JiucaiZhang2· XiaoliZhang1
Received: 18 August 2021 / Accepted: 20 June 2022
© The Author(s), under exclusive licence to Springer Nature B.V. 2022
Abstract
Dexterous manipulation tasks usually have multiple objectives. The priorities of these objectives may vary at different phases
of a manipulation task. Current methods do not consider the objective priority and its change during the task, making a robot
have a hard time or even fail to learn a good policy. In this work, we develop a novel Adaptive Hierarchical Curriculum to
guide the robot to learn manipulation tasks with multiple prioritized objectives. Our method determines the objective pri-
orities during the learning process and updates the learning sequence of the objectives to adapt to the changing priorities at
different phases. A smooth transition function is developed to mitigate the effects on the learning stability when updating
the learning sequence. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in
which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results
show that the proposed method outperforms the baseline methods with a 92.5% success rate in 40 tests and on average takes
36.4% less time to finish the task.
Keywords Multi-phase multi-objective manipulation· Adaptive curriculum· Objective priority· Robot learning
1 Introduction
Dexterous manipulation is essential to increase robots’ usa-
bility in assembly, healthcare, education, and living assis-
tance. These tasks typically need to be finished in multi-
ple phases, and each phase has multiple objectives [9, 10].
Although all phases usually share the same set of objec-
tives [25, 30], the priorities of objectives in each phase can
vary, which are critical to achieving the manipulation tasks’
efficiency and success rate. For example, an assembly task
usually has two phases: (1) approaching, (2) installation. All
phases share three objectives: (a) fast speed, (b) high preci-
sion, and (c) avoid the collision. In the first phase, the robot
picks up the assembly part and move to the target position.
The task objective with the top priority is to avoid touching
other parts, then try to move faster to minimize the execu-
tion time, and the lowest priority is to move precisely. In the
second phase, the robot reaches the target position and is
ready for installation, and now the priority order changes to
high precision to improve the installation quality, minimize
the execution time, and avoid touching other parts.
Existing research in the traditional control theory mainly
focuses on weighing multiple objectives to balance objec-
tives with optimization methods [13], which is computa-
tionally inefficient. Although deep reinforcement learning
(DRL) has been proven effective in enabling the robot to
conduct autonomous manipulation tasks intelligently [19],
the current reward formulation is usually a linear summation
of the reward components of objectives, which is implicit
and inefficient to learn the objective priorities, and causing
poor learning performance (i.e., take a long time to learn or
even fail to learn a correct policy). Furthermore, the current
reward mechanism is usually fixed through all phases. This
one-fix-all solution (i.e., using the same objective priority
for all phases) cannot ensure each phase’s local performance
to be optimal. Such solutions may lead to sub-optimal per-
formance as the reward is not customized for each phase of
* Xiaoli Zhang
xlzhang@mines.edu
Lingfeng Tao
tao@mines.edu
Jiucai Zhang
zhangjiucai@gmail.com
1 Colorado School ofMines, Intelligent Robotics andSystems
Lab, 1500 Illinois St, Golden, CO80401, USA
2 GAC R&D Center Silicon Valley, Sunnyvale, CA94085,
USA
/ Published online: 16 August 2022
Journal of Intelligent & Robotic Systems (2022) 106: 1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Article
Hierarchical impedance-based tracking control has attracted much interest recently due to its advantages of no external forces/torques feedback and inertia reshaping. Desired trajectories on each hierarchy level can be asymptotically tracked following the order of priorities. However, tasks specified by users are required to have the same dimensions as the DOF of robot and are simultaneously feasible and independent. In this paper, all these restrictions are removed and a passivity-based hierarchical tracking controller with strict priorities is developed for an arbitrary number of conflicting tasks. Based on the theory of constrained optimization, a new control objective is proposed to achieve automatic tracking of hierarchy-consistent trajectory that is local optimal following the hierarchy order. Formal proof of asymptotic stability for tracking case of partially feasible, conflicting tasks is provided. The effectiveness of proposed method is evaluated in simulations and experiments on Franka Emika Panda robot.
Article
Full-text available
In order to reduce the influence of the coupling characteristics between the aircraft and engine of hypersonic vehicle, an integrated aircraft-engine control method of hypersonic vehicle is studied in this paper. Firstly, the control-oriented aircraft-engine integrated mathematical model of the hypersonic vehicle is established. Then, by using the nonlinear dynamic inversion (NDI) and the incremental nonlinear dynamic inversion (INDI), the outer loop control algorithm with slow attitude change and the inner loop control algorithm with fast angular rate change are respectively designed. Moreover, based on the online ontology model including aircraft-engine coupling characteristics of the hypersonic vehicle, the coupling control scheme of flight posture and engine of the hypersonic vehicle is designed in the form of control linkage. Finally, the reference model, error control, online estimation and other modules are introduced into the NDI controller to ensure the flight quality and robustness requirements of the hypersonic vehicle. The simulation results show that the aircraft-engine coupling control scheme based on the NDI achieves the expected control performance.
Conference Paper
Full-text available
Wheelchair-mounted robotic arms are used in rehabilitation robotics to help persons with physical impairment perform activity of daily living (ADL) tasks. However, the dexterity of manipulation tasks makes the teleoperation of the robotic arm challenging for the user, as it is difficult to control all degrees of freedom with a handheld joystick or a touch screen device. PbD(Programming by demonstration) allows the user to demonstrate the desired behavior and enables the system to learn from the demonstrations and adapt to a new environment. This learned model can perform a new set of action in a new environment. Learning from demonstration includes object identification and recognition, trajectory planning, obstacle avoidance, and adapting to a new environment, wherever necessary. PbD using a learning-based approach learns the task through a model that captures the underlying structures of the task. The model can be a probabilistic graphical model, a neural network, or a combination of both. PbD with learning can be generalized and applied to new situations as this method enables the robot to learn the model rather than just memorizing and imitating the demonstration. In addition to this, it also helps in efficient learning with a reduced number of demonstrations. This survey focuses on an overview of the recent machine learning (ML) techniques used with PbD to perform dexterous manipulation tasks that enable the robot to learn and apply what is learned to a new set of tasks and a new environment. (PDF) Programming by demonstration using learning based approach: A Mini review. Available from: https://www.researchgate.net/publication/372701841_Programming_by_demonstration_using_learning_based_approach_A_Mini_review [accessed Aug 27 2024].
Article
Full-text available
In-hand manipulation and grasp adjustment with dexterous robotic hands is a complex problem that not only requires highly coordinated finger movements but also deals with interaction variability. The control problem becomes even more complex when introducing tactile information into the feedback loop. Traditional approaches do not consider tactile feedback and attempt to solve the problem either by relying on complex models that are not always readily available or by constraining the problem in order to make it more tractable. In this paper, we propose a hierarchical control approach where a higher level policy is learned through reinforcement learning, while low level controllers ensure grip stability throughout the manipulation action. The low level controllers are independent grip stabilization controllers based on tactile feedback. The independent controllers allow reinforcement learning approaches to explore the manipulation tasks state-action space in a more structured manner. We show that this structure allows learning the unconstrained task with RL methods that cannot learn it in a non-hierarchical setting. The low level controllers also provide an abstraction to the tactile sensors input, allowing transfer to real robot platforms. We show preliminary results of the transfer of policies trained in simulation to the real robot hand.
Article
Full-text available
In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent’s utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.
Conference Paper
Full-text available
Autonomously learning a complex task takes a very long time for Reinforcement Learning (RL) agents. One way to learn faster is by dividing a complex task into several simple subtasks and organizing them into a Curriculum that guides Transfer Learning (TL) methods to reuse knowledge in a convenient sequence. However, previous works do not take into account the TL method to build specialized Curricula, leaving the burden of a careful subtask selection to a human. We here contribute novel procedures for: (i) dividing the target task into simpler ones under minimal human supervision ; (ii) automatically generating Curricula based on object-oriented task descriptions; and (iii) using generated Curricula for reusing knowledge across tasks. Our experiments show that our proposal achieves a better performance using both manually given and generated subtasks when compared to the state-of-the-art technique in two different domains.
Article
Full-text available
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. Our experiments indicate that our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0
Article
In this paper, an adaptive controller is developed for discrete time linear systems that takes into account parametric uncertainty, internal-external non-parametric random uncertainties, and time varying control signal delay. Additionally, the proposed adaptive control is designed in such a way that it is utterly model free. Even though these properties are studied separately in the literature, they are not taken into account all together in adaptive control literature. The Q-function is used to estimate long-term performance of the proposed adaptive controller. Control policy is generated based on the long-term predicted value, and this policy searches an optimal stabilizing control signal for uncertain and unstable systems. The derived control law does not require an initial stabilizing control assumption as in the ones in the recent literature. Learning error, control signal convergence, minimized Q-function, and instantaneous reward are analyzed to demonstrate the stability and effectiveness of the proposed adaptive controller in a simulation environment. Finally, key insights on parameters convergence of the learning and control signals are provided.