A preview of this full-text is provided by Springer Nature.
Content available from Journal of Intelligent & Robotic Systems
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
https://doi.org/10.1007/s10846-022-01680-7
SHORT PAPER
Multi‑Phase Multi‑Objective Dexterous Manipulation withAdaptive
Hierarchical Curriculum
LingfengTao1· JiucaiZhang2· XiaoliZhang1
Received: 18 August 2021 / Accepted: 20 June 2022
© The Author(s), under exclusive licence to Springer Nature B.V. 2022
Abstract
Dexterous manipulation tasks usually have multiple objectives. The priorities of these objectives may vary at different phases
of a manipulation task. Current methods do not consider the objective priority and its change during the task, making a robot
have a hard time or even fail to learn a good policy. In this work, we develop a novel Adaptive Hierarchical Curriculum to
guide the robot to learn manipulation tasks with multiple prioritized objectives. Our method determines the objective pri-
orities during the learning process and updates the learning sequence of the objectives to adapt to the changing priorities at
different phases. A smooth transition function is developed to mitigate the effects on the learning stability when updating
the learning sequence. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in
which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results
show that the proposed method outperforms the baseline methods with a 92.5% success rate in 40 tests and on average takes
36.4% less time to finish the task.
Keywords Multi-phase multi-objective manipulation· Adaptive curriculum· Objective priority· Robot learning
1 Introduction
Dexterous manipulation is essential to increase robots’ usa-
bility in assembly, healthcare, education, and living assis-
tance. These tasks typically need to be finished in multi-
ple phases, and each phase has multiple objectives [9, 10].
Although all phases usually share the same set of objec-
tives [25, 30], the priorities of objectives in each phase can
vary, which are critical to achieving the manipulation tasks’
efficiency and success rate. For example, an assembly task
usually has two phases: (1) approaching, (2) installation. All
phases share three objectives: (a) fast speed, (b) high preci-
sion, and (c) avoid the collision. In the first phase, the robot
picks up the assembly part and move to the target position.
The task objective with the top priority is to avoid touching
other parts, then try to move faster to minimize the execu-
tion time, and the lowest priority is to move precisely. In the
second phase, the robot reaches the target position and is
ready for installation, and now the priority order changes to
high precision to improve the installation quality, minimize
the execution time, and avoid touching other parts.
Existing research in the traditional control theory mainly
focuses on weighing multiple objectives to balance objec-
tives with optimization methods [13], which is computa-
tionally inefficient. Although deep reinforcement learning
(DRL) has been proven effective in enabling the robot to
conduct autonomous manipulation tasks intelligently [19],
the current reward formulation is usually a linear summation
of the reward components of objectives, which is implicit
and inefficient to learn the objective priorities, and causing
poor learning performance (i.e., take a long time to learn or
even fail to learn a correct policy). Furthermore, the current
reward mechanism is usually fixed through all phases. This
one-fix-all solution (i.e., using the same objective priority
for all phases) cannot ensure each phase’s local performance
to be optimal. Such solutions may lead to sub-optimal per-
formance as the reward is not customized for each phase of
* Xiaoli Zhang
xlzhang@mines.edu
Lingfeng Tao
tao@mines.edu
Jiucai Zhang
zhangjiucai@gmail.com
1 Colorado School ofMines, Intelligent Robotics andSystems
Lab, 1500 Illinois St, Golden, CO80401, USA
2 GAC R&D Center Silicon Valley, Sunnyvale, CA94085,
USA
/ Published online: 16 August 2022
Journal of Intelligent & Robotic Systems (2022) 106: 1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.