Conference PaperPDF Available

Learning Robust Task Priorities of QP-Based Whole-Body Torque-Controllers

Authors:

Abstract

Generating complex whole-body movements for humanoid robots is now most often achieved with multi-task whole-body controllers based on quadratic programming. To perform on the real robot, such controllers often require a human expert to tune or optimize the many parameters of the controller related to the tasks and to the specific robot, which is generally reported as a tedious and time consuming procedure. This problem can be tackled by automatically optimizing some parameters such as task priorities or task trajectories, while ensuring constraints satisfaction, through simulation. However, this does not guarantee that parameters optimized in simulation will also be optimal for the real robot. As a solution, the present paper focuses on optimizing task priorities in a robust way, by looking for solutions which achieve desired tasks under a variety of conditions and perturbations. This approach, which can be referred to as domain randomization, can greatly facilitate the transfer of optimized solutions from simulation to a real robot. The proposed method is demonstrated using a simulation of the humanoid robot iCub for a whole-body stepping task.
Learning robust task priorities of QP-based whole-body
torque-controllers
Marie Charbonneau1,2, Valerio Modugno2,3, Francesco Nori4, Giuseppe Oriolo3,
Daniele Pucci1and Serena Ivaldi2
Abstract Generating complex whole-body movements for
humanoid robots is now most often achieved with multi-task
whole-body controllers based on quadratic programming. To
perform on the real robot, such controllers often require a
human expert to tune or optimize the many parameters of the
controller related to the tasks and to the specific robot, which is
generally reported as a tedious and time consuming procedure.
This problem can be tackled by automatically optimizing some
parameters such as task priorities or task trajectories, while
ensuring constraints satisfaction, through simulation. However,
this does not guarantee that parameters optimized in simulation
will also be optimal for the real robot. As a solution, the present
paper focuses on optimizing task priorities in a robust way, by
looking for solutions which achieve desired tasks under a variety
of conditions and perturbations. This approach, which can be
referred to as domain randomization, can greatly facilitate the
transfer of optimized solutions from simulation to a real robot.
The proposed method is demonstrated using a simulation of
the humanoid robot iCub for a whole-body stepping task.
I. INTRODUCTION
Applications involving highly redundant robots, such as
humanoids, have the potential to disrupt many industries and
bring countless benefits to society. Nevertheless, the design
of efficient and safe controllers for humanoid platforms
is highly challenging, especially for applications involving
interaction with the environment. Indeed, a great number of
issues can stem from the complexity of the control algo-
rithms, task inconsistencies, or the need of guaranteeing con-
straints satisfaction, in order to generate physically consistent
behaviors. A highly effective solution to the whole-body
control problem is to decompose a complex behavior into
several elementary tasks, according to a prioritization scheme
[1], typically either a “strict” or a “soft” task hierarchy.
With strict prioritization strategies (such as a traditional
stack-of-tasks), a fixed task hierarchy is assured by geomet-
rical conditions such as a null space task projector [2], [3],
or by the use of quadratic programming (QP) to compute
optimal control actions [4]. Conversely, prioritization in a
soft task hierarchy (based on a weighted combination of
tasks), can be achieved by assigning each task a weight
defining its relative importance [5], [6].
This work was supported by the EU H2020 program under the Marie
Sklodowska-Curie SECURE grant (n.642667), as well as the european
projects An.Dy (n.731540) and Comanoid (n.645097).
1iCub Facility, Istituto Italiano di Tecnologia, Genova, Italy.
name.surname@iit.it
2Loria, Inria Nancy - Grand Est, CNRS & Univ. Lorraine, Villers-l`
es-
Nancy, France. name.surname@inria.fr
3DIAG, Sapienza Universita di Roma, Roma, Italy.
surname@diag.uniroma1.it
4Google DeepMind, London, UK. fnori@google.com
Fig. 1: Different iCub models performing the same whole-
body motion with several tasks. In this paper, with the pur-
pose to eventually ease the passage from simulated to real-
world robots, we optimize the task priorities for robustness,
in order to allow their transfer from one robot model to a
different one, without the need of re-tuning.
Usually, task priorities are designed a priori by experts,
then manually tuned to adjust the task ordering, timing,
transitions, etc. It is a tedious operation which, depending on
the complexity of the system and the task, can take a notable
portion of a researcher’s time, and can be particularly hard
in the case of whole-body control of floating-base platforms.
A recent line of research seeks to tackle this issue, aiming
to automatically learn whole-body task priorities [7], [8],
[9]. For instance in [7], task coefficients are learned for
controlling a single arm, allowing the transfer of acquired
knowledge to different tasks. In [8], task generalization is
achieved with an evolutionary algorithm employed to learn
policies for a range of tasks using parametrized motor skills.
Then, in [9], task priorities guaranteeing the non violation
of system constraints are learned for bimanual motions.
Learning methods often use a random exploration over
numerous experiments, which could be problematic to per-
form on hardware. For this reason, training is preferably per-
formed in simulation. However, inherent differences between
simulated and real robots can render an optimal solution
untransferrable from one to the other. This gap between
reality and simulation needs to be accounted for, in order
to achieve automatic parameter tuning.
Solutions to this problem have recently been addressed by
trial-and-error algorithms [10], [11]. In [10], prior knowledge
from simulations was exploited to find acceptable behaviors
on the real robot, in few trials. In [11], a trial-and-error
learning algorithm encouraged exploration of the task space,
allowing adaptation to inaccurate models, also in few trials.
Learning from simulation can also simplify data gathering,
but there is no guarantee that the acquired knowledge is
effective in the real world. For example, optimal control and
movement primitives are combined in [12] to find solutions
which can be easily deployed on the real robot, but they
strongly rely on the accuracy of the simulated model.
Also, since QP solvers often allow for constraints relax-
ation, strict constraints satisfaction is not always ensured by
the frameworks presented above. In [9], the learning proce-
dure guarantees strict constraints fulfillment, but transferring
knowledge from simulation to reality did not fully achieve
the desired behavior. This implies that constraints satisfac-
tion is beneficial, but not enough to achieve transferability:
solutions which are robust rather than optimal are needed,
in order to achieve a better generalization.
Looking for robust solutions is central to transfer learning,
which seeks to exploit previously acquired knowledge [13]
to allow a model trained on one task to be re-purposed on a
second related one. This approach can benefit from domain
randomization (DR) [14], which consists in randomizing
some aspects of the simulation to enrich the range of possible
environments experienced by the learner. In [15], robust
policies for pivoting a tool held in the robot’s gripper were
learned in simulation, given random friction and control
delays, such that the learned policies proved to be effective
on the real robot as well.
This work proposes to apply the idea of DR to whole-body
controllers. In this context, we want to ensure that balance is
maintained while performing a task, even if large differences
exist between the learning domain and the testing domain.
To achieve this result, we combine a DR approach with
fitness functions promoting the robustness of the learned con-
troller, and provide a method to learn robust task priorities
which achieve desired goals, while allowing to facilitate the
transfer of results from simulation to reality. The effective-
ness of the proposed method is demonstrated by optimizing
parameters in simulation for the goal of making the iCub
perform steps, and showing that it is possible to overcome
issues related to the transferability problem.
The paper is organized as follows. Section II introduces
the modeling of a floating-base robot for whole-body control.
Section III describes the task-based whole-body controller
used for validating the approach, as well as the constrained
optimization algorithm used for learning task priorities. Ex-
periments led with different models of the iCub robot are
then illustrated in Section IV. Results are discussed in V,
leading to the conclusion of the paper.
II. BACKGROU ND
A. Notation used throughout the paper
The set of real numbers is denoted by R
The transpose operator is denoted by (·)>
The Moore-Penrose pseudoinverse is denoted by (·)
|v|denotes the Euclidean norm of a vector vRn
Given a time function f(t)Rn, its first- and second-
order time derivatives are denoted by ˙
f(t)and ¨
f(t)
B. System Modelling
The robot is modelled as described in [3], where it is
assumed to be a free-floating system composed of n+1 links
wQP controller robot
u
s, ˙s
fitness
φ(w)
stochastic
optimization
updated weights
wlearning
controller
Conditions j
III-A
III-B
(7)
Fig. 2: Overview of the proposed method. Given task prior-
ities w, the QP-based controller computes a control input
uunder a set of conditions j(e.g. desired trajectories,
disturbances, noise). An outer learning loop allows the
optimization of the task weights.
connected with n1-DOF (degree of freedom) joints. The
robot configuration space is characterized by q, composed
of the position and orientation of a base frame Battached to
a link of the robot, and the joint angles s. The robot velocity
is expressed as ν= (vB,˙s), where vBis the velocity of
the base frame expressed with respect to the inertial frame.
Applying the Euler-Poincar´
e formalism [16, Chapter 13.5]
yields the following system equation of motion:
M(q) ˙ν+h(q, ν) = ζτ +
nc
X
k=1
J>
ckfk(1)
where MRn+6×n+6 is the mass mass matrix, hRn+6
is the bias vector of Coriolis and gravity terms, τRn
is a vector representing the actuation joint torques, and
ζ= (0n×6,1n)>is a selector matrix, in which 1nRn×n
denotes the identity matrix and 0n×6Rn×6is a zero
matrix. Assuming that the robot is interacting with the
environment through ncdistinct wrenches (as an abuse of
notation, here we do not define wrench as the dual of a
twist), fkR6denotes the k-th external wrench applied by
the environment on the frame ckof the robot.
III. MET HOD S
The method proposed for learning robust task priorities
(outlined in Fig. 2) relies on two main parts: (i) a QP-
based whole-body torque-controller which tracks desired task
trajectories and sends joint torque commands to the robot,
and (ii) a black-box constrained stochastic optimization pro-
cedure, posing no restriction on the structure of the learning
problem. It is used to optimize task priorities was follows:
at the end of a roll-out (i.e. execution of a footstep), the
fitness of the obtained robot behavior is evaluated, allowing
the optimization algorithm to update the task weights.
A. QP-based whole-body torque controller
The control framework used in this paper was defined
from a stack of tasks solved by quadratic programming. The
following subsections define the control input and objective,
before describing an optimization problem allowing the
computation of a control input which achieves the objective.
1) Control input: A control input u=τ>F>
c>is
composed of joint torques τand contact forces Fc, where
FcR6ncis a stacked vector of contact wrenches. Defining
Jcas the contact Jacobian, stacked in the same way as Fc,
and B=ζ J>
c, the system dynamics (1) can then be
reformulated as
M(q) ˙ν+h=Bu (2)
2) Control objective: The developed controller has the
following objectives, from which a stack of tasks is defined:
Stabilize the center of mass position XCoM R3
Stabilize the stance foot pose Xstance R6
Stabilize the swing foot pose Xswing R6
Stabilize the neck orientation Xneck R3
Track joint positions (postural task) s
Minimize joint torques τ
The velocity ˙
XTof a Cartesian task Tcan be computed
from the associated Jacobian JTand the robot velocity ν
with ˙
XT=JTν, where T∈ {CoM , stance, swing, neck}.
Deriving this expression yields the task acceleration:
¨
XT=˙
JTν+JT˙ν(3)
In view of Eq. (2) and (3), task accelerations ¨
XTcan be
formulated as a function of the control input u:
¨
XT(u) = ˙
JTν+JTM1(Bu h)(4)
Stabilization of a Cartesian task may then be attempted
by minimizing ˜
¨
XT(u) = ¨
XT(u)¨
X
T, the error on the
task acceleration, where ¨
X
Tis a feedback term composed of
reference linear accelerations obtained with a proportional-
derivative (PD) feedback control policy, and reference angu-
lar accelerations computed as in [17, section 5.11.6, p.173].
As for the postural task, similarly as above, in view of
Eq. (2), one can obtain ¨s(u), a formulation of the joint
accelerations ¨sas a function of the control input u:
¨s(u) = ζ>M1(Bu h)(5)
Stabilization of the postural task may then be attempted
by minimizing ˜
¨s(u) = ¨s(u)¨sref , the error on the joint
accelerations, with ¨sref obtained through inverse kinematics
by solving the following optimization problem:
¨sref = arg min
¨s
1
2X
T
wT
˜
˙νT
2+ws
˜
˙νs
2(6)
In the above, the error on robot acceleration is computed
with ˜
˙νT= ˙νJ
T˙
JTν¨
X
Tfor Cartesian tasks, and
with ˜
˙νs= ˙ν˙
v∗>
B¨s∗>>for the postural task. ¨sis
a feedback term computed using a PD control policy, and
˙
v
Bis a feedback term on the base velocity used to ensure
convergence of the base acceleration to zero.
3) QP formulation: In the context where tasks may have
relative priorities, soft task priorities would allow to achieve
a trade-off between weighted tasks. A set of task weights is
defined, in order to attribute each task a priority, with
w={wCoM , wstance , wswing, wneck , ws, wτ}(7)
The terms wCoM , wstance , wswing, wneck Rrefer to
weights associated to the CoM, stance foot, swing foot
and neck Cartesian tasks, and ws, wτRto the weights
associated to the postural task and joint torque regularization.
The control architecture described above can thus be
formulated as the following optimization problem.
u= arg min
u
1
2cost (8a)
subject to Cu b(8b)
where the constraint Eq. (8b) ensures that the contact forces
remain within the associated friction cones. The cost function
(8a) is computed as the weighted sum of all task objectives:
cost =PTwT
˜
¨
XT(u)
2
+ws
˜
¨s(u)
2+wτ|τ(u)|2, in which
wTRrefers to weights associated to Cartesian tasks.
Reorganizing the terms in the cost function, one can easily
verify that it has the form of a QP problem.
B. Constrained stochastic optimization
In this work, the learning problem is cast as a black-box
constrained stochastic optimization. Given a fitness function
φ(w) : RnPR(with nPthe number of task weights
to optimize), sets of nIC inequality constraints and nE C
equality constraints h,g, the idea is to find an optimal
solution wWRnPto the problem:
w= arg max
w
φ(w)(9a)
subject to
gi(w)0i= 1, . . . , nIC (9b)
hi(w)=0 i= 1, . . . , nEC (9c)
The Covariance Matrix Adaptation - Evolution Strat-
egy (CMA-ES), first described in [18], is a standard tool
for solving black-box optimization problems. In its basic
implementation, CMA-ES defines a multivariate Gaussian
distribution N(¯
w, σ2,Σ)with mean ¯
w, step size σ2and
covariance Σ. However, since CMA-ES was not originally
designed for handling constraints, this work employs a
constrained extension of it called (1+1)CMA-ES, which uses
covariance constrained adaptation (CCA) [19]. This variant
was benchmarked in [9] for bimanual tasks, as well as in
[20] for trajectory optimization of a whole-body movement
(standing up from a chair).
A single iteration, or generation, of (1+1)-CMA-ES com-
prises several stages. First, a sample w1is drawn according
to the rule w1=w+σDz, where Dis the Cholesky factor
of the covariance matrix Σ=D>Dand zis a standard
normal distributed vector z N (0,I). The fitness function
φis evaluated for this sample, and the Gaussian distribution
is updated based on the result, as described in [21]. This
update also exploits the previous generations to change the
covariance and step size of the search distribution. Finally,
information about the successful iterations is stored in a
search path sRnP. A more detailed description of the
algorithm can be found in [20].
IV. EXP ERI MEN TS
This section exposes experiments with the iCub robot [22].
They were performed in order to validate empirically that the
method described in section III is capable of generating task
priorities which (i) produce robust whole-body motions, even
when contacts due to physical interaction with the environ-
ment evolve in time, and (ii) can cope with imperfections in
the robot model, disturbances and noise.
1) Robot setup: Experiments were conducted in simula-
tion using the open-source robot simulator Gazebo [23]. They
were performed with the iCub robot, using 23 DOF on legs,
arms and torso. The robot is equipped with 6 force/torque
sensors on the arms, legs and feet, allowing whole-body
dynamics computations. We tested our method using two
distinct models of the iCub with different inertial properties:
the first one with a tethered power supply and the second
one with a battery pack on the back, as shown in Fig. 1.
2) Whole-body motion: The controller described in sec-
tion III-A was developed in Matlab/Simulink using WBTool-
box [24]. It can be used either for a simulated or a real robot.
It is applied to the task of performing steps, i.e. performing
the following sequential movements for each step: move
the CoM above the stance foot, move the swing foot up
by 0.025m, move the swing foot back to its initial pose,
and move the CoM back to its initial position. The passage
from one movement to the next takes place when Cartesian
task errors are smaller than a threshold and contact forces
are consistent with the desired motion, or a maximum state
duration has been reached. Over the course of a step, the
desired stance foot pose, neck orientation and posture remain
at their initial value.
Note that over all experiments, the proportional-derivative
gains used to compute reference accelerations associated to
Cartesian and postural tasks are kept constant.
The next subsections describe the experiments we per-
formed, while Table I presents them in summary form. Train-
ing and testing were performed with different robot models
and stepping tasks, as well as under different randomized
conditions (RC) as described in Table II.
A. Training with the tethered iCub model
In order to optimize task weights, three different fitness
functions were compared. The first one, φp, favored perfor-
mance on the Cartesian tasks and less deployed effort. The
second one, φr, focused on robustness, by favoring solutions
with smaller excursion of the ZMP position PZM P with
respect to the center of the support polygon OSP . The third
fitness function, φpr, was a combination of the first two:
φp=1
XTmax
tend
X
t=0 X
T
˜
XT
2
0.0001
τmax
tend
X
t=0
|τ|2(10a)
φr=1
PZM Pmax
tend
X
t=0
|PZM P OSP |2(10b)
φpr =1
2(φp+φr)(10c)
XTmax ,τmax and PZM Pmax are normalization factors.
Inequality constraints to the learning problem were defined
on joint position limits and torque limits: they act as an extra
safety built on top of the QP controller.
The task used for the optimization is to perform 1 step,
as described in section IV-.2. The simulation was limited to
10 seconds, allowing the robot to perform one step and shift
its weight to the side in order to start a second one, making
sure that the robot remained stable after foot touchdown.
Early termination of a simulation took place in cases where
the robot fell or the QP in Eq. (8) could not be successfully
solved (which may happen when the weights being tested are
far from being optimal, and lead to a configuration in which
the QP solver simply can not find a solution). In these cases,
a penalty of 1.5was added to the fitness in Eq. (10).
Optimized task priorities were obtained by performing
200 iterations of (1+1)-CMA-ES [19] applied to the control
framework, with an exploration rate of 0.1; each learning ex-
periment was repeated 10 times. Since task priorities as used
in Eq. (6) and (8) are relative to each other, wCoM was at-
tributed a fixed value of 1. The remaining task priorities were
attributed bounds as shown in the bottom of Table I. Fur-
thermore, the weight values w0= [1,1,1,0.1,1e3,1e4]
obtained by hand were verified to allow the tethered iCub
model to successfully perform the desired stepping motion,
and were therefore used as a starting point.
Moreover, to achieve robustness through domain
randomization and enable the controller to eventually
cope with real-world data, the robot was subjected to the
randomized conditions 1 to 5 (see Table II) at each learning
iteration. These conditions constitute the set of conditions
junder which the controller had to perform. In particular,
tests performed in simulation showed that applying a
force of 10N on the chest for no more than 1second was
sufficient to destabilize the robot when using unoptimal
weights. Thus, using such disturbances during learning can
encourage the generation of robust task priorities.
An additional set of learning experiments was performed
for φpr without using DR, to assess the contribution of DR.
Results showed that 200 iterations were sufficient to
achieve strong convergence. Weights obtained with each of
the fitness functions in Eq. (10) are shown in Table I.
B. Testing 1: with the tethered iCub model
In order to validate the robustness achieved with the
optimized weights, each set of them were tested on the
same iCub model used for training, but not submitted to
randomized conditions. The testing task in this case was to
achieve 6 consecutive steps. The success rates achieved with
the optimized weights from each fitness function are shown
in Table I, and typical trajectories achieved using each fitness
function are shown in the left part of Fig. 3 and in Fig. 4.
C. Testing 2: with the backpacked iCub model
In order to replicate conditions similar to performing
experiments on the real robot, we prepared a second model
of the iCub, with a battery pack installed on its back. This
TABLE I: Summary of performed experiments and achieved results
Scenario Training Testing 1 Testing 2
Robot tethered tethered backpacked
Task 1 step 6 steps 6 steps
RC 1, 2, 3, 4, 5 5, 6, 7
Fitness # DR wCoM wstance wswing wneck wswτsuccess success
hand tuning 1 No 1 1 1 1 1e3 1e41/10/1
φp10 Yes 1 0.2±0.3 1.1±1.2 (1 ±3)e3 0.5±0.3 (2 ±5)e55/10 0/10
φr10 Yes 1 1.6±1.2 1.8±1.0 0.1±0.1 (4 ±7)e3 1e10 7/10 5/10
φpr 10 Yes 1 0.9±1.3 2.4±1.1 0.6±1.2 1e6 1e10 10/10 10/10
φpr 10 No 1 1.0±0.3 0.1±0.3 0.2±0.1 1e6 (4 ±5)e68/10 2/10
weights lower bounds: 1 1e4 1e4 1e10 1e6 1e10
weights upper bounds: 1 10 10 10 10 0.1
Randomized conditions RC are defined in Table II. The column # gives the number of training experiments performed given each fitness function, while
only 1 set of manually tuned gains was used. In the column DR, “Yes” means that the RC defined at the top of the table were used for DR while
training; “No” means that they were disabled. The “success” columns report success rates achieved when testing the sets of trained weights.
(a) tethered model (b) backpacked model
Fig. 3: Typical CoM and feet trajectories obtained for 6 strides performed with the tethered and backpacked iCub models,
given initial weights w0or weights optimized using DR (except for “no DR”) with φp,φr,φpr. Each color denotes the use
of a different set of optimized weights. The x,yand zaxes correspond respectively to the sagittal, frontal and vertical axes.
TABLE II: Randomized set of conditions j
Randomized Condition (RC) in jRandom value
1. Gaussian noise on F/T sensor signals On / Off
2. Appointed swing foot Left / Right
3. Direction in which swing foot is moved Front / Back
4. XCoMdmoved forward by δ(m) δ|δR+, δ 0.02
5. nFexternal wrenches applied on chest {nF|nFZ, nF7}
and for each wrench i:
at time tFi(s, with 1e2precision) tFi|tFiR+, tFi10
duration dF(s, with 1e2precision) dFi|dFiR+, dFi1
direction (γFi, θFi, ϕFi)(rad) γFi|γFiR, γFi2π
θFi|θFiR, θFi2π
ϕFi|ϕFiR, ϕFi2π
force magnitude FFi(N) FFi|FFiR, FFi10
torque magnitude τFi(Nm) τFi|τFiR, τFi10
6. Gaussian noise on joint velocity signals On
7. Gaussian noise on F/T sensor signals On
model was subjected randomized conditions 5 to 7 (see
Table II), with the testing task to achieve 6 consecutive
steps. The success rates achieved with the optimized weights
from each fitness function are shown in Table I. Successful
CoM and feet trajectories obtained with φpr, using DR and
not, are shown in Fig. 3. As could be expected, due to
Fig. 4: Typical ZMP trajectories obtained for 6 steps per-
formed with the tethered iCub model, from left to right:
given initial weights, weights optimized with φp,φr,φpr.
Each color (blue, red or yellow) denotes the use of a different
set of optimized weights. The x,yand zaxes correspond
respectively to the sagittal, frontal and vertical axes.
the addition of noise, ZMP trajectories showed to be highly
noisy and are therefore not shown. These results show that
weights obtained with φpr allow for a higher robustness
of the controller. Additionally, the rate of success achieved
with DR shows to be significantly higher than without DR,
demonstrating that DR did have a measurable impact on the
achieved robustness.
V. DISCUSSION AND CONCLUSION
In summary, the proposed method generated task priorities
for successful whole-body control of different robot models,
in 200 learning iterations. It was demonstrated by performing
training on a first model of the iCub robot, then testing on
a different model submitted to diverse working conditions.
Results achieved with φp, a fitness function favoring task
performance, were likely limited by overfitting on the model
used for learning and did not allow much robustness with
respect to disturbances. On the other hand, optimizing for
robustness with φrallowed higher chances of success when
changing conditions, by encouraging smaller ground reaction
forces and the generation of angular momentum through
the torso and arms. However, φrmight have neglected the
fulfillment of tasks, which are used for keeping balance.
Instead, using φpr, a fitness function combining robustness
and performance, allowed to achieve superior results.
With φpr, the swing foot placement, crucial for stability
at touchdown, was given high importance, while the neck
orientation task a lesser one, allowing compliance to external
forces, which thus facilitates recovery from external pertur-
bations and contact switching. As for the postural task, its
allotted low priority allows it to be used as regularization
(just as joint torques), instead of competing with Cartesian
tasks. Such a solution is interesting, as it may not have been
a priori self-evident to an expert defining task priorities.
Furthermore, the ranges over which sets of optimized weights
were obtained show that the problem has multiple local
minima. Therefore, although task priorities require proper
tuning, the controller is not highly sensitive to a single
precise adjustment of task weights.
Eventually, in order to further improve performance when
using a different robot model or moving on to experiments on
the real robot, one may be interested in performing another
round of optimization on the task priorities, for example
exploiting data-efficient learning methods, as done in [11].
Nonetheless, the robustness achieved with the proposed
method has the potential to allow higher success when
passing from simulation to real-world experiments. Further
works shall explore different DR conditions, such as ran-
domizing values associated to the dynamics of the system,
control delay, or measurement noise amplitude and delay.
The extension of the proposed method to additional param-
eters such as feedback gains will also be examined.
REFERENCES
[1] S. Ivaldi, J. Babiˇ
c, M. Mistry, and R. Murphy, “Special issue on
whole-body control of contacts and dynamics for humanoid robots,”
Autonomous Robots, vol. 40, no. 3, pp. 425–428, Mar 2016.
[2] L. Saab, O. E. Ramos, F. Keith, N. Mansard, P. Sou`
eres, and J. Y.
Fourquet, “Dynamic whole-body motion generation under rigid con-
tacts and other unilateral constraints,” IEEE Transactions on Robotics,
vol. 29, no. 2, pp. 346–362, April 2013.
[3] G. Nava, F. Romano, F. Nori, and D. Pucci, “Stability analysis and
design of momentum-based controllers for humanoid robots,” in 2016
IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), Oct 2016, pp. 680–687.
[4] S. Dafarra, G. Nava, M. Charbonneau, N. Guedelha, F. Andrade,
S. Traversaro, L. Fiorio, F. Romano, F. Nori, G. Metta, and D. Pucci,
“A Control Architecture with Online Predictive Planning for Position
and Torque Controlled Walking of Humanoid Robots,ArXiv e-prints,
July 2018.
[5] J. Salini, V. Padois, and P. Bidaud, “Synthesis of complex humanoid
whole-body behavior: A focus on sequencing and tasks transitions,”
in 2011 IEEE International Conference on Robotics and Automation,
May 2011, pp. 1283–1290.
[6] K. Otani, K. Bouyarmane, and S. Ivaldi, “Generating assistive hu-
manoid motions for co-manipulation tasks with a multi-robot quadratic
program controller,” in ICRA, 2018.
[7] N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization
with a mixture of controllers for motion generation,” in 2015 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS),
Sept 2015, pp. 6416–6421.
[8] S. Ha and C. Liu, “Evolutionary optimization for parameterized whole-
body dynamic motor skills,” in ICRA, 2016.
[9] V. Modugno, U. Chervet, G. Oriolo, and S. Ivaldi, “Learning soft
task priorities for safe control of humanoid robots with constrained
stochastic optimization,” in HUMANOIDS, 2016.
[10] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can
adapt like animals,” Nature, vol. 521, pp. 503–507, 2015.
[11] J. Spitz, K. Bouyarmane, S. Ivaldi, and J. B. Mouret, “Trial-and-error
learning of repulsors for humanoid qp-based whole-body control,” in
2017 IEEE-RAS 17th International Conference on Humanoid Robotics
(Humanoids), Nov 2017, pp. 468–475.
[12] D. Clever, M. Harant, K. D. Mombaur, M. Naveau, O. Stasse, and
D. Endres, “Cocomopl: A novel approach for humanoid walking
generation combining optimal control, movement primitives and learn-
ing and its transfer to the real robot HRP-2,” IEEE Robotics and
Automation Letters, vol. 2, no. 2, pp. 977–984, 2017.
[13] S. Pan and Q. Yang, “A Survey on Transfer Learning,IEEE Trans-
actions on Knowledge and Data Engineering, vol. 22, no. 10, pp.
1345–1359, 2010.
[14] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel,
“Domain randomization for transferring deep neural networks from
simulation to the real world,” in 2017 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), Sept 2017, pp. 23–30.
[15] R. Antonova, S. Cruciani, C. Smith, and D. Kragic, “Reinforcement
learning for pivoting task,CoRR, vol. abs/1703.00472, 2017.
[16] J. E. Marsden and T. S. Ratiu, Introduction to Mechanics and Symme-
try: A Basic Exposition of Classical Mechanical Systems. Springer
Publishing Company, Incorporated, 2010.
[17] R. Olfati-Saber, “Nonlinear control of underactuated mechanical
systems with application to robotics and aerospace vehicles,” Ph.D.
dissertation, Massachusetts Institute of Technology, Cambridge,
February 2001. [Online]. Available: http://hdl.handle.net/1721.1/8979
[18] N. Hansen and A. Ostermeier, “Completely derandomized self-
adaptation in evolution strategies.Evolutionary Computation, vol. 9,
pp. 159–195, Jan 2001.
[19] D. V. Arnold and N. Hansen, “A (1+1)-CMA-ES for constrained
optimisation,” in GECCO, 2012, pp. 297–304.
[20] V. Modugno, G. Nava, D. Pucci, F. Nori, G. Oriolo, and S. Ivaldi,
“Safe trajectory optimization for whole-body motion of humanoids,” in
2017 IEEE-RAS 17th International Conference on Humanoid Robotics
(Humanoids), November 2017, pp. 763–770.
[21] C. Igel, T. Suttorp, and N. Hansen, “A computational efficient co-
variance matrix update and a (1+1)-CMA for evolution strategies,” in
GECCO, 2006.
[22] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori, “The
icub humanoid robot: An open platform for research in embodied
cognition,” in Proceedings of the 8th Workshop on Performance
Metrics for Intelligent Systems, ser. PerMIS ’08. New York, NY,
USA: ACM, 2008, pp. 50–56.
[23] N. Koenig and A. Howard, “Design and use paradigms for gazebo,
an open-source multi-robot simulator,” in Intelligent Robots and Sys-
tems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International
Conference on, vol. 3. IEEE, 2004, pp. 2149–2154.
[24] F. Romano, S. Traversaro, D. Pucci, J. Eljaik, A. Del Prete, and
F. Nori, “A whole-body software abstraction layer for control design
of free-floating mechanical systems,” in IEEE Int. Conf. on Robotic
Computing (IRC), 2017, pp. 148–155.
... Recently Charbonneau et al. [29] and Yuan et al. [30] used BO to find the parameters of a whole-body controller (WBC) (inverse dynamics) for a humanoid robot, yielding robust performance for the control. Our work can be seen as complementary to these works, because we propose to use BO to find the best cost weights of the reactive planner for a given WBC. ...
... is computed using (20) or (26) and y f,rea,out 1 is computed using (29) or (35) based on which foot is stance, W f is the foot width. W and W are the minimum and maximum step width with respect to the nominal step width, respectively, which is pelvis width L p (note that W and W could also be negative [8]); n = 1 when the right foot is stance, and n = 2 when the left foot is stance. ...
... Substituting this equation and (29) into (30), we compute the viability kernel boundary in lateral inward direction as b t,y,r,out = − W f 2 + y f,rea,out ...
Article
Model predictive control (MPC) has shown great success for controlling complex systems, such as legged robots. However, when closing the loop, the performance and feasibility of the finite horizon optimal control problem (OCP) solved at each control cycle are not guaranteed anymore. This is due to model discrepancies, the effect of low-level controllers, uncertainties, and sensor noise. To address these issues, we propose a modified version of a standard MPC approach used in legged locomotion with viability (weak forward invariance) guarantees. In this approach, instead of adding a (conservative) terminal constraint to the problem, we propose to use the measured state projected to the viability kernel in the OCP solved at each control cycle. Moreover, we use past experimental data to find the best cost weights, which measure a combination of performance, constraint satisfaction robustness, or stability (invariance). These interpretable costs measure the tradeoff between robustness and performance. For this purpose, we use Bayesian optimization to systematically design experiments that help efficiently collect data to learn a cost function leading to robust performance. Our simulation results with different realistic disturbances (i.e., external pushes, unmodeled actuator dynamics, and computational delay) show the effectiveness of our approach to create robust controllers for humanoid robots.
... For instance, Del Prete et al. [48] proposed to improve the robustness of task space inverse dynamics by modeling uncertainties in the joint torques of humanoid robots. Charbonneau et al. instead, learned the soft priorities that optimize for both task performance and robustness [35], by applying the idea of DR. However, both [48] and [35] lack experimental validation on the real robot. ...
... Charbonneau et al. instead, learned the soft priorities that optimize for both task performance and robustness [35], by applying the idea of DR. However, both [48] and [35] lack experimental validation on the real robot. ...
Thesis
Full-text available
This thesis aims to investigate systems and tools for teleoperating a humanoid robot. Robotteleoperation is crucial to send and control robots in environments that are dangerous or inaccessiblefor humans (e.g., disaster response scenarios, contaminated environments, or extraterrestrialsites). The term teleoperation most commonly refers to direct and continuous control of a robot.In this case, the human operator guides the motion of the robot with her/his own physical motionor through some physical input device. One of the main challenges is to control the robot in a waythat guarantees its dynamical balance while trying to follow the human references. In addition,the human operator needs some feedback about the state of the robot and its work site through remotesensors in order to comprehend the situation or feel physically present at the site, producingeffective robot behaviors. Complications arise when the communication network is non-ideal. Inthis case the commands from human to robot together with the feedback from robot to human canbe delayed. These delays can be very disturbing for the human operator, who cannot teleoperatetheir robot avatar in an effective way.Another crucial point to consider when setting up a teleoperation system is the large numberof parameters that have to be tuned to effectively control the teleoperated robots. Machinelearning approaches and stochastic optimizers can be used to automate the learning of some of theparameters.In this thesis, we proposed a teleoperation system that has been tested on the humanoid robotiCub. We used an inertial-technology-based motion capture suit as input device to control thehumanoid and a virtual reality headset connected to the robot cameras to get some visual feedback.We first translated the human movements into equivalent robot ones by developping a motionretargeting approach that achieves human-likeness while trying to ensure the feasibility of thetransferred motion. We then implemented a whole-body controller to enable the robot to trackthe retargeted human motion. The controller has been later optimized in simulation to achieve agood tracking of the whole-body reference movements, by recurring to a multi-objective stochasticoptimizer, which allowed us to find robust solutions working on the real robot in few trials.To teleoperate walking motions, we implemented a higher-level teleoperation mode in whichthe user can use a joystick to send reference commands to the robot. We integrated this setting inthe teleoperation system, which allows the user to switch between the two different modes.A major problem preventing the deployment of such systems in real applications is the presenceof communication delays between the human input and the feedback from the robot: evena few hundred milliseconds of delay can irremediably disturb the operator, let alone a few seconds.To overcome these delays, we introduced a system in which a humanoid robot executescommands before it actually receives them, so that the visual feedback appears to be synchronizedto the operator, whereas the robot executed the commands in the past. To do so, the robot continuouslypredicts future commands by querying a machine learning model that is trained on pasttrajectories and conditioned on the last received commands.
... Finally, an optimization algorithm, single or multi-objective, is used to find and select feasible, and optimal wholebody trajectories for the given activity, and DHM morphology according to the chosen score. We use non-linear constrained optimization algorithms to ensure that the solutions are always safe, not violating any constraints for the DHM or the activity [23,108]. ...
... However, it is possible to define different weights (v i ) at Eq. (2.15) and obtain different kinds of movements. Additionally, other cost functions like in Charbonneau et al. [23] can favor different movement aspects. ...
Thesis
This thesis aims to provide tools for improving ergonomics at work environments. Some work activities in industry are commonly executed by workers in a non-ergonomic fashion, which may lead to musculoskeletal disorders in the short or in the long term. Work-related Musculoskeletal Disorders (WMSDs) are a major health issue worldwide, that also represents important costs both for society and companies. WMSDs are known to be caused by multiple factors, such as repetitive motion, excessive force, and awkward, non-ergonomic body postures. Not surprisingly, work environments with such factors may present an incidence of WMSDs of up to 3 or 4 times higher than in the overall population. Here, our approach is to evaluate the human motion with respect to ergonomics indexes, optimize the motion, and intervene on the task based on the optimized motion. To evaluate the body posture ergonomics, we developed a Digital Human Model (DHM) simulation capable of replaying whole-body motions.In simulation, the initial movement can be iteratively improved, until an optimal ergonomic whole-body motion is obtained.We make the case that a robot in physical interaction with a human could drive the human towards more ergonomic whole-body motions, possibly to an ergonomically optimal motion. To design a robot controller that influences the body posture, we first investigate the human motor behavior in a human-human co-manipulation study. In this human dyad study, we observed motor behavior patterns that were used to design a collaboration controller for physical human-robot interaction (pHRI). In a new study, the same co-manipulation task was then executed by humans collaborating with a Franka Emika Panda robot.
... For instance, Del Prete et al. [19] proposed to improve the robustness of task space inverse dynamics by modeling uncertainties in the joint torques of humanoid robots. Charbonneau et al. instead, learned the soft priorities that optimize for both task performance and robustness [20], by applying the idea of DR. However, both [19] and [20] lack experimental validation on the real robot. ...
... Charbonneau et al. instead, learned the soft priorities that optimize for both task performance and robustness [20], by applying the idea of DR. However, both [19] and [20] lack experimental validation on the real robot. ...
Article
Generating complex movements in redundant robots like humanoids is usually done by means of multi-task controllers based on quadratic programming, where a multitude of tasks is organized according to strict or soft priorities. Time-consuming tuning and expertise are required to choose suitable task priorities, and to optimize their gains. Here, we automatically learn the controller configuration (soft and strict task priorities and Convergence Gains), looking for solutions that track efficiently a variety of desired task trajectories while preserving the robot's balance. We use multi-objective optimization to compare and choose among Pareto-optimal solutions that represent a trade-off of performance and robustness. Then we experimentally validate the transferable solutions on the real robot. We experimentally validate our method by learning a control configuration for the iCub humanoid, to perform different whole-body tasks, such as picking up objects, reaching and opening doors.
... The basic idea behind many works on policy search for robotic manipulation is using either knowledge priors or surrogate models for dynamics, structure, or the parameters of the environment so that the robot controller matures with a handful of trials [93]. Accordingly, learning dynamical movement primitives to achieve optimal trajectories [94], learning task priors [95], and transferring knowledge from the human demonstration in imitation learning [96,90] are some notable approaches. ...
Article
Full-text available
The recent development of industrial manufacturing and social services has witnessed a significant trend of automation and intelligentization due to the wide application of robots and the technology of artificial intelligence (AI). While robots liberate humans from tedious and dangerous work in hazardous environments, AI simplifies the programming of robots by automatically inferring patterns and models from the interaction between the robots and the environment. Nevertheless, the application of robots and AI to more general manufacturing and social tasks is still limited by the lack of flexibility and adaptability to the changes in the task and the environment. Thus, a new concept, adaptive robotics, has been proposed to address the desire that an AI-powered robot should be able to properly reprogram itself to these changes without human intervention. Nevertheless, this concept is yet too abstract to provide any specific guidance to the development of robot programs. In this paper, we attempt to provide methodical redefinition and reformulation of adaptive robotics both in conceptual and mathematical manners based on the study of previous results. First of all, we introduce the essential motivation and the conceptual origination of adaptability of mechanical systems. Then, we review the previous literature and explore the related work of adaptive robotics. Based on this, we provide a uniform mathematical formulation of adaptive robotics based on adaptive and robust Markov-decision process (MDP). Through this work, we attempt to inspire the generic framework of adaptive robotics incorporating the existing immature paradigms, by which we are aiming at a clarified and well-defined context of adaptive robotics for future research on related domains.
... This method was generalized on the ATRIAS robot in [28]. For whole-body locomotion control, Charbonneau et al. [29] learned task priorities of QP-based whole-body torque controller with BO. Yuan et al. [30] proposed alternating BO algorithm, which iteratively learns the parameters of sub-spaces from the whole highdimensional parametric space in whole-body torque control. ...
Article
Full-text available
This work proposes a parameters auto-tuning strategy for biped locomotion in whole-body stabilization control (inverse kinematics based and inverse dynamics based) and active impedance control based on Bayesian optimization(BO). Using the domain knowledge, the parameter space is divided into three sub-spaces and optimized by decoupling BO and alternating BO algorithms. The effectiveness of the proposed method is demonstrated in simulation using a torque-controlled biped robot that we developed. The 32 control parameters are tuned in less than 400 evaluations. In addition, the auto-tuned parameters are robust to different top-level velocity inputs and show compliant behavior with balance in push recovery scenarios. To the best of our knowledge, this is the first work to automatically tune the parameters of the three controllers (inverse kinematics, inverse dynamics and active impedance control) jointly.
... Recently [28], [29] used BO to find the parameters of a whole-body controller (inverse dynamics) for a humanoid robot, yielding robust performance for the control. Our work can be seen as complementary to these works, because we propose to use BO to find the best cost weights of the reactive planner for a given whole-body controller. ...
Preprint
Full-text available
Model predictive control (MPC) has shown great success for controlling complex systems such as legged robots. However, when closing the loop, the performance and feasibility of the finite horizon optimal control problem solved at each control cycle is not guaranteed anymore. This is due to model discrepancies, the effect of low-level controllers, uncertainties and sensor noise. To address these issues, we propose a modified version of a standard MPC approach used in legged locomotion with viability (weak forward invariance) guarantees that ensures the feasibility of the optimal control problem. Moreover, we use past experimental data to find the best cost weights, which measure a combination of performance, constraint satisfaction robustness, or stability (invariance). These interpretable costs measure the trade off between robustness and performance. For this purpose, we use Bayesian optimization (BO) to systematically design experiments that help efficiently collect data to learn a cost function leading to robust performance. Our simulation results with different realistic disturbances (i.e. external pushes, unmodeled actuator dynamics and computational delay) show the effectiveness of our approach to create robust controllers for humanoid robots.
... Different works exist that also attempt to ease the burden of the human programmer and automatize the process of selecting task constraints and/or priorities for constraint-based frameworks. A number of approaches apply constrained stochastic optimization or reinforcement learning to find task priorities that improve the overall robot behavior e.g., in terms of robustness [10], safety [11], constraint satisfaction [12,13], smoothness of motion [14] or generalization capabilities [5]. Compared to our work these approaches focus on the automatic derivation of (soft) task priorities in terms of mixing weights that balance the contribution of different (predefined) task constraints. ...
Preprint
Full-text available
Constraint-based control approaches offer a flexible way to specify robotic manipulation tasks and execute them on robots with many degrees of freedom. However, the specification of task constraints and their associated priorities usually requires a human-expert and often leads to tailor-made solutions for specific situations. This paper presents our recent efforts to automatically derive task constraints for a constraint-based robot controller from data and adapt them with respect to previously unseen situations (contexts). We use a programming-by-demonstration approach to generate training data in multiple variations (context changes) of a given task. From this data we learn a probabilistic model that maps context variables to task constraints and their respective soft task priorities. We evaluate our approach with 3 different dual-arm manipulation tasks on an industrial robot and show that it performs better in terms of reproduction accuracy than constraint-based controllers with manually specified constraints.
... Contrary to these works, we resort to using constrained gradient-based trajectory optimization to generate feasible trajectories and use Bayesian optimization on top of it to make the trajectories robust by automatically tuning the cost weights. Also, [22], [23] used BO to find the parameters of a whole-body controller for a humanoid robot, yielding robust performance for the control. Our work can be seen as complementary to these work, because we propose to use BO to find the best cost weights of TO for a given whole-body controller. ...
Article
Full-text available
In this work we propose an approach to learn a robust policy for solving the pivoting task. Recently, several model-free continuous control algorithms were shown to learn successful policies without prior knowledge of the dynamics of the task. However, obtaining successful policies required thousands to millions of training episodes, limiting the applicability of these approaches to real hardware. We developed a training procedure that allows us to use a simple custom simulator to learn policies robust to the mismatch of simulation vs robot. In our experiments, we demonstrate that the policy learned in the simulator is able to pivot the object to the desired target angle on the real robot. We also show generalization to an object with different inertia, shape, mass and friction properties than those used during training. This result is a step towards making model-free reinforcement learning available for solving robotics tasks via pre-training in simulators that offer only an imprecise match to the real-world dynamics.
Article
Full-text available
COCoMoPL is a recently developed approach Combining Optimal Control, Movement Primitives and Learning for the generation of humanoid walking motions [6]. It solves optimal control problems based on detailed dynamic models of the robot for a variety of walking parameters and uses the solutions as training data to create movement primitives that are very close to feasibility and optimality. These can be employed to synthesize complex walking sequences for humanoid robots online in a very efficient way. We demonstrate, for the first time, that COCoMoPL works on a real humanoid robot, here HRP-2 with 36 DOF and 30 position controlled actuators. To this end, it was necessary to significantly extend the existing approach by including transition steps into the training data, modify the movement primitives (MP) to admit these transitions, improve the representation of the ZMP MPs and tighten the transition conditions at the beginning and end of steps. We present a thorough validation of the method in simulation and on the real robot for a challenging sequence of movements. We also compare the characteristics of movements after each step of the methodology.
Conference Paper
Full-text available
In this paper, we propose a software abstraction layer to simplify the design and synthesis of whole-body controllers without making any preliminary assumptions on the control law to be implemented. The main advantage of the proposed library is the decoupling of the control software from implementation details, which are related to the robotic platform. Furthermore, the resulting code is more clean and concise than ad-hoc code, as it focuses only on the implementation of the control law. In addition, we present a reference implementation of the abstraction layer together with a Simulink interface to provide support to Model-Driven based development. We also show the implementation of a simple proportional-derivative plus gravity compensation control together with a more complex momentum-based bipedal balance controller.
Conference Paper
In this paper, we propose a software abstraction layer to simplify the design and synthesis of whole-body controllers without making any preliminary assumptions on the control law to be implemented. The main advantage of the proposed library is the decoupling of the control software from implementation details, which are related to the robotic platform. Furthermore, the resulting code is more clean and concise than ad-hoc code, as it focuses only on the implementation of the control law. In addition, we present a reference implementation of the abstraction layer together with a Simulink interface to provide support to Model-Driven based development. We also show the implementation of a simple proportional-derivative plus gravity compensation control together with a more complex momentum-based bipedal balance controller.