Content uploaded by Marie Charbonneau
Author content
All content in this area was uploaded by Marie Charbonneau on Dec 13, 2018
Content may be subject to copyright.
Learning robust task priorities of QP-based whole-body
torque-controllers
Marie Charbonneau1,2, Valerio Modugno2,3, Francesco Nori4, Giuseppe Oriolo3,
Daniele Pucci1and Serena Ivaldi2
Abstract— Generating complex whole-body movements for
humanoid robots is now most often achieved with multi-task
whole-body controllers based on quadratic programming. To
perform on the real robot, such controllers often require a
human expert to tune or optimize the many parameters of the
controller related to the tasks and to the specific robot, which is
generally reported as a tedious and time consuming procedure.
This problem can be tackled by automatically optimizing some
parameters such as task priorities or task trajectories, while
ensuring constraints satisfaction, through simulation. However,
this does not guarantee that parameters optimized in simulation
will also be optimal for the real robot. As a solution, the present
paper focuses on optimizing task priorities in a robust way, by
looking for solutions which achieve desired tasks under a variety
of conditions and perturbations. This approach, which can be
referred to as domain randomization, can greatly facilitate the
transfer of optimized solutions from simulation to a real robot.
The proposed method is demonstrated using a simulation of
the humanoid robot iCub for a whole-body stepping task.
I. INTRODUCTION
Applications involving highly redundant robots, such as
humanoids, have the potential to disrupt many industries and
bring countless benefits to society. Nevertheless, the design
of efficient and safe controllers for humanoid platforms
is highly challenging, especially for applications involving
interaction with the environment. Indeed, a great number of
issues can stem from the complexity of the control algo-
rithms, task inconsistencies, or the need of guaranteeing con-
straints satisfaction, in order to generate physically consistent
behaviors. A highly effective solution to the whole-body
control problem is to decompose a complex behavior into
several elementary tasks, according to a prioritization scheme
[1], typically either a “strict” or a “soft” task hierarchy.
With strict prioritization strategies (such as a traditional
stack-of-tasks), a fixed task hierarchy is assured by geomet-
rical conditions such as a null space task projector [2], [3],
or by the use of quadratic programming (QP) to compute
optimal control actions [4]. Conversely, prioritization in a
soft task hierarchy (based on a weighted combination of
tasks), can be achieved by assigning each task a weight
defining its relative importance [5], [6].
This work was supported by the EU H2020 program under the Marie
Sklodowska-Curie SECURE grant (n.642667), as well as the european
projects An.Dy (n.731540) and Comanoid (n.645097).
1iCub Facility, Istituto Italiano di Tecnologia, Genova, Italy.
name.surname@iit.it
2Loria, Inria Nancy - Grand Est, CNRS & Univ. Lorraine, Villers-l`
es-
Nancy, France. name.surname@inria.fr
3DIAG, Sapienza Universita di Roma, Roma, Italy.
surname@diag.uniroma1.it
4Google DeepMind, London, UK. fnori@google.com
Fig. 1: Different iCub models performing the same whole-
body motion with several tasks. In this paper, with the pur-
pose to eventually ease the passage from simulated to real-
world robots, we optimize the task priorities for robustness,
in order to allow their transfer from one robot model to a
different one, without the need of re-tuning.
Usually, task priorities are designed a priori by experts,
then manually tuned to adjust the task ordering, timing,
transitions, etc. It is a tedious operation which, depending on
the complexity of the system and the task, can take a notable
portion of a researcher’s time, and can be particularly hard
in the case of whole-body control of floating-base platforms.
A recent line of research seeks to tackle this issue, aiming
to automatically learn whole-body task priorities [7], [8],
[9]. For instance in [7], task coefficients are learned for
controlling a single arm, allowing the transfer of acquired
knowledge to different tasks. In [8], task generalization is
achieved with an evolutionary algorithm employed to learn
policies for a range of tasks using parametrized motor skills.
Then, in [9], task priorities guaranteeing the non violation
of system constraints are learned for bimanual motions.
Learning methods often use a random exploration over
numerous experiments, which could be problematic to per-
form on hardware. For this reason, training is preferably per-
formed in simulation. However, inherent differences between
simulated and real robots can render an optimal solution
untransferrable from one to the other. This gap between
reality and simulation needs to be accounted for, in order
to achieve automatic parameter tuning.
Solutions to this problem have recently been addressed by
trial-and-error algorithms [10], [11]. In [10], prior knowledge
from simulations was exploited to find acceptable behaviors
on the real robot, in few trials. In [11], a trial-and-error
learning algorithm encouraged exploration of the task space,
allowing adaptation to inaccurate models, also in few trials.
Learning from simulation can also simplify data gathering,
but there is no guarantee that the acquired knowledge is
effective in the real world. For example, optimal control and
movement primitives are combined in [12] to find solutions
which can be easily deployed on the real robot, but they
strongly rely on the accuracy of the simulated model.
Also, since QP solvers often allow for constraints relax-
ation, strict constraints satisfaction is not always ensured by
the frameworks presented above. In [9], the learning proce-
dure guarantees strict constraints fulfillment, but transferring
knowledge from simulation to reality did not fully achieve
the desired behavior. This implies that constraints satisfac-
tion is beneficial, but not enough to achieve transferability:
solutions which are robust rather than optimal are needed,
in order to achieve a better generalization.
Looking for robust solutions is central to transfer learning,
which seeks to exploit previously acquired knowledge [13]
to allow a model trained on one task to be re-purposed on a
second related one. This approach can benefit from domain
randomization (DR) [14], which consists in randomizing
some aspects of the simulation to enrich the range of possible
environments experienced by the learner. In [15], robust
policies for pivoting a tool held in the robot’s gripper were
learned in simulation, given random friction and control
delays, such that the learned policies proved to be effective
on the real robot as well.
This work proposes to apply the idea of DR to whole-body
controllers. In this context, we want to ensure that balance is
maintained while performing a task, even if large differences
exist between the learning domain and the testing domain.
To achieve this result, we combine a DR approach with
fitness functions promoting the robustness of the learned con-
troller, and provide a method to learn robust task priorities
which achieve desired goals, while allowing to facilitate the
transfer of results from simulation to reality. The effective-
ness of the proposed method is demonstrated by optimizing
parameters in simulation for the goal of making the iCub
perform steps, and showing that it is possible to overcome
issues related to the transferability problem.
The paper is organized as follows. Section II introduces
the modeling of a floating-base robot for whole-body control.
Section III describes the task-based whole-body controller
used for validating the approach, as well as the constrained
optimization algorithm used for learning task priorities. Ex-
periments led with different models of the iCub robot are
then illustrated in Section IV. Results are discussed in V,
leading to the conclusion of the paper.
II. BACKGROU ND
A. Notation used throughout the paper
•The set of real numbers is denoted by R
•The transpose operator is denoted by (·)>
•The Moore-Penrose pseudoinverse is denoted by (·)†
•|v|denotes the Euclidean norm of a vector v∈Rn
•Given a time function f(t)∈Rn, its first- and second-
order time derivatives are denoted by ˙
f(t)and ¨
f(t)
B. System Modelling
The robot is modelled as described in [3], where it is
assumed to be a free-floating system composed of n+1 links
wQP controller robot
u
s, ˙s
fitness
φ(w)
stochastic
optimization
updated weights
w∗learning
controller
Conditions j
III-A
III-B
(7)
Fig. 2: Overview of the proposed method. Given task prior-
ities w, the QP-based controller computes a control input
uunder a set of conditions j(e.g. desired trajectories,
disturbances, noise). An outer learning loop allows the
optimization of the task weights.
connected with n1-DOF (degree of freedom) joints. The
robot configuration space is characterized by q, composed
of the position and orientation of a base frame Battached to
a link of the robot, and the joint angles s. The robot velocity
is expressed as ν= (vB,˙s), where vBis the velocity of
the base frame expressed with respect to the inertial frame.
Applying the Euler-Poincar´
e formalism [16, Chapter 13.5]
yields the following system equation of motion:
M(q) ˙ν+h(q, ν) = ζτ +
nc
X
k=1
J>
ckfk(1)
where M∈Rn+6×n+6 is the mass mass matrix, h∈Rn+6
is the bias vector of Coriolis and gravity terms, τ∈Rn
is a vector representing the actuation joint torques, and
ζ= (0n×6,1n)>is a selector matrix, in which 1n∈Rn×n
denotes the identity matrix and 0n×6∈Rn×6is a zero
matrix. Assuming that the robot is interacting with the
environment through ncdistinct wrenches (as an abuse of
notation, here we do not define wrench as the dual of a
twist), fk∈R6denotes the k-th external wrench applied by
the environment on the frame ckof the robot.
III. MET HOD S
The method proposed for learning robust task priorities
(outlined in Fig. 2) relies on two main parts: (i) a QP-
based whole-body torque-controller which tracks desired task
trajectories and sends joint torque commands to the robot,
and (ii) a black-box constrained stochastic optimization pro-
cedure, posing no restriction on the structure of the learning
problem. It is used to optimize task priorities was follows:
at the end of a roll-out (i.e. execution of a footstep), the
fitness of the obtained robot behavior is evaluated, allowing
the optimization algorithm to update the task weights.
A. QP-based whole-body torque controller
The control framework used in this paper was defined
from a stack of tasks solved by quadratic programming. The
following subsections define the control input and objective,
before describing an optimization problem allowing the
computation of a control input which achieves the objective.
1) Control input: A control input u=τ>F>
c>is
composed of joint torques τand contact forces Fc, where
Fc∈R6ncis a stacked vector of contact wrenches. Defining
Jcas the contact Jacobian, stacked in the same way as Fc,
and B=ζ J>
c, the system dynamics (1) can then be
reformulated as
M(q) ˙ν+h=Bu (2)
2) Control objective: The developed controller has the
following objectives, from which a stack of tasks is defined:
– Stabilize the center of mass position XCoM ∈R3
– Stabilize the stance foot pose Xstance ∈R6
– Stabilize the swing foot pose Xswing ∈R6
– Stabilize the neck orientation Xneck ∈R3
– Track joint positions (postural task) s
– Minimize joint torques τ
The velocity ˙
XTof a Cartesian task Tcan be computed
from the associated Jacobian JTand the robot velocity ν
with ˙
XT=JTν, where T∈ {CoM , stance, swing, neck}.
Deriving this expression yields the task acceleration:
¨
XT=˙
JTν+JT˙ν(3)
In view of Eq. (2) and (3), task accelerations ¨
XTcan be
formulated as a function of the control input u:
¨
XT(u) = ˙
JTν+JTM−1(Bu −h)(4)
Stabilization of a Cartesian task may then be attempted
by minimizing ˜
¨
XT(u) = ¨
XT(u)−¨
X∗
T, the error on the
task acceleration, where ¨
X∗
Tis a feedback term composed of
reference linear accelerations obtained with a proportional-
derivative (PD) feedback control policy, and reference angu-
lar accelerations computed as in [17, section 5.11.6, p.173].
As for the postural task, similarly as above, in view of
Eq. (2), one can obtain ¨s(u), a formulation of the joint
accelerations ¨sas a function of the control input u:
¨s(u) = ζ>M−1(Bu −h)(5)
Stabilization of the postural task may then be attempted
by minimizing ˜
¨s(u) = ¨s(u)−¨sref , the error on the joint
accelerations, with ¨sref obtained through inverse kinematics
by solving the following optimization problem:
¨sref = arg min
¨s
1
2X
T
wT
˜
˙νT
2+ws
˜
˙νs
2(6)
In the above, the error on robot acceleration is computed
with ˜
˙νT= ˙ν−J†
T˙
JTν−¨
X∗
Tfor Cartesian tasks, and
with ˜
˙νs= ˙ν−˙
v∗>
B¨s∗>>for the postural task. ¨s∗is
a feedback term computed using a PD control policy, and
˙
v∗
Bis a feedback term on the base velocity used to ensure
convergence of the base acceleration to zero.
3) QP formulation: In the context where tasks may have
relative priorities, soft task priorities would allow to achieve
a trade-off between weighted tasks. A set of task weights is
defined, in order to attribute each task a priority, with
w={wCoM , wstance , wswing, wneck , ws, wτ}(7)
The terms wCoM , wstance , wswing, wneck ∈Rrefer to
weights associated to the CoM, stance foot, swing foot
and neck Cartesian tasks, and ws, wτ∈Rto the weights
associated to the postural task and joint torque regularization.
The control architecture described above can thus be
formulated as the following optimization problem.
u∗= arg min
u
1
2cost (8a)
subject to Cu ≤b(8b)
where the constraint Eq. (8b) ensures that the contact forces
remain within the associated friction cones. The cost function
(8a) is computed as the weighted sum of all task objectives:
cost =PTwT
˜
¨
XT(u)
2
+ws
˜
¨s(u)
2+wτ|τ(u)|2, in which
wT∈Rrefers to weights associated to Cartesian tasks.
Reorganizing the terms in the cost function, one can easily
verify that it has the form of a QP problem.
B. Constrained stochastic optimization
In this work, the learning problem is cast as a black-box
constrained stochastic optimization. Given a fitness function
φ(w) : RnP→R(with nPthe number of task weights
to optimize), sets of nIC inequality constraints and nE C
equality constraints h,g, the idea is to find an optimal
solution w◦∈W⊆RnPto the problem:
w◦= arg max
w
φ(w)(9a)
subject to
gi(w)≤0i= 1, . . . , nIC (9b)
hi(w)=0 i= 1, . . . , nEC (9c)
The Covariance Matrix Adaptation - Evolution Strat-
egy (CMA-ES), first described in [18], is a standard tool
for solving black-box optimization problems. In its basic
implementation, CMA-ES defines a multivariate Gaussian
distribution N(¯
w, σ2,Σ)with mean ¯
w, step size σ2and
covariance Σ. However, since CMA-ES was not originally
designed for handling constraints, this work employs a
constrained extension of it called (1+1)CMA-ES, which uses
covariance constrained adaptation (CCA) [19]. This variant
was benchmarked in [9] for bimanual tasks, as well as in
[20] for trajectory optimization of a whole-body movement
(standing up from a chair).
A single iteration, or generation, of (1+1)-CMA-ES com-
prises several stages. First, a sample w1is drawn according
to the rule w1=w+σDz, where Dis the Cholesky factor
of the covariance matrix Σ=D>Dand zis a standard
normal distributed vector z∼ N (0,I). The fitness function
φis evaluated for this sample, and the Gaussian distribution
is updated based on the result, as described in [21]. This
update also exploits the previous generations to change the
covariance and step size of the search distribution. Finally,
information about the successful iterations is stored in a
search path s∈RnP. A more detailed description of the
algorithm can be found in [20].
IV. EXP ERI MEN TS
This section exposes experiments with the iCub robot [22].
They were performed in order to validate empirically that the
method described in section III is capable of generating task
priorities which (i) produce robust whole-body motions, even
when contacts due to physical interaction with the environ-
ment evolve in time, and (ii) can cope with imperfections in
the robot model, disturbances and noise.
1) Robot setup: Experiments were conducted in simula-
tion using the open-source robot simulator Gazebo [23]. They
were performed with the iCub robot, using 23 DOF on legs,
arms and torso. The robot is equipped with 6 force/torque
sensors on the arms, legs and feet, allowing whole-body
dynamics computations. We tested our method using two
distinct models of the iCub with different inertial properties:
the first one with a tethered power supply and the second
one with a battery pack on the back, as shown in Fig. 1.
2) Whole-body motion: The controller described in sec-
tion III-A was developed in Matlab/Simulink using WBTool-
box [24]. It can be used either for a simulated or a real robot.
It is applied to the task of performing steps, i.e. performing
the following sequential movements for each step: move
the CoM above the stance foot, move the swing foot up
by 0.025m, move the swing foot back to its initial pose,
and move the CoM back to its initial position. The passage
from one movement to the next takes place when Cartesian
task errors are smaller than a threshold and contact forces
are consistent with the desired motion, or a maximum state
duration has been reached. Over the course of a step, the
desired stance foot pose, neck orientation and posture remain
at their initial value.
Note that over all experiments, the proportional-derivative
gains used to compute reference accelerations associated to
Cartesian and postural tasks are kept constant.
The next subsections describe the experiments we per-
formed, while Table I presents them in summary form. Train-
ing and testing were performed with different robot models
and stepping tasks, as well as under different randomized
conditions (RC) as described in Table II.
A. Training with the tethered iCub model
In order to optimize task weights, three different fitness
functions were compared. The first one, φp, favored perfor-
mance on the Cartesian tasks and less deployed effort. The
second one, φr, focused on robustness, by favoring solutions
with smaller excursion of the ZMP position PZM P with
respect to the center of the support polygon OSP . The third
fitness function, φpr, was a combination of the first two:
φp=−1
XTmax
tend
X
t=0 X
T
˜
XT
2
−0.0001
τmax
tend
X
t=0
|τ|2(10a)
φr=−1
PZM Pmax
tend
X
t=0
|PZM P −OSP |2(10b)
φpr =1
2(φp+φr)(10c)
XTmax ,τmax and PZM Pmax are normalization factors.
Inequality constraints to the learning problem were defined
on joint position limits and torque limits: they act as an extra
safety built on top of the QP controller.
The task used for the optimization is to perform 1 step,
as described in section IV-.2. The simulation was limited to
10 seconds, allowing the robot to perform one step and shift
its weight to the side in order to start a second one, making
sure that the robot remained stable after foot touchdown.
Early termination of a simulation took place in cases where
the robot fell or the QP in Eq. (8) could not be successfully
solved (which may happen when the weights being tested are
far from being optimal, and lead to a configuration in which
the QP solver simply can not find a solution). In these cases,
a penalty of −1.5was added to the fitness in Eq. (10).
Optimized task priorities were obtained by performing
200 iterations of (1+1)-CMA-ES [19] applied to the control
framework, with an exploration rate of 0.1; each learning ex-
periment was repeated 10 times. Since task priorities as used
in Eq. (6) and (8) are relative to each other, wCoM was at-
tributed a fixed value of 1. The remaining task priorities were
attributed bounds as shown in the bottom of Table I. Fur-
thermore, the weight values w0= [1,1,1,0.1,1e−3,1e−4]
obtained by hand were verified to allow the tethered iCub
model to successfully perform the desired stepping motion,
and were therefore used as a starting point.
Moreover, to achieve robustness through domain
randomization and enable the controller to eventually
cope with real-world data, the robot was subjected to the
randomized conditions 1 to 5 (see Table II) at each learning
iteration. These conditions constitute the set of conditions
junder which the controller had to perform. In particular,
tests performed in simulation showed that applying a
force of 10N on the chest for no more than 1second was
sufficient to destabilize the robot when using unoptimal
weights. Thus, using such disturbances during learning can
encourage the generation of robust task priorities.
An additional set of learning experiments was performed
for φpr without using DR, to assess the contribution of DR.
Results showed that 200 iterations were sufficient to
achieve strong convergence. Weights obtained with each of
the fitness functions in Eq. (10) are shown in Table I.
B. Testing 1: with the tethered iCub model
In order to validate the robustness achieved with the
optimized weights, each set of them were tested on the
same iCub model used for training, but not submitted to
randomized conditions. The testing task in this case was to
achieve 6 consecutive steps. The success rates achieved with
the optimized weights from each fitness function are shown
in Table I, and typical trajectories achieved using each fitness
function are shown in the left part of Fig. 3 and in Fig. 4.
C. Testing 2: with the backpacked iCub model
In order to replicate conditions similar to performing
experiments on the real robot, we prepared a second model
of the iCub, with a battery pack installed on its back. This
TABLE I: Summary of performed experiments and achieved results
Scenario Training Testing 1 Testing 2
Robot tethered tethered backpacked
Task 1 step 6 steps 6 steps
RC 1, 2, 3, 4, 5 — 5, 6, 7
Fitness # DR wCoM wstance wswing wneck wswτsuccess success
hand tuning 1 No 1 1 1 1 1e−3 1e−41/10/1
φp10 Yes 1 0.2±0.3 1.1±1.2 (1 ±3)e−3 0.5±0.3 (2 ±5)e−55/10 0/10
φr10 Yes 1 1.6±1.2 1.8±1.0 0.1±0.1 (4 ±7)e−3 1e−10 7/10 5/10
φpr 10 Yes 1 0.9±1.3 2.4±1.1 0.6±1.2 1e−6 1e−10 10/10 10/10
φpr 10 No 1 1.0±0.3 0.1±0.3 0.2±0.1 1e−6 (4 ±5)e−68/10 2/10
weights lower bounds: 1 1e−4 1e−4 1e−10 1e−6 1e−10
weights upper bounds: 1 10 10 10 10 0.1
Randomized conditions RC are defined in Table II. The column # gives the number of training experiments performed given each fitness function, while
only 1 set of manually tuned gains was used. In the column DR, “Yes” means that the RC defined at the top of the table were used for DR while
training; “No” means that they were disabled. The “success” columns report success rates achieved when testing the sets of trained weights.
(a) tethered model (b) backpacked model
Fig. 3: Typical CoM and feet trajectories obtained for 6 strides performed with the tethered and backpacked iCub models,
given initial weights w0or weights optimized using DR (except for “no DR”) with φp,φr,φpr. Each color denotes the use
of a different set of optimized weights. The x,yand zaxes correspond respectively to the sagittal, frontal and vertical axes.
TABLE II: Randomized set of conditions j
Randomized Condition (RC) in jRandom value
1. Gaussian noise on F/T sensor signals On / Off
2. Appointed swing foot Left / Right
3. Direction in which swing foot is moved Front / Back
4. XCoMdmoved forward by δ(m) δ|δ∈R+, δ ≤0.02
5. nFexternal wrenches applied on chest {nF|nF∈Z, nF≤7}
and for each wrench i:
at time tFi(s, with 1e−2precision) tFi|tFi∈R+, tFi≤10
duration dF(s, with 1e−2precision) dFi|dFi∈R+, dFi≤1
direction (γFi, θFi, ϕFi)(rad) γFi|γFi∈R, γFi≤2π
θFi|θFi∈R, θFi≤2π
ϕFi|ϕFi∈R, ϕFi≤2π
force magnitude FFi(N) FFi|FFi∈R, FFi≤10
torque magnitude τFi(Nm) τFi|τFi∈R, τFi≤10
6. Gaussian noise on joint velocity signals On
7. Gaussian noise on F/T sensor signals On
model was subjected randomized conditions 5 to 7 (see
Table II), with the testing task to achieve 6 consecutive
steps. The success rates achieved with the optimized weights
from each fitness function are shown in Table I. Successful
CoM and feet trajectories obtained with φpr, using DR and
not, are shown in Fig. 3. As could be expected, due to
Fig. 4: Typical ZMP trajectories obtained for 6 steps per-
formed with the tethered iCub model, from left to right:
given initial weights, weights optimized with φp,φr,φpr.
Each color (blue, red or yellow) denotes the use of a different
set of optimized weights. The x,yand zaxes correspond
respectively to the sagittal, frontal and vertical axes.
the addition of noise, ZMP trajectories showed to be highly
noisy and are therefore not shown. These results show that
weights obtained with φpr allow for a higher robustness
of the controller. Additionally, the rate of success achieved
with DR shows to be significantly higher than without DR,
demonstrating that DR did have a measurable impact on the
achieved robustness.
V. DISCUSSION AND CONCLUSION
In summary, the proposed method generated task priorities
for successful whole-body control of different robot models,
in 200 learning iterations. It was demonstrated by performing
training on a first model of the iCub robot, then testing on
a different model submitted to diverse working conditions.
Results achieved with φp, a fitness function favoring task
performance, were likely limited by overfitting on the model
used for learning and did not allow much robustness with
respect to disturbances. On the other hand, optimizing for
robustness with φrallowed higher chances of success when
changing conditions, by encouraging smaller ground reaction
forces and the generation of angular momentum through
the torso and arms. However, φrmight have neglected the
fulfillment of tasks, which are used for keeping balance.
Instead, using φpr, a fitness function combining robustness
and performance, allowed to achieve superior results.
With φpr, the swing foot placement, crucial for stability
at touchdown, was given high importance, while the neck
orientation task a lesser one, allowing compliance to external
forces, which thus facilitates recovery from external pertur-
bations and contact switching. As for the postural task, its
allotted low priority allows it to be used as regularization
(just as joint torques), instead of competing with Cartesian
tasks. Such a solution is interesting, as it may not have been
a priori self-evident to an expert defining task priorities.
Furthermore, the ranges over which sets of optimized weights
were obtained show that the problem has multiple local
minima. Therefore, although task priorities require proper
tuning, the controller is not highly sensitive to a single
precise adjustment of task weights.
Eventually, in order to further improve performance when
using a different robot model or moving on to experiments on
the real robot, one may be interested in performing another
round of optimization on the task priorities, for example
exploiting data-efficient learning methods, as done in [11].
Nonetheless, the robustness achieved with the proposed
method has the potential to allow higher success when
passing from simulation to real-world experiments. Further
works shall explore different DR conditions, such as ran-
domizing values associated to the dynamics of the system,
control delay, or measurement noise amplitude and delay.
The extension of the proposed method to additional param-
eters such as feedback gains will also be examined.
REFERENCES
[1] S. Ivaldi, J. Babiˇ
c, M. Mistry, and R. Murphy, “Special issue on
whole-body control of contacts and dynamics for humanoid robots,”
Autonomous Robots, vol. 40, no. 3, pp. 425–428, Mar 2016.
[2] L. Saab, O. E. Ramos, F. Keith, N. Mansard, P. Sou`
eres, and J. Y.
Fourquet, “Dynamic whole-body motion generation under rigid con-
tacts and other unilateral constraints,” IEEE Transactions on Robotics,
vol. 29, no. 2, pp. 346–362, April 2013.
[3] G. Nava, F. Romano, F. Nori, and D. Pucci, “Stability analysis and
design of momentum-based controllers for humanoid robots,” in 2016
IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), Oct 2016, pp. 680–687.
[4] S. Dafarra, G. Nava, M. Charbonneau, N. Guedelha, F. Andrade,
S. Traversaro, L. Fiorio, F. Romano, F. Nori, G. Metta, and D. Pucci,
“A Control Architecture with Online Predictive Planning for Position
and Torque Controlled Walking of Humanoid Robots,” ArXiv e-prints,
July 2018.
[5] J. Salini, V. Padois, and P. Bidaud, “Synthesis of complex humanoid
whole-body behavior: A focus on sequencing and tasks transitions,”
in 2011 IEEE International Conference on Robotics and Automation,
May 2011, pp. 1283–1290.
[6] K. Otani, K. Bouyarmane, and S. Ivaldi, “Generating assistive hu-
manoid motions for co-manipulation tasks with a multi-robot quadratic
program controller,” in ICRA, 2018.
[7] N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization
with a mixture of controllers for motion generation,” in 2015 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS),
Sept 2015, pp. 6416–6421.
[8] S. Ha and C. Liu, “Evolutionary optimization for parameterized whole-
body dynamic motor skills,” in ICRA, 2016.
[9] V. Modugno, U. Chervet, G. Oriolo, and S. Ivaldi, “Learning soft
task priorities for safe control of humanoid robots with constrained
stochastic optimization,” in HUMANOIDS, 2016.
[10] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can
adapt like animals,” Nature, vol. 521, pp. 503–507, 2015.
[11] J. Spitz, K. Bouyarmane, S. Ivaldi, and J. B. Mouret, “Trial-and-error
learning of repulsors for humanoid qp-based whole-body control,” in
2017 IEEE-RAS 17th International Conference on Humanoid Robotics
(Humanoids), Nov 2017, pp. 468–475.
[12] D. Clever, M. Harant, K. D. Mombaur, M. Naveau, O. Stasse, and
D. Endres, “Cocomopl: A novel approach for humanoid walking
generation combining optimal control, movement primitives and learn-
ing and its transfer to the real robot HRP-2,” IEEE Robotics and
Automation Letters, vol. 2, no. 2, pp. 977–984, 2017.
[13] S. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans-
actions on Knowledge and Data Engineering, vol. 22, no. 10, pp.
1345–1359, 2010.
[14] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel,
“Domain randomization for transferring deep neural networks from
simulation to the real world,” in 2017 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), Sept 2017, pp. 23–30.
[15] R. Antonova, S. Cruciani, C. Smith, and D. Kragic, “Reinforcement
learning for pivoting task,” CoRR, vol. abs/1703.00472, 2017.
[16] J. E. Marsden and T. S. Ratiu, Introduction to Mechanics and Symme-
try: A Basic Exposition of Classical Mechanical Systems. Springer
Publishing Company, Incorporated, 2010.
[17] R. Olfati-Saber, “Nonlinear control of underactuated mechanical
systems with application to robotics and aerospace vehicles,” Ph.D.
dissertation, Massachusetts Institute of Technology, Cambridge,
February 2001. [Online]. Available: http://hdl.handle.net/1721.1/8979
[18] N. Hansen and A. Ostermeier, “Completely derandomized self-
adaptation in evolution strategies.” Evolutionary Computation, vol. 9,
pp. 159–195, Jan 2001.
[19] D. V. Arnold and N. Hansen, “A (1+1)-CMA-ES for constrained
optimisation,” in GECCO, 2012, pp. 297–304.
[20] V. Modugno, G. Nava, D. Pucci, F. Nori, G. Oriolo, and S. Ivaldi,
“Safe trajectory optimization for whole-body motion of humanoids,” in
2017 IEEE-RAS 17th International Conference on Humanoid Robotics
(Humanoids), November 2017, pp. 763–770.
[21] C. Igel, T. Suttorp, and N. Hansen, “A computational efficient co-
variance matrix update and a (1+1)-CMA for evolution strategies,” in
GECCO, 2006.
[22] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori, “The
icub humanoid robot: An open platform for research in embodied
cognition,” in Proceedings of the 8th Workshop on Performance
Metrics for Intelligent Systems, ser. PerMIS ’08. New York, NY,
USA: ACM, 2008, pp. 50–56.
[23] N. Koenig and A. Howard, “Design and use paradigms for gazebo,
an open-source multi-robot simulator,” in Intelligent Robots and Sys-
tems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International
Conference on, vol. 3. IEEE, 2004, pp. 2149–2154.
[24] F. Romano, S. Traversaro, D. Pucci, J. Eljaik, A. Del Prete, and
F. Nori, “A whole-body software abstraction layer for control design
of free-floating mechanical systems,” in IEEE Int. Conf. on Robotic
Computing (IRC), 2017, pp. 148–155.