Content uploaded by Ruben Grandia
Author content
All content in this area was uploaded by Ruben Grandia on Aug 18, 2022
Content may be subject to copyright.
1
Perceptive Locomotion through Nonlinear Model
Predictive Control
Ruben Grandia1, Fabian Jenelten1, Shaohui Yang2, Farbod Farshidian1, and Marco Hutter1
Abstract—Dynamic locomotion in rough terrain requires ac-
curate foot placement, collision avoidance, and planning of the
underactuated dynamics of the system. Reliably optimizing for
such motions and interactions in the presence of imperfect
and often incomplete perceptive information is challenging. We
present a complete perception, planning, and control pipeline,
that can optimize motions for all degrees of freedom of the
robot in real-time. To mitigate the numerical challenges posed
by the terrain a sequence of convex inequality constraints is
extracted as local approximations of foothold feasibility and
embedded into an online model predictive controller. Steppability
classification, plane segmentation, and a signed distance field are
precomputed per elevation map to minimize the computational
effort during the optimization. A combination of multiple-
shooting, real-time iteration, and a filter-based line-search are
used to solve the formulated problem reliably and at high
rate. We validate the proposed method in scenarios with gaps,
slopes, and stepping stones in simulation and experimentally on
the ANYmal quadruped platform, resulting in state-of-the-art
dynamic climbing.
Index Terms—Legged Locomotion, Terrain Perception, Opti-
mal Control.
I. INTRODUCTION
INSPIRED by nature, the field of legged robotics aims to
enable the deployment of autonomous systems in rough
and complex environments. Indeed, during the recent DARPA
subterranean challenge, legged robots were widely adopted,
and highly successful [2], [3]. Still, complex terrains that
require precise foot placements, e.g., negative obstacles and
stepping stones as shown in Fig. 1, remain difficult.
A key challenge lies in the fact that both the terrain and
the system dynamics impose constraints on contact location,
force, and timing. When taking a model-based approach,
mature methods exist for perceptive locomotion with a slow,
static gait [4]–[8] and for blind, dynamic locomotion that
assumes flat terrain [9]–[11]. Learning-based controllers have
recently shown the ability to generalize blind locomotion
to challenging terrain with incredible robustness [12]–[14].
Still, tightly integrating perception to achieve coordinated and
precise foot placement remains an active research problem.
This research was supported by the Swiss National Science Foundation
(SNSF) as part of project No.188596 and by the Swiss National Science
Foundation through the National Centre of Competence in Research Robotics
(NCCR Robotics). This project has received funding from the European
Union’s Horizon 2020 research and innovation programme under grant
agreement No 780883.
1R. Grandia, F. Jenelten, F. Farshidian and M. Hutter are with the De-
partment of Mechanical and Process Engineering, ETH Zurich, Switzerland.
{rgrandia,fabianje,farbodf,mahutter}@ethz.ch.
2S. Yang is with the Automatic Control Laboratory, ´
Ecole polytechnique
f´
ed´
erale de Lausanne (EPFL), Switzerland. shaohui.yang@epfl.ch.
Manuscript received March 13, 2022; revised August 17, 2022.
Fig. 1. ANYmal walking on uneven stepping stones. In the shown config-
uration, the top foothold is 60 cm above the lowest foothold. The top right
visualizes the internal terrain representation used by the controller.
In an effort to extend dynamic locomotion to uneven terrain,
several methods have been proposed to augment foothold
selection algorithms with perceptive information [15]–[17].
These approaches build on a strict hierarchy of first selecting
footholds and optimizing torso motion afterward. This decom-
position reduces the computational complexity but relies on
hand-crafted coordination between the two modules. Addition-
ally, separating the legs from the torso optimization makes it
difficult to consider kinematic limits and collision avoidance
between limbs and terrain.
Trajectory optimization where torso and leg motions are
jointly optimized has shown impressive results in simula-
tion [18]–[20] and removes the need for engineered torso-
foot coordination. Complex motions can be automatically
discovered by including the entire terrain in the optimization.
However, computation times are often too long for online
deployment. Additionally, due to the non-convexity, non-
linearity, and discontinuity introduced by optimizing over
arbitrary terrain, these methods can get stuck in poor local
minima. Dedicated work on providing an initial guess is
needed to find feasible motions reliably [21].
This work presents a planning and control framework
that optimizes over all degrees of freedom of the robot,
considers collision avoidance with the terrain, and enables
complex dynamic maneuvers in rough terrain. The method
is centered around nonlinear Model Predictive Control (MPC)
with a multiple-shooting discretization [22], [23]. However,
in contrast to the aforementioned work, where the full terrain
is integrated into the optimization, we get a handle on the
numerical difficulty introduced by the terrain by exposing the
terrain as a series of geometric primitives that approximate
the local terrain. In this case, we use convex polygons as
foot placement constraints, but different shapes can be used
2
as long as they lead to well-posed constraints in the optimiza-
tion. Additionally, a signed distance field (SDF) is used for
collision avoidance. We empirically demonstrate that such a
strategy is an excellent trade-off between giving freedom to the
optimization to discover complex motions and the reliability
with which we can solve the formulated problem.
A. Contributions
We present a novel approach to locomotion in challenging
terrain where perceptive information needs to be considered
and nontrivial motions are required. The complete perception,
planning, and control pipeline contains the following contri-
butions:
•We perform simultaneous and real-time optimization of
all degrees of freedom of the robot for dynamic motions
across rough terrain. Perceptive information is encoded
through a sequence of geometric primitives that capture
local foothold constraints and a signed distance field used
for collision avoidance.
•The proposed combination of a multiple-shooting tran-
scription, sequential quadratic programming, and a filter-
based line-search enables fast and reliable online solu-
tions to the nonlinear optimal control problem.
•We provide a detailed description of the implemented
MPC problem, its integration with whole-body and re-
active control modules, and extensive experimental vali-
dation of the resulting locomotion controller.
The MPC implementation is publicly available as part of the
OCS2 toolbox1[24]. The implemented online segmentation
of the elevation map, and the efficient precomputation of a
signed distance field are contributed to existing open-source
repositories2, 3.
B. Outline
An overview of the proposed method is given in Fig. 2. The
perception pipeline at the top of the diagram runs at 20 Hz and
is based on an elevation map constructed from pointcloud in-
formation. For each map update, classification, segmentation,
and other precomputation are performed to prepare for the high
number of perceptive queries during motion optimization. At
the core of the framework, we use nonlinear MPC at 100 Hz
to plan a motion for all degrees of freedom and bring together
user input, perceptive information, and the measured state of
the robot. Finally, state estimation, whole-body torque control,
and reactive behaviors are executed at a rate of 400 Hz.
After a review of related work in section II, this paper is
structured similarly to Fig. 2. First, we present the perception
pipeline in section III. Afterward, the formulated optimal
control problem and corresponding numerical optimization
strategy are discussed in sections IV & V. We introduce the
motion execution layer in section VI. The resulting method is
evaluated on the quadrupedal robot ANYmal [25] (see Fig. 1)
in section VII, and concluded with section VIII.
1https://github.com/leggedrobotics/ocs2
2https://github.com/leggedrobotics/elevation mapping cupy
3https://github.com/ANYbotics/grid map
User input
Elevation mapping
State estimation &
Disturbance observer
Whole-body &
reactive control
Nonlinear MPC
Classification,
Segmentation, &
Precomputation
ANYmal
20 Hz
100 Hz 400 Hz
Fig. 2. Schematic overview of the proposed method together with the update
rate of each component.
II. RE LATE D WORK
A. Decomposing locomotion
When assuming a quasi-static gait with a predetermined
stepping sequence, the planning problem on rough terrain
can be simplified and decomposed into individual contact
transitions, as demonstrated in the early work on LittleDog
[4], [26]. In a one-step-ahead fashion, one can check the next
foothold for kinematic feasibility, feasibility w.r.t. the terrain,
and the existence of a statically stable transition. This problem
can be efficiently solved by sampling and checking candidate
footholds [27]. Afterward, a collision-free swing leg trajectory
to the desired foothold can be generated with CHOMP [28]
based on an SDF. Fully onboard perception and control with
such an approach were achieved by Fankhauser et al. [7].
Instead of one-step-ahead planning, an RRT graph can be built
to plan further ahead [5]. Sampling over templated foothold
transitions achieves similar results [6], [29].
In this work, we turn our attention to dynamic gaits, where
statically stable transitions between contact configurations are
not available. In model-based approaches to dynamic, percep-
tive locomotion, a distinction can be made between methods
where the footholds locations are determined separately from
the torso and those where the foothold locations and torso
motions are jointly optimized.
Several methods in which footholds are selected before
optimizing the torso motions, initially designed for flat terrain,
have been adapted to traverse rough terrain [30], [31]. These
methods typically employ some form of Raibert heuristic [32]
to select the next foothold and adapt it based on perceptive
information such as a traversability estimate [33]. The work
of Bellicoso et al. [9] was extended by including a batch
search for feasible footholds based on a given terrain map and
foothold scoring [15]. Similarly, in [16], the foot placement
is adapted based on visual information resulting in dynamic
trotting and jumping motions. In [34], the authors proposed to
train a convolutional neural network (CNN) to speed up the
online evaluation of such a foothold adaptation pipeline. This
CNN was combined with the MPC strategy in [11] to achieve
3
perceptive locomotion in simulation [17]. In [35] and [36], a
Reinforcement Learning (RL) policy has replaced the heuristic
foothold selection.
However, since foothold locations are chosen before op-
timizing the torso motion, their effect on dynamic stability
and kinematic feasibility is not directly considered, requiring
additional heuristics to coordinate feet and torso motions to
satisfy whole-body kinematics and dynamics. Moreover, it
becomes hard to consider collisions of the leg with the terrain
because the foothold is already fixed. In our approach, we
use the same heuristics to find a suitable nominal foothold
in the terrain. However, instead of fixing the foothold to that
particular location, a region is extracted around the heuristic
in which the foothold is allowed to be optimized.
The benefit of jointly optimizing torso and leg motions has
been demonstrated in the field of trajectory optimization. One
of the first demonstrations of simultaneous optimization of foot
placement and a zero-moment point (ZMP) [37] trajectory was
achieved by adding 2D foot locations as decision variables to
an MPC algorithm [38]. More recently, Kinodynamic [39],
Centroidal [40], [41], and full dynamics models [42], [43]
have been used for simultaneous optimization of 3D foot
locations and body motion. Alternatively, a single rigid body
dynamics (SRBD) model or other simplified torso models can
be extended with decision variables for Cartesian foothold
locations [19], [44]. Real-time capable methods have been
proposed with the specification of leg motions on position [10],
velocity [45], or acceleration level [46]. One challenge of this
line of work is the computational complexity arising from the
high dimensional models, already in the case of locomotion on
flat terrain. Our method also uses a high-dimensional model
and falls in this category. A key consideration when extending
the formulations with perceptive information has thus been to
keep computation within real-time constraints.
Finally, several methods exist that additionally optimize gait
timings or even the contact sequence together with the whole-
body motion. This can be achieved through complementarity
constraints [18], [20], [47], mixed-integer programming [48],
[49], or by explicitly integrating contact models into the opti-
mization [46], [50]. Alternatively, the duration of each contact
phase can be included as a decision variable [19], [51] or
found through bilevel optimization [52], [53]. However, such
methods are prone to poor local optima and reliably solving
the optimization problems in real-time remains challenging.
B. Terrain representation
The use of an elevation map has a long-standing history in
the field of legged robotics [54], and it is still an integral
part of many perceptive locomotion controllers today. Ap-
proaches where footholds are selected based on a local search
or sampling-based algorithm can directly operate on such a
structure. However, more work is needed when integrating the
terrain into a gradient-based optimization.
Winkler et al. [19] uses an elevation map for both foot
placement and collision avoidance. The splines representing
the foot motion are constrained to start and end on the terrain
with equality constraints. An inequality constraint is used to
avoid the terrain in the middle of the swing phase. Ignoring the
discontinuity and non-convexity from the terrain makes this
approach prone to poor local minima, motivating specialized
initialization schemes [21] for this framework.
In [44], a graduated optimization scheme is used, where
a first optimization is carried out over a smoothened version
of the terrain. The solution of this first optimization is then
used to initialize an optimization over the actual elevation
map. In a similar spirit, Mordatch [18] considers a general
3D environment and uses a soft-min operator to smoothen the
closest point computation. A continuation scheme is used to
gradually increase the difficulty of the problem over consecu-
tive optimizations.
Deits et al. [55] describe a planning approach over rough ter-
rain based on mixed-integer quadratic programming (MIQP).
Similar to [8], convex safe regions are extracted from the
terrain, and footstep assignment to a region is formulated as
a discrete decision. The foothold optimization is simplified
because only convex, safe regions are considered during
planning. Furthermore, the implementation relied on manual
seeding of convex regions by a human operator. We follow the
same philosophy of presenting the terrain as a convex region
to the optimization. However, we remove the mixed-integer
aspect by pre-selecting the convex region. The benefits are
two-fold: First, we do not require a global convex decomposi-
tion of the terrain, which is a hard problem in general [56], and
instead, only produce a local convex region centered around a
nominal foothold. Second, the MIQP approach does not allow
for nonlinear costs and dynamics, which limits the range of
motions that can be expressed. We first explored the proposed
terrain representation as part of our previous work [57], but
relied on offline mapping, manual terrain segmentation, and
did not yet consider terrain collisions. In [58], we applied
this idea to wheeled-legged robots, but again relied on offline
mapping and segmentation. Moreover, as discussed in the next
section, in both [57] and [58], we used a different solver, which
was found to be insufficient for the scenarios in this work.
C. Motion Optimization
For trajectory optimization, large-scale optimization soft-
ware like SNOPT [59] and IPOPT [60] are popular. They are
the workhorse for offline trajectory optimization in the work of
Winkler [19], Dai [20], Mordatch [18], Posa [47], and Pardo
[42]. These works show a great range of motions in simulation,
but it typically takes minutes to hours to find a solution.
A different line of work uses specialized solvers that exploit
the sparsity that arises from a sequential decision making pro-
cess. Several variants of Differential Dynamic Programming
(DDP) [61] have been proposed in the context of robotic
motion optimization, e.g., iLQR [62], [63], SLQ [39], and
FDDP [64].
With a slightly different view on the problem, the field of
(nonlinear) model predictive control [23], [65] has specialized
in solving successive optimal control problems under real-
time constraints. See [66] for a comparison of state-of-the-
art quadratic programming (QP) solvers that form the core
of second-order optimization approaches to the nonlinear
4
Torso referencePlane Segmentation
Collision Avoidance
Constraints
Reference
Generation
A B
C
Filtering & Classification
Elevation Map
Optimal Control Problem
Convex Foothold Constraints
Plane boundary
Convex constraint
Query
Projection
Signed Distance Field
(dense 3D grid)
Fig. 3. Perception pipeline overview. (A) The elevation map is filtered and classified into steppable and non-steppable cells [Section III-A]. All steppable
areas are segmented into planes [Section III-B]. After segmentation, the steppablity classification is refined. (B) A signed distance field [Section III-C] and
torso reference layer [Section III-D] are precomputed to reduce the required computation time during optimization. (C) Convex foothold constraints in (21)
are obtained from the plane segmentation. The signed distance field enables collision avoidance in (23), and the torso reference is used to generate height and
orientation references [Section IV-E].
problem. For time-critical applications, the real-time iteration
scheme can be used to trade optimality for lower computa-
tional demands [67]: In a Sequential Quadratic Programming
(SQP) approach to the nonlinear problem, at each control
instance, only a single QP optimization step is performed.
The current work was initially built on top of a solver in
the first category [39]. However, a significant risk in classical
DDP-based approaches is the need to perform a nonlinear
system rollout along the entire horizon. Despite the use of a
feedback policy, these forward rollouts can diverge, especially
in the presence of underactuated dynamics. This same obser-
vation motivated Mastalli et al. to design FDDP to maintain
gaps between shorter rollouts, resulting in a formulation that is
equivalent to direct multiple-shooting formulations with only
equality constraints [22], [64]. Giftthaler et al. [68] studied
several combinations of iLQR and multiple-shooting but did
not yet consider constraints beyond system dynamics nor a
line-search procedure to determine the stepsize. Furthermore,
experiments were limited to simple, flat terrain walking.
We directly follow the multiple-shooting approach with a
real-time iteration scheme and leverage the efficient structure
exploiting QP solver HPIPM [69]. However, as also mentioned
in both [64] and [68], one difficulty is posed in deciding a
stepsize for nonlinear problems, where one now has to monitor
both the violation of the system dynamics and minimization
of the cost function. To prevent an arbitrary trade-off through
a merit function, we suggest using a filter-based line-search
instead [70], which allows a step to be accepted if it reduces
either the objective function or the constraint violation. As we
will demonstrate in the result section, these choices contribute
to the robustness of the solver in challenging scenarios.
III. TERRAIN PERCEPTION AND SEGMENTATION
An overview of the perception pipeline and its relation to
the MPC controller is provided in Fig. 3. The pipeline can be
divided into three parts: (A) steppability classification and seg-
mentation, (B) precomputation of the SDF and torso reference,
and (C) integration into the optimal control problem.
The elevation map, represented as a 2.5D grid [71] with a
4 cm resolution is provided by the GPU based implementation
introduced in [72]. The subsequent map processing presented
in this work runs on the CPU and is made available as part of
that same open-source library. Both (A) and (B) are computed
once per map and run at 20 Hz, asynchronously to the motion
optimization in (C).
A. Filtering & Classification
The provided elevation map contains empty cells in oc-
cluded areas. As a first step, we perform inpainting by filling
each cell with the minimum value found along the occlusion
border. Afterwards, a median filter is used to reduce noise and
outliers in the map.
Steppablity classification is performed by thresholding the
local surface inclination and the local roughness estimated
through the standard deviation [73]. Both quantities can be
computed with a single pass through the considered neigh-
bourhood of size N:
µ=1
NX
i
ci,S=1
NX
i
cic>
i,Σ=S−µµ>,(1)
where µand Sare the first and second moment, and Σ∈
R3×3is the positive semi-definite covariance matrix of the cell
positions ci. The variance in normal direction, σ2
n, is then the
smallest eigenvalue of Σ, and the surface normal, n, is the
5
corresponding eigenvector. For steppability classification we
use a neighbourhood of N= 9, and set a threshold of 2 cm
on the standard deviation in normal direction and a maximum
inclination of 35◦, resulting in the following classification:
steppability =(1if σn≤0.02,and nz≥0.82,
0otherwise,(2)
where nzdenotes the z-coordinate of the surface normal.
B. Plane Segmentation
After the initial classification, the plane segmentation starts
by identifying continuous regions with the help of a connected
component labelling [74]. For each connected region of cells,
we compute again the covariance as in (1), where Nis now
the number of cells in the connected region, and accept the
region as a plane based on the following criteria:
planarity =(1if σn≤0.025, nz≥0.87,and N≥4
0otherwise.(3)
Notice that here we loosen the bound on the standard deviation
to 2.5 cm, tighten the bound on the inclination to 30◦, and add
the constraint that at least 4 cells form a region. If the planarity
condition is met, the surface normal and mean of the points
define the plane.
If a region fails the planarity condition, we trigger RANSAC
[75] on that subset of the data. The same criteria in (3) are
used to find smaller planes within the connected region. After
the algorithm terminates, all cells that have not been included
in any plane have their steppability updated and set to 0.
At this point, we have a set of plane parameters with
connected regions of the map assigned to them. For each of
these regions, we now extract a 2D contour from the elevation
map [76], and project it along the z-axis to the plane to define
the boundary in the frame of the plane. It is important to
consider that regions can have holes, for example, when a free-
standing obstacle is located in the middle of an open floor. The
boundary of each segmented region is therefore represented
by an outer polygon together with a set of polygons that trace
enclosed holes. See Fig. 4 for an illustrative example of such
a segmented region and the local convex approximations it
permits. Finally, if the particular region allows, we shrink the
boundary inwards (and holes outwards) to provide a safety
margin. If the inscribed area is not large enough, the plane
boundary is accepted without margin. In this way we obtain a
margin where there is enough space to do so, but at the same
time we do not reject small stepping stones, which might be
crucial in certain scenarios.
C. Signed Distance Field
Before computing the SDF, we take advantage of the
classification between terrain that will be potentially stepped
on and terrain that will not be stepped on. To all cells that are
non-steppable, we add a vertical margin of 2 cm, and dilate the
elevation by one cell. The latter effectively horizontally inflates
all non-steppable areas by the map resolution. This procedure
Fig. 4. An example of a segmented region represented by a non-convex outer
polygon and two non-overlapping holes (drawn in black). Three different local
convex approximations (drawn in orange) are shown that are found around
query points with the iterative algorithm described in section IV-F.
corrects for the problem that edges tend to be underestimated
in the provided elevation map.
We use a dense 3D voxel grid, where each voxel contains
the value and 3D gradient. The previous motion plan is used
to determine the 3D volume where distance information is
needed. This volume is a bounding box that contains all
collision bodies of the last available plan with a margin of
25 cm. This way, the size and shape of the SDF grid dynam-
ically scales with the motion that is executed. Storing both
value and derivative as proposed in [77] allows for efficient
interpolation during optimization. However, in contrast to [77],
where values and gradients are cached after the first call, we
opt to precompute the full voxel grid to reduce the computation
time during optimization as much as possible.
This is possible by taking advantage of the extra structure
that the 2.5D representation provides. A detailed description
of how the SDF can be efficiently computed from an elevation
map is given in Appendix A.
D. Torso reference map
With user input defined as horizontal velocity and an
angular rate along the z-direction, it is the responsibility of
the controller to decide on the height and orientation of the
torso. We would like the torso pose to be positioned in such a
way that suitable footholds are in reach for all of the feet. We
therefore create a layer that is a smooth interpolation of all
steppable regions as described in [44]. The use of this layer to
generate a torso height and orientation reference is presented
in section IV-E.
IV. MOTION PLANNING
In this section, we describe the nonlinear MPC formulation.
In particular, we set out to define all components in the
following nonlinear optimal control problem:
minimize
u(·)Φ(x(T)) + ZT
0
L(x(t),u(t), t)dt, (4a)
subject to: x(0) = ˆ
x,(4b)
˙
x=fc(x,u, t),(4c)
g(x,u, t) = 0,(4d)
where x(t)and u(t)are the state and the input at time t, and ˆ
x
is the current measured state. The term L(·)is a time-varying
running cost, and Φ(·)is the cost at the terminal state x(T).
The goal is to find a control signal that minimizes this cost
6
Fig. 5. Overview of the coordinates frames and constraints used in the
definition of the MPC problem. On the front left foot, a friction cone is
shown, defined in the terrain frame FT. On the right front foot, a swing
reference trajectory is drawn between the liftoff frame FT−and touchdown
frame FT+. Foot placement constraints are defined as a set of half-spaces
in the touchdown frame. Stance legs have collision bodies at the knee, as
illustrated on the right hind leg, while swing legs have collision bodies on
both the foot and the knee, as shown on the left hind leg.
subject to the initial condition, x0, system dynamics, fc(·), and
equality constraints, g(·). Inequality constraints are all handled
through penalty functions and will be defined as part of the
cost function in section IV-F.
A. Robot definition
We define the generalized coordinates and velocities as:
q=hθ>
B,p>
B,q>
ji>
,˙
q=ω>
B,v>
B,˙
q>
j>
,(5)
where θB∈R3is the orientation of the base frame, FB,
in Euler angles, pB∈R3is the position of the base in the
world frame, FW.ωB∈R3and vB∈R3are the angular
rate and linear velocity of the base in the body frame FB.
Joint positions and velocities are given by qj∈R12 and ˙
qj∈
R12. The collection of all contact forces is denoted by λ∈
R12. When referring to these quantities per leg, we will use a
subscript i, e.g. qi∈R3or λi∈R3. All subscripts for legs
in contact are contained in the set C. A graphical illustration
of the robot together with the defined coordinate frames is
provided in Fig. 5.
B. Torso dynamics
To derive the torso dynamics used in this work, consider
the full rigid body dynamics of the robot,
M(q)¨
q+n(q,˙
q) = S>τ+τdist +X
i∈C
J>
i(q)λi,(6)
with inertia matrix M:R18 →R18×18, generalized acceler-
ations ¨
q∈R18, and nonlinear terms n:R18 ×R18 →R18
on the left hand side. The right hand contains the selection
matrix S= [012×6,I12×12]∈R12×18 , actuation torques
τ∈R12, disturbance forces τdist ∈R18 , contact Jacobians
Ji:R18 →R3×18, and contact forces λi∈R3.
For these equations of motion, it is well known that for
an articulated system, the underactuated, top 6 rows are
of main interest for motion planning [51]. These so-called
centroidal dynamics govern the range of motion that can be
achieved [40], [78]. Solving the centroidal dynamics for base
acceleration gives:
˙
ωB
˙
vB=M−1
B τdist
B−MBj ¨
qj−nB+X
i∈C
J>
B,i λi!,(7)
=fB(q,˙
q,¨
qj,λ,τdist
B),(8)
where MB∈R6×6is the compound inertia tensor at the top
left of M(q), and MBj ∈R6×12 is the top right block that
encodes inertial coupling between the legs and base. The other
terms with subscript Bcorrespond to the top 6 rows of the
same terms in (6).
To simplify the torso dynamics, we evaluate this func-
tion with zero inertial coupling forces from the joints, i.e.
MBj ¨
qj=0. This simplification allows us to consider the legs
only on velocity level and removes joint accelerations from the
formulation. From here, further simplifications would be pos-
sible. Evaluating the function at a nominal joint configuration
and zero joint velocity creates a constant inertia matrix and
gives rise to the commonly used single rigid body assumption.
While this assumption is appropriate on flat terrain, the joints
move far away from their nominal configuration in this work,
creating a significant shift in mass distribution and center of
mass location.
C. Input loopshaping
The bandwidth limitations of the series elastic actuators
used in ANYmal pose an additional constraint on the set of
motions that are feasible on hardware. Instead of trying to
accurately model these actuator dynamics, we use a frequency-
dependent cost function to penalize high-frequency content in
the contact forces and joint velocity signals [79]. For com-
pleteness, we present here the resulting system augmentation
in the time domain:
˙
sλ=Aλsλ+Bλνλ,˙
sj=Ajsj+Bjνj,(9)
λ=Cλsλ+Dλνλ,˙
qj=Cjsj+Djνj,
where sλand sjare additional states, and νλand νjare auxil-
iary inputs, associated with contact forces and joint velocities
respectively. When the filters (νλ→λand νj→˙
qj) are
low-pass filters, penalizing the auxiliary input is equivalent to
penalizing high frequency content in λand ˙
qj.
An extreme case is obtained when choosing Aλ=Dλ=0,
Bλ=Cλ=I, in which case the auxiliary input becomes the
derivative, ˙
λ. This reduces to the common system augmenta-
tion technique that allows penalization of input rates [23].
In our case we allow some direct control (D6=0) and
select Aλ=Aj=0,Bλ=Bj=I,Cλ=100
4I,Cj=
50
3I,Dλ=1
4I,Dj=1
3I. This corresponds to a progressive
increase in cost up to a frequency of 100 rad s−1for λand up
to 50 rad s−1for ˙
qj, where high frequency components have
their cost increased by a factor of 4and 3respectively.
7
D. System Dynamics
We are now ready to define the state vector x∈R48 and
input vector u∈R24 used during motion optimization:
x=hθ>
B,p>
B,ω>
B,v>
B,q>
j,s>
λ,s>
ji>
,u=ν>
λ,ν>
j>
.(10)
Putting together the robot dynamics from section IV-B and
system augmentation described in IV-C gives the continuous
time MPC model ˙
x=fc(x,u, t):
d
dt
θB
pB
ωB
vB
qj
sλ
sj
=
T(θB)ωB
RB(θB)vB
fB(q,˙
q,0,Cλsλ+Dλνλ,τdist
B)
Cjsj+Djνj
Aλsλ+Bλνλ
Ajsj+Bjνj
,(11)
where T(θB) : R3→R3×3provides the conversion be-
tween angular body rates and Euler angle derivatives, and
RB(θB) : R3→R3×3provides the body to world rotation
matrix. The disturbance wrench τdist
Bis considered a parameter
and is assumed constant over the MPC horizon.
E. Reference generation
The user commands 2D linear velocities and an angular rate
in the horizontal plane, as well as a desired gait pattern. A
full motion and contact force reference is generated to encode
these user commands and additional motion preferences into
the cost function defined in section IV-F. This process is
carried out before every MPC iteration.
As a first step, assuming a constant input along the horizon,
a 2D base reference position and heading direction are extrap-
olated in the world frame. At each point in time, the 2D pose
is converted to a 2D position for each hip. The smoothened
elevation map, i.e. the torso reference layer shown in Fig 3, is
interpolated at the 2D hip location. The interpolated elevation
in addition to a desired nominal height, hnom, gives a 3D
reference position for each hip. A least-squares fit through the
four hip positions gives the 6DoF base reference.
The extracted base reference and the desired gait pattern
are used to derive nominal foothold locations. Here we use
the common heuristic that the nominal foothold is located
below the hip, in gravity-aligned direction, at the middle of
the contact phase [32]. Additionally, for the first upcoming
foothold, a feedback on the measured velocity is added:
pi,nom =pi,hip,nom +shnom
g(vB,meas −vB,com ),(12)
where pi,nom ∈R3is the nominal foothold, pi,hip,nom ∈R3
is the nominal foothold location directly below the hip, and gis
the gravitational constant. vB,meas and vB,com are measured
and commanded base velocity respectively.
With the nominal foothold locations known, the plane seg-
mentation defined in section III-B is used to adapt the nominal
foothold locations to the perceived terrain. Each foothold is
projected onto the plane that is closest and within kinematic
limits. Concretely, we pick the reference foothold, pi,proj,
according to:
argmin
pi,proj ∈Π(pi,nom)kpi,nom −pi,pr oj k2
2+wkinfk in(pi,proj ),
(13)
where Π(pi,nom)is a set of candidate points. For each
segmented plane we take the point within that region that is
closest to the nominal foothold as a candidate. The term fkin
is a kinematic penalty with weight wkin that penalizes the
point if the leg extension at liftoff or touchdown is beyond
a threshold and if the foothold crosses over to the opposite
side of the body. Essentially, this is a simplified version of
the foothold batch search algorithm presented in [15], which
searches over cells of the map instead of pre-segmented planes.
After computing all projected footholds, heuristic swing
trajectories are computed with two quintic splines; from liftoff
to apex and apex to touchdown. The spline is constrained
by a desired liftoff and touchdown velocity, and an apex
location is selected in such a way that the trajectory clears the
highest terrain point between the footholds. Inverse kinematics
is used to derive joint position references corresponding to the
base and feet references. Finally, contact forces references are
computed by dividing the total weight of the robot equally
among all feet that are in contact. Joint velocity references
are set to zero.
F. Cost & Soft Inequality Constraints
The cost function (4a) is built out of several components.
The running cost L(x,u, t)can be split into tracking costs
L, loopshaping costs Lν, and penalty costs LB:
L=L+Lν+LB.(14)
The motion tracking cost are used to follow the reference
trajectory defined in section IV-E. Tracking error are defined
for the base, B, and for each foot ,i,
B=
log(RBR>
B,ref )∨
pB−pB,ref
ωB−ωB,ref
vB−vB,ref
,i=
qi−qi,ref
˙
qi−˙
qi,ref
pi−pi,ref
vi−vi,ref
λi−λi,ref
,(15)
where log(RBR>
B,ref )∨is the logarithmic map of the orien-
tation error, represented as a 3D rotation vector, and piand
viare the foot position and velocity in world frame. Together
with diagonal, positive definite, weight matrices WBand Wi,
for which the individual elements are listed in Table I, these
errors form the following nonlinear least-squares cost:
L=1
2kBk2
WB+
4
X
i=1
1
2kik2
Wi.(16)
As discussed in section IV-C, high-frequency content in
joint velocities and contact forces are penalized through a cost
on the corresponding auxiliary input. This cost is a simple
quadratic cost:
Lν=1
2ν>
λRλνλ+1
2ν>
jRjνj,(17)
8
TABLE I
MOTI ON TR ACK IN G WEI GH TS
Term Weights
log(RBR>
B,ref )∨(100.0,300.0,300.0)
pB−pB,ref (1000.0,1000.0,1500.0)
ωB−ωB,ref (10.0,30.0,30.0)
vB−vB,ref (15.0,15.0,30.0)
qi−qi,ref (2.0,2.0,1.0)
˙
qi−˙
qi,ref (0.02,0.02,0.01)
pi−pi,ref (30.0,30.0,30.0)
vi−vi,ref (15.0,15.0,15.0)
λi−λi,ref (0.001,0.001,0.001)
where Rλand Rjare constant, positive semi-definite, weight
matrices. To obtain an appropriate scaling and avoid further
manual tuning, these matrices are obtained from the quadratic
approximation of the motion tracking cost (16), with respect
to λand ˙
qjrespectively, at the nominal stance configuration
of the robot.
All inequality constraints are handled through the penalty
cost. In this work, we use relaxed barrier functions [80], [81].
This penalty function is defined as a log-barrier on the interior
of the feasible space and switches to a quadratic function at a
distance δfrom the constraint boundary.
B(h) = (−µln(h), h ≥δ,
µ
2h−2δ
δ2−1−µln(δ), h < δ. (18)
The penalty is taken element-wise for vector-valued inequality
constraints. The sum of all penalties is given as follows:
LB=
4
X
i=1 Bjhj
i+X
i∈C Btht
i+Bλhλ
i+X
c∈D Bdhd
c,
(19)
with joint limit constraints hj
ifor all legs, foot placement and
friction cones constraints, ht
iand hλ
i, for legs in contact, and
collision avoidance constraints hd
cfor all bodies in a set D.
The joint limits constraints contain upper {qj,˙
qj,τ}and
lower bounds {qj,˙
qj,τ}for positions, velocities, and torques:
hj
i=
qj−qj
qj−qj
˙
qj−˙
qj
˙
qj−˙
qj
τ−τ
τ−τ
≥0,(20)
where we approximate the joint torques by considering a static
equilibrium in each leg, i.e. τi=J>
j,iλi.
The foot placement constraint is a set of linear inequality
constraints in task space:
ht
i=At
i·pi+bt
i≥0,(21)
where At
i∈Rm×3, and bt
i∈Rmdefine mhalf-space
constraints in 3D. Each half-space is defined as the plane
spanned by an edge of the 2D polygon and the surface normal
of the touchdown terrain FT+. The polygon is obtained by
initializing all mvertices at the reference foothold derived in
section IV-E and iteratively displacing them outwards. Each
vertex is displaced in a round-robin fashion until it reaches the
boundary of the segmented region or until further movement
would cause the polygon to become non-convex. Similar to
[82], we have favoured the low computational complexity of an
iterative scheme over an exact approach of obtaining a convex
inner approximation. The first set of extracted constraints
remain unaltered for a foot that is in the second half of the
swing phase to prevent last-minute jumps in constraints.
The friction cone constraint is implemented as:
hλ
i=µcFz−qF2
x+F2
y+2≥0,(22)
with [Fx, Fy, Fz]>=R>
TRBλi, defining the forces in the
local terrain frame. µcis the friction coefficient, and > 0is
a parameter that ensures a continuous derivative at λi=0,
and at the same time creates a safety margin [83].
The collision avoidance constraint is given by evaluation of
the SDF at the center of a collision sphere, pc, together with
the required distance given by the radius, rc, and a shaping
function dmin(t).
hd
c=dSDF (pc)−rc−dmin (t)≥0.(23)
The primary use of the shaping function is to relax the
constraint if a foot starts a swing phase from below the map.
To avoid the robot using maximum velocity to escape the
collision, we provide smooth guidance back to free space with
a cubic spline trajectory. This happens when the perceived
terrain is higher than the actual terrain, for example in case
of a soft terrain like vegetation and snow, or simply because
of drift and errors in the estimated map. The collision set D
contains collision bodies for all knees and for all feet that are
in swing phase, as visualized on the hind legs in Fig. 5.
Finally, we use a quadratic cost as the terminal cost in (4a).
To approximate the infinite horizon cost incurred after the
finite horizon length, we solve a Linear Quadratic Regulator
(LQR) problem for the linear approximation of the MPC
model and quadratic approximation of the intermediate costs
around the nominal stance configuration of the robot. The
Riccati matrix SLQR of the cost-to-go is used to define the
quadratic cost around the reference state:
Φ(x) = 1
2(x−xref (T))>SLQR (x−xref (T)) .(24)
G. Equality constraints
For each foot in swing phase, the contact forces are required
to be zero:
λi=0,∀i /∈ C.(25)
Additionally, for each foot in contact, the end-effector
velocity is constrained to be zero. For swing phases, the
reference trajectory is enforced only in the normal direction.
This ensures that the foot lifts off and touches down with
a specified velocity while leaving complete freedom of foot
placement in the tangential direction.
vi=0,if i∈ C,
n>(t) (vi−vi,ref +kp(pi−pi,ref )) = 0,if i /∈ C,
The surface normal, n(t), is interpolated over time since liftoff
and touchdown terrain can have a different orientation.
9
Algorithm 1 Real-time iteration Multiple-shooting MPC
1: Given: previous solution wi
2: Discretize the continous problem to the form of (27)
3: Compute the linear quadratic approximation (30)
4: Compute the equality constraint projection (34)
5: δ˜
w←Solve the projected QP subproblem (35)
6: δw←Pδ˜
w+p, back substitution using (33)
7: wi+1 ←Line-Search(wi,δw), (Algorithm 2)
V. NUMERICAL OPTIMIZATION
We consider a direct multiple-shooting approach to trans-
forming the continuous optimal control problem into a finite-
dimensional nonlinear program (NLP) [22]. Since MPC com-
putes control inputs over a receding horizon, successive in-
stances of (27) are similar and can be efficiently warm-
started when taking an SQP approach by shifting the previous
solution. For new parts of the shifted horizon, for which no
initial guess exists, we repeat the final state of the previous
solution and initialize the inputs with the references generated
in section IV-E. Additionally, we follow the real-time iteration
scheme where only one SQP step is performed per MPC
update [84]. In this way, the solution is improved across
consecutive instances of the problem, rather than iterating until
convergence for each problem.
As an overview of the approach described in the following
sections, a pseudo-code is provided in Algorithm 1, referring
to the relevant equations used at each step. Except for the
solution of the QP in line 5, all steps of the algorithm are
parallelized across the shooting intervals. The QP is solved
using HPIPM [69].
A. Discretization
The continuous control signal u(t)is parameterized over
subintervals of the prediction horizon [t, t +T]to obtain a
finite-dimensional decision problem. This creates a grid of
nodes k∈ {0,.. . , N }defining control times tkseparated by
intervals of duration δt ≈T/(N−1). Around gait transitions,
δt is slightly shortened or extended such that a node is exactly
at the gait transition.
In this work, we consider a piecewise constant, or zero-
order-hold, parameterization of the input. Denoting xk=
x(tk)and integrating the continuous dynamics in (11) over an
interval leads to a discrete time representation of the dynamics:
fd
k(xk,uk) = xk+Ztk+δt
tk
fc(x(τ),uk, t)dτ. (26)
The integral in (26) is numerically approximated with an inte-
gration method of choice to achieve the desired approximation
accuracy of the evolution of the continuous time system under
the zero-order-hold commands. We use an explicit second-
order Runge-Kutta scheme.
The general nonlinear MPC problem presented below can
be formulated by defining and evaluating a cost function and
constraints on the grid of nodes.
min
X,UΦ(xN) +
N−1
X
k=0
lk(xk,uk),(27a)
s.t. x0−ˆ
x=0,(27b)
xk+1 −fd
k(xk,uk) = 0, k = 0,. .. , N −1,(27c)
gk(xk,uk) = 0, k = 0,. .. , N −1,(27d)
where X= [x>
0, . . . x>
N]>, and U= [u>
0, . . . u>
N−1]>, are
the sequences of state and input variables respectively. The
nonlinear cost and constraint functions lk, and gk, are discrete
sample of the continuous counterpart. Collecting all decision
variables into a vector, w= [X>,U>]>, problem (27) can be
written as a general NLP:
min
wφ(w),s.t. F(w)
G(w)=0,(28)
where φ(w)is the cost function, F(w)is the collection
of initial state and dynamics constraints, and G(w)is the
collection of all general equality constraints.
B. Sequential Quadratic Programming (SQP)
SQP based methods apply Newton-type iterations to
Karush-Kuhn-Tucker (KKT) optimality conditions, assuming
some regularity conditions on the constraints [85]. The La-
grangian of the NLP in (28) is defined as:
L(w,λG,λH) = φ(w) + λ>
FF(w) + λ>
GG(w),(29)
with Lagrange multipliers λFand λG, corresponding to the
dynamics and equality constraints. The Newton iterations can
be equivalently computed by solving the following potentially
non-convex QP [86]:
min
δw∇wφ(wi)>δw+1
2δw>Biδw,(30a)
s.t F(wi) + ∇wF(wi)>δw=0,(30b)
G(wi) + ∇wG(wi)>δw=0,(30c)
where the decision variables, δw=w−wi, define the update
step relative to the current iteration wi, and the Hessian Bi=
∇2
wL(wi,λF,λG). Computing the solution to (30) provides a
candidate decision variable update, δwi, and updated Lagrange
multipliers.
C. Quadratic Approximation Strategy
As we seek to deploy MPC on dynamic robotic platforms, it
is critical that the optimization problem in (30) is well condi-
tioned and does not provide difficulty to numerical solvers. In
particular, when Biin (30a) is positive semi-definite (p.s.d),
the resulting QP is convex and can be efficiently solved [66].
To ensure this, an approximate, p.s.d Hessian is used instead
of the full Hessian of the Lagrangian. For the tracking costs
(16), the objective function has a least-squares form in which
case the Generalized Gauss-Newton approximation,
∇2
w1
2ki(w)k2
Wi≈ ∇wi(w)>Wi∇wi(w),(31)
10
proves effective in practice [87]. Similarly, for the soft con-
straints, we exploit to convexity of the penalty function applied
to the nonlinear constraint [88]:
∇2
w(B(h(w))) ≈ ∇wh(w)>∇2
hB(h(w))∇wh(w),(32)
where the diagonal matrix ∇2
hB(h(w)) maintains the curva-
ture information of the convex penalty functions. The contri-
bution of the constraints to the Lagrangian in (29) is ignored
in the approximate Hessian since we do not have additional
structure that allows a convex approximation.
D. Constraint Projection
The equality constraints in IV-G were carefully chosen to
have full row rank w.r.t. the control inputs, such that, after
linearization, ∇wG(wi)>has full row rank in (30c). This
means that the equality constraints can be eliminated before
solving the QP through a change of variables [86]:
δw=Pδ˜
w+p,(33)
where the linear transformation satisfies
∇wG(wi)>P=0,∇wG(wi)>p=−G(wi).(34)
After substituting (33) into (30), the following QP is solved
w.r.t. δ˜
w.
min
δ˜
w∇˜
w˜
φ(wi)>δ˜
w+1
2δ˜
w>˜
Biδ˜
w,(35a)
s.t ˜
F(wi) + ∇˜
w˜
F(wi)>δ˜
w=0.(35b)
Because each constraint applies only to the variables at one
node k, the coordinate transformation maintains the sparsity
pattern of an optimal control problem and can be computed
in parallel. Since this projected problem now only contains
costs and system dynamics, solving the QP only requires
one Ricatti-based iteration [69]. The full update δwis then
obtained through back substitution into (33).
E. Line-Search
To select an appropriate stepsize, we employ a line-search
based on the filter line-search used in IPOPT [60]. In contrast
to a line-search based on a merit function, where cost and
constraints are combined to one metric, the main idea is
to ensure that each update either improves the constraint
satisfaction or the cost function. The constraint satisfaction
θ(w)is measured by taking the norm of all constraints scaled
by the time discretization:
θ(w) = δt F(w)>,G(w)>>2.(36)
In case of high or low constraint satisfaction, the behavior
is adapted: When the constraint is violated beyond a set
threshold, θmax, the focus changes purely to decreasing the
constraints; when constraint violation is below a minimum
threshold, θmin, the focus changes to minimizing costs.
Compared to the algorithm presented in [60], we remove
recovery strategies and second-order correction steps, for
which there is no time in the online setting. Furthermore, the
Algorithm 2 Backtracking Line-Search
1: Hyperparameters: αmin = 10−4, θmax = 10−2, θmin = 10−6, η =
10−4, γφ= 10−6, γθ= 10−6, γα= 0.5
2: α←1.0
3: θk←θ(wi)
4: φk←φ(wi)
5: Accepted ←False
6: while Not Accepted and α≥αmin do
7: θi+1 ←θ(wi+αδw)
8: φi+1 ←φ(wi+αδw)
9: if θi+1 > θmax then
10: if θi+1 <(1 −γθ)θithen
11: Accepted ←True
12: end if
13: else if max(θi+1, θi)< θmin and ∇φ(wi)>δw<0then
14: if φi+1 < φi+ηα∇φ(wi)>δwthen
15: Accepted ←True
16: end if
17: else
18: if φi+1 < φi−γφθior θi+1 <(1 −γθ)θithen
19: Accepted ←True
20: end if
21: end if
22: if Not Accepted then
23: α←γαα
24: end if
25: end while
26: if Accepted then
27: wi+1 ←wi+αδw
28: else
29: wi+1 ←wi
30: end if
history of iterates plays no role since we perform only one
iteration per problem.
The simplified line-search as used in this work is given in
Algorithm 2 and contains three distinct branches in which a
step can be accepted. The behavior at high constraint violation
is given by line 9, where a step is rejected if the new constraint
violation is above the threshold and worse than the current
violation. The switch to the low constraint behavior is made
in line 13: if both new and old constraint violations are low
and the current step is in a descent direction, we require that
the cost decrease satisfies the Armijo condition in line 14.
Finally, the primary acceptance condition is given in line 18,
where either a cost or constraint decrease is requested. The
small constants γφ, and γθare used to fine-tune this condition
with a required non-zero decrease in either quantity.
VI. MOT IO N EXECUTION
The optimized motion planned by the MPC layer consists
of contact forces and desired joint velocities. We linearly
interpolate the MPC motion plan at the 400 Hz execution
rate and apply the feedback gains derived from the Riccati
backward pass to the measured state [83]. The corresponding
torso acceleration is obtained through (8). The numerical
derivative of the planned joint velocities is used to determine a
feedforward joint acceleration. A high-frequency whole-body
controller (WBC) is used to convert the desired acceleration
tasks into torque commands [89]–[91]. A generalized mo-
mentum observer is used to estimate the contact state [92].
Additionally, the estimated external torques are filtered and
added to the MPC and WBC dynamics as described in [44].
We use the same filter setup as shown in Fig. 13. of [44].
11
A. Event based execution
Inevitably, the measured contact state will be different from
the planned contact state used during the MPC optimization. In
this case, the designed contact forces cannot be provided by the
whole-body controller. We have implemented simple reactive
behaviors to respond to this situation and provide feedback to
the MPC layer.
In case there is a planned contact, but no contact is mea-
sured, we follow a downward regaining motion for that foot.
Under the assumption that the contact mismatch will be short,
the MPC will start a new plan again from a closed contact
state. Additionally, we propagate the augmented system in (9)
with the information that no contact force was generated, i.e.
0!
=Cλsλ+Dλνλ. In this way, the MPC layer will generate
contact forces that maintain the requested smoothness w.r.t.
the executed contact forces.
When contact is measured, but no contact was planned, the
behavior depends on the planned time till contact. If contact
was planned to happen soon, the measured contact is sent
to the MPC to generate the next plan from that early contact
state. In the meantime, the WBC maintains a minimum contact
force for that foot. If no upcoming contact was planned, the
measured contact is ignored.
B. Whole-body control
The whole-body control (WBC) approach considers the full
nonlinear rigid body dynamics of the system in (6), including
the estimate of disturbance forces. Each task is formulated as
an equality constraint, inequality constraint, or least-squares
objective affine in the generalized accelerations, torques, and
contact forces. While we have used a hierarchical resolution
of tasks in the past [91], in this work, we instead use a single
QP and trade off the tracking tasks with weights. We found
that a strict hierarchy results in a dramatic loss of performance
in lower priority tasks when inequalities constraints are active.
Additionally, the complexity of solving multiple QPs and null-
space projections in the hierarchical approach is no longer
justified with the high quality motion reference coming from
the MPC.
The complete list of tasks is given in Table II. The first two
blocks of tasks enforce physical consistency and inequality
constraints on torques, forces, and joint configurations. The
joint limit constraint is derived from an exponential Control
Barrier Function (CBF) [93] on the joint limits, qj≤qj≤qj,
resulting in the following joint acceleration constraints:
¨
qj+ (γ1+γ2)˙
qj+γ1γ2(qj−qj)≥0,(37)
−¨
qj−(γ1+γ2)˙
qj+γ1γ2(qj−qj)≥0,(38)
with scalar parameters γ1>0, γ2>0. These CBF constraints
guarantee that the state constraints are satisfied for all time
and under the full nonlinear dynamics of the system [94].
For the least-square tasks, we track swing leg motion with
higher weight than the torso reference. This prevents that
the robot exploits the leg inertia to track torso references in
underactuated directions, and it ensures that the foot motion
is prioritized over torso tracking when close to kinematics
TABLE II
WHO LE-B ODY C ONT ROL TAS KS
Type Task
=Floating base equations of motion.
No motion at the contact points.
≥
Torque limits.
Friction cone constraint.
Joint limit barrier constraint.
w2
ik·k2
Swing leg motion tracking (wi= 100.0).
Torso linear and angular acceleration (wi= 1.0).
Contact force tracking. (wi= 0.01).
limits. Tracking the contact forces references with a low
weight regulates the force distribution in case the contact
configuration allows for internal forces.
Finally, the torque derived from the whole-body controller,
τwbc ∈R12, is computed. To compensate for model uncer-
tainty for swing legs, the integral of joint acceleration error
with gain K > 0is added to the torque applied to the system:
τi=τi,wbc −KZt
tsw
0
(¨
qi−¨
qi,wbc)dt, (39)
=τi,wbc −K ˙
qi−˙
qi(tsw
0)−Zt
tsw
0
¨
qi,wbcdt!,(40)
where tsw
0is the start time of the swing phase. The acceleration
integral can be implemented based on the measured velocity
˙
qiand the velocity at the start of the swing phase, ˙
qi(tsw), as
shown in (40). Futhermore, the feedback term is saturated to
prevent integrator windup. For stance legs, a PD term is added
around the planned joint configuration and contact consistent
joint velocity.
VII. RES ULTS
ANYmal is equipped with either two dome shaped Robo-
Sense bpearl LiDARs, mounted in the front and back of
the torso, or with four Intel RealSense D435 depth cameras
mounted on each side of the robot. Elevation mapping runs at
20 Hz on an onboard GPU (Jetson AGX Xavier). Control and
state estimation are executed on the main onboard CPU (Intel
i7-8850H,2.6 GHz, Hexa-core) at 400 Hz, asynchronously to
the MPC optimization which is triggered at 100 Hz. Four cores
are used for parallel computation in the MPC optimization.
A time horizon of T= 1.0 s is used with a nominal time
discretization of δt ≈0.015 s, with a slight variation due to the
adaptive discretization around gait transitions. Each multiple-
shooting MPC problem therefore contains around 5000 deci-
sion variables. Part (A) and (B) of perception pipeline in Fig. 3
are executed on a second onboard CPU of the same kind and
provides the precomputed layers over Ethernet.
To study the performance of the proposed controller, we
report results in different scenarios and varying levels of
detail. All perception, MPC, and WBC parameters remain
constant throughout the experiments and are the same for
simulation and hardware. An initial guess for these parameters
was found in simulation, and we further fine-tuned them on
hardware. First, results for the perception pipeline in isolation
are presented in section VII-A. Second, we validate the major
design choices in simulation in section VII-B. Afterward, the
12
Fig. 6. Evaluation of the plane segmentation on a demo terrain [71]. The
shown map has a true size of 20×20×1 m with a resolution of 4 cm. Top
left shows the elevation map with additive uniform noise of ±2 cm plus
Gaussian noise with a standard deviation of 2 cm. Top right shows the map
after inpainting, filtering, steppability classification, and plane segmentation.
Below, four areas of interest are shown. Their original location in the map is
marked in the top right image.
proposed controller is put to the test in challenging simula-
tion, as well as hardware experiments in section VII-C. All
experiments are shown in the supplemental video [1]. Finally,
known limitations are discussed in section VII-D.
A. Perception Pipeline
The output of the steppability classification and plane seg-
mentation (part A in Fig. 3) for a demo terrain is shown in
Fig. 6. This terrain is available as part of the gridmap library
and contains a collection of slopes, steps, curvatures, rough
terrain, and missing data. The left middle image shows that
slopes and steps are, in general, well segmented. In the bottom
right image, one sees the effect of the plane segmentation on
a curved surface. In those cases, the terrain will be segmented
into a collection of smaller planes. Finally, the rough terrain
sections shown in the right middle and bottom left image show
that the method is able to recognize such terrain as one big
planar section as long as the roughness is within the specified
tolerance. These cases also show the importance of allowing
holes in the segmented regions, making it possible to exclude
just those small regions where the local slope or roughness is
outside the tolerance. A global convex decomposition of the
map would result in a much larger amount of regions.
The computation time for the construction and querying
of the signed distance field is benchmarked on sub-maps of
varying sizes extracted from the demo map, see Fig. 7. As
expected, the construction time scales linearly with the SDF
size, and the query time is constant with a slight increase when
the memory size exceeds a cache level. During runtime, the
local SDF size is typically below 105voxels, resulting in a
computation time well below 10 ms. Together with the map
update rate of 20 Hz, the proposed method provides the SDF
Fig. 7. Computation time for constructing and querying the signed distance
field. Submaps of the terrain in Fig. 6 are used. SDF size on the horizontal axis
denotes the total amount of data points in the SDF (width×length×height).
The query time is reported for the total of 103random queries for the
interpolated value and derivative.
Fig. 8. ANYmal stepping up a box of 35 cm. Left: Without considering knee
collisions. Right: Knee collision included in the optimization.
at an order of magnitude faster than methods that maintain a
general 3D voxel grid, with update rated reported around 1 Hz
[77]. Per MPC iteration, around 103SDF queries are made,
making the SDF query time negligible compared to the total
duration of one MPC iteration.
B. Simulation
1) Collision avoidance: To highlight the importance of
considering knee collisions with the terrain, the robot is
commanded to traverse a box of 35 cm with a trotting gait
at 0.25 m s−1. Fig. 8 compares the simulation result of this
scenario with and without the knee collisions considered.
The inclusion of knee collision avoidance is required to
successfully step up the box with the hind legs. As shown
in the figure, the swing trajectories are altered. Furthermore,
the base pose and last stepping location before stepping up
are adjusted to prepare for the future, showing the benefit
of considering all degrees of freedom in one optimization.
Similarly, on the way down, the foothold optimization (within
constraints) allows that the feet are placed away from the step,
avoiding knee collisions while stepping down.
Fig. 9 provides insight into the solver during the motion
performed with the knee collisions included. The four peaks
in the cost function show the effect of the collision avoidance
penalty when the legs are close to the obstacle during the step
up and step down. Most of the time, the step obtained from
the QP subproblem is accepted by the line-search with the
full stepsize of 1.0. However, between 7and 8 s the stepsize
is decreased to prevent the constraint violation from further
rising. This happens when the front legs step down the box and
are close to collision. In those cases, the collision avoidance
penalty is highly nonlinear, and the line-search is required
to maintain the right balance between cost decrease and
13
0
50
100
Cost
10−6
10−4
10−2
100
Constraint violation
0 2 4 6 8 10 12 14
time [s]
0.00
0.25
0.50
1.00
Step size
Constraint
Cost OR Constraint
Fig. 9. Solver status during the box traversal motion (including knee collision
avoidance). The first and second plots show the total cost, and constraint
violation according to (36), after each iteration. The bottom plot shows the
stepsize and the line-search branch that led to the step acceptance. ‘Constraint‘
refers to a step accepted in the high constraint violation branch in line 9 of
Algorithm 2, ‘Cost OR Constraint‘ refers to the branch where either cost
or constraint decrease is accepted in line 18. Note that the low constraint
violation branch, line 13, did not occur in this experiment.
constraint satisfaction. We note that the line-search condition
for low constraint violation is typically not achieved when
using only one iteration per MPC problem.
2) Model selection: In the same scenario, we compare the
performance of the proposed dynamics for the base with those
of the commonly used single rigid body dynamics (SRBD). To
be precise, the torso dynamics in (8) are evaluated at a constant
nominal joint configuration and with zero joint velocities,
while the rest of the controller remains identical. When using
the SRBD, the model does not describe the backward shift in
the center of mass location caused by the leg configuration.
The result is that the controller with the SRBD model has
a persisting bias that makes the robot almost tip over during
the step up. This model error is quantified in Fig. 10. At 30◦
pitch angle, there is a center of mass error of 2.6 cm, resulting
in a bias of 13.3 N m at the base frame. For reference, this is
equivalent to an unmodelled payload of 3.6 kg at the tip of the
robot. The proposed model fully describes the change in inertia
and center of mass location and therefore does not have any
issue to predict the state trajectory during the step up motion.
3) Solver Comparison: To motivate our choice to imple-
ment a multiple-shooting solver and move away from the
DDP-based methods used in previous work, we compare both
approaches on flat terrain and the stepping stone scenario
shown in Fig. 11. In particular, we compare against iLQR [62]
and implement it with the same constraint projection, line-
search, and the Riccati Backward pass of HPIPM as described
in section V. The key difference between the algorithms lies
in the update step. For multiple-shooting, we update both state
and inputs directly: u+
k=uk+αδuk,x+
k=xk+αδxk. In
contrast, iLQR proceeds with a line-search over closed-loop
nonlinear rollouts of the dynamics:
u+
k=uk+αkk+Kkx+
k−xk,(41)
x+
k+1 =fd
k(x+
k,u+
k),x+
0=ˆ
x,(42)
x
Fig. 10. The location of the center of mass (CoM) in heading direction for
various torso pitch angles. The first set of CoM locations is evaluated with the
true joint angles, which are obtained when aligning the legs with the gravity
direction as in the top image. This corresponds to the reference in section
IV-E, which is tracked by the MPC. The second set of CoM locations is
evaluated for the default joint angles, shown in the bottom image, as assumed
by the SRBD model.
where Kkis the optimal feedback gain obtained from the
Riccati Backward pass and kk=δuk−Kkδxkis the control
update. Due to this inherently single-threaded process, each
line-search for iLQR takes four times as long as for the
multi-threaded multiple-shooting. However, note that with the
hybrid multiple-shooting-iLQR variants in [68] this difference
vanishes.
Table III reports the solvers’ average cost, dynamics con-
straint violation, and equality constraint violation for a trotting
gait in several scenarios. As a baseline, we run the multiple-
shooting solver until convergence (with a maximum of 50
iterations) instead of real-time iteration. To test the MPC
in isolation, we use the MPC dynamics as the simulator
and apply the MPC input directly. Because of the nonlinear
rollouts of iLQR, dynamics constraints are always satisfied,
and iLQR, therefore, has the edge over multiple-shooting on
this metric. However, as the scenario gets more complex and
the optimization problem becomes harder, there is a point
where the forward rollout of iLQR is unstable and diverges.
For the scenario shown in Fig. 11, this happens in the place
where the robot is forced to take a big leap at the 63 %
mark and at the 78 % mark where the hind leg is close to
singularity as the robot steps down. The continuous time
variant SLQ [39] fails in similar ways. These failure cases
are sudden, unpredictable, and happen regularly when testing
on hardware, where imperfect elevation maps, real dynamics,
and disturbances add to the challenge. The absence of long
horizon rollouts in the multiple-shooting approach makes it
more robust and better suited for the scenarios shown in this
work. For cases where both solvers are stable, we find that
the small dynamics violation left with multiple-shooting in a
real-time iteration setting does not translate to any practical
performance difference on hardware. Finally, even for the
most challenging scenario, multiple-shooting with real-time
iteration remains within 10 % cost of the baseline.
4) Contact feedback: The reactive behavior under a mis-
match in planned and sensed contact information is shown
in the accompanying video. First, the sensed terrain is set to
14
Fig. 11. ANYmal traversing stepping stones in simulation (right to left). The resulting state trajectories for feet and torso, and the snapshots are shown for
a traversal with the multiple-shooting solver and a trotting gait at 0.75 m s−1. The marked 63 % and 78 % locations indicate where the alternative solver,
iLQR, diverges for 0.5 m s−1and 0.75 m s−1, respectively.
TABLE III
SOLVER COMPARISON ON FLAT TERRAIN AND STEPPING STONES. THE
BASELINE ITERATES UNTIL CONVERGENCE INSTEAD OF USING
RE AL-TIME ITERATION.
Baseline Multiple
shooting iLQR
Flat - 0.50 m/s
Cost 54.19 54.17 54.18
Dynamics Constr. 1.70 ×10−73.41 ×10−30.0
Equality Constr. 2.28 ×10−63.51 ×10−33.51 ×10−3
Stones - 0.25 m/s
Cost 151.92 156.22 156.69
Dynamics Constr. 3.58 ×10−51.01 ×10−20.0
Equality Constr. 7.24 ×10−42.28 ×10−22.14 ×10