Content uploaded by Ruben Grandia

Author content

All content in this area was uploaded by Ruben Grandia on Aug 18, 2022

Content may be subject to copyright.

1

Perceptive Locomotion through Nonlinear Model

Predictive Control

Ruben Grandia1, Fabian Jenelten1, Shaohui Yang2, Farbod Farshidian1, and Marco Hutter1

Abstract—Dynamic locomotion in rough terrain requires ac-

curate foot placement, collision avoidance, and planning of the

underactuated dynamics of the system. Reliably optimizing for

such motions and interactions in the presence of imperfect

and often incomplete perceptive information is challenging. We

present a complete perception, planning, and control pipeline,

that can optimize motions for all degrees of freedom of the

robot in real-time. To mitigate the numerical challenges posed

by the terrain a sequence of convex inequality constraints is

extracted as local approximations of foothold feasibility and

embedded into an online model predictive controller. Steppability

classiﬁcation, plane segmentation, and a signed distance ﬁeld are

precomputed per elevation map to minimize the computational

effort during the optimization. A combination of multiple-

shooting, real-time iteration, and a ﬁlter-based line-search are

used to solve the formulated problem reliably and at high

rate. We validate the proposed method in scenarios with gaps,

slopes, and stepping stones in simulation and experimentally on

the ANYmal quadruped platform, resulting in state-of-the-art

dynamic climbing.

Index Terms—Legged Locomotion, Terrain Perception, Opti-

mal Control.

I. INTRODUCTION

INSPIRED by nature, the ﬁeld of legged robotics aims to

enable the deployment of autonomous systems in rough

and complex environments. Indeed, during the recent DARPA

subterranean challenge, legged robots were widely adopted,

and highly successful [2], [3]. Still, complex terrains that

require precise foot placements, e.g., negative obstacles and

stepping stones as shown in Fig. 1, remain difﬁcult.

A key challenge lies in the fact that both the terrain and

the system dynamics impose constraints on contact location,

force, and timing. When taking a model-based approach,

mature methods exist for perceptive locomotion with a slow,

static gait [4]–[8] and for blind, dynamic locomotion that

assumes ﬂat terrain [9]–[11]. Learning-based controllers have

recently shown the ability to generalize blind locomotion

to challenging terrain with incredible robustness [12]–[14].

Still, tightly integrating perception to achieve coordinated and

precise foot placement remains an active research problem.

This research was supported by the Swiss National Science Foundation

(SNSF) as part of project No.188596 and by the Swiss National Science

Foundation through the National Centre of Competence in Research Robotics

(NCCR Robotics). This project has received funding from the European

Union’s Horizon 2020 research and innovation programme under grant

agreement No 780883.

1R. Grandia, F. Jenelten, F. Farshidian and M. Hutter are with the De-

partment of Mechanical and Process Engineering, ETH Zurich, Switzerland.

{rgrandia,fabianje,farbodf,mahutter}@ethz.ch.

2S. Yang is with the Automatic Control Laboratory, ´

Ecole polytechnique

f´

ed´

erale de Lausanne (EPFL), Switzerland. shaohui.yang@epfl.ch.

Manuscript received March 13, 2022; revised August 17, 2022.

Fig. 1. ANYmal walking on uneven stepping stones. In the shown conﬁg-

uration, the top foothold is 60 cm above the lowest foothold. The top right

visualizes the internal terrain representation used by the controller.

In an effort to extend dynamic locomotion to uneven terrain,

several methods have been proposed to augment foothold

selection algorithms with perceptive information [15]–[17].

These approaches build on a strict hierarchy of ﬁrst selecting

footholds and optimizing torso motion afterward. This decom-

position reduces the computational complexity but relies on

hand-crafted coordination between the two modules. Addition-

ally, separating the legs from the torso optimization makes it

difﬁcult to consider kinematic limits and collision avoidance

between limbs and terrain.

Trajectory optimization where torso and leg motions are

jointly optimized has shown impressive results in simula-

tion [18]–[20] and removes the need for engineered torso-

foot coordination. Complex motions can be automatically

discovered by including the entire terrain in the optimization.

However, computation times are often too long for online

deployment. Additionally, due to the non-convexity, non-

linearity, and discontinuity introduced by optimizing over

arbitrary terrain, these methods can get stuck in poor local

minima. Dedicated work on providing an initial guess is

needed to ﬁnd feasible motions reliably [21].

This work presents a planning and control framework

that optimizes over all degrees of freedom of the robot,

considers collision avoidance with the terrain, and enables

complex dynamic maneuvers in rough terrain. The method

is centered around nonlinear Model Predictive Control (MPC)

with a multiple-shooting discretization [22], [23]. However,

in contrast to the aforementioned work, where the full terrain

is integrated into the optimization, we get a handle on the

numerical difﬁculty introduced by the terrain by exposing the

terrain as a series of geometric primitives that approximate

the local terrain. In this case, we use convex polygons as

foot placement constraints, but different shapes can be used

2

as long as they lead to well-posed constraints in the optimiza-

tion. Additionally, a signed distance ﬁeld (SDF) is used for

collision avoidance. We empirically demonstrate that such a

strategy is an excellent trade-off between giving freedom to the

optimization to discover complex motions and the reliability

with which we can solve the formulated problem.

A. Contributions

We present a novel approach to locomotion in challenging

terrain where perceptive information needs to be considered

and nontrivial motions are required. The complete perception,

planning, and control pipeline contains the following contri-

butions:

•We perform simultaneous and real-time optimization of

all degrees of freedom of the robot for dynamic motions

across rough terrain. Perceptive information is encoded

through a sequence of geometric primitives that capture

local foothold constraints and a signed distance ﬁeld used

for collision avoidance.

•The proposed combination of a multiple-shooting tran-

scription, sequential quadratic programming, and a ﬁlter-

based line-search enables fast and reliable online solu-

tions to the nonlinear optimal control problem.

•We provide a detailed description of the implemented

MPC problem, its integration with whole-body and re-

active control modules, and extensive experimental vali-

dation of the resulting locomotion controller.

The MPC implementation is publicly available as part of the

OCS2 toolbox1[24]. The implemented online segmentation

of the elevation map, and the efﬁcient precomputation of a

signed distance ﬁeld are contributed to existing open-source

repositories2, 3.

B. Outline

An overview of the proposed method is given in Fig. 2. The

perception pipeline at the top of the diagram runs at 20 Hz and

is based on an elevation map constructed from pointcloud in-

formation. For each map update, classiﬁcation, segmentation,

and other precomputation are performed to prepare for the high

number of perceptive queries during motion optimization. At

the core of the framework, we use nonlinear MPC at 100 Hz

to plan a motion for all degrees of freedom and bring together

user input, perceptive information, and the measured state of

the robot. Finally, state estimation, whole-body torque control,

and reactive behaviors are executed at a rate of 400 Hz.

After a review of related work in section II, this paper is

structured similarly to Fig. 2. First, we present the perception

pipeline in section III. Afterward, the formulated optimal

control problem and corresponding numerical optimization

strategy are discussed in sections IV & V. We introduce the

motion execution layer in section VI. The resulting method is

evaluated on the quadrupedal robot ANYmal [25] (see Fig. 1)

in section VII, and concluded with section VIII.

1https://github.com/leggedrobotics/ocs2

2https://github.com/leggedrobotics/elevation mapping cupy

3https://github.com/ANYbotics/grid map

User input

Elevation mapping

State estimation &

Disturbance observer

Whole-body &

reactive control

Nonlinear MPC

Classification,

Segmentation, &

Precomputation

ANYmal

20 Hz

100 Hz 400 Hz

Fig. 2. Schematic overview of the proposed method together with the update

rate of each component.

II. RE LATE D WORK

A. Decomposing locomotion

When assuming a quasi-static gait with a predetermined

stepping sequence, the planning problem on rough terrain

can be simpliﬁed and decomposed into individual contact

transitions, as demonstrated in the early work on LittleDog

[4], [26]. In a one-step-ahead fashion, one can check the next

foothold for kinematic feasibility, feasibility w.r.t. the terrain,

and the existence of a statically stable transition. This problem

can be efﬁciently solved by sampling and checking candidate

footholds [27]. Afterward, a collision-free swing leg trajectory

to the desired foothold can be generated with CHOMP [28]

based on an SDF. Fully onboard perception and control with

such an approach were achieved by Fankhauser et al. [7].

Instead of one-step-ahead planning, an RRT graph can be built

to plan further ahead [5]. Sampling over templated foothold

transitions achieves similar results [6], [29].

In this work, we turn our attention to dynamic gaits, where

statically stable transitions between contact conﬁgurations are

not available. In model-based approaches to dynamic, percep-

tive locomotion, a distinction can be made between methods

where the footholds locations are determined separately from

the torso and those where the foothold locations and torso

motions are jointly optimized.

Several methods in which footholds are selected before

optimizing the torso motions, initially designed for ﬂat terrain,

have been adapted to traverse rough terrain [30], [31]. These

methods typically employ some form of Raibert heuristic [32]

to select the next foothold and adapt it based on perceptive

information such as a traversability estimate [33]. The work

of Bellicoso et al. [9] was extended by including a batch

search for feasible footholds based on a given terrain map and

foothold scoring [15]. Similarly, in [16], the foot placement

is adapted based on visual information resulting in dynamic

trotting and jumping motions. In [34], the authors proposed to

train a convolutional neural network (CNN) to speed up the

online evaluation of such a foothold adaptation pipeline. This

CNN was combined with the MPC strategy in [11] to achieve

3

perceptive locomotion in simulation [17]. In [35] and [36], a

Reinforcement Learning (RL) policy has replaced the heuristic

foothold selection.

However, since foothold locations are chosen before op-

timizing the torso motion, their effect on dynamic stability

and kinematic feasibility is not directly considered, requiring

additional heuristics to coordinate feet and torso motions to

satisfy whole-body kinematics and dynamics. Moreover, it

becomes hard to consider collisions of the leg with the terrain

because the foothold is already ﬁxed. In our approach, we

use the same heuristics to ﬁnd a suitable nominal foothold

in the terrain. However, instead of ﬁxing the foothold to that

particular location, a region is extracted around the heuristic

in which the foothold is allowed to be optimized.

The beneﬁt of jointly optimizing torso and leg motions has

been demonstrated in the ﬁeld of trajectory optimization. One

of the ﬁrst demonstrations of simultaneous optimization of foot

placement and a zero-moment point (ZMP) [37] trajectory was

achieved by adding 2D foot locations as decision variables to

an MPC algorithm [38]. More recently, Kinodynamic [39],

Centroidal [40], [41], and full dynamics models [42], [43]

have been used for simultaneous optimization of 3D foot

locations and body motion. Alternatively, a single rigid body

dynamics (SRBD) model or other simpliﬁed torso models can

be extended with decision variables for Cartesian foothold

locations [19], [44]. Real-time capable methods have been

proposed with the speciﬁcation of leg motions on position [10],

velocity [45], or acceleration level [46]. One challenge of this

line of work is the computational complexity arising from the

high dimensional models, already in the case of locomotion on

ﬂat terrain. Our method also uses a high-dimensional model

and falls in this category. A key consideration when extending

the formulations with perceptive information has thus been to

keep computation within real-time constraints.

Finally, several methods exist that additionally optimize gait

timings or even the contact sequence together with the whole-

body motion. This can be achieved through complementarity

constraints [18], [20], [47], mixed-integer programming [48],

[49], or by explicitly integrating contact models into the opti-

mization [46], [50]. Alternatively, the duration of each contact

phase can be included as a decision variable [19], [51] or

found through bilevel optimization [52], [53]. However, such

methods are prone to poor local optima and reliably solving

the optimization problems in real-time remains challenging.

B. Terrain representation

The use of an elevation map has a long-standing history in

the ﬁeld of legged robotics [54], and it is still an integral

part of many perceptive locomotion controllers today. Ap-

proaches where footholds are selected based on a local search

or sampling-based algorithm can directly operate on such a

structure. However, more work is needed when integrating the

terrain into a gradient-based optimization.

Winkler et al. [19] uses an elevation map for both foot

placement and collision avoidance. The splines representing

the foot motion are constrained to start and end on the terrain

with equality constraints. An inequality constraint is used to

avoid the terrain in the middle of the swing phase. Ignoring the

discontinuity and non-convexity from the terrain makes this

approach prone to poor local minima, motivating specialized

initialization schemes [21] for this framework.

In [44], a graduated optimization scheme is used, where

a ﬁrst optimization is carried out over a smoothened version

of the terrain. The solution of this ﬁrst optimization is then

used to initialize an optimization over the actual elevation

map. In a similar spirit, Mordatch [18] considers a general

3D environment and uses a soft-min operator to smoothen the

closest point computation. A continuation scheme is used to

gradually increase the difﬁculty of the problem over consecu-

tive optimizations.

Deits et al. [55] describe a planning approach over rough ter-

rain based on mixed-integer quadratic programming (MIQP).

Similar to [8], convex safe regions are extracted from the

terrain, and footstep assignment to a region is formulated as

a discrete decision. The foothold optimization is simpliﬁed

because only convex, safe regions are considered during

planning. Furthermore, the implementation relied on manual

seeding of convex regions by a human operator. We follow the

same philosophy of presenting the terrain as a convex region

to the optimization. However, we remove the mixed-integer

aspect by pre-selecting the convex region. The beneﬁts are

two-fold: First, we do not require a global convex decomposi-

tion of the terrain, which is a hard problem in general [56], and

instead, only produce a local convex region centered around a

nominal foothold. Second, the MIQP approach does not allow

for nonlinear costs and dynamics, which limits the range of

motions that can be expressed. We ﬁrst explored the proposed

terrain representation as part of our previous work [57], but

relied on ofﬂine mapping, manual terrain segmentation, and

did not yet consider terrain collisions. In [58], we applied

this idea to wheeled-legged robots, but again relied on ofﬂine

mapping and segmentation. Moreover, as discussed in the next

section, in both [57] and [58], we used a different solver, which

was found to be insufﬁcient for the scenarios in this work.

C. Motion Optimization

For trajectory optimization, large-scale optimization soft-

ware like SNOPT [59] and IPOPT [60] are popular. They are

the workhorse for ofﬂine trajectory optimization in the work of

Winkler [19], Dai [20], Mordatch [18], Posa [47], and Pardo

[42]. These works show a great range of motions in simulation,

but it typically takes minutes to hours to ﬁnd a solution.

A different line of work uses specialized solvers that exploit

the sparsity that arises from a sequential decision making pro-

cess. Several variants of Differential Dynamic Programming

(DDP) [61] have been proposed in the context of robotic

motion optimization, e.g., iLQR [62], [63], SLQ [39], and

FDDP [64].

With a slightly different view on the problem, the ﬁeld of

(nonlinear) model predictive control [23], [65] has specialized

in solving successive optimal control problems under real-

time constraints. See [66] for a comparison of state-of-the-

art quadratic programming (QP) solvers that form the core

of second-order optimization approaches to the nonlinear

4

Torso referencePlane Segmentation

Collision Avoidance

Constraints

Reference

Generation

A B

C

Filtering & Classification

Elevation Map

Optimal Control Problem

Convex Foothold Constraints

Plane boundary

Convex constraint

Query

Projection

Signed Distance Field

(dense 3D grid)

Fig. 3. Perception pipeline overview. (A) The elevation map is ﬁltered and classiﬁed into steppable and non-steppable cells [Section III-A]. All steppable

areas are segmented into planes [Section III-B]. After segmentation, the steppablity classiﬁcation is reﬁned. (B) A signed distance ﬁeld [Section III-C] and

torso reference layer [Section III-D] are precomputed to reduce the required computation time during optimization. (C) Convex foothold constraints in (21)

are obtained from the plane segmentation. The signed distance ﬁeld enables collision avoidance in (23), and the torso reference is used to generate height and

orientation references [Section IV-E].

problem. For time-critical applications, the real-time iteration

scheme can be used to trade optimality for lower computa-

tional demands [67]: In a Sequential Quadratic Programming

(SQP) approach to the nonlinear problem, at each control

instance, only a single QP optimization step is performed.

The current work was initially built on top of a solver in

the ﬁrst category [39]. However, a signiﬁcant risk in classical

DDP-based approaches is the need to perform a nonlinear

system rollout along the entire horizon. Despite the use of a

feedback policy, these forward rollouts can diverge, especially

in the presence of underactuated dynamics. This same obser-

vation motivated Mastalli et al. to design FDDP to maintain

gaps between shorter rollouts, resulting in a formulation that is

equivalent to direct multiple-shooting formulations with only

equality constraints [22], [64]. Giftthaler et al. [68] studied

several combinations of iLQR and multiple-shooting but did

not yet consider constraints beyond system dynamics nor a

line-search procedure to determine the stepsize. Furthermore,

experiments were limited to simple, ﬂat terrain walking.

We directly follow the multiple-shooting approach with a

real-time iteration scheme and leverage the efﬁcient structure

exploiting QP solver HPIPM [69]. However, as also mentioned

in both [64] and [68], one difﬁculty is posed in deciding a

stepsize for nonlinear problems, where one now has to monitor

both the violation of the system dynamics and minimization

of the cost function. To prevent an arbitrary trade-off through

a merit function, we suggest using a ﬁlter-based line-search

instead [70], which allows a step to be accepted if it reduces

either the objective function or the constraint violation. As we

will demonstrate in the result section, these choices contribute

to the robustness of the solver in challenging scenarios.

III. TERRAIN PERCEPTION AND SEGMENTATION

An overview of the perception pipeline and its relation to

the MPC controller is provided in Fig. 3. The pipeline can be

divided into three parts: (A) steppability classiﬁcation and seg-

mentation, (B) precomputation of the SDF and torso reference,

and (C) integration into the optimal control problem.

The elevation map, represented as a 2.5D grid [71] with a

4 cm resolution is provided by the GPU based implementation

introduced in [72]. The subsequent map processing presented

in this work runs on the CPU and is made available as part of

that same open-source library. Both (A) and (B) are computed

once per map and run at 20 Hz, asynchronously to the motion

optimization in (C).

A. Filtering & Classiﬁcation

The provided elevation map contains empty cells in oc-

cluded areas. As a ﬁrst step, we perform inpainting by ﬁlling

each cell with the minimum value found along the occlusion

border. Afterwards, a median ﬁlter is used to reduce noise and

outliers in the map.

Steppablity classiﬁcation is performed by thresholding the

local surface inclination and the local roughness estimated

through the standard deviation [73]. Both quantities can be

computed with a single pass through the considered neigh-

bourhood of size N:

µ=1

NX

i

ci,S=1

NX

i

cic>

i,Σ=S−µµ>,(1)

where µand Sare the ﬁrst and second moment, and Σ∈

R3×3is the positive semi-deﬁnite covariance matrix of the cell

positions ci. The variance in normal direction, σ2

n, is then the

smallest eigenvalue of Σ, and the surface normal, n, is the

5

corresponding eigenvector. For steppability classiﬁcation we

use a neighbourhood of N= 9, and set a threshold of 2 cm

on the standard deviation in normal direction and a maximum

inclination of 35◦, resulting in the following classiﬁcation:

steppability =(1if σn≤0.02,and nz≥0.82,

0otherwise,(2)

where nzdenotes the z-coordinate of the surface normal.

B. Plane Segmentation

After the initial classiﬁcation, the plane segmentation starts

by identifying continuous regions with the help of a connected

component labelling [74]. For each connected region of cells,

we compute again the covariance as in (1), where Nis now

the number of cells in the connected region, and accept the

region as a plane based on the following criteria:

planarity =(1if σn≤0.025, nz≥0.87,and N≥4

0otherwise.(3)

Notice that here we loosen the bound on the standard deviation

to 2.5 cm, tighten the bound on the inclination to 30◦, and add

the constraint that at least 4 cells form a region. If the planarity

condition is met, the surface normal and mean of the points

deﬁne the plane.

If a region fails the planarity condition, we trigger RANSAC

[75] on that subset of the data. The same criteria in (3) are

used to ﬁnd smaller planes within the connected region. After

the algorithm terminates, all cells that have not been included

in any plane have their steppability updated and set to 0.

At this point, we have a set of plane parameters with

connected regions of the map assigned to them. For each of

these regions, we now extract a 2D contour from the elevation

map [76], and project it along the z-axis to the plane to deﬁne

the boundary in the frame of the plane. It is important to

consider that regions can have holes, for example, when a free-

standing obstacle is located in the middle of an open ﬂoor. The

boundary of each segmented region is therefore represented

by an outer polygon together with a set of polygons that trace

enclosed holes. See Fig. 4 for an illustrative example of such

a segmented region and the local convex approximations it

permits. Finally, if the particular region allows, we shrink the

boundary inwards (and holes outwards) to provide a safety

margin. If the inscribed area is not large enough, the plane

boundary is accepted without margin. In this way we obtain a

margin where there is enough space to do so, but at the same

time we do not reject small stepping stones, which might be

crucial in certain scenarios.

C. Signed Distance Field

Before computing the SDF, we take advantage of the

classiﬁcation between terrain that will be potentially stepped

on and terrain that will not be stepped on. To all cells that are

non-steppable, we add a vertical margin of 2 cm, and dilate the

elevation by one cell. The latter effectively horizontally inﬂates

all non-steppable areas by the map resolution. This procedure

Fig. 4. An example of a segmented region represented by a non-convex outer

polygon and two non-overlapping holes (drawn in black). Three different local

convex approximations (drawn in orange) are shown that are found around

query points with the iterative algorithm described in section IV-F.

corrects for the problem that edges tend to be underestimated

in the provided elevation map.

We use a dense 3D voxel grid, where each voxel contains

the value and 3D gradient. The previous motion plan is used

to determine the 3D volume where distance information is

needed. This volume is a bounding box that contains all

collision bodies of the last available plan with a margin of

25 cm. This way, the size and shape of the SDF grid dynam-

ically scales with the motion that is executed. Storing both

value and derivative as proposed in [77] allows for efﬁcient

interpolation during optimization. However, in contrast to [77],

where values and gradients are cached after the ﬁrst call, we

opt to precompute the full voxel grid to reduce the computation

time during optimization as much as possible.

This is possible by taking advantage of the extra structure

that the 2.5D representation provides. A detailed description

of how the SDF can be efﬁciently computed from an elevation

map is given in Appendix A.

D. Torso reference map

With user input deﬁned as horizontal velocity and an

angular rate along the z-direction, it is the responsibility of

the controller to decide on the height and orientation of the

torso. We would like the torso pose to be positioned in such a

way that suitable footholds are in reach for all of the feet. We

therefore create a layer that is a smooth interpolation of all

steppable regions as described in [44]. The use of this layer to

generate a torso height and orientation reference is presented

in section IV-E.

IV. MOTION PLANNING

In this section, we describe the nonlinear MPC formulation.

In particular, we set out to deﬁne all components in the

following nonlinear optimal control problem:

minimize

u(·)Φ(x(T)) + ZT

0

L(x(t),u(t), t)dt, (4a)

subject to: x(0) = ˆ

x,(4b)

˙

x=fc(x,u, t),(4c)

g(x,u, t) = 0,(4d)

where x(t)and u(t)are the state and the input at time t, and ˆ

x

is the current measured state. The term L(·)is a time-varying

running cost, and Φ(·)is the cost at the terminal state x(T).

The goal is to ﬁnd a control signal that minimizes this cost

6

Fig. 5. Overview of the coordinates frames and constraints used in the

deﬁnition of the MPC problem. On the front left foot, a friction cone is

shown, deﬁned in the terrain frame FT. On the right front foot, a swing

reference trajectory is drawn between the liftoff frame FT−and touchdown

frame FT+. Foot placement constraints are deﬁned as a set of half-spaces

in the touchdown frame. Stance legs have collision bodies at the knee, as

illustrated on the right hind leg, while swing legs have collision bodies on

both the foot and the knee, as shown on the left hind leg.

subject to the initial condition, x0, system dynamics, fc(·), and

equality constraints, g(·). Inequality constraints are all handled

through penalty functions and will be deﬁned as part of the

cost function in section IV-F.

A. Robot deﬁnition

We deﬁne the generalized coordinates and velocities as:

q=hθ>

B,p>

B,q>

ji>

,˙

q=ω>

B,v>

B,˙

q>

j>

,(5)

where θB∈R3is the orientation of the base frame, FB,

in Euler angles, pB∈R3is the position of the base in the

world frame, FW.ωB∈R3and vB∈R3are the angular

rate and linear velocity of the base in the body frame FB.

Joint positions and velocities are given by qj∈R12 and ˙

qj∈

R12. The collection of all contact forces is denoted by λ∈

R12. When referring to these quantities per leg, we will use a

subscript i, e.g. qi∈R3or λi∈R3. All subscripts for legs

in contact are contained in the set C. A graphical illustration

of the robot together with the deﬁned coordinate frames is

provided in Fig. 5.

B. Torso dynamics

To derive the torso dynamics used in this work, consider

the full rigid body dynamics of the robot,

M(q)¨

q+n(q,˙

q) = S>τ+τdist +X

i∈C

J>

i(q)λi,(6)

with inertia matrix M:R18 →R18×18, generalized acceler-

ations ¨

q∈R18, and nonlinear terms n:R18 ×R18 →R18

on the left hand side. The right hand contains the selection

matrix S= [012×6,I12×12]∈R12×18 , actuation torques

τ∈R12, disturbance forces τdist ∈R18 , contact Jacobians

Ji:R18 →R3×18, and contact forces λi∈R3.

For these equations of motion, it is well known that for

an articulated system, the underactuated, top 6 rows are

of main interest for motion planning [51]. These so-called

centroidal dynamics govern the range of motion that can be

achieved [40], [78]. Solving the centroidal dynamics for base

acceleration gives:

˙

ωB

˙

vB=M−1

B τdist

B−MBj ¨

qj−nB+X

i∈C

J>

B,i λi!,(7)

=fB(q,˙

q,¨

qj,λ,τdist

B),(8)

where MB∈R6×6is the compound inertia tensor at the top

left of M(q), and MBj ∈R6×12 is the top right block that

encodes inertial coupling between the legs and base. The other

terms with subscript Bcorrespond to the top 6 rows of the

same terms in (6).

To simplify the torso dynamics, we evaluate this func-

tion with zero inertial coupling forces from the joints, i.e.

MBj ¨

qj=0. This simpliﬁcation allows us to consider the legs

only on velocity level and removes joint accelerations from the

formulation. From here, further simpliﬁcations would be pos-

sible. Evaluating the function at a nominal joint conﬁguration

and zero joint velocity creates a constant inertia matrix and

gives rise to the commonly used single rigid body assumption.

While this assumption is appropriate on ﬂat terrain, the joints

move far away from their nominal conﬁguration in this work,

creating a signiﬁcant shift in mass distribution and center of

mass location.

C. Input loopshaping

The bandwidth limitations of the series elastic actuators

used in ANYmal pose an additional constraint on the set of

motions that are feasible on hardware. Instead of trying to

accurately model these actuator dynamics, we use a frequency-

dependent cost function to penalize high-frequency content in

the contact forces and joint velocity signals [79]. For com-

pleteness, we present here the resulting system augmentation

in the time domain:

˙

sλ=Aλsλ+Bλνλ,˙

sj=Ajsj+Bjνj,(9)

λ=Cλsλ+Dλνλ,˙

qj=Cjsj+Djνj,

where sλand sjare additional states, and νλand νjare auxil-

iary inputs, associated with contact forces and joint velocities

respectively. When the ﬁlters (νλ→λand νj→˙

qj) are

low-pass ﬁlters, penalizing the auxiliary input is equivalent to

penalizing high frequency content in λand ˙

qj.

An extreme case is obtained when choosing Aλ=Dλ=0,

Bλ=Cλ=I, in which case the auxiliary input becomes the

derivative, ˙

λ. This reduces to the common system augmenta-

tion technique that allows penalization of input rates [23].

In our case we allow some direct control (D6=0) and

select Aλ=Aj=0,Bλ=Bj=I,Cλ=100

4I,Cj=

50

3I,Dλ=1

4I,Dj=1

3I. This corresponds to a progressive

increase in cost up to a frequency of 100 rad s−1for λand up

to 50 rad s−1for ˙

qj, where high frequency components have

their cost increased by a factor of 4and 3respectively.

7

D. System Dynamics

We are now ready to deﬁne the state vector x∈R48 and

input vector u∈R24 used during motion optimization:

x=hθ>

B,p>

B,ω>

B,v>

B,q>

j,s>

λ,s>

ji>

,u=ν>

λ,ν>

j>

.(10)

Putting together the robot dynamics from section IV-B and

system augmentation described in IV-C gives the continuous

time MPC model ˙

x=fc(x,u, t):

d

dt

θB

pB

ωB

vB

qj

sλ

sj

=

T(θB)ωB

RB(θB)vB

fB(q,˙

q,0,Cλsλ+Dλνλ,τdist

B)

Cjsj+Djνj

Aλsλ+Bλνλ

Ajsj+Bjνj

,(11)

where T(θB) : R3→R3×3provides the conversion be-

tween angular body rates and Euler angle derivatives, and

RB(θB) : R3→R3×3provides the body to world rotation

matrix. The disturbance wrench τdist

Bis considered a parameter

and is assumed constant over the MPC horizon.

E. Reference generation

The user commands 2D linear velocities and an angular rate

in the horizontal plane, as well as a desired gait pattern. A

full motion and contact force reference is generated to encode

these user commands and additional motion preferences into

the cost function deﬁned in section IV-F. This process is

carried out before every MPC iteration.

As a ﬁrst step, assuming a constant input along the horizon,

a 2D base reference position and heading direction are extrap-

olated in the world frame. At each point in time, the 2D pose

is converted to a 2D position for each hip. The smoothened

elevation map, i.e. the torso reference layer shown in Fig 3, is

interpolated at the 2D hip location. The interpolated elevation

in addition to a desired nominal height, hnom, gives a 3D

reference position for each hip. A least-squares ﬁt through the

four hip positions gives the 6DoF base reference.

The extracted base reference and the desired gait pattern

are used to derive nominal foothold locations. Here we use

the common heuristic that the nominal foothold is located

below the hip, in gravity-aligned direction, at the middle of

the contact phase [32]. Additionally, for the ﬁrst upcoming

foothold, a feedback on the measured velocity is added:

pi,nom =pi,hip,nom +shnom

g(vB,meas −vB,com ),(12)

where pi,nom ∈R3is the nominal foothold, pi,hip,nom ∈R3

is the nominal foothold location directly below the hip, and gis

the gravitational constant. vB,meas and vB,com are measured

and commanded base velocity respectively.

With the nominal foothold locations known, the plane seg-

mentation deﬁned in section III-B is used to adapt the nominal

foothold locations to the perceived terrain. Each foothold is

projected onto the plane that is closest and within kinematic

limits. Concretely, we pick the reference foothold, pi,proj,

according to:

argmin

pi,proj ∈Π(pi,nom)kpi,nom −pi,pr oj k2

2+wkinfk in(pi,proj ),

(13)

where Π(pi,nom)is a set of candidate points. For each

segmented plane we take the point within that region that is

closest to the nominal foothold as a candidate. The term fkin

is a kinematic penalty with weight wkin that penalizes the

point if the leg extension at liftoff or touchdown is beyond

a threshold and if the foothold crosses over to the opposite

side of the body. Essentially, this is a simpliﬁed version of

the foothold batch search algorithm presented in [15], which

searches over cells of the map instead of pre-segmented planes.

After computing all projected footholds, heuristic swing

trajectories are computed with two quintic splines; from liftoff

to apex and apex to touchdown. The spline is constrained

by a desired liftoff and touchdown velocity, and an apex

location is selected in such a way that the trajectory clears the

highest terrain point between the footholds. Inverse kinematics

is used to derive joint position references corresponding to the

base and feet references. Finally, contact forces references are

computed by dividing the total weight of the robot equally

among all feet that are in contact. Joint velocity references

are set to zero.

F. Cost & Soft Inequality Constraints

The cost function (4a) is built out of several components.

The running cost L(x,u, t)can be split into tracking costs

L, loopshaping costs Lν, and penalty costs LB:

L=L+Lν+LB.(14)

The motion tracking cost are used to follow the reference

trajectory deﬁned in section IV-E. Tracking error are deﬁned

for the base, B, and for each foot ,i,

B=

log(RBR>

B,ref )∨

pB−pB,ref

ωB−ωB,ref

vB−vB,ref

,i=

qi−qi,ref

˙

qi−˙

qi,ref

pi−pi,ref

vi−vi,ref

λi−λi,ref

,(15)

where log(RBR>

B,ref )∨is the logarithmic map of the orien-

tation error, represented as a 3D rotation vector, and piand

viare the foot position and velocity in world frame. Together

with diagonal, positive deﬁnite, weight matrices WBand Wi,

for which the individual elements are listed in Table I, these

errors form the following nonlinear least-squares cost:

L=1

2kBk2

WB+

4

X

i=1

1

2kik2

Wi.(16)

As discussed in section IV-C, high-frequency content in

joint velocities and contact forces are penalized through a cost

on the corresponding auxiliary input. This cost is a simple

quadratic cost:

Lν=1

2ν>

λRλνλ+1

2ν>

jRjνj,(17)

8

TABLE I

MOTI ON TR ACK IN G WEI GH TS

Term Weights

log(RBR>

B,ref )∨(100.0,300.0,300.0)

pB−pB,ref (1000.0,1000.0,1500.0)

ωB−ωB,ref (10.0,30.0,30.0)

vB−vB,ref (15.0,15.0,30.0)

qi−qi,ref (2.0,2.0,1.0)

˙

qi−˙

qi,ref (0.02,0.02,0.01)

pi−pi,ref (30.0,30.0,30.0)

vi−vi,ref (15.0,15.0,15.0)

λi−λi,ref (0.001,0.001,0.001)

where Rλand Rjare constant, positive semi-deﬁnite, weight

matrices. To obtain an appropriate scaling and avoid further

manual tuning, these matrices are obtained from the quadratic

approximation of the motion tracking cost (16), with respect

to λand ˙

qjrespectively, at the nominal stance conﬁguration

of the robot.

All inequality constraints are handled through the penalty

cost. In this work, we use relaxed barrier functions [80], [81].

This penalty function is deﬁned as a log-barrier on the interior

of the feasible space and switches to a quadratic function at a

distance δfrom the constraint boundary.

B(h) = (−µln(h), h ≥δ,

µ

2h−2δ

δ2−1−µln(δ), h < δ. (18)

The penalty is taken element-wise for vector-valued inequality

constraints. The sum of all penalties is given as follows:

LB=

4

X

i=1 Bjhj

i+X

i∈C Btht

i+Bλhλ

i+X

c∈D Bdhd

c,

(19)

with joint limit constraints hj

ifor all legs, foot placement and

friction cones constraints, ht

iand hλ

i, for legs in contact, and

collision avoidance constraints hd

cfor all bodies in a set D.

The joint limits constraints contain upper {qj,˙

qj,τ}and

lower bounds {qj,˙

qj,τ}for positions, velocities, and torques:

hj

i=

qj−qj

qj−qj

˙

qj−˙

qj

˙

qj−˙

qj

τ−τ

τ−τ

≥0,(20)

where we approximate the joint torques by considering a static

equilibrium in each leg, i.e. τi=J>

j,iλi.

The foot placement constraint is a set of linear inequality

constraints in task space:

ht

i=At

i·pi+bt

i≥0,(21)

where At

i∈Rm×3, and bt

i∈Rmdeﬁne mhalf-space

constraints in 3D. Each half-space is deﬁned as the plane

spanned by an edge of the 2D polygon and the surface normal

of the touchdown terrain FT+. The polygon is obtained by

initializing all mvertices at the reference foothold derived in

section IV-E and iteratively displacing them outwards. Each

vertex is displaced in a round-robin fashion until it reaches the

boundary of the segmented region or until further movement

would cause the polygon to become non-convex. Similar to

[82], we have favoured the low computational complexity of an

iterative scheme over an exact approach of obtaining a convex

inner approximation. The ﬁrst set of extracted constraints

remain unaltered for a foot that is in the second half of the

swing phase to prevent last-minute jumps in constraints.

The friction cone constraint is implemented as:

hλ

i=µcFz−qF2

x+F2

y+2≥0,(22)

with [Fx, Fy, Fz]>=R>

TRBλi, deﬁning the forces in the

local terrain frame. µcis the friction coefﬁcient, and > 0is

a parameter that ensures a continuous derivative at λi=0,

and at the same time creates a safety margin [83].

The collision avoidance constraint is given by evaluation of

the SDF at the center of a collision sphere, pc, together with

the required distance given by the radius, rc, and a shaping

function dmin(t).

hd

c=dSDF (pc)−rc−dmin (t)≥0.(23)

The primary use of the shaping function is to relax the

constraint if a foot starts a swing phase from below the map.

To avoid the robot using maximum velocity to escape the

collision, we provide smooth guidance back to free space with

a cubic spline trajectory. This happens when the perceived

terrain is higher than the actual terrain, for example in case

of a soft terrain like vegetation and snow, or simply because

of drift and errors in the estimated map. The collision set D

contains collision bodies for all knees and for all feet that are

in swing phase, as visualized on the hind legs in Fig. 5.

Finally, we use a quadratic cost as the terminal cost in (4a).

To approximate the inﬁnite horizon cost incurred after the

ﬁnite horizon length, we solve a Linear Quadratic Regulator

(LQR) problem for the linear approximation of the MPC

model and quadratic approximation of the intermediate costs

around the nominal stance conﬁguration of the robot. The

Riccati matrix SLQR of the cost-to-go is used to deﬁne the

quadratic cost around the reference state:

Φ(x) = 1

2(x−xref (T))>SLQR (x−xref (T)) .(24)

G. Equality constraints

For each foot in swing phase, the contact forces are required

to be zero:

λi=0,∀i /∈ C.(25)

Additionally, for each foot in contact, the end-effector

velocity is constrained to be zero. For swing phases, the

reference trajectory is enforced only in the normal direction.

This ensures that the foot lifts off and touches down with

a speciﬁed velocity while leaving complete freedom of foot

placement in the tangential direction.

vi=0,if i∈ C,

n>(t) (vi−vi,ref +kp(pi−pi,ref )) = 0,if i /∈ C,

The surface normal, n(t), is interpolated over time since liftoff

and touchdown terrain can have a different orientation.

9

Algorithm 1 Real-time iteration Multiple-shooting MPC

1: Given: previous solution wi

2: Discretize the continous problem to the form of (27)

3: Compute the linear quadratic approximation (30)

4: Compute the equality constraint projection (34)

5: δ˜

w←Solve the projected QP subproblem (35)

6: δw←Pδ˜

w+p, back substitution using (33)

7: wi+1 ←Line-Search(wi,δw), (Algorithm 2)

V. NUMERICAL OPTIMIZATION

We consider a direct multiple-shooting approach to trans-

forming the continuous optimal control problem into a ﬁnite-

dimensional nonlinear program (NLP) [22]. Since MPC com-

putes control inputs over a receding horizon, successive in-

stances of (27) are similar and can be efﬁciently warm-

started when taking an SQP approach by shifting the previous

solution. For new parts of the shifted horizon, for which no

initial guess exists, we repeat the ﬁnal state of the previous

solution and initialize the inputs with the references generated

in section IV-E. Additionally, we follow the real-time iteration

scheme where only one SQP step is performed per MPC

update [84]. In this way, the solution is improved across

consecutive instances of the problem, rather than iterating until

convergence for each problem.

As an overview of the approach described in the following

sections, a pseudo-code is provided in Algorithm 1, referring

to the relevant equations used at each step. Except for the

solution of the QP in line 5, all steps of the algorithm are

parallelized across the shooting intervals. The QP is solved

using HPIPM [69].

A. Discretization

The continuous control signal u(t)is parameterized over

subintervals of the prediction horizon [t, t +T]to obtain a

ﬁnite-dimensional decision problem. This creates a grid of

nodes k∈ {0,.. . , N }deﬁning control times tkseparated by

intervals of duration δt ≈T/(N−1). Around gait transitions,

δt is slightly shortened or extended such that a node is exactly

at the gait transition.

In this work, we consider a piecewise constant, or zero-

order-hold, parameterization of the input. Denoting xk=

x(tk)and integrating the continuous dynamics in (11) over an

interval leads to a discrete time representation of the dynamics:

fd

k(xk,uk) = xk+Ztk+δt

tk

fc(x(τ),uk, t)dτ. (26)

The integral in (26) is numerically approximated with an inte-

gration method of choice to achieve the desired approximation

accuracy of the evolution of the continuous time system under

the zero-order-hold commands. We use an explicit second-

order Runge-Kutta scheme.

The general nonlinear MPC problem presented below can

be formulated by deﬁning and evaluating a cost function and

constraints on the grid of nodes.

min

X,UΦ(xN) +

N−1

X

k=0

lk(xk,uk),(27a)

s.t. x0−ˆ

x=0,(27b)

xk+1 −fd

k(xk,uk) = 0, k = 0,. .. , N −1,(27c)

gk(xk,uk) = 0, k = 0,. .. , N −1,(27d)

where X= [x>

0, . . . x>

N]>, and U= [u>

0, . . . u>

N−1]>, are

the sequences of state and input variables respectively. The

nonlinear cost and constraint functions lk, and gk, are discrete

sample of the continuous counterpart. Collecting all decision

variables into a vector, w= [X>,U>]>, problem (27) can be

written as a general NLP:

min

wφ(w),s.t. F(w)

G(w)=0,(28)

where φ(w)is the cost function, F(w)is the collection

of initial state and dynamics constraints, and G(w)is the

collection of all general equality constraints.

B. Sequential Quadratic Programming (SQP)

SQP based methods apply Newton-type iterations to

Karush-Kuhn-Tucker (KKT) optimality conditions, assuming

some regularity conditions on the constraints [85]. The La-

grangian of the NLP in (28) is deﬁned as:

L(w,λG,λH) = φ(w) + λ>

FF(w) + λ>

GG(w),(29)

with Lagrange multipliers λFand λG, corresponding to the

dynamics and equality constraints. The Newton iterations can

be equivalently computed by solving the following potentially

non-convex QP [86]:

min

δw∇wφ(wi)>δw+1

2δw>Biδw,(30a)

s.t F(wi) + ∇wF(wi)>δw=0,(30b)

G(wi) + ∇wG(wi)>δw=0,(30c)

where the decision variables, δw=w−wi, deﬁne the update

step relative to the current iteration wi, and the Hessian Bi=

∇2

wL(wi,λF,λG). Computing the solution to (30) provides a

candidate decision variable update, δwi, and updated Lagrange

multipliers.

C. Quadratic Approximation Strategy

As we seek to deploy MPC on dynamic robotic platforms, it

is critical that the optimization problem in (30) is well condi-

tioned and does not provide difﬁculty to numerical solvers. In

particular, when Biin (30a) is positive semi-deﬁnite (p.s.d),

the resulting QP is convex and can be efﬁciently solved [66].

To ensure this, an approximate, p.s.d Hessian is used instead

of the full Hessian of the Lagrangian. For the tracking costs

(16), the objective function has a least-squares form in which

case the Generalized Gauss-Newton approximation,

∇2

w1

2ki(w)k2

Wi≈ ∇wi(w)>Wi∇wi(w),(31)

10

proves effective in practice [87]. Similarly, for the soft con-

straints, we exploit to convexity of the penalty function applied

to the nonlinear constraint [88]:

∇2

w(B(h(w))) ≈ ∇wh(w)>∇2

hB(h(w))∇wh(w),(32)

where the diagonal matrix ∇2

hB(h(w)) maintains the curva-

ture information of the convex penalty functions. The contri-

bution of the constraints to the Lagrangian in (29) is ignored

in the approximate Hessian since we do not have additional

structure that allows a convex approximation.

D. Constraint Projection

The equality constraints in IV-G were carefully chosen to

have full row rank w.r.t. the control inputs, such that, after

linearization, ∇wG(wi)>has full row rank in (30c). This

means that the equality constraints can be eliminated before

solving the QP through a change of variables [86]:

δw=Pδ˜

w+p,(33)

where the linear transformation satisﬁes

∇wG(wi)>P=0,∇wG(wi)>p=−G(wi).(34)

After substituting (33) into (30), the following QP is solved

w.r.t. δ˜

w.

min

δ˜

w∇˜

w˜

φ(wi)>δ˜

w+1

2δ˜

w>˜

Biδ˜

w,(35a)

s.t ˜

F(wi) + ∇˜

w˜

F(wi)>δ˜

w=0.(35b)

Because each constraint applies only to the variables at one

node k, the coordinate transformation maintains the sparsity

pattern of an optimal control problem and can be computed

in parallel. Since this projected problem now only contains

costs and system dynamics, solving the QP only requires

one Ricatti-based iteration [69]. The full update δwis then

obtained through back substitution into (33).

E. Line-Search

To select an appropriate stepsize, we employ a line-search

based on the ﬁlter line-search used in IPOPT [60]. In contrast

to a line-search based on a merit function, where cost and

constraints are combined to one metric, the main idea is

to ensure that each update either improves the constraint

satisfaction or the cost function. The constraint satisfaction

θ(w)is measured by taking the norm of all constraints scaled

by the time discretization:

θ(w) = δt F(w)>,G(w)>>2.(36)

In case of high or low constraint satisfaction, the behavior

is adapted: When the constraint is violated beyond a set

threshold, θmax, the focus changes purely to decreasing the

constraints; when constraint violation is below a minimum

threshold, θmin, the focus changes to minimizing costs.

Compared to the algorithm presented in [60], we remove

recovery strategies and second-order correction steps, for

which there is no time in the online setting. Furthermore, the

Algorithm 2 Backtracking Line-Search

1: Hyperparameters: αmin = 10−4, θmax = 10−2, θmin = 10−6, η =

10−4, γφ= 10−6, γθ= 10−6, γα= 0.5

2: α←1.0

3: θk←θ(wi)

4: φk←φ(wi)

5: Accepted ←False

6: while Not Accepted and α≥αmin do

7: θi+1 ←θ(wi+αδw)

8: φi+1 ←φ(wi+αδw)

9: if θi+1 > θmax then

10: if θi+1 <(1 −γθ)θithen

11: Accepted ←True

12: end if

13: else if max(θi+1, θi)< θmin and ∇φ(wi)>δw<0then

14: if φi+1 < φi+ηα∇φ(wi)>δwthen

15: Accepted ←True

16: end if

17: else

18: if φi+1 < φi−γφθior θi+1 <(1 −γθ)θithen

19: Accepted ←True

20: end if

21: end if

22: if Not Accepted then

23: α←γαα

24: end if

25: end while

26: if Accepted then

27: wi+1 ←wi+αδw

28: else

29: wi+1 ←wi

30: end if

history of iterates plays no role since we perform only one

iteration per problem.

The simpliﬁed line-search as used in this work is given in

Algorithm 2 and contains three distinct branches in which a

step can be accepted. The behavior at high constraint violation

is given by line 9, where a step is rejected if the new constraint

violation is above the threshold and worse than the current

violation. The switch to the low constraint behavior is made

in line 13: if both new and old constraint violations are low

and the current step is in a descent direction, we require that

the cost decrease satisﬁes the Armijo condition in line 14.

Finally, the primary acceptance condition is given in line 18,

where either a cost or constraint decrease is requested. The

small constants γφ, and γθare used to ﬁne-tune this condition

with a required non-zero decrease in either quantity.

VI. MOT IO N EXECUTION

The optimized motion planned by the MPC layer consists

of contact forces and desired joint velocities. We linearly

interpolate the MPC motion plan at the 400 Hz execution

rate and apply the feedback gains derived from the Riccati

backward pass to the measured state [83]. The corresponding

torso acceleration is obtained through (8). The numerical

derivative of the planned joint velocities is used to determine a

feedforward joint acceleration. A high-frequency whole-body

controller (WBC) is used to convert the desired acceleration

tasks into torque commands [89]–[91]. A generalized mo-

mentum observer is used to estimate the contact state [92].

Additionally, the estimated external torques are ﬁltered and

added to the MPC and WBC dynamics as described in [44].

We use the same ﬁlter setup as shown in Fig. 13. of [44].

11

A. Event based execution

Inevitably, the measured contact state will be different from

the planned contact state used during the MPC optimization. In

this case, the designed contact forces cannot be provided by the

whole-body controller. We have implemented simple reactive

behaviors to respond to this situation and provide feedback to

the MPC layer.

In case there is a planned contact, but no contact is mea-

sured, we follow a downward regaining motion for that foot.

Under the assumption that the contact mismatch will be short,

the MPC will start a new plan again from a closed contact

state. Additionally, we propagate the augmented system in (9)

with the information that no contact force was generated, i.e.

0!

=Cλsλ+Dλνλ. In this way, the MPC layer will generate

contact forces that maintain the requested smoothness w.r.t.

the executed contact forces.

When contact is measured, but no contact was planned, the

behavior depends on the planned time till contact. If contact

was planned to happen soon, the measured contact is sent

to the MPC to generate the next plan from that early contact

state. In the meantime, the WBC maintains a minimum contact

force for that foot. If no upcoming contact was planned, the

measured contact is ignored.

B. Whole-body control

The whole-body control (WBC) approach considers the full

nonlinear rigid body dynamics of the system in (6), including

the estimate of disturbance forces. Each task is formulated as

an equality constraint, inequality constraint, or least-squares

objective afﬁne in the generalized accelerations, torques, and

contact forces. While we have used a hierarchical resolution

of tasks in the past [91], in this work, we instead use a single

QP and trade off the tracking tasks with weights. We found

that a strict hierarchy results in a dramatic loss of performance

in lower priority tasks when inequalities constraints are active.

Additionally, the complexity of solving multiple QPs and null-

space projections in the hierarchical approach is no longer

justiﬁed with the high quality motion reference coming from

the MPC.

The complete list of tasks is given in Table II. The ﬁrst two

blocks of tasks enforce physical consistency and inequality

constraints on torques, forces, and joint conﬁgurations. The

joint limit constraint is derived from an exponential Control

Barrier Function (CBF) [93] on the joint limits, qj≤qj≤qj,

resulting in the following joint acceleration constraints:

¨

qj+ (γ1+γ2)˙

qj+γ1γ2(qj−qj)≥0,(37)

−¨

qj−(γ1+γ2)˙

qj+γ1γ2(qj−qj)≥0,(38)

with scalar parameters γ1>0, γ2>0. These CBF constraints

guarantee that the state constraints are satisﬁed for all time

and under the full nonlinear dynamics of the system [94].

For the least-square tasks, we track swing leg motion with

higher weight than the torso reference. This prevents that

the robot exploits the leg inertia to track torso references in

underactuated directions, and it ensures that the foot motion

is prioritized over torso tracking when close to kinematics

TABLE II

WHO LE-B ODY C ONT ROL TAS KS

Type Task

=Floating base equations of motion.

No motion at the contact points.

≥

Torque limits.

Friction cone constraint.

Joint limit barrier constraint.

w2

ik·k2

Swing leg motion tracking (wi= 100.0).

Torso linear and angular acceleration (wi= 1.0).

Contact force tracking. (wi= 0.01).

limits. Tracking the contact forces references with a low

weight regulates the force distribution in case the contact

conﬁguration allows for internal forces.

Finally, the torque derived from the whole-body controller,

τwbc ∈R12, is computed. To compensate for model uncer-

tainty for swing legs, the integral of joint acceleration error

with gain K > 0is added to the torque applied to the system:

τi=τi,wbc −KZt

tsw

0

(¨

qi−¨

qi,wbc)dt, (39)

=τi,wbc −K ˙

qi−˙

qi(tsw

0)−Zt

tsw

0

¨

qi,wbcdt!,(40)

where tsw

0is the start time of the swing phase. The acceleration

integral can be implemented based on the measured velocity

˙

qiand the velocity at the start of the swing phase, ˙

qi(tsw), as

shown in (40). Futhermore, the feedback term is saturated to

prevent integrator windup. For stance legs, a PD term is added

around the planned joint conﬁguration and contact consistent

joint velocity.

VII. RES ULTS

ANYmal is equipped with either two dome shaped Robo-

Sense bpearl LiDARs, mounted in the front and back of

the torso, or with four Intel RealSense D435 depth cameras

mounted on each side of the robot. Elevation mapping runs at

20 Hz on an onboard GPU (Jetson AGX Xavier). Control and

state estimation are executed on the main onboard CPU (Intel

i7-8850H,2.6 GHz, Hexa-core) at 400 Hz, asynchronously to

the MPC optimization which is triggered at 100 Hz. Four cores

are used for parallel computation in the MPC optimization.

A time horizon of T= 1.0 s is used with a nominal time

discretization of δt ≈0.015 s, with a slight variation due to the

adaptive discretization around gait transitions. Each multiple-

shooting MPC problem therefore contains around 5000 deci-

sion variables. Part (A) and (B) of perception pipeline in Fig. 3

are executed on a second onboard CPU of the same kind and

provides the precomputed layers over Ethernet.

To study the performance of the proposed controller, we

report results in different scenarios and varying levels of

detail. All perception, MPC, and WBC parameters remain

constant throughout the experiments and are the same for

simulation and hardware. An initial guess for these parameters

was found in simulation, and we further ﬁne-tuned them on

hardware. First, results for the perception pipeline in isolation

are presented in section VII-A. Second, we validate the major

design choices in simulation in section VII-B. Afterward, the

12

Fig. 6. Evaluation of the plane segmentation on a demo terrain [71]. The

shown map has a true size of 20×20×1 m with a resolution of 4 cm. Top

left shows the elevation map with additive uniform noise of ±2 cm plus

Gaussian noise with a standard deviation of 2 cm. Top right shows the map

after inpainting, ﬁltering, steppability classiﬁcation, and plane segmentation.

Below, four areas of interest are shown. Their original location in the map is

marked in the top right image.

proposed controller is put to the test in challenging simula-

tion, as well as hardware experiments in section VII-C. All

experiments are shown in the supplemental video [1]. Finally,

known limitations are discussed in section VII-D.

A. Perception Pipeline

The output of the steppability classiﬁcation and plane seg-

mentation (part A in Fig. 3) for a demo terrain is shown in

Fig. 6. This terrain is available as part of the gridmap library

and contains a collection of slopes, steps, curvatures, rough

terrain, and missing data. The left middle image shows that

slopes and steps are, in general, well segmented. In the bottom

right image, one sees the effect of the plane segmentation on

a curved surface. In those cases, the terrain will be segmented

into a collection of smaller planes. Finally, the rough terrain

sections shown in the right middle and bottom left image show

that the method is able to recognize such terrain as one big

planar section as long as the roughness is within the speciﬁed

tolerance. These cases also show the importance of allowing

holes in the segmented regions, making it possible to exclude

just those small regions where the local slope or roughness is

outside the tolerance. A global convex decomposition of the

map would result in a much larger amount of regions.

The computation time for the construction and querying

of the signed distance ﬁeld is benchmarked on sub-maps of

varying sizes extracted from the demo map, see Fig. 7. As

expected, the construction time scales linearly with the SDF

size, and the query time is constant with a slight increase when

the memory size exceeds a cache level. During runtime, the

local SDF size is typically below 105voxels, resulting in a

computation time well below 10 ms. Together with the map

update rate of 20 Hz, the proposed method provides the SDF

Fig. 7. Computation time for constructing and querying the signed distance

ﬁeld. Submaps of the terrain in Fig. 6 are used. SDF size on the horizontal axis

denotes the total amount of data points in the SDF (width×length×height).

The query time is reported for the total of 103random queries for the

interpolated value and derivative.

Fig. 8. ANYmal stepping up a box of 35 cm. Left: Without considering knee

collisions. Right: Knee collision included in the optimization.

at an order of magnitude faster than methods that maintain a

general 3D voxel grid, with update rated reported around 1 Hz

[77]. Per MPC iteration, around 103SDF queries are made,

making the SDF query time negligible compared to the total

duration of one MPC iteration.

B. Simulation

1) Collision avoidance: To highlight the importance of

considering knee collisions with the terrain, the robot is

commanded to traverse a box of 35 cm with a trotting gait

at 0.25 m s−1. Fig. 8 compares the simulation result of this

scenario with and without the knee collisions considered.

The inclusion of knee collision avoidance is required to

successfully step up the box with the hind legs. As shown

in the ﬁgure, the swing trajectories are altered. Furthermore,

the base pose and last stepping location before stepping up

are adjusted to prepare for the future, showing the beneﬁt

of considering all degrees of freedom in one optimization.

Similarly, on the way down, the foothold optimization (within

constraints) allows that the feet are placed away from the step,

avoiding knee collisions while stepping down.

Fig. 9 provides insight into the solver during the motion

performed with the knee collisions included. The four peaks

in the cost function show the effect of the collision avoidance

penalty when the legs are close to the obstacle during the step

up and step down. Most of the time, the step obtained from

the QP subproblem is accepted by the line-search with the

full stepsize of 1.0. However, between 7and 8 s the stepsize

is decreased to prevent the constraint violation from further

rising. This happens when the front legs step down the box and

are close to collision. In those cases, the collision avoidance

penalty is highly nonlinear, and the line-search is required

to maintain the right balance between cost decrease and

13

0

50

100

Cost

10−6

10−4

10−2

100

Constraint violation

0 2 4 6 8 10 12 14

time [s]

0.00

0.25

0.50

1.00

Step size

Constraint

Cost OR Constraint

Fig. 9. Solver status during the box traversal motion (including knee collision

avoidance). The ﬁrst and second plots show the total cost, and constraint

violation according to (36), after each iteration. The bottom plot shows the

stepsize and the line-search branch that led to the step acceptance. ‘Constraint‘

refers to a step accepted in the high constraint violation branch in line 9 of

Algorithm 2, ‘Cost OR Constraint‘ refers to the branch where either cost

or constraint decrease is accepted in line 18. Note that the low constraint

violation branch, line 13, did not occur in this experiment.

constraint satisfaction. We note that the line-search condition

for low constraint violation is typically not achieved when

using only one iteration per MPC problem.

2) Model selection: In the same scenario, we compare the

performance of the proposed dynamics for the base with those

of the commonly used single rigid body dynamics (SRBD). To

be precise, the torso dynamics in (8) are evaluated at a constant

nominal joint conﬁguration and with zero joint velocities,

while the rest of the controller remains identical. When using

the SRBD, the model does not describe the backward shift in

the center of mass location caused by the leg conﬁguration.

The result is that the controller with the SRBD model has

a persisting bias that makes the robot almost tip over during

the step up. This model error is quantiﬁed in Fig. 10. At 30◦

pitch angle, there is a center of mass error of 2.6 cm, resulting

in a bias of 13.3 N m at the base frame. For reference, this is

equivalent to an unmodelled payload of 3.6 kg at the tip of the

robot. The proposed model fully describes the change in inertia

and center of mass location and therefore does not have any

issue to predict the state trajectory during the step up motion.

3) Solver Comparison: To motivate our choice to imple-

ment a multiple-shooting solver and move away from the

DDP-based methods used in previous work, we compare both

approaches on ﬂat terrain and the stepping stone scenario

shown in Fig. 11. In particular, we compare against iLQR [62]

and implement it with the same constraint projection, line-

search, and the Riccati Backward pass of HPIPM as described

in section V. The key difference between the algorithms lies

in the update step. For multiple-shooting, we update both state

and inputs directly: u+

k=uk+αδuk,x+

k=xk+αδxk. In

contrast, iLQR proceeds with a line-search over closed-loop

nonlinear rollouts of the dynamics:

u+

k=uk+αkk+Kkx+

k−xk,(41)

x+

k+1 =fd

k(x+

k,u+

k),x+

0=ˆ

x,(42)

x

Fig. 10. The location of the center of mass (CoM) in heading direction for

various torso pitch angles. The ﬁrst set of CoM locations is evaluated with the

true joint angles, which are obtained when aligning the legs with the gravity

direction as in the top image. This corresponds to the reference in section

IV-E, which is tracked by the MPC. The second set of CoM locations is

evaluated for the default joint angles, shown in the bottom image, as assumed

by the SRBD model.

where Kkis the optimal feedback gain obtained from the

Riccati Backward pass and kk=δuk−Kkδxkis the control

update. Due to this inherently single-threaded process, each

line-search for iLQR takes four times as long as for the

multi-threaded multiple-shooting. However, note that with the

hybrid multiple-shooting-iLQR variants in [68] this difference

vanishes.

Table III reports the solvers’ average cost, dynamics con-

straint violation, and equality constraint violation for a trotting

gait in several scenarios. As a baseline, we run the multiple-

shooting solver until convergence (with a maximum of 50

iterations) instead of real-time iteration. To test the MPC

in isolation, we use the MPC dynamics as the simulator

and apply the MPC input directly. Because of the nonlinear

rollouts of iLQR, dynamics constraints are always satisﬁed,

and iLQR, therefore, has the edge over multiple-shooting on

this metric. However, as the scenario gets more complex and

the optimization problem becomes harder, there is a point

where the forward rollout of iLQR is unstable and diverges.

For the scenario shown in Fig. 11, this happens in the place

where the robot is forced to take a big leap at the 63 %

mark and at the 78 % mark where the hind leg is close to

singularity as the robot steps down. The continuous time

variant SLQ [39] fails in similar ways. These failure cases

are sudden, unpredictable, and happen regularly when testing

on hardware, where imperfect elevation maps, real dynamics,

and disturbances add to the challenge. The absence of long

horizon rollouts in the multiple-shooting approach makes it

more robust and better suited for the scenarios shown in this

work. For cases where both solvers are stable, we ﬁnd that

the small dynamics violation left with multiple-shooting in a

real-time iteration setting does not translate to any practical

performance difference on hardware. Finally, even for the

most challenging scenario, multiple-shooting with real-time

iteration remains within 10 % cost of the baseline.

4) Contact feedback: The reactive behavior under a mis-

match in planned and sensed contact information is shown

in the accompanying video. First, the sensed terrain is set to

14

Fig. 11. ANYmal traversing stepping stones in simulation (right to left). The resulting state trajectories for feet and torso, and the snapshots are shown for

a traversal with the multiple-shooting solver and a trotting gait at 0.75 m s−1. The marked 63 % and 78 % locations indicate where the alternative solver,

iLQR, diverges for 0.5 m s−1and 0.75 m s−1, respectively.

TABLE III

SOLVER COMPARISON ON FLAT TERRAIN AND STEPPING STONES. THE

BASELINE ITERATES UNTIL CONVERGENCE INSTEAD OF USING

RE AL-TIME ITERATION.

Baseline Multiple

shooting iLQR

Flat - 0.50 m/s

Cost 54.19 54.17 54.18

Dynamics Constr. 1.70 ×10−73.41 ×10−30.0

Equality Constr. 2.28 ×10−63.51 ×10−33.51 ×10−3

Stones - 0.25 m/s

Cost 151.92 156.22 156.69

Dynamics Constr. 3.58 ×10−51.01 ×10−20.0

Equality Constr. 7.24 ×10−42.28 ×10−22.14 ×10