PreprintPDF Available

Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We achieve robust natural motion with reinforcement learning and bipedal musculoskeletal models with up to 90 muscles.
Content may be subject to copyright.
PREPRINT 1
Natural and Robust Walking using Reinforcement
Learning without Demonstrations in
High-Dimensional Musculoskeletal Models
Pierre Schumacher1,2, Thomas Geijtenbeek3, Vittorio Caggiano4, Vikash Kumar4, Syn Schmitt5,
Georg Martius1,6, Daniel F. B. Haeufle2,7
Fig. 1: We achieve robust and energy-efficient natural walking with RL on a series of human models Left to right: H0918,
H1622,H2190,MyoLeg and an uneven terrain environment.
Abstract—Humans excel at robust bipedal walking in complex
natural environments. In each step, they adequately tune the
interaction of biomechanical muscle dynamics and neuronal
signals to be robust against uncertainties in ground conditions [1].
However, it is still not fully understood how the nervous system
resolves the musculoskeletal redundancy to solve the multi-
objective control problem considering stability, robustness, and
energy efficiency. In computer simulations, energy minimization
has been shown to be a successful optimization target, reproducing
natural walking with trajectory optimization [2] or reflex-
based control methods [3]–[6]. However, these methods focus
on particular motions at a time and the resulting controllers are
limited when compensating for perturbations [6]–[9]. In robotics,
reinforcement learning (RL) methods recently achieved highly
stable (and efficient) locomotion on quadruped systems [10],
[11], but the generation of human-like walking with bipedal
biomechanical models has required extensive use of expert data
sets [12]–[14]. This strong reliance on demonstrations often results
in brittle policies [15] and limits the application to new behaviors,
especially considering the potential variety of movements for
high-dimensional musculoskeletal models in 3D. Achieving natural
locomotion with RL without sacrificing its incredible robustness
might pave the way for a novel approach to studying human
walking in complex natural environments [16].
Index Terms—biomechanics, motor control, reinforcement
learning, human walking
1Max-Planck Institute for Intelligent Systems, Tübingen, Germany
2
Hertie Institute for Clinical Brain Research and Center for Integrative
Neuroscience, Tübingen, Germany
3Goatstream, The Netherlands
4Meta AI, New York, USA
5
Institute for Modelling and Simulation of Biomechanical Systems, Univer-
sity of Stuttgart, Germany
6Department of Computer Science, University of Tübingen, Germany
7Institute of Computer Engineering, Heidelberg University, Germany
I. INTRODUCTION
The aim of this study is to demonstrate that RL methods
can generate robust controllers for musculoskeletal models.
The novelty of our work is that we do not try to achieve
human-like behavior with RL by relying on kinematic data, but
only on biologically plausible objectives in combination with
realistic biomechanical constraints embedded into simulation
engines. These evolutionary priors have the potential to be
general enough to allow for the reproduction of natural gait,
similar to the achievements of reflex control policies, but with
the potential for generating diverse and robust behaviors under
many different conditions.
We specifically propose a reward function restricted to
metrics that are considered plausible objectives for biological
organisms, while using experimental human data only to modify
the relative importance of the different metrics, similar to
[17]–[19]. This goes beyond previous works applying RL to
biomechanical models which either study low-dimensional
systems [20], make use of expert data [14] during training or
learn unrealistic movements [21], [22]. We propose a reward
function based only on walking speed, joint pain, and muscle
effort, achieving periodic gaits that resemble human walking
kinematics and ground reaction forces (GRF) closer than
comparable RL approaches [3], [23]–[25]. Furthermore, the
learning approach generated walking in 4 different models and
2 simulation engines of differing biomechanical complexity
and accuracy with an identical training protocol and without
changing the reward function.
The simpler 2D and 3D models are comparable in complexity
and almost reach the naturalism of existing optimal control-
and reflex-based frameworks [4], [5], [26]. While their 80 or 90
PREPRINT 2
muscle counterparts are substantially more challenging to con-
trol, our approach still achieved gaits with kinematics and GRFs
similar to experimental human data, albeit with more artifacts,
potentially related to biomechanical modeling accuracy. Achiev-
ing gaits in these complex models is a step towards applications
in rehabilitation, neuroscience, and computer graphics requiring
high-dimensional models in complex environments. Striking
is the robustness of the learned controllers exhibiting diverse
stabilization strategies when faced with dynamic perturbations
to an extent unseen in previous reflex-based controllers [8], [9],
[26]–[28]. As the used reward terms are considered plausible
objectives for biological organisms, the general approach may
also be applicable to different movements. Therefore, we
believe that this approach is a useful starting point for the
community showing that RL is a viable candidate to investigate
the highly robust nature of complex human movements.
II. RESULTS
Our framework is built upon the recently published DEP-
RL [22] approach to learning feedback controllers for muscu-
loskeletal systems. DEP-RL has been shown to achieve robust
locomotion in several tasks, including running with a high-
dimensional (120 muscles) bipedal ostrich model, by proposing
a novel exploration scheme for overactuated systems. The
learned behaviors, however, still exhibited unnatural artifacts,
such as large co-contraction levels and excessive bending of
several joints.
Here, we extend that work by introducing an adaptive reward
function that accounts for biologically plausible incentives.
These incentives result in gaits that resembling human walking
much closer. Furthermore, the reward function is general
enough to generate gaits across several models with up to
90 muscles in two and three dimensions and in simulators of
differing biomechanical modeling accuracy without changes in
the weighting of the incentives in the reward function. Only the
network size was decreased for the low-dimensional models
to benefit from the computational speed up.
A. Reward function
Building on previous work on gait optimization [5], we
found that a natural gait can be achieved with RL by using
objectives that incentivize:
1)
learning to maintain a given speed without falling down,
2) minimizing effort, and
3) minimizing pain.
Thus, our reward function contains three main terms:
r=rvel ceffort cpain.(1)
The first term specifies the external task the agent should solve.
As we want the agent to move at a walking pace while keeping
its balance, we chose the following objective:
rvel =(exp (vvtarget)2if v < vtarget
1 otherwise,(2)
where vis the center-of-mass velocity and the target velocity
vtarget
is chosen to be
1.2m
/
s
, which is close to the average
energetically optimal human walking speed [29]. The velocity
reward is constant above the target velocity to improve the
optimization of the auxiliary cost terms, inspired by a recent
study on reward shaping in robotics [30].
Important for achieving natural human walking is the use
of minimal muscle effort, as the literature suggests that energy
efficiency is a key component of human locomotion [31], [32]:
ceffort =α(t)a3+w1(uuprev)2+w2Nactive (3)
where the first term penalizes muscle activity
a
[33], the second
term incentivizes smoothness of muscle excitations
u
, and the
third term
Nactive
incentivizes a small number of active muscles
(penalizing activity exceeding a certain value).
From a technical standpoint, it proved challenging to
effectively minimize muscle activity. Using a strong cost scale
that leads to energy-efficient walking later in training, causes
a performance collapse when enabled from the start. We,
therefore, chose an approach rooted in constrained optimization
[34]. We propose an adaptation mechanism for the weighting
parameter
α(t)
, increasing the weight only when the agent
performs well in the main task (
rvel
) and decreasing it when this
constraint is violated. Concretely, we measure the performance
by the task return. The details are provided in Alg. 1, we
marked the constrained optimization in blue.
This adaptive learning mechanism can be applied to each
model and removes the need for hand-tuning of schedules. A
change in reward function over time could, however, destabilize
learning, as previously collected environment transitions are
not reflective of the current effort cost anymore [35]. We,
therefore, monitor the performance of the policy in the current
environment, while the effort cost is only applied the moment
when data is sampled from the replay buffer. This relabeling of
previously collected data ensures that our off-policy algorithm
can make efficient use of the full replay buffer.
Algorithm 1 Effort weight adaptation.
Require:
threshold
θ
, smoothing
β
, change in adaptation rate
α, decay term λ[0,1]
rmean 0,αt0,smean 0
while True do
rtrain_episode() return from episode
rmean β rmean + (1 β)r
if rmean > θ and smean <0.5then
αλ·α performance newly high
slow down adaptation
else if rmean > θ and smean >0.5then
αt+1 αt+ αperformance high for long
else
αt+1 αtαperformance too low
end if
ctarget (1if rmean > θ
0otherwise
cmean βcmean + (1 β)ctarget
end while
The third term
cpain
is necessary to prevent unnatural optima.
One striking example is the over-use of mechanical forces of
PREPRINT 3
TABLE I: All used models. Trunk means that the trunk and
the pelvis of the model can move separately. Toes means that
the toes and the rest of the foot can move separately. The
designation 3D marks models that can walk in full 3D, as
opposed to planar movements.
Model # DOFs # muscles 3D trunk toes engine
H0918 9 18 Hyfydy
H1622 16 22 Hyfydy
H2190 21 90 Hyfydy
MyoLeg 20 80 MuJoCo
the joint limits (e.g. massive knee over-extension) to keep a
straight leg while minimizing muscle activity. As this is clearly
unnatural behavior, we include objectives that account for the
notion of pain:
cpain =w3X
i
τlim
i+w4X
j
FGRF
j,(4)
where
τlim
i
is the torque with which the joint angle limit of
joint
i
is violated (joint-limit pain) and
FGRF
j
is the vertical
ground reaction force (GRF) for foot
j
(joint-loading pain).
We only penalize GRFs if they exceed
1.2
times the model’s
body weight [36], [37], such that all pain cost terms vanish
close to the natural gait and do not further bias the solution.
We tuned the cost term weights
ωi
for
i {1, ..., 4}
by
first separating the kinematic data into gait cycles for each
leg, starting and ending when the respective foot touches the
ground. The resulting data is then averaged over all gait cycles
recorded from both legs. The average trajectory is finally
compared to its equivalent obtained from experimental human
data. The experimental match, defined as the fraction of the
gait cycle for which the average simulated trajectory overlaps
within the standard deviation of experimental data, serves as
an optimization metric for our cost terms. We note that the
coefficients are identical across all joints and muscles, and stress
that no human data was used during the learning process, but
only to find weighting coefficients. This procedure is similar
to [17], with the difference that we search for values that work
across a range of models, instead of optimally for one model.
Finally, we initialize the models with a randomized initial
state that starts with one elevated leg, while we also clip all
muscle excitations to lie between
0
and
0.5
to further reduce
muscle effort and mitigate asymmetries caused by the initial
state distribution.
B. Models
With the reward function and the RL approach described
above, we are able to learn robust control policies for several
models of human walking, with varying complexity, and
across two different simulation engines with different levels of
biomechanical accuracy (see Fig. 1):
H0918 A planar Hyfydy model with 9 degrees-of-freedom
(DOFs) and 18 muscles, based on [38].
H1622 A 3D Hyfydy model with 16 DOFs and 22 muscles,
based on [38].
H2190 A 3D Hyfydy model with 21 DOFs and 90 muscles,
and articulation between the otherwise rigid pelvis and torso,
based on [38]–[40].
MyoLeg A 3D MuJoCo model with 20 DOFs and 80
muscles, based on [39]. As for the H0918 and H1622 models,
the pelvis and torso are one rigid body part, while each foot
contains articulated toes (all five toes are joined into one body
segment). See Tab. I for a summary of the models.
C. Simulation engine
The simulation engines used for each model are indicated
in the description and are either: a) Hyfydy [41], which was
used via the SCONE Python API [37], or b) MuJoCo, which
was used via the MyoSuite [42] environment. We chose these
two engines, to highlight the versatility of our approach but
also to bridge two communities: biomechanics and RL.
Hyfydy is an engine built for biomechanical accuracy. It is
closely related to the well-established OpenSim [43] framework,
matching its level of detail in muscle and deformation-based
contact-force models while providing increased computational
performance. MuJoCo is a fast simulation framework widely
used in the robotics and RL community. It also offers a
simplified muscle model with rigid tendons and resolves contact
forces using the convex Gauss Principle. The MyoSuite [42]
builds on this framework, allowing for the development of
high-dimensional muscle-control models which have recently
gained a lot of interest from the RL community [25], [44],
[45]. Both engines achieve the required computational speed
to train control policies for these high-dimensional models in
under a day. See Suppl. A for more technical details.
D. Learned behaviors
We first show that with our framework, we can train agents
across 4 different models to produce walking gaits with the
same training approach and reward function. In Figure 2 we
compare the resulting gait kinematics against experimental data,
included in the SCONE software [37], [46]. Kinematics are
shown for 5 rollouts of the most human-like policy checkpoint
that was achieved over the entire training run over 10 random
seeds, averaged over all gait cycles of both legs in a 10 s walk
1
.
The results for the planar H0918 and the 3D H1622 model
look very similar to the experimental data, even though the an-
kle kinematics differ slightly. While the agents achieve the most
human-like gaits here, the models are also of limited complexity
and applicability, compared to the high-dimensional systems,
H2190 and MyoLeg. As seen in Tab. II and Fig. 2, our approach
still achieves periodic gaits resembling human kinematics with
the difficult-to-control 80 and 90 muscle models, even though
they contain more artifacts. The H2190-agent exhibits less
knee flexion and the MyoLeg-agent lacks the double-peaked
GRF structure; it also exhibits differences in the hip kinematics.
Overall, the behavior of the H2190 model appears more natural
than the one produced with the MyoLeg model, see also the
discussion in Sec. III and the supplementary videos.
Nevertheless, Tab. II shows that RL gaits not only approxi-
mate human walking but are also robust and energy-efficient
across all models, without changes in the reward function and
only minimal changes in the hyperparameters of the RL method.
1For videos, see: https://sites.google.com/view/naturalwalkingrl
PREPRINT 4
RL human-data
0.5
0.0
0.5
1.0
hip [rad]
H0918 H1622 H2190 MyoLeg
0
1
knee [rad]
0.5
0.0
0.5
ankle [rad]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0
1
2
GRF [BW]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
Fig. 2: Gait kinematics for RL agents for all models Shown are the hip, knee, ankle and GRF values averaged over 5
rollouts of 10 s walking on flat ground. We excluded rollouts that did not achieve the whole episode length to clearly highlight
the achieved kinematics. While there are slight discrepancies between experimental data (grey) and the RL behaviors (red),
especially for the high-dimensional models, the proposed reward function provides a strong starting point for researchers aiming
to create robust and natural controllers for high-dimensional musculoskeletal systems. Also, see the videos on the website.
We provide the training curves and additional metrics for 10
random seeds in Suppl. B.
In order to probe the robustness of our controllers, we
perform roll-outs on uneven terrain, which was not seen during
training. The entire training procedure was performed with flat
ground. The generated terrain contains 10 tiles of 1 m length
with random slopes of
±5
and is fixed for all evaluations.
The behavior of the planar H0918-model is compared against
a popular reflex-based controller as an illustrative example,
adapted from [26], [37] and included with the SCONE software.
We were only able to use this simple reflex-based controller
with the H0918 model, as it did not produce stable gaits with
the other models. We train 5 reflex-based controllers with
different initializations until convergence, while we use the
most natural RL policy for each model and perform 20 roll-
outs with randomized initial states to test the robustness. We
chose this approach as reflex-based controllers are sensitive to
the initial simulation state; different roll-outs would be almost
identical if we would have to use similar starting states.
While both approaches adequately match human kinematics
with low energy consumption in the planar case, the reflex-
based controller produces more natural gaits. However, when
exposed to uneven terrain, the RL agent achieves an average
distance of 10.42 m, which shows that it is much more robust
than the reflex controller with an average distance of 2.46 m,
see Tab. II. Both controllers also induce similar average muscle
activities over the gait cycle, with the RL agent inducing less
smooth activity, shown in Fig. 3.
With the same framework, we were also able to train
agents to learn maximum speed running, by simply using
the achieved velocity as the velocity reward in our reward
function. Additionally, the action clipping and effort costs
were omitted, as energy consumption is less critical for short
maximum performance tasks. See Suppl. C for these results.
As a showcase of the extreme robustness of the RL agents,
we generated a difficult drawbridge-terrain task with moveable
environment elements that present dynamic perturbations, see
Fig. 6b. We test the robustness of H1622 and H2190 RL
controllers in this scenario, even though they were only ever
trained on flat ground, and observe remarkable stability across
the task. We report the data in Tab. III and in the videos.
Note that we tried several alternatives to our approach which
yielded worse results. We performed experiments with different
reward terms such as a constant instead of an adaptive effort
term, with metabolic energy costs [3] or with a cost of transport
[47], [48] reward. Even though these terms sometimes lead to
small muscle activity during execution, the kinematics were
further away from human data. We conjecture that energy
PREPRINT 5
TABLE II: The table shows the average cubic muscle activ-
ity (effort), the percentage match with human experimental
data (exp. match), and the average distance walked on the rough
terrain. Note that the exp. match metric measures the percentage
of the gait cycle during which the trajectory perfectly lies inside
the standard deviation of the experimental data. Even relatively
natural gaits can still achieve a low metric if the angles are
slightly shifted.
controller system avg. effort
experimental
match
avg. dis-
tance [m]
reflex H0918 0.041 ±3×1030.68 ±0.08 2.46 ±0.98
RL H0918 0.013 ±3×1040.67 ±0.03 10.42 ±0.94
RL H1622 0.015 ±2×1030.73 ±0.01 5.6±0.99
RL H2190 0.017 ±1×1050.50 ±0.01 10.59 ±2.51
RL MyoLeg 0.013 ±2×1040.43 ±0.05 n.a.
minimization is not enough of an incentive for human-like gait
if the learning algorithm is as flexible as an RL agent. See
also Fig. 8 for ablations of our reward function.
Larger effort term exponents, penalization of contacts
between limbs or angle-based joint limit violation costs did
not lead to better behavior. The prescription of hip movement
at a certain frequency (step clock), keeping certain joint angles
in pre-specified positions or minimizing torso rotation helped
to achieve stable gaits, but prevented effort minimization and
did not lead to natural kinematics.
III. DISCUSSION
As the human biomechanical system is highly redundant,
there are many possible solutions to walking at a defined
speed. There exists strong evidence that natural human walking
is in part driven by energy-efficiency [49]. Optimal control
approaches have shown that natural walking kinematics can
be achieved if energy optimality is considered in the cost
function.2
However, in most RL approaches, energy consumption is
either ignored, or only static action regularization is used, which
affects learning but does not yield truly efficient behavior. By
introducing a single reward term schedule that adapts the
weighting of the energy term in the reward function depending
on the current performance, we achieved energy-efficient
gaits with more natural kinematics also in RL. Moreover,
the adaptation algorithm (Alg. 1) and all other reward terms
and their weighting coefficients are general enough to work—
without any changes—across 2D and 3D models with different
numbers of muscles and even different levels of biomechanical
modeling accuracy.
This is a significant step towards finding a general reward
function and framework to generate natural and robust move-
ments with RL in muscle-driven systems. Other RL frameworks
that do achieve natural muscle utilization either consider low-
dimensional systems [23] or strongly rely on motion capture
data [51] to render the learning problem feasible. Our approach
works without the use of motion capture data during training
2
Some also suggest that muscle fatigue could be the driving factor to explain
the experimentally observed kinematic patterns [50].
RL reflex-based
0.0
0.5
HAM
0.0
0.5
BF
0.0
0.5
GLU
0.0
0.5
ILP
0.0
0.5
VAS
0
1
GAS
0.0
0.5
SOL
0.0
0.5
TA
0.0 0.2 0.4 0.6 0.8 1.0
gait cycle [%]
0.0
0.5
REF
Fig. 3: Muscle activation for RL agent and reflex-based
controller. We compare muscle activities for two controller
types for natural walking with the H0918 model. The activity
for the RL agent has been clipped to
0.5
. We use 5 roll-outs
of the most natural RL policy and 5 reflex-based controllers
that were optimized until convergence. The initial state for the
RL agent is randomized, which would cause collapse with the
reflex controllers, as they are sensitive to the initial state.
and with few and very general reward terms and therefore may
generalize better to other movements.
In our opinion, the only comparable work is by Weng et
al. [20]. They achieved human-level kinematics on a planar
human model with 18 muscles, by crafting a multi-stage
learning curriculum affecting the weighting of seven reward
terms. As this learning curriculum contains model-specific
reward terms and adaptation procedures, we speculate that it
would have to be hand-tuned for different models.
While our approach achieved higher robustness than reflex-
based controllers and kinematics closer to natural walking than
previous demonstration-free RL approaches, several discrepan-
cies to natural walking remain, see Fig. 2 and supplementary
videos. The low-dimensional models (H0918 and H1622) in
general do not present proper ankle rolling, while the high-
PREPRINT 6
20
0
20
torso ang. [deg]
H2190
0246810
time [s]
20
0
20
torso ang. [deg]
MyoLeg
Fig. 4: Torso oscillations during walking. We show the torso
angle with the vertical axis for 5 rollouts of 10 s for the H2190-
and the MyoLeg-models. The MyoLeg presents stronger lateral
oscillations. The dashed line shows a straight torso posture.
dimensional models (H2190 and MyoLeg) exhibit less passive
leg-swing in the swing phase of the gait.
The behavior of the MyoLeg model deviates stronger from
human data than the H2190, although they are similar in terms
of complexity. This is most prominent in the ankle kinematics
and the lack of double-peak structure in the GRFs in the
MyoLeg model. We also observed a tendency for unnatural
lateral torso oscillations with the MyoLeg model, see Fig. 4
and the videos.
These differences in behaviors could be related to the model
parametrization, as the MyoLeg uses a different muscle geom-
etry from H2190 and includes mesh-based contact dynamics,
which might increase learning difficulty. Alternatively, the more
elaborate biomechanical features in Hyfydy, such as elastic
tendons [52], non-linear foot-ground contact mechanics [53],
variable pennation angles [54] or error-controlled integration,
could account for the increased realism of the behaviors with
the Hyfydy models. See Suppl. A for more details on the
simulation engines.
Research on the contribution of biomechanical structures
to the emergence of natural movement [55]–[58] suggest that,
in addition to the learning method and reward function, the
biomechanical structures and modeling choices may play a
crucial role in the accurate reproduction of human gait. This
seems a plausible explanation for the increased realism in
the Hyfydy models, as previous observations in predictive
simulations suggest that e.g. an elastic tendon is beneficial for
natural gait [3], [5], [26]. We regard this as one interesting
area of future research, which could help us better understand
the fundamentals of the interaction between biomechanics and
neuronal control in human locomotion.
In conclusion, we achieved highly robust walking approach-
ing human-like kinematics and ground reaction forces. While a
better degree of accuracy was achieved in simpler models, we
provide first promising results for difficult-to-control 80 and 90
muscle models which are of high interest for applications in
rehabilitation, neuroscience, and computer graphics. Learning
with the proposed reward function and RL framework allows for
these results across several models of differing complexity and
biomechanical modeling accuracy with only minimal changes
in the hyperparameters of the method. We hope that this inspires
researchers from both the biomechanics and the RL community
to further improve on our approach and to develop tools to
unravel the fundamentals of the generation of complex, robust,
and energy-efficient human movement.
ACKNOW LEDGMENT
Pierre Schumacher was supported by the International Max
Planck Research School for Intelligent Systems (IMPRS-IS).
This work was supported by the Cyber Valley Research Fund
(CyVy-RF-2020-11 to DH and GM).
REFERENCES
[1]
A. Patla, “Strategies for dynamic stability during adaptive human
locomotion,” IEEE Engineering in Medicine and Biology Magazine,
vol. 22, no. 2, pp. 48–52, 2003.
[2]
A. Falisse, G. Serrancolí, C. L. Dembia, J. Gillis, I. Jonkers,
and F. De Groote, “Rapid predictive simulations with complex
musculoskeletal models suggest that diverse healthy and pathological
human gaits can emerge from similar control strategies, Journal of The
Royal Society Interface, vol. 16, no. 157, p. 20190402, 2019. [Online].
Available: https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2019.
0402
[3]
J. Wang, S. Hamner, S. Delp, and V. Koltun, “Optimizing Locomotion
Controllers Using Biologically-Based Actuators and Objectives, ACM
Trans. on Graphics, vol. 31, no. 4, p. 25, 2012.
[4]
S. Song and H. Geyer, “Generalization of a muscle-reflex control model
to 3D walking,” in 2013 35th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC). IEEE,
Jul. 2013.
[5]
T. Geijtenbeek, M. van de Panne, and A. F. van der Stappen, “Flexible
muscle-based locomotion for bipedal creatures,” ACM Transactions on
Graphics, vol. 32, no. 6, 2013.
[6]
K. Veerkamp, N. Waterval, T. Geijtenbeek, C. Carty, D. Lloyd, J. Harlaar,
and M. van der Krogt, “Evaluating cost function criteria in predicting
healthy gait,” Journal of Biomechanics, vol. 123, p. 110530, jun 2021.
[7]
S. Song and . Geyer, Hartmut, “A neural circuitry that emphasizes
spinal feedback generates diverse behaviours of human locomotion, The
Journal of Physiology, vol. 593, no. 16, pp. 3493–3511, 2015.
[8]
S. Song and H. Geyer, “Evaluation of a Neuromechanical Walking Control
Model Using Disturbance Experiments,” Frontiers in Computational
Neuroscience, vol. 11, no. 3, 2017.
[9]
D. F. B. Haeufle, B. Schmortte, H. Geyer, R. Müller, and . Schmitt,
Syn, “The Benefit of Combining Neuronal Feedback and Feed-Forward
Control for Robustness in Step Down Perturbations of Simulated Human
Walking Depends on the Muscle Function, Frontiers in Computational
Neuroscience, vol. 12, no. 80, 2018.
[10]
A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged
locomotion in challenging terrains using egocentric vision,” in 6th
Annual Conference on Robot Learning, 2022. [Online]. Available:
https://openreview.net/forum?id=Re3NjSwf0WF
[11]
T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter,
“Learning robust perceptive locomotion for quadrupedal robots in the
wild,” Science Robotics, vol. 7, no. 62, p. eabk2822, 2022. [Online].
Available: https://www.science.org/doi/abs/10.1126/scirobotics.abk2822
[12]
V. L. Barbera, F. Pardo, Y. Tassa, M. Daley, C. Richards,
P. Kormushev, and J. Hutchinson, “OstrichRL: A musculoskeletal
ostrich simulation to study bio-mechanical locomotion,” in Deep
RL Workshop NeurIPS 2021, 2021. [Online]. Available: https:
//openreview.net/forum?id=7KzszSyQP0D
[13] S. Lee, M. Park, K. Lee, and J. Lee, “Scalable muscle-actuated human
simulation and control,” ACM Trans. Graph., vol. 38, no. 4, jul 2019.
[Online]. Available: https://doi.org/10.1145/3306346.3322972
[14]
J. Park, S. Min, P. S. Chang, J. Lee, M. S. Park, and J. Lee,
“Generative gaitnet, in ACM SIGGRAPH 2022 Conference Proceedings,
ser. SIGGRAPH ’22. New York, NY, USA: Association for Computing
Machinery, 2022. [Online]. Available: https://doi.org/10.1145/3528233.
3530717
PREPRINT 7
[15]
C. Qi, P. Abbeel, and A. Grover, “Imitating, fast and slow: Robust
learning from demonstrations via decision-time planning,” 2022.
[16]
S. Song, Ł. Kidzi´
nski, X. B. Peng, C. Ong, J. Hicks, S. Levine, C. G.
Atkeson, and S. L. Delp, “Deep reinforcement learning for modeling
human locomotion control in neuromechanical simulation,” Journal of
NeuroEngineering and Rehabilitation, vol. 18, no. 1, aug 2021.
[17]
B. Berret, E. Chiovetto, F. Nori, and T. Pozzo, “Evidence for composite
cost functions in arm movement planning: An inverse optimal control
approach,” PLOS Computational Biology, vol. 7, no. 10, pp. 1–18, 10
2011. [Online]. Available: https://doi.org/10.1371/journal.pcbi.1002183
[18]
X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp:
Adversarial motion priors for stylized physics-based character control,”
ACM Trans. Graph., vol. 40, no. 4, Jul. 2021. [Online]. Available:
http://doi.acm.org/10.1145/3450626.3459670
[19]
J. Weng, E. Hashemi, and A. Arami, “Human gait cost function varies
with walking speed: An inverse optimal control study,” IEEE Robotics
and Automation Letters, vol. 8, no. 8, pp. 4777–4784, 2023.
[20]
——, “Natural walking with musculoskeletal models using deep rein-
forcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 2,
pp. 4156–4162, 2021.
[21]
J. Xu, M. Macklin, V. Makoviychuk, Y. Narang, A. Garg, F. Ramos,
and W. Matusik, Accelerated policy learning with parallel differentiable
simulation,” in International Conference on Learning Representations,
2022. [Online]. Available: https://openreview.net/forum?id=ZSKRQMvttc
[22]
P. Schumacher, D. Haeufle, D. Büchler, S. Schmitt, and G. Martius,
“DEP-RL: Embodied exploration for reinforcement learning in
overactuated and musculoskeletal systems, in The Eleventh International
Conference on Learning Representations, 2023. [Online]. Available:
https://openreview.net/forum?id=C-xa_D3oTj6
[23]
Ł. Kidzi´
nski, S. P. Mohanty, C. F. Ong, Z. Huang, S. Zhou, A. Pechenko,
A. Stelmaszczyk, P. Jarosik, M. Pavlov, S. Kolesnikov, S. Plis, Z. Chen,
Z. Zhang, J. Chen, J. Shi, Z. Zheng, C. Yuan, Z. Lin, H. Michalewski,
P. Milos, B. Osinski, A. Melnik, M. Schilling, H. Ritter, S. F. Carroll,
J. Hicks, S. Levine, M. Salathé, and S. Delp, “Learning to run challenge
solutions: Adapting reinforcement learning methods for neuromus-
culoskeletal environments, in The NIPS ’17 Competition: Building
Intelligent Systems, S. Escalera and M. Weimer, Eds. Cham: Springer
International Publishing, 2018, pp. 121–153.
[24]
S. Song, Ł. Kidzi´
nski, X. B. Peng, C. Ong, J. Hicks, S. Levine, C. G.
Atkeson, and S. L. Delp, “Deep reinforcement learning for modeling
human locomotion control in neuromechanical simulation,” Journal of
NeuroEngineering and Rehabilitation, vol. 18, no. 1, p. 126, Aug 2021.
[Online]. Available: https://doi.org/10.1186/s12984-021-00919-y
[25]
C. Berg, V. Caggiano, and V. Kumar, “Sar: Generalization of physiological
agility and dexterity via synergistic action representation, 2023.
[26]
H. Geyer and H. Herr, “A muscle-reflex model that encodes principles
of legged mechanics produces human walking dynamics and muscle
activities,” IEEE Trans Neural Syst Rehabil Eng, vol. 18, no. 3, pp.
263–273, Jun 2010.
[27]
R. Ramadan, H. Geyer, J. Jeka, G. Schöner, and H. Reimann, A
neuromuscular model of human locomotion combines spinal reflex
circuits with voluntary movements, Scientific Reports, vol. 12, no. 1,
may 2022.
[28]
L. Schreff, D. F. B. Haeufle, J. Vielemeyer, and R. Müller, “Evaluating
anticipatory control strategies for their capability to cope with step-down
perturbations in computer simulations of human walking,” Scientific
Reports, vol. 12, no. 1, jun 2022.
[29]
B. J. Mohler, W. B. Thompson, S. H. Creem-Regehr, H. L. Pick, Jr,
and W. H. Warren, Jr, “Visual flow influences gait transition speed and
preferred walking speed,” Exp. Brain Res., vol. 181, no. 2, pp. 221–228,
Aug. 2007.
[30]
N. Rudin, D. Hoeller, M. Bjelonic, and M. Hutter, Advanced skills by
learning locomotion and local navigation end-to-end, in 2022 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), 2022,
pp. 2497–2503.
[31]
D. Abe, Y. Fukuoka, and M. Horiuchi, “Economical speed
and energetically optimal transition speed evaluated by gross
and net oxygen cost of transport at different gradients, PLOS
ONE, vol. 10, no. 9, pp. 1–14, 09 2015. [Online]. Available:
https://doi.org/10.1371/journal.pone.0138154
[32]
P. C. Raffalt, M. K. Guul, A. N. Nielsen, S. Puthusserypady, and
T. Alkjær, “Economy, movement dynamics, and muscle activity of
human walking at different speeds, Scientific Reports, vol. 7, no. 1, p.
43986, Mar 2017. [Online]. Available: https://doi.org/10.1038/srep43986
[33]
M. Ackermann and A. J. van den Bogert, “Optimality principles
for model-based prediction of human gait,” Journal of Biomechanics,
vol. 43, no. 6, pp. 1055–1060, 2010. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0021929009007210
[34]
T. Zahavy, Y. Schroecker, F. Behbahani, K. Baumli, S. Flennerhag,
S. Hou, and S. Singh, “Discovering policies with DOMiNO: Diversity
optimization maintaining near optimality, in The Eleventh International
Conference on Learning Representations, 2023. [Online]. Available:
https://openreview.net/forum?id=kjkdzBW3b8p
[35]
K. Lee, L. Smith, and P. Abbeel, “Pebble: Feedback-efficient interactive
reinforcement learning via relabeling experience and unsupervised pre-
training,” in International Conference on Machine Learning, 2021.
[36]
J. NILSSON and A. THORSTENSSON, “Ground reaction forces at
different speeds of human walking and running, Acta Physiologica
Scandinavica, vol. 136, no. 2, pp. 217–227, 1989. [Online].
Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1748-1716.
1989.tb08655.x
[37]
T. Geijtenbeek, “SCONE: Open Source Software for Predictive
Simulation of Biological Motion,” Journal of Open Source Software,
vol. 4, no. 38, p. 1421, 2019. [Online]. Available: https://scone.software
[38]
S. L. Delp, J. P. Loan, M. G. Hoy, and F. E. Zajac, “An interactive
Graphics-Based Model of the Lower Extremity to Study Orthopaedic
Surgical Procedures,” pp. 757 767, 1990.
[39]
A. Rajagopal, C. Dembia, M. DeMers, D. Delp, J. Hicks, and S. Delp,
“Full body musculoskeletal model for muscle-driven simulation of human
gait,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 10,
pp. 2068–2079, 2016.
[40]
M. Christophy, N. A. Faruk Senan, J. C. Lotz, and O. M. O’Reilly,
“A musculoskeletal model for the lumbar spine.” Biomechanics and
modeling in mechanobiology, vol. 11, no. 1-2, pp. 19–34, jan 2012.
[41]
T. Geijtenbeek, “The Hyfydy simulation software, 11 2021. [Online].
Available: https://hyfydy.com
[42]
V. Caggiano, H. Wang, G. Durandau, M. Sartori, and V. Kumar,
“Myosuite a contact-rich simulation suite for musculoskeletal motor
control,” https://github.com/facebookresearch/myosuite, 2022. [Online].
Available: https://sites.google.com/view/myosuite
[43]
A. Seth et al., “Opensim: Simulating musculoskeletal dynamics and
neuromuscular control to study human and animal movement, PLoS
computational biology, vol. 14, no. 7, p. e1006223, 2018.
[44]
V. Caggiano, S. Dasari, and V. Kumar. MyoDex: A Generalizable Prior
for Dexterous Manipulation. [Online]. Available: https://openreview.net/
forum?id=iYBTiYzN0A
[45]
A. S. Chiappa, A. M. Vargas, A. Z. Huang, and A. Mathis, “Latent
exploration for reinforcement learning,” 2023.
[46]
G. Bovi, M. Rabuffetti, P. Mazzoleni, and M. Ferrarin, “A multiple-task
gait analysis approach: Kinematic, kinetic and EMG reference data for
healthy young and adult subjects,” Gait & Posture, vol. 33, no. 1, pp.
6–13, jan 2011.
[47]
K. Veerkamp, N. Waterval, T. Geijtenbeek, C. Carty, D. Lloyd,
J. Harlaar, and M. van der Krogt, “Evaluating cost function criteria in
predicting healthy gait,” Journal of Biomechanics, vol. 123, p. 110530,
2021. [Online]. Available: https://www.sciencedirect.com/science/article/
pii/S0021929021003110
[48]
A. Mastrogeorgiou, A. Papatheodorou, K. Koutsoukis, and E. Papadopou-
los, “Learning energy-efficient trotting for legged robots, in Robotics in
Natural Settings. Cham: Springer International Publishing, 2023, pp.
204–215.
[49]
J. Selinger, S. O’Connor, J. Wong, and J. Donelan, “Humans
can continuously optimize energetic cost during walking, Current
Biology, vol. 25, no. 18, pp. 2452–2456, 2015. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0960982215009586
[50]
M. Ackermann and A. J. van den Bogert, “Optimality principles for
model-based prediction of human gait,” Journal of Biomechanics, vol. 43,
no. 6, pp. 1055–1060, apr 2010.
[51] S. Lee, M. Park, K. Lee, and J. Lee, “Scalable muscle-actuated human
simulation and control,” ACM Trans. Graph., vol. 38, no. 4, jul 2019.
[Online]. Available: https://doi.org/10.1145/3306346.3322972
[52]
M. Ishikawa, P. V. Komi, M. J. Grey, V. Lepola, and G.-P.
Bruggemann, “Muscle-tendon interaction and elastic energy usage
in human walking,” Journal of Applied Physiology, vol. 99,
no. 2, pp. 603–608, 2005, pMID: 15845776. [Online]. Available:
https://doi.org/10.1152/japplphysiol.00189.2005
[53]
L. Saraiva, M. Rodrigues da Silva, F. Marques, M. Tavares
da Silva, and P. Flores, “A review on foot-ground contact
modeling strategies for human motion analysis,” Mechanism and
Machine Theory, vol. 177, p. 105046, 2022. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0094114X22002932
[54]
R. Sopher, A. Amis, D. Davies, and J. Jeffers, “The influence of muscle
pennation angle and cross-sectional area on contact forces in the ankle
PREPRINT 8
joint,” The Journal of Strain Analysis for Engineering Design, vol. 52,
09 2016.
[55]
K. G. Gerritsen, A. J. van den Bogert, M. Hulliger, and . Zernicke,
Ronald F, “Intrinsic Muscle Properties Facilitate Locomotor Control - A
Computer Simulation Study,” Motor Control, vol. 2, no. 3, pp. 206–220,
jul 1998.
[56]
D. F. Haeufle, S. Grimmer, and . Seyfarth, A., “The role of intrinsic
muscle properties for stable hopping - stability is achieved by the force-
velocity relation,” Bioinspiration & Biomimetics, vol. 5, no. 1, p. 016004,
2010.
[57]
C. T. John, F. C. Anderson, J. S. Higginson, and S. L. Delp, “Stabilisation
of walking by intrinsic muscle properties revealed in a three-dimensional
muscle-driven simulation.” Computer methods in biomechanics and
biomedical engineering, vol. 16, no. 4, pp. 451–62, apr 2013.
[58]
I. Wochner, P. Schumacher, G. Martius, D. Büchler, S. Schmitt, and
D. Haeufle, “Learning with muscles: Benefits for data-efficiency and
robustness in anthropomorphic tasks,” in Proceedings of The 6th
Conference on Robot Learning, ser. Proceedings of Machine Learning
Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205. PMLR,
14–18 Dec 2023, pp. 1178–1188.
[59]
M. Millard, T. Uchida, A. Seth, and S. L. Delp, “Flexing computational
muscle: modeling and simulation of musculotendon dynamics.” Journal
of biomechanical engineering, vol. 135, no. 2, p. 021005, feb 2013.
[60]
K. H. Hunt and F. R. E. Crossley, “Coefficient of Restitution Interpreted
as Damping in Vibroimpact, Journal of Applied Mechanics, vol. 42,
no. 2, p. 440, jun 1975.
[61]
M. a. Sherman, A. Seth, and S. L. Delp, “Simbody: multibody dynamics
for biomedical research,” Procedia IUTAM, vol. 2, pp. 241–261, jan
2011.
[62]
S. R. Hamner and S. L. Delp, “Muscle contributions to fore-aft
and vertical body mass center accelerations over a range of running
speeds,” Journal of Biomechanics, vol. 46, no. 4, pp. 780–787, 2013.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0021929012006768
[63]
F. Pardo, “Tonic: A deep reinforcement learning library for fast
prototyping and benchmarking,” arXiv preprint arXiv:2011.07537, 2020.
APPENDIX
A. Simulation Engines
The Hyfydy and MuJoCo simulation engines differ in these
key areas:
Musculotendon dynamics The muscle model in Hyfydy is
based on Millard at el. [59] and includes elastic tendons, muscle
pennation, and muscle fiber damping. The MuJoCo muscle
model is based on a simplified Hill-type model, parameterized
to match existing OpenSim models [42], and supports only
rigid tendons and does not include variable pennation angles.
Contact forces Hyfydy uses the Hunt-Crossly [60] contact
model with non-linear damping to generate contact forces, with
a friction cone based on dynamic, static, and viscous friction
coefficients [61]. MuJoCo contacts are rigid, with a friction
pyramid instead of a cone, and without separate coefficients
for dynamic and viscous friction.
Contact geometry The MuJoCo model uses a convex
mesh for foot geometry, while in the Hyfydy models the foot
geometry is approximated using three contact spheres.
Integration step size Hyfydy uses an error-controlled
integrator with variable step size, while MuJoCo uses a fixed
step size and no error control. The average simulation step size
in Hyfydy is around 0.00014s (7000 Hz) for the H2190 model,
compared to the fixed MyoSuite step size of 0.001s (1000Hz)
for the MyoLeg model.
B. Training Curves
Here we present more detailed results about the training
evolution in Fig. 5. We plot the experimental match percentage
H0918 H1622 H2190 MyoLeg
0.00
0.33
0.67
exp. match[%]
0.1
0.2
muscle activity
0
5000
10000
return
0.0 0.2 0.4 0.6 0.8 1.0
iteration ×108
0.0
0.1
effort weight
Fig. 5: Training curves for the walking task. We present the
evolution of the match with experimental data (exp. match),
the task performance, the averaged muscle activity, and the
effort cost weight. The cost weight increases more slowly for
the more complex models, showing its adaptive nature.
between the collected gait-cycle averaged data and experimental
human data, the muscle-averaged effort, the training returns,
and the weight that the effort-reward term has over training.
This weight is adapted over time and depends on the agent’s
performance. It increases slower for the complex models and
saturates at smaller values. It can also be seen that the returns
for the MyoLeg are generally smaller than for the other models.
We observed that there was more variance over training and
over different seeds for the MyoLeg-agent, leading to much
smaller averaged returns. It was still possible to find a training
checkpoint that achieved robust, close-to human-like walking
for this model.
C. Running
We performed maximum-speed running experiments with
every model. While most reward terms remained identical to
the natural walking case, we replace the external task reward
by the velocity of the center of mass
rvel =v
and removed
energetic constraints such as the muscle excitation clipping
and the effort cost term. The gait-cycle- and leg-averaged
kinematics are shown in Fig. C. As this task is a maximum
performance movement, we have equalized the forces between
the Hyfydy- and MuJoCo-based models, as the Hyfydy-models
in the main experiments are generally based on experimental
data with weaker maximum isometric muscle forces [38]. Note
that we added a negative reward for self-collision forces for
PREPRINT 9
TABLE III: Maximum running velocity for different models,
expressed in
m
/
s
and total achieved distance in the rough terrain
environment. We only show the maximum speed over 20 roll-
outs for each model to show the largest velocity that we were
able to achieve. For the rough terrain, we also record 20 roll-
outs for each agent. We do not perform this experiment for the
H0918 model, as the 3D nature of the terrain is not applicable
to it, and we do not apply it to the MyoLeg model, as the
terrain has not been implemented in the MuJoCo simulator.
system H0918 H1622 H2190 MyoLeg
max. velocity 5.38 5.04 6.49 5.44
achieved distance n.a. 9.87 ±4.27 10.45 ±4.77 n.a.
the running tasks, as the agents would often cross their legs
and hit them against each other, thereby hopping instead of
running.
Even though there remains a stronger discrepancy between
the produced kinematics and the experimental data than for
walking, the hip movement and GRFs are generally well aligned
for the Hyfydy-models. The MyoLeg-model presents very
strong lateral torso oscillations during running, see also Fig. 6a.
In future work, biological objectives such as head stabilization
or the inclusion of arms in the model might minimize some of
these artifacts. See Tab. III for the maximum running velocities
for each model.
We also performed robustness experiments on a challenging
obstacle course, see Fig. 6b and supplementary videos.
D. Reward function ablation
We perform ablations on our reward, see Fig. 8. Throughout
all considered variations, only the full reward functions leads
to gaits resembling human kinematics with low muscle activity
across all models.
E. Hyperparameters
Used hyperparameter settings for the RL agent, DEP and the
cost function are shown in Tab. IV. Non-reported RL values
are left to their default setting in TonicRL [63]. See [22] for
an explanation of the DEP-specific terms. The RL parameters
were held constant, except for an increase in network capacity
for H2190 and MyoLeg.
PREPRINT 10
50
0
50
torso ang. [deg]
H2190
0246810
time [s]
50
0
50
torso ang. [deg]
MyoLeg
(a) Torso oscillations during flat-ground running. We show the torso
angle with the vertical axis for 5 rollouts of 10 s for the H2190 and
the MyoLeg models. The MyoLeg presents strong lateral oscillations.
The dashed line shows a straight torso posture. textesttttttttttttttttttttt
ttttsssssssssssssssssssssss
(b) Dynamic terrain for running. We probe the robustness of our
policies trained for the H1622 and the H2190 with challenging obstacles.
The tiles of the bridge rotate around the central axis and hang
downwards, similar to a drawbridge. The agents were trained on flat
ground and only have access to proprioceptive feedback.
RL human-data
0
1
hip [rad]
H0918 H1622 H2190 MyoLeg
0
1
2
knee [rad]
0.5
0.0
0.5
ankle [rad]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0
2
4
GRF [BW]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
0.00 0.25 0.50 0.75 1.00
gait cycle [%]
Fig. 7: Gait-cycle kinematics for running. The experimental data shows human subjects running at
5m
/
s
and was extracted
from [62].
PREPRINT 11
ours no-adapt no-effort only-vel
0.00
0.35
0.70
exp. match[%]
H0918 H1622 H2190
0.00 0.25 0.50 0.75 1.00
iteration ×108
0.2
0.4
0.6
muscle activity
0.00 0.25 0.50 0.75 1.00
iteration ×108
0.00 0.25 0.50 0.75 1.00
iteration ×108
Fig. 8: Cost function ablations. We show several ablations of our cost function and plot the average match with experimental
human data, as detailed in the main paper, as well as the average muscle activity. A natural gait is generally characterized by a
large experimental match as well as minimal muscle activity. Different ablations are shown: The adaptive effort term is zero
(
α(t)=0
): no-adapt. The entire effort cost term is zero (
ceffort = 0
) and we deactivate the action clipping: no-effort. We only
reward with the velocity reward term (
ceffort = 0
&
cpain = 0
): only-vel. Only the combined cost function achieves a close
resemblance to natural gait with low muscle activity. Leaving out the pain-related costs leads to the worst gait trajectories,
while a combination of the effort cost terms and the adaptive cost term is needed to achieve the lowest muscle activity.
TABLE IV: Hyperparameters for all algorithms.
(a) DEP settings.
Parameter Value
DEP κ1200
τ40
buffer size 200
bias rate 0.002
s4avg 2
time dist (t)5
integration pswitch 3.71 ×104
HDEP 8
test episode 3
force scale n.a.
(b) MPO settings
Parameter Value
buffer size 1e6
batch size 256
steps before batches 2e5
steps between batches 1000
number of batches 30
n-step return 1
n parallel 20
n sequential 10
hidden layers 2
hidden sizes 256
lractor 3×104
lrcritic 3×104
lrdual 1×102
(c) MPO setting changes for H2190 and MyoLeg.
Parameter Value
hidden sizes 1024
lractor 3.53 ×105
lrcritic 6.08 ×105
lrdual 2.13 ×103
(d) Cost function settings.
Parameter Value Meaning
ω10.097 action smoothing
ω21.579 number of active muscles above 15% activity
ω30.131 joint limit torque
ω40.073 GRFs above 1.2 BW
α9×104change in adaptation rate
θ1000 performance threshold
β0.8 running avg. smoothing
λ0.9 decay term
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Quadrupedal locomotion skills are challenging to develop. In recent years, Deep Reinforcement Learning (DRL) promises to automate the development of locomotion controllers and map sensory observations to low-level actions. However, legged locomotion still is a challenging task for DRL algorithms, especially when energy efficiency is taken into consideration. In this paper, we propose a DRL scheme for efficient trotting applied on Laelaps II quadruped in MuJoCo. First, an accurate model of the robot is created by revealing the necessary parameters to be imported in the simulation, while special focus is given to the quadruped’s drivetrain. Concerning, the reward function and the action space, we investigate the best way to integrate in the reward, the terms necessary to minimize the Cost of Transport (CoT) while maintaining a trotting locomotion pattern. Last, we present how our solution increased the energy efficiency for a simple task of trotting on level terrain similar to the treadmill-robot environment at the Control Systems Lab [1] of NTUA.KeywordsLegged robotsLearning locomotionEnergy efficiencyDeep reinforcement learning
Article
Full-text available
Previous simulation studies investigated the role of reflexes and central pattern generators to explain the kinematic and dynamic adaptations in reaction to step-down perturbations. However, experiments also show preparatory adaptations in humans based on visual anticipation of a perturbation. In this study, we propose a high-level anticipatory strategy augmenting a low-level muscle-reflex control. This strategy directly changes the gain of the reflex control exclusively during the last contact prior to a drop in ground level. Our simulations show that especially the anticipatory reduction of soleus activity and the increase of hamstrings activity result in higher robustness. The best results were obtained when the change in stimulation of the soleus muscle occurred 300 ms after the heel strike of the contralateral leg. This enabled the model to descend perturbation heights up to − 0.21 m and the resulting kinematic and dynamic adaptations are similar to the experimental observations. This proves that the anticipatory strategy observed in experiments has the purpose of increasing robustness. Furthermore, this strategy outperforms other reactive strategies, e.g., pure feedback control or combined feedback and feed-forward control, with maximum perturbation heights of − 0.03 and − 0.07 m, respectively.
Article
Full-text available
Existing models of human walking use low-level reflexes or neural oscillators to generate movement. While appropriate to generate the stable, rhythmic movement patterns of steady-state walking, these models lack the ability to change their movement patterns or spontaneously generate new movements in the specific, goal-directed way characteristic of voluntary movements. Here we present a neuromuscular model of human locomotion that bridges this gap and combines the ability to execute goal directed movements with the generation of stable, rhythmic movement patterns that are required for robust locomotion. The model represents goals for voluntary movements of the swing leg on the task level of swing leg joint kinematics. Smooth movements plans towards the goal configuration are generated on the task level and transformed into descending motor commands that execute the planned movements, using internal models. The movement goals and plans are updated in real time based on sensory feedback and task constraints. On the spinal level, the descending commands during the swing phase are integrated with a generic stretch reflex for each muscle. Stance leg control solely relies on dedicated spinal reflex pathways. Spinal reflexes stimulate Hill-type muscles that actuate a biomechanical model with eight internal joints and six free-body degrees of freedom. The model is able to generate voluntary, goal-directed reaching movements with the swing leg and combine multiple movements in a rhythmic sequence. During walking, the swing leg is moved in a goal-directed manner to a target that is updated in real-time based on sensory feedback to maintain upright balance, while the stance leg is stabilized by low-level reflexes and a behavioral organization switching between swing and stance control for each leg. With this combination of reflex-based stance leg and voluntary, goal-directed control of the swing leg, the model controller generates rhythmic, stable walking patterns in which the swing leg movement can be flexibly updated in real-time to step over or around obstacles.
Article
Full-text available
Modeling human motor control and predicting how humans will move in novel environments is a grand scientific challenge. Researchers in the fields of biomechanics and motor control have proposed and evaluated motor control models via neuromechanical simulations, which produce physically correct motions of a musculoskeletal model. Typically, researchers have developed control models that encode physiologically plausible motor control hypotheses and compared the resulting simulation behaviors to measurable human motion data. While such plausible control models were able to simulate and explain many basic locomotion behaviors (e.g. walking, running, and climbing stairs), modeling higher layer controls (e.g. processing environment cues, planning long-term motion strategies, and coordinating basic motor skills to navigate in dynamic and complex environments) remains a challenge. Recent advances in deep reinforcement learning lay a foundation for modeling these complex control processes and controlling a diverse repertoire of human movement; however, reinforcement learning has been rarely applied in neuromechanical simulation to model human control. In this paper, we review the current state of neuromechanical simulations, along with the fundamentals of reinforcement learning, as it applies to human locomotion. We also present a scientific competition and accompanying software platform, which we have organized to accelerate the use of reinforcement learning in neuromechanical simulations. This “Learn to Move” competition was an official competition at the NeurIPS conference from 2017 to 2019 and attracted over 1300 teams from around the world. Top teams adapted state-of-the-art deep reinforcement learning techniques and produced motions, such as quick turning and walk-to-stand transitions, that have not been demonstrated before in neuromechanical simulations without utilizing reference motion data. We close with a discussion of future opportunities at the intersection of human movement simulation and reinforcement learning and our plans to extend the Learn to Move competition to further facilitate interdisciplinary collaboration in modeling human motor control for biomechanics and rehabilitation research
Article
Full-text available
Accurate predictive simulations of human gait rely on optimisation criteria to solve the system’s redundancy. Defining such criteria is challenging, as the objectives driving the optimization of human gait are unclear. This study evaluated how minimising various physiologically-based criteria (i.e., cost of transport, muscle activity, head stability, foot-ground impact, and knee ligament use) affects the predicted gait, and developed and evaluated a combined, weighted cost function tuned to predict healthy gait. A generic planar musculoskeletal model with 18 Hill-type muscles was actuated using a reflex-based, parameterized controller. First, the criteria were applied into the base simulation framework separately. The gait pattern predicted by minimising each criterion was compared to experimental data of healthy gait using coefficients of determination (R²) and root mean square errors (RMSE) averaged over all biomechanical variables. Second, the optimal weighted combined cost function was created through stepwise addition of the criteria. Third, performance of the resulting combined cost function was evaluated by comparing the predicted gait to a simulation that was optimised solely to track experimental data. Optimising for each of the criteria separately showed their individual contribution to distinct aspects of gait (overall R²: 0.37-0.56; RMSE: 3.47-4.63 SD). An optimally weighted combined cost function provided improved overall agreement with experimental data (overall R²: 0.72; RMSE: 2.10 SD), and its performance was close to what is maximally achievable for the underlying simulation framework. This study showed how various optimisation criteria contribute to synthesising gait and that careful weighting of them is essential in predicting healthy gait.
Article
This work investigates the optimal cost function composition for human gait at different walking speeds. Kinematic and kinetic data for walking at four walking speeds were collected from five able-bodied individuals. The data was then used to recover optimal cost functions in a predictive simulation environment with musculoskeletal models. 20 inverse optimal control (IOC) problems were solved for cost function weight tuning using the previously developed and validated Adaptive Reference IOC (AR-IOC) algorithm. Given the walking speed range examined (0.6-1.5m/s), the converged cost function weights suggest that the increase in walking speed attributes to a reduction of foot sliding penalty weight and weight increase for the center of mass (CoM) acceleration and stability as confirmed by several experiments. Furthermore, we did not observe any significant weight shift in effort reduction between the upper and the lower body with respect to walking speed. The obtained results from this study can be used in a toolbox for obtaining subject- and task-specific cost functions and assisting the development of personalized rehabilitation technologies.
Article
This review discusses the methodologies available in the literature, usually utilized to study the foot-ground interaction in the human locomotion. For this purpose, PubMed, Web of Science and Scopus databases were considered to identify publications focusing on foot-ground contact modeling strategies. As the consequence of a critical reading of those publications, some additional papers were identified and included in the present review. A total of 30 papers were analyzed, in which different contact geometries were established with the goal of defining the foot and ground surfaces. The most common geometries were based on points, circles, ellipses, spheres, ellipsoids, rectangular contact elements and surfaces obtained from 3D scanning procedures. Regarding the resolution of the foot-ground interaction, the formulations based on contact force approaches were preferred to the methods based on geometrical constraints. Several studies considered both computational and experimental approaches. One of the main limitations reported in the analyzed papers dealt with the restriction of the motion to the sagittal plane. It was observed that a standard and general procedure to formulate the human foot-ground contact is still lacking.