Online Learning of Hybrid Models for Variable
Impedance Control of a Changing-Contact
University of Birmingham, UK
University of Birmingham, UK
Honda Research Institute (EU), Germany
Abstract—Many manipulation tasks comprise discrete action
sequences characterized by continuous dynamics, with the tran-
sitions between these discrete dynamic modes characterized by
discontinuous dynamics. The individual modes can represent
different types of contacts, surfaces, or other factors, and
different control strategies may be needed for each mode and
the transitions between modes. This paper describes a piece-wise
continuous, hybrid control framework that automatically detects
transitions between modes and incrementally learns a model
of the dynamics of each mode to support variable impedance
control in that mode. The recognition of modes is invariant to
the direction of motion and the magnitude of applied forces.
Also, new modes are identiﬁed automatically and suitable models
of the corresponding dynamics are learned. The framework is
evaluated on a robot manipulator sliding an object on a surface
along a desired motion trajectory in the presence of changes in
surface friction, applied force, or the type of contact between
the object and the surface. Experimental results indicate reliable
and efﬁcient recognition of modes, learning of dynamics models,
and variable-impedance control during task execution.
I. MOTI VATI ON
Consider a robot manipulator sliding an object over a
surface along a desired pattern, as shown in Figure 1. The
system’s dynamics vary markedly before and after the object
comes in contact with the surface, and based on the type
of contact (e.g., surface or edge contact), surface friction,
applied force, and other factors. We consider all such tasks that
involve changes in dynamics due to changes in the nature of
contact, i.e., changes in the interaction between two objects as
“changing-contact” tasks. Many practical manipulation tasks
are changing-contact tasks characterized by discontinuities in
the dynamics when the nature of the contact changes, making
it difﬁcult to learn a single model of the task dynamics. They
can be modeled as a hybrid system with continuous dynamics
within each of a number of discrete dynamic modes that
may need a distinct control strategy . Then, the overall
task’s dynamics are piece-wise continuous, with the system
transitioning between the individual modes over time.
Constructing separate (continuous) models for the different
modes, each well-suited for operation within a mode, elim-
inates the need for a combined model but it introduces the
need for a transition model that chooses from the models and
the control strategies for the individual modes. Such a hybrid
model  uses the modularity of the sub-tasks to construct the
overall transition model . However, it requires the ability
Fig. 1: Sliding an object along a desired pattern on three surfaces with
different values of friction; images represent different transitions.
to accurately recognize the mode at any point in time, revise
the existing dynamics models to adapt to the changes within
any given mode, and to identify and learn dynamics models for
previously unseen modes. This paper describes a framework
that addresses these requirements and enables hybrid control
of a changing-contact manipulation task by:
1) Incrementally learning a non-linear, piece-wise contin-
uous model of the dynamics of any given task from a
single demonstration, without prior knowledge of the
task, its modes, or the order in which the modes appear.
2) Incorporating a transition model that automatically clus-
ters the modes associated with any given task, and helps
transition to new modes during task execution.
3) Introducing a reduced feature representation that makes
the learning of dynamics models computationally efﬁ-
cient, and makes the identiﬁcation of modes independent
of the motion direction and magnitude of applied forces.
4) Incrementally and continuously learns and revises a
probabilistic model of the dynamics of any particular
mode, using the model for variable impedance control
and compliant motion within that mode.
The novelty is in the ﬁrst three contributions; the last one
builds on our prior work on variable impedance control of
continuous contact tasks . To better understand the learn-
ing and control challenges faced by such a learning framework
(e.g., discretely-changing, non-linear, piecewise continuous
environment dynamics), we chose to explore a representative
changing-contact task with discretely changing dynamics: a
robot manipulator sliding an object on a surface in a desired
motion pattern. We limit sensor input to that from a force-
torque sensor on the manipulator (i.e., we do not use an
external camera), and evaluate the framework in the presence
of discrete changes in surface friction, applied force, and type
We begin with a review of related work in Section II,
followed by a description of the proposed framework in
Section III. Section IV discusses the experimental results,
followed by the conclusions in Section V.
II. RE LATE D WOR K
There is a rich literature of methods developed to address
the learning and control problems associated with robot ma-
nipulation . This includes reinforcement learning (RL)
methods [9, 24] and recent methods based on a combination
of deep networks and RL for learning ﬂexible behaviors from
complex data [1, 4, 14, 15]. These data-driven methods require
large labeled datasets, which often need to be obtained from
multiple repetitions of the task by the robot. These require-
ments are difﬁcult to satisfy in practical domains, especially
on a physical robot. Also, the training process optimizes
parameters of speciﬁc skills (or their sequence), and the
internal representations and decision making mechanisms are
opaque, making it difﬁcult to transferred the learned policies to
new tasks. Sim-to-real learning strategies have been developed
to reduce the need to perform training on real robots for
manipulation tasks. However, aspects of dynamics in the real
world, e.g., the continuous time dynamics of rigid bodies with
friction, are too complicated (NP-hard) to be modelled in a
real-time dynamics simulator [2, 7]. Also, these methods are
also not well-suited to work with a hybrid system formulation
because they (implicitly or explicitly) consider a single model
over the different modes of the manipulation action .
RL and optimal control methods applied to robot ma-
nipulation often assume that the underlying task dynamics
are smooth. Also, the application of learning strategies to
hybrid systems has been limited [12, 26], with many of them
focusing on bipedal locomotion [8, 17]. Planning approaches
for manipulation domains often explicitly take the multi-modal
structure of the dynamics of manipulation into account [27, 6].
However, these planning methods assume a pre-deﬁned model
of the system and prior knowledge of actions and the modes.
Unlike online learning approaches such as , our framework
does not require a periodically repeating trajectory, nor does
it learn a time-series of controller parameters to be used in
a repeatable dynamic environment. The framework learns to
adapt its controller based on the current dynamic forces it
Many methods have shown the beneﬁts of incorporating
modes or phases into the design of controllers , and
many methods have been proposed to learn controllers for
such multi-phase tasks [3, 10, 13]. Different strategies of
sequencing motion primitives have also been used to solve ma-
nipulation tasks. However, most of these assume that a library
of modes or motion primitives already exists , or segment a
sequence of primitives from human demonstrations . This
makes the learned policy dependent on the speciﬁc movements
and their sequence.
The framework described in this paper for changing-contact
manipulation draws inspiration from the approaches that in-
corporate modes in the design of controllers. However, our
framework supports (a) automatic recognition of modes and
identiﬁcation of new modes invariant to the direction of motion
and magnitude of the applied force; and (b) incremental learn-
ing and revision of dynamic models for variable impedance
control in the individual modes.
III. PROB LE M FORMULATION AND FRA ME WORK
This section ﬁrst describes the formulation of changing-
contact manipulation tasks as a piece-wise continuous hybrid
system (Section III-A). Section III-B describes the control
strategy and learning of continuous dynamics within a single
mode. The detection and learning of the discrete dynamic
modes are then explained in Section III-C.
A. Piece-wise Continuous Hybrid System
In a piece-wise continuous hybrid system, the state can
be described as the tuple (m, s)where m∈Mis a mode
from the discrete set of modes M, and s∈Smis an
element in the continuous subspace Sm⊆Rdassociated
with m. This formulation assumes that subspaces do not
intersect or overlap, i.e., Sm∩Sn=∅ ∀ m6=n. The
evolution of swithin a mode is determined by a discrete-
time continuous function Sm(.), but the state transition is
discrete and discontinuous at the boundaries between modes.
Lee et al.  called the boundary between modes mand m0,
where the transition occurs deterministically, as guard regions
denoted as Gm,m0⊆Sm. In the guard regions, sis transported
to st+1 ∈Sm0through a reset function rm,m0(.). The state
propagation is thus governed by:
st+1 =(rmt,mt+1 (st) + wtif st∈Gmt,mt+1
Sm(st) + wtif st∈Smt
where wtis additive (Gaussian) process noise. In the context
of the sliding task considered in this paper, the forces and
torques measured by the robot at its end-effector constitute
the observable state (s) of the system that varies continuously
within each contact mode. This formulation makes the reason-
able assumption that properties such as friction are continuous
across the surface of each object. The control strategy guiding
the object’s motion and the (static or smoothly changing)
environment in that mode can be considered to determine the
function Sm(.)governing the evolution of sin that mode.
When mode changes occur (guard regions), the dynamics is
transported to a new state in mode nwhere the state evolution
is guided by function Sn(.). For changing-contact tasks, the
guard are sudden and pronounced compared with the readings
within a contact mode. The mode switches impose structure
on manipulation tasks; the transitions can be considered as
triggers for changing the current model of the environment.
B. Control Strategy and Learning Dynamics in a Mode
The control strategy and the method to learn the dynamics
model for each separate mode was improved from our previous
work . Our approach for learning the continuous dynamics
of individual mode uses an Incremental Gaussian Mixture
Model (IGMM) approach . IGMM internally uses a variant
of the Expectation-Maximization (EM) algorithm to ﬁt the
model. In our implementation, GMM was incrementally ﬁt
over points X= (X1, ..., XT), with Xt= [St−1, Dt]where
each point contains information about some previous observ-
able state (S), along with the current values to be predicted
(D). When the learned model is used during task execution,
values for the next time instant is predicted as a function of
the robot’s current state, (Dt+1|St), using Gaussian Mixture
Regression (GMR) . In this work, the forward model
learns to predict the end-effector forces and torques ([Feet, τt])
from the previous end-effector force, torque, and end-effector
velocity ([Feet−1, τt−1,˙xt−1]). We used the magnitudes of
force, torque, and end-effector velocity for learning and pre-
diction instead of their 3D vector representation. Since the
magnitudes of frictional forces and torques are independent
of the direction of motion (in ideal cases), the simpliﬁed
representation is sufﬁcient to learn and predict the end-effector
forces and torques along the direction of motion. This reduced
representation of forces and torques made the learning process
simpler, more computationally efﬁcient, and also independent
of the direction of motion. The learned model always predicts
the forces and torques along (or against) the direction of
motion. The components of force and torques along the axes
of motion can be recovered when needed or estimated using
the previously measured sensor values.
The predictions from the forward model provide the feed-
forward term that cancels out the effect of the environment
forces (friction) during motion, in the control equation:
t∆ ˙xt+uf c
free + (1 −λt−1)(Kp
λt= 1 −1
1 + e−r(εt−ε0)(5)
where utis the control command to the robot (i.e., task space
force) at time t,Kp
tare the (positive deﬁnite) stiffness
and damping matrices of the feedback controller for motion;
tis the simple force feedback control (Equation 3) for the
interaction task (the directions of the force control is orthogo-
nal to direction of motion control) with the proportional gain
tfor the error in task-space force ∆F;ktis the feed-forward
term (end-effector forces and torques) predicted by the forward
model associated with the present mode mt, using GMR as
described previously; and ∆xand ∆ ˙xare the errors in the end-
effector position and velocity at each instant. The weighting
factor λtis a function of the accuracy of the forward model
at instant t, that maps the error in prediction from the forward
model (εt) to a value between 0 and 1 (such as the logistic
function in Equation 5). The logistic growth rate rand the
sigmoid midpoint ε0are hyperparameters that have to be tuned
for the task. It ensures that the overall control law (Equation
2) relies on the feed-forward term only if the dynamics of the
mode is learned accurately. Otherwise, the robot should aim
to follow the goal trajectory more accurately by prioritizing
the feedback control term.
Equation 4 deﬁnes how the stiffness parameter is updated
at each instant in the variable-impedance control law as a
function of prediction accuracy of the forward model. Kp
is the maximum allowed stiffness, and Kp
free is the mini-
mum stiffness parameter that would provide accurate position
tracking in the absence of all external disturbances (motion in
free space). The damping term is updated as Kd
using the constraint for critically-damped systems . We
demonstrated the advantage of using this hybrid variable
impedance formulation in .
C. Contact Mode Recognition and Identiﬁcation
Our approach for recognizing known modes and identifying
new modes in changing-contact tasks is based on the obser-
vation that any change in mode is accompanied by a sudden
signiﬁcant change in the sensor readings. In our framework,
the robot responds to pronounced changes in force-torque
measurements by brieﬂy using a high-stiffness control strategy
while quickly obtaining a batch of sensor data to conﬁrm and
respond to the transition. The robot learns a new dynamics
model if a new mode is detected, and transitions to (and
revises) an existing dynamics model if a known mode if
transitioning to a known mode.
The management of modes is based on an online incremen-
tal clustering algorithm called Balanced Iterative Reducing
and Clustering using Hierarchies (BIRCH) [29, 30]. This
algorithm incrementally and dynamically clusters incoming
data for given memory and time constraints, without having
to examine all existing data points or clusters. We used the
implementation of BIRCH in the Scikit-learn library .
Each cluster is considered to represent a mode in a feature
space (more details below), with the clusters being updated
using batches of the feature data. The fraction of the input
feature vectors assigned to any existing cluster determines the
conﬁdence in the corresponding mode being the current mode.
If the highest such conﬁdence value is above a threshold,
the dynamics model of that mode is used and revised until a
mode change occurs. If the feature vectors are not sufﬁciently
similar to an existing cluster, a new cluster (i.e., mode) and
the corresponding dynamics model are constructed and revised
(see Section III-B) until a mode transition occurs.
The key factor inﬂuencing the reliability and generalizability
is the choice of feature representation for the modes. This
representation is task dependent but the objective is to identify
one or more properties that vary substantially when change
occurs while concisely and uniquely representing the modes.
For the task of sliding an object over surfaces with different
values of friction, the property that strongly inﬂuences the
end-effector forces (Fee) is the friction coefﬁcient between
the object and the surface. When two objects slide over each
Fig. 2: The torque measured at the pivot (τ) varies for different
relative orientation of the object (θ), unlike the force at the tip (Fr).
The object is moving along ˙xresulting in a frictional resistance Fr
at the point of contact in the opposite direction.
other at constant velocity, Fee is proportional to the applied
normal force (R) and the friction coefﬁcient (µ) (assuming the
relative orientation of their surface normals do not change); µ
can then be estimated as:
A concise feature representation for this task is thus kFt
which has the effect of making mode classiﬁcation indepen-
dent of the magnitude of the applied force.
In a similar manner, for changes in the type of contact, end-
effector orientation is an useful feature, but small changes in
orientation may require different modes. A more reasonable
feature is the magnitude of the end-effector torques that can
be measured using the force-torque sensor in the wrist:
where Fris the force at the tip, lis the length of the pivot
arm, and θis the orientation between the surface normals.
Figure 2 indicates that for any object, τis different for the
different types of contacts. With the magnitude of the torques
(kτk) as the feature representation, modes can be classiﬁed
independent of the motion direction and object orientation.
This representation would not work when the magnitude of
the applied force differs. If we instead assume that the force
measured at the wrist (Fee) approximates the force at the
tip of the object (Fr), Equations 6 and 7 imply that kτtk
invariant to the magnitude of the applied force for a ﬁxed
relative orientation between the objects in contact:
Ris constant for each mode (based on θ) provided
object geometry (l) and friction (µ) do not change. Experimen-
tal analysis revealed that this parameter by itself is insufﬁcient
to distinguish between contacts when the applied normal
force changes because the assumption about kinematic friction
(Fr=µR) does not hold in many real-world situations . We
thus use [kτk
R]as the feature representation for this task;
it supports better generalization over different normal forces
while reliably distinguishing different changing contacts.
TABLE I: Control loop of framework
Input : Control parameters: Kp
models fi, i ∈[1, M ]; Current mode: m= 0.
1while Motion pattern not complete do
2if Object in contact with surface then
3if mode transition detected then
5m=classif y mode learn dynamics()
7Update and use fmfor control
Algorithm I is an overview of the framework’s control
loop for a manipulator sliding an object on a surface; it
proceeds until a desired motion pattern is completed. Control
and learning methods are used only after the object comes in
contact with the surface (lines 2-11). As described earlier, the
robot responds to a detected mode transition by setting a high
stiffness, collecting feature samples, determining the mode and
learning/revising the corresponding dynamics model (lines 3-
5). In the absence of a mode transition, the robot continues
with its current mode and dynamics model (lines 6-8).
IV. EXP ER IM EN TAL EVALUATIO N
We used a 7-DoF Franka Emika Panda manipulator robot
for our experiments. The robot had to slide an object along
a desired motion pattern on a surface, and we considered a
“changing surface” task and a “changing contact type” task to
evaluate the hypotheses. We used the root mean square error
(RMSE) in following/tracking a desired motion trajectory as
a key performance measure.
To evaluate the need for separate models, the robot was
asked to slide an object across two surfaces with markedly
different friction, with the transition point being unknown to
the robot. The robot had a dynamics model for the ﬁrst surface
(learned a priori) but no models for the second surface. In
90% of the trials, the robot was unable to complete the task.
The feed-forward values being predicted by the model for the
rougher surface were much higher than those required for the
smoother surface, making the robot overshoot (when it tran-
sitioned to the smoother surface) and reach the safety limits
of joint torques, causing the robot to stop. This indicated that
performance is unreliable with a single incrementally revised
model when there are pronounced discrete mode changes.
Further, the robot was allowed to build a new model from
scratch each time a mode switch was observed. However, the
robot had to operate with high-stiffness for a longer time
Fig. 3: Modes detected by the transition model and their conﬁdence
values. The numbers on top (in green) indicate the conﬁdence with
which the transition model identiﬁed that mode. The number below
(in red) shows the mode with the next highest conﬁdence. “N”
indicates a transition to a new mode. The red vertical lines along
the x-axis indicate the actual occurrence of mode transitions.
until a reliable forward (dynamics) model for the new mode
was created, which is undesirable. On the other hand, when
different models for the two surfaces are available, the robot
was able to switch between them much faster, spending much
less time using the high-stiffness strategy. Results and ﬁgures
from these experiments are omitted for brevity.
We then experimentally evaluated the following hypotheses:
H1: The framework provides reliable and efﬁcient perfor-
mance for changing-contact manipulation tasks; and
H2: The framework’s performance is robust to changes in the
direction of motion and applied forces.
to examine whether the framework reliably and efﬁciently
transitions to the appropriate model in the presence of changes
in direction of motion and applied forces.
To evaluate the hypotheses, we ﬁrst considered the changing
surface task (Figure 1). The robot had to slide an object
back and forth between two surfaces, but one of the surfaces
was randomly changed to a surface with a different value
of friction. Starting with no knowledge about the surfaces, it
incrementally identiﬁes each dynamic mode and builds a dy-
namics model for each mode (i.e., each distinct surface) while
operating brieﬂy under high stiffness. Once it has learned
the dynamics models for the different modes, it responds to
any subsequent mode transition by quickly transitioning the
corresponding dynamics model identiﬁed.
Figure 3 summarizes the results over one trial of this
experiment. We observe that the framework is able to identify
transitions to existing or new modes with high conﬁdence.
In each instance, the second best choice of mode is asso-
ciated with a much lower value of conﬁdence. The results
also indicate that the algorithms and the underlying feature
representation make the performance robust to changes in the
direction of motion, i.e., a new mode is not identiﬁed when
the manipulator moves over a previously seen surface in a
different direction. There is some confusion between surfaces
2 and 3 because their friction values are somewhat similar.
Figure 4 shows the absolute error in trajectory tracking
during this task and the corresponding stiffness parameters
used by the controller. The peaks in the error plot correspond
to the sudden change of surface. The prediction made by
the model of the previous mode caused a momentary loss of
trajectory tracking accuracy, until the robot switches to the
high-stiffness mode for identifying the current mode. Once
the robot identiﬁed the current mode, it used lower stiffness
to complete the task. As discussed previously, switching to
Fig. 4: Performance of framework for changing-surface task. Top:
controller stiffness variation during the task. Bottom: absolute error
in trajectory tracking. The spikes correspond to an incorrect feed-
forward prediction by the previous model after the transition.
Fig. 5: The different contacts used.
a previously learned mode requires a much shorter period of
high stiffness (and expends much less energy) compared with
learning a new dynamics model from scratch. These results
support hypothesis H1, and to some extent H2.
Next we conducted experiments with the changing contact
type task. The robot had to slide an object along a trajectory on
a surface under three different types of contacts (Figure 5). The
robot started with no prior knowledge of the task. During each
trial, the robot approached the table to execute a particular type
of contact while maintaining a normal force of 10N. Contact
with the surface triggers a transition; the robot proceeds to
slide the object (in its grip) along the surface with the force of
10N. This is initially done at a high stiffness if it is learning
a new dynamics model, or at a suitably low stiffness if the
transition is to an existing mode/model.
Next, Figure 6 demonstrates the robustness of the frame-
work to motion along a direction different from that used
Detected Mode Contact 1 Contact 2 Contact 3
Contact 1 83 9 16
Contact 2 2 88 1
Contact 3 14 2 79
New Mode 1 1 4
Detected Mode Contact 1 Contact 2 Contact 3
Contact 1 81 10 17
Contact 2 3 86 1
Contact 3 15 2 77
New Mode 1 2 5
TABLE II: Confusion matrix of average conﬁdence (%) across
10 trials associated with mode recognition based on the learned
dynamics models for three types of contacts Top: Normal force of
10N;Bottom: Normal force 20N.
Fig. 6: Testing the previously trained transition models for motion
in a different direction. Top: Torques measured about axis parallel
to surface and perpendicular to direction of motion; The spikes
in the measurements correspond to contacts; Middle: End-effector
forces predicted by the forward model for the current mode; Bottom:
Variation in controller stiffness due to the predicted forces.
during training. The feed-forward model predictions and the
corresponding variable impedance behaviour for one of the
trials is shown, along with the model chosen with the highest
conﬁdence (bottom of the ﬁgure). The identiﬁed modes match
the true mode in all cases.
Fig. 7: Testing the previously trained transition model under different
normal force (20N instead of 10N). Top: Torques measured about
axis parallel to surface and perpendicular to direction of motion; The
spikes in the measurements correspond to making contact; Middle:
End-effector forces predicted by the forward model for the current
mode; Bottom: Controller stiffness variation due to the predicted
The framework was then tested for the same task and
contacts while applying a different (constant) normal force
on the surface during the sliding motion (Figure 7). Although
the conﬁdence associated with the modes is a little lower and
the time taken to recognize the modes is a little more, the
framework is still able to recognize the modes correctly and
the task is completed successfully using variable impedance
control. The lower conﬁdence can be attributed to the kinetic
friction assumption (µ=F/R) being unrealistic in many real
world tasks. These results support hypotheses H1 and H2.
V. CONCLUSIONS AND FUTURE WO RK
This paper described a framework that formulated changing-
contact tasks as a piece-wise continuous, hybrid system.
Our framework does not require any prior knowledge about
the environment or the objects involved. Unlike data-driven
methods that require many labeled training examples, our
framework is able to build an initial dynamics model for
each observed mode from one demonstration of the target
task, incrementally revising (and introducing new) dynamics
models during task execution; the learned models provide
smooth variable impedance control within each mode. Unlike
other existing methods for related manipulation tasks , our
method is not limited to the sequence of modes seen dur-
ing demonstrations. Also, unlike existing work that modeled
discretely changing dynamics , our framework requires
no prior information about the number of modes involved in
the task. In the context of a manipulator sliding an object
along desired motion trajectories on a surface in the presence
of changing friction, applied force, and type of contacts,
experimental results demonstrate the framework’s ability to
reliably and efﬁciently follow the desired motion trajectories
invariant to changes in the direction of motion and magnitude
of applied forces.
Our future work will address the current limitation of the
framework. For instance, the current strategy of switching
between modes (and dynamics models) is not smooth, with
spikes in sensor measurements in the guard (i.e., transition)
regions. Also, we will explore tasks with many more modes.
Another direction for future research is to investigate other
examples of changing-contact manipulation tasks, and addi-
tional factors that inﬂuence such tasks. In addition, it would
also be interesting to explore the automatic selection (or
learning) of the feature representation for each changing-
contact manipulation task. The longer-term objective is to
enable reliable, efﬁcient, and smooth learning and control
in the context of a robot manipulator performing complex
assembly tasks with multiple objects in complex domains.
 Marcin Andrychowicz, Bowen Baker, Maciek Chociej,
Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur
Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al.
Learning dexterous in-hand manipulation. arXiv preprint
 D. Baraff. Coping with friction for non-penetrating rigid
body simulation. ACM SIGGRAPH, 25(4):31–41, 1991.
 L. Bus¸oniu, T. de Bruin, D. Toli´
c, J. Kober, and
I. Palunko. Reinforcement learning for control: Per-
formance, stability, and deep approximators. Annual
Reviews in Control, 46:8–28, 2018.
 Karol Hausman, Jost Tobias Springenberg, Ziyu Wang,
Nicolas Heess, and Martin Riedmiller. Learning an
embedding space for transferable robot skills. In Inter-
national Conference on Learning Representations, 2018.
 A.J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and
S. Schaal. Dynamical movement primitives: learning at-
tractor models for motor behaviors. Neural computation,
 Ajinkya Jain and Scott Niekum. Efﬁcient hierarchical
robot motion planning under uncertainty and hybrid
dynamics. arXiv preprint arXiv:1802.04205, 2018.
 Aaron M Johnson, Samuel A Burden, and Daniel E
Koditschek. A hybrid systems model for simple manip-
ulation and self-manipulation systems. The International
Journal of Robotics Research, 35(11):1354–1392, 2016.
 Nate Kohl and Peter Stone. Policy gradient reinforcement
learning for fast quadrupedal locomotion. In IEEE
International Conference on Robotics and Automation,
2004. Proceedings. ICRA’04. 2004, volume 3, pages
2619–2624. IEEE, 2004.
 George Konidaris, Scott Kuindersma, Roderic Grupen,
and Andrew Barto. Robot learning from demonstration
by constructing skill trees. The International Journal of
Robotics Research, 31(3):360–375, 2012.
 M. Koval, N. Pollard, and S. Srinivasa. Pre-and post-
contact policy decomposition for planar contact manip-
ulation under uncertainty. The International Journal of
Robotics Research, 35(1-3):244–264, 2016.
 O. Kroemer, S. Niekum, and G. Konidaris. A review of
robot learning for manipulation: Challenges, representa-
tions, and algorithms. arXiv preprint arXiv:1907.03146,
 G. Lee, Z. Marinho, A. Johnson, G. Gordon, S. Srini-
vasa, and M. Mason. Unsupervised learning for non-
linear piecewise smooth hybrid systems. arXiv preprint
 S Levine, N Wagener, and P Abbeel. Learning contact-
rich manipulation skills with guided policy search (2015).
arXiv preprint arXiv:1501.05611.
 Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter
Abbeel. End-to-end training of deep visuomotor policies.
The Journal of Machine Learning Research, 17(1):1334–
 Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind
Rajeswaran, and Emanuel Todorov. Reinforcement
learning for non-prehensile manipulation: Transfer from
simulation to physical system. In IEEE International
Conference on Simulation, Modeling, and Programming
for Autonomous Robots, pages 35–42. IEEE, 2018.
 M. Mathew, S. Sidhik, M. Sridharan, M. Azad,
A. Hayashi, and J. Wyatt. Online Learning of Feed-
Forward Models for Task-Space Variable Impedance
Control. In International Conference on Humanoid
Robots (Humanoids), 2019.
 Yutaka Nakamura, Takeshi Mori, Masa-aki Sato, and
Shin Ishii. Reinforcement learning for a biped robot
based on a cpg-actor-critic method. Neural networks,
 D. Nguyen-Tuong, M. Seeger, and J. Peters. Model learn-
ing with local gaussian process regression. Advanced
Robotics, 23(15):2015–2034, 2009.
 S. Niekum, S. Chitta, A. Barto, B. Marthi, and S. Os-
entoski. Incremental semantically grounded learning
from demonstration. In Robotics: Science and Systems,
volume 9, pages 10–15607. Berlin, Germany, 2013.
 P. Pastor, M. Kalakrishnan, L. Righetti, and S. Schaal.
Towards associative skill memories. In 2012 12th IEEE-
RAS International Conference on Humanoid Robots (Hu-
manoids), pages 309–315. IEEE, 2012.
 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-
learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.
 Joseph M Romano, Kaijen Hsiao, G¨
Sachin Chitta, and Katherine J Kuchenbecker. Human-
inspired robotic grasp control with tactile sensing. IEEE
Transactions on Robotics, 27(6):1067–1079, 2011.
 M. Song and H. Wang. Highly efﬁcient incremental
estimation of Gaussian mixture models for online data
stream clustering. In K. L. Priddy, editor, SPIE Confer-
ence Series, volume 5803, pages 174–183, March 2005.
 Freek Stulp, Evangelos A Theodorou, and Stefan Schaal.
Reinforcement learning with sequences of motion prim-
itives for robust manipulation. IEEE Transactions on
robotics, 28(6):1360–1370, 2012.
 H. Sung. Gaussian mixture regression and classiﬁcation.
PhD thesis, Rice University, 2004.
 Yuval Tassa and Emo Todorov. Stochastic complemen-
tarity for local control of discontinuous dynamics. 2010.
 Marc Toussaint, Kelsey Allen, Kevin A Smith, and
Joshua B Tenenbaum. Differentiable physics and sta-
ble modes for tool-use and manipulation planning. In
Robotics: Science and Systems, 2018.
 Chenguang Yang, Gowrishankar Ganesh, Sami Had-
dadin, Sven Parusel, Alin Albu-Schaeffer, and Etienne
Burdet. Human-like adaptation of force and impedance
in stable and unstable interactions. IEEE transactions on
robotics, 27(5):918–930, 2011.
 T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An
efﬁcient data clustering method for very large databases.
SIGMOD Rec., 25(2):103–114, June 1996. ISSN 0163-
 Tian Zhang, Raghu Ramakrishnan, and Miron Livny.
Birch: A new data clustering algorithm and its appli-
cations. Data Mining and Knowledge Discovery, 1(2):