Content uploaded by Saif Sidhik

Author content

All content in this area was uploaded by Saif Sidhik on May 28, 2020

Content may be subject to copyright.

Online Learning of Hybrid Models for Variable

Impedance Control of a Changing-Contact

Manipulation Task

Saif Sidhik

University of Birmingham, UK

Mohan Sridharan

University of Birmingham, UK

Dirk Ruiken

Honda Research Institute (EU), Germany

Abstract—Many manipulation tasks comprise discrete action

sequences characterized by continuous dynamics, with the tran-

sitions between these discrete dynamic modes characterized by

discontinuous dynamics. The individual modes can represent

different types of contacts, surfaces, or other factors, and

different control strategies may be needed for each mode and

the transitions between modes. This paper describes a piece-wise

continuous, hybrid control framework that automatically detects

transitions between modes and incrementally learns a model

of the dynamics of each mode to support variable impedance

control in that mode. The recognition of modes is invariant to

the direction of motion and the magnitude of applied forces.

Also, new modes are identiﬁed automatically and suitable models

of the corresponding dynamics are learned. The framework is

evaluated on a robot manipulator sliding an object on a surface

along a desired motion trajectory in the presence of changes in

surface friction, applied force, or the type of contact between

the object and the surface. Experimental results indicate reliable

and efﬁcient recognition of modes, learning of dynamics models,

and variable-impedance control during task execution.

I. MOTI VATI ON

Consider a robot manipulator sliding an object over a

surface along a desired pattern, as shown in Figure 1. The

system’s dynamics vary markedly before and after the object

comes in contact with the surface, and based on the type

of contact (e.g., surface or edge contact), surface friction,

applied force, and other factors. We consider all such tasks that

involve changes in dynamics due to changes in the nature of

contact, i.e., changes in the interaction between two objects as

“changing-contact” tasks. Many practical manipulation tasks

are changing-contact tasks characterized by discontinuities in

the dynamics when the nature of the contact changes, making

it difﬁcult to learn a single model of the task dynamics. They

can be modeled as a hybrid system with continuous dynamics

within each of a number of discrete dynamic modes that

may need a distinct control strategy [11]. Then, the overall

task’s dynamics are piece-wise continuous, with the system

transitioning between the individual modes over time.

Constructing separate (continuous) models for the different

modes, each well-suited for operation within a mode, elim-

inates the need for a combined model but it introduces the

need for a transition model that chooses from the models and

the control strategies for the individual modes. Such a hybrid

model [18] uses the modularity of the sub-tasks to construct the

overall transition model [12]. However, it requires the ability

Fig. 1: Sliding an object along a desired pattern on three surfaces with

different values of friction; images represent different transitions.

to accurately recognize the mode at any point in time, revise

the existing dynamics models to adapt to the changes within

any given mode, and to identify and learn dynamics models for

previously unseen modes. This paper describes a framework

that addresses these requirements and enables hybrid control

of a changing-contact manipulation task by:

1) Incrementally learning a non-linear, piece-wise contin-

uous model of the dynamics of any given task from a

single demonstration, without prior knowledge of the

task, its modes, or the order in which the modes appear.

2) Incorporating a transition model that automatically clus-

ters the modes associated with any given task, and helps

transition to new modes during task execution.

3) Introducing a reduced feature representation that makes

the learning of dynamics models computationally efﬁ-

cient, and makes the identiﬁcation of modes independent

of the motion direction and magnitude of applied forces.

4) Incrementally and continuously learns and revises a

probabilistic model of the dynamics of any particular

mode, using the model for variable impedance control

and compliant motion within that mode.

The novelty is in the ﬁrst three contributions; the last one

builds on our prior work on variable impedance control of

continuous contact tasks [16]. To better understand the learn-

ing and control challenges faced by such a learning framework

(e.g., discretely-changing, non-linear, piecewise continuous

environment dynamics), we chose to explore a representative

changing-contact task with discretely changing dynamics: a

robot manipulator sliding an object on a surface in a desired

motion pattern. We limit sensor input to that from a force-

torque sensor on the manipulator (i.e., we do not use an

external camera), and evaluate the framework in the presence

of discrete changes in surface friction, applied force, and type

of contact.

We begin with a review of related work in Section II,

followed by a description of the proposed framework in

Section III. Section IV discusses the experimental results,

followed by the conclusions in Section V.

II. RE LATE D WOR K

There is a rich literature of methods developed to address

the learning and control problems associated with robot ma-

nipulation [11]. This includes reinforcement learning (RL)

methods [9, 24] and recent methods based on a combination

of deep networks and RL for learning ﬂexible behaviors from

complex data [1, 4, 14, 15]. These data-driven methods require

large labeled datasets, which often need to be obtained from

multiple repetitions of the task by the robot. These require-

ments are difﬁcult to satisfy in practical domains, especially

on a physical robot. Also, the training process optimizes

parameters of speciﬁc skills (or their sequence), and the

internal representations and decision making mechanisms are

opaque, making it difﬁcult to transferred the learned policies to

new tasks. Sim-to-real learning strategies have been developed

to reduce the need to perform training on real robots for

manipulation tasks. However, aspects of dynamics in the real

world, e.g., the continuous time dynamics of rigid bodies with

friction, are too complicated (NP-hard) to be modelled in a

real-time dynamics simulator [2, 7]. Also, these methods are

also not well-suited to work with a hybrid system formulation

because they (implicitly or explicitly) consider a single model

over the different modes of the manipulation action [11].

RL and optimal control methods applied to robot ma-

nipulation often assume that the underlying task dynamics

are smooth. Also, the application of learning strategies to

hybrid systems has been limited [12, 26], with many of them

focusing on bipedal locomotion [8, 17]. Planning approaches

for manipulation domains often explicitly take the multi-modal

structure of the dynamics of manipulation into account [27, 6].

However, these planning methods assume a pre-deﬁned model

of the system and prior knowledge of actions and the modes.

Unlike online learning approaches such as [28], our framework

does not require a periodically repeating trajectory, nor does

it learn a time-series of controller parameters to be used in

a repeatable dynamic environment. The framework learns to

adapt its controller based on the current dynamic forces it

experiences.

Many methods have shown the beneﬁts of incorporating

modes or phases into the design of controllers [22], and

many methods have been proposed to learn controllers for

such multi-phase tasks [3, 10, 13]. Different strategies of

sequencing motion primitives have also been used to solve ma-

nipulation tasks. However, most of these assume that a library

of modes or motion primitives already exists [20], or segment a

sequence of primitives from human demonstrations [19]. This

makes the learned policy dependent on the speciﬁc movements

and their sequence.

The framework described in this paper for changing-contact

manipulation draws inspiration from the approaches that in-

corporate modes in the design of controllers. However, our

framework supports (a) automatic recognition of modes and

identiﬁcation of new modes invariant to the direction of motion

and magnitude of the applied force; and (b) incremental learn-

ing and revision of dynamic models for variable impedance

control in the individual modes.

III. PROB LE M FORMULATION AND FRA ME WORK

This section ﬁrst describes the formulation of changing-

contact manipulation tasks as a piece-wise continuous hybrid

system (Section III-A). Section III-B describes the control

strategy and learning of continuous dynamics within a single

mode. The detection and learning of the discrete dynamic

modes are then explained in Section III-C.

A. Piece-wise Continuous Hybrid System

In a piece-wise continuous hybrid system, the state can

be described as the tuple (m, s)where m∈Mis a mode

from the discrete set of modes M, and s∈Smis an

element in the continuous subspace Sm⊆Rdassociated

with m. This formulation assumes that subspaces do not

intersect or overlap, i.e., Sm∩Sn=∅ ∀ m6=n. The

evolution of swithin a mode is determined by a discrete-

time continuous function Sm(.), but the state transition is

discrete and discontinuous at the boundaries between modes.

Lee et al. [12] called the boundary between modes mand m0,

where the transition occurs deterministically, as guard regions

denoted as Gm,m0⊆Sm. In the guard regions, sis transported

to st+1 ∈Sm0through a reset function rm,m0(.). The state

propagation is thus governed by:

st+1 =(rmt,mt+1 (st) + wtif st∈Gmt,mt+1

Sm(st) + wtif st∈Smt

(1)

where wtis additive (Gaussian) process noise. In the context

of the sliding task considered in this paper, the forces and

torques measured by the robot at its end-effector constitute

the observable state (s) of the system that varies continuously

within each contact mode. This formulation makes the reason-

able assumption that properties such as friction are continuous

across the surface of each object. The control strategy guiding

the object’s motion and the (static or smoothly changing)

environment in that mode can be considered to determine the

function Sm(.)governing the evolution of sin that mode.

When mode changes occur (guard regions), the dynamics is

transported to a new state in mode nwhere the state evolution

is guided by function Sn(.). For changing-contact tasks, the

guard are sudden and pronounced compared with the readings

within a contact mode. The mode switches impose structure

on manipulation tasks; the transitions can be considered as

triggers for changing the current model of the environment.

B. Control Strategy and Learning Dynamics in a Mode

The control strategy and the method to learn the dynamics

model for each separate mode was improved from our previous

work [16]. Our approach for learning the continuous dynamics

of individual mode uses an Incremental Gaussian Mixture

Model (IGMM) approach [23]. IGMM internally uses a variant

of the Expectation-Maximization (EM) algorithm to ﬁt the

model. In our implementation, GMM was incrementally ﬁt

over points X= (X1, ..., XT), with Xt= [St−1, Dt]where

each point contains information about some previous observ-

able state (S), along with the current values to be predicted

(D). When the learned model is used during task execution,

values for the next time instant is predicted as a function of

the robot’s current state, (Dt+1|St), using Gaussian Mixture

Regression (GMR) [25]. In this work, the forward model

learns to predict the end-effector forces and torques ([Feet, τt])

from the previous end-effector force, torque, and end-effector

velocity ([Feet−1, τt−1,˙xt−1]). We used the magnitudes of

force, torque, and end-effector velocity for learning and pre-

diction instead of their 3D vector representation. Since the

magnitudes of frictional forces and torques are independent

of the direction of motion (in ideal cases), the simpliﬁed

representation is sufﬁcient to learn and predict the end-effector

forces and torques along the direction of motion. This reduced

representation of forces and torques made the learning process

simpler, more computationally efﬁcient, and also independent

of the direction of motion. The learned model always predicts

the forces and torques along (or against) the direction of

motion. The components of force and torques along the axes

of motion can be recovered when needed or estimated using

the previously measured sensor values.

The predictions from the forward model provide the feed-

forward term that cancels out the effect of the environment

forces (friction) during motion, in the control equation:

ut=Kp

t∆xt+Kd

t∆ ˙xt+uf c

t+λt−1kt(2)

ufc

t=Kf

t∆Ft+Fd

t(3)

Kp

t=Kp

free + (1 −λt−1)(Kp

max −Kp

free)(4)

λt= 1 −1

1 + e−r(εt−ε0)(5)

where utis the control command to the robot (i.e., task space

force) at time t,Kp

tand Kd

tare the (positive deﬁnite) stiffness

and damping matrices of the feedback controller for motion;

ufc

tis the simple force feedback control (Equation 3) for the

interaction task (the directions of the force control is orthogo-

nal to direction of motion control) with the proportional gain

Kf

tfor the error in task-space force ∆F;ktis the feed-forward

term (end-effector forces and torques) predicted by the forward

model associated with the present mode mt, using GMR as

described previously; and ∆xand ∆ ˙xare the errors in the end-

effector position and velocity at each instant. The weighting

factor λtis a function of the accuracy of the forward model

at instant t, that maps the error in prediction from the forward

model (εt) to a value between 0 and 1 (such as the logistic

function in Equation 5). The logistic growth rate rand the

sigmoid midpoint ε0are hyperparameters that have to be tuned

for the task. It ensures that the overall control law (Equation

2) relies on the feed-forward term only if the dynamics of the

mode is learned accurately. Otherwise, the robot should aim

to follow the goal trajectory more accurately by prioritizing

the feedback control term.

Equation 4 deﬁnes how the stiffness parameter is updated

at each instant in the variable-impedance control law as a

function of prediction accuracy of the forward model. Kp

max

is the maximum allowed stiffness, and Kp

free is the mini-

mum stiffness parameter that would provide accurate position

tracking in the absence of all external disturbances (motion in

free space). The damping term is updated as Kd

t=pKp

t/4

using the constraint for critically-damped systems [5]. We

demonstrated the advantage of using this hybrid variable

impedance formulation in [16].

C. Contact Mode Recognition and Identiﬁcation

Our approach for recognizing known modes and identifying

new modes in changing-contact tasks is based on the obser-

vation that any change in mode is accompanied by a sudden

signiﬁcant change in the sensor readings. In our framework,

the robot responds to pronounced changes in force-torque

measurements by brieﬂy using a high-stiffness control strategy

while quickly obtaining a batch of sensor data to conﬁrm and

respond to the transition. The robot learns a new dynamics

model if a new mode is detected, and transitions to (and

revises) an existing dynamics model if a known mode if

transitioning to a known mode.

The management of modes is based on an online incremen-

tal clustering algorithm called Balanced Iterative Reducing

and Clustering using Hierarchies (BIRCH) [29, 30]. This

algorithm incrementally and dynamically clusters incoming

data for given memory and time constraints, without having

to examine all existing data points or clusters. We used the

implementation of BIRCH in the Scikit-learn library [21].

Each cluster is considered to represent a mode in a feature

space (more details below), with the clusters being updated

using batches of the feature data. The fraction of the input

feature vectors assigned to any existing cluster determines the

conﬁdence in the corresponding mode being the current mode.

If the highest such conﬁdence value is above a threshold,

the dynamics model of that mode is used and revised until a

mode change occurs. If the feature vectors are not sufﬁciently

similar to an existing cluster, a new cluster (i.e., mode) and

the corresponding dynamics model are constructed and revised

(see Section III-B) until a mode transition occurs.

The key factor inﬂuencing the reliability and generalizability

is the choice of feature representation for the modes. This

representation is task dependent but the objective is to identify

one or more properties that vary substantially when change

occurs while concisely and uniquely representing the modes.

For the task of sliding an object over surfaces with different

values of friction, the property that strongly inﬂuences the

end-effector forces (Fee) is the friction coefﬁcient between

the object and the surface. When two objects slide over each

Fig. 2: The torque measured at the pivot (τ) varies for different

relative orientation of the object (θ), unlike the force at the tip (Fr).

The object is moving along ˙xresulting in a frictional resistance Fr

at the point of contact in the opposite direction.

other at constant velocity, Fee is proportional to the applied

normal force (R) and the friction coefﬁcient (µ) (assuming the

relative orientation of their surface normals do not change); µ

can then be estimated as:

µ∝kFeek

R(6)

A concise feature representation for this task is thus kFt

eek

Rt,

which has the effect of making mode classiﬁcation indepen-

dent of the magnitude of the applied force.

In a similar manner, for changes in the type of contact, end-

effector orientation is an useful feature, but small changes in

orientation may require different modes. A more reasonable

feature is the magnitude of the end-effector torques that can

be measured using the force-torque sensor in the wrist:

τ=Frlsin θ(7)

where Fris the force at the tip, lis the length of the pivot

arm, and θis the orientation between the surface normals.

Figure 2 indicates that for any object, τis different for the

different types of contacts. With the magnitude of the torques

(kτk) as the feature representation, modes can be classiﬁed

independent of the motion direction and object orientation.

This representation would not work when the magnitude of

the applied force differs. If we instead assume that the force

measured at the wrist (Fee) approximates the force at the

tip of the object (Fr), Equations 6 and 7 imply that kτtk

Rtis

invariant to the magnitude of the applied force for a ﬁxed

relative orientation between the objects in contact:

τ=µRl sin(θ)

Ideally, kτk

Ris constant for each mode (based on θ) provided

object geometry (l) and friction (µ) do not change. Experimen-

tal analysis revealed that this parameter by itself is insufﬁcient

to distinguish between contacts when the applied normal

force changes because the assumption about kinematic friction

(Fr=µR) does not hold in many real-world situations [2]. We

thus use [kτk

R,kFeek

R]as the feature representation for this task;

it supports better generalization over different normal forces

while reliably distinguishing different changing contacts.

TABLE I: Control loop of framework

Input : Control parameters: Kp

free,Kp

max; Dynamics

models fi, i ∈[1, M ]; Current mode: m= 0.

1while Motion pattern not complete do

2if Object in contact with surface then

3if mode transition detected then

4Kp

t←Kp

max

5m=classif y mode learn dynamics()

6else

7Update and use fmfor control

(Section III-B)

8end

9else

10 Kp

t←Kp

free

11 end

12 end

Algorithm I is an overview of the framework’s control

loop for a manipulator sliding an object on a surface; it

proceeds until a desired motion pattern is completed. Control

and learning methods are used only after the object comes in

contact with the surface (lines 2-11). As described earlier, the

robot responds to a detected mode transition by setting a high

stiffness, collecting feature samples, determining the mode and

learning/revising the corresponding dynamics model (lines 3-

5). In the absence of a mode transition, the robot continues

with its current mode and dynamics model (lines 6-8).

IV. EXP ER IM EN TAL EVALUATIO N

We used a 7-DoF Franka Emika Panda manipulator robot

for our experiments. The robot had to slide an object along

a desired motion pattern on a surface, and we considered a

“changing surface” task and a “changing contact type” task to

evaluate the hypotheses. We used the root mean square error

(RMSE) in following/tracking a desired motion trajectory as

a key performance measure.

To evaluate the need for separate models, the robot was

asked to slide an object across two surfaces with markedly

different friction, with the transition point being unknown to

the robot. The robot had a dynamics model for the ﬁrst surface

(learned a priori) but no models for the second surface. In

90% of the trials, the robot was unable to complete the task.

The feed-forward values being predicted by the model for the

rougher surface were much higher than those required for the

smoother surface, making the robot overshoot (when it tran-

sitioned to the smoother surface) and reach the safety limits

of joint torques, causing the robot to stop. This indicated that

performance is unreliable with a single incrementally revised

model when there are pronounced discrete mode changes.

Further, the robot was allowed to build a new model from

scratch each time a mode switch was observed. However, the

robot had to operate with high-stiffness for a longer time

Fig. 3: Modes detected by the transition model and their conﬁdence

values. The numbers on top (in green) indicate the conﬁdence with

which the transition model identiﬁed that mode. The number below

(in red) shows the mode with the next highest conﬁdence. “N”

indicates a transition to a new mode. The red vertical lines along

the x-axis indicate the actual occurrence of mode transitions.

until a reliable forward (dynamics) model for the new mode

was created, which is undesirable. On the other hand, when

different models for the two surfaces are available, the robot

was able to switch between them much faster, spending much

less time using the high-stiffness strategy. Results and ﬁgures

from these experiments are omitted for brevity.

We then experimentally evaluated the following hypotheses:

H1: The framework provides reliable and efﬁcient perfor-

mance for changing-contact manipulation tasks; and

H2: The framework’s performance is robust to changes in the

direction of motion and applied forces.

to examine whether the framework reliably and efﬁciently

transitions to the appropriate model in the presence of changes

in direction of motion and applied forces.

To evaluate the hypotheses, we ﬁrst considered the changing

surface task (Figure 1). The robot had to slide an object

back and forth between two surfaces, but one of the surfaces

was randomly changed to a surface with a different value

of friction. Starting with no knowledge about the surfaces, it

incrementally identiﬁes each dynamic mode and builds a dy-

namics model for each mode (i.e., each distinct surface) while

operating brieﬂy under high stiffness. Once it has learned

the dynamics models for the different modes, it responds to

any subsequent mode transition by quickly transitioning the

corresponding dynamics model identiﬁed.

Figure 3 summarizes the results over one trial of this

experiment. We observe that the framework is able to identify

transitions to existing or new modes with high conﬁdence.

In each instance, the second best choice of mode is asso-

ciated with a much lower value of conﬁdence. The results

also indicate that the algorithms and the underlying feature

representation make the performance robust to changes in the

direction of motion, i.e., a new mode is not identiﬁed when

the manipulator moves over a previously seen surface in a

different direction. There is some confusion between surfaces

2 and 3 because their friction values are somewhat similar.

Figure 4 shows the absolute error in trajectory tracking

during this task and the corresponding stiffness parameters

used by the controller. The peaks in the error plot correspond

to the sudden change of surface. The prediction made by

the model of the previous mode caused a momentary loss of

trajectory tracking accuracy, until the robot switches to the

high-stiffness mode for identifying the current mode. Once

the robot identiﬁed the current mode, it used lower stiffness

to complete the task. As discussed previously, switching to

Fig. 4: Performance of framework for changing-surface task. Top:

controller stiffness variation during the task. Bottom: absolute error

in trajectory tracking. The spikes correspond to an incorrect feed-

forward prediction by the previous model after the transition.

Fig. 5: The different contacts used.

a previously learned mode requires a much shorter period of

high stiffness (and expends much less energy) compared with

learning a new dynamics model from scratch. These results

support hypothesis H1, and to some extent H2.

Next we conducted experiments with the changing contact

type task. The robot had to slide an object along a trajectory on

a surface under three different types of contacts (Figure 5). The

robot started with no prior knowledge of the task. During each

trial, the robot approached the table to execute a particular type

of contact while maintaining a normal force of 10N. Contact

with the surface triggers a transition; the robot proceeds to

slide the object (in its grip) along the surface with the force of

10N. This is initially done at a high stiffness if it is learning

a new dynamics model, or at a suitably low stiffness if the

transition is to an existing mode/model.

Next, Figure 6 demonstrates the robustness of the frame-

work to motion along a direction different from that used

Ground Truth

Detected Mode Contact 1 Contact 2 Contact 3

Contact 1 83 9 16

Contact 2 2 88 1

Contact 3 14 2 79

New Mode 1 1 4

Ground Truth

Detected Mode Contact 1 Contact 2 Contact 3

Contact 1 81 10 17

Contact 2 3 86 1

Contact 3 15 2 77

New Mode 1 2 5

TABLE II: Confusion matrix of average conﬁdence (%) across

10 trials associated with mode recognition based on the learned

dynamics models for three types of contacts Top: Normal force of

10N;Bottom: Normal force 20N.

Fig. 6: Testing the previously trained transition models for motion

in a different direction. Top: Torques measured about axis parallel

to surface and perpendicular to direction of motion; The spikes

in the measurements correspond to contacts; Middle: End-effector

forces predicted by the forward model for the current mode; Bottom:

Variation in controller stiffness due to the predicted forces.

during training. The feed-forward model predictions and the

corresponding variable impedance behaviour for one of the

trials is shown, along with the model chosen with the highest

conﬁdence (bottom of the ﬁgure). The identiﬁed modes match

the true mode in all cases.

Fig. 7: Testing the previously trained transition model under different

normal force (20N instead of 10N). Top: Torques measured about

axis parallel to surface and perpendicular to direction of motion; The

spikes in the measurements correspond to making contact; Middle:

End-effector forces predicted by the forward model for the current

mode; Bottom: Controller stiffness variation due to the predicted

forces.

The framework was then tested for the same task and

contacts while applying a different (constant) normal force

on the surface during the sliding motion (Figure 7). Although

the conﬁdence associated with the modes is a little lower and

the time taken to recognize the modes is a little more, the

framework is still able to recognize the modes correctly and

the task is completed successfully using variable impedance

control. The lower conﬁdence can be attributed to the kinetic

friction assumption (µ=F/R) being unrealistic in many real

world tasks. These results support hypotheses H1 and H2.

V. CONCLUSIONS AND FUTURE WO RK

This paper described a framework that formulated changing-

contact tasks as a piece-wise continuous, hybrid system.

Our framework does not require any prior knowledge about

the environment or the objects involved. Unlike data-driven

methods that require many labeled training examples, our

framework is able to build an initial dynamics model for

each observed mode from one demonstration of the target

task, incrementally revising (and introducing new) dynamics

models during task execution; the learned models provide

smooth variable impedance control within each mode. Unlike

other existing methods for related manipulation tasks [19], our

method is not limited to the sequence of modes seen dur-

ing demonstrations. Also, unlike existing work that modeled

discretely changing dynamics [12], our framework requires

no prior information about the number of modes involved in

the task. In the context of a manipulator sliding an object

along desired motion trajectories on a surface in the presence

of changing friction, applied force, and type of contacts,

experimental results demonstrate the framework’s ability to

reliably and efﬁciently follow the desired motion trajectories

invariant to changes in the direction of motion and magnitude

of applied forces.

Our future work will address the current limitation of the

framework. For instance, the current strategy of switching

between modes (and dynamics models) is not smooth, with

spikes in sensor measurements in the guard (i.e., transition)

regions. Also, we will explore tasks with many more modes.

Another direction for future research is to investigate other

examples of changing-contact manipulation tasks, and addi-

tional factors that inﬂuence such tasks. In addition, it would

also be interesting to explore the automatic selection (or

learning) of the feature representation for each changing-

contact manipulation task. The longer-term objective is to

enable reliable, efﬁcient, and smooth learning and control

in the context of a robot manipulator performing complex

assembly tasks with multiple objects in complex domains.

REFERENCES

[1] Marcin Andrychowicz, Bowen Baker, Maciek Chociej,

Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur

Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al.

Learning dexterous in-hand manipulation. arXiv preprint

arXiv:1808.00177, 2018.

[2] D. Baraff. Coping with friction for non-penetrating rigid

body simulation. ACM SIGGRAPH, 25(4):31–41, 1991.

[3] L. Bus¸oniu, T. de Bruin, D. Toli´

c, J. Kober, and

I. Palunko. Reinforcement learning for control: Per-

formance, stability, and deep approximators. Annual

Reviews in Control, 46:8–28, 2018.

[4] Karol Hausman, Jost Tobias Springenberg, Ziyu Wang,

Nicolas Heess, and Martin Riedmiller. Learning an

embedding space for transferable robot skills. In Inter-

national Conference on Learning Representations, 2018.

[5] A.J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and

S. Schaal. Dynamical movement primitives: learning at-

tractor models for motor behaviors. Neural computation,

25(2):328–373, 2013.

[6] Ajinkya Jain and Scott Niekum. Efﬁcient hierarchical

robot motion planning under uncertainty and hybrid

dynamics. arXiv preprint arXiv:1802.04205, 2018.

[7] Aaron M Johnson, Samuel A Burden, and Daniel E

Koditschek. A hybrid systems model for simple manip-

ulation and self-manipulation systems. The International

Journal of Robotics Research, 35(11):1354–1392, 2016.

[8] Nate Kohl and Peter Stone. Policy gradient reinforcement

learning for fast quadrupedal locomotion. In IEEE

International Conference on Robotics and Automation,

2004. Proceedings. ICRA’04. 2004, volume 3, pages

2619–2624. IEEE, 2004.

[9] George Konidaris, Scott Kuindersma, Roderic Grupen,

and Andrew Barto. Robot learning from demonstration

by constructing skill trees. The International Journal of

Robotics Research, 31(3):360–375, 2012.

[10] M. Koval, N. Pollard, and S. Srinivasa. Pre-and post-

contact policy decomposition for planar contact manip-

ulation under uncertainty. The International Journal of

Robotics Research, 35(1-3):244–264, 2016.

[11] O. Kroemer, S. Niekum, and G. Konidaris. A review of

robot learning for manipulation: Challenges, representa-

tions, and algorithms. arXiv preprint arXiv:1907.03146,

2019.

[12] G. Lee, Z. Marinho, A. Johnson, G. Gordon, S. Srini-

vasa, and M. Mason. Unsupervised learning for non-

linear piecewise smooth hybrid systems. arXiv preprint

arXiv:1710.00440, 2017.

[13] S Levine, N Wagener, and P Abbeel. Learning contact-

rich manipulation skills with guided policy search (2015).

arXiv preprint arXiv:1501.05611.

[14] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter

Abbeel. End-to-end training of deep visuomotor policies.

The Journal of Machine Learning Research, 17(1):1334–

1373, 2016.

[15] Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind

Rajeswaran, and Emanuel Todorov. Reinforcement

learning for non-prehensile manipulation: Transfer from

simulation to physical system. In IEEE International

Conference on Simulation, Modeling, and Programming

for Autonomous Robots, pages 35–42. IEEE, 2018.

[16] M. Mathew, S. Sidhik, M. Sridharan, M. Azad,

A. Hayashi, and J. Wyatt. Online Learning of Feed-

Forward Models for Task-Space Variable Impedance

Control. In International Conference on Humanoid

Robots (Humanoids), 2019.

[17] Yutaka Nakamura, Takeshi Mori, Masa-aki Sato, and

Shin Ishii. Reinforcement learning for a biped robot

based on a cpg-actor-critic method. Neural networks,

20(6):723–735, 2007.

[18] D. Nguyen-Tuong, M. Seeger, and J. Peters. Model learn-

ing with local gaussian process regression. Advanced

Robotics, 23(15):2015–2034, 2009.

[19] S. Niekum, S. Chitta, A. Barto, B. Marthi, and S. Os-

entoski. Incremental semantically grounded learning

from demonstration. In Robotics: Science and Systems,

volume 9, pages 10–15607. Berlin, Germany, 2013.

[20] P. Pastor, M. Kalakrishnan, L. Righetti, and S. Schaal.

Towards associative skill memories. In 2012 12th IEEE-

RAS International Conference on Humanoid Robots (Hu-

manoids), pages 309–315. IEEE, 2012.

[21] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,

B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,

R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-

napeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-

learn: Machine learning in Python. Journal of Machine

Learning Research, 12:2825–2830, 2011.

[22] Joseph M Romano, Kaijen Hsiao, G¨

unter Niemeyer,

Sachin Chitta, and Katherine J Kuchenbecker. Human-

inspired robotic grasp control with tactile sensing. IEEE

Transactions on Robotics, 27(6):1067–1079, 2011.

[23] M. Song and H. Wang. Highly efﬁcient incremental

estimation of Gaussian mixture models for online data

stream clustering. In K. L. Priddy, editor, SPIE Confer-

ence Series, volume 5803, pages 174–183, March 2005.

doi: 10.1117/12.601724.

[24] Freek Stulp, Evangelos A Theodorou, and Stefan Schaal.

Reinforcement learning with sequences of motion prim-

itives for robust manipulation. IEEE Transactions on

robotics, 28(6):1360–1370, 2012.

[25] H. Sung. Gaussian mixture regression and classiﬁcation.

PhD thesis, Rice University, 2004.

[26] Yuval Tassa and Emo Todorov. Stochastic complemen-

tarity for local control of discontinuous dynamics. 2010.

[27] Marc Toussaint, Kelsey Allen, Kevin A Smith, and

Joshua B Tenenbaum. Differentiable physics and sta-

ble modes for tool-use and manipulation planning. In

Robotics: Science and Systems, 2018.

[28] Chenguang Yang, Gowrishankar Ganesh, Sami Had-

dadin, Sven Parusel, Alin Albu-Schaeffer, and Etienne

Burdet. Human-like adaptation of force and impedance

in stable and unstable interactions. IEEE transactions on

robotics, 27(5):918–930, 2011.

[29] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An

efﬁcient data clustering method for very large databases.

SIGMOD Rec., 25(2):103–114, June 1996. ISSN 0163-

5808.

[30] Tian Zhang, Raghu Ramakrishnan, and Miron Livny.

Birch: A new data clustering algorithm and its appli-

cations. Data Mining and Knowledge Discovery, 1(2):

141–182, 1997.