PreprintPDF Available

Online Learning of Hybrid Models for Variable Impedance Control of a Changing-Contact Manipulation Task

Preprints and early-stage research may not have been peer reviewed yet.
Preprint

Online Learning of Hybrid Models for Variable Impedance Control of a Changing-Contact Manipulation Task

Abstract and Figures

Many manipulation tasks comprise discrete action sequences characterized by continuous dynamics, with the transitions between these discrete dynamic modes characterized by discontinuous dynamics. The individual modes can represent different types of contacts, surfaces, or other factors, and different control strategies may be needed for each mode and the transitions between modes. This paper describes a piece-wise continuous, hybrid control framework that automatically detects transitions between modes and incrementally learns a model of the dynamics of each mode to support variable impedance control in that mode. The recognition of modes is invariant to the direction of motion and the magnitude of applied forces. Also, new modes are identified automatically and suitable models of the corresponding dynamics are learned. The framework is evaluated on a robot manipulator sliding an object on a surface along a desired motion trajectory in the presence of changes in surface friction, applied force, or the type of contact between the object and the surface. Experimental results indicate reliable and efficient recognition of modes, learning of dynamics models, and variable-impedance control during task execution.
Content may be subject to copyright.
Online Learning of Hybrid Models for Variable
Impedance Control of a Changing-Contact
Manipulation Task
Saif Sidhik
University of Birmingham, UK
Mohan Sridharan
University of Birmingham, UK
Dirk Ruiken
Honda Research Institute (EU), Germany
Abstract—Many manipulation tasks comprise discrete action
sequences characterized by continuous dynamics, with the tran-
sitions between these discrete dynamic modes characterized by
discontinuous dynamics. The individual modes can represent
different types of contacts, surfaces, or other factors, and
different control strategies may be needed for each mode and
the transitions between modes. This paper describes a piece-wise
continuous, hybrid control framework that automatically detects
transitions between modes and incrementally learns a model
of the dynamics of each mode to support variable impedance
control in that mode. The recognition of modes is invariant to
the direction of motion and the magnitude of applied forces.
Also, new modes are identified automatically and suitable models
of the corresponding dynamics are learned. The framework is
evaluated on a robot manipulator sliding an object on a surface
along a desired motion trajectory in the presence of changes in
surface friction, applied force, or the type of contact between
the object and the surface. Experimental results indicate reliable
and efficient recognition of modes, learning of dynamics models,
and variable-impedance control during task execution.
I. MOTI VATI ON
Consider a robot manipulator sliding an object over a
surface along a desired pattern, as shown in Figure 1. The
system’s dynamics vary markedly before and after the object
comes in contact with the surface, and based on the type
of contact (e.g., surface or edge contact), surface friction,
applied force, and other factors. We consider all such tasks that
involve changes in dynamics due to changes in the nature of
contact, i.e., changes in the interaction between two objects as
changing-contact” tasks. Many practical manipulation tasks
are changing-contact tasks characterized by discontinuities in
the dynamics when the nature of the contact changes, making
it difficult to learn a single model of the task dynamics. They
can be modeled as a hybrid system with continuous dynamics
within each of a number of discrete dynamic modes that
may need a distinct control strategy [11]. Then, the overall
task’s dynamics are piece-wise continuous, with the system
transitioning between the individual modes over time.
Constructing separate (continuous) models for the different
modes, each well-suited for operation within a mode, elim-
inates the need for a combined model but it introduces the
need for a transition model that chooses from the models and
the control strategies for the individual modes. Such a hybrid
model [18] uses the modularity of the sub-tasks to construct the
overall transition model [12]. However, it requires the ability
Fig. 1: Sliding an object along a desired pattern on three surfaces with
different values of friction; images represent different transitions.
to accurately recognize the mode at any point in time, revise
the existing dynamics models to adapt to the changes within
any given mode, and to identify and learn dynamics models for
previously unseen modes. This paper describes a framework
that addresses these requirements and enables hybrid control
of a changing-contact manipulation task by:
1) Incrementally learning a non-linear, piece-wise contin-
uous model of the dynamics of any given task from a
single demonstration, without prior knowledge of the
task, its modes, or the order in which the modes appear.
2) Incorporating a transition model that automatically clus-
ters the modes associated with any given task, and helps
transition to new modes during task execution.
3) Introducing a reduced feature representation that makes
the learning of dynamics models computationally effi-
cient, and makes the identification of modes independent
of the motion direction and magnitude of applied forces.
4) Incrementally and continuously learns and revises a
probabilistic model of the dynamics of any particular
mode, using the model for variable impedance control
and compliant motion within that mode.
The novelty is in the first three contributions; the last one
builds on our prior work on variable impedance control of
continuous contact tasks [16]. To better understand the learn-
ing and control challenges faced by such a learning framework
(e.g., discretely-changing, non-linear, piecewise continuous
environment dynamics), we chose to explore a representative
changing-contact task with discretely changing dynamics: a
robot manipulator sliding an object on a surface in a desired
motion pattern. We limit sensor input to that from a force-
torque sensor on the manipulator (i.e., we do not use an
external camera), and evaluate the framework in the presence
of discrete changes in surface friction, applied force, and type
of contact.
We begin with a review of related work in Section II,
followed by a description of the proposed framework in
Section III. Section IV discusses the experimental results,
followed by the conclusions in Section V.
II. RE LATE D WOR K
There is a rich literature of methods developed to address
the learning and control problems associated with robot ma-
nipulation [11]. This includes reinforcement learning (RL)
methods [9, 24] and recent methods based on a combination
of deep networks and RL for learning flexible behaviors from
complex data [1, 4, 14, 15]. These data-driven methods require
large labeled datasets, which often need to be obtained from
multiple repetitions of the task by the robot. These require-
ments are difficult to satisfy in practical domains, especially
on a physical robot. Also, the training process optimizes
parameters of specific skills (or their sequence), and the
internal representations and decision making mechanisms are
opaque, making it difficult to transferred the learned policies to
new tasks. Sim-to-real learning strategies have been developed
to reduce the need to perform training on real robots for
manipulation tasks. However, aspects of dynamics in the real
world, e.g., the continuous time dynamics of rigid bodies with
friction, are too complicated (NP-hard) to be modelled in a
real-time dynamics simulator [2, 7]. Also, these methods are
also not well-suited to work with a hybrid system formulation
because they (implicitly or explicitly) consider a single model
over the different modes of the manipulation action [11].
RL and optimal control methods applied to robot ma-
nipulation often assume that the underlying task dynamics
are smooth. Also, the application of learning strategies to
hybrid systems has been limited [12, 26], with many of them
focusing on bipedal locomotion [8, 17]. Planning approaches
for manipulation domains often explicitly take the multi-modal
structure of the dynamics of manipulation into account [27, 6].
However, these planning methods assume a pre-defined model
of the system and prior knowledge of actions and the modes.
Unlike online learning approaches such as [28], our framework
does not require a periodically repeating trajectory, nor does
it learn a time-series of controller parameters to be used in
a repeatable dynamic environment. The framework learns to
adapt its controller based on the current dynamic forces it
experiences.
Many methods have shown the benefits of incorporating
modes or phases into the design of controllers [22], and
many methods have been proposed to learn controllers for
such multi-phase tasks [3, 10, 13]. Different strategies of
sequencing motion primitives have also been used to solve ma-
nipulation tasks. However, most of these assume that a library
of modes or motion primitives already exists [20], or segment a
sequence of primitives from human demonstrations [19]. This
makes the learned policy dependent on the specific movements
and their sequence.
The framework described in this paper for changing-contact
manipulation draws inspiration from the approaches that in-
corporate modes in the design of controllers. However, our
framework supports (a) automatic recognition of modes and
identification of new modes invariant to the direction of motion
and magnitude of the applied force; and (b) incremental learn-
ing and revision of dynamic models for variable impedance
control in the individual modes.
III. PROB LE M FORMULATION AND FRA ME WORK
This section first describes the formulation of changing-
contact manipulation tasks as a piece-wise continuous hybrid
system (Section III-A). Section III-B describes the control
strategy and learning of continuous dynamics within a single
mode. The detection and learning of the discrete dynamic
modes are then explained in Section III-C.
A. Piece-wise Continuous Hybrid System
In a piece-wise continuous hybrid system, the state can
be described as the tuple (m, s)where mMis a mode
from the discrete set of modes M, and sSmis an
element in the continuous subspace SmRdassociated
with m. This formulation assumes that subspaces do not
intersect or overlap, i.e., SmSn=∅ ∀ m6=n. The
evolution of swithin a mode is determined by a discrete-
time continuous function Sm(.), but the state transition is
discrete and discontinuous at the boundaries between modes.
Lee et al. [12] called the boundary between modes mand m0,
where the transition occurs deterministically, as guard regions
denoted as Gm,m0Sm. In the guard regions, sis transported
to st+1 Sm0through a reset function rm,m0(.). The state
propagation is thus governed by:
st+1 =(rmt,mt+1 (st) + wtif stGmt,mt+1
Sm(st) + wtif stSmt
(1)
where wtis additive (Gaussian) process noise. In the context
of the sliding task considered in this paper, the forces and
torques measured by the robot at its end-effector constitute
the observable state (s) of the system that varies continuously
within each contact mode. This formulation makes the reason-
able assumption that properties such as friction are continuous
across the surface of each object. The control strategy guiding
the object’s motion and the (static or smoothly changing)
environment in that mode can be considered to determine the
function Sm(.)governing the evolution of sin that mode.
When mode changes occur (guard regions), the dynamics is
transported to a new state in mode nwhere the state evolution
is guided by function Sn(.). For changing-contact tasks, the
guard are sudden and pronounced compared with the readings
within a contact mode. The mode switches impose structure
on manipulation tasks; the transitions can be considered as
triggers for changing the current model of the environment.
B. Control Strategy and Learning Dynamics in a Mode
The control strategy and the method to learn the dynamics
model for each separate mode was improved from our previous
work [16]. Our approach for learning the continuous dynamics
of individual mode uses an Incremental Gaussian Mixture
Model (IGMM) approach [23]. IGMM internally uses a variant
of the Expectation-Maximization (EM) algorithm to fit the
model. In our implementation, GMM was incrementally fit
over points X= (X1, ..., XT), with Xt= [St1, Dt]where
each point contains information about some previous observ-
able state (S), along with the current values to be predicted
(D). When the learned model is used during task execution,
values for the next time instant is predicted as a function of
the robot’s current state, (Dt+1|St), using Gaussian Mixture
Regression (GMR) [25]. In this work, the forward model
learns to predict the end-effector forces and torques ([Feet, τt])
from the previous end-effector force, torque, and end-effector
velocity ([Feet1, τt1,˙xt1]). We used the magnitudes of
force, torque, and end-effector velocity for learning and pre-
diction instead of their 3D vector representation. Since the
magnitudes of frictional forces and torques are independent
of the direction of motion (in ideal cases), the simplified
representation is sufficient to learn and predict the end-effector
forces and torques along the direction of motion. This reduced
representation of forces and torques made the learning process
simpler, more computationally efficient, and also independent
of the direction of motion. The learned model always predicts
the forces and torques along (or against) the direction of
motion. The components of force and torques along the axes
of motion can be recovered when needed or estimated using
the previously measured sensor values.
The predictions from the forward model provide the feed-
forward term that cancels out the effect of the environment
forces (friction) during motion, in the control equation:
ut=Kp
txt+Kd
t∆ ˙xt+uf c
t+λt1kt(2)
ufc
t=Kf
tFt+Fd
t(3)
Kp
t=Kp
free + (1 λt1)(Kp
max Kp
free)(4)
λt= 1 1
1 + er(εtε0)(5)
where utis the control command to the robot (i.e., task space
force) at time t,Kp
tand Kd
tare the (positive definite) stiffness
and damping matrices of the feedback controller for motion;
ufc
tis the simple force feedback control (Equation 3) for the
interaction task (the directions of the force control is orthogo-
nal to direction of motion control) with the proportional gain
Kf
tfor the error in task-space force F;ktis the feed-forward
term (end-effector forces and torques) predicted by the forward
model associated with the present mode mt, using GMR as
described previously; and xand ∆ ˙xare the errors in the end-
effector position and velocity at each instant. The weighting
factor λtis a function of the accuracy of the forward model
at instant t, that maps the error in prediction from the forward
model (εt) to a value between 0 and 1 (such as the logistic
function in Equation 5). The logistic growth rate rand the
sigmoid midpoint ε0are hyperparameters that have to be tuned
for the task. It ensures that the overall control law (Equation
2) relies on the feed-forward term only if the dynamics of the
mode is learned accurately. Otherwise, the robot should aim
to follow the goal trajectory more accurately by prioritizing
the feedback control term.
Equation 4 defines how the stiffness parameter is updated
at each instant in the variable-impedance control law as a
function of prediction accuracy of the forward model. Kp
max
is the maximum allowed stiffness, and Kp
free is the mini-
mum stiffness parameter that would provide accurate position
tracking in the absence of all external disturbances (motion in
free space). The damping term is updated as Kd
t=pKp
t/4
using the constraint for critically-damped systems [5]. We
demonstrated the advantage of using this hybrid variable
impedance formulation in [16].
C. Contact Mode Recognition and Identification
Our approach for recognizing known modes and identifying
new modes in changing-contact tasks is based on the obser-
vation that any change in mode is accompanied by a sudden
significant change in the sensor readings. In our framework,
the robot responds to pronounced changes in force-torque
measurements by briefly using a high-stiffness control strategy
while quickly obtaining a batch of sensor data to confirm and
respond to the transition. The robot learns a new dynamics
model if a new mode is detected, and transitions to (and
revises) an existing dynamics model if a known mode if
transitioning to a known mode.
The management of modes is based on an online incremen-
tal clustering algorithm called Balanced Iterative Reducing
and Clustering using Hierarchies (BIRCH) [29, 30]. This
algorithm incrementally and dynamically clusters incoming
data for given memory and time constraints, without having
to examine all existing data points or clusters. We used the
implementation of BIRCH in the Scikit-learn library [21].
Each cluster is considered to represent a mode in a feature
space (more details below), with the clusters being updated
using batches of the feature data. The fraction of the input
feature vectors assigned to any existing cluster determines the
confidence in the corresponding mode being the current mode.
If the highest such confidence value is above a threshold,
the dynamics model of that mode is used and revised until a
mode change occurs. If the feature vectors are not sufficiently
similar to an existing cluster, a new cluster (i.e., mode) and
the corresponding dynamics model are constructed and revised
(see Section III-B) until a mode transition occurs.
The key factor influencing the reliability and generalizability
is the choice of feature representation for the modes. This
representation is task dependent but the objective is to identify
one or more properties that vary substantially when change
occurs while concisely and uniquely representing the modes.
For the task of sliding an object over surfaces with different
values of friction, the property that strongly influences the
end-effector forces (Fee) is the friction coefficient between
the object and the surface. When two objects slide over each
Fig. 2: The torque measured at the pivot (τ) varies for different
relative orientation of the object (θ), unlike the force at the tip (Fr).
The object is moving along ˙xresulting in a frictional resistance Fr
at the point of contact in the opposite direction.
other at constant velocity, Fee is proportional to the applied
normal force (R) and the friction coefficient (µ) (assuming the
relative orientation of their surface normals do not change); µ
can then be estimated as:
µkFeek
R(6)
A concise feature representation for this task is thus kFt
eek
Rt,
which has the effect of making mode classification indepen-
dent of the magnitude of the applied force.
In a similar manner, for changes in the type of contact, end-
effector orientation is an useful feature, but small changes in
orientation may require different modes. A more reasonable
feature is the magnitude of the end-effector torques that can
be measured using the force-torque sensor in the wrist:
τ=Frlsin θ(7)
where Fris the force at the tip, lis the length of the pivot
arm, and θis the orientation between the surface normals.
Figure 2 indicates that for any object, τis different for the
different types of contacts. With the magnitude of the torques
(kτk) as the feature representation, modes can be classified
independent of the motion direction and object orientation.
This representation would not work when the magnitude of
the applied force differs. If we instead assume that the force
measured at the wrist (Fee) approximates the force at the
tip of the object (Fr), Equations 6 and 7 imply that kτtk
Rtis
invariant to the magnitude of the applied force for a fixed
relative orientation between the objects in contact:
τ=µRl sin(θ)
Ideally, kτk
Ris constant for each mode (based on θ) provided
object geometry (l) and friction (µ) do not change. Experimen-
tal analysis revealed that this parameter by itself is insufficient
to distinguish between contacts when the applied normal
force changes because the assumption about kinematic friction
(Fr=µR) does not hold in many real-world situations [2]. We
thus use [kτk
R,kFeek
R]as the feature representation for this task;
it supports better generalization over different normal forces
while reliably distinguishing different changing contacts.
TABLE I: Control loop of framework
Input : Control parameters: Kp
free,Kp
max; Dynamics
models fi, i [1, M ]; Current mode: m= 0.
1while Motion pattern not complete do
2if Object in contact with surface then
3if mode transition detected then
4Kp
tKp
max
5m=classif y mode learn dynamics()
6else
7Update and use fmfor control
(Section III-B)
8end
9else
10 Kp
tKp
free
11 end
12 end
Algorithm I is an overview of the framework’s control
loop for a manipulator sliding an object on a surface; it
proceeds until a desired motion pattern is completed. Control
and learning methods are used only after the object comes in
contact with the surface (lines 2-11). As described earlier, the
robot responds to a detected mode transition by setting a high
stiffness, collecting feature samples, determining the mode and
learning/revising the corresponding dynamics model (lines 3-
5). In the absence of a mode transition, the robot continues
with its current mode and dynamics model (lines 6-8).
IV. EXP ER IM EN TAL EVALUATIO N
We used a 7-DoF Franka Emika Panda manipulator robot
for our experiments. The robot had to slide an object along
a desired motion pattern on a surface, and we considered a
“changing surface” task and a “changing contact type” task to
evaluate the hypotheses. We used the root mean square error
(RMSE) in following/tracking a desired motion trajectory as
a key performance measure.
To evaluate the need for separate models, the robot was
asked to slide an object across two surfaces with markedly
different friction, with the transition point being unknown to
the robot. The robot had a dynamics model for the first surface
(learned a priori) but no models for the second surface. In
90% of the trials, the robot was unable to complete the task.
The feed-forward values being predicted by the model for the
rougher surface were much higher than those required for the
smoother surface, making the robot overshoot (when it tran-
sitioned to the smoother surface) and reach the safety limits
of joint torques, causing the robot to stop. This indicated that
performance is unreliable with a single incrementally revised
model when there are pronounced discrete mode changes.
Further, the robot was allowed to build a new model from
scratch each time a mode switch was observed. However, the
robot had to operate with high-stiffness for a longer time
Fig. 3: Modes detected by the transition model and their confidence
values. The numbers on top (in green) indicate the confidence with
which the transition model identified that mode. The number below
(in red) shows the mode with the next highest confidence. “N”
indicates a transition to a new mode. The red vertical lines along
the x-axis indicate the actual occurrence of mode transitions.
until a reliable forward (dynamics) model for the new mode
was created, which is undesirable. On the other hand, when
different models for the two surfaces are available, the robot
was able to switch between them much faster, spending much
less time using the high-stiffness strategy. Results and figures
from these experiments are omitted for brevity.
We then experimentally evaluated the following hypotheses:
H1: The framework provides reliable and efficient perfor-
mance for changing-contact manipulation tasks; and
H2: The framework’s performance is robust to changes in the
direction of motion and applied forces.
to examine whether the framework reliably and efficiently
transitions to the appropriate model in the presence of changes
in direction of motion and applied forces.
To evaluate the hypotheses, we first considered the changing
surface task (Figure 1). The robot had to slide an object
back and forth between two surfaces, but one of the surfaces
was randomly changed to a surface with a different value
of friction. Starting with no knowledge about the surfaces, it
incrementally identifies each dynamic mode and builds a dy-
namics model for each mode (i.e., each distinct surface) while
operating briefly under high stiffness. Once it has learned
the dynamics models for the different modes, it responds to
any subsequent mode transition by quickly transitioning the
corresponding dynamics model identified.
Figure 3 summarizes the results over one trial of this
experiment. We observe that the framework is able to identify
transitions to existing or new modes with high confidence.
In each instance, the second best choice of mode is asso-
ciated with a much lower value of confidence. The results
also indicate that the algorithms and the underlying feature
representation make the performance robust to changes in the
direction of motion, i.e., a new mode is not identified when
the manipulator moves over a previously seen surface in a
different direction. There is some confusion between surfaces
2 and 3 because their friction values are somewhat similar.
Figure 4 shows the absolute error in trajectory tracking
during this task and the corresponding stiffness parameters
used by the controller. The peaks in the error plot correspond
to the sudden change of surface. The prediction made by
the model of the previous mode caused a momentary loss of
trajectory tracking accuracy, until the robot switches to the
high-stiffness mode for identifying the current mode. Once
the robot identified the current mode, it used lower stiffness
to complete the task. As discussed previously, switching to
Fig. 4: Performance of framework for changing-surface task. Top:
controller stiffness variation during the task. Bottom: absolute error
in trajectory tracking. The spikes correspond to an incorrect feed-
forward prediction by the previous model after the transition.
Fig. 5: The different contacts used.
a previously learned mode requires a much shorter period of
high stiffness (and expends much less energy) compared with
learning a new dynamics model from scratch. These results
support hypothesis H1, and to some extent H2.
Next we conducted experiments with the changing contact
type task. The robot had to slide an object along a trajectory on
a surface under three different types of contacts (Figure 5). The
robot started with no prior knowledge of the task. During each
trial, the robot approached the table to execute a particular type
of contact while maintaining a normal force of 10N. Contact
with the surface triggers a transition; the robot proceeds to
slide the object (in its grip) along the surface with the force of
10N. This is initially done at a high stiffness if it is learning
a new dynamics model, or at a suitably low stiffness if the
transition is to an existing mode/model.
Next, Figure 6 demonstrates the robustness of the frame-
work to motion along a direction different from that used
Ground Truth
Detected Mode Contact 1 Contact 2 Contact 3
Contact 1 83 9 16
Contact 2 2 88 1
Contact 3 14 2 79
New Mode 1 1 4
Ground Truth
Detected Mode Contact 1 Contact 2 Contact 3
Contact 1 81 10 17
Contact 2 3 86 1
Contact 3 15 2 77
New Mode 1 2 5
TABLE II: Confusion matrix of average confidence (%) across
10 trials associated with mode recognition based on the learned
dynamics models for three types of contacts Top: Normal force of
10N;Bottom: Normal force 20N.
Fig. 6: Testing the previously trained transition models for motion
in a different direction. Top: Torques measured about axis parallel
to surface and perpendicular to direction of motion; The spikes
in the measurements correspond to contacts; Middle: End-effector
forces predicted by the forward model for the current mode; Bottom:
Variation in controller stiffness due to the predicted forces.
during training. The feed-forward model predictions and the
corresponding variable impedance behaviour for one of the
trials is shown, along with the model chosen with the highest
confidence (bottom of the figure). The identified modes match
the true mode in all cases.
Fig. 7: Testing the previously trained transition model under different
normal force (20N instead of 10N). Top: Torques measured about
axis parallel to surface and perpendicular to direction of motion; The
spikes in the measurements correspond to making contact; Middle:
End-effector forces predicted by the forward model for the current
mode; Bottom: Controller stiffness variation due to the predicted
forces.
The framework was then tested for the same task and
contacts while applying a different (constant) normal force
on the surface during the sliding motion (Figure 7). Although
the confidence associated with the modes is a little lower and
the time taken to recognize the modes is a little more, the
framework is still able to recognize the modes correctly and
the task is completed successfully using variable impedance
control. The lower confidence can be attributed to the kinetic
friction assumption (µ=F/R) being unrealistic in many real
world tasks. These results support hypotheses H1 and H2.
V. CONCLUSIONS AND FUTURE WO RK
This paper described a framework that formulated changing-
contact tasks as a piece-wise continuous, hybrid system.
Our framework does not require any prior knowledge about
the environment or the objects involved. Unlike data-driven
methods that require many labeled training examples, our
framework is able to build an initial dynamics model for
each observed mode from one demonstration of the target
task, incrementally revising (and introducing new) dynamics
models during task execution; the learned models provide
smooth variable impedance control within each mode. Unlike
other existing methods for related manipulation tasks [19], our
method is not limited to the sequence of modes seen dur-
ing demonstrations. Also, unlike existing work that modeled
discretely changing dynamics [12], our framework requires
no prior information about the number of modes involved in
the task. In the context of a manipulator sliding an object
along desired motion trajectories on a surface in the presence
of changing friction, applied force, and type of contacts,
experimental results demonstrate the framework’s ability to
reliably and efficiently follow the desired motion trajectories
invariant to changes in the direction of motion and magnitude
of applied forces.
Our future work will address the current limitation of the
framework. For instance, the current strategy of switching
between modes (and dynamics models) is not smooth, with
spikes in sensor measurements in the guard (i.e., transition)
regions. Also, we will explore tasks with many more modes.
Another direction for future research is to investigate other
examples of changing-contact manipulation tasks, and addi-
tional factors that influence such tasks. In addition, it would
also be interesting to explore the automatic selection (or
learning) of the feature representation for each changing-
contact manipulation task. The longer-term objective is to
enable reliable, efficient, and smooth learning and control
in the context of a robot manipulator performing complex
assembly tasks with multiple objects in complex domains.
REFERENCES
[1] Marcin Andrychowicz, Bowen Baker, Maciek Chociej,
Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur
Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al.
Learning dexterous in-hand manipulation. arXiv preprint
arXiv:1808.00177, 2018.
[2] D. Baraff. Coping with friction for non-penetrating rigid
body simulation. ACM SIGGRAPH, 25(4):31–41, 1991.
[3] L. Bus¸oniu, T. de Bruin, D. Toli´
c, J. Kober, and
I. Palunko. Reinforcement learning for control: Per-
formance, stability, and deep approximators. Annual
Reviews in Control, 46:8–28, 2018.
[4] Karol Hausman, Jost Tobias Springenberg, Ziyu Wang,
Nicolas Heess, and Martin Riedmiller. Learning an
embedding space for transferable robot skills. In Inter-
national Conference on Learning Representations, 2018.
[5] A.J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and
S. Schaal. Dynamical movement primitives: learning at-
tractor models for motor behaviors. Neural computation,
25(2):328–373, 2013.
[6] Ajinkya Jain and Scott Niekum. Efficient hierarchical
robot motion planning under uncertainty and hybrid
dynamics. arXiv preprint arXiv:1802.04205, 2018.
[7] Aaron M Johnson, Samuel A Burden, and Daniel E
Koditschek. A hybrid systems model for simple manip-
ulation and self-manipulation systems. The International
Journal of Robotics Research, 35(11):1354–1392, 2016.
[8] Nate Kohl and Peter Stone. Policy gradient reinforcement
learning for fast quadrupedal locomotion. In IEEE
International Conference on Robotics and Automation,
2004. Proceedings. ICRA’04. 2004, volume 3, pages
2619–2624. IEEE, 2004.
[9] George Konidaris, Scott Kuindersma, Roderic Grupen,
and Andrew Barto. Robot learning from demonstration
by constructing skill trees. The International Journal of
Robotics Research, 31(3):360–375, 2012.
[10] M. Koval, N. Pollard, and S. Srinivasa. Pre-and post-
contact policy decomposition for planar contact manip-
ulation under uncertainty. The International Journal of
Robotics Research, 35(1-3):244–264, 2016.
[11] O. Kroemer, S. Niekum, and G. Konidaris. A review of
robot learning for manipulation: Challenges, representa-
tions, and algorithms. arXiv preprint arXiv:1907.03146,
2019.
[12] G. Lee, Z. Marinho, A. Johnson, G. Gordon, S. Srini-
vasa, and M. Mason. Unsupervised learning for non-
linear piecewise smooth hybrid systems. arXiv preprint
arXiv:1710.00440, 2017.
[13] S Levine, N Wagener, and P Abbeel. Learning contact-
rich manipulation skills with guided policy search (2015).
arXiv preprint arXiv:1501.05611.
[14] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter
Abbeel. End-to-end training of deep visuomotor policies.
The Journal of Machine Learning Research, 17(1):1334–
1373, 2016.
[15] Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind
Rajeswaran, and Emanuel Todorov. Reinforcement
learning for non-prehensile manipulation: Transfer from
simulation to physical system. In IEEE International
Conference on Simulation, Modeling, and Programming
for Autonomous Robots, pages 35–42. IEEE, 2018.
[16] M. Mathew, S. Sidhik, M. Sridharan, M. Azad,
A. Hayashi, and J. Wyatt. Online Learning of Feed-
Forward Models for Task-Space Variable Impedance
Control. In International Conference on Humanoid
Robots (Humanoids), 2019.
[17] Yutaka Nakamura, Takeshi Mori, Masa-aki Sato, and
Shin Ishii. Reinforcement learning for a biped robot
based on a cpg-actor-critic method. Neural networks,
20(6):723–735, 2007.
[18] D. Nguyen-Tuong, M. Seeger, and J. Peters. Model learn-
ing with local gaussian process regression. Advanced
Robotics, 23(15):2015–2034, 2009.
[19] S. Niekum, S. Chitta, A. Barto, B. Marthi, and S. Os-
entoski. Incremental semantically grounded learning
from demonstration. In Robotics: Science and Systems,
volume 9, pages 10–15607. Berlin, Germany, 2013.
[20] P. Pastor, M. Kalakrishnan, L. Righetti, and S. Schaal.
Towards associative skill memories. In 2012 12th IEEE-
RAS International Conference on Humanoid Robots (Hu-
manoids), pages 309–315. IEEE, 2012.
[21] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-
learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.
[22] Joseph M Romano, Kaijen Hsiao, G¨
unter Niemeyer,
Sachin Chitta, and Katherine J Kuchenbecker. Human-
inspired robotic grasp control with tactile sensing. IEEE
Transactions on Robotics, 27(6):1067–1079, 2011.
[23] M. Song and H. Wang. Highly efficient incremental
estimation of Gaussian mixture models for online data
stream clustering. In K. L. Priddy, editor, SPIE Confer-
ence Series, volume 5803, pages 174–183, March 2005.
doi: 10.1117/12.601724.
[24] Freek Stulp, Evangelos A Theodorou, and Stefan Schaal.
Reinforcement learning with sequences of motion prim-
itives for robust manipulation. IEEE Transactions on
robotics, 28(6):1360–1370, 2012.
[25] H. Sung. Gaussian mixture regression and classification.
PhD thesis, Rice University, 2004.
[26] Yuval Tassa and Emo Todorov. Stochastic complemen-
tarity for local control of discontinuous dynamics. 2010.
[27] Marc Toussaint, Kelsey Allen, Kevin A Smith, and
Joshua B Tenenbaum. Differentiable physics and sta-
ble modes for tool-use and manipulation planning. In
Robotics: Science and Systems, 2018.
[28] Chenguang Yang, Gowrishankar Ganesh, Sami Had-
dadin, Sven Parusel, Alin Albu-Schaeffer, and Etienne
Burdet. Human-like adaptation of force and impedance
in stable and unstable interactions. IEEE transactions on
robotics, 27(5):918–930, 2011.
[29] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An
efficient data clustering method for very large databases.
SIGMOD Rec., 25(2):103–114, June 1996. ISSN 0163-
5808.
[30] Tian Zhang, Raghu Ramakrishnan, and Miron Livny.
Birch: A new data clustering algorithm and its appli-
cations. Data Mining and Knowledge Discovery, 1(2):
141–182, 1997.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies that can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system such as friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM .
Conference Paper
Full-text available
During the initial trials of a novel manipulation task, humans tend to keep their arms considerably stiff in order to reduce the effects of any unforeseen disturbances on the ability to perform the task accurately. After a few repetitions, humans reduce and adapt the stiffness of their arms without any significant reduction in task performance. Research in human motor control strongly indicates that humans learn and continuously revise internal models of manipulation tasks to support such adaptive behaviour. These internal models help predict future states of the task, anticipate necessary control actions, and adapt impedance quickly to match task requirements. Drawing inspiration from these findings, we propose a novel framework that supports the online learning of a time-independent forward model of a manipulation task from a small number of examples. The measured inaccuracies in the predictions of this model are used to dynamically update the forward model and modify the impedance parameters of a feedback controller during task execution. Furthermore, our framework includes a hybrid force-motion controller that enables the robot to be compliant in particular directions (if required) while adapting the impedance in other directions. These capabilities are illustrated and evaluated on continuous contact tasks such as polishing a board and stirring porridge.
Article
Full-text available
We consider the problem of using real-time feedback from contact sensors to create closed-loop pushing actions. To do so, we formulate the problem as a partially observable Markov decision process (POMDP) with a transition model based on a physics simulator and a reward function that drives the robot towards a successful grasp. We demonstrate that it is intractable to solve the full POMDP with traditional techniques and introduce a novel decomposition of the policy into pre- and post-contact stages to reduce the computational complexity. Our method uses an offline point-based solver on a variable-resolution discretization of the state space to solve for a post-contact policy as a pre-computation step. Then, at runtime, we use an A* search to compute a pre-contact trajectory. We prove that the value of the resulting policy is within a bound of the value of the optimal policy and give intuition about when it performs well. Additionally, we show the policy produced by our algorithm achieves a successful grasp more quickly and with higher probability than a baseline QMDP policy on two different objects in simulation. Finally, we validate our simulation results on a real robot using commercially available tactile sensors.
Article
Full-text available
Rigid bodies, plastic impact, persistent contact, Coulomb friction, and massless limbs are ubiquitous simplifications introduced to reduce the complexity of mechanics models despite the obvious physical inaccuracies that each incurs individually. In concert, it is well known that the interaction of such idealized approximations can lead to conflicting and even paradoxical results. As robotics modeling moves from the consideration of isolated behaviors to the analysis of tasks requiring their composition, a mathematically tractable framework for building models that combine these simple approximations with reliable results is overdue. In this paper we present a formal hybrid dynamical system model that introduces suitably restricted compositions of these familiar abstractions with the guarantee of a certain kind of consistency analogous to global existence and uniqueness in classical dynamical systems. While a real system will have continuous (though possibly very stiff and fast) dynamics through impacts, the hybrid system developed here provides a discontinuous but self--consistent approximation to the dynamics. The modeling choices sacrifice exact quantitative accuracy for qualitatively correct and analytically tractable results with certain formal guarantees.
Article
Reinforcement learning (RL) offers powerful algorithms to search for optimal controllers of systems with nonlinear, possibly stochastic dynamics that are unknown or highly uncertain. This review mainly covers artificial-intelligence approaches to RL, from the viewpoint of the control engineer. We explain how approximate representations of the solution make RL feasible for problems with continuous states and control actions. Stability is a central concern in control, and we argue that while the control-theoretic RL subfield called adaptive dynamic programming is dedicated to it, stability of RL largely remains an open question. We also cover in detail the case where deep neural networks are used for approximation, leading to the field of deep RL, which has shown great success in recent years. With the control practitioner in mind, we outline opportunities and pitfalls of deep RL; and we close the survey with an outlook that – among other things – points out some avenues for bridging the gap between control and artificial-intelligence RL techniques.
Article
Policy search methods based on reinforcement learning and optimal control can allow robots to automatically learn a wide range of tasks. However, practical applications of policy search tend to require the policy to be supported by hand-engineered components for perception, state estimation, and low-level control. We propose a method for learning policies that map raw, low-level observations, consisting of joint angles and camera images, directly to the torques at the robot's joints. The policies are represented as deep convolutional neural networks (CNNs) with 92,000 parameters. The high dimensionality of such policies poses a tremendous challenge for policy search. To address this challenge, we develop a sensorimotor guided policy search method that can handle high-dimensional policies and partially observed tasks. We use BADMM to decompose policy search into an optimal control phase and supervised learning phase, allowing CNN policies to be trained with standard supervised learning techniques. This method can learn a number of manipulation tasks that require close coordination between vision and control, including inserting a block into a shape sorting cube, screwing on a bottle cap, fitting the claw of a toy hammer under a nail with various grasps, and placing a coat hanger on a clothes rack.