Content uploaded by Kazutoshi Tanaka
Author content
All content in this area was uploaded by Kazutoshi Tanaka on Jun 03, 2020
Content may be subject to copyright.
Learning Robotic Assembly Tasks with Lower Dimensional Systems by
Leveraging Physical Softness and Environmental Constraints
Masashi Hamaya1, Robert Lee2, Kazutoshi Tanaka1, Felix von Drigalski1, Chisato Nakashima3,
Yoshiya Shibata3, and Yoshihisa Ijiri1,3
Abstract— In this study, we present a novel control frame-
work for assembly tasks with a soft robot. Typically, existing
hard robots require high frequency controllers and precise
force/torque sensors for assembly tasks. The resulting robot
system is complex, entailing large amounts of engineering and
maintenance. Physical softness allows the robot to interact with
the environment easily. We expect soft robots to perform assem-
bly tasks without the need for high frequency force/torque con-
trollers and sensors. However, specific data-driven approaches
are needed to deal with complex models involving nonlinearity
and hysteresis. If we were to apply these approaches directly,
we would be required to collect very large amounts of training
data. To solve this problem, we argue that by leveraging softness
and environmental constraints, a robot can complete tasks in
lower dimensional state and action spaces, which could greatly
facilitate the exploration of appropriate assembly skills. Then,
we apply a highly efficient model-based reinforcement learning
method to lower dimensional systems. To verify our method, we
perform a simulation for peg-in-hole tasks. The results show
that our method learns the appropriate skills faster than an
approach that does not consider lower dimensional systems.
Moreover, we demonstrate that our method works on a real
robot equipped with a compliant module on the wrist.
I. INTRODUCTION
Robotic manipulation for assembly tasks presents great
potential for full automation in industrial situations. One of
the most common assembly tasks is the peg-in-hole task.
Many researchers have explored control methods for peg-in-
hole tasks [1]. However, these tasks remain challenging since
they require the controllers to deal with low tolerance and
complex models such as contacts and jamming between the
peg and hole.
To perform such peg-in-hole tasks, force controllers have
been widely applied [1]. Furthermore, recent developments
have demonstrated high precision control solutions for peg-
in-hole tasks [2], [3]. However, these approaches often
employed high frequency controllers or precise force/torque
sensors, and therefore, they largely depend on hardware
performance. If the robots could more easily interact with the
environment without such controllers or sensors, the robot
system would be much simpler.
To this end, we use physically soft robots (including
compliant components such as springs and silicon) for the
peg-in-hole tasks because the softness allows the robot to
1MH, KT, FD, and YI are with OMRON SINIC X Corporation, Tokyo,
Japan masashi.hamaya@sinicx.com
2RL is with the Australian Center for Robotic Vision at Queensland
University of Technology in Brisbane, Australia (work done as an intern at
OMRON SINIC X Corp.)
3YS, CN, and YI are with OMRON Corporation, Tokyo, Japan
Align
Fit
Insertion
Approach
Contact
Physical
softness
Possible state space
(e.g. position, velocity)
Possible action space
(e.g. position, velocity, force)
Environmental
constraints
Model-based reinforcement learning
Robot with
compliant wrist
Sequence of
peg-in-hole skills
Lower dimensional system
Fig. 1. Proposed framework for assembly tasks with a soft robot. We
develop the lower dimensional space using softness and environmental
constraints to make learning tractable. Then, we apply sample-efficient
model-based reinforcement learning.
interact easily with the environment. Therefore, we expect
that the soft robot will be able to perform assembly tasks
even with low frequency controllers since the softness can
morphologically handle contact-rich interactions without the
need for significant engineering effort. Meanwhile, obtaining
analytical models of soft robots is difficult due to the
nonlinear or hysteretic nature of soft components. This
makes hand designed controllers for them difficult. One
promising approach is to use certain data-driven approaches
to address the complexity. However, if we directly apply
such approaches to the robots, we would be required to
collect a large amount of real robot data to learn the complex
dynamics in such non-linear and high dimensional systems.
To deal with this problem, we propose a novel soft
robotic control framework for peg-in-hole tasks. The key
insight of this study is that by leveraging the softness and
environmental constraints, the robot can complete the tasks in
lower dimensional state and action spaces. More concretely,
we divide the sequence of peg-in-hole skills into manipu-
lation primitives (MPs) (e.g., fit, align, and insertion) [4],
which allow us to consider only important state variables.
In addition, we do not require force controllers as we
replace the traditional rigid robot with soft robots [5]. We
can consider only desired position or velocity and ignore
desired force for the action space. This also could make our
approach accessible to a wider variety of robot systems. In
such a manner, we can considerably reduce the number of
dimensions for both state and action spaces by using softness
and environmental constraints. This facilitates exploration of
the appropriate peg-in-hole skills.
To obtain the skills for peg-in-hole tasks with as few
trials as possible, we apply a sample efficient model-based
reinforcement learning method called Probabilistic Inference
for Learning Control (PILCO) [6] to the lower dimensional
system. PILCO considers continuous state and action space,
and learns the nonlinear model with uncertainty using Gaus-
sian process (GP). By making use of the model uncertainty,
this algorithm shows remarkable sample efficiency when
compared to existing model-free reinforcement learning [6].
This has also been applied to complex systems such as in-
hand manipulation [7] and human-robot interactions [8]. For
this reason, PILCO is suitable for learning peg-in-hole tasks,
especially in industrial scenarios where the robots need to
adapt to new environments rapidly.
We summarize our proposed framework in Fig. 1. The
contributions of this study are as follows:
•We propose a novel control framework for peg-in-hole
tasks using a physically soft robot. We demonstrate that
the softness and environmental constraints allow the
robot to perform the tasks in lower dimensional state
and action spaces.
•We employ sample-efficient model-based reinforcement
learning for the tasks, which can obtain the appropriate
skills from scratch with a few interactions.
•We perform a simulation to verify our method. We
develop a peg-in-hole simulator with passive compliant
components. The results show that our method success-
fully learned the peg-in-hole skills faster than a method
without the lower dimensional systems. Moreover, we
demonstrate an experiment using a real robot with a
compliant wrist.
This paper is organized as follows. Section II presents
related works for our study. We introduce our lower dimen-
sional system in Section III and learning of peg-in-hole skills
in Section IV. Section V demonstrates the simulations and
experiment. Section VI discusses our results, and Section VII
concludes.
II. RELATED WORKS
In this section, we present related works for assembly
manipulation. The related works are broadly categorized into
impedance control methods, soft robot assembly methods,
and lower dimensional systems.
A. Force control
Force control methods have been explored extensively [1].
One of the most popular approaches is the model-based
controller. For example, contact state analyses-based force
controllers have been proposed [9], [10]. Van et al. developed
a force controller for multi-fingers [11]. Force sensors are
provided for the tips of each finger to calculate the desired
contact forces. Karako et al. also proposed a controller with
multi-fingers for ring insertion, which could compensate
for position errors using a high frequency controller [2].
Most model-based controllers employ force sensors or high
frequency controllers. Their controllers are largely affected
by hardware performance. Force controllers that exclude
force/torque sensors have also been introduced [12], [13].
However, they also suffer from model errors.
Recently, learning-based controllers have attracted much
attention. Deep reinforcement learning and self-supervised
learning are also used for assembly tasks [14], [15], [16],
[17]. Model-based reinforcement learning methods have also
been proposed [3], [18].
Although all of the above methods could learn adaptive
assembly skills, they also largely depend on force/torque
sensors or accurate torque controllers. In contrast to them, we
use a physically soft robot and develop a learning framework
for peg-in-hole tasks to address the model complexity.
B. Physically soft robots
Softness can address collisions with the environment and
improve skill performance [19]. Some studies have demon-
strated the effectiveness of softness in assembly tasks [20],
[21]. Soft tactile sensors have been developed to measure
the orientations of the grasped objects [22], [23]. Nishimura
et al. developed a passive compliant wrist and demonstrated
that a soft wrist could absorb uncertainties of states for the
peg [24]. Deaconescu et al. developed a pneumatic-driven
gripper, which allowed the correction of the misalignment
between the peg and the hole [25]. Hang et al. leveraged a
soft robot for pre-grasping tasks. The softness helped sliding
of objects without using force controllers [5]. Deimel et al.
proposed a soft hand, which can grasp multiple objects with
a few dimensional expressions [26].
Based on the insights provided by [5], [26], we leverage
the softness for the lower dimensional action space to im-
prove learning efficiency in high dimensional tasks such as
peg-in-hole tasks.
C. Lower dimensional system
Environmental constraints could help manipulation
skills [27]. They can also reduce the number of dimensions
for the state space. Johannsmeier et al. proposed a specific
skill formalism called MPs, which reduces the complexity
of manipulation tasks to learn the optimal parameters
for parameterized impedance controllers [4]. Tang et al.
presented a dimensional reduction method for learning from
demonstrations, which considered only radial directions
because of rotational symmetry between the peg and the
hole [28].
Based on skill formalism [4], our method attempts to
learn peg-in-hole skills with the soft robot while using the
softness and environmental constraints, where the robot can
perform the assembly tasks in much lower state and action
spaces. In addition, we employ sample-efficient model-based
reinforcement learning [6], which can obtain the peg-in-hole
skills from scratch with a few trials.
State Action
n1: approach n2: contact n3: fit n4: align n5: insertion
Pre-designed position controller Learned velocity controller
arm
compliant
wrist
gripper
peg
hole
Fig. 2. Schematic for peg-in-hole skill. We employ MPs [4], which divide the sequence of skills. The red and green arrows show the target state and
actions on each primitive, respectively.
III. LOWER DIMENSIONAL SYSTEM WITH
SOFTNESS AND ENVIRONMENTAL
CONSTRAINTS
In this section, we introduce our proposed method, namely
a lower dimensional system using softness and environmental
constraints.
A. Preliminaries
We consider the gripper position and orientation in a
Cartesian coordinate system. We implement the compliant
wrist between the tip of the robot arm and the gripper. Here,
we assume that the gripper and the peg are tightly coupled
and that the position of the gripper can be measured. Based
on these assumptions, the dynamics of the gripper can be
given as
xt+1 =f(xt,ut) + ω,(1)
with contentious state x∈RD, command actions u∈RF,
and additive Gaussian noises ω. We employ a data-driven
approach to obtain the policy πfor the peg-in-hole task
because of the complex dynamics derived from the softness.
However, if we directly apply learning approaches to the
system, the required amount of training data would increase
since the number of the dimensions of states and actions
becomes large. For this reason, we need to create a lower
dimensional system to make the learning tractable.
B. Manipulation primitives with softness
To leverage the environmental constraints, we employ
graphical skill formulation, which segments the sequence of
the peg-in-hole skills [4]. When the peg contacts the surface
or hole, the direction the robot can move in is constrained
for each event such as fit, align, and insertion. In this
manner, the authors could reduce the target dimensionality
of the task. Unlike the formulation [4], we can ignore the
force controller and only focus on the position and velocity
controller because we use a compliant robot, specifically
one that can adsorb contacts with the environment. This can
reduce the number of dimensions for the action space.
The skill graph consists of edges e∈Eand nodes n∈N.
We refer to the nodes as MPs. Each MP has a pre-designed
position controller or learned velocity controller. The transi-
tions between the MPs are triggered by ”conditions.” Here,
we define three conditions and a transition as follows:
1) Pre-condition Cpre initializes the skills. If ||xT−
xinit|| < δinit , where xinit is an initial position, T
is a timestep and δinit is a threshold for the position
error, it proceeds to the next primitive. Otherwise, it
performs the initialization again.
2) Success condition Csuc stops the skills when they are
successful. If ||xT−xsuc|| < δsuc , it stops the task.
Otherwise, it returns to the pre-condition.
3) error condition Cerr stops the skills. If the robot reach
is outside the operation range x>xout, it returns to
the pre-condition.
4) Transition eiswitches the MPs. If ||xT−xd|| < δ, the
MP moves from nto n+ 1. Otherwise, it returns to
the pre-condition.
Based on this formalism, we design the MPs for the peg-
in-hole task. For simplicity, we consider a 2D plane system
based on the previous works by [4], [28].
•n1(Approach): The robot moves close to the surface.
The state and action are given as follows:
x(approach) = [y, z]>,u(approach) = [yd, zd]>
In this primitive, we employ a pre-designed position
controller with a motion planner to move from the initial
position to the desired position.
•n2(Contact): The robot contacts the surface.
x(contact) =z, u(contact) =zd
Similar to the approach primitive, we use the position
controller. Since the softness allows safe contact with
the surface, we do not have to apply force controllers.
•n3(Fit): The robot moves laterally in the ydirection
while keeping contact with the surface until the peg fits
in the hole.
x(fit) = [y, ˙y]>, u(fit) = ˙yd
We use a velocity controller with reinforcement learn-
ing. Since the peg is pressed down on the surface by the
softness, the zdirection can be constrained. Therefore,
we only consider the ydirection for both the state
and action space. Continued movement in the direction
allows the edge of the peg to fit in the hole.
•n4(Align): The robot is rotated in the αdirection until
the peg and holes are vertically aligned.
x(align) = [α, ˙α]>, u(align) = ˙yd
We can focus on the state in the αrotation and action
space in the ydirection only because the peg is con-
strained in the hole, and the robot can naturally generate
the rotation motion whose center is the edge of the peg.
We can also ignore the zdirection since the softness can
absorb the deformation of the zdirection.
•n5(Insertion): The peg is inserted in the hole to the
desired depth.
x(insertion) = [z, ˙z]>,u(insertion) = [ ˙yd,˙zd]>
We focus on the state of the zdirection only since the
peg moves in that direction alone. We consider both the
yand zaction spaces in case the peg jams.
IV. SKILL ACQUISITION BY MODEL-BASED
REINFORCEMENT LEARNING
In this section, we briefly introduce the model-based rein-
forcement learning method. We learn the velocity controllers
on the fit, align, and insertion primitives with PILCO [6]
to address complex systems such as contacts and softness.
The learning goal is to obtain a control policy u(n)=
π(n)(x(n),θ(n))that maximizes the long-term reward on
each primitive n. For simplicity, we denote the state x(n),
dynamics f(n), policy π(n), and policy parameter θ(n)as x,
f,π, and θ, respectively.
The reward is expressed as follows:
Jπ(θ) = ΣT
t=0Ext[r(xt)],(2)
where Jπevaluates the reward for Ttimesteps, θis the
policy parameter, and an immediate reward r(xt)is given
as
r(xt) = exp −1
2(xd−xt)>W(xd−xt)
(3)
where xdis a desired state, and Wis a diagonal matrix for
a weight of the reward function.
A. Model learning
PILCO employs Gaussian process regression [29] to learn
the dynamics. We consider the training inputs and outputs
as ˜
x= [x>
t,u>
t]∈RD+Fand ∆t=xt+1 −xt∈RD,
respectively. Given lset of training input ˜
X= [˜
x1, ..., ˜
xl]
and output y= [∆1, ..., ∆l]data, the predictive distribution
for the next timestep xt+1 is represented as
p(xt+1|xt,ut) = N(xt+1 |µt+1,Σt+1),(4)
µt+1 =xt+k>
∗(K+σ2
ξI)−1y(5)
Σt+1 =k∗∗ −k>
∗(K+σ2
ξI)−1k∗.(6)
Here, k∗:= k(˜
X,˜
xt),k∗∗ := k(˜
xt), where Kis a kernel
matrix, each of whose element follows Kij =k(˜xi,˜xj), and
σξis a noise parameter.
We use the following squared exponential kernel function:
k(˜
xi,˜
xj) = σ2
fexp −1
2(˜
xi−˜
xj)>Λ−1(˜
xi−˜
xj)(7)
where Λis a precision matrix that expresses the character-
istic length, and σfis the bandwidth parameter.
B. Control policy
We use a deterministic Gaussian process policy. Since it
has high expressiveness, it is suitable for encoding complex
skills. The following control policy, given a test input x∗, is
represented as
π(x∗,θ) =
H
X
i=1
k(mi,x∗)(K+σ2
πI)−1t,(8)
where tis a training target, M= [m1, ..., mH]are the
centers of the Gaussian basis functions, σ2
πis noise variance,
and kis a kernel function. A policy parameter θis composed
of M,t, and the scale of the kernel functions.
C. Policy evaluation and improvement
To evaluate the control policy, we need to compute the
distribution of ∆tby integrating the distributions of ˜
xand
f. However, it cannot be obtained analytically. To this end,
PILCO employs an approximation with an analytic moment
matching technique.
The policy parameter θis optimized by gradient descent
for Jπ(θ). The gradient ∂J π(θ)/∂θis calculated by the
chain rule technique. We apply a standard gradient-based
optimization method called conjugate gradient (CG).
V. SIMULATION AND EXPERIMENT
A. Simulation
To verify our method, we conducted a simple simulation
of the peg-in-hole for our proof of concept. The goal of the
simulation was to investigate whether our lower dimensional
system could accelerate learning.
1) Setup: We developed a peg-in-hole environment with
the Box2D physics engine [30]. We considered a 2D plane
environment with a simplified model of our wrist, and com-
pared our method involving the lower dimensional system
with the method without it. The center block simulated the
robot arm, which was controlled directly with the velocity
commands for the yand zdirections. The peg and arm
were connected by springs to emulate the compliant wrist
(see Fig. 3). Note that we placed the frames to surround the
center block so that the springs could not be extended too
much. Although the kinematic structure was different from
the real robot’s one (see Fig. 5), the functionality could be
considered equivalent. In this simulation, we learned the fit,
align, and insertion primitives. The dynamical system of our
method was given in Sec. III-B. The comparison considered
all dimensions x= [y, z, α, ˙y, ˙z, ˙α]>,u= [ ˙yd,˙zd]>.
Fig. 3. Snapshots of simulation with successfully learned policies. The peg and arm (the center block) were connected by springs to emulate the compliant
wrist. We placed frames to surround the arm so that the springs could not be extended too much.
0 5 10 15 20
0
0.02
0.04
0.06
0.08
0.1
y position error
t
w/ lower dim. system (proposed)
w/o lower dim. system
0 5 10 15 20
trial
0
0.02
0.04
0.06
0.08
0.1
angle error
align
0 5 10 15 20
0
0.02
0.04
0.06
0.08
0.1
z position error
insertion
Fig. 4. Average position or angle errors on each trial and primitive. The blue lines show the results of our proposed method. The red lines show the
results excluding the lower dimensional system. Our method could learn policies faster in comparison to the other method.
Motion capture
camera × 6
Compliant wrist
Marker
× 4
mp
an
t
t
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
is
t
t
t
t
t
t
t
t
t
t
t
t
t
Fig. 5. Experimental setup composed of a 6-degree of freedom robot,
compliant wrist, gripper, and motion capture system.
We used an open source code for PILCO [31]. On each
primitive, PILCO learned the policy to reach the desired po-
sition. In the fit, align, and insertion primitives, the timestep
Twas 15, 10, and 10, respectively. We used the reward
function given in Eq. (3) to reduce the position error. The
control frequency was 20 Hz. Two initial data collection trials
in which the robot provides uniform random actions, were
conducted, and the number of learning trials was 20. We
evaluated the errors between the desired and current positions
on each trial in the two methods. We expected that our
method would show better performance since the area of
exploration was smaller than that in the other method.
2) Results of the simulation: Fig. 3 shows snapshots of
the peg-in-hole skills.The robot could leverage its constraints
and successfully learned the skills. Fig. 4 shows the position
errors on each primitive in the two methods to see the
learning effects. The x-axes show the learning trials and the
y-axes show the y,α, and zposition errors, which are the
most important states in the fit, align, and insertion primi-
tives, respectively. We evaluated 30 learning sessions. The
error bars show the standard deviations for the sessions. Our
method shows smaller errors than the compared counterpart,
and converges in a few trials. This result demonstrates that
our method successfully learned the peg-in-hole task faster
than that without the lower dimensional system.
B. Experiment with a real robot
Next, we performed a real-robot experiment to evaluate
our method in a real environment.
1) Setup: We used a Universal Robot UR5 and devel-
oped the compliant wrist, which could be attached between
the arm and a Robotiq 2F-85 gripper (see Fig. 5). Note
that the peg was fixed firmly to the gripper using a 3D
printed jig. The compliant wrist was composed of three
springs, which allowed deformation in six directions and
could switch between the rigid and soft modes (see Fig. 5).
We implemented Cartesian position and velocity controllers
using MoveIt [32]. For the approach primitive, we employed
the position controller with the rigid mode. For the contact
primitive, we switched to the soft mode from the rigid one.
We provided the desired position zdso that the tip of the peg
could touch the ground when switching to the soft mode. We
kept the soft mode from the contact to insertion primitives.
The position was 10 mm away from the center of the hole.
Then, we employed the velocity controller for the fit, align,
and insertion primitives. The diameter of the peg was 10 mm.
The tolerance between the peg and the hole was 15 µm. We
Fig. 6. Snapshots of the experiment with the real robot. Our method successfully learned peg-in-hole skills even with the real robot.
12345
0
0.005
0.01
0.015
0.02
0.025
0.03
y position error [m]
t
12345
trials
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
angle error [rad]
align
1 2 3 4 5
0
0.005
0.01
0.015
0.02
0.025
0.03
z position error [m]
insertion
Fig. 7. Average position or angle errors on each trial and primitive. The
error bars show the standard deviations for 10 learning sessions. All the
errors decreased in the fifth trial.
utilized Optitrack, a motion capture system, to measure the
position of the gripper. We connected our learning framework
with the controllers for UR5 and the motion capture system
using the Robot Operating System (ROS).
We also used the same framework as in the simulation.
In this real robot experiment, we focused on position or
angle states alone due to the noisy observations. First, we
learned the fit primitive. Then, we executed the learned fit
primitive and start learning the align primitive. The same
procedure was performed for the insertion primitive. The
control frequency was 5 Hz. One initial data collection
trial, where the robot provides uniform random actions, was
conducted, and five learning trials were undertaken. The
timesteps Tfor the fit, align, and insertion primitives were
10, 10, and 15, respectively.
2) Results of the experiment: We collected 60 and 75 data
points corresponding to a total of 12 and 15 s interactions for
the fit and align, and insertion primitives, respectively. Fig. 6
shows the snapshots for the learned peg-in-hole skills. The
robot also successfully learned the peg-in-hole skill. Fig. 7
shows the average position error on each trial and primitive.
We test 10 learning sessions. In the fit primitive, the errors
converged with the second trial. In the align primitive, the
error became smaller in the fifth trial. Finally, in the insertion
primitive, the errors were small since the robot succeeds in
reaching the bottom of the hole in all the trials.
We evaluated the success rate on each primitive after
learning. We defined success as the allowable position error
at the maximum timestep Ton each primitive, which could
proceed to next primitives. The success rates were 8/10,
9/10, and 10/10 on the fit, align, and insertion primitives,
respectively. The results show that our method successfully
learns peg-in-hole skills even when using a real robot.
VI. DISCUSSION
Here, we discuss the experimental results and limitations
of this study, and scope for future work.
We demonstrated that our method successfully learned the
peg-in-hole tasks in most of the learning sessions with a
few trials. Our method allows the robot to complete the
tasks without the force controllers and precise force/torque
sensors. Some failure cases in the fit and align primitives
caused overshoots for all the trials. The insertion primitives
were always successful because the align was precisely
performed and jamming did not occur. It could be useful to
obtain recovery skills to address jamming during insertion.
In this experiment, we only considered the position state
space as we observed the noisy velocity measurements
probably due to the motion capture system or vibration of
the compliant wrist. For faster task completion and more
precise control, velocity feedback could be included in the
policy’s observations. To this end, a policy search method
robust to noisy observations [33] would be helpful. By using
this, we could make our robot system even simpler to use
other sensors such as an inertial measurement unit (IMU) and
a distance sensor to measure deformation of the compliant
wrist instead of the motion capture system.
For future work, we will consider extending the proposed
method to deal with unseen tasks (e.g., different goals or peg
sizes). A meta-learning method, which can adapt to unseen
tasks with a few trials [34] would be promising. In addition,
we will explore the optimal stiffness of the compliant wrist,
which can accelerate the learning.
VII. CONCLUSION
In this study, we proposed a control framework for as-
sembly skills using a soft robot. We argued that the softness
allows the robot to interact with the environment and leverage
its constraints. This approach can lead to lower dimensional
systems and considerably ease the exploration of appropriate
skills. We employed manipulation primitives and sample
efficient model-based reinforcement learning. We performed
a simulation and an experiment with a real robot. The results
showed that our method successfully and efficiently learned
the needed skills in the lower dimensional system.
REFERENCES
[1] J. Xu, Z. Hou, Z. Liu, and H. Qiao, “Compare contact model-based
control and contact model-free learning: A survey of robotic peg-in-
hole assembly strategies,” arXiv preprint arXiv:1904.05240, 2019.
[2] Y. Karako, S. Kawakami, K. Koyama, M. Shimojo, T. Senoo, and
M. Ishikawa, “High-speed ring insertion by dynamic observable
contact hand,” in IEEE International Conference on Robotics and
Automation, 2019, pp. 2744–2750.
[3] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar,
and P. Abbeel, “Reinforcement learning on variable impedance con-
troller for high-precision robotic assembly,” in IEEE International
Conference on Robotics and Automation, 2019, pp. 3080–3087.
[4] L. Johannsmeier, M. Gerchow, and S. Haddadin, “A framework for
robot manipulation: Skill formalism, meta learning and adaptive con-
trol,” in IEEE International Conference on Robotics and Automation,
2019, pp. 5844–5850.
[5] K. Hang, A. S. Morgan, and A. M. Dollar, “Pre-grasp sliding manip-
ulation of thin objects using soft, compliant, or underactuated hands,”
IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 662–669,
2019.
[6] M. P. Deisenroth, D. Fox, and C. E. Rasmussen, “Gaussian processes
for data-efficient learning in robotics and control,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–
423, 2013.
[7] P. Falco, A. Attawia, M. Saveriano, and D. Lee, “On policy learning
robust to irreversible events: An application to robotic in-hand ma-
nipulation,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp.
1482–1489, 2018.
[8] M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto,
“Learning assistive strategies for exoskeleton robots from user-robot
physical interaction,” Pattern Recognition Letters, vol. 99, pp. 67–76,
2017.
[9] K. Zhang, J. Xu, H. Chen, J. Zhao, and K. Chen, “Jamming anal-
ysis and force control for flexible dual peg-in-hole assembly,” IEEE
Transactions on Industrial Electronics, vol. 66, no. 3, pp. 1930–1939,
2018.
[10] X. Zhang, Y. Zheng, J. Ota, and Y. Huang, “Peg-in-hole assembly
based on two-phase scheme and f/t sensor for dual-arm robot,”
Sensors, vol. 17, no. 9, p. 2004, 2017.
[11] K. Van Wyk, M. Culleton, J. Falco, and K. Kelly, “Comparative peg-
in-hole testing of a force-based manipulation controlled robotic hand,”
IEEE Transactions on Robotics, vol. 34, no. 2, pp. 542–549, 2018.
[12] H. Park, J.-H. Bae, J.-H. Park, M.-H. Baeg, and J. Park, “Intuitive
peg-in-hole assembly strategy with a compliant manipulator,” in IEEE
International Symposium on Robotics, 2013, pp. 1–5.
[13] R. Li and H. Qiao, “Condition and strategy analysis for assembly
based on attractive region in environment,” IEEE/ASME Transactions
on Mechatronics, vol. 22, no. 5, pp. 2218–2228, 2017.
[14] M. Vecerik, O. Sushkov, D. Barker, T. Rothrl, T. Hester, and J. Scholz,
“A practical approach to insertion with variable socket position using
deep reinforcement learning,” in IEEE International Conference on
Robotics and Automation, 2019, pp. 754–760.
[15] J. Xu, Z. Hou, W. Wang, B. Xu, K. Zhang, and K. Chen, “Feedback
deep deterministic policy gradient with fuzzy reward for robotic
multiple peg-in-hole assembly tasks,” IEEE Transactions on Industrial
Informatics, vol. 15, no. 3, pp. 1658–1667, 2019.
[16] Y. Fan, J. Luo, and M. Tomizuka, “A learning framework for high
precision industrial assembly,” in IEEE International Conference on
Robotics and Automation, 2019, pp. 811–817.
[17] M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-
Fei, A. Garg, and J. Bohg, “Making sense of vision and touch:
self-supervised learning of multimodal representations for contact-rich
tasks,” in IEEE International Conference on Robotics and Automation,
2019, pp. 8943–8950.
[18] S. Levine, N. Wagener, and P. Abbeel, “Learning contact-rich ma-
nipulation skills with guided policy search,” in IEEE International
Conference on Robotics and Automation, 2015, pp. 156–163.
[19] J. Morimoto, “Soft humanoid motor learning,” Science Robotics,
vol. 2, no. 13, p. eaaq0989, 2017.
[20] K.-L. Du, B.-B. Zhang, X. Huang, and J. Hu, “Dynamic analysis of
assembly process with passive compliance for robot manipulators,”
in IEEE International Symposium on Computational Intelligence in
Robotics and Automation, vol. 3, 2003, pp. 1168–1173.
[21] S.-k. Yun, “Compliant manipulation for peg-in-hole: Is passive compli-
ance a key to learn contact motion?” in IEEE International Conference
on Robotics and Automation, 2008, pp. 1647–1652.
[22] R. Li, R. Platt, W. Yuan, A. ten Pas, N. Roscup, M. A. Srinivasan,
and E. Adelson, “Localization and manipulation of small parts using
gelsight tactile sensing,” in IEEE/RSJ International Conference on
Intelligent Robots and Systems, 2014, pp. 3988–3993.
[23] K. Nozu and K. Shimonomura, “Robotic bolt insertion and tightening
based on in-hand object localization and force sensing,” in IEEE/ASME
International Conference on Advanced Intelligent Mechatronics, 2018,
pp. 310–315.
[24] T. Nishimura, Y. Suzuki, T. Tsuji, and T. Watanabe, “Peg-in-hole
under state uncertainties via a passive wrist joint with push-activate-
rotation function,” in IEEE-RAS International Conference on Hu-
manoid Robotics, 2017, pp. 67–74.
[25] T. Deaconescu and A. Deaconescu, “Pneumatic muscle-actuated ad-
justable compliant gripper system for assembly operations,” Strojniski
Vestnik-Journal of Mechanical Engineering, vol. 63, no. 4, pp. 225–
235, 2017.
[26] R. Deimel and O. Brock, “A novel type of compliant and underactuated
robotic hand for dexterous grasping,” The International Journal of
Robotics Research, vol. 35, no. 1-3, pp. 161–185, 2016.
[27] N. C. Dafle, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa,
M. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. Fuhlbrigge,
“Extrinsic dexterity: in-hand manipulation with external forces,” in
IEEE International Conference on Robotics and Automation, 2014,
pp. 1578–1585.
[28] T. Tang, H.-C. Lin, and M. Tomizuka, “A learning-based framework
for robot peg-hole-insertion,” in ASME Dynamic Systems and Control
Conference, 2015, pp. 1–9.
[29] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine
learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.
[30] “Box2D,” http://www.globus.org/toolkit/.
[31] “Probabilistic Inference for Learning Control (PILCO),” https://github.
com/nrontsis/PILCO.
[32] “MoveIt,” https://moveit.ros.org/.
[33] P. Parmas, C. E. Rasmussen, J. Peters, and K. Doya, “PIPPS: Flexible
model-based policy search robust to the curse of chaos,” in Interna-
tional Conference on Machine Learning, vol. 80, 2018, pp. 4065–4074.
[34] S. Sæmundsson, K. Hofmann, and M. P. Deisenroth, “Meta reinforce-
ment learning with latent variable gaussian processes,” in Conference
on Uncertainty in Artificial Intelligence, 2018.