Content uploaded by Kazutoshi Tanaka

Author content

All content in this area was uploaded by Kazutoshi Tanaka on Jun 03, 2020

Content may be subject to copyright.

Learning Robotic Assembly Tasks with Lower Dimensional Systems by

Leveraging Physical Softness and Environmental Constraints

Masashi Hamaya1, Robert Lee2, Kazutoshi Tanaka1, Felix von Drigalski1, Chisato Nakashima3,

Yoshiya Shibata3, and Yoshihisa Ijiri1,3

Abstract— In this study, we present a novel control frame-

work for assembly tasks with a soft robot. Typically, existing

hard robots require high frequency controllers and precise

force/torque sensors for assembly tasks. The resulting robot

system is complex, entailing large amounts of engineering and

maintenance. Physical softness allows the robot to interact with

the environment easily. We expect soft robots to perform assem-

bly tasks without the need for high frequency force/torque con-

trollers and sensors. However, speciﬁc data-driven approaches

are needed to deal with complex models involving nonlinearity

and hysteresis. If we were to apply these approaches directly,

we would be required to collect very large amounts of training

data. To solve this problem, we argue that by leveraging softness

and environmental constraints, a robot can complete tasks in

lower dimensional state and action spaces, which could greatly

facilitate the exploration of appropriate assembly skills. Then,

we apply a highly efﬁcient model-based reinforcement learning

method to lower dimensional systems. To verify our method, we

perform a simulation for peg-in-hole tasks. The results show

that our method learns the appropriate skills faster than an

approach that does not consider lower dimensional systems.

Moreover, we demonstrate that our method works on a real

robot equipped with a compliant module on the wrist.

I. INTRODUCTION

Robotic manipulation for assembly tasks presents great

potential for full automation in industrial situations. One of

the most common assembly tasks is the peg-in-hole task.

Many researchers have explored control methods for peg-in-

hole tasks [1]. However, these tasks remain challenging since

they require the controllers to deal with low tolerance and

complex models such as contacts and jamming between the

peg and hole.

To perform such peg-in-hole tasks, force controllers have

been widely applied [1]. Furthermore, recent developments

have demonstrated high precision control solutions for peg-

in-hole tasks [2], [3]. However, these approaches often

employed high frequency controllers or precise force/torque

sensors, and therefore, they largely depend on hardware

performance. If the robots could more easily interact with the

environment without such controllers or sensors, the robot

system would be much simpler.

To this end, we use physically soft robots (including

compliant components such as springs and silicon) for the

peg-in-hole tasks because the softness allows the robot to

1MH, KT, FD, and YI are with OMRON SINIC X Corporation, Tokyo,

Japan masashi.hamaya@sinicx.com

2RL is with the Australian Center for Robotic Vision at Queensland

University of Technology in Brisbane, Australia (work done as an intern at

OMRON SINIC X Corp.)

3YS, CN, and YI are with OMRON Corporation, Tokyo, Japan

Align

Fit

Insertion

Approach

Contact

Physical

softness

Possible state space

(e.g. position, velocity)

Possible action space

(e.g. position, velocity, force)

Environmental

constraints

Model-based reinforcement learning

Robot with

compliant wrist

Sequence of

peg-in-hole skills

Lower dimensional system

Fig. 1. Proposed framework for assembly tasks with a soft robot. We

develop the lower dimensional space using softness and environmental

constraints to make learning tractable. Then, we apply sample-efﬁcient

model-based reinforcement learning.

interact easily with the environment. Therefore, we expect

that the soft robot will be able to perform assembly tasks

even with low frequency controllers since the softness can

morphologically handle contact-rich interactions without the

need for signiﬁcant engineering effort. Meanwhile, obtaining

analytical models of soft robots is difﬁcult due to the

nonlinear or hysteretic nature of soft components. This

makes hand designed controllers for them difﬁcult. One

promising approach is to use certain data-driven approaches

to address the complexity. However, if we directly apply

such approaches to the robots, we would be required to

collect a large amount of real robot data to learn the complex

dynamics in such non-linear and high dimensional systems.

To deal with this problem, we propose a novel soft

robotic control framework for peg-in-hole tasks. The key

insight of this study is that by leveraging the softness and

environmental constraints, the robot can complete the tasks in

lower dimensional state and action spaces. More concretely,

we divide the sequence of peg-in-hole skills into manipu-

lation primitives (MPs) (e.g., ﬁt, align, and insertion) [4],

which allow us to consider only important state variables.

In addition, we do not require force controllers as we

replace the traditional rigid robot with soft robots [5]. We

can consider only desired position or velocity and ignore

desired force for the action space. This also could make our

approach accessible to a wider variety of robot systems. In

such a manner, we can considerably reduce the number of

dimensions for both state and action spaces by using softness

and environmental constraints. This facilitates exploration of

the appropriate peg-in-hole skills.

To obtain the skills for peg-in-hole tasks with as few

trials as possible, we apply a sample efﬁcient model-based

reinforcement learning method called Probabilistic Inference

for Learning Control (PILCO) [6] to the lower dimensional

system. PILCO considers continuous state and action space,

and learns the nonlinear model with uncertainty using Gaus-

sian process (GP). By making use of the model uncertainty,

this algorithm shows remarkable sample efﬁciency when

compared to existing model-free reinforcement learning [6].

This has also been applied to complex systems such as in-

hand manipulation [7] and human-robot interactions [8]. For

this reason, PILCO is suitable for learning peg-in-hole tasks,

especially in industrial scenarios where the robots need to

adapt to new environments rapidly.

We summarize our proposed framework in Fig. 1. The

contributions of this study are as follows:

•We propose a novel control framework for peg-in-hole

tasks using a physically soft robot. We demonstrate that

the softness and environmental constraints allow the

robot to perform the tasks in lower dimensional state

and action spaces.

•We employ sample-efﬁcient model-based reinforcement

learning for the tasks, which can obtain the appropriate

skills from scratch with a few interactions.

•We perform a simulation to verify our method. We

develop a peg-in-hole simulator with passive compliant

components. The results show that our method success-

fully learned the peg-in-hole skills faster than a method

without the lower dimensional systems. Moreover, we

demonstrate an experiment using a real robot with a

compliant wrist.

This paper is organized as follows. Section II presents

related works for our study. We introduce our lower dimen-

sional system in Section III and learning of peg-in-hole skills

in Section IV. Section V demonstrates the simulations and

experiment. Section VI discusses our results, and Section VII

concludes.

II. RELATED WORKS

In this section, we present related works for assembly

manipulation. The related works are broadly categorized into

impedance control methods, soft robot assembly methods,

and lower dimensional systems.

A. Force control

Force control methods have been explored extensively [1].

One of the most popular approaches is the model-based

controller. For example, contact state analyses-based force

controllers have been proposed [9], [10]. Van et al. developed

a force controller for multi-ﬁngers [11]. Force sensors are

provided for the tips of each ﬁnger to calculate the desired

contact forces. Karako et al. also proposed a controller with

multi-ﬁngers for ring insertion, which could compensate

for position errors using a high frequency controller [2].

Most model-based controllers employ force sensors or high

frequency controllers. Their controllers are largely affected

by hardware performance. Force controllers that exclude

force/torque sensors have also been introduced [12], [13].

However, they also suffer from model errors.

Recently, learning-based controllers have attracted much

attention. Deep reinforcement learning and self-supervised

learning are also used for assembly tasks [14], [15], [16],

[17]. Model-based reinforcement learning methods have also

been proposed [3], [18].

Although all of the above methods could learn adaptive

assembly skills, they also largely depend on force/torque

sensors or accurate torque controllers. In contrast to them, we

use a physically soft robot and develop a learning framework

for peg-in-hole tasks to address the model complexity.

B. Physically soft robots

Softness can address collisions with the environment and

improve skill performance [19]. Some studies have demon-

strated the effectiveness of softness in assembly tasks [20],

[21]. Soft tactile sensors have been developed to measure

the orientations of the grasped objects [22], [23]. Nishimura

et al. developed a passive compliant wrist and demonstrated

that a soft wrist could absorb uncertainties of states for the

peg [24]. Deaconescu et al. developed a pneumatic-driven

gripper, which allowed the correction of the misalignment

between the peg and the hole [25]. Hang et al. leveraged a

soft robot for pre-grasping tasks. The softness helped sliding

of objects without using force controllers [5]. Deimel et al.

proposed a soft hand, which can grasp multiple objects with

a few dimensional expressions [26].

Based on the insights provided by [5], [26], we leverage

the softness for the lower dimensional action space to im-

prove learning efﬁciency in high dimensional tasks such as

peg-in-hole tasks.

C. Lower dimensional system

Environmental constraints could help manipulation

skills [27]. They can also reduce the number of dimensions

for the state space. Johannsmeier et al. proposed a speciﬁc

skill formalism called MPs, which reduces the complexity

of manipulation tasks to learn the optimal parameters

for parameterized impedance controllers [4]. Tang et al.

presented a dimensional reduction method for learning from

demonstrations, which considered only radial directions

because of rotational symmetry between the peg and the

hole [28].

Based on skill formalism [4], our method attempts to

learn peg-in-hole skills with the soft robot while using the

softness and environmental constraints, where the robot can

perform the assembly tasks in much lower state and action

spaces. In addition, we employ sample-efﬁcient model-based

reinforcement learning [6], which can obtain the peg-in-hole

skills from scratch with a few trials.

State Action

n1: approach n2: contact n3: fit n4: align n5: insertion

Pre-designed position controller Learned velocity controller

arm

compliant

wrist

gripper

peg

hole

Fig. 2. Schematic for peg-in-hole skill. We employ MPs [4], which divide the sequence of skills. The red and green arrows show the target state and

actions on each primitive, respectively.

III. LOWER DIMENSIONAL SYSTEM WITH

SOFTNESS AND ENVIRONMENTAL

CONSTRAINTS

In this section, we introduce our proposed method, namely

a lower dimensional system using softness and environmental

constraints.

A. Preliminaries

We consider the gripper position and orientation in a

Cartesian coordinate system. We implement the compliant

wrist between the tip of the robot arm and the gripper. Here,

we assume that the gripper and the peg are tightly coupled

and that the position of the gripper can be measured. Based

on these assumptions, the dynamics of the gripper can be

given as

xt+1 =f(xt,ut) + ω,(1)

with contentious state x∈RD, command actions u∈RF,

and additive Gaussian noises ω. We employ a data-driven

approach to obtain the policy πfor the peg-in-hole task

because of the complex dynamics derived from the softness.

However, if we directly apply learning approaches to the

system, the required amount of training data would increase

since the number of the dimensions of states and actions

becomes large. For this reason, we need to create a lower

dimensional system to make the learning tractable.

B. Manipulation primitives with softness

To leverage the environmental constraints, we employ

graphical skill formulation, which segments the sequence of

the peg-in-hole skills [4]. When the peg contacts the surface

or hole, the direction the robot can move in is constrained

for each event such as ﬁt, align, and insertion. In this

manner, the authors could reduce the target dimensionality

of the task. Unlike the formulation [4], we can ignore the

force controller and only focus on the position and velocity

controller because we use a compliant robot, speciﬁcally

one that can adsorb contacts with the environment. This can

reduce the number of dimensions for the action space.

The skill graph consists of edges e∈Eand nodes n∈N.

We refer to the nodes as MPs. Each MP has a pre-designed

position controller or learned velocity controller. The transi-

tions between the MPs are triggered by ”conditions.” Here,

we deﬁne three conditions and a transition as follows:

1) Pre-condition Cpre initializes the skills. If ||xT−

xinit|| < δinit , where xinit is an initial position, T

is a timestep and δinit is a threshold for the position

error, it proceeds to the next primitive. Otherwise, it

performs the initialization again.

2) Success condition Csuc stops the skills when they are

successful. If ||xT−xsuc|| < δsuc , it stops the task.

Otherwise, it returns to the pre-condition.

3) error condition Cerr stops the skills. If the robot reach

is outside the operation range x>xout, it returns to

the pre-condition.

4) Transition eiswitches the MPs. If ||xT−xd|| < δ, the

MP moves from nto n+ 1. Otherwise, it returns to

the pre-condition.

Based on this formalism, we design the MPs for the peg-

in-hole task. For simplicity, we consider a 2D plane system

based on the previous works by [4], [28].

•n1(Approach): The robot moves close to the surface.

The state and action are given as follows:

x(approach) = [y, z]>,u(approach) = [yd, zd]>

In this primitive, we employ a pre-designed position

controller with a motion planner to move from the initial

position to the desired position.

•n2(Contact): The robot contacts the surface.

x(contact) =z, u(contact) =zd

Similar to the approach primitive, we use the position

controller. Since the softness allows safe contact with

the surface, we do not have to apply force controllers.

•n3(Fit): The robot moves laterally in the ydirection

while keeping contact with the surface until the peg ﬁts

in the hole.

x(ﬁt) = [y, ˙y]>, u(ﬁt) = ˙yd

We use a velocity controller with reinforcement learn-

ing. Since the peg is pressed down on the surface by the

softness, the zdirection can be constrained. Therefore,

we only consider the ydirection for both the state

and action space. Continued movement in the direction

allows the edge of the peg to ﬁt in the hole.

•n4(Align): The robot is rotated in the αdirection until

the peg and holes are vertically aligned.

x(align) = [α, ˙α]>, u(align) = ˙yd

We can focus on the state in the αrotation and action

space in the ydirection only because the peg is con-

strained in the hole, and the robot can naturally generate

the rotation motion whose center is the edge of the peg.

We can also ignore the zdirection since the softness can

absorb the deformation of the zdirection.

•n5(Insertion): The peg is inserted in the hole to the

desired depth.

x(insertion) = [z, ˙z]>,u(insertion) = [ ˙yd,˙zd]>

We focus on the state of the zdirection only since the

peg moves in that direction alone. We consider both the

yand zaction spaces in case the peg jams.

IV. SKILL ACQUISITION BY MODEL-BASED

REINFORCEMENT LEARNING

In this section, we brieﬂy introduce the model-based rein-

forcement learning method. We learn the velocity controllers

on the ﬁt, align, and insertion primitives with PILCO [6]

to address complex systems such as contacts and softness.

The learning goal is to obtain a control policy u(n)=

π(n)(x(n),θ(n))that maximizes the long-term reward on

each primitive n. For simplicity, we denote the state x(n),

dynamics f(n), policy π(n), and policy parameter θ(n)as x,

f,π, and θ, respectively.

The reward is expressed as follows:

Jπ(θ) = ΣT

t=0Ext[r(xt)],(2)

where Jπevaluates the reward for Ttimesteps, θis the

policy parameter, and an immediate reward r(xt)is given

as

r(xt) = exp −1

2(xd−xt)>W(xd−xt)

(3)

where xdis a desired state, and Wis a diagonal matrix for

a weight of the reward function.

A. Model learning

PILCO employs Gaussian process regression [29] to learn

the dynamics. We consider the training inputs and outputs

as ˜

x= [x>

t,u>

t]∈RD+Fand ∆t=xt+1 −xt∈RD,

respectively. Given lset of training input ˜

X= [˜

x1, ..., ˜

xl]

and output y= [∆1, ..., ∆l]data, the predictive distribution

for the next timestep xt+1 is represented as

p(xt+1|xt,ut) = N(xt+1 |µt+1,Σt+1),(4)

µt+1 =xt+k>

∗(K+σ2

ξI)−1y(5)

Σt+1 =k∗∗ −k>

∗(K+σ2

ξI)−1k∗.(6)

Here, k∗:= k(˜

X,˜

xt),k∗∗ := k(˜

xt), where Kis a kernel

matrix, each of whose element follows Kij =k(˜xi,˜xj), and

σξis a noise parameter.

We use the following squared exponential kernel function:

k(˜

xi,˜

xj) = σ2

fexp −1

2(˜

xi−˜

xj)>Λ−1(˜

xi−˜

xj)(7)

where Λis a precision matrix that expresses the character-

istic length, and σfis the bandwidth parameter.

B. Control policy

We use a deterministic Gaussian process policy. Since it

has high expressiveness, it is suitable for encoding complex

skills. The following control policy, given a test input x∗, is

represented as

π(x∗,θ) =

H

X

i=1

k(mi,x∗)(K+σ2

πI)−1t,(8)

where tis a training target, M= [m1, ..., mH]are the

centers of the Gaussian basis functions, σ2

πis noise variance,

and kis a kernel function. A policy parameter θis composed

of M,t, and the scale of the kernel functions.

C. Policy evaluation and improvement

To evaluate the control policy, we need to compute the

distribution of ∆tby integrating the distributions of ˜

xand

f. However, it cannot be obtained analytically. To this end,

PILCO employs an approximation with an analytic moment

matching technique.

The policy parameter θis optimized by gradient descent

for Jπ(θ). The gradient ∂J π(θ)/∂θis calculated by the

chain rule technique. We apply a standard gradient-based

optimization method called conjugate gradient (CG).

V. SIMULATION AND EXPERIMENT

A. Simulation

To verify our method, we conducted a simple simulation

of the peg-in-hole for our proof of concept. The goal of the

simulation was to investigate whether our lower dimensional

system could accelerate learning.

1) Setup: We developed a peg-in-hole environment with

the Box2D physics engine [30]. We considered a 2D plane

environment with a simpliﬁed model of our wrist, and com-

pared our method involving the lower dimensional system

with the method without it. The center block simulated the

robot arm, which was controlled directly with the velocity

commands for the yand zdirections. The peg and arm

were connected by springs to emulate the compliant wrist

(see Fig. 3). Note that we placed the frames to surround the

center block so that the springs could not be extended too

much. Although the kinematic structure was different from

the real robot’s one (see Fig. 5), the functionality could be

considered equivalent. In this simulation, we learned the ﬁt,

align, and insertion primitives. The dynamical system of our

method was given in Sec. III-B. The comparison considered

all dimensions x= [y, z, α, ˙y, ˙z, ˙α]>,u= [ ˙yd,˙zd]>.

Fig. 3. Snapshots of simulation with successfully learned policies. The peg and arm (the center block) were connected by springs to emulate the compliant

wrist. We placed frames to surround the arm so that the springs could not be extended too much.

0 5 10 15 20

0

0.02

0.04

0.06

0.08

0.1

y position error

t

w/ lower dim. system (proposed)

w/o lower dim. system

0 5 10 15 20

trial

0

0.02

0.04

0.06

0.08

0.1

angle error

align

0 5 10 15 20

0

0.02

0.04

0.06

0.08

0.1

z position error

insertion

Fig. 4. Average position or angle errors on each trial and primitive. The blue lines show the results of our proposed method. The red lines show the

results excluding the lower dimensional system. Our method could learn policies faster in comparison to the other method.

Motion capture

camera × 6

Compliant wrist

Marker

× 4

mp

an

t

t

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

is

t

t

t

t

t

t

t

t

t

t

t

t

t

Fig. 5. Experimental setup composed of a 6-degree of freedom robot,

compliant wrist, gripper, and motion capture system.

We used an open source code for PILCO [31]. On each

primitive, PILCO learned the policy to reach the desired po-

sition. In the ﬁt, align, and insertion primitives, the timestep

Twas 15, 10, and 10, respectively. We used the reward

function given in Eq. (3) to reduce the position error. The

control frequency was 20 Hz. Two initial data collection trials

in which the robot provides uniform random actions, were

conducted, and the number of learning trials was 20. We

evaluated the errors between the desired and current positions

on each trial in the two methods. We expected that our

method would show better performance since the area of

exploration was smaller than that in the other method.

2) Results of the simulation: Fig. 3 shows snapshots of

the peg-in-hole skills.The robot could leverage its constraints

and successfully learned the skills. Fig. 4 shows the position

errors on each primitive in the two methods to see the

learning effects. The x-axes show the learning trials and the

y-axes show the y,α, and zposition errors, which are the

most important states in the ﬁt, align, and insertion primi-

tives, respectively. We evaluated 30 learning sessions. The

error bars show the standard deviations for the sessions. Our

method shows smaller errors than the compared counterpart,

and converges in a few trials. This result demonstrates that

our method successfully learned the peg-in-hole task faster

than that without the lower dimensional system.

B. Experiment with a real robot

Next, we performed a real-robot experiment to evaluate

our method in a real environment.

1) Setup: We used a Universal Robot UR5 and devel-

oped the compliant wrist, which could be attached between

the arm and a Robotiq 2F-85 gripper (see Fig. 5). Note

that the peg was ﬁxed ﬁrmly to the gripper using a 3D

printed jig. The compliant wrist was composed of three

springs, which allowed deformation in six directions and

could switch between the rigid and soft modes (see Fig. 5).

We implemented Cartesian position and velocity controllers

using MoveIt [32]. For the approach primitive, we employed

the position controller with the rigid mode. For the contact

primitive, we switched to the soft mode from the rigid one.

We provided the desired position zdso that the tip of the peg

could touch the ground when switching to the soft mode. We

kept the soft mode from the contact to insertion primitives.

The position was 10 mm away from the center of the hole.

Then, we employed the velocity controller for the ﬁt, align,

and insertion primitives. The diameter of the peg was 10 mm.

The tolerance between the peg and the hole was 15 µm. We

Fig. 6. Snapshots of the experiment with the real robot. Our method successfully learned peg-in-hole skills even with the real robot.

12345

0

0.005

0.01

0.015

0.02

0.025

0.03

y position error [m]

t

12345

trials

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

angle error [rad]

align

1 2 3 4 5

0

0.005

0.01

0.015

0.02

0.025

0.03

z position error [m]

insertion

Fig. 7. Average position or angle errors on each trial and primitive. The

error bars show the standard deviations for 10 learning sessions. All the

errors decreased in the ﬁfth trial.

utilized Optitrack, a motion capture system, to measure the

position of the gripper. We connected our learning framework

with the controllers for UR5 and the motion capture system

using the Robot Operating System (ROS).

We also used the same framework as in the simulation.

In this real robot experiment, we focused on position or

angle states alone due to the noisy observations. First, we

learned the ﬁt primitive. Then, we executed the learned ﬁt

primitive and start learning the align primitive. The same

procedure was performed for the insertion primitive. The

control frequency was 5 Hz. One initial data collection

trial, where the robot provides uniform random actions, was

conducted, and ﬁve learning trials were undertaken. The

timesteps Tfor the ﬁt, align, and insertion primitives were

10, 10, and 15, respectively.

2) Results of the experiment: We collected 60 and 75 data

points corresponding to a total of 12 and 15 s interactions for

the ﬁt and align, and insertion primitives, respectively. Fig. 6

shows the snapshots for the learned peg-in-hole skills. The

robot also successfully learned the peg-in-hole skill. Fig. 7

shows the average position error on each trial and primitive.

We test 10 learning sessions. In the ﬁt primitive, the errors

converged with the second trial. In the align primitive, the

error became smaller in the ﬁfth trial. Finally, in the insertion

primitive, the errors were small since the robot succeeds in

reaching the bottom of the hole in all the trials.

We evaluated the success rate on each primitive after

learning. We deﬁned success as the allowable position error

at the maximum timestep Ton each primitive, which could

proceed to next primitives. The success rates were 8/10,

9/10, and 10/10 on the ﬁt, align, and insertion primitives,

respectively. The results show that our method successfully

learns peg-in-hole skills even when using a real robot.

VI. DISCUSSION

Here, we discuss the experimental results and limitations

of this study, and scope for future work.

We demonstrated that our method successfully learned the

peg-in-hole tasks in most of the learning sessions with a

few trials. Our method allows the robot to complete the

tasks without the force controllers and precise force/torque

sensors. Some failure cases in the ﬁt and align primitives

caused overshoots for all the trials. The insertion primitives

were always successful because the align was precisely

performed and jamming did not occur. It could be useful to

obtain recovery skills to address jamming during insertion.

In this experiment, we only considered the position state

space as we observed the noisy velocity measurements

probably due to the motion capture system or vibration of

the compliant wrist. For faster task completion and more

precise control, velocity feedback could be included in the

policy’s observations. To this end, a policy search method

robust to noisy observations [33] would be helpful. By using

this, we could make our robot system even simpler to use

other sensors such as an inertial measurement unit (IMU) and

a distance sensor to measure deformation of the compliant

wrist instead of the motion capture system.

For future work, we will consider extending the proposed

method to deal with unseen tasks (e.g., different goals or peg

sizes). A meta-learning method, which can adapt to unseen

tasks with a few trials [34] would be promising. In addition,

we will explore the optimal stiffness of the compliant wrist,

which can accelerate the learning.

VII. CONCLUSION

In this study, we proposed a control framework for as-

sembly skills using a soft robot. We argued that the softness

allows the robot to interact with the environment and leverage

its constraints. This approach can lead to lower dimensional

systems and considerably ease the exploration of appropriate

skills. We employed manipulation primitives and sample

efﬁcient model-based reinforcement learning. We performed

a simulation and an experiment with a real robot. The results

showed that our method successfully and efﬁciently learned

the needed skills in the lower dimensional system.

REFERENCES

[1] J. Xu, Z. Hou, Z. Liu, and H. Qiao, “Compare contact model-based

control and contact model-free learning: A survey of robotic peg-in-

hole assembly strategies,” arXiv preprint arXiv:1904.05240, 2019.

[2] Y. Karako, S. Kawakami, K. Koyama, M. Shimojo, T. Senoo, and

M. Ishikawa, “High-speed ring insertion by dynamic observable

contact hand,” in IEEE International Conference on Robotics and

Automation, 2019, pp. 2744–2750.

[3] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar,

and P. Abbeel, “Reinforcement learning on variable impedance con-

troller for high-precision robotic assembly,” in IEEE International

Conference on Robotics and Automation, 2019, pp. 3080–3087.

[4] L. Johannsmeier, M. Gerchow, and S. Haddadin, “A framework for

robot manipulation: Skill formalism, meta learning and adaptive con-

trol,” in IEEE International Conference on Robotics and Automation,

2019, pp. 5844–5850.

[5] K. Hang, A. S. Morgan, and A. M. Dollar, “Pre-grasp sliding manip-

ulation of thin objects using soft, compliant, or underactuated hands,”

IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 662–669,

2019.

[6] M. P. Deisenroth, D. Fox, and C. E. Rasmussen, “Gaussian processes

for data-efﬁcient learning in robotics and control,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–

423, 2013.

[7] P. Falco, A. Attawia, M. Saveriano, and D. Lee, “On policy learning

robust to irreversible events: An application to robotic in-hand ma-

nipulation,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp.

1482–1489, 2018.

[8] M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto,

“Learning assistive strategies for exoskeleton robots from user-robot

physical interaction,” Pattern Recognition Letters, vol. 99, pp. 67–76,

2017.

[9] K. Zhang, J. Xu, H. Chen, J. Zhao, and K. Chen, “Jamming anal-

ysis and force control for ﬂexible dual peg-in-hole assembly,” IEEE

Transactions on Industrial Electronics, vol. 66, no. 3, pp. 1930–1939,

2018.

[10] X. Zhang, Y. Zheng, J. Ota, and Y. Huang, “Peg-in-hole assembly

based on two-phase scheme and f/t sensor for dual-arm robot,”

Sensors, vol. 17, no. 9, p. 2004, 2017.

[11] K. Van Wyk, M. Culleton, J. Falco, and K. Kelly, “Comparative peg-

in-hole testing of a force-based manipulation controlled robotic hand,”

IEEE Transactions on Robotics, vol. 34, no. 2, pp. 542–549, 2018.

[12] H. Park, J.-H. Bae, J.-H. Park, M.-H. Baeg, and J. Park, “Intuitive

peg-in-hole assembly strategy with a compliant manipulator,” in IEEE

International Symposium on Robotics, 2013, pp. 1–5.

[13] R. Li and H. Qiao, “Condition and strategy analysis for assembly

based on attractive region in environment,” IEEE/ASME Transactions

on Mechatronics, vol. 22, no. 5, pp. 2218–2228, 2017.

[14] M. Vecerik, O. Sushkov, D. Barker, T. Rothrl, T. Hester, and J. Scholz,

“A practical approach to insertion with variable socket position using

deep reinforcement learning,” in IEEE International Conference on

Robotics and Automation, 2019, pp. 754–760.

[15] J. Xu, Z. Hou, W. Wang, B. Xu, K. Zhang, and K. Chen, “Feedback

deep deterministic policy gradient with fuzzy reward for robotic

multiple peg-in-hole assembly tasks,” IEEE Transactions on Industrial

Informatics, vol. 15, no. 3, pp. 1658–1667, 2019.

[16] Y. Fan, J. Luo, and M. Tomizuka, “A learning framework for high

precision industrial assembly,” in IEEE International Conference on

Robotics and Automation, 2019, pp. 811–817.

[17] M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-

Fei, A. Garg, and J. Bohg, “Making sense of vision and touch:

self-supervised learning of multimodal representations for contact-rich

tasks,” in IEEE International Conference on Robotics and Automation,

2019, pp. 8943–8950.

[18] S. Levine, N. Wagener, and P. Abbeel, “Learning contact-rich ma-

nipulation skills with guided policy search,” in IEEE International

Conference on Robotics and Automation, 2015, pp. 156–163.

[19] J. Morimoto, “Soft humanoid motor learning,” Science Robotics,

vol. 2, no. 13, p. eaaq0989, 2017.

[20] K.-L. Du, B.-B. Zhang, X. Huang, and J. Hu, “Dynamic analysis of

assembly process with passive compliance for robot manipulators,”

in IEEE International Symposium on Computational Intelligence in

Robotics and Automation, vol. 3, 2003, pp. 1168–1173.

[21] S.-k. Yun, “Compliant manipulation for peg-in-hole: Is passive compli-

ance a key to learn contact motion?” in IEEE International Conference

on Robotics and Automation, 2008, pp. 1647–1652.

[22] R. Li, R. Platt, W. Yuan, A. ten Pas, N. Roscup, M. A. Srinivasan,

and E. Adelson, “Localization and manipulation of small parts using

gelsight tactile sensing,” in IEEE/RSJ International Conference on

Intelligent Robots and Systems, 2014, pp. 3988–3993.

[23] K. Nozu and K. Shimonomura, “Robotic bolt insertion and tightening

based on in-hand object localization and force sensing,” in IEEE/ASME

International Conference on Advanced Intelligent Mechatronics, 2018,

pp. 310–315.

[24] T. Nishimura, Y. Suzuki, T. Tsuji, and T. Watanabe, “Peg-in-hole

under state uncertainties via a passive wrist joint with push-activate-

rotation function,” in IEEE-RAS International Conference on Hu-

manoid Robotics, 2017, pp. 67–74.

[25] T. Deaconescu and A. Deaconescu, “Pneumatic muscle-actuated ad-

justable compliant gripper system for assembly operations,” Strojniski

Vestnik-Journal of Mechanical Engineering, vol. 63, no. 4, pp. 225–

235, 2017.

[26] R. Deimel and O. Brock, “A novel type of compliant and underactuated

robotic hand for dexterous grasping,” The International Journal of

Robotics Research, vol. 35, no. 1-3, pp. 161–185, 2016.

[27] N. C. Daﬂe, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa,

M. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. Fuhlbrigge,

“Extrinsic dexterity: in-hand manipulation with external forces,” in

IEEE International Conference on Robotics and Automation, 2014,

pp. 1578–1585.

[28] T. Tang, H.-C. Lin, and M. Tomizuka, “A learning-based framework

for robot peg-hole-insertion,” in ASME Dynamic Systems and Control

Conference, 2015, pp. 1–9.

[29] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine

learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.

[30] “Box2D,” http://www.globus.org/toolkit/.

[31] “Probabilistic Inference for Learning Control (PILCO),” https://github.

com/nrontsis/PILCO.

[32] “MoveIt,” https://moveit.ros.org/.

[33] P. Parmas, C. E. Rasmussen, J. Peters, and K. Doya, “PIPPS: Flexible

model-based policy search robust to the curse of chaos,” in Interna-

tional Conference on Machine Learning, vol. 80, 2018, pp. 4065–4074.

[34] S. Sæmundsson, K. Hofmann, and M. P. Deisenroth, “Meta reinforce-

ment learning with latent variable gaussian processes,” in Conference

on Uncertainty in Artiﬁcial Intelligence, 2018.