Conference PaperPDF Available

Learning Robotic Assembly Tasks with Lower Dimensional Systems by Leveraging Physical Softness and Environmental Constraints

  • OMRON SINIC X Corporation, Japan, Tokyo
  • OMRON SINIC X Corporation

Abstract and Figures

In this study, we present a novel control framework for assembly tasks with a soft robot. Typically, existing hard robots require high frequency controllers and precise force/torque sensors for assembly tasks. The resulting robot system is complex, entailing large amounts of engineering and maintenance. Physical softness allows the robot to interact with the environment easily. We expect soft robots to perform assembly tasks without the need for high frequency force/torque controllers and sensors. However, specific data-driven approaches are needed to deal with complex models involving nonlinearity and hysteresis. If we were to apply these approaches directly, we would be required to collect very large amounts of training data. To solve this problem, we argue that by leveraging softness and environmental constraints, a robot can complete tasks in lower dimensional state and action spaces, which could greatly facilitate the exploration of appropriate assembly skills. Then, we apply a highly efficient model-based reinforcement learning method to lower dimensional systems. To verify our method, we perform a simulation for peg-in-hole tasks. The results show that our method learns the appropriate skills faster than an approach that does not consider lower dimensional systems. Moreover, we demonstrate that our method works on a real robot equipped with a compliant module on the wrist.
Content may be subject to copyright.
Learning Robotic Assembly Tasks with Lower Dimensional Systems by
Leveraging Physical Softness and Environmental Constraints
Masashi Hamaya1, Robert Lee2, Kazutoshi Tanaka1, Felix von Drigalski1, Chisato Nakashima3,
Yoshiya Shibata3, and Yoshihisa Ijiri1,3
Abstract In this study, we present a novel control frame-
work for assembly tasks with a soft robot. Typically, existing
hard robots require high frequency controllers and precise
force/torque sensors for assembly tasks. The resulting robot
system is complex, entailing large amounts of engineering and
maintenance. Physical softness allows the robot to interact with
the environment easily. We expect soft robots to perform assem-
bly tasks without the need for high frequency force/torque con-
trollers and sensors. However, specific data-driven approaches
are needed to deal with complex models involving nonlinearity
and hysteresis. If we were to apply these approaches directly,
we would be required to collect very large amounts of training
data. To solve this problem, we argue that by leveraging softness
and environmental constraints, a robot can complete tasks in
lower dimensional state and action spaces, which could greatly
facilitate the exploration of appropriate assembly skills. Then,
we apply a highly efficient model-based reinforcement learning
method to lower dimensional systems. To verify our method, we
perform a simulation for peg-in-hole tasks. The results show
that our method learns the appropriate skills faster than an
approach that does not consider lower dimensional systems.
Moreover, we demonstrate that our method works on a real
robot equipped with a compliant module on the wrist.
Robotic manipulation for assembly tasks presents great
potential for full automation in industrial situations. One of
the most common assembly tasks is the peg-in-hole task.
Many researchers have explored control methods for peg-in-
hole tasks [1]. However, these tasks remain challenging since
they require the controllers to deal with low tolerance and
complex models such as contacts and jamming between the
peg and hole.
To perform such peg-in-hole tasks, force controllers have
been widely applied [1]. Furthermore, recent developments
have demonstrated high precision control solutions for peg-
in-hole tasks [2], [3]. However, these approaches often
employed high frequency controllers or precise force/torque
sensors, and therefore, they largely depend on hardware
performance. If the robots could more easily interact with the
environment without such controllers or sensors, the robot
system would be much simpler.
To this end, we use physically soft robots (including
compliant components such as springs and silicon) for the
peg-in-hole tasks because the softness allows the robot to
1MH, KT, FD, and YI are with OMRON SINIC X Corporation, Tokyo,
2RL is with the Australian Center for Robotic Vision at Queensland
University of Technology in Brisbane, Australia (work done as an intern at
3YS, CN, and YI are with OMRON Corporation, Tokyo, Japan
Possible state space
(e.g. position, velocity)
Possible action space
(e.g. position, velocity, force)
Model-based reinforcement learning
Robot with
compliant wrist
Sequence of
peg-in-hole skills
Lower dimensional system
Fig. 1. Proposed framework for assembly tasks with a soft robot. We
develop the lower dimensional space using softness and environmental
constraints to make learning tractable. Then, we apply sample-efficient
model-based reinforcement learning.
interact easily with the environment. Therefore, we expect
that the soft robot will be able to perform assembly tasks
even with low frequency controllers since the softness can
morphologically handle contact-rich interactions without the
need for significant engineering effort. Meanwhile, obtaining
analytical models of soft robots is difficult due to the
nonlinear or hysteretic nature of soft components. This
makes hand designed controllers for them difficult. One
promising approach is to use certain data-driven approaches
to address the complexity. However, if we directly apply
such approaches to the robots, we would be required to
collect a large amount of real robot data to learn the complex
dynamics in such non-linear and high dimensional systems.
To deal with this problem, we propose a novel soft
robotic control framework for peg-in-hole tasks. The key
insight of this study is that by leveraging the softness and
environmental constraints, the robot can complete the tasks in
lower dimensional state and action spaces. More concretely,
we divide the sequence of peg-in-hole skills into manipu-
lation primitives (MPs) (e.g., fit, align, and insertion) [4],
which allow us to consider only important state variables.
In addition, we do not require force controllers as we
replace the traditional rigid robot with soft robots [5]. We
can consider only desired position or velocity and ignore
desired force for the action space. This also could make our
approach accessible to a wider variety of robot systems. In
such a manner, we can considerably reduce the number of
dimensions for both state and action spaces by using softness
and environmental constraints. This facilitates exploration of
the appropriate peg-in-hole skills.
To obtain the skills for peg-in-hole tasks with as few
trials as possible, we apply a sample efficient model-based
reinforcement learning method called Probabilistic Inference
for Learning Control (PILCO) [6] to the lower dimensional
system. PILCO considers continuous state and action space,
and learns the nonlinear model with uncertainty using Gaus-
sian process (GP). By making use of the model uncertainty,
this algorithm shows remarkable sample efficiency when
compared to existing model-free reinforcement learning [6].
This has also been applied to complex systems such as in-
hand manipulation [7] and human-robot interactions [8]. For
this reason, PILCO is suitable for learning peg-in-hole tasks,
especially in industrial scenarios where the robots need to
adapt to new environments rapidly.
We summarize our proposed framework in Fig. 1. The
contributions of this study are as follows:
We propose a novel control framework for peg-in-hole
tasks using a physically soft robot. We demonstrate that
the softness and environmental constraints allow the
robot to perform the tasks in lower dimensional state
and action spaces.
We employ sample-efficient model-based reinforcement
learning for the tasks, which can obtain the appropriate
skills from scratch with a few interactions.
We perform a simulation to verify our method. We
develop a peg-in-hole simulator with passive compliant
components. The results show that our method success-
fully learned the peg-in-hole skills faster than a method
without the lower dimensional systems. Moreover, we
demonstrate an experiment using a real robot with a
compliant wrist.
This paper is organized as follows. Section II presents
related works for our study. We introduce our lower dimen-
sional system in Section III and learning of peg-in-hole skills
in Section IV. Section V demonstrates the simulations and
experiment. Section VI discusses our results, and Section VII
In this section, we present related works for assembly
manipulation. The related works are broadly categorized into
impedance control methods, soft robot assembly methods,
and lower dimensional systems.
A. Force control
Force control methods have been explored extensively [1].
One of the most popular approaches is the model-based
controller. For example, contact state analyses-based force
controllers have been proposed [9], [10]. Van et al. developed
a force controller for multi-fingers [11]. Force sensors are
provided for the tips of each finger to calculate the desired
contact forces. Karako et al. also proposed a controller with
multi-fingers for ring insertion, which could compensate
for position errors using a high frequency controller [2].
Most model-based controllers employ force sensors or high
frequency controllers. Their controllers are largely affected
by hardware performance. Force controllers that exclude
force/torque sensors have also been introduced [12], [13].
However, they also suffer from model errors.
Recently, learning-based controllers have attracted much
attention. Deep reinforcement learning and self-supervised
learning are also used for assembly tasks [14], [15], [16],
[17]. Model-based reinforcement learning methods have also
been proposed [3], [18].
Although all of the above methods could learn adaptive
assembly skills, they also largely depend on force/torque
sensors or accurate torque controllers. In contrast to them, we
use a physically soft robot and develop a learning framework
for peg-in-hole tasks to address the model complexity.
B. Physically soft robots
Softness can address collisions with the environment and
improve skill performance [19]. Some studies have demon-
strated the effectiveness of softness in assembly tasks [20],
[21]. Soft tactile sensors have been developed to measure
the orientations of the grasped objects [22], [23]. Nishimura
et al. developed a passive compliant wrist and demonstrated
that a soft wrist could absorb uncertainties of states for the
peg [24]. Deaconescu et al. developed a pneumatic-driven
gripper, which allowed the correction of the misalignment
between the peg and the hole [25]. Hang et al. leveraged a
soft robot for pre-grasping tasks. The softness helped sliding
of objects without using force controllers [5]. Deimel et al.
proposed a soft hand, which can grasp multiple objects with
a few dimensional expressions [26].
Based on the insights provided by [5], [26], we leverage
the softness for the lower dimensional action space to im-
prove learning efficiency in high dimensional tasks such as
peg-in-hole tasks.
C. Lower dimensional system
Environmental constraints could help manipulation
skills [27]. They can also reduce the number of dimensions
for the state space. Johannsmeier et al. proposed a specific
skill formalism called MPs, which reduces the complexity
of manipulation tasks to learn the optimal parameters
for parameterized impedance controllers [4]. Tang et al.
presented a dimensional reduction method for learning from
demonstrations, which considered only radial directions
because of rotational symmetry between the peg and the
hole [28].
Based on skill formalism [4], our method attempts to
learn peg-in-hole skills with the soft robot while using the
softness and environmental constraints, where the robot can
perform the assembly tasks in much lower state and action
spaces. In addition, we employ sample-efficient model-based
reinforcement learning [6], which can obtain the peg-in-hole
skills from scratch with a few trials.
State Action
n1: approach n2: contact n3: fit n4: align n5: insertion
Pre-designed position controller Learned velocity controller
Fig. 2. Schematic for peg-in-hole skill. We employ MPs [4], which divide the sequence of skills. The red and green arrows show the target state and
actions on each primitive, respectively.
In this section, we introduce our proposed method, namely
a lower dimensional system using softness and environmental
A. Preliminaries
We consider the gripper position and orientation in a
Cartesian coordinate system. We implement the compliant
wrist between the tip of the robot arm and the gripper. Here,
we assume that the gripper and the peg are tightly coupled
and that the position of the gripper can be measured. Based
on these assumptions, the dynamics of the gripper can be
given as
xt+1 =f(xt,ut) + ω,(1)
with contentious state xRD, command actions uRF,
and additive Gaussian noises ω. We employ a data-driven
approach to obtain the policy πfor the peg-in-hole task
because of the complex dynamics derived from the softness.
However, if we directly apply learning approaches to the
system, the required amount of training data would increase
since the number of the dimensions of states and actions
becomes large. For this reason, we need to create a lower
dimensional system to make the learning tractable.
B. Manipulation primitives with softness
To leverage the environmental constraints, we employ
graphical skill formulation, which segments the sequence of
the peg-in-hole skills [4]. When the peg contacts the surface
or hole, the direction the robot can move in is constrained
for each event such as fit, align, and insertion. In this
manner, the authors could reduce the target dimensionality
of the task. Unlike the formulation [4], we can ignore the
force controller and only focus on the position and velocity
controller because we use a compliant robot, specifically
one that can adsorb contacts with the environment. This can
reduce the number of dimensions for the action space.
The skill graph consists of edges eEand nodes nN.
We refer to the nodes as MPs. Each MP has a pre-designed
position controller or learned velocity controller. The transi-
tions between the MPs are triggered by ”conditions.” Here,
we define three conditions and a transition as follows:
1) Pre-condition Cpre initializes the skills. If ||xT
xinit|| < δinit , where xinit is an initial position, T
is a timestep and δinit is a threshold for the position
error, it proceeds to the next primitive. Otherwise, it
performs the initialization again.
2) Success condition Csuc stops the skills when they are
successful. If ||xTxsuc|| < δsuc , it stops the task.
Otherwise, it returns to the pre-condition.
3) error condition Cerr stops the skills. If the robot reach
is outside the operation range x>xout, it returns to
the pre-condition.
4) Transition eiswitches the MPs. If ||xTxd|| < δ, the
MP moves from nto n+ 1. Otherwise, it returns to
the pre-condition.
Based on this formalism, we design the MPs for the peg-
in-hole task. For simplicity, we consider a 2D plane system
based on the previous works by [4], [28].
n1(Approach): The robot moves close to the surface.
The state and action are given as follows:
x(approach) = [y, z]>,u(approach) = [yd, zd]>
In this primitive, we employ a pre-designed position
controller with a motion planner to move from the initial
position to the desired position.
n2(Contact): The robot contacts the surface.
x(contact) =z, u(contact) =zd
Similar to the approach primitive, we use the position
controller. Since the softness allows safe contact with
the surface, we do not have to apply force controllers.
n3(Fit): The robot moves laterally in the ydirection
while keeping contact with the surface until the peg fits
in the hole.
x(fit) = [y, ˙y]>, u(fit) = ˙yd
We use a velocity controller with reinforcement learn-
ing. Since the peg is pressed down on the surface by the
softness, the zdirection can be constrained. Therefore,
we only consider the ydirection for both the state
and action space. Continued movement in the direction
allows the edge of the peg to fit in the hole.
n4(Align): The robot is rotated in the αdirection until
the peg and holes are vertically aligned.
x(align) = [α, ˙α]>, u(align) = ˙yd
We can focus on the state in the αrotation and action
space in the ydirection only because the peg is con-
strained in the hole, and the robot can naturally generate
the rotation motion whose center is the edge of the peg.
We can also ignore the zdirection since the softness can
absorb the deformation of the zdirection.
n5(Insertion): The peg is inserted in the hole to the
desired depth.
x(insertion) = [z, ˙z]>,u(insertion) = [ ˙yd,˙zd]>
We focus on the state of the zdirection only since the
peg moves in that direction alone. We consider both the
yand zaction spaces in case the peg jams.
In this section, we briefly introduce the model-based rein-
forcement learning method. We learn the velocity controllers
on the fit, align, and insertion primitives with PILCO [6]
to address complex systems such as contacts and softness.
The learning goal is to obtain a control policy u(n)=
π(n)(x(n),θ(n))that maximizes the long-term reward on
each primitive n. For simplicity, we denote the state x(n),
dynamics f(n), policy π(n), and policy parameter θ(n)as x,
f,π, and θ, respectively.
The reward is expressed as follows:
Jπ(θ) = ΣT
where Jπevaluates the reward for Ttimesteps, θis the
policy parameter, and an immediate reward r(xt)is given
r(xt) = exp 1
where xdis a desired state, and Wis a diagonal matrix for
a weight of the reward function.
A. Model learning
PILCO employs Gaussian process regression [29] to learn
the dynamics. We consider the training inputs and outputs
as ˜
x= [x>
t]RD+Fand t=xt+1 xtRD,
respectively. Given lset of training input ˜
X= [˜
x1, ..., ˜
and output y= [1, ..., l]data, the predictive distribution
for the next timestep xt+1 is represented as
p(xt+1|xt,ut) = N(xt+1 |µt+1,Σt+1),(4)
µt+1 =xt+k>
Σt+1 =k∗∗ k>
Here, k:= k(˜
xt),k∗∗ := k(˜
xt), where Kis a kernel
matrix, each of whose element follows Kij =kxi,˜xj), and
σξis a noise parameter.
We use the following squared exponential kernel function:
xj) = σ2
fexp 1
where Λis a precision matrix that expresses the character-
istic length, and σfis the bandwidth parameter.
B. Control policy
We use a deterministic Gaussian process policy. Since it
has high expressiveness, it is suitable for encoding complex
skills. The following control policy, given a test input x, is
represented as
π(x,θ) =
where tis a training target, M= [m1, ..., mH]are the
centers of the Gaussian basis functions, σ2
πis noise variance,
and kis a kernel function. A policy parameter θis composed
of M,t, and the scale of the kernel functions.
C. Policy evaluation and improvement
To evaluate the control policy, we need to compute the
distribution of tby integrating the distributions of ˜
f. However, it cannot be obtained analytically. To this end,
PILCO employs an approximation with an analytic moment
matching technique.
The policy parameter θis optimized by gradient descent
for Jπ(θ). The gradient ∂J π(θ)/∂θis calculated by the
chain rule technique. We apply a standard gradient-based
optimization method called conjugate gradient (CG).
A. Simulation
To verify our method, we conducted a simple simulation
of the peg-in-hole for our proof of concept. The goal of the
simulation was to investigate whether our lower dimensional
system could accelerate learning.
1) Setup: We developed a peg-in-hole environment with
the Box2D physics engine [30]. We considered a 2D plane
environment with a simplified model of our wrist, and com-
pared our method involving the lower dimensional system
with the method without it. The center block simulated the
robot arm, which was controlled directly with the velocity
commands for the yand zdirections. The peg and arm
were connected by springs to emulate the compliant wrist
(see Fig. 3). Note that we placed the frames to surround the
center block so that the springs could not be extended too
much. Although the kinematic structure was different from
the real robot’s one (see Fig. 5), the functionality could be
considered equivalent. In this simulation, we learned the fit,
align, and insertion primitives. The dynamical system of our
method was given in Sec. III-B. The comparison considered
all dimensions x= [y, z, α, ˙y, ˙z, ˙α]>,u= [ ˙yd,˙zd]>.
Fig. 3. Snapshots of simulation with successfully learned policies. The peg and arm (the center block) were connected by springs to emulate the compliant
wrist. We placed frames to surround the arm so that the springs could not be extended too much.
0 5 10 15 20
y position error
w/ lower dim. system (proposed)
w/o lower dim. system
0 5 10 15 20
angle error
0 5 10 15 20
z position error
Fig. 4. Average position or angle errors on each trial and primitive. The blue lines show the results of our proposed method. The red lines show the
results excluding the lower dimensional system. Our method could learn policies faster in comparison to the other method.
Motion capture
camera × 6
Compliant wrist
× 4
Fig. 5. Experimental setup composed of a 6-degree of freedom robot,
compliant wrist, gripper, and motion capture system.
We used an open source code for PILCO [31]. On each
primitive, PILCO learned the policy to reach the desired po-
sition. In the fit, align, and insertion primitives, the timestep
Twas 15, 10, and 10, respectively. We used the reward
function given in Eq. (3) to reduce the position error. The
control frequency was 20 Hz. Two initial data collection trials
in which the robot provides uniform random actions, were
conducted, and the number of learning trials was 20. We
evaluated the errors between the desired and current positions
on each trial in the two methods. We expected that our
method would show better performance since the area of
exploration was smaller than that in the other method.
2) Results of the simulation: Fig. 3 shows snapshots of
the peg-in-hole skills.The robot could leverage its constraints
and successfully learned the skills. Fig. 4 shows the position
errors on each primitive in the two methods to see the
learning effects. The x-axes show the learning trials and the
y-axes show the y,α, and zposition errors, which are the
most important states in the fit, align, and insertion primi-
tives, respectively. We evaluated 30 learning sessions. The
error bars show the standard deviations for the sessions. Our
method shows smaller errors than the compared counterpart,
and converges in a few trials. This result demonstrates that
our method successfully learned the peg-in-hole task faster
than that without the lower dimensional system.
B. Experiment with a real robot
Next, we performed a real-robot experiment to evaluate
our method in a real environment.
1) Setup: We used a Universal Robot UR5 and devel-
oped the compliant wrist, which could be attached between
the arm and a Robotiq 2F-85 gripper (see Fig. 5). Note
that the peg was fixed firmly to the gripper using a 3D
printed jig. The compliant wrist was composed of three
springs, which allowed deformation in six directions and
could switch between the rigid and soft modes (see Fig. 5).
We implemented Cartesian position and velocity controllers
using MoveIt [32]. For the approach primitive, we employed
the position controller with the rigid mode. For the contact
primitive, we switched to the soft mode from the rigid one.
We provided the desired position zdso that the tip of the peg
could touch the ground when switching to the soft mode. We
kept the soft mode from the contact to insertion primitives.
The position was 10 mm away from the center of the hole.
Then, we employed the velocity controller for the fit, align,
and insertion primitives. The diameter of the peg was 10 mm.
The tolerance between the peg and the hole was 15 µm. We
Fig. 6. Snapshots of the experiment with the real robot. Our method successfully learned peg-in-hole skills even with the real robot.
y position error [m]
angle error [rad]
1 2 3 4 5
z position error [m]
Fig. 7. Average position or angle errors on each trial and primitive. The
error bars show the standard deviations for 10 learning sessions. All the
errors decreased in the fifth trial.
utilized Optitrack, a motion capture system, to measure the
position of the gripper. We connected our learning framework
with the controllers for UR5 and the motion capture system
using the Robot Operating System (ROS).
We also used the same framework as in the simulation.
In this real robot experiment, we focused on position or
angle states alone due to the noisy observations. First, we
learned the fit primitive. Then, we executed the learned fit
primitive and start learning the align primitive. The same
procedure was performed for the insertion primitive. The
control frequency was 5 Hz. One initial data collection
trial, where the robot provides uniform random actions, was
conducted, and five learning trials were undertaken. The
timesteps Tfor the fit, align, and insertion primitives were
10, 10, and 15, respectively.
2) Results of the experiment: We collected 60 and 75 data
points corresponding to a total of 12 and 15 s interactions for
the fit and align, and insertion primitives, respectively. Fig. 6
shows the snapshots for the learned peg-in-hole skills. The
robot also successfully learned the peg-in-hole skill. Fig. 7
shows the average position error on each trial and primitive.
We test 10 learning sessions. In the fit primitive, the errors
converged with the second trial. In the align primitive, the
error became smaller in the fifth trial. Finally, in the insertion
primitive, the errors were small since the robot succeeds in
reaching the bottom of the hole in all the trials.
We evaluated the success rate on each primitive after
learning. We defined success as the allowable position error
at the maximum timestep Ton each primitive, which could
proceed to next primitives. The success rates were 8/10,
9/10, and 10/10 on the fit, align, and insertion primitives,
respectively. The results show that our method successfully
learns peg-in-hole skills even when using a real robot.
Here, we discuss the experimental results and limitations
of this study, and scope for future work.
We demonstrated that our method successfully learned the
peg-in-hole tasks in most of the learning sessions with a
few trials. Our method allows the robot to complete the
tasks without the force controllers and precise force/torque
sensors. Some failure cases in the fit and align primitives
caused overshoots for all the trials. The insertion primitives
were always successful because the align was precisely
performed and jamming did not occur. It could be useful to
obtain recovery skills to address jamming during insertion.
In this experiment, we only considered the position state
space as we observed the noisy velocity measurements
probably due to the motion capture system or vibration of
the compliant wrist. For faster task completion and more
precise control, velocity feedback could be included in the
policy’s observations. To this end, a policy search method
robust to noisy observations [33] would be helpful. By using
this, we could make our robot system even simpler to use
other sensors such as an inertial measurement unit (IMU) and
a distance sensor to measure deformation of the compliant
wrist instead of the motion capture system.
For future work, we will consider extending the proposed
method to deal with unseen tasks (e.g., different goals or peg
sizes). A meta-learning method, which can adapt to unseen
tasks with a few trials [34] would be promising. In addition,
we will explore the optimal stiffness of the compliant wrist,
which can accelerate the learning.
In this study, we proposed a control framework for as-
sembly skills using a soft robot. We argued that the softness
allows the robot to interact with the environment and leverage
its constraints. This approach can lead to lower dimensional
systems and considerably ease the exploration of appropriate
skills. We employed manipulation primitives and sample
efficient model-based reinforcement learning. We performed
a simulation and an experiment with a real robot. The results
showed that our method successfully and efficiently learned
the needed skills in the lower dimensional system.
[1] J. Xu, Z. Hou, Z. Liu, and H. Qiao, “Compare contact model-based
control and contact model-free learning: A survey of robotic peg-in-
hole assembly strategies,” arXiv preprint arXiv:1904.05240, 2019.
[2] Y. Karako, S. Kawakami, K. Koyama, M. Shimojo, T. Senoo, and
M. Ishikawa, “High-speed ring insertion by dynamic observable
contact hand,” in IEEE International Conference on Robotics and
Automation, 2019, pp. 2744–2750.
[3] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar,
and P. Abbeel, “Reinforcement learning on variable impedance con-
troller for high-precision robotic assembly,” in IEEE International
Conference on Robotics and Automation, 2019, pp. 3080–3087.
[4] L. Johannsmeier, M. Gerchow, and S. Haddadin, “A framework for
robot manipulation: Skill formalism, meta learning and adaptive con-
trol,” in IEEE International Conference on Robotics and Automation,
2019, pp. 5844–5850.
[5] K. Hang, A. S. Morgan, and A. M. Dollar, “Pre-grasp sliding manip-
ulation of thin objects using soft, compliant, or underactuated hands,”
IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 662–669,
[6] M. P. Deisenroth, D. Fox, and C. E. Rasmussen, “Gaussian processes
for data-efficient learning in robotics and control,IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–
423, 2013.
[7] P. Falco, A. Attawia, M. Saveriano, and D. Lee, “On policy learning
robust to irreversible events: An application to robotic in-hand ma-
nipulation,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp.
1482–1489, 2018.
[8] M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto,
“Learning assistive strategies for exoskeleton robots from user-robot
physical interaction,” Pattern Recognition Letters, vol. 99, pp. 67–76,
[9] K. Zhang, J. Xu, H. Chen, J. Zhao, and K. Chen, “Jamming anal-
ysis and force control for flexible dual peg-in-hole assembly,IEEE
Transactions on Industrial Electronics, vol. 66, no. 3, pp. 1930–1939,
[10] X. Zhang, Y. Zheng, J. Ota, and Y. Huang, “Peg-in-hole assembly
based on two-phase scheme and f/t sensor for dual-arm robot,”
Sensors, vol. 17, no. 9, p. 2004, 2017.
[11] K. Van Wyk, M. Culleton, J. Falco, and K. Kelly, “Comparative peg-
in-hole testing of a force-based manipulation controlled robotic hand,”
IEEE Transactions on Robotics, vol. 34, no. 2, pp. 542–549, 2018.
[12] H. Park, J.-H. Bae, J.-H. Park, M.-H. Baeg, and J. Park, “Intuitive
peg-in-hole assembly strategy with a compliant manipulator,” in IEEE
International Symposium on Robotics, 2013, pp. 1–5.
[13] R. Li and H. Qiao, “Condition and strategy analysis for assembly
based on attractive region in environment,” IEEE/ASME Transactions
on Mechatronics, vol. 22, no. 5, pp. 2218–2228, 2017.
[14] M. Vecerik, O. Sushkov, D. Barker, T. Rothrl, T. Hester, and J. Scholz,
“A practical approach to insertion with variable socket position using
deep reinforcement learning,” in IEEE International Conference on
Robotics and Automation, 2019, pp. 754–760.
[15] J. Xu, Z. Hou, W. Wang, B. Xu, K. Zhang, and K. Chen, “Feedback
deep deterministic policy gradient with fuzzy reward for robotic
multiple peg-in-hole assembly tasks,” IEEE Transactions on Industrial
Informatics, vol. 15, no. 3, pp. 1658–1667, 2019.
[16] Y. Fan, J. Luo, and M. Tomizuka, “A learning framework for high
precision industrial assembly,” in IEEE International Conference on
Robotics and Automation, 2019, pp. 811–817.
[17] M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-
Fei, A. Garg, and J. Bohg, “Making sense of vision and touch:
self-supervised learning of multimodal representations for contact-rich
tasks,” in IEEE International Conference on Robotics and Automation,
2019, pp. 8943–8950.
[18] S. Levine, N. Wagener, and P. Abbeel, “Learning contact-rich ma-
nipulation skills with guided policy search,” in IEEE International
Conference on Robotics and Automation, 2015, pp. 156–163.
[19] J. Morimoto, “Soft humanoid motor learning,” Science Robotics,
vol. 2, no. 13, p. eaaq0989, 2017.
[20] K.-L. Du, B.-B. Zhang, X. Huang, and J. Hu, “Dynamic analysis of
assembly process with passive compliance for robot manipulators,
in IEEE International Symposium on Computational Intelligence in
Robotics and Automation, vol. 3, 2003, pp. 1168–1173.
[21] S.-k. Yun, “Compliant manipulation for peg-in-hole: Is passive compli-
ance a key to learn contact motion?” in IEEE International Conference
on Robotics and Automation, 2008, pp. 1647–1652.
[22] R. Li, R. Platt, W. Yuan, A. ten Pas, N. Roscup, M. A. Srinivasan,
and E. Adelson, “Localization and manipulation of small parts using
gelsight tactile sensing,” in IEEE/RSJ International Conference on
Intelligent Robots and Systems, 2014, pp. 3988–3993.
[23] K. Nozu and K. Shimonomura, “Robotic bolt insertion and tightening
based on in-hand object localization and force sensing,” in IEEE/ASME
International Conference on Advanced Intelligent Mechatronics, 2018,
pp. 310–315.
[24] T. Nishimura, Y. Suzuki, T. Tsuji, and T. Watanabe, “Peg-in-hole
under state uncertainties via a passive wrist joint with push-activate-
rotation function,” in IEEE-RAS International Conference on Hu-
manoid Robotics, 2017, pp. 67–74.
[25] T. Deaconescu and A. Deaconescu, “Pneumatic muscle-actuated ad-
justable compliant gripper system for assembly operations,” Strojniski
Vestnik-Journal of Mechanical Engineering, vol. 63, no. 4, pp. 225–
235, 2017.
[26] R. Deimel and O. Brock, “A novel type of compliant and underactuated
robotic hand for dexterous grasping,” The International Journal of
Robotics Research, vol. 35, no. 1-3, pp. 161–185, 2016.
[27] N. C. Dafle, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa,
M. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. Fuhlbrigge,
“Extrinsic dexterity: in-hand manipulation with external forces,” in
IEEE International Conference on Robotics and Automation, 2014,
pp. 1578–1585.
[28] T. Tang, H.-C. Lin, and M. Tomizuka, “A learning-based framework
for robot peg-hole-insertion,” in ASME Dynamic Systems and Control
Conference, 2015, pp. 1–9.
[29] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine
learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.
[30] “Box2D,”
[31] “Probabilistic Inference for Learning Control (PILCO),” https://github.
[32] “MoveIt,”
[33] P. Parmas, C. E. Rasmussen, J. Peters, and K. Doya, “PIPPS: Flexible
model-based policy search robust to the curse of chaos,” in Interna-
tional Conference on Machine Learning, vol. 80, 2018, pp. 4065–4074.
[34] S. Sæmundsson, K. Hofmann, and M. P. Deisenroth, “Meta reinforce-
ment learning with latent variable gaussian processes,” in Conference
on Uncertainty in Artificial Intelligence, 2018.
... Our prior work presented learning assembly controllers for the soft robot [18]. We leveraged softness and environmental constraints so that the robot can complete tasks in a lowerdimensional state and action spaces. ...
... We utilized a Universal Robot (UR5) and a compliant wrist, which was attached between the arm and a Robotiq 2F-85 gripper (Fig. 3) [18]. A peg was firmly affixed to the gripper using a 3D printed jig for simplicity. ...
... Three demonstrations were successful demonstrations and 12 were failed ones (three demonstrations with four directions). In the successful demonstrations, the participant was instructed to imitate the manipulation primitives [31], [18] to help the robot perform the task more successfully. The manipulation primitives consisted of three step: 1) fit the tip of the peg into the hole, 2) align the peg vertically while keeping contacts with the hole, and 3) insert the peg to the bottom of the hole. ...
Conference Paper
Full-text available
Physically soft robots are promising for robotic assembly tasks as they allow stable contacts with the environment. Meanwhile, designing appropriate controllers is still challenging due to the difficulty of modeling contact-rich soft body interactions. In this study, we propose a novel learning system for soft robotic assembly strategies. We formulate this problem as a reinforcement learning task and design the reward function from human demonstrations. Our key insight is that the failed demonstrations can be used as constraints to avoid failed behaviors. To this end, we developed a teaching device with which humans can intuitively provide various demonstrations. Moreover, we leverage Physically-Consistent Gaussian Mixture Models to clearly assign Gaussian components to the successful and failed trials, which are originally intended to model complicated trajectories with directional and local similarity. Using the components, we then create the reference trajectories via Gaussian Mixture Regressions, which fit the successful demonstrations while considering the failed ones. Finally, we apply a sample-efficient deep model-based reinforcement learning method to obtain robust strategies with a few interactions. To validate our method, we developed a real-robot experimental system composed of a rigid collaborative robot arm with a compliant wrist and the teaching device. Our results demonstrate that our method successfully learned the assembly strategies with a higher success rate than when using only successful demonstrations.
... As soft robots are also suitable for physical human-robot interactions, learning approaches for the interactions have been proposed for assistive robots [26], [27], [28]. We used a robot with an underactuated soft wrist module composed of springs [8], [29]. The soft wrist is useful for our problem setting, as we tackle contact-rich tasks and physical interactions simultaneously. ...
... Simulation platform: We applied peg-in-hole tasks because they are difficult to achieve owing to complex dynamics from contact-richness as well as human-robot interactions. We developed a simulation that emulates a soft robot previously proposed in [8], [29] using the PyBullet environment [32]. Similarly, we deployed virtual 6 DoF compliance components between the end-effector and the tip of the arm, as shown in Fig. 2. We implemented the compliance virtually using PD controllers for the six axes such that the end-effector could return to the equilibrium length. ...
Full-text available
In this study, we developed a novel learning framework from physical human-robot interactions. Owing to human domain knowledge, such interactions can be useful for facilitation of learning. However, applying numerous interactions for training data might place a burden on human users, particularly in real-world applications. To address this problem, we propose formulating this as a model-based reinforcement learning problem to reduce errors during training and increase robustness. Our key idea is to develop 1) an advisory and adversarial interaction strategy and 2) a human-robot inter-action model to predict each behavior. In the advisory and adversarial interactions, a human guides and disturbs the robot when it moves in the wrong and correct directions, respectively. Meanwhile, the robot tries to achieve its goal in conjunction with predicting the human's behaviors using the interaction model. To verify the proposed method, we conducted peg-in-hole experiments in a simulation and real-robot environment with human participants and a soft robot, which has an underactuated soft wrist module. The robot is suitable for our setting as it can safely interact with both the contact-rich tasks and humans. The experimental results showed that our proposed method had smaller position errors during training and a higher number of successes than the baselines without any interactions and with random into
... As a practical scenario, we address a robotic assembly, more specifically a challenging peg-in-hole task with variable hole orientations using a robotic arm with a soft wrist connecting the end of the arm to the gripper with springs [13]. Although recent work has revealed the effectiveness of soft robots for contact-rich manipulation, learning their complex dynamics in MBRL is still challenging and has been done independently for every single task [14]. Through extensive simulation and real-robot experiments, we confirmed that the proposed TRANS-AM enabled soft robots to accomplish a target task in a shorter time by utilizing dynamics models acquired in source environments, compared to when conducting MBRL in the target environment from scratch. ...
... Nevertheless, the complexity of soft robotic bodies makes it hard to design controllers manually [35] and required datadriven approaches [36]- [38]. Most recently, MBRL has been used to learn soft-robotic control tasks [14], [39]. Our study extends this line of research and proposes a transfer RL method that works effectively for sample efficiency even when state-transition dynamics are complex and unknown due to the softness of robotic arms. ...
Conference Paper
Full-text available
Practical industrial assembly scenarios often require robotic agents to adapt their skills to unseen tasks quickly. While transfer reinforcement learning (RL) could enable such quick adaptation, much prior work has to collect many samples from source environments to learn target tasks in a model-free fashion, which still lacks sample efficiency on a practical level. In this work, we develop a novel transfer RL method named TRANSfer learning by Aggregating dynamics Models (TRANS-AM). TRANS-AM is based on model-based RL (MBRL) for its high-level sample efficiency, and only requires dynamics models to be collected from source environments. Specifically, it learns to aggregate source dynamics models adaptively in an MBRL loop to better fit the state-transition dynamics of target environments and execute optimal actions there. As a case study to show the effectiveness of this proposed approach, we address a challenging contact-rich peg-in-hole task with variable hole orientations using a soft robot. Our evaluations with both simulation and real-robot experiments demonstrate that TRANS-AM enables the soft robot to accomplish target tasks with fewer episodes compared when learning the tasks from scratch.
... For the other algorithms, Ding et al. [154] developed a multipose force-torque state representation method and learned the dynamic model via two-layer LSTM, which transformed online policy learning offline. Hamaya et al. [155] leveraged softness and environmental constraints to reduce the dimensions of the state and action spaces and applied the Probabilistic Inference for Learning Control (PILCO) to obtain the appropriate skills. The performance of model-based RL largely depends on the accuracy of the learned model. ...
The application of robots in mechanical assembly increases the efficiency of industrial production. With the requirements of flexible manufacturing, it has become a research hotspot to accomplish diversified assembly operations safely and efficiently in unstructured environments. In recent years, many advanced robot assembly strategies have been proposed. Fault monitoring and strategy performance evaluation have also attracted more attention. To promote the development of robotic assembly, this paper systematically reviews the recent research in this field. According to the assembly process, the review separates the research contents into target recognition and searching, compliant strategies for fine insertion motion and fault monitoring. The characteristics of each method are summarized. Furthermore, a performance evaluation for assembly strategies is proposed with typical metrics. We surveyed the classical benchmarks to provide support for standardized performance evaluation. Finally, the challenges and potential directions are discussed.
... Morphological computation aims to design intrinsic dynamics with 'embodied intelligence' [24], which has been applied to designing damping for locomotion [25]. Compliance can be optimized to reduce the need for information [26] or reduce the problem dimension [27] in manipulation tasks. ...
Full-text available
The objective of many contact-rich manipulation tasks can be expressed as desired contacts between environmental objects. Simulation and planning for rigid-body contact continues to advance, but the achievable performance is significantly impacted by hardware design, such as physical compliance and sensor placement. Much of mechatronic design for contact is done from a continuous controls perspective (e.g. peak collision force, contact stability), but hardware also affects the ability to infer discrete changes in contact. Robustly detecting contact state can support the correction of errors, both online and in trial-and-error learning. Here, discrete contact states are considered as changes in environmental dynamics, and the ability to infer this with proprioception (motor position and force sensors) is investigated. A metric of information gain is proposed, measuring the reduction in contact belief uncertainty from force/position measurements, and developed for fully- and partially-observed systems. The information gain depends on the coupled robot/environment dynamics and sensor placement, especially the location and degree of compliance. Hardware experiments over a range of physical compliance conditions validate that information gain predicts the speed and certainty with which contact is detected in (i) monitoring of contact-rich assembly and (ii) collision detection. Compliant environmental structures are then optimized to allow industrial robots to achieve safe, higher-speed contact.
... The applicability of the jamming gripper has been researched for different purposes, such as in the feet of robots for walking on natural terrain [23], [24] and climbing walls [25]. Such soft robotics technologies are expected to be applied to the field of robotic assembly [26], [27] for high-mix low-volume production. ...
... In this framework, a task is divided into sub-tasks named "manipulation primitives." Another study [17] proposed a learning method for peg-in-hole tasks using manipulation primitives. A robot using this method learned the task faster by dividing the task into sub-tasks and using a soft wrist, which decreased the number of variables in the action and state. ...
Conference Paper
Full-text available
Robotic contact juggling is a challenging task in which robots must control the movement of a ball rapidly and indirectly without holding it while keeping the ball in and sometimes out of contact with the robot's body. In this work, we address the problem of learning such robotic contact juggling from trial and error via model-based reinforcement learning (MBRL). The key insight is that complex robot-ball interactions of the contact juggling actually consist of a small set of simple dynamics that each corresponds to a distinct interaction "primitive" such as touching and releasing the ball. Accordingly, we develop a tailored MBRL method that incrementally fits a set of simple dynamics models to the movements of a robot and a ball while also learning a switching model that can select a proper dynamics model depending on the current state and action. The learned model can then be used in an MBRL framework to seek optimal juggling control. We demonstrated the effectiveness of our approach on a simulator of contact juggling performed by a robotic arm.
... Although this choice compromises the reduced training time, we adopt sim2real to address this issue and argue that the use of MPs as meta-actions makes the sim to real transfer more efficient. In the same vein, Hamaya et al. [5] divide the peg-in-hole task into five steps with different action spaces and state spaces. This decomposition greatly speeds up training through dimensional reduction of action and state spaces. ...
This paper explores the idea that skillful assembly is best represented as dynamic sequences of Manipulation Primitives, and that such sequences can be automatically discovered by Reinforcement Learning. Manipulation Primitives, such as ``Move down until contact'', ``Slide along x while maintaining contact with the surface'', have enough complexity to keep the search tree shallow, yet are generic enough to generalize across a wide range of assembly tasks. Policies are learned in simulation, and then transferred onto the physical platform. Direct sim2real transfer (without retraining in real) achieves excellent success rates on challenging assembly tasks, such as round peg insertion with 100 micron clearance or square peg insertion with large hole position/orientation estimation errors.
Full-text available
We address the problem of pre-grasp sliding manipulation, which is an essential skill when a thin object cannot be directly grasped from a flat surface. Leveraged on the passive reconfigurability of soft, compliant, or underactuated robotic hands, we formulate this problem as an integrated motion and grasp planning problem, and plan the manipulation directly in the robot configuration space. Rather than explicitly pre-computing a pair of valid start and goal configurations, and then in a separate step planning a path to connect them, our planner actively samples start and goal robot configurations from configuration sampleable regions modeled from the geometries of the object and support surface. While randomly connecting the sampled start and goal configurations in pairs, the planner verifies whether any connected pair can achieve the task to finally confirm a solution. The proposed planner is implemented and evaluated both in simulation and on a real robot. Given the inherent compliance of the employed Yale T42 hand, we relax the motion constraints and show that the planning performance is significantly boosted. Moreover, we show that our planner outperforms two baseline planners, and that it can deal with objects and support surfaces of arbitrary geometries and sizes.
Conference Paper
Learning from small data sets is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we frame meta learning as a hierarchical latent variable model and infer the relationship between tasks automatically from data. We apply our framework in a modelbased reinforcement learning setting and show that our meta-learning model effectively generalizes to novel tasks by identifying how new tasks relate to prior ones from minimal data. This results in up to a 60% reduction in the average interaction time needed to solve tasks compared to strong baselines.