Conference PaperPDF Available

Model-free and learning-free grasping by Local Contact Moment matching

Authors:

Abstract and Figures

This paper addresses the problem of grasping arbitrarily shaped objects, observed as partial point-clouds, without requiring: models of the objects, physics parameters, training data, or other a-priori knowledge. A grasp metric is proposed based on Local Contact Moment (LoCoMo). LoCoMo combines zero-moment shift features, of both hand and object surface patches, to determine local similarity. This metric is then used to search for a set of feasible grasp poses with associated grasp likelihoods. LoCoMo overcomes some limitations of both classical grasp planners and learning-based approaches. Unlike force-closure analysis, LoCoMo does not require knowledge of physical parameters such as friction coefficients, and avoids assumptions about fingertip contacts, instead enabling robust contacts of large areas of hand and object surface. Unlike more recent learning-based approaches, LoCoMo does not require training data, and does not need any prototype grasp configurations to be taught by kinesthetic demonstration. We present results of real-robot experiments grasping 21 different objects, observed by a wrist-mounted depth camera. All objects are grasped successfully when presented to the robot individually. The robot also successfully clears cluttered heaps of objects by sequentially grasping and lifting objects until none remain.
Content may be subject to copyright.
Model-free and learning-free grasping
by Local Contact Moment matching
Maxime Adjigble1, Naresh Marturi1, Valerio Ortenzi2, Vijaykumar Rajasekaran1,
Peter Corke2, and Rustam Stolkin1
Abstract—This paper addresses the problem of grasping arbi-
trarily shaped objects, observed as partial point-clouds, without
requiring: models of the objects, physics parameters, training
data, or other a-priori knowledge. A grasp metric is proposed
based on Local Contact Moment (LoCoMo). LoCoMo combines
zero-moment shift features, of both hand and object surface
patches, to determine local similarity. This metric is then used
to search for a set of feasible grasp poses with associated grasp
likelihoods. LoCoMo overcomes some limitations of both classical
grasp planners and learning-based approaches. Unlike force-
closure analysis, LoCoMo does not require knowledge of physical
parameters such as friction coefficients, and avoids assumptions
about fingertip contacts, instead enabling robust contacts of large
areas of hand and object surface. Unlike more recent learning-
based approaches, LoCoMo does not require training data, and
does not need any prototype grasp configurations to be taught
by kinesthetic demonstration. We present results of real-robot
experiments grasping 21 different objects, observed by a wrist-
mounted depth camera. All objects are grasped successfully when
presented to the robot individually. The robot also successfully
clears cluttered heaps of objects by sequentially grasping and
lifting objects until none remain.
I. INTRODUCTION
Robots have been routinely and reliably grasping a vast
variety of objects in manufacturing environments for several
decades. This is based on simple pre-programmed actions, on
exactly pre-defined objects, in highly structured environments.
However, autonomous, vision-guided grasping, in unstructured
environments, remains an open research problem. In this paper,
we assume that the robot has a model of itself, but does not
have any models or prior knowledge of the objects that it is
tasked with grasping. These objects may take arbitrary shape
and appear amidst clutter, observed as noisy partial point-
clouds. Our main contribution is to show how this problem
can be approached without needing either classical physics
analysis, or any learning from training data.
Classical grasping methods based on physics analysis [1],
[2] typically require the robot to have detailed knowledge of
the grasped object’s shape, mass and mass distribution, and
friction coefficients between object surfaces and hand parts. It
is common to assume point or fingertip contacts, with contacts
of large surface areas of the hand becoming analytically
intractable. More recent work has investigated a variety of
machine learning approaches to grasping [3]–[5]. Learning
1M.Adjigble, N. Marturi, V. Rajasekaran and R. Stolkin are with the
Extreme Robotics Laboratory, School of Metallurgy and Materials, University
of Birmingham, UK. maxime.adjigble@gmail.com
2V. Ortenzi and P. Corke are with the ARC Centre of Excellence for
Robotic Vision, Queensland University of Technology, Brisbane QLD 4001,
Australia. http://www.roboticvision.org
Figure 1. (Top-left) Point cloud of the object. (Top-right) Contact moment
features for a single finger with planar surface. Red, yellow and green
respectively encodes increasing values of the metric in this order computed
using (3). (Bottom-left) Generated grasp with the highest contact probability.
(Bottom-right) Grasp executed on the robot.
approaches seek to encode a more direct link between the
geometry of a scene (typically observed as a point-cloud) and
grasp hypotheses. Such methods have significantly contributed
to overcoming limitations of classical methods. However,
all of these methods require training data (some more and
some less). Most of these methods also require prototypical
grasps (pinch-grasp, power-grasp, edge-grasp etc.) to be taught
by kinesthetic demonstration or pre-progamming, albeit that
learning-based methods can often adapt these pre-taught hand
configurations to new object shapes (generalisation) with some
success.
In this paper, we propose a novel algorithm for computing
robust grasp hypotheses on arbitrarily shaped objects. The
overall grasping pipeline is depicted in Fig. 1. Given a point-
cloud view of a surface, and the kinematics of the robot’s
arm and hand, our algorithm outputs a variety of feasible
grasp poses for the hand, and evaluates each according to a
novel grasp likelihood metric. A collision-free reach-to-grasp
trajectory is then sought, and the highest-likelihood reachable
grasp is executed. Like recent learning-based methods, our
method also maps observed surface shapes directly to grasp
hypotheses. However, this mapping is not achieved by learn-
ing, does not require any training data, nor does it require any
kinesthetic teaching or pre-programming of prototypical grasp
configurations. Instead, we propose a novel grasp likelihood
metric, the local contact moment probability function, which
evaluates the shape compatibility between local parts of hand
or finger surface, and local parts of an observed point-cloud.
Local contact moment (LoCoMo) is based on computing
zero-moment shift features for local parts of the observed point
cloud, and also parts of the robot’s hand. First described in the
computer graphics literature [6], zero moment shift features
represent the characteristics of limited regions of surfaces,
and are especially good at encoding information about surface
curvature, Fig. 2, which is particularly important for matching
hand parts to a grasped object. These features represent the
surface characteristics of a limited region of the point cloud,
hence they are “local” features. Also, they are computed on
the point cloud without the need of any a-priori knowledge of
the object (i.e., model-free).
Using LoCoMo as a fitness function, a point-cloud surface
can be efficiently searched for good matches to finger surface
geometry. Kinematic analysis then yields a set of feasible
grasps, with each grasp associated with a grasp likelihood.
The motion-space of the arm is then explored to find collision-
free reach-to-grasp trajectories to the highest likelihood grasp
poses.
The main contributions of this work are:
We propose the use of zero moment shift features [6] for
robotic grasp-planning.
We propose a new metric, the local contact moment
probability function, for evaluating compatibility between
the surface geometries of local parts of both object and
gripper. This metric is model-free, and does not need to
be learned from training data.
Exploitation of the kinematics of the robot to select a
subset of the graspable points, first identified by LoCoMo,
that are kinematically reachable and feasible for the arm
and hand system.
The remainder of this paper is structured as follows: Section
II highlights the novelties of this work with respect to related
literature. Section III describes the technical details of our
proposed method. Section IV shows the results of a number
of experiments conducted using a Schunk industrial two-finger
hand mounted on a KUKA LBR iiwa manipulator arm. Section
V provides concluding remarks.
II. RE LATE D WOR K
Classical approaches to grasping predominantly use
physics-based analysis to compute force-closure [7]–[11].
Most of these approaches rely on a large amount of a-
priori knowledge. They typically assume that an accurate and
complete 3D model of the object is known, as well as its mass,
mass distribution and also coefficients of friction between the
object’s surfaces and parts of the robot hand. In contrast, in
many real applications, a robot may be required to grasp a
previously unknown object of arbitrary shape, observed as a
partial point-cloud view, for which friction coefficients and
mass distribution are generally unknown. Many of these classi-
cal force-closure approaches are also restricted to assumptions
of fingertip contacts only. Physics-based analysis becomes
problematic when large patches of hand surface come into
contact with the object (unlike many human grasps such as the
“power grasp” where large surfaces of the hand are wrapped
around the object).
More recent approaches have explored various forms of
learning, [3], [12]–[15]. Learning-based methods overcome
some of the limitations of classical methods, and have shown
potential for generalising to grasping novel object shapes. [3]
achieved moderately successful grasping, by learning a direct
mapping between visual stimuli and motor outputs. Learning
was achieved via robots making exploratory motions coupled
with reinforcement. The system was able to synthesize novel
grasping policies, but relied on enormous amounts of training
data, involving large numbers of robots performing exploratory
actions over a long period of time. [15] minimised the amount
of reinforcement learning needed, by initiating learning from
close-to-good grasp poses by kinesthetic demonstration using
a data glove. In contrast, [13] showed significant ability to
generalise grasping to novel objects, achieved by “one-shot”
learning, i.e., the robot was taught a single grasp on a single
object, and was then able to plan successful grasps on new
shapes. [13] learned “local” models of relationships between
hand-parts and the curvatures of object surface patches. How-
ever, these must be combined with a “global” model of hand
shape, corresponding to a grasp prototype (pinch grasp, power
grasp, etc.) which is taught by demonstration. The method
therefore remains unable to synthesize novel grasp prototypes
that have not been taught.
Like the above learning approaches, our method also does
not rely on object models or physics knowledge. Like [13] it
exploits local descriptors of finger contacts (but a different
kind). However, our method requires no training data, and
can synthesize its own grasp hypotheses without any need of
demonstration.
III. METHOD
We present a method to address robotic grasping based on
the LoCoMo metric between the object and the gripper. This
similarity metric between the features on the object and the
features on the gripper is used to select viable finger poses on
the surface of the object which are then combined with the
kinematics of the gripper to form a grasp. In the following,
we assume a model of the gripper, in this case a parallel jaw
gripper.
The algorithm is given a (partial) point cloud of an object,
and first computes the zero-moment shift features on the point
cloud. The same features are extracted on the point cloud
of the gripper model. These features of object and gripper
are then used together to compute a local shape similarity
metric between object and gripper. The main idea is to find the
points that maximise the contact surface and to use only areas
of the object that match the surface curvature of the fingers
of the gripper for the grasp. Finally, a feasibility analysis is
performed to select the subset of pairs of points which are
Figure 2. Local surface classification base on the zero moment shift of the
Stanford Bunny. The colors Red, Yellow, Green and Blue encode in increasing
order the magnitude of the L1 norm of the zero-moment shift vector. High
values (Blue) incurs on the ears with high curvatures and low values (Red)
on surfaces with low curvatures. Left: ρ= 0.008, right:ρ= 0.016.
returned from the previous action and which are kinematically
feasible for the gripper.
A. Features Extraction and Matching Metric
Over the years, various local visual features have been
presented in the literature and were previously used for tasks
such as 2D/3D object recognition and pose estimation, [16]–
[20]. In this work we propose the use of zero-moment shift
features for grasping arbitrarily shaped objects.
Let Bρ(X)represent the Euclidean sphere of radius ρ
centered at a point XR3. Given a set of points Xin R3,
the zero-moment shift nρof the set of points ξ=X Bρ(X),
belonging to the sphere Bρ(X), can be expressed as
nρ=M0
ρ(ξ)X(1)
M0
ρ(ξ) = 1
N
N
X
n=1
Xi(2)
where, M0
ρ(ξ)represents the zero moment (or centroid) of the
set of points ξbelonging to the sphere Bρ(X).Xiis a point
sampled from ξand Nthe total number of points in ξ.
The L1 norm |nρ|of the zero-moment shift is a good
indicator of the characteristics of the underlying surface of the
set of points, as shown in Fig. 2. It can be used in conjunction
with a classifier to robustly distinguish smooth surfaces from
edges, and also be used in conjunction with the first-moment
of the set of points to provide a robust surface classification for
noisy point cloud or mesh models as presented in [6]. In this
work, we focus on the use of the zero-moment shift to compute
a similarity metric between two arbitrary surfaces. We assume
that the set of point is already preprocessed and filtered of
outliers. Comparing two local surfaces is then reduced to
comparing the zero-moment shift of the two surfaces. To
this end, we introduce the LoCoMo probability function Cϵ
between two surfaces
Cρ= 1 max(x, ϕ(x,
0,Σ)) ϕ(ε;
0,Σ)
max(x, ϕ(x,
0,Σ)) (3)
ϕrepresents the multivariate Gaussian density function
ϕ(X, µ, Σ) = 1
p(2π)n|Σ|exp(1
2(Xµ1(Xµ)) (4)
where X, µ Rn,Σis the covariance matrix and nthe space
dimension.
0is the null vector of R3,εthe error between the
two zero-moment shift vectors defined as
ε=n1
ρn2
ρ(5)
where n1
ρand n2
ρare expressed in the same reference frame.
max(x, ϕ(x, ...)) is the maximum value of the function
ϕ(x, ...)for all xR3. The zero-moment shift vectors can be
projected on the axis of the normal and the axis orthogonal to
the normal of the surface to obtain a new set of coordinates
(n, n,0) which can be used for the computation of (5).
This LoCoMo metric based on zero-moment shift features is
extremely useful for grasping, as it provides a clear indication
of the local contact between the surfaces of a gripper and an
object.
B. Grasp Selection and Ranking
Selecting stable grasps is crucial to guarantee the success
of a grasp. Several analytic methods use force closure, such as
[21] and [22]. Force closure guarantees a static equilibrium be-
tween the contact forces. Furthermore, the interaction between
two surfaces in contact can be reduced to one or multiple
contact points as described in [23]. These assumptions are
necessary conditions for a stable grasp selection, however they
are not sufficient conditions for a stable grasp, as mentioned
in [24].
The problem of generating grasp candidates can be formu-
lated as sampling finger poses on the surface of the object,
and combining them using the kinematics of the gripper to
form a grasp as described in [25]. Our method computes the
contact probability Cias given by (7) for each finger and uses
the kinematics of the gripper to select a set of finger poses to
form a grasp. The local contact probability Cρis computed
for an infinitesimal surface in a sphere of radius ρ. In order
to account for the entire shape of a finger, Cρneeds to be
integrated over its entire surface. We also introduce R, the
ranking metric (given by (6)), to rank the grasps by computing
the weighted product of the contact probability for each finger.
R=k
nf
Y
i=1
Cwi
i(6)
Ci=1
Ns
n
X
i=1
Ci,Xi
ρ(7)
where, kis a normalizing term, wiare weights satisfying
Pn
i=1 wi= 1,nfthe number of fingers, Cithe contact
probability for a finger defined in (7), nthe number of points in
the vicinity of the finger, Nsa normalizing term representing
the maximum number of points in the vicinity of the finger,
Ci,Xi
ρthe local contact moment probability between a point on
the point cloud and its orthogonal projection on the surface of
the gripper. More information on how to combine probability
Algorithm 1: Grasp generation and ranking.
Data: Point Cloud X, Fingers’ 3D model, Sphere
Radius ρ
Result: Top-k grasps
1Compute the surface normal at each point X X
2for X X do
3Select the set of points ξin Bρ(X)
4Compute nρwith (1)
5end
6for each finger do
7for X X do
8Sample several finger poses Pfaround X
9for p Pfdo
10 Select the set of points Xswithin a
distance dfrom the surface of the finger
11 for Xs Xsdo
12 Project Xson the finger’s surface
13 Compute Cs,Xs
ρwith (3)
14 end
15 Compute Ciwith (7)
16 Append Pfto P
17 end
18 end
19 end
20 Find F, the set of finger poses in Psatisfying the
kinematic constraints of the gripper
21 for f F do
22 Compute Rwith (6)
23 end
24 Order Fby decreasing order of R
25 Sample gripper pose from F
26 return the Top-k grasp poses
distributions can be found in [26]. A summary of the method
can be found in Alg. 1.
IV. EXP ER IM EN TAL RESULTS
A. Experimental setup
Our experimental setup (shown in Fig. 3) comprises a 7
degrees of freedom KUKA LBR iiwa arm whose end-effector
is mounted with a Schunk PG70 parallel jaw gripper with
flat fingers. The maximum stroke of the gripper is 68 mm.
The developed method neither require any prior knowledge
of the scene nor use any object models. However, for each
grasping trial, the robot workspace containing test objects
is observed by moving a robot wrist-mounted Ensenso N35
depth camera to six different locations. Resulting partial point
clouds from all viewpoints are stitched together, in robot base
coordinate frame, to form a point cloud of the work scene.
After segmenting the ground plane, the resulting cloud is then
used by our method to generate grasp hypotheses. Hand-eye
calibration has been performed beforehand to transform the
KUKA 7 DoF robot
3D camera
Gripper
Test objects
Figure 3. Hardware setup used to validate the proposed grasping method.
Figure 4. 21 objects used for the experiments. (left-column) spring clamp,
aluminum profile, multi-head screwdriver, screwdriver, plastic strawberry, golf
ball; (middle) racquetball, plastic lemon , plastic nectarine, wood block, potted
meat can, electric hand drill, plastic bottle, gray pipe, white pipe; (right-
column) blue cup, hammer, bleach cleanser, gas knob, bamboo bowl, mustard
container.
camera-acquired point cloud data to robot’s coordinate system
as well as to simplify the computations.
The proposed grasping method has been tested on 21
objects, as shown in Fig.4, comprising a wide variety of
shapes, masses, materials, and textures. 13 of them are from
the YCB object set [27]. The objects are selected such that
they are small enough to be physically graspable by the used
gripper.
Two sets of experiments were conducted. Firstly, we tested
the robot’s ability to grasp and lift individual objects from
Table I
SET OF OBJECTS USED FOR THE EXPERIMENT.
Object Success Rate 1st Grasp (5 Trials)
bleach cleanser 80% (4/5)
racquetball 100% (5/5
blue cup 80% (4/5)
aluminium profile 100% (5/5)
plastic bottle 100% (5/5)
bamboo bowl 100% (5/5)
spring clamp 100% (5/5)
electric hand drill 80% (4/5)
gas knob 100% (5/5)
golf ball 100% (5/5)
hammer 100% (5/5)
plastic lemon 80% (4/5)
mustard container 100% (5/5)
plastic nectarine 100% (5/5)
gray pipe 100% (5/5)
potted meat can 40% (2/5)
screwdriver 100% (5/5)
plastic strawberry 100% (5/5)
multi-head screwdriver 100% (5/5)
white pipe 60% (3/5)
wood block 100% (5/5)
Success Rate 91.43% (96/105)
the surface of a table. Second set of tests were performed
to analyse the robot’s ability to clear randomly piled heaps of
objects, by grasping and lifting objects successively, until none
remained. During trials, running on an Intel Core i7-4790K
CPU @ 4.00GHz and 16 GB RAM, our method took 13.53
seconds (on an average) to generate 1500 grasp hypotheses
for a point cloud with 31183 data points corresponding to
a clutter scene of 13 Objects. This computational time is
distributed as follows. The local contact moment computation
is performed in 1.26 seconds (9.3%), the selection of finger
pairs with feasible gripper kinematics is done in 6.29 seconds
(46.5%), and the robot’s end effector pose sample and inverse
kinematics check takes up to 5.98 seconds (44.2%).
B. Grasping individual objects
Our first experiment evaluates the robot’s ability to grasp
and lift individual objects off a flat table surface. 21 objects
were used, with five grasping trials performed on each object.
For each of the five trials, we randomly placed each object
on the table with different orientations and positions. After
capturing and registering partial point-clouds from multiple
views, points belonging to the table surface are filtered out and
the resulting object point cloud is then used to generate grasp
hypotheses, as described in Alg.1. The grasps are ranked, and
the grasp with the highest likelihood, Eq. (6), is executed. A
grasp is recorded as successful if the robot manages to grasp
and lift the object to a post-grasp position 20 cm above the
table, and hold the object for more than 10 seconds without
dropping it.
Table I shows the results of our algorithm when grasping
objects that are individually placed on a table. Fig.5 shows
images of successful grasps. The overall success rate for all
five trials on all 21 objects is 91.43% (96 successful grasps
Planned grasp Pre-grasp Grasp Post-grasp
Figure 5. Successful grasps for various objects. In each row, from left to
right, the first image shows the point cloud of the object with the contact
moment probability and the highest ranked grasp; the second image shows
the pre-grasp position of the gripper; the third image shows the grasp; finally,
the fourth image shows the post-grasp position with the object grasped.
out of 105). In 97.14% (102/105), the LoCoMo algorithm
suggested viable grasps, but objects were dropped for other
reasons. For example the object was heavy, and the selected
grasp was far from the centre of mass, placing a large torque
on the gripper jaws, causing the object to twist loose. In the
case of the potted meat can, the success rate was only 40%
(2/5). This was due to shiny surfaces which caused a very
noisy point cloud.
In safety-critical, high-consequence industries, such as nu-
clear waste handling or other extreme environments, au-
tonomous robotics methods are likely to be introduced as
“operator-assistance technologies”, i.e., human-supervised au-
tonomy. In such cases, a human operator might select between
Figure 6. Three different cluttered scenes generated for validating our
approach.
several grasps that have been suggested by an autonomous
grasp planner. As a small step towards exploring such a
system, we repeated the first experiment, however in each
attempt we allowed a human to choose one of the best five
grasp candidates suggested by the LoCoMo algorithm. In this
case, grasp success rose to 98%. This suggests that improved
performance might be obtained by combining LoCoMo with
other kinds of information, e.g., selecting grasps which result
in minimal torques.
C. Grasping objects from a cluttered heap
The second set of experiments was performed on cluttered,
self-occluding heaps of objects. For each heap, at least 6
objects were placed in a random pile. Three different heaps
were used, Fig. 6. The robot is tasked with clearing the
heap, by successively grasping and lifting objects until none
remain. No ground plane segmentation was performed in this
second experiment. However, the LoCoMo algorithm was able
to automatically label the flat table surface as ungraspable,
i.e., excluding flat surfaces, and focusing attention on objects,
appears to be an inherent behaviour of the algorithm.
At each iteration, grasps are generated, and the highest
ranked grasp is executed. Each object is removed without
replacement if the grasp is successful, and the experiment
is repeated until all the objects are grasped or the algorithm
reports that it cannot identify any more feasible grasps. The
success of each grasp attempt is determined in the same way
as in the first experiment.
Table II shows the results for the heap-picking experiments.
We report the results of three different heaps containing at least
six objects each. For the first heap, 100% of the objects were
grasped successfully from the table, one after the other. Only
the gas knob required two trials to be successfully grasped,
with all other objects grasped on the first attempt.
For the second heap, all objects were grasped at the first
attempt, and the success rate was 100%. During its third
grasp, the robot chose to grasp and lift the bowl object, while
the bowl still held three other objects inside it (multi-head
screwdriver, plastic bottle and nectarine). In order to continue
Table II
CLU TTE RE D SCE NE EX PE RIM EN T RES ULTS .
Scene Attempt Object Success / Failure
#1
1 blue cup success
2 golf ball success
3 white pipe success
4 electric hand drill success
5 gas knob failure
6 wood block success
7 gas knob success
8 plastic nectarine success
#2
1 gray pipe success
2 aluminum profile success
3 bamboo bowl success
4 multi-head screwdriver success
5 plastic bottle success
6 plastic nectarine success
#3
1 mustard container success
2 plastic bottle success
3 spring clamp failed
4 plastic lemon success
5 spring clamp success
6 hammer failed
7 hammer success
8 plastic strawberry rolled off table
the experiment, these objects were placed back on the table
and then successfully grasped, needing only one attempt each.
For the third heap, 83% of the objects (5 out of 6) were
successfully grasped. The spring clamp and hammer proved
to be difficult, due to sparse point clouds. However, only two
attempts were required to grasp these objects. The system did
not fail to plan a grasp for the final object (plastic strawberry).
Unfortunately, lifting the hammer caused the strawberry to roll
off the table so that this final object of the heap could not be
completed.
Fig. 7 shows the generated grasps in the cluttered scene 1.
The robot was able to clear all three heaps successfully, the
only exception being the final object of the third heap, which
was pushed off the table during lifting of one of the other
objects.
D. Discussion
Overall, results suggest that the LoCoMo algorithm is very
promising. For lifting individual objects, a success rate of
91.43%was obtained over five different trials on 21 dif-
ferent objects, featuring a very wide variety of shapes and
appearances. This result is remarkable considering that the
system did not have any model or other a-priori knowledge
of the objects being grasped. Additionally, no training data
was required, and no learning was involved to obtain these
results. Moreover, in the heap-picking experiments, featuring
extreme clutter conditions, LoCoMo was able to grasp most
of the objects at the first attempt (15 out of 19 objects) and
was able to successfully grasp all objects, of all heaps, with
the exception of the final object of the final heap (plastic
strawberry) which rolled off the table during earlier activity.
Aside from a small number of unusual incidents, the pro-
posed algorithm appears to have planned robust grasps almost
Planned grasp
Pre-grasp
Grasp
Post-grasp
Aempt 1 Aempt 2 Aempt 3 Aempt 4 Aempt 5 Aempt 6 Aempt 7 Aempt 8
Figure 7. Results of grasp execution in cluttered scenes. First row: images of point cloud of the scene and the gripper. Second row: pre-grasp position of
the gripper with respect to the cluttered scene. Third row: execution of the grasp. Fourth row: post-grasp position of the gripper. Chronological sequence is
from left to right, i.e. first column shows the grasping of the first object, second column the grasping of the second object and so on. Detailed results can be
found in the provided supplementary video.
100% of the time. However, we believe that we can improve
robustness in several ways. We noted earlier that the set of
five highest-ranked grasps occasionally contains a grasp that
performs better than the highest ranked grasp. This is because
LoCoMo selects grasps based purely on the geometry of
surfaces. Combining LoCoMo’s robust selection of graspable
geometrical features, with other kinds of information such
as mass distribution [28], may enable more robust perfor-
mance. Additionally, combining multiple grasp hypotheses
with human-supervised autonomy, appears to outperform pure
autonomy based on LoCoMo alone.
V. CONCLUSION
In this paper, we proposed a novel grasp generation method,
based on the LoCoMo metric which searches for similarities
between the shape of finger surfaces, and the local shape
of an object, observed as a partial point-cloud. The metric
is based on zero-moment shift visual features, which encode
useful information about local surface curvature. Our method
does not rely on any a-priori knowledge about objects or
their physical parameters, and also does not require learning
from any kind of training data. Grasps are planned from
point-cloud images of objects, viewed from a depth-camera
mounted on the robot’s wrist. Experimental trials, with a real
robot and wide variety of objects, suggest that our method
generalises well to many shapes. We also demonstrated very
robust performance in extremely cluttered scenes. Moreover,
the algorithm is also capable of classifying certain objects
(e.g., flat table surfaces) as not graspable.
Our future work will focus on improving the performance
of the method in terms of speed and extending it to perform
multi-finger grasping. We will also focus on accomplishing
complex manipulations in challenging scenarios, e.g., nuclear,
automotive etc. by integrating it with our previous state
estimation and control methodologies [29], [30].
VI. ACK NOWLEDGEMENTS
This work forms part of the UK National Centre for
Nuclear Robotics initiative, funded by EPSRC EP/R02572X/1.
It is also supported by H2020 RoMaNS 645582, and EP-
SRC EP/P017487/1, EP/P01366X/1. Stolkin is supported by
a Royal Society Industry Fellowship. Ortenzi and Corke are
supported by the Australian Research Council Centre of Ex-
cellence for Robotic Vision (project number CE140100016).
REFERENCES
[1] A. T. Miller and P. K. Allen, “Graspit! a versatile simulator for robotic
grasping,” IEEE Robotics & Automation Magazine, vol. 11, no. 4, pp.
110–122, 2004.
[2] V.-D. Nguyen, “Constructing force-closure grasps,” The International
Journal of Robotics Research, vol. 7, no. 3, pp. 3–16, 1988.
[3] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning
hand-eye coordination for robotic grasping with deep learning and large-
scale data collection,” The International Journal of Robotics Research,
vol. 37, no. 4-5, pp. 421–436, 2018.
[4] N. Marturi, M. Kopicki, A. Rastegarpanah, V. Rajasekaran, M. Adjigble,
R. Stolkin, A. Leonardis, and Y. Bekiroglu, “Dynamic grasp and
trajectory planning for moving objects,” Autonomous Robots, in-press.
[5] A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp pose detection
in point clouds,” The International Journal of Robotics Research, p.
0278364917735594, 2017.
[6] U. Clarenz, M. Rumpf, and A. Telea, “Robust feature detection and
local classification for surfaces based on moment analysis,” IEEE
Transactions on Visualization and Computer Graphics, vol. 10, no. 5,
pp. 516–524, 2004.
[7] J. Weisz and P. K. Allen, “Pose error robust grasping from contact
wrench space metrics,” in Robotics and Automation (ICRA), 2012 IEEE
International Conference on. IEEE, 2012, pp. 557–562.
[8] C. Rosales, R. Su´
arez, M. Gabiccini, and A. Bicchi, “On the synthesis
of feasible and prehensile robotic grasps,” in Robotics and Automation
(ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp.
550–556.
[9] M. A. Roa and R. Su´
arez, “Computation of independent contact regions
for grasping 3-d objects,” IEEE Transactions on Robotics, vol. 25, no. 4,
pp. 839–850, 2009.
[10] D. Prattichizzo and J. C. Trinkle, “Grasping, in Springer handbook of
robotics. Springer, 2008, pp. 671–700.
[11] J.-W. Li, H. Liu, and H.-G. Cai, “On computing three-finger force-
closure grasps of 2-d and 3-d objects,” IEEE Transactions on Robotics
and Automation, vol. 19, no. 1, pp. 155–161, 2003.
[12] M. Gualtieri, A. ten Pas, K. Saenko, and R. Platt, “High precision
grasp pose detection in dense clutter, in Intelligent Robots and Systems
(IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016, pp.
598–605.
[13] M. Kopicki, R. Detry, M. Adjigble, R. Stolkin, A. Leonardis, and J. L.
Wyatt, “One-shot learning and generation of dexterous grasps for novel
objects,” The International Journal of Robotics Research, vol. 35, no. 8,
pp. 959–976, 2016.
[14] I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic
grasps,” The International Journal of Robotics Research, vol. 34, no.
4-5, pp. 705–724, 2015.
[15] H. B. Amor, O. Kroemer, U. Hillenbrand, G. Neumann, and J. Peters,
“Generalization of human grasping for multi-fingered robot hands,” in
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International
Conference on. IEEE, 2012, pp. 2043–2050.
[16] M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, “Region-
sequence based six-stream cnn features for general and fine-grained
human action recognition in videos,” Pattern Recognition, vol. 76, pp.
506–521, 2018.
[17] M. Ma, N. Marturi, Y. Li, R. Stolkin, and A. Leonardis, A local-global
coupled-layer puppet model for robust online human pose tracking,”
Computer Vision and Image Understanding, vol. 153, pp. 163–178,
2016.
[18] D. Smeets, J. Keustermans, D. Vandermeulen, and P. Suetens, “meshsift:
Local surface features for 3d face recognition under expression varia-
tions and partial data,” Computer Vision and Image Understanding, vol.
117, no. 2, pp. 158–169, 2013.
[19] E. Paquet, M. Rioux, A. Murching, T. Naveen, and A. Tabatabai,
“Description of shape information for 2-d and 3-d objects,” Signal
processing: Image communication, vol. 16, no. 1-2, pp. 103–122, 2000.
[20] R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu, “Fast 3d recognition and
pose using the viewpoint feature histogram, in Intelligent Robots and
Systems (IROS), 2010 IEEE/RSJ International Conference on. IEEE,
2010, pp. 2155–2162.
[21] R. M. Murray, Z. Li, S. S. Sastry, and S. S. Sastry, A mathematical
introduction to robotic manipulation. CRC press, 1994.
[22] C. Ferrari and J. Canny, “Planning optimal grasps, in Robotics and
Automation, 1992. Proceedings., 1992 IEEE International Conference
on. IEEE, 1992, pp. 2290–2295.
[23] A. Bicchi and V. Kumar, “Robotic grasping and contact: A review, in
Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE Interna-
tional Conference on, vol. 1. IEEE, 2000, pp. 348–353.
[24] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp
synthesis - a survey, IEEE Transactions on Robotics, vol. 30, no. 2,
pp. 289–309, 2014.
[25] M. Kopicki, R. Detry, F. Schmidt, C. Borst, R. Stolkin, and J. L. Wyatt,
“Learning dexterous grasps that generalise to novel objects by combining
hand and contact models,” in Robotics and Automation (ICRA), 2014
IEEE International Conference on. IEEE, 2014, pp. 5358–5365.
[26] S. Kaplan, “Combining probability distributions from experts in risk
analysis,” Risk Analysis, vol. 20, no. 2, pp. 155–156, 2000.
[27] B. Calli, A. Walsman, A. Singh, S. Srinivasa, P. Abbeel, and A. M.
Dollar, “Benchmarking in manipulation research: Using the yale-cmu-
berkeley object and model set, IEEE Robotics & Automation Magazine,
vol. 22, no. 3, pp. 36–52, 2015.
[28] N. Mavrakis, R. Stolkin, L. Baronti, M. Kopicki, M. Castellani et al.,
“Analysis of the inertia and dynamics of grasped objects, for choosing
optimal grasps to enable torque-efficient post-grasp manipulations, in
Humanoid Robots (Humanoids), 2016 IEEE-RAS 16th International
Conference on. IEEE, 2016, pp. 171–178.
[29] V. Ortenzi, N. Marturi, R. Stolkin, J. A. Kuo, and M. Mistry, “Vision-
guided state estimation and control of robotic manipulators which lack
proprioceptive sensors, in Intelligent Robots and Systems (IROS), 2016
IEEE/RSJ International Conference on. IEEE, 2016, pp. 3567–3574.
[30] N. Marturi, A. Rastegarpanah, C. Takahashi, M. Adjigble, R. Stolkin,
S. Zurek, M. Kopicki, M. Talha, J. A. Kuo, and Y. Bekiroglu, “Towards
advanced robotic manipulation for nuclear decommissioning: a pilot
study on tele-operation and autonomy, in Robotics and Automation for
Humanitarian Applications (RAHA), 2016 International Conference on.
IEEE, 2016, pp. 1–8.
... Yet, reliably grasping objects, whether known or unknown, within unstructured environments remains a persistent challenge. Many researchers have proposed solutions to this broad grasping problem by leveraging advancements in sensor and computational capabilities [1]- [3]. These solutions, which can be either analytical or data-driven, leverage visual features or even shape uncertainty while aiming at achieving stable grasps on the surface of rigid objects. ...
... To obtain the grasp set G(P partial task ), we utilise our Local Contact Moment 1 (LoCoMo) based grasp planner presented in [1]. Using zero moments shift, it generates grasp hypotheses by assessing local similarities between an object's surface and gripper fingers. ...
... Grasps are then presented in a ranked list based on a shape similarity score. The set of k grasps is denoted as (3) is the gripper pose in the world frame and r j is the LoCoMo ranking score, calculated as in [1]. Once grasp poses are generated, the robot selects the top-ranked and best feasible pose for execution. ...
Article
Full-text available
In this paper, we address the problem of task-informed grasping in scenarios where only incomplete or partial object information is available. Existing methods, which either focus on task-aware grasping or grasping under partiality, typically require extensive data and long training durations. In contrast, we propose a one-shot task-informed methodology that enables the transfer of grasps computed for a stored object model in the database to another object of the same category that is partially perceived. Our method leverages the reconstructed shapes from Gaussian Process Implicit Surfaces (GPIS) and employs the Functional Maps (FM) framework to transfer task-specific grasping functions. By defining task functions on the objects' manifolds and incorporating an uncertainty metric from GPIS, our approach provides a robust solution for part-specific and task-oriented grasping. Validated through simulations and real-world experiments with a 7-axis collaborative robotic arm, our methodology demonstrates a success rate exceeding 90% in achieving task-informed grasps on a variety of objects.
... Alternatively, a voxel grid with real values could be used, but this would require encoding the voxel's 3D points into a single real value that may provide information about the local surface of the object. Other potential candidates for this representation include curvatures (minimum, maximum, or a combination of both) or the Local Contact Moment score (LoCoMo) [14]. However, these options are not investigated in this work and are left for future studies. ...
... The force closure principle [19] is then used on the set of extracted points to filter out unstable grasps. The remaining grasps are ranked using LoCoMo metric [14]. ...
... X * r is the closest trajectory pose to X r , which is computed by discretising the trajectory and finding the closest pose to X r .Ẋ r is the derivative of X r . F * r is overlaid on the bilateral haptic force F r given in (14) in a specific way to enable the desired behaviour. Given a Cartesian force F , F F and F T denote its x − y − z force and torque vectors, respectively. ...
Preprint
Full-text available
This paper presents an assisted telemanipulation framework for reaching and grasping desired objects from clutter. Specifically, the developed system allows an operator to select an object from a cluttered heap and effortlessly grasp it, with the system assisting in selecting the best grasp and guiding the operator to reach it. To this end, we propose an object pose estimation scheme, a dynamic grasp re-ranking strategy, and a reach-to-grasp hybrid force/position trajectory guidance controller. We integrate them, along with our previous SpectGRASP grasp planner, into a classical bilateral teleoperation system that allows to control the robot using a haptic device while providing force feedback to the operator. For a user-selected object, our system first identifies the object in the heap and estimates its full six degrees of freedom (DoF) pose. Then, SpectGRASP generates a set of ordered, collision-free grasps for this object. Based on the current location of the robot gripper, the proposed grasp re-ranking strategy dynamically updates the best grasp. In assisted mode, the hybrid controller generates a zero force-torque path along the reach-to-grasp trajectory while automatically controlling the orientation of the robot. We conducted real-world experiments using a haptic device and a 7-DoF cobot with a 2-finger gripper to validate individual components of our telemanipulation system and its overall functionality. Obtained results demonstrate the effectiveness of our system in assisting humans to clear cluttered scenes.
... Moving forward, future research will explore the application of the proposed ideas in this paper to 3D data-sets, while simultaneously incorporating uncertainties in both localization and object categorization into the YOLOv3 or its subsequent versions like YOLOX. Additionally, we are exploring the combination of these object localisation and categorization algorithms with our lab's work on advanced robotics methods for vision-guided autonomous grasping and manipulation [101]- [103]. ...
Preprint
Full-text available
This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality, so that the student can output confidence scores with each categorization. The resulting student significantly outperforms the same YOLO architecture trained directly on the same amount of labelled data. Our GP-based approach yields robust and meaningful uncertainty estimations for complex industrial object classifications. The end-to-end network is also capable of real-time processing, needed for robotics applications. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable. In this paper, we demonstrate an example of detection, localisation, and object category recognition of nuclear mixed-waste materials in highly cluttered and unstructured scenes. This is critical for robotic sorting and handling of legacy nuclear waste, which poses complex environmental remediation challenges in many nuclearised nations.
... This manual process, often due to diverse product variants and low waste stream volumes, poses challenges for automation (Harper et al., 2019;Thompson et al., 2020). Robotic disassembly, powered by recent advancements in artificial intelligence (Meng et al., 2022), offers a promising direction (Choux et al., 2021;Li et al., 2018;Marshall et al., 2020;Marturi et al., 2018;Poschmann et al., 2021) especially in enhancing safety (Glöser-Chahoud et al., 2021), efficiency, and economic viability of the initial steps in the recycling process (Wei et al., 2023). A crucial aspect in the existing literature pertains to whether automated disassembly should halt at the module or cell level prior to recycling (Alfaro-Algaba and Ramirez, 2020; Thompson et al., 2021). ...
... To exploit the colour and the distance, RGBD cameras for object identification and pose estimation has been used for ware house operations [2]. Methods based on reinforcement learning to grasp the object towards warehouse pick and place activities were demonstrated by [3], model-free and learning-free grasping is presented in [4]. Other approaches to assign task based on adaptive heuristic approaches to assign and prioritize task has been demonstrated in [5], warehouse traffic flow management [6]. ...
... Several works are done in 3D grasp prediction. In [18], a grasp generation method was proposed which searches for similarities between the shape of finger surfaces, and the local shape of an object observed as a partial point cloud. Furthermore, in [19], a GQ-CNN model was trained to generate point clouds and grasp attempts in order to evaluate the grasp quality. ...
Chapter
The rapid growth of the electric vehicle industry has created a significant demand for the recycling of end-of-life electric vehicle batteries (EOL-EVB). Manual disassembly methods suffer from low efficiency, highlighting the urgent need for intelligent disassembly solutions for electric vehicle batteries. A major challenge in intelligent disassembly is dealing with uncertainty, especially when it comes to the disassembly of screws, which vary in shape, size, and rust level. To address this challenge, we present a multifunctional screw disassembly workstation specifically designed for the disassembly of screws, which constitutes a substantial portion of the EOL-EVB disassembly process. The workstation incorporates an automated sleeve replacement device that can seamlessly replace and disassemble sleeves during disassembly. Additionally, we propose a screw-type recognition method based on attributes, enabling the identification of various screw attributes to determine appropriate disassembly methods. This method exhibits scalability and requires only a small amount of data. By expanding the capabilities of our previous Neurosymbolic TAMP (Task and Motion Planning) work, we can support multiple types of screw disassembly and integrate it into the overall process of EOL-EVB disassembly, significantly reducing repetitive tasks such as screw disassembly during the disassembly process. Experimental results demonstrate the effectiveness of the workstation in disassembling multiple types of screws within a realistic disassembly environment.
Conference Paper
Full-text available
We present early pilot-studies of a new international project, developing advanced robotics to handle nuclear waste. Despite enormous remote handling requirements, there has been remarkably little use of robots by the nuclear industry. The few robots deployed have been directly teleoperated in rudimentary ways, with no advanced control methods or autonomy. Most remote handling is still done by an aging workforce of highly skilled experts, using 1960s style mechanical Master-Slave devices. In contrast, this paper explores how novice human operators can rapidly learn to control modern robots to perform basic manipulation tasks; also how autonomous robotics techniques can be used for operator assistance, to increase throughput rates, decrease errors, and enhance safety. We compare humans directly teleoperating a robot arm, against human-supervised semi-autonomous control exploiting computer vision, visual servoing and autonomous grasping algorithms. We show how novice operators rapidly improve their performance with training; suggest how training needs might scale with task complexity; and demonstrate how advanced autonomous robotics techniques can help human operators improve their overall task performance. An additional contribution of this paper is to show how rigorous experimental and analytical methods from human factors research, can be applied to perform principled scientific evaluations of human test-subjects controlling robots to perform practical manipulative tasks.
Article
Full-text available
This paper shows how a robot arm can follow and grasp moving objects tracked by a vision system, as is needed when a human hands over an object to the robot during collaborative working. While the object is being arbitrarily moved by the human co-worker, a set of likely grasps, generated by a learned grasp planner, are evaluated online to generate a feasible grasp with respect to both: the current configuration of the robot respecting the target grasp; and the constraints of finding a collision-free trajectory to reach that configuration. A task-based cost function enables relaxation of motion-planning constraints, enabling the robot to continue following the object by maintaining its end-effector near to a likely pre-grasp position throughout the object's motion. We propose a method of dynamic switching between: a local planner, where the hand smoothly tracks the object, maintaining a steady relative pre-grasp pose; and a global planner, which rapidly moves the hand to a new grasp on a completely different part of the object, if the previous graspable part becomes unreachable. Various experiments are conducted using a real collaborative robot and the obtained results are discussed.
Article
Full-text available
We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast, as well as robust, we present a two-step cascaded structure with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs well, for which we present a method to apply structured regularization on the weights based on multimodal group regularization. We demonstrate that our method outperforms the previous state-of-the-art methods in robotic grasp detection, and can be used to successfully execute grasps on a Baxter robot.
Article
Full-text available
Recently, a number of grasp detection methods have been proposed that can be used to localize robotic grasp configurations directly from sensor data without estimating object pose. The underlying idea is to treat grasp perception analogously to object detection in computer vision. These methods take as input a noisy and partially occluded RGBD image or point cloud and produce as output pose estimates of viable grasps, without assuming a known CAD model of the object. Although these methods generalize grasp knowledge to new objects well, they have not yet been demonstrated to be reliable enough for wide use. Many grasp detection methods achieve grasp success rates (grasp successes as a fraction of the total number of grasp attempts) between 75% and 95% for novel objects presented in isolation or in light clutter. Not only are these success rates too low for practical grasping applications, but the light clutter scenarios that are evaluated often do not reflect the realities of real world grasping. This paper proposes a number of innovations that together result in a significant improvement in grasp detection performance. The specific improvement in performance due to each of our contributions is quantitatively measured either in simulation or on robotic hardware. Ultimately, we report a series of robotic experiments that average a 93% end-to-end grasp success rate for novel objects presented in dense clutter.
Article
This paper addresses the problems of both general and also fine-grained human action recognition in video sequences. Compared with general human actions, fine-grained action information is more difficult to detect and occupies relatively small-scale image regions. Our work seeks to improve fine-grained action discrimination, while also retaining the ability to perform general action recognition. Our method first estimates human pose and human parts positions in video sequences by extending our recent work on human pose tracking, and crops different scaled patches to obtain richer action information in a variety of different scales of appearance and motion cues. We then utilize a Convolutional Neural Network (CNN) to process each such image patch. Instead of using the output one dimension feature from the full-connection layer, we utilize the outputs of the pooling layer of CNN structure, which contains more spatial information. Then the high dimension of the pooling features is reduced by encoding, to generate the final human action descriptors for classification. Our method reduces feature dimension while also effectively combining appearance and motion information in a unified framework. We have carried out empirical experiments using two publicly available human action datasets, comparing the human action recognition result of our algorithm against six recent state-of-the-art methods from the literature. The results suggest comparatively strong performance of our method.
Book
A Mathematical Introduction to Robotic Manipulation presents a mathematical formulation of the kinematics, dynamics, and control of robot manipulators. It uses an elegant set of mathematical tools that emphasizes the geometry of robot motion and allows a large class of robotic manipulation problems to be analyzed within a unified framework. The foundation of the book is a derivation of robot kinematics using the product of the exponentials formula. The authors explore the kinematics of open-chain manipulators and multifingered robot hands, present an analysis of the dynamics and control of robot systems, discuss the specification and control of internal forces and internal motions, and address the implications of the nonholonomic nature of rolling contact are addressed, as well. The wealth of information, numerous examples, and exercises make A Mathematical Introduction to Robotic Manipulation valuable as both a reference for robotics researchers and a text for students in advanced robotics courses.
Conference Paper
We present early pilot-studies of a new international project, developing advanced robotics to handle nuclear waste. Despite enormous remote handling requirements, there has been remarkably little use of robots by the nuclear industry. The few robots deployed have been directly teleoperated in rudimentary ways, with no advanced control methods or autonomy. Most remote handling is still done by an aging workforce of highly skilled experts, using 1960s style mechanical Master-Slave devices. In contrast, this paper explores how novice human operators can rapidly learn to control modern robots to perform basic manipulation tasks; also how autonomous robotics techniques can be used for operator assistance, to increase through-put rates, decrease errors, and enhance safety. We compare humans directly teleoperating a robot arm, against human-supervised semi-autonomous control exploiting computer vision, visual servoing and autonomous grasping algorithms. We show how novice operators rapidly improve their performance with training; suggest how training needs might scale with task complexity ; and demonstrate how advanced autonomous robotics techniques can help human operators improve their overall task performance. An additional contribution of this paper is to show how rigorous experimental and analytical methods from human factors research, can be applied to perform principled scientific evaluations of human test-subjects controlling robots to perform practical manipulative tasks.
Conference Paper
Multi-fingered robot grasping is a challenging problem that is difficult to tackle using hand-coded programs. In this paper we present an imitation learning approach for learning and generalizing grasping skills based on human demonstrations. To this end, we split the task of synthesizing a grasping motion into three parts: (1) learning efficient grasp representations from human demonstrations, (2) warping contact points onto new objects, and (3) optimizing and executing the reach-and-grasp movements. We learn low-dimensional latent grasp spaces for different grasp types, which form the basis for a novel extension to dynamic motor primitives. These latent-space dynamic motor primitives are used to synthesize entire reach-and-grasp movements. We evaluated our method on a real humanoid robot. The results of the experiment demonstrate the robustness and versatility of our approach.
Article
This paper addresses the problem of online tracking of articulated human body poses in dynamic environments. Many previous approaches perform poorly in realistic applications: often future frames or entire sequences are used anticausally to mutually refine the poses in each individual frame, making online tracking impossible; tracking often relies on strong assumptions about e.g. clothing styles, body-part colours and constraints on body-part motion ranges, limiting such algorithms to a particular dataset; the use of holistic feature models limits the ability of optimisation-based matching to distinguish between pose errors of different body parts. We overcome these problems by proposing a coupled-layer framework, which uses the previous notions of deformable structure (DS) puppet models. The underlying idea is to decompose the global pose candidate in any particular frame into several local parts to obtain a refined pose. We introduce an adaptive penalty with our model to improve the searching scope for a local part pose, and also to overcome the problem of using fixed constraints. Since the pose is computed using only current and previous frames, our method is suitable for online sequential tracking. We have carried out empirical experiments using three different public benchmark datasets, comparing two variants of our algorithm against four recent state-of-the-art (SOA) methods from the literature. The results suggest comparatively strong performance of our method, regardless of weaker constraints and fewer assumptions about the scene, and despite the fact that our algorithm is performing online sequential tracking, whereas the comparison methods perform mutual optimisation backwards and forwards over all frames of the entire video sequence.