Content uploaded by Maxime Adjigble

Author content

All content in this area was uploaded by Maxime Adjigble on Jan 05, 2022

Content may be subject to copyright.

Model-free and learning-free grasping

by Local Contact Moment matching

Maxime Adjigble1, Naresh Marturi1, Valerio Ortenzi2, Vijaykumar Rajasekaran1,

Peter Corke2, and Rustam Stolkin1

Abstract—This paper addresses the problem of grasping arbi-

trarily shaped objects, observed as partial point-clouds, without

requiring: models of the objects, physics parameters, training

data, or other a-priori knowledge. A grasp metric is proposed

based on Local Contact Moment (LoCoMo). LoCoMo combines

zero-moment shift features, of both hand and object surface

patches, to determine local similarity. This metric is then used

to search for a set of feasible grasp poses with associated grasp

likelihoods. LoCoMo overcomes some limitations of both classical

grasp planners and learning-based approaches. Unlike force-

closure analysis, LoCoMo does not require knowledge of physical

parameters such as friction coefﬁcients, and avoids assumptions

about ﬁngertip contacts, instead enabling robust contacts of large

areas of hand and object surface. Unlike more recent learning-

based approaches, LoCoMo does not require training data, and

does not need any prototype grasp conﬁgurations to be taught

by kinesthetic demonstration. We present results of real-robot

experiments grasping 21 different objects, observed by a wrist-

mounted depth camera. All objects are grasped successfully when

presented to the robot individually. The robot also successfully

clears cluttered heaps of objects by sequentially grasping and

lifting objects until none remain.

I. INTRODUCTION

Robots have been routinely and reliably grasping a vast

variety of objects in manufacturing environments for several

decades. This is based on simple pre-programmed actions, on

exactly pre-deﬁned objects, in highly structured environments.

However, autonomous, vision-guided grasping, in unstructured

environments, remains an open research problem. In this paper,

we assume that the robot has a model of itself, but does not

have any models or prior knowledge of the objects that it is

tasked with grasping. These objects may take arbitrary shape

and appear amidst clutter, observed as noisy partial point-

clouds. Our main contribution is to show how this problem

can be approached without needing either classical physics

analysis, or any learning from training data.

Classical grasping methods based on physics analysis [1],

[2] typically require the robot to have detailed knowledge of

the grasped object’s shape, mass and mass distribution, and

friction coefﬁcients between object surfaces and hand parts. It

is common to assume point or ﬁngertip contacts, with contacts

of large surface areas of the hand becoming analytically

intractable. More recent work has investigated a variety of

machine learning approaches to grasping [3]–[5]. Learning

1M.Adjigble, N. Marturi, V. Rajasekaran and R. Stolkin are with the

Extreme Robotics Laboratory, School of Metallurgy and Materials, University

of Birmingham, UK. maxime.adjigble@gmail.com

2V. Ortenzi and P. Corke are with the ARC Centre of Excellence for

Robotic Vision, Queensland University of Technology, Brisbane QLD 4001,

Australia. http://www.roboticvision.org

Figure 1. (Top-left) Point cloud of the object. (Top-right) Contact moment

features for a single ﬁnger with planar surface. Red, yellow and green

respectively encodes increasing values of the metric in this order computed

using (3). (Bottom-left) Generated grasp with the highest contact probability.

(Bottom-right) Grasp executed on the robot.

approaches seek to encode a more direct link between the

geometry of a scene (typically observed as a point-cloud) and

grasp hypotheses. Such methods have signiﬁcantly contributed

to overcoming limitations of classical methods. However,

all of these methods require training data (some more and

some less). Most of these methods also require prototypical

grasps (pinch-grasp, power-grasp, edge-grasp etc.) to be taught

by kinesthetic demonstration or pre-progamming, albeit that

learning-based methods can often adapt these pre-taught hand

conﬁgurations to new object shapes (generalisation) with some

success.

In this paper, we propose a novel algorithm for computing

robust grasp hypotheses on arbitrarily shaped objects. The

overall grasping pipeline is depicted in Fig. 1. Given a point-

cloud view of a surface, and the kinematics of the robot’s

arm and hand, our algorithm outputs a variety of feasible

grasp poses for the hand, and evaluates each according to a

novel grasp likelihood metric. A collision-free reach-to-grasp

trajectory is then sought, and the highest-likelihood reachable

grasp is executed. Like recent learning-based methods, our

method also maps observed surface shapes directly to grasp

hypotheses. However, this mapping is not achieved by learn-

ing, does not require any training data, nor does it require any

kinesthetic teaching or pre-programming of prototypical grasp

conﬁgurations. Instead, we propose a novel grasp likelihood

metric, the local contact moment probability function, which

evaluates the shape compatibility between local parts of hand

or ﬁnger surface, and local parts of an observed point-cloud.

Local contact moment (LoCoMo) is based on computing

zero-moment shift features for local parts of the observed point

cloud, and also parts of the robot’s hand. First described in the

computer graphics literature [6], zero moment shift features

represent the characteristics of limited regions of surfaces,

and are especially good at encoding information about surface

curvature, Fig. 2, which is particularly important for matching

hand parts to a grasped object. These features represent the

surface characteristics of a limited region of the point cloud,

hence they are “local” features. Also, they are computed on

the point cloud without the need of any a-priori knowledge of

the object (i.e., model-free).

Using LoCoMo as a ﬁtness function, a point-cloud surface

can be efﬁciently searched for good matches to ﬁnger surface

geometry. Kinematic analysis then yields a set of feasible

grasps, with each grasp associated with a grasp likelihood.

The motion-space of the arm is then explored to ﬁnd collision-

free reach-to-grasp trajectories to the highest likelihood grasp

poses.

The main contributions of this work are:

•We propose the use of zero moment shift features [6] for

robotic grasp-planning.

•We propose a new metric, the local contact moment

probability function, for evaluating compatibility between

the surface geometries of local parts of both object and

gripper. This metric is model-free, and does not need to

be learned from training data.

•Exploitation of the kinematics of the robot to select a

subset of the graspable points, ﬁrst identiﬁed by LoCoMo,

that are kinematically reachable and feasible for the arm

and hand system.

The remainder of this paper is structured as follows: Section

II highlights the novelties of this work with respect to related

literature. Section III describes the technical details of our

proposed method. Section IV shows the results of a number

of experiments conducted using a Schunk industrial two-ﬁnger

hand mounted on a KUKA LBR iiwa manipulator arm. Section

V provides concluding remarks.

II. RE LATE D WOR K

Classical approaches to grasping predominantly use

physics-based analysis to compute force-closure [7]–[11].

Most of these approaches rely on a large amount of a-

priori knowledge. They typically assume that an accurate and

complete 3D model of the object is known, as well as its mass,

mass distribution and also coefﬁcients of friction between the

object’s surfaces and parts of the robot hand. In contrast, in

many real applications, a robot may be required to grasp a

previously unknown object of arbitrary shape, observed as a

partial point-cloud view, for which friction coefﬁcients and

mass distribution are generally unknown. Many of these classi-

cal force-closure approaches are also restricted to assumptions

of ﬁngertip contacts only. Physics-based analysis becomes

problematic when large patches of hand surface come into

contact with the object (unlike many human grasps such as the

“power grasp” where large surfaces of the hand are wrapped

around the object).

More recent approaches have explored various forms of

learning, [3], [12]–[15]. Learning-based methods overcome

some of the limitations of classical methods, and have shown

potential for generalising to grasping novel object shapes. [3]

achieved moderately successful grasping, by learning a direct

mapping between visual stimuli and motor outputs. Learning

was achieved via robots making exploratory motions coupled

with reinforcement. The system was able to synthesize novel

grasping policies, but relied on enormous amounts of training

data, involving large numbers of robots performing exploratory

actions over a long period of time. [15] minimised the amount

of reinforcement learning needed, by initiating learning from

close-to-good grasp poses by kinesthetic demonstration using

a data glove. In contrast, [13] showed signiﬁcant ability to

generalise grasping to novel objects, achieved by “one-shot”

learning, i.e., the robot was taught a single grasp on a single

object, and was then able to plan successful grasps on new

shapes. [13] learned “local” models of relationships between

hand-parts and the curvatures of object surface patches. How-

ever, these must be combined with a “global” model of hand

shape, corresponding to a grasp prototype (pinch grasp, power

grasp, etc.) which is taught by demonstration. The method

therefore remains unable to synthesize novel grasp prototypes

that have not been taught.

Like the above learning approaches, our method also does

not rely on object models or physics knowledge. Like [13] it

exploits local descriptors of ﬁnger contacts (but a different

kind). However, our method requires no training data, and

can synthesize its own grasp hypotheses without any need of

demonstration.

III. METHOD

We present a method to address robotic grasping based on

the LoCoMo metric between the object and the gripper. This

similarity metric between the features on the object and the

features on the gripper is used to select viable ﬁnger poses on

the surface of the object which are then combined with the

kinematics of the gripper to form a grasp. In the following,

we assume a model of the gripper, in this case a parallel jaw

gripper.

The algorithm is given a (partial) point cloud of an object,

and ﬁrst computes the zero-moment shift features on the point

cloud. The same features are extracted on the point cloud

of the gripper model. These features of object and gripper

are then used together to compute a local shape similarity

metric between object and gripper. The main idea is to ﬁnd the

points that maximise the contact surface and to use only areas

of the object that match the surface curvature of the ﬁngers

of the gripper for the grasp. Finally, a feasibility analysis is

performed to select the subset of pairs of points which are

Figure 2. Local surface classiﬁcation base on the zero moment shift of the

Stanford Bunny. The colors Red, Yellow, Green and Blue encode in increasing

order the magnitude of the L1 norm of the zero-moment shift vector. High

values (Blue) incurs on the ears with high curvatures and low values (Red)

on surfaces with low curvatures. Left: ρ= 0.008, right:ρ= 0.016.

returned from the previous action and which are kinematically

feasible for the gripper.

A. Features Extraction and Matching Metric

Over the years, various local visual features have been

presented in the literature and were previously used for tasks

such as 2D/3D object recognition and pose estimation, [16]–

[20]. In this work we propose the use of zero-moment shift

features for grasping arbitrarily shaped objects.

Let Bρ(X)represent the Euclidean sphere of radius ρ

centered at a point X∈R3. Given a set of points Xin R3,

the zero-moment shift nρof the set of points ξ=X ∩Bρ(X),

belonging to the sphere Bρ(X), can be expressed as

nρ=M0

ρ(ξ)−X(1)

M0

ρ(ξ) = 1

N

N

X

n=1

Xi(2)

where, M0

ρ(ξ)represents the zero moment (or centroid) of the

set of points ξbelonging to the sphere Bρ(X).Xiis a point

sampled from ξand Nthe total number of points in ξ.

The L1 norm |nρ|of the zero-moment shift is a good

indicator of the characteristics of the underlying surface of the

set of points, as shown in Fig. 2. It can be used in conjunction

with a classiﬁer to robustly distinguish smooth surfaces from

edges, and also be used in conjunction with the ﬁrst-moment

of the set of points to provide a robust surface classiﬁcation for

noisy point cloud or mesh models as presented in [6]. In this

work, we focus on the use of the zero-moment shift to compute

a similarity metric between two arbitrary surfaces. We assume

that the set of point is already preprocessed and ﬁltered of

outliers. Comparing two local surfaces is then reduced to

comparing the zero-moment shift of the two surfaces. To

this end, we introduce the LoCoMo probability function Cϵ

between two surfaces

Cρ= 1 −max(x, ϕ(x, ⃗

0,Σ)) −ϕ(ε;⃗

0,Σ)

max(x, ϕ(x,⃗

0,Σ)) (3)

ϕrepresents the multivariate Gaussian density function

ϕ(X, µ, Σ) = 1

p(2π)n|Σ|exp(−1

2(X−µ)Σ−1(X−µ)) (4)

where X, µ ∈Rn,Σis the covariance matrix and nthe space

dimension. ⃗

0is the null vector of R3,εthe error between the

two zero-moment shift vectors deﬁned as

ε=n1

ρ−n2

ρ(5)

where n1

ρand n2

ρare expressed in the same reference frame.

max(x, ϕ(x, ...)) is the maximum value of the function

ϕ(x, ...)for all x∈R3. The zero-moment shift vectors can be

projected on the axis of the normal and the axis orthogonal to

the normal of the surface to obtain a new set of coordinates

(n∥, n⊥,0) which can be used for the computation of (5).

This LoCoMo metric based on zero-moment shift features is

extremely useful for grasping, as it provides a clear indication

of the local contact between the surfaces of a gripper and an

object.

B. Grasp Selection and Ranking

Selecting stable grasps is crucial to guarantee the success

of a grasp. Several analytic methods use force closure, such as

[21] and [22]. Force closure guarantees a static equilibrium be-

tween the contact forces. Furthermore, the interaction between

two surfaces in contact can be reduced to one or multiple

contact points as described in [23]. These assumptions are

necessary conditions for a stable grasp selection, however they

are not sufﬁcient conditions for a stable grasp, as mentioned

in [24].

The problem of generating grasp candidates can be formu-

lated as sampling ﬁnger poses on the surface of the object,

and combining them using the kinematics of the gripper to

form a grasp as described in [25]. Our method computes the

contact probability Cias given by (7) for each ﬁnger and uses

the kinematics of the gripper to select a set of ﬁnger poses to

form a grasp. The local contact probability Cρis computed

for an inﬁnitesimal surface in a sphere of radius ρ. In order

to account for the entire shape of a ﬁnger, Cρneeds to be

integrated over its entire surface. We also introduce R, the

ranking metric (given by (6)), to rank the grasps by computing

the weighted product of the contact probability for each ﬁnger.

R=k

nf

Y

i=1

Cwi

i(6)

Ci=1

Ns

n

X

i=1

Ci,Xi

ρ(7)

where, kis a normalizing term, wiare weights satisfying

Pn

i=1 wi= 1,nfthe number of ﬁngers, Cithe contact

probability for a ﬁnger deﬁned in (7), nthe number of points in

the vicinity of the ﬁnger, Nsa normalizing term representing

the maximum number of points in the vicinity of the ﬁnger,

Ci,Xi

ρthe local contact moment probability between a point on

the point cloud and its orthogonal projection on the surface of

the gripper. More information on how to combine probability

Algorithm 1: Grasp generation and ranking.

Data: Point Cloud X, Fingers’ 3D model, Sphere

Radius ρ

Result: Top-k grasps

1Compute the surface normal at each point X∈ X

2for ∀X∈ X do

3Select the set of points ξin Bρ(X)

4Compute nρwith (1)

5end

6for each ﬁnger do

7for ∀X∈ X do

8Sample several ﬁnger poses Pfaround X

9for p∈ Pfdo

10 Select the set of points Xswithin a

distance dfrom the surface of the ﬁnger

11 for Xs∈ Xsdo

12 Project Xson the ﬁnger’s surface

13 Compute Cs,Xs

ρwith (3)

14 end

15 Compute Ciwith (7)

16 Append Pfto P

17 end

18 end

19 end

20 Find F, the set of ﬁnger poses in Psatisfying the

kinematic constraints of the gripper

21 for ∀f∈ F do

22 Compute Rwith (6)

23 end

24 Order Fby decreasing order of R

25 Sample gripper pose from F

26 return the Top-k grasp poses

distributions can be found in [26]. A summary of the method

can be found in Alg. 1.

IV. EXP ER IM EN TAL RESULTS

A. Experimental setup

Our experimental setup (shown in Fig. 3) comprises a 7

degrees of freedom KUKA LBR iiwa arm whose end-effector

is mounted with a Schunk PG70 parallel jaw gripper with

ﬂat ﬁngers. The maximum stroke of the gripper is 68 mm.

The developed method neither require any prior knowledge

of the scene nor use any object models. However, for each

grasping trial, the robot workspace containing test objects

is observed by moving a robot wrist-mounted Ensenso N35

depth camera to six different locations. Resulting partial point

clouds from all viewpoints are stitched together, in robot base

coordinate frame, to form a point cloud of the work scene.

After segmenting the ground plane, the resulting cloud is then

used by our method to generate grasp hypotheses. Hand-eye

calibration has been performed beforehand to transform the

KUKA 7 DoF robot

3D camera

Gripper

Test objects

Figure 3. Hardware setup used to validate the proposed grasping method.

Figure 4. 21 objects used for the experiments. (left-column) spring clamp,

aluminum proﬁle, multi-head screwdriver, screwdriver, plastic strawberry, golf

ball; (middle) racquetball, plastic lemon , plastic nectarine, wood block, potted

meat can, electric hand drill, plastic bottle, gray pipe, white pipe; (right-

column) blue cup, hammer, bleach cleanser, gas knob, bamboo bowl, mustard

container.

camera-acquired point cloud data to robot’s coordinate system

as well as to simplify the computations.

The proposed grasping method has been tested on 21

objects, as shown in Fig.4, comprising a wide variety of

shapes, masses, materials, and textures. 13 of them are from

the YCB object set [27]. The objects are selected such that

they are small enough to be physically graspable by the used

gripper.

Two sets of experiments were conducted. Firstly, we tested

the robot’s ability to grasp and lift individual objects from

Table I

SET OF OBJECTS USED FOR THE EXPERIMENT.

Object Success Rate 1st Grasp (5 Trials)

bleach cleanser 80% (4/5)

racquetball 100% (5/5

blue cup 80% (4/5)

aluminium proﬁle 100% (5/5)

plastic bottle 100% (5/5)

bamboo bowl 100% (5/5)

spring clamp 100% (5/5)

electric hand drill 80% (4/5)

gas knob 100% (5/5)

golf ball 100% (5/5)

hammer 100% (5/5)

plastic lemon 80% (4/5)

mustard container 100% (5/5)

plastic nectarine 100% (5/5)

gray pipe 100% (5/5)

potted meat can 40% (2/5)

screwdriver 100% (5/5)

plastic strawberry 100% (5/5)

multi-head screwdriver 100% (5/5)

white pipe 60% (3/5)

wood block 100% (5/5)

Success Rate 91.43% (96/105)

the surface of a table. Second set of tests were performed

to analyse the robot’s ability to clear randomly piled heaps of

objects, by grasping and lifting objects successively, until none

remained. During trials, running on an Intel Core i7-4790K

CPU @ 4.00GHz and 16 GB RAM, our method took 13.53

seconds (on an average) to generate 1500 grasp hypotheses

for a point cloud with 31183 data points corresponding to

a clutter scene of 13 Objects. This computational time is

distributed as follows. The local contact moment computation

is performed in 1.26 seconds (9.3%), the selection of ﬁnger

pairs with feasible gripper kinematics is done in 6.29 seconds

(46.5%), and the robot’s end effector pose sample and inverse

kinematics check takes up to 5.98 seconds (44.2%).

B. Grasping individual objects

Our ﬁrst experiment evaluates the robot’s ability to grasp

and lift individual objects off a ﬂat table surface. 21 objects

were used, with ﬁve grasping trials performed on each object.

For each of the ﬁve trials, we randomly placed each object

on the table with different orientations and positions. After

capturing and registering partial point-clouds from multiple

views, points belonging to the table surface are ﬁltered out and

the resulting object point cloud is then used to generate grasp

hypotheses, as described in Alg.1. The grasps are ranked, and

the grasp with the highest likelihood, Eq. (6), is executed. A

grasp is recorded as successful if the robot manages to grasp

and lift the object to a post-grasp position 20 cm above the

table, and hold the object for more than 10 seconds without

dropping it.

Table I shows the results of our algorithm when grasping

objects that are individually placed on a table. Fig.5 shows

images of successful grasps. The overall success rate for all

ﬁve trials on all 21 objects is 91.43% (96 successful grasps

Planned grasp Pre-grasp Grasp Post-grasp

Figure 5. Successful grasps for various objects. In each row, from left to

right, the ﬁrst image shows the point cloud of the object with the contact

moment probability and the highest ranked grasp; the second image shows

the pre-grasp position of the gripper; the third image shows the grasp; ﬁnally,

the fourth image shows the post-grasp position with the object grasped.

out of 105). In 97.14% (102/105), the LoCoMo algorithm

suggested viable grasps, but objects were dropped for other

reasons. For example the object was heavy, and the selected

grasp was far from the centre of mass, placing a large torque

on the gripper jaws, causing the object to twist loose. In the

case of the potted meat can, the success rate was only 40%

(2/5). This was due to shiny surfaces which caused a very

noisy point cloud.

In safety-critical, high-consequence industries, such as nu-

clear waste handling or other extreme environments, au-

tonomous robotics methods are likely to be introduced as

“operator-assistance technologies”, i.e., human-supervised au-

tonomy. In such cases, a human operator might select between

Figure 6. Three different cluttered scenes generated for validating our

approach.

several grasps that have been suggested by an autonomous

grasp planner. As a small step towards exploring such a

system, we repeated the ﬁrst experiment, however in each

attempt we allowed a human to choose one of the best ﬁve

grasp candidates suggested by the LoCoMo algorithm. In this

case, grasp success rose to 98%. This suggests that improved

performance might be obtained by combining LoCoMo with

other kinds of information, e.g., selecting grasps which result

in minimal torques.

C. Grasping objects from a cluttered heap

The second set of experiments was performed on cluttered,

self-occluding heaps of objects. For each heap, at least 6

objects were placed in a random pile. Three different heaps

were used, Fig. 6. The robot is tasked with clearing the

heap, by successively grasping and lifting objects until none

remain. No ground plane segmentation was performed in this

second experiment. However, the LoCoMo algorithm was able

to automatically label the ﬂat table surface as ungraspable,

i.e., excluding ﬂat surfaces, and focusing attention on objects,

appears to be an inherent behaviour of the algorithm.

At each iteration, grasps are generated, and the highest

ranked grasp is executed. Each object is removed without

replacement if the grasp is successful, and the experiment

is repeated until all the objects are grasped or the algorithm

reports that it cannot identify any more feasible grasps. The

success of each grasp attempt is determined in the same way

as in the ﬁrst experiment.

Table II shows the results for the heap-picking experiments.

We report the results of three different heaps containing at least

six objects each. For the ﬁrst heap, 100% of the objects were

grasped successfully from the table, one after the other. Only

the gas knob required two trials to be successfully grasped,

with all other objects grasped on the ﬁrst attempt.

For the second heap, all objects were grasped at the ﬁrst

attempt, and the success rate was 100%. During its third

grasp, the robot chose to grasp and lift the bowl object, while

the bowl still held three other objects inside it (multi-head

screwdriver, plastic bottle and nectarine). In order to continue

Table II

CLU TTE RE D SCE NE EX PE RIM EN T RES ULTS .

Scene Attempt Object Success / Failure

#1

1 blue cup success

2 golf ball success

3 white pipe success

4 electric hand drill success

5 gas knob failure

6 wood block success

7 gas knob success

8 plastic nectarine success

#2

1 gray pipe success

2 aluminum proﬁle success

3 bamboo bowl success

4 multi-head screwdriver success

5 plastic bottle success

6 plastic nectarine success

#3

1 mustard container success

2 plastic bottle success

3 spring clamp failed

4 plastic lemon success

5 spring clamp success

6 hammer failed

7 hammer success

8 plastic strawberry rolled off table

the experiment, these objects were placed back on the table

and then successfully grasped, needing only one attempt each.

For the third heap, 83% of the objects (5 out of 6) were

successfully grasped. The spring clamp and hammer proved

to be difﬁcult, due to sparse point clouds. However, only two

attempts were required to grasp these objects. The system did

not fail to plan a grasp for the ﬁnal object (plastic strawberry).

Unfortunately, lifting the hammer caused the strawberry to roll

off the table so that this ﬁnal object of the heap could not be

completed.

Fig. 7 shows the generated grasps in the cluttered scene 1.

The robot was able to clear all three heaps successfully, the

only exception being the ﬁnal object of the third heap, which

was pushed off the table during lifting of one of the other

objects.

D. Discussion

Overall, results suggest that the LoCoMo algorithm is very

promising. For lifting individual objects, a success rate of

91.43%was obtained over ﬁve different trials on 21 dif-

ferent objects, featuring a very wide variety of shapes and

appearances. This result is remarkable considering that the

system did not have any model or other a-priori knowledge

of the objects being grasped. Additionally, no training data

was required, and no learning was involved to obtain these

results. Moreover, in the heap-picking experiments, featuring

extreme clutter conditions, LoCoMo was able to grasp most

of the objects at the ﬁrst attempt (15 out of 19 objects) and

was able to successfully grasp all objects, of all heaps, with

the exception of the ﬁnal object of the ﬁnal heap (plastic

strawberry) which rolled off the table during earlier activity.

Aside from a small number of unusual incidents, the pro-

posed algorithm appears to have planned robust grasps almost

Planned grasp

Pre-grasp

Grasp

Post-grasp

A�empt 1 A�empt 2 A�empt 3 A�empt 4 A�empt 5 A�empt 6 A�empt 7 A�empt 8

Figure 7. Results of grasp execution in cluttered scenes. First row: images of point cloud of the scene and the gripper. Second row: pre-grasp position of

the gripper with respect to the cluttered scene. Third row: execution of the grasp. Fourth row: post-grasp position of the gripper. Chronological sequence is

from left to right, i.e. ﬁrst column shows the grasping of the ﬁrst object, second column the grasping of the second object and so on. Detailed results can be

found in the provided supplementary video.

100% of the time. However, we believe that we can improve

robustness in several ways. We noted earlier that the set of

ﬁve highest-ranked grasps occasionally contains a grasp that

performs better than the highest ranked grasp. This is because

LoCoMo selects grasps based purely on the geometry of

surfaces. Combining LoCoMo’s robust selection of graspable

geometrical features, with other kinds of information such

as mass distribution [28], may enable more robust perfor-

mance. Additionally, combining multiple grasp hypotheses

with human-supervised autonomy, appears to outperform pure

autonomy based on LoCoMo alone.

V. CONCLUSION

In this paper, we proposed a novel grasp generation method,

based on the LoCoMo metric which searches for similarities

between the shape of ﬁnger surfaces, and the local shape

of an object, observed as a partial point-cloud. The metric

is based on zero-moment shift visual features, which encode

useful information about local surface curvature. Our method

does not rely on any a-priori knowledge about objects or

their physical parameters, and also does not require learning

from any kind of training data. Grasps are planned from

point-cloud images of objects, viewed from a depth-camera

mounted on the robot’s wrist. Experimental trials, with a real

robot and wide variety of objects, suggest that our method

generalises well to many shapes. We also demonstrated very

robust performance in extremely cluttered scenes. Moreover,

the algorithm is also capable of classifying certain objects

(e.g., ﬂat table surfaces) as not graspable.

Our future work will focus on improving the performance

of the method in terms of speed and extending it to perform

multi-ﬁnger grasping. We will also focus on accomplishing

complex manipulations in challenging scenarios, e.g., nuclear,

automotive etc. by integrating it with our previous state

estimation and control methodologies [29], [30].

VI. ACK NOWLEDGEMENTS

This work forms part of the UK National Centre for

Nuclear Robotics initiative, funded by EPSRC EP/R02572X/1.

It is also supported by H2020 RoMaNS 645582, and EP-

SRC EP/P017487/1, EP/P01366X/1. Stolkin is supported by

a Royal Society Industry Fellowship. Ortenzi and Corke are

supported by the Australian Research Council Centre of Ex-

cellence for Robotic Vision (project number CE140100016).

REFERENCES

[1] A. T. Miller and P. K. Allen, “Graspit! a versatile simulator for robotic

grasping,” IEEE Robotics & Automation Magazine, vol. 11, no. 4, pp.

110–122, 2004.

[2] V.-D. Nguyen, “Constructing force-closure grasps,” The International

Journal of Robotics Research, vol. 7, no. 3, pp. 3–16, 1988.

[3] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning

hand-eye coordination for robotic grasping with deep learning and large-

scale data collection,” The International Journal of Robotics Research,

vol. 37, no. 4-5, pp. 421–436, 2018.

[4] N. Marturi, M. Kopicki, A. Rastegarpanah, V. Rajasekaran, M. Adjigble,

R. Stolkin, A. Leonardis, and Y. Bekiroglu, “Dynamic grasp and

trajectory planning for moving objects,” Autonomous Robots, in-press.

[5] A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp pose detection

in point clouds,” The International Journal of Robotics Research, p.

0278364917735594, 2017.

[6] U. Clarenz, M. Rumpf, and A. Telea, “Robust feature detection and

local classiﬁcation for surfaces based on moment analysis,” IEEE

Transactions on Visualization and Computer Graphics, vol. 10, no. 5,

pp. 516–524, 2004.

[7] J. Weisz and P. K. Allen, “Pose error robust grasping from contact

wrench space metrics,” in Robotics and Automation (ICRA), 2012 IEEE

International Conference on. IEEE, 2012, pp. 557–562.

[8] C. Rosales, R. Su´

arez, M. Gabiccini, and A. Bicchi, “On the synthesis

of feasible and prehensile robotic grasps,” in Robotics and Automation

(ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp.

550–556.

[9] M. A. Roa and R. Su´

arez, “Computation of independent contact regions

for grasping 3-d objects,” IEEE Transactions on Robotics, vol. 25, no. 4,

pp. 839–850, 2009.

[10] D. Prattichizzo and J. C. Trinkle, “Grasping,” in Springer handbook of

robotics. Springer, 2008, pp. 671–700.

[11] J.-W. Li, H. Liu, and H.-G. Cai, “On computing three-ﬁnger force-

closure grasps of 2-d and 3-d objects,” IEEE Transactions on Robotics

and Automation, vol. 19, no. 1, pp. 155–161, 2003.

[12] M. Gualtieri, A. ten Pas, K. Saenko, and R. Platt, “High precision

grasp pose detection in dense clutter,” in Intelligent Robots and Systems

(IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016, pp.

598–605.

[13] M. Kopicki, R. Detry, M. Adjigble, R. Stolkin, A. Leonardis, and J. L.

Wyatt, “One-shot learning and generation of dexterous grasps for novel

objects,” The International Journal of Robotics Research, vol. 35, no. 8,

pp. 959–976, 2016.

[14] I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic

grasps,” The International Journal of Robotics Research, vol. 34, no.

4-5, pp. 705–724, 2015.

[15] H. B. Amor, O. Kroemer, U. Hillenbrand, G. Neumann, and J. Peters,

“Generalization of human grasping for multi-ﬁngered robot hands,” in

Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International

Conference on. IEEE, 2012, pp. 2043–2050.

[16] M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, “Region-

sequence based six-stream cnn features for general and ﬁne-grained

human action recognition in videos,” Pattern Recognition, vol. 76, pp.

506–521, 2018.

[17] M. Ma, N. Marturi, Y. Li, R. Stolkin, and A. Leonardis, “A local-global

coupled-layer puppet model for robust online human pose tracking,”

Computer Vision and Image Understanding, vol. 153, pp. 163–178,

2016.

[18] D. Smeets, J. Keustermans, D. Vandermeulen, and P. Suetens, “meshsift:

Local surface features for 3d face recognition under expression varia-

tions and partial data,” Computer Vision and Image Understanding, vol.

117, no. 2, pp. 158–169, 2013.

[19] E. Paquet, M. Rioux, A. Murching, T. Naveen, and A. Tabatabai,

“Description of shape information for 2-d and 3-d objects,” Signal

processing: Image communication, vol. 16, no. 1-2, pp. 103–122, 2000.

[20] R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu, “Fast 3d recognition and

pose using the viewpoint feature histogram,” in Intelligent Robots and

Systems (IROS), 2010 IEEE/RSJ International Conference on. IEEE,

2010, pp. 2155–2162.

[21] R. M. Murray, Z. Li, S. S. Sastry, and S. S. Sastry, A mathematical

introduction to robotic manipulation. CRC press, 1994.

[22] C. Ferrari and J. Canny, “Planning optimal grasps,” in Robotics and

Automation, 1992. Proceedings., 1992 IEEE International Conference

on. IEEE, 1992, pp. 2290–2295.

[23] A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,” in

Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE Interna-

tional Conference on, vol. 1. IEEE, 2000, pp. 348–353.

[24] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp

synthesis - a survey,” IEEE Transactions on Robotics, vol. 30, no. 2,

pp. 289–309, 2014.

[25] M. Kopicki, R. Detry, F. Schmidt, C. Borst, R. Stolkin, and J. L. Wyatt,

“Learning dexterous grasps that generalise to novel objects by combining

hand and contact models,” in Robotics and Automation (ICRA), 2014

IEEE International Conference on. IEEE, 2014, pp. 5358–5365.

[26] S. Kaplan, “Combining probability distributions from experts in risk

analysis,” Risk Analysis, vol. 20, no. 2, pp. 155–156, 2000.

[27] B. Calli, A. Walsman, A. Singh, S. Srinivasa, P. Abbeel, and A. M.

Dollar, “Benchmarking in manipulation research: Using the yale-cmu-

berkeley object and model set,” IEEE Robotics & Automation Magazine,

vol. 22, no. 3, pp. 36–52, 2015.

[28] N. Mavrakis, R. Stolkin, L. Baronti, M. Kopicki, M. Castellani et al.,

“Analysis of the inertia and dynamics of grasped objects, for choosing

optimal grasps to enable torque-efﬁcient post-grasp manipulations,” in

Humanoid Robots (Humanoids), 2016 IEEE-RAS 16th International

Conference on. IEEE, 2016, pp. 171–178.

[29] V. Ortenzi, N. Marturi, R. Stolkin, J. A. Kuo, and M. Mistry, “Vision-

guided state estimation and control of robotic manipulators which lack

proprioceptive sensors,” in Intelligent Robots and Systems (IROS), 2016

IEEE/RSJ International Conference on. IEEE, 2016, pp. 3567–3574.

[30] N. Marturi, A. Rastegarpanah, C. Takahashi, M. Adjigble, R. Stolkin,

S. Zurek, M. Kopicki, M. Talha, J. A. Kuo, and Y. Bekiroglu, “Towards

advanced robotic manipulation for nuclear decommissioning: a pilot

study on tele-operation and autonomy,” in Robotics and Automation for

Humanitarian Applications (RAHA), 2016 International Conference on.

IEEE, 2016, pp. 1–8.