Conference PaperPDF Available

A Performance Evaluation of Surface Normals-based Descriptors for Recognition of Objects Using CAD-Models

Authors:

Abstract and Figures

This paper describes a study and analysis of surface normal-base descriptors for 3D object recognition. Specifically, we evaluate the behaviour of descriptors in the recognition process using virtual models of objects created from CAD software. Later, we test them in real scenes using synthetic objects created with a 3D printer from the virtual models. In both cases, the same virtual models are used on the matching process to find similarity. The difference between both experiments is in the type of views used in the tests. Our analysis evaluates three subjects: the effectiveness of 3D descriptors depending on the viewpoint of camera, the geometry complexity of the model and the runtime used to do the recognition process and the success rate to recognize a view of object among the models saved in the database.
Content may be subject to copyright.
A Performance Evaluation of Surface Normals-based Descriptors for
Recognition of Objects Using CAD-Models
C. M. Mateo
1
, P. Gil
2
and F. Torres
2
1
University Institute for Computing Research, University of Alicante, San Vicente del Raspeig, Spain
2
Department of Physics, Systems Engineering and Signal Theory, University of Alicante, San Vicente del Raspeig, Spain
{cm.mateo, pablo.gil, fernando.torres}@ua.es
Keywords:
3D Object Recognition, 3D Surface Descriptors, Surface Normal, Geometric Modelling.
Abstract:
This paper describes astudy and analysis of surface normal-base descriptors for 3D object recognition. Specif-
ically, we evaluate the behaviour of descriptors in the recognition process using virtual models of objects cre-
ated from CAD software. Later, we test them in real scenes using synthetic objects created with a 3D printer
from the virtual models. In both cases, the same virtual models are used on the matching process to find
similarity. The difference between both experiments is in the type of views used in the tests. Our analysis
evaluates three subjects: the effectiveness of 3D descriptors depending on the viewpoint of camera, the ge-
ometry complexity of the model and the runtime used to do the recognition process and the success rate to
recognize a view of object among the models saved in the database.
1 INTRODUCTION
The 3D object recognition process has had important
advances in the last years. In recent works, many ap-
proaches use range sensors to obtain depth of objects
present in a scene. The depth information has per-
mitted to change the techniques and algorithms for
extracting features from image. In addition, this one
has been used to design and createnew descriptors for
identification objects from scene captured by range
sensors (Rusu, 2009) and (Lai, 2013). LIDARSs,
Time of Flight cameras (ToF) or RGBD sensors, such
as Kinect or Asus Xtion PRO Live, provide depth and
allow us to recover the 3D structure of scene from
a single image. The choice of the kind of sensor
is depending on the context and lighting conditions
(indoors, outdoors) and type of specific application
(guided/navigation of robots or vehicles, people de-
tection, human-machine interaction, object recogni-
tion and reconstruction, etc.). Furthermore,the recog-
nition methodology applied to retrieve the 3D object
shape is different depends on whether the object is
rigid or non-rigid. A variety of methods for detec-
tion of rigid and non-rigid objects were presented in
(Wohlkinger et al., 2012) and (Lian et al., 2013), re-
spectively.
In this work, rigid object recognition is done. But
rigid object recognition can be based on visual fea-
tures information such as bounding, skeleton, silhou-
ette, colour, texture, moments, etc. or geometric fea-
tures such as vectors normal, voxels, etc. obtained
from depth information captured from a range sen-
sor. Examples of descriptors for rigid objects based
on geometric features, are: PFH (Point Feature His-
togram) and FPFH (Fast Point Feature Histogram)
(Rusu, 2009); VFH (Viewpoint Feature Histogram)
(Rusu et al., 2010); CVFH (Clustered Viewpoint Fea-
ture Histogram) (Aldoma et al., 2011); and SHOT
(Signature of Histograms of Otientations) (Tombari
et al., 2010). All of them describe the geometry of
an object using normal vectors to its surface which is
represented by a point clouds. Other descriptors such
as ESF (Ensemble of shape Functions) (Wohlkinger
and Vincze, 2011a) and SVDS (shape Distribution
on Voxel Surfaces) (Wohlkinger and Vincze, 2011b);
GRSD (Global Radius based Surface Descriptors)
(Marton et al., 2011) are based on voxels to represent
the object surface. SGURF (Semi-Global Unique Ref-
erence Frames) and OUR-CVFH (Oriented, Unique
and Repeatable CVFH) (Aldoma et al., 2012b) are
also other noteworthy descriptors because they have
the advantage to the ambiguity over the camera roll
angle. SGURF is computed from a single viewpoint
of the object surface and OUR-CVFH is based on a
mix between SGURF and CVFH. CVFH is briefly
discussed below.
In this paper, 3D rigid object recognition based on
object category recognition is done. Also, we have
428
introduced some novelty into the performance shown
in (Wohlkinger et al., 2012) and (Alexandre, 2012).
We have created views from a virtual camera which
captures information of virtual models with different
viewpoints. Afterwards, we have created the 3D rigid
objects from CAD models using 3D printer to test
if the behavioural changes of the descriptors are sig-
nificant. Thereby, the errors in the recognition pro-
cess can be better controlled. Thus, both descriptors,
model and object, are computed from known perfect
geometrical figures. Therefore, the recognition errors
only depend on the geometry of the isolated object
in the scene and the precision of descriptor for mod-
elling and identifying these objects. It is important
emphasize that evaluated descriptors cannot be used
if the scene was not previously segmented and the ob-
jects are localized therein.
The rest of this paper is structured as follows. 3D
descriptors based on geometric information are com-
mented in Section 2. In Section 3, we present the sim-
ilarity measures proposed for associating objects to
models. Experimental results of the descriptors eval-
uation is shown in Section 4 and 5. Finally, section 6,
contains the conclusions.
2 3D DESCRIPTORS
In this paper, we work with isolated rigid objects with
uncluttered backgroundsin indoor scenes. Hence, our
appearancemodel is based on a set of different feature
descriptors. In particular, five descriptors are used in
the experimentation. For each descriptor type, we use
the same training framework. That is the same ob-
jects as dataset or test data. The training framework is
detailed later (Section 4). The descriptors are always
computed over a mesh consists of a point cloud. The
descriptors only include geometric information based
on the surface shape but they do not include colour or
other type of visual features information. The idea is
to evaluate 3D objects recognition methods based on
3D descriptors without using additional appearance
informationsuch as colour and texture from scene im-
age, information position/orientation from geoloca-
tion and odometry techniques obtained. The absence
of colour and texture provides generality for working
with unknown objects and simplifies the runtime in
the recognition task. Frequently, in the industrial en-
vironments are used objects and pieces without this
kind of information. Those are made of metal or plas-
tic material with homogeneous colour and they can
only be differenced by means of geometry and sur-
face features.
The five feature descriptors based on surface nor-
mal vectors: PFH, FPFH, SHOT, VFH and CVFH,
were chosen because they retrieve enough geometri-
cal information of shape. This information will give
us the ability to make further analysis in industrial
pieces. In the literature, descriptors are grouped as
local and global recognition pipeline. The main dif-
ference among these groups is the size of signature
and the number of signatures to describe the surface.
In the first, descriptor is represented by a signature for
each point of surface, but, in the second, it saves all
viewpoint information using one signature for whole
surface. A brief description:
PFH, It is a set of signatures from several local
neighbourhoods. For each point is computed a 3-
tuple, h α, φ, θ i of angles which represent the
relation among normals in their neighbourhood,
according to Darboux frame. Then in order to,
compute each final signature, the method adds the
relations among all points within neighbourhood
in the surface. Therefore the complexity compu-
tational is O
nk
2
. The signature dimensionality
is 125.
FPFH, This is based on the same idea that PFH, it
uses a Darboux frame to make relations among
pair of points within a neighbourhood with ra-
dio r for computing each local surface signature.
This descriptor generates a linear complexity in
the number of neighbours, O(nk). This approxi-
mation changesthe relations among a point and its
neighbours located with a distance smaller than r,
adding a specific weight according to the distance
between point and everyneighbour. The signature
dimensionality is 33.
SHOT, In this descriptor a partitioned spherical grid
is used as local reference frame. For each volume
of the partitioned grid, a signature of the amount
of cosθ
i
between the normal at each point of sur-
face and the normal at the query feature point is
computed. A normalization of descriptor is re-
quired to provide it robustness towards point den-
sity variations. The signature dimensionality is
352.
VFH, It is based on FPFH. Eachsignature consists of
a histogram with two components; one has the an-
gles hα, φ, θi which is calculated as the angular re-
lation between a point’s normal and the normal of
the point cloud’s centroid, and other represent the
angles between the vector determined by the sur-
face centroid and viewpoint. This descriptor has
complexity of O(n). The signature dimensionality
is 308.
CVFH, This descriptor is an extension to VFH. The
basic idea is to identify an object from splitting it
APerformanceEvaluationofSurfaceNormals-basedDescriptorsforRecognitionofObjectsUsingCAD-Models
429
(a) Cone (b) Cube (c) Cylinder (d) Prism (e) Sphere
Figure 1: Primive shapes of the models.
in a set of smooth and continuous regions or clus-
ters. The edges, ridges and other discontinuities in
the surface are not considered because these parts
are more affected by the noise. Thereby, for each
of these clusters is computed its VFH descrip-
tor. CVFH describes a surface as a histogram in
which each histogram item represents the centroid
to surface and the average of the normals among
all points of surface. Again, the dimensionality is
308.
Other descriptors such as Radius-based (RSD and
GRSD) or voxels-based (SVDS and ESF) are not stud-
ied here. This decision was taken because the re-
sults shown in (Aldoma et al., 2012a) and (Alexandre,
2012) that Normal-based descriptors are best with
household object as proving the accumulated recog-
nition rate, ROC curve for recognition and Recall-vs-
(1-Precision).
3 SIMILARITY MEASURES
Similarity measures are used to associate the CAD-
model and the object view. The similarity measures
are defined like distance metrics. Four type of dis-
tance metrics, d
s
= {d
L1
, d
L2
, d
χ
2
, d
H
} are used to
compare the CAD-model, C
j
, which represents a ob-
ject category with the object view in the scene. The
definitions for the four distances are:
d
L1
(p, q) =
n
i=1
p
i
q
i
(1)
d
L2
(p, q) =
s
n
i=1
(p
i
q
i
)
2
(2)
d
χ
2
(p, q) =
n
i=1
(p
i
q
i
)
2
p
i
+ q
i
(3)
d
H
(p, q) =
1
2
s
n
i=1
(
p
i
q
i
)
2
(4)
where d
L1
defines the Manhattan distance, d
L2
is
Euclidean distance, d
χ
2
defines Chi-squared distance
and d
H
is Hellinger distance. And n is point dimen-
sions, being p and q two arbitrary points.
Each CAD-model, C
j
is defined by a set of views
C
j
=
c
j1
, c
j2
. . . , c
jr
where r is the number of view-
points from where the CAD-model is observed with
a virtual camera. Furthermore, each view is repre-
sented by a set of descriptors defined as following,
c
jl
=
n
m
jl
1
, m
jl
2
, m
jl
3
, m
jl
4
, m
jl
5
o
where l represents the
view identifier and j the object class defined in the
CAD-model. This set represents a hybrid descrip-
tor composed of five components. A component for
each type of descriptor: PFH, FPHF, SHOT, VFH
and CVFH. Similarly, for each object, O
i
is defined
by a set of views O
i
= {o
i1
, o
i2
, . . . , o
in
}where n is the
number of viewpoints from where the object in scene
is captured using a virtual or real camera. As well,
each view is represented by a set of descriptors de-
fined as following, o
ik
=
v
ik
1
, v
ik
2
, v
ik
3
, v
ik
4
, v
ik
5
where k
represents the view identifier, and i is the object iden-
tifier.
Then, the difference between each component of
the CAD-model descriptor and object descriptor, is
calculated according to equations (1), (2), (3) and (4).
The similarity, d
c
, between object category, C
j
in
the database and the object in scene, is computed by
using the minimum distance for each type of descrip-
tor, following equation (5). The comparison is done
for all models saved in the database.
d
c
(O
i
,C
j
) = min
o
ik
O
i
c
jl
C
j
n
d
o
ik
, c
jl
o
(5)
d
o
ik
, c
jl
=
q
d
s
(o
ik
, c
jl
)
2
+ d
s
(c
jl
, o
ik
)
2
(6)
where s represents the kind of distance defined in
equation (1), (2), (3) and (4).
4 EXPERIMENTS
Test data were created to analyse the 3D descriptors
behaviour. They were created like a dataset of the 5
basic shapes which are used like models of objects.
They are a sphere, cube, cone, cylinder and triangu-
lar prism (Figure 1). These models represent differ-
ent surfaces without colour, texture or another charac-
teristic different to geometry. Each CAD-model was
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
430
(a) Tesselated-sphere and arbitrary viewpoints
(b) Top, side and another.
Figure 2: (a) Camera poses to obtain views. (b) Virtual and
real objects views from three arbitrary poses, respectively.
created as a point cloud from CAD software. Each
CAD-model represents an object category in order to
recognize. They are represented by a point cloud with
variable number of points, with regards to the view
and the kind of shape.
The correspondence process between model and
object must be consistent. For this reason, in this
paper, we have evaluated this process using CAD-
models. In addition, we did not use keypoints com-
puted from the surface and so the noise due to in-
accuracy in its location is almost eliminated. There-
fore, factors like the repeatable of keypoints with re-
spect to viewpoint variations cannot be produced. We
have used all points in the surface to analyse and
evaluate the descriptors behaviour, thoroughly. If we
had only evaluated the descriptors with a number of
points chosen from surface, i.e. keypoints, the anal-
ysis had been limited to effectiveness of those. The
keypointsmust be chosen to avoidredundantor sparse
information (keypoints close or too far themselves,
respectively). Generally, the descriptors based on
keypoints are efficient but they are little descriptive
and they are not robust to noise. Other descriptors,
such as local/regional or global descriptors are more
suitable to noise. Moreover, they are useful to han-
dle partial/complete surface information and so they
are more descriptive on objects with poor geometric
structure. Therefore, they are more suitable to cate-
gorize objects in a recognition process, as can be seen
here.
In the experiments, geometric transformations are
applied to the point cloud of CAD-models shown in
Figure 1. Geometric transformations simulate view-
point of the objects in scene of real world. Geometric
transformations applied were rotations, translations
and scale changes from different camera poses (Fig-
ure 2). The recognition process consists of a match-
ing process among CAD models and objects in or-
der to associate and identify the object category. The
object category is given by the object greatest simi-
larity between the object and the geometric shape of
a model (Figure 3, Figure 4 and Figure 5), applying
Equation 5.
In order to evaluate the behaviour descriptors and
find which works best in recognition process, we have
planned two type of experiments. Firstly, virtual ob-
jects are created from CAD-models selecting viewsto
build the test database (Figure 3). Thus, at least, we
guarantee that all views created for the test database
are equals to one view of a CAD-model. Secondly,
virtual objects are created from CAD-models apply-
ing one or more transformation on those (Figure 4).
These transformations are chosen to provide different
views to any view used within a model so we ensure
a total difference between test database and models.
In this case, we have worked with 42 and 38 different
views of the test and model database, respectively.
Figure 3 shows a comparison in which the match-
ing process is done combining all descriptors with
all distances for virtual object views without trans-
formations. This comparison allows us to determine
the capacity of similarity measures for classification
of object views in categories according to a CAD-
model. The obtained results report better recognition
when the matching process is done using L1 distances
and the worst results are generated by L2 distance, in
both case is independent from the used 3D descrip-
tor. In addition, L2 distance causes confusion in the
recognition as distance matrices of PFH, FPFH and
SHOT demonstrate.χ
2
and H provide similar results
although H is slightly better.
Figure 4 shows an interesting additional experi-
ment. It consists in reporting recognition results with
regard to the transformation level. The difficulty in
the matching process is increased due to the loss of
similarity among the virtual object views with trans-
formation and the models. In this case, both distance
matrices, VFH and SHOT, report about a growth of
confusion level in the recognition regardless of dis-
tance metric. Furthermore, both PFH and FPFH are
not practically changed their behaviour. Summariz-
ing, CVFH is the most stable descriptor although the
chosen distance metric is different or the object views
are not exactly equal to any model views.
Finally, we have tried out the behaviour of the two
best descriptors using the two best similarity mea-
sures when the recognition process is realized from
real physical objects. In this case, the views for the
test database are obtained by means of acquisition
process from Kinect. In this last experiment, CAD-
models are used to create 5 real physical objects us-
ing a 3D printer. They were created using PLA (PLA:
APerformanceEvaluationofSurfaceNormals-basedDescriptorsforRecognitionofObjectsUsingCAD-Models
431
(a) PFH (L2) (b) PFH (χ
2
) (c) PFH (H) (d) PFH (L1)
(e) FPFH (L2) (f) FPFH (χ
2
) (g) FPFH (H) (h) FPFH (L1)
(i) SHOT (L2) (j) SHOT (χ
2
) (k) SHOT (H) (l) SHOT (L1)
(m) VFH (L2) (n) VFH (χ
2
) (o) VFH (H) (p) VFH (L1)
(q) CVFH (L2) (r) CVFH (χ
2
) (s) CVFH (H) (t) CVFH (L1)
Figure 3: Distance matrix when model set is compared with itself (Model vs Model).
PolyLactic Acid or PolyLActide) filament of 3mm di-
ameter. The print allowed us a precisely controlling
of the size, exact shape and the building material that
objects would have in the scene. This is done be-
cause we would not have an appropriated error han-
dling, if household objects similar to (Rusu, 2009) or
(Alexandre, 2012) had been used in our experiments.
Perhaps, in those cases, the errors in the recognition
process were influenced by the properties of building
material, the capture and digitalized process when the
shapes are not exactly like the CAD-model, etc. For
this reason, we have built our own objects for the test
database. After we have captured from Kinect these
real physical objects using different pose cameras in
the scene. In particular, the test data set has a total of
32 camera views for each object. These viewpoints
represent rotations and translations. The object has
been rotated from 4 different angles (0,
π
6
,
π
3
,
π
2
)rad in
two different axis (in relation of the main axis and
minor axis of the object). In addition, the object has
been translated to 4 different positions which repre-
sent (origin, near, left and right). This way the scale
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
432
(a) PFH (χ
2
) (b) PFH (L1)
(c) FPFH (χ
2
) (d) FPFH (L1)
(e) SHOT (χ
2
) (f) SHOT (L1)
(g) VFH (χ
2
) (h) VFH (L1)
(i) CVFH (χ
2
) (j) CVFH (L1)
Figure 4: Distance matrix when model set is compared with
test set (Model vs Test).
changes have also been considered. The result can be
seen in Figure 5 which shows the matching process
between all objects and all CAD-models.
As the above Figures 4 clearly shown, CVFH is
the most effective to recognize virtual objects. There-
fore, it turns out a good choice to apply it to recog-
nize real physical objects using similar views to those
were registered for the virtual objects as is shown in
Figure 5. A comparison of Figures 4(i)- 4(j) and Fig-
ures 5(c)- 5(d) demonstrate that the presence of varia-
tions, such as present noise, lacking of pointsto define
the surface when the view is captured from camera or
loosing of smoothing surface due to noise points in
the acquisition process, have worsened the matching
(a) VFH (χ
2
) (b) VFH (L1)
(c) CVFH (χ
2
) (d) CVFH (L1)
Figure 5: Distance matrix for matching process among
models and real scenes.
process. Consequently, the distance between a view
and false model are closer to zero. This fact is clearly
observed between cylinder and cone.
5 ANALYSIS AND EVALUATION
OF TIME AND ACCURACY
The recognition process behaviour have been evalu-
ated with regards to the relation between runtime and
accuracy. A complete set of experiments were de-
signed. Summarizing, the recognition process con-
sisted of three steps: a) Building database: Calcula-
tion of descriptors for each view in each model saved
in the database. b) Calculation of descriptors for real
and virtual (test) views. c) Matching between test
views by means of computing difference among all
models views saved in the database and arbitrary test
view.
The runtime of steps a) and b) on the recogni-
tion process is changing and it depends on amount of
points in the view, the number of views per model, the
number of models and the descriptor characteristics.
Thus, we have to measure the runtime cost depend-
ing on detail level of its representation in each point
cloud. Figure 6 shows the runtime for each descrip-
tor depending on the shape. Each graph represents
the runtime of all descriptors for each shape (for each
shape were used 162 views with different amount of
points). On the one hand, as observed, the runtime
dependencywith shape complexityis least-significant
than computational complexity of feature descriptor.
It is because all shapes keep the following relation:
PFH >> FPFH >> SHOT >> CVFH >> VFH.
Although, the shape complexity affects to stability of
local feature descriptors runtime (Figure 6(f)). VFH
APerformanceEvaluationofSurfaceNormals-basedDescriptorsforRecognitionofObjectsUsingCAD-Models
433
300 400 500 600 700 800 900 1000 1100
10
0
10
1
10
2
10
3
10
4
Number of points
Descriptor runtime (msec.)
pfh
fpfh
shot
vfh
cvfh
(a) Cone
300 400 500 600 700 800 900 1000 1100
10
0
10
1
10
2
10
3
10
4
Number of points
Descriptor runtime (msec.)
pfh
fpfh
shot
vfh
cvfh
(b) Cube
300 400 500 600 700 800 900 1000 1100
10
0
10
1
10
2
10
3
10
4
Number of points
Descriptor runtime (msec.)
pfh
fpfh
shot
vfh
cvfh
(c) Cylinder
300 400 500 600 700 800 900 1000 1100
10
0
10
1
10
2
10
3
10
4
Number of points
Descriptor runtime (msec.)
pfh
fpfh
shot
vfh
cvfh
(d) Prism
300 400 500 600 700 800 900 1000 1100
10
0
10
1
10
2
10
3
10
4
Number of points
Descriptor runtime (msec.)
pfh
fpfh
shot
vfh
cvfh
(e) Sphere
   
     
     
     
    
     
(f) Mean and standard errors
Figure 6: Descriptor runtime depending on the shape.
Cone Cube Cylinder Prism Sphere
10
0
10
1
10
2
10
3
10
4
10
5
10
6
Matching runtime(msec.)
shot
pfh
fpfh
cvfh
vfh
(a) Euclidean
(b) Chi-squared
Cone Cube Cylinder Prism Sphere
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
Matching runtime(msec.)
shot
pfh
fpfh
cvfh
vfh
(c) Hellinger
(d) Manhattan
Figure 7: Matching runtime for each descriptor depending
on the shape.
and CVFH are the fastest in this comparison.
On the other hand, a study of the balance be-
tween runtime and accuracy is realized in step c).
Firstly, Figure 7 shows the mean runtime in match-
ing process between a test view and models database.
Again, the set of global descriptors (VFH and CVFH)
is faster than others (10
3
times), independently for the
high dimensionality of its signatures. Secondly, Fig-
ure 8 shows the difference between accuracy when
the matching process is made using models such as
test views and when it is made using test views. In
addition, accuracy is less using local descriptors than
global descriptors. Although CVFH has the best ac-
curacy rate, another important issue is the metric se-
shot pfh fpfh cvfh vfh
0
20
40
60
80
100
Accuracy (%)
Model vs Model
Model vs Test
(a) Euclidean
shot pfh fpfh cvfh vfh
0
20
40
60
80
100
Accuracy (%)
Model vs Model
Model vs Test
(b) Chi-squared
shot pfh fpfh cvfh vfh
0
20
40
60
80
100
Accuracy (%)
Model vs Model
Model vs Test
(c) Hellinger
shot pfh fpfh cvfh vfh
0
20
40
60
80
100
Accuracy (%)
Model vs Model
Model vs Test
(d) Manhattan
Figure 8: Accuracy rates for descriptors depending on met-
ric used in matching process.
lection. In terms of runtime, this selection is not out-
standing (Figure 7), but it is important in terms of
accuracy (Figure 8). In the experiments, model vs
model represented in Figure 3, a 20% increase of ac-
curacy rate is obtained. When L1 is used as observed
in Figure 8(a) - 8(d). Nevertheless, the best result is
obtained using χ
2
in th experiment, model vs test rep-
resented in Figure 4. In this case, a 5% increase of
accuracy is achieved.
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
434
6 CONCLUSIONS
This paper discusses the effectiveness of using 3D de-
scriptors based on normals to surfaces in order to rec-
ognize geometric objects. 3D descriptors were used
for real physical and virtual objects recognition by
means of matching with virtual geometric models. A
total of 6028 tests have been done. Where 3800 tests
(4 different distances, 5 descriptors, 5 shapes and 38
views per shape) are from the model-vs-model ex-
periment, 2100 tests (2 different distances, 5 descrip-
tors, 5 shapes and 42 views per shape) are from the
model-vs-test experiment and 128 tests (2 different
distances, 2 descriptors, one shape and 32 views) are
from the model-vs-real-physical-object experiment.
SHOT and FPFH are run in CPU-based parallel im-
plementation. The computer specification is Intel
Core i7-4770k processor, equipped with 16GB of sys-
tem memory and GPU is Nvidia GeForce 770GTX.
The effectiveness of recognition process is evaluated
by measuring the runtime and the precision to achieve
success rate of the recognition process. Those are de-
pending on the type of descriptor, resolution of the
point cloud which represents each object, and the
level of accuracy required for the recognition.
ACKNOWLEDGEMENTS
The research leading to these result has received
funding from the Spanish Government and European
FEDER funds (DPI2012-32390)and the Valencia Re-
gional Government (PROMETEO/2013/085).
REFERENCES
Aldoma, A., Marton, Z.-C., Tombari, F., Wohlkinger, W.,
Potthast, C., Zeisl, B., Rusu, R. B., Gedikli, S., and
Vincze, M. (2012a). Tutorial: Point cloud library:
Three-dimensional object recognition and 6 dof pose
estimation. In IEEE Robot. Automat. Mag., vol-
ume 19, pages 80–91.
Aldoma, A., Tombari, F., Rusu, R. B., and Vincze,
M. (2012b). Our-cvfh - oriented, unique and re-
peatable clustered viewpoint feature histogram for
object recognition and 6dof pose estimation. In
DAGM/OAGM Symposium, pages 113–122.
Aldoma, A., Vincze, M., Blodow, N., Gossow, D., Gedikli,
S., Rusu, R.B., and Bradski, G. R. (2011). Cad-model
recognition and 6dof pose estimation using 3d cues.
In IEEE International Conference on Computer Vision
Workshops, ICCV 2011 Workshops, Barcelona, Spain,
November 6-13, 2011, pages 585–592.
Alexandre, L. A. (2012). 3D descriptors for object and cate-
gory recognition: a comparative evaluation. In Work-
shop on Color-Depth Camera Fusion in Robotics at
the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), Vilamoura, Portugal.
Lai, K. (2013). Object Recognition and Semantic Scene
Labeling for RGB-D Data. PhD thesis, University of
Washington, Washington, USA.
Lian, Z., Godil, A., Bustos, B., Daoudi, M., Hermans, J.,
Kawamura, S., Kurita, Y., Lavou´e, G., Van Nguyen,
H., Ohbuchi, R., Ohkita, Y., Ohishi, Y., Porikli, F.,
Reuter, M., Sipiran, I., Smeets, D., Suetens, P., Tabia,
H., and Vandermeulen, D. (2013). A comparison
of methods for non-rigid 3d shape retrieval. Pattern
Recogn., 46(1):449–461.
Marton, Z.-C., Pangercic, D., Blodow, N., and Beetz, M.
(2011). Combined 2d-3d categorization and clas-
sification for multimodal perception systems. I. J.
Robotic Res., 30(11):1378–1402.
Rusu, R. B. (2009). Semantic 3D object maps for every-
day manipulation in human living environments. PhD
thesis, Technical University Munich.
Rusu, R. B., Bradski, G.R., Thibaux, R., and Hsu, J. (2010).
Fast 3d recognition and pose using the viewpoint fea-
ture histogram. In IROS, pages 2155–2162. IEEE.
Tombari, F., Salti, S., and Stefano, L. D. (2010). Unique
signatures of histograms for local surface descrip-
tion. In Proceedings of the 11th European Conference
on Computer Vision Conference on Computer Vision:
Part III, ECCV’10, pages 356–369, Berlin, Heidel-
berg. Springer-Verlag.
Wohlkinger, W., Aldoma, A., Rusu, R. B., and Vincze, M.
(2012). 3dnet: Large-scale object class recognition
from cad models. In ICRA, pages 5384–5391. IEEE.
Wohlkinger, W. and Vincze, M. (2011a). Ensemble of
shape functions for 3d object classification. In RO-
BIO, pages 2987–2992. IEEE.
Wohlkinger, W. and Vincze, M. (2011b). Shape distri-
butions on voxel surfaces for 3d object classification
from depth images. In ICSIPA, pages 115–120. IEEE.
APerformanceEvaluationofSurfaceNormals-basedDescriptorsforRecognitionofObjectsUsingCAD-Models
435
... Numerous feature descriptors are available for both 2D and 3D image data. Due to the fact that the depth information in 3D image data provides higher quality, 3D feature descriptors are preferably used for recognition based on CAD models [11,12,16]. Most modern 3D feature descriptors encode surface normal information, that is obtained from point clouds [16,17]. ...
... Due to the fact that the depth information in 3D image data provides higher quality, 3D feature descriptors are preferably used for recognition based on CAD models [11,12,16]. Most modern 3D feature descriptors encode surface normal information, that is obtained from point clouds [16,17]. The development of such descriptors has been stimulated by the increasing availability and affordability of 3D sensors, that directly capture point clouds of objects. ...
... For describing complete point clouds of objects or CAD models, global descriptors are particularly suitable. One very accurate and performant global descriptor is the clustered viewpoint feature histogram (CVFH) descriptor [16]. Global descriptors are the basis of global processing pipelines for 3D-Object recognition. ...
Conference Paper
Full-text available
The seamless fusion of the virtual world of information with the real physical world of things is considered the key for mastering the increasing complexity of production networks in the context of Industry 4.0. This fusion, widely referred to as the Internet of Things (IoT), is primarily enabled through the use of automatic identification (Auto-ID) technologies as an interface between the two worlds. Existing Auto-ID technologies almost exclusively rely on artificial features or identifiers that are attached to an object for the sole purpose of identification. In fact, using artificial features for the purpose of identification causes additional efforts and is not even always applicable. This paper, therefore, follows an approach of using multiple natural object features defined by the technical product information from computer-aided design (CAD) models for direct identification. By extending optical instance-level 3D-Object recognition by means of additional non-optical sensors, a multi-sensor automatic identification system (AIS) is realised, capable of identifying unpackaged piece goods without the need for artificial identifiers. While the implementation of a prototype confirms the feasibility of the approach, first experiments show improved accuracy and distinctiveness in identification compared to optical instance-level 3D-Object recognition. This paper aims to introduce the concept of multi-sensor identification and to present the prototype multi-sensor AIS.
... An analysis and evaluation of the normal-based surface descriptors, shown in section 2.1, was presented in previous works [12] and [16]. Both works test the classification capability of the descriptors to be applied in the recognition process of real objects. ...
... In this case, the Euclidean distance was considered because it is the most traditional metric for a matching process. In [16], these works were extended to different similarity measures such as Manhattan, Euclidean, Chisquared and Hellinger distances. Both studies uses metrics to measure the similarity among objects and models stored in a database by comparison of descriptors. ...
... Both studies uses metrics to measure the similarity among objects and models stored in a database by comparison of descriptors. The Manhattan distance gives the best results if all metrics are compared as shown [16]. Another difference between the two works is that the database models are real objects in [12] vs. object synthetic views generated from CAD models in [16], but in both works, the models are compared with real partial views captured from a depth sensor. ...
Article
Full-text available
During grasping and intelligent robotic manipulation tasks, the camera position relative to the scene changes dramatically because the robot is moving to adapt its path and correctly grasp objects. This is because the camera is mounted at the robot effector. For this reason, in this type of environment, a visual recognition system must be implemented to recognize and “automatically and autonomously” obtain the positions of objects in the scene. Furthermore, in industrial environments, all objects that are manipulated by robots are made of the same material and cannot be differentiated by features such as texture or color. In this work, first, a study and analysis of 3D recognition descriptors has been completed for application in these environments. Second, a visual recognition system designed from specific distributed client-server architecture has been proposed to be applied in the recognition process of industrial objects without these appearance features. Our system has been implemented to overcome problems of recognition when the objects can only be recognized by geometric shape and the simplicity of shapes could create ambiguity. Finally, some real tests are performed and illustrated to verify the satisfactory performance of the proposed system.
... A few papers (Alexandre 2012;Salti and Petrelli 2012;Mateo et al. 2014;Guo et al. 2016;Rostami et al. 2019;Zhang et al. 2019;Xie et al. 2020;Guo et al. 2020) have been published on 3D descriptors and the related fields, such as object recognition (Aldoma et al. 2012). As opposed to these papers, (1) we here cover a large number of descriptors to be evaluated. ...
Article
Full-text available
The development of inexpensive 3D data acquisition devices has promisingly facilitated the wide availability and popularity of point clouds, which attracts increasing attention to the effective extraction of 3D point cloud descriptors for accuracy of the efficiency of 3D computer vision tasks in recent years. However, how to develop discriminative and robust feature representations from 3D point clouds remains a challenging task due to their intrinsic characteristics. In this paper, we give a comprehensively insightful investigation of the existing 3D point cloud descriptors. These methods can be principally divided into two categories according to their advancement: hand-crafted and deep learning-based approaches, which will be further discussed from the perspective of elaborate classification, their advantages, and limitations. Finally, we present the future research directions of the extraction of 3D point cloud descriptors.
... In recent years, the use of descriptors in 3D PC data has increased, having an impact on the performance of several PC applications, such as object detection [12], semantic segmentation [13], and visual recognition [14]. More recently, Diniz et al. [7,8,[15][16][17][18][19][20] presented a collection of novel descriptors for assessing the quality of static PCs. ...
Conference Paper
Full-text available
Point Cloud Quality Assessment (PCQA) has become an important task in immersive multimedia since it is fundamental for improving computer graphics applications and ensuring the best Quality of Experience (QoE) for the end user. In recent years, the field of PCQA has made exemplary progress, with state-of-the-art methods achieving better predictive performance at lower computational complexity. However, most of this progress was made using Full-Reference (FR) metrics. Since, in many cases, the reference point cloud is not available, the design of No-Reference (NR) methods has become increasingly important. In this paper, we investigate the suitability of geometric-aware texture descriptors to blindly assess the quality of colored Dynamic Point Cloud (DPC). The proposed metric first uses a descriptor to extract features of the assessed Point Cloud (PC) frames. Then, the descriptor statistics are used to extract quality-aware features. Finally, a machine learning algorithm is employed to regress the quality-aware features into visual quality scores, and these scores are aggregated using a temporal pooling function. Then we study the effects of different temporal pooling strategies on the performance of DPC quality assessment methods. Our experimental tests were carried out using the lat-est publicly available database and demonstrated the efficiency of the evaluated temporal pooling models. This work aims to provide a direction on how to apply a temporal pooling function to combine per-frame quality predictions generated with descriptor-based PC quality assessment methods to estimate the quality of dynamic PCs. An implementation of the metric described in this paper can be found in https://gitlab.com/gpds-unb/no-reference-dpcqa-temporal-pooling.
... In rigid object recognition, the derivatives of normals are often used [65], as normals can directly reflect the geometry of surfaces. In fact, normals are currently the most frequently used clue in shape representation research [66]. ...
Article
Making robots know people’s place concepts has attracted researchers for decades. People believe that this capability will firmly benefit not only robot–human interaction but also reasonable and social robot behaviors, or even traditional problems in robot research such as object detection. Focusing on place classification, this paper builds a kind of native pure 3D geometric description to capture place layouts based on common point clouds. This perspective enables our method to naturally accommodate various illuminations, including extremely bad lighting for which traditional image methods cannot work properly. The space of a place is first divided into 3D voxels. The cardinal orientations of this space are then extracted, and the geometric attributes of the voxels are subsequently represented based on the cardinal orientations. The voxels with geometric attributes are defined as the cardinal-direction prototyping blocks (CDPBs). Next, the CDPB distribution for a scene is calculated by qualitative spatial description technology, thereby obtaining the complete place description. Given the sparse description, the sparse random forest (SRF) is used for learning. The experiments indicate that the CDPB-based method outperforms the current 3D geometric method and its mixed method, and it has good time performance. The main advantages of our method are that it does not require any strict hypotheses on surfaces, such as planar surfaces, it requires smaller fusion windows to attain satisfactory classification rates, it can be used in extreme lighting environments, and its parameter selection is easy.
... Megherbi [193] presents a comparison of classification approaches for automatic threat objects identification in computed tomography images. Mateo [190] describes a study analyzing 3D descriptors based on normal surface for 3D object recognition. Sharman shows a systematic review of segmentation and 3D object recognition algorithms, employing a total of 20 works for this analysis. ...
Article
Full-text available
In this paper, we present a systematic literature review concerning 3D object recognition and classification. We cover articles published between 2006 and 2016 available in three scientific databases (ScienceDirect, IEEE Xplore and ACM), using the methodology for systematic review proposed by Kitchenham. Based on this methodology, we used tags and exclusion criteria to select papers about the topic under study. After the works selection, we applied a categorization process aiming to group similar object representation types, analyzing the steps applied for object recognition, the tests and evaluation performed and the databases used. Lastly, we compressed all the obtained information in a general overview and presented future prospects for the area. Link for the publication: https://link.springer.com/epdf/10.1007/s10044-019-00804-4?author_access_token=paE7wTbqwKN7oCwVliHwLve4RwlQNchNByi7wbcMAY7uL2tJzq0UXA0O13kX7wvxz98EQWbRDi2uT7G5KxVe0WzCAoagCbJhmkFlrCPdZIPfyyYkaSt_0zAEiJJc2cojH9AajAmYQ5BT1LV4EonJMg%3D%3D
... This process uses an object-recognition pipeline based on shape retrieval and interest region detections. There are several previous works that use these approaches, a case in point are [32,35], in which 3D objects are recognized and located on a worktable by using shape descriptors. Others, such as Aldoma et al. in [36], presented a review of current techniques of object recognition, comparing local and global mesh descriptors. ...
Article
Full-text available
Sensing techniques are important for solving problems of uncertainty inherent to intelligent grasping tasks. The main goal here is to present a visual sensing system based on range imaging technology for robot manipulation of non-rigid objects. Our proposal provides a suitable visual perception system of complex grasping tasks to support a robot controller when other sensor systems, such as tactile and force, are not able to obtain useful data relevant to the grasping manipulation task. In particular, a new visual approach based on RGBD data was implemented to help a robot controller carry out intelligent manipulation tasks with flexible objects. The proposed method supervises the interaction between the grasped object and the robot hand in order to avoid poor contact between the fingertips and an object when there is neither force nor pressure data. This new approach is also used to measure changes to the shape of an object's surfaces and so allows us to find deformations caused by inappropriate pressure being applied by the hand's fingers. Test was carried out for grasping tasks involving several flexible household objects with a multi-fingered robot hand working in real time. Our approach generates pulses from the deformation detection method and sends an event message to the robot controller when surface deformation is detected. In comparison with other methods, the obtained results reveal that our visual pipeline does not use deformations models of objects and materials, as well as the approach works well both planar and 3D household objects in real time. In addition, our method does not depend on the pose of the robot hand because the location of the reference system is computed from a recognition process of a pattern located place at the robot forearm. The presented experiments demonstrate that the proposed method accomplishes a good monitoring of grasping task with several objects and different grasping configurations in indoor environments.
... This process uses an object-recognition pipeline based on shape retrieval and interest region detections. There are several previous works that use these approaches, a case in point are [32,35], in which 3D objects are recognized and located on a worktable by using shape descriptors. Others, such as Aldoma et al. in [36], presented a review of current techniques of object recognition, comparing local and global mesh descriptors. ...
Article
The introduction of inexpensive 3D data acquisition devices has promisingly facilitated the wide availability and popularity of 3D point cloud, which attracts more attention on the effective extraction of novel 3D point cloud descriptors for accurate and efficient of 3D computer vision tasks. However, how to de- velop discriminative and robust feature descriptors from various point clouds remains a challenging task. This paper comprehensively investigates the exist- ing approaches for extracting 3D point cloud descriptors which are categorized into three major classes: local-based descriptor, global-based descriptor and hybrid-based descriptor. Furthermore, experiments are carried out to present a thorough evaluation of performance of several state-of-the-art 3D point cloud descriptors used widely in practice in terms of descriptiveness, robustness and efficiency.
Conference Paper
Most works in dexterous manipulation consider Force Closure Grasp (FCG), not only for rigid object but also, for flexible ones, though in the second case a force readjustment is also necessary. However, there are situations in which FCG is nonviable. This paper deals with the situation of a flexible object that must be necessarily grasped from one of its sides, taking advantage of its flexibility. This situation is common in packaging processes, or other everyday situations carried out by people. The paper presents first a new solution based on a planner that reproduces the actions of a human hand, aided by a database with knowledge about previous tasks. The paper is focused on describing the first tests done to evaluate the effectiveness of the planner for grasping different basic flexible objects, compared to a human hand.
Conference Paper
Full-text available
This paper deals with local 3D descriptors for surface matching. First, we categorize existing methods into two classes: Signatures and Histograms. Then, by discussion and experiments alike, we point out the key issues of unique- ness and repeatability of the local reference frame. Based on these observations, we formulate a novel comprehensive proposal for surface representation, which encompasses a new unique and repeatable local reference frame as well as a new 3D descriptor. The latter lays at the intersection between Signatures and His- tograms, so as to possibly achieve a better balance between descriptiveness and robustness. Experiments on publicly available datasets as well as on range scans obtained with Spacetime Stereo provide a thorough validation of our proposal.
Conference Paper
We propose a novel method to estimate a unique and repeatable reference frame in the context of 3D object recognition from a single viewpoint based on global descriptors. We show that the ability of defining a robust reference frame on both model and scene views allows creating descriptive global representations of the object view, with the beneficial effect of enhancing the spatial descriptiveness of the feature and its ability to recognize objects by means of a simple nearest neighbor classifier computed on the descriptor space. Moreover, the definition of repeatable directions can be deployed to efficiently retrieve the 6DOF pose of the objects in a scene. We experimentally demonstrate the effectiveness of the proposed method on a dataset including 23 scenes acquired with the Microsoft Kinect sensor and 25 full-3D models by comparing the proposed approach with state-of-the-art global descriptors. A substantial improvement is presented regarding accuracy in recognition and 6DOF pose estimation, as well as in terms of computational performance.
Article
With the advent of new-generation depth sensors, the use of three-dimensional (3-D) data is becoming increasingly popular. As these sensors are commodity hardware and sold at low cost, a rapidly growing group of people can acquire 3- D data cheaply and in real time.
Article
This work addresses the problem of real-time 3D shape based object class recognition, its scaling to many categories and the reliable perception of categories. A novel shape descriptor for partial point clouds based on shape functions is presented, capable of training on synthetic data and classifying objects from a depth sensor in a single partial view in a fast and robust manner. The classification task is stated as a 3D retrieval task finding the nearest neighbors from synthetically generated views of CAD-models to the sensed point cloud with a Kinect-style depth sensor. The presented shape descriptor shows that the combination of angle, point-distance and area shape functions gives a significant boost in recognition rate against the baseline descriptor and outperforms the state-of-the-art descriptors in our experimental evaluation on a publicly available dataset of real-world objects in table scene contexts with up to 200 categories.
Article
In this article we describe an object perception system for autonomous robots performing everyday manipulation tasks in kitchen environments. The perception system gains its strengths by exploiting that the robots are to perform the same kinds of tasks with the same objects over and over again. It does so by learning the object representations necessary for the recognition and reconstruction in the context of pick-and-place tasks. The system employs a library of specialized perception routines that solve different, well-defined perceptual sub-tasks and can be combined into composite perceptual activities including the construction of an object model database, multimodal object classification, and object model reconstruction for grasping. We evaluate the effectiveness of our methods, and give examples of application scenarios using our personal robotic assistants acting in a human living environment.
Article
3D object and object class recognition gained momentum with the arrival of low-cost RGB-D sensors and enables robotics tasks not feasible years ago. Scaling object class recognition to hundreds of classes still requires extensive time and many objects for learning. To overcome the training issue, we introduce a methodology for learning 3D descriptors from synthetic CAD-models and classification of never-before-seen objects at the first glance, where classification rates and speed are suited for robotics tasks. We provide this in 3DNet (3d-net.org), a free resource for object class recognition and 6DOF pose estimation from point cloud data. 3DNet provides a large-scale hierarchical CAD-model databases with increasing numbers of classes and difficulty with 10, 50, 100 and 200 object classes together with evaluation datasets that contain thousands of scenes captured with a RGB-D sensor. 3DNet further provides an open-source framework based on the Point Cloud Library (PCL) for testing new descriptors and benchmarking of state-of-the-art descriptors together with pose estimation procedures to enable robotics tasks such as search and grasping.
Article
Non-rigid 3D shape retrieval has become an active and important research topic in content-based 3D object retrieval. The aim of this paper is to measure and compare the performance of state-of-the-art methods for non-rigid 3D shape retrieval. The paper develops a new benchmark consisting of 600 non-rigid 3D watertight meshes, which are equally classified into 30 categories, to carry out experiments for 11 different algorithms, whose retrieval accuracies are evaluated using six commonly utilized measures. Models and evaluation tools of the new benchmark are publicly available on our web site [1].
Conference Paper
This paper focuses on developing a fast and accurate 3D feature for use in object recognition and pose estimation for rigid objects. More specifically, given a set of CAD models of different objects representing our knoweledge of the world - obtained using high-precission scanners that deliver accurate and noiseless data - our goal is to identify and estimate their pose in a real scene obtained by a depth sensor like the Microsoft Kinect. Borrowing ideas from the Viewpoint Feature Histogram (VFH) due to its computational efficiency and recognition performance, we describe the Clustered Viewpoint Feature Histogram (CVFH) and the cameras roll histogram together with our recognition framework to show that it can be effectively used to recognize objects and 6DOF pose in real environments dealing with partial occlusion, noise and different sensors atributes for training and recognition data. We show that CVFH out-performs VFH and present recognition results using the Microsoft Kinect Sensor on an object set of 44 objects.
Conference Paper
In this work we address the problem of 3D shape based object class recognition directly from point cloud data obtained from RGB-D cameras like the Kinect sensor from Microsoft. A novel shape descriptor is presented, capable of classifying 'never before seen objects' at their first occurrence in a single view in a fast and robust manner. The classification task is stated as a matching problem, finding the most similar D model and view from a database of CAD models gathered from the web to a given depth image. We further show how locally sensitive hashing can be easily adapted to implement fast matching against a database of 2500 CAD models with more than 200000 views in 160 categories. This shape descriptor utilizes distributions on voxel surfaces and can be used in various applications: As a pure 3D descriptor for D model retrieval, as a 2.5D descriptor for finding 3D models to partial views or as our main indention as a classification system in the home-robotics domain to enable recognition and manipulation of everyday objects. Experimental evaluation against the baseline descriptors on a dataset of real-world objects in table scene contexts and on a 3D database shows significant improvements.