Conference PaperPDF Available

3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder

Authors:

Abstract and Figures

We present an algorithm for registration between a large-scale point cloud and a close-proximity scanned point cloud, providing a localization solution that is fully independent of prior information about the initial positions of the two point cloud coordinate systems. The algorithm , denoted LORAX, selects super-points—local subsets of points—and describes the geometric structure of each with a low-dimensional descriptor. These descriptors are then used to infer potential matching regions for an efficient coarse registration process, followed by a fine-tuning stage. The set of super-points is selected by covering the point clouds with overlapping spheres, and then filtering out those of low-quality or nonsalient regions. The descriptors are computed using state-of-the-art unsupervised machine learning, utilizing the technology of deep neural network based auto-encoders. This novel framework provides a strong alternative to the common practice of using manually designed key-point descriptors for coarse point cloud registration. Utilizing super-points instead of key-points allows the available geometrical data to be better exploited to find the correct transformation. Encoding local 3D geometric structures using a deep neural network auto-encoder instead of traditional descriptors continues the trend seen in other computer vision applications and indeed leads to superior results. The algorithm is tested on challenging point cloud registration datasets, and its advantages over previous approaches as well as its robustness to density changes, noise and missing data are shown.
Content may be subject to copyright.
3D Point Cloud Registration for Localization using a
Deep Neural Network Auto-Encoder
Gil Elbaz Tamar Avraham Anath Fischer
Technion - Israel Institute of Technology
gil.elbaz@alumni.technion.ac.il tammya@cs.technion.ac.il meranath@technion.ac.il
Abstract
We present an algorithm for registration between a
large-scale point cloud and a close-proximity scanned point
cloud, providing a localization solution that is fully in-
dependent of prior information about the initial positions
of the two point cloud coordinate systems. The algo-
rithm, denoted LORAX, selects super-points—local subsets
of points—and describes the geometric structure of each
with a low-dimensional descriptor. These descriptors are
then used to infer potential matching regions for an effi-
cient coarse registration process, followed by a fine-tuning
stage. The set of super-points is selected by covering the
point clouds with overlapping spheres, and then filtering out
those of low-quality or nonsalient regions. The descriptors
are computed using state-of-the-art unsupervised machine
learning, utilizing the technology of deep neural network
based auto-encoders.
This novel framework provides a strong alternative to
the common practice of using manually designed key-point
descriptors for coarse point cloud registration. Utilizing
super-points instead of key-points allows the available geo-
metrical data to be better exploited to find the correct trans-
formation. Encoding local 3D geometric structures using
a deep neural network auto-encoder instead of traditional
descriptors continues the trend seen in other computer vi-
sion applications and indeed leads to superior results. The
algorithm is tested on challenging point cloud registration
datasets, and its advantages over previous approaches as
well as its robustness to density changes, noise and missing
data are shown.
1. Introduction
1.1. Overview
Point clouds, similarly to images, capture semantic in-
formation describing the objects in the world around us. In
contrast to image data, which holds a two-dimensional pro-
jection of the scene in a fixed grid, a point cloud is a set of
Figure 1: Registration between a close-proximity point
cloud (colored) and a large-scale point cloud (grayscale)
unorganized three-dimensional points in a unified coordi-
nate system, capturing 3D spatial information. Methods for
point cloud data analysis have been developed throughout
the past few decades [1], and the significance of research in
this field is trending upwards due to advances in affordable
high-quality 3D scanning technology [2], machine learning
breakthroughs, and new interesting applications.
Point cloud registration is defined as finding the trans-
formation between two separate point cloud coordinate sys-
tems. It is key for Simultaneous Localization and Map-
ping (SLAM) [3,4], 3D reconstruction of scenes [5], and it
has become central in vision-based autonomous driving [6].
Much progress has been achieved, yet there are still major
challenges, such as registration of large-scale point clouds,
with low scene overlap and without prior positional infor-
mation.
Outdoor localization today relies heavily on GPS tech-
nology, accompanied by ground based augmentation sys-
tems to improve accuracy. This localization requires receiv-
ing signals from multiple satellites simultaneously with an
accuracy that varies greatly with the number of satellites
available, the weather, and physical obstructions blocking
or altering the signal path. The technology is inaccurate and
unreliable in areas around the world with little to no ground
infrastructure [7].
In this paper we focus on a localization technique that
relies on registering a large-scale point cloud and a small-
scale point cloud scanned within a scene at different times.
The registration is independent of proximity information
between the clouds’ initial coordinate systems. We define
the two point clouds: a “global point cloud”, made up of a
large scanned outdoor scene with a coordinate system fixed
to a real-world geographic coordinate system, and a “lo-
cal point cloud”, made up of a substantially smaller point
cloud, captured online at an unknown location and orienta-
tion within the global point cloud scene. The transformation
between the local and global point cloud is calculated using
a machine learning analysis of their geometry. This serves
as a high quality localization method that is completely in-
dependent of GPS for outdoor environments. See Fig. 1.
1.2. Related Work
Registration algorithms are divided into those dealing
with coarse registration and those dealing with fine regis-
tration. Coarse registration algorithms make no prior prox-
imity assumptions of the point cloud positions and aim for
a coarse alignment, meaning a lenient loss function pol-
icy. Fine registration algorithms assume that the input point
clouds are approximately aligned; they thus utilize the ini-
tial proximity between the points to fine-tune the alignment
between the point cloud coordinate systems. Fine registra-
tion can be used when the two clouds are acquired con-
secutively with large overlaps between their scenes, or as
a follow-up to a coarse registration procedure.
Numerous point cloud coarse registration methods have
been developed [8], yet coarse registration remains an open
challenge with much room for improvement. In the Fast
Point Feature Histogram [9,10] (FPFH) algorithm, a his-
togram based descriptor is calculated for each point within
the point cloud, over multiple scales. Salient persistent his-
tograms over multiple scale calculations are labeled as key-
points, which are then matched to find the registration be-
tween the point clouds. Other descriptors for locating and
describing key-points were suggested as well. See [11] for
a survey. Some examples are 3D-SIFT [12] , NARF [13],
and SHOT [11]. Many complex hand-coded features were
proposed, with the same goal of being invariant to rotation
and translation, and robust to noise.
In the field of 2D computer vision, a similar develop-
ment period of hand-coded features has come to an abrupt
end due to the breakthrough research in the field of deep
learning [14]. Using the deep learning methods, much more
advanced features (with complexity beyond human design)
are calculated from within the data, advancing major fields
in 2D computer vision such as detection, classification, seg-
mentation, localization and registration [15]. These meth-
ods focus almost exclusively on 2D data. The unstruc-
tured, continuous and large point cloud datasets pose ex-
treme problems not enabling a straightforward dimension
adaption into 3D space. In order to utilize the 3D data in
our method, the point cloud is densely sampled, and the 2.5
dimensional data of each local surface is captured and com-
bined. The advanced tools of deep learning are applied to
this data, in the form of unsupervised machine learning for
high quality dimension reduction [16], as a critical stage for
the coarse point cloud registration.
A different approach based on linear plane matching was
developed for the coarse registration of airborne LIDAR
point clouds [17]. By relying on the presence of linear struc-
tures, this approach is limited to specific dataset classes.
The problem of fine registration between point clouds
has been intensively studied, and high quality solutions now
exist for online applications such as SLAM [3,4]. The so-
lutions revolve around the Iterative Closest Point (ICP) [18]
algorithm and its improvements [19]. A noteworthy fine
registration method that is based on the correlation of Ex-
tended Gaussian Images in the Fourier domain [20] was
proposed as an alternative to ICP, although its final stage
again relied on iterations of ICP for fine-tuning. Fine-
registration is not the focus of this research, although to
achieve end-to-end registration the standard ICP algorithm
is utilized in its final stages.
All of the above registration methods were designed for
input point cloud pairs that are similar in order of magnitude
and low in quantity (under 1 million) of points.
1.3. Contribution
This work proposes and tests two original approaches,
applied for the first time as a base to point cloud registration:
1. Using super-points (selected by a Random Sphere
Cover Set) as the basic units for matching, instead of
the commonly used key-points or local linear struc-
tures. This utilizes a much wider variety of geomet-
ric structures, and better exploits the available data for
finding the correct transformation. In addition, it trans-
forms the complexity of the rest of the algorithm to be
correlated to the surface area covered in the point cloud
scene instead of on the number of points in the scene.
It is simple, fast, and allows for scalability.
2. Encoding local 3D geometric structures using a deep
neural network auto-encoder. This method provides
state-of-the-art encoding in image analysis applica-
tions. By adapting the data and applying this method
within the algorithmic pipeline developed, it creates
features from within the data that are proven to out-
perform manually designed local geometry features.
We show here that combining these ideas produces promis-
ing registration results on multiple challenging datasets.
The method is generic in the sense that it can work with
any data regardless of the type of sensor or scene.
While most registration algorithms deal with similar-
sized point clouds, we take on the unique problem setup
of point clouds that differ significantly in size, and we de-
signed our algorithm to be effective on large-scale scanned
data. Although the algorithm requires an initial offline
stage, the online stages can be implemented efficiently in
parallel, making it suitable for real-time applications.
2. The LORAX Registration Algorithm
We focus on the registration of two point clouds: a global
point cloud depicting a large outdoor area, and a small lo-
cal point cloud captured from within the global point cloud
scene. The global point cloud can contain as many as ~100
million 3D points, while the local point cloud is two-to-
three orders of magnitude smaller.
In this section we present the LOcalization by Registra-
tion using a deep Auto-encoder reduced Cover Set (LO-
RAX) algorithm.
2.1. Algorithm Overview
The algorithm includes the following steps:
1. Division of the point clouds to super-points using the
new Random Sphere Cover Set algorithm.
2. Selection of a normalized local coordinate system for
each super-point.
3. Projection of super-point data onto 2D depth maps.
4. Saliency detection and filtration of super-points.
5. Dimension reduction using a Deep Neural Network
Auto-Encoder.
6. Finding candidate matches between correlating de-
scriptors.
7. Coarse registration using a localized search.
8. Iterative Closest Point fine-tuning.
Next, each stage of the algorithm will be explained in detail
and analyzed.
2.2. Random Sphere Cover Set (RSCS)
First, the set of super-points (SP) that will be used as the
basic units for the matching process is defined. Each super-
point is a subset of points describing a local surface. Over-
lapping is allowed (i.e., one point can be included in several
super-points). To obtain coverage of almost all points in a
cloud (~95%) with a manageable representation, we sug-
gest the following iterative procedure: (1) randomly select
a point Pthat does not yet belong to any SP; (2) define a
new SP as the set of points located inside the sphere of a
fixed radius Rsphere with Pas its center.
Figure 2: Coverage of points vs. RSCS iterations
This simple procedure, which we refer to as RSCS, has
interesting properties that allow the Rsphere parameter to
be estimated. In random sphere packing, non-overlapping
spheres were shown to fill approximately 64% of an en-
closed 3D region [21]. Let Vlocal be the volume of a sphere
encompassing the local point cloud and let mbe the number
of matches used in the final stage of the algorithm. To en-
sure that the minimum, m, of SP pairs will be matched, we
select Rsphere so that it will be possible to randomly pack
2mspheres inside the volume Vlocal.
Rsphere 3
4··0.64
2m·Vlocal
1
3
(1)
An estimation of the number of SPs created by the RSCS
algorithm given the intrinsic parameters of the local and
global point clouds is analyzed in the Supplementary Ma-
terial. It is shown that the method covers points of the point
cloud at an exponentially decaying rate. Fig. 2shows the
percentage of points covered as a function of RSCS itera-
tions. The RSCS algorithm is applied once on the global
point cloud and multiple times on the local point cloud,
for robustness in the later stages. The Nlocal
SP SPs found
from the multiple applications of RSCS on the local point
cloud are combined into a single set representing the local
point cloud in the next stages of the algorithm, as shown in
Fig. 7(a) and (b).
2.3. Selection of a Normalized Local Coordinate
System for each Super-point
The local coordination system for a SP is defined as fol-
lows: the origin is set to be the centroid of the SP, then the
coordinate system of the SP is set using Singular Value De-
composition (SVD) on the estimated covariance matrix of
the points within the SP.
The assumption that each SP describes a surface of the
scene is utilized in this stage. The eigenvalues of surfaces
hold two large eigenvectors of similar size and one smaller
eigenvector. This signifies that the points are mostly scat-
tered in two dimensions while the third dimension has sig-
nificantly lower variance. The z-axis is set to be the third
eigenvector. In order to define the x-axis, the mean height
of discrete radial slices of the SP are calculated and inserted
into a polar histogram. Then the x-axis is set to the direc-
tion corresponding to the largest bin. This local coordinate
system creates invariance to the location and orientation of
the SP, while preserving its geometry.
2.4. Depth Map Projection
After bringing each SP into a local coordinate system,
they can be directly compared. However the results will be
completely unreliable, as they will have been affected by
the variation in point density and by random noise. A di-
mension reduction is crucial to mitigate these effects. To
this end, the continuous point location data is converted
into a discrete image format (of size [dim1,d
im1]). The
SP is scaled to the dimensions of the image dim1(we used
dim1= 64), after which the z-axis height of each point
is projected onto the depth map for each corresponding
pixel. Finally, the image is cropped to [dim2,d
im2](we used
dim2= 32), to remove the circular edges of the SP in the
depth map. See Fig. 3(a) and (b).
(a) Example SP (b) The depth map (c) The reconstruction
Figure 3: Super-point depth map projection
To reduce the effects of noise and varying densities, a
max filter and then a mean filter are applied to the image.
This modification of the SP information can be visualized
by reconstructing the SP from the depth map. As shown in
Fig. 3(c), the reconstruction reliably holds the same geomet-
ric shapes and qualities as the original SP point cloud, while
creating a complete coverage over the unknown sparse re-
gions.
2.5. Saliency Detection and Filtration
For the fastest and best quality registration, the number
of irrelevant SPs that pass through this pipeline should be
reduced. Irrelevant SPs are filtered out by three criteria:
density, geometric properties, and saliency levels.
Density Test: The density is measured both in absolute
terms and in comparison to other SPs. This means that SPs
containing fewer than Ndpoints are filtered out. In addi-
tion, SPs with relatively few points in comparison to their
Knearest neighbor SP (Euclidean distance is measured be-
tween SP centroids) are also filtered out.
Geometric Quality Test: The height of each SP within its
individual local coordinate system is measured. Low height
SPs, which signal flat surfaces, are filtered out.
Saliency Test: SP depth maps from the global point cloud
are reshaped into a column “depth vector” of length d2
im2.
A Principal Component Analysis (PCA) is performed for
the set of depth vectors. The SPs (from the local and global
point clouds) that are accurately reconstructed using only
the first three eigenvectors have commonly found geometric
characteristics within the dataset, and are thus filtered out.
This reduces the chance of matches between similar SPs
located in different areas of the point clouds.
2.6. Auto-Encoder Dimension Reduction
A key stage in this algorithm is the comparison be-
tween SP geometries from within global and local point
clouds. The comparison of high-dimensional objects is
prone to large noise and variance, even with identical se-
mantic meaning. To compare the semantics of the SP ge-
ometry, the dimension of the depth map images must be re-
duced while retaining maximum geometrical information.
We constructed and tested two separate dimension reduc-
tion methods in this research. The first is based on PCA and
the second on a Deep Auto-Encoder (DAE).
2.6.1 Linear Dimension Reduction
The PCA method lowers the dimension of the data by pro-
jecting it onto a lower dimension linear hyperplane. A base
of keigenvectors are calculated from the depth vectors of
the global point cloud SP (similarly to the saliency detec-
tion, but here k>3). The keigenvalues corresponding to
the eigenvectors define the super-position required to recre-
ate each SP. This compact representation feature holding
geometric information of the SP is denoted as the Princi-
pal Component Analysis Super-point Feature (PCAF). It is
important to note that PCA creates a linear projection of the
data, which results in high data loss and a relatively large
reduced form. This method serves as a comparable bench-
mark for the DAE method.
2.6.2 Deep Auto-Encoder Dimension Reduction
It was shown that DAE neural networks [16] yield state-
of-the-art image compression. Here we use this technique
to obtain compact representations of the 2.5D super-point
geometry, captured in the depth maps.
DAE neural network architectures are made up of en-
coder and decoder stages. The encoder stage starts with
the input layer and then is connected to hidden layers,
Figure 4: Deep Auto-encoder architecture
which gradually decrease in dimension until reaching the
requested compact dimension. The decoder stage starts
with the compact representation of the data; each succeed-
ing hidden layer is of a higher dimension, until the output
layer dimension, equal to the input dimension, is reached.
The loss function is defined as the pixel-wise error between
the input and the output layer, optimizing the network to
achieve the best compact representation of the image.
To design a DAE that would fit our application, we per-
formed extensive empirical testing and optimization. We
concluded that a network with 4 fully connected hidden
layers, using the sigmoid non-linear activation function be-
tween each of the layers and dropout (DO) on the in-
put layer, returns the satisfactory results. To further re-
duce the number of parameters in the network, the 4th
and 1st hidden layers as well as the 3rd and 2nd lay-
ers mirror each other with identical weights, while retain-
ing individually learned bias values. This weight shar-
ing means that the back propagation learning algorithm
is constrained to optimize the same weights for the en-
coding and decoding process [22]. The compact dimen-
sion of the reduction is defined by the encoder output di-
mension. The architecture uses the following dimensions:
1032(Input), [1032,128](L1), [128,10](L2), [10,128](L3),
[128,1032](L4), 1032(output). See Fig. 4.
A combination of data driven and synthetic depth maps
are utilized to initially train the deep neural network.
100,000 super-point depth maps were used to train the pro-
posed DAE. The training stage is unsupervised, i.e., no
manual annotation of data is required and the network is
initialized with random weights. This training process is
offline, and the network can be improved by updating it pe-
riodically with additional point cloud data acquired from
scanned local clouds. The offline training process can be
lengthy, but the encoding portion can be activated online
inexpensively and quickly.
This compact low-dimension representation can be seen
as a feature capturing the geometric information of the en-
tire SP. It represents the SP at a fixed lower-dimension vec-
tor, not correlated with the number of points in the SP. This
is a substantial improvement in the reduction of complexity,
Figure 5: Visualization of deep auto-encoder input, reduc-
tion and reconstruction
in comparison to the competing local descriptors at each
point. The SP auto-encoder based Feature (SAF) can be
used for many tasks, such as detection or classification of
3D objects, while here we optimize it for registration.
Fig. 5shows examples of depth maps input into the DAE
and reduced to a 10-dimensional SAF (5x2 matrix enlarged
for better visualization) and then reconstructed to the origi-
nal dimensions through the decoder. The height of the depth
map is translated into color: blue corresponds to zero height
and dark red corresponds to maximal height. The recon-
struction is not identical to the input, but it does capture the
general geometry of the SP. This is optimal for robustness
to noise and small changes—crucial for our application—
while capturing the significant SP geometric properties.
To further show the effectiveness of the DAE, the fea-
tures learned from within the data are analyzed and com-
pared to the eigenvectors calculated from the PCA method.
To do this, an independent activation of each dimension in
the SAF vector is input into the decoder to visualize what
the DAE has learned. See Fig. 6.
The eigenvectors of the PCA and the independent de-
coder activations of the DAE are the “building-blocks”
of the compact representations created in both methods.
Fig. 6(a) shows the growing complexity of each eigenvector,
which is sorted by eigenvalue. This is in contrast to the un-
structured DAE activation images. See Fig. 6(b). The PCA
method can be approximated as a single hidden layer neural
network with only linear functions [23]. This means that
in order to represent complex geometry, complex eigenvec-
tors are needed. Due to multi-layered super-position of the
values in the DAE, complex geometry is represented using
relatively simple “building blocks”.
2.7. Selecting Candidates for Matching
After describing each SP by a SAF vector we select a
set of similarly described SP pairs to act as candidates for
matching. By measuring the Euclidian distance between
SAF features, each SP in the local point cloud is paired with
its K-nearest-neighbors from the global point cloud (we set
K to 3). When the distance associated with the i+1 nearest
neighbor is significantly larger than that associated with the
i nearest neighbor, we filter out candidates i+1 to K. Note
that the number of candidates, Pcandidates is in the order
of O(Nlocal
SP ). To get a feeling for the number of candidates
selected, consider a problem set with a global point cloud of
10 million points, and a local point cloud of 500 thousand
points. For this set we get approximately 200 SPs from the
local point cloud, meaning that about 550 candidates are
selected. See Fig. 7(c).
(a) PCA eigenvectors
(b) DAE independent decoder activations
Figure 6: Compact Representation Vectors
2.8. Coarse Registration by Localized Search
To find the 6DoF (6 degrees of freedom) transformation
between the point clouds, at least 3 matches are required
(we used m=6for robustness). Dealing with the search
space size of at least Pcandidates
mis impractical (over 1013
for the example above). We therefore consider for each it-
eration only mcandidate-pairs for which all global cloud
points can be encompassed in a sphere with a volume not
exceeding Vlocal. This reduces the search space of transfor-
mation options by 8 to 9 orders of magnitude (reducing the
options in the example above to about 40,000). We use a
RANSAC [24] procedure, iteratively selecting 6 candidate-
pairs, computing a transformation, and checking the con-
sensus by measuring the average (physical) distance be-
tween transformed points in the local point cloud and their
nearest neighbors in the global point cloud. (To save run-
time, we transform only a diluted version of the local point
cloud.) We tested 10,000 random selections (about 1/4 of
the search space). Instead of selecting only the best scoring
transformation as the result of the coarse registration step,
we record the 5 best transformations (T1,...,T
5) in which
(a) Global point cloud with RSCS (b) Local point cloud
with RSCS
(c) Matching candidate connections
Figure 7: RSCS and matching candidates (each point is col-
ored according to the last RSCS iteration to cover it)
the local point clouds are non-overlapping. Then the fine-
tuning step (described in the next section) is applied to each,
and the one that yielded the best scoring fine registration is
finally selected.
2.9. Iterative Closest Point Fine-tuning
Simple Iterative Closest Point (ICP) fine tuning is per-
formed, initialized by each of T1,...,T
5transformations.
The registration with the lowest ICP loss is chosen, defining
the LORAX output transformation. This step stems from
the realization that the “closest” coarse registration result
doesn’t always correlate with the correct registration result,
as there are many local minima in the optimization func-
tion. The best fine registration is shown empirically to cor-
respond to the best coarse registration in about 75% of the
cases, the second-best in about 18% of the cases, and the
third-best in about 4%. This stage can be replaced by any
fine-tuning approach.
2.10. Efficiency Discussion
Our current implementation was not optimized for real-
time performance. However, this algorithm does have the
potential to be incorporated in field equipment and perform
real-time localization, given that the global point cloud is
captured ahead of time via aerial LIDAR or stereographic
reconstruction. The neural network training process and the
calculations on the global point cloud up to the dimension
reduction stage are designed to run offline. The compact
descriptors from the global cloud may be saved into the on-
line equipment along with a downsampled version of the
global point cloud. Once a local point cloud is captured
online from an unknown position within the global scene,
the SP division, normalization, saliency detection, and DAE
dimension reduction stages can be carried out in parallel
for each SP independently. Then KNN candidate selection
based on KD-tree [25], RANSAC based localized candidate
search [26], and ICP [27] can be accomplished efficiently in
parallel as well (using multiple CPUs and/or a GPU). The
code for the RSCS method and SAF descriptor are available
at: https://github.com/gilbaz/LORAX.
3. Experiments and Results
The advantage of the RSCS SP creation over the FPFH
persistent key-point detection and the descriptive quality
of SAF in comparison to the FPFH descriptor are shown
throughout the experiments. Each stage of the registra-
tion was extensively tested using the “Challenging Datasets
for Point Cloud Registration Algorithms” [28], matching
a close-proximity point cloud to a global large-scale point
cloud of the same scene captured in two different seasons.
In addition, controlled experiments were carried out using
large-scale aerial point clouds, in order to better understand
the effects of different types of noise on the registration.
3.1. Challenging Dataset Registration
The datasets utilized contain many point clouds of out-
door scenes, captured by a ground LIDAR scanner, over
multiple seasons. By stitching together the point cloud laser
scans captured in each scene for each season, an authentic
global point cloud is created. This point cloud data is ideal
for testing the LORAX algorithm. We test it by register-
ing local point clouds to a global point cloud of the same
scene, captured in a different season. This setup is indeed
challenging; the scenes contain some rigid stationary ob-
jects such as a gazebo structure, lamp poles and benches,
but also inconsistent objects like people, bushes and tree
branches. This challenge is elevated by missing informa-
tion due to scanning angles and occlusions.
See Fig. 1for an example of LORAX’s results. The
colors indicate the relative distance between each point in
the local point cloud (after being registered) and its near-
est neighbor in the global point cloud, where green is closer
and red is further.
We test and compare the registration performance with
RSCS super-points vs. with key-points, and with FPFH de-
scriptors vs. PCAF and vs. SAF. The results are summa-
rized in Table 1.
For the comparison to key-point based methods we fol-
lowed [9]. The ‘RSCS+FPFH’ method computed the FPFH
descriptor corresponding with the center points of the RSCS
superpoints. Each result reported is the average perfor-
mance for 12 local point clouds, 9 from the ’Gazebo’ dataset
and 3 from the ’Wood’ dataset. For each registration re-
sult we measure the relative translational error (RTE) and
the relative rotation error (RRE) that are used and defined
in [29]. When the RTE is below a predefined threshold (we
used 1 meter), the registration result is defined as a binary
’success’. For each test we report the binary success rate
and the average RTE and RRE scores of the successful tests.
All methods use the same fine-tuning procedure and there-
fore achieve similar RTEs, yet they have different resulting
RREs and binary success rates, indicating the quality and
the robustness of the coarse registration.
Table 1shows that using a combination of RSCS for
the point cloud sub-division and the DAE to create SAF
yields the most robust and highest-quality registration re-
sults. The robustness gained from RSCS is evident in the
comparison of KP+FPFH to RSCS+FPFH. The comparison
RSCS+FPFH to RSCS+PCAF and RSCS+SAF shows the
advantage of using machine-learning based features over
manually designed features, as well as the advantage of
SAF over PCAF.
3.2. Noise, Occlusion, and Density Sensitivity Tests
To further test the quality and limitations of LORAX,
we used a few urban outdoor scene point clouds provided
by [30]. These point clouds, of approximately 1.5 million
points each, depict large areas of 250 square meters. See
Fig. 8for an example.
Figure 8: Aerial scanned point cloud depicting hill with sur-
rounding houses and roads. An example from [30] dataset.
RRE [degrees] RTE [meters] Binary Success Rate
KP+FPFH 12.2±4.8 0.44±0.2 8/12 (67%)
RSCS+FPFH 9.1±2.6 0.43±0.24 8/12 (67%)
RSCS+PCAF 7.2±2.3 0.40±0.32 8/12 (67%)
*RSCS+SAF 2.5±1.2 0.42±0.27 10/12 (83%)
Table 1: Registration results
In order to be able to control different parameters of
noise, density and occlusions, we performed semi-synthetic
experiments in which small point clouds with radii of 15-50
meters were cropped from the large original point clouds.
The LORAX and the KP+FPFH registration algorithms
were tested on altered versions of the local point clouds,
matching them to the original global point cloud.
Noise modification included: (1) adding random noise
by randomly moving 10%, 20%, 50% of the points a uni-
formly distributed distance of up to 3 meters, (2) randomly
removing 10%, 20%, 50% of the points to test sensitivity
to the density of the cloud, and (3) simulating occlusions
by removing 10%, 20%, 50% of points within local random
spheres. See Fig. 9.
(a) Original cropped
local point cloud
(b) Same cloud with simulated
noise and occlusions
Figure 9: Simulating noise, density change, and occlusions
50 randomly cropped point clouds from 3 full scenes
were tested to analyze the effects of downsampling (den-
sity change) (DS), random relocation noise (RN), and oc-
clusions (OC) on each. Fig. 10 summarizes the results. To
clarify, each point on the graph represents the average bi-
nary success rate of 50 registration tests given the noise
specifications defined. In this experiment the binary suc-
cess rate is defined by an RTE threshold of 2.5 meters (due
to the large scale of the global point cloud).
These results show high robustness to point density, due
to the depth map projection stage. Random noise has lit-
tle effect on our algorithm, due to the SAF representa-
tion, which captures only the major geometry characteris-
tics. The occlusion is the hardest defect to deal with. The
algorithm overcomes occlusions at a low level, but is greatly
hindered otherwise. Overall we see that LORAX is not sen-
sitive to substantial levels of random noise, density change
and occlusions, and that its robustness deteriorates only at
Figure 10: Noise, Occlusion, and Density Sensitivity Tests
extreme levels. The KP+FPFH algorithm (dashed line) re-
turned a low binary success rate when tested on clean local
point clouds due to the lack of “key-point” inducing scene
features, in many sections of the global point cloud. These
results add confidence in the direction of this research.
4. Conclusion
This paper presented LORAX, an innovative point cloud
registration algorithm. With the goal of outdoor localiza-
tion, this algorithm deals with the challenges of a multiple
magnitude difference in the number of points between the
two registered point clouds and with a large total number of
points involved. Two original approaches were presented:
1) using super-points (selected by a random sphere cover
set) as the basic units for matching, instead of key-points
and 2) encoding local 3D geometric structures using a deep
neural network auto-encoder. We have shown that the com-
bination of these ideas yields promising registration results
on challenging datasets. The method is generic is the sense
that it can work with any data regardless of the type of sen-
sor or scene. Moreover, while it includes an offline train-
ing stage, the online stages can be implemented efficiently
in parallel, making it suitable for serving real-time applica-
tions.
In future work; we intend to adapt this approach for sim-
ilar sized point clouds with small scene overlaps. Another
interesting direction is designing a multi-scale super-point
version of this algorithm. Finally, using a convolutional
auto-encoder with an input of height and color informa-
tion could produce excellent super-point features useful for
a wide range of point cloud analysis tasks.
Acknowledgments
This research was supported partially by the Technion
and the Magnet Omek Consortium, Ministry of Industry
and Trade, Israel. The authors would like to acknowledge
Elbit Systems Ltd for providing data for this research.
References
[1] Gary KL Tam, Zhi-Quan Cheng, Yu-Kun Lai, Frank C
Langbein, Yonghuai Liu, David Marshall, Ralph R
Martin, Xian-Fang Sun, and Paul L Rosin. Registra-
tion of 3d point clouds and meshes: a survey from
rigid to nonrigid. IEEE transactions on visualization
and computer graphics, 19(7):1199–1217, 2013.
[2] Brent Schwarz. Mapping the world in 3d. Nat. Pho-
tonics, 4(7):429–430, 2010.
[3] Hugh Durrant-Whyte and Tim Bailey. Simultane-
ous localization and mapping: part i. IEEE robotics
& automation magazine, 13(2):99–110, 2006.
[4] Jakob Engel, Thomas Schöps, and Daniel Cremers.
Lsd-slam: Large-scale direct monocular slam. In Eu-
ropean Conference on Computer Vision, pages 834–
849. Springer, 2014.
[5] Norbert Haala and Martin Kada. An update on au-
tomatic 3d building reconstruction. ISPRS Journal
of Photogrammetry and Remote Sensing, 65(6):570–
580, 2010.
[6] Jesse Levinson, Jake Askeland, Jan Becker, Jennifer
Dolson, David Held, Soeren Kammel, J Zico Kolter,
Dirk Langer, Oliver Pink, Vaughan Pratt, et al. To-
wards fully autonomous driving: Systems and algo-
rithms. In Intelligent Vehicles Symposium (IV), 2011
IEEE, pages 163–168. IEEE, 2011.
[7] US DoD. Global positioning system standard posi-
tioning service performance standard. Assistant sec-
retary of defense for command, control, communica-
tions, and intelligence, 2001.
[8] Ben Bellekens, Vincent Spruyt, Rafael Berkvens,
Rudi Penne, and Maarten Weyn. A benchmark sur-
vey of rigid 3d point cloud registration algorithms.
[9] Radu Bogdan Rusu, Nico Blodow, and Michael Beetz.
Fast point feature histograms (fpfh) for 3d registration.
In Robotics and Automation, 2009. ICRA’09. IEEE In-
ternational Conference on, pages 3212–3217. IEEE,
2009.
[10] Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Mar-
ton, and Michael Beetz. Aligning point cloud views
using persistent feature histograms. In 2008 IEEE/RSJ
International Conference on Intelligent Robots and
Systems, pages 3384–3391. IEEE, 2008.
[11] Federico Tombari, Samuele Salti, and Luigi Di Ste-
fano. Unique signatures of histograms for local sur-
face description. In European conference on computer
vision, pages 356–369. Springer, 2010.
[12] Paul Scovanner, Saad Ali, and Mubarak Shah. A 3-
dimensional sift descriptor and its application to ac-
tion recognition. In Proceedings of the 15th ACM
international conference on Multimedia, pages 357–
360. ACM, 2007.
[13] Bastian Steder, Radu Bogdan Rusu, Kurt Konolige,
and Wolfram Burgard. Narf: 3d range image fea-
tures for object recognition. In Workshop on Defin-
ing and Solving Realistic Perception Problems in Per-
sonal Robotics at the IEEE/RSJ Int. Conf. on Intelli-
gent Robots and Systems (IROS), volume 44, 2010.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
ton. Imagenet classification with deep convolutional
neural networks. In Advances in neural information
processing systems, pages 1097–1105, 2012.
[15] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton.
Deep learning. Nature, 521(7553):436–444, 2015.
[16] Geoffrey E Hinton and Ruslan R Salakhutdinov. Re-
ducing the dimensionality of data with neural net-
works. Science, 313(5786):504–507, 2006.
[17] Hangbin Wu and Hongchao Fan. Registration of air-
borne lidar point clouds by matching the linear plane
features of building roof facets. Remote Sensing,
8(6):447, 2016.
[18] Paul J Besl and Neil D McKay. Method for regis-
tration of 3-d shapes. In Robotics-DL tentative, pages
586–606. International Society for Optics and Photon-
ics, 1992.
[19] Szymon Rusinkiewicz and Marc Levoy. Efficient vari-
ants of the icp algorithm. In 3-D Digital Imaging
and Modeling, 2001. Proceedings. Third International
Conference on, pages 145–152. IEEE, 2001.
[20] Ameesh Makadia, Alexander Patterson, and Kostas
Daniilidis. Fully automatic registration of 3d point
clouds. In 2006 IEEE Computer Society Confer-
ence on Computer Vision and Pattern Recognition
(CVPR’06), volume 1, pages 1297–1304. IEEE, 2006.
[21] GD Scott and DM Kilgour. The density of random
close packing of spheres. Journal of Physics D: Ap-
plied Physics, 2(6):863, 1969.
[22] Yann LeCun et al. Generalization and network design
strategies. Connectionism in perspective, pages 143–
155, 1989.
[23] Terence D Sanger. Optimal unsupervised learning in a
single-layer linear feedforward neural network. Neu-
ral networks, 2(6):459–473, 1989.
[24] Martin A Fischler and Robert C Bolles. Random sam-
ple consensus: a paradigm for model fitting with ap-
plications to image analysis and automated cartogra-
phy. Communications of the ACM, 24(6):381–395,
1981.
[25] Kun Zhou, Qiming Hou, Rui Wang, and Baining Guo.
Real-time kd-tree construction on graphics hardware.
ACM Transactions on Graphics (TOG), 27(5):126,
2008.
[26] Donghwa Lee, Hyongjin Kim, and Hyun Myung.
Gpu-based real-time rgb-d 3d slam. In Ubiquitous
Robots and Ambient Intelligence (URAI), 2012 9th In-
ternational Conference on, pages 46–48. IEEE, 2012.
[27] Deyuan Qiu, Stefan May, and Andreas Nüchter. Gpu-
accelerated nearest neighbor search for 3d registration.
In International Conference on Computer Vision Sys-
tems, pages 194–203. Springer, 2009.
[28] François Pomerleau, Ming Liu, Francis Colas, and
Roland Siegwart. Challenging data sets for point cloud
registration algorithms. The International Journal of
Robotics Research, 31(14):1705–1711, 2012.
[29] Yanxin Ma, Yulan Guo, Jian Zhao, Min Lu, Jun
Zhang, and Jianwei Wan. Fast and accurate registra-
tion of structured point clouds with small overlaps.
[30] Elbit Systems Ltd.

Supplementary resource (1)

... These can alternatively be learned [16,36] from voxelixed patches [4,33,88] or from point clouds [19-21, 28, 87]. The learning can be supervised by labels [16] or achieved without them [28,35,44,89]. Our method falls into the latter category, as our contrastive loss motivates our encoder to separate instances by approximating geodesic distances [83] without training data labels. ...
Preprint
Many 3D tasks such as pose alignment, animation, motion transfer, and 3D reconstruction rely on establishing correspondences between 3D shapes. This challenge has recently been approached by matching of semantic features from pre-trained vision models. However, despite their power, these features struggle to differentiate instances of the same semantic class such as "left hand" versus "right hand" which leads to substantial mapping errors. To solve this, we learn a surface-aware embedding space that is robust to these ambiguities. Importantly, our approach is self-supervised and requires only a small number of unpaired training meshes to infer features for new 3D shapes at test time. We achieve this by introducing a contrastive loss that preserves the semantic content of the features distilled from foundational models while disambiguating features located far apart on the shape's surface. We observe superior performance in correspondence matching benchmarks and enable downstream applications including in-part segmentation, pose alignment, and motion transfer. The project site is available at https://lukas.uzolas.com/SurfaceAware3DFeaturesSite.
... Deep-learning-based methods leverage neural networks and have been used to extract more salient descriptors [10], [11]. However, they lack explainability and need to be pre-trained. ...
Preprint
Point cloud registration is a critical problem in computer vision and robotics, especially in the field of navigation. Current methods often fail when faced with high outlier rates or take a long time to converge to a suitable solution. In this work, we introduce a novel algorithm for point cloud registration called SANDRO (Splitting strategy for point cloud Alignment using Non-convex anD Robust Optimization), which combines an Iteratively Reweighted Least Squares (IRLS) framework with a robust loss function with graduated non-convexity. This approach is further enhanced by a splitting strategy designed to handle high outlier rates and skewed distributions of outliers. SANDRO is capable of addressing important limitations of existing methods, as in challenging scenarios where the presence of high outlier rates and point cloud symmetries significantly hinder convergence. SANDRO achieves superior performance in terms of success rate when compared to the state-of-the-art methods, demonstrating a 20% improvement from the current state of the art when tested on the Redwood real dataset and 60% improvement when tested on synthetic data.
... Based on the different types of input data, we can further broadly classify the existing methods into two categories. Firstly, methods based on structured data (Zeng et al. 2017;Elbaz, Avraham, and Fischer 2017;Choy, Park, and Koltun 2019;Ao et al. 2021), which typically involve projecting the point cloud into 2D images or constructing 3D volume, and directly applying 2D CNN or 3D CNN. Secondly, methods based on raw point clouds (Aoki et al. 2019;Du et al. 2019;Huang, Mei, and Zhang 2020;Bai et al. 2020;Xu et al. 2021;Huang et al. 2022b), which directly use the point cloud data as input. ...
Preprint
The discriminative feature is crucial for point cloud registration. Recent methods improve the feature discriminative by distinguishing between non-overlapping and overlapping region points. However, they still face challenges in distinguishing the ambiguous structures in the overlapping regions. Therefore, the ambiguous features they extracted resulted in a significant number of outlier matches from overlapping regions. To solve this problem, we propose a prior-guided SMoE-based registration method to improve the feature distinctiveness by dispatching the potential correspondences to the same experts. Specifically, we propose a prior-guided SMoE module by fusing prior overlap and potential correspondence embeddings for routing, assigning tokens to the most suitable experts for processing. In addition, we propose a registration framework by a specific combination of Transformer layer and prior-guided SMoE module. The proposed method not only pays attention to the importance of locating the overlapping areas of point clouds, but also commits to finding more accurate correspondences in overlapping areas. Our extensive experiments demonstrate the effectiveness of our method, achieving state-of-the-art registration recall (95.7\%/79.3\%) on the 3DMatch/3DLoMatch benchmark. Moreover, we also test the performance on ModelNet40 and demonstrate excellent performance.
... A parallel track of works follow a view based scheme [10,20], where the sub-spaces of 3D information in form of projections or depth map are learned with well studied 2D networks. Promising potential, these methods do not cover for sparse point sets. ...
Preprint
We present PPFNet - Point Pair Feature NETwork for deeply learning a globally informed 3D local feature descriptor to find correspondences in unorganized point clouds. PPFNet learns local descriptors on pure geometry and is highly aware of the global context, an important cue in deep learning. Our 3D representation is computed as a collection of point-pair-features combined with the points and normals within a local vicinity. Our permutation invariant network design is inspired by PointNet and sets PPFNet to be ordering-free. As opposed to voxelization, our method is able to consume raw point clouds to exploit the full sparsity. PPFNet uses a novel N-tuple\textit{N-tuple} loss and architecture injecting the global information naturally into the local descriptor. It shows that context awareness also boosts the local feature representation. Qualitative and quantitative evaluations of our network suggest increased recall, improved robustness and invariance as well as a vital step in the 3D descriptor extraction performance.
... The approach was evaluated in realistic urban setting and in derelict buildings. Elbaz et al [19] used segments produced from a Random Sphere Cover Set -overlapping point cloud spheres where each point of the original point cloud could be part of more than one sphere. These spheres were then projected to 2D depth images and processed by a deep auto-encoder. ...
Preprint
Localization in challenging, natural environments such as forests or woodlands is an important capability for many applications from guiding a robot navigating along a forest trail to monitoring vegetation growth with handheld sensors. In this work we explore laser-based localization in both urban and natural environments, which is suitable for online applications. We propose a deep learning approach capable of learning meaningful descriptors directly from 3D point clouds by comparing triplets (anchor, positive and negative examples). The approach learns a feature space representation for a set of segmented point clouds that are matched between a current and previous observations. Our learning method is tailored towards loop closure detection resulting in a small model which can be deployed using only a CPU. The proposed learning method would allow the full pipeline to run on robots with limited computational payload such as drones, quadrupeds or UGVs.
... Hence, it becomes impossible to compute global descriptors from the bag-of-word approach to do Li-DAR based place recognition. Most existing approaches circumvent this problem by using readings from the Global Positioning System (GPS) to provide coarse localization, followed by point cloud registration, e.g. the iterative closest point (ICP) [33] or autoencoder based registration [9], for pose-estimation. As a result, LiDAR based localization is largely neglected since GPS might not be always available, despite the fact that much more accurate localization results can be obtained from LiDAR compared to images due to the availability of precise depth information. ...
Preprint
Unlike its image based counterpart, point cloud based retrieval for place recognition has remained as an unexplored and unsolved problem. This is largely due to the difficulty in extracting local feature descriptors from a point cloud that can subsequently be encoded into a global descriptor for the retrieval task. In this paper, we propose the PointNetVLAD where we leverage on the recent success of deep networks to solve point cloud based retrieval for place recognition. Specifically, our PointNetVLAD is a combination/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud. Furthermore, we propose the "lazy triplet and quadruplet" loss functions that can achieve more discriminative and generalizable global descriptors to tackle the retrieval task. We create benchmark datasets for point cloud based retrieval for place recognition, and the experimental results on these datasets show the feasibility of our PointNetVLAD. Our code and the link for the benchmark dataset downloads are available in our project website. http://github.com/mikacuy/pointnetvlad/
Preprint
Large-scale scene point cloud registration with limited overlap is a challenging task due to computational load and constrained data acquisition. To tackle these issues, we propose a point cloud registration method, MT-PCR, based on Modality Transformation. MT-PCR leverages a BEV capturing the maximal overlap information to improve the accuracy and utilizes images to provide complementary spatial features. Specifically, MT-PCR converts 3D point clouds to BEV images and eastimates correspondence by 2D image keypoints extraction and matching. Subsequently, the 2D correspondence estimates are then transformed back to 3D point clouds using inverse mapping. We have applied MT-PCR to Terrestrial Laser Scanning and Aerial Laser Scanning point cloud registration on the GrAco dataset, involving 8 low-overlap, square-kilometer scale registration scenarios. Experiments and comparisons with commonly used methods demonstrate that MT-PCR can achieve superior accuracy and robustness in large-scale scenes with limited overlap.
Article
Point cloud registration serves as a basis for vision and robotic applications including 3D reconstruction and mapping. Despite significant improvements on the quality of results, recent deep learning approaches are computationally expensive and power-hungry, making them difficult to deploy on resource-constrained edge devices. To tackle this problem, in this paper, we propose a fast, accurate, and robust registration for low-cost embedded FPGAs. Based on a parallel and pipelined PointNet feature extractor, we develop custom accelerator cores namely PointLKCore and ReAgentCore, for two different learning-based methods. They are both correspondence-free and computationally efficient as they avoid the costly feature matching step involving nearest-neighbor search. The proposed cores are implemented on the Xilinx ZCU104 board and evaluated using both synthetic and real-world datasets, showing substantial improvements in the trade-off between runtime and registration quality. They run 58.27–63.21x faster than ARM Cortex-A53 CPU and offer 1.54–10.91x speedups over Intel Xeon CPU and Nvidia Jetson boards, while consuming less than 1W and achieving 133.63–267.57x energy-efficiency compared to Nvidia GeForce GPU. The proposed cores are more robust to noise and large initial misalignments than the classical methods and quickly find reasonable solutions in less than 4–12ms, demonstrating real-time performance.
Chapter
The emergence of advanced 3D sensing technologies, such as LiDAR, has significantly increased the availability of point cloud data, driving the need for robust analytics through deep learning. Point clouds, with their detailed spatiotemporal structures, are vital across numerous applications, requiring innovative approaches for effective interpretation and utilization. This chapter delves into the intersection of deep learning and point cloud analytics, covering essential tasks like point classification and semantic segmentation. It then explores place recognition, object retrieval, and registration, emphasizing their importance in interpreting dynamic environments. This chapter concludes with an examination of multimodal analysis, showcasing the synergistic potential of integrating point cloud data with other data modalities. Each section systematically unpacks the problems, general solution strategies, seminal contributions, and emerging trends, encapsulating the state-of-the-art in deep-learning-based point cloud analytics and paving the way for future advancements in the field.
Article
Full-text available
This paper presents a new approach for the registration of airborne LiDAR point clouds by finding and matching corresponding linear plane features. Linear plane features are a type of common feature in an urban area and are convenient for obtaining feature parameters from point clouds. Using such linear feature parameters, the 3D rigid body coordination transformation model is adopted to register the point clouds from different trajectories. The approach is composed of three steps. In the first step, an OpenStreetMap-aided method is applied to select simply-structured roof pairs as the corresponding roof facets for the registration. In the second step, the normal vectors of the selected roof facets are calculated and input into an over-determined observation system to estimate the registration parameters. In the third step, the registration is be carried out by using these parameters. A case dataset with a two trajectory point cloud was selected to verify the proposed method. To evaluate the accuracy of the point cloud after registration, 40 checkpoints were manually selected; the results of the evaluation show that the general accuracy is 0.96 m, which is approximately 1.6 times the point cloud resolution. Furthermore, two overlap zones were selected to measure the surface-difference between the two trajectories. According to the analysis results, the average surface-distance is approximately 0.045-0.129 m.
Article
Full-text available
Advanced user interface sensors are able to observe the environment in three dimensions with the use of specific optical techniques such as time-of-flight, structured light or stereo vision. Due to the success of modern sensors, which are able to fuse depth and color information of the environment, a new focus on different domains appears. This survey studies different state-of-the-art registration algorithms, which are able to determine the motion between two corresponding 3D point clouds. This survey starts from a mathematical field of view by explaining two deterministic methods, namely Principle Component Analysis (PCA) and Singular Value Decomposition (SVD), towards more iteratively methods such as Iterative Closest Point (ICP) and its variants. We compare the performance of the different algorithms to their precision and robustness based on a real world dataset. The main contribution of this survey consists of the performance benchmark that is based on a real world dataset, which includes 3D point clouds of a Microsoft Kinect camera, and a mathematical overview of different registration methods, which are commonly used in applications such as simultaneous localization and mapping, and 3D-scanning. The outcome of our benchmark concludes that the ICP point-to-surface method is the most precise algorithm. Beside the precision, the result for the robustness we can conclude that a combination of applying a ICP point-to-point method after an SVD method gives the minimum error. I. INTRODUCTION This article, which is an extended version of the conference paper [1], contains new results that defines the robustness and the precision of the different registration algorithms. With the advent of inexpensive depth sensing devices, robotics, computer vision and ambient application technology research has shifted from 2D imaging and Laser Imaging Detection And Ranging (LIDAR) scanning towards real-time reconstruction of the environment based on 3D point cloud data. On the one hand, there are structured light based sensors such as the Microsoft Kinect and Asus Xtion sensor, which generate a structured point cloud, sampled on a regular grid, and on the other hand, there are many time-of-flight based sensors such as the Softkinetic Depthsense camera, which yield an unstructured point cloud. These point clouds can either be used directly to detect and recognize objects in the environment where ambient technology is been used, or can be integrated over time to completely reconstruct a 3D map of the camera's surroundings [2], [3], [4]. However, in the latter case, point clouds obtained at different time instances need to be aligned, a process that is often referred to as registration. Registration
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Article
Full-text available
Three-dimensional surface registration transforms multiple three-dimensional data sets into the same coordinate system so as to align overlapping components of these sets. Recent surveys have covered different aspects of either rigid or nonrigid registration, but seldom discuss them as a whole. Our study serves two purposes: 1) To give a comprehensive survey of both types of registration, focusing on three-dimensional point clouds and meshes and 2) to provide a better understanding of registration from the perspective of data fitting. Registration is closely related to data fitting in which it comprises three core interwoven components: model selection, correspondences and constraints, and optimization. Study of these components 1) provides a basis for comparison of the novelties of different techniques, 2) reveals the similarity of rigid and nonrigid registration in terms of problem representations, and 3) shows how overfitting arises in nonrigid registration and the reasons for increasing interest in intrinsic techniques. We further summarize some practical issues of registration which include initializations and evaluations, and discuss some of our own observations, insights and foreseeable research trends.
Article
Full-text available
The number of registration solutions in the literature has bloomed recently. The iterative closest point, for example, could be considered as the backbone of many laser-based localization and mapping systems. Although they are widely used, it is a common challenge to compare registration solutions on a fair base. The main limitation is to overcome the lack of accurate ground truth in current data sets, which usually cover environments only over a small range of organization levels. In computer vision, the Stanford 3D Scanning Repository pushed forward point cloud registration algorithms and object modeling fields by providing high-quality scanned objects with precise localization. We aim to provide similar high-caliber working material to the robotic and computer vision communities but with sceneries instead of objects. We propose eight point cloud sequences acquired in locations covering the environment diversity that modern robots are susceptible to encounter, ranging from inside an apartment to a woodland area. The core of the data sets consists of 3D laser point clouds for which supporting data (Gravity, Magnetic North and GPS) are given for each pose. A special effort has been made to ensure global positioning of the scanner within mm-range precision, independent of environmental conditions. This will allow for the development of improved registration algorithms when mapping challenging environments, such as those found in real-world situations.
Conference Paper
We propose a direct (feature-less) monocular SLAM algorithm which, in contrast to current state-of-the-art regarding direct meth- ods, allows to build large-scale, consistent maps of the environment. Along with highly accurate pose estimation based on direct image alignment, the 3D environment is reconstructed in real-time as pose-graph of keyframes with associated semi-dense depth maps. These are obtained by filtering over a large number of pixelwise small-baseline stereo comparisons. The explicitly scale-drift aware formulation allows the approach to operate on challenging sequences including large variations in scene scale. Major enablers are two key novelties: (1) a novel direct tracking method which operates on sim(3), thereby explicitly detecting scale-drift, and (2) an elegant probabilistic solution to include the effect of noisy depth values into tracking. The resulting direct monocular SLAM system runs in real-time on a CPU.
Conference Paper
To perform registration of structured point clouds with large rotation and small overlaps, this paper presents an algorithm based on the direction angles and the projection information of dense points. This algorithm fully employs the geometric information of structured environment. It consists of two parts: rotation estimation and translation estimation. For rotation estimation, a direction angle is defined for a point cloud and then the rotation matrix is obtained by comparing the difference between the distributions of angles. For translation estimation, the point clouds are projected onto three orthogonal planes and then a correlation operation is performed on the projection images to calculate the translation vector. Experiments have been conducted on several datasets. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art approaches in terms of both accuracy and efficiency.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Conference Paper
This paper proposes a GPU (graphics processing unit)-based real-time RGB-D (red-green-blue depth) 3D SLAM (simultaneous localization and mapping) system. RGB-D data contain 2D image and per-pixel depth information. First, 6-DOF (degree-of-freedom) visual odometry is obtained through the 3D-RANSAC (three-dimensional random sample consensus) algorithm with image features. And a projective ICP (iterative closest point) algorithm gives an accurate odometry estimation result with depth information. For speed up extraction of features and ICP computation, GPU-based parallel computation is performed. After detecting loop closure, a graph-based SLAM algorithm optimizes trajectory of the sensor and 3D map.
Article
Models of randomly packed hard spheres exhibit some features of the properties of simple liquids, e.g. the packing density and the radial distribution. The value of the maximum packing density of spheres can be determined from models if care is taken to ensure random packing at the boundary surfaces and if correction is made for volume errors at the boundaries. Experiments for both the random `loose' and the random close-packed densities are reported with fraction one-eighth in. plexiglass, nylon and steel balls in air, and also with steel balls immersed in oil. A series of measurements for the random close-packed density has been made with up to 80 000 steel balls and with the aid of a mechanical vibrator. A computer analysis of the results permits a one-step, two-parameter extrapolation to infinite volume. The figure so obtained for the random close-packed density is 06366±00005, which represents an improvement in precision over previous results by an order of magnitude.