Conference PaperPDF Available

Deep learning for 2D scan matching and loop closure

Authors:
Deep Learning for 2D Scan Matching and Loop Closure
Jiaxin Li1, Huangying Zhan2, Ben M. Chen3, Ian Reid2, Gim Hee Lee4
Abstract Although 2D LiDAR based Simultaneous Local-
ization and Mapping (SLAM) is a relatively mature topic
nowadays, the loop closure problem remains challenging due
to the lack of distinctive features in 2D LiDAR range scans.
Existing research can be roughly divided into correlation based
approaches e.g. scan-to-submap matching and feature based
methods e.g. bag-of-words (BoW). In this paper, we solve loop
closure detection and relative pose transformation using 2D
LiDAR within an end-to-end Deep Learning framework. The
algorithm is verified with simulation data and on an Unmanned
Aerial Vehicle (UAV) flying in indoor environment. The loop
detection ConvNet alone achieves an accuracy of 98.2% in
loop closure detection. With a verification step using the scan
matching ConvNet, the false positive rate drops to around
0.001%. The proposed approach processes 6000 pairs of raw
LiDAR scans per second on a Nvidia GTX1080 GPU.
I. INTRODUCTION
The ability to detect and complete loop closure cor-
rectly is critical to many autonomous robotic applications.
Firstly, loop closure is a major component in the back-end
graph optimization in the front-end / back-end framework
of SLAM [1]. It helps to eliminate odometry errors that
are accumulated in the front-end for long term operations.
Secondly, localization in a prior map becomes possible
with loop closure. A direct extension of such ability is
to allow recovery from the kidnapped robot problem [2],
which significantly improves the robustness of the system.
Thirdly, cooperative mapping with multiple robots requires
loop closure algorithms, so that the submaps from various
robots can be merged into a global consistent map.
In the community of computer vision, loop closure has ex-
perienced intensive research in the past few decades. Mature
algorithms such as DBoW2, FAB-MAP are widely used in
state-of-the-art visual SLAM systems including ORB-SLAM
and LSD-SLAM etc. On the contrary, little attention has
been given to loop closure with 2D LiDAR sensor. Most
popular solutions such as FAST-SLAM [3], Hector SLAM
[4] are LiDAR-based odometry without loop closure. Only
very recently, LiDAR based loop closure is implemented
together with odometry in the Google Cartographer [5].
In comparison to images, 2D scans encode much less
information. The lack of rich intensity gradients, which
are most commonly used in image-based feature extraction,
makes it extremely difficult to extract distinctive features for
The first two authors contributed equally. The source code of this work
is released at http://uav.ece.nus.edu.sg/li2017deep.html
1Jiaxin Li is with the Graduate School for Integrative Science & Engi-
neering, National University of Singapore (NUS). jli@u.nus.edu
2Huangying Zhan and Ian Reid are with The University of Adelaide and
The Australian Centre of Excellence in Robotic Vision.
3Ben M. Chen is with the Department of Electrical and Computer
Engineering, NUS.
4Gim Hee Lee is with the Department of Computer Science, NUS.
loop closure detection. Furthermore, some structured features
e.g. corners have weak variations in the range measurements.
Consequently, a naive migration of feature detectors and
descriptors from computer vision is impractical [6].
Existing works on detecting loop closures mainly focus
on designing proper feature extraction, followed by Nearest
Neighbor (NN) search, BoW retrieval or classifier etc. In-
spired by the recent astonishing success of deep learning al-
gorithms in extracting and classifying features, we approach
the 2D LiDAR loop closure problem as a classification
problem, to answer whether a pair of laser scans are captured
in nearby location. It is shown that our formulation leads to
extremely high accuracy. Moreover, by making use of state-
of-the-art deep learning framework and the computational
power of contemporary GPUs, loop closure can be performed
in a short time with exhaustive matching, which was believed
to be impractical previously [7].
The main contribution of this paper is to introduce a
solution for scan matching and loop closure detection using
2D LiDAR scans within a single deep learning framework
as shown in Figure 1. The design, training and analysis
of our proposed deep network are discussed in Section III.
In Section IV, the proposed algorithm is validated by both
simulation and real world applications to provide reliable
detection of loop closure at an extremely high speed. In
addition, the visualization and analysis of the trained network
give a novel insight into the type of features that are critical
to scan matching and loop closure detection. Finally, the
conclusion is elaborated in Section V.
II. REL ATE D WORK
Early attempts utilized Monte Carlo localization to solve
2D LiDAR based loop closure, but the idea of searching in
the pose space is not scalable. Subsequently, Stachniss et
al. [8] improved standard Rao-Blackwellized algorithm by
associating both occupancy grid map and topological map
to each particle. In [9], Neira et al. achieved linear time
relocation by applying geometric constraints on features in
a stochastic map.
1) Feature Based Approach: A majority of research fo-
cuses on feature based approaches, i.e. detect and describe
features from either single scan or submap, and employ
various data association or classification algorithms to detect
loop closure. In [10], Bosse and Zlot described submaps with
orientation histogram, projection histogram, and a novel en-
tropy sequence of projection histogram. Also, they proposed
an exhaustive approach to deal with unstructured environ-
ments which common correlation methods can not handle.
In 2009, Bosse and Zlot [11] further extended their work by
proposing a methodology to design keypoint locations and
descriptors that models the region around a keypoint.
In 2010, Diego et al. [6] proposed the Fast Laser Interest
Point Transform (FLIRT) algorithm that uses curvature based
detector and beta grid descriptor. The former defines an
integral operator to map the scan curve into multi-scale
parameterization. The latter encodes occupancy probabilities
and the variance in a polar histogram. This work is further
improved by Diego et al. in 2013 [12], where the Geom-
etry FLIRT Phrases is introduced as an efficient retrieval
approach. The extracted FLIRT features from 10000 training
scans are clustered into a few hundred words, i.e. a vocabu-
lary. By using a modified Inverse Document Frequency (IDF)
technique–i.e. scan ID numbers are stored against the words–
place recognition can be greatly accelerated. Also, Diego et
al. proposed a geometry verification step to compare any two
scans by checking the clockwise ordering of their FLIRT
features. This idea of utilizing geometrical information is
extended by Himstedt et al. in 2014 [7], where the relative
orientation of the landmarks and the distance of co-occurring
landmarks are encoded into the original FLIRT descriptor.
2) Optimization Based Approach: In 2015, Olson [13]
adopted the multi-resolution image pyramid method to
achieve robust scan-to-scan matching. In addition, a heap
structure is utilized to realize “one-to-many” and “many-
to-many” search, which makes it possible to perform loop
closure in large scale environments. A year later, Hess et al.
published the Google Cartographer LiDAR SLAM algorithm
[5], which implements odometry together with a real-time
loop closure algorithm. The key idea is building multiple
submaps, and aligning new scans to nearby submaps to
generate constrains on a graph. The real-time alignment is
achieved by a branch-and-bound approach.
3) Learning with LiDAR: The benefits of deep learning
have not been fully felt in the field of geometric matching
of 2D LiDAR data. Other machine learning techniques were
employed for loop-closure problem. In 2009, Granstrom et
al. [14] used an AdaBoost classifier and designed 20 hand-
crafted features e.g. average range etc. to serve as the input
to the classifier. The detection rate is 85% and is likely to
be higher if deep learning algorithms were employed.
Nicolai et al. [15] utilized ConvNets on processing 3D
LiDAR scans for odometry, but the results are disappointing
compared with existing scan matching techniques. The clos-
est work from ours is from Pfeiffer et al. [16], who trained a
target-oriented navigation model to give steering commands.
However, their work focused on deep learning with motion
planning, which is different from ours.
III. DES IGN O F THE DEEP NETWORK
This section describes our end-to-end deep network for
the scan matching and loop closure detection problems. Our
network shown in Figure 1 first learns the features needed for
both scan matching and loop closure detection in a common
stack of convolutional layers. Next, these features are passed
on to two separate fully connected layers for scan matching
and loop closure detection respectively.
A. Fully Connected Layers
One of the fully connected layer stacks is designed to do
regression and compute the relative transformation between
two given scans. The other fully connected layer stacks is
designed to do classification and determine the presence of
a loop closure opportunity.
1) Scan Matching: Given any pair of LiDAR scans si,
the objective is to find the relative pose transformation T=
[∆x, y, θ]Tbetween the two scans. x, yrepresent
translation, and θis the rotation angle. Let us denote g(.)
as the unknown function that maps sito T:
T=g(si).(1)
The function g(.)is learned during the training phase. The
cost to be optimized is defined as the Euclidean loss kTlabel
Tk2, where Tlabel represents the label, i.e. ground truth.
2) Loop Closure Detection: Scan-to-scan loop closure
detection checks whether a place is revisited given a pair
of scans si. This can be formulated as a classification
problem, where class 0represents no loop closure and class
1represents loop closure detected. We design and train a
ConvNet as a scoring function f(si)R2to assign scores
to the two classes with a pair of scans as input. The cost to
be minimized is the cross-entropy loss,
Li=log(p) = log(efyi
Pjefj)(2)
where yi∈ {0,1}is the ground truth label, and fjrepresents
the j-th element in the vector f. Here p=efyi
Pjefjis the
Softmax function. For instance, for a pair of scans sithat
belongs to the same place, p=ef1
ef0+ef1. If the scoring
function f= [f0, f1]Tgives f1f0, the cross-entropy
loss Liwill be near zero. On contrary, the loss Liwill be
large if f1f0.
In a probabilistic view, the Softmax function pcan be in-
terpreted as a normalized probability of assigning the correct
label yigiven the input siand the network f. Minimizing
the cross-entropy loss Liequals to minimizing the negative
log likelihood of the correct class, i.e. performing Maximum
Likelihood Estimation (MLE).
B. Convolutional Neural Network
Both g(.)and f(.)are complex functions that take scans
as input and generate either pose transformation or loop
closure classification. We use ConvNets to learn the features
needed for these functions because they are proven to be ca-
pable of modeling highly non-linear problems. Furthermore,
the choice of ConvNet is inspired by previous efforts on
SLAM and loop closure detection. Most early algorithms
like FastSLAM relies on features / landmarks to perform
localization, while recent loop closure solutions with LiDAR
or camera highly depend on detecting and matching features.
It is well known that convolutional layers in ConvNet can
be interpreted as transforming inputs into feature representa-
tions, which are used for regression or classification later in
the fully connected layers. A consistent result for the deep
Fig. 1. Our deep network for scan matching and loop closure detection. There is a common stack of convolutional layers with two fully connected layers
for each task.
learning community is that the learned features outperform
hand-crafted ones in most situations.
Based on the assumption that the features needed for scan
matching are similar to features needed for loop closure
detection, we design a single stack of convolutional layers
to learn the features needed for g(.)and f(.)as shown in
Figure 1. This assumption is verified during the training
phase stated in Section III-C. The design of the convolutional
layers follows the ResNet framework [17]. Similar to results
in the computer vision community, we found that the residual
structure has superior performance in both scan matching and
loop closure detection compared with traditional networks.
C. Training the Network
1) Training the Scan Matching: The input is a pair of laser
scans i.e. a 2×1081 matrix for the scan matching network.
The output is a vector T= [∆x, y, θ]T. We set up a UAV
simulation platform to get millions of samples for training.
The simulator is built with Robot Operation System (ROS)
and Gazebo, where the UAV is mounted with a Hokuyo
UTM-30LX range finder with a maximum range of 30m
and field-of-view of 270. Multiple indoor environments
are constructed for data acquisition. The maximum linear
velocity of the simulated drone is 0.8m/s, and the maxi-
mum yaw velocity is 0.5rad/s. The ground truth positions
[xj, yj, θj]Tand the range measurements are recorded at the
speed of 10Hz. A training sample can be acquired according
to Equation (3), where cis an integer to control the interval
between two range measurements.
x= (xj+cxj)cosθj+ (yj+cyj)sinθj
y= (yj+cyj)cosθj(xj+cxj)sinθj
θ=θj+cθj
(3)
For example, the actual frequency of generating
[∆x, y, θ]Tis 2Hz when c= 5. Here we choose
c={5,10,20}, i.e. the training samples are a combination
of 2Hz, 1Hz and 0.5Hz data. Considering the maximum
velocity of the simulated UAV, xand yis 1.6mat most
while θis 1rad at maximum in the training dataset. In
addition, samples of c={−5,10,20}are added so that
both forward and backward motions can be captured. A
training dataset consisting of around 3 millions samples is
created for the scan matching ConvNet.
2) From Scan Matching to Loop Detection: The
dataset generation for loop closure detection is sim-
ilar to that for scan matching. Samples of c=
5,±10,±20,±50,±500,±2500}are built. The labels are
set to 1, i.e. loop detected, if the pose transformation satisfies
x < 1.6,y < 1.6,θ < 1. On the other hand, samples
with large transformations are set to label 0.
Loop closure detection shares the same convolutional
layers with the scan matching ConvNet. By only back propa-
gating the fully connected layers, we obtained a test accuracy
of 98.2%. Actually we attempted to train the loop detection
ConvNet from scratch without the pre-trained convolutional
layers, but this resulted in a slightly lower testing accuracy
of 95%. This phenomenon proves our hypothesis that the
features extracted by convolutional layers of scan matching
ConvNet can be directly applied to detecting loop closures.
D. Visualization and Analysis
In order to understand the semantic meaning of learned
features, the learned convolutional kernels, convolved output
(feature maps) and the neuron responses are visualized.
1) Visualization: There are two methods adopted in this
work for the visualization. The first method focus on kernels
and feature maps. After visualizing kernels learned in all
convolutional layers, we found that the kernels usually follow
a few patterns in each layer except those learned to be
approaching zeros. For example in Figure 2(a), there are
typically two patterns in the first convolutional layer - a
concave and convex kernel.
The second visualization method concentrates on the re-
ceptive field with high neuron response, i.e., we try to find
the range measurements that can produce high values after
convolution with the kernels. In Figure 2(c), the 1081 range
measurements generate a feature map of length 360 after
convolving with a 1×7kernel (stride equal to 3). Each
value in the feature map corresponds to a receptive field in
the input scan, i.e., 7 range measurements. The high response
receptive fields are visualized in Figure 2 and 3.
2) Analysis: From the visualization results shown in Fig-
ure 2(b) and Figure 2(c), it is found that the kernels in the
first convolutional layer have an effect similar to distance
filters. Kernel 0 responds to the small range measurements
strongly while kernel 1 responds to the large measurements.
(a) (b) (c)
Fig. 2. (a) 64 Kernels learned in the first convolutional layer. (b) The first row is the visualization of Kernel 0; The second row plots the feature map
of Kernel 0. In the original input scan shown in the third row, highlighted areas are the receptive fields corresponding to top-5 feature map response. The
intensity of highlighted area is positively correlated to the strength of response. The fourth row draws the input scan in 2D Cartesian space where the high
response measurements are highlighted in red. (c) Similar visualization result based on Kernel 1.
(a) (b) (c)
Fig. 3. (a) Kernel 1 in 7th convolutional layer and corresponding visualization. (b) Kernel 1 in 11th convolutional layer and corresponding results. (c)
High response examples from the deepest residual layer.
The 2D Cartesian space visualization proves the hypothesis.
In Figure 2(b), the receptive fields with high response are
the points that are close to the origin, while in Figure 2(c),
the distant points produce high response.
Further investigations are carried out on deeper layers.
Some visualization results from various convolutional layers,
including kernels and feature maps, are plotted in Figure 3(a)
and 3(b). Also, some high response receptive fields from the
deepest convolutional layer are shown in 2D Cartesian coor-
dinate, as in Figure 3(c). Unfortunately, physical meanings
of the kernels in deeper layers are not as obvious as that
in CONV1. Some distant lines or close lines, and points at
the edge of lines produce high response. And surprisingly,
there are few corner points that generate high response in
the ConvNet. Although it is not clear why these regions
are emphasized by ConvNets, the visualization results are
inspiring. It may be a good choice to pay more attention to
edge points, distant lines and close lines when dealing with
geometrical perception of 2D laser scans.
E. Geometrical Validation
After training, the loop closure detection ConvNet
achieves a testing accuracy of 98.2% that outperforms ex-
isting algorithms. However, the 1.8% error may still lead to
false alarm in large scale applications. Therefore a validation
scheme is proposed to reduce the false positive rate to almost
zero. In the case that the loop closure detection ConvNet
reports a positive prediction, the same pair of input scans
(s0, s1)are put into the scan matching ConvNet to get a pose
transformation T= [∆x, y, θ]T. We align s1with s0
using the predicted transformation Tin the Cartesian space:
s0
1i="cosθsinθ
sinθcosθ#"s1ix
s1iy#+"x
y#,(4)
ri=ks0
1iNNs0(s0
1i)k2,(5)
where NNs0(s0
1i)is the nearest neighbor search of the
transformed point s0
1iin the scan s0in the Cartesian space.
Intuitively, the error vector rmeasures how well the two
scans aligns given a geometrical transformation.
Inspired by [18], a robust t-distribution weighting wiis
applied to the alignment cost so that the alignment error
caused by occlusion and noise etc. can be handled properly:
wi=v+ 1
v+ (ri
σ)2.(6)
σin (6) is calculated iteratively using
σ2=1
nX
i
r2
i
v+ 1
v+ (ri
σ)2,(7)
where the tunable parameter v= 5 is a typical choice.
A robust alignment error ρ(r) = Piwiriis computed for
each positive classification result, to filter out those with high
alignment error. The effect of such geometrical validation is
two fold. The first is to avoid false positive classification of
the loop detection ConvNet. The other situation when geo-
metrical validation fails is that the loop closure classification
gives correct prediction but the scan matching regression is
inaccurate. The geometrical validation benefits the system by
rejecting both incorrect and inaccurate loop closures.
IV. EXP ERI MEN TS
The scan matching and loop closure detection ConvNets
are tested in a simulation platform, as well as on a UAV
flying in an indoor environment. The experiments focus
on detecting and performing loop closure. Our scan-to-scan
framework makes it easy to embed the ConvNets into any
modern 2D LiDAR SLAM algorithms. In particular, we
enhance the Hector SLAM proposed by Kohlbrecher et al.
[4] with our proposed approach, which results in significant
improvement in simulation and UAV navigation.
In the ConvNet enchanced SLAM, odometry is performed
by the Hector SLAM and keyframes are created incre-
mentally so that each pair of adjacent keyframes satisfies
x < 1.6m, y < 1.6m, θ < 1.0rad, which is the
same as the setting in Section III-C. On the creation of
each keyframe, it is checked against every keyframe created
before using the loop closure detection ConvNet. The few
keyframes created immediately before the new keyframe are
skipped because it is unnecessary to perform loop closure for
neighbor measurements. The possible loop closing keyframes
reported by the loop detection ConvNet will go through the
geometrical validation using the scan matching ConvNet. For
the confirmed loop closures, the pose transformations are
further refined with standard Iterative Closest Point (ICP)
algorithm, based on the prediction of the scan matching
ConvNet. A graph is built to capture the detected loops and
the respective transformation.
A. Simulation
A diverse indoor environment, shown in the bottom-left of
Figure 4(b), is built with ROS and Gazebo, with the size of
around 60m×60m. In addition, Gaussian noise N(0,0.052)
is augmented into the simulated Hokuyo UTM-30LX range
finder. The loop closure detection follows the brute-force
searching strategy mentioned above. During the simulation
shown in Figure 4(a), the simulated drone travels for 143m,
resulting in 288 keyframes. In total 39340 laser scan pairs
are tested with the ConvNet, shown in Table I.
In the simulation, we try performing SLAM with only
the ConvNets, i.e., the scan matching ConvNet instead of
the Hector SLAM plays the role of odometry. The resulting
green trajectory is not perfect but still in accordance with our
expectation. In odometry, any tiny transformation inaccuracy
TABLE I
PERFORMANCES OF LOO P CLO SU RE DE TE CT IO N
Scan Pairs TP FP Precision Recall Acc.
sim. w/o GV 39340 6 115 5.0% 100% 99.7%
sim. w/ GV 39340 6 0 100% 100% 100%
fly. w/o GV 780 13 26 33.3% 28.8% 92.5%
fly. w/ GV 780 13 0 100% 28.8% 95.9%
will accumulate and finally results in large error. Usually it
is difficult to train a regression ConvNet to perfectly track a
highly non-linear function like scan matching.
In Figure 4(a), the original Hector SLAM, represented by
the blue line, begins drifting after a few meters because
of the augmented scan noise and the challenging diverse
environment. When the drone returns to the the starting point,
loop closure is detected and performed by the ConvNets.
The trajectory after loop closure is shown in the black line,
which has competitive accuracy compared to the Google
Cartographer [5] algorithm. Because of the modern deep
learning framework, our approach performs thousands of
detections per second, which is significantly faster than any
traditional algorithms including the Google Cartographer.
B. Real-time Loop Closure on UAV
To evaluate the performance and robustness in practical
environment, a UAV is navigated with the proposed ConvNet
enhanced Hector SLAM. The occupancy map of the flight
environment is shown in the bottom-right of Figure 4(b).
1) UAV Setup: As shown in Figure 4(b), the self-built
UAV is equipped with a Hokuyo UTM-30LX range finder.
An Intel NUC computer is mounted to run a ROS based
navigation software, which includes the UAV localization
and a A-star based path planning algorithm. The sensor data
from the range finder are sent back to a workstation in real
time via WiFi. The workstation runs the proposed CNN
enhanced Hector SLAM algorithm and sends the localization
result back to the UAV. Similarly, brute-force searching
strategy is used, and loop closures are conducted using the
ICP refined scan matching. In addition, a VICON system is
used to record the ground truth trajectory of the UAV.
2) Results: Although the room is relative neat for LiDAR
sensors, significant drift can be found for the original Hector
SLAM algorithm. That is probably because the fluctuation
in the height of the UAV, including the takeoff and landing
process, violates the planar assumption of 2D SLAM. Or
in other words, the changes of range measurements come
from not only the planar movement of the UAV, but also
the environment’s structure that varies in different heights.
Again, with the proposed ConvNet based loop closure, the
optimized trajectory in black line is much closer to the
ground truth, shown in Figure 4(c). In total there are 47
keyframes, and the results are illustrated in Table I.
C. Run Time
With a Nvidia GTX1080 GPU of the Pascal architecture,
it takes about 1s to process 6000 laser scan pairs. In our
case, loop closure can be detected as long as the relative
location of any pair of scans satisfies x < 1.6,y <
-15 -10 -5 0 5 10 15 20 25 30 35
-10
-5
0
5
10
15
20
25
30
35 ConvNet + Hector SLAM
Hector SLAM
ConvNet
Google Cartographer
Ground truth
(a) (b)
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
-0.5
0
0.5
1
1.5
2
2.5
3ConvNet + Hector SLAM
Hector SLAM
VICON ground truth
(c)
Fig. 4. (a) The simulation shows effective loop closure with the proposed solution. (b) The hardware setup of the UAV equipped with a Hokuyo UTM-30LX
range finder and the VICON system. (c) The proposed approach significantly improve the localization accuracy when navigating a UAV.
1.6,θ < 1. Therefore, take an example that keyframes are
created once the robot travels for 1mor steers for 0.6rad,
our approach is able to perform brute-force loop detection for
6km per second. This is infeasible previously given the range
measurements as the only input. Moreover, if some other
searching strategies, like those taking position uncertainty
into account, are employed, the proposed approach can
process a huge map in a short time. Also, such a deep
learning based algorithm can be even faster in the near future
because of the rapid development of GPUs.
V. CONCLUSION
In this paper, we proposed a deep learning based frame-
work to solve the loop closure problem for 2D LiDAR.
To the best of our knowledge, this is the first work that
performs scan matching and loop closure detection using
ConvNets. According to experiments in simulation and on
a UAV, our approach achieves the accuracy of 98.2% and
false positive rate of around 0.001%, even in unseen and
noisy environment. With a contemporary GPU, our approach
is able to process 6000 pairs of laser scans per second.
In addition, it is revealed that features learned by the
ConvNets concentrate on the furthest and the nearest range
measurements, which differ a lot from previous hand-crafted
scan features. The visualization and analysis of ConvNets
show inspiring results for understanding the types of features
that are critical for geometrical perception.
Although the deep learning based framework achieves
satisfying loop detection performance, the scan matching
ConvNet is relatively inaccurate compared to traditional
scan-to-scan or scan-to-map methods. In loop closure de-
tection, any scan-to-scan approach always suffers from the
intrinsic ambiguity that similar scan measurements may not
belong to the same place. To utilize the modern ideas of scan-
to-map and submap-to-submap, combining Recurrent Neural
Networks (RNNs) or Long Short Term Memory (LSTM)
Networks with ConvNets should be a promising direction.
ACK NO WL EDG EME NTS
This work was partially supported by a NUS start-up grant
R-252-000-636-133.
REFERENCES
[1] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira,
I. D. Reid, and J. J. Leonard, “Simultaneous localization and map-
ping: Present, future, and the robust-perception age,” arXiv preprint
arXiv:1606.05830, 2016.
[2] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press,
2005.
[3] M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, et al., “Fastslam:
A factored solution to the simultaneous localization and mapping
problem,” in Aaai/iaai, 2002, pp. 593–598.
[4] S. Kohlbrecher, J. Meyer, O. von Stryk, and U. Klingauf, “A flex-
ible and scalable slam system with full 3d motion estimation,” in
Proc. IEEE International Symposium on Safety, Security and Rescue
Robotics (SSRR), November 2011.
[5] W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure
in 2d lidar slam,” in Robotics and Automation (ICRA), 2016 IEEE
International Conference on, 2016, pp. 1271–1278.
[6] G. D. Tipaldi and K. O. Arras, “Flirt-interest regions for 2d range
data,” in Robotics and Automation (ICRA), 2010 IEEE International
Conference on, 2010, pp. 3616–3622.
[7] M. Himstedt, J. Frost, S. Hellbach, H.-J. B¨
ohme, and E. Maehle,
“Large scale place recognition in 2d lidar scans using geometrical
landmark relations,” in Intelligent Robots and Systems (IROS), 2014
IEEE/RSJ International Conference on, 2014, pp. 5030–5035.
[8] C. Stachniss, G. Grisetti, D. H¨
ahnel, and W. Burgard, “Improved rao-
blackwellized mapping by adaptive sampling and active loop-closure,
in Proceedings of the Workshop on Self-Organization of AdaptiVE
behavior (SOAVE), 2004, pp. 1–15.
[9] J. Neira, J. D. Tard´
os, and J. A. Castellanos, “Linear time vehicle
relocation in slam,” in Robotics and Automation (ICRA), 2010 IEEE
International Conference on, 2003, pp. 427–433.
[10] M. Bosse and R. Zlot, “Map matching and data association for
large-scale two-dimensional laser scan-based slam,The International
Journal of Robotics Research, vol. 27, no. 6, pp. 667–691, 2008.
[11] ——, “Keypoint design and evaluation for place recognition in 2d
lidar maps,” Robotics and Autonomous Systems, vol. 57, no. 12, pp.
1211–1224, 2009.
[12] G. D. Tipaldi, L. Spinello, and W. Burgard, “Geometrical flirt phrases
for large scale place recognition in 2d range data,” in Robotics and
Automation (ICRA), 2013 IEEE International Conference on, 2013,
pp. 2693–2698.
[13] E. Olson, “M3rsm: Many-to-many multi-resolution scan matching,” in
Robotics and Automation (ICRA), 2015 IEEE International Conference
on, 2015, pp. 5815–5821.
[14] K. Granstrom, J. Callmer, F. Ramos, and J. Nieto, “Learning to detect
loop closure from range data,” in Robotics and Automation (ICRA),
2009 IEEE International Conference on, 2009, pp. 15–22.
[15] A. Nicolai, R. Skeele, C. Eriksen, and G. A. Hollinger, “Deep learning
for laser based odometry estimation,” in Deep Learning Workshop at
Robotics: Science and Systems Conference 2016, Oct 2016.
[16] M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart, and C. Cadena,
“From perception to decision: A data-driven approach to end-to-
end motion planning for autonomous ground robots,” arXiv preprint
arXiv:1609.07910, 2016.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2016, pp. 770–778.
[18] C. Kerl, J. Sturm, and D. Cremers, “Robust odometry estimation
for rgb-d cameras,” in Robotics and Automation (ICRA), 2013 IEEE
International Conference on, 2013, pp. 3748–3754.
... Ref. [13] suggested a SLAM system that uses local features obtained with CNN instead of traditional hand-made features. Regarding deep learning-based 2D SLAM, most existing methods use a method to convert range values of laser scans in Polar coordinates to panoramic grayscale images for the training of CNN models [14][15][16]. In this paper, for the training processes, we have used a new form of images which represent the laser range values as scan points in Cartesian coordinates with multiple colors. ...
... Table 1 compared the related works in the aspects of dynamic objects, the use of deep learning, and the type of sensors used. [8] O X Vision, LiDAR [9] O X Only LiDAR [11,13] X O Vision [12] O O Vision, LiDAR [14,15] X O Only LiDAR [16] X O LiDAR, IMU As described previously, most of the existing SLAM solutions in dynamic environments or those which are deep learning-based require vision sensors to perform object detection for visual SLAM. An undesirable consequence of this is that the image data have ...
... Proposed O O Only LiDAR [8] O X Vision, LiDAR [9] O X Only LiDAR [11,13] X O Vision [12] O O Vision, LiDAR [14,15] X O Only LiDAR [16] X O LiDAR, IMU As described previously, most of the existing SLAM solutions in dynamic environments or those which are deep learning-based require vision sensors to perform object detection for visual SLAM. An undesirable consequence of this is that the image data have a large data size and take much more time to process than laser scan data. ...
Article
Full-text available
This paper proposes a method for CNN-based fault detection of the scan-matching algorithm for accurate SLAM in dynamic environments. When there are dynamic objects in an environment, the environment that is detected by a LiDAR sensor changes. Thus, the scan matching of laser scans is likely to fail. Therefore, a more robust scan-matching algorithm to overcome the faults of scan matching is needed for 2D SLAM. The proposed method first receives raw scan data in an unknown environment and executes ICP (Iterative Closest Points) scan matching of laser scans from a 2D LiDAR. Then, the matched scans are converted into images, which are fed into a CNN model for its training to detect the faults of scan matching. Finally, the trained model detects the faults when new scan data are provided. The training and evaluation are performed in various dynamic environments, taking real-world scenarios into account. Experimental results showed that the proposed method accurately detects the faults of scan matching in every experimental environment.
... The work in [29] designs a 3D laser-based place recognition system to deal with the random disturbances caused by dynamic objects. In [30], Li et al. propose an end-to-end Deep Learning framework for loop closure detection and relative pose computation. ...
Article
Full-text available
This paper proposes a lightweight and efficient Neighborhood Encoding-based Global Localization (NEGL) approach for unmanned ground vehicles (UGVs). To realize the reliable feature description and overcome the restriction of limited Field of View (FOV), neighborhood encoding (NE) scheme is firstly proposed to describe the feature via constructing the structural relationship among neighborhood features. Following that, the reliability of NE scheme is analyzed through the NE-similarity measurement between the priori feature and the detected feature. In addition, the probability model of NEGL is proposed, which is a novel idea based on the priori feature map and simplified through the hierarchical clustering and distance-triggered multiple hypothesis tracking (DT-MHT). Finally, the correct global pose of the vehicle under ambiguous environments is gradually recovered. Comparative experiments using the publicly available datasets and our self-recorded datasets are conducted, and evaluation results show the superior performance of NEGL on success ratio, efficiency, running time and localization accuracy over the adaptive Monte Carlo localization (AMCL), NE+AMCL and Cartographer. Additionally, the experimental results of different FOVs demonstrate NEGL is independent on the range and the direction of the FOV.
... PointNet and PointNet++, which enable data-driven point cloud handling [27], [28], accelerates improvement of CNN-based localization using 3D LiDAR. Various types of CNN-based localization and point registration methods have been proposed in recent [29]- [35]. The CNNs could learn the entire relation of the measurements and map because better matching of them can be recognized by them. ...
Article
Full-text available
Most of the recent automated driving systems assume the accurate functioning of localization. Unanticipated errors cause localization failures and result in failures in automated driving. An exact localization failure detection is necessary to ensure safety in automated driving; however, detection of the localization failures is challenging because sensor measurement is assumed to be independent of each other in the localization process. Owing to the assumption, the entire relation of the sensor measurement is ignored. Consequently, it is difficult to recognize the misalignment between the sensor measurement and the map when partial sensor measurement overlaps with the map. This paper proposes a method for the detection of localization failures using Markov random fields with fully connected latent variables. The full connection enables to take the entire relation into account and contributes to the exact misalignment recognition. Additionally, this paper presents localization failure probability calculation and efficient distance field representation methods. We evaluate the proposed method using two types of datasets. The first dataset is the SemanticKITTI dataset, whereby four methods are compared with the proposed method. The comparison results reveal that the proposed method achieves the most accurate failure detection. The second dataset is created based on log data acquired from the demonstrations that we conducted in Japanese public roads. The dataset includes several localization failure scenes. We apply the failure detection methods to the dataset and confirm that the proposed method achieves exact and immediate failure detection.
... Deep learning techniques have been successfully applied to a registration problem, outperforming conventional geometry-based methods such as ICP and its variants [2], [3]. There exists a line of work that employs DNNs to predict a rigid transform from input point sets in an endto-end fashion [4]- [8]. Another approach combines DNN feature extraction and non-learning-based closed-form pose estimation. ...
Preprint
Point cloud registration is the basis for many robotic applications such as odometry and Simultaneous Localization And Mapping (SLAM), which are increasingly important for autonomous mobile robots. Computational resources and power budgets are limited on these robots, thereby motivating the development of resource-efficient registration method on low-cost FPGAs. In this paper, we propose a novel approach for FPGA-based 3D point cloud registration built upon a recent deep learning-based method, PointNetLK. A highly-efficient FPGA accelerator for PointNet-based feature extraction is designed and implemented on both low-cost and mid-range FPGAs (Avnet Ultra96v2 and Xilinx ZCU104). Our accelerator design is evaluated in terms of registration speed, accuracy, resource usage, and power consumption. Experimental results show that PointNetLK with our accelerator achieves up to 21.34x and 69.60x faster registration speed than the CPU counterpart and ICP, respectively, while only consuming 722mW and maintaining the same level of accuracy.
Article
Full-text available
LiDAR (Light Detection and Ranging) SLAM (Simultaneous Localization and Mapping) serves as a basis for indoor cleaning, navigation, and many other useful applications in both industry and household. From a series of LiDAR scans, it constructs an accurate, globally consistent model of the environment and estimates a robot position inside it. SLAM is inherently computationally intensive; it is a challenging problem to realize a fast and reliable SLAM system on mobile robots with a limited processing capability. To overcome such hurdles, in this paper, we propose a universal, low-power, and resource-efficient accelerator design for 2D LiDAR SLAM targeting resource-limited FPGAs. As scan matching is at the heart of SLAM, the proposed accelerator consists of dedicated scan matching cores on the programmable logic part, and provides software interfaces to facilitate the use. Our accelerator can be integrated to various SLAM methods including the ROS (Robot Operating System)-based ones, and users can switch to a different method without modifying and re-synthesizing the logic part. We integrate the accelerator into three widely-used methods, i.e., scan matching, particle filter, and graph-based SLAM. We evaluate the design in terms of resource utilization, speed, and quality of output results using real-world datasets. Experiment results on a Pynq-Z2 board demonstrate that our design accelerates scan matching and loop-closure detection tasks by up to $14.84\times$ and $18.92\times$ , yielding $4.67\times$ , $4.00\times$ , and $4.06\times$ overall performance improvement in the above methods, respectively. Our design enables the real-time performance while consuming only 2.4W and maintaining accuracy, which is comparable to the software counterparts and even the state-of-the-art methods.
Chapter
Simultaneous localization and mapping (SLAM) based on laser radar (LiDAR) is the core technology to realize mobile robot navigation, but a single LiDAR scan-matching map construction method does not meet the application requirements in the case of a single environmental feature. In this paper, based on the graph optimization approach, we use a fusion algorithm based on odometry and IMU data to improve the relative positioning accuracy, and further build the environmental map based on the ROS platform in a promenade environment with high shape similarity and single features. Data acquisition and accuracy analysis experiments verify that the algorithm can achieve great map building results.KeywordsMobile robotSLAMSensor fusionPromenade environment
Article
Full-text available
Simultaneous Localization and Mapping (SLAM) consists in the concurrent construction of a representation of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. The paper serves as a tutorial for the non-expert reader. It is also a position paper: by looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: do robots need SLAM? Is SLAM solved?
Conference Paper
Full-text available
The recognition of places that have already been visited is a fundamental requirement for a mobile robot. This particularly concerns the detection of loop closures while mapping environments as well as the global localization w.r.t. to a prior map. This paper introduces a novel solution to place recognition with 2D LIDAR scans. Existing approaches utilize descriptors covering the local appearance of discriminative features within a bag-of-words (BOW) framework accompanied with approximate geometric verification. Though limiting the set of potential matches their performance crucially drops for increasing number of scans making them less appropriate for large scale environments. We present Geometrical Landmark Relations (GLARE), which transform 2D laser scans into pose invariant histogram representations. Potential matches are found in sub-linear time using an efficient Approximate Nearest Neighbour (ANN) search. Experimental results obtained from publicly available datasets demonstrate that GLARE significantly outperforms state-of-the-art approaches in place recognition for large scale outdoor environments, while achieving similar results for indoor settings. Our Approach achieves recognition rates of 93% recall at 99% precision for a dataset covering a total path of about 6.5 km.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
We describe a new multi-resolution scan matching method that makes exhaustive (and thus local-minimum-proof) matching practical, even for large positional uncertainties. Unlike earlier multi-resolution methods, in which putative matches at low-resolutions can lead the matcher to an incorrect solution, our method generates exactly the same answer as a brute-force full-resolution method. We provide a proof of this. Novelly, our method allows decimation of both the look-up table and in the point cloud, yielding a 10x speedup versus contemporary correlative methods. When a robot closes a large-scale loop, it must often consider many loop-closure candidates. In this paper, we describe an approach for posing a scan matching query over these candidates jointly, finding the best match(es) between a particular pose and a set of candidate poses ('one-to-many'), or the best match between two sets of poses ('many-to-many'). This mode of operation finds the first loop closure as much as 45x faster than traditional 'one-to-one' scan matching.