Conference PaperPDF Available

Visual Odometry Correction based on Loop Closure Detection

Authors:

Abstract and Figures

An essential requirement in the fields of robotics and intelligent transportation systems is to know the position of a mobile robot along the time, as well as the trajectory that it describes by using on-board sensors. In this paper, we propose a novel approach focused on the use of cameras as perception sensors for visual localization in unknown environments. Our system allows to perform a robust visual odometry, where correction algorithms based on loop closure detection are applied for correctly identifying the location of a robot in long-term situations. In order to satisfy the previous conditions, we carry out a methodological improvement of some classic computer vision techniques. In addition, new algorithms are implemented with the aim of compensating the drift produced in the visual odometry calculation along the traversed path. According to this, our main goal is to obtain an accurate estimation of the position, orientation and tra-jectory followed by an autonomous vehicle. Sequences of images acquired by an on-board stereo camera system are analyzed without any previous knowledge about the real environment. Several results obtained from these sequences are presented to demonstrate the benefits of our proposal.
Content may be subject to copyright.
VISUAL ODOMETRY CORRECTION BASED ON
LOOP CLOSURE DETECTION
L. CARAMAZANA, R. ARROYO and L. M. BERGASA
Robesafe Research Group, Department of Electronics, University of Al-
calá (UAH). E-mail contact: lidia.caramazana.zarzosa@gmail.com
An essential requirement in the fields of robotics and intelligent transporta-
tion systems is to know the position of a mobile robot along the time, as
well as the trajectory that it describes by using on-board sensors. In this
paper, we propose a novel approach focused on the use of cameras as per-
ception sensors for visual localization in unknown environments. Our sys-
tem allows to perform a robust visual odometry, where correction algo-
rithms based on loop closure detection are applied for correctly identifying
the location of a robot in long-term situations. In order to satisfy the pre-
vious conditions, we carry out a methodological improvement of some
classic computer vision techniques. In addition, new algorithms are im-
plemented with the aim of compensating the drift produced in the visual
odometry calculation along the traversed path. According to this, our main
goal is to obtain an accurate estimation of the position, orientation and tra-
jectory followed by an autonomous vehicle. Sequences of images acquired
by an on-board stereo camera system are analyzed without any previous
knowledge about the real environment. Several results obtained from these
sequences are presented to demonstrate the benefits of our proposal.
1 Introduction
In recent years, the estimation of the pose of an autonomous robot or ve-
hicle using computer vision techniques has become a topic of a great inter-
est in the robotics community. This is due to the improvements in cameras
features and to their reduced costs with respect to other sensors traditional-
ly used for localization tasks, such as GPS, IMU, range-based or ultra-
Open Conference on Future Trends in Robotics
sounds, among others. Besides, the proliferation of visual SLAM systems
(Bailey et al., 2006) has extended the application of camera-based ap-
proaches for determining the global location of a mobile robot in an un-
known environment.
In this context, visual odometry (Nister et al., 2004) has the goal of
estimating the position and orientation of a robot or vehicle by analyzing
an image sequence acquired by cameras without previous information
about locations. Each pair of images is considered to match their keypoints
and calculate the translation and rotation between two poses of the vehicle.
Unfortunately, visual odometry typically accumulates a drift when long
periods of time are taken into account. This problem makes that the locali-
zation tasks could not be completely reliable in these cases. For this rea-
son, in extended trajectories the information provided by standard visual
odometry algorithms gives errors in long-term conditions.
According to the previous considerations, in this work we propose
a novel approach based on loop closure detection using ABLE (Arroyo et
al., 2014) for correcting the drift in visual odometry, which is initially
processed by means of the LIBVISO library (Kitt et al., 2010). With the
aim of solving the problems related to the drift, our system recognizes re-
visited places and recalculates a corrected pose. We contribute a method
that uses this information to estimate the deviation between the revisited
pose and the previous one. In order to validate our proposal, image se-
quences from the publicly available KITTI dataset (Geiger et al., 2013) are
employed.
2 Method for Visual Odometry: The LIBVISO Algorithm
The visual odometry algorithm provided by LIBVISO allows to determine
the six degrees of freedom (rotation and translation) in a visual localization
system. In our work, stereo cameras are employed in image acquisition.
Due to this, intrinsic and extrinsic camera parameters are needed to cor-
rectly perform the matching between the stereo images. In our case, the
tests performed in the KITTI dataset are carried out using the specific
camera parameters defined in (Geiger et al., 2013). The application of a
stereo camera approach provides a higher robustness to our global system,
because it avoids the scale ambiguities that are common when monocular
cameras are used for visual odometry computation.
VISUAL ODOMETRY CORRECTION BASED ON LOOP CLOSURE DETECTION
The methodology behind our implementation derived from
LIBVISO is mainly based on a trifocal geometry between the images. In-
itially, some keypoints are detected and their main features are extracted
and matched for each two consecutive pair of images, as shown in the ex-
ample presented in Fig. 1. Taking into account the obtained matches, the
movement of the autonomous robot or vehicle is estimated by processing a
trifocal tensor that associates the keypoints between three frames of a same
static scene.
Fig. 1. A representation of the movement estimated over an example image using
visual odometry, jointly with a diagram of the applied trifocal tensor.
In addition, the implementation of LIBVISO detects outliers using
RANSAC (Scaramuzza et al., 2009), which allows to reject the atypical
values obtained by erroneous matches and to improve the odometry results
with respect to schemes without this filtering technique. However, this is
not sufficient to avoid the drift along the time, as will be evidenced in the
section of results. For this reason, we contribute a more robust approach
based on a refined correction of the poses using loop closure information.
3 Method for Loop Closure Detection: The ABLE Algorithm
Some previous studies recently carried out by our research group in the
topic of visual loop closure detection (Arroyo et al., 2014) are now applied
to correct the drift derived from the previous visual odometry computation
stage. The developed method for identifying when a place is revisited is
named ABLE (Able for Binary-appearance Loop-closure Evaluation).
The main goal of this algorithm is to visually describe places in
order to give similarity measurements between them for elucidating if a
loop closure exists or not. Typically, ABLE computes global LDB binary
features (Yang et al., 2012) for image description. In this case, disparity in-
formation obtained from the stereo images is also added to the descriptor.
Apart from this, a variant of the description method initially designed in
Open Conference on Future Trends in Robotics
ABLE is contributed in this paper, where the recently proposed AKAZE
features (Alcantarilla et al., 2013) are tested as core of the global descrip-
tion approach instead of LDB. We implement it to evaluate the robustness
and efficiency of AKAZE, which adds gradient information in a nonlinear
space to obtain a description invariant to scale and rotation.
After describing the images, the binary features () computed for
each frame are matched to see if they are similar enough to consider a revi-
sited place. In the case of binary descriptors such as LDB or AKAZE, the
Hamming distance can be applied in matching, which provides a great ef-
ficiency, because it consists on a simple XOR operation () followed by a
basic sum of bits, as formulated in Equation (1). The obtained similarity
values are stored on a distance matrix (). These values are used to detect
the loop closures when high similarity measurements are obtained.
(, ) = (,  ) = bitsum(() ⊕ ()) (1)
4 Our Proposal for Visual Odometry Correction
The information about the loops identified in the previous system stage is
now used to correct the visual odometry. Here, we contribute the formula-
tion of our method to perform these corrections. After a revisited place is
detected in a specific frame, the drift of the pose currently estimated by the
visual odometry algorithm is compensated by taking into account the pose
obtained when the place was previously traversed. In this case, we consid-
er corrections for the plane xz, where the deviation (∆) between the current
pose () and the previous one () is calculated as follows:
∆() = |()− ()| (2)
∆() = |()− ()| (3)
Then, the current poses are updated ((), ()) in the and
axes using the previously estimated deviation:
()= ()+ ∆() (4)
()= ()+ ∆() (5)
Besides, an average deviation (∆, ∆) is subsequently
computed after detecting the first pose corresponding to a loop closure.
This information is employed to correct the poses in the rest of the trajec-
tory, where is the number of processed frames:
VISUAL ODOMETRY CORRECTION BASED ON LOOP CLOSURE DETECTION
∆ =∆()
!"
# (6)
∆ =∆$()
!"
# (7)
After calculating the average deviations in the loop zone, the poses
in the remaining frames are updated according to the following equations:
()= ()+ ∆ (8)
()= ()+ ∆ (9)
The application of the formulated corrections in poses improves
the accuracy initially obtained by only using a visual odometry without
consider the progressive drift, as corroborated in the next section of results.
5 Evaluation and Main Results
Our proposal is evaluated in the KITTI Odometry dataset (Geiger et al.,
2013). It contains 22 sequences recorded on different car routes around
Karlsruhe (Germany). GPS ground-truth measurements are available. A
ground-truth for loop closure was also defined in (Arroyo et al., 2014).
In Fig. 2, it can be seen how the visual odometry measurements
obtained by LIBVISO without correction have a considerable drift with re-
spect to the GPS ground-truth. The maps presented correspond to some
significant sequences from the KITTI dataset, which are also presented in
Fig. 3 to show the matches of the loop closures detected using ABLE.
In addition, Fig. 4 depicts some examples of distance matrices
processed by means of ABLE using LDB and AKAZE descriptors as core.
The detected loop closures correspond to the diagonals in the matrices. Be-
sides, Fig. 5 introduces precision-recall results about ABLE performance
in loop closure detection depending on the descriptor used as core. Apart
from LDB and AKAZE, we also test other typical descriptors such as
HOG (Dalal et al., 2005), SURF (Bay et al., 2008), BRIEF (Calonder et
al., 2010) and ORB (Rublee et al., 2011). These results demonstrate the
better performance of LDB and the new approach based on AKAZE.
Finally, Fig. 6 evidences the better accuracy of our proposal based
on a visual odometry with loop closure corrections, where it can be seen
how the drift is reduced with respect to the original visual odometry.
Open Conference on Future Trends in Robotics
(
a)
Sequence 00
Fig. 2.
Results for visual odometry without correction in the KITTI dataset.
(
a)
Sequence 00
Fig. 3.
Results for loop closure detection in the KITTI dataset.
(
a) ground-
truth
Fig. 4.
Examples of distance matrices from the Sequence 06 of
Fig. 5. Precision-
recall curves obtained for
Open Conference on Future Trends in Robotics
(b) Sequence 02
(c) Sequence 08
Results for visual odometry without correction in the KITTI dataset.
Sequence 00
(b) Sequence 02
(c) Sequence 08
Results for loop closure detection in the KITTI dataset.
truth
(b) using LDB (c)
using AKAZE
Examples of distance matrices from the Sequence 06 of
the KITTI dataset
recall curves obtained for
the Sequence 00
of the KITTI dataset.
(c) Sequence 08
Results for visual odometry without correction in the KITTI dataset.
(c) Sequence 08
using AKAZE
the KITTI dataset
.
of the KITTI dataset.
VISUAL ODOMETRY CORRECTION BASED ON LOOP CLOSURE DETECTION
(
a)
Sequence 05
Fig. 6. Res
ults for visual odometry with loop
6 Conclusions and Future Works
In this work, we have
visual odometry estimation
posed along the paper demonstrate the benefits of this
visible reduction of the progressive drift
The
contributions presented can be divided into three
First of all, the implementation of the initial stereo visual odometry system
based on LIBVISO.
detection, including a new
ly, the formulation of a complete
try estimations
using the information about
In future works, we
us
ing optimizations based on algorithms
Acknowledgeme
This
research has received funding from the RoboCity2030
(
S2013/MIT-2748
),
supported
References
(Alcantarilla et al., 2013)
2013, “
Fast explicit
spaces," in
British Machine Vision Conference
VISUAL ODOMETRY CORRECTION BASED ON LOOP CLOSURE DETECTION
Sequence 05
(b) Sequence 06
ults for visual odometry with loop
correction in the KITTI dataset.
6 Conclusions and Future Works
In this work, we have
defined and valid
ated our system based on a
visual odometry estimation
using loop closure corrections. The results e
posed along the paper demonstrate the benefits of this
proposal
, such as the
visible reduction of the progressive drift
accumulated along the time.
contributions presented can be divided into three
main a
First of all, the implementation of the initial stereo visual odometry system
based on LIBVISO.
Secondly, the application of ABLE for loop closure
detection, including a new
approach based on AKAZE features.
And fina
ly, the formulation of a complete
method for correcting the visual odom
using the information about
the loop closures detected
In future works, we
plan to improve our visual odometry
ing optimizations based on algorithms
such as Levenberg-M
arquardt
Acknowledgeme
nts
research has received funding from the RoboCity2030
-III-
CM project
supported
by the CAM
and Structural Funds of the EU.
(Alcantarilla et al., 2013)
Alcantarilla, P. F., Nuevo, J.,
and Bartoli, A.,
Fast explicit
difusion for accelerated features
in nonlinear scale
British Machine Vision Conference
(BMVC), pp. 1-13.
VISUAL ODOMETRY CORRECTION BASED ON LOOP CLOSURE DETECTION
(b) Sequence 06
correction in the KITTI dataset.
ated our system based on a
robust
using loop closure corrections. The results e
x-
, such as the
accumulated along the time.
main a
reas.
First of all, the implementation of the initial stereo visual odometry system
Secondly, the application of ABLE for loop closure
And fina
l-
method for correcting the visual odom
e-
the loop closures detected
.
model
arquardt
.
CM project
and Structural Funds of the EU.
and Bartoli, A.,
in nonlinear scale
Open Conference on Future Trends in Robotics
(Arroyo et al., 2014) Arroyo, R., Alcantarilla, P. F., Bergasa, L. M., Yebes,
J. J., and Bronte, S., 2014, “Fast and effective visual place recognition us-
ing binary codes and disparity information," in IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), pp. 3089-3094.
(Bailey et al., 2006) Bailey, T., and Durrant-Whyte, H., 2006, “Simultane-
ous Localisation And Mapping (SLAM): Part II State of the art,” IEEE
Robotics and Automation Magazine (RAM), 13(3): 108-117.
(Bay et al., 2008) Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L., 2008,
"Speeded-Up Robust Features (SURF)," Computer Vision and Image Un-
derstanding (CVIU), 110(3): 346-359.
(Calonder et al., 2010) Calonder, M., Lepetit, V., Strecha, C., and Fua, P.,
2010, “BRIEF: Binary Robust Independent Elementary Features," in Eu-
ropean Conference on Computer Vision (ECCV), pp. 778-792.
(Dalal et al., 2005) Dalal, N., and Triggs, B., 2005, "Histograms of
Oriented Gradients for human detection," IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), vol. 2, pp. 886-893.
(Geiger et al., 2013) Geiger, A., Lenz, P., Stiller, C., and Urtasun, R.,
2013, "Vision meets robotics: The KITTI dataset," International Journal of
Robotics Research (IJRR), 32(11): 1231-1237.
(Kitt et al., 2010) Kitt, B., Geiger, A., and Lategahn, H., 2010, “Visual
odometry based on stereo image sequences with RANSAC-based outlier
rejection scheme," in IEEE Intelligent Vehicles Symp. (IV), pp. 486-492.
(Nister et al., 2004) Nister, D., Naroditsky, O., and Bergen, J., 2004, "Vis-
ual odometry," in IEEE Conference on Computer Vision and Pattern Rec-
ognition (CVPR), vol. 1, pp. 652-659.
(Rublee et al., 2011) Rublee, E., Rabaud, V., Konolige, K., and Bradski,
G., 2011, "ORB: An efficient alternative to SIFT or SURF," in Interna-
tional Conference on Computer Vision (ICCV), pp. 2564-2571.
(Scaramuzza et al., 2009) Scaramuzza, D., Fraundorfer, F., and Siegwart,
R., 2009, “Real-time monocular visual odometry for on-road vehicles with
1-point RANSAC,” in IEEE International Conference on Robotics and Au-
tomation (ICRA), pp. 4293-4299.
(Yang et al., 2012) Yang, X., and Cheng, K. T., 2012, “LDB: An ultra-fast
feature for scalable augmented reality on mobile devices," in International
Symposium on Mixed and Augmented Reality (ISMAR), pp. 49-57.
... Loop closure is a constraint that allows to estimate the drift when a place is revisited. Papers [10] and [11] use different approaches to detect loop closure, and different ways to estimate and to correct the drift. Another way to reduce the errors is to use filtering methods, in order to merge sensors datas, like in [12] and [13]. ...
... In our proposed method, we use loop closure constraint to reduce drift .We can classify loop closure detection methods in two categories. Appearance-based approaches like [10], [11], [14] and [15]. The second category is more recent and based on Convolution Neural Networks (CNN), like [16] and [17]. ...
... In [10], the correction is based on estimating the drift between the pose currently estimated by the visual odometry algorithm and the pose given when the place was previously visited. The main idea of [11] is to measure transformation between first and last frame. ...
Article
Full-text available
This paper presents a vision-based localization framework based on visual odometry.Visual odometry is a classic approach to incrementally estimate robot motion even in GPS denied environment, by tracking features in successive images. As it is subject to drift, this paper proposes to call a convolutional neural netwok and visual memory to improve process accuracy. In fact, our framework is made of two main steps. First, the robot builds its visual memory by annotating places with their ground truth positions. Dedicated data structures are made to store referenced images and their positions. Then, during navigation step, we use loop closure corrected visual odometry. A siamese convolutional neural network allows us to detect already visited positions. It takes as input current image and an already stored one. If the place is recognized, the drift is then quantified using the stored position. Drift correction is conducted by an original two levels correction process. The first level is directly applied to the estimation by substracting the error. The second level is applied to the graph itself using iterative closest point method, to match the estimated trajectory graph to the ground truth one. Experiments showed that the proposed localization method has a centimetric accuracy.
... D'autres méthodes existent pour résoudre le problème de la dérive de l'odométrie visuelle par une optimisation sur les dernières poses estimées des caméras avec un algorithme d'ajustement de faisceaux (en anglais Bundle Adjustment) (Mouragnon et al., 2006) (Gong et al., 2015). En outre, la technique de fermeture de boucle (en anglais Loop-Closure) permet de corriger les défauts de la carte créée et de la maintenir spatialement cohérente (Birem, 2015) (Caramazana et al., 2016). ...
Thesis
La problématique qui va être abordée dans cette thèse est la localisation d’un robot mobile. Ce dernier, équipé de capteurs bas-coût, cherche à exploiter le maximum d’informations possibles pour répondre à un objectif fixé au préalable. Un problème de fusion de données sera traité d’une manière à ce qu’à chaque situation, le robot saura quelle information utiliser pour se localiser d’une manière continue. Les données que nous allons traiter seront de différents types. Dans nos travaux, deux propriétés de localisation sont désirées: la précision et la confiance. Pour pouvoir le contrôler, le robot doit connaître sa position d’une manière précise et intègre. En effet, la précision désigne le degré d’incertitude métrique lié à la position estimée. Elle est retournée par un filtre de fusion. Si en plus, le degré de certitude d’être dans cette zone d’incertitude est grand, la confiance dans l’estimation sera élevée et cette estimation sera donc considérée comme intègre. Ces deux propriétés sont généralement liées. C’est pourquoi, elles sont souvent représentées ensemble pour caractériser l'estimation retournée de la pose du robot. Dans ce travail nous rechercherons à optimiser simultanément ces deux propriétés.Pour tirer profit des différentes techniques existantes pour une estimation optimale de la pose du robot,nous proposons une approche descendante basée sur l’exploitation d’une carte environnementale définie dans un référentiel absolu. Cette approche utilise une sélection a priori des meilleures mesures informatives parmi toutes les sources de mesure possibles. La sélection se fait selon un objectif donné (de précision et de confiance), l’état actuel du robot et l’apport informationnel des données.Comme les données sont bruitées, imprécises et peuvent également être ambiguës et peu fiables, la prise en compte de ces limites est nécessaire afin de fournir une évaluation de la pose du robot aussi précise et fiable que possible. Pour cela, une focalisation spatiale et un réseau bayésien sont utilisés pour réduire les risques de mauvaises détections. Si malgré tout de mauvaises détections subsistent, elles seront gérées par un processus de retour qui réagit de manière efficace en fonction des objectifs souhaités.Les principales contributions de ce travail sont d'une part la conception d'une architecture de localisation multi-sensorielle générique et modulaire de haut niveau avec un mode opératoire descendant. Nous avons utilisé la notion de triplet perceptif qui représente un ensemble amer, capteur, détecteur pour désigner chaque module perceptif. À chaque instant, une étape de prédiction et une autre de mise à jour sont exécutées. Pour l’étape de mise à jour, le système sélectionne le triplet le plus pertinent (d'un point de vue précision et confiance) selon un critère informationnel. L’objectif étant d’assurer une localisation intègre et précise, notre algorithme a été écrit de manière à ce que l’on puisse gérer les aspects ambiguïtés.D'autre part, l'algorithme développé permet de localiser un robot dans une carte de l'environnement. Pour cela, une prise en compte des possibilités de mauvaises détections suite aux phénomènes d'ambiguïté a été considérée par le moyen d'un processus de retour en arrière. En effet, ce dernier permet d'une part de corriger une mauvaise détection et d'autre part d'améliorer l’estimation retournée de la pose pour répondre à un objectif souhaité.
... In our tests, 100 values of θ are taken in order to obtain well-defined curves. Apart from evaluation purposes, the distance values registered in M can be used in real application for correcting SLAM or visual odometry errors based on loop closure detection (Caramazana et al. 2016). A threshold (θ ) must be applied to discern if the similarity is sufficient to consider a loop closure between two places, as stated in Eq. 21: ...
Article
Full-text available
Visual topological localization is a process typically required by varied mobile autonomous robots, but it is a complex task if long operating periods are considered. This is because of the appearance variations suffered in a place: dynamic elements, illumination or weather. Due to these problems, long-term visual place recognition across seasons has become a challenge for the robotics community. For this reason, we propose an innovative method for a robust and efficient life-long localization using cameras. In this paper, we describe our approach (ABLE), which includes three different versions depending on the type of images: monocular, stereo and panoramic. This distinction makes our proposal more adaptable and effective, because it allows to exploit the extra information that can be provided by each type of camera. Besides, we contribute a novel methodology for identifying places, which is based on a fast matching of global binary descriptors extracted from sequences of images. The presented results demonstrate the benefits of using ABLE, which is compared to the most representative state-of-the-art algorithms in long-term conditions.
... Moreover, Fig. 6 shows how the distance matrices generated by OpenABLE are applicable to loop closure detection for tasks related to visual odometry correction, which is studied in works such as [24]. We present several mapping results, where loop closures are annotated while they are identified by OpenABLE. ...
Conference Paper
Full-text available
Visual information is a valuable asset in any perception scheme designed for an intelligent transportation system. In this regard, the camera-based recognition of locations provides a higher situational awareness of the environment, which is very useful for varied localization solutions typically needed in long-term autonomous navigation, such as loop closure detection and visual odometry or SLAM correction. In this paper we present OpenABLE, an open-source toolbox contributed to the community with the aim of helping researchers in the application of these kinds of lifelong localization algorithms. The implementation follows the philosophy of the topological place recognition method named ABLE, including several new features and improvements. These functionalities allow to match locations using different global image description methods and several configuration options, which enable the users to control varied parameters in order to improve the performance of place recognition depending on their specific problem requisites. The applicability of our toolbox in visual localization purposes for intelligent vehicles is validated in the presented results, jointly with comparisons to the main state-of-the-art methods.
Conference Paper
Full-text available
We present a novel approach for place recognition and loop closure detection based on binary codes and disparity information using stereo images. Our method (ABLE-S) applies the Local Difference Binary (LDB) descriptor in a global framework to obtain a robust global image description, which is initially based on intensity and gradient pairwise comparisons. LDB has a higher descriptiveness power than other popular alternatives such as BRIEF, which only relies on intensity. In addition, we integrate disparity information into the binary descriptor (D-LDB). Disparity provides valuable information which decreases the effect of some typical problems in place recognition such as perceptual aliasing. The KITTI Odometry dataset is mainly used to test our approach due to its varied environments, challenging situations and length. Additionally, a loop closure ground-truth is introduced in this work for the KITTI Odometry benchmark with the aim of standardizing a robust evaluation methodology for comparing different previous algorithms against our method and for future benchmarking of new proposals. Attending to the presented results, our method allows a fast and more effective visual loop closure detection compared to state-of-the-art algorithms such as FAB-MAP, WI-SURF and BRIEF-Gist.
Article
Full-text available
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10–100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations, and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets, and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.
Conference Paper
Full-text available
This paper presents a system capable of recovering the trajectory of a vehicle from the video input of a single camera at a very high frame-rate. The overall frame-rate is limited only by the feature extraction process, as the outlier removal and the motion estimation steps take less than 1 millisecond with a normal laptop computer. The algorithm relies on a novel way of removing the outliers of the feature matching process.We show that by exploiting the nonholonomic constraints of wheeled vehicles it is possible to use a restrictive motion model which allows us to parameterize the motion with only 1 feature correspondence. Using a single feature correspondence for motion estimation is the lowest model parameterization possible and results in the most efficient algorithms for removing outliers. Here we present two methods for outlier removal. One based on RANSAC and the other one based on histogram voting. We demonstrate the approach using an omnidirectional camera placed on a vehicle during a peak time tour in the city of Zurich. We show that the proposed algorithm is able to cope with the large amount of clutter of the city (other moving cars, buses, trams, pedestrians, sudden stops of the vehicle, etc.). Using the proposed approach, we cover one of the longest trajectories ever reported in real-time from a single omnidirectional camera and in cluttered urban scenes, up to 3 kilometers.
Conference Paper
Full-text available
A common prerequisite for many vision-based driver assistance systems is the knowledge of the vehicle's own movement. In this paper we propose a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images. Our method is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed. The only assumption we make is a known camera geometry, where the calibration may also vary over time. We employ an Iterated Sigma Point Kalman Filter in combination with a RANSAC-based outlier rejection scheme which yields robust frame-to-frame motion estimation even in dynamic environments. A high-accuracy inertial navigation system is used to evaluate our results on challenging real-world video sequences. Experiments show that our approach is clearly superior compared to other filtering techniques in terms of both, accuracy and run-time.
Conference Paper
Full-text available
We propose to use binary strings as an efficient feature point descriptor, which we call BRIEF.We show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L 2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either.
Conference Paper
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.
Conference Paper
The efficiency, robustness and distinctiveness of a feature descriptor are critical to the user experience and scalability of a mobile Augmented Reality (AR) system. However, existing descriptors are either too compute-expensive to achieve real-time performance on a mobile device such as a smartphone or tablet, or not sufficiently robust and distinctive to identify correct matches from a large database. As a result, current mobile AR systems still only have limited capabilities, which greatly restrict their deployment in practice. In this paper, we propose a highly efficient, robust and distinctive binary descriptor, called Local Difference Binary (LDB). LDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pairwise grid cells within the patch. A multiple gridding strategy is applied to capture the distinct patterns of the patch at different spatial granularities. Experimental results demonstrate that LDB is extremely fast to compute and to match against a large database due to its high robustness and distinctiveness. Comparing to the state-of-the-art binary descriptor BRIEF, primarily designed for speed, LDB has similar computational efficiency, while achieves a greater accuracy and 5x faster matching speed when matching over a large database with 1.7M+ descriptors.
Article
This tutorial provides an introduction to the Si- multaneous Localisation and Mapping (SLAM) method and the extensive research on SLAM that has been undertaken. Part I of this tutorial described the essential SLAM prob- lem. Part II of this tutorial (this paper) is concerned with recent advances in computational methods and in new for- mulations of the SLAM problem for large scale and complex environments. I. Introduction SLAM is the process by which a mobile robot can build a map of the environment and at the same time use this map to compute it's location. The past decade has seen rapid and exciting progress in solving the SLAM problem together with many compelling implementations of SLAM methods. The great majority of work has focused on im- proving computational e-ciency while ensuring consistent and accurate estimates for the map and vehicle pose. How- ever, there has also been much research on issues such as non-linearity, data association and landmark characterisa- tion, all of which are vital in achieving a practical and robust SLAM implementation. This tutorial focuses on the recursive Bayesian formula- tion of the SLAM problem in which probability distribu- tions or estimates of absolute or relative locations of land- marks and vehicle pose are obtained. Part I of this tutorial surveyed the development of the essential SLAM algorithm in state-space and particle-fllter form, described a number of key implementations and cited locations of source code and real-world data for evaluation of SLAM algorithms. Part II of this tutorial (this paper), surveys the current state of the art in SLAM research with a focus on three key areas; computational complexity, data association, and environment representation. Much of the mathematical no- tation and essential concepts used in this paper are deflned in Part I of this tutorial and so are not repeated here. SLAM, in it's naive form, scales quadratically with the number of landmarks in a map. For real-time implemen- tation, this scaling is potentially a substantial limitation in the use of SLAM methods. Section II surveys the many approaches that have been developed to reduce this complexity. These include linear-time state augmentation, sparsiflcation in information form, partitioned updates and submapping methods. A second major hurdle to overcome in implementation of SLAM methods is to correctly as- sociate observations of landmarks with landmarks held in the map. Incorrect association can lead to catastrophic failure of the SLAM algorithm. Data association is par- ticularly important when a vehicle returns to a previously mapped region after a long excursion; the so-called 'loop- closure' problem. Section III surveys current data as- sociation methods used in SLAM. These include batch- validation methods that exploit constraints inherent in the SLAM formulation, appearance-based methods, and multi- hypothesis techniques. The third development discussed in this tutorial is the trend towards richer appearance-based models of landmarks and maps. While initially motivated by problems in data association and loop closure, these methods have resulted in qualitatively difierent methods of describing the SLAM problem; focusing on trajectory esti- mation rather than landmark estimation. Section IV sur- veys current developments in this area along a number of lines including delayed mapping, the use of non-geometric land-marks, and trajectory estimation methods. SLAM methods have now reached a state of consider- able maturity. Future challenges will centre on methods enabling large scale implementations in increasingly un- structured environments and especially in situations where GPS-like solutions are unavailable or unreliable; in urban canyons, under foliage, underwater or on remote planets.