Conference PaperPDF Available

Performance Analysis of Visual/Inertial Integrated Positioning in Typical Urban Scenarios of Hong Kong

Abstract and Figures

There is an increasing demand for accurate and robust positioning in many application domains, such as the unmanned aerial vehicle (UAV) and autonomous driving vehicles (ADV). The integration of visual odometry and inertial navigation system (INS) is extensively studied to fulfill the positioning requirement. The visual odometry can provide aided positioning by matching consecutive frames of images. However, it can be sensitive to illumination conditions and features availability in urban environment. In this paper, we propose to evaluate the performance of tightly coupled visual/inertial integrated positioning in a typical urban scenario of Hong Kong based on existing state-of-the-art visual/inertial integration algorithm. The performance of visual/inertial integrated positioning is tested and validated in a typical urban scenario of Hong Kong which includes numerous dynamic participants, vehicles, pedestrians and trunks. The result shows that the visual/inertial integration can be affected in scenes with excessive dynamic objects
Content may be subject to copyright.
Proceedings of 2019 Asian-Pacific Conference on Aerospace Technology and Science
August 28-31, 2019, Taiwan
Performance Analysis of Visual/Inertial Integrated
Positioning in Typical Urban Scenarios of Hong Kong
Xiwei Bai1 Weisong Wen2 and Li-Ta Hsu*1
1Interdisciplinary Division of Aeronautical and Aviation Engineering, Hong Kong
Polytechnic University, Hong Kong
2Department of Mechanical Engineering, Hong Kong Polytechnic University, Hong
Kong
*Corresponding author: Li-Ta Hsu
E-mail: lt.hsu@polyu.edu.hk, Tel.: +852-3400-8061
Abstract
There is an increasing demand for accurate and robust positioning in many application
domains, such as the unmanned aerial vehicle (UAV) and autonomous driving vehicles
(ADV). The integration of visual odometry and inertial navigation system (INS) is
extensively studied to fulfill the positioning requirement. The visual odometry can
provide aided positioning by matching consecutive frames of images. However, it can
be sensitive to illumination conditions and features availability in urban environment. In
this paper, we propose to evaluate the performance of tightly coupled visual/inertial
integrated positioning in a typical urban scenario of Hong Kong based on existing
state-of-the-art visual/inertial integration algorithm. The performance of visual/inertial
integrated positioning is tested and validated in a typical urban scenario of Hong Kong
which includes numerous dynamic participants, vehicles, pedestrians and trunks. The
result shows that the visual/inertial integration can be affected in scenes with excessive
dynamic objects
Keywords: Positioning; Navigation; Sensor fusion; Visual/inertial integrated
positioning system
1. Introduction
Accurate and robust positioning is significant for the unmanned aerial vehicle (UAV) [1]
and autonomous driving vehicles (ADV) [2] in an urban area. The integration of visual
odometry [3] and inertial navigation system (INS) is extensively studied to fulfill the
positioning requirement. The visual/inertial integrated positioning method is a
promising solution for autonomous systems. The visual odometry can use camera
images extracted feature points and match them with previous frames to provide aided
positioning [4]. However, it can be sensitive to illumination conditions and features
2
availability. The low-cost inertial navigation system could provide high-frequency
attitude and acceleration measurements. The recently proposed tightly coupled
visual/inertial integration method [5] can obtain prominent positioning performance in
constraint scenarios with enough environment features and ideal illumination conditions.
The factor graph [6] is employed to integrate the visual odometry, visual loop closure [5]
and INS raw measurements. At present, the main methods of Visual Odometry are
divided into two parts based on feature points and direct methods without features.
Based on the former, a method proposed to reduce the noise effects in the sequential
trajectory reconstruction process, which could improve the accuracy in the feature
points matching [7]. The Direct Sparse Odometry (DSO) is a visual odometry based on
a direct structure and motion formulation, so it can sample pixels from all image regions
with intensity gradient [8]. Unfortunately, the sole visual odometry are not robust to
light conditions, dynamic changes, image conditions and the motion estimate still drifts
without loop closure. These factors make localization difficult in outdoor environments
[9-11]. However, the performance of visual/inertial integration can be challenged or
degraded in a scenario with excessive dynamic objects. Since the violation of geometric
constraints in a dynamic environment, and the optical flow characteristic will be
affected [12].
In this paper, we propose to evaluate the performance of tightly coupled visual/inertial
integrated positioning in a typical urban scenario of Hong Kong based on the work in
[5]. The tested scenario can have numerous dynamic participants, vehicles, pedestrians
and trunks, etc. Firstly, the state-of-the-art INS pre-integration technique [13] is
employed to get the transformation between consecutive frames INS raw measurements
to derive the INS factor. Then the feature-based visual odometry is conducted based on
features matching to derive the visual odometry factor. Finally, we make use of the
Ceres [14] to solve the factor graph optimization to get the optimal estimation of the
positioning state set. The performance of visual/inertial integrated positioning is tested
and validated in a typical urban scenario of Hong Kong. The result shows that the
visual/inertial integration can be affected in scenes with excessive dynamic objects.
The main contributions of this paper are listed as follows:
1) We evaluate the performance of visual/inertial integrated positioning system in an
urban scenario of Hong Kong with numerous dynamic objects.
2) We analyze the performance of the visual/inertial integration positioning versus the
quality of visual feature tracking.
The rest of paper is organized as following: we discuss the methodology in Section 2
based on the VINS-Fusion framework. The performance analysis is shown in Section 3.
Finally, the conclusion of this research is summarized in Section 4.
3
2. Methodology
The evaluated visual/inertial integrated positioning framework is based on the work in
[5]. The flowchart is shown in Figure 1. The inputs are the sensor measurements from
IMU and images from monocular camera. This system starts from raw measurements
pre-processing of camera and IMU. And then the initialization provides all necessary
values for nonlinear optimization. During the initialization, the loosely coupled sensor
fusion is used to obtain the initial value. Firstly, the pure visual estimation of the pose of
all the frames in the sliding window is performed by SFM, and then aligned with the
IMU pre-integration, therefore obtaining the attitude, velocity, gravity vector and 3D
feature location. The output is the position and orientation estimation. The detail of the
evaluated visual/inertial integrated positioning algorithm can be found in [5]. We
evaluate this technique on vehicle localization in deep urban area.
Fig. 1. Flowchart of the evaluated visual/inertial positioning system.
3. Experiment Results
a) Experiment Setup
The sensor setup is shown in the left-hand side of Figure 2 and the data is collected on
12th, April 2019. The IMU (Xsens Mti 30) is used to collect the high-frequency attitude
and acceleration measurements. The monocular camera is used to capture consecutive
images. Both IMU and camera are installed on top of a vehicle. The reference trajectory
is provided by NovAtel SPAN-CPT (RTK GNSS/INS integrated positioning system). A
dynamic experiment is conducted in an urban scenario in Hong Kong. The yellow curve
in right-hand side of Figure 2 shows the tested trajectory.
The tested scenarios are shown in Figure 3. We can find that the illumination is varable
during the test which can introduce significant challenge. The dynamic vehicles are
passing through which can severely distort the performance of feature tracking process
[5]. In short, the illumination and dynamic objects are the two major challenges. We
believe that the evaluated scenarios can really be a challenging case for visual/inertial
integrated positioning which is crucial for autonomous driving.
4
Fig. 2. The experiment setup
Fig. 3. Snapshots of the evaluated scenarios with numerous dynamic objects
b) Performance analysis
In this paper, we focus on analyzing the performance of the existing state-of-the-art
visual/inertial integrated positioning performance in urban canyons.
In order to evaluate the performance of visual/inertial integrated positioning system,
three aspects are analysed:
1) 2D positioning error VS velocity error: this part analyses the relationship between
the 2D positioning error and the velocity error of visual/inertial integrated
positioning.
2) 2D positioning error VS mean number of features tracking [5]: this part analyses
the impact of average number of times each visual feature is tracked in each key
frame for 2D positioning [5]. For more details of feature tracking can be found in
[5].
5
3) 2D positioning error VS feature track difference: this part analyses mean number of
features tracking difference between two consecutive frames which indicates the
current features tracking number of times minus the one at last time
The trajectories of the visual/inertial and reference positioning are shown in Figure 4.
Table I shows the 2D positioning performance of the evaluated visual/inertial integrated
positioning system. The red curve represents the reference trajectory and the green
curve represents the positioning from evaluated visual/inertial integrated positioning.
Firstly, 34.21 meters of mean positioning error is obtained based on the evaluated
method with a standard deviation of 15.49 meters. Moreover, the maximum error
reaches 67.32 which is not acceptable for autonomous driving vehicle localization. The
second column shows the 2D velocity error during the experiment. The mean error and
standard deviation are 0.92 and 0.79 respectively. The detailed results of 2D positioning
error and velocity error can be found in Figure 5. The third column shows the mean
number of feature tracking with mean error of 35.56 and a standard deviation of 49.31
respectively. The last column shows mean number of feature tracking difference during
the experiment with mean error and standard deviation are 5.72 and 9.29 respectively.
TABLE I
POSITIONING PERFORMANCE OF THE EVALUATED VISUAL/INERTIAL INTEGRATED POSITIONING
Items
2D error
2D velocity error
mean number of
feature tracking
Mean error
34.21m
0.92
35.56
Std
15.49
0.79
49.31
Maximum
error
67.32
4.13
Fig. 4. The trajectories of the visual/inertial positioning (green curve) and reference
6
trajectory (red curve).
As the Figure 5 shows, the top panel shows the reference and VIO velocity and the
bottom panel shows the 2D positioning error. We find that the 2D error increases
significantly during epoch 20 to 60 and 100 to 200 when 2D velocity error is more than
0.92 (mean error). Moreover, when the VIO velocity during epoch 330 to 350 is more
than 10, the 2D error also increases slightly. In short, the performance of the
visual/inertial integrated positioning is correlated with the reference and VIO velocity.
Fig. 5. Positioning error VS reference and VIO velocity
As the Figure 6 shows, the top panel shows the mean number of feature tracking and the
bottom panel shows the 2D positioning error. We find that the bottom panel 2D error
increases significantly during epoch 100 to 200 when mean number of feature tracking
is limited (less than 20). As the Figure 3 shows, many dynamic objects and the
illumination environment cause the features decrease. Moreover, when the velocity is
zero (the vehicle stops), mean number of feature track rises significantly (epoch 70~80
and 270~330). In short, the performance of the visual/inertial integrated positioning is
correlated with the number of feature tracking.
7
Fig. 6. Positioning error VS mean number of feature tracking
As the Figure 7 shows, the top panel shows the mean number of feature tracking
difference and the bottom panel shows the 2D positioning error. We find that the bottom
panel 2D error increases dramatically during epoch 20 to 60 when mean number of
feature tracking difference changed significantly (more than 50). Interestingly, when the
2D error is very large during epoch 100 to 200, the feature tracking difference fluctuates
slightly. As the Figure 3 shows, many moving objects affect the number of features over
the period (less than 20). In short, the performance of the visual/inertial integrated
positioning is correlated with the number of feature tracking difference.
Fig. 7. Positioning error VS mean number of feature tracking difference
8
In conclusion, the performance of visual/inertial is affected by these factors: the velocity
between reference and VIO, mean number of feature tracking, mean number of feature
tracking difference. More importantly, numerous dynamic objects in the road will
reduce the features tracking number and lead to heavy 2D positioning error. In the
future, we plan to use YOLO, which is a real-time object detection system, to detect the
moving objects and then remove them.
References
[1] F. Nex and F. Remondino, "UAV for 3D mapping applications: a review," Applied
geomatics, vol. 6, no. 1, pp. 1-15, 2014.
[2] C. Urmson et al., "Autonomous driving in urban environments: Boss and the urban
challenge," Journal of Field Robotics, vol. 25, no. 8, pp. 425-466, 2008.
[3] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, "ORB-SLAM: a versatile and
accurate monocular SLAM system," IEEE Transactions on Robotics, vol. 31, no. 5,
pp. 1147-1163, 2015.
[4] F. Steinbrücker, J. Sturm, and D. Cremers, "Real-time visual odometry from dense
RGB-D images," in 2011 IEEE International Conference on Computer Vision
Workshops (ICCV Workshops), 2011, pp. 719-722: IEEE.
[5] T. Qin, P. Li, and S. Shen, "Vins-mono: A robust and versatile monocular
visual-inertial state estimator," IEEE Transactions on Robotics, vol. 34, no. 4, pp.
1004-1020, 2018.
[6] V. Indelman, S. Williams, M. Kaess, and F. Dellaert, "Factor graph based
incremental smoothing in inertial navigation systems," in Information Fusion
(FUSION), 2012 15th International Conference on, 2012, pp. 2154-2161: IEEE.
[7] S. Fiani and M. Fravolini, "A robust monocular visual odometry algorithm for
autonomous robot application," IFAC Proceedings Volumes, vol. 43, no. 16, pp.
551-556, 2010.
[8] J. Engel, V. Koltun, and D. Cremers, "Direct sparse odometry," IEEE transactions on
pattern analysis and machine intelligence, vol. 40, no. 3, pp. 611-625, 2017.
[9] R. Gonzalez, F. Rodriguez, J. L. Guzman, C. Pradalier, and R. Siegwart, "Control of
off-road mobile robots using visual odometry and slip compensation," Advanced
Robotics, vol. 27, no. 11, pp. 893-906, 2013.
[10] K. Nagatani, A. Ikeda, G. Ishigami, K. Yoshida, and I. Nagai, "Development of a
visual odometry system for a wheeled robot on loose soil using a telecentric
camera," Advanced Robotics, vol. 24, no. 8-9, pp. 1149-1167, 2010.
[11] N. NouraniVatani and P. V. K. Borges, "Correlationbased visual odometry for
ground vehicles," Journal of Field Robotics, vol. 28, no. 5, pp. 742-768, 2011.
[12] M. R. U. Saputra, A. Markham, and N. Trigoni, "Visual SLAM and structure from
motion in dynamic environments: A survey," ACM Computing Surveys (CSUR), vol.
51, no. 2, p. 37, 2018.
[13] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, "iSAM2:
9
Incremental smoothing and mapping using the Bayes tree," The International Journal
of Robotics Research, vol. 31, no. 2, pp. 216-235, 2012.
[14] S. Agarwal and K. Mierle, "Ceres solver: Tutorial & reference," Google Inc, vol. 2,
p. 72, 2012.
... As a result, pose estimation from VINS can drift or even be lost due to the degraded features tracking caused by dynamic objects [9], such as moving vehicles and pedestrians. In fact, our previous study in [10] evaluates the performance of a state-of-the-art VINS method, the VINS-Mono [11], in diverse urban canyons with numerous dynamic objects. The results show [10] that dynamic objects are one of the major reasons that degrade the performance of VINS in urban areas. ...
... In fact, our previous study in [10] evaluates the performance of a state-of-the-art VINS method, the VINS-Mono [11], in diverse urban canyons with numerous dynamic objects. The results show [10] that dynamic objects are one of the major reasons that degrade the performance of VINS in urban areas. To mitigate the effects of dynamic objects on the accuracy of VINS, the major research streams include 1) dynamic objects detection based on motion tracking; 2) moving objects detection and removal based on deep learning; 3) mitigate the effects of dynamic objects using robust methods. ...
... Therefore, we believe that remodel the outlier measurement is preferable. Interestingly, our previous work in [10] extensively evaluated the performance of VINS in urban canyons and we find that the positioning error is highly correlated with the quality of feature tracking with almost linear correlation [10]. Moreover, our recent work [41] shows that the excessive exclusion of DFP can distort the geometry of feature distribution, which can degrade the performance of VINS. ...
Preprint
Full-text available
Visual-inertial integrated navigation system (VINS) has been extensively studied over the past decades to provide accurate and low-cost positioning solutions for autonomous systems. Satisfactory performance can be obtained in an ideal scenario with sufficient and static environment features. However, there are usually numerous dynamic objects in deep urban areas, and these moving objects can severely distort the feature tracking process which is fatal to the feature-based VINS. The well-known method mitigates the effects of dynamic objects is to detect the vehicles using deep neural networks and remove the features belongs to the surrounding vehicle. However, excessive exclusion of features can severely distort the geometry of feature distribution, leading to limited visual measurements. Instead of directly eliminating the features from dynamic objects, this paper proposes to adopt the visual measurement model based on the quality of feature tracking to improve the performance of VINS. Firstly, a self-tuning covariance estimation approach is proposed to model the uncertainty of each feature measurements by integrating two parts: 1) the geometry of feature distribution (GFD), 2) the quality of feature tracking. Secondly, an adaptive M-estimator is proposed to correct the measurement residual model to further mitigate the impacts of outlier measurements, such as the dynamic features. Different from the conventional M-estimator, the proposed method effectively alleviates the reliance of excessive parameterization of M-estimator. Experiments are conducted in a typical urban area of Hong Kong with numerous dynamic objects, and the results show that the proposed method could effectively mitigate the effects of dynamic objects and improved accuracy of VINS is obtained when compared with the conventional method.
... As a result, pose estimation from the VINS can significantly drift or even fail due to degraded feature tracking caused by dynamic objects [10], such as moving vehicles and pedestrians. Our previous study in [11] evaluated the performance of a state-of-the-art VINS method, the VINS-Mono [6], in diverse urban canyons with numerous dynamic objects. The results show [11] that dynamic objects are one of the major reasons why the performance of the VINS is degraded in urban areas. ...
... Our previous study in [11] evaluated the performance of a state-of-the-art VINS method, the VINS-Mono [6], in diverse urban canyons with numerous dynamic objects. The results show [11] that dynamic objects are one of the major reasons why the performance of the VINS is degraded in urban areas. Major directions of researches on mitigating the effects of dynamic objects on the accuracy of the VINS include: (1) dynamic object detection [12] based on motion tracking; (2) moving object detection and removal based on deep learning [13]; (3) mitigating the effects of dynamic objects using robust methods. ...
... Therefore, we believe that remodeling outlier measurement is preferable. Interestingly, our previous work in [11] extensively evaluated the performance of the VINS in urban canyons and we find out that the positioning error is closely related to the quality of feature tracking with almost a linear relationship [11]. Moreover, our recent work [43] shows that the excessive exclusion of DFPs can distort the geometry of feature distribution, which can degrade the performance of the VINS. ...
Article
Full-text available
The visual-inertial integrated navigation system (VINS) has been extensively studied over the past decades to provide accurate and low-cost positioning solutions for autonomous systems. Satisfactory performance can be obtained in an ideal scenario with sufficient and static environment features. However, there are usually numerous dynamic objects in deep urban areas, and these moving objects can severely distort the feature-tracking process which is critical to the feature-based VINS. One well-known method that mitigates the effects of dynamic objects is to detect vehicles using deep neural networks and remove the features belonging to surrounding vehicles. However, excessive feature exclusion can severely distort the geometry of feature distribution, leading to limited visual measurements. Instead of directly eliminating the features from dynamic objects, this study proposes to adopt the visual measurement model based on the quality of feature tracking to improve the performance of the VINS. First, a self-tuning covariance estimation approach is proposed to model the uncertainty of each feature measurement by integrating two parts: (1) the geometry of feature distribution (GFD); (2) the quality of feature tracking. Second, an adaptive M-estimator is proposed to correct the measurement residual model to further mitigate the effects of outlier measurements, like the dynamic features. Different from the conventional M-estimator, the proposed method effectively alleviates the reliance on the excessive parameterization of the M-estimator. Experiments were conducted in typical urban areas of Hong Kong with numerous dynamic objects. The results show that the proposed method could effectively mitigate the effects of dynamic objects and improved accuracy of the VINS is obtained when compared with the conventional VINS method.
... The recently extensively studied visual-inertial navigation system (VINS) [15] can provide accurate and relative positioning in a short period. However, according to our findings in [16], the performance of VINS can be significantly degraded in urban canyons due to the numerous unexpected dynamic objects [17] and unstable illumination conditions. Therefore, the integration of GNSS/IMU/camera [18] is studied to make use of their complementary properties to provide better performance. ...
Article
Full-text available
Achieving accurate and reliable positioning in dynamic urban scenarios using low-cost vehicular onboard sensors, such as the global navigation satellite systems (GNSS), camera, and inertial measurement unit (IMU), is still a challenging problem. Multi-Agent collaborative integration (MCI) opens a new window for achieving this goal, by sharing the sensor measurements between multiple agents to further improve the accuracy of respective positioning. One of the major difficulties in MCI is to effectively connect all the sensor measurements arising from multiple independent agents. The popular approach is to find the overlapping areas between agents using active sensors, such as cameras. However, the performance of overlapping area detection is significantly degraded in outdoor urban areas due to the challenges arising from numerous unexpected moving objects and unstable illumination conditions. To fill this gap, this paper proposes to leverage both the camera-based overlapping area detection and the inter-ranging measurements to boost the cross-connection between multi-agents and brings the MCI to outdoor urban scenarios using low-cost onboard sensors. Moreover, a novel MCI framework is proposed to integrate the sensor measurements from the low-cost GNSS receiver, camera, IMU, and inter-ranging using state-of-the-art factor graph optimization (FGO) to fully explore their complementary properties. The proposed MCI framework is validated using two challenging datasets collected in urban canyons of Hong Kong. We conclude that the proposed MCI framework can effectively improve the positioning accuracy of the respective agents in the evaluated datasets. We believe that the proposed MCI framework has the potential to be prevalently adopted by the connected intelligent transportation systems (ITS) applications to provide robust positioning using low-cost onboard sensors in urban scenarios.
Article
In this paper, we proposed a graduated non-convexity (GNC) aided outlier mitigation method for the improvement of the visual-inertial integrated navigation system (VINS) to face the challenge of dynamic environments with numerous unexpected outlier measurements. A GNC optical flow algorithm was proposed for the detection of the outliers of feature tracking in the front-end of VINS by iteratively estimating the optical flow and the optimal weightings of feature correspondences. Then the feature correspondences with small weightings were excluded. However, excessive outlier exclusion may cause insufficient constraints on the state, causing degeneration of VINS. To solve the problem, this paper proposed to detect the potential degeneration based on the degree of constraint in different directions of the pose estimation. Then the number of features being considered was intelligently adapted based on the degeneration level to improve the geometry constraint in the coming epochs. We evaluated the effectiveness of the proposed method by using two challenging datasets (including challenging night scenarios) collected in urban canyons of Hong Kong. The results show that the proposed method can effectively reject the potential outlier visual measurements, and alleviate the degeneration, leading to improved positioning performance in both evaluated datasets.
Article
Full-text available
This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.
Article
Full-text available
Unmanned aerial vehicle (UAV) platforms are nowadays a valuable source of data for inspection, surveillance, mapping, and 3D modeling issues. As UAVs can be considered as a low-cost alternative to the classical manned aerial photogrammetry, new applications in the short- and close-range domain are introduced. Rotary or fixed-wing UAVs, capable of performing the photogrammetric data acquisition with amateur or SLR digital cameras, can fly in manual, semiautomated, and autonomous modes. Following a typical photogrammetric workflow, 3D results like digital surface or terrain models, contours, textured 3D models, vector information, etc. can be produced, even on large areas. The paper reports the state of the art of UAV for geomatics applications, giving an overview of different UAV platforms, applications, and case studies, showing also the latest developments of UAV image processing. New perspectives are also addressed.
Conference Paper
Full-text available
This paper describes a new approach for information fusion in inertial navigation systems. In contrast to the commonly used filtering techniques, the proposed approach is based on a non-linear optimization for processing incoming measurements from the inertial measurement unit (IMU) and any other available sensors into a navigation solution. A factor graph formulation is introduced that allows multi-rate, asynchronous, and possibly delayed measurements to be incorporated in a natural way. This method, based on a recently developed incremental smoother, automatically determines the number of states to recompute at each step, effectively acting as an adaptive fixed-lag smoother. This yields an efficient and general framework for information fusion, providing nearly-optimal state estimates. In particular, incoming IMU measurements can be processed in real time regardless to the size of the graph. The proposed method is demonstrated in a simulated environment using IMU, GPS and stereo vision measurements and compared to the optimal solution obtained by a full non-linear batch optimization and to a conventional extended Kalman filter (EKF).
Conference Paper
Full-text available
We present an energy-based approach to visual odometry from RGB-D images of a Microsoft Kinect camera. To this end we propose an energy function which aims at finding the best rigid body motion to map one RGB-D image into another one, assuming a static scene filmed by a moving camera. We then propose a linearization of the energy function which leads to a 6×6 normal equation for the twist coordinates representing the rigid body motion. To allow for larger motions, we solve this equation in a coarse-to-fine scheme. Extensive quantitative analysis on recently proposed benchmark datasets shows that the proposed solution is faster than a state-of-the-art implementation of the iterative closest point (ICP) algorithm by two orders of magnitude. While ICP is more robust to large camera motion, the proposed method gives better results in the regime of small displacements which are often the case in camera tracking applications.
Article
In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.
Article
A monocular visual-inertial system (VINS), consisting of a camera and a low-cost inertial measurement unit (IMU), forms the minimum sensor suite for metric six degrees-of-freedom (DOF) state estimation. However, the lack of direct distance measurement poses significant challenges in terms of IMU processing, estimator initialization, extrinsic calibration, and nonlinear optimization. In this work, we present VINS-Mono: a robust and versatile monocular visual-inertial state estimator.Our approach starts with a robust procedure for estimator initialization and failure recovery. A tightly-coupled, nonlinear optimization-based method is used to obtain high accuracy visual-inertial odometry by fusing pre-integrated IMU measurements and feature observations. A loop detection module, in combination with our tightly-coupled formulation, enables relocalization with minimum computation overhead.We additionally perform four degrees-of-freedom pose graph optimization to enforce global consistency. We validate the performance of our system on public datasets and real-world experiments and compare against other state-of-the-art algorithms. We also perform onboard closed-loop autonomous flight on the MAV platform and port the algorithm to an iOS-based demonstration. We highlight that the proposed work is a reliable, complete, and versatile system that is applicable for different applications that require high accuracy localization. We open source our implementations for both PCs and iOS mobile devices.
Article
We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry -- represented as inverse depth in a reference frame -- and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
Article
This manuscript evaluates through physical experiments a navigation architecture for off-road mobile robots. It has been implemented taking into account low-cost components and computational requirements while still being able to robustly address the required tasks. The visual odometry approach uses two consumer-grade cameras for estimating robot displacement and orientation (visual compass). One of these cameras is also used for slip estimation. The control scheme updates online the feedback gains taking into account the current slip. Furthermore, the setpoints are sent to two low-level controllers, PI controllers with anti-windup, properly tuned. In order to validate the proposed architecture, we have employed a tracked mobile robot in off-road conditions, gravel terrains, with a slip around 5%. The selected physical experiments show the right integration of the localization layer (visual odometry with visual compass) and the control layer with the slip compensation controller and the appropriate computation time (lesser than 0.2 s are employed at each sampling instant).
Conference Paper
The paper proposes a methodology for reducing the effects of noise in the sequential trajectory reconstruction process based on a monocamera video stream. The recovery of the robot odometry is based on a features tracking technique. In the proposed approach, the estimation of the of ego-motion is improved by exploiting the information extracted from a dynamic window that memorizes the last frames seen by the camera. At each step, the Fundamental Matrix is recovered by selecting, from the dynamic window, the best matching frames ranked according a specific index of performance that is the condition number of the LS problem associated to the Fundamental Matrix computation. The performance of the proposed strategy are analyzed in presence of image quantization noise and in presence of inaccuracy in the feature point matching.