Conference PaperPDF Available

Towards a real-time image mosaicing solution

Authors:

Abstract and Figures

Real-time aerial image mosaicing is a crucial task for future search and rescue missions. Solving the correspondence problem, estimating a valid transformation and visualizing the result is computational intensive. It becomes more challenging if the flying platform is a small micro air vehicle (MAV), which limits the available margin for payload significantly. This paper proposes a robust algorithm that is able to create and update photo maps in a fixed period of time. The approach uses a high number of features and strict match filtering to allow robust matching without additional sensor information. Subsequently the ability of todays most common single board computers (SBC) to run the presented algorithm is examined. Together with the selection of a lightweight board camera the setup is less than 100 grams allowing even small MAVs to generate maps in real-time
Content may be subject to copyright.
Towards a real-time aerial image mosaicing solution
Alexander Kern
, Markus Bobbe and Ulf Bestmann
TU Braunschweig, Germany
ABS TR ACT
Real-time aerial image mosaicing is a crucial
task for future search and rescue missions. Solv-
ing the correspondence problem, estimating a
valid transformation and visualizing the result is
computational intensive. It becomes more chal-
lenging if the flying platform is a small micro air
vehicle (MAV), which limits the available mar-
gin for payload significantly.
This paper proposes a robust algorithm that is
able to create and update photo maps in a fixed
period of time. The approach uses a high num-
ber of features and strict match filtering to allow
robust matching without additional sensor infor-
mation. Subsequently the ability of todays most
common single board computers (SBC) to run
the presented algorithm is examined. Together
with the selection of a lightweight board camera
the setup is less than 100 grams allowing even
small MAVs to generate maps in real-time.
1 INTRODUCTION
In recent years the development of unmanned aircrafts
considerably reduced the costs and effort required to gener-
ate aerial images. As a result aerial mapping is getting more
common in various fields such as agriculture or construc-
tion side documentation. Common mapping results are 2D
pseudo-orthophotos and 3D surface meshes. Both are com-
monly generated using structure from motion or photogram-
metry workflows which are started after the survey flight and
usually take several hours.
The capability for cost and time efficient aerial map-
ping is also needed by various emergency response units. In
civil catastrophe scenarios for instance after earthquakes or
tsunamis these maps can be used by rescue workers to get an
overview of the situation. While a 3D mesh of the area gener-
ated by structure from motion or photogrammetry can fulfill
the need for an all around perspective at best, it is compu-
tationally intensive and therefore too time consuming. Cer-
tainly time is of critical importance to maintain an effective
coordination of the emergency aid to save human lifes.
This paper aims to provide a solution for a 2D real-time
mapping implementation. In its first half a suitable mosaicing
algorithm is discussed and presented. Afterwards the hard-
ware requirements are determined and compared to hardware
contact: al.kern@tu-braunschweig.de
available today. The ultimate goal is to create a lightweight
standalone package capable of performing the image acquisi-
tion and mapping task independently from the aerial vehicle
it is attached to.
Figure 1: General approach to image mosaicing.
2 REAL-TIME CAPABLE MOSAICING ALGORITHM
In this section the focus lies on the algorithms used for
image mosaicing. The choice is highly influenced by the
requirements emerging from real-time performance. While
classic approaches use bundle adjustment to minimize the re-
projection error over all images in a single, iterative step to
get a global consistent solution, it is not feasible for real-time
applications. The computational time is too high and will
increase with every image taken as the complexity increases
with O((m+n)3), where mis the number of images and nis
the number of structure points [1]. The real-time performance
imposes the requirement of a constant maximum time frame
for every stitching iteration. This time frame has to be inde-
pendent from the number of images that have already been
taken. To allow such performance the extensive OpenCV 3.0
C++ library is well suited and is therefore used for this task.
1
2.1 Feature Extraction
In figure 1 the general approach for image mosaicing is
displayed. As soon as a new image is acquired and sent to
the pipeline, the first step is initiated. This step combines de-
tection of markable points and description of the pixels near
environment. For this task the ORB algorithm is used. It
is based on the FAST keypoint detector and the BRIEF de-
scription of these keypoints [2]. The fact ORB uses a binary
descriptor allows very efficient logical operations and com-
parisons, which fits the need for a real-time capable system
excellently.
2.2 Feature Matching
If the input image is the first of the series it is then saved
with all its features, since there is no other data it can be
stitched to. After the acquisition and feature extraction of the
next image the search for corresponding points is initiated.
To ensure a fixed time frame the current input image is only
stitched to the previous, thereby matches between these two
sets of features have to be found. It is the most crucial part as
it influences the final transformation notably. After generat-
ing a list of matches by brute force comparing their descrip-
tor it is therefore beneficial to filter these results. A common
method to identify good matches is the ratio test [3]. It com-
pares the best and the second best match for each feature by
their euclidean distance. A good match has a high chance of
not having a second best match too close, which would be an
indicator for a homogenous image section.
Additionally the previously introduced characteristic of
ORB as a binary descriptor is used. It describes the features
near environment with 1 and 0. Comparing two descriptors
bit by bit and quantifying the difference by the hamming dis-
tance a proposition about the likeliness of two features can be
done. That way the set of raw matches can be reduced to a set
of best matches. Figure 2 shows the movement of the matches
from two consecutive frames. On the top the movement of the
raw matches is displayed, whereas on the bottom the filtered
matches can be seen. In figure 3 the resulting stitched scene
based on 43 aerial images is displayed. It should be noted,
that reducing the matches drastically can result in decreas-
ing transformation stability despite their quality, as they may
not be uniformly spread across the image leaving more de-
grees of freedom. A good compromise between quality and
quantity of matches should therefore be intended and can be
achieved by selecting proper parameters for the pipeline de-
pending mainly on the processing power and the overall per-
formance requirements.
To improve the overall stability of the algorithm by min-
imizing the total drifting error it is furthermore beneficial to
use an initial guess of the position to compute image neigh-
boorhood relations. This can be done by kalman or parti-
cle filtering, additional sensor data like GPS or simple lin-
ear extrapolation of image centroids. If neighbours can be
found within a defined range, matched features with the pre-
vious frame can be reprojected into the neighboring images
to increase the number of observations and therefore ulti-
mately improve transformations. At this step advanced state-
of-the-art algorithms perform local bundle adjustment to op-
timize camera pose and 3D world points for even higher ac-
curacy [4]. Considering the increasing complexity accompa-
nied with numeric optimization a simpler approach for our
system was chosen. The neighbourhood relations are used to
compute additional matches between more than two frames
resulting in reduced error propagation, which is evaluated in
chapter 4.
Figure 2: Movement of the matched features from two con-
secutive frames with unfiltered matches on the top and filtered
by ratio test and hamming distance on the bottom.
2.3 Transformation estimation
After generating features and determining their corre-
sponding points the transformation estimation starts. The
most common method for real-time capable mosaicing ap-
proaches is the identification of the homography. It describes
the relative position of two planes by a 3x3 matrix with 8
degrees of freedom, called perspective transformation. How-
Figure 3: Stitched result with unfiltered matches on the left
and with ratio and hamming filtering on the right for a test
dataset.
ever in practice it has proven to be more reliable to use pitch
and roll stabilised image data for mapping, reducing the over-
all complexity of the homography to
H=A t
OT1(1)
where Arepresents a 1D rotation and scaling matrix and t
the translation vector. Thus with 3 corresponding points the
similiar transformation can be identified. It is important to
be aware, that while estimating the homography the minimal
geometric error between input and reference image is calcu-
lated. However the reference image in the pipeline is always
the last stitched image. To achieve global consistent solu-
tions it is therefore convenient to first transform all features
used for calculation into the reference coordinate system and
then compute the homography [5]. That way the input im-
age is aligned relatively to the global reference and not to the
untransformed previous image. Additionally random sam-
pling consistency (RANSAC) is applied to reduce the number
of outliers and obtain the maximum transformation accuracy
during the process.
2.4 Image composition
In case a valid transformation matrix was found the input
image must then be composited visually with the rest of the
data. This step contains a high risk of breaking the real-time
requirement, as the growing global map must be updated in a
fixed time frame. Therefore only the section of the image that
actually changed has to be considered. To achieve this a com-
mon approach is to project the edges of the input image into
the global reference system using the previously estimated
homography. In conclusion the maximum dimension of the
warped input image is known and can be reduced to a region
of interest for the visualization.
Additionally to increase performance even more the vi-
sualization process can be decoupled from the rest of the
pipeline in aspects of image resolution. While working on
lower resolution data for fast but stable feature extraction and
matching the calculated matrix can be scaled afterwards. This
approach also implies sending only raw image data and the
corresponding transformation informations to the user. Once
received, the frame can be aligned in high resolution to the
global reference by using the complete processing power of
the ground station. Furthermore implementing it that way,
lost data packages during uplink to the UAV will not be a rel-
evant problem as the solution on-board stays consistent the
whole time.
3 HARDWARE SELECTION
In the previous section a potential real-time capable im-
age mosaicing algorithm was descriped. The next step is to
choose a lightweight hardware that is able to satisfy the de-
fined requirements.
3.1 SBC Selection
The performance of the proposed algorithm is mainy in-
fluenced by two parameters. First is the image resolution,
second the number of features that are extracted. However
these parameters are not independent from each other. A high
resolution image has a lot of details that can be detected as
markable points. The higher the resolution, the more mark-
able points and the more features can be extracted. To ensure
matching features it is therefore necessary to extract as many
markable points as possible while keeping the computational
time low. This is aggravated by the fact that during matching
brute force is used. In conclusion every feature is compared
with each other resulting in an O(n2)complexity.
Following the approach given by [6] the selection of a suf-
ficently capable single board computer (SBC) is now possi-
ble. Their framework quantifies the performance of the devel-
opers computer in OpenCV by running two different standard
algorithms and measuring their respective processing time T1
and T2. The first is a simple (ComplexityC1= 0%), the
second a complex one (C2= 100%). By interpolating linear
between these two sampling points the user is able to identify
the complexity Calg of his own algorithm with
Calg =Talg T1
T2T1
C2+C1(2)
In the next step [6] tested common SBCs with both algo-
rithms defining T1and T2on common SBC hardware. With
a known Calg on the developers computer Talg on the SBC
can then be estimated using
T0
alg =Calg C1
C2
(T0
2
T0
1) + T0
1(3)
Table 1 shows the resulting processing performance of the
proposed mosaicing algorithm with a working resolution of
480 x 360 pixels and roughly 500 extracted features per im-
age on different SBCs. Furthermore table 2 shows the spec-
ifications of the boards allowing to choose the best match-
ing component concerning size and weight. Overall the Brix
board is the best compromise by performance and weight. It
also outperforms the Intel NUC while being less than half as
heavy. However considering the goal of a very lightweight
setup the Odroid XU3 offers the best solution. In our finals
system an Odroid XU4 was selected as it comes with the same
processing hardware but less periphery.
Board Estimated fps
ITX i7 31.3
Brix 53.5
NUC 43.3
Odroid XU3 12.3
Odroid U3 8.1
ITX atom 9.3
Jetson 6.6
Rasp. Pi B 0.6
Table 1: Processing performance for the proposed algorithm
with different SBCs [7].
3.2 Camera specifications
At first it is important to determine the general mission
requirements. In an emergency aid scenario a human should
be indentifiable on the aerial images. Therefore a resolution
of 5 pixels per 30 cm should at least be maintained resulting in
a ground sampling distance (GSD) of 6 cm / pixel. Taking the
image resolution defined in section 3.1 into account this leads
to an image ground dimension of 28.8 x 21.6 m. The vehicle
operation altitude can vary but was chosen to be at least 30 m
to prevent collisions with high trees or other obsticles.
Figure 4: UI-1221LE camera developed by IDS.
For a lightweight setup the use of board cameras is a
promising option. In figure 4 the µEye UI-1221LE developed
by IDS is displayed. With a resolution of 0.36 Megapixel, a
maximum of 87.2 fps and global shutter it is a suitable camera
for this mission. Following the definition of the GSD
GSD =hr el
d
f nP ixel
(4)
with the relative altitude hrel above ground, das the camera
chip width and nP ixel as image width the focal length of the
camera lens can be deduced. Applying hr el = 30 m, d= 4.52
mm, nP ixel = 480 pixels and GSD = 6 cm / pixel the focal
length of the camera is estimated with equation 4 to
f=hrel
d
GSD nP ixel
= 4.7mm (5)
Finally to guarantee a stable stitching process a high over-
lap between two consecutive frames is recommendable. Tests
showed that an overlap of at least 75% is required for a stable
stiching process.
Figure 5: Processing time distribution for each image with
image loading (green), feature detection (red) and total pro-
cessing time (blue)
4 EVAL UATIO N
The proposed algorithm was tested on the determinded
hardware using an available dataset. At first the performance
of the SBC is analyzed and compared with the estimation
made in the previous section. Subsequently the accuracy of
the calculated trajectory is evaluated using a GPS reference.
The data set used was published by [8] in 2016 and includes
381 images of a village captured by a UAV in an height of
165 m. Following section 3.2 the image resolution should
be at least 480x360 pixels with a height of 30 m and 75%
overlap. The latter parameters are overly fulfilled by the data
set, which results in higher stability of the algorithm with less
ground resolution. This can not be transferred directly to the
Name Processor Memory Weight [gram] Power@100% [Watt] Volume[cm3]
mini-ITX I7 Intel i7-4770S 16GB 684 68 1815
Brix Intel i7-4500 8GB 172 26 261
NUC Intel i5-4250U 8GB 550 20 661
Odroid XU3 Samsung Exynos 5422 2GB 70 11 131
Odroid U3 Samsung Exynos 4412 2GB 52 7 79
mini-ITX atom Intel Atom D2500 8GB 427 24 1270
Jetson Cortex A15 2GB 185 13 573
Rasp. Pi B ARM1176JZF-S 512MB 69 4 95
Table 2: Processing performance for the proposed algorithm with different SBCs [7].
Figure 6: Trajectory generated by the GPS receiver (blue) and
the stitching algorithm (red) with additional neighbour frames
mission requirements, as the resolution is not enough to
identify humans safely. But it should be sufficent to verify the
general performance and accuracy capabilities. Additionally
the resolution can be adjusted in the real scenario as proposed
in section 2.4 for visualization purposes.
4.1 SBC Performance
The extrapolations made in section 3.1 indicate the
Odroid XU4 to run the stitching pipeline at 12.3 FPS. Us-
ing the given data set a mean runtime of 103 ms per frame
was achieved. This concludes 9.7 FPS for the designed sys-
tem making it slightly slower than estimated. The processing
time for each image is displayed in figure 5. It can be no-
ticed that image reading from the harddrive (green) shapes
the mean processing time significantly, while feature match-
ing (red) defines the overall variance. Despite the small varia-
tions around the mean value the complexity can be identified
as constant per frame.
4.2 GPS - Image trajectory comparison
To evaluate the accuracy of the transformations and there-
fore the quality of the mapping process a comparison between
UAV trajectories will be displayd. The trajectory produced by
Figure 7: Trajectory generated by the GPS receiver (blue)
and the stitching algorithm (red) without additional neighbour
frames
the proposed algorithm was analyzed by following the x-
and y-coordinates of image centroids in the global reference
frame. The scale was extracted by identifying markable cen-
troids from the images and measuring their distance in satel-
lite images. Calculating the meters per pixel and applying
this informations to the rest of the data produced the red out-
put in figure 6. The GPS trajectory measured by the UAV
(standalone, single frequency receiver) on the other hand is
displayed in blue. Figure 7 in contrast shows the calculated
trajectory for the same data set without additional neighbour
frame matching. It can be noticed, that the overall visual con-
sistency only fits the first leg flown. Even though an align-
ment of the trajectories was found visually by assuming the
error to be minimal at start, the error propagation of the trans-
formation estimation obviously grows and affects the map-
ping solution negatively. However for future analysis another
possibility might be to align the starting points, calculate a
least squares transformation for the first centroids and apply-
ing this to the rest of the data. That way error propagation can
be directly calculated between every single measurement, al-
lowing a more distinct analysis.
Figure 8: Public image sequence data set visualized by the proposed stitching algorithm
0 50 100 150 200 250 300 350 400 450
frame
0
20
40
60
80
100
120
140
160
180
height [m]
Figure 9: Height measurements by GPS (blue) and stitching
algorithm (red)
Additional analysis can be achieved by decomposing the
homography into rotation, scale and translation. Subse-
quently by multiplying the resulting scale with the initial
UAV height of 165 m a direct comparison to the GPS height
measurements can be done. Figure 9 shows this comparison.
On the left the total height measured by image processing is
displayed in red while GPS data is in blue. Significant is the
sinusoidally characteristic of the plot. This is also an indica-
tion for the reduced error propagation achieved by neighbour
frame matching. The first leg flown by the UAV ends at frame
77 followed by the second leg until frame 105. In this exact
period the error drops down to zero. This repeats constantly
following the flight routine of the UAV with a steady growing
error offset.
5 CONCLUSION
A lightweight setup for image mosaicing was introduced
together with a real-time capable stitching algorithm. The
feature based approach solves the problem using only image
data and no additional sensor informations. Tests with a pub-
lic dataset showed promising results reducing the error prop-
agation through homography estimation by extracting high
amounts of markable points and filtering the matches after-
wards.
The evaluation of available SBCs revealed that the Odroid
XU4 is capable of running the pipeline with an average of 9.7
FPS. In combination with a board camera like the UI-1221LE
the whole setup is light and can further be developed to a
standalone module making real-time mapping available for a
variety of MAVs. Final flight tests with the defined setup will
show if an additional camera stabilization is required and are
planned for the very near future.
REFERENCES
[1] Kaushik Mitra and Rama Chellappa. A scalable projec-
tive bundle adjustment algorithm using the l infinty norm.
Sixth Indian Conference on Computer Vision, Graphics
Image Processing, 2008.
[2] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary
Bradski. Orb: an efficient alternative to sift or surf. 2011
International Conference on Computer Vision, 2011.
[3] David G. Lowe. Distinctive image features from scale-
invariant keypoints. International Journal of Computer
Vision, 2004.
[4] Montiel J. M. M. Mur-Artal, Ra´
ul and Juan D. Tard´
os.
ORB-SLAM: a versatile and accurate monocular SLAM
system. IEEE Transactions on Robotics, 31(5):1147–
1163, 2015.
[5] Taygun Kekec, Alper Yildrim, and Mustafa Unel. A
new approach to real-time mosaicing of aerial images.
Robotics and Autonomous Systems, 62:1755–1767, 2014.
[6] Dries Hulens, Jon Verbeke, and Toon Goedem. How to
choose the best embedded processing platform for on-
board uav image processing? VISSAPP 2015, 2015.
[7] Dries Hulens. Embedded processing board selection tool.
www.eavise.be/hulens/selectiontool.html, 2016.
[8] Gang Wan Zhenbao Liu Shuhui Bu, Yong Zhao.
Map2dfusion: Real-time incremental uav image mosaic-
ing based on monocular slam. 2016.
... Nonetheless, with the advent of binary descriptors such as ORB [14], which has been proven to be fast to compute and match, some works [16], [11], [9] have exploited the ORB features to generate an aerial image mosaicing while fulfilling real-time performance up to 30 fps. However, the number of features extracted and the image resolution have been restricted to a minimum, thus preventing to deal with high resolution images, reducing the scope of the application context. ...
... Therefore, this graph gives us a panorama of the drone's speed and the frames taken to generate the panorama, such that from frame 300 to 600 the trajectory speed remains constant and 141 frames were used to generate the panorama. Note that the flight is made up of few frames due to the drone's speed was For the sake of comparison, Table I shows the mean total time, considering a trajectory with constant speed, compared to the standard binary descriptor-based feature matching using OpenCV, and those results reported in [16], [11], [9]. Even though the mean total time reported is quite similar, the proposed approach handles a higher image resolution and a larger amount of extracted descriptors without any parallel processing involving DSP, GPU or multi-core processing, but only the organisation model based on LSH and binary descriptors. ...
... Our approach 2200 800 × 480 35 OpenCV matching 2200 800 × 480 240 [16] n/m 720 × 576 192.6 [11] n/m 320 × 240 6.001 [9] 500 480 × 360 31.3 . An accurate homography translates into an accurate image stitching. ...
... Common implementations can be found in consumer robot vacuum cleaners, and they are an increasingly important component of autonomous vehicle navigation systems. More importantly, SLAM solutions can produce real-time data, and, although less accurate than photogrammetry, they provide an excellent alternative to the mosaicking of drone-based imagery (Bu et al., 2016;Kern et al., 2016). Therefore, we will aim to implement a SLAM algorithm, such as ORB-SLAM2 We are now developing tests to decide the most efficient way of SLAM implementation. ...
Article
Full-text available
This paper presents new developments on drone-based automated survey for the detection of individual items or fragments of material culture visible on the ground surface. Since the publication of our original proof of concept, awarded with the Journal of Archaeological Science and Society for Archaeological Sciences Emerging Investigator Award 2019, additional funding has allowed us to implement a series of improvements to the method. These aim to improve detection capabilities and the extraction of items' shapes and increase flight autonomy, control, area covered per flight and the type of environments in which the method can be applied while reducing computing needs, processing time and expertise necessary for its application. This paper provides an account of the methods followed to achieve these objectives, their preliminary results and the current development for their implementation into a free and open-source system that can be used by the archaeological community at large.
... Given its robustness and fast matching, this binary descriptor has been recently used for real-time aerial image mosaicing. In [17], ORB descriptors are extracted and matched by using a brute force approach. Accordingly, once outliers are removed using RANSAC, a perspective transformation in the form of either a homography or a rigid transformation matrix is computed to stitch the images. ...
Article
Full-text available
This paper presents a GPU-based real-time approach for generating high-definition (HD) aerial image mosaics. The cumbersome process of registering HD images is addressed by a parallel scheme that rapidly matches binary features. The proposed feature matcher takes advantage of the fast ORB (oriented FAST and rotated BRIEF) descriptor and its attainable arrangement into hash tables. By exploiting the best functionalities of binary descriptors and hashing-based data structures, the process of creating HD mosaics is accelerated. On average, real-time performance of 14.5 ms is achieved in a frame-to-frame process, for input images of 2.7 K resolution (2704 × 1521). For evaluation purposes in terms of robustness and speed, we selected two image registration methods for comparison. The first method uses the feature extractor and matcher modules of the well-known ORB-SLAM. The second comparison is carried out against the standard KNN-based matcher of OpenCV. The experiments were conducted under different conditions and scenarios, and the proposed approach exhibits a speed-up of 10.5 times compared to ORB-SLAM-based approach and 36.5 times compared to the OpenCV matcher. Therefore, this research widens the range of applications for aerial mosaicing, since the proposed system is capable of creating high-detail panoramas of large sites while acquiring data.
Chapter
Real world applications of UAV imagery are growing at a rate faster than ever. Along with this growth comes the need to process the UAV images and extract useful information from them. This paper illustrates a comprehensive python-based algorithm to stitch multiple images gathered from a single UAV and then perform a landscape scan to identify features and other non-homogeneities in the ortho-mosaiced image. The methodology introduced for image stitching involves key point detection using the SIFT algorithm and key point matching using KNN and RANSAC algorithms. The methodology introduced for object identification involves the computation of intensity changes between blocks of pixels in the horizontal direction (H-Scan). These intensity changes are then sorted and filtered before being mapped to feature types such as house roofs, mud trails, forest cover, etc. depending on the image being analyzed. The results indicate that the algorithm can extract meaningful information such as the location and intensity of features from the ortho-mosaiced image. The computational power required to implement this algorithm is extremely minimal, making it a good preliminary algorithm to use for mosaicing and analyzing a set of overlapping UAV images.KeywordsUnmanned aerial vehiclesImage processingObject identificationOrtho-Mosaicing
Conference Paper
Full-text available
For a variety of tasks, complex image processing algorithms are a necessity to make UAVs more autonomous. Often, the processing of images of the on-board camera is performed on a ground station, which severely limits the operating range of the UAV. Often, offline processing is used since it is difficult to find a suitable hardware platform to run a specific vision algorithm on-board the UAV. First of all, it is very hard to find a good trade-off between speed, power consumption and weight of a specific hardware platform and secondly, due to the variety of hardware platforms, it is difficult to find a suitable hardware platform and to estimate the speed the user's algorithm will run on that hardware platform. In this paper we tackle those problems by presenting a framework that automatically determines the most-suited hardware platform for each arbitrary complex vision algorithm. Additionally, our framework estimates the speed, power consumption and flight time of this algorithm for a variety of hardware platforms on a specific UAV. We demonstrate this methodology on two real-life cases and give an overview of the present top processing CPU-based platforms for on-board UAV image processing.
Article
Full-text available
This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.
Conference Paper
Full-text available
The traditional bundle adjustment algorithm for structure from motion problem has a computational complexity of O((m+n)3) per iteration and memory requirement of O(mn(m+n)), where m is the number of cameras and n is the number of structure points. The sparse version of bundle adjustment has a computational complexity of O(m3+mn) per iteration and memory requirement of O(mn). Here we propose an algorithm that has a computational complexity of O(mn(radicm+radicn)) per iteration and memory requirement of O(max(m,n)). The proposed algorithm is based on minimizing the Linfin norm of reprojection error. It alternately estimates the camera and structure parameters, thus reducing the potentially large scale optimization problem to many small scale subproblems each of which is a quasi-convex optimization problem and hence can be solved globally. Experiments using synthetic and real data show that the proposed algorithm gives good performance in terms of minimizing the reprojection error and also has a good convergence rate.
Conference Paper
Full-text available
Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
We present a new image mosaicing technique that uses sequential aerial images captured from a camera and is capable of creating consistent large scale mosaics in real-time. To find the alignment of every new image, we use all the available images in the mosaic that have intersection with the new image instead of using only the previous one. To detect image intersections in an efficient manner, we utilize ‘Separating Axis Theorem’, a geometric tool from computer graphics which is used for collision detection. Moreover, after a certain number of images are added to the mosaic, a novel affine refinement procedure is carried out to increase global consistency. Finally, gain compensation and multi-band blending are optionally used as offline steps to compensate for photometric defects and seams caused by misregistrations. Proposed approach is tested on some public datasets and it is compared with two state-of-the-art algorithms. Results are promising and show the potential of our algorithm in various practical scenarios.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.