ArticlePDF Available

Abstract and Figures

Keypoint detection usually results in a large number of keypoints which are mostly clustered, redundant, and noisy. These keypoints often require special processing like Adaptive Non-Maximal Suppression (ANMS) to retain the most relevant ones. In this paper, we present three new efficient ANMS approaches which ensure a fast and homogeneous repartition of the keypoints in the image. For this purpose, a square approximation of the search range to suppress irrelevant points is proposed to reduce the computational complexity of the ANMS. To further speed up the proposed approaches, we also introduce a novel strategy to initialize the search range based on image dimension which leads to a faster convergence. An exhaustive survey and comparisons with already existing methods are provided to highlight the effectiveness and scalability of our methods and the initialization strategy.
Content may be subject to copyright.
1
Pattern Recognition Letters
journal homepage: www.elsevier.com
Ecient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint
distribution
Oleksandr Bailo, Francois Rameau∗∗, Kyungdon Joo, Jinsun Park, Oleksandr Bogdan,InSoKweon
aDepartment of Electrical Engineering, KAIST, Daejeon, 34141, Republic of Korea
ABSTRACT
Keypoint detection usually results in a large number of keypoints which are mostly clustered, re-
dundant, and noisy. These keypoints often require special processing like Adaptive Non-Maximal
Suppression (ANMS) to retain the most relevant ones. In this paper, we present three new ecient
ANMS approaches which ensure a fast and homogeneous repartition of the keypoints in the image.
For this purpose, a square approximation of the search range to suppress irrelevant points is proposed
to reduce the computational complexity of the ANMS. To further speed up the proposed approaches,
we also introduce a novel strategy to initialize the search range based on image dimension which leads
to a faster convergence. An exhaustive survey and comparisons with already existing methods are
provided to highlight the eectiveness and scalability of our methods and the initialization strategy.
c
2018 Elsevier Ltd. All rights reserved.
1. Introduction
Keypoint detection is often the first step for various tasks
such as SLAM [14], panorama stitching [4], camera calibra-
tion [3], and visual tracking [12, 5]. Therefore, this stage poten-
tially aects the robustness, stability, and accuracy of the afore-
mentioned applications. In the past decade, we have witnessed
significant advances in keypoint detectors leading to major im-
provements in terms of accuracy, speed, and repeatability. But
while the detection of keypoints has been intensively studied,
ensuring their homogeneous spatial distribution has attracted a
rather low level of attention. It is well known that spatial point
distribution is crucial to avoiding problematic cases like de-
generated configurations (for structure from motion or SLAM)
or redundant information (i.e. cluster of points) as depicted
in Fig. 1. Moreover, a homogeneous and unclustered point dis-
tribution might speed up most computer vision pipelines since a
lower number of keypoints is needed to cover the whole image.
One of the most eective solutions to ensure well-distributed
keypoint detection is to apply an Adaptive Non-Maximal Sup-
pression (ANMS) algorithm on the keypoints extracted by a de-
tector. However, despite all the advantages oered by such ap-
proaches, these methods have been rarely used in practice due
∗∗Corresponding author: Tel.: +8210-3355-7120;
E-mail addresses: frameau@rcv.kaist.ac.kr (F. Rameau)
(a) (b) (c)
Fig. 1: Keypoint detection: (a) TopM NMS, (b) bucketing, (c) proposed ANMS.
The bottom right subimage represents the coverage and clusteredness of key-
points computed using a Gaussian kernel. The red color in the subimage stands
for a dense cluster of points, while the blue color represents an uncovered area.
to their high computational complexity. To overcome this lim-
itation we propose three novel approaches called Range Tree
ANMS (RT ANMS), K-d Tree ANMS (K-dT ANMS), and Sup-
pression via Square Covering (SSC). The developed algorithms
aim to eciently select the strongest and well-distributed key-
points across the image. We achieve such performance using
a square search range approximation which is initialized in an
optimal and intuitive manner (see Fig. 2).
An abundant number of experiments are used to demon-
strate the relevance of our ANMS algorithms in terms of speed,
spatial distribution, and memory eciency. Furthermore, we
experimentally highlight that ANMS is a beneficial step for
SLAM, which drastically improves the accuracy of the motion
estimation while using a restricted number of keypoints.
2
Keypoint detection
Input image
Preprocessing and
initialization
Image with output keypoints
(a) (b)
(d)
Iteration
# detected points: 9
Range size: 250p # detected points: 34
Range size: 125p # detected points: 126
Range size: 63p # detected points: 58
Range size: 94p # detected points: 105
Range size: 70p
Suppression range search
(c)
Fig. 2: Algorithm’s workflow: (a) keypoint detection in the original image (depicted in blue), (b) sorting keypoints by strength and initialization of the search
range, (c) conceptual representation of our ANMS algorithm where: every column represents the search range guess (orange boxes) through a binary search process
iterated until queried number of points is reached (100 in this example); while every row depicts the iterations through input points, (d) final result where the red
dots represent the selected keypoints.
To sum up, the contributions of this paper are the following:
Three novel and ecient ANMS algorithms
A new and optimal initialization of the search range
An extensive series of experiments against state-of-the-art
Ecient and optimized ANMS codes are made available
at https://github.com/BAILOOL/ANMS-Codes.
This paper is organized as follows. In Section 2, we provide
an extensive literature review of existing approaches. The nota-
tions as well as proposed methods are introduced in Section 3.
Finally, a large number of experiments is provided in Section 4
followed by a brief conclusion (Section 5).
2. Related work
In this section, we report existing methods that have been de-
veloped to improve the spatial distribution of keypoints. These
approaches can be divided into three categories: bucketing ap-
proaches, Non-Maximal Suppression (NMS), and ANMS.
2.1. Bucketing approach
Currently, the bucketing-based point detection approach [10]
is the most common technique used to ensure good repartition
of the keypoints. This approach is relatively simple: the source
image is partitioned into a grid and keypoints are detected in
each grid cell. The bucketing-based approach is ecient for
detecting keypoints all over the image, however, it is unable to
avoid the presence of redundant information ( i.e. clusters of
keypoints).
2.2. Non-maximal suppression
NMS (also referred to as TopM) is often used to remove a
large number of keypoints which are mostly redundant or noisy
responses of the keypoint detectors. The most common ap-
proach for NMS [15] consists of suppressing the weakest key-
points using an empirically determined threshold. Thereafter,
the clusteredness is often reduced by suppressing the keypoints
which do not belong to a local maximum in a particular ra-
dius. NMS is a straightforward and fast way to reject unnec-
essary corners, but, in many real case situations, this approach
leads to a very limited spatial dissemination of the keypoints
(see Fig. 1(b)).
It should be noted that certain works have recently attempted
to improve the NMS stage by introducing a novel adaptive
cornerness score calculation taking into consideration the lo-
cal contrast around the keypoints [16]. Thus, these approaches
tend to improve the spatial distribution as well as the robustness
against illumination variations. However, they suer from the
point clustering eect inherent to NMS approaches.
2.3. Adaptive non-maximal suppression
ANMS methods have been developed to tackle the aforemen-
tioned drawbacks. These techniques enforce better keypoint
spatial distribution by jointly taking into account the cornerness
strength and the spatial localization of the keypoints. The very
first ANMS approach was proposed by Brown et al. [4]. The
authors initially introduced this concept to robustify the image
matching for panorama stitching. In that work, the keypoints
are suppressed based on their corner strength and the location
of the closest strong keypoint. Unfortunately, the original im-
plementation of this ANMS has a quadratic complexity which
is not suitable for real-time applications such as SLAM.
To overcome this problem, multiple attempts to reduce the
computational time of ANMS have been investigated. For
instance, Cheng et al. [7] proposed an algorithm using a 2-
dimensional k-d tree for space-partitioning of high-dimensional
data. Using this data structure, the keypoints are separated into
rectangular image regions. Then, from each cell, the strongest
features are selected as the output sample set. This algorithm
was extended by Behrens et al. [1] using a general tree data
structure. While these methods perform faster than the tra-
ditional ANMS [4], they do not necessarily output homoge-
neously distributed points.
More recently, Gauglitz et al. [8] have proposed two comple-
mentary approaches that reportedly perform in a subquadratic
run time. In the first approach, the authors have chosen to use
an approximate nearest neighbor algorithm [6] which relies on
a randomized search tree [17]. The second algorithm named
Suppression via Disk Covering (SDC) aims to further boost the
performance of the ANMS. The algorithm simulates an approx-
imate radius-nearest neighbor query by superimposing a grid
3
onto the keypoints and approximating the Euclidean distance
between keypoints by the distance between the centers of the
cells into which they fall.
Our proposed approaches tackle the limitations of previous
works while maintaining favorable eciency and scalability.
3. Methodology
In this section, we describe a problem statement and propose
several ecient algorithms which ensure a homogeneous repar-
tition of keypoints in the image. Specifically, we cover ANMS
based on Tree Data Structure (TDS) (includes K-dT and RT
ANMSs) followed by Suppression via Square Covering (SSC).
Lastly, we provide a derivation of the initialization of the search
range to further speed-up proposed algorithms.
3.1. Problem statement
Most of the recent ANMS approaches share a common
pipeline. The set of two-dimensional (d=2) input keypoints
Pin ={pi
in}n
i=1of size n=card(Pin ) (where card(.) stands for
cardinality operator) is extracted by the detector — and sorted
according to the cornerness score of the points. Further, the
keypoints in Pin are iteratively processed to compute a smaller
and better-distributed set of output keypoints Pout ={pi
out}m
i=1of
size m=card(Pout ), where mis defined by the user. The output
set of points ensures a good spatial coverage all over the image
while avoiding clustering. This homogeneous point distribution
is enforced by a spatial consistency check in an adaptive search
range of size w(wis the radius of a circle or half the side of
a square depending upon what approach is used) defining the
suppression neighborhood around a candidate point pin. The
radius wis adjusted until the number of retrieved points is close
to maccording to a certain threshold m±t, where trepresents
user-defined tolerance threshold.
3.2. ANMS based on Tree Data Structure
Using a data structure is a common way to approach the
ANMS problem [8]. However, previous attempts have resulted
in relatively inecient implementations (Section 2). In addi-
tion, as observed in [1], after the ANMS step, there are still
regions in the image containing a high level of clusteredness.
In this section, we propose an ecient algorithm which relies
on more suitable data structures and maintains good spatial key-
point distribution. K-dimensional Tree [13] (K-dT) and Range
Tree [2] (RT) have been used for this purpose.
First, K-dT is a binary search tree where the data in each node
is a K-dimensional point in space. Using this data structure al-
lows space partitioning to organize points in a K-dimensional
space. This partitioning can be used to eciently retrieve the
set of points Pwwhich falls into a defined range around a partic-
ular point. On the other hand, RT is an alternative to K-dT. RT
is a binary search tree where the data in each node contains an
associated structure that is a (d1)-dimensional RT. Compared
to K-dTs, RTs oer faster query times in exchange for worse
storage complexity (see Table 1). While these two data struc-
tures are essentially dierent, from the high-level perspective,
the algorithm is generic and appropriate for any data structure
Table 1: TDS time and storage analysis.
K-d Tree Range Tree
Time Storage Time Storage
Insert O(log n)
O(n)
O(log dn)
O(nlog d1n)Query O(n11/d+card(Pw)) O(log dn+card(Pw))
Delete O(log n)O(log dn)
that is capable of retrieving the set of points within the defined
range. Therefore, we describe both proposed algorithms (i.e
K-dT ANMS and RT ANMS) within this subsection.
The TDS is built on keypoints Pin sorted in decreasing or-
der of strength (i.e., cornerness score). This TDS is used in
our algorithm as a way to eciently obtain the nearest neigh-
bors of a particular keypoint given a search range. This search
range is determined by the binary search that tries to guess the
most appropriate search range wto satisfy the queried num-
ber of keypoints. For every wguess, the nearest neighbors of
each keypoint (processed in a decreasing order of strength) are
suppressed in a way that they will not be considered in further
iterations under the selected w. For this purpose, the index list
Idxsis used to keep track of the uncovered keypoints. The bi-
nary search terminates when the number of retrieved keypoints
is close to the number of queried keypoints maccording to a tol-
erance threshold m±t. The outline is provided in Algorithm 1.
The proposed algorithm has similarities to the algorithm pre-
sented in [8] where the authors have chosen to use an approxi-
mate nearest neighbor algorithm which relies on a randomized
search tree. However, that algorithm [8] is not optimally ef-
ficient since it performs both query and delete operations for
each candidate keypoint in Pin per radius guess. Furthermore,
it requires dynamically adding/removing keypoints to the tree
which drastically slows performance. In contrast, our algo-
rithms achieve comparable results with a single query opera-
tion per search range guess, which makes it more ecient and
scalable.
Algorithm 1: ANMS based on TDS
Input: keypoints Pin extracted by the detector
Output: spatially distributed keypoints Pout
sort Pin by strength
build T DS on sorted Pin
initialize binary search boundaries (Sec. 3.4)
while binary search for search range w do
Pout =
initialize Idxswith all as selected
for piPin do
if piIdxsthen
Pout =Pout pi
Pw=T DS .query(pi,w)
Idxs=Idxs\Pw
if |card(Pout )m| ≤ tthen return Pout
3.3. Suppression via square covering
We have compared both K-dT ANMS and RT ANMS and
observed similar performance in terms of keypoint repeatabil-
ity and clusteredness (see Section 4.4). It is worth mentioning
4
that while in the case of K-dT the range of search is defined by
the radius around the candidate point, RT uses a square approx-
imation of the search range. This square approximation can
potentially boost the speed performance of the ANMS.
One of the key approximations which makes SDC [8] e-
cient is a radius-nearest neighbor query, by superimposing a
grid Gwonto the keypoints and approximating the Euclidean
distance between keypoints by the distance between the centers
of the cells into which they fall. While this approximation per-
forms well, it still requires computing the Euclidean distance
between a large number of keypoints. Therefore, it is a cru-
cial concern since the number of computations increases as the
number of keypoints grows.
To tackle the aforementioned problem, we propose to apply
the square approximation for the SDC [8] algorithm. In partic-
ular, once the grid Gwis set, we try to cover the cells which lie
within 2w(determined by binary search) regardless of where
exactly the points are located inside this square range of cov-
erage. This drastically boosts the performance of the algorithm
since covering of the cells is simply performed by traversing
through a square search range without the need for Euclidean
distance computation. The pseudo-code is in Algorithm 2.
Algorithm 2: Suppression via Square Covering(SSC)
Input: keypoints Pin extracted by the detector
Output: spatially distributed keypoints Pout
sort Pin by strength
initialize binary search boundaries (Sec. 3.4)
while binary search for suppression side w do
set resolution of grid Gw=w/2
uncover cells of Gw
Pout =
for piPin do
if cell Gw[pi]is not covered then
Pout =Pout pi
cover cells of Gwaround piwith square of side 2w
if |card(Pout )m| ≤ tthen return Pout
3.4. Initialization of search range
Similar to our proposed algorithms, the SDC [8] uses binary
search to guess the appropriate search range. In this previous
work, the upper bound ahof the binary search is set to image
width WI, while the low one alis set to 1. This often results
in unnecessary iterations and decreases the convergence speed.
To tackle this problem, we propose a novel and elegant way
to precompute the bounds for the binary search which drasti-
cally decreases the number of iterations until convergence and,
in turn, improves the speed of the algorithm.
Our problem statement is the following, we want to homoge-
neously distribute the mqueried number of points on the image
without any clusters. To do so, we try to cover the image with
squares of side 2ahwith a minimal distance between the square
centers ah+1 (see Fig. 3). Given 2ahand WI, we can calculate
the maximum number of squares that perfectly fit in a row of
the image. We define this row as a set of squares placed at the
same height in the image where the first and the last square in
WI
HI
ah
ah
2
2
ah+1
1
ah
ah+1
2
Fig. 3: Graphical representation of the optimal point distribution. Bounding
boxes of dierent colors represent the search range around the candidate points.
a row are perfectly aligned with image borders. If there are q
points (i.e. square centers) inside each row, then there are q1
distances in each row between these points. In addition, the left
and right extreme points are located at a distance ahfrom the
left and right borders of the image. Thus, we can express the
image width WIin terms of ahand q:
WI=2ah+(ah+1)(q1),(1)
hence, from Equation (1) the number of points in each row is:
q=WIah+1
ah+1.(2)
Similarly, the maximum number of square centers lpossibly
fitting within the image height HIis:
HI=2ah+(ah+1)(l1).(3)
The queried number of points mis equal to the product of qand
l. By substituting l=m
qto Equation (3) and substituting qfrom
Equation (2), we obtain the following equation:
(m1)a2
h+ah(WI+2m+HI)+m+WIHIWI=0.(4)
Solving this equation for ahyields two solutions, one of which
is always negative, while the other one gives us the final esti-
mation of the square side:
ah=HI+WI+2m
2(m1) ,(5)
where the discriminant of the quadratic Equation (4) is:
∆ = 4WI+4m+4HIm+H2
I+W2
I2WIHI+4WIHIm.(6)
It is worth mentioning, while the solution tries to allocate as
many points as possible, it does not guarantee that the num-
ber of points on the image will be exactly equal to m. This
happens for several reasons. First of all, the fraction m
qmight
not produce an integer value for l. Secondly, during the code
implementation, we round the obtained value for ahsince the
minimum unit of an image is 1 pixel.
The lower bound alof the binary search can be determined
by looking closely at the worst possible point distribution. This
happens when all ninput points are located in a single square
on the image with no space between them. Given such distribu-
tion, we want to retrieve at least mqueried points by filling this
space with the smallest possible squares of side 2al. This can be
mathematically expressed as m(2al)2=n. Therefore, the equa-
tion for the lower bound of the binary search is the following:
al=1
2rn
m.(7)
5
4. Results
4.1. Time and storage complexity
The detailed time complexity analysis is provided in Table 2.
All of the presented algorithms (listed in ‘Method’ column)
rely on the preprocessing (i.e. sorting by strength) input key-
points. For this purpose, we utilize a sorting algorithm with an
average performance of O(nlog n). Additionally, K-dT and RT
ANMSs rely on TDS which has to be populated with the in-
put keypoints. This is performed by inserting (see Table 1 for
complexity) nnumber of keypoints one by one into a data struc-
ture resulting in overall O(nlog n) and O(nlog dn) complexity,
respectively. The query time for each algorithm to select ap-
propriate keypoints is stated in the ‘Query’ column in Table 2.
Specifically, the TopM algorithm simply retrieves mnumber
of keypoints from an already sorted list in O(m). The tradi-
tional ANMS [4] (designated ‘Brown’ in our experiments) al-
gorithm requires the computation of the minimum distance be-
tween every keypoint which requires O(n2) followed by sort-
ing in O(nlog n) and keypoint retrieval in O(m). Since the rest
of the algorithms (SDC, K-dT ANSM, RT ANMS, and SSC)
rely on a binary search algorithm to find the appropriate search
range in O(log wini), the total query time complexity can be ob-
tained by multiplying the number of search range guesses with
the complexity of keypoint selection per every guess. The to-
tal time complexity listed in ‘Total (approximated)’ gives us the
following insight into the algorithm’s performance. Obviously,
the TopM approach clearly outperforms all other methods in
terms of speed due to its simplicity. Furthermore, SDC, K-dT
ANMS, RT ANMS, and SSC are certainly asymptotically faster
than traditional ‘Brown’ [4].
The storage complexity evaluation is shown in Table 2.
Methods that do not rely on any data structure (e.g., TopM,
‘Brown’ [4], SDC, SSC) at most occupy memory necessary to
incorporate a number of input and output keypoints resulting in
O(n+m) complexity. These methods surely demonstrate better
storage complexity compared to K-dT and RT ANMSs which
additionally require memory for storing TDS (see Table 1).
Overall, due to the sophisticated time complexity estimation,
it is challenging to highlight a clear winner among the fastest
ANMS algorithms (SDC, K-dT ANMS, RT ANMS, and SSC).
In order to provide a qualitative evaluation of the algorithms,
we have performed an extensive evaluation of all methods.
4.2. Synthetic and real experiments
First of all, to fairly assess the speed performance of the
dierent algorithms, a large series of synthetic experiments
has been performed. For this purpose, a set of randomly dis-
tributed 2D points is generated on a synthetic image of resolu-
tion 1280 ×720 p. Further, a random cornerness score is indi-
vidually assigned to every point to simulate the behavior of key-
point detection in a natural image. The number of 2D points is
in range [800,11000] with a step of 100. Every test is repeated
1000 times to ensure an unbiased estimation of the algorithms’
speed. The queried number of points is fixed to 800, while the
search range wis initialized to image width. We intentionally
did not use our initialization technique (Section 3.4) for this test
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000
Mean computational time (ms)
0
1
2
3
4
5
6
number of points
RT ANMS
SSC
SDC
K-dT ANMS
TopM
(a) Mean processing time
number of points
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000
Std computational time (ms)
0
0.5
1
1.5
2
2.5
3
3.5
4
RT ANMS
SSC
SDC
K-dT ANMS
TopM
(b) Standard deviation
Fig. 4: Comparison of methods on synthetic data: (a) mean processing time,
(b) standard deviation.
to provide a fair comparison with SDC [8]. It should be noted
that Brown [4] has been removed from these experiments for
the sake of clarity (i.e., scale inconsistency) since this method
is significantly slower than the proposed approaches.
The mean computational time and the standard deviation
against the number of points per iteration are available in
Fig. 4(a) and Fig. 4(b) respectively. Through this experiment,
it is noticeable that the TopM algorithm drastically outperforms
more sophisticated approaches (but provides a very unsatisfy-
ing point distribution in practice, see Sec. 4.4). Other algo-
rithms show interesting characteristics. SSC is indisputably
more ecient than any existing algorithms both in stability
(i.e., the standard deviation remains very low) and speed. On
the other hand, SDC demonstrates satisfying results but suers
from a lack of stability for a low number of input keypoints. In-
deed, this approach is more ecient than RT ANMS when the
number of input points is large, however, this tendency is re-
versed when the detected keypoints do not exceed 5000. More-
over, SDC is less scalable than our SSC approach. Finally, de-
spite its eciency for a small number of points, K-dT ANMS
loses its advantage for more than 2000 points.
While the assessment with synthetic data is a good evaluation
showing clear tendencies, the distribution of keypoints in real
images can be dierent than the ones obtained synthetically.
Therefore, we propose an extensive series of evaluations us-
ing real images. For this purpose, we select 1000 images from
KITTI [9] (Sequence 00), and detect keypoints using FAST [15]
with a threshold th=5. Such relatively low threshold results in
a large number of detected keypoints (i.e.,>10000 keypoints per
image). Subsequently, the keypoints are sorted by their strength
. Further, we iteratively select a fixed number of the strongest
keypoints starting from 100 until reaching 10000. The step of
such selection is 100, which results in 100 tests per image. The
ANMS algorithms return a fixed percentage of the input num-
ber of keypoints. For instance, for 1000 input keypoints and a
queried percentage of 10%, the number of queried keypoints is
100. We have applied 10 dierent ratios in range [10%,100%]
with a step of 10%. Several representative results from these
evaluations are provided in Figure 5.
This extensive evaluation demonstrates that SSC clearly out-
performs all other methods in terms of speed. Overall, dierent
conclusions from those obtained with the synthetic experiments
can be drawn. Indeed, SDC remains ecient for a relatively
small number of queried keypoints (Fig.5(a)) but tends to be
less eective when a large number of input and output keypoints
are processed. This can be explained by the substantial number
6
Table 2: Time and storage complexity.
Time complexity
Method Preprocess Build Query Total (approximated)
Storage
complexity
TopM O(nlogn) - O(m)O(nlog n)O(n+m)
Brown O(nlog n) - O(n2+nlog n+m)O(n2)O(n+m)
SDC O(nlog n) - O(log wini ·(n+mr)) O(nlog n+log wini ·(n+mr)) O(n+m)
K-dT ANMS O(nlog n)O(nlog n)O(log wini ·(n+n11/d+Pcard(Pw))) O(nlog n+log wini ·(n+Pcard(Pw))) O(n+card(Pw)+m)
RT ANMS O(nlog n)O(nlog dn)O(log wini ·(n+log dn+Pcard(Pw))) O(nlog dn+log wini ·(n+Pcard(Pw))) O(nlog d1n+card(Pw)+m)
SSC O(nlog n) - O(log wini ·(n+4m)) O(nlog n+log wini ·(n+4m)) O(n+m)
number of input points
0 2000 4000 6000 8000 10000
processing time [ms]
0
5
10
15
20
25
30
10 percent
SDC
K-dT ANMS
RT ANMS
SSC
(a)
number of input points
0 2000 4000 6000 8000 10000
processing time [ms]
0
5
10
15
20
25
30
35
40
40 percent
SDC
K-dT ANMS
RT ANMS
SSC
(b)
number of input points
0 2000 4000 6000 8000 10000
processing time [ms]
0
5
10
15
20
25
30
35
40
70 percent
SDC
K-dT ANMS
RT ANMS
SSC
(c)
Fig. 5: Mean processing time vs. number of input keypoints for 1000 images. Subfigures (a)-(c) show linear scale of the yaxis.
0
1
2
3
4
5
6
7
8
9
10
SDC K-dT ANMS RT ANMS SSC
Number of iterations
Without initialization With initialization
Fig. 6: Number of iterations until convergence (with and without initialization).
of Euclidean distances comparison to be computed for a dense
set of input keypoints (Fig.5(b) and (c)). In this respect, our RT
ANMS scales more eciently even when many outputs points
are requested (see Fig.5(b) and (c)). Finally, our K-dT ANMS
becomes inecient for a large number of points due to the rel-
atively slow query time of this data structure (see Fig.5(c)).
4.3. Eect of proposed initialization
In this experiment, we evaluate the impact of our initializa-
tion on the speed of the dierent methods. For this purpose, the
same real-image experimental setup described in Section 4.2
is used. However, in this case, we employ our initialization
approach (Section 3.4). Two criteria have been utilized to de-
termine the advantages oered by this technique. The first is
the number of iterations needed to reach the number of queried
points (see Fig. 6). The second one is the overall speed-up pro-
vided to each method (see Table 3).
For every single method, the number of necessary iterations
has been reduced by a factor of three, leading to a significant
speed-up. It is noticeable that certain approaches are more af-
fected by this initialization. For instance, this is the case for the
K-dT ANMS approach which has been sped-up by a factor of
2.6×. However, some other algorithms such as our RT ANMS
has been moderately improved by our bounds calculation. This
Table 3: Speedup provided by initialization (average over 1000 images).
Method Without Initialization (ms) With Initialization (ms) Speedup
SDC 7.4 3.1 2.4x
K-dT ANMS 17.5 6.8 2.6x
RT ANMS 8.9 7.3 1.2x
SSC 2.0 1.4 1.4x
can be explained by the nature of the RT structure itself. In fact,
with a closer initialization (i.e., a smaller search range), the total
number of expensive queries increases.
4.4. Clusteredness
The main advantage of using an ANMS strategy is the un-
clustered and well-distributed set of keypoints resulting from
this process. Indeed, this feature allows us to avoid redun-
dant information typically occurring with commonly used ap-
proaches such as bucketing keypoint detection and standard
NMS. To evaluate the clusteredness we have reproduced the
experiment suggested in [16], where the authors propose an ap-
propriate metric to evaluate this criterion. For this evaluation,
the image is divided into a regular grid of 10 ×10 cells to com-
pute the number of points lying in every single cell. The stan-
dard deviation of the number of corners per cell is utilized as
the clusteredness metric since it is representative of the homo-
geneity of the spatial distribution.
To provide a statically valid evaluation of the clusteredness
for every single approach, 2000 randomly selected images from
the KITTI dataset [9] are used. In this experiment, th=12 and
the number of queried keypoints mvaries between 100 to 700.
The obtained results are visible in Fig. 8. We can clearly notice
that all the ANMS approaches provide similar outputs in terms
of spatial distribution which can also be observed in Fig. 7. As
to the bucketing approach (grid size is 7×5), it produces bet-
ter spatial distribution than TopM, but cannot meet the perfor-
mance of the ANMS strategies. This can be explained by the
7
(a) (b) (c)
Fig. 7: Keypoint detection: (a) K-dT ANMS, (b) RT ANMS, (c) SSC. The red dots represent selected keypoints. In this experiment, th=12 and m=100.
number of points
100 200 300 400 500 600 700
clusterness
0
0.005
0.01
0.015
0.02
0.025
0.03 RT ANMS
SSC
SDC
Brown
K-dT ANMS
TopM
Bucketing
Fig. 8: Mean and standard deviation of the clusteredness over 1000 images.
fact that bucketing approach is designed to ensure good spatial
distribution, but does not solve the problem of point clusters.
4.5. Application to SLAM
SLAM is one of the applications where the spatial distribu-
tion of the keypoints on the image is crucial. Therefore, we
have included our ANMS solutions in a stereo-SLAM algo-
rithm which is conceptually close to S-PTAM [14]. Specifi-
cally, the keypoints are detected on both stereo-images using
the FAST (th=12) and filtered by our ANMS algorithms to
reach 750 points. These stereo points are matched together us-
ing a line search strategy and triangulated to initialize the 3D
map. Motion tracking is performed using a RANSAC P3P [11]
algorithm. Finally, the mapping is achieved by refining the
structure and the motion together via a local bundle adjustment
scheme. For this evaluation, we have utilized all the training se-
quences from the KITTI dataset [9] where an accurate ground
truth is provided. The mean translation (in percentage) and ro-
tation (in degree) error per sequence are computed with the met-
rics recommended by [9].
The results of the entire experiment are available in Fig. 9.
Regarding the translational error, a clear tendency is noticeable.
For instance, the TopM algorithm is particularly inecient in
light of the other approaches. On the other hand, the bucketing
approach tends to perform better than the TopM approach but
never provides better results than the ANMS methods. All the
ANMS approaches provide comparable results. The same ten-
dency is observed for the rotation estimation. However, for ro-
tation, the detection of well-distributed keypoints is less crucial.
The error discrepancy between the dierent sequences may be
justified by the various contexts in which the sequences have
been acquired. For instance, in Seq00 the large rotational er-
ror can be explained by the high quantity of turns in the se-
quence, while Seq04 (which admits a very low rotational error)
mostly consists of a short and straight line. Moreover, Seq01
is interesting to analyze because it is probably the most chal-
lenging - the vehicle is going at high speed through a relatively
empty scene (low texture). These factors make the keypoints
particularly dicult to track. Under these conditions, ANMS
algorithms show even more significant improvements. Another
observation is the improved robustness to moving objects. Our
approaches also show very promising results in sequences con-
taining one or more moving objects (for instance in Seq04).
With ANMS only a few keypoints are detected on moving ob-
jects, while their majority belong to the rigid background, there-
fore, the outliers are more eciently removed by a robust esti-
mation step (RANSAC in our SLAM). Finally, in Fig. 9, the
slight error discrepancy between ANMS methods is mostly due
to the inherent randomness of the point tracking strategy, noise,
and numerical error typical of real image experiments. Nev-
ertheless, we can certainly conclude that all the ANMS ap-
proaches - compared in this paper - significantly improve the
SLAM algorithm in a very similar manner.
In Fig. 10, we propose a qualitative comparison of our SSC
algorithm against the bucketing strategy. For this estimation,
we use the New College dataset [18] (see Fig. 1) consisting
of 50000 stereo image pairs, covering 2.5km with a hand-
held stereo camera (multiple loops and challenging scenarios).
Through this experiment, it is clear that our ANMS approach
significantly reduces drift over the sequence compared to the
bucketing approach. This drift is particularly obvious in the
side view (see Fig. 10(b)). Note that the TopM algorithm is not
depicted in this figure for sake of clarity (very large drift).
4.6. Discussion on proposed methods
Certainly, ANMS approaches are beneficial under specific
contexts and conditions. It is appropriate for pose estimation
(SLAM, panorama stitching, etc.), self-calibration, and Struc-
ture from Motion (SfM). Similarly, Schauwecker et al. [16]
have demonstrated that a good dissemination of the points in
the images resulted in a better sparse stereo matching. How-
ever, ANMS is not limited to these topics and can be appropri-
ate for many real-time approaches. For example, it might be the
case for Bag-of-Word place recognition, where well-distributed
points can lead to a stronger description of the image. While the
authors have originally developed SDC [8] for planar tracking
purposes, we believe that ANMS might be counter-productive
for visual tracking under certain conditions (i.e. small target,
cluttered scene). Other techniques requiring a dense cluster of
points on a salient part of the image (i.e. point based obstacle
detection) would probably not be improved by ANMS.
In this paper, we have proposed three ANMS techniques
named K-dT ANMS, RT ANMS, and SSC to homogeneously
distribute keypoints on the image. While ANMS methods pro-
vide visually and statistically (analyzed by Z-test) similar out-
puts in terms of spatial distribution, SSC demonstrates the best
time and scalability performance. Therefore, this algorithm is
8
0
1
2
3
4
5
6
Seq00
Seq01
Seq02
Seq03
Seq04
Seq05
Seq06
Seq07
Seq08
Seq09
Seq10
Translational error (%)
Sequence number
Brown
RT-ANMS
SSC
SDC
K-dT ANMS
TopM
Bucketing
(a)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Seq00
Seq01
Seq02
Seq03
Seq04
Seq05
Seq06
Seq07
Seq08
Seq09
Seq10
Rotational error (
°
)
Sequence number
Brown
RT-ANMS
SSC
SDC
K-dT ANMS
TopM
Bucketing
(b)
Fig. 9: Experimental results of dierent methods on SLAM: (a) mean translational error, (b) rotational error.
Z (m)
0
50
100
150
X (m)
0 50 100 150 200
SSC
Bucketing
(a)
X (m)
200
150
100
50
0
-40
Y (m)
-20
0
0
50
100
150
Z (m)
SSC
Bucketing
(b)
Fig. 10: Trajectories computed on the New College dataset using our SSC ap-
proach and the bucketing strategy: (a) top view, (b) side view.
advisable for use in case an application requires real-time per-
formance even when the number of input points is relatively
high. This would include, for example, real-time SLAM or vi-
sual odometry. On the other hand, since K-dT and RT ANMSs
are based on TDS to store the input points, they can be used for
situations when keypoints need to be reused. A good example
is a large scale SfM where many re-projection on the images
have to be performed to aggregate new images. Thus, these
approaches can be accelerated by using the same structure for
keypoints detection and for point matching. Compared to K-dT
ANMS, RT ANMS oers faster query time but requires more
storage memory. Therefore, a user should consider this tradeo
when choosing among the proposed ANMS based on TDS.
5. Conclusion
In this paper, we have presented three novel ANMS tech-
niques (codes are provided) to homogeneously distribute de-
tected keypoints in the image. Through an extensive series of
experiments, we have highlighted the eectiveness and scala-
bility of our approaches. Furthermore, we have demonstrated
the positive impact of our ANMS strategies on visual SLAM.
The presented results show that ANMS is a beneficial step for
improving SLAM performance. Another major contribution of
this paper is the binary search boundaries initialization which
drastically reduces the number of iterations needed to retain the
queried number of points. The proposed initialization is de-
signed to be suitable for any ANMS relying on binary search.
The current ANMS approaches are designed to handle con-
ventional images, but may perform poorly on non-uniform spa-
tial resolution induced by distortion (e.g. fisheye lens, catadiop-
tric system, etc.). Naturally, the extension of this work will
focus on this problem by proposing an ANMS applicable to the
unified spherical model.
Acknowledgment
This research was supported by the Shared Sensing for Coop-
erative Cars Project funded by Bosch (China) Investment Ltd.
The second author was supported by Korea Research Fellow-
ship (KRF) Program through the NRF funded by the Ministry
of Science, ICT and Future Planning (2015H1D3A1066564).
References
[1] A. Behrens and H. R ¨
ollinger. Analysis of feature point distributions for
fast image mosaicking algorithms. Acta Polytechnica, 50(4), 2010.
[2] R. Berinde. Ecient implementations of range trees. 2007.
[3] Y. Bok, H. Ha, and I. Kweon. Automated checkerboard detection and
indexing using circular boundaries. Pattern Recognition Letters, 71:66–
72, 2016.
[4] M. Brown, R. Szeliski, and S. Winder. Multi-image matching using multi-
scale oriented patches. In CVPR, 2005.
[5] S. Buoncompagni, D. Maio, D. Maltoni, and S. Papi. Saliency-based
keypoint selection for fast object detection and matching. Pattern Recog-
nition Letters, 62:32–40, 2015.
[6] T. Chan. A minimalists implementation of an approximate nearest neigh-
bor algorithm in fixed dimensions. See https://goo.gl/cvDjAs, 2006.
[7] Z. Cheng, D. Devarajan, and R. Radke. Determining vision graphs for
distributed camera networks using feature digests. EURASIP Journal on
Applied Signal Processing, 2007(1):220–220, 2007.
[8] S. Gauglitz, L. Foschini, M. Turk, and T. H¨
ollerer. Eciently selecting
spatially distributed keypoints for visual tracking. In ICIP, 2011.
[9] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driv-
ing? the kitti vision benchmark suite. In CVPR, 2012.
[10] B. Kitt, A. Geiger, and H. Lategahn. Visual odometry based on stereo
image sequences with ransac-based outlier rejection scheme. In IV, 2010.
[11] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of
the perspective-three-point problem for a direct computation of absolute
camera position and orientation. In CVPR, 2011.
[12] Q. Miao, G. Wang, C. Shi, X. Lin, and Z. Ruan. A new framework
for on-line object tracking based on surf. Pattern Recognition Letters,
32(13):1564–1571, 2011.
[13] M. Muja and D. Lowe. Scalable nearest neighbor algorithms for high
dimensional data. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 36(11):2227–2240, 2014.
[14] T. Pire, T. Fischer, J. Civera, P. De Crist´
oforis, and J. Berlles. Stereo
parallel tracking and mapping for robot localization. In IROS, 2015.
[15] E. Rosten and T. Drummond. Machine learning for high-speed corner
detection. In ECCV, 2006.
[16] K. Schauwecker, R. Klette, and A. Zell. A new feature detector and stereo
matching method for accurate high-performance sparse stereo matching.
In IROS, 2012.
[17] R. Seidel and C. Aragon. Randomized search trees. Algorithmica,
16(4):464–497, 1996.
[18] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The new
college vision and laser data set. The International Journal of Robotics
Research, 28(5):595–599, May 2009.
... This method FastORBSLAM [15] replaces ORB-SLAM2 descriptors Kanade-Lucas-Tomasi Tracking Method(KLT), which reduces match points-to-points time. Oleksandr Bailos' [16] proposes an efficient adaptive non-maximal suppression algorithm for keypoints distribution to minimize the time to finish the program. This algorithm finds a global response value more significant than the set threshold. ...
... In this way, Robust Keyframe can reduce redundant information. Keyframes below the calculated tracking values between curframe and Keyframe by a certain ratio can be expressed as (16) added. However, the Keyframe of ORB-SLAM2 is poor robustness. ...
Preprint
In this paper, we develop a robust, efficient visual SLAM system that utilizes spatial inhibition of low threshold, baseline lines, and closed-loop keyframe features. Using ORB-SLAM2, our methods include stereo matching, frame tracking, local bundle adjustment, and line and point global bundle adjustment. In particular, we contribute re-projection in line with the baseline. Fusing lines in the system consume colossal time, and we reduce the time from distributing points to utilizing spatial suppression of feature points. In addition, low threshold key points can be more effective in dealing with low textures. In order to overcome Tracking keyframe redundant problems, an efficient and robust closed-loop tracking key frame is proposed. The proposed SLAM has been extensively tested in KITTI and EuRoC datasets, demonstrating that the proposed system is superior to state-of-the-art methods in various scenarios.
... The other group of methods reduces the graph geometry in SLAM, which decimates features [8], points [7], [10], frames [12], [27] with minimal information loss. Bailo et al. [2] proposes an adaptive non-maximal suppression (ANMS) to quickly and uniformly re-segment keypoints in the image. It reduces the computational complexity through a square approximation of the search range to suppress irrelevant points, and initializes the search range based on the image dimensions, which leads to a faster convergence. ...
... Adaptive non-maximal suppression (ANMS) algorithm [2] improves the performance in SLAM and image registration by selecting keypoints detected on the image to be homogeneously distributed through efficient computation [3], [15]. ANMS achieves better results on visual SLAM compared to the topM [30] or the bucketing [24] approach when a sufficient number of keypoints is selected. ...
Preprint
When adapting Simultaneous Mapping and Localization (SLAM) to real-world applications, such as autonomous vehicles, drones, and augmented reality devices, its memory footprint and computing cost are the two main factors limiting the performance and the range of applications. In sparse feature based SLAM algorithms, one efficient way for this problem is to limit the map point size by selecting the points potentially useful for local and global bundle adjustment (BA). This study proposes an efficient graph optimization for sparsifying map points in SLAM systems. Specifically, we formulate a maximum pose-visibility and maximum spatial diversity problem as a minimum-cost maximum-flow graph optimization problem. The proposed method works as an additional step in existing SLAM systems, so it can be used in both conventional or learning based SLAM systems. By extensive experimental evaluations we demonstrate the proposed method achieves even more accurate camera poses with approximately 1/3 of the map points and 1/2 of the computation.
... SIFT and ORB tend to detect large numbers of spatially clustered points (that is suboptimal for our purpose of detecting object landmarks), we combine them with Adaptive Non-Maximal Suppression (ANMS [4]) to ensure a homogeneous spatial distribution. The results shown in Table 1 show that all detectors allow our method to deliver competitive results, with SuperPoint proving to be the best choice. ...
Preprint
This paper proposes a novel paradigm for the unsupervised learning of object landmark detectors. Contrary to existing methods that build on auxiliary tasks such as image generation or equivariance, we propose a self-training approach where, departing from generic keypoints, a landmark detector and descriptor is trained to improve itself, tuning the keypoints into distinctive landmarks. To this end, we propose an iterative algorithm that alternates between producing new pseudo-labels through feature clustering and learning distinctive features for each pseudo-class through contrastive learning. With a shared backbone for the landmark detector and descriptor, the keypoint locations progressively converge to stable landmarks, filtering those less stable. Compared to previous works, our approach can learn points that are more flexible in terms of capturing large viewpoint changes. We validate our method on a variety of difficult datasets, including LS3D, BBCPose, Human3.6M and PennAction, achieving new state of the art results.
... Among multiple feature-extractors, SIFT/ SURF detectors are more resistant against scale differences but are relatively slow. Multiple approaches to improve the performance of a keypoint-based approach using non-maximal suppression are also proposed [25]. Nonetheless, these approaches use these keypoints to estimate a relative pose, focused on estimating a translation vector up to a scale factor, i.e. a proportional vector to a translation vector. ...
Preprint
Full-text available
Relative camera pose estimation plays a pivotal role in dealing with 3D reconstruction and visual localization. To address this, we propose a Siamese network based on MobileNetV3-Large for an end-to-end relative camera pose regression independent of camera parameters. The proposed network uses pair of images taken at different locations in the same scene to estimate the 3D translation vector and rotation vector in unit quaternion. To increase the generality of the model, rather than training it for a single scene, data for four scenes are combined to train a single universal model to estimate the relative pose. Further for independency of hyperparameter weighing between translation and rotation loss is not used. Instead we use the novel two-stage training procedure to learn the balance implicitly with faster convergence. We compare the results obtained with the Cambridge Landmarks dataset, comprising of different scenes, with existing CNN-based regression methods as baselines, e.g., RPNet and RCPNet. The findings indicate that, when compared to RCPNet, proposed model improves the estimation of the translation vector by a percentage change of 16.11%, 28.88%, 52.27% on the Kings College, Old Hospital, St Marys Church scenes from Cambridge Landmarks dataset, respectively.
... Every time that a range reading is available, new map features are initialized using the next camera frame available. First ORB keypoints [41] are detected on the frame, then a subset of strong keypoints is selected using the methodology proposed in [42]. For each strong keypoint, a new map feature is initialized in the local SLAM system state using model in Equation (16). ...
Article
Full-text available
This work presents a hybrid visual-based SLAM architecture that aims to take advantage of the strengths of each of the two main methodologies currently available for implementing visual-based SLAM systems, while at the same time minimizing some of their drawbacks. The main idea is to implement a local SLAM process using a filter-based technique, and enable the tasks of building and maintaining a consistent global map of the environment, including the loop closure problem, to use the processes implemented using optimization-based techniques. Different variants of visual-based SLAM systems can be implemented using the proposed architecture. This work also presents the implementation case of a full monocular-based SLAM system for unmanned aerial vehicles that integrates additional sensory inputs. Experiments using real data obtained from the sensors of a quadrotor are presented to validate the feasibility of the proposed approach.
... Among all the different feature extraction approaches, Oriented FAST and Rotated BRIEF (ORB) [15] is exploited for VIS images, while Speeded-up Robust Features (SURF) [16] is employed for TIR images. To further improve the spatial keypoint distribution over the image, adaptive non-maximal suppression (ANMS) with suppression via square covering (SSC) is employed [17], that is able to enforce a better spatial distribution by jointly taking into account the keypoints strength and their localization. An upper bound of 400 features is set to constrain the computational burden. ...
Conference Paper
Full-text available
This work investigates the potentialities of multispectral imaging data fusion for relative navigation, mapping and dynamical characterization of an unknown celestial body. A vision-based navigation algorithm is designed to work on both visible (VIS) and thermal infrared (TIR) images, with the aim of estimating the spacecraft's relative pose while reconstructing the target's shape. The output of the Image Processing (IP) is then considered as the primary measurement source for an Extended Kalman Filter (EKF), that fuses camera output with inertial measurements to refine the pose estimate and reconstruct the asteroid's spin state. Experimental results suggest that the proposed data fusion approach can effectively enhance the navigation solution accuracy without requiring any additional on-board hardware complexity.
Article
A Simultaneous Localization and Mapping (SLAM) system must be robust to support long-term mobile vehicle and robot applications. However, camera and LiDAR based SLAM systems can be fragile when facing challenging illumination or weather conditions which degrade the utility of imagery and point cloud data. Radar, whose operating electromagnetic spectrum is less affected by environmental changes, is promising although its distinct sensor model and noise characteristics bring open challenges when being exploited for SLAM. This paper studies the use of a Frequency Modulated Continuous Wave radar for SLAM in large-scale outdoor environments. We propose a full radar SLAM system, including a novel radar motion estimation algorithm that leverages radar geometry for reliable feature tracking. It also optimally compensates motion distortion and estimates pose by joint optimization. Its loop closure component is designed to be simple yet efficient for radar imagery by capturing and exploiting structural information of the surrounding environment. Extensive experiments on three public radar datasets, ranging from city streets and residential areas to countryside and highways, show competitive accuracy and reliability performance of the proposed radar SLAM system compared to the state-of-the-art LiDAR, vision and radar methods. The results show that our system is technically viable in achieving reliable SLAM in extreme weather conditions on the RADIATE Dataset, for example, heavy snow and dense fog, demonstrating the promising potential of using radar for all-weather localization and mapping.
Conference Paper
Full-text available
Current missions to asteroids largely rely on visible imaging to perform relative navigation with respect to the target surface. A consequence of observing the body in the visible band is the large influence of illumination conditions, which could be mitigated by an imaging system working in other spectral bands. This paper studies the possibility to exploit multispectral imaging sensors for improving spacecrafts' relative navigation in the proximity of asteroids. Realistic sensors models are considered and thermal images are synthetically generated exploiting physics-based models. An optical navigation algorithm for pose estimation is tested with visible and thermal images, proving the possible advantages of the multispectral strategy.
Article
Full-text available
In this paper, we propose a multi-vehicle localization approach relying exclusively on cameras installed on connected cars (e.g. vehicles with Internet access). The proposed method is designed to perform in real-time while requiring a low bandwidth connection as a result of an efficient distributed architecture. Hence, our approach is compatible with both LTE Internet connection and local Wi-Fi networks. To reach this goal, the vehicles share small portions of their respective 3D maps to estimate their relative positions. The global consistency between multiple vehicles is enforced via a novel graph-based strategy. The efficiency of our system is highlighted through a series of real experiments involving multiple vehicles. Moreover, the usefulness of our technique is emphasized by an innovative and unique multi-car see-through system resolving the inherent limitations of the previous approaches. A video demonstration is available via: https://youtu.be/GD7Z95bWP6k.
Article
Optical Coherence Tomography (OCT) and OCT Angiography (OCTA) techniques offer numerous advantages in clinical skin applications but the field of view (FOV) of current commercial systems are relatively limited to cover the entire skin lesion. The typical method to expand the FOV is to apply wide field objective lens. However, lateral resolution is often sacrificed when scanning with these lenses. To overcome this drawback, we developed an automated 3D stitching method for creating high-resolution skin structure and vascular volumes with large field of view, which was realized by montaging multiple adjacent OCT and OCTA volumes. The proposed stitching method is demonstrated by montaging 3×3 OCT and OCTA volumes (9 OCT/OCTA volumes as one dataset with each volume covers 2.5cm×2.5cm area) of healthy thin and thick skin from 6 volunteers. The proposed stitching protocol achieves high flexibility and repeatable for all the participants. Moreover, according to evaluation of structural similarity index (SSI) and feature similarity index (FSI), our proposed stitched result has a superior similarity to single scanning protocol in large-scaled. We had also verified its improved performance through assessing metrics of vessel contrast-noise-ratio (CNR) from 2.07±0.44 (single large-scaled scanning protocol) to 3.05±0.51 (proposed 3×3 sub-volume stitching method). This article is protected by copyright. All rights reserved.
Article
Full-text available
The Perspective-Three-Point (P3P) problem aims at determining the position and orientation of the camera in the world reference frame from three 2D-3D point correspondences. This problem is known to provide up to four solutions that can then be disambiguated using a fourth point. All existing solutions attempt to first solve for the position of the points in the camera reference frame, and then compute the position and orientation of the camera in the world frame, which alignes the two point sets. In contrast, in this paper we propose a novel closed-form solution to the P3P problem, which computes the aligning transformation directly in a single stage, without the intermediate derivation of the points in the camera frame. This is made possible by introducing intermediate camera and world reference frames, and expressing their relative position and orientation using only two parameters. The projection of a world point into the parametrized camera pose then leads to two conditions and finally a quartic equation for finding up to four solutions for the parameter pair. A subsequent backsubstitution directly leads to the corresponding camera poses with respect to the world reference frame. We show that the proposed algorithm offers accuracy and precision comparable to a popular, standard, state-of-the-art approach but at much lower computational cost (15 times faster). Furthermore, it provides improved numerical stability and is less affected by degenerate configurations of the selected world points. The superior computational efficiency is particularly suitable for any RANSAC-outlier-rejection step, which is always recommended before applying PnP or non-linear optimization of the final solution.
Conference Paper
Full-text available
This paper describes a visual SLAM system based on stereo cameras and focused on real-time localization for mobile robots. To achieve this, it heavily exploits the parallel nature of the SLAM problem, separating the time-constrained pose estimation from less pressing matters such as map building and refinement tasks. On the other hand, the stereo setting allows to reconstruct a metric 3D map for each frame of stereo images, improving the accuracy of the mapping process with respect to monocular SLAM and avoiding the well-known bootstrapping problem. Also, the real scale of the environment is an essential feature for robots which have to interact with their surrounding workspace. A series of experiments, on-line on a robot as well as off-line with public datasets, are performed to validate the accuracy and real-time performance of the developed method.
Conference Paper
Full-text available
Hardware platforms with limited processing power are often incapable of running dense stereo analysis algorithms at acceptable speed. Sparse algorithms provide an alternative but generally lack in accuracy. To overcome this predicament, we present an efficient sparse stereo analysis algorithm that applies a dense consistency check, leading to accurate matching results. We further improve matching accuracy by introducing a new feature detector based on FAST, which exhibits a less clustered feature distribution. The new feature detector leads to a superior performance of our stereo analysis algorithm. Performance evaluation shows that the proposed stereo matching system achieves processing rates above 200 frames per second on a commodity dual core CPU, and faster than video frame-rate processing on a low-performance embedded platform. The stereo matching results prove to be superior to those obtained with ordinary sparse matching algorithms.
Article
Full-text available
Alexander Behrens and Hendrik R\"{o}llinger}, title = {Analysis of Feature Point Distributions for Fast Image Mosaicking Algorithms}, journal = {Acta Polytechnica Journal of Advanced Engineering}, year = {2010}, volume = {50}, pages = {12--18}, number = {4}, month = {Aug.}, issn = {ISSN 1210-2709} } Abstract In many algorithms the registration of image pairs is done by feature point matching. After the feature detection is performed, all extracted interest points are usually used for the registration process without further feature point distribution analysis. However, in the case of small and sparse sets of feature points of fixed size, suitable for real-time image mosaicking algorithms, a uniform spatial feature distribution across the image becomes relevant. Thus, in this paper we discuss and analyze algorithms which provide different spatial point distributions from a given set of SURF features. The evaluations show that a more uniform spatial distribution of the point matches results in lower image registration errors, and is thus more beneficial for fast image mosaicking algorithms.
Article
This paper presents a new algorithm for automated checkerboard detection and indexing. Automated checkerboard detection is essential for reducing user inputs in any camera calibration process. We adopt an iterative refinement algorithm to extract corner candidates. In order to utilize the characteristics of checkerboard corners, we extract a circular boundary from each candidate and find its sign-changing indices. We initialize an arbitrary point and its neighboring two points as seeds and assign world coordinates to the other points. The largest set of world-coordinate-assigned points is selected as the detected checkerboard. The performance of the proposed algorithm is evaluated using images with various sizes and particular conditions.
Article
In this paper we present a new approach to rank and select keypoints based on their saliency for object detection and matching under moderate viewpoint and lighting changes. Saliency is defined in terms of detectability, repeatability and distinctiveness by considering both the keypoint strength (as returned by the detector algorithm) and the associated local descriptor discriminating power. Our experiments prove that selecting a small amount of available keypoints (e.g., 10%) not only boosts efficiency but can also lead to better detection/matching accuracy thus making the proposed method attractive for real-time applications (e.g., augmented reality).
Article
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti.
Article
Range-trees are a data structure for solving the bi-dimensional orthogonal range query prob- lem: given a set of points in the plane, ecien tly nd the points which lie inside any given rectangle. This paper gives a complete description of the data structure, and investigates what is the most ecien t way to implement it. Methods to improve the performance of the data structure when ran on a multi-core machine are also investigated. The performance of several versions of the structure is analyzed on a practical data set.