Content uploaded by François Rameau

Author content

All content in this area was uploaded by François Rameau on Feb 25, 2018

Content may be subject to copyright.

1

Pattern Recognition Letters

journal homepage: www.elsevier.com

Eﬃcient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint

distribution

Oleksandr Bailo, Francois Rameau∗∗, Kyungdon Joo, Jinsun Park, Oleksandr Bogdan,InSoKweon

aDepartment of Electrical Engineering, KAIST, Daejeon, 34141, Republic of Korea

ABSTRACT

Keypoint detection usually results in a large number of keypoints which are mostly clustered, re-

dundant, and noisy. These keypoints often require special processing like Adaptive Non-Maximal

Suppression (ANMS) to retain the most relevant ones. In this paper, we present three new eﬃcient

ANMS approaches which ensure a fast and homogeneous repartition of the keypoints in the image.

For this purpose, a square approximation of the search range to suppress irrelevant points is proposed

to reduce the computational complexity of the ANMS. To further speed up the proposed approaches,

we also introduce a novel strategy to initialize the search range based on image dimension which leads

to a faster convergence. An exhaustive survey and comparisons with already existing methods are

provided to highlight the eﬀectiveness and scalability of our methods and the initialization strategy.

c

2018 Elsevier Ltd. All rights reserved.

1. Introduction

Keypoint detection is often the ﬁrst step for various tasks

such as SLAM [14], panorama stitching [4], camera calibra-

tion [3], and visual tracking [12, 5]. Therefore, this stage poten-

tially aﬀects the robustness, stability, and accuracy of the afore-

mentioned applications. In the past decade, we have witnessed

signiﬁcant advances in keypoint detectors leading to major im-

provements in terms of accuracy, speed, and repeatability. But

while the detection of keypoints has been intensively studied,

ensuring their homogeneous spatial distribution has attracted a

rather low level of attention. It is well known that spatial point

distribution is crucial to avoiding problematic cases like de-

generated conﬁgurations (for structure from motion or SLAM)

or redundant information (i.e. cluster of points) as depicted

in Fig. 1. Moreover, a homogeneous and unclustered point dis-

tribution might speed up most computer vision pipelines since a

lower number of keypoints is needed to cover the whole image.

One of the most eﬀective solutions to ensure well-distributed

keypoint detection is to apply an Adaptive Non-Maximal Sup-

pression (ANMS) algorithm on the keypoints extracted by a de-

tector. However, despite all the advantages oﬀered by such ap-

proaches, these methods have been rarely used in practice due

∗∗Corresponding author: Tel.: +8210-3355-7120;

E-mail addresses: frameau@rcv.kaist.ac.kr (F. Rameau)

(a) (b) (c)

Fig. 1: Keypoint detection: (a) TopM NMS, (b) bucketing, (c) proposed ANMS.

The bottom right subimage represents the coverage and clusteredness of key-

points computed using a Gaussian kernel. The red color in the subimage stands

for a dense cluster of points, while the blue color represents an uncovered area.

to their high computational complexity. To overcome this lim-

itation we propose three novel approaches called Range Tree

ANMS (RT ANMS), K-d Tree ANMS (K-dT ANMS), and Sup-

pression via Square Covering (SSC). The developed algorithms

aim to eﬃciently select the strongest and well-distributed key-

points across the image. We achieve such performance using

a square search range approximation which is initialized in an

optimal and intuitive manner (see Fig. 2).

An abundant number of experiments are used to demon-

strate the relevance of our ANMS algorithms in terms of speed,

spatial distribution, and memory eﬃciency. Furthermore, we

experimentally highlight that ANMS is a beneﬁcial step for

SLAM, which drastically improves the accuracy of the motion

estimation while using a restricted number of keypoints.

2

Keypoint detection

Input image

Preprocessing and

initialization

Image with output keypoints

(a) (b)

(d)

Iteration

# detected points: 9

Range size: 250p # detected points: 34

Range size: 125p # detected points: 126

Range size: 63p # detected points: 58

Range size: 94p # detected points: 105

Range size: 70p

Suppression range search

(c)

Fig. 2: Algorithm’s workﬂow: (a) keypoint detection in the original image (depicted in blue), (b) sorting keypoints by strength and initialization of the search

range, (c) conceptual representation of our ANMS algorithm where: every column represents the search range guess (orange boxes) through a binary search process

iterated until queried number of points is reached (100 in this example); while every row depicts the iterations through input points, (d) ﬁnal result where the red

dots represent the selected keypoints.

To sum up, the contributions of this paper are the following:

•Three novel and eﬃcient ANMS algorithms

•A new and optimal initialization of the search range

•An extensive series of experiments against state-of-the-art

•Eﬃcient and optimized ANMS codes are made available

at https://github.com/BAILOOL/ANMS-Codes.

This paper is organized as follows. In Section 2, we provide

an extensive literature review of existing approaches. The nota-

tions as well as proposed methods are introduced in Section 3.

Finally, a large number of experiments is provided in Section 4

followed by a brief conclusion (Section 5).

2. Related work

In this section, we report existing methods that have been de-

veloped to improve the spatial distribution of keypoints. These

approaches can be divided into three categories: bucketing ap-

proaches, Non-Maximal Suppression (NMS), and ANMS.

2.1. Bucketing approach

Currently, the bucketing-based point detection approach [10]

is the most common technique used to ensure good repartition

of the keypoints. This approach is relatively simple: the source

image is partitioned into a grid and keypoints are detected in

each grid cell. The bucketing-based approach is eﬃcient for

detecting keypoints all over the image, however, it is unable to

avoid the presence of redundant information ( i.e. clusters of

keypoints).

2.2. Non-maximal suppression

NMS (also referred to as TopM) is often used to remove a

large number of keypoints which are mostly redundant or noisy

responses of the keypoint detectors. The most common ap-

proach for NMS [15] consists of suppressing the weakest key-

points using an empirically determined threshold. Thereafter,

the clusteredness is often reduced by suppressing the keypoints

which do not belong to a local maximum in a particular ra-

dius. NMS is a straightforward and fast way to reject unnec-

essary corners, but, in many real case situations, this approach

leads to a very limited spatial dissemination of the keypoints

(see Fig. 1(b)).

It should be noted that certain works have recently attempted

to improve the NMS stage by introducing a novel adaptive

cornerness score calculation taking into consideration the lo-

cal contrast around the keypoints [16]. Thus, these approaches

tend to improve the spatial distribution as well as the robustness

against illumination variations. However, they suﬀer from the

point clustering eﬀect inherent to NMS approaches.

2.3. Adaptive non-maximal suppression

ANMS methods have been developed to tackle the aforemen-

tioned drawbacks. These techniques enforce better keypoint

spatial distribution by jointly taking into account the cornerness

strength and the spatial localization of the keypoints. The very

ﬁrst ANMS approach was proposed by Brown et al. [4]. The

authors initially introduced this concept to robustify the image

matching for panorama stitching. In that work, the keypoints

are suppressed based on their corner strength and the location

of the closest strong keypoint. Unfortunately, the original im-

plementation of this ANMS has a quadratic complexity which

is not suitable for real-time applications such as SLAM.

To overcome this problem, multiple attempts to reduce the

computational time of ANMS have been investigated. For

instance, Cheng et al. [7] proposed an algorithm using a 2-

dimensional k-d tree for space-partitioning of high-dimensional

data. Using this data structure, the keypoints are separated into

rectangular image regions. Then, from each cell, the strongest

features are selected as the output sample set. This algorithm

was extended by Behrens et al. [1] using a general tree data

structure. While these methods perform faster than the tra-

ditional ANMS [4], they do not necessarily output homoge-

neously distributed points.

More recently, Gauglitz et al. [8] have proposed two comple-

mentary approaches that reportedly perform in a subquadratic

run time. In the ﬁrst approach, the authors have chosen to use

an approximate nearest neighbor algorithm [6] which relies on

a randomized search tree [17]. The second algorithm named

Suppression via Disk Covering (SDC) aims to further boost the

performance of the ANMS. The algorithm simulates an approx-

imate radius-nearest neighbor query by superimposing a grid

3

onto the keypoints and approximating the Euclidean distance

between keypoints by the distance between the centers of the

cells into which they fall.

Our proposed approaches tackle the limitations of previous

works while maintaining favorable eﬃciency and scalability.

3. Methodology

In this section, we describe a problem statement and propose

several eﬃcient algorithms which ensure a homogeneous repar-

tition of keypoints in the image. Speciﬁcally, we cover ANMS

based on Tree Data Structure (TDS) (includes K-dT and RT

ANMSs) followed by Suppression via Square Covering (SSC).

Lastly, we provide a derivation of the initialization of the search

range to further speed-up proposed algorithms.

3.1. Problem statement

Most of the recent ANMS approaches share a common

pipeline. The set of two-dimensional (d=2) input keypoints

Pin ={pi

in}n

i=1of size n=card(Pin ) (where card(.) stands for

cardinality operator) is extracted by the detector — and sorted

according to the cornerness score of the points. Further, the

keypoints in Pin are iteratively processed to compute a smaller

and better-distributed set of output keypoints Pout ={pi

out}m

i=1of

size m=card(Pout ), where mis deﬁned by the user. The output

set of points ensures a good spatial coverage all over the image

while avoiding clustering. This homogeneous point distribution

is enforced by a spatial consistency check in an adaptive search

range of size w(wis the radius of a circle or half the side of

a square depending upon what approach is used) deﬁning the

suppression neighborhood around a candidate point pin. The

radius wis adjusted until the number of retrieved points is close

to maccording to a certain threshold m±t, where trepresents

user-deﬁned tolerance threshold.

3.2. ANMS based on Tree Data Structure

Using a data structure is a common way to approach the

ANMS problem [8]. However, previous attempts have resulted

in relatively ineﬃcient implementations (Section 2). In addi-

tion, as observed in [1], after the ANMS step, there are still

regions in the image containing a high level of clusteredness.

In this section, we propose an eﬃcient algorithm which relies

on more suitable data structures and maintains good spatial key-

point distribution. K-dimensional Tree [13] (K-dT) and Range

Tree [2] (RT) have been used for this purpose.

First, K-dT is a binary search tree where the data in each node

is a K-dimensional point in space. Using this data structure al-

lows space partitioning to organize points in a K-dimensional

space. This partitioning can be used to eﬃciently retrieve the

set of points Pwwhich falls into a deﬁned range around a partic-

ular point. On the other hand, RT is an alternative to K-dT. RT

is a binary search tree where the data in each node contains an

associated structure that is a (d−1)-dimensional RT. Compared

to K-dTs, RTs oﬀer faster query times in exchange for worse

storage complexity (see Table 1). While these two data struc-

tures are essentially diﬀerent, from the high-level perspective,

the algorithm is generic and appropriate for any data structure

Table 1: TDS time and storage analysis.

K-d Tree Range Tree

Time Storage Time Storage

Insert O(log n)

O(n)

O(log dn)

O(nlog d−1n)Query O(n1−1/d+card(Pw)) O(log dn+card(Pw))

Delete O(log n)O(log dn)

that is capable of retrieving the set of points within the deﬁned

range. Therefore, we describe both proposed algorithms (i.e

K-dT ANMS and RT ANMS) within this subsection.

The TDS is built on keypoints Pin sorted in decreasing or-

der of strength (i.e., cornerness score). This TDS is used in

our algorithm as a way to eﬃciently obtain the nearest neigh-

bors of a particular keypoint given a search range. This search

range is determined by the binary search that tries to guess the

most appropriate search range wto satisfy the queried num-

ber of keypoints. For every wguess, the nearest neighbors of

each keypoint (processed in a decreasing order of strength) are

suppressed in a way that they will not be considered in further

iterations under the selected w. For this purpose, the index list

Idxsis used to keep track of the uncovered keypoints. The bi-

nary search terminates when the number of retrieved keypoints

is close to the number of queried keypoints maccording to a tol-

erance threshold m±t. The outline is provided in Algorithm 1.

The proposed algorithm has similarities to the algorithm pre-

sented in [8] where the authors have chosen to use an approxi-

mate nearest neighbor algorithm which relies on a randomized

search tree. However, that algorithm [8] is not optimally ef-

ﬁcient since it performs both query and delete operations for

each candidate keypoint in Pin per radius guess. Furthermore,

it requires dynamically adding/removing keypoints to the tree

which drastically slows performance. In contrast, our algo-

rithms achieve comparable results with a single query opera-

tion per search range guess, which makes it more eﬃcient and

scalable.

Algorithm 1: ANMS based on TDS

Input: keypoints Pin extracted by the detector

Output: spatially distributed keypoints Pout

sort Pin by strength

build T DS on sorted Pin

initialize binary search boundaries (Sec. 3.4)

while binary search for search range w do

Pout =∅

initialize Idxswith all as selected

for pi∈Pin do

if pi∈Idxsthen

Pout =Pout ∪pi

Pw=T DS .query(pi,w)

Idxs=Idxs\Pw

if |card(Pout )−m| ≤ tthen return Pout

3.3. Suppression via square covering

We have compared both K-dT ANMS and RT ANMS and

observed similar performance in terms of keypoint repeatabil-

ity and clusteredness (see Section 4.4). It is worth mentioning

4

that while in the case of K-dT the range of search is deﬁned by

the radius around the candidate point, RT uses a square approx-

imation of the search range. This square approximation can

potentially boost the speed performance of the ANMS.

One of the key approximations which makes SDC [8] eﬃ-

cient is a radius-nearest neighbor query, by superimposing a

grid Gwonto the keypoints and approximating the Euclidean

distance between keypoints by the distance between the centers

of the cells into which they fall. While this approximation per-

forms well, it still requires computing the Euclidean distance

between a large number of keypoints. Therefore, it is a cru-

cial concern since the number of computations increases as the

number of keypoints grows.

To tackle the aforementioned problem, we propose to apply

the square approximation for the SDC [8] algorithm. In partic-

ular, once the grid Gwis set, we try to cover the cells which lie

within 2w(determined by binary search) regardless of where

exactly the points are located inside this square range of cov-

erage. This drastically boosts the performance of the algorithm

since covering of the cells is simply performed by traversing

through a square search range without the need for Euclidean

distance computation. The pseudo-code is in Algorithm 2.

Algorithm 2: Suppression via Square Covering(SSC)

Input: keypoints Pin extracted by the detector

Output: spatially distributed keypoints Pout

sort Pin by strength

initialize binary search boundaries (Sec. 3.4)

while binary search for suppression side w do

set resolution of grid Gw=w/2

uncover cells of Gw

Pout =∅

for pi∈Pin do

if cell Gw[pi]is not covered then

Pout =Pout ∪pi

cover cells of Gwaround piwith square of side 2w

if |card(Pout )−m| ≤ tthen return Pout

3.4. Initialization of search range

Similar to our proposed algorithms, the SDC [8] uses binary

search to guess the appropriate search range. In this previous

work, the upper bound ahof the binary search is set to image

width WI, while the low one alis set to 1. This often results

in unnecessary iterations and decreases the convergence speed.

To tackle this problem, we propose a novel and elegant way

to precompute the bounds for the binary search which drasti-

cally decreases the number of iterations until convergence and,

in turn, improves the speed of the algorithm.

Our problem statement is the following, we want to homoge-

neously distribute the mqueried number of points on the image

without any clusters. To do so, we try to cover the image with

squares of side 2ahwith a minimal distance between the square

centers ah+1 (see Fig. 3). Given 2ahand WI, we can calculate

the maximum number of squares that perfectly ﬁt in a row of

the image. We deﬁne this row as a set of squares placed at the

same height in the image where the ﬁrst and the last square in

WI

HI

ah

ah

2

2

ah+1

1

ah

ah+1

2

Fig. 3: Graphical representation of the optimal point distribution. Bounding

boxes of diﬀerent colors represent the search range around the candidate points.

a row are perfectly aligned with image borders. If there are q

points (i.e. square centers) inside each row, then there are q−1

distances in each row between these points. In addition, the left

and right extreme points are located at a distance ahfrom the

left and right borders of the image. Thus, we can express the

image width WIin terms of ahand q:

WI=2ah+(ah+1)(q−1),(1)

hence, from Equation (1) the number of points in each row is:

q=WI−ah+1

ah+1.(2)

Similarly, the maximum number of square centers lpossibly

ﬁtting within the image height HIis:

HI=2ah+(ah+1)(l−1).(3)

The queried number of points mis equal to the product of qand

l. By substituting l=m

qto Equation (3) and substituting qfrom

Equation (2), we obtain the following equation:

(m−1)a2

h+ah(WI+2m+HI)+m+WI−HIWI=0.(4)

Solving this equation for ahyields two solutions, one of which

is always negative, while the other one gives us the ﬁnal esti-

mation of the square side:

ah=−HI+WI+2m−√∆

2(m−1) ,(5)

where the discriminant of the quadratic Equation (4) is:

∆ = 4WI+4m+4HIm+H2

I+W2

I−2WIHI+4WIHIm.(6)

It is worth mentioning, while the solution tries to allocate as

many points as possible, it does not guarantee that the num-

ber of points on the image will be exactly equal to m. This

happens for several reasons. First of all, the fraction m

qmight

not produce an integer value for l. Secondly, during the code

implementation, we round the obtained value for ahsince the

minimum unit of an image is 1 pixel.

The lower bound alof the binary search can be determined

by looking closely at the worst possible point distribution. This

happens when all ninput points are located in a single square

on the image with no space between them. Given such distribu-

tion, we want to retrieve at least mqueried points by ﬁlling this

space with the smallest possible squares of side 2al. This can be

mathematically expressed as m(2al)2=n. Therefore, the equa-

tion for the lower bound of the binary search is the following:

al=1

2rn

m.(7)

5

4. Results

4.1. Time and storage complexity

The detailed time complexity analysis is provided in Table 2.

All of the presented algorithms (listed in ‘Method’ column)

rely on the preprocessing (i.e. sorting by strength) input key-

points. For this purpose, we utilize a sorting algorithm with an

average performance of O(nlog n). Additionally, K-dT and RT

ANMSs rely on TDS which has to be populated with the in-

put keypoints. This is performed by inserting (see Table 1 for

complexity) nnumber of keypoints one by one into a data struc-

ture resulting in overall O(nlog n) and O(nlog dn) complexity,

respectively. The query time for each algorithm to select ap-

propriate keypoints is stated in the ‘Query’ column in Table 2.

Speciﬁcally, the TopM algorithm simply retrieves mnumber

of keypoints from an already sorted list in O(m). The tradi-

tional ANMS [4] (designated ‘Brown’ in our experiments) al-

gorithm requires the computation of the minimum distance be-

tween every keypoint which requires O(n2) followed by sort-

ing in O(nlog n) and keypoint retrieval in O(m). Since the rest

of the algorithms (SDC, K-dT ANSM, RT ANMS, and SSC)

rely on a binary search algorithm to ﬁnd the appropriate search

range in O(log wini), the total query time complexity can be ob-

tained by multiplying the number of search range guesses with

the complexity of keypoint selection per every guess. The to-

tal time complexity listed in ‘Total (approximated)’ gives us the

following insight into the algorithm’s performance. Obviously,

the TopM approach clearly outperforms all other methods in

terms of speed due to its simplicity. Furthermore, SDC, K-dT

ANMS, RT ANMS, and SSC are certainly asymptotically faster

than traditional ‘Brown’ [4].

The storage complexity evaluation is shown in Table 2.

Methods that do not rely on any data structure (e.g., TopM,

‘Brown’ [4], SDC, SSC) at most occupy memory necessary to

incorporate a number of input and output keypoints resulting in

O(n+m) complexity. These methods surely demonstrate better

storage complexity compared to K-dT and RT ANMSs which

additionally require memory for storing TDS (see Table 1).

Overall, due to the sophisticated time complexity estimation,

it is challenging to highlight a clear winner among the fastest

ANMS algorithms (SDC, K-dT ANMS, RT ANMS, and SSC).

In order to provide a qualitative evaluation of the algorithms,

we have performed an extensive evaluation of all methods.

4.2. Synthetic and real experiments

First of all, to fairly assess the speed performance of the

diﬀerent algorithms, a large series of synthetic experiments

has been performed. For this purpose, a set of randomly dis-

tributed 2D points is generated on a synthetic image of resolu-

tion 1280 ×720 p. Further, a random cornerness score is indi-

vidually assigned to every point to simulate the behavior of key-

point detection in a natural image. The number of 2D points is

in range [800,11000] with a step of 100. Every test is repeated

1000 times to ensure an unbiased estimation of the algorithms’

speed. The queried number of points is ﬁxed to 800, while the

search range wis initialized to image width. We intentionally

did not use our initialization technique (Section 3.4) for this test

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Mean computational time (ms)

0

1

2

3

4

5

6

number of points

RT ANMS

SSC

SDC

K-dT ANMS

TopM

(a) Mean processing time

number of points

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Std computational time (ms)

0

0.5

1

1.5

2

2.5

3

3.5

4

RT ANMS

SSC

SDC

K-dT ANMS

TopM

(b) Standard deviation

Fig. 4: Comparison of methods on synthetic data: (a) mean processing time,

(b) standard deviation.

to provide a fair comparison with SDC [8]. It should be noted

that Brown [4] has been removed from these experiments for

the sake of clarity (i.e., scale inconsistency) since this method

is signiﬁcantly slower than the proposed approaches.

The mean computational time and the standard deviation

against the number of points per iteration are available in

Fig. 4(a) and Fig. 4(b) respectively. Through this experiment,

it is noticeable that the TopM algorithm drastically outperforms

more sophisticated approaches (but provides a very unsatisfy-

ing point distribution in practice, see Sec. 4.4). Other algo-

rithms show interesting characteristics. SSC is indisputably

more eﬃcient than any existing algorithms both in stability

(i.e., the standard deviation remains very low) and speed. On

the other hand, SDC demonstrates satisfying results but suﬀers

from a lack of stability for a low number of input keypoints. In-

deed, this approach is more eﬃcient than RT ANMS when the

number of input points is large, however, this tendency is re-

versed when the detected keypoints do not exceed 5000. More-

over, SDC is less scalable than our SSC approach. Finally, de-

spite its eﬃciency for a small number of points, K-dT ANMS

loses its advantage for more than 2000 points.

While the assessment with synthetic data is a good evaluation

showing clear tendencies, the distribution of keypoints in real

images can be diﬀerent than the ones obtained synthetically.

Therefore, we propose an extensive series of evaluations us-

ing real images. For this purpose, we select 1000 images from

KITTI [9] (Sequence 00), and detect keypoints using FAST [15]

with a threshold th=5. Such relatively low threshold results in

a large number of detected keypoints (i.e.,>10000 keypoints per

image). Subsequently, the keypoints are sorted by their strength

. Further, we iteratively select a ﬁxed number of the strongest

keypoints starting from 100 until reaching 10000. The step of

such selection is 100, which results in 100 tests per image. The

ANMS algorithms return a ﬁxed percentage of the input num-

ber of keypoints. For instance, for 1000 input keypoints and a

queried percentage of 10%, the number of queried keypoints is

100. We have applied 10 diﬀerent ratios in range [10%,100%]

with a step of 10%. Several representative results from these

evaluations are provided in Figure 5.

This extensive evaluation demonstrates that SSC clearly out-

performs all other methods in terms of speed. Overall, diﬀerent

conclusions from those obtained with the synthetic experiments

can be drawn. Indeed, SDC remains eﬃcient for a relatively

small number of queried keypoints (Fig.5(a)) but tends to be

less eﬀective when a large number of input and output keypoints

are processed. This can be explained by the substantial number

6

Table 2: Time and storage complexity.

Time complexity

Method Preprocess Build Query Total (approximated)

Storage

complexity

TopM O(nlogn) - O(m)O(nlog n)O(n+m)

Brown O(nlog n) - O(n2+nlog n+m)O(n2)O(n+m)

SDC O(nlog n) - O(log wini ·(n+m/εr)) O(nlog n+log wini ·(n+m/εr)) O(n+m)

K-dT ANMS O(nlog n)O(nlog n)O(log wini ·(n+n1−1/d+Pcard(Pw))) O(nlog n+log wini ·(n+Pcard(Pw))) O(n+card(Pw)+m)

RT ANMS O(nlog n)O(nlog dn)O(log wini ·(n+log dn+Pcard(Pw))) O(nlog dn+log wini ·(n+Pcard(Pw))) O(nlog d−1n+card(Pw)+m)

SSC O(nlog n) - O(log wini ·(n+4m)) O(nlog n+log wini ·(n+4m)) O(n+m)

number of input points

0 2000 4000 6000 8000 10000

processing time [ms]

0

5

10

15

20

25

30

10 percent

SDC

K-dT ANMS

RT ANMS

SSC

(a)

number of input points

0 2000 4000 6000 8000 10000

processing time [ms]

0

5

10

15

20

25

30

35

40

40 percent

SDC

K-dT ANMS

RT ANMS

SSC

(b)

number of input points

0 2000 4000 6000 8000 10000

processing time [ms]

0

5

10

15

20

25

30

35

40

70 percent

SDC

K-dT ANMS

RT ANMS

SSC

(c)

Fig. 5: Mean processing time vs. number of input keypoints for 1000 images. Subﬁgures (a)-(c) show linear scale of the yaxis.

0

1

2

3

4

5

6

7

8

9

10

SDC K-dT ANMS RT ANMS SSC

Number of iterations

Without initialization With initialization

Fig. 6: Number of iterations until convergence (with and without initialization).

of Euclidean distances comparison to be computed for a dense

set of input keypoints (Fig.5(b) and (c)). In this respect, our RT

ANMS scales more eﬃciently even when many outputs points

are requested (see Fig.5(b) and (c)). Finally, our K-dT ANMS

becomes ineﬃcient for a large number of points due to the rel-

atively slow query time of this data structure (see Fig.5(c)).

4.3. Eﬀect of proposed initialization

In this experiment, we evaluate the impact of our initializa-

tion on the speed of the diﬀerent methods. For this purpose, the

same real-image experimental setup described in Section 4.2

is used. However, in this case, we employ our initialization

approach (Section 3.4). Two criteria have been utilized to de-

termine the advantages oﬀered by this technique. The ﬁrst is

the number of iterations needed to reach the number of queried

points (see Fig. 6). The second one is the overall speed-up pro-

vided to each method (see Table 3).

For every single method, the number of necessary iterations

has been reduced by a factor of three, leading to a signiﬁcant

speed-up. It is noticeable that certain approaches are more af-

fected by this initialization. For instance, this is the case for the

K-dT ANMS approach which has been sped-up by a factor of

2.6×. However, some other algorithms such as our RT ANMS

has been moderately improved by our bounds calculation. This

Table 3: Speedup provided by initialization (average over 1000 images).

Method Without Initialization (ms) With Initialization (ms) Speedup

SDC 7.4 3.1 2.4x

K-dT ANMS 17.5 6.8 2.6x

RT ANMS 8.9 7.3 1.2x

SSC 2.0 1.4 1.4x

can be explained by the nature of the RT structure itself. In fact,

with a closer initialization (i.e., a smaller search range), the total

number of expensive queries increases.

4.4. Clusteredness

The main advantage of using an ANMS strategy is the un-

clustered and well-distributed set of keypoints resulting from

this process. Indeed, this feature allows us to avoid redun-

dant information typically occurring with commonly used ap-

proaches such as bucketing keypoint detection and standard

NMS. To evaluate the clusteredness we have reproduced the

experiment suggested in [16], where the authors propose an ap-

propriate metric to evaluate this criterion. For this evaluation,

the image is divided into a regular grid of 10 ×10 cells to com-

pute the number of points lying in every single cell. The stan-

dard deviation of the number of corners per cell is utilized as

the clusteredness metric since it is representative of the homo-

geneity of the spatial distribution.

To provide a statically valid evaluation of the clusteredness

for every single approach, 2000 randomly selected images from

the KITTI dataset [9] are used. In this experiment, th=12 and

the number of queried keypoints mvaries between 100 to 700.

The obtained results are visible in Fig. 8. We can clearly notice

that all the ANMS approaches provide similar outputs in terms

of spatial distribution which can also be observed in Fig. 7. As

to the bucketing approach (grid size is 7×5), it produces bet-

ter spatial distribution than TopM, but cannot meet the perfor-

mance of the ANMS strategies. This can be explained by the

7

(a) (b) (c)

Fig. 7: Keypoint detection: (a) K-dT ANMS, (b) RT ANMS, (c) SSC. The red dots represent selected keypoints. In this experiment, th=12 and m=100.

number of points

100 200 300 400 500 600 700

clusterness

0

0.005

0.01

0.015

0.02

0.025

0.03 RT ANMS

SSC

SDC

Brown

K-dT ANMS

TopM

Bucketing

Fig. 8: Mean and standard deviation of the clusteredness over 1000 images.

fact that bucketing approach is designed to ensure good spatial

distribution, but does not solve the problem of point clusters.

4.5. Application to SLAM

SLAM is one of the applications where the spatial distribu-

tion of the keypoints on the image is crucial. Therefore, we

have included our ANMS solutions in a stereo-SLAM algo-

rithm which is conceptually close to S-PTAM [14]. Speciﬁ-

cally, the keypoints are detected on both stereo-images using

the FAST (th=12) and ﬁltered by our ANMS algorithms to

reach 750 points. These stereo points are matched together us-

ing a line search strategy and triangulated to initialize the 3D

map. Motion tracking is performed using a RANSAC P3P [11]

algorithm. Finally, the mapping is achieved by reﬁning the

structure and the motion together via a local bundle adjustment

scheme. For this evaluation, we have utilized all the training se-

quences from the KITTI dataset [9] where an accurate ground

truth is provided. The mean translation (in percentage) and ro-

tation (in degree) error per sequence are computed with the met-

rics recommended by [9].

The results of the entire experiment are available in Fig. 9.

Regarding the translational error, a clear tendency is noticeable.

For instance, the TopM algorithm is particularly ineﬃcient in

light of the other approaches. On the other hand, the bucketing

approach tends to perform better than the TopM approach but

never provides better results than the ANMS methods. All the

ANMS approaches provide comparable results. The same ten-

dency is observed for the rotation estimation. However, for ro-

tation, the detection of well-distributed keypoints is less crucial.

The error discrepancy between the diﬀerent sequences may be

justiﬁed by the various contexts in which the sequences have

been acquired. For instance, in Seq00 the large rotational er-

ror can be explained by the high quantity of turns in the se-

quence, while Seq04 (which admits a very low rotational error)

mostly consists of a short and straight line. Moreover, Seq01

is interesting to analyze because it is probably the most chal-

lenging - the vehicle is going at high speed through a relatively

empty scene (low texture). These factors make the keypoints

particularly diﬃcult to track. Under these conditions, ANMS

algorithms show even more signiﬁcant improvements. Another

observation is the improved robustness to moving objects. Our

approaches also show very promising results in sequences con-

taining one or more moving objects (for instance in Seq04).

With ANMS only a few keypoints are detected on moving ob-

jects, while their majority belong to the rigid background, there-

fore, the outliers are more eﬃciently removed by a robust esti-

mation step (RANSAC in our SLAM). Finally, in Fig. 9, the

slight error discrepancy between ANMS methods is mostly due

to the inherent randomness of the point tracking strategy, noise,

and numerical error typical of real image experiments. Nev-

ertheless, we can certainly conclude that all the ANMS ap-

proaches - compared in this paper - signiﬁcantly improve the

SLAM algorithm in a very similar manner.

In Fig. 10, we propose a qualitative comparison of our SSC

algorithm against the bucketing strategy. For this estimation,

we use the New College dataset [18] (see Fig. 1) consisting

of 50000 stereo image pairs, covering 2.5km with a hand-

held stereo camera (multiple loops and challenging scenarios).

Through this experiment, it is clear that our ANMS approach

signiﬁcantly reduces drift over the sequence compared to the

bucketing approach. This drift is particularly obvious in the

side view (see Fig. 10(b)). Note that the TopM algorithm is not

depicted in this ﬁgure for sake of clarity (very large drift).

4.6. Discussion on proposed methods

Certainly, ANMS approaches are beneﬁcial under speciﬁc

contexts and conditions. It is appropriate for pose estimation

(SLAM, panorama stitching, etc.), self-calibration, and Struc-

ture from Motion (SfM). Similarly, Schauwecker et al. [16]

have demonstrated that a good dissemination of the points in

the images resulted in a better sparse stereo matching. How-

ever, ANMS is not limited to these topics and can be appropri-

ate for many real-time approaches. For example, it might be the

case for Bag-of-Word place recognition, where well-distributed

points can lead to a stronger description of the image. While the

authors have originally developed SDC [8] for planar tracking

purposes, we believe that ANMS might be counter-productive

for visual tracking under certain conditions (i.e. small target,

cluttered scene). Other techniques requiring a dense cluster of

points on a salient part of the image (i.e. point based obstacle

detection) would probably not be improved by ANMS.

In this paper, we have proposed three ANMS techniques

named K-dT ANMS, RT ANMS, and SSC to homogeneously

distribute keypoints on the image. While ANMS methods pro-

vide visually and statistically (analyzed by Z-test) similar out-

puts in terms of spatial distribution, SSC demonstrates the best

time and scalability performance. Therefore, this algorithm is

8

0

1

2

3

4

5

6

Seq00

Seq01

Seq02

Seq03

Seq04

Seq05

Seq06

Seq07

Seq08

Seq09

Seq10

Translational error (%)

Sequence number

Brown

RT-ANMS

SSC

SDC

K-dT ANMS

TopM

Bucketing

(a)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Seq00

Seq01

Seq02

Seq03

Seq04

Seq05

Seq06

Seq07

Seq08

Seq09

Seq10

Rotational error (

°

)

Sequence number

Brown

RT-ANMS

SSC

SDC

K-dT ANMS

TopM

Bucketing

(b)

Fig. 9: Experimental results of diﬀerent methods on SLAM: (a) mean translational error, (b) rotational error.

Z (m)

0

50

100

150

X (m)

0 50 100 150 200

SSC

Bucketing

(a)

X (m)

200

150

100

50

0

-40

Y (m)

-20

0

0

50

100

150

Z (m)

SSC

Bucketing

(b)

Fig. 10: Trajectories computed on the New College dataset using our SSC ap-

proach and the bucketing strategy: (a) top view, (b) side view.

advisable for use in case an application requires real-time per-

formance even when the number of input points is relatively

high. This would include, for example, real-time SLAM or vi-

sual odometry. On the other hand, since K-dT and RT ANMSs

are based on TDS to store the input points, they can be used for

situations when keypoints need to be reused. A good example

is a large scale SfM where many re-projection on the images

have to be performed to aggregate new images. Thus, these

approaches can be accelerated by using the same structure for

keypoints detection and for point matching. Compared to K-dT

ANMS, RT ANMS oﬀers faster query time but requires more

storage memory. Therefore, a user should consider this tradeoﬀ

when choosing among the proposed ANMS based on TDS.

5. Conclusion

In this paper, we have presented three novel ANMS tech-

niques (codes are provided) to homogeneously distribute de-

tected keypoints in the image. Through an extensive series of

experiments, we have highlighted the eﬀectiveness and scala-

bility of our approaches. Furthermore, we have demonstrated

the positive impact of our ANMS strategies on visual SLAM.

The presented results show that ANMS is a beneﬁcial step for

improving SLAM performance. Another major contribution of

this paper is the binary search boundaries initialization which

drastically reduces the number of iterations needed to retain the

queried number of points. The proposed initialization is de-

signed to be suitable for any ANMS relying on binary search.

The current ANMS approaches are designed to handle con-

ventional images, but may perform poorly on non-uniform spa-

tial resolution induced by distortion (e.g. ﬁsheye lens, catadiop-

tric system, etc.). Naturally, the extension of this work will

focus on this problem by proposing an ANMS applicable to the

uniﬁed spherical model.

Acknowledgment

This research was supported by the Shared Sensing for Coop-

erative Cars Project funded by Bosch (China) Investment Ltd.

The second author was supported by Korea Research Fellow-

ship (KRF) Program through the NRF funded by the Ministry

of Science, ICT and Future Planning (2015H1D3A1066564).

References

[1] A. Behrens and H. R ¨

ollinger. Analysis of feature point distributions for

fast image mosaicking algorithms. Acta Polytechnica, 50(4), 2010.

[2] R. Berinde. Eﬃcient implementations of range trees. 2007.

[3] Y. Bok, H. Ha, and I. Kweon. Automated checkerboard detection and

indexing using circular boundaries. Pattern Recognition Letters, 71:66–

72, 2016.

[4] M. Brown, R. Szeliski, and S. Winder. Multi-image matching using multi-

scale oriented patches. In CVPR, 2005.

[5] S. Buoncompagni, D. Maio, D. Maltoni, and S. Papi. Saliency-based

keypoint selection for fast object detection and matching. Pattern Recog-

nition Letters, 62:32–40, 2015.

[6] T. Chan. A minimalists implementation of an approximate nearest neigh-

bor algorithm in ﬁxed dimensions. See https://goo.gl/cvDjAs, 2006.

[7] Z. Cheng, D. Devarajan, and R. Radke. Determining vision graphs for

distributed camera networks using feature digests. EURASIP Journal on

Applied Signal Processing, 2007(1):220–220, 2007.

[8] S. Gauglitz, L. Foschini, M. Turk, and T. H¨

ollerer. Eﬃciently selecting

spatially distributed keypoints for visual tracking. In ICIP, 2011.

[9] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driv-

ing? the kitti vision benchmark suite. In CVPR, 2012.

[10] B. Kitt, A. Geiger, and H. Lategahn. Visual odometry based on stereo

image sequences with ransac-based outlier rejection scheme. In IV, 2010.

[11] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of

the perspective-three-point problem for a direct computation of absolute

camera position and orientation. In CVPR, 2011.

[12] Q. Miao, G. Wang, C. Shi, X. Lin, and Z. Ruan. A new framework

for on-line object tracking based on surf. Pattern Recognition Letters,

32(13):1564–1571, 2011.

[13] M. Muja and D. Lowe. Scalable nearest neighbor algorithms for high

dimensional data. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 36(11):2227–2240, 2014.

[14] T. Pire, T. Fischer, J. Civera, P. De Crist´

oforis, and J. Berlles. Stereo

parallel tracking and mapping for robot localization. In IROS, 2015.

[15] E. Rosten and T. Drummond. Machine learning for high-speed corner

detection. In ECCV, 2006.

[16] K. Schauwecker, R. Klette, and A. Zell. A new feature detector and stereo

matching method for accurate high-performance sparse stereo matching.

In IROS, 2012.

[17] R. Seidel and C. Aragon. Randomized search trees. Algorithmica,

16(4):464–497, 1996.

[18] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The new

college vision and laser data set. The International Journal of Robotics

Research, 28(5):595–599, May 2009.