ArticlePDF Available

Abstract and Figures

Squared planar markers have become a popular method for pose estimation in applications such as autonomous robots, unmanned vehicles and virtual trainers. The markers allow estimating the position of a monocular camera with minimal cost, high robustness, and speed. One only needs to create markers with a regular printer, place them in the desired environment so as to cover the working area, and then registering their location from a set of images. Nevertheless, marker detection is a time-consuming process, especially as the image dimensions grows. Modern cameras are able to acquire high resolutions images, but fiducial marker systems are not adapted in terms of computing speed. This paper proposes a multi-scale strategy for speeding up marker detection in video sequences by wisely selecting the most appropriate scale for detection, identification and corner estimation. The experiments conducted show that the proposed approach outperforms the state-of-the-art methods without sacrificing accuracy or robustness. Our method is up to 40 times faster than the state-of-the-art method, achieving over 1000 fps in 4 K images without any parallelization.
Content may be subject to copyright.
Speeded Up Detection of Squared Fiducial Markers
Francisco J. Romero-Ramirez1, Rafael Mu˜noz-Salinas1,2,, Rafael Medina-Carnicer1,2
Abstract
Squared planar markers have become a popular method for pose estimation in applications such as autonomous robots,
unmanned vehicles or virtual trainers. The markers allow estimating the position of a monocular camera with minimal
cost, high robustness, and speed. One only needs to create markers with a regular printer, place them in the desired
environment so as to cover the working area, and then registering their location from a set of images.
Nevertheless, marker detection is a time-consuming process, especially as the image dimensions grows. Modern cameras
are able to acquire high resolutions images, but fiducial marker systems are not adapted in terms of computing speed.
This paper proposes a multi-scale strategy for speeding up marker detection in video sequences by wisely selecting the
most appropriate scale for detection, identification and corner estimation. The experiments conducted show that the
proposed approach outperforms the state-of-the-art methods without sacrificing accuracy or robustness. Our method is
up to 40 times faster than the state-of-the-art method, achieving over 1000 fps in 4K images without any parallelization.
Keywords: Fiducial Markers, Marker Mapping, SLAM.
1. Introduction1
Pose estimation is a common task for many applications2
such as autonomous robots [1, 2, 3], unmanned vehicles3
[4, 5, 6, 7, 8] and virtual assistants [9, 10, 11, 12], among4
other.5
Cameras are cheap sensors that can be effectively used6
for this task. In the ideal case, natural features such as7
keypoints or texture [13, 14, 15, 16] are be employed to8
create a map of the environment. Although some of the9
traditional problems of previous methods for this task have10
been solved in the last few years, other problems remain.11
For instance, they are subject to filter stability issues or12
significant computational requirements.13
In any case, artificial landmarks are a popular approach14
for camera pose estimation. Square fiducial markers, com-15
prised by an external squared black border and an internal16
identification code, are especially attractive because the17
camera pose can be estimated from the four corners of a18
single marker [17, 18, 19, 20]. The recent work of [21] is19
Corresponding author
Email addresses: fj.romero@uco.es (Francisco J.
Romero-Ramirez), in1musar@uco.es (Rafael Mu˜noz-Salinas),
rmedina@uco.es (Rafael Medina-Carnicer)
1Departamento de Inform´atica y An´alisis Num´erico, Edificio Ein-
stein. Campus de Rabanales, Universidad de Co´rdoba, 14071,
ordoba, Spain, Tlfn:(+34)957212289
2Instituto Maim´onides de Investigaci´on en Biomedicina (IM-
IBIC). Avenida Men´endez Pidal s/n, 14004, ordoba, Spain,
Tlfn:(+34)957213861
a step forward the use of this type of markers in large- 20
scale problems. One only need to print the set of markers 21
with a regular printer, place them in the area under which 22
the camera must move, and take a set of pictures of the 23
markers. The pictures are then analyzed and the three- 24
dimensional marker locations automatically obtained. Af- 25
terward, a single image spotting a marker is enough to 26
estimate the camera pose. 27
Despite the recent advances, marker detection can be a 28
time-consuming process. Considering that the systems re- 29
quiring localization have in many cases limited resources, 30
such as mobile phones or aerial vehicles, the computational 31
effort of localization should be kept to a minimum. The 32
computing time employed in marker detection is a func- 33
tion of the image size employed: the larger the images, the 34
slower the process. On the other hand, high-resolution im- 35
ages are preferable since markers can be detected, even 36
if far from the camera, with high accuracy. The con- 37
tinuous reduction in the cost of the cameras, along with 38
the increase of their resolution, makes necessary to de- 39
velop methods able to reliably detect the markers in high- 40
resolution images. 41
The main contribution of this paper is a novel method 42
for detecting square fiducial markers in video sequences. 43
The proposed method relies on the idea that markers can 44
be detected in smaller versions of the image, and employs a 45
multi-scale approach to speed up computation while main- 46
taining the precision and accuracy. In addition, the sys- 47
tem is able to dynamically adapt its parameters in order 48
Preprint submitted to Image and Vision Computing June 28, 2018
to achieve maximum performance in the analyzed video49
sequence. Our approach has been extensively tested and50
compared with the state-of-the-art methods for marker de-51
tection. The results show that our method is more than an52
order of magnitude faster than state-of-the-art approaches53
without compromising robustness or accuracy, and with-54
out requiring any type of parallelism.55
The remainder of this paper is structured as follows.56
Section 2 explains the works most related to ours. Sec-57
tion 3 details our proposal for speeding up the detection58
of markers. Finally, Section 4 gives a exhaustive analysis59
of the proposed method and Section 5 draws some conclu-60
sions.61
2. Related works62
Fiducials marker systems are commonly used for camera63
localization and tracking when robustness, precision, and64
speed are required. In the simplest case, points are used65
as fiducial markers, such as LEDs, retroreflective spheres66
or planar dots [22, 23]. However, their main drawback is67
the need of a method to solve the assignment problem, i.e.,68
assigning a unique and consistent identifier to each element69
over time. In order to ease the problem, a common solution70
consists in adding an identifying code into each marker.71
Examples of this are planar circular markers [24, 25], 2D-72
barcodes [26, 27] and even some authors have proposed73
markers designed using evolutionary algorithms [28].74
Amongst all proposed approaches, these based on75
squared planar markers have gained popularity. These76
markers consist of an external black border and an inter-77
nal code (most often binary) that uniquely identifies each78
marker (see Fig 1). Their main advantage is that the pose79
of the camera can be estimated from a single marker.80
ARToolKit [29] is one of the pioneer proposals. They81
employed markers with a custom pattern that is identified82
by template matching. This identification method, how-83
ever, is prone to error and not very robust to illumination84
changes. In addition, the method’s sensitivity degrades85
as the number of markers increases. As a consequence,86
other authors improved that work by using binary BCH87
codes [30] (which allows a more robust error detection) and88
named it ARToolKit+ [31]. The project was halted and89
followed by the Studierstube Tracker project [32], which is90
privative. Similar to the ARToolKit+ project is the dis-91
continued project ARTag [33].92
BinARyID [34] is one of the first systems that proposed93
a method for generating customizable marker codes. In-94
stead of using a predefined set of codes, they proposed95
a method for generating the desired number of codes for96
each particular application. However, they do not consider97
the possibility of error detection and correction. AprilT- 98
ags [18], however, proposed methods for error detection 99
and correction, but their approach was not suitable for a 100
large number of markers. 101
The work ArUco [17] is probably the most popular sys- 102
tem for marker detection nowadays. It adapts to non- 103
uniform illumination, and is very robust, being able to 104
do error detection and correction of the binary codes im- 105
plemented. In addition, the authors proposed a method 106
to obtain optimal binary codes (in terms of intermarker- 107
distance) using Mixed Integer Linear Programming [35]. 108
Chilitags [36] is a variation of ArUco that employs a sim- 109
pler method for decoding the marker binary codes. As we 110
show in the experimental section, the method has a bad 111
behavior in high-resolution images. 112
The recent work [21] is a step towards the applicabil- 113
ity of such methods to large areas, proposing a method 114
for estimating the three-dimensional location of a set of 115
markers freely placed in the environment (Fig 1). Given 116
a set of images taken with a regular camera (such as a 117
mobile phone), the method automatically estimates their 118
location. This is an important step that allows extending 119
the robust localization of fiducial markers to very large 120
areas. 121
Although all fiducial marker systems aim maximum 122
speed in their design, few specific solutions have been pro- 123
posed to speed up the detection process. The work of 124
Johnston et. al. [37] is an interesting example in which 125
the authors propose a method to speed up computation by 126
parallelizing the image segmentation process. Neverthe- 127
less, both speed and computing power is a crucial aspect, 128
especially if the localization system needs to be embedded 129
in devices with limited resources. 130
Our work can be seen as an improvement of the ArUco 131
system, that according to our experience, is one of the most 132
reliable fiducial marker systems nowadays (see Sec 4 for 133
further details). We propose a novel method for marker de- 134
tection and identification that allows to speed up the com- 135
puting time in video sequences by wisely exploiting tempo- 136
ral information and an applying multi-scale approach. In 137
contrast to previous works, no parallelization is required in 138
our method, thus making it especially attractive for mobile 139
devices with limited computational resources. 140
3. Speeded up marker detection 141
This section provides a detailed explanation of the 142
method proposed for speeding up the detection of squared 143
planar markers. First, Sect. 3.1 provides an overview of 144
the pipeline employed in the previous work, ArUco [17], 145
for marker detection and identification, highlighting the 146
parts of the process susceptible to be accelerated. Then, 147
2
Figure 1: Detection and identification pipeline of ArUco. (a)
Original image. (b) Image thresholded using an adaptive method.
(c) Contours extracted. (d) Filtered contours that approximate to
four-corner polygons. (e) Canonical image computed for one of the
squared contours detected. (f ) Binarization after applying Otsu’s
method.
Sect. 3.2 explains the proposed method to speed up the148
process.149
3.1. Marker detection and identification in ArUco150
The main steps for marker detection and identification151
proposed in ArUco [17] are depicted in Figure 1. Given the152
input image I(Figure 1a), the following steps are taken:153
Image segmentation (Figure 1b). Since the designed154
markers have an external black border surrounded by155
a white space, the borders can be found by segmen-156
tation. In their approach, a local adaptive method is157
employed: the mean intensity value mof each pixel158
is computed using a window size wt. The pixel is set159
to zero if its intensity is greater than mc, where c160
is a constant value. This method is robust and ob-161
tains good results for a wide range of values of its162
parameters wtand c.163
Contour extraction and filtering (Figures 1(c,d)). The164
contour following algorithm of Suzuki and Abe [38]165
is employed to obtain the set of contours from the166
thresholded image. Since most of the contours ex-167
tracted correspond to irrelevant background elements,168
a filtering step is required. First, contours too small169
are discarded. Second, the remaining contours are170
approximated to its most similar polygon using the171
Douglas and Peucker algorithm [39]. Those that do172
not approximate well to a four-corner convex polygon173
are discarded from further processing.174
Marker code extraction (Figures 1(e,f)). The next175
step consists in analyzing the inner region of the re-176
maining contours to determine which of them are valid 177
markers. To do so, perspective projection is first re- 178
moved by computing the homography matrix, and the 179
resulting canonical image (Fig. 1e) is thresholded us- 180
ing the Otsu’s method [40]. The binarized image 181
(Fig. 1f) is divided into a regular grid and each ele- 182
ment is assigned a binary value according to the ma- 183
jority of the pixels in the cell. For each marker candi- 184
date, it is necessary to determine whether it belongs 185
to the set of valid markers or if it is a background el- 186
ement. Four possible identifiers are obtained for each 187
candidate, corresponding to the four possible rota- 188
tions of the canonical image. If any of the identifiers 189
belong to the set of valid markers, then it is accepted. 190
Subpixel corner refinement. The last step consists in 191
estimating the location of the corners with subpixel 192
accuracy. To do so, the method employs a linear 193
regression of the marker’s contour pixels. In other 194
words, it estimates the lines of the marker sides em- 195
ploying all the contour pixels and computes the in- 196
tersections. This method, however, is not reliable for 197
uncalibrated cameras with small focal lenses (such as 198
fisheye cameras) since they usually exhibit high dis- 199
tortion. 200
When analyzing the computing times of this pipeline, 201
it can be observed that the Image segmentation and the 202
Marker code extraction steps are consuming most of the 203
computing time. The time employed in the image segmen- 204
tation step is proportional to the image size, that also in- 205
fluences the length of the contours extracted and thus the 206
computing time employed in the Contour extraction and 207
filtering step. The extraction of the canonical image (in 208
the Marker code extraction step) involves two operations. 209
First, computing the homography matrix, which is cheap. 210
But then, the inner region of each contour must be warped 211
to create the canonical image. This step requires access to 212
the image pixels of the contour region performing an inter- 213
polation in order to obtain the canonical image. The main 214
problem is that the time required to obtain the canonical 215
image depends on the size of the observed contour. The 216
larger a contour in the original image, the more time it is 217
required to obtain the canonical image. Moreover, since 218
most of the contours obtained do not belong to markers, 219
the system may employ a large amount of time computing 220
canonical images that will be later rejected. 221
A simpler approach to solving that problem would be to 222
directly sample a few sets of pixels from the inner region 223
of the marker. This is the method employed in ChiliTags. 224
However, as it will be shown in the experimental section, 225
it is prone to many false negatives. 226
3
Figure 2: Process pipeline Main steps for fast detection and identification of squared planar markers.(a) Original input image. (b) Resized
image for marker search. (c) Thresholded image. (d) Rectangles found (pink). (e) Markers detected with its corresponding identification.
The image pyramid is used to speed up homography computation. (f) The corners obtained in (e) are upsampled to find their location in the
original image with subpixel precision.
3.2. Proposed method227
The key ideas of our proposal in order to speed up the228
computation are explained below. First, while the adap-229
tive thresholding method employed in ArUco is robust to230
many illumination conditions without altering its param-231
eters, it is a time-consuming process that requires a con-232
volution. By taking advantage of temporal information,233
the adaptive thresholding method is replaced by a global234
thresholding approach.235
Second, instead of using the original input image, a236
smaller version is employed. This is based on the fact237
that, in most cases, the useful markers for camera pose238
estimation must have a minimum size. Imagine an image239
of dimensions 1920 ×1080 pixels, in which a marker is de-240
tected as a small square with a side length of 10 pixels.241
Indeed, the estimation of the camera pose is not reliable242
at such small resolution. Thus, one might want to set a243
minimum length to the markers employed for camera pose244
estimation. For instance, let say that we only use markers245
with a minimum side length of ˙τi= 100 pixels, i.e., with a246
total area of 10.000 pixels. Another situation in which we247
can set a limit to the length of markers is when processing248
video sequences. It is clear that the length of a marker249
must be similar to its length in the previous frame.250
Now, let us also think about the size of the canonical251
images employed (Figure 1e). The smaller the image, the252
faster the detection process but the poorer the image qual-253
ity. Our experience, however, indicates that very reliable254
detection of the binary code can be obtained from very255
small canonical images, such 32×32 pixels. In other words,256
all the rectangles detected in the image, no matter their257
side length, are reduced to canonical images of side length258
τc= 32 pixels, for the purpose of identification.259
Our idea, then, is to employ a reduced version of the 260
input image, using the scale factor τc
˙τi, so as to speed up 261
the segmentation step. In the reduced image, the smallest 262
allowed markers, with a side length of 100 pixels in the 263
original image, appears as rectangles with a side length of 264
32 pixels. As a consequence, there will be no loss of quality 265
when they are converted into the canonical image. 266
This idea has one drawback: the location of the corners 267
extracted in the low resolution image is not as good esti- 268
mations as the ones that can be obtained in the original 269
image. Thus, the pose estimated with them will have a 270
higher error. To solve that problem, a corner upsampling 271
step is included, in which the precision of the corners is re- 272
fined up to subpixel accuracy in the original input image 273
by employing an image pyramid. 274
Finally, it must be considered that the generation of 275
the canonical image is a very time-consuming operation 276
(even if the process is done in the reduced image) that 277
proportional to the contour length. We propose a method 278
to perform the extraction of the canonical images in almost 279
constant time (independently of the contour length) by 280
wisely employing the image pyramid. 281
Below, there is a detailed explanation of the main steps 282
of the proposed method, using Figure 2 to ease the expla- 283
nation. 284
1. Image Resize: Given the input image I(Fig 2a), the
first step consists in obtaining a resized version Ir
(Fig 2b) that will be employed for segmentation. As
previously pointed out, the size of the reduced image
is calculated as:
Ir
w=τc
˙τi
Iw;Ir
h=τc
˙τi
Ih,(1)
where the subscripts wand hdenotes width and height
4
Figure 3: Pyramidal Warping. Scene showing tree marker at
different resolutions. The left column shows the canonical images
warped from the pyramid of images. Larger markers are warped
from smaller images. For each marker, the image of the pyramid
that minimizes the warping time while preserving the resolution is
selected.
respectively. In order to decouple the desired mini-
mum marker size from the input image dimensions,
we define ˙τias:
˙τi=τc+max(Iw, Ih)τi|τi[0,1],(2)
where the normalized parameter τiindicates the min-285
imum marker size as a value in the range [0,1]. When286
τi= 0, the reduced image will be the same size as the287
original image. As τitends to one, the image Irbe-288
comes smaller, and consequently, the computational289
time required for the following step is reduced. The290
impact of this parameter in the final speed up is mea-291
sured in the experimental section.292
2. Image Segmentation: As already indicated, a global293
threshold method is employed using the following pol-294
icy. If no markers were detected in the previous frame,295
a random threshold search is performed. The random296
process is repeated up to three times using the range297
of threshold values [10,240]. For each tested thresh-298
old value, the whole pipeline explained below is per-299
formed. If after a number of attempts, no marker is300
found, it is assumed that no markers are visible in the301
frame. If at least one marker is detected, a histogram302
is created using the pixel values of all detected mark-303
ers. Then, Otsu’s algorithm [40] is employed to select304
the optimal threshold for the next frame. The calcu-305
lated threshold is applied to Irin order to obtain It306
(Fig 2c). As we show experimentally, the proposed 307
method can adapt to smooth and abrupt illumination 308
changes. 309
3. Contour Extraction and Filtering: First, contours are 310
extracted from the image Itusing Suzuki and Abe al- 311
gorithm [38], then small contours are removed. Since 312
the extracted contours will rarely be squared (due to 313
perspective projection), their perimeter is employed 314
for rejection purposes: those with a perimeter smaller 315
than P(τc)=4×τcpixels are rejected. For the re- 316
maining contours, a polygonal approximation is per- 317
formed using Douglas and Peucker algorithm [39], and 318
those that do not approximate to a convex polygon of 319
four corners are also rejected. Finally, the remaining 320
contours are the candidates to be markers (Fig 2d). 321
4. Image Pyramid Creation: An image pyramid
I= (I0, . . . , In)
with a set of resized versions of I, is created. I0de- 322
notes the original image and the subsequent images 323
Iiare created by subsampling Ii1by a factor of two. 324
The number nof images in the pyramid is such that
the smallest image dimensions is close to τc×τc, i.e.,
n= argmin
v|Iv∈I
|(Iv
wIv
h)τ2
c|.(3)
5. Marker Code Extraction: In this step the canonical 325
images of the remaining contours must be extracted 326
and then binarized. Our method uses the pyramid 327
of images Ipreviously computed to ensure that the 328
process is performed in constant time, independently 329
of the input image and contour sizes. The key princi- 330
ple is selecting, for each contour, the image from the 331
pyramid in which the contour length is most similar 332
to the canonical image length P(τc). In this manner, 333
warping is faster. 334
Let us consider a detected contour ϑIr, and denote
by P(ϑ)jits perimeter in the image Ij I. Then,
the best image Ih I for homography computation
is selected as:
Ih|h= argmin
j∈{0,1,...n}
|P(ϑ)jP(τc)|.(4)
335
The pyramidal warping method employed can be bet- 336
ter understood in Fig. 3, which shows a scene with 337
three markers at different distances. The left im- 338
ages represent the canonical images obtained while 339
the right images show the pyramid of images. In our 340
method, the canonical image of the smallest marker is 341
extracted from the largest image in the pyramid (top 342
5
Figure 4: Test sequences. (a) The set of 16 markers employed for evaluation. There are four markers from each method tested: ArUco,
AprilTags, ArToolKit+ and ChiliTags. (b-e) Images from the video sequences used for testing. The markers are seen as small as in (b), and
as big as in (e), where the marker represents the 40% of the total image area.
row of Fig 3). As the length of the marker increases,343
smaller images of the pyramid are employed to obtain344
the canonical view. This guarantees that the canon-345
ical image is obtained in almost constant time using346
the minimum possible computation.347
Finally, for each canonical image, the Otsu’s method348
[40] for binarization is employed, and the inner code349
analyzed to determine whether it is a valid marker or350
not. This is a very cheap operation.351
6. Corner Upsampling: So far, markers have been de-352
tected in the image Ir. However, it is required to353
precisely localize their corners in the original image354
I. As previously indicated, the precision of the esti-355
mated camera pose is directly influenced by the pre-356
cision in the corner localization. Since the difference357
in size between the images Iand Ircan be very large,358
a direct upsampling can lead to errors. Instead, we359
proceed in incremental steps looking for the corners360
in larger versions of the image Iruntil the image Iis361
reached.362
For the corner upsampling task, the image Ii I of363
the pyramid with the most similar size to Iris selected364
in the first place, i.e.,365
Ii= argmin
Iv∈I
|(Iv
wIv
h)(Ir
wIr
h)|.(5)
Then, the position of each contour corner in the image366
Iiis computed by simply upsampling the corner lo-367
cations. This is, however, an approximate estimation368
that does not precisely indicate the corner position369
in the image Ii. Thus, a corner refinement process is370
done in the vicinity of each corner so as to find its best371
location in the selected image Ii. For that purpose, 372
the method implemented in the OpenCV library [41] 373
has been employed. Once the search is done in Iifor 374
all corners, the operation is repeated for the image 375
Ii1, until I0is reached. In contrast to the ArUco 376
approach, this one is not affected by lens distortions. 377
7. Estimation of τi:The parameter τihas a direct influ- 378
ence in the computation time. The higher it is, the 379
faster the computation. A naive approach consists 380
in setting a fixed value for this parameter. However, 381
when processing video sequences, the parameter can 382
be automatically adjusted at the end of each frame. 383
In the first image of the sequence, the parameter τiis 384
set to zero. Thus, markers of any size are detected. 385
Then, for the next frame, τiis set to a value slightly 386
smaller than the size of the smallest marker detected 387
in the previous frame. In this way, markers could be 388
detected even if the camera moves away from them. 389
Therefore, the parameter τican be dynamically up- 390
dated as: 391
τi= (1 τs)P(ϑs)/4 (6)
where ϑsis the marker with the smallest perimeter 392
found in the image, and τsis a factor in the range 393
(0,1] that accounts for the camera motion speed. For 394
instance, when τs= 0.1, it means that in the next 395
frame, τiis such that markers 10% smaller than the 396
smallest marker in the current image will be sought. 397
If no markers are detected in a frame, τiis set to zero 398
so that in the next frame markers of any size can be 399
detected. 400
6
Figure 5: Sp eedUp of ArUco3 compared to ArUco, ArToolKit+, ChiliTags and AprilTags for resolutions: 4K (3840 ×2160), 1080p
(1920 ×1080), 720p (1280 ×720), 600p (800 ×600) and 480p (640 ×480). The horizontal axis represents the percentage of area occupied by
the markers in each frame, and the vertical axis one indicates how many times ArUco3 is faster.
As can be observed, the proposed pipeline includes a401
number of differences with respect to the original ArUco402
pipeline that allows increasing significantly the processing403
speed as we show next.404
4. Experiments and results405
This section shows the results obtained to validate the406
methodology proposed for the detection of fiducial mark-407
ers.408
First, in Sect 4.1, the computing times of our proposal409
are compared to the best alternatives found in the liter-410
ature: AprilTags [18], ChiliTags [36], ArToolKit+ [31],411
as well as ArUco [17] which is included in the OpenCV412
library3. Then, Sect. 4.2 analyses and compares the sensi-413
tivity of the proposed method with the above-mentioned414
methods. The main goal is to demonstrate that our ap-415
proach is able to reliably detect the markers with a very416
high true positive ratio, under a wide range of marker reso-417
lutions, while keeping the false positive rate to zero. After-418
ward, Sect. 4.3 studies the impact of the different system419
parameters on the speed and sensitivity, while Sect. 4.4420
evaluates the precision in the estimation of the corners.421
Finally, Sect. 4.5 shows the performance of the proposed422
method in a realistic video sequence with occlusions, illu-423
mination, and scale changes.424
To carry out the first three experiments, several videos425
have been recorded in our laboratory. Figure 4(b-e) shows426
some images of the video sequences employed. For these427
tests, a panel with a total of 16 markers was printed (Fig-428
ure 4a), four from each one of the fiducial markers em-429
ployed. The sequences were recorded at different distances430
at a frame rate of 30 fps using an Honor 5 mobile phone at431
4K resolution. The videos employed are publicly available432
4for evaluation purposes.433
3https://opencv.org/
4https://mega.nz/#F!DnA1wIAQ!6f6owb81G0E7Sw3EfddUXQ
In the video, there are frames in which the markers ap- 434
pear as small as can be observed in Figure 4b, where 435
the area of each marker occupies only 0.5% of the image, 436
and frames in which the marker is observed as big as in 437
Figure 4e, where the marker occupies 40% of total im- 438
age area. In total, the video sequences recorded sum up 439
to 10.666 frames. The video frames have been processed 440
at different resolutions so that the impact of the image 441
resolution in the computing time can be analyzed. In par- 442
ticular, the following the standard image resolutions have 443
been employed: 4K (3840 ×2160), 1080p (1920 ×1080), 444
720p (1280 ×720), 600p (800 ×600) and 480p (640×480). 445
All tests were performed using an Intel R
Core TM i7- 446
4700HQ 8-core processor with 8 GB RAM and Ubuntu 447
16.04 as the operating system. However, only one execu- 448
tion thread was employed in the tests performed. 449
It must be indicated that the code generated as part of 450
this work has been publicly released as the version 3 of the 451
popular ArUco library5. So, in the experiments section, 452
the method proposed in this paper will be referred to as 453
ArUco3. 454
4.1. Speedup 455
This section compares the computing times of the pro-
posed method with the most commonly used alternatives
AprilTags, ArToolKit+, ChiliTags, and ArUco. To do so,
we compute the speedup of our approach as the ratio be-
tween the computing time of an alternative (t1) and the
computing time of ArUco3 (t2) in processing the same im-
age:
SpeedUp =t1/t2(7)
In our method, the value τc= 32 was employed in all the 456
sequences, while τiand the segmentation threshold where 457
automatically computed as explained in the Steps 2 and 7 458
of the proposed method (Sect. 3.2). 459
5http://www.uco.es/grupos/ava/node/25
7
Table 1: Mean computing times (milliseconds) of the different steps
of the proposed method for different resolutions.
Resolution
480p 600p 720p 1080p 2160p
Step 1:Image Resize 0.037 0.050 0.057 0.068 0.101
Step 2:Image Segmentation 0.044 0.048 0.059 0.084 0.351
Step 3:Contour Extraction and Filtering 0.219 0.250 0.301 0.403 1.109
Step 4:Image Pyramid Creation 0.037 0.076 0.096 0.186 0.476
Step 5:Marker code extraction 0.510 0.519 0.542 0.547 0.583
Step 6:Corner Upsampling 0.058 0.065 0.079 0.096 0.134
Time (ms) 0.903 1.009 1.133 1.384 2.755
Fig. 5 shows the speedup of our approach for different460
image resolutions. The horizontal axis represents the rel-461
ative area occupied by the marker in the image, while the462
vertical axis represents the speedup. A total of 30 speed463
measurements were performed for each image, taking the464
median computing time for our evaluation. In the tests,465
the speedup is evaluated as a function of the observed466
marker area in order to better understand the behavior467
of our approach.468
The tests conducted clearly show that the proposed469
method (ArUco3) is faster than the rest of the methods470
and that the speedup increases with the image resolu-471
tion and with the observed marker area. Compared to472
ArUco implementation in the OpenCV library, the pro-473
posed method is significantly faster, achieving a minimum474
speedup of 17 in 4Kresolutions, up to 40 in the best case.475
In order to properly analyze the computing times of the476
different steps of the proposed method (Sect. 3.2), Table 1477
shows a summary for different image resolutions. Likewise,478
Fig. 6 shows the percentage of the total time required by479
each step. Please notice that Step 7 (Eq. 6) has been480
omitted because its computing time is negligible.481
As can be seen, the two most time-consuming opera-482
tions are Step 3 and 5. In particular, Step 5 requires spe-483
cial attention, since it proves the validity of the multi-scale484
method proposed for marker warping. It can be observed485
in the table, that the amount of time employed by Step 5486
is constant across all resolutions. In other words, the com-487
puting time does not increase significantly with the image488
resolution. Also notice how the time of Step 3 increases489
in 2160p. It is because this step involves operations that490
depend on the image dimensions, which grow quadrati-491
cally. An interesting future work is to develop methods492
reducing the time for contour extraction and filtering in493
high-resolution images.494
In any case, considering the average total computing495
time, the proposed method achieves in average more than496
360 fps in 4Kresolutions and more than 1000 fps in the497
lowest resolution, without any parallelism.498
Figure 6: Main steps ArUco3 times. Percentage of time of the
global computation required by each of the steps for resolutions: 4K,
1080p, 720p, 600p and 480p.
4.2. Sensitivity analysis 499
Correct detection of markers is a critical aspect that 500
must be analyzed to verify that the proposed algorithm is 501
able to obviate redundant information present in the scene, 502
extracting exclusively marker information. Fig. 7 shows 503
the True Positive Rate (TPR) of the proposed method as 504
a function of the area occupied by the marker in the image 505
for different image resolutions. 506
As can be observed, below certain marker area, the de- 507
tection is not reliable. This is because the observed marker 508
area is very small, making it difficult to distinguish the 509
different bits of the inner binary code. Once the observed 510
area of the marker reaches a certain limit, the proposed 511
method achieves perfect detection in all resolutions. It 512
must be remarked, that the False Positive Rate is zero in 513
all cases tested. Since it is a binary problem, the True 514
Negative Rate is one (TNR=1-FPR). 515
For a comparative evaluation performance between 516
ArUco3 and the other methods, the TPR has been an- 517
alyzed individually and the results are shown in Fig. 7. 518
As can be observed, ArUco behaves exactly like ArUco3. 519
AprilTags, however, has very poor behavior in all resolu- 520
tions, especially as the marker or the image sizes increases. 521
As we already commented in Sect. 2, AprilTags does not 522
rely on warping the marker image but instead does a sub- 523
sampling of a few pixels on the image in order to obtain 524
the binary code. This may be one of the reasons for its 525
poor performance. ArToolKit+ behaves reasonably well 526
across all the image resolutions and marker areas, while 527
Chilitags shows a somewhat unreliable behavior in all res- 528
olutions but 480p. 529
In conclusion, the proposed approach behaves similar to 530
the previous version of ArUco. 531
8
Figure 7: True Positive Ratio. Mean true positive ratio (TPR) for ArUco3, Chilitags, ArUco, ArToolKit+ and AprilTags for resolutions:
4K, 1080p, 720p, 600p and 480p), as function of the observed area for the set of markers.
4.3. Analysis of parameters532
The computing time and robustness of the proposed533
method depend mainly on two parameters, namely τi
534
which indicates the minimum size of the markers detected,535
and τc, the size of the canonical image.536
The parameter τihas an influence on the computing537
time, since it determines the size of the resized image Ir
538
(Eq. 1). We have analyzed the speed as a function of539
this parameter and the results are shown in Fig. 8. The540
figure represents the horizontal axis the value τi, and in the541
vertical axis, the average speed (measured as frames per542
second) in the sequences analyzed, independently of the543
observed marker area. A different line has been depicted544
for each image resolution. In this case, we have set fixed545
the parameter τc= 32.546
It can be observed that the curves follow a similar pat-547
tern in the five cases analyzed. In general, the maxi-548
mum increase in speed is obtained in the range of values549
τi= (0,0.2). Beyond that point, the improvement be-550
comes marginal. To better understand the impact of this551
parameter, Table 2 shows the reduction of the input im-552
age size Ifor different values of τi. For instance, when553
τi= 0.02, the resized image Iris 48% smaller than the554
original input image I(see Eq. 1). Beyond τi= 0.2, the555
resized image is so small that it has not a big impact in the556
speedup because there are other steps with a fixed com-557
puting time such as the Step 5 (Marker Code Extraction). 558
Table 2: Image size reduction for different values of τi.
τi0.01 0.015 0.02 0.1 0.2
Size reduction 0% 31% 48% 82% 90%
In any case, it must be noticed that the proposed 559
method is able to achieve 1000 fps in 4K resolutions when 560
detecting markers larger than 10% (τi= 0.1) of the image 561
area, and the same limit of 1000 fps is achieved for 1080p 562
resolutions for τi= 0.05. 563
With regards to the parameter τc, it indirectly influences 564
the speed since it determines the size of the resized images 565
(Eq 1). The smaller it is, the smaller the resized image Ir.566
Nevertheless, this parameter also has an influence on the 567
correct detection of the markers. The parameter indicates 568
the size of the canonical images used to identify the bi- 569
nary code of markers. If the canonical image is very small, 570
pixels are mixed up, and identification is not robust. Con- 571
sequently, the goal is to determine the minimum value of 572
τcthat achieves the best TPR. Fig. 9 shows the TPR ob- 573
tained for different configurations of the parameter τc. As 574
can be seen, for low values of the parameter τc(between 575
8 and 32) the system shows problems in the detection of 576
markers. However, for τc32 there is no improvement in 577
the TPR. Thus, we conclude that the value τc= 32 is the 578
9
Figure 8: Parameter τi.Speed of
method as a function of the parameter τi
for the different resolutions tested.
Figure 9: Parameter τc.True positive
rate obtained by different configurations of
parameter τc
Figure 10: Vertex jitter measured for
the different marker systems.
best choice.579
4.4. Precision of corner detection580
An important aspect to consider in the detection of the581
markers is vertex jitter, which refers to the noise in the582
estimation of the corners’ location. These errors are prob-583
lematic because they propagate to the estimation of the584
camera pose. In our method, a corner upsampling step585
(Step 6 in Sect. 3.2) is proposed to refine the corners’ esti-586
mations from the reduced image Irto the original image587
I. This section analyzes the proposed method comparing588
the results with the other marker systems.589
In order to perform the experiments, the camera has590
been placed at a fixed position recording the set of mark-591
ers already presented in Fig. 4a. Since the camera is not592
moving, the average location estimated for each corner can593
be considered to be the correct one (i.e., a Gaussian error594
distribution is assumed). Then, the standard deviation is595
an error measure for the localization of the corners. The596
process has been repeated a total of six times at varying597
distances and the results obtained are shown in Fig. 10 as598
box plots. In Table 3, the average error of each method599
has been indicated.
Table 3: Vertex jitter analysis: Standard deviations of the different
methods in estimating the marker corners.
Method ArUco ArUco3 Chilitags AprilTags ArToolKit+
Average error (pix) 0.140 0.161 0.174 0.225 0.432
600
As can be observed, the ArUco system obtains the best601
results, followed by our proposal ArUco3. However, it can602
be seen that the difference between both methods is of603
only 0.02 pixels, which is very small to consider it rele-604
vant. Chilitags shows a similar behavior than ArUco and605
ArUco3, but AprilTags and ArToolKit+ exhibit worse per-606
formance.607
4.5. Video sequence analysis 608
This section aims at showing the behavior of the pro- 609
posed system in a realistic scenario. For that purpose, four 610
markers have been placed in an environment with irregular 611
lighting and a video sequence has been recorded using a 612
4K mobile phone camera. Figure 11(a-e) show the frames 613
1,665,1300,1700 and 2100 of the video sequence. At the 614
start of the sequence, the camera is around five meters 615
away from the markers. The camera approaches the mark- 616
ers and then moves away again. As can be seen, around 617
frame 650 (Figure 11b), the user occludes the markers tem- 618
porarily. 619
Figure 11f shows the values of the parameter τiauto- 620
matically calculated along the sequence and Figure 11g 621
the processing speed. As can be observed, the system is 622
able to automatically adapt the value of τiaccording to the 623
observed marker area, thus adapting the computing speed 624
of the system. The maximum speed is obtained around 625
the frame 1300 when the camera is closest to the markers. 626
It can also be observed that around frame 650 when 627
the user occludes the markers with his hand, the system is 628
unable to detect any marker. Thus, the system searches for 629
the full resolution image (τi= 0) and the speed decreases. 630
However, when the markers are observed again, the system 631
recovers its speed. 632
Finally, Figure 11h shows the threshold values employed 633
for segmentation in each frame. As can be seen, the system 634
adapts to the illumination changes. Along the sequence, 635
the system does not produce any false negative nor posi- 636
tives. 637
5. Conclusions and future work 638
This paper has proposed a novel approach for detect- 639
ing fiducial markers aimed at maximizing speed while pre- 640
serving accuracy and robustness. The proposed method 641
10
Figure 11: Video Sequence in a realistic scenario. (a-e) Frames of the video sequence. The camera approaches the marker and then moves
away. The user occludes the camera temporarily. (f ) Evolution of the parameter τiautomatically computed. (g) Speed of the proposed
method in each frame of the sequence. (h) Thresholds automatically computed for each frame. The system adapts to illumination changes.
is specially designed to take advantage of the increasing642
camera resolutions available nowadays. Instead of detect-643
ing markers in the original image, a smaller version of the644
image is employed, in which the detection can be done645
at higher speed. By wisely employing a multi-scale image646
representation, the proposed method is able to find the po-647
sition of the marker corners with subpixel accuracy in the648
original image. The size of the processed image, as well649
as the threshold employed for segmentation, are dynam-650
ically adapted in each frame considering the information651
of the previous one. As a consequence, the system speed652
dynamically adapts in order to achieve the maximum per-653
formance.654
As shown experimentally, the proposed method outper-655
forms the state-of-the-art systems in terms of computing656
speed, without compromising the sensitivity or the preci-657
sion. Our method is between 17 and 40 times faster than658
the ArUco approach implemented in the OpenCV library.659
When compared to other approaches such as Chilitags,660
AprilTags, and ArToolKit+, our method achieves even661
higher speedups. 662
We consider as possible future works to investigate the 663
use of the proposed method in fish-eye cameras. The idea 664
is to compare the method with the rectified images if there 665
is analyze the method’s performance in presence of high 666
distortion. Also, we as well as to characterize the perfor- 667
mance when multiple fiducial markers with significantly 668
different scales are present in the same image. 669
Our system, which is publicly available as open source 670
code6, is a cost-effective tool for fast and precise self- 671
localization in applications such as robotics, unmanned 672
vehicles or augmented reality applications. 673
Acknowledgments 674
This project has been funded under projects TIN2016- 675
75279-P and IFI16/00033 (ISCIII) of Spain Ministry of 676
Economy, Industry and Competitiveness, and FEDER. 677
6http://www.uco.es/grupos/ava/node/25
11
References678
[1] R. Sim, J. J. Little, Autonomous vision-based robotic explo-679
ration and mapping using hybrid maps and particle filters, Im-680
age and Vision Computing 27 (1) (2009) 167 177, canadian681
Robotic Vision 2005 and 2006.682
[2] A. Pichler, S. C. Akkaladevi, M. Ikeda, M. Hofmann, M. Plasch,683
C. ogerer, G. Fritz, Towards shared autonomy for robotic684
tasks in manufacturing, Procedia Manufacturing 11 (Supple-685
ment C) (2017) 72 82, 27th International Conference on Flex-686
ible Automation and Intelligent Manufacturing, FAIM2017, 27-687
30 June 2017, Modena, Italy.688
[3] R. Valencia-Garcia, R. Martinez-B´ejar, A. Gasparetto, An in-689
telligent framework for simulating robot-assisted surgical oper-690
ations, Expert Systems with Applications 28 (3) (2005) 425 691
433.692
[4] A. Broggi, E. Dickmanns, Applications of computer vision to693
intelligent vehicles, Image and Vision Computing 18 (5) (2000)694
365 366.695
[5] T. Patterson, S. McClean, P. Morrow, G. Parr, C. Luo, Timely696
autonomous identification of uav safe landing zones, Image and697
Vision Computing 32 (9) (2014) 568 578.698
[6] D. Gonz´alez, J. erez, V. Milan´es, Parametric-based path gen-699
eration for automated vehicles at roundabouts, Expert Systems700
with Applications 71 (2017) 332 341.701
[7] J. L. Sanchez-Lopez, J. Pestana, P. de la Puente, P. Campoy, A702
reliable open-source system architecture for the fast designing703
and prototyping of autonomous multi-uav systems: Simulation704
and experimentation, Journal of Intelligent & Robotic Systems705
(2015) 1–19.706
[8] M. Olivares-Mendez, S. Kannan, H. Voos, Vision based fuzzy707
control autonomous landing with uavs: From v-rep to real708
experiments, in: Control and Automation (MED), 2015 23th709
Mediterranean Conference on, 2015, pp. 14–21.710
[9] S. Pflugi, R. Vasireddy, T. Lerch, T. M. Ecker, M. Tannast,711
N. Boemke, K. Siebenrock, G. Zheng, Augmented marker track-712
ing for peri-acetabular osteotomy surgery, in: 2017 39th Annual713
International Conference of the IEEE Engineering in Medicine714
and Biology Society (EMBC), 2017, pp. 937–941.715
[10] J. P. Lima, R. Roberto, F. Sim˜oes, M. Almeida, L. Figueiredo,716
J. M. Teixeira, V. Teichrieb, Markerless tracking system for717
augmented reality in the automotive industry, Expert Systems718
with Applications 82 (2017) 100 114.719
[11] P. Chen, Z. Peng, D. Li, L. Yang, An improved augmented re-720
ality system based on andar, Journal of Visual Communication721
and Image Representation 37 (2016) 63 69, weakly supervised722
learning and its applications.723
[12] S. Khattak, B. Cowan, I. Chepurna, A. Hogue, A real-time724
reconstructed 3d environment augmented with virtual objects725
rendered with correct occlusion, in: Games Media Entertain-726
ment (GEM), 2014 IEEE, 2014, pp. 1–8.727
[13] J. Engel, T. Sch¨ops, D. Cremers, LSD-SLAM: Large-scale direct728
monocular SLAM, 2014.729
[14] R. Mur-Artal, J. M. M. Montiel, J. D. Tard´os, Orb-slam: A ver-730
satile and accurate monocular slam system, IEEE Transactions731
on Robotics 31 (5) (2015) 1147–1163.732
[15] Cooperative pose estimation of a fleet of robots based on inter-733
active points alignment, Expert Systems with Applications 45734
(2016) 150 160.735
[16] S.-h. Zhong, Y. Liu, Q.-c. Chen, Visual orientation inhomogene-736
ity based scale-invariant feature transform, Expert Syst. Appl.737
42 (13) (2015) 5658–5667.738
[17] S. Garrido-Jurado, R. Mu˜noz Salinas, F. J. Madrid-Cuevas,739
M. J. Mar´ın-Jim´enez, Automatic generation and detection of740
highly reliable fiducial markers under occlusion, Pattern Recog- 741
nition 47 (6) (2014) 2280–2292. 742
[18] E. Olson, Apriltag: A robust and flexible visual fiducial system, 743
in: Robotics and Automation (ICRA), 2011 IEEE International 744
Conference on, 2011, pp. 3400–3407. 745
[19] F. Ababsa, M. Mallem, Robust camera pose estimation using 746
2d fiducials tracking for real-time augmented reality systems, 747
in: Proceedings of the 2004 ACM SIGGRAPH International 748
Conference on Virtual Reality Continuum and Its Applications 749
in Industry, VRCAI ’04, 2004, pp. 431–435. 750
[20] V. Mond´ejar-Guerra, S. Garrido-Jurado, R. Mu˜noz-Salinas, M.- 751
J. Mar´ın-Jim´enez, R. Medina-Carnicer, Robust identification of 752
fiducial markers in challenging conditions, Expert Systems with 753
Applications 93 (1) (2018) 336–345. 754
[21] R. Mu˜noz-Salinas, M. J. Mar´ın-Jimenez, E. Yeguas-Bolivar, 755
R. Medina-Carnicer, Mapping and localization from planar 756
markers, Pattern Recognition 73 (January 2018) 158 171. 757
[22] K. Dorfm¨uller, H. Wirth, Real-time hand and head tracking for 758
virtual environments using infrared beacons, in: in Proceedings 759
CAPTECH’98. 1998, Springer, 1998, pp. 113–127. 760
[23] M. Ribo, A. Pinz, A. L. Fuhrmann, A new optical tracking sys- 761
tem for virtual and augmented reality applications, in: In Pro- 762
ceedings of the IEEE Instrumentation and Measurement Tech- 763
nical Conference, 2001, pp. 1932–1936. 764
[24] V. A. Knyaz, R. V. Sibiryakov, The development of new coded 765
targets for automated point identification and non-contact sur- 766
face measurements, in: 3D Surface Measurements, International 767
Archives of Photogrammetry and Remote Sensing, Vol. XXXII, 768
part 5, 1998, pp. 80–85. 769
[25] L. Naimark, E. Foxlin, Circular data matrix fiducial system 770
and robust image processing for a wearable vision-inertial self- 771
tracker, in: Proceedings of the 1st International Symposium on 772
Mixed and Augmented Reality, ISMAR ’02, IEEE Computer 773
Society, Washington, DC, USA, 2002, pp. 27–36. 774
[26] J. Rekimoto, Y. Ayatsuka, Cybercode: designing augmented 775
reality environments with visual tags, in: Proceedings of DARE 776
2000 on Designing augmented reality environments, DARE ’00, 777
ACM, New York, NY, USA, 2000, pp. 1–10. 778
[27] M. Rohs, B. Gfeller, Using camera-equipped mobile phones for 779
interacting with real-world objects, in: Advances in Pervasive 780
Computing, 2004, pp. 265–271. 781
[28] M. Kaltenbrunner, R. Bencina, reactivision: a computer-vision 782
framework for table-based tangible interaction, in: Proceedings 783
of the 1st international conference on Tangible and embedded 784
interaction, TEI ’07, ACM, New York, NY, USA, 2007, pp. 785
69–74. 786
[29] H. Kato, M. Billinghurst, Marker tracking and hmd calibration 787
for a video-based augmented reality conferencing system, in: 788
Augmented Reality, 1999. (IWAR ’99) Proceedings. 2nd IEEE 789
and ACM International Workshop on, 1999, pp. 85–94. 790
[30] S. Lin, D. J. Costello, Error Control Coding, Second Edition, 791
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2004. 792
[31] D. Wagner, D. Schmalstieg, ARToolKitPlus for pose tracking on 793
mobile devices, in: Computer Vision Winter Workshop, 2007, 794
pp. 139–146. 795
[32] D. Schmalstieg, A. Fuhrmann, G. Hesina, Z. Szalav´ari, L. M. 796
Encarna¸ao, M. Gervautz, W. Purgathofer, The studierstube 797
augmented reality project, Presence: Teleoper. Virtual Environ. 798
11 (1) (2002) 33–54. 799
[33] M. Fiala, Designing highly reliable fiducial markers, IEEE 800
Transactions on Pattern Analysis and Machine Intelligence 801
32 (7) (2010) 1317–1324. 802
[34] D. Flohr, J. Fischer, A Lightweight ID-Based Extension for 803
Marker Tracking Systems, in: Eurographics Symposium on Vir- 804
tual Environments (EGVE) Short Paper Proceedings, 2007, pp. 805
12
59–64.806
[35] S. Garrido-Jurado, R. Mu˜noz-Salinas, F. Madrid-Cuevas,807
R. Medina-Carnicer, Generation of fiducial marker dictionaries808
using mixed integer linear programming, Pattern Recognition809
51 (2016) 481–491.810
[36] Q. Bonnard, S. Lemaignan, G. Zufferey, A. Mazzei, S. Cuendet,811
N. Li, A. ¨
Ozg¨ur, P. Dillenbourg, Chilitags 2: Robust fiducial812
markers for augmented reality and robotics. (2013).813
URL http://chili.epfl.ch/software814
[37] D. Johnston, M. Fleury, A. Downton, A. Clark, Real-time po-815
sitioning for augmented reality on a custom parallel machine,816
Image and Vision Computing 23 (3) (2005) 271 286.817
[38] Topological structural analysis of digitized binary images by818
border following, Computer Vision, Graphics, and Image Pro-819
cessing 30 (1) (1985) 32 46.820
[39] D. H. Douglas, T. K. Peucker, Algorithms for the reduction821
of the number of points required to represent a digitized line822
or its caricature, Cartographica: The International Journal for823
Geographic Information and Geovisualization 2 (10) (1973) 112824
122.825
[40] N. Otsu, A threshold selection method from gray-level his-826
tograms, IEEE Transactions on Systems, Man, and Cybernetics827
9 (1) (1979) 62–66.828
[41] G. Bradski, A. Kaehler, Learning OpenCV: Computer Vision in829
C++ with the OpenCV Library, 2nd Edition, O’Reilly Media,830
Inc., 2013.831
13
... (2). S L 1 L 2 can be estimated during the initialization phase of the mission using a known pattern, such as fiducial markers [36], [37] or a chessboard. Additionally, the method described in [38] can be used to estimate the matrix if both rovers are moving within the same 2D plane during initialization. ...
Article
Full-text available
A swarm of robots offers significant advantages over a single robot, enabling faster exploration of larger areas and enhanced robustness against single-point failures. Accurate relative positioning is critical for executing collaborative missions without collisions. When Visual Simultaneous Localization and Mapping (VSLAM) is employed to estimate each robot’s poses, the inter-agent loop closing method is commonly used to improve relative positioning accuracy by refining pose estimates and merging local maps based on shared feature points. However, this approach demands considerable computational and communication resources. In this paper, we introduce Co llaborative SLAM using V isual O dometry and R anges (CoVOR-SLAM) to address these challenges. CoVOR-SLAM significantly reduces both the amount of data transmitted between robots and their computational loads, as it requires only pose estimates, covariances, and range measurements instead of feature or map points for inter-agent loop detection. The necessary range measurements can be derived from pilot signals within the communication system, eliminating the need for complex additional infrastructure. We evaluated CoVOR-SLAM using real images and ultra-wideband range data collected from two rovers, as well as in a larger multi-agent setup utilizing public image datasets and realistic simulations. The results demonstrate that CoVOR-SLAM accurately estimates robot poses while requiring considerably less computational power and communication capacity than inter-agent loop closing techniques.
... Observations z t are camera estimates of the robot's pose. The highest level of observation noise σ z 3 matches the real-world setup, where ArUco markers [37] are positioned 1m apart on a grid (Fig. 9b). The mean error and the standard deviation of these pose estimation were measured given ground-truth. ...
Preprint
Full-text available
State estimation and control are often addressed separately, leading to unsafe execution due to sensing noise, execution errors, and discrepancies between the planning model and reality. Simultaneous control and trajectory estimation using probabilistic graphical models has been proposed as a unified solution to these challenges. Previous work, however, relies heavily on appropriate Gaussian priors and is limited to holonomic robots with linear time-varying models. The current research extends graphical optimization methods to vehicles with arbitrary dynamical models via Simultaneous Trajectory Estimation and Local Adaptation (STELA). The overall approach initializes feasible trajectories using a kinodynamic, sampling-based motion planner. Then, it simultaneously: (i) estimates the past trajectory based on noisy observations, and (ii) adapts the controls to be executed to minimize deviations from the planned, feasible trajectory, while avoiding collisions. The proposed factor graph representation of trajectories in STELA can be applied for any dynamical system given access to first or second-order state update equations, and introduces the duration of execution between two states in the trajectory discretization as an optimization variable. These features provide both generalization and flexibility in trajectory following. In addition to targeting computational efficiency, the proposed strategy performs incremental updates of the factor graph using the iSAM algorithm and introduces a time-window mechanism. This mechanism allows the factor graph to be dynamically updated to operate over a limited history and forward horizon of the planned trajectory. This enables online updates of controls at a minimum of 10Hz. Experiments demonstrate that STELA achieves at least comparable performance to previous frameworks on idealized vehicles with linear dynamics.[...]
... The pegboard, the cylinders, and the virtual prosthetic hand are all digital AR artifacts, whereas the rest of the environment is real. The pegboard is virtually anchored to a real table placed in front of the seated participant using an ArUco [7,24] marker 2 and the OpenCV plus Unity asset 3 for pose tracking and content registration. Participants can grab virtual objects with their hand via direct interaction. ...
Conference Paper
Bionic limb prosthetics continue to face significant challenges in terms of artificial limb embodiment, often leading to prosthesis abandonment in patients. Following recent results in literature on digital trainings, this study investigates the impact of the visuo-attentive feedback of an Augmented/Diminished Reality (AR/DR) exergame on prosthetic embodiment. Two interaction paradigms were compared, i.e. a reference unrestricted interaction of a virtual prosthetic limb against a one in which the prosthesis and its interaction capability fades over time and requires the participant’s visual-attention to reestablish it. Preliminary findings on non-amputees demonstrated hardship with the visuo-attentive approach which was associated with detrimental effects on the prosthesis embodiment, possibly because of the re-allocation of cognitive resources that are necessary to establish the process of body ownership.
... These are typically black squares with a white border and feature a distinctive arrangement of white blocks inside, making them a type of binary marker. The process of identifying these square fiducial markers generally involves a series of steps: image binarization through thresholding, contour detection, and pattern recognition carriedout in that sequence [16][17][18]. ...
Conference Paper
Full-text available
Camera Pose Estimation (CPE) is vital in augmented reality, virtual reality, and assisted living applications (AAL). While many software solutions exist, hardware-based solutions are more complex due to resource constraints (like memory, timing, etc.) This work uses a Field Programmable Gate Array (FPGA) based hardware accelerator to detect square binary fiducial markers for CPE, employing a single scan crack run-length algorithm for contour detection. This low-complexity method processes frames pixel by pixel, eliminating the need for buffering. High-Level Synthesis (HLS) is used for hardware development, with a co-design approach mapping fiducial detection to hardware and solving CPE via software using the Perspective-n-Point (PnP) method. The system, deployed on an FPGA-SoC, is tested on synthetic and indoor datasets. Results show successful marker detection at various resolutions and distances. The design has been prototyped and tested on a Zedboard with Xilinx Zynq®-7000 SoC. Keywords— Camera pose estimation, single scan run length algorithm, connected component algorithm (CCA), HLS, FPGA, fiducial marker.
... Each bulls-eye will be accompanied by four corner markers and one horizontal line marker as shown in Figure 4. These corner markers are ArUco markers [22][23] which are very robust binary square fiducial markers. The line marker is used to set the vertical position of the bulls-eye using infra-red sensor TCRT5000 to detect the line marker. ...
Article
Full-text available
Shooting exercises in Indonesia typically use simple bulls-eye targets on wooden boards with sand backstops, requiring manual setup and score calculation. This setup is inefficient, especially for long-range shooting, as operators must walk far to retrieve targets, and bullets embedded in sand are hard to recycle. This project developed an advanced shooting target featuring a bullet collector, semi-automatic target setup, automatic scoring, and target monitoring. A system with such complete features is not available in the market. This target system has a roll of bulls-eye paper and the roller is powered by a servo motor controlled by a switch to command a fresh new page of bulls-eye its positioning is helped by an infrared sensor to detect markers in the paper for correct positioning. This system is equipped with a bullet collector system by directing the bullet to a container using 45 0 angled armor and a layer of sand in the container to stop the bullet. This system is also equipped with a camera pointing to the bulls-eye paper and its output is transmitted to a monitor close to the shooter to identify bullet tracks for evaluating his shooting performance and to improve his shooting strategy. The image from the same camera is used for image processing with the OpenCV library and Python scripts to calculate the shooting score automatically. Several physical tests have been conducted and the system proves to perform reasonably well in the tests with some errors of around 3% for single bullet holes and simple multiple bullet holes. Based on test results, the pistol bullets have quite different properties from the rifle bullets. Pistol bullets follow the impact deflection with a coefficient of restitution e = 0 while rifle bullets follow the impact deflection with e ≈ 0.5. The pistol bullets are completely disintegrated after impact while the rifle bullets are just distorted. This is an open access article under the CC BY-SA license
Article
Unmanned Surface Vehicles (USVs) are commonly used as mobile docking stations for Unmanned Aerial Vehicles (UAVs) to ensure sustained operational capabilities. Conventional vision-based techniques based on horizontally-placed fiducial markers for autonomous landing are not only susceptible to interference from lighting and shadows but are also restricted by the limited Field of View (FOV) of the visual system. This study proposes a method that integrates an improved minimum snap trajectory planning algorithm with an event-triggered vision-based technique to achieve autonomous landing on a small USV. The trajectory planning algorithm ensures trajectory smoothness and controls deviations from the target flight path, enabling the UAV to approach the USV despite the visual system’s limited FOV. To avoid direct contact between the UAV and the fiducial marker while mitigating the interference from lighting and shadows on the marker, a landing platform with a vertically placed fiducial marker is designed to separate the UAV landing area from the fiducial marker detection region. Additionally, an event-triggered mechanism is used to limit excessive yaw angle adjustment of the UAV to improve its autonomous landing efficiency and stability. Experiments conducted in both terrestrial and river environments demonstrate that the UAV can successfully perform autonomous landing on a small USV in both stationary and moving scenarios.
Chapter
Full-text available
Augmented Reality (AR) allows workers to construct buildings accurately and intuitively without the need for traditional tools like 2-D drawings and rulers. However, accurately tracking worker’s pose remains a significant challenge in existing experiments due to their continuous and irregular movement. This research discusses a series of methods using cameras and algorithms to achieve the 6-DoF pose tracking function and reveal the relationship between each method and corresponding tracking accuracy in order to figure out a robust approach of AR-assisted assembly. This paper begins with a consideration of the possible limitations of existing methods including the image drift associated with visual SLAM and the time-consuming nature of fiducial markers. Next, the entire hardware and software framework was introduced, which elaborates on how the motion capture system is integrated into the AR-assisted assembly system. Then, some experiments have been carried out to demonstrate the connection between the system set up and pose tracking accuracy. This research shows the possibility to easily finish assembly task based on AR technology by integrating motion capture system.
Conference Paper
Full-text available
This paper is focused on the design of a vision based control approach for the autonomous landing task of Vertical Take-off and Landing (VTOL) Unmanned Aerial Vehicles (UAVs). Here is presented the setup of a simulated environment developed in V-REP connected to ROS, and its uses for tuning a vision based control approach. In this work, a Fuzzy control approach was proposed to command the UAV's vertical, longitudinal, lateral and orientation velocities. The UAV's pose estimation was done based on a vision algorithm and the knowledge of the landing target. Real experiments with a quadrotor landing in a moving platform are also presented.
Article
Full-text available
Squared planar markers are a popular tool for fast, accurate and robust camera localization, but its use is frequently limited to a single marker, or at most, to a small set of them for which their relative pose is known beforehand. Mapping and localization from a large set of planar markers is yet a scarcely treated problem in favour of keypoint-based approaches. However, while keypoint detectors are not robust to rapid motion, large changes in viewpoint, or significant changes in appearance, fiducial markers can be robustly detected under a wider range of conditions. This paper proposes a novel method to simultaneously solve the problems of mapping and localization from a set of squared planar markers. First, a quiver of pairwise relative marker poses is created, from which an initial pose graph is obtained. The pose graph may contain small pairwise pose errors, that when propagated, leads to large errors. Thus, we distribute the rotational and translational error along the basis cycles of the graph so as to obtain a corrected pose graph. Finally, we perform a global pose optimization by minimizing the reprojection errors of the planar markers in all observed frames. The experiments conducted show that our method performs better than Structure from Motion and visual SLAM techniques.
Article
Full-text available
During the process of design and development of an autonomous Multi-UAV System, two main problems appear. The first one is the difficulty of designing all the modules and behaviors of the aerial multi-robot system. The second one is the difficulty of having an autonomous prototype of the system for the developers that allows to test the performance of each module even in an early stage of the project. These two problems motivate this paper. A multipurpose system architecture for autonomous multi-UAV platforms is presented. This versatile system architecture can be used by the system designers as a template when developing their own systems. The proposed system architecture is general enough to be used in a wide range of applications, as demonstrated in the paper. This system architecture aims to be a reference for all designers. Additionally, to allow for the fast prototyping of autonomous multi-aerial systems, an Open Source framework based on the previously defined system architecture is introduced. It allows developers to have a flight proven multi-aerial system ready to use, so that they can test their algorithms even in an early stage of the project. The implementation of this framework, introduced in the paper with the name of “CVG Quadrotor Swarm”, which has also the advantages of being modular and compatible with different aerial platforms, can be found at https://github.com/Vision4UAV/cvg_quadrotor_swarmwith a consistent catalog of available modules. The good performance of this framework is demonstrated in the paper by choosing a basic instance of it and carrying out simulation and experimental tests whose results are summarized and discussed in this paper.
Article
Full-text available
Square-based fiducial markers are one of the most popular approaches for camera pose estimation due to its fast detection and robustness. In order to maximize their error correction capabilities, it is required to use an inner binary codification with a large inter-marker distance. This paper proposes two Mixed Integer Linear Programming (MILP) approaches to generate configurable square-based fiducial marker dictionaries maximizing their inter-marker distance. The first approach guarantees the optimal solution, however, it can only be applied to relatively small dictionaries and number of bits since the computing times are too long for many situations. The second approach is an alternative formulation to obtain suboptimal dictionaries within restricted time, achieving results that still surpass significantly the current state of the art methods.
Article
Full-text available
This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.
Conference Paper
We developed and validated a small, easy to use and cost-effective augmented marker-based hybrid navigation system for peri-acetabular osteotomy (PAO) surgery. The hybrid system consists of a tracking unit directly placed on the patient's pelvis, an augmented marker with an integrated inertial measurement unit ('MU) attached to the patient's acetabular fragment and the host computer. The tracking unit sends a live video stream of the marker to the host computer where the marker's pose is estimated. The augmented marker with the 'MU sends its pose estimate to the host computer where we apply sensor fusion to compute the final marker pose estimate. The host computer then tracks the orientation of the acetabular fragment during peri-acetabular osteotomy surgery. Anatomy registration is done using a previously developed registration device. A Kalman filter-based sensor fusion was added to complete the system. A plastic bone study was performed for validation between an optical tracking-based navigation system and our proposed system. Mean absolute difference for inclination and anteversion was 1.63 degrees and 1.55 degrees, respectively. The results show that our system is able to accurately measure the orientation of the acetabular fragment.
Article
Urban environments are becoming more and more complex because several factors as consecutive crossroads or lanes changes. These scenarios demand specific infrastructures—i.e. roundabouts, for improving traffic flow compared with traditional intersections. A roundabout removes timeouts associated with traffic lights at crossroads and trajectory conflicts among drivers. However, it is a challenging scenario for both humans and automated vehicles. This work presents a path planning method for automated vehicle driving at roundabouts. The proposed system achieves a G¹ continuous path, minimizing curvature steps to increase smoothness, dividing the driving process in three stages: entrance maneuver, driving within the roundabout and exit maneuver. Parametric equations are generated to deal with automated roundabout driving. This approach allows a real time planning considering two-lane roundabouts, taking different exits. Tests in simulated environments and on our prototype platform—Cybercar—validate the system on real urban environments, showing the proper behavior of the system.
Article
AndAR is a project applied to develop Mobile Augmented Reality (MAR) applications on the android platform. The existing registration technologies of AndAR are still base on markers assume that all frames from all videos contain the target objects. With the need of practical application, the registration based on natural features is more popular, but the major limitation of the registration is that many of them are based on low-level visual features. This paper improves AndAR by introducing the planar natural features. The key of registration based on planar natural features is to get the homography matrix which can be calculated with more than 4 pairs of matching feature points, so a 3D registration method based on ORB and optical flow is proposed in this paper. ORB is used for feature point matching and RANSAC is used to choose good matches, called inliers, from all the matches. When the ratio of inliers is more than 50% in some video frame, inliers tracking based on optical flow is used to calculate the homography matrix in the latter frames and when the number of inliers successfully tracked is less than 4, then it goes back to ORB feature point matching again. The result shows that the improved AndAR can augment not only reality based on markers but also reality based on planar natural features in near real time and the hybrid approach can not only improve speed but also extend the usable tracking range.
Article
Scale-invariant feature transform (SIFT) is an algorithm to detect and describe local features in images. In the last fifteen years, SIFT plays a very important role in multimedia content analysis, such as image classification and retrieval, because of its attractive character on invariance. This paper intends to explore a new path for SIFT research by making use of the findings from neuroscience. We propose a more efficient and compact scale-invariant feature detector and descriptor by simulating visual orientation inhomogeneity in human system. We validate that visual orientation inhomogeneity SIFT (V-SIFT) can achieve better or at least comparable performance with less computation resource and time cost in various computer vision tasks under real world conditions, such as image matching and object recognition. This work also illuminates a wider range of opportunities for integrating the inhomogeneity of visual orientation with other local position-dependent detectors and descriptors.