ArticlePDF Available

Generation of fiducial marker dictionaries using Mixed Integer Linear Programming

Authors:

Abstract

Square-based fiducial markers are one of the most popular approaches for camera pose estimation due to its fast detection and robustness. In order to maximize their error correction capabilities, it is required to use an inner binary codification with a large inter-marker distance. This paper proposes two Mixed Integer Linear Programming (MILP) approaches to generate configurable square-based fiducial marker dictionaries maximizing their inter-marker distance. The first approach guarantees the optimal solution, however, it can only be applied to relatively small dictionaries and number of bits since the computing times are too long for many situations. The second approach is an alternative formulation to obtain suboptimal dictionaries within restricted time, achieving results that still surpass significantly the current state of the art methods.
Generation of fiducial marker dictionaries using mixed integer linear programming
S. Garrido-Jurado1, R. Mu˜noz-Salinas, F.J Madrid-Cuevas, R. Medina-Carnicer
Computing and Numerical Analysis Department, C´ordoba University, Spain
Abstract
Square-based fiducial markers are one of the most popular approaches for camera pose estimation due to its fast detection
and robustness. In order to maximize their error correction capabilities, it is required to use an inner binary codification
with a large inter-marker distance. This paper proposes two Mixed Integer Linear Programming (MILP) approaches to
generate configurable square-based fiducial marker dictionaries maximizing their inter-marker distance. The first approach
guarantees the optimal solution, however, it can only be applied to relatively small dictionaries and number of bits since
the computing times are too long for many situations. The second approach is an alternative formulation to obtain
suboptimal dictionaries within restricted time, achieving results that still surpass significantly the current state of the art
methods.
Keywords: fiducial markers, MILP, mixed integer linear programming, augmented reality, computer vision.
1. Introduction
Camera pose estimation is a common problem in numer-
ous computer vision applications such as robot navigation
[1, 2] or augmented reality [3, 4, 5], which is usually based
on obtaining correspondences between environment and im-
age points. While the use of natural features, such as key
points or textures [6, 7, 8, 9], is a very popular strategy which
does not require altering the environment, the use of fiducial
markers is still of great importance since it provides point
correspondences more robustly, efficiently and precisely.
In particular, square-based fiducial markers are the most
popular in the field of augmented reality [4, 10, 11] since a
single marker provides the four points required to estimate
the camera pose (given that it is properly calibrated). In
general, squared-based markers use an inner binary code for
identification, error detection and correction.
The detection process of this type of markers can be split in
two main steps. The first step is the candidate search, which
consists in finding square shapes in the image that look like
markers. The second step is the identification stage, where
the inner codification of the candidates is analyzed in order
to determine whether they really are markers, and if they
belong to the considered set of valid ones, also known as
dictionary.
A key aspect of such dictionaries is the inter-marker dis-
Corresponding author
Email addresses: i52gajus@uco.es (S. Garrido-Jurado),
rmsalinas@uco.es (R. Mu˜noz-Salinas), fjmadrid@uco.es (F.J
Madrid-Cuevas), rmedina@uco.es (R. Medina-Carnicer)
1Computing and Numerical Analysis Department, Edificio Einstein.
Campus de Rabanales, C´ordoba University, 14071, C´ordoba, Spain,
Tlfn:(+34)957212255
tance [10], which is the minimum Hamming distance between
the binary codes of the markers, considering the four possi-
ble rotations. This distance defines the maximum number of
bits that can be corrected without producing an inter-marker
confusion error, i.e. a marker being erroneously identified
as a different one. As a consequence, the inter-marker dis-
tance is directly related to the error correction capabilities
of a dictionary. The larger the inter-marker, the lower the
false negative and inter-marker confusion rates, and there-
fore, the higher the robustness of the process.
For instance, Figure 1 shows an example of inter-marker
confusion error and the importance of large inter-marker
distances. The two first markers have a short distance of
only 1bit while the third marker has larger distances of at
least 5bits to the rest of markers. As it can be seen in Fig-
ures 1d,e, a single erroneous bit is enough to cause a wrong
identification of the second marker. On the other hand, the
third marker is correctly identified despite having a higher
number of errors.
Most related works propose their own predefined dictio-
nary of markers with a fixed number of markers and bits, and
a constant inter-marker distance. However, using a prede-
fined dictionary for every application is not the optimal ap-
proach. Instead, if the number of required markers and their
size is known, it is preferable to create a custom dictionary
that maximizes the inter-marker distance and, consequently,
the error detection and correction capabilities. Although this
is the tendency of the latest proposals [12, 13], they rely on
heuristic approaches, none of them being optimal.
This paper presents two novel dictionary generation
methods based on the Mixed Integer Linear Programming
(MILP) paradigm. The first MILP model proposed guaran-
tees the optimal inter-marker distance for a specific number
Preprint submitted to Pattern Recognition October 5, 2015
1
0
1
1
0
1
1
0
0
1
1
1
1
1
1
0
0
1
1
0
0
0
0
1
1
0
0
0
1
0
0
0
0
1
1
1
1
1
1
0
0
0
0
1
0
0
0
1
1
0
0
0
0
1
1
0
0
1
1
1
1
1
1
0
0
1
1
0
1
1
0
1
1
1
1
0
1
1
0
0
0
1
0
0
0
0
0
1
0
0
1
1
0
0
1
0
1
1
0
0
1
1
1
1
1
1
1
1
m0
m1
m2
(a) (b) (c) (d) (e)
Figure 1: Example of inter-marker confusion error and how it is avoided with large inter-marker distances. (a) Original marker images.
The marker distances are D(m0, m1) = 1,D(m0, m2) = 5 and D(m1, m2) = 6. (b) Real image with the markers placed in the environment.
(c) Marker images after removing the perspective. (d) Inner bits extracted from images in c. Erroneous bits are highlighted in red. (e) Final
identifiers assigned to each marker. It can be observed that m1has been confused with m0. On the other hand, m2, despite a higher number
of errors, can be correctly identify since the distance to the rest of markers is higher. (Best seen in color).
of markers and bits. It is the first approach in the literature,
up to our knowledge, that assures optimal results in terms of
inter-marker distance. However, since the convergence time
of this model is too long for many applications, we also pro-
pose an alternative MILP formulation that converges faster,
and, although it does not guarantee optimality, its results
surpass the current state of the art significantly. The two
proposed approaches presented in this paper constitute rel-
evant improvements to the error correction capabilities of
fiducial marker systems.
The rest of the paper is structured as follows. Section
2 reviews related work. Section 3 presents a mathematical
formalization of the problem. Section 4 details the proposed
MILP models to our problem. Finally, Section 5 presents
the experimentation carried out, and Section 6 draws some
conclusions.
2. Related work
Fiducial markers are synthetic elements placed in the
working area either to facilitate the camera pose estimation
task or for labeling purposes. They are specially designed
to be easily detected even at low resolutions and most of the
applications require not one but many different markers (a
dictionary). Thus, the ability of identifying them uniquely is
an important feature. Several fiducial marker systems have
been proposed in the literature as shown in Figure 2.
The simplest proposals are those based on fiducial points.
These markers constitute a single scene point and are usually
based on leds, retroreflective spheres or planar dots [14, 15].
Their identification is typically based on the relative posi-
tions of different points, which can be a limiting and complex
process.
A straight evolution of the point markers are the circular
markers [16, 17] (Fig. 2a). These markers are similar to the
previous ones except for the fact that they include infor-
mation (as circular sectors or concentric rings) to facilitate
the identification process. Their main drawback is that they
provide only one correspondence point per marker.
Other types of fiducial markers are based on blob detec-
tion. For instance, Cybercode [18] and VisualCode [19] (Fig.
2b,c) are based on the same technology than QR and Maxi-
code [20] codes, but providing several correspondence points.
Other popular fiducial markers based on blob detection are
the ReacTIVision amoeba markers [21], that are designed
using genetic algorithms (Fig. 2d).
One of the most popular families of fiducial markers are
the square-based ones. They contain a black border to ease
their detection and employ their inner region for identifica-
tion purposes. Their main benefit is that each marker pro-
vides four prominent points (i.e. its four corners) which can
be easily detected and employed as correspondence points,
thus allowing camera pose estimation using a single marker.
In this category, one of the most popular systems is AR-
ToolKit [4], an open source pro ject which has been exten-
sively used in the last decade, especially in the academic
community. ARToolKit markers include a pattern in their
inner region for identification which can be customized by
the user (Fig. 2e). Despite its popularity, it presents some
drawbacks. Firstly, it uses a template matching strategy to
identify the markers, which produces a high false positive
rate [22]. Secondly, the square detection is based on global
thresholding which makes it high sensitive to the lighting
conditions.
Instead of using template matching, the majority of
square-based marker systems employs a binary codification
in their inner region [23, 10, 11]. Most approaches use a
2
(a) Intersense
(h) ARToolKit
Plus
(e) ARToolKit
(b) CyberCode (c) VisualCode (d) ReacTIVision
(j) ArUco
(f) Matrix (g) ARTag
(i) AprilTags
Figure 2: Examples of fiducial markers proposed in previous works.
codification based on classic methods of signal coding, such
as CRC codes [24], achieving a more robust identification
and facilitating the error detection and correction processes.
Matrix [23] is one of the first and simplest proposals which
uses a binary code with redundant bits for error detection
(Fig. 2f). ARTag [10] (Fig. 2g) is based on the same idea
but it employs a more robust codification. Furthermore, it
improves the square detection using an edge-based method
instead of the global thresholding of ARToolKit. ARTag
provides its marker dictionary in a specific order so that the
inter-marker distance is maximized. Its main drawbacks are
that the marker size is fixed to 6 ×6 bits and the error
correction process can correct up to one bit, independently
of the inter-marker distance of the used marker subset.
ARToolKit Plus [11] (Fig. 2h) improves some of the fea-
tures of its predecessor ARToolKit. Firstly, it employs a
dynamic method to update the global threshold depending
on the pixel values in the previous detected markers. Sec-
ondly, as in ARTag, it provides a binary codification for
marker identification. The first ARToolKit Plus version in-
cludes 512 markers whose codification is based on repeating
four times a 9-bits identifier, achieving a minimum distance
of four bits between any pair of markers. The last known
version instead, proposes a dictionary of 4096 markers with
6×6 bits based on a BCH codification [25] that presents a
minimum distance of two bits so that error correction is not
possible. ARToolKitPlus project was halted and followed by
the Studierstube Tracker [26] project which is not publicly
available.
The main problem of codification based on classic coding
techniques is that they need to deal with the different marker
rotations, which affect negatively to the inter-marker dis-
tances of the generated dictionaries. Some approaches em-
ploy special anchor points, such as QR and Maxicode [20],
to remove the rotation ambiguity. However, this complicates
the detection process and, more importantly, these anchor
points are vulnerable to errors since they are not protected
by any coding system.
Most recent approaches rely on heuristics to select a set of
markers with large inter-marker distances, considering rota-
tion. However, since the search space is very large even for
small dictionaries with a low number of bits, an exhaustive
search is unfeasible and optimality can not be guaranteed.
One of the first and simplest proposals is the BinARyID
system [27], whose dictionary generation process is based on
selecting those markers that accomplish a minimum Ham-
ming distance of one to any of the previous selected mark-
ers, so that rotation ambiguities are avoided. The markers
are analyzed in ascending order until the desired number of
markers is achieved. Its main problem is that it does not
allow error detection and correction (since the distance is
one) and the generation times can be prohibitive for large
dictionaries.
The generation method proposed by the AprilTags library
[13] (Fig. 2i) employs a similar approach than BinARyID,
but with some significant improvements. Firstly, the mini-
mum Hamming distance can be provided by the user, gen-
erating dictionaries with larger inter-marker distances and,
hence, allowing error detection and correction. Secondly,
instead of analyzing markers one by one in ascending order,
larger increments are performed based on an heuristic ap-
proach, so that markers with larger inter-marker distances
are found faster. Finally, selected markers also need to ac-
complish a minimum geometric complexity to increment the
number of bit transitions. Its main downside is that the
generation time is still very large, specially for large marker
sizes.
In [12], the ArUco coding system is presented (Fig. 2j).
Its generation is based on maximizing both the inter-marker
distance and the number of bit transitions. Contrary to
AprilTags, the minimum inter-marker distance does not
need to be provided by the user, instead it is automatically
derived during the generation process. However, it has two
main downsides. Firstly, it uses a time consuming stochastic
search strategy. Secondly, as the marker size increases, its
memory requirements grow exponentially, limiting the max-
imum size to which it can be applied. As ARTag, ArUco
sorts the generated markers in a list so as to maximize the
inter-marker distance.
This paper proposes two novel approaches to generate
square-based fiducial marker dictionaries based on Mixed In-
teger Linear Programming (MILP) [28]. MILP methods can
achieve the optimal results of a mathematical model repre-
sented by linear relationships and where some unknowns are
constrained to be integers. MILP problems receive special
attention from the community since they fit many real-life
situations, such as embedded system design [29], industrial
processes [30], automatic scheduling [31], distribution sys-
tems [32] or trajectory planning [33]. However, these kind of
problems are known to be NP-hard and, as a consequence,
there have been many efforts in developing techniques to
speed up the convergence process.
3
In contrast to previous works, our first proposed method
achieves the optimal dictionary in terms of inter-marker dis-
tance for any number of markers and bits. It is the first
approach in the literature that guarantees the optimal inter-
marker distances, up to our knowledge. However, it suffers
from the course of dimensionality and the computing times
are too long as the size of the dictionaries or number of bits
increase. Thus, we propose a second approach that achieves
suboptimal dictionaries within restricted computing times
which compares very favorable to the dictionaries obtained
by previous works.
As shown in the experimental section, our methods ob-
tain state of the art dictionaries (in terms of inter-marker
distances), which are a relevant improvement to the error
correction capabilities of square fiducial marker systems.
3. Problem formulation
The most relevant aspects to consider during the design
of a marker dictionary are the false positive rate, the false
negative rate and the inter-marker confusion rate. The first
two are usually handled using error detection and correc-
tion techniques. On the other hand, the inter-marker con-
fusion rate depends only on the distance among the dictio-
nary markers. If the distance between two markers is short,
a marker could be confused with another one with just a
few bit modifications, and the error could not be detected.
As a consequence, this value also affects to the maximum
number of erroneous bits that can be corrected.
Let us denote by Dthe set of all possible markers of n×n
bits and by DdDthe subset of all vectors formed by
exactly dmarkers. An element D= (m1, m2, m3...,md)
of the set Ddis named a dictionary, where the super-index
indicates the marker position in the dictionary.
An automatic dictionary generation process consists in
selecting a dictionary Dfrom the set Dd, so that each mi, i
{1, ..., d}, is as far from each mj,j∈ {1, ..., d},j6=i, as
possible. In general, the problem is to find the dictionary
Dthat maximizes the desired criterion τ(D):
D= argmax
DDd
{τ(D)}.(1)
The criterion τ(D) employed in this work is the dictionary
inter-marker distance, which is the minimum Hamming dis-
tance between any two markers of D. This value is of great
importance since it indicates the minimum number of er-
roneous bits that can be corrected: b(τ(D)1)/2c. If the
number of erroneous bits of a marker is lower than or equal
to this value, it can be guaranteed that the closest marker
in the dictionary is the correct one. However, if the num-
ber of erroneous bits is higher than τ(D), the assumption
does not hold and hence, the error correction cannot be per-
formed because the closest marker could not be the correct
one (inter-marker confusion mistake). Thus, the objective of
a dictionary generation method is to maximize the function
τ(D).
mR1(m)
m1m2m3
m4m5m6
m7m8m9
m7m4m1
m8m5m2
m9m6m3
R2(m)
m9m8m7
m6m5m4
m3m2m1
R3(m)
m3m6m9
m2m5m8
m1m4m7
Figure 3: Example of 90 degrees rotations of a marker, m, composed
by 3 ×3 bits. The marker mis represented by the bits in row-
major order (m1,...,m9). As can be observed, after one rotation,
the bits are permuted and the obtained marker, R1(m), is represented
by (m7, m4, m1, m8, m5, m2, m9, m6, m3). The same process repeats
for the rest of rotations.
Since an exhaustive evaluation of the entire search space
is not feasible, a MILP model to find the optimal solution
is proposed in this work.
Let us start defining the marker named i, i.e. a component
of a dictionary D, as a binary matrix miof size n×n:
mi= (mi
1, mi
2, mi
3...,mi
n×n)|mi
k∈ {0,1},(2)
where mi
kdenotes the k-th bit of the marker matrix mi
assuming a row-major order.
Note that a marker detected in an image can be rotated
respect to its original position and, thus, the marker bits will
also be rotated. As it is shown in Fig. 3, a marker rotation
can be formulated as a permutation of the marker bits. For
a marker mi, let us define its analogous set, A(mi), as the
set of the markers obtained after the three possible rotations
plus the marker itself:
A(mi) =
3
[
l=0
Rl(mi),(3)
being Rlan operator that rotates the marker bits l×90
degrees in clockwise direction. In fact, all the markers in an
analogous set can be considered as equivalent solutions of
the search space.
Since our goal is to obtain a dictionary that maximizes
the inter-marker distance, the distance between two mark-
ers, mi, mj,|i, j ∈ {1, ..., d}, i 6=j, must be properly de-
fined. Considering that all the elements in an analogous set
are equivalent, the distance between markers from different
analogous set is defined as:
D(mi, mj) = min
mk∈A(mj){H(mi, mk)},(4)
where the function His the Hamming distance between two
markers.
Furthermore, since we want to obtain the camera pose
with respect to the marker, its corners must be identified
unequivocally. Therefore, the distance of a marker to the
rest of elements in its analogous set must be considered too.
This distance, referred to as marker self-distance, is defined
as:
S(mi) = min
mk∈A0(mi){H(mi, mk)},(5)
4
where A0(mi) is the analogous set of miwithout considering
the marker itself:
A0(mi) = A(mi)− {mi}.(6)
In the end, the objective function τ(D) in Eq. 1 is the
minimum distance among the marker self-distances and the
distances between any pair of markers in the dictionary:
τ(D) = min
min
miDS(mi),min
mi,mjD
mi6=mjD(mi, mj)
.
(7)
This function represents the inter-marker distance of a dic-
tionary and the goal of the dictionary generation methods
proposed in this work is to maximize it.
Figure 1 shows a marker detection example which illus-
trates the benefits of using dictionaries with large inter-
marker distances. A dictionary composed by 3 markers of
6×6bits is shown in Fig.1a,b. While the distance between
m0and m1is only one bit, the distances to m2are larger,
D(m0, m2)=5and D(m1, m2)=6. The markers have
been detected using a standard marker detection process like
the one in [12]. Figure 1c shows the images obtained af-
ter removing the perspective distortion of each marker and
Figure 1d shows the extracted bits from each image. It can
be observed that the images in Figure 1c have an important
amount of noise, which usually produces errors during the bit
extraction process. Erroneous bits are highlighted in red in
Figure 1d. Finally, Figure 1e shows the assigned identifiers
to each of the markers applying a maximum error correction
of 2 bits. The marker m1has been erroneously identified as
m0due to the erroneous extracted bit and the short marker
distance. Note that a single error is enough to make m1
identical to m0. On the other hand, m2has been correctly
identified despite presenting two erroneous bits. Due to the
large marker distance, it is less unlikely that m2gets erro-
neously identified during the identification step.
3.1. Maximum inter-marker distance: τn
max
In order to reduce the search space and accelerate the
MILP convergence, it is of great importance to known the
maximum possible value of the objective function τ(D). Let
us denote by τn
max the maximum possible inter-marker dis-
tance of a dictionary with markers of n×nbits. In this
section, the derivation of this value is presented as already
done in [12].
If we think of the simplest possible dictionary, we realize
that it is composed by a single marker. Then, the marker
self-distance constitutes the inter-marker distance of the dic-
tionary. If a second marker is added to the dictionary, the
new inter-marker distance will be smaller than or equal to
the previous one. As a consequence, the maximum theoret-
ical value of τn
max is given only by the self-distance (Eq. 5)
of the first marker.
Figure 4: Example of quartet in a 4 ×4 bits marker.
Group Quartets Hamming distances
90 deg 180 deg 270 deg
Q10000,1111 0 0 0
Q2
1000,0100,0010,0001,
1110,0111,1011,1101 2 2 2
Q31100,0110,0011,1001 2 4 2
Q40101,1010 4 0 4
Table 1: Quartet groups and Hamming distances they provide in each
rotation.
Quartet Group Hamming distances
90 degrees 180 degrees 270 degrees
1Q32 4 2
2Q32 4 2
3Q44 0 4
4Q32 4 2
Total distances 10 12 10
τ4
max min(10,12,10) = 10
Table 2: Quartet assignment for a 4 ×4 marker (C= 4) to obtain
τ4
max = 10. It can be observed that the sequence {Q3, Q3, Q4}is
repeated until filling all the quartets in the marker.
The key point to understand the derivation of τn
max is
the concept of quartet, which is the set of four bits that
interchange their positions at each rotation (see Figure 4).
As can be observed, these four bits do not interact with the
rest of bits when the marker is rotated. Hence, a quartet
contributes to the marker self-distance independently from
the rest of quartets. In general, the number of quartets of a
marker is C=jn2
4k. If nis odd, the central bit constitutes
a quartet by itself that can be ignored since it does not
influence on the self-distance.
Since a quartet is composed by 4 bits, there is a total of
16 different possible quartets. From a brief study, it can be
observed that some of the quartets provide the same Ham-
ming distances in each rotation. For the purpose of calcu-
lating τn
max, these quartets can be considered equivalent and
they can be grouped into the same quartet group,Qi. Table
1 shows the 4 different quartet groups and the Hamming
distances they provide in each rotation.
For instance, the quartet 1100 contributes with Hamming
distances (2,4,2) as it rotates:
H(1100,0110) = 2; H(1100,0011) = 4; H(1100,1001) = 2,
and quartet 1001, which belongs to the same quartet group,
contributes with the same Hamming distances:
H(1001,1100) = 2; H(1001,0110) = 4; H(1001,0011) = 2.
Hence, the problem of obtaining the maximum marker
self-distance consists in assigning each quartet to a quartet
5
group, so that the minimum marker distance of the three
rotations is maximized. If this is understood as a multiob-
jective problem, the Pareto front is composed by the quar-
tet groups Q3and Q4, since they dominate all the other
solutions. Thus, the problem is simplified in assigning each
quartet to any of the two quartet groups Q3and Q4. Then,
it can be easily deduced that the maximum value τn
max is ob-
tained by assigning the groups {Q3, Q3, Q4}(in this order)
repeatedly until completing all the quartets of a marker.
For instance, for a 4 ×4 marker (C= 4), τ4
max is obtained
by assigning the groups {Q3, Q3, Q4, Q3}, as it is shown in
Table 2.
In general, the maximum inter-marker distance is calcu-
lated as:
τn
max = 2 4C
3.(8)
4. Proposed solutions
This section presents our proposals to generate marker
dictionaries using MILP. First, a short introduction to MILP
is given. Then, our first model is presented, which obtains
optimal solutions. Finally, our second model, which obtains
suboptimal solutions within restricted time is presented.
4.1. Mixed Integer Linear Programming (MILP)
An integer linear programming (ILP) problem is a math-
ematical optimization or feasibility program in which some
or all of the decision variables are restricted to be integers
and the objective function and the constraints are linear.
The canonical form of a integer linear program is:
maximize ctx
Subject to
Ax b
x0
xZ,
(9)
where xis the vector of decision variables, cis the coef-
ficient vector of the objective function, Ais the coefficient
matrix of the constraints and bis the constant terms vec-
tor. The last constraint forces the decision variables to be
integers, although in practice this constraint can be applied
to some or all of the variables. In case all variables are
constrained to be integers, the problem is known as a Pure
Integer Linear Programming Problem (PILP), otherwise it
is known as Mixed Integer Linear Programming (MILP).
Whereas Linear Programming problems belong to complex-
ity class P [34], which means they are efficiently solvable,
ILP or MILP problems are known to be NP-hard due to the
integer restriction and, thus, they cannot be solved in poly-
nomial time [35]. In fact, the particular case where the de-
cision variables are binary is one of the problems in the well
known list of Karp’s 21 NP-complete problems [36]. This
also implies that the computational complexity cannot be de-
termined, neither analytically or experimentally.
However, these kind of problems have been extensively
studied in the literature and many techniques have been pro-
posed in order to obtain the optimal solution efficiently. The
main techniques are based in the Branch and Cut method
[37], which is an iterative process that combines the Branch
and Bound [38] algorithm and the use of cutting planes [39].
Branch and Bound is an optimization algorithm based on
a search tree which is explored by partitioning the search
space on each node. It is comprised by the branching step
and the bounding step.
During the branching step, a node is split in several
branches by dividing the possible values of a specific variable,
so that the union of all the branches covers all the possibil-
ities. In the MILP procedure, this is performed by splitting
the different values that a decision variable can take.
The bounding step determines the lower and upper bounds
of the optimization function in a particular branch. If these
bounds cannot surpass the current best solution, the branch
is pruned, reducing the search space. In the MILP case,
non-integral solutions to LP relaxations, i.e. the problem
without considering the integer constraints, serve as upper
bounds and integral solutions serve as lower bounds.
Finally, cutting planes can be applied during the opti-
mization process to further reduce the search space. Cutting
planes generate new restrictions for the model that are sat-
isfied for any feasible solution, i.e. any integer solution, but
violated by the current solution of the LP relaxation, so that
the next non-integer optimal solution should be closer to the
integer one.
The Branch and Cut algorithm explores the tree until find-
ing the optimal solution, nevertheless many feasible non-
optimal solutions can also be found during the process.
4.2. Optimal Dictionary
In order to obtain dictionaries with optimal inter-marker
distances, we propose a MILP model to obtain the maximum
value of the cost function τ(D). This model is processed by
a MILP solver so that Branch and Cut algorithm is applied
to reduce the search space and speed up the convergence to
the optimum.
The decision variables of the proposed model are the bits
mi
jof the dictionary markers, i.e., one binary variable per
bit. In order to formulate the MILP problem, let us rewrite
the objective function in Eq. 7 as:
τ(D) = min
min
miD
mk∈A0(mi)
{H(mi, mk)},min
mi,mjD
mi6=mj
mk∈A(mj)
{H(mi, mk)}
.
(10)
Thus, our goal can be enunciated as maximizing the min-
imum of a set of Hamming distances, some of which are
self-distances and the others are distances between pair of
6
markers. The Hamming distance between two markers can
be expressed as:
H(mi, mj) =
n×n
X
k=1
mi
kmj
k,(11)
being the exclusive-or operator. Since this is a non-
linear operation, it must be reformulated as a linear one in
order to be represented in a MILP model. This is accom-
plished by introducing, for each exclusive-or operation, a
new auxiliary binary decision variable, δ, and the following
set of constraints:
mi
kmj
k=mi
k+mj
k2δ
δmi
k
δmj
k
δmi
k+mj
k1.
(12)
Finally, since the objective function is the minimum of
a set of values, a new auxiliary decision variable, τ, is
added to represent the minimum of all the Hamming dis-
tances (Eq. 10). The proposed problem formulation is then
defined as:
maximize τ
Subject to, miD,
(I) H(mi, mk)τ0mk∈ A(mj),mjD, j 6=i
(II) H(mi, mk)τ0mk∈ A0(mi)
(III) 0ττn
max
(IV) I(mi)− I(mk)0mk∈ A0(mi)
(V) I(mi)− I(mi+1 )0mi+1 D
(VI) X
mjD
n×n
X
k=0
mj
kdn2
2,
(13)
so that the minimum distance τis maximized, ensuring
that every Hamming distance is larger or equal to this value.
Note that after the optimization process, the variable τwill
contain the value of τ(D).
Constraint (I) guarantees that the distance between any
pair of markers in the dictionary is greater than or equal
to τ, while constraint (II) guarantees the same condition
for all the self-distances. Note that the previous model is
a simplified version since each Hamming distance is repre-
sented by a sum of exclusive-or operations (which include
the decision variables associated to the marker bits, see Eq.
11). Furthermore, each exclusive-or operation requires the
addition of the inequalities in Eq. 12 and the auxiliary vari-
ables δ. We have decided not to represent all these auxiliary
information in Eq. 13 for the sake of clarity.
Constraint (III) defines an upper bound for the value of
τ. This bound is the maximum inter-marker distance τn
max,
which is theoretically obtained in Sec. 3.1. However, if an
optimal dictionary with fewer markers has been previously
generated, its optimal objective value can be employed as
upper-bound, since it is not possible to surpass the optimal
solution of a dictionary with fewer markers. This modifica-
tion accelerates the convergence process.
Finally, in order to reduce the search space, constraints
(IV), (V) and (VI) are added to remove symmetric solutions.
Note that for an optimal solution of the model, another
optimal solution can be obtained by just rotating any of the
markers (i.e., selecting another marker from its analogous set
A(mi)). To avoid this, only the markers with the highest
encoded value in the analogous sets are considered. This
is achieved by constraint (IV), where I(mi) represents the
number encoded by the marker bits:
I(mi) =
n×n
X
k=1
2k1mi
k.(14)
Another symmetry arises from the fact that a dictionary is
a vector of markers, thus, permutations of its elements lead
to equivalent solutions. Constraint (V) is added to avoid
this symmetry by forcing an strict ascending order of the
markers.
Finally, an optimal solution can be converted into another
one by just inverting all its bits. To avoid this symmetry,
constraint (VI) forces that the total number of ones has to be
higher than or equal to the total number of zeros. Thus, only
one of the two opposite solutions is valid. The parameter d
is the cardinal of the dictionary.
4.3. Suboptimal Dictionary
The previous model achieves the optimal inter-marker dis-
tance results. However, as shown in Sec. 5, the convergence
times are too long despite the efforts made to reduce the
search space. As a consequence, it can only be applied to
generate dictionaries with relatively small number of mark-
ers and bits. In this section, an alternative formulation that
obtains suboptimal results in much less time is proposed.
In this case, instead of generating all the markers in a
single MILP optimization step, an iterative method, where
markers are generated incrementally, is proposed. At each
iteration, t, a new MILP model is defined to generate the
next marker mt, that maximizes its distance to all the pre-
viously generated markers and its self-distance. As in the
previous case, the decision variables of this model are the
marker bits. Then, the objective function at the t-th itera-
tion is defined as:
τ(Dt) = min S(mt),min
miDt1
{D(mt, mi)}=
min
min
mk∈A0(mt)H(mt, mk),min
mkDt1
mi∈A(mk)
{H(mt, mk)}
.
(15)
7
Once again, our goal is equivalent to the maximization of
the minimum of a set of Hamming distances, some of them
related to the new marker self-distance and the rest related
to its distance to the previous markers. The self-distances
can be expressed using the same transformation described
for the previous model (Eq. 12). However, this transforma-
tion is not required to calculate the distance between two
markers since only the bits of one of them are variables. As
a consequence, the transformation in Eq. 12 can be refor-
mulated as:
mi
kmj
k=(mi
k,if mj
k= 0
1mi
k,otherwise ,(16)
where mi
kis an unknown bit represented by a decision vari-
able and mj
kis a bit from a marker in Dt1.
The proposed model is similar to the previous one, in-
cluding the τvariable which represents the τ(D) value.
However, in this case the number of involved Hamming dis-
tances is smaller and so is the total number of constraints.
The proposed MILP formulation is as follows:
maximize τ
Subject to
(I) H(mt, mk)τ0mk∈ A0(mt)
(II) H(mt, mk)τ0mk∈ A(mi),miDt1
(III) 0ττ(Dt1)
(IV) I(mt)− I(mk)0mk∈ A0(mt).
(17)
Similarly to the optimal model, constraint (I) guarantees
that the marker distance (Eq. 4) between the new generated
marker, mt, and any marker previously generated is greater
than or equal to τ. Constraint (II) guarantees the same
condition for the self-distance of mt.
The upper bound of τin constraint (III) refers to the
fact that the maximum cost of a solution is lower than or
equal to the maximum cost of the previous dictionary. This
assumption is only valid if the previous markers have been
created using the same suboptimal iterative method. For
the first generated marker, the value τn
max (Sec. 3.1) can be
employed for constraint (III). Finally, constraint (IV) is the
same than in the optimal model, i.e., selecting the marker
with highest encoded value from its analogous set.
The proposed method works iteratively, i.e., a new MILP
model is generated and solved to obtain a new marker at
each iteration. This process repeats until the desired num-
ber of markers, d, is generated. It must be noted that, at
each iteration, the optimal solution may not be unique, and
that the selection of a solution will condition the subsequent
iterations. As a result, the marker dictionary obtained by
this method is not optimal, nor unique, contrary to the pre-
vious model. However, the processing time spent by a MILP
solver is much shorter compared to the optimal formulation
and the results, as shown in Sec. 5, are still remarkable.
The whole process is summarized in Algorithm 1 .
Algorithm 1 Suboptimal dictionary generation.
1: D0# Empty dictionary
2: for tfrom 1 to ddo
3: Generate MILP Model for Dt# See Model in Eq. 17
4: mtSolve MILP Model # Get optimal marker
5: DtDt1Smt# Add to previous markers
6: end for
7: Return last dictionary, Dd
Additionally, in order to ensure the convergence within a
restricted amount of time, a time limit can be set to each
MILP model. Thus, if the optimization is not finished when
the limit is reached, the best feasible solution obtained at
that moment is selected. This is a common strategy in
Mixed Integer Programming to guarantee convergence when
suboptimal solutions are allowed.
5. Experiments and results
This section shows the results of the experimentation car-
ried out to validate our proposals. First, the optimal formu-
lation is studied. Then, the suboptimal model is analyzed in
terms of inter-marker distances and generation times. The
obtained results have been compared to those produced by
the best alternatives in the literature, ArUco [12], AprilTags
[13], ARTag [10] and ARToolKitPlus [11].
The Gurobi optimizer v5.6 [40] has been chosen to solve
the MILP models since it is the solver achieving the fastest
convergence times for our models. The default Gurobi con-
figuration has shown to be adequate for our proposals, since
most of the parameters are configured automatically based
on the model characteristics. All tests were performed us-
ing the twelve cores of a system equipped with an Intel Core
i7-3930K 3.20 Ghz processor, 16 GB of RAM and Ubuntu
14.04 as operating system with a load average of 0.1.
It must be indicated that the generated dictionaries by our
proposals have been set publicly available as a part of the
ArUco library [12].
5.1. Optimal formulation
The generation of a marker dictionary is an off-line process
which is typically performed only once. As a consequence,
the generated time is not a critical aspect of the process.
However, the main problem of the proposed optimal model
is that its generation times are too long for high number
of markers and number of bits. Note that the search space
for the optimal model is 2n×n×d(being dthe number of
generated markers) which indicates an exponential growth.
From our experimentation, it has been observed that for
markers of sizes bigger than 5×5 bits, the convergence times
are too long to study the results, thus, our experimentation
has been restricted to smaller sizes. Even for marker sizes
of 3 ×3 and 4 ×4 bits, we have been limited to dictionaries
of 37 and 8 markers respectively.
Figure 5 shows the generation times for the optimal
model. For each dictionary size, the τ(D) value obtained
8
0
1
2
3
4
5
6
7
8
0 5 10 15 20 25 30 35
Generation time (days)
d
3×3 bits
4×4 bits
Figure 5: Generation times for the optimal formulation proposal as a
function of the dictionary size for 3 ×3 and 4 ×4 bits. As it can be ob-
served, the generation times increase considerably with the dictionary
size and the number of bits. A formal study for bigger marker sizes or
number of markers is not feasible.
in the previous generation was used as an upper bound of
the objective function, as it is explained in the model de-
scription in Sec. 4.2.
It can be observed that the convergence times are indeed
considerably long. For instance, the generation of an opti-
mal dictionary of 37 markers and 3 ×3 bits lasted 6 days,
and the generation of a dictionary of only 8 markers and
4×4 bits lasted more than 7 days. Due to this time limita-
tion, we have not been able to study the optimal dictionar-
ies for a higher number of bits or dictionary sizes, neither to
compare the optimal results with the rest of methods. Nev-
ertheless, it must be noted that the formulation is suitable
for those applications where the required number of markers
and marker size are not too high, keeping in mind that dic-
tionary generation is an off-line process which is necessary
to perform only once.
Furthermore, these long times justify the suboptimal
model proposal which converges notably faster, allowing the
generation of bigger dictionaries, both in number of markers
and bits.
5.2. Suboptimal formulation
5.2.1. Analysis of dictionary distances
This section compares the distances obtained by the sub-
optimal formulation with those obtained by the ArUco,
AprilTags, ARTag and ARToolKitPlus methods. To that
end, dictionaries with up to 250 markers have been gen-
erated, covering in our opinion the requirements of most
fiducial marker applications. The marker sizes have been
selected from a range of sizes from 4 ×4 to 25 ×25 bits.
A time restriction of 150 seconds has been set to solve
each MILP model in order to ensure the convergence of the
optimization process within restricted time. Once the limit
is reached, the best feasible solution at that moment is se-
lected.
The ArUco method is an iterative process which employs
an objective distance value. This value is decremented af-
ter an amount of unproductive iterations. To compare the
results in the same conditions, the ArUco method was also
configured to decrement this value after 150 seconds of un-
productive iterations.
In the AprilTags method, the objective distance has to be
specified by the user and its method does not propose any spe-
cific condition to reduce this value. Thus, we have employed
the same condition than in the ArUco case, i.e. reducing
the objective distance after 150 seconds of unproductive it-
erations.
The marker dictionaries of ARTag and ARToolKitPlus
are fixed and their markers are composed by 6×6bits. Thus,
they cannot been compared for different marker sizes. In the
ARTag case, different dictionary sizes are obtained by tak-
ing the specific subset of markers in the order recommended
by the authors. On the other hand, ARToolKitPlus does not
provide a recommended order and hence, its dictionary size
is also fixed.
Figure 6 shows the mean τ(D) value for 30 executions as
a function of the dictionary size and for different marker
sizes. The results of the different executions only present
deviations in the intervals where the objective distance is
reduced, which correspond to the slopes of the curves. In
the flat regions, there is no deviation and the same result is
achieved for all the executions.
As it can be observed, the proposed suboptimal method
outperforms the results of all the other proposals. For the
smallest size, 4 ×4 bits, there are not remarkable differences
since the search space is smaller. However, the improve-
ments increase with the marker size. For marker sizes of
6×6 bits and bigger, the suboptimal method achieves re-
sults which clearly surpass the other methods. For instance,
for a dictionary composed by 22 markers of 10 ×10 bits,
the suboptimal model achieves a τ(D) value of 47 while the
second best alternative, the ArUco method, achieves a value
of 42. This implies that the suboptimal dictionary can cor-
rect up to 23 erroneous bits whereas the ArUco method can
only correct up to 20 bits, a difference of 3 bits. For 25 ×25
markers, the difference increases to 15 bits.
It is also remarkable how the results of the AprilTags
method notably degrade as the marker size increases. This
indicates that the employed strategy, which is based in an as-
cending order search, is not suitable for large search spaces.
The ARToolKitPlus library proposes two different dictio-
naries, also known as ARToolKitPlus Simple and ARToolK-
itPlus BCH. However, both of them are fixed and, contrary
to ARTag, they do not provide a recommended order and,
consequently, the inter-marker distance cannot be analyzed
as a function of the dictionary size. Instead, we have com-
pared against the ARToolKitPlus dictionaries by generating
dictionaries with the same characteristics in terms of dictio-
nary size and number of bits. ARToolKitPlus Simple dictio-
nary is composed by 512 markers of 6×6bits and achieves
aτ(D)value of 4, while the ARToolKitPlus BCH dictionary
is composed by 4096 markers of 6×6bits achieving a τ(D)
9
2
4
6
8
10
12
0 50 100 150 200 250
τ(D)
d
Suboptimal
ArUco
AprilTags
(a) 4 ×4 bits
8
10
12
14
16
18
20
22
24
26
0 50 100 150 200 250
τ(D)
d
Suboptimal
ArUco
AprilTags
ARTag
(b) 6 ×6 bits
20
30
40
50
60
70
0 50 100 150 200 250
τ(D)
d
Suboptimal
ArUco
AprilTags
(c) 10 ×10 bits
100
200
300
400
500
0 50 100 150 200 250
τ(D)
d
Suboptimal
ArUco
AprilTags
(d) 25 ×25 bits
Figure 6: Minimum inter-marker distances τ(D) for the suboptimal formulation and the ArUco, AprilTags and ARTag methods, as a function
of the dictionary size for different number of bits. ARTag dictionary is only shown for 6 ×6 bits since its dictionary is fixed. It can be observed
that the suboptimal model outperforms the results of the other methods.
Dictionary Size Marker Size τ(D)
Original Suboptimal ArUco AprilTags
ARTo olKitPlus Simple 512 markers 6 ×6 bits 4 11 10 10
ARTo olKitPlus BCH 4096 markers 6 ×6 bits 2 98 8
Table 3: Inter-marker distance comparison between the dictionaries from the ARToolKitPlus library and those generated by ArUco, AprilTags
and our suboptimal proposal with the same characteristics, i.e. same number of markers and bits. The column Original indicates the inter-
marker distance of the original ARToolKitPlus dictionaries. The results of the original dictionaries are significantly lower than those obtained
by the rest of methods. The results of our suboptimal proposal surpass the other approaches, allowing the correction of one more bit in
comparison to ArUco and AprilTags.
value of 2.
Table 3 shows the results obtained by the suboptimal,
ArUco and AprilTags methods in comparison to the AR-
ToolKitPlus dictionaries. It can be observed that the results
of the original ARToolKitPlus dictionaries are considerably
low in comparison to the rest of methods. For instance, in
the case of ARToolKitPlus Simple, the original dictionary
can perform an error correction of 1 bit, whereas the ArUco
and AprilTags dictionaries can correct up to 4 bits due to the
larger inter-marker distance of 10. However, our suboptimal
proposal achieves a inter-marker distance of 11, allowing a
maximum error correction of 5 bits and surpassing the rest
of methods.
The same situation occurs for ARToolKitPlus BCH case,
the original dictionary cannot perform error correction while
ArUco and AprilTags can correct up to 3 bits due to an inter-
marker distance of 8. Once again, our suboptimal approach
surpasses the rest of methods achieving a maximum inter-
marker distance of 9 and allowing error correction up to 4
bits.
The experimentation shows that the proposed suboptimal
model produces dictionaries with the longest inter-marker
distances in the literature, incrementing the error correction
capabilities and only surpassed by the results of the optimal
model, which is not applicable to a high number of markers
and bits.
10
0
200
400
600
800
1000
1200
0 50 100 150 200 250
Generation time (s)
d
Suboptimal
ArUco
AprilTags
(a) 4 ×4 bits
0
500
1000
1500
2000
2500
3000
0 50 100 150 200 250
Generation time (s)
d
Suboptimal
ArUco
AprilTags
(b) 6 ×6 bits
0
1000
2000
3000
4000
5000
6000
7000
8000
0 50 100 150 200 250
Generation time (s)
d
Suboptimal
ArUco
AprilTags
(c) 10 ×10 bits
0
5000
10000
15000
20000
25000
0 50 100 150 200 250
Generation time (s)
d
Suboptimal
ArUco
AprilTags
(d) 25 ×25 bits
Figure 7: Generation times for the suboptimal, ArUco and AprilTags methods as a function of the dictionary size for different number of bits.
The suboptimal times are, in general, shorter than the times of the other approaches, although the difference decreases as the marker size
increase.
5.2.2. Generation Time
This section analyses the times employed by the subop-
timal, ArUco and AprilTags methods to generate the dic-
tionaries shown in the previous section. As for ARTag and
ARToolKitPlus, the comparison is not feasible since their
dictionaries are fixed and, hence, there is no generation pro-
cess.
Figure 7 shows the mean generation times for 30 execu-
tions as a function of the dictionary size for different number
of bits. As can be observed, the generation times are notably
shorter compared to the optimal case. For instance, the gen-
eration of an optimal dictionary composed by 6 markers and
4×4 bits needs more than 1 day, while the suboptimal model
employed less than 20 seconds to generate a dictionary of 250
markers with the same marker size.
Also, the generation times of the suboptimal method are,
in general, shorter than those of ArUco and AprilTags.
These differences are specially relevant for smaller marker
sizes. The generation times are only shorter in the AprilT-
ags case for 25 ×25 bits. However, as it has been shown in
Sec. 5.2.1, the inter-marker distances obtained by AprilTags
in this case are completely unsatisfactory in comparison to
those obtained by our proposal or the ArUco method.
As for the ArUco and AprilTags results, it can be noted
that there are some slopes in the plots, where the times
increase sharply. This is especially remarkable for small
marker sizes (4 ×4 and 6 ×6 bits). These peaks correspond
to those dictionary sizes where the objective distance is de-
creased. Since the objective distance can only be reduced
by reaching the time limit, the generation time increases
considerably with each reduction.
On the other hand, the suboptimal proposal reduces the
objective distance by adjusting their bounds during the
MILP optimization based on the linear relaxation of the
problem. This means that the time limit does not need
to be reached every time the objective distance is reduced
and it explains why the suboptimal times are significantly
shorter than the times of the other approaches for 4×4 bits.
However, as the marker size increases, the times of the sub-
optimal method start to grow and some sharply slopes ap-
pear (similarly to those on the ArUco or AprilTags curves).
These peaks also correspond to the dictionary sizes where
the time limit is reached. In these cases, the solver takes the
best feasible solution found until that moment and continues
11
with the next generation.
Note that for the three methods, after the upper bound of
the objective function is reduced, the next generated mark-
ers are less restricted and they can be generated faster,
which explains the flat lines after each slope.
For the biggest marker size, 25 ×25 bits, there is a high
number of objective distance reductions so that the slopes
become less distinguishable in the three cases.
5.2.3. Comparison to optimal dictionary
In this section, the distances obtained by the subopti-
mal method are compared to those obtained by the optimal
model. However, as it is shown in Section 5.1, the experi-
mentation carried out with the optimal model is limited to
small dictionaries and number of bits due to its time com-
plexity. As a consequence, the results are not enough to
draw any conclusion. Table 4 summarizes the distance re-
sults for dictionaries up to 8 markers and 4 ×4 bits. The
distances of the suboptimal model correspond to the mean
of 30 executions.
As it can be observed, the maximum difference between
the distances of the optimal and suboptimal models is 2,
although, as it has been stated, we cannot draw a conclusion
due to the reduced number of results.
6. Conclusions
This paper has proposed two novel methods to obtain
fiducial marker dictionaries based on the Mixed Integer Lin-
ear Programming paradigm. The first model, contrary to
any of the previous methods, guarantees the optimality of
the dictionary in terms of inter-marker distance for any num-
ber of bits and markers. However, the generation times are
too long for many practical situations. The second method,
proposes an iterative formulation that, although does not
guarantee optimality, achieves better results than the state-
of-the art methods within restricted time.
As a consequence, the dictionaries generated with our pro-
posals allow the detection and correction of a higher number
of erroneous bits than previous approaches. These results
lead to a direct improvement in the marker detection pro-
cess.
Finally, it must be indicated that the generated dictionar-
ies by our proposals have been set publicly available as a part
of the ArUco library [12].
7. Acknowledgments
We are grateful to the financial support provided by
Science and Technology Ministry of Spain and FEDER
(projects TIN2012-32952 and BROCA).
dτ(D)
Optimal Suboptimal
1 10 10
2 8 8
3 8 8
4 8 7.37
5 8 6.2
6 8 6
7 8 6
8 8 6
Table 4: Suboptimal distances compared to the optimal distances for
4×4 bits and dictionary size up to 8 markers.
References
[1] B. Williams, M. Cummins, J. Neira, P. Newman,
I. Reid, J. Tard´os, A comparison of loop closing tech-
niques in monocular SLAM, Robotics and Autonomous
Systems (2009) 1188–1197.
[2] E. Royer, M. Lhuillier, M. Dhome, J.-M. Lavest,
Monocular vision for mobile robot localization and au-
tonomous navigation, International Journal of Com-
puter Vision 74 (3) (2007) 237–260.
[3] R. T. Azuma, A survey of augmented reality, Presence
6 (1997) 355–385.
[4] H. Kato, M. Billinghurst, Marker tracking and HMD
calibration for a video-based augmented reality confer-
encing system, in: Proceedings of the 2nd IEEE and
ACM International Workshop on Augmented Reality,
IWAR ’99, IEEE Computer Society, Washington, DC,
USA, 1999, pp. 85–94.
[5] V. Lepetit, P. Fua, Monocular model-based 3d tracking
of rigid objects: A survey, in: Foundations and Trends
in Computer Graphics and Vision, 2005, pp. 1–89.
[6] W. Daniel, R. Gerhard, M. Alessandro, T. Drummond,
S. Dieter, Real-time detection and tracking for aug-
mented reality on mobile phones, IEEE Transactions
on Visualization and Computer Graphics 16 (3) (2010)
355–368.
[7] G. Klein, D. Murray, Parallel tracking and mapping
for small AR workspaces, in: Proceedings of the 2007
6th IEEE and ACM International Symposium on Mixed
and Augmented Reality, ISMAR ’07, IEEE Computer
Society, Washington, DC, USA, 2007, pp. 1–10.
[8] K. Mikolajczyk, C. Schmid, Indexing based on scale
invariant interest points., in: ICCV, 2001, pp. 525–531.
[9] D. G. Lowe, Object recognition from local scale-
invariant features, in: Proceedings of the International
Conference on Computer Vision-Volume 2 - Volume 2,
ICCV ’99, IEEE Computer Society, Washington, DC,
USA, 1999, pp. 1150–1157.
12
[10] M. Fiala, Designing highly reliable fiducial markers,
IEEE Trans. Pattern Anal. Mach. Intell. 32 (7) (2010)
1317–1324.
[11] D. Wagner, D. Schmalstieg, ARToolKitPlus for pose
tracking on mobile devices, in: Computer Vision Win-
ter Workshop, 2007, pp. 139–146.
[12] S. Garrido-Jurado, R. Mu˜noz-Salinas, F. Madrid-
Cuevas, M. Mar´ın-Jim´enez, Automatic generation and
detection of highly reliable fiducial markers under oc-
clusion, Pattern Recognition 47 (6) (2014) 2280 – 2292.
[13] E. Olson, AprilTag: A robust and flexible visual fiducial
system, in: Proceedings of the IEEE International Con-
ference on Robotics and Automation (ICRA), IEEE,
2011, pp. 3400–3407.
[14] K. Dorfm¨uller, H. Wirth, Real-time hand and head
tracking for virtual environments using infrared bea-
cons, in: in Proceedings CAPTECH’98. 1998, Springer,
1998, pp. 113–127.
[15] M. Ribo, A. Pinz, A. L. Fuhrmann, A new optical
tracking system for virtual and augmented reality ap-
plications, in: In Proceedings of the IEEE Instrumenta-
tion and Measurement Technical Conference, 2001, pp.
1932–1936.
[16] V. A. Knyaz, R. V. Sibiryakov, The development of
new coded targets for automated point identification
and non-contact surface measurements, in: 3D Surface
Measurements, International Archives of Photogram-
metry and Remote Sensing, Vol. XXXII, part 5, 1998,
pp. 80–85.
[17] L. Naimark, E. Foxlin, Circular data matrix fiducial
system and robust image processing for a wearable
vision-inertial self-tracker, in: Proceedings of the 1st In-
ternational Symposium on Mixed and Augmented Real-
ity, ISMAR ’02, IEEE Computer Society, Washington,
DC, USA, 2002, pp. 27–36.
[18] J. Rekimoto, Y. Ayatsuka, CyberCode: designing aug-
mented reality environments with visual tags, in: Pro-
ceedings of DARE 2000 on Designing augmented reality
environments, DARE ’00, ACM, New York, NY, USA,
2000, pp. 1–10.
[19] M. Rohs, B. Gfeller, Using camera-equipped mobile
phones for interacting with real-world objects, in: Ad-
vances in Pervasive Computing, 2004, pp. 265–271.
[20] E. Ouaviani, A. Pavan, M. Bottazzi, E. Brunelli,
F. Caselli, M. Guerrero, A common image processing
framework for 2d barcode reading, in: Image Processing
and Its Applications. Seventh International Conference
on (Conf. Publ. No. 465), Vol. 2, 1999, pp. 652–655.
[21] M. Kaltenbrunner, R. Bencina, reacTIVision: a
computer-vision framework for table-based tangible in-
teraction, in: Proceedings of the 1st international con-
ference on Tangible and embedded interaction, TEI ’07,
ACM, New York, NY, USA, 2007, pp. 69–74.
[22] M. Fiala, Comparing ARTag and ARToolKit Plus fidu-
cial marker systems, in: IEEE International Workshop
on Haptic Audio Visual Environments and their Appli-
cations, 2005, pp. 147–152.
[23] J. Rekimoto, Matrix: A realtime object identifica-
tion and registration method for augmented reality, in:
Third Asian Pacific Computer and Human Interaction,
Kangawa, Japan, IEEE Computer Society, 1998, pp.
63–69.
[24] W. Peterson, D. Brown, Cyclic codes for error detec-
tion, Proceedings of the IRE 49 (1) (1961) 228–235.
[25] S. Lin, D. Costello, Error Control Coding: Fundamen-
tals and Applications, Prentice Hall, 1983.
[26] D. Schmalstieg, A. Fuhrmann, G. Hesina, Z. Szalaari,
L. M. Encarna¸ao, M. Gervautz, W. Purgathofer,
The Studierstube augmented reality project, Presence:
Teleoper. Virtual Environ. 11 (1) (2002) 33–54.
[27] D. Flohr, J. Fischer, A Lightweight ID-Based Extension
for Marker Tracking Systems, in: Eurographics Sym-
posium on Virtual Environments (EGVE) Short Paper
Proceedings, 2007, pp. 59–64.
[28] A. Schrijver, Theory of Linear and Integer Program-
ming, John Wiley & Sons, Inc., New York, NY, USA,
1986.
[29] R. Niemann, P. Marwedel, An algorithm for hard-
ware/software partitioning using mixed integer linear
programming, Design Automation for Embedded Sys-
tems 2 (2) (1997) 165–193.
[30] C.-W. Hui, Y. Natori, An industrial application using
mixed-integer programming technique: A multi-period
utility system model, Computers and Chemical Engi-
neering 20, Supplement 2 (0) (1996) 1577–1582.
[31] H. Morais, P. K´ad´ar, P. Faria, Z. A. Vale, H. Khodr,
Optimal scheduling of a renewable micro-grid in an
isolated load area using mixed-integer linear program-
ming, Renewable Energy 35 (1) (2010) 151–156.
[32] T. G¨onen, Distribution-system planning using mixed-
integer programming, Generation, Transmission and
Distribution, IEE Proceedings C 128 (1981) 70–79(9).
[33] A. Richards, J. P. How, Aircraft trajectory planning
with collision avoidance using mixed integer linear
programming, American Control Conference (ACC) 3
(2002) 1936–1941 vol.3.
13
[34] J. A. Nelder, R. Mead, A simplex method for function
minimization, The computer journal 7 (4) (1965) 308–
313.
[35] A. Schrijver, Theory of Linear and Integer Program-
ming, John Wiley & Sons, Chichester, 1986.
[36] M. R. Garey, D. S. Johnson, Computers and intractabil-
ity: a guide to the theory of NP-completeness. 1979,
San Francisco, LA: Freeman.
[37] M. Padberg, G. Rinaldi, A branch-and-cut algorithm
for the resolution of large-scale symmetric traveling
salesman problems, SIAM review 33 (1) (1991) 60–100.
[38] E. L. Lawler, D. E. Wood, Branch-and-bound methods:
A survey, Operations research 14 (4) (1966) 699–719.
[39] H. Marchand, A. Martin, R. Weismantel, L. Wolsey,
Cutting planes in integer and mixed integer program-
ming, Discrete Applied Mathematics 123 (1) (2002)
397–446.
[40] I. Gurobi Optimization, Gurobi optimizer reference
manual, http://www.gurobi.com (2014).
14
... While highprecision interferometers can measure these irregularities in a lab environment before deployment, scaling this measurement process for mass production and after deployment remains a major hurdle. Synthesizing novel viewpoints from aberrated image sequences is therefore non-trivial, as the surface irregularities introduce non-radially symmetric distortions and optical artifacts that conventional calibration techniques struggle to handle [10,15,23,29,35,36]. These distorted features cannot be reliably matched to their warped counterparts, especially in complex scenes [15,17,21,25,39]. ...
... Synthesizing novel viewpoints from aberrated image sequences is therefore non-trivial, as the surface irregularities introduce non-radially symmetric distortions and optical artifacts that conventional calibration techniques struggle to handle [10,15,23,29,35,36]. These distorted features cannot be reliably matched to their warped counterparts, especially in complex scenes [15,17,21,25,39]. Recent advancements have explored embedding camera parameter tuning within reconstruction objectives, allowing for synthesis with noisy calibration parameters [18,35] and reducing dependency on precise initializations. ...
Preprint
Full-text available
Recent extended reality headsets and field robots have adopted covers to protect the front-facing cameras from environmental hazards and falls. The surface irregularities on the cover can lead to optical aberrations like blurring and non-parametric distortions. Novel view synthesis methods like NeRF and 3D Gaussian Splatting are ill-equipped to synthesize from sequences with optical aberrations. To address this challenge, we introduce SynthCover to enable novel view synthesis through protective covers for downstream extended reality applications. SynthCover employs a Refractive Field that estimates the cover's geometry, enabling precise analytical calculation of refracted rays. Experiments on synthetic and real-world scenes demonstrate our method's ability to accurately model scenes viewed through protective covers, achieving a significant improvement in rendering quality compared to prior methods. We also show that the model can adjust well to various cover geometries with synthetic sequences captured with covers of different surface curvatures. To motivate further studies on this problem, we provide the benchmarked dataset containing real and synthetic walkable scenes captured with protective cover optical aberrations.
... Visual fiducial markers [21] are artificial markers that can help with localization when seen with a camera. Visual fiducial markers are binary patterns that can be identified and decoded from an RGB data stream (camera input). ...
... In contrast to TagSLAM, we use the same tag in different orientations (rotated 90°) on each face of the cuboid. This helps with reducing the number of markers in the dictionary and thereby enhancing the performance of the ArUCO marker detection [36] [21]. In addition, we propose a reduced marker placement algorithm to reduce the number of markers required to cover a mapped indoor environment. ...
Article
Full-text available
Large indoor spaces having complex layouts are often difficult to navigate. Indoor spaces in hospitals, universities, shopping complexes, etc., carry multi-modal information through text and symbols. Hence, it is difficult for Blind and Visually Impaired (BVI) people to independently navigate such spaces. Indoor environments are usually GPS-denied; therefore, Bluetooth-based, WiFi-based, or Range-based methods are used for localization. These methods incur high setup costs, lack good accuracy, and sometimes need specialized sensing equipment. We propose a Visual Assist (VA) system for the indoor navigation of BVI individuals using visual fiducial markers for localization. State-of-the-art (SOTA) approaches for localization using visual fiducial markers use fixed cameras having a limited field of view. We employ a Pan- Tilt turret-mounted camera, which provides a 360° field of view for enhanced marker tracking. We, therefore, need fewer markers for mapping and navigation. We further use our localization model for enhancing existing SLAM methods, namely, Hector SLAM, ORBSLAM and UCOSLAM. The efficacy of the proposed system is measured on three metrics, i.e., Root Mean Square Error(RMSE), Average Distance to Nearest Neighbours (ADNN), and Absolute Trajectory Error (ATE). The proposed system offers accurate trajectory tracking upto ±8 cm . ADNN and RMSE of Hector SLAM, ORB-SLAM, and UcoSLAM improve by 9.1%, 8.9%, and 7%, respectively while ATE is reduced by 6.7%, 4.5%, and 5.2%.
... Furthermore, the size in the transversal plane (XY), in pixels, is utilized to scale the Compton and RGB images properly through a transformation factor that relates both units, meters and pixels. For this purpose, we developed an algorithm based on the Aruco C++ library [34,35], which was chosen due to its robustness and straight forward implementation in our C++ based software. The geometric calibration of the RGB-and γ-ray cameras was accomplished by means of dedicated laboratory measurements using a 22 Na source in combination with fiducial markers, and following the common procedure described in previous works [6,8]. ...
Preprint
Full-text available
Nuclear energy production is inherently tied to the management and disposal of radioactive waste. Enhancing classification and monitoring tools is therefore crucial, with significant socioeconomic implications. This paper reports on the applicability and performance of a high-efficiency, cost-effective and portable Compton camera for detecting and visualizing low- and medium-level radioactive waste from the decommissioning and regular operation of nuclear power plants. The results demonstrate the good performance of Compton imaging for this type of application, both in terms of image resolution and reduced measuring time. A technical readiness level of TRL7 has been thus achieved with this system prototype, as demonstrated with dedicated field measurements carried out at the radioactive-waste disposal plant of El Cabril (Spain) utilizing a pluarility of radioactive-waste drums from decomissioned nuclear power plants. The performance of the system has been enhanced by means of computer-vision techniques in combination with advanced Compton-image reconstruction algorithms based on Maximum-Likelihood Expectation Maximization. Finally, we also show the feasibility of 3D tomographic reconstruction from a series of relatively short measurements around the objects of interest. The potential of this imaging system to enhance nuclear waste management makes it a promising innovation for the nuclear industry.
... Jessica appears and disappears in the workspace, making the robot have to adapt its motion frequently. Jessica is tracked by an ArUco [24] marker via a camera. Throughout the experiment, the robot traverses between both ends of the table. ...
Preprint
With the goal of efficiently computing collision-free robot motion trajectories in dynamically changing environments, we present results of a novel method for Heuristics Informed Robot Online Path Planning (HIRO). Dividing robot environments into static and dynamic elements, we use the static part for initializing a deterministic roadmap, which provides a lower bound of the final path cost as informed heuristics for fast path-finding. These heuristics guide a search tree to explore the roadmap during runtime. The search tree examines the edges using a fuzzy collision checking concerning the dynamic environment. Finally, the heuristics tree exploits knowledge fed back from the fuzzy collision checking module and updates the lower bound for the path cost. As we demonstrate in real-world experiments, the closed-loop formed by these three components significantly accelerates the planning procedure. An additional backtracking step ensures the feasibility of the resulting paths. Experiments in simulation and the real world show that HIRO can find collision-free paths considerably faster than baseline methods with and without prior knowledge of the environment.
... the first applied the Perspective-n-Point (P3P) method (Lu, 2018) to the robot target region, whose four coplanar corners have known positions. The second involved attaching an ArUco tag to the robot (Romero-Ramirez et al., 2018;Garrido-Jurado et al., 2016), see Fig. 2. The objective of this section is to demonstrate that CL-SABER has comparable performance to these methods in terms of capturing relative pose trends. If so, then it is a valid scheme for regulating range in the absence of direct depth measurement or of cooperation. ...
Article
Full-text available
This paper explores range and bearing angle regulation of a leader–follower using monocular vision. The main challenge is that monocular vision does not directly provide a range measurement. The contribution is a novel concurrent learning (CL) approach, called CL Subtended Angle and Bearing Estimator for Relative pose (CL-SABER), which achieves range regulation without communication, persistency of excitation or known geometry and is demonstrated on a physical, robot platform. A history stack estimates target size which augments the Kalman filter (KF) with a range pseudomeasurement. The target is followed to scale without drift, persistency of excitation requirements, prior knowledge, or additional measurements. Finite excitation is required to achieve parameter convergence and perform steady-state regulation using CL-SABER. Evaluation using simulation and mobile robot experiments in special Euclidean planar space (SE(2)) show that the new method provides stable and consistent range regulation, as demonstrated by the inter-rater reliability, including in noisy and high leader acceleration environments.
... This 3D print is suitable for evaluating surface reconstructions as shape, color, occlusion and illumination are demanding. Four 3D-printed holders different in height and inclination of the uppermost surface are placed on the ear and carry custom-made multimodal markers: an ArUco [29,30] marker for microscope camera pose estimation and an CT X-Spot spherical markers (1.5 mm diameter, titanium CT markers, Beekley Medical) placed 1 mm below the ArUco marker origin (Fig. 2). ...
Article
Full-text available
Purpose Multi-zoom microscopic surface reconstructions of operating sites, especially in ENT surgeries, would allow multimodal image fusion for determining the amount of resected tissue, for recognizing critical structures, and novel tools for intraoperative quality assurance. State-of-the-art three-dimensional model creation of the surgical scene is challenged by the surgical environment, illumination, and the homogeneous structures of skin, muscle, bones, etc., that lack invariant features for stereo reconstruction. Methods An adaptive near-infrared pattern projector illuminates the surgical scene with optimized patterns to yield accurate dense multi-zoom stereoscopic surface reconstructions. The approach does not impact the clinical workflow. The new method is compared to state-of-the-art approaches and is validated by determining its reconstruction errors relative to a high-resolution 3D-reconstruction of CT data. Results 200 surface reconstructions were generated for 5 zoom levels with 10 reconstructions for each object illumination method (standard operating room light, microscope light, random pattern and adaptive NIR pattern). For the adaptive pattern, the surface reconstruction errors ranged from 0.5 to 0.7 mm, as compared to 1–1.9 mm for the other approaches. The local reconstruction differences are visualized in heat maps. Conclusion Adaptive near-infrared (NIR) pattern projection in microscopic surgery allows dense and accurate microscopic surface reconstructions for variable zoom levels of small and homogeneous surfaces. This could potentially aid in microscopic interventions at the lateral skull base and potentially open up new possibilities for combining quantitative intraoperative surface reconstructions with preoperative radiologic imagery.
... In addition to the driving data, we record eye-tracking (ET) data, of the human drivers using a VPS 19 [27] ET system. We use ArUco [28] markers displayed on the screen showing the video to transform the ET video to the RGB stream. As we were streaming the RGB video to the control station, we recorded that stream separately to record the same stream the driver sees, including, for example, camera artifacts. ...
Preprint
Full-text available
Dynamic Vision Sensors (DVS), offer a unique advantage in control applications, due to their high temporal resolution, and asynchronous event-based data. Still, their adoption in machine learning algorithms remains limited. To address this gap, and promote the development of models that leverage the specific characteristics of DVS data, we introduce the Multi-Modal Dynamic-Vision-Sensor Line Following dataset (MMDVS-LF). This comprehensive dataset, is the first to integrate multiple sensor modalities, including DVS recordings, RGB video, odometry, and Inertial Measurement Unit (IMU) data, from a small-scale standardized vehicle. Additionally, the dataset includes eye-tracking and demographic data of drivers performing a Line Following task on a track. With its diverse range of data, MMDVS-LF opens new opportunities for developing deep learning algorithms, and conducting data science projects across various domains, supporting innovation in autonomous systems and control applications.
Article
We propose an image recognition method that automatically measures the deterioration size from photographed images by a digital camera to improve the efficiency of inspections of communication tunnels used to accommodate communication cables. The proposed method detects equipment and deterioration areas from photographed images using a deep learning, and estimates pixel resolution (mm/pixel) from the detection results of metal rack areas and size information. The actual deterioration size is measured from the pixel resolution and the detection result of the deterioration area. We evaluated the detection results of the proposed method using 1, 700 images, and confirmed that the proposed method can detect with an F-measure of 0.92 in the metal rack areas and 0.84 in the exposed rebar area. We evaluated the measured values by the proposed method and the measured values by the measuring tape using 91 exposed rebars, and found that the correlation coefficient was high at 0.9773. The measured performance of the proposed method are sufficient to determine the scale of deterioration and can be applied to the inspection of communication tunnels.
Article
Full-text available
This paper presents a fiducial marker system specially appropriated for camera pose estimation in applications such as augmented reality and robot localization. Three main contributions are presented. First, we propose an algorithm for generating configurable marker dictionaries (in size and number of bits) following a criterion to maximize the inter-marker distance and the number of bit transitions. In the process, we derive the maximum theoretical inter-marker distance that dictionaries of square binary markers can have. Second, a method for automatically detecting the markers and correcting possible errors is proposed. Third, a solution to the occlusion problem in augmented reality applications is shown. To that aim, multiple markers are combined with an occlusion mask calculated by color segmentation. The experiments conducted show that our proposal obtains dictionaries with higher inter-marker distances and lower false negative rates than state-of-the-art systems, and provides an effective solution to the occlusion problem.
Article
This paper surveys the field of augmented reality (AR), in which 3D virtual objects are integrated into a 3D real environment in real time. It describes the medical, manufacturing, visualization, path planning, entertainment, and military applications that have been explored. This paper describes the characteristics of augmented reality systems, including a detailed discussion of the tradeoffs between optical and video blending approaches. Registration and sensing errors are two of the biggest problems in building effective augmented reality systems, so this paper summarizes current efforts to overcome these problems. Future directions and areas requiring further research are discussed. This survey provides a starting point for anyone interested in researching or using augmented reality.
Article
The central task of close-range photogrammetry is the solution of correspondence problem, i.e. determination of image coordinates for given space point for two and more images. The paper presents the results of developing and testing new coded targets for automated identification and coordinates determination of marked points. The developed coded targets are independent to location and rotation, provide the possibility of reliable detection and localisation in textured image and precise centre coordinates calculation. The coded targets have been used for automated 3D coordinates measurements by photogrammetric station of State Research Institute for Aviation System. The methodics of automated identification and 3D measurements technique are given. Also the estimations of system performance and measurements accuracy and approaches to high accuracy achievement are presented.
Article
An algorithm is described for solving large-scale instances of the Symmetric Traveling Salesman Problem (STSP) to optimality. The core of the algorithm is a "polyhedral" cutting-plane procedure that exploits a subset of the system of linear inequalities defining the convex hull of the incidence vectors of the hamiltonian cycles of a complete graph. The cuts are generated by several identification procedures that have been described in a companion paper. Whenever the cutting-plane procedure does not terminate with an optimal solution the algorithm uses a tree- search strategy that, as opposed to branch-and-bound, keeps on producing cuts after branching. The algorithm has been implemented in FORTRAN. Two different linear programming (LP) packages have been used as the LP solver. The implementation of the algorithm and the interface with one of the LP solvers is described in sufficient detail to permit the replication of our experiments. Computational results are reported with up to 42 STSPs with sizes ranging from 48 to 2,392 nodes. Most of the medium-sized test problems are taken from the literature; all others are large-scale real-world problems. All of the instances considered in this study were solved to optimality by the algorithm in "reasonable" computation times.