Conference PaperPDF Available

Abstract and Figures

Over the last decades fiducial markers have provided widely adopted tools to add reliable model-based features into an otherwise general scene. Given their central role in many computer vision tasks, countless different solutions have been proposed in the literature. Some designs are focused on the accuracy of the recovered camera pose with respect to the tag; some other concentrate on reaching high detection speed or on recognizing a large number of distinct markers in the scene. In such a crowded area both the researcher and the practitioner are licensed to wonder if there is any need to introduce yet another approach. Nevertheless, with this paper, we would like to present a general purpose fiducial marker system that can be deemed to add some valuable features to the pack. Specifically, by exploiting the projective properties of a circular set of sizeable dots, we propose a detection algorithm that is highly accurate. Further, applying a dot pattern scheme derived from error-correcting codes, allows for robustness with respect to very large occlusions. In addition, the design of the marker itself is flexible enough to accommodate different requirements in terms of pose accuracy and number of patterns. The overall performance of the marker system is evaluated in an extensive experimental section, where a comparison with a well-known baseline technique is presented.
Content may be subject to copyright.
RUNE-Tag: a High Accuracy Fiducial Marker with Strong Occlusion Resilience
Filippo Bergamasco, Andrea Albarelli, Emanuele Rodol`
a and Andrea Torsello
Dipartimento di Scienze Ambientali, Informatica e Statistica
Universit`
a Ca’ Foscari Venezia - via Torino, 155 - 30172 Venice Italy
http://www.dais.unive.it
Abstract
Over the last decades fiducial markers have provided
widely adopted tools to add reliable model-based features
into an otherwise general scene. Given their central role
in many computer vision tasks, countless different solutions
have been proposed in the literature. Some designs are fo-
cused on the accuracy of the recovered camera pose with
respect to the tag; some other concentrate on reaching high
detection speed or on recognizing a large number of distinct
markers in the scene. In such a crowded area both the re-
searcher and the practitioner are licensed to wonder if there
is any need to introduce yet another approach. Neverthe-
less, with this paper, we would like to present a general pur-
pose fiducial marker system that can be deemed to add some
valuable features to the pack. Specifically, by exploiting
the projective properties of a circular set of sizeable dots,
we propose a detection algorithm that is highly accurate.
Further, applying a dot pattern scheme derived from error-
correcting codes, allows for robustness with respect to very
large occlusions. In addition, the design of the marker itself
is flexible enough to accommodate different requirements in
terms of pose accuracy and number of patterns. The overall
performance of the marker system is evaluated in an exten-
sive experimental section, where a comparison with a well-
known baseline technique is presented.
1. Introduction
A fiducial marker is, in its broadest definition, any arti-
ficial object consistent with a known model that is placed
in a scene. At the current state-of-the-art such artifacts are
still only choice whenever a high level of precision and re-
peatability in image-based measurement is required. This
is, for instance, the case with accurate camera pose esti-
mation, 3D structure-from-motion or, more in general, any
flavor of vision-driven dimensional assessment task. Of
course a deluge of approaches have been proposed in or-
der to obtain a reasonable performance by relying only on
natural features already present in the scene. To this ex-
tent, several repeatable and distinctive interest point detec-
tion and matching techniques have been proposed over the
years. While in some scenarios such approaches can obtain
satisfactory results, they still suffer from shortcomings that
severely hinder their broader use. Specifically, the lack of a
well known model limits their usefulness in pose estimation
and, even when such a model can be inferred (for instance
by using bundle adjustment) its accuracy heavily depends
on the correctness of localization and matching. Moreover,
the availability and quality of natural features in a scene is
not guaranteed in general. Indeed, the surface smoothness
found in most man-made objects can easily lead to scenes
that are very poor in features. Finally, photometric inconsis-
tencies due to reflective or translucent materials jeopardizes
the repeatability of the detected points. For this reasons, it
is not surprising that artificial fiducial tags continue to be
widely used and are still an active research topic. For prac-
tical purposes, most markers are crafted in such a way as to
be easily detected and recognized in images produced by a
pinhole-modeled camera. In this sense, their design lever-
ages the projective invariants that characterizes geometrical
entities such as lines, planes and conics. It is reasonable to
believe that circular dots were among the first shapes used.
In fact, circles appear as ellipses under projective transfor-
mations and the associated conic is invariant with respect
to the point of view of the camera. This allows both for
an easy detection and a quite straightforward rectification
of the circle plane. In his seminal work Gatrell [7] pro-
poses to use a set of highly contrasted concentric circles
and to validate a candidate marker by exploiting the com-
patibility between the centroids of the ellipses found. By
alternating white and black circles a few bits of informa-
tion can be encoded in the marker itself. In [3] the con-
centric circle approach is enhanced by adding colors and
multiple scales. In [11] and [16] dedicated “data rings” are
added to the fiducial design. A set of four circles located
at the corner of a square is adopted by [4]: in this case an
identification pattern is placed at the centroid of the four
dots in order to distinguish between different targets. This
ability to recognize the viewed markers is very important
113
(a) Concentric Circles (b) Intersense (c) ARToolkit (d) ARTag (e) RUNE-43 (f) RUNE-129
Figure 1. Some examples of fiducial markers that differ both for the detection technique and for the pattern used for recognition. In the
first two, detection happens by finding ellipses and the coding is respectively held by the color of the rings in (a) and by the appearance
of the sectors in (b). The black square border enables detection in (c) and (d), but while ARToolkit uses image correlation to differentiate
markers, ARTag relies in error-correcting binary codes. Finally, in (e) and (f) we show two examples of RUNE-Tags.
for complex scenes where more than a single fiducial is re-
quired, furthermore, the availability of a coding scheme al-
lows for an additional validation step and lowers the num-
ber of false positives. While coded patterns are widely used
(see for instance [18,5,15]) it is interesting to note that
many papers suggest the use of the cross ratio among de-
tected points [19,20,12] or lines [21]. A clear advantage of
the cross ratio is that, being projective invariant, the recog-
nition can be made without the need of any rectification of
the image. Unfortunately this comes at the price of a low
overall number of distinctively recognizable patterns. In
fact the cross ratio is a single scalar with a strongly non-
uniform distribution [8] and this limits the number of well-
spaced different values that can possibly be generated. Also
the projective invariance of lines is frequently used in the
design of fiducial markers. Almost invariably this feature
is exploited by detecting the border edges of a highly con-
trasted quadrilateral block. This happens, for instance, with
the very well known ARToolkit [10] system which is freely
available and adopted in countless virtual reality applica-
tions. Thanks to its easy detection and the high accuracy
that can be obtained in pose recovery [14], this solution is
retained in many recent approaches, such as ARTag [6] and
ARToolkitPlus [22]. These two latter methods replace the
recognition technique of ARToolkit, which is based on im-
age correlation, with a binary coded pattern (see Fig. 1).
The use of an error-correcting code makes the marker iden-
tification very robust, in fact we can deem these designs as
the most successful from an applicative point of view.
In this paper we introduce a novel fiducial marker system
that takes advantage of the same basic features for detection
and recognition purposes. The marker is characterized by a
circular arrangement dots at fixed angular positions in one
or more concentric rings. Within this design, the projec-
tive properties of both the atomic dots and the rings they
compose are exploited to make the processing fast and re-
liable. In the following section we describe the general na-
ture of our marker, the algorithm proposed for its detection
and the coding scheme to be used for robust recognition. In
the experimental section we validate the proposed approach
by comparing its performance with two widely used marker
systems and by testing its robustness under a wide range of
noise sources.
2. Rings of Unconnected Ellipses
The proposed tag is built by partitioning a disc in sev-
eral evenly distributed sectors. Each sector, in turn, can be
divided into a number of concentric rings, which we call
levels. Each pair made up of a sector and a level defines a
slot where a dot can be placed. Finally, each dot is a circu-
lar feature whose radius is proportional to the radius of the
level at which the dot is placed. Within this design the reg-
ular organization of the dots enables easy localization and,
by properly populating each slot, it is possible to bind some
information to the tag. In Fig. 1(e) a tag built with 43 sec-
tors and just one level is shown. In Fig. 1(f) we add two
more levels: note that the dot size decreases for the inner
levels. We will explain in the following sections how this
structure is also flexible and well suitable to deal with many
scenarios.
2.1. Fast and Robust Detection in Projective Images
Both the dots and the ideal rings on which they are dis-
posed appear as ellipses under general projective transform.
Thus, the first step of the localization procedure is to try
to locate the dots by finding all the ellipses in the scene.
For this purpose we use the ellipse detector supplied by the
OpenCV [1] library, but any other suitable technique would
be fine. The dots candidates found at this stage can be con-
sidered the starting point for our algorithm. A common ap-
proach would consists in the use of a RANSAC scheme on
features centers in order to locate the dots that belong to the
Total ellipses 10 50 100 500
Full (RANSAC) 252 2118760 75287520 >1010
Proposed method 45 1225 4950 124750
Figure 2. Number of maximum steps required for ellipse testing.
114
r1r2
r1r2
(a) Estimation of the feasible plane orientations
r
r
(b) Candidate ring estimation (c) Dot vote counting
Figure 3. Steps of the ring detection: in (a) the feasible view directions are evaluated for each ellipse (with complexity O(n)), in (b) for
each compatible pair of ellipses the feasible rings are estimated (with complexity O(n2)), in (c) the dot votes are counted, the code is
recovered and the best candidate ring is accepted (figure best viewed in color).
same marker (if any) and to separate them them from false
positives. Unfortunately, five points are needed to charac-
terize an ellipse, thus the use of RANSAC (especially with
many false positives) could lead to an intractable problem
(see Fig. 2). Since the marker itself can contain more than
one hundred dots, it is obvious that this approach is not fea-
sible. A possible alternative could be the use of some spe-
cialized Hough Transform [23], but also this solution would
not work since the relatively low number of dots (coupled
with the high dimensionality of the parameter space) hin-
ders the ability of the bins to accumulate enough votes for a
reliable detection. In order to cluster dots candidates into
coherent rings we need to exploit some additional infor-
mation. Specifically, after the initial ellipse detection the
full conic associated to each dot candidate is known. While
from this single conic it is not possible to recover the full the
camera pose, nevertheless we can estimate a rotation that
transform the ellipse into a circle. Following [2], the first
step for recovering such rotation is to normalize the conic
associated to the dot, obtaining:
Q=
A B D
f
B C E
f
D
fE
fF
f2
where fis the focal length of the camera that captured the
scene and Ax2+ 2Bxy +Cy2+ 2Dx + 2Ey +F= 0
is the implicit equation of the ellipse found. The Qis then
decomposed via SVD:
Q=VΛVTwith Λ=diag(λ1, λ2, λ3)
The required rotation can thus be computed (up to some
parameters) as:
R=V
gcosα s1gsinα s2h
sinα s1cosα 0
s1s2hcosα s2hsinα s1g
with g=rλ2λ3
λ1λ3
, h =rλ1λ2
λ1λ3
Here αis an arbitrary rotation around the normal of the
marker plane. Since we are not interested in the complete
pose (which is not even possible to recover) we can just fix
such angle to 0and obtain:
R=V
g0s2h
0s10
s1s2h0s1g
Finally s1and s2are two free signs, which leave us with
four possible rotation matrices, defining four different ori-
entations. Two of these orientation can be eliminated, as
they are discording with the line of sight. The other two
must be evaluated for each detected ellipse: we can define
them as r1and r2(see Fig. 3(a)). At this point it is pos-
sible to search for whole markers. For all the pairs of de-
tected ellipses the rotations are combined to form four fea-
sible rotation pair. These are filtered eliminating the pairs
with an inner product above a fixed threshold and then the
best pair of rotation is selected by applying the average of
the rotations (as quaternions) to both ellipses and a choos-
ing the pair with the minimal distance between the mean
radii of the rectified ellipses. The rationale of the filtering
is to avoid to choose ellipses with discordant orientations
(as the marker is planar) and the compatibility score takes
in account that the dots on the same ring should be exactly
the same size on the rectified plane. If a good average ro-
tation ris found then exactly two hypothesis about the ring
location can be made. In fact we know both the angle be-
tween camera and marker planes and the size of the dots
on the rectified plane. Since the ratio between the radii of
each level and the dots that it contains is known and con-
stant (regardless of the level) we can estimate the radius of
the ring. Finally we can fit such ring of know radius to the
two dots examined and thus reproject on the image plane
the two possible solutions (Fig. 3(b)). In this way we get
115
Figure 4. Detection grid for a Rune-Tag with multiple levels
two main advantages. The first one is at most O(n2)can-
didate rings have to be tested, were nis the number of the
ellipses found (in Fig. 2we can see that the problem be-
comes tractable). The second advantage is that, as opposed
to many other approaches, the vote binning and the recov-
ery of the code happens entirely in the image space, thus
no picture rectification is required. Note that the counting
of the dots happens by reprojecting the circular grid made
by sectors and levels on the image (Fig. 3(c)). Of course if
more than one ring is expected we need to project the addi-
tional levels both inward and outward (see Fig. 4). This is
due to the fact that even if a correct ring is detected we still
do not know at which level it is located since the ratio of the
dots is scaled accordingly.
2.2. Marker Recognition and Coding Strategies
Once the candidate ellipses are found we are left with
two coupled problems: the first is that of assigning corre-
spondences between the candidates ellipses and the circles
in the marker, or, equivalently, to find an alignment around
the orthogonal axis of the marker; the second is that of rec-
ognizing which of several markers we are dealing with.
The problem is further complicated by the fact that mis-
detections and occlusions make the matching non exact.
Here we chose to cast the problem into the solid and well-
developed mathematical framework of coding theory where
the circle pattern corresponds to a code with clearly de-
signed properties and error-correcting capabilities. In what
follows we will give a brief review of the theory needed to
build and decode the markers. We refer to [13] for a more
in-depth introduction to the field.
Ablock code of length nover a set of Symbols Sis a set
CSnand the elements of a code are called codewords.
The Hamming distance dH :S×SNis the number of
symbols that differ between two codeword, i.e.,
dH(u, v) = |is.t. ui6=vi, i = 1...n|
The Hamming distance of a code is the minimum distance
between all the codewords: dH(C) = minu,vCdH(u, v).
A code with Hamming distance dcan detect d1errors
and correct b(d1)/2cor d1erasures (i.e., situations in
which we have unreadable rather than wrong symbols).
Let qNsuch that q=pk, for prime a pand an integer
k1, we denote with Fqthe field with qelements. A
linear code Cis a k-dimensional vector sub-space of (Fq)n,
where the symbols are taken over the field Fq. A linear code
of length nand dimension khas qkdistinct codewords and
is subject to the singleton bound: dnk+1, thus, with a
fixed code length n, higher error correcting capabilities are
payed with a smaller number of available codewords.
In our setting we map the point patterns around the cir-
cle to a codeword, but, since on a circle we do not have a
starting position of the code, we have to take into account
all cyclic shifts of a pattern. A linear code Cis called cyclic
if any cyclic shift of a codeword is still a codeword, i.e.
(c0, . . . , cn1)C(cn1, c0, . . . , cn2)C .
There is a bijection between the vectors of (Fq)nand
residue class of Fq[x]modulo division by xn1.
v= (v0, . . . , vn1)v0+v1x+· · · +vn1xn1.
Multiplying a polynomial form of a code by xmodulo xn
1corresponds to cyclic shift:
x(c0+c1x+· · ·+cn1xn1) = cn1+c0x+· · ·+cn2xn1.
Further, Cis a cyclic code if and only if Cis an ideal of
the quotient group of the polynomial ring Fq[x]modulo di-
vision by xn1. This means that all cyclic codes in poly-
nomial form are multiples of a monic generator polynomial
g(x)which divides xn1in Fq. Thus, if g(x)is a generator
polynomial of degree m, all codewords can be obtained by
any mapping a polynomial p(x)Fq[x]of degree at most
nm1minto p(x)g(x) mod xn1.
Using a cyclic code of distance 2e+1 guarantees that we
can correct emisdetections, regardless of the actual align-
ment of the patterns. Moreover, we can decode the marker
used and recover the alignment at the same time. Since all
the cyclic shifts are codes, we can group the codewords into
cyclic equivalence classes, such that two codewords are in
the same class if one can be obtained from the other with
a cyclic shift. Clearly, the number of elements in a cyclic
equivalence class divides n, so by choosing nprime, we
only have classes where all the codewords are distinct, or
classes composed of one element, i.e., constant codewords
with nrepetitions of the same symbol. The latter group
is composed of which are at most qcodewords and can be
easily eliminated. In our marker setting, the choice of the
marker is encoded by the cyclic equivalence class, while
the actual alignment of the circles can be obtained from the
detected element within the class.
In this paper we are restricting our analysis to the cor-
rection of random errors or erasures, but it is worth noting
that cyclic codes have been used to detect and correct burst
errors, i.e. errors that are spatially coherent, like we have in
the case of occlusions.
116
10-4
10-3
10-2
10-1
20 40 60 80 100
∆α [rad]
Noise σ
Rune-43
Rune-129
ARToolkit
ARToolkitPlus
10-4
10-3
10-2
10-1
0 5 10 15 20
∆α [rad]
Gaussian blur window size [px]
Rune-43
Rune-129
ARToolkit
ARToolkitPlus
10-4
10-3
10-2
10-1
0 0.2 0.4 0.6 0.8 1 1.2
∆α [rad]
Angle of view [rad]
Rune-43
Rune-129
ARToolkit
ARToolkitPlus
Figure 5. Evaluation of the accuracy in the camera pose estimation with respect to different scene conditions. Examples of the detected
features are shown for RUNE-129 (first image column) and ARToolkitPlus (second image column).
Specifically, we experiment with two distinct codes. The
first code (RUNE-43) is formed of a single circular pattern
of circles that can be present or absent in 43 different angu-
lar slots. In this situation we encode the pattern as a vector
in (Z2)43, where Z2is the remainder class modulo 2. For
this code we chose the generator polynomial
g(x) = (1 + x2+x4+x7+x10 +x12 +x14 )
(1 + x+x3+x7+x11 +x13 +x14)
which provides a cyclic code of dimension 15 giving 762
different markers (equivalence classes) with a minimum
distance of 13, allowing us to correct up to 6 errors.
The second code (RUNE-129) is formed of a three con-
centric pattern of circles in 43 different angular slots. In
this situation we have 8 possible patterns for each angular
slot. We hold out the pattern with no circles to detect era-
sures due to occlusions and we encode the remaining 7 as
a vector in (Z7)43. For this code we chose the generator
polynomial
g(x) = (1 + 4x+x2+ 6x3+x4+ 4x5+x6)
(1+2x2+2x3+2x4+x6)(1+x+3x2+5x3+3x4+x5+x6)
(1 + 5x+ 5x2+ 5x4+ 5x5+x6)(1 + 6x+ 2x3+ 6x5+x6)
(1 + 6x+ 4x2+ 3x3+ 4x4+ 6x5+x6)
providing a cyclic code of dimension 7 which gives 19152
different markers with a minimum distance of 30, allowing
us to correct up to 14 errors, or 29 erasures, or any combi-
nation of eerrors and cerasures such that 2e+c29. For
efficient algorithms to decode the patterns and correct the
error we refer to the literature [13].
2.3. Estimation of the Camera Pose
By using the detected and labelled ellipses it is now pos-
sible to estimate the camera pose. Since the geometry of the
original marker is known any algorithm that solves the PnP
problem can be used. In our test we used the solvePnP func-
tion available from OpenCV. However it should be noted
117
10-4
10-3
10-2
0 10 20 30 40 50 60 70
∆α [rad]
Occlusions [%]
Rune-43
Rune-129
10-4
10-3
10-2
10-1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
∆α [rad]
Gradient Steepness
Rune-43
Rune-129
Figure 6. Evaluation of the accuracy in the camera pose estimation of RUNE-Tag with respect to occlusion (left column) and illumination
gradient (right column).
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 2 3 4 5 6 7 8 9 10
Time [sec.]
Number of tags
Rune-Tag 129
Rune-Tag 43
Figure 7. Evaluation of the recognition time respectively when adding artificial false ellipses in the scene (left column) and with several
markers (right column).
that, while the estimated ellipse centers can be good enough
for the detection step, it could be reasonable to refine them
in order to recover a more accurate pose. Since this is done
only when a marker is found and recognized we can in-
dulge and dedicate a little more computational resources
at this stage. In this paper we used the robust ellipse re-
finement presented in [17]. In addition to a more accurate
localization it could be useful to correct also the projective
displacement of the ellipses centers. However, according to
our tests, such correction gives in general no advantage and
sometimes leads to slightly less accurate results. Finally we
also tried the direct method outlined in [9], but we obtained
very unstable results, especially with small and skewed el-
lipses.
3. Experimental Validation
In this section the accuracy and speed of the Rune-Tag
fiducial markers is evaluated and compared with the re-
sults obtained by ARToolkit and ARToolkitPlus. Both tags
with one level (RUNE-43) and three levels (RUNE-129) are
tested. All the experiments have been performed on typical
a desktop PC equipped with a 1.6Ghz Intel Core Duo pro-
cessor. The accuracy of the recovered pose is measured as
the angular difference between the ground truth camera ori-
entation and the pose obtained. Such ground truth is known
since the test images are synthetically generated under dif-
ferent condition of nose, illumination, viewing direction,
etc. The implementations of ARToolkit and ARToolkitPlus
used are the ones freely available at the respective websites.
The real images are taken with a 640x480 CMOS webcam.
3.1. Accuracy and Baseline Comparisons
In Fig. 5the accuracy of our markers is evaluated. In
the first test an additive Gaussian noise was added to im-
ages with an average view angle of 0.3 radians and no arti-
ficial blur added. The performance of all methods get worse
with increasing levels of noise and ARToolkitPlus, while in
general more accurate than ARToolkit, breaks when dealing
with a noise with a std. dev. greater than 80 (pixel inten-
sities goes from 0 to 255). Both RUNE-43 and RUNE-129
always recover a more faithful pose. We think that this is
mainly due to the larger number of correspondences used
to solve the PnP problem. In fact we can observe that in
all the experiments RUNE-129 performs consistently better
than RUNE-43. Unlike additive noise, Gaussian blur seems
to have a more limited effect on all the techniques. This is
mainly related to the fact that all of them performs a pre-
liminary edge detection step, which in turn applies a con-
volution kernel. Thus is somewhat expected that an addi-
tional blur does not affect severely the marker localization.
Finally it is interesting to note that oblique angles lead to
an higher accuracy (as long as the markers are still recog-
nizable). This is explained by observing that the constraint
of the reprojection increases with the angle of view. Over-
all these experiments confirm that Rune-Tag always outper-
forms the other two tested techniques by about one order of
118
(a) (b) (c) (d)
Figure 8. Some examples of behaviour in real videos with occlusion. In (a) and (b) an object is placed inside the marker and the setup is
rotated. In (c) and (d) the pose is recovered after medium and severe occlusion.
magnitude. In practical terms the improvement is not negli-
gible, in fact an error as low as 103radians still produces
a jitter of 1 millimeter when projected over a distance of
1 meter. While this is a reasonable performance for aug-
mented reality applications, it can be unacceptable for ob-
taining precise contactless measures.
3.2. Resilience to Occlusion and Illumination
One of the main characteristics of Rune-Tag is that it is
very robust to occlusion. In section 2.2 we observed that
RUNE-129 can be used to distinguish between about 20.000
different tags and still be robust to occlusions as large as
about 2/3of the dots. By choosing different cyclic coding
schemes is even possible to push this robustness even fur-
ther, at the price of a lower number of available tags. In the
first column of Fig. 6we show how occlusion affects the
accuracy of the pose estimation (i.e. how well the pose is
estimated with fewer dots regardless to the ability of recog-
nize the marker). Albeit a linear decreasing of the accuracy
with respect to the occlusion can be observer, the precision
is still quite reasonable also when most of the dots are not
visible. In Fig. 9we show the recognition rate of the two
proposed designs with respect to the percentage of marker
area occluded. In the second column of Fig. 6the robustness
to illumination gradient is examined. The gradient itself is
measured unit per pixel (i.e. quantity to add to each pixel
value for a each pixel of distance from the image center).
Overall, the proposed methods are not affected very much
by the illumination gradient and break only when it become
very large (in our setup an illumination gradient of 1 implies
that pixels are completely saturated at 255 pixels from the
image center).
Occlusion 0% 10% 20% 50% 70%
RUNE-43 100% 69% 40% 0% 0%
RUNE-129 100% 100% 100% 100% 67%
Figure 9. Recognition rate of the two proposed marker configura-
tions with respect to the percentage of area occluded.
3.3. Performance Evaluation
Our tag system is designed for improved accuracy and
robustness rather than for high detection speed. This is quite
apparent in Fig. 7, where we can see that the recognition
could require from a minimum of about 15 ms (RUNE-
43 with one tag an no noise) to a maximum of about 180
ms (RUNE-129 with 10 tags). By comparison ARToolkit-
Plus is about an order of magnitude faster [22]. However, it
should be noted that, despite being slower, the frame rates
reachable by Rune-Tag (from 60 to about 10 fps) can still
be deemed as usable even for real-time applications (in par-
ticular when few markers are viewed at the same time).
3.4. Behaviour with Real Images
In addition to the evaluation with synthetic images we
also performed some qualitative tests on real videos. In
Fig. 8some experiments with common occlusion scenar-
ios are presented. In the first two shots an object is placed
inside a RUNE-43 marker in a typical setup used for image-
based shape reconstruction. In the following two frames a
RUNE-129 marker is tested for its robustness to moderate
and severe occlusion. At last, in Fig. 10 an inherent short-
coming of our design is highlighted. The high density ex-
hibited by the more packed markers may result in a failure
of the ellipse detector whereas the tag is far away from the
camera or very angled, causing the dots to become too small
or blended.
Figure 10. Recognition fails when the marker is angled and far
away from the camera and the ellipses blends together.
119
4. Conclusions
The proposed fiducial marker system exhibits several ad-
vantages over the current range of designs. It is both very re-
sistant to occlusion thanks to its code-theoretic design, and
offers very high accuracy in pose estimation. In fact, our
experimental validation shows that the precision of the pose
recovery can be about an order of magnitude higher than the
current state-of-the-art. This advantage is maintained also
with a significant level of artificial noise, blur, illumination
gradient and with up to 70% of the features occluded. Fur-
ther, the design of the marker itself is quite flexible as can be
adapted to accommodate a larger number of different codes
or an higher resilience to occlusion. In addition, the iden-
tity between the features to be detected and the pattern to be
recognized leaves plenty of space in the marker interior for
any additional payload or even for placing a physical object
for reconstruction tasks. Finally, while slower than other
techniques, this novel method is fast enough to be used in
real-time applications. Of course those enhancements do
not come without some drawbacks. Specifically, the severe
packing of circular points can lead the ellipse detector to
wrongly merge features at low resolution. This effectively
reduces the maximum distance at which a target can be rec-
ognized. However, this limitation can be easily relieved by
using a simpler marker, such as RUNE-43, which allows for
a more extended range while still providing a satisfactory
precision.
References
[1] G. Bradski and A. Kaehler. Learning OpenCV: Computer
Vision with the OpenCV Library. O’Reilly Media, Inc., 1st
edition, 2008. 114
[2] Q. Chen, H. Wu, and T. Wada. Camera calibration with two
arbitrary coplanar circles. In European Conference on Com-
puter Vision - ECCV, 2004. 115
[3] Y. Cho, J. Lee, and U. Neumann. A multi-ring color fidu-
cial system and a rule-based detection method for scalable
fiducial-tracking augmented reality. In Proceedings of Inter-
national Workshop on Augmented Reality, 1998. 113
[4] D. Claus and A. W. Fitzgibbon. Reliable automatic calibra-
tion of a marker-based position tracking system. In IEEE
Workshop on Applications of Computer Vision, 2005. 113
[5] M. Fiala. Artag, a fiducial marker system using digital tech-
niques. In Proceedings of the 2005 IEEE Computer Soci-
ety Conference on Computer Vision and Pattern Recognition
(CVPR’05), CVPR ’05, Washington, DC, USA, 2005. IEEE
Computer Society. 114
[6] M. Fiala. Designing highly reliable fiducial markers. IEEE
Trans. Pattern Anal. Mach. Intel., 32(7), 2010. 114
[7] L. Gatrell, W. Hoff, and C. Sklair. Robust image features:
Concentric contrasting circles and their image extraction. In
Proc. of Cooperative Intelligent Robotics in Space, Washing-
ton, USA, 1991. SPIE. 113
[8] D. Q. Huynh. The cross ratio: A revisit to its probability den-
sity function. In Proceedings of the British Machine Vision
Conference BMVC 2000, 2000. 114
[9] J. Kannala, M. Salo, and J. Heikkil¨
a. Algorithms for com-
puting a planar homography from conics in correspondence.
In British Machine Vision Conference, 2006. 118
[10] H. Kato and M. Billinghurst. Marker tracking and hmd cal-
ibration for a video-based augmented reality conferencing
system. In Proceedings of the 2nd IEEE and ACM Inter-
national Workshop on Augmented Reality, Washington, DC,
USA, 1999. IEEE Computer Society. 114
[11] V. A. Knyaz, H. O. Group, and R. V. Sibiryakov. The de-
velopment of new coded targets for automated point identifi-
cation and non-contact surface measurements. In 3D Sur-
face Measurements, International Archives of Photogram-
metry and Remote Sensing, 1998. 113
[12] R. V. Liere and J. D. Mulder. Optical tracking using projec-
tive invariant marker pattern properties. In Proceedings of
the IEEE Virtual Reality Conference. IEEE Press, 2003. 114
[13] J. H. V. Lint. Introduction to Coding Theory. Springer-Verlag
New York, Inc., Secaucus, NJ, USA, 1998. 116,117
[14] M. Maidi, J.-Y. Didier, F. Ababsa, and M. Mallem. A perfor-
mance study for camera pose estimation using visual marker
based tracking. Mach. Vision Appl., 21, 2010. 114
[15] J. Mooser, S. You, and U. Neumann. Tricodes: A barcode-
like fiducial design for augmented reality media. Multimedia
and Expo, IEEE International Conference on, 2006. 114
[16] L. Naimark and E. Foxlin. Circular data matrix fiducial
system and robust image processing for a wearable vision-
inertial self-tracker. In Proceedings of the 1st International
Symposium on Mixed and Augmented Reality, ISMAR ’02,
Washington, DC, USA, 2002. IEEE Computer Society. 113
[17] J. Ouellet and P. Hebert. Precise ellipse estimation without
contour point extraction. Mach. Vision Appl., 21, 2009. 118
[18] J. Rekimoto and Y. Ayatsuka. CyberCode: designing aug-
mented reality environments with visual tags. In DARE ’00:
Proceedings of DARE 2000 on Designing augmented reality
environments, New York, NY, USA, 2000. ACM. 114
[19] L. Teixeira, M. Loaiza, A. Raposo, and M. Gattass. Aug-
mented reality using projective invariant patterns. In Ad-
vances in Visual Computing, volume 5358 of Lecture Notes
in Computer Science. Springer Berlin / Heidelberg, 2008.
114
[20] V. S. Tsonisp, K. V. Ch, and P. E. Trahaniaslj. Landmark-
based navigation using projective invariants. In Proceedings
of the 1998 IEEE Intl. Conf. on Intelligent Robots and Sys-
tems, Victoria, Canada, 1998. IEEE Computer Society. 114
[21] A. van Rhijn and J. D. Mulder. Optical tracking using line
pencil fiducials. In Proceedings of the eurographics sympo-
sium on virtual environments, 2004. 114
[22] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and
D. Schmalstieg. Real time detection and tracking for aug-
mented reality on mobile phones. IEEE Transactions on Vi-
sualization and Computer Graphics, 99, 2010. 114,119
[23] X. Yu, H. W. Leong, C. Xu, and Q. Tian. A robust and
accumulator-free ellipse hough transform. In Proceedings
of the 12th annual ACM international conference on Multi-
media, New York, NY, USA, 2004. ACM. 115
120
... There is a big group of marker systems that are designed for various purposes. Examples of marker-based systems include ARToolKit [5,6], ARTag [7], ARToolKit Plus [8], reacTIVision [9], CALTag [10], AprilTag [11][12][13], RUNE-Tag [14], Pi-Tag [15], CCTag [16], ChromaTag [17], and TopoTag [18]. For further examples and descriptions, see the related work section. ...
... AprilTag is free and open-sourced [55]. RUNE-Tag [14] is a marker system that was published by Bergamasco et al. in 2011. The system was developed with an emphasis on strong occlusion resilience. ...
... Pi-Tag [15] is a marker system published by Bergamasco et al. 2013. It is a follow-up work on RUNE-Tag [14], which was developed by the same team. The design of Pi-Tag is similar to CoP-Tag [58]. ...
Article
The paper describes the process of designing a simple fiducial marker. The marker is meant for use in augmented reality applications. Unlike other systems, it does not encode any information, but it can be used for obtaining the position, rotation, relative size, and projective transformation. Also, the system works well with motion blur and is resistant to the marker’s imperfections, which could theoretically be drawn only by hand. Previous systems put constraints on colors that need to be used to form the marker. The proposed system works with any saturated color, leading to better blending with the surrounding environment. The marker’s final shape is a rectangular area of a solid color with three lines of a different color going from the center to three corners of the rectangle. Precise detection can be achieved using neural networks, given that the training set is very varied and well designed. A detailed literature review was performed, and no such system was found. Therefore, the proposed design is novel for localization in the spatial scene. The testing proved that the system works well both indoor and outdoor, and the detections are precise.
... Each region represents a keypoint and a bit. RuneTag [7], [8] arranges solid dots on concentric circles. The dots' presence or absence represents bits, and their centroids are used for pose estimation. ...
... Fourier tags [19], [20] encode multi-bit information in the fequency domain. RuneTag [7], [8] uses rings of dots to represent cyclic codes and provide a large number of keypoints for robust pose estimation. ...
Article
Full-text available
A fiducial marker system usually consists of markers, a detection algorithm, and a coding system. The appearance of markers and the detection robustness are generally limited by the existing detection algorithms, which are hand-crafted with traditional low-level image processing techniques. Furthermore, a sophisticatedly designed coding system is required to overcome the shortcomings of both markers and detection algorithms. To improve the flexibility and robustness in various applications, we propose a general deep learning based framework, DeepTag, for fiducial marker design and detection. DeepTag not only supports detection of a wide variety of existing marker families, but also makes it possible to design new marker families with customized local patterns. Moreover, we propose an effective procedure to synthesize training data on the fly without manual annotations. Thus, DeepTag can easily adapt to existing and newly-designed marker families. To validate DeepTag and existing methods, beside existing datasets, we further collect a new large and challenging dataset where markers are placed in different view distances and angles. Experiments show that DeepTag well supports different marker families and greatly outperforms the existing methods in terms of both detection robustness and pose accuracy. Both code and dataset are available at https://herohuyongtao.github.io/research/publications/deep-tag/.
... Mainstream visual markers mainly include Rune-Tag [30], April-Tag [31], Chroma-Tag [32], Aruco-Tag [33], etc. Considering the positioning accuracy and recognition speed, we select Aruco-Tag as a visual marker. The core of localization based on a visual marker is to solve the PnP problem. ...
Article
In this paper, we propose a visual marker-aided LiDAR/IMU/encoder integrated odometry, Marked-LIEO, to achieve pose estimation of mobile robots in an indoor long corridor environment. In the first stage, we design the pre-integration model of encoder and IMU respectively to realize the pose estimation combined with the pose estimation from the second stage providing prediction for the LiDAR odometry. In the second stage, we design low-frequency visual marker odometry, which is optimized jointly with LiDAR odometry to obtain the final pose estimation. In view of the wheel slipping and LiDAR degradation problems, we design an algorithm that can make the optimization weight of encoder odometry and LiDAR odometry adjust adaptively according to yaw angle and LiDAR degradation distance respectively. Finally, we realize the multi-sensor fusion localization through joint optimization of an encoder, IMU, LiDAR, and camera measurement information. Aiming at the problems of GNSS information loss and LiDAR degradation in indoor corridor environment, this method introduces the state prediction information of encoder and IMU and the absolute observation information of visual marker to achieve the accurate pose of indoor corridor environment, which has been verified by experiments in Gazebo simulation environment and real environment.
... Binary codes in a square shape are used to create payload inside a black border of markers. The square shape has been preferred to a circular shape [Bergamasco et al. 2011;Naimark and Foxlin 2002;Xu and Dudek 2011] or irregular shape [Bencina et al. 2005]. This is because four known point correspondences, such as corners, help estimating the marker poses. ...
Preprint
Fiducial markers have been broadly used to identify objects or embed messages that can be detected by a camera. Primarily, existing detection methods assume that markers are printed on ideally planar surfaces. Markers often fail to be recognized due to various imaging artifacts of optical/perspective distortion and motion blur. To overcome these limitations, we propose a novel deformable fiducial marker system that consists of three main parts: First, a fiducial marker generator creates a set of free-form color patterns to encode significantly large-scale information in unique visual codes. Second, a differentiable image simulator creates a training dataset of photorealistic scene images with the deformed markers, being rendered during optimization in a differentiable manner. The rendered images include realistic shading with specular reflection, optical distortion, defocus and motion blur, color alteration, imaging noise, and shape deformation of markers. Lastly, a trained marker detector seeks the regions of interest and recognizes multiple marker patterns simultaneously via inverse deformation transformation. The deformable marker creator and detector networks are jointly optimized via the differentiable photorealistic renderer in an end-to-end manner, allowing us to robustly recognize a wide range of deformable markers with high accuracy. Our deformable marker system is capable of decoding 36-bit messages successfully at ~29 fps with severe shape deformation. Results validate that our system significantly outperforms the traditional and data-driven marker methods. Our learning-based marker system opens up new interesting applications of fiducial markers, including cost-effective motion capture of the human body, active 3D scanning using our fiducial markers' array as structured light patterns, and robust augmented reality rendering of virtual objects on dynamic surfaces.
... This circular marker is divided into black, white sectors, with small black and white circles that allow to understand its orientation. Bergamasco et al. [22] provided a fiducial marker system with a strong occlusion resilience while Calvet et al. developed a circular fiducials system, composed of three concentric circles, able to deal with severe conditions, such as partial occlusion, varying distances and angles of view [21]. The marker provides a high-frequency image since the circles are black over a white background. ...
Article
Full-text available
Fiducial markers are fundamental components of many computer vision systems that help, through their unique features (e.g., shape, color), a fast localization of spatial objects in unstructured scenarios. They find applications in many scientific and industrial fields, such as augmented reality, human-robot interaction, and robot navigation. In order to overcome the limitations of traditional paper-printed fiducial markers (i.e. deformability of the paper surface, incompatibility with industrial and harsh environments, complexity of the shape to reproduce directly on the piece), we aim at exploiting existing, or additionally fabricated, structural features on rigid bodies (e.g., holes), developing a fiducial mechanical marker system called MechaTag. Our system, endowed with a dedicated algorithm, is able to minimize recognition errors and to improve repeatability also in case of ill boundary conditions (e.g., partial illumination). We assess MechaTag in a pilot study, achieving a robustness of fiducial marker recognition above 95% in different environment conditions and position configurations. The pilot study was conducted by guiding a robotic platform in different poses in order to experiment with a wide range of working conditions. Our results make MechaTag a reliable fiducial marker system for a wide range of robotic applications in harsh industrial environments without losing accuracy of recognition due to the shape and material.
... There is, visually, no obvious difference in appearance or geometrical shape between the two classes (see the examples in Figures 5 and 6). The geometrical freeform property differentiates Artcodes from other well-known markers, such as barcodes (Woodland and Bernard, 1952), QR codes (International Organization for Standardization, 2015), ARTags (Fiala, 2005), or RUNE-tags (Bergamasco et al., 2011). Artcodes, as a type of augmented reality technique, have been adopted in many situations (as described in Section 2.2) to augment the meanings of the objects in aesthetic-centred contexts. ...
Article
Full-text available
Software testing is often hindered where it is impossible or impractical to determine the correctness of the behaviour or output of the software under test (SUT), a situation known as the oracle problem. An example of an area facing the oracle problem is automatic image classification, using machine learning to classify an input image as one of a set of predefined classes. An approach to software testing that alleviates the oracle problem is metamorphic testing (MT). While traditional software testing examines the correctness of individual test cases, MT instead examines the relations amongst multiple executions of test cases and their outputs. These relations are called metamorphic relations (MRs): if an MR is found to be violated, then a fault must exist in the SUT. This paper examines the problem of classifying images containing visually hidden markers called Artcodes, and applies MT to verify and enhance the trained classifiers. This paper further examines two MRs, Separation and Occlusion, and reports on their capability in verifying the image classification using one-way analysis of variance (ANOVA) in conjunction with three other statistical analysis methods: t-test (for unequal variances), Kruskal–Wallis test, and Dunnett’s test. In addition to our previously-studied classifier, that used Random Forests, we introduce a new classifier that uses a support vector machine, and present its MR-augmented version. Experimental evaluations across a number of performance metrics show that the augmented classifiers can achieve better performance than non-augmented classifiers. This paper also analyses how the enhanced performance is obtained.
Article
Zusammenfassung In diesem Beitrag wird ein Design für eine visuelle Passermarke zur Posenschätzung von Kameras vorgestellt, die sogenannte Dynamische Passermarke . Die Passermarke wird auf einem Anzeigegerät dargestellt und passt ihr Aussehen an die räumlichen und zeitlichen Anforderungen der visuellen Wahrnehmungsaufgabe an. Dies ist vor allem für mobile Roboter von Vorteil, die eine Kamera als Sensor verwenden. Es wird eine Regelung entworfen, die das Aussehen der Passermarke in Abhängigkeit der aktuellen Pose der Kamera verändert, um den Erfassungsbereich der Kamera für die Posenschätzung zu vergrößern und eine höhere Genauigkeit der Posenschätzung im Vergleich zu herkömmlichen passiven Passermarken zu erreichen.
Article
Fiducial markers have been broadly used to identify objects or embed messages that can be detected by a camera. Primarily, existing detection methods assume that markers are printed on ideally planar surfaces. The size of a message or identification code is limited by the spatial resolution of binary patterns in a marker. Markers often fail to be recognized due to various imaging artifacts of optical/perspective distortion and motion blur. To overcome these limitations, we propose a novel deformable fiducial marker system that consists of three main parts: First, a fiducial marker generator creates a set of free-form color patterns to encode significantly large-scale information in unique visual codes. Second, a differentiable image simulator creates a training dataset of photorealistic scene images with the deformed markers, being rendered during optimization in a differentiable manner. The rendered images include realistic shading with specular reflection, optical distortion, defocus and motion blur, color alteration, imaging noise, and shape deformation of markers. Lastly, a trained marker detector seeks the regions of interest and recognizes multiple marker patterns simultaneously via inverse deformation transformation. The deformable marker creator and detector networks are jointly optimized via the differentiable photorealistic renderer in an end-to-end manner, allowing us to robustly recognize a wide range of deformable markers with high accuracy. Our deformable marker system is capable of decoding 36-bit messages successfully at ~29 fps with severe shape deformation. Results validate that our system significantly outperforms the traditional and data-driven marker methods. Our learning-based marker system opens up new interesting applications of fiducial markers, including cost-effective motion capture of the human body, active 3D scanning using our fiducial markers' array as structured light patterns, and robust augmented reality rendering of virtual objects on dynamic surfaces.
Article
Full-text available
Many computer vision tasks can be simplified if special image features are placed on the objects to be recognized. A review of special image features that have been used in the past is given and then a new image feature, the concentric contrasting circle, is presented. The concentric contrasting circle image feature has the advantages of being easily manufactured, easily extracted from the image, robust extraction (true targets are found, while few false targets are found), it is a passive feature, and its centroid is completely invariant to the three translational and one rotational degrees of freedom and nearly invariant to the remaining two rotational degrees of freedom. There are several examples of existing parallel implementations which perform most of the extraction work. Extraction robustness was measured by recording the probability of correct detection and the false alarm rate in a set of images of scenes containing mockups of satellites, fluid couplings, and electrical components. A typical application of concentric contrasting circle features is to place them on modeled objects for monocular pose estimation or object identification. This feature is demonstrated on a visually challenging background of a specular but wrinkled surface similar to a multilayered insulation spacecraft thermal blanket.
Conference Paper
Full-text available
The ellipse Hough transform (EHT) is a widely-used technique. Most of the previous modifications to the standard EHT improved either the voting procedure that computes the absolute measure function (AMF) or the peak detection of the AMF. However, existing EHTs are not robust for detecting partial slightly-oblique ellipses. This paper presents a Robust and Accumulator-Free Ellipse Hough Transform (RAF-EHT), an improved EHT that is robust even for partial slightly-oblique ellipses. Our RAF-EHT is based on two main ideas, namely, (1) an improved measure function (IMF) for handling the partiality and the obliqueness of ellipses, (2) a new accumulator-free computation scheme for finding the top k peaks of the IMF, without complex peak detection. Experimental results show that the RAF-EHT is more robust than the existing EHTs in detecting the partial slightly-oblique ellipses. In addition, the RAF-EHT needs only a little memory because it is accumulator-free.
Conference Paper
Full-text available
In this paper, we describe a novel camera calibration method to estimate the extrinsic parameters and the focal length of a camera by using only one single image of two coplanar circles with arbitrary radius. We consider that a method of simple operation to estimate the extrinsic parameters and the focal length of a camera is very important because in many vision based applications, the position, the pose and the zooming factor of a camera is adjusted frequently. An easy to use and convenient camera calibration method should have two characteristics: 1) the calibration object can be produced or prepared easily, and 2) the operation of a calibration job is simple and easy. Our new method satisfies this requirement, while most existing camera calibration methods do not because they need a specially designed calibration object, and require multi-view images. Because drawing beautiful circles with arbitrary radius is so easy that one can even draw it on the ground with only a rope and a stick, the calibration object used by our method can be prepared very easily. On the other hand, our method need only one image, and it allows that the centers of the circle and/or part of the circles to be occluded. Another useful feature of our method is that it can estimate the focal length as well as the extrinsic parameters of a camera simultaneously. This is because zoom lenses are used so widely, and the zooming factor is adjusted as frequently as the camera setting, the estimation of the focal length is almost a must whenever the camera setting is changed. The extensive experiments over simulated images and real images demonstrate the robustness and the effectiveness of our method.
Book
From the Publisher: This book has long been considered one of the classic references to an important area in the fields of information theory and coding theory. This third edition has been revised and expanded, including new chapters on algebraic geometry, new classes of codes, and the essentials of the most recent developments on binary codes. Also included are exercises with complete solutions.
Article
The central task of close-range photogrammetry is the solution of correspondence problem, i.e. determination of image coordinates for given space point for two and more images. The paper presents the results of developing and testing new coded targets for automated identification and coordinates determination of marked points. The developed coded targets are independent to location and rotation, provide the possibility of reliable detection and localisation in textured image and precise centre coordinates calculation. The coded targets have been used for automated 3D coordinates measurements by photogrammetric station of State Research Institute for Aviation System. The methodics of automated identification and 3D measurements technique are given. Also the estimations of system performance and measurements accuracy and approaches to high accuracy achievement are presented.
Article
In this paper, a new pattern based optical tracking method is presented for the recognition and pose estimation of input devices for virtual or augmented reality environments. The method is based on pencils of line ducials, which reduces occlusion problems and allows for single camera pattern recognition and orientation estimation. Pattern recognition is accomplished using a projective invariant property of line pencils: the cross ratio. Orientation is derived from single camera line-plane correspondences, and position estimation is achieved using multiple cameras. The method is evaluated against a related point based tracking approach. Results show our method has lower latency and comparable accuracy, and is less sensitive to occlusion.