Content uploaded by Andrea Torsello

Author content

All content in this area was uploaded by Andrea Torsello

Content may be subject to copyright.

RUNE-Tag: a High Accuracy Fiducial Marker with Strong Occlusion Resilience

Filippo Bergamasco, Andrea Albarelli, Emanuele Rodol`

a and Andrea Torsello

Dipartimento di Scienze Ambientali, Informatica e Statistica

Universit`

a Ca’ Foscari Venezia - via Torino, 155 - 30172 Venice Italy

http://www.dais.unive.it

Abstract

Over the last decades ﬁducial markers have provided

widely adopted tools to add reliable model-based features

into an otherwise general scene. Given their central role

in many computer vision tasks, countless different solutions

have been proposed in the literature. Some designs are fo-

cused on the accuracy of the recovered camera pose with

respect to the tag; some other concentrate on reaching high

detection speed or on recognizing a large number of distinct

markers in the scene. In such a crowded area both the re-

searcher and the practitioner are licensed to wonder if there

is any need to introduce yet another approach. Neverthe-

less, with this paper, we would like to present a general pur-

pose ﬁducial marker system that can be deemed to add some

valuable features to the pack. Speciﬁcally, by exploiting

the projective properties of a circular set of sizeable dots,

we propose a detection algorithm that is highly accurate.

Further, applying a dot pattern scheme derived from error-

correcting codes, allows for robustness with respect to very

large occlusions. In addition, the design of the marker itself

is ﬂexible enough to accommodate different requirements in

terms of pose accuracy and number of patterns. The overall

performance of the marker system is evaluated in an exten-

sive experimental section, where a comparison with a well-

known baseline technique is presented.

1. Introduction

A ﬁducial marker is, in its broadest deﬁnition, any arti-

ﬁcial object consistent with a known model that is placed

in a scene. At the current state-of-the-art such artifacts are

still only choice whenever a high level of precision and re-

peatability in image-based measurement is required. This

is, for instance, the case with accurate camera pose esti-

mation, 3D structure-from-motion or, more in general, any

ﬂavor of vision-driven dimensional assessment task. Of

course a deluge of approaches have been proposed in or-

der to obtain a reasonable performance by relying only on

natural features already present in the scene. To this ex-

tent, several repeatable and distinctive interest point detec-

tion and matching techniques have been proposed over the

years. While in some scenarios such approaches can obtain

satisfactory results, they still suffer from shortcomings that

severely hinder their broader use. Speciﬁcally, the lack of a

well known model limits their usefulness in pose estimation

and, even when such a model can be inferred (for instance

by using bundle adjustment) its accuracy heavily depends

on the correctness of localization and matching. Moreover,

the availability and quality of natural features in a scene is

not guaranteed in general. Indeed, the surface smoothness

found in most man-made objects can easily lead to scenes

that are very poor in features. Finally, photometric inconsis-

tencies due to reﬂective or translucent materials jeopardizes

the repeatability of the detected points. For this reasons, it

is not surprising that artiﬁcial ﬁducial tags continue to be

widely used and are still an active research topic. For prac-

tical purposes, most markers are crafted in such a way as to

be easily detected and recognized in images produced by a

pinhole-modeled camera. In this sense, their design lever-

ages the projective invariants that characterizes geometrical

entities such as lines, planes and conics. It is reasonable to

believe that circular dots were among the ﬁrst shapes used.

In fact, circles appear as ellipses under projective transfor-

mations and the associated conic is invariant with respect

to the point of view of the camera. This allows both for

an easy detection and a quite straightforward rectiﬁcation

of the circle plane. In his seminal work Gatrell [7] pro-

poses to use a set of highly contrasted concentric circles

and to validate a candidate marker by exploiting the com-

patibility between the centroids of the ellipses found. By

alternating white and black circles a few bits of informa-

tion can be encoded in the marker itself. In [3] the con-

centric circle approach is enhanced by adding colors and

multiple scales. In [11] and [16] dedicated “data rings” are

added to the ﬁducial design. A set of four circles located

at the corner of a square is adopted by [4]: in this case an

identiﬁcation pattern is placed at the centroid of the four

dots in order to distinguish between different targets. This

ability to recognize the viewed markers is very important

113

(a) Concentric Circles (b) Intersense (c) ARToolkit (d) ARTag (e) RUNE-43 (f) RUNE-129

Figure 1. Some examples of ﬁducial markers that differ both for the detection technique and for the pattern used for recognition. In the

ﬁrst two, detection happens by ﬁnding ellipses and the coding is respectively held by the color of the rings in (a) and by the appearance

of the sectors in (b). The black square border enables detection in (c) and (d), but while ARToolkit uses image correlation to differentiate

markers, ARTag relies in error-correcting binary codes. Finally, in (e) and (f) we show two examples of RUNE-Tags.

for complex scenes where more than a single ﬁducial is re-

quired, furthermore, the availability of a coding scheme al-

lows for an additional validation step and lowers the num-

ber of false positives. While coded patterns are widely used

(see for instance [18,5,15]) it is interesting to note that

many papers suggest the use of the cross ratio among de-

tected points [19,20,12] or lines [21]. A clear advantage of

the cross ratio is that, being projective invariant, the recog-

nition can be made without the need of any rectiﬁcation of

the image. Unfortunately this comes at the price of a low

overall number of distinctively recognizable patterns. In

fact the cross ratio is a single scalar with a strongly non-

uniform distribution [8] and this limits the number of well-

spaced different values that can possibly be generated. Also

the projective invariance of lines is frequently used in the

design of ﬁducial markers. Almost invariably this feature

is exploited by detecting the border edges of a highly con-

trasted quadrilateral block. This happens, for instance, with

the very well known ARToolkit [10] system which is freely

available and adopted in countless virtual reality applica-

tions. Thanks to its easy detection and the high accuracy

that can be obtained in pose recovery [14], this solution is

retained in many recent approaches, such as ARTag [6] and

ARToolkitPlus [22]. These two latter methods replace the

recognition technique of ARToolkit, which is based on im-

age correlation, with a binary coded pattern (see Fig. 1).

The use of an error-correcting code makes the marker iden-

tiﬁcation very robust, in fact we can deem these designs as

the most successful from an applicative point of view.

In this paper we introduce a novel ﬁducial marker system

that takes advantage of the same basic features for detection

and recognition purposes. The marker is characterized by a

circular arrangement dots at ﬁxed angular positions in one

or more concentric rings. Within this design, the projec-

tive properties of both the atomic dots and the rings they

compose are exploited to make the processing fast and re-

liable. In the following section we describe the general na-

ture of our marker, the algorithm proposed for its detection

and the coding scheme to be used for robust recognition. In

the experimental section we validate the proposed approach

by comparing its performance with two widely used marker

systems and by testing its robustness under a wide range of

noise sources.

2. Rings of Unconnected Ellipses

The proposed tag is built by partitioning a disc in sev-

eral evenly distributed sectors. Each sector, in turn, can be

divided into a number of concentric rings, which we call

levels. Each pair made up of a sector and a level deﬁnes a

slot where a dot can be placed. Finally, each dot is a circu-

lar feature whose radius is proportional to the radius of the

level at which the dot is placed. Within this design the reg-

ular organization of the dots enables easy localization and,

by properly populating each slot, it is possible to bind some

information to the tag. In Fig. 1(e) a tag built with 43 sec-

tors and just one level is shown. In Fig. 1(f) we add two

more levels: note that the dot size decreases for the inner

levels. We will explain in the following sections how this

structure is also ﬂexible and well suitable to deal with many

scenarios.

2.1. Fast and Robust Detection in Projective Images

Both the dots and the ideal rings on which they are dis-

posed appear as ellipses under general projective transform.

Thus, the ﬁrst step of the localization procedure is to try

to locate the dots by ﬁnding all the ellipses in the scene.

For this purpose we use the ellipse detector supplied by the

OpenCV [1] library, but any other suitable technique would

be ﬁne. The dots candidates found at this stage can be con-

sidered the starting point for our algorithm. A common ap-

proach would consists in the use of a RANSAC scheme on

features centers in order to locate the dots that belong to the

Total ellipses 10 50 100 500

Full (RANSAC) 252 2118760 75287520 >1010

Proposed method 45 1225 4950 124750

Figure 2. Number of maximum steps required for ellipse testing.

114

r1r2

r1r2

(a) Estimation of the feasible plane orientations

r

r

(b) Candidate ring estimation (c) Dot vote counting

Figure 3. Steps of the ring detection: in (a) the feasible view directions are evaluated for each ellipse (with complexity O(n)), in (b) for

each compatible pair of ellipses the feasible rings are estimated (with complexity O(n2)), in (c) the dot votes are counted, the code is

recovered and the best candidate ring is accepted (ﬁgure best viewed in color).

same marker (if any) and to separate them them from false

positives. Unfortunately, ﬁve points are needed to charac-

terize an ellipse, thus the use of RANSAC (especially with

many false positives) could lead to an intractable problem

(see Fig. 2). Since the marker itself can contain more than

one hundred dots, it is obvious that this approach is not fea-

sible. A possible alternative could be the use of some spe-

cialized Hough Transform [23], but also this solution would

not work since the relatively low number of dots (coupled

with the high dimensionality of the parameter space) hin-

ders the ability of the bins to accumulate enough votes for a

reliable detection. In order to cluster dots candidates into

coherent rings we need to exploit some additional infor-

mation. Speciﬁcally, after the initial ellipse detection the

full conic associated to each dot candidate is known. While

from this single conic it is not possible to recover the full the

camera pose, nevertheless we can estimate a rotation that

transform the ellipse into a circle. Following [2], the ﬁrst

step for recovering such rotation is to normalize the conic

associated to the dot, obtaining:

Q=

A B −D

f

B C −E

f

−D

f−E

f−F

f2

where fis the focal length of the camera that captured the

scene and Ax2+ 2Bxy +Cy2+ 2Dx + 2Ey +F= 0

is the implicit equation of the ellipse found. The Qis then

decomposed via SVD:

Q=VΛVTwith Λ=diag(λ1, λ2, λ3)

The required rotation can thus be computed (up to some

parameters) as:

R=V

gcosα s1gsinα s2h

sinα −s1cosα 0

s1s2hcosα s2hsinα −s1g

with g=rλ2−λ3

λ1−λ3

, h =rλ1−λ2

λ1−λ3

Here αis an arbitrary rotation around the normal of the

marker plane. Since we are not interested in the complete

pose (which is not even possible to recover) we can just ﬁx

such angle to 0and obtain:

R=V

g0s2h

0−s10

s1s2h0−s1g

Finally s1and s2are two free signs, which leave us with

four possible rotation matrices, deﬁning four different ori-

entations. Two of these orientation can be eliminated, as

they are discording with the line of sight. The other two

must be evaluated for each detected ellipse: we can deﬁne

them as r1and r2(see Fig. 3(a)). At this point it is pos-

sible to search for whole markers. For all the pairs of de-

tected ellipses the rotations are combined to form four fea-

sible rotation pair. These are ﬁltered eliminating the pairs

with an inner product above a ﬁxed threshold and then the

best pair of rotation is selected by applying the average of

the rotations (as quaternions) to both ellipses and a choos-

ing the pair with the minimal distance between the mean

radii of the rectiﬁed ellipses. The rationale of the ﬁltering

is to avoid to choose ellipses with discordant orientations

(as the marker is planar) and the compatibility score takes

in account that the dots on the same ring should be exactly

the same size on the rectiﬁed plane. If a good average ro-

tation ris found then exactly two hypothesis about the ring

location can be made. In fact we know both the angle be-

tween camera and marker planes and the size of the dots

on the rectiﬁed plane. Since the ratio between the radii of

each level and the dots that it contains is known and con-

stant (regardless of the level) we can estimate the radius of

the ring. Finally we can ﬁt such ring of know radius to the

two dots examined and thus reproject on the image plane

the two possible solutions (Fig. 3(b)). In this way we get

115

Figure 4. Detection grid for a Rune-Tag with multiple levels

two main advantages. The ﬁrst one is at most O(n2)can-

didate rings have to be tested, were nis the number of the

ellipses found (in Fig. 2we can see that the problem be-

comes tractable). The second advantage is that, as opposed

to many other approaches, the vote binning and the recov-

ery of the code happens entirely in the image space, thus

no picture rectiﬁcation is required. Note that the counting

of the dots happens by reprojecting the circular grid made

by sectors and levels on the image (Fig. 3(c)). Of course if

more than one ring is expected we need to project the addi-

tional levels both inward and outward (see Fig. 4). This is

due to the fact that even if a correct ring is detected we still

do not know at which level it is located since the ratio of the

dots is scaled accordingly.

2.2. Marker Recognition and Coding Strategies

Once the candidate ellipses are found we are left with

two coupled problems: the ﬁrst is that of assigning corre-

spondences between the candidates ellipses and the circles

in the marker, or, equivalently, to ﬁnd an alignment around

the orthogonal axis of the marker; the second is that of rec-

ognizing which of several markers we are dealing with.

The problem is further complicated by the fact that mis-

detections and occlusions make the matching non exact.

Here we chose to cast the problem into the solid and well-

developed mathematical framework of coding theory where

the circle pattern corresponds to a code with clearly de-

signed properties and error-correcting capabilities. In what

follows we will give a brief review of the theory needed to

build and decode the markers. We refer to [13] for a more

in-depth introduction to the ﬁeld.

Ablock code of length nover a set of Symbols Sis a set

C⊂Snand the elements of a code are called codewords.

The Hamming distance dH :S×S→Nis the number of

symbols that differ between two codeword, i.e.,

dH(u, v) = |is.t. ui6=vi, i = 1...n|

The Hamming distance of a code is the minimum distance

between all the codewords: dH(C) = minu,v∈CdH(u, v).

A code with Hamming distance dcan detect d−1errors

and correct b(d−1)/2cor d−1erasures (i.e., situations in

which we have unreadable rather than wrong symbols).

Let q∈Nsuch that q=pk, for prime a pand an integer

k≥1, we denote with Fqthe ﬁeld with qelements. A

linear code Cis a k-dimensional vector sub-space of (Fq)n,

where the symbols are taken over the ﬁeld Fq. A linear code

of length nand dimension khas qkdistinct codewords and

is subject to the singleton bound: d≤n−k+1, thus, with a

ﬁxed code length n, higher error correcting capabilities are

payed with a smaller number of available codewords.

In our setting we map the point patterns around the cir-

cle to a codeword, but, since on a circle we do not have a

starting position of the code, we have to take into account

all cyclic shifts of a pattern. A linear code Cis called cyclic

if any cyclic shift of a codeword is still a codeword, i.e.

(c0, . . . , cn−1)∈C⇒(cn−1, c0, . . . , cn−2)∈C .

There is a bijection between the vectors of (Fq)nand

residue class of Fq[x]modulo division by xn−1.

v= (v0, . . . , vn−1)⇔v0+v1x+· · · +vn−1xn−1.

Multiplying a polynomial form of a code by xmodulo xn−

1corresponds to cyclic shift:

x(c0+c1x+· · ·+cn−1xn−1) = cn−1+c0x+· · ·+cn−2xn−1.

Further, Cis a cyclic code if and only if Cis an ideal of

the quotient group of the polynomial ring Fq[x]modulo di-

vision by xn−1. This means that all cyclic codes in poly-

nomial form are multiples of a monic generator polynomial

g(x)which divides xn−1in Fq. Thus, if g(x)is a generator

polynomial of degree m, all codewords can be obtained by

any mapping a polynomial p(x)∈Fq[x]of degree at most

n−m−1minto p(x)g(x) mod xn−1.

Using a cyclic code of distance 2e+1 guarantees that we

can correct emisdetections, regardless of the actual align-

ment of the patterns. Moreover, we can decode the marker

used and recover the alignment at the same time. Since all

the cyclic shifts are codes, we can group the codewords into

cyclic equivalence classes, such that two codewords are in

the same class if one can be obtained from the other with

a cyclic shift. Clearly, the number of elements in a cyclic

equivalence class divides n, so by choosing nprime, we

only have classes where all the codewords are distinct, or

classes composed of one element, i.e., constant codewords

with nrepetitions of the same symbol. The latter group

is composed of which are at most qcodewords and can be

easily eliminated. In our marker setting, the choice of the

marker is encoded by the cyclic equivalence class, while

the actual alignment of the circles can be obtained from the

detected element within the class.

In this paper we are restricting our analysis to the cor-

rection of random errors or erasures, but it is worth noting

that cyclic codes have been used to detect and correct burst

errors, i.e. errors that are spatially coherent, like we have in

the case of occlusions.

116

10-4

10-3

10-2

10-1

20 40 60 80 100

∆α [rad]

Noise σ

Rune-43

Rune-129

ARToolkit

ARToolkitPlus

10-4

10-3

10-2

10-1

0 5 10 15 20

∆α [rad]

Gaussian blur window size [px]

Rune-43

Rune-129

ARToolkit

ARToolkitPlus

10-4

10-3

10-2

10-1

0 0.2 0.4 0.6 0.8 1 1.2

∆α [rad]

Angle of view [rad]

Rune-43

Rune-129

ARToolkit

ARToolkitPlus

Figure 5. Evaluation of the accuracy in the camera pose estimation with respect to different scene conditions. Examples of the detected

features are shown for RUNE-129 (ﬁrst image column) and ARToolkitPlus (second image column).

Speciﬁcally, we experiment with two distinct codes. The

ﬁrst code (RUNE-43) is formed of a single circular pattern

of circles that can be present or absent in 43 different angu-

lar slots. In this situation we encode the pattern as a vector

in (Z2)43, where Z2is the remainder class modulo 2. For

this code we chose the generator polynomial

g(x) = (1 + x2+x4+x7+x10 +x12 +x14 )

(1 + x+x3+x7+x11 +x13 +x14)

which provides a cyclic code of dimension 15 giving 762

different markers (equivalence classes) with a minimum

distance of 13, allowing us to correct up to 6 errors.

The second code (RUNE-129) is formed of a three con-

centric pattern of circles in 43 different angular slots. In

this situation we have 8 possible patterns for each angular

slot. We hold out the pattern with no circles to detect era-

sures due to occlusions and we encode the remaining 7 as

a vector in (Z7)43. For this code we chose the generator

polynomial

g(x) = (1 + 4x+x2+ 6x3+x4+ 4x5+x6)

(1+2x2+2x3+2x4+x6)(1+x+3x2+5x3+3x4+x5+x6)

(1 + 5x+ 5x2+ 5x4+ 5x5+x6)(1 + 6x+ 2x3+ 6x5+x6)

(1 + 6x+ 4x2+ 3x3+ 4x4+ 6x5+x6)

providing a cyclic code of dimension 7 which gives 19152

different markers with a minimum distance of 30, allowing

us to correct up to 14 errors, or 29 erasures, or any combi-

nation of eerrors and cerasures such that 2e+c≤29. For

efﬁcient algorithms to decode the patterns and correct the

error we refer to the literature [13].

2.3. Estimation of the Camera Pose

By using the detected and labelled ellipses it is now pos-

sible to estimate the camera pose. Since the geometry of the

original marker is known any algorithm that solves the PnP

problem can be used. In our test we used the solvePnP func-

tion available from OpenCV. However it should be noted

117

10-4

10-3

10-2

0 10 20 30 40 50 60 70

∆α [rad]

Occlusions [%]

Rune-43

Rune-129

10-4

10-3

10-2

10-1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

∆α [rad]

Gradient Steepness

Rune-43

Rune-129

Figure 6. Evaluation of the accuracy in the camera pose estimation of RUNE-Tag with respect to occlusion (left column) and illumination

gradient (right column).

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 20 40 60 80 100

Time [sec.]

Number of false ellipses

Rune-Tag 129

Rune-Tag 43

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 2 3 4 5 6 7 8 9 10

Time [sec.]

Number of tags

Rune-Tag 129

Rune-Tag 43

Figure 7. Evaluation of the recognition time respectively when adding artiﬁcial false ellipses in the scene (left column) and with several

markers (right column).

that, while the estimated ellipse centers can be good enough

for the detection step, it could be reasonable to reﬁne them

in order to recover a more accurate pose. Since this is done

only when a marker is found and recognized we can in-

dulge and dedicate a little more computational resources

at this stage. In this paper we used the robust ellipse re-

ﬁnement presented in [17]. In addition to a more accurate

localization it could be useful to correct also the projective

displacement of the ellipses centers. However, according to

our tests, such correction gives in general no advantage and

sometimes leads to slightly less accurate results. Finally we

also tried the direct method outlined in [9], but we obtained

very unstable results, especially with small and skewed el-

lipses.

3. Experimental Validation

In this section the accuracy and speed of the Rune-Tag

ﬁducial markers is evaluated and compared with the re-

sults obtained by ARToolkit and ARToolkitPlus. Both tags

with one level (RUNE-43) and three levels (RUNE-129) are

tested. All the experiments have been performed on typical

a desktop PC equipped with a 1.6Ghz Intel Core Duo pro-

cessor. The accuracy of the recovered pose is measured as

the angular difference between the ground truth camera ori-

entation and the pose obtained. Such ground truth is known

since the test images are synthetically generated under dif-

ferent condition of nose, illumination, viewing direction,

etc. The implementations of ARToolkit and ARToolkitPlus

used are the ones freely available at the respective websites.

The real images are taken with a 640x480 CMOS webcam.

3.1. Accuracy and Baseline Comparisons

In Fig. 5the accuracy of our markers is evaluated. In

the ﬁrst test an additive Gaussian noise was added to im-

ages with an average view angle of 0.3 radians and no arti-

ﬁcial blur added. The performance of all methods get worse

with increasing levels of noise and ARToolkitPlus, while in

general more accurate than ARToolkit, breaks when dealing

with a noise with a std. dev. greater than 80 (pixel inten-

sities goes from 0 to 255). Both RUNE-43 and RUNE-129

always recover a more faithful pose. We think that this is

mainly due to the larger number of correspondences used

to solve the PnP problem. In fact we can observe that in

all the experiments RUNE-129 performs consistently better

than RUNE-43. Unlike additive noise, Gaussian blur seems

to have a more limited effect on all the techniques. This is

mainly related to the fact that all of them performs a pre-

liminary edge detection step, which in turn applies a con-

volution kernel. Thus is somewhat expected that an addi-

tional blur does not affect severely the marker localization.

Finally it is interesting to note that oblique angles lead to

an higher accuracy (as long as the markers are still recog-

nizable). This is explained by observing that the constraint

of the reprojection increases with the angle of view. Over-

all these experiments conﬁrm that Rune-Tag always outper-

forms the other two tested techniques by about one order of

118

(a) (b) (c) (d)

Figure 8. Some examples of behaviour in real videos with occlusion. In (a) and (b) an object is placed inside the marker and the setup is

rotated. In (c) and (d) the pose is recovered after medium and severe occlusion.

magnitude. In practical terms the improvement is not negli-

gible, in fact an error as low as 10−3radians still produces

a jitter of 1 millimeter when projected over a distance of

1 meter. While this is a reasonable performance for aug-

mented reality applications, it can be unacceptable for ob-

taining precise contactless measures.

3.2. Resilience to Occlusion and Illumination

One of the main characteristics of Rune-Tag is that it is

very robust to occlusion. In section 2.2 we observed that

RUNE-129 can be used to distinguish between about 20.000

different tags and still be robust to occlusions as large as

about 2/3of the dots. By choosing different cyclic coding

schemes is even possible to push this robustness even fur-

ther, at the price of a lower number of available tags. In the

ﬁrst column of Fig. 6we show how occlusion affects the

accuracy of the pose estimation (i.e. how well the pose is

estimated with fewer dots regardless to the ability of recog-

nize the marker). Albeit a linear decreasing of the accuracy

with respect to the occlusion can be observer, the precision

is still quite reasonable also when most of the dots are not

visible. In Fig. 9we show the recognition rate of the two

proposed designs with respect to the percentage of marker

area occluded. In the second column of Fig. 6the robustness

to illumination gradient is examined. The gradient itself is

measured unit per pixel (i.e. quantity to add to each pixel

value for a each pixel of distance from the image center).

Overall, the proposed methods are not affected very much

by the illumination gradient and break only when it become

very large (in our setup an illumination gradient of 1 implies

that pixels are completely saturated at 255 pixels from the

image center).

Occlusion 0% 10% 20% 50% 70%

RUNE-43 100% 69% 40% 0% 0%

RUNE-129 100% 100% 100% 100% 67%

Figure 9. Recognition rate of the two proposed marker conﬁgura-

tions with respect to the percentage of area occluded.

3.3. Performance Evaluation

Our tag system is designed for improved accuracy and

robustness rather than for high detection speed. This is quite

apparent in Fig. 7, where we can see that the recognition

could require from a minimum of about 15 ms (RUNE-

43 with one tag an no noise) to a maximum of about 180

ms (RUNE-129 with 10 tags). By comparison ARToolkit-

Plus is about an order of magnitude faster [22]. However, it

should be noted that, despite being slower, the frame rates

reachable by Rune-Tag (from 60 to about 10 fps) can still

be deemed as usable even for real-time applications (in par-

ticular when few markers are viewed at the same time).

3.4. Behaviour with Real Images

In addition to the evaluation with synthetic images we

also performed some qualitative tests on real videos. In

Fig. 8some experiments with common occlusion scenar-

ios are presented. In the ﬁrst two shots an object is placed

inside a RUNE-43 marker in a typical setup used for image-

based shape reconstruction. In the following two frames a

RUNE-129 marker is tested for its robustness to moderate

and severe occlusion. At last, in Fig. 10 an inherent short-

coming of our design is highlighted. The high density ex-

hibited by the more packed markers may result in a failure

of the ellipse detector whereas the tag is far away from the

camera or very angled, causing the dots to become too small

or blended.

Figure 10. Recognition fails when the marker is angled and far

away from the camera and the ellipses blends together.

119

4. Conclusions

The proposed ﬁducial marker system exhibits several ad-

vantages over the current range of designs. It is both very re-

sistant to occlusion thanks to its code-theoretic design, and

offers very high accuracy in pose estimation. In fact, our

experimental validation shows that the precision of the pose

recovery can be about an order of magnitude higher than the

current state-of-the-art. This advantage is maintained also

with a signiﬁcant level of artiﬁcial noise, blur, illumination

gradient and with up to 70% of the features occluded. Fur-

ther, the design of the marker itself is quite ﬂexible as can be

adapted to accommodate a larger number of different codes

or an higher resilience to occlusion. In addition, the iden-

tity between the features to be detected and the pattern to be

recognized leaves plenty of space in the marker interior for

any additional payload or even for placing a physical object

for reconstruction tasks. Finally, while slower than other

techniques, this novel method is fast enough to be used in

real-time applications. Of course those enhancements do

not come without some drawbacks. Speciﬁcally, the severe

packing of circular points can lead the ellipse detector to

wrongly merge features at low resolution. This effectively

reduces the maximum distance at which a target can be rec-

ognized. However, this limitation can be easily relieved by

using a simpler marker, such as RUNE-43, which allows for

a more extended range while still providing a satisfactory

precision.

References

[1] G. Bradski and A. Kaehler. Learning OpenCV: Computer

Vision with the OpenCV Library. O’Reilly Media, Inc., 1st

edition, 2008. 114

[2] Q. Chen, H. Wu, and T. Wada. Camera calibration with two

arbitrary coplanar circles. In European Conference on Com-

puter Vision - ECCV, 2004. 115

[3] Y. Cho, J. Lee, and U. Neumann. A multi-ring color ﬁdu-

cial system and a rule-based detection method for scalable

ﬁducial-tracking augmented reality. In Proceedings of Inter-

national Workshop on Augmented Reality, 1998. 113

[4] D. Claus and A. W. Fitzgibbon. Reliable automatic calibra-

tion of a marker-based position tracking system. In IEEE

Workshop on Applications of Computer Vision, 2005. 113

[5] M. Fiala. Artag, a ﬁducial marker system using digital tech-

niques. In Proceedings of the 2005 IEEE Computer Soci-

ety Conference on Computer Vision and Pattern Recognition

(CVPR’05), CVPR ’05, Washington, DC, USA, 2005. IEEE

Computer Society. 114

[6] M. Fiala. Designing highly reliable ﬁducial markers. IEEE

Trans. Pattern Anal. Mach. Intel., 32(7), 2010. 114

[7] L. Gatrell, W. Hoff, and C. Sklair. Robust image features:

Concentric contrasting circles and their image extraction. In

Proc. of Cooperative Intelligent Robotics in Space, Washing-

ton, USA, 1991. SPIE. 113

[8] D. Q. Huynh. The cross ratio: A revisit to its probability den-

sity function. In Proceedings of the British Machine Vision

Conference BMVC 2000, 2000. 114

[9] J. Kannala, M. Salo, and J. Heikkil¨

a. Algorithms for com-

puting a planar homography from conics in correspondence.

In British Machine Vision Conference, 2006. 118

[10] H. Kato and M. Billinghurst. Marker tracking and hmd cal-

ibration for a video-based augmented reality conferencing

system. In Proceedings of the 2nd IEEE and ACM Inter-

national Workshop on Augmented Reality, Washington, DC,

USA, 1999. IEEE Computer Society. 114

[11] V. A. Knyaz, H. O. Group, and R. V. Sibiryakov. The de-

velopment of new coded targets for automated point identiﬁ-

cation and non-contact surface measurements. In 3D Sur-

face Measurements, International Archives of Photogram-

metry and Remote Sensing, 1998. 113

[12] R. V. Liere and J. D. Mulder. Optical tracking using projec-

tive invariant marker pattern properties. In Proceedings of

the IEEE Virtual Reality Conference. IEEE Press, 2003. 114

[13] J. H. V. Lint. Introduction to Coding Theory. Springer-Verlag

New York, Inc., Secaucus, NJ, USA, 1998. 116,117

[14] M. Maidi, J.-Y. Didier, F. Ababsa, and M. Mallem. A perfor-

mance study for camera pose estimation using visual marker

based tracking. Mach. Vision Appl., 21, 2010. 114

[15] J. Mooser, S. You, and U. Neumann. Tricodes: A barcode-

like ﬁducial design for augmented reality media. Multimedia

and Expo, IEEE International Conference on, 2006. 114

[16] L. Naimark and E. Foxlin. Circular data matrix ﬁducial

system and robust image processing for a wearable vision-

inertial self-tracker. In Proceedings of the 1st International

Symposium on Mixed and Augmented Reality, ISMAR ’02,

Washington, DC, USA, 2002. IEEE Computer Society. 113

[17] J. Ouellet and P. Hebert. Precise ellipse estimation without

contour point extraction. Mach. Vision Appl., 21, 2009. 118

[18] J. Rekimoto and Y. Ayatsuka. CyberCode: designing aug-

mented reality environments with visual tags. In DARE ’00:

Proceedings of DARE 2000 on Designing augmented reality

environments, New York, NY, USA, 2000. ACM. 114

[19] L. Teixeira, M. Loaiza, A. Raposo, and M. Gattass. Aug-

mented reality using projective invariant patterns. In Ad-

vances in Visual Computing, volume 5358 of Lecture Notes

in Computer Science. Springer Berlin / Heidelberg, 2008.

114

[20] V. S. Tsonisp, K. V. Ch, and P. E. Trahaniaslj. Landmark-

based navigation using projective invariants. In Proceedings

of the 1998 IEEE Intl. Conf. on Intelligent Robots and Sys-

tems, Victoria, Canada, 1998. IEEE Computer Society. 114

[21] A. van Rhijn and J. D. Mulder. Optical tracking using line

pencil ﬁducials. In Proceedings of the eurographics sympo-

sium on virtual environments, 2004. 114

[22] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and

D. Schmalstieg. Real time detection and tracking for aug-

mented reality on mobile phones. IEEE Transactions on Vi-

sualization and Computer Graphics, 99, 2010. 114,119

[23] X. Yu, H. W. Leong, C. Xu, and Q. Tian. A robust and

accumulator-free ellipse hough transform. In Proceedings

of the 12th annual ACM international conference on Multi-

media, New York, NY, USA, 2004. ACM. 115

120