Content uploaded by Aaron Mavrinac

Author content

All content in this area was uploaded by Aaron Mavrinac on Sep 20, 2017

Content may be subject to copyright.

A Self-localization System with Global Error

Reduction and Online Map-Building Capabilities

Karam Shaya, Aaron Mavrinac, Jose Luis Alarcon Herrera, and Xiang Chen

Department of Electrical and Computer Engineering, University of Windsor,

Windsor, Ontario, Canada

{shayak,mavrin1,alarconj,xchen}@uwindsor.ca

Abstract. An economical self-localization system which uses a monoc-

ular camera and a set of artiﬁcial landmarks is presented herein. The

system represents the surrounding environment as a topological graph

where each node corresponds to an artiﬁcial landmark and each edge

corresponds to a relative pose between two landmarks. The edges are

weighted based on an error metric (related to pose uncertainty) and a

shortest path algorithm is applied to the map to compute the path corre-

sponding to the least aggregate weight. This path is used to localize the

camera with respect to a global coordinate system whose origin lies on

an arbitrary reference landmark (i.e., the destination node of the path).

The proposed system does not require a preliminary training process, as

it builds and updates the map online. Experimental results demonstrate

the performance of the system in reducing the global error associated

with large-scale localization.

1 Introduction

With the growing demand for autonomous robots in industrial, medical, domes-

tic, and other domains, a large portion of research in the robotics industry has

been geared toward the development and improvement of localization systems.

For the purposes of this paper, existing localization systems in the literature

will be divided into two categories: those that obtain their data from multiple

sensors (e.g., [8], [1], [5]) and those that obtain them from a single sensor (e.g.,

[11], [15], [16]). The former type of system takes a sensor fusion approach. One

major advantage of sensor fusion is the availability of multiple sources of data,

through which the robot may verify the readings of its individual sensors and

reduce the overall error of its pose estimates. The disadvantages of multiple sen-

sors are added complexity (in the localization algorithm and hardware design),

larger form factor, and increased cost. Conversely, systems that use single sen-

sors tend to be simpler, smaller, and less expensive; however, they do not have

the redundancy and fusion of multiple independent sensor measurements and

must therefore use internal methods to reduce estimation error. The focus of our

work is on reducing the estimation error of a single sensor localization system.

This research is supported in part by NSERC-Discovery Grant to Prof. Xiang Chen.

C.-Y. Su, S. Rakheja, H. Liu (Eds.): ICIRA 2012, Part III, LNAI 7508, pp. 13–22, 2012.

c

Springer-Verlag Berlin Heidelberg 2012

14 K. Shaya et al.

Compared to the sensing modalities used in other solutions (e.g., odometry,

sonar, and laser), two dimensional images provide a robot’s localization algo-

rithm with more data about the environment [9]. They can be used by a lo-

calization algorithm to detect, identify, and estimate the pose of objects in a

scene.

The nature of the map-building process of visual localization yields two par-

ticular types of pose estimation errors: local error – which originates from error

and noise in image capture and aﬀects pose estimations made with respect to

coordinate systems in the image – and global error – which arises from the ac-

cumulation of local error and aﬀects pose estimations made with respect to a

global coordinate system that may not necessarily be in the image. Due to the

inﬂuence local error has on global error, reducing the former would result in

a reduction of the latter. This may be achieved, for example, through markers

(i.e., artiﬁcial landmarks) that can be accurately detected and discerned to help

keep local error at a minimum [7], [4], [10].

In this paper, the problem will be approached from a diﬀerent perspective.

While local error reduction can be characterized as a low-level means of re-

ducing global error, the proposed method will take on a more direct approach

by representing the surrounding environment as a mathematical graph whose

edge weights reﬂect the eﬀects of local error. Graphical representation of similar

vision-based pose estimation problems has previously been applied to such areas

as multiview registration of 3D scenes [14] and large-scale extrinsic calibration

of camera networks [3]. By applying a shortest path algorithm, the graph can

be optimized to yield the paths of minimum global error (i.e., accumulated local

error). To the best of the authors’ knowledge, the minimization of global error in

a localization system using the graphical approach outlined herein has not been

presented in literature.

The remainder of this paper is organized as follows: Section 2 deﬁnes the pre-

liminary concepts relating to pose composition, graphs, and the shortest path

algorithm; Section 3 states the necessary assumptions and presents the pro-

posed self-localization system’s method and algorithm; Section 4 demonstrates

the performance of the system in reducing the eﬀect of global error; and Section

5 concludes the paper by providing reﬂections on the results.

2 Preliminaries

2.1 Pose Composition

AposePαβ is a rigid three dimensional Euclidean transformation from the co-

ordinate system of object αto the coordinate system of object β.Thismaybe

referred to as the pose of object αwith respect to object β.

The inverse of pose Pαβ may be denoted P−1

αβ or Pβα. The former notation

will be used here to emphasize that Pαβ is the available direct estimate.

Successive pose transformations may be composed into a single pose:

Pαγ(p)=(Pαβ ◦Pβγ )(p)=Pβγ(Pαβ (p)) (1)

Localization System with Global Error Reduction and Online Map-Building 15

Note that a left-composition convention is used to better illuminate the sequence

of pose transformations.

The details of pose inversion and composition vary depending on the repre-

sentation used. The reader is directed to any of the numerous texts on Euclidean

geometry for a treatment appropriate to his or her working representation.

2.2 Relative Pose Calculation

Refering to Fig. 1, the relative pose transformation of a marker αwith respect

to another marker βis determined through pose composition to be

Pαβ =Pαc ◦P−1

βc (2)

where Pαc and Pβc are the poses of αand βwith respect to the camera c

(assumed to be available). Note that this calculation is made possible by the

fact that both markers are in the camera’s FOV (shaded area in Fig. 1) at the

same time; more speciﬁcally, pose estimates are taken from the same captured

camera frame.

Fig. 1. Relative Pose Calculation - the camera can calculate the relative pose trans-

formation between multiple markers by capturing them concurrently in its FOV

2.3 Marker Graph

The marker graph (based on the calibration graph introduced by Mavrinac et

al. [12]) is a method of representing a set of markers as a topological map.

It is a weighted, undirected graph GM=(M,E

M,WM), where Mis the set

of detected markers in the system, EMis a set of edges, and WMis the set of

weights corresponding to the edges in EM. The existence of an edge {α, β}∈EM

indicates that a relative pose transformation from marker αto marker β(or vice

versa) is available.

Since it is trivial to invert a pose, the availability of Pαβ implies availability of

Pβα. The edge weight (wαβ ∈R+)∈W

Mis the estimation uncertainty of Pαβ.

Apathp=α,...,βin GM,fromnodeαto node β, represents a se-

quence of pose transformations which may be composed to yield Pαβ .Ifp=

v1,v

2,...,v

n,

P1,n =P1,2◦P2,3◦···◦Pn−1,n (3)

16 K. Shaya et al.

where Pi,j is the pose transformation from vito vj.IfanyPi,j is not available,

Pi,j =P−1

j,i . The aggregate error associated with this pose is

w1,n =

n

2

wk−1,k (4)

which is the length of path p.

2.4 Localization Graph

The localization graph is essentially a marker graph that includes the camera

cas an additional node. It is a weighted, undirected graph GL=(L,E

L,WL),

where L=c∪M,ELis the set of edges between nodes L,andWLis, again, the

set of associated edge weights. The localization graph is incrementally updated

as the camera cmoves through the environment. Note that GL⊃G

M.

2.5 Shortest Path Algorithm

Shortest path algorithms can be applied to graphs to ﬁnd the minimum topolog-

ical path between two nodes. The famous shortest path algorithm by Dijkstra

[6] solves the single-source shortest path problem of undirected graphs with non-

negative edge weights. Since globally localizing a camera involves obtaining its

pose with respect to a single source (i.e., the global coordinate system), Dijk-

stra’s algorithm is appropriate for computing the shortest path of edges from

said source to the camera. It returns the path yielding the minimum aggregate

error deﬁned in (4).

3 Global Localization

3.1 Assumptions

Internal Calibration and Pose Estimation. It is assumed that there exists

some means by which a camera may estimate, from a single view, its relative

three dimensional pose with respect to a calibration target of known structure

[17], [2]. This normally implies that the camera is internally calibrated.

Marker Constraints. It is assumed that (i) if a marker m∈Mis connected

to an edge, it remains ﬁxed in its position, and (ii) the selected reference node

Rcorresponds to a marker that is available and detectable in the environment.

Map Updates. It is assumed that any operation that updates GMsimultane-

ously updates GL,andviceversa.

Localization System with Global Error Reduction and Online Map-Building 17

3.2 Problem Deﬁnition

The problem of global localization using computer vision is formalized as follows:

Given a monocular camera c,asetofmarkersM, and an arbitrary global

reference frame R∈M, ﬁnd PcR as ctraverses the environment.

Let us further deﬁne a set of markers Vwhich are in the camera’s current FOV

(e.g., shaded area in Fig. 1). Then, it can be noted that PcR may either be

obtained directly from R(when R∈V) or indirectly from another marker v∈V

(when R/∈V), assuming there exists a path in GLfrom cto R.

It is additionally desirable to decrease the global error by using the path p

yielding the minimum aggregate error (as deﬁned in (4)) for each estimated PcR.

3.3 Self-localization Method

The method will be explained with the aid of an example. Suppose it is desired

to ﬁnd the pose of a monocular camera cwithin an environment consisting of

markers M={W, X, Y , Z, R}, as shown in Fig. 2. In this case, Ris selected as the

global reference frame, so the problem is to ﬁnd PcR . As mentioned previously,

there are two ways of ﬁnding PcR : either directly through R(when Ris in the

FOV) or indirectly through intermediate markers W, X, Y , Z (when Ris not in

the FOV).

As shown in Fig. 3, the localization graph is connected assuming that the

direct pose estimates in Fig. 2 are all available. Thus, the camera ccan be

localized with respect to the common reference frame Rusing any of the markers

in the map.

The minimum requirement to achieve global localization is that there must

exist a path from the current position of cto the reference frame Rin GL.

Additional edges may yield shorter paths (i.e., pose compositions with lower

aggregate error). The positioning of the set of markers Mshould be chosen

appropriately. Note that there is no disadvantage, aside from additional eﬀort

positioning and obtaining pose estimates, to increasing the size of M.

Fig. 2. Camera Pose Estimation - arrows indicate the direct pose estimate of one object

(camera or marker) with respect to another, where the pose of the object on the arrow’s

tail is given with respect to that on the arrow’s head

18 K. Shaya et al.

In this example, direct pose estimates PWc,PWX,PYX,PZX,PYZ,PYR

,and

PZR are obtained, along with their respective pose uncertainties. The availability

of direct pose estimates is encapsulated in the localization graph of Fig. 3.

Fig. 3. Localization Graph for System in Fig. 2 - edge weights represent pose estimation

uncertainty values

The solution is obtained through composition of the estimated poses, accord-

ing to (3), where the shortest paths are computed using Dijkstra’s algorithm

or similar. As an example, suppose the shortest path from cto Rin GLis

c, W, X, Y, Z, R. Then,

PcR =P−1

Wc ◦PWX ◦P−1

YX ◦PYZ ◦PZR (5)

as per (3). The associated aggregate error is wcR =wWc +wWX +wYX+wYZ+

wZR.

Map Updating. When there are multiple markers in the FOV, the relative

poses between all possible pairs of markers in the frame are calculated and

connecting edges (with associated weights) are created between them. If in a

subsequent frame an edge is re-detected and is found to have a lower weight than

the existing edge, its relative pose transformation and weight overwrite those

corresponding to the existing edge. In this way, the edges of the graph maintain

the minimum weights (and thus the relative poses with the least uncertainty) at

all times.

3.4 Self-localization Algorithm

In the formal expression of the algorithm (Algorithm 1), let primed ()variables

represent calculations made in the current frame (e.g., P

ij represents the relative

pose between markers iand jas calculated from the current frame). Three pre-

viously undeﬁned functions are used in this algorithm: The ﬁrst, calcw({α, β}),

calculates the weight of the edge connecting nodes αand βbased on an appropri-

ately derived error metric; the second, con(G,s,d), returns True if there exists a

path from sto din G; the third, sp(G,m,R), returns the shortest path psfrom

mto Rin G(using Dijkstra’s algorithm) in the form deﬁned in Section 2.3. The

boolean variable nwill be used to indicate that a map update has occurred.

Localization System with Global Error Reduction and Online Map-Building 19

Algorithm 1. Proposed Self-Localization Algorithm (Finding P

cR)

1: M←{R}

2: EM,WM,V←∅

3: n←False

4: loop

5: Capture frame

6: if V =∅then

7: M←M∪V

8: if |V| >1then

9: for all {vi,v

j}∈V

2do

10: P

ij ←P

ic ◦P−1

jc

11: P

ji ←P−1

ij

12: if {vi,v

j}/∈EMthen

13: EM←EM∪{vi,v

j}

14: wij ,w

ji ←∞

15: WM←W

M∪wij ,w

ji

16: end if

17: if calcw({vi,v

j})<w

ij then

18: wij ,w

ji ←calcw({vi,v

j})

19: Pij ←P

ij

20: Pji ←P

ji

21: n←True

22: end if

23: end for

24: if n=True then

25: for all {m∈M|con(GM,m,R)}do

26: ps←sp(GM,m,R)

27: PmR ←Pps,1,ps,2

28: wmR ←wps,1,ps,2

29: for k=2→|ps|−1do

30: PmR ←PmR ◦Pps,k,ps,k+1

31: wmR ←wmR +wps,k,ps,k+1

32: end for

33: end for

34: end if

35: end if

36: if R∈V then

37: P

cR =P−1

Rc

38: else if ∃{v∈V |con(GM,v,R)}then

39: vm←argmin

v∈V

(calcw({v, c})+wvR)

40: P

cR ←P−1

vmc◦PvmR

41: end if

42: end if

43: n=False

44: end loop

20 K. Shaya et al.

4 Experimental Results

An experiment was performed in an indoor environment using an internally cali-

brated ICube NS4133BU Camera, a set of markers, a marker detection algorithm,

and the proposed self-localization algorithm. The purpose of the experiment is

to demonstrate the proposed algorithm’s performance in reducing global error.

The metric used for quantifying the edge weights of the localization graph

is based on the ﬁndings of Schweighofer and Pinz [13]. In their paper, they

demonstrate that pose ambiguity of planar targets is aﬀected by the change in

position and/or orientation of the target in the scene. For instance, the pose

ambiguity was found to increase with the distance between the camera and

the target. Based on this, a simple function f(d), where dis the perpendicular

distance between the camera and a marker, can be used as a metric for the edge

weights. When the edge is deﬁned by two markers, the function is applied to

each marker individually and the two results are averaged to obtain the weight

of the connecting edge.

The general procedure is outlined as follows:

–The markers are attached to a wall in the conﬁguration shown in Fig. 4(a).

–The camera is traversed through the room (facing the markers throughout)

while the marker detection and self-localization algorithms are running.

–A map with reference Ris built and updated online by the self-localization

algorithm (refer to Fig. 4(b)).

–The camera is positioned such that only marker Yis visible in its FOV.

–From this known position, the localization accuracy was compared between

the shortest path obtained from our algorithm and the 35 other possible

(simple) paths between markers Yand R.

Three types of functions for edge weights were tested on the map: linear (f(d)=

d), quadratic (f(d)=d2), and exponential (f(d)=3

d−1). Each function was

applied to the 36 possible paths from Yto R. In all the cases, the shortest path

corresponding to the minimum aggregate error (obtained from our algorithm)

Fig. 4. Experimental Map - (a) arrangement of markers in the environment, (b) topo-

logical representation of the map

Localization System with Global Error Reduction and Online Map-Building 21

Fig. 5. Ground Truth vs. Aggregate Error - ﬁnding a correlation using exponential

edge weight function f(d)=50

d−1

yielded the lowest ground truth positional error compared to the 35 other paths.

The average, minimum, and maximum ground truth errors for the 36 paths

were identical for all three functions: 67.9, 29.1, and 117.8 mm, respectively.

Furthermore, it was observed that compared to the other two, the exponential

function showed the best correlation between the aggregate error and the ground

truth positional error. Fig. 5 shows this correlation in the form of a line of

best ﬁt with correlation coeﬃcient R≈0.74, using the exponential function

f(d)=50

d−1. The bottom left-most data point in this ﬁgure graphically shows

how the minimum aggregate error corresponds to the minimum ground truth

positional error of the 36 paths for the given metric.

The experiment was repeated using the same exponential error function, ex-

cept the three lower markers were moved closer to the camera and tilted hori-

zontally. In this case, the minimum ground truth error corresponded to the third

least aggregate error and there was a negative correlation of R≈0.47 between

the aggregate and ground truth errors. This indicates that this particular error

function may not be suitable for all conﬁgurations of markers.

5Conclusion

A proposed topological mapping approach is applied to a self-localization sys-

tem to reduce global error and eliminate the need for an oﬄine map-building

mode. The map is incrementally built and updated as the camera moves through

the environment. A shortest path algorithm is applied to the map to ﬁnd the

path of least aggregate error based on an appropriate edge weight metric. Ex-

periments were done with online map-building and using a weight metric based

on the perpendicular distance between the camera and each marker. The re-

sults demonstrate the eﬀectiveness of the system in reducing global error but

emphasize the importance of deriving an appropriate error metric for the edge

weights.

22 K. Shaya et al.

References

1. Anjum, M., Park, J., Hwang, W., il Kwon, H., Hyeon Kim, J., Lee, C., Soo Kim,

K., il Danr Cho, D.: Sensor data fusion using unscented kalman ﬁlter for accu-

rate localization of mobile robots. In: 2010 International Conference on Control

Automation and Systems (ICCAS), pp. 947–952 (October 2010)

2. Ansar, A., Daniilidis, K.: Linear pose estimation from points or lines. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence 25(5), 578–589 (2003)

3. Brand, M., Antone, M., Teller, S.: Spectral Solution of Large-Scale Extrinsic Cam-

era Calibration as a Graph Embedding Problem. In: Pajdla, T., Matas, J(G.) (eds.)

ECCV 2004. LNCS, vol. 3022, pp. 262–273. Springer, Heidelberg (2004)

4. Chen, X., Li, R., Wang, X., Tian, Y., Huang, Q.: A novel artiﬁcial landmark for

monocular global visual localization of indoor robots. In: 2010 International Con-

ference on Mechatronics and Automation (ICMA), pp. 1314–1319 (August 2010)

5. Choi, H., Kim, D.Y., Hwang, J.P., Kim, E., Kim, Y.O.: Cv-slam using ceiling

boundary. In: 2010 the 5th IEEE Conference on Industrial Electronics and Appli-

cations (ICIEA), pp. 228–233 (June 2010)

6. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische

Mathematik 1(1), 269–271 (1959)

7. Fiala, M.: Artag, a ﬁducial marker system using digital techniques. In: IEEE Com-

puter Society Conference on Computer Vision and Pattern Recognition, CVPR

2005, vol. 2, pp. 590–596 (June 2005)

8. Hwang, S.Y., Park, J.T., Song, J.B.: Autonomous navigation of a mobile robot

using an upward-looking camera and sonar sensors. In: 2010 IEEE Workshop on

Advanced Robotics and its Social Impacts (ARSO), pp. 40–45 (October 2010)

9. Kitanov, A., Bisevac, S., Petrovic, I.: Mobile robot self-localization in complex in-

door environments using monocular vision and 3d model. In: 2007 IEEE/ASME

International Conference on Advanced Intelligent Mechatronics, pp. 1–6 (Septem-

ber 2007)

10. Lim, H., Lee, Y.S.: Real-time single camera slam using ﬁducial markers. In: ICCAS-

SICE, pp. 177–182 (August 2009)

11. Lv, Q., Zhou, W., Liu, J.: Realization of odometry system using monocular vi-

sion. In: 2006 International Conference on Computational Intelligence and Security,

vol. 2, pp. 1841–1844 (Novemebr 2006)

12. Mavrinac, A., Chen, X., Tepe, K.: An automatic calibration method for stereo-

based 3d distributed smart camera networks. Computer Vision and Image Under-

standing 114(8), 952–962 (2010)

13. Schweighofer, G., Pinz, A.: Robust pose estimation from a planar target. IEEE

Transactions on Pattern Analysis and Machine Intelligence 28(12), 2024–2030

(2006)

14. Sharp, G., Lee, S., Wehe, D.: Multiview registration of 3d scenes by minimizing

error between coordinate frames. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence 26(8), 1037–1050 (2004)

15. Van Hamme, D., Veelaert, P., Philips, W.: Robust monocular visual odometry by

uncertainty voting. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 643–

647 (June 2011)

16. Yu, Y., Pradalier, C., Zong, G.: Appearance-based monocular visual odometry

for ground vehicles. In: 2011 IEEE/ASME International Conference on Advanced

Intelligent Mechatronics (AIM), pp. 862–867 (July 2011)

17. Zhang, Z.: A ﬂexible new technique for camera calibration. IEEE Transactions on

Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000)