Content uploaded by Aaron Mavrinac
Author content
All content in this area was uploaded by Aaron Mavrinac on Sep 20, 2017
Content may be subject to copyright.
A Self-localization System with Global Error
Reduction and Online Map-Building Capabilities
Karam Shaya, Aaron Mavrinac, Jose Luis Alarcon Herrera, and Xiang Chen
Department of Electrical and Computer Engineering, University of Windsor,
Windsor, Ontario, Canada
{shayak,mavrin1,alarconj,xchen}@uwindsor.ca
Abstract. An economical self-localization system which uses a monoc-
ular camera and a set of artificial landmarks is presented herein. The
system represents the surrounding environment as a topological graph
where each node corresponds to an artificial landmark and each edge
corresponds to a relative pose between two landmarks. The edges are
weighted based on an error metric (related to pose uncertainty) and a
shortest path algorithm is applied to the map to compute the path corre-
sponding to the least aggregate weight. This path is used to localize the
camera with respect to a global coordinate system whose origin lies on
an arbitrary reference landmark (i.e., the destination node of the path).
The proposed system does not require a preliminary training process, as
it builds and updates the map online. Experimental results demonstrate
the performance of the system in reducing the global error associated
with large-scale localization.
1 Introduction
With the growing demand for autonomous robots in industrial, medical, domes-
tic, and other domains, a large portion of research in the robotics industry has
been geared toward the development and improvement of localization systems.
For the purposes of this paper, existing localization systems in the literature
will be divided into two categories: those that obtain their data from multiple
sensors (e.g., [8], [1], [5]) and those that obtain them from a single sensor (e.g.,
[11], [15], [16]). The former type of system takes a sensor fusion approach. One
major advantage of sensor fusion is the availability of multiple sources of data,
through which the robot may verify the readings of its individual sensors and
reduce the overall error of its pose estimates. The disadvantages of multiple sen-
sors are added complexity (in the localization algorithm and hardware design),
larger form factor, and increased cost. Conversely, systems that use single sen-
sors tend to be simpler, smaller, and less expensive; however, they do not have
the redundancy and fusion of multiple independent sensor measurements and
must therefore use internal methods to reduce estimation error. The focus of our
work is on reducing the estimation error of a single sensor localization system.
This research is supported in part by NSERC-Discovery Grant to Prof. Xiang Chen.
C.-Y. Su, S. Rakheja, H. Liu (Eds.): ICIRA 2012, Part III, LNAI 7508, pp. 13–22, 2012.
c
Springer-Verlag Berlin Heidelberg 2012
14 K. Shaya et al.
Compared to the sensing modalities used in other solutions (e.g., odometry,
sonar, and laser), two dimensional images provide a robot’s localization algo-
rithm with more data about the environment [9]. They can be used by a lo-
calization algorithm to detect, identify, and estimate the pose of objects in a
scene.
The nature of the map-building process of visual localization yields two par-
ticular types of pose estimation errors: local error – which originates from error
and noise in image capture and affects pose estimations made with respect to
coordinate systems in the image – and global error – which arises from the ac-
cumulation of local error and affects pose estimations made with respect to a
global coordinate system that may not necessarily be in the image. Due to the
influence local error has on global error, reducing the former would result in
a reduction of the latter. This may be achieved, for example, through markers
(i.e., artificial landmarks) that can be accurately detected and discerned to help
keep local error at a minimum [7], [4], [10].
In this paper, the problem will be approached from a different perspective.
While local error reduction can be characterized as a low-level means of re-
ducing global error, the proposed method will take on a more direct approach
by representing the surrounding environment as a mathematical graph whose
edge weights reflect the effects of local error. Graphical representation of similar
vision-based pose estimation problems has previously been applied to such areas
as multiview registration of 3D scenes [14] and large-scale extrinsic calibration
of camera networks [3]. By applying a shortest path algorithm, the graph can
be optimized to yield the paths of minimum global error (i.e., accumulated local
error). To the best of the authors’ knowledge, the minimization of global error in
a localization system using the graphical approach outlined herein has not been
presented in literature.
The remainder of this paper is organized as follows: Section 2 defines the pre-
liminary concepts relating to pose composition, graphs, and the shortest path
algorithm; Section 3 states the necessary assumptions and presents the pro-
posed self-localization system’s method and algorithm; Section 4 demonstrates
the performance of the system in reducing the effect of global error; and Section
5 concludes the paper by providing reflections on the results.
2 Preliminaries
2.1 Pose Composition
AposePαβ is a rigid three dimensional Euclidean transformation from the co-
ordinate system of object αto the coordinate system of object β.Thismaybe
referred to as the pose of object αwith respect to object β.
The inverse of pose Pαβ may be denoted P−1
αβ or Pβα. The former notation
will be used here to emphasize that Pαβ is the available direct estimate.
Successive pose transformations may be composed into a single pose:
Pαγ(p)=(Pαβ ◦Pβγ )(p)=Pβγ(Pαβ (p)) (1)
Localization System with Global Error Reduction and Online Map-Building 15
Note that a left-composition convention is used to better illuminate the sequence
of pose transformations.
The details of pose inversion and composition vary depending on the repre-
sentation used. The reader is directed to any of the numerous texts on Euclidean
geometry for a treatment appropriate to his or her working representation.
2.2 Relative Pose Calculation
Refering to Fig. 1, the relative pose transformation of a marker αwith respect
to another marker βis determined through pose composition to be
Pαβ =Pαc ◦P−1
βc (2)
where Pαc and Pβc are the poses of αand βwith respect to the camera c
(assumed to be available). Note that this calculation is made possible by the
fact that both markers are in the camera’s FOV (shaded area in Fig. 1) at the
same time; more specifically, pose estimates are taken from the same captured
camera frame.
Fig. 1. Relative Pose Calculation - the camera can calculate the relative pose trans-
formation between multiple markers by capturing them concurrently in its FOV
2.3 Marker Graph
The marker graph (based on the calibration graph introduced by Mavrinac et
al. [12]) is a method of representing a set of markers as a topological map.
It is a weighted, undirected graph GM=(M,E
M,WM), where Mis the set
of detected markers in the system, EMis a set of edges, and WMis the set of
weights corresponding to the edges in EM. The existence of an edge {α, β}∈EM
indicates that a relative pose transformation from marker αto marker β(or vice
versa) is available.
Since it is trivial to invert a pose, the availability of Pαβ implies availability of
Pβα. The edge weight (wαβ ∈R+)∈W
Mis the estimation uncertainty of Pαβ.
Apathp=α,...,βin GM,fromnodeαto node β, represents a se-
quence of pose transformations which may be composed to yield Pαβ .Ifp=
v1,v
2,...,v
n,
P1,n =P1,2◦P2,3◦···◦Pn−1,n (3)
16 K. Shaya et al.
where Pi,j is the pose transformation from vito vj.IfanyPi,j is not available,
Pi,j =P−1
j,i . The aggregate error associated with this pose is
w1,n =
n
2
wk−1,k (4)
which is the length of path p.
2.4 Localization Graph
The localization graph is essentially a marker graph that includes the camera
cas an additional node. It is a weighted, undirected graph GL=(L,E
L,WL),
where L=c∪M,ELis the set of edges between nodes L,andWLis, again, the
set of associated edge weights. The localization graph is incrementally updated
as the camera cmoves through the environment. Note that GL⊃G
M.
2.5 Shortest Path Algorithm
Shortest path algorithms can be applied to graphs to find the minimum topolog-
ical path between two nodes. The famous shortest path algorithm by Dijkstra
[6] solves the single-source shortest path problem of undirected graphs with non-
negative edge weights. Since globally localizing a camera involves obtaining its
pose with respect to a single source (i.e., the global coordinate system), Dijk-
stra’s algorithm is appropriate for computing the shortest path of edges from
said source to the camera. It returns the path yielding the minimum aggregate
error defined in (4).
3 Global Localization
3.1 Assumptions
Internal Calibration and Pose Estimation. It is assumed that there exists
some means by which a camera may estimate, from a single view, its relative
three dimensional pose with respect to a calibration target of known structure
[17], [2]. This normally implies that the camera is internally calibrated.
Marker Constraints. It is assumed that (i) if a marker m∈Mis connected
to an edge, it remains fixed in its position, and (ii) the selected reference node
Rcorresponds to a marker that is available and detectable in the environment.
Map Updates. It is assumed that any operation that updates GMsimultane-
ously updates GL,andviceversa.
Localization System with Global Error Reduction and Online Map-Building 17
3.2 Problem Definition
The problem of global localization using computer vision is formalized as follows:
Given a monocular camera c,asetofmarkersM, and an arbitrary global
reference frame R∈M, find PcR as ctraverses the environment.
Let us further define a set of markers Vwhich are in the camera’s current FOV
(e.g., shaded area in Fig. 1). Then, it can be noted that PcR may either be
obtained directly from R(when R∈V) or indirectly from another marker v∈V
(when R/∈V), assuming there exists a path in GLfrom cto R.
It is additionally desirable to decrease the global error by using the path p
yielding the minimum aggregate error (as defined in (4)) for each estimated PcR.
3.3 Self-localization Method
The method will be explained with the aid of an example. Suppose it is desired
to find the pose of a monocular camera cwithin an environment consisting of
markers M={W, X, Y , Z, R}, as shown in Fig. 2. In this case, Ris selected as the
global reference frame, so the problem is to find PcR . As mentioned previously,
there are two ways of finding PcR : either directly through R(when Ris in the
FOV) or indirectly through intermediate markers W, X, Y , Z (when Ris not in
the FOV).
As shown in Fig. 3, the localization graph is connected assuming that the
direct pose estimates in Fig. 2 are all available. Thus, the camera ccan be
localized with respect to the common reference frame Rusing any of the markers
in the map.
The minimum requirement to achieve global localization is that there must
exist a path from the current position of cto the reference frame Rin GL.
Additional edges may yield shorter paths (i.e., pose compositions with lower
aggregate error). The positioning of the set of markers Mshould be chosen
appropriately. Note that there is no disadvantage, aside from additional effort
positioning and obtaining pose estimates, to increasing the size of M.
Fig. 2. Camera Pose Estimation - arrows indicate the direct pose estimate of one object
(camera or marker) with respect to another, where the pose of the object on the arrow’s
tail is given with respect to that on the arrow’s head
18 K. Shaya et al.
In this example, direct pose estimates PWc,PWX,PYX,PZX,PYZ,PYR
,and
PZR are obtained, along with their respective pose uncertainties. The availability
of direct pose estimates is encapsulated in the localization graph of Fig. 3.
Fig. 3. Localization Graph for System in Fig. 2 - edge weights represent pose estimation
uncertainty values
The solution is obtained through composition of the estimated poses, accord-
ing to (3), where the shortest paths are computed using Dijkstra’s algorithm
or similar. As an example, suppose the shortest path from cto Rin GLis
c, W, X, Y, Z, R. Then,
PcR =P−1
Wc ◦PWX ◦P−1
YX ◦PYZ ◦PZR (5)
as per (3). The associated aggregate error is wcR =wWc +wWX +wYX+wYZ+
wZR.
Map Updating. When there are multiple markers in the FOV, the relative
poses between all possible pairs of markers in the frame are calculated and
connecting edges (with associated weights) are created between them. If in a
subsequent frame an edge is re-detected and is found to have a lower weight than
the existing edge, its relative pose transformation and weight overwrite those
corresponding to the existing edge. In this way, the edges of the graph maintain
the minimum weights (and thus the relative poses with the least uncertainty) at
all times.
3.4 Self-localization Algorithm
In the formal expression of the algorithm (Algorithm 1), let primed ()variables
represent calculations made in the current frame (e.g., P
ij represents the relative
pose between markers iand jas calculated from the current frame). Three pre-
viously undefined functions are used in this algorithm: The first, calcw({α, β}),
calculates the weight of the edge connecting nodes αand βbased on an appropri-
ately derived error metric; the second, con(G,s,d), returns True if there exists a
path from sto din G; the third, sp(G,m,R), returns the shortest path psfrom
mto Rin G(using Dijkstra’s algorithm) in the form defined in Section 2.3. The
boolean variable nwill be used to indicate that a map update has occurred.
Localization System with Global Error Reduction and Online Map-Building 19
Algorithm 1. Proposed Self-Localization Algorithm (Finding P
cR)
1: M←{R}
2: EM,WM,V←∅
3: n←False
4: loop
5: Capture frame
6: if V =∅then
7: M←M∪V
8: if |V| >1then
9: for all {vi,v
j}∈V
2do
10: P
ij ←P
ic ◦P−1
jc
11: P
ji ←P−1
ij
12: if {vi,v
j}/∈EMthen
13: EM←EM∪{vi,v
j}
14: wij ,w
ji ←∞
15: WM←W
M∪wij ,w
ji
16: end if
17: if calcw({vi,v
j})<w
ij then
18: wij ,w
ji ←calcw({vi,v
j})
19: Pij ←P
ij
20: Pji ←P
ji
21: n←True
22: end if
23: end for
24: if n=True then
25: for all {m∈M|con(GM,m,R)}do
26: ps←sp(GM,m,R)
27: PmR ←Pps,1,ps,2
28: wmR ←wps,1,ps,2
29: for k=2→|ps|−1do
30: PmR ←PmR ◦Pps,k,ps,k+1
31: wmR ←wmR +wps,k,ps,k+1
32: end for
33: end for
34: end if
35: end if
36: if R∈V then
37: P
cR =P−1
Rc
38: else if ∃{v∈V |con(GM,v,R)}then
39: vm←argmin
v∈V
(calcw({v, c})+wvR)
40: P
cR ←P−1
vmc◦PvmR
41: end if
42: end if
43: n=False
44: end loop
20 K. Shaya et al.
4 Experimental Results
An experiment was performed in an indoor environment using an internally cali-
brated ICube NS4133BU Camera, a set of markers, a marker detection algorithm,
and the proposed self-localization algorithm. The purpose of the experiment is
to demonstrate the proposed algorithm’s performance in reducing global error.
The metric used for quantifying the edge weights of the localization graph
is based on the findings of Schweighofer and Pinz [13]. In their paper, they
demonstrate that pose ambiguity of planar targets is affected by the change in
position and/or orientation of the target in the scene. For instance, the pose
ambiguity was found to increase with the distance between the camera and
the target. Based on this, a simple function f(d), where dis the perpendicular
distance between the camera and a marker, can be used as a metric for the edge
weights. When the edge is defined by two markers, the function is applied to
each marker individually and the two results are averaged to obtain the weight
of the connecting edge.
The general procedure is outlined as follows:
–The markers are attached to a wall in the configuration shown in Fig. 4(a).
–The camera is traversed through the room (facing the markers throughout)
while the marker detection and self-localization algorithms are running.
–A map with reference Ris built and updated online by the self-localization
algorithm (refer to Fig. 4(b)).
–The camera is positioned such that only marker Yis visible in its FOV.
–From this known position, the localization accuracy was compared between
the shortest path obtained from our algorithm and the 35 other possible
(simple) paths between markers Yand R.
Three types of functions for edge weights were tested on the map: linear (f(d)=
d), quadratic (f(d)=d2), and exponential (f(d)=3
d−1). Each function was
applied to the 36 possible paths from Yto R. In all the cases, the shortest path
corresponding to the minimum aggregate error (obtained from our algorithm)
Fig. 4. Experimental Map - (a) arrangement of markers in the environment, (b) topo-
logical representation of the map
Localization System with Global Error Reduction and Online Map-Building 21
Fig. 5. Ground Truth vs. Aggregate Error - finding a correlation using exponential
edge weight function f(d)=50
d−1
yielded the lowest ground truth positional error compared to the 35 other paths.
The average, minimum, and maximum ground truth errors for the 36 paths
were identical for all three functions: 67.9, 29.1, and 117.8 mm, respectively.
Furthermore, it was observed that compared to the other two, the exponential
function showed the best correlation between the aggregate error and the ground
truth positional error. Fig. 5 shows this correlation in the form of a line of
best fit with correlation coefficient R≈0.74, using the exponential function
f(d)=50
d−1. The bottom left-most data point in this figure graphically shows
how the minimum aggregate error corresponds to the minimum ground truth
positional error of the 36 paths for the given metric.
The experiment was repeated using the same exponential error function, ex-
cept the three lower markers were moved closer to the camera and tilted hori-
zontally. In this case, the minimum ground truth error corresponded to the third
least aggregate error and there was a negative correlation of R≈0.47 between
the aggregate and ground truth errors. This indicates that this particular error
function may not be suitable for all configurations of markers.
5Conclusion
A proposed topological mapping approach is applied to a self-localization sys-
tem to reduce global error and eliminate the need for an offline map-building
mode. The map is incrementally built and updated as the camera moves through
the environment. A shortest path algorithm is applied to the map to find the
path of least aggregate error based on an appropriate edge weight metric. Ex-
periments were done with online map-building and using a weight metric based
on the perpendicular distance between the camera and each marker. The re-
sults demonstrate the effectiveness of the system in reducing global error but
emphasize the importance of deriving an appropriate error metric for the edge
weights.
22 K. Shaya et al.
References
1. Anjum, M., Park, J., Hwang, W., il Kwon, H., Hyeon Kim, J., Lee, C., Soo Kim,
K., il Danr Cho, D.: Sensor data fusion using unscented kalman filter for accu-
rate localization of mobile robots. In: 2010 International Conference on Control
Automation and Systems (ICCAS), pp. 947–952 (October 2010)
2. Ansar, A., Daniilidis, K.: Linear pose estimation from points or lines. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 25(5), 578–589 (2003)
3. Brand, M., Antone, M., Teller, S.: Spectral Solution of Large-Scale Extrinsic Cam-
era Calibration as a Graph Embedding Problem. In: Pajdla, T., Matas, J(G.) (eds.)
ECCV 2004. LNCS, vol. 3022, pp. 262–273. Springer, Heidelberg (2004)
4. Chen, X., Li, R., Wang, X., Tian, Y., Huang, Q.: A novel artificial landmark for
monocular global visual localization of indoor robots. In: 2010 International Con-
ference on Mechatronics and Automation (ICMA), pp. 1314–1319 (August 2010)
5. Choi, H., Kim, D.Y., Hwang, J.P., Kim, E., Kim, Y.O.: Cv-slam using ceiling
boundary. In: 2010 the 5th IEEE Conference on Industrial Electronics and Appli-
cations (ICIEA), pp. 228–233 (June 2010)
6. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
Mathematik 1(1), 269–271 (1959)
7. Fiala, M.: Artag, a fiducial marker system using digital techniques. In: IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition, CVPR
2005, vol. 2, pp. 590–596 (June 2005)
8. Hwang, S.Y., Park, J.T., Song, J.B.: Autonomous navigation of a mobile robot
using an upward-looking camera and sonar sensors. In: 2010 IEEE Workshop on
Advanced Robotics and its Social Impacts (ARSO), pp. 40–45 (October 2010)
9. Kitanov, A., Bisevac, S., Petrovic, I.: Mobile robot self-localization in complex in-
door environments using monocular vision and 3d model. In: 2007 IEEE/ASME
International Conference on Advanced Intelligent Mechatronics, pp. 1–6 (Septem-
ber 2007)
10. Lim, H., Lee, Y.S.: Real-time single camera slam using fiducial markers. In: ICCAS-
SICE, pp. 177–182 (August 2009)
11. Lv, Q., Zhou, W., Liu, J.: Realization of odometry system using monocular vi-
sion. In: 2006 International Conference on Computational Intelligence and Security,
vol. 2, pp. 1841–1844 (Novemebr 2006)
12. Mavrinac, A., Chen, X., Tepe, K.: An automatic calibration method for stereo-
based 3d distributed smart camera networks. Computer Vision and Image Under-
standing 114(8), 952–962 (2010)
13. Schweighofer, G., Pinz, A.: Robust pose estimation from a planar target. IEEE
Transactions on Pattern Analysis and Machine Intelligence 28(12), 2024–2030
(2006)
14. Sharp, G., Lee, S., Wehe, D.: Multiview registration of 3d scenes by minimizing
error between coordinate frames. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence 26(8), 1037–1050 (2004)
15. Van Hamme, D., Veelaert, P., Philips, W.: Robust monocular visual odometry by
uncertainty voting. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 643–
647 (June 2011)
16. Yu, Y., Pradalier, C., Zong, G.: Appearance-based monocular visual odometry
for ground vehicles. In: 2011 IEEE/ASME International Conference on Advanced
Intelligent Mechatronics (AIM), pp. 862–867 (July 2011)
17. Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on
Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000)