Conference PaperPDF Available

A Self-localization System with Global Error Reduction and Online Map-Building Capabilities


Abstract and Figures

An economical self-localization system which uses a monocular camera and a set of artificial landmarks is presented herein. The system represents the surrounding environment as a topological graph where each node corresponds to an artificial landmark and each edge corresponds to a relative pose between two landmarks. The edges are weighted based on an error metric (related to pose uncertainty) and a shortest path algorithm is applied to the map to compute the path corresponding to the least aggregate weight. This path is used to localize the camera with respect to a global coordinate system whose origin lies on an arbitrary reference landmark (i.e., the destination node of the path). The proposed system does not require a preliminary training process, as it builds and updates the map online. Experimental results demonstrate the performance of the system in reducing the global error associated with large-scale localization.
Content may be subject to copyright.
A Self-localization System with Global Error
Reduction and Online Map-Building Capabilities
Karam Shaya, Aaron Mavrinac, Jose Luis Alarcon Herrera, and Xiang Chen
Department of Electrical and Computer Engineering, University of Windsor,
Windsor, Ontario, Canada
Abstract. An economical self-localization system which uses a monoc-
ular camera and a set of artificial landmarks is presented herein. The
system represents the surrounding environment as a topological graph
where each node corresponds to an artificial landmark and each edge
corresponds to a relative pose between two landmarks. The edges are
weighted based on an error metric (related to pose uncertainty) and a
shortest path algorithm is applied to the map to compute the path corre-
sponding to the least aggregate weight. This path is used to localize the
camera with respect to a global coordinate system whose origin lies on
an arbitrary reference landmark (i.e., the destination node of the path).
The proposed system does not require a preliminary training process, as
it builds and updates the map online. Experimental results demonstrate
the performance of the system in reducing the global error associated
with large-scale localization.
1 Introduction
With the growing demand for autonomous robots in industrial, medical, domes-
tic, and other domains, a large portion of research in the robotics industry has
been geared toward the development and improvement of localization systems.
For the purposes of this paper, existing localization systems in the literature
will be divided into two categories: those that obtain their data from multiple
sensors (e.g., [8], [1], [5]) and those that obtain them from a single sensor (e.g.,
[11], [15], [16]). The former type of system takes a sensor fusion approach. One
major advantage of sensor fusion is the availability of multiple sources of data,
through which the robot may verify the readings of its individual sensors and
reduce the overall error of its pose estimates. The disadvantages of multiple sen-
sors are added complexity (in the localization algorithm and hardware design),
larger form factor, and increased cost. Conversely, systems that use single sen-
sors tend to be simpler, smaller, and less expensive; however, they do not have
the redundancy and fusion of multiple independent sensor measurements and
must therefore use internal methods to reduce estimation error. The focus of our
work is on reducing the estimation error of a single sensor localization system.
This research is supported in part by NSERC-Discovery Grant to Prof. Xiang Chen.
C.-Y. Su, S. Rakheja, H. Liu (Eds.): ICIRA 2012, Part III, LNAI 7508, pp. 13–22, 2012.
Springer-Verlag Berlin Heidelberg 2012
14 K. Shaya et al.
Compared to the sensing modalities used in other solutions (e.g., odometry,
sonar, and laser), two dimensional images provide a robot’s localization algo-
rithm with more data about the environment [9]. They can be used by a lo-
calization algorithm to detect, identify, and estimate the pose of objects in a
The nature of the map-building process of visual localization yields two par-
ticular types of pose estimation errors: local error – which originates from error
and noise in image capture and affects pose estimations made with respect to
coordinate systems in the image – and global error – which arises from the ac-
cumulation of local error and affects pose estimations made with respect to a
global coordinate system that may not necessarily be in the image. Due to the
influence local error has on global error, reducing the former would result in
a reduction of the latter. This may be achieved, for example, through markers
(i.e., artificial landmarks) that can be accurately detected and discerned to help
keep local error at a minimum [7], [4], [10].
In this paper, the problem will be approached from a different perspective.
While local error reduction can be characterized as a low-level means of re-
ducing global error, the proposed method will take on a more direct approach
by representing the surrounding environment as a mathematical graph whose
edge weights reflect the effects of local error. Graphical representation of similar
vision-based pose estimation problems has previously been applied to such areas
as multiview registration of 3D scenes [14] and large-scale extrinsic calibration
of camera networks [3]. By applying a shortest path algorithm, the graph can
be optimized to yield the paths of minimum global error (i.e., accumulated local
error). To the best of the authors’ knowledge, the minimization of global error in
a localization system using the graphical approach outlined herein has not been
presented in literature.
The remainder of this paper is organized as follows: Section 2 defines the pre-
liminary concepts relating to pose composition, graphs, and the shortest path
algorithm; Section 3 states the necessary assumptions and presents the pro-
posed self-localization system’s method and algorithm; Section 4 demonstrates
the performance of the system in reducing the effect of global error; and Section
5 concludes the paper by providing reflections on the results.
2 Preliminaries
2.1 Pose Composition
AposePαβ is a rigid three dimensional Euclidean transformation from the co-
ordinate system of object αto the coordinate system of object β.Thismaybe
referred to as the pose of object αwith respect to object β.
The inverse of pose Pαβ may be denoted P1
αβ or Pβα. The former notation
will be used here to emphasize that Pαβ is the available direct estimate.
Successive pose transformations may be composed into a single pose:
Pαγ(p)=(Pαβ Pβγ )(p)=Pβγ(Pαβ (p)) (1)
Localization System with Global Error Reduction and Online Map-Building 15
Note that a left-composition convention is used to better illuminate the sequence
of pose transformations.
The details of pose inversion and composition vary depending on the repre-
sentation used. The reader is directed to any of the numerous texts on Euclidean
geometry for a treatment appropriate to his or her working representation.
2.2 Relative Pose Calculation
Refering to Fig. 1, the relative pose transformation of a marker αwith respect
to another marker βis determined through pose composition to be
Pαβ =Pαc P1
βc (2)
where Pαc and Pβc are the poses of αand βwith respect to the camera c
(assumed to be available). Note that this calculation is made possible by the
fact that both markers are in the camera’s FOV (shaded area in Fig. 1) at the
same time; more specifically, pose estimates are taken from the same captured
camera frame.
Fig. 1. Relative Pose Calculation - the camera can calculate the relative pose trans-
formation between multiple markers by capturing them concurrently in its FOV
2.3 Marker Graph
The marker graph (based on the calibration graph introduced by Mavrinac et
al. [12]) is a method of representing a set of markers as a topological map.
It is a weighted, undirected graph GM=(M,E
M,WM), where Mis the set
of detected markers in the system, EMis a set of edges, and WMis the set of
weights corresponding to the edges in EM. The existence of an edge {α, β}∈EM
indicates that a relative pose transformation from marker αto marker β(or vice
versa) is available.
Since it is trivial to invert a pose, the availability of Pαβ implies availability of
Pβα. The edge weight (wαβ R+)∈W
Mis the estimation uncertainty of Pαβ.
Apathp=α,...,βin GM,fromnodeαto node β, represents a se-
quence of pose transformations which may be composed to yield Pαβ .Ifp=
P1,n =P1,2P2,3◦···Pn1,n (3)
16 K. Shaya et al.
where Pi,j is the pose transformation from vito vj.IfanyPi,j is not available,
Pi,j =P1
j,i . The aggregate error associated with this pose is
w1,n =
wk1,k (4)
which is the length of path p.
2.4 Localization Graph
The localization graph is essentially a marker graph that includes the camera
cas an additional node. It is a weighted, undirected graph GL=(L,E
where L=c∪M,ELis the set of edges between nodes L,andWLis, again, the
set of associated edge weights. The localization graph is incrementally updated
as the camera cmoves through the environment. Note that GL⊃G
2.5 Shortest Path Algorithm
Shortest path algorithms can be applied to graphs to find the minimum topolog-
ical path between two nodes. The famous shortest path algorithm by Dijkstra
[6] solves the single-source shortest path problem of undirected graphs with non-
negative edge weights. Since globally localizing a camera involves obtaining its
pose with respect to a single source (i.e., the global coordinate system), Dijk-
stra’s algorithm is appropriate for computing the shortest path of edges from
said source to the camera. It returns the path yielding the minimum aggregate
error defined in (4).
3 Global Localization
3.1 Assumptions
Internal Calibration and Pose Estimation. It is assumed that there exists
some means by which a camera may estimate, from a single view, its relative
three dimensional pose with respect to a calibration target of known structure
[17], [2]. This normally implies that the camera is internally calibrated.
Marker Constraints. It is assumed that (i) if a marker m∈Mis connected
to an edge, it remains fixed in its position, and (ii) the selected reference node
Rcorresponds to a marker that is available and detectable in the environment.
Map Updates. It is assumed that any operation that updates GMsimultane-
ously updates GL,andviceversa.
Localization System with Global Error Reduction and Online Map-Building 17
3.2 Problem Definition
The problem of global localization using computer vision is formalized as follows:
Given a monocular camera c,asetofmarkersM, and an arbitrary global
reference frame R∈M, find PcR as ctraverses the environment.
Let us further define a set of markers Vwhich are in the camera’s current FOV
(e.g., shaded area in Fig. 1). Then, it can be noted that PcR may either be
obtained directly from R(when R∈V) or indirectly from another marker v∈V
(when R/∈V), assuming there exists a path in GLfrom cto R.
It is additionally desirable to decrease the global error by using the path p
yielding the minimum aggregate error (as defined in (4)) for each estimated PcR.
3.3 Self-localization Method
The method will be explained with the aid of an example. Suppose it is desired
to find the pose of a monocular camera cwithin an environment consisting of
markers M={W, X, Y , Z, R}, as shown in Fig. 2. In this case, Ris selected as the
global reference frame, so the problem is to find PcR . As mentioned previously,
there are two ways of finding PcR : either directly through R(when Ris in the
FOV) or indirectly through intermediate markers W, X, Y , Z (when Ris not in
the FOV).
As shown in Fig. 3, the localization graph is connected assuming that the
direct pose estimates in Fig. 2 are all available. Thus, the camera ccan be
localized with respect to the common reference frame Rusing any of the markers
in the map.
The minimum requirement to achieve global localization is that there must
exist a path from the current position of cto the reference frame Rin GL.
Additional edges may yield shorter paths (i.e., pose compositions with lower
aggregate error). The positioning of the set of markers Mshould be chosen
appropriately. Note that there is no disadvantage, aside from additional effort
positioning and obtaining pose estimates, to increasing the size of M.
Fig. 2. Camera Pose Estimation - arrows indicate the direct pose estimate of one object
(camera or marker) with respect to another, where the pose of the object on the arrow’s
tail is given with respect to that on the arrow’s head
18 K. Shaya et al.
In this example, direct pose estimates PWc,PWX,PYX,PZX,PYZ,PYR
PZR are obtained, along with their respective pose uncertainties. The availability
of direct pose estimates is encapsulated in the localization graph of Fig. 3.
 
Fig. 3. Localization Graph for System in Fig. 2 - edge weights represent pose estimation
uncertainty values
The solution is obtained through composition of the estimated poses, accord-
ing to (3), where the shortest paths are computed using Dijkstra’s algorithm
or similar. As an example, suppose the shortest path from cto Rin GLis
c, W, X, Y, Z, R. Then,
PcR =P1
as per (3). The associated aggregate error is wcR =wWc +wWX +wYX+wYZ+
Map Updating. When there are multiple markers in the FOV, the relative
poses between all possible pairs of markers in the frame are calculated and
connecting edges (with associated weights) are created between them. If in a
subsequent frame an edge is re-detected and is found to have a lower weight than
the existing edge, its relative pose transformation and weight overwrite those
corresponding to the existing edge. In this way, the edges of the graph maintain
the minimum weights (and thus the relative poses with the least uncertainty) at
all times.
3.4 Self-localization Algorithm
In the formal expression of the algorithm (Algorithm 1), let primed ()variables
represent calculations made in the current frame (e.g., P
ij represents the relative
pose between markers iand jas calculated from the current frame). Three pre-
viously undefined functions are used in this algorithm: The first, calcw({α, β}),
calculates the weight of the edge connecting nodes αand βbased on an appropri-
ately derived error metric; the second, con(G,s,d), returns True if there exists a
path from sto din G; the third, sp(G,m,R), returns the shortest path psfrom
mto Rin G(using Dijkstra’s algorithm) in the form defined in Section 2.3. The
boolean variable nwill be used to indicate that a map update has occurred.
Localization System with Global Error Reduction and Online Map-Building 19
Algorithm 1. Proposed Self-Localization Algorithm (Finding P
1: M←{R}
2: EM,WM,V←
3: nFalse
4: loop
5: Capture frame
6: if V =then
7: M←M∪V
8: if |V| >1then
9: for all {vi,v
10: P
ij P
ic P1
11: P
ji P1
12: if {vi,v
13: EMEM∪{vi,v
14: wij ,w
ji ←∞
15: WM←W
Mwij ,w
16: end if
17: if calcw({vi,v
ij then
18: wij ,w
ji calcw({vi,v
19: Pij P
20: Pji P
21: nTrue
22: end if
23: end for
24: if n=True then
25: for all {m∈M|con(GM,m,R)}do
26: pssp(GM,m,R)
27: PmR Pps,1,ps,2
28: wmR wps,1,ps,2
29: for k=2→|ps|−1do
30: PmR PmR Pps,k,ps,k+1
31: wmR wmR +wps,k,ps,k+1
32: end for
33: end for
34: end if
35: end if
36: if R∈V then
37: P
cR =P1
38: else if ∃{v∈V |con(GM,v,R)}then
39: vmargmin
(calcw({v, c})+wvR)
40: P
cR P1
41: end if
42: end if
43: n=False
44: end loop
20 K. Shaya et al.
4 Experimental Results
An experiment was performed in an indoor environment using an internally cali-
brated ICube NS4133BU Camera, a set of markers, a marker detection algorithm,
and the proposed self-localization algorithm. The purpose of the experiment is
to demonstrate the proposed algorithm’s performance in reducing global error.
The metric used for quantifying the edge weights of the localization graph
is based on the findings of Schweighofer and Pinz [13]. In their paper, they
demonstrate that pose ambiguity of planar targets is affected by the change in
position and/or orientation of the target in the scene. For instance, the pose
ambiguity was found to increase with the distance between the camera and
the target. Based on this, a simple function f(d), where dis the perpendicular
distance between the camera and a marker, can be used as a metric for the edge
weights. When the edge is defined by two markers, the function is applied to
each marker individually and the two results are averaged to obtain the weight
of the connecting edge.
The general procedure is outlined as follows:
The markers are attached to a wall in the configuration shown in Fig. 4(a).
The camera is traversed through the room (facing the markers throughout)
while the marker detection and self-localization algorithms are running.
A map with reference Ris built and updated online by the self-localization
algorithm (refer to Fig. 4(b)).
The camera is positioned such that only marker Yis visible in its FOV.
From this known position, the localization accuracy was compared between
the shortest path obtained from our algorithm and the 35 other possible
(simple) paths between markers Yand R.
Three types of functions for edge weights were tested on the map: linear (f(d)=
d), quadratic (f(d)=d2), and exponential (f(d)=3
d1). Each function was
applied to the 36 possible paths from Yto R. In all the cases, the shortest path
corresponding to the minimum aggregate error (obtained from our algorithm)
Fig. 4. Experimental Map - (a) arrangement of markers in the environment, (b) topo-
logical representation of the map
Localization System with Global Error Reduction and Online Map-Building 21
     
Fig. 5. Ground Truth vs. Aggregate Error - finding a correlation using exponential
edge weight function f(d)=50
yielded the lowest ground truth positional error compared to the 35 other paths.
The average, minimum, and maximum ground truth errors for the 36 paths
were identical for all three functions: 67.9, 29.1, and 117.8 mm, respectively.
Furthermore, it was observed that compared to the other two, the exponential
function showed the best correlation between the aggregate error and the ground
truth positional error. Fig. 5 shows this correlation in the form of a line of
best fit with correlation coefficient R0.74, using the exponential function
d1. The bottom left-most data point in this figure graphically shows
how the minimum aggregate error corresponds to the minimum ground truth
positional error of the 36 paths for the given metric.
The experiment was repeated using the same exponential error function, ex-
cept the three lower markers were moved closer to the camera and tilted hori-
zontally. In this case, the minimum ground truth error corresponded to the third
least aggregate error and there was a negative correlation of R0.47 between
the aggregate and ground truth errors. This indicates that this particular error
function may not be suitable for all configurations of markers.
A proposed topological mapping approach is applied to a self-localization sys-
tem to reduce global error and eliminate the need for an offline map-building
mode. The map is incrementally built and updated as the camera moves through
the environment. A shortest path algorithm is applied to the map to find the
path of least aggregate error based on an appropriate edge weight metric. Ex-
periments were done with online map-building and using a weight metric based
on the perpendicular distance between the camera and each marker. The re-
sults demonstrate the effectiveness of the system in reducing global error but
emphasize the importance of deriving an appropriate error metric for the edge
22 K. Shaya et al.
1. Anjum, M., Park, J., Hwang, W., il Kwon, H., Hyeon Kim, J., Lee, C., Soo Kim,
K., il Danr Cho, D.: Sensor data fusion using unscented kalman filter for accu-
rate localization of mobile robots. In: 2010 International Conference on Control
Automation and Systems (ICCAS), pp. 947–952 (October 2010)
2. Ansar, A., Daniilidis, K.: Linear pose estimation from points or lines. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 25(5), 578–589 (2003)
3. Brand, M., Antone, M., Teller, S.: Spectral Solution of Large-Scale Extrinsic Cam-
era Calibration as a Graph Embedding Problem. In: Pajdla, T., Matas, J(G.) (eds.)
ECCV 2004. LNCS, vol. 3022, pp. 262–273. Springer, Heidelberg (2004)
4. Chen, X., Li, R., Wang, X., Tian, Y., Huang, Q.: A novel artificial landmark for
monocular global visual localization of indoor robots. In: 2010 International Con-
ference on Mechatronics and Automation (ICMA), pp. 1314–1319 (August 2010)
5. Choi, H., Kim, D.Y., Hwang, J.P., Kim, E., Kim, Y.O.: Cv-slam using ceiling
boundary. In: 2010 the 5th IEEE Conference on Industrial Electronics and Appli-
cations (ICIEA), pp. 228–233 (June 2010)
6. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
Mathematik 1(1), 269–271 (1959)
7. Fiala, M.: Artag, a fiducial marker system using digital techniques. In: IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition, CVPR
2005, vol. 2, pp. 590–596 (June 2005)
8. Hwang, S.Y., Park, J.T., Song, J.B.: Autonomous navigation of a mobile robot
using an upward-looking camera and sonar sensors. In: 2010 IEEE Workshop on
Advanced Robotics and its Social Impacts (ARSO), pp. 40–45 (October 2010)
9. Kitanov, A., Bisevac, S., Petrovic, I.: Mobile robot self-localization in complex in-
door environments using monocular vision and 3d model. In: 2007 IEEE/ASME
International Conference on Advanced Intelligent Mechatronics, pp. 1–6 (Septem-
ber 2007)
10. Lim, H., Lee, Y.S.: Real-time single camera slam using fiducial markers. In: ICCAS-
SICE, pp. 177–182 (August 2009)
11. Lv, Q., Zhou, W., Liu, J.: Realization of odometry system using monocular vi-
sion. In: 2006 International Conference on Computational Intelligence and Security,
vol. 2, pp. 1841–1844 (Novemebr 2006)
12. Mavrinac, A., Chen, X., Tepe, K.: An automatic calibration method for stereo-
based 3d distributed smart camera networks. Computer Vision and Image Under-
standing 114(8), 952–962 (2010)
13. Schweighofer, G., Pinz, A.: Robust pose estimation from a planar target. IEEE
Transactions on Pattern Analysis and Machine Intelligence 28(12), 2024–2030
14. Sharp, G., Lee, S., Wehe, D.: Multiview registration of 3d scenes by minimizing
error between coordinate frames. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence 26(8), 1037–1050 (2004)
15. Van Hamme, D., Veelaert, P., Philips, W.: Robust monocular visual odometry by
uncertainty voting. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 643–
647 (June 2011)
16. Yu, Y., Pradalier, C., Zong, G.: Appearance-based monocular visual odometry
for ground vehicles. In: 2011 IEEE/ASME International Conference on Advanced
Intelligent Mechatronics (AIM), pp. 862–867 (July 2011)
17. Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on
Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000)
... In particular, planar markers [1]- [6], which are designed to be easily detected and associated across images, find extensive use in laboratory and commercial settings (factories, warehouses, mines, etc.). In applications that perform planar marker-based SfM or SLAM [7]- [10], there is a basic need to estimate the 6DOF pose of an observed marker relative to the camera coordinate frame. This is often solved as a special case of planar pose estimation (PPE), which functions by determining the relative pose between a plane of known dimensions and its projection onto the image [11]- [13]. ...
... Marker-based SfM/SLAM is an active research area [7]- [10], [20]. Marker ambiguity is not dealt with explicitly in [7], [9], [20], though [9] combined feature-based SfM with marker-based SfM. ...
... Marker-based SfM/SLAM is an active research area [7]- [10], [20]. Marker ambiguity is not dealt with explicitly in [7], [9], [20], though [9] combined feature-based SfM with marker-based SfM. Munoz-Salinas et al. applied the ratio test of [13] in their marker-based SfM [8] and SLAM pipeline [10]. ...
Full-text available
Planar markers are useful in robotics and computer vision for mapping and localisation. Given a detected marker in an image, a frequent task is to estimate the 6DOF pose of the marker relative to the camera, which is an instance of planar pose estimation (PPE). Although there are mature techniques, PPE suffers from a fundamental ambiguity problem, in that there can be more than one plausible pose solutions for a PPE instance. Especially when localisation of the marker corners is noisy, it is often difficult to disambiguate the pose solutions based on reprojection error alone. Previous methods choose between the possible solutions using a heuristic criteria, or simply ignore ambiguous markers. We propose to resolve the ambiguities by examining the consistencies of a set of markers across multiple views. Our specific contributions include a novel rotation averaging formulation that incorporates long-range dependencies between possible marker orientation solutions that arise from PPE ambiguities. We analyse the combinatorial complexity of the problem, and develop a novel lifted algorithm to effectively resolve marker pose ambiguities, without discarding any marker observations. Results on real and synthetic data show that our method is able to handle highly ambiguous inputs, and provides more accurate and/or complete marker-based mapping and localisation.
... As a solution to these issues, several authors have suggested the use of fiducial markers to enhance the pose estimation process (Davison et al. 2007;Yamada et al. 2009;Klopschitz and Schmalstieg 2007;Shaya et al. 2012;Neunert et al. 2015). The SPM-SLAM method (Rafael et al. 2019) addresses some of the aforementioned limitations by utilizing squared fiducial markers instead of natural features. ...
Full-text available
Fiducial markers are a cost-effective solution for solving labeling and monocular localization problems, making them valuable tools for augmented reality (AR), robot navigation, and 3D modeling applications. However, with the development of many marker detection systems in the last decade, it has become challenging for new users to determine which is best suited for their needs. This paper presents a qualitative and quantitative evaluation of the most relevant marker systems. We analyze the available alternatives in the literature, describe their differences and limitations, and conduct detailed experiments to compare them in terms of sensitivity, specificity, accuracy, computational cost, and performance under occlusion. To our knowledge, this study provides the most comprehensive and updated comparison of fiducial markers. In the Conclusion section, we offer recommendations on which method to use based on the application requirements.
... For instance, a tag-based precision landing on a recharging station for automated energy replenishment of micro UAVs was proposed in [56]. However, few studies have focused on using tags as a significant part of their localization solution [57,58]. ...
Automated visual data collection using autonomous unmanned aerial vehicles (UAVs) can improve the accessibility and accuracy of the frequent data required for indoor construction inspections and tracking. However, robust localization, as a critical enabler for autonomy, is challenging in ever-changing, cluttered, GPS-denied indoor construction environments. Rapid alterations and repetitive low-texture areas on indoor construction sites jeopardize the reliability of typical vision-based solutions. This research proposes a tag-based visual-inertial localization method for off-the-shelf UAVs with only a camera and an inertial measurement unit (IMU). Given that tag locations are known in the BIM, the proposed method estimates the UAV's global pose by fusing inertial data and tag measurements using an on-manifold extended Kalman filter (EKF). The root-mean-square error (RMSE) achieved in our experiments in laboratory and simulation, being as low as 2 − 5 cm, indicates the potential of deploying the proposed method for autonomous navigation of low-cost UAVs in indoor construction environments.
... An alternative to tracking natural keypoints in images is detecting fiducial markers. Klopschitz and other authors [59][60][61] introduced methods for solving visual SLAM based on markers. However, they do not consider optimizing the estimated marker poses for the ambiguity problem. ...
Full-text available
Artificial marker mapping is a useful tool for fast camera localization estimation with a certain degree of accuracy in large indoor and outdoor environments. Nonetheless, the level of accuracy can still be enhanced to allow the creation of applications such as the new Visual Odometry and SLAM datasets, low-cost systems for robot detection and tracking, and pose estimation. In this work, we propose to improve the accuracy of map construction using artificial markers (mapping method) and camera localization within this map (localization method) by introducing a new type of artificial marker that we call the smart marker. A smart marker consists of a square fiducial planar marker and a pose measurement system (PMS) unit. With a set of smart markers distributed throughout the environment, the proposed mapping method estimates the markers’ poses from a set of calibrated images and orientation/distance measurements gathered from the PMS unit. After this, the proposed localization method can localize a monocular camera with the correct scale, directly benefiting from the improved accuracy of the mapping method. We conducted several experiments to evaluate the accuracy of the proposed methods. The results show that our approach decreases the Relative Positioning Error (RPE) by 85% in the mapping stage and Absolute Trajectory Error (ATE) by 50% for the camera localization stage in comparison with the state-of-the-art methods present in the literature.
... Later, Klopschitz et al. [31] proposed a method where Structure from Motion was first employed to reconstruct the scene, and then the markers were located from the recovered camera poses. Karam et al. [32] presented a method to create a map of markers as a pose graph where nodes represent markers and edge the relative pose between them. Finally, Neunert et al. [33] proposed to fuse inertial information with fiducial markers using an EKF. ...
Full-text available
This paper proposes a novel approach for Simultaneous Localization and Mapping by fusing natural and artificial landmarks. Most of the SLAM approaches use natural landmarks (such as keypoints). However, they are unstable over time, repetitive in many cases or insufficient for a robust tracking (e.g. in indoor buildings). On the other hand, other approaches have employed artificial landmarks (such as squared fiducial markers) placed in the environment to help tracking and relocalization. We propose a method that integrates both approaches in order to achieve long-term robust tracking in many scenarios. Our method has been compared to the start-of-the-art methods ORB-SLAM2 [1] and LDSO [2] in the public dataset Kitti [3], Euroc-MAV [4], TUM [5] and SPM [6], obtaining better precision, robustness and speed. Our tests also show that the combination of markers and keypoints achieves better accuracy than each one of them independently.
Visual simultaneous localization and mapping (VSLAM) is an appropriate method for positioning and navigation of intelligent unmanned systems under Global Navigation Satellite Systems (GNSS)-denied environment, but it is still facing some dilemmas in repetitive large-scale environments. In this article, a VSLAM method based on bifocal-binocular vision is proposed. By introducing the binocular camera with different focal lengths, the perception ability of the system in vast space is improved as the designed cameras could complement each other at different working distances. Meanwhile, considering the inherent structure of the scene, additional optimization is proposed to reduce the accumulated error based on the markers distribution knowledge obtained from online placement inference. The algorithm proposed in this article significantly improves the stability and accuracy of the VSLAM system in repetitive large-scale scenes, and is validated in both virtual datasets and real-world environments.
Simultaneous Localization and Mapping (SLAM) is a key problem in the field of Artificial Intelligence and mobile robotics that addresses the problem of localization and mapping when a prior map of the workspace is not accessible. The determination of the SLAM problem has gained significant research momentum up to recent times. In this paper, firstly the problem of SLAM, its general model, framework, the difficulties, and leading approaches are described. Secondly, the progress of SLAM solving algorithms is surveyed throughout history. Pre-development, early SLAM solving algorithms, recent and present methods are presented and the progression of the state-of-art is reviewed based on the impact of leading approaches. We have selected some of the most important approaches of all time (1986–2019) to understand the research development, current trends, and intellectual structure of SLAM. Furthermore, from the trend of recent studies and the existence of difficult problems, a brief but sufficient review in the visual SLAM with the most outstanding approaches is presented. This paper provides one single sufficient review that allows researchers to understand the trend of SLAM, where it has come from, where it is going to and what needs to be more investigated in the SLAM-related field area. The future, in other words, the potential most important approaches inspiring the future researches in the SLAM problem can be seen. This paper will be an efficient overview and a valuable survey for introducing the SLAM solving approaches in mobile robotics as well as the general application of SLAM.
Simultaneous Localization and Mapping is the process of simultaneously creating a map of the environment while navigating in it. Most of the SLAM approaches use natural features (e.g. keypoints) that are unstable over time, repetitive in many cases or their number insufficient for a robust tracking (e.g. in indoor buildings). Other researchers, on the other hand, have proposed the use of artificial landmarks, such as squared fiducial markers, placed in the environment to help tracking and relocalization. This paper proposes a novel SLAM approach by fusing natural and artificial landmarks in order to achieve long-term robust tracking in many scenarios. Our method has been compared to the start-of-the-art methods ORB-SLAM2 [1], LDSO [2] and SPM-SLAM [3] in the public datasets Kitti [4], Euroc-MAV [5], TUM [6] and SPM [3], obtaining better precision, robustness and speed. Our tests also show that the combination of markers and keypoints achieves better accuracy than each one of them independently.
Conference Paper
Full-text available
This paper presents a sensor-data-fusion method using an Unscented Kalman Filter (UKF), to implement an accurate localization system for mobile robots. Integration of data from various sensors using an efficient sensor fusion algorithm is required to achieve a continuous and accurate localization of mobile robots. We use data from low cost accelerometer, gyroscope, and encoders to obtain robot motion information. The UKF, used as an efficient sensor fusion algorithm, is an advanced filtering technique which outperforms the widely-used Extended Kalman Filter (EKF) in many applications. The system is able to compensate for the slip errors by switching between two different UKF models built for slip and no-slip cases. Since the accelerometer error accumulates with time because of the double integration, the data from accelerometer is only used in slip model of the UKF. The results obtained from UKF sensor fusion algorithm are compared with the results from an accurate distance laser sensor. The experimental results show that the system is able to accurately track the motion of the robot in various robot motion scenarios including the scenario when robot's encoders data is not reliable due to the slip occurrence.
Conference Paper
Full-text available
Visual odometry system is an important part in the navigation task. It is a remedy to the traditional odometry, which can give precise distance and course information in unknown or slippery situations. This paper proposes a monocular vision odometry system. It facilitates matching keypoints in different frames sharing the same world coordinates as one necessary constraint. By accurately matching the keypoints from image frames, we can successfully use coordinates transformation to estimate the distance and the rotation angles of the vehicles
This paper presents a method for computing the visual odometry of ground vehicles. By mounting a downward-looking camera on the vehicle, it can recover the motion of the vehicle only by using the video input. Different from other feature based algorithms, the key technique deployed in this system is an appearance-based approach — template matching. A rotated template matching approach is used for motion estimation. For each two consecutive images captured from the camera, an image patch is first selected from the first image, and then the image patch is rotated by every angle within a predefined range and matched against the second image. The best result from the matching indicates both the position translation and heading changes. For performance evaluation, this algorithm is tested on the indoor granite surface. In the end, we give a comparison of the result between wheel odometry and visual odometry.
This paper deals with the autonomous navigation scheme for a mobile robot in indoor environment using an upward-looking camera and sonar sensors. Corner and lamp features are extracted from the sequential ceiling images, and these features are used as landmarks in the SLAM (simultaneous localization and mapping) process. Combining lamp information with the conventional corner feature-based approach provides accurate pose estimation, since lamp features are robustly detected and associated in most indoor environments. The extracted features are used in the EKF (extended Kalman filter) to estimate both robot pose and feature positions. Based on the pose estimation from the SLAM process, autonomous exploration is achieved by applying driving gains to exploration nodes. The sonar sensors are adopted to detect most obstacles including glasses and black surfaces. The proposed scheme is a low-cost solution to autonomous mobile robot navigation since it can be implemented with a web camera and a small number of sonar sensors. Experimental results show that the proposed scheme works successfully in real environments.
Conference Paper
This paper addresses the problem of large scale multiview registration of range images captured from unknown viewing directions. To reduce the computational burden, we decouple the local problem of pairwise registration on neighboring views from the global problem of distribution of accumulated errors. We define the global problem over the graph of neighboring views, and we show that this graph can be decomposed into a set of cycles such that the optimal transformation parameters for each cycle can be solved in closed form. We then describe an iterative procedure that can be used to integrate the solutions for the set of cycles across the graph. This method for error distribution does not require point correspondences between views, and therefore can be used together with robot odometry or any method of pairwise registration. Experimental results demonstrate the effectiveness of this technique on range images of an indoor facility.
We consider a graph with n vertices, all pairs of which are connected by an edge; each edge is of given positive length. The following two basic problems are solved. Problem 1: construct the tree of minimal total length between the n vertices. (A tree is a graph with one and only one path between any two vertices.) Problem 2: find the path of minimal total length between two given vertices.
Conference Paper
In this paper, we consider the problem of mobile robot pose estimation using only visual information from a single camera and odometry readings. A focus is on building complex environmental models, fast online rendering and real-time complex and noisy image segmentation. The 3D model of the robot's environment is built using a professional freeware computer graphics tool named Blender and pre-stored in the memory of the robot's on-board computer. Estimation of the mobile robot pose as a stochastic variable is done by correspondences of image lines, extracted using Random Window Randomized Hough Transform line detection algorithm, and model lines, predicted using odometry readings and 3D environment model. The camera model and ray tracing algorithm are also described. Developed algorithms are experimentally tested using a Pioneer 2DX mobile robot.
Conference Paper
GPS by itself is not dependable in urban environments, due to signal reception issues such as multi-path effects or occlusion. Other sensor data is required to keep track of the vehicle in absence of a reliable GPS signal. We propose a new method to use a single on-board consumer-grade camera for vehicle motion estimation. The method is based on the tracking of ground plane features, taking into account the uncertainty on their backprojection as well as the uncertainty on the vehicle motion. A Hough-like parameter space vote is employed to extract motion parameters from the uncertainty models. The method is easy to calibrate and designed to be robust to outliers and bad feature quality. Experimental results show good accuracy and high reliability, with a positional estimate within 2 metres for a 400 metre elapsed distance.