Visual path following using only monocular vision for urban environments.
ABSTRACT This document provides a summary to a short video with the same title. The video shows the French intelligent transportation vehicle CyCab performing visual path following using only monocular vision. All phases of the process are shown with a spoken commentary. In the teaching phase, the user drives the robot manually while images from the camera are stored. Key images with corresponding images features are stored as a map together with 2D and 3D local information. In the navigation phase, CyCab follows the learned path by tracking the images features projected from the map and with a simple visual servoing control law.
- SourceAvailable from: Anthony Remazeilles[Show abstract] [Hide abstract]
ABSTRACT: Autonomous cars will likely play an important role in the future. A vision system designed to support outdoor navigation for such vehicles has to deal with large dynamic environments, changing imaging conditions, and temporary occlusions by other moving objects. This paper presents a novel appearance-based navigation framework relying on a single perspective vision sensor, which is aimed towards resolving of the above issues. The solution is based on a hierarchical environment representation created during a teaching stage, when the robot is controlled by a human operator. At the top level, the representation contains a graph of key-images with extracted 2D features enabling a robust navigation by visual servoing. The information stored at the bottom level enables to efficiently predict the locations of the features which are currently not visible, and eventually (re-)start their tracking. The outstanding property of the proposed framework is that it enables robust and scalable navigation without requiring a globally consistent map, even in interconnected environments. This result has been confirmed by realistic off-line experiments and successful real-time navigation trials in public urban areas.2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA; 01/2007
Conference Paper: Outdoor autonomous navigation using monocular vision[Show abstract] [Hide abstract]
ABSTRACT: In this paper, a complete system for outdoor robot navigation is presented. It uses only monocular vision. The robot is first guided on a path by a human. During this learning step, the robot records a video sequence. From this sequence, a three dimensional map of the trajectory and the environment is built. When this map has been computed, the robot is able to follow the same trajectory by itself. Experimental results carried out with an urban electric vehicle are shown and compared to the ground truth.Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on; 09/2005
Conference Paper: 3D navigation based on a visual memory[Show abstract] [Hide abstract]
ABSTRACT: This paper addresses the design of a control law for vision-based robot navigation. The method proposed is based on a topological representation of the environment. Within this context, a learning stage enables a graph to be built in which nodes represent views acquired by the camera, and edges denote the possibility for the robotic system to move from one image to an other. A path finding algorithm then gives the robot a collection of views describing the environment it has to go through in order to reach its desired position. This article focuses on the control law used for controlling the robot motion's online. The particularity of this control law is that it does not require any reconstruction of the environment, and does not force the robot to converge towards each intermediary position in the path. Landmarks matched between each consecutive views of the path are considered as successive features that the camera has to observe within its field of view. An original visual servoing control law, using specific features, ensures that the robot navigates within the visibility path. Simulation results demonstrate the validity of the proposed approachRobotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on; 06/2006
Visual path following using only monocular vision for urban
Albert Diosi, Fabien Spindler, Anthony Remazeilles, Siniˇ saˇSegvi´ c and Franc ¸ois Chaumette
Abstract—This document provides a summary to a short
video with the same title. The video shows the French intelligent
transportation vehicle CyCab performing visual path following
using only monocular vision. All phases of the process are
shown with a spoken commentary. In the teaching phase, the
user drives the robot manually while images from the camera
are stored. Key images with corresponding images features are
stored as a map together with 2D and 3D local information.
In the navigation phase, CyCab follows the learned path by
tracking the images features projected from the map and with
a simple visual servoing control law.
Autonomous transportation vehicles will likely play an
important role in the future. The video corresponding to this
summarizing text shows a system capable of learning and
executing a path using only one perspective camera.
In the video a CyCab, a French-made 4 wheel drive,
4 wheel steered intelligent vehicle designed to carry 2
passengers is shown. In our CyCab all computations except
the low-level control are carried out on a laptop with a 2GHz
Centrino processor. A 70◦field of view, forward looking,
B&W Allied Vision Marlin (F-131B) camera is mounted on
the robot at a 65cm height. The camera is used in its auto
shutter mode, while the image are scaled down to 320x240.
II. DETAILS ON VISUAL NAVIGATION
This section briefly describes the implemented visual
navigation framework. The teaching of the robot i.e. the
mapping of the environment is described first, followed
by the description of the navigation process consisting of
localization and robot control.
Learning a path (i.e. mapping) starts with the manual
driving of the robot on a reference path while processing
(or storing for off-line mapping) the images from the robot’s
camera. From the images an internal representation of the
path is created, as summarized in fig. 1. The mapping starts
with finding Harris points in the first image, initializing a
Kanade-Lucas-Tomasi (KLT) feature tracker and by saving
the first image as the first reference image. The KLT tracker
was modified to compensate for changes in the illumination.
In the next step a new image is acquired and the tracked
features are updated. The tracking of features which appear
different than in the previous reference image is abandoned.
The presented work has been performed within the French national
project Predit Mobivip and project Robea Bodega.
The authors are with IRISA/INRIA Rennes, Campus Beaulieu, 35042
Rennes cedex, France. Email: firstname.lastname@example.org
sequence of images, i.e. mapping.
The steps involved in building a representation of a path from a
The rest of the features are then used to estimate the 3D
geometry between the previous reference and the current
image. In the 3D geometry estimation, the essential matrix is
recovered using a calibrated algorithm in a random sampling
framework. If the 3D reconstructionerror is low and there are
enough tracked features a new image is acquired. Otherwise
the current image is saved as the next reference image. The
relative pose of the current image with respect to the previous
reference image and the 2D and 3D coordinates of the point
features shared with the previous reference image are also
saved. Then the tracker is reinitialized with new Harris points
added to the old ones and the processing loop continues with
acquiring a new image.
The resulting map (fig. 2) is used during autonomous
navigation in the localization module to provide stable image
points for image-based visual servoing.
The localization process during navigation is depicted
in fig. 3. The navigation process is started with the user
selecting a reference image close to the robot’s current
location. Then an image is acquired and matched to the
selected reference image. The matching is done using Lowe’s
SIFT descriptors. The estimation of the camera pose using
the matched points enables to project map points from the
reference image into the current image. The projected points
During navigation, the point features from the map are projected into the
current image and tracked.
The map consists of reference images, 2D and 3D information.
are then used to initialize a KLT tracker. Next, a new image
is acquired and the point positions are updated by the tracker.
Using the tracked points a three-view geometry calculation
is performed between the previous reference, current and
next reference image (fig. 2). If the current image is found
to precede the next reference image, then points from the
map are reprojected into the current image. The projected
points are used to resume the tracking of points currently
not tracked and to stop the tracking of points which are far
from their projections. A new image is acquired next and the
whole cycle continues with tracking. However, if it is found
that the current image comes after the next reference image,
a topological transition is made i.e. the next-next reference
image becomes the next reference image. The tracker is
then reinitialized with points from the map and the process
continues with acquiring a new image.
Localize in map
Fig. 3.Visual localization during navigation.
C. Motion Control
In the motion control scheme the robot is not required
to accurately reach each reference image of the path, nor
to follow accurately the learned path since that may not be
useful during navigation. In practice, the exact motion of the
robot should be controlled by an obstacle avoidance module
which we plan to implement soon. Therefore a simple
control algorithm was implemented where the difference
in the x-coordinates (assuming the forward facing camera’s
horizontal axis is orthogonal with the axis of robot rotation)
of the centroid of features in the current and next reference
image are fed back into the motion controller of the robot
as steering angle.
The translational velocity is set to a constant value, except
during turns, where it is reduced (to a smaller constant value)
to ease the tracking of quickly moving features in the image.
III. ADDITIONAL INFORMATION
The concept of the framework has been evaluated using
simulations in , while the feature tracker and the complete
vision subsystem have been described in , . An experi-
mental evaluation of the navigation framework can be found
IV. RELATED WORK
In  a CyCab follows a prerecorded trajectory using a
camera with a fisheye lens. Unlike in our work, an accurate
3D model of the path is created using a dense set of
reference images with global using bundle adjustment. After
the scale is corrected using odometry measurements the
robot accurately follows the reference path using pose based
In  a robot navigated outdoors with 2D image informa-
tion only. During mapping, image features were tracked and
their image patches together with their x image coordinates
were saved. During navigation, the robot control was based
on simple rules applied to the tracked feature coordinates in
the next reference and current image.
The work described in  aimed at indoor navigation,
can deal with occlusion of features using 3D information.
A local 3D reconstruction is done between two reference
omnidirectional images. During navigation, tracked features
which have been occluded get reprojected back into the
current image. The recovered pose of the robot is used to
guide the robot towards the target image.
 Z. Chen and S. T. Birchfield. Qualitative vision-based mobile robot
navigation. In ICRA, Orlando, 2006.
 A. Diosi, S Remazeilles, A.and Segvic, and F. Chaumette. Outdoor
visual path following experiments. Submitted to IROS’07.
 T. Goedeme, T. Tuytelaars, G. Vanacker, M. Nuttin, and L. Van Gool.
Feature based omnidirectional sparse visual path following. In IROS,
Edmonton, Canada, August 2005.
 A. Remazeilles, P. Gros, and F. Chaumette. 3D navigation based on a
visual memory. In ICRA’06, 2006.
 E. Royer, J. Bom, M. Dhome, B Thuillot, M. Lhuillier, and F. Mar-
moiton. Outdoor autonomous navigation using monocular vision. In
IROS, pages 3395–3400, Edmonton, Canada, August 2005.
 S. Segvic, A. Remazeilles, and F. Chaumette.
feature tracker by adaptive modelling of the feature support. In ECCV,
Graz, Austria, 2006.
 S. Segvic, A. Remazeilles, A. Diosi, and F. Chaumette. Large scale
vision based navigation without an accurate global reconstruction. In
Accepted to CVPR’07, Minneapolis, June 2007.
Enhancing the point