IMAGE MATCHING ROBUST TO CHANGES IN IMAGING CONDITIONS WITH A
Naoko Enami, Norimichi Ukita, Masatsugu Kidode
Nara Institute of Science and Technology
8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, JAPAN
In this paper, we propose a matching method for images cap-
tured at different times and under different capturing condi-
and low frame-rate images captured asynchronously. To cope
with this difficulty, previous and current panoramic images
are created from sequential images which are rectified based
on the view direction of a camera, and then compared. In ad-
dition, in order to allow the matching method to be applicable
to images captured under varying conditions, (1) for differ-
ent lanes, enlarged/reduced panoramic images are compared
with each other, and (2) robustness to noises and changes in
illumination is improvedby the edge features. To confirm the
effectiveness of the proposed method, we conducted experi-
ments matching real images captured under various capturing
Index Terms— Image Matching, Car Mounted Camera,
GPS, Panoramic Image
society. For accurate navigation, the map should be up-to-
date. However, most map information is updated on the basis
of surveillance fieldwork.
roads (e.g., the locations of road markers and traffic signals,
the number of lanes). Streetscape information provides the
locations of stores/buildings that can be landmarks and desti-
nations for drivers. Road informationis scheduled to be com-
piled as a database by the government ministries. Therefore,
if streetscape information can be also automatically updated
by change detection in streetscapes, all the map information
can be efficiently updated.
Methods for detecting a change in streetscapes, by com-
paring their images, have been proposed. One of the methods
uses satellite images . The change of a streetscape is de-
tected by the difference between segments in the images that
are captured in the same location at different times. However,
some important information such as signs can’t be obtained
because the images are captured from the sky. On the other
hand,  uses a car-mounted omni-directionalcamera and an
off-the-shelf GPS. Since the sides of buildings are thus ob-
served, information obtained by this method is more detailed
thanthat obtainedfromsatellite images. Imagesequences ob-
served in roughly the same location are first extracted from a
numberof images that are capturedat differenttimes with po-
sition information. Since the off-the-shelf GPS has a margin
of error of about 15m, the images in the same location are ex-
the sequences. The change in streetscapes is then detected by
the pixel difference between the extracted images.
Since this method  assumes that an omni-directional
camera captures images under convenient capturing condi-
tions (i.e., high frame-rate capturing in a low car-speed), (1)
a small change is difficult to detect because the resolution per
area of the omni-directionalcamera is low, and (2) wide areas
can’t be observed simultaneously because image observation
under the above conditions is not desired for public automo-
biles, that is, probecars are required;it is impossibleto secure
a large number of probe cars in order to observe wide areas
Consequently, we propose a map-update system with the
• Wide areas can be observed simultaneously by obser-
vation from a number of normal automobiles.
• The system assumes that each automobile has an or-
dinary low-resolution car-mounted camera, of the type
which is often used in a drive-recorder, and an off-the-
shelf GPS receiver.
server system via wireless communication. The server
system analyzes the collected images in order to detect
changes in streetscapes.
978-1-4244-2665-2/08/$25.00 c ?2008 IEEE
As with the previous method , our system first matches
previous and current images captured in the same location.
Any change is then detected based on the difference between
the matched images. However, our system has the following
problems that need solving:
• An off-the-shelf GPS receiver has a margin of error of
• Images at different times look different due to changes
• Since imagesare capturedfroma numberof normalau-
are different,andthe framerateof the imagesis low due
to wireless communication. These features result in the
large difference in the image-capturing positions.
in the images changes significantly.
While the first two problems are dealt with also by existent
methods, the last three are our own system’s own problems.
We propose an image matching method that solves all the
problems. This matching method must be robust to changes
in capturing conditions that are due to observation by various
car-mounted cameras at different times.
2. MAP UPDATE SYSTEM
In this section, we describe the details of the map update sys-
This system collects the detailed streetscape information
in streetscape are detected by using the collected information.
For simultaneous wide-area observation, normal automobiles
shown in Fig.1(a) collect information while running freely
with a monocular camera and a GPS. The camera positions
and the view direction are different for each automobile. The
camera is assumed to be set in the back of the rearview mir-
ror. Its view direction is pointed to the automobile’s moving
direction as shown in Fig.2. In Fig.1(b), the collected infor-
mation (i.e., image, positioning information, capturing time,
internal parameters of the camera, camera positions) is sent
to the map update system via a wireless communication such
as the cell-phonethat is used by an existing map-deliverysys-
tem . Because of wireless communication,images must be
compressed (e.g., JPEG) and low resolution (e.g., 640×480
pixel) in order to decrease data traffic. However, unlike an
omni-directional image that captures 360-degrees view in a
360×240pixel image, a normal camera can capture the detail
Server: Map-update system Server: Map-update systemServer: Map-update system Server: Map-update system
(3) Detecting the change in(3) Detecting the change in (3) Detecting the change in (3) Detecting the change in
(4) Updating the map(4) Updating the map (4) Updating the map (4) Updating the map
(1) Recording the input data(1) Recording the input data(1) Recording the input data (1) Recording the input data
(2) Matching between (2) Matching between(2) Matching between (2) Matching between
Previous & input dataPrevious & input data Previous & input dataPrevious & input data
Car-navigation systemCar-navigation systemCar-navigation system Car-navigation system
Client: automobilesClient: automobilesClient: automobiles Client: automobiles
Monocular cameraMonocular cameraMonocular camera Monocular camera
(b) Output data(b) Output data (b) Output data(b) Output data(b) Output data(b) Output data(b) Output data
Newest mapNewest mapNewest map Newest mapNewest map Newest map Newest map
(a) Input data(a) Input data (a) Input data (a) Input data(a) Input data (a) Input data (a) Input data
ImagesImagesImages ImagesImages Images Images
PositionsPositionsPositions PositionsPositions PositionsPositions
Capturing conditions Capturing conditionsCapturing conditionsCapturing conditionsCapturing conditions Capturing conditionsCapturing conditions
Wireless comm.Wireless comm.Wireless comm. Wireless comm.
Fig. 1. Navigation map update system.
Fig. 2. Camera mounted in a car.
of a streetscape (e.g., a store sign) even in a low-resolution
image, as shown in Fig.3. If the imaging conditions are as-
sumed to be the following, a maximum of 450KB of trans-
mission occurs in 1 second. The file size of an image is about
100KB-150KB, and the framerate is 3fps. The bandwidth
of a current cell-phone is about 269kbps. However, in the
next-generation cell-phone it is 20Mbps in an experimental
measurement. Therefore, if the above mentioned compressed
images are received selectively from only several cars, it is
possible for the server system to collect enough online the in-
formationto updatea map. The system updatesthe mapusing
the collected information as illustrated in Fig.1;
(1) Thepreviousinformationdatabaseis compiledfromin-
(2) The system matches the previous and current image se-
quences that are captured in the same location.
(3) The change is detected by comparing the matched im-
(4) The map is updated based on the detected change.
Fig. 3. (a): Observed image in our system, (b): Omnidirec-
tional image in .
3. IMAGE MATCHING ROBUST TO CHANGES IN
THE CAPTURING CONDITIONS
3.1. Image Matching using Panoramic Images
In our method, by comparing these sequences that are ex-
tracted by the position information, previous and current im-
ages capturedin exactlythe same locationare matched. How-
ever, in our system the capturing intervals are lengthened in
order to send the images via wireless communication. Be-
cause of different car-speeds, the capturing timing is not syn-
if the current image is compared with one of the previous im-
ages captured in the location nearest to where the current im-
age is captured (Fig.4(a)), simply searching for an overlap-
ping region between these two images is difficult because the
region is small due to the long capturing intervals. There-
fore, our method matches the images using their panoramic
images. The panoramic image is created by connecting time-
series images in each previous and current sequence. Since
the overlappingregion is large in the panoramicimage, stable
matching is possible even if the capturing interval is long and
the capturing timing changes (Fig.4(b)).
have been proposed: (1) [] creates a panoramic image by
concatenatingimages captured at rest while moving a camera
along a straight line parallel to a streetscape and stop at reg-
ular intervals, and (2) [ ,  , , ] create a panoramic
camera. [, , ] use line-scanned images captured by a
car-mountedcamera. Furthermore,image blurring is rectified
by using the estimated depth of an object in the panoramic
image in []. In [], a 3D textured urban model is gener-
ated by projecting observed line-scanned images onto a CAD
model by using a laser range scanner. These additional pro-
cedures allow us to generate more informative panoramic im-
ages. However, all of these methods cannot deal with the
different capturing conditions such as a moving speed and
a camera angle. Differences in these conditions result in a
corrupted panoramic image. As mentioned before, these dif-
ferences inevitably exist in our map-update system.
Our method copes with these differences by (1) rectify-
ing captured images based on camera angle and (2) adjusting
the width of each image concatenated in chronological order
based on camera speed. With the images thus rectified and
adjusted, panoramic images acquired at different times can
be similar. The similar panoramic images facilitate finding an
overlappingregion between them (i.e., facilitate matching the
As with a panoramic image, EPI (Epipolar Plane Image)
analysis is a popular way to improve the robustness of an-
alyzing an observation scene by connecting time-series im-
ages . Several methods for a car-mounted camera have
been also proposed. For example, 3D reconstruction and ar-
Panoramic image generated from previous images Panoramic image generated from previous imagesPanoramic image generated from previous images
Panoramic image generated from input Panoramic image generated from input
Input image Input image
Frame t+1Frame t+1
Previous imagesPrevious images
Matching with each imageMatching with each image
Matching while Matching while
(a): Matching with each image (a): Matching with each image
(b):Matching with Panoramic image (b):Matching with Panoramic image
Fig. 4. Image matching.
bitrary view generation with an EPI image  and arbitrary
view generation by matching EPI images of multiple cameras
. However, in our proposed system, normal automobiles
often move unpredictably with undesired movements, for ex-
ample, swaying up-down and left-right, and changing speed.
These undesired movements make EPI analysis more likely
to fail. The negative influence of these movements has been
confirmed in .
The features for the matching are extracted from less-
changed streetscape area that faces the car’s lane. This is be-
cause the streetscape area has the characteristic features for
identifying the capturing position. However, since the view
direction is pointed in the direction that the automobile is
moving direction, the streetscape areas are significantly dis-
torted at the left side of the image of the streetscape, as shown
in Fig.5. Although the camera points to the automobile’s
moving direction in our system, the camera should face the
streetscape for observing the images suitable for identifica-
tion. In order to create these images virtually, a part of the
captured image is cut out and projected onto a panoramic im-
age plane using projective transformation.
3.2. The problems occur due to the change in capturing
copewith thechangein capturingconditionsthat occurdue to
various factors (e.g., the number and variety of automobiles
and cameras, and different capturing times). In what follows
we first describe (1) the change in capturing conditions that
effect the appearance of the panoramic image and (2) how to
cope with it.
(1) Changes in the view direction.
Since the view direction of cameras differ, the appear-
ances of the images differ too, even if they are captured
in the same location, as is shown in Fig.5(a) and (c).
image. Therefore, a virtual image is created from the
capturedimage so that its view directioncoincides with
the automobile’s moving direction. The view direction
of a real camera is obtained with reference to the au-
tomobile’s moving direction and used to generate the
virtual image. The stability of the image matching can
(b): in a right lane (b): in a right lane
(a): in a left lane(a): in a left lane
(c): in a left lane(c): in a left lane
Same camera-view direction Same camera-view direction
Different camera-view directionDifferent camera-view direction
Fig. 5. Images captured by different camera directions and in
be improved using the panoramic images that are cre-
ated from these virtual images. In  also, a panoramic
image is created from the images in which the roadside
is observed. In this system, however, an undistorted
panoramic image can easily be created by connecting
high-frame-rateline-scan images observed by the fixed
camera. On the other hand, for the panoramic image
that is suitable for image matching, image compensa-
tion based on the camera rotation angle is required.
(2) Changes in the weather conditions
The illumination is greatly changed dependingthe cap-
turing time. Because the capturing time is included
in the capturing information sent from the client, im-
age sequences captured at the about same time can be
found. However, image comparing is difficult because
of changes in brightness due to the weather conditions.
In the existent method , the image matching is done
by the DP matching that uses the color information in
images. However, the color information is susceptible
to changes in the weather conditions. If the illumina-
tion changes excessively in our system, the color infor-
mation is not suitable for the features. Therefore, the
images are matched by the edge features that are robust
to the change in the illumination.
(3) Changes such as roadside trees and signs
ing between current and previous images. These re-
gions are thereforeremovedfromthe edgeimages. The
generated edge images enable image matching using
characteristic features (e.g., boundaries of buildings)
robust to changes over time.
(4) Changes in driving lanes
If images are observed in different lanes, the appear-
ance of the streetscapes there are also different because
the distance between a camera and the streetscapes is
changed, as shown in Fig.5(a) and (b). In order to
match these images correctly, they are compared while
changing their sizes.
Matching between previous & Matching between previous &
input data captured at the input data captured at the
same location same location
Panoramic image generation for matching
robust to capturing conditions
1. Rotation rectification1. Rotation rectification1. Rotation rectification
a. FOE estimationa. FOE estimation
b. Rotation angle estimationb. Rotation angle estimation
c. Rectificationc. Rectification
Previous dataPrevious data
Image sequences matching
4. Matching4. Matching4. Matching
b. Correlation calculation b. Correlation calculation
a. Template enlargement/shrinkagea. Template enlargement/shrinkage
b. Noise removalb. Noise removal
3. Feature extraction for matching 3. Feature extraction for matching
a. Edge extraction a. Edge extraction
2. Panoramic image generation 2. Panoramic image generation
a. Region extraction a. Region extraction
c. Panoramic image c. Panoramic image
b. Projection to a panoramic plane b. Projection to a panoramic plane
Previous data Previous data
Panoramic image generation for matching
robust to capturing conditions
Image sequences matching
Fig. 6. Image matching process.
3.3. Image matching process
Fig.6 illustrates the process flow of our method.
(1) Image rectification based on the camera rotation(Sec.
1. FOEestimation: TheFOE is estimatedusingop-
tical flows shown in the observed images.
2. Camera rotation estimation: The rotation angle
of the camera is estimated using the FOE.
3. Image rectification: Each observedimage is rec-
tified so that the images are captured while the
camera is directed in the moving direction of the
(2) Panoramic image generation from an image sequence
1. Region extraction: A part of image (Hereafter, it
is defined as the rectangle image) whose width is
equal to the displacement is then extracted from
2. Image projection: The rectangle image is pro-
jected onto a panoramic image plane parallel to
the automobile’s moving direction.
3. Panoramic image generation: The panoramic
image is created by concatenating the rectangle
(3) Feature extraction for image matching (Sec.3.6)
Edges are extracted from the panoramic image. The
noises are eliminated.
(4) Image matching (Sec.3.7)
For image matching, the various sizes of the current
panoramic image are compared with all regions in the
previous panoramic image.
3.4. Rotation Rectification
First, we describe how to estimate the FOE in images. When
an automobile with the camera moves forward, the FOE indi-
cates its moving direction.
In practice, the FOE is calculated as follows. First of all,
at time t. Each point is tracked in an image at t + 1 based on
correlation of its local image in order to calculate its optical
flow. The feature point should be extracted in a streetscape
region that consists of characteristic appearances and that is
always visible from the camera. It is, however, difficult to
estimate robustly the flow of a point near the image center in
which a scene distant from the camera is observed, because
its displacement between sequential frames is very small and
leads is followed by a noisy flow. The flow around an outer
region can be, on the other hand, robustly estimated because
its displacement is large enough if the point correspondence
between sequential frames is established. However, the large
displacement results in difficulty in optical flow estimation
points are extractedaround the mid point1. The flows of these
points are computed based on their tracking results between
N frames2. The centroid of the intersection of these flows is
regarded as the FOE in the image at t.
If the opticalaxis of the cameracoincideswith the 3D line
from the optical center of the camera to the FOE in the image
plane,theimageplaneis consideredto beperpendicularto the
movingdirection of the automobile. The rotation angle of the
camera is represented by pan (θ) and tilt (φ) angles between
the optical axis, and the 3D line is obtained by the following
, where fl denotes the focal length of the camera, which is
known because all camera parameters required are sent with
observed images from an automobile (2).
Eachobservedimageis thenrectified with therotationan-
gle of the camera. Fig.7 illustrates the plane consisting of the
optical axis and the x axis of the image in a 3D coordinate
system. Let virtual image plane B be perpendicularto the op-
tical axis, while observed image plane A is rotated by θ and
φ. To obtain a pixel value for each point b in B for generating
a rectified image, the intersection, a, of A and the line that is
determined by b and the optical center Ocis computed. The
2D coordinates of a in A can be calculated by rotating a by
−θ and−φ; thex-ycoordinatesofa 3Dpoint,b, in B indicate
1In experiments shown in this paper, the feature points are extracted be-
tween x = 120 and x = 170 in a VGA image.
2In experiments shown in this paper, N is determined to be 3 because any
point is observed at least through 3 frames in our captured image sequences.
Virtual image Virtual image
plane B plane B
Optical axis Optical axis
Fig. 7. Rotation rectification.
Image plane B Image plane B
position at t+2
position at t+2
position at t+1
position at t+1
Moving directionMoving directionMoving direction
Streetscape in a sceneStreetscape in a scene Streetscape in a scene
Panoramic image plane CPanoramic image plane C
Extracted image regionExtracted image region
Fig. 8. Panoramic image generation.
its 2D image coordinates whose image center is at the inter-
section of the optical axis and B. a is rotated by the following
rotation matrix. This process is performed for all pixels in B.
As a result, the rotation-rectified image can be generated.
3.5. Panoramic Image Generation
A rectangle image is created by extracting a part of an ob-
served image. The rectangle image is used for generating a
panoramic image (Fig.8). If the rectangle image is extracted
from an outer region, its appearance is deformed significantly
due to perspective projection. This deformation makes it dif-
ficultto concatenatesmoothlythe sequentialrectangleimages
for generating a panoramic image. Although the deformation
around the image center is small, features suitable for image
matching can’t be obtained from this region because every-
thing looks tiny. The rectangle image is, therefore, extracted
from the mid point between the left side and the center of the
The widthofthe rectangleimageshouldbe determinedby
a horizontal displacement between two sequential frames for
suppressing the gap between the sequential rectangle images.
The displacement is determined by the horizontal component
ofopticalflow, whichis employedalso forobtainingtheFOE.
However, while the FOE needs to be estimated once, unless
a camera mounted in an automobile is moved, the rectangle
image is extracted from all observed images. That is, feature