# Image Registration in a Coarse Three-Dimensional Virtual Environment.

**ABSTRACT** Abstract In recent years, the availability of off-the-shelf geometric data for an urban environment has increased. During rendering, ground level images are mapped onto the façades of the buildings to improve the visual quality of the scene. This paper focuses on a technique that enables ground level images to be automatically integrated into an existing coarse three-dimensional environment. The approach utilises the planar nature of architectural scenes to enable the automatic extraction of a building façade from an image and its registration into the virtual environment.

**0**Bookmarks

**·**

**85**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Cyber city displays 3D geoinformatics for virtual reality in computer vision. On the geometric aspect, the purpose is to establish 3D models. In the visualization, the objects need more texture details to reach reality. This research registers aerial photos onto building facades. In which, building roofs and walls are to be treated. Generally, the occlusion due to the wall is more complicated than the roof. Therefore, before texture mapping, we need to detect occlusions. This research uses traditional aerial photos and oblique helicopter images. The main task is to detect hidden areas for roofs and walls, then compensate them with multi-view images. The wall images that we choose are based on oblique photography. For a target wall, we select the largest oblique angle for the prior image in order to get more homogeneous image resolution. However, the occlusion problem by neighboring buildings is more serious. It is the major task of this study to select the optimal combination of images. For a roof, we use the vertical photos to obtain better resolution. The essence of the step is to register the roof surface with images. Experimental results indicate that high fidelity may be reached. - SourceAvailable from: citeseerx.ist.psu.edu01/2006;
- SourceAvailable from: Florentin Wörgötter[Show abstract] [Hide abstract]

**ABSTRACT:**In the first part of this article, we analyze the relation between local image structures (i.e., homogeneous, edge-like, corner-like or texture-like structures) and the underlying local 3D structure (represented in terms of continuous surfaces and different kinds of 3D discontinuities) using range data with real-world color images. We find that homogeneous image structures correspond to continuous surfaces, and discontinuities are mainly formed by edge-like or corner-like structures, which we discuss regarding potential computer vision applications and existing assumptions about the 3D world. In the second part, we utilize the measurements developed in the first part to investigate how the depth at homogeneous image structures is related to the depth of neighbor edges. For this, we first extract the local 3D structure of regularly sampled points, and then, analyze the coplanarity relation between these local 3D structures. We show that the likelihood to find a certain depth at a homogeneous image patch depends on the distance between the image patch and a neighbor edge. We find that this dependence is higher when there is a second neighbor edge which is coplanar with the first neighbor edge. These results allow deriving statistically based prediction models for depth interpolation on homogeneous image structures.Network Computation in Neural Systems 07/2007; 18(2):129-60. · 0.50 Impact Factor

Page 1

Volume 25 (2006), number 1 pp. 69–82

COMPUTER GRAPHICS forum

Image Registration in a Coarse Three-Dimensional Virtual

Environment

R. G. Laycock and A. M. Day

School of Computing Sciences University of East Anglia Norwich, UK,

robert.laycock@uea.ac.uk, amd@cmp.uea.ac.uk

Abstract

In recent years, the availability of off-the-shelf geometric data for an urban environment has increased. During

rendering, ground level images are mapped onto the fac ¸ades of the buildings to improve the visual quality of the

scene. This paper focuses on a technique that enables ground level images to be automatically integrated into an

existing coarse three-dimensional environment. The approach utilises the planar nature of architectural scenes to

enabletheautomaticextractionofabuildingfac ¸adefromanimageanditsregistrationintothevirtualenvironment.

Keywords: image registration, virtual environment, texture mapping

ACM CCS: I.4.8 Image Processing and Computer Vision: Scene Analysis

1. Introduction

The development of technologies in the remote sensing and

photogrammetryfieldshasledtotheincreaseintheavailabil-

ity of large three-dimensional urban models. Many applica-

tionssuchasvirtualtourismandurbanplanningrequirethese

models to be rendered in real time. To improve the visualiza-

tion of such models, images are applied to the geometry. A

typical approach is to overlay an aerial image onto the 2.5D

geometricmodel[1,2].Whilethisisfasttoundertakeandthe

result is adequate for aerial views of the scene, the result is

lessthandesirableforgroundlevelwalkthroughapplications.

Theaerialimageisstretchedoverthebuildingfa¸ cades,which

are an integral part of the model during a walkthrough. Con-

sequently, ground level images should be incorporated into

themodeltoimprovethevisualqualityoftherenderedscene.

2. Related Work

A variety of approaches have been developed that would fa-

cilitate the integration of ground level images into an initial

three-dimensional model. One possibility would be to ob-

tain a three-dimensional model and the corresponding tex-

ture maps simultaneously using structure from multiple view

techniques[3–5].Theresultingmodelwouldbeinteractively

merged and automaticallyrefined to fit with the initial virtual

environment. The difficulty, which arises with these tech-

niques, is that many architectural scenes are planar. There-

fore, it is often the case that the relationship between the

automatically identified corresponding features is modelled

better using a planar homography rather than the Fundamen-

tal Matrix. Furthermore, the model will become distorted

owing to occluding objects. To alleviate the problems of oc-

clusions, interactive modeling systems have been developed

such as described in [6] and [7]. In these systems the user is

able to generate a model by constructing various primitives

such as cuboids, prisms, etc. The primitives are fitted to the

image automatically to obtain the model. These interactive

systems involve too much user input when modeling a large

area. Consequently, vehicle-borne systems have been devel-

oped by Fr¨ uh and Zakhor [2] and Zhao and Shibasaki [8].

Fr¨ uh and Zakhor present a procedure that enables the models

to be registered with the three dimensional data. While these

approachesarefastatacquiringthemodelandthetextures,in

doing so they incorporate additional hardware such as laser

scanners. The disadvantage of the method is the limited field

ofviewsincethecamerasandlasersarefixedonatripod.This

limitation to capture the images to cover the entire building

fa¸ cade is reduced if the camera is permitted to move freely.

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and

Blackwell Publishing Ltd. Published by Blackwell Publishing,

9600 Garsington Road, Oxford 2DQ, UK and 350 Main Street,

Malden, MA 02148, USA.

69

Submitted August 2004

Revised April 2005

Accepted July 2005

Page 2

70

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

Many techniques have been undertaken in a controlled

environment to perform image registration onto small ob-

jects [9–11]. In an urban environment approaches such as

described in [12–16] have been developed. These techniques

match structures in the image with structures in the range

data. However, their application to image registration in a

coarsethree-dimensionalurbanenvironmentisdifficultsince

the mesh representing the building fa¸ cades has a low resolu-

tion. Brenner and Haala in [1] integrated the images into the

model by employing the property that building fa¸ cades are

largely planar structures. The images were registered to the

building fa¸ cades by manual selection of four corresponding

points. These points could be the four corners of the building

intheimageandthecorrespondingpointsinthemodel.These

would be sufficient to transform the image plane from a pro-

jective space to the metric space of the model. This manual

process has several disadvantages when attempting to model

a large environment. It is a laborious procedure and one that

assumes that four points are always available in the images.

This assumption is not always valid. In a residential environ-

ment, it will often be the case that various objects such as

vegetation or fencing will occlude the base of the building.

Leaving only two corner feature points visible in the image.

Thispaperpresentsanapproachfortheautomaticregistra-

tion of ground level images to the building fa¸ cades contained

inacoarsethree-dimensionalurbanmodel.Itfunctionsinthe

presence of occluding objects.

3. Generating the Urban Model

Inthiswork,theurbanmodelhasbeengeneratedbycombin-

ingLightDetectionandRanging(LIDAR),elevationdataand

LandLine.Plus data, as discussed in [17]. The LandLine.Plus

data provide building footprints for the urban model as well

asaroadnetwork.Thebuildingfootprintsareextrudedbythe

height values contained in the LIDAR elevation data and are

cappedwitharoofmodel.Theroofmodelsareautomatically

generated for a given building footprint polygon. The road

network from the LandLine.Plus data is partitioned into line

segments, representing the streets, where the end points of

the streets are equal to junction points in the road network.

The system developed has the functionality to print a map

indicating the area to be modelled and the street identifica-

tion numbers. A user can obtain images for every building

along the street labelled in the map. This provides a mapping

from the ground level images to the buildings as they are

viewed in the virtual scene. Figure 1 indicates the mapping

ofbuildingimagestothebuildingfa¸ cadesinthevirtualscene.

Each perpendicular ray from the road centre line intersects

the building walls in the virtual scene. In addition, Figure 1

illustrates the result of extruding the building footprints in

the LandLine.Plus data by the amount indicated by the LI-

DAR height data. This model permits a user to obtain a set of

images, and for each image its corresponding fa¸ cade in the

virtual model is known. The remainder of this paper details

Figure 1: Mapping ground level images to buildings in the

Virtual Urban Environment.

the technique that enables a building to be identified in the

ground level image and mapped onto the fa¸ cade in the virtual

model automatically.

4. Overview

While complicated architectural structures exist in an ur-

ban environment, a high percentage of simpler buildings are

present. Performing image registration using an interactive

approach [6], is time consuming and labour intensive. In a

virtual urban environment, close attention is aimed at the

modeling of landmark buildings. It is these buildings that of-

ten consist of complicated geometry. However, in order for

theurbanenvironmenttoberealistic,thelessimportantbuild-

ings should be included. These buildings are much simpler

in their geometric representation frequently being modelled

accurately using large planar surfaces. An automatic method

for the registration of images onto this category of buildings

wouldbebeneficial,relievingtheusertomodelthelandmark

buildings. The main contribution of this paper focuses on a

technique to automatically extract the texture maps for these

simpler buildings and register them onto their fa¸ cades.

The fa¸ cades consist of a dominant planar structure. There-

fore, to map the building fa¸ cade in the image to the building

fa¸ cade in the model, four coplanar corresponding lines or

points, none of which are collinear, are required. These four

features are sufficient to determine the planar homography

that will translate a plane under a projective transformation.

In order for the homography to be valid for the entire fa¸ cade,

thefourfeaturesshouldbeclosetothefourextremalpointsof

the planar surface representing the building. Consequently,

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 3

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

71

the four features may take the form of the four corner points

of the building fa¸ cade. However, it is difficult to extract these

four corners explicitly from the image data, since the evi-

dence for a feature point is small and it may easily become

occluded. To alleviate this problem, the four line segments

representing a quadrilateral surrounding the building fa¸ cade

will be used. The quadrilateral consists of two line segments

representing the building walls, a line parallel to the domi-

nant horizontal direction in the fa¸ cade and passing through

thehighestpointontheroofandthebaseofthebuilding.Au-

tomatically extracting line segments from an image is more

reliable than point extraction since the evidence for a line

segment is based on many pixels in the image.

It is often the case in residential environments for the base

line to be occluded by foreground objects. Therefore, only

three lines representing the top and two sides of a bounding

quadrilateral for the fa¸ cade can be extracted. Three lines do

not provide a sufficient number of constraints to map from

a projective to a metric space. Consequently, the image of

the building will first be upgraded to an affine image via the

identification of the vanishing line [5]. Using the three lines

and the aspect ratio of the building, the affine image may be

transformed to the metric image. This enables four lines to

be identified and hence, the homography determined to map

the building image onto the building fa¸ cade.

5. Building Fa¸ cade Identification

This section will present the method to identify the three

buildingfa¸ cadelinesintheimage.Theaimistotakeaground

level image and identify the two outermost vertical edges of

the building fa¸ cade. Between these two line segments, a fur-

thersetoflinesegmentscanbeextracted.Theselinesegments

formaconnectedpath,denotedastheroofprofile,whichpar-

titions the building fa¸ cade planes from the roof planes. The

third line is incident to the maximum y coordinate of this

roof profile. Its orientation is equal to the dominant direc-

tion of the line segments identified on the building fa¸ cade

that are orientated closer to the horizontal than the vertical.

The approach assumes that the image has not been rotated

significantly about the optical axis in order that the domi-

nant orientation of the lines, which are more horizontal than

vertical, may be chosen. Figure 2 illustrates a ground level

image with the three building lines labelled. The roof profile

is illustrated as a dashed black line.

Adoptingacolourortexturebasedsegmentationtechnique

to identify these lines will be problematic due to the large

variation in building wall material that can occur in architec-

tural scenes. Consequently, the structure in the image will be

analysed, since a building fa¸ cade will be the largest planar

object in the image. To facilitate the structure analysis, two

images are acquired with a small translation between them.

During data capture, the camera is translated by an amount

approximately within the range 1/30–1/50 of the distance to

Figure 2: Perspective image including the three building

fac ¸ade line segments. The roof profile is shown using dashed

black line segments.

the building. This range is typically used to obtain images

that give sufficient variation in the displacement of the pix-

els to aid in free stereo viewing [18]. It permits features at

different depth levels to be identified, allowing for noise that

may occur during data capture. These images are captured

for each building fa¸ cade specified in the printed map of the

virtualmodel,asindicatedinSection3.Thefollowingproce-

dure summarises the building fa¸ cade identification algorithm

developed to process the stereo pair.

Procedure: Building Fa¸ cade Identification

(1) Extraction and validation of corresponding features in

the two images.

(2) Identification of the set of lines, L, that lie on the dom-

inant planar structure.

(3) Identify from the lines in L which ones lie near to the

edge of the planar structure.

(4) For each pair of lines found in step 3 search for a path

between the lines and generate a roof profile.

(5) Take the pair of lines which combined with the roof

profile generates the largest fa¸ cade polygon.

(6) Generatethethreelinesfromthelargestfa¸ cadepolygon.

Each of these stages in the method will now be discussed

in more detail.

5.1. Generating Corresponding Features

To enable the structure in an image to be analysed, corre-

sponding features in the two perspective images of the scene

arerequired.Inthiswork,bothcorrespondingpointsandcor-

responding lines are used. The aim is to employ these two

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 4

72

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

types of corresponding features in a process that identifies

the dominant planar structure from the scene. It is impor-

tant to obtain good coverage of features across the image.

Consequently, the KLT tracker [19] will be used to gener-

ate the feature points in an image at many locations where

high detail exists. Using feature points alone to identify the

dominant planar structure may be prone to error since many

feature points may easily lie on a virtual plane rather than an

actual physical plane in world space. To alleviate this prob-

lem corresponding lines may be used, which owing to their

varying orientation are less likely to generate a virtual plane

where many of the other lines also lie.

5.1.1. Generating Corresponding Lines

The first stage in extracting corresponding lines is to extract

all the edge pixels. These occur at changes in gradient in

the image intensity and are identified using the Canny Edge

Detection algorithm [20]. Line Segments are fitted to these

edge pixels as described in [5]. The process is repeated in the

second image in order that a set of line segments in the first

image and second image of the scene exist. These lines are

matched by solving an assignment problem, where the aim

is to assign each line in the second image to a line in the first

image. Since a one to one mapping of line to line matches

results from the solution to an assignment problem, dummy

lines are generated to ensure that both the first and second

sets of lines have the same cardinality. The assignment prob-

lem will result in a set of line segment matches such that the

total cost of assigning each line segment in image one to an-

other segment in image two is minimal. Each line segment in

imageoneisassignedacostofhowwellitmatcheseachseg-

ment in image two. This cost is dependent on the similarity

between the orientation of the line segments, the displace-

ment between the midpoints of the segments and the image

intensity surrounding the line segments. These quantities are

typical to many line matching algorithms such as presented

in [21] and [22].

Given the small translation and minimal rotation between

the two images, a line segment in the first image will be both

in close proximity and at a similar orientation to their coun-

terpartsinthesecondimage.Thesetwofactorsareemployed

toreducethesearchspacesincelinesegmentpairsthataretoo

far apart or at a significantly different orientation should not

be considered as potential match candidates. Consequently

an infinite cost is assigned to a line segment pair if the line

segment’s midpoints are 20 pixels apart or if the angle be-

tween their orientations is larger than 20◦. The orientation of

two line segments may vary slightly because of the integer

space of the images or if a minor rotation occurs during the

translation of the camera. Therefore to cull invalid matches,

the 20◦threshold is employed, since this is sufficiently large

to ensure valid matches are included. The 20 pixel thresh-

old described for the verification of the Euclidean distance

between a line segment pair’s midpoints is chosen on the

basis of the maximum disparity observed in the image. The

maximum disparity is dependent on the nearest object in the

scene, which is often approximately equal for a given street

owing to building boundary constraints. Therefore it may

be manually adjusted, on a per street basis, if required. In

addition, an infinite cost is used for the cost of matching a

line segment to a dummy line. If the line segments consid-

ered suitable match candidates, then the image intensity is

used.

Equation (1) presents the cost function that is minimised

during the generation of a solution to the assignment prob-

lem. The cost function c(i, j) defines the cost of match-

ing line i in the second image with line j in the first

image.

⎧

⎩

SAD(i, j) =

?

c(i, j) =

⎨

∞,

∞,

dist(mid(i),mid(j)) > 20

angle(i,j) > 20

otherwiseSAD(i, j),

(1)

min

−mag(j)≤k≤mag(i)

1

mag(j)∗ cost(i,j,k)

?

In Equation (1), the normalized sum of the absolute dif-

ferences, SAD, in image intensity is used to obtain a match

cost. The normalized SAD value is computed by sliding line

segment j in the direction of line segment i. At each position,

k,linesegmentjpartiallyoverlapslinesegmentiandamatch

cost is determined to identify the best matching position for

the line segment. The match cost is determinined by calcu-

lating the absolute error between two overlapping windows

surrounding line segments i and j. The windows consist of

twoparts:thelefthalfandtherighthalf.Theseportionsofthe

window are generated using the pixels that lie one pixel from

thelinesegmentintheleftandrighthalfspaces,respectively.

At each position, the cost is evaluated and the minimum cost

is stored as the match cost for the current line segments i and

j. Given the costs between each candidate line segment pair,

the assignment problem can be solved to generate a set of

corresponding lines. Figure 3 illustrates the result of match-

ing the line segments in image one with the line segments

in image two. Example line segments are labelled from one

to eight, where each line in a corresponding pair is assigned

the same label. Line segments that do not have a match are

shown as dashed black and white lines.

5.1.2. Validation of Corresponding Feature Points

During the data capture stage, the camera is translated by a

small amount resulting in the feature points and lines being

displaced from one image to the next in a direction corre-

spondingtothetranslationvectorofthecamera.Furthermore,

the magnitude of these vectors will provide the relative dis-

tance the features are from the location of the camera. This

concept has been employed in many algorithms to enable

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 5

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

73

Figure 3: Illustrates the matching of the lines in the two

images. The top represents the first image and the bottom

representsthesecondimagetakenafterasmalltranslationof

the camera. Matching line segments are coloured identically

and eight example line pairs are labelled.

the generation of a depth map [23]. However, in this section,

the concept will aid in the validation of the corresponding

features.Thefeaturematchingalgorithmsdevelopedforgen-

eratingcorrespondingfeaturesmayprovidefalsematchesfor

a variety of reasons. These include the identification of fea-

tures in one image, which become occluded and therefore

notvisibleinthesecondimage.Inaddition,afeaturepointin

image one may have a similar neighbourhood to many pix-

els in the second image, causing the KLT tracker to generate

falsematches.Twohistogramsaregeneratedrepresentingthe

direction of the displacement vectors and the magnitude of

the displacement vectors. The aim is to analyse these his-

tograms to permit features that can be considered outliers to

be removed from the corresponding feature set. An outlier is

any feature that is either not sufficiently close to the modal

magnitude or modal orientation of the displacement vectors.

Because of the features being defined in integer space and

the fact that a small amount of rotation may occur between

the two images, reliable features will be found within an area

surrounding the modal value. Consequently, a valid range

will be identified within each of the histograms by observing

the first occurrence of three consecutive empty bins in either

direction from the modal value. This enables outliers to be

identifiedthathaveeithertoolargeadisplacementmagnitude

or orientation.

5.1.3. Generating Corresponding Line Segments

The application of the validation procedure to the feature

points is straightforward but it is not directly applicable to

validating the corresponding lines. This is because after ex-

ecuting the line-matching algorithm, a set of potential cor-

responding line pairs is generated rather than line segment

pairs. Therefore, to use the validation procedure for a given

line pair, two points — one on each line in the two images—

are required. It will be useful in a later stage of the building

fa¸ cade extraction algorithm to have corresponding points on

the line segments that are good match candidates. To enable

theidentificationofthesecorrespondingfeatures,thefactthat

most line segments lie on planar objects in the scene will be

employed.Underaprojectivetransformation,theintersection

of two lines in the same plane remains invariant. Therefore,

given a line the intersection points that exist between this

line and the other lines in the scene provide potential cor-

responding feature points. These potential feature points are

valid providing that the two lines incident to the intersection

point are incident to the intersection point in world space.

Since this stage of the algorithm is working in image space,

the knowledge that the lines are incident in world space is

unavailable.However,therewillexistmanylinesinthesame

plane and these will cause many intersection points, which

are valid. These intersection points will be transformed from

imageonetoimagetwoviaatwodimensionallinetolineho-

mographymatrix[24].Consequently,byidentifyingthesetof

intersection points on the line, the homography may be com-

putedandthecorrespondingfeaturesforthislinedetermined.

Thelargestsetmaybeidentifiedbyexhaustivelytestingevery

possiblecombinationofthreeintersectionpointsoneachline

from the total set of intersection points. These three points

are sufficient to determine the line-to-line homography using

Equation (2).

x2= Hlx1,

(2)

where x2is the point on the line in image one represented

in homogeneous form (λ, μ) and x1is the point in image

two represented using homogeneous form. Hlrepresents the

two-by-two line homography. The number of inliers to this

homography is calculated by using the symmetric transfer

error [24]. A feature with an error less than the user specified

threshold of a half is considered an inlier.

Reducing the search space via the cross ratio

An exhaustive search of all the intersection points on a

given line segment permits the line segment’s end points to

be determined via the calculation of the line-to-line homog-

raphy. However, the time required for the exhaustive testing

of all the possible sets of three intersection points may be

reduced, since many of the sets may be culled. The reason

being that although it is not known if the points are incident

to the same intersection point in world space, it is known

if four points are collinear. The cross ratio is invariant un-

der a projective transformation and provides the collinear

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 6

74

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

constraint on the four points. Consequently, each combina-

tion of four points from the set of all intersection points can

be tested. The cross ratio, as specified in Equation (3), may

be calculated. Each point is represented as a homogeneous

point (λ, μ). The arguments to the equation, a, b, c and d,

represent the parameter, λ, for the location of the four points

in the parametric Equation of the line. For all these points,

μ=1sinceallthepointswilllieonthelineinimagespace.If

the cross ratio is constant from image one to image two, then

it can be stated that the four points are collinear and either lie

on a physical or virtual line in world space.

crossratio(a,b,c,d) =(a − b) ∗ (c − d)

(a − c) ∗ (b − d)

(3)

For each combination of four points that the cross ratio is

constant,asetofthreepointsisgenerated.Thesethreepoints

aremoredispersedalongthelinesincethesethreeprovidethe

best representation for the line segment. The maximum set

of inliers computed using the line-to-line homography is de-

termined using these sets of three points. Applying the cross

ratio substantially reduces the total number of sets required

to be tested, enabling all the possible sets of three points to

be considered during the computation of the line-to-line ho-

mography. Figure 4 illustrates the two images with their end

points shown. It can be observed that lines on the main build-

ing fa¸ cade have their end points accurately positioned. The

next section details how the dominant planar structure may

be determined using the valid set of correspondences.

5.2. Identification of the dominant planar structure

Given the set of corresponding feature points and lines, this

section of the paper describes the algorithm used to deter-

mine the dominant planar object in the image. A planar ob-

ject transforms from one image to another under a projective

transformation via a four by four planar homography, H. As

mentioned previously, computing such a homography using

feature points may identify virtual planes rather than physi-

cal planes actually identified in world space. Therefore, the

corresponding lines will be used to compute the planar ho-

mography.Thefourcorrespondinglinesaresufficienttosolve

Equation (4) to identify the planar homography.

li.Hxi= 0,

(4)

where liis the ith line (lia, lib, lic) in image one and xirefers

to the ithpoint (xix, xiy, xiw) on the corresponding line to liin

image two.

The majority of the lines will lie on the plane supported by

thebuildingfa¸ cade.Consequently,thehomographyissought

thatisbasedonfourlinesandgeneratesthesetofinliers,PS,

withthelargestcardinality.Testingallpossiblecombinations

of four lines from the complete set of lines in the image is

toocomputationallyexpensive.Manysetswilllieinthesame

Figure4: Illustratesthelinesegmentmatcher.Thetopimage

representsthefirstimageandthebottomimagerepresentsthe

second image taken after a small translation of the camera.

The end points of the line segments on the building fac ¸ade

are correctly positioned.

plane meaning that many redundant tests will be carried out.

The RANSAC [25] algorithm will be employed to take sets

of four lines at random and to test to find the largest set of

inliers. At each iteration, the homography is computed using

four corresponding lines and the inliers are tested by using

the symmetric transfer error between the line segment’s end

points.

5.3. Edge Classification

Thesetoflinesegmentsextractedfromimageonecomprises

of those edges in the image that exist within an object and

those edges that are at object boundaries. The aim of this part

of the algorithm is to identify the line segments that were

generated using edge pixels incident to the building fa¸ cade

boundary. This is achieved by first employing the planar ho-

mographytomapallthefeatures,fromimageoneintoimage

two. For each feature a displacement vector can be defined

that is generated using the projected feature from image one

and its corresponding feature in image two.

Figure 5 illustrates the displacement vectors. It can be

seen that features on the building plane are inliers and there-

fore have a small displacement vector magnitude. Whereas

features located on objects off this planar object have a

larger magnitude. Furthermore, the features behind the plane

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 7

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

75

Figure 5: The displacement vectors for the features trans-

formedviatheplanarhomography.Whitefeaturesareinfront

of the plane, black line segments represent features behind

the plane and the black dots are features on the plane.

displace in one direction and features in front of the plane

displace in a different direction. These properties enable the

features to be classified as in front of the plane, behind the

plane and on the plane, and the figure illustrates this classi-

fication. The direction of the features in front of the plane

and those behind the plane are determined by identifying the

twopeaksinthebimodalhistogramrepresentingtheanglesof

thefeaturedisplacementvectors.Theuseofthedisplacement

vector is discussed in [24] where the planar homography is

used to construct a binary space partition of a scene.

5.3.1. Planar Map Generation

The corresponding feature points and line segments provide

planar information at discrete points throughout the image.

During edge classification, each line segment is classified by

inspecting the neighbouring features plus their planar infor-

mation.Tofacilitatethelocationoftheneighbouringfeatures

for a given line segment, a planar map is generated by con-

structing a Delaunay Triangulation of all the features in the

corresponding feature set. This feature set includes all the

valid corresponding points plus points along line segments

inthesetPS.Eachfeatureisclassifiedintooneofthreecate-

gories: on the plane, behind the plane or in front of the plane.

The triangles in the triangulation are rendered to a grey scale

image and are shaded on the basis of the class of feature at

eachoftheirvertices.Barycentriccoordinatesareusedtolin-

early interpolate the class of each vertex across the triangle.

Figure 6 illustrates the per-pixel planar map representing the

features in front of the plane in white, on the plane as grey

level 128 and behind the plane in black. Gradation of these

grey levels occurs between the feature points.

Figure 6: The per pixel-planar map. Grey level of 128 is

equal to image areas on the plane. White pixels are in front

oftheplaneandblackpixelsarebehindtheplane.Gradation

ofthesevaluesareobtainedfrominterpolationinareaswhere

features do not exist.

Using the planar map, each line segment is tested to de-

termine if it is on the planar fa¸ cade by searching along two

arrays of pixels. These two arrays are constructed from the

pixelsoneunitfromthelinesegmentinthelefthalfspaceand

one unit from the line segment in the right half space. If there

exists pixels, which are classified as being behind the plane,

in both the left and right arrays of pixels from the planar

map, then the line segment is classified as off the plane. The

remaining unclassified line segments undergo further classi-

fication to separate those line segments on the boundary of

the building fa¸ cade from those that are within the boundary.

The line segments within the building fa¸ cade boundary do

not contain any behind the plane features in their left or right

half arrays. The line segments, which have behind the plane

features in only one array, may be determined as an object

boundary edge and inserted into a set of object boundary line

segments, L. The half space of the line that contains the fea-

tures classified as on the plane is likely to be the half space

where the building fa¸ cade exists.

5.3.2. Generating Additional Corresponding Features

The accuracy of the planar map is dependent on the accuracy

and distribution of corresponding feature points throughout

the image. In the previous sections, it was shown how the

KLTtrackerwasemployedandthelinesegmentmatcherwas

developedtoextractcorrespondingfeatures.Inpracticethese

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 8

76

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

featuresalonesometimesdonotprovidegoodcoverageofthe

entire image leading to an inaccurate planar map and, hence,

theidentificationofthoselinesegmentsneartheboundaryof

the building fa¸ cade is unreliable.

A common property that reduces the sampling of corre-

sponding features is caused by ground level images that fre-

quently contain many pixels representing the sky. These re-

gions in the image should be classified as behind the plane,

but the homogeneous regions offer no distinguishing marks

from which corresponding pixels may be extracted. Conse-

quently, additional features are automatically inserted into

the sky region by performing a seed fill algorithm from each

pixel in the top row of the image. The algorithm recursively

adds pixels, creating a mask of similar pixels, while the red,

green and blue channels are sufficiently similar. A thresh-

old value of 10 was used for each of the colour channels to

ensure only close-matching pixels are connected. To avoid

overfilling into other similarly coloured objects, the seed fill

algorithm was modified to not permit the filling to pass near

to edge pixels. The new features are subsequently distributed

within the region in the image covered by the masks gen-

erated during the seed fill algorithm. Figure 5 illustrates the

additional features behind the plane inserted into the image

regions representing the sky.

5.4. Candidate Building Fa¸ cade Generation

During this stage, the line segments, which represent the ob-

ject boundary edges, are joined to construct candidate build-

ing fa¸ cade polygons. This is achieved by first partitioning the

set of line segments, L, which lie on the dominant planar

structure, into two groups: those that are oriented more hor-

izontal than vertical, HL, and those that are oriented more

vertical than horizontal, VL. Since it is assumed that the

images are taken without rotating the camera about the opti-

cal axis significant distortion of the horizontal lines will be

avoided. Each line in the set of line segments, VL, contains

a Boolean variable, planar, indicating if the planar fa¸ cade is

located in the left half space of the line segment. If the value

of the variable planar is false, then the planar fa¸ cade is lo-

cated in the right half space. A set of candidate vertical line

pairs is generated from the set VL using Equation (5).

C =

{(l,k) ∈ VL × VL|¬l.planar ∧ k.planar ∧ onlef t(l,k)},

(5)

whereonleft(l,k)returnstrueifalinesegmentliscompletely

in the left half space of a line segment k.

Each element of the candidate fa¸ cade set, C, forms a hy-

pothesis for the two vertical lines of the building fa¸ cade.

These hypotheses are verified by constructing a path from

the left edge to the right edge. The path is generated by

traversing a graph, G(V,E). V is equal to the vertices of

the graph, where each vertex corresponds to a line segment

from set L. This path is referred to as the roof profile and

is illustrated in Figure 2. The aim is to identify the path that

formsthelargestfa¸ cadepolygonbetweenthecurrentleftand

right edges. Consequently, all the paths in a graph will be

enumerated using a depth first search, DFS(G, startVertex,

endVertex), and the path that creates the largest valid fa¸ cade

polygon will be selected. The following procedure formally

describes the algorithm.

(1) ∀ (l,k) ∈ C

(2) Construct a graph G(V,E).

V = {l} ∨ {k}∨

{e ∈ L|¬(onlef t(e,l) ∨ ¬onlef t(e,k))}

E = {(i, j) ∈ V × V|validIntersection(i, j)}

(3) Construct the candidate fa¸ cades by extracting all paths

from vertex l to k.

VP = {p ∈ DFS(G,l,k)|validpath(p)}

(4) Choose the largest polygon from the candidate fa¸ cades.

max

vp∈VP(area(vp))

Instep2,anedgeisgeneratedinthegraphbetweentwover-

tices if the two line segments represented by the two vertices

intersect at a valid intersection point. The valid intersection

function checks that the intersection point occurs within the

image region, is between the two candidate vertical lines, l

and k, and is above the minimum y coordinate of the two

vertical lines. The validation of the intersection points limits

the total number of edges in the graph and permits all the

paths to be enumerated using a depth first traversal of the

graph from the vertex representing edge l to the vertex repre-

senting edge k. The procedure provides many paths from the

left vertical line to the right vertical line. Each path is tested

to ensure that it is a valid fa¸ cade. A fa¸ cade is validated by

constructingafa¸ cadepolygon.Thepolygoncomprisesofthe

line segments l and k joined by the path identified during the

depth first traversal. The polygon is completed by inserting

a base line segment between the end point of l and the end

point of k. The fa¸ cade is valid if the polygon does not contain

any feature that is behind the planar fa¸ cade object. It allows

features that are in front of the plane to be within the fa¸ cade

polygon since these features could be due to foreground and

thereforepotentiallyoccludingobjects.Thepolygonistested

by observing the pixel values in the planar map illustrated in

Figure 6.

The largest valid fa¸ cade polygon is obtained for each pair

oflinesegmentsinthecandidatesetC.Eachofthesepolygons

will be modified to enable the largest polygon to be chosen.

The modification places the baseline segment on the bottom

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 9

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

77

oftheimage.Thisenablesbuildingfa¸ cadesthatarebothwider

and taller in the image to have a higher priority. The largest

polygonwillbeusedinthenextstagetofacilitatetherecovery

of the affine properties from the image—thereby moving one

stepclosertotheimageregistrationofthefa¸ cadetothemodel

in the virtual environment.

6. Affine Rectification

Themotivationfortherecoveryoftheaffinepropertiesofthe

buildingfa¸ cadepolygonistoenablethethreebuildingfa¸ cade

lines to be sufficient to register the building fa¸ cade image to

the virtual model. The affine rectification is achieved using

a two-step procedure. The first step identifies the vanishing

line, which is defined by two vanishing points in the image.

The second defines a projective transformation, P, using this

vanishing line [26]. The projective transformation warps the

image so that the planar building fa¸ cade is rectified up to an

affine transformation.

The automatic vanishing point detection algorithm pre-

sented by Lee et al. [15], has been adopted to extract the

vanishing points corresponding to the two dominant line ori-

entationsonthebuildingfa¸ cadepolygon.Thisalgorithmwas

chosenbecauseitusesagaussianspheretoidentifythelargest

set of concurrent lines in the vertical and horizontal direc-

tions. Consequently, the approach will work for lines that

are parallel and therefore intersect at infinity. For each set

of concurrent lines the mean intersection point is calculated.

The two intersection points resulting from the largest two

concurrent line sets defines the vanishing points, v1 and v2.

These two vanishing points define the vanishing line, (la, lb,

lc) = v1 × v2. The vanishing line is inserted into a projective

transformation matrix, P, as presented in [26].

⎛

la

P =

⎝

1

0

0

1

lb

0

0

lc

⎞

⎠.

(6)

By warping the image by P, the affine rectification of the

image plane is completed. If the lines in the image that are

incident to the vanishing points are already parallel, then no

affine rectification of the plane is required. This often occurs

if the camera centre is sufficiently distant from the object of

interestorthecameraispositionedsuchthatitfocusesonthe

centre of the planar surface at a viewing direction orthogonal

to the plane.

7. Metric Rectification

Aftertheplanarfa¸ cadeisrectifieduptoanaffinetransforma-

tion, the lines in the two dominant directions will be parallel.

An affine transformation matrix, A, can be applied to obtain

an image rectified up to a metric transformation. The metric

image has the invariant properties of a projective image and

Figure 7: The perspective image of a building fac ¸ade is up-

gradedthroughanaffinerectificationtoametricrectification.

The three image lines on the affine image are used to recover

the metric properties of the planar fac ¸ade.

an affine image plus the property that the ratio of lengths are

invariant. The ratio of the lengths enables the aspect ratio of

the building fa¸ cade in the virtual model to be used to identify

the four corners in the metric image of the building fa¸ cade.

Figure2illustratesthebuildingfa¸ cadeimagewiththethree

lineslabelledl1,l2andl3.Inthisexample,thebuildingfa¸ cade

did not require affine rectification, since both the horizontal

lines and the vertical lines are parallel. Therefore, the domi-

nantdirectionofthehorizontallinesonthefa¸ cademaybeem-

ployed to determine the orientation of the line, l2. However,

if the image is not affinely rectified, then the set of ‘horizon-

tal’ lines from the perspective image should be transformed

into the affine image and the dominant orientation selected

from the new parallel line set. Figure 7 illustrates a ground

level image under a perspective, affine and metric transfor-

mation. The line l2 was selected using the orientation of the

dominant horizontal line set after its transformation from the

perspective to the affine image.

Thethreeimagelinescorrespondtothevirtualfa¸ cadelines

on the model L1, L2 and L3, labelled in Figure 8, and these

line correspondences are related via an affine transformation

matrix, A. The computation of A is undertaken by using two

equations from each line correspondence to form a set of

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 10

78

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

Figure8: A:Theaffinerectifiedimage,B:Thepseudometric

rectified image, C: Four corner points translated from the

pseudo metric image to the perspective image, D: The final

registered building fac ¸ade. The light grey area with black

cross hatching denotes an area of transparent pixels.

linear equations. This set of linear equations is identical to

Equation (4) used when identifying the planar homography.

However, since only an affine transformation is required, the

matrix A will contain (0, 0, 1) in the last row and therefore

only six equations, or three correspondences, are required to

determine A. The top right of Figure 8 presents, a pseudo

metric image, the result of transforming the affine image in

order that the image lines are mapped onto the scene lines.

However, the result has not been recovered up to a metric

transformation, since the affine transformation has only pro-

videdasingleorthogonalconstraint,ontheimage.Liebowitz

andZisserman[26]describethemetricrectificationofaplane

using a known angle between lines and illustrates that given

only a single orthogonal constraint the resultant plane will

be rectified so that the lines are orthogonal. Although it is

stated that there will exist an ambiguity between the scales in

the two directions of the orthogonal line pair. In this paper,

the metric rectification will be completed by calculating the

height of the building in the pseudo metric image. The four

cornersofthepseudometricimagearetransformedbackinto

the affine image. These four points are labelled 1, 2, 3 and

4 in the top left of Figure 8. They allow the two scales to

be determined in the x and y directions. Since linear inter-

polation is invariant under an affine transformation, the scale

in the x direction can be computed by determining the scale

between the width, imageWidth, of the pseudo metric image

and the distance, distWidth, between the points 2 and 3 in the

affine image. The scale in the y direction can be computed

similarly. The difference in these two scales provides the rel-

ative scale in the image. Multiplying the relative scale and

the width of the building fa¸ cade calculates the width of the

buildingfa¸ cade,widthInYScale,atthesamescaleastheyaxis

of the image. The widthInYScale is divided by the aspect ra-

tio,aspectRatio,ofthebuildingtodeterminetheheightofthe

buildingfa¸ cadewithinthepseudometricimage.Equation(7)

defineshowtheminimumycoordinateofthebuildingfa¸ cade

is determined.

scaleInX = imageWidth/distWidth

scaleInY = imageHeight/distHeight

relativeScale = scaleInY/scaleInX

widthInYScale = imageWidth ∗ relativeScale

minYCoord = imageHeight−

(widthInYScale/aspectRatio)

(7)

Once the height of the building fa¸ cade has been calculated

by the two coordinates of the base of the building are known.

Thesetwopointsaremappedvia(PA)−1toobtainthelocation

of these corner points in the original perspective image. The

four points are illustrated in the bottom left of Figure 8. The

final stage of the algorithm is to register the image to the

fa¸ cade in the virtual environment.

8. Image Registration

Giventhefourcornerpointsonthebuildingfa¸ cadeintheper-

spective image, x1−4, and the four corner points in the virtual

model,x?

ing the Direct Linear Transform algorithm [27]. The bottom

rightoffigure8illustratesthemappingthathasregisteredthe

building fa¸ cade image onto the building fa¸ cade in the virtual

model.Toimprovethemodelforvisualization,theareaofthe

image outside of the fa¸ cade polygon is set to be transparent.

This area is shaded light grey and overlaid with black cross

hatching in the figure.

1−4,theplanarhomography,H,canbecalculatedus-

9. Results

Figure 9 illustrates the results obtained by executing the pro-

cedure to automatically identify the building fa¸ cade texture

mapsonfourbuildingimages.Duringthealgorithm,theroof

profiles are extracted to define the building fa¸ cade polygon

and any pixels outside of this polygon are shaded grey in the

figure. Each texture map was generated in 10–15 seconds on

a Pentium 1.8GHz computer.

9.1. Automatic Geometric Refinement

The generation of the three-dimensional urban environment

incorporated low-resolution altitude data to raise each foot-

print to the correct height. Given the limited resolution of

this data, automatic roof modeling techniques were used to

assign each building a roof model. While this improves the

visualizationofthemodel,itisnottruetotheactuallocation.

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 11

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

79

Figure 9: The left column illustrates four ground level im-

ages.Eachimageisthefirstimagefromapairofimages.The

right column presents the texture map obtained after auto-

matically detecting the three bounding building fac ¸ade lines.

During the image registration algorithm the roof profile is

identified. Consequently, the roof profile may be employed

to correct the roof geometry in the coarse virtual model.

Figure 11 illustrates the building fa¸ cade texture maps from

figure 9 assigned to the geometry in an urban environment.

It can be seen how the gable end is automatically adjusted to

accommodate the texture map.

Figure 10: The concept of partitioning a building fac ¸ade

image into a wall map and an object map, which are com-

binedduringrenderingusingthemultipletextureunitsonthe

graphics card.

9.2. Removal of Occlusions

The image registration algorithm enables the transformation

of the ground level images onto the building fa¸ cades in an

urban environment. The resultant texture maps contain oc-

cluding objects, which become projected onto the building

fa¸ cades during rendering. To enable a realistic scene to be

renderedforawalkthroughapplication,theoccludingobjects

shouldberemovedfromthetexturemaps.Thisproblemcon-

sists of two parts. The first part involves the identification of

theoccludingobjectsinthetexturemaps,andthesecondpart

concerns the synthesis of pixels to fill the occluded regions.

The identification of the occluding objects is undertaken

using the following assumption. Many building fa¸ cades in

a residential environment are constructed using a particu-

lar building wall material such as flint, concrete or brick.

Additional objects including windows, doors and decorative

wall patterns are either inset or overlaid on the wall mate-

rial. This property enables the segmentation of the building

fa¸ cade to proceed by collecting all the pixels into a region

that represents the wall material. The remaining pixels may

be grouped together to form objects, and a planar segmenta-

tionisundertakentoremoveobjectsthatarelabelledasbeing

in the foreground. The removal of the occluding objects re-

sults in a texture map consisting of undefined pixel values,

which may be filled using inpainting techniques [28]. How-

ever, using image inpainting techniques on large regions is

both time consuming and can produce undesirable artefacts.

Consequently, an alternative procedure may be utilised, as

described in [29], which employs the fact that the wall ma-

terial used for the building fa¸ cade may be expressed as a

periodic pattern. This enables the wall material to be rep-

resented using a repeatable wall map, which may be tiled

seamlessly across the surface of the fa¸ cade. The other ob-

jects, which are coplanar with the building fa¸ cade, may be

inserted into a transparent texture map called the object map.

During rendering, the wall map is assigned to the first texture

unit of a graphics card and the object map is assigned to the

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 12

80

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

Figure 11: Texture maps are assigned to the building fac ¸ades and the roof geometry is altered according to the roof profile.

second texture unit. The two textures are subsequently com-

posited in real time using multitexturing. Figure 10 provides

an overview of this approach.

10. Conclusions

In this paper, an algorithm has been presented that enables

two images of an architectural scene to be automatically pro-

cessed to identify the main building fa¸ cade. It achieves this

operation using the assumption that many buildings consist

of a dominant planar structure and that the planar structure

contains many line segments. The line segments extracted

from the image are used to construct a set of corresponding

line pairs. The end points of these lines will not necessarily

match, due to occlusions or image noise. Consequently, the

endpointsarecalculatedbyidentifyingahomography,which

relates one line in the first image to its corresponding line in

the second image. The homography is computed using the

points obtained by intersecting the line with the other lines

in the image. The homography that relates the most intersec-

tion points is selected. The search space is reduced in size by

employing the cross ratio. Therefore, ensuring that the inter-

sectionpointsusedtocalculatethehomographyarecollinear.

The resulting line segments enable the building fa¸ cade to be

identifiedbygeneratingaplanarhomography.Thelargestset

of line segments related via a planar homography is chosen,

and a polygon surrounding these segments is generated. The

building fa¸ cade polygon is constructed by enumerating the

possible paths from the left vertical line segment to the right

line segment using a depth first traversal. By enumerating

the paths in the graph, the approach is capable of extracting

the division between the roof planes and the building fa¸ cade

planes. Figure 11 illustrates the result of automatically gen-

eratingfourtexturemaps.Thetexturemapsareappliedtothe

surfaces of the building fa¸ cades in the urban environment.

11. Future Work

Image registration relies on the building fa¸ cade being a large

planarobjectthatcontainsmanylinesegments.Thisassump-

tion has led to the successful application of the approach for

texture mapping vernacular, detached, buildings. While this

category of building is frequently found in an urban environ-

ment, it is often the case that the entire planar object is not

possible to be captured in a single image. This could be be-

cause of the existence of limited vantage points from where

to acquire the images or because the subject consists of a

large building or group of buildings. The latter is a problem

because the objects would not fit into the image plane and

still provide a sufficient image resolution for building fa¸ cade

texturemapsviewedatcloserange.Therefore,toimprovethe

flexibility of the approach, further work is required to enable

thetechniquetoworkonasequenceofimagescontainingthe

planar object. Image mosaics, would form an integral part of

such an approach, where the dominant planar object, being

thebuildingfa¸ cade,wouldbeusedtoalignalltheimagesinto

the same coordinate frame. A planar homography is com-

puted during the creation of the image mosaic and this could

facilitate the construction of the planar map. Consequently,

the planar map may be employed to enable the extraction of

the building fa¸ cade.

The removal of the perspective distortion via metric recti-

fication is only correct for the dominant planar structure of

the building fa¸ cade. If the building consists of other planar

surfaces, these should also be rectified up to a metric trans-

formation. To avoid misclassifying the additional geometry

as an occluder, and to subsequently assist in the perspec-

tive correction of these smaller planar surfaces, the building

footprints in the LandLine.Plus data can be consulted. These

footprints contain the cross section of the building and there-

fore offers insight into additional planar surfaces that should

be rectified. If the footprint indicates the presence of many

planes,thenitmaybepossibletosolvetheimageregistration

problem, via the direct computation of the projection matrix,

using at least six image to model corresponding lines.

References

1.C. Brenner and N. Haala. Fast production of virtual re-

ality city models, In Proceedings of ISPRS Comm. IV

Symposium, Stuttgart, September 7–10, IAPRS, 32(4),

pp. 323–330, 1998.

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 13

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

81

2. C.Fr¨ uhandA.Zakhor.Fast3dmodelgenerationinurban

environments, international conference on multisensor

fusion and integration for intelligent systems. Baden-

Baden, Germany, 165–170, August 2001.

3. O. Faugeras, L. Robert, S. Laveau, G. Csurka, C. Zeller,

C. Gauclin and I. Zoghlami. 3-d reconstruction of ur-

ban scenes from image sequences. Computer Vision and

Image Understanding, 69(3): 292–309, 1998.

4. A. Dick, P. Torr and R. Cipolla. Automatic 3d modeling

of architecture, In Proceedings of 11th British Machine

Vision Conference, Bristol, pp. 372–381, 2000.

5. D. Liebowitz, A. Criminisi and A. Zisserman. Creating

architectural models from images. Computer Graphics

Forum, 18(3): 39–50, 1999.

6.S. Gibson, R. J. Hubbold, J. Cook and T. L. Howard.

Interactive reconstruction of virtual environments from

videosequences.ComputersandGraphics,27(2),2003.

7.P. Debevec, C. Taylor and J. Malik. Modelling and

rendering architecture from photographs: A hybrid

geometry- and image-based approach, In Proceedings

of ACM Siggraph, pp. 11–20, 1996.

8. H. Zhao and R. Shibasaki. Reconstructing a textured

CAD model of an urban environment using vehicle-

borne laser range range scanners and line cameras.

Machine Vision and Applications, 14: 35–41, 2003.

9. C.Rocchini,P.CignoniandP.Scopigno.Multipletexture

stitching and blending on 3d objects. In 10th Workshop

on Rendering, Eurographics, Granada, Spain, 173–180,

1999.

10. T. Tipdecho. Automatic image registration between im-

age and object spaces. In Proceedings of Open source

GIS - GRASS users conference 2002, Trento, Italy,

2002.

11. H.Lensch,W.HeidrichandH.Seidel.Automatedtexture

registration and stitching for real world models. Pacific

Graphics, 317–327, 2000.

12. X. Wang, S. Totaro, F. Taillandier, A. Hanson and

S. Teller. Recovering fa¸ cade texture and microstructure

from real world images. In proceedings of 2nd Interna-

tional workshop on Texture Analysis and Synthesis at

ECCV, pp. 145–149, 2002.

13. C. Jaynes and M. Partington. Pose calibration using ap-

proximatelyplanarurbanstructure.AsianConferenceon

Computer Vision, 1999.

14. I. Stamos and P. Allen. 3-D model construction using

range and image data. In Proceedings of IEEE Con-

ference on Computer Vision and Pattern Recognition,

Hilton Head, S.C., 531–536, 2000.

15. S. C. Lee, R. Nevatia and S. K. Jung. Automatic inte-

gration of fa¸ cade textures into 3d building models with

a projective geometry based line clustering. Computer

Graphics Forum, 21(3), 511–519, 2002.

16. S. Neumann, S. You, J. Hu, B. Jiang and J. W.

Lee. Augmented virtual environments (AVE): Dy-

namic fusion of imagery and 3D Models, IEEE Vir-

tual Reality 2003, Los Angeles California, 61–70,

2003.

17. R.G.LaycockandA.M.Day.Automaticallygenerating

large urban environments based on the footprint data of

buildings.ACMSolidModelling2003,Seattle,346–351,

2003.

18. F. Waack. http://www.stereoscopy.com/library/waack-

ch-4.html, 2004.

19. J. Shi and C. Tomasi. Good features to track. In IEEE

Conference on CVPR, 593–600, 1994.

20. J. F. Canny. A computational approach to edge detec-

tion,IEEETransactiononPatternAnalysisandMachine

Intelligence, 679–698, 1986.

21. N.Ayache

of

andB.Faverjon.

by

Efficient

matching

registra-

graphtion

scriptions of edge segments. IJCV, 1(2): 107–131,

1987.

stereoimagesde-

22. J. J. Guerrero and C. Sag¨ u´ es. Robust line matching

and estimate of homographies simultaneously. Pattern

Recognition and Image Analysis, 2652/2003, 297–307,

2003.

23. K. Muhlmann, D. Maier, J. Hesser and R. Manner. Cal-

culating dense disparity maps from colour stereo im-

ages, and efficient implementation. In IEEE Conference

on Computer Vision and Pattern Recognition, Kauai

Marriott, Hawaii, 30–31, 2001.

24. R. Hartley and A. Zisserman. Multiple view geometry

in computer vision, Cambridge University Press, 2000,

0-521-62304-9.

25. M. A. Fischler and R. C. Bolles. Random sample con-

sensus: A paradigm for model fitting with applications

to image analysis and automated cartography. CACM,

24(6): 381–395, 1981.

26. D. Liebowitz and A. Zisserman. Metric rectification

for perspective images of planes, In Proceedings of

IEEE Conference on CVPR, Santa Barbara, California,

pp. 482–488, 1998.

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.

Page 14

82

R. G. Laycock and A. M. Day/Image Registration in a Coarse 3D Virtual Environment

27. R. I. Hartley. In defense of the eight point al-

gorithm,

IEEEtransactions

ysisand machineintelligence,

1997.

on pattern

19(6):

anal-

580–593,

28. A. Telea. An inpainting technique based on the fast

marchingmethod.JournalofGraphicsTools,9(1),ACM

Press, 25–36, 2004.

29. R. G. Laycock and A. M. Day. Automatic techniques

fortexturemappinginvirtualurbanenvironments.Com-

puter Graphics International 2004, 586–589, 2004.

c ?2006 The Authors

Journal compilation c ?2006 The Eurographics Association and Blackwell Publishing Ltd.