Page 1

LINEAR GLOBAL MOSAICS FOR UNDERWATER SURVEYING

Nuno Gracias, Jo˜ ao Paulo Costeira and Jos´ e Santos-Victor∗

Instituto Superior T´ ecnico & Instituto de Sistemas e Rob´ otica

Av. Rovisco Pais, 1049–001 Lisboa Codex, Portugal

Abstract.

in survey missions, is the ability to quickly generate a wide area view of the sea floor. This paper

presents a method for the fast creation of globally consistent video mosaics. A closed–form solution

for the estimation of the global image motion is presented. It uses a least-squares criteria over a

residual vector which is linear on the homography parameters. Aiming at real–time operation, a fast

implementation is described using recursive least–squares, which permits the creation of globally

consistent mosaics during video acquisition. The application to underwater imagery is illustrated by

the creation of video mosaics capable of being used for surveying or autonomous navigation.

An important feature for autonomous underwater vehicles equipped with video cameras

Key Words. Global image registration; recursive least–squares; underwater mosaicing; computer

vision

1. INTRODUCTION

This paper addresses the problem of the fast cre-

ation of globally consistent mosaics. We present a

simple formulation based on an affine description

of the image motion. This model allows for for-

mulating the mosaic creation problem as the min-

imization of the norm of a vector of residues which

is a sparse linear combination of the coordinates of

point sets resulting from matching several image

pairs. The linear nature of the problem allows for

obtaining fast solutions using least squares meth-

ods.

The methodology in this paper can be used for effi-

ciently creating navigation maps for autonomous

underwater vehicles, or in ROV–assisted human

surveying of an underwater region. For computer

vision applications requiring higher registration

accuracy, the method is valuable in providing an

initial global motion estimate. This estimate can

serve as the initial value for a posterior finer regis-

tration step, involving more specific motion mod-

els and non–linear optimization.

An important feature for autonomous underwater

vehicles equipped with video cameras is the abil-

ity to quickly generate a wide area view of the sea

floor. Such view can easily be interpreted by a

human operator on a survey mission or be used

as a spatial representation for navigation. When

compared with land or aerial environments, the

0Email: {ngracias,jpc,jasv}@isr.ist.utl.pt. This work was

(partially) supported by the FCT, the FCT Programa Op-

eracional Sociedade de Informa¸ c˜ ao (POSI) in the frame of

QCA III.

light underwater is subject to intense attenuation

and scattering. These factors severely limit the

definition and range of underwater imagery. Un-

der such conditions, video mosaicing methods are

suited to creating large visual representations of

the sea floor, through the registration of many

close–range images.

Underwater video mapping commonly requires

the registration of large sets of images of the

region of interest (Gracias et al., 2003; Negah-

daripour and Firoozfam, 2001). Most commonly

the image registration is performed by pair–wise

image registration in chronological order (Gra-

cias and Santos-Victor, 2000; Plakas and Trucco,

2000). The resulting motion estimates are then

concatenated to infer the relation between any

pair of images. However, even small amounts of

noise in the estimation process may result in large

accumulated error. This is most noticeable if the

image sequence contains regions of the scene that

have been captured some time before, such as loop

camera trajectories.

1.1. Related work on global registration

A number of authors have tackled the problem

of registration for camera loop trajectories in or-

der to create spatially coherent mosaics (Sawhney

et al., 1998). Bundle adjustment techniques from

the photogrammetry literature have been success-

fully adapted to image registering applications

(McLauchlan and Jaenicke, 2000).

feature of such approaches is the use of an obser-

A common

IAV2004 - PREPRINTS

5th IFAC/EURON Symposium on Intelligent Autonomous Vehicles

Instituto Superior Técnico, Lisboa, Portugal

July 5-7, 2004

Page 2

vation model that impose non-linear constraints

on the motion parameters (Gracias and Santos-

Victor, 2001; Duffin and Barrett, 1998), thus re-

quiring off–line minimization which is often highly

time–consuming.

Recently (Unnikrishnan and Kelly, 2002) ad-

dressed the problem of efficiently distorting strip

mosaics in order to close loops in a smooth way.

The proposed solution has low computational

complexity and is best suited for the case where

the number of temporally distant overlaps is small

compared to the adjacent ones.

In (Davis, 1998), a least squares solution is out-

lined for the global registration of images captured

under no translation. The elements of pair-wise

homographies are used as data in a linear sys-

tem of equations for registering each image on a

common reference frame. However, the issue of

independent scale factors arising from the use of

projective homographies is not addressed.

Garcia et al.(Garcia et al., 2002) address the prob-

lem of estimating the position of an AUV while

constructing a mosaic. The issue of looping tra-

jectories is dealt with using a Kalman Filter with

an augmented state vector. Part of our paper ad-

dress the same issues, but using a recursive least

squares framework, where there is no concern in

explicitly estimating the position of the camera

with respect to the mosaic, nor the need of a dy-

namic model for the motion of the vehicle deploy-

ing the camera.

The simple observation structure that arises from

using the affine motion model for global registra-

tion has been overlooked in the mosaic creation

literature. The main contribution of our paper lies

on the formulation of the global registration prob-

lem in the least squares framework. This frame-

work enables a simple and fast implementation,

which we illustrate on underwater mosaics.

2. METHODOLOGY

The methodology in this paper is divided into two

parts. The first addresses the registration as a

batch, using a linear least squares criteria. The

second describes a recursive formulation intended

for real–time operation.

2.1. Least Squares Solution to the Global

Mosaic

In this section we will assume that a sequence of

image frames have been acquired. For each pair

of overlapping images (i,j), we will assume that

a number Pijof point correspondences have been

found. Let xn

coordinates of the nthmeasured in the coordinate

frame of image j, which matches point xn

sured in the coordinate frame of image i.

i,j=

?un

i,j,vn

i,j

?

be the 2–D image

j,i, mea-

We will now address the problem of global image

motion estimation. As we are interested in obtain-

ing a fast solution, we will formulate the problem

as the minimization of the norm of a residual vec-

tor which is linear with the image motion param-

eters. The residual vector contains the differences

in the coordinates of selected image points which

are in correspondence and are mapped to a com-

mon reference frame.

We will assume that the image motion between

frames can be adequately described by an affine

model, represented by a 3 × 3 affine homography

matrix of 6 parameters. Ideally, for the applica-

tion cases where there is unconstrained camera

motion (such as 3D camera translation and rota-

tion), one would use a full projective collineation

as this is the most general and accurate model for

describing the mapping between two or more im-

ages obtained by projective cameras looking at a

plane. However, the affine model is the most gen-

eral model which allows for the residual vector to

be expressed as a linear combination of the motion

parameters for more than two images.

The following linear formulation assumes the pres-

ence of a common reference frame where the resid-

uals are measured. Let HRef,i be the affine ho-

mography that maps the coordinate frame of im-

age i onto the reference frame, such that

⎡

⎣

HRef,i=

⎢

h(i)

1

h(i)

4

0

h(i)

2

h(i)

5

0

h(i)

3

h(i)

6

1

⎤

⎥

⎦.

Let Hi,jbe the affine homography relating frames

i and j. The corresponding point residues are de-

i,j= C?xn

C (x) =

000

u

− →

h(i)=

h(i)

1

h(i)

2

h(i)

3

fined as rn

j,i

?·− →

0

h(i)−C?xn

0

v

i,j

?·−→

h(j), where

?

uv

10

1

?

?

h(i)

4

h(i)

5

h(i)

6

?T

.

Given a set of N images, we assume that a set of

M homographies were found by pair–wise match-

ing, such that the whole image set is connected,

i.e., every image can be related to every other by

appropriately cascading the homographies. This

condition implies M ≥ N − 1, where the case

M = N − 1 would correspond to having each im-

age matched with only one other image, such as

the case of simple time sequential matching.

Page 3

Let?β be defined as the 6N × 1 vector containing

?β =

linear system of equations can be written in the

form

all the stacked elements of HRef,i for all images,

?

equations of the point residues for all points, a

h(1)

1...h(1)

6h(2)

1...h(N)

6

?T

. By combining the

R =? X ·?β

where R is a vector of stacked point residues and

? X is a sparse matrix obtained from the coordi-

matched points,? X is sized 2P × 6N and has a

The sough solution will be obtained by minimiz-

ing the norm of the residues vector. However, in

order to find a unique solution we must establish

the common reference frame where the point co-

ordinate differences are measured. A simple way

to do this, is to select one of the image frames

as the common mosaic reference frame. Without

loss of generality, we will select the first image

frame as the reference frame, which is expressed

−→

1000

(2.1)

nates of the point matches.For a total of P

maximum of 6P non-zero elements.

as

h(1)=?

An unconstrained system of linear equations can

10

?T.

formed from eq. 2.1 by excluding

Let β be defined as β =

Equation 2.1 can be expressed as

−→

h(1)from?β.

?

h(2)

1...h(2)

6h(3)

1...h(N)

6

?T

.

R = [ ?

X1

X ] ·

?−→

h(1)

β

?

= X · β − Y

where?

Using the L2norm, the global mosaicing problem

can be stated as the classic unconstrained least

squares problem of finding the estimate?β such

ever the computation of XTX should be avoided

as it can lead to a large condition number and

thus limit the accuracy of the solution. In this

paper we have used a method based on the QR

decomposition of X (Press et al., 1988).

X1and X are matrices of sizes a 2P ×6 and

2P × 6(N − 1) respectively, and Y = −?

X1·−→

h(1).

that?β = argminβ?X · β − Y ?2, for which the

closed form solution is?β =?XTX?−1· Y . How-

The simplicity of the least squares formulation

and the sparse structure of the matrix X allow

for a fast solution to the global mosaicing prob-

lem. This motivates a recursive least squares for-

mulation, suited for real–time applications, that

will be presented in the following section.

2.2. Recursive Least Squares

We will now assume that we have a image stream

and that we are able to match each new incoming

image with (a least) the previous incoming image.

The recursive formulation is based on two distinct

estimate updates, namely an observation update

and an order update. The observation update cor-

responds to the inclusion of new data arising from

a match between a pair of previously acquired im-

ages, whereas the order update corresponds to the

inclusion of a new image.

For the order update we assume that the new im-

age is only matched with the previous one. This

assumption is validated by the fact that, in prac-

tical applications, we have large superposition be-

tween time–adjacent frames and thus the match-

ing of a new image with the previous has a high

probability of being successful. Furthermore, we

will take advantage of the special observations

structure.

The general notation for the recursive formulation

is the following. Let?βt, Xtand Ytbe the instances

Ntbe the number of images at t, so that?βtis a

the first image to be the reference frame.

at discrete time instant t,of?β, X and Y .

6(Nt− 1) × 1 vector. As before, we will consider

Let

2.2.1. Observation Update

vation update corresponds to the appending a

2Pij×6(Nt− 1) matrix xtto Xt−1and the corre-

sponding 2Pij×1 vector ytto Yt−1. The updated

estimate?βtis obtained recursively from?βt−1, in-

sive formulation involves the storage and updat-

ing of the inverse of the autocorrelation matrix

M−1

t

=

square root filter which maintains a factorization

of the form M−1

t

= StST

in terms of stability. Details on the formulation

and implementation of this filter can be found in

(Pollock, 1999).

A new obser-

volving Xt−1, Yt−1, xtand yt. The simplest recur-

?XT

tXt

?−1. However we have used the

tand compares favorably

2.2.2. Order Update

volves enlarging?βt−1to accommodate the homog-

estimate is found by solving Xt·?βt= Ytwhere

Xt=

AB

?

Matrices A and B are sized 2Pn−1,n× 6(Nt− 2)

and 2Pn−1,n× 6 respectively. The updating of

?βtimplies the computation of h(Nt)which can be

The order update in-

raphy parameters for the new image. The new

?

Xt−1

0

?

,?βt=

?

?βt−1

h(Nt)

?

and Yt=

Yt−1

yt

?

.

Page 4

obtained by solving

A ·?βt−1+ B · h(Nt)= yt.

with the previous, A as the following structure

A =?

h(Nt)=?BTB?−1BT?

where h(Nt−1)are the lower 6 elements of?βt−1.

cess due to the small sizes of B and D.

As we assume that each new image is matched

0

D

?where D is a 2Pn−1,n× 6 matrix.

yt− D · h(Nt−1)?

Using a least squares criteria, h(Nt)is given by

Note that the computation of h(Nt)is a fast pro-

We now need to update St in order to use the

square root filter posteriorly.

tween St and St−1 can be compactly expressed

as

?StST

where G =?St−1ST

larger than BTB.

The relation be-

t

?−1=

t−1

?

GATB

BTB

BTA

?

?−1+ATA. Note that for a

large number of images the matrix G will be much

Taking into consideration the execution speed re-

quirements, we are interested in computing StST

without requiring the explicit inversion of G. Us-

ing the formulas of inversion by partition StST

can be computed as

?

where

T =?BTB − BTAG−1ATB?−1

P = G−1− QQTG−1.

Having G−1, the above expressions require few

matrix additions and multiplications, and the in-

version of a 2Pn−1,n× 2Pn−1,n matrix. The in-

verse of G can be efficiently computed using the

Woodbury formula,

G−1=?E−1+ ATA?−1=

E −

where E = St−1ST

2Pn−1,n identity matrix. Again, the above for-

mula implies the inversion of a 2Pn−1,n×2Pn−1,n

matrix. Finally, Stis recovered from StST

a Choleski factorization algorithm.

t

t

StST

t=

P

QT

Q

T

?

Q = −?G−1ATB?· T

?

E · AT·?I + A · E · AT?−1· A · E

t−1and I is the 2Pn−1,n×

?

t using

3. IMPLEMENTATION

This section details the algorithms we used to val-

idate the approach and discusses some implemen-

tation details that influence the performance. All

benchmarks refers to a 1.6GHz processor and ac-

quired images of 180 × 135 pixels.

Acquire 2 images

Match images

Initialize RLS structure

Acquire new image

Match with previous

RLS order update

Detect unmatched image

pair of high superposition

Match image pair of high

superposition

RLS observation update

Y

Y

N

N

Fig. 3.1. Step sequence for the recursive construction

of the mosaic.

3.1. Pair wise matching

An essential building block for the methods in this

paper is the algorithm for finding point correspon-

dences between two images of the same planar

scene. The algorithm is summarized in the fol-

lowing. Further details can be found in (Gracias,

2003).

A set of point features corresponding to textured

areas, is extracted from one of the images. For

each feature (defined as a small square image

patch centered at the detected corner location),

a prospective match is found in the other image,

using normalized cross-correlation. A robust esti-

mation technique is used to remove outliers using

a Least Median of Squares criterion, and random

sampling. Typical total execution time is 1 sec-

ond, for 50 inliers selected out of 70 matched fea-

tures.

3.2. Recursive Mosaic Algorithm

An algorithm for the recursive construction of mo-

saics was implemented. This algorithm combines

the pair-wise image matching, superposition de-

tection and recursive least squares updates, to

create a globally consistent mosaic on–line. The

overall algorithmic flow is presented in Figure 3.1.

The recursive procedure requires the initialization

the data structures it maintains. The initial val-

ues ofˆβ0, X0, Y0 and S0 are obtained from the

point matches of two initial frames.

algorithm contains two nested cycles. The outer

cycle corresponds to the inclusion of a new image

while the inner cycle exploits the superposition

Next, the

Page 5

between previously acquired images.

A new image is matched over the last one. The

resulting point matches are used to update the or-

der of the RLS filter, as described in Section 2.2.2.

As this image may also overlap other images, we

search for superposition between non–consecutive

images. If large overlap is found between an un-

matched image pair, the pair–wise matching is at-

tempted. If it succeeds, thenˆβ is updated us-

ing the square root filter. This cycle of super-

position detection, matching and RLS update is

performed until no unmatched overlapping image

pairs are found or all attempted image matching

fails. Then a new image is processed.

We measure the amount of superposition between

any pair of images by composing the correspond-

ing inter–image homography, using the currentˆβ.

3.3. Implementation Considerations

Both batch and recursive least squares methods

can straightforwardly be adapted more restricted

models of image motion. For the case of underwa-

ter or aerial surveying, it is often preferable to use

the 4 d.o.f. similarity, which is suited for setups

where the image plane is approximately parallel

to the scene (Gracias, 2003). This motion model

was used in some of the experiments. The image

to reference frame homography is defined as

⎡

⎣

and

h(i)=

h(i)

1

h(i)

2

matrices and vectors are sized accordingly.

HRef,i=

⎢

h(i)

1

h(i)

2

0

−h(i)

h(i)

1

0

2

h(i)

3

h(i)

4

1

⎤

⎥

⎦,

. All other

− →

?

h(i)

3

h(i)

4

?T

To promote the execution speed, the number of

point coordinates used in the global estimation

methods was reduced to the minimum required for

the motion model, namely 3 points for the affine

model and 2 for the similarity. Note that the pair-

wise image matching is performed with a large set

of points, and that the resulting homography is

computed from a large set of inliers. The homog-

raphy is used to compute 2 (or 3) virtual points,

for the global estimation.

4. RESULTS

The following results were obtained using un-

derwater video sequences captured from a sub-

mersible. The first sequence contains 85 images

that were acquired while the camera was under-

going a 3 loop trajectory. The average superposi-

Fig. 4.1. Mosaic created from sequential image match-

ing. The effect of the accumulated error is

visible on the marked regions, corresponding

to the same features on the sea floor.

Fig. 4.2. Global mosaic using batch least squares.

tion between time consecutive frames is 55%. Fig-

ure 4.1 shows the result of from simple sequential

image matching. Several sources of error, such

as non–planar scene, limited matching resolution

and affine camera model, lead to the error ac-

cumulation which is visible in the repetitive pat-

terns corresponding to the same ground features.

The mosaic of Figure 4.2 was created using batch

least squares with the affine motion model and

309 pairs of matched images.

A second sequence of 129 image frames, selected

from a larger set of 6 minutes of video, was used to

create the mosaic of Figure 4.3, using the recursive

mosaic algorithm. Upon completion, 272 pairs of

images were matched.

For comparison, Table 4.1 presents the optimiza-

tion time required to obtain a global motion es-

timate using batch (BLS) and non–linear least

squares (NLLS). The NLLS method is described

Page 6

Fig. 4.3. Global mosaic using recursive least squares.

Sequence

First

Second

model

Affine

Similarity

BLS

0.16

0.14

NLLS

13

20

Table 4.1Execution time (in seconds) for batch

(BLS) and non–linear least squares

(NLLS).

in (Gracias, 2003), where it was used for topol-

ogy estimation. Although much slower, the NLLS

presents the advantage of coping with more spe-

cific non–linear motion models, such as the con-

stant scale similarity.

5. CONCLUSIONS AND FUTURE

WORK

In this paper we have presented a least squares

approach for the creation of globally consistent

mosaics. The approach allows for a linear formu-

lation using point matches between pairs of im-

ages, and the fast estimation of the image mo-

tion. Both batch and recursive implementations

were detailed. Due to the low computational de-

mand, an important advantage of the recursive

implementation is the possibility of performing

the image registration, loop detection and trajec-

tory correction in an integrated fashion. For un-

derwater vision applications, this methodology al-

lows for the creation of wide views of the sea floor

during image acquisition, thus being of benefit in

human surveying mission or in map building for

autonomous navigation.

Future work includes the extension of the method

to deal with geographic data, such as sensor read-

ings during acquisition, and world points of known

location. We are also investigating the use of non–

linear image motion models without resorting to

bundle adjustment techniques.

6. REFERENCES

Davis, J. (1998). Mosaics of scenes with moving

objects. In: Proc. of the Conference on Com-

puter Vision and Pattern Recognition. Santa

Barbara, CA, USA.

Duffin, K. and W. Barrett (1998). Globally op-

timal image mosaics. In: Graphics Interface.

pp. 217–222.

Garcia, R., J. Puig, P. Ridao and X. Cufi

(2002). Augmented state Kalman filtering for

AUV navigation. In:

Robotics and Automation. Washington DC,

USA. pp. 4010–4015.

Gracias, N. (2003). Mosaic–based Visual Navi-

gation for Autonomous Underwater Vehicles.

PhD thesis. Instituto Superior T´ ecnico. Lisbon,

Portugal.

Gracias, N. and J. Santos-Victor (2000). Under-

water video mosaics as visual navigation maps.

Computer Vision and Image Understanding

79(1), 66–91.

Gracias, N. and J. Santos-Victor (2001). Under-

water mosaicing and trajectory reconstruction

using global alignment. In: Proc. of the Oceans

2001 Conference. Honolulu, Hawaii, U.S.A..

pp. 2557–2563.

Gracias,N.,S. Zwaan,

J. Santos-Victor (2003). Mosaic based Navi-

gation for Autonomous Underwater Vehicles.

Journal of Oceanic Engineering.

McLauchlan, P. and A. Jaenicke (2000). Image

mosaicing using sequential bundle adjustment.

In: Proc. of the British Machine Vision Con-

ference BMVC2000. Bristol, U.K.

Negahdaripour, S. and P. Firoozfam (2001). Posi-

tioning and image mosaicing of long image se-

quences; Comparison of selected methods. In:

Proc. of the IEEE Oceans 2001 Conference.

Honolulu, Hawai, USA.

Plakas, C. and E. Trucco (2000). Developing a

real-time robust video tracker. In:

the IEEE Oceans 2002 Conference. Providence,

Rhode Island, USA.

Pollock, D. (1999). A Handbook of Time–Series

Analysis, Signal Processing and Dynamics.

Academic Press.

Press, W., S. Teukolsky, W. Vetterling and

B. Flannery (1988). Numerical Recipes in C:

The Art of Scientific Computing. Cambridge

University Press.

Sawhney, H., S. Hsu and R. Kumar (1998). Ro-

bust video mosaicing through topology infer-

ence and local to global alignment. In: Proc.

European Conf. on Computer Vision. Freiburg,

Germany.

Unnikrishnan, R. and A. Kelly (2002). A con-

strained optimization approach to globally con-

sistent mapping. In: Proc. Int. Conf. on Intell.

Robots and Systems. Lausanne, Switzerland.

Proc. Int. Conf. on

A. Bernardino and

Proc. of