Linear Global Mosaics For Underwater Surveying
ABSTRACT An important feature for autonomous underwater vehicles equipped with video cameras in survey missions, is the ability to quickly generate a wide area view of the sea floor. This paper presents a method for the fast creation of globally consistent video mosaics. A closedform solution for the estimation of the global image motion is presented. It uses a leastsquares criteria over a residual vector which is linear on the homography parameters. Aiming at realtime operation, a fast implementation is described using recursive leastsquares, which permits the creation of globally consistent mosaics during video acquisition. The application to underwater imagery is illustrated by the creation of video mosaics capable of being used for surveying or autonomous navigation.

Conference Proceeding: Mosaic aided navigation: Tools, methods and results
[show abstract] [hide abstract]
ABSTRACT: The online construction of an image mosaic or panorama can be exploited to aid the navigation system of an airborne platform. The purpose of the paper is threefold. First, the paper presents some of the tools required for computing a mosaic and using the information collected as a side product within a navigation filter. These tools include a special variation of the Kalman filter and a new formulation of the trifocal tensor from multiframe vision. Second, the paper summarizes a general method for fusing the motion information obtained during the mosaicking process and also shows how “loopclosure” can be used to preserve navigation errors at their initial levels. Third, the paper discusses a number of illustrative examples to show how mosaic aiding can indeed result in a substantial improvement of the navigation solution.Position Location and Navigation Symposium (PLANS), 2010 IEEE/ION; 06/2010  SourceAvailable from: Rafael Garcia[show abstract] [hide abstract]
ABSTRACT: Over the past decade, several image mosaicing methods have been proposed in robotic mapping and remote sensing applications. Owing to rapid developments in obtaining optical data from areas beyond human reach, there is a high demand from different science fields for creating largearea image mosaics, often using images as the only source of information. One of the most important steps in the mosaicing process is motion estimation between overlapping images to obtain the topology, i.e., the spatial relationships between images.In this paper, we propose a generic framework for featurebased image mosaicing capable of obtaining the topology with a reduced number of matching attempts and of getting the best possible trajectory estimation. Innovative aspects include the use of a fast image similarity criterion combined with a Minimum Spanning Tree (MST) solution, to obtain a tentative topology and information theory principles to decide when to update trajectory estimation. Unlike previous approaches for largearea mosaicing, our framework is able to naturally deal with the cases where timeconsecutive images cannot be matched successfully, such as completely unordered sets. This characteristic also makes our approach robust to sensor failure. The performance of the method is illustrated with experimental results obtained from different challenging underwater image sequences.Robotics and Autonomous Systems. 02/2013; 61(2):125–136.  SourceAvailable from: Nuno GraciasJ. Field Robotics. 01/2010; 27:656674.
Page 1
LINEAR GLOBAL MOSAICS FOR UNDERWATER SURVEYING
Nuno Gracias, Jo˜ ao Paulo Costeira and Jos´ e SantosVictor∗
Instituto Superior T´ ecnico & Instituto de Sistemas e Rob´ otica
Av. Rovisco Pais, 1049–001 Lisboa Codex, Portugal
Abstract.
in survey missions, is the ability to quickly generate a wide area view of the sea floor. This paper
presents a method for the fast creation of globally consistent video mosaics. A closed–form solution
for the estimation of the global image motion is presented. It uses a leastsquares criteria over a
residual vector which is linear on the homography parameters. Aiming at real–time operation, a fast
implementation is described using recursive least–squares, which permits the creation of globally
consistent mosaics during video acquisition. The application to underwater imagery is illustrated by
the creation of video mosaics capable of being used for surveying or autonomous navigation.
An important feature for autonomous underwater vehicles equipped with video cameras
Key Words. Global image registration; recursive least–squares; underwater mosaicing; computer
vision
1. INTRODUCTION
This paper addresses the problem of the fast cre
ation of globally consistent mosaics. We present a
simple formulation based on an affine description
of the image motion. This model allows for for
mulating the mosaic creation problem as the min
imization of the norm of a vector of residues which
is a sparse linear combination of the coordinates of
point sets resulting from matching several image
pairs. The linear nature of the problem allows for
obtaining fast solutions using least squares meth
ods.
The methodology in this paper can be used for effi
ciently creating navigation maps for autonomous
underwater vehicles, or in ROV–assisted human
surveying of an underwater region. For computer
vision applications requiring higher registration
accuracy, the method is valuable in providing an
initial global motion estimate. This estimate can
serve as the initial value for a posterior finer regis
tration step, involving more specific motion mod
els and non–linear optimization.
An important feature for autonomous underwater
vehicles equipped with video cameras is the abil
ity to quickly generate a wide area view of the sea
floor. Such view can easily be interpreted by a
human operator on a survey mission or be used
as a spatial representation for navigation. When
compared with land or aerial environments, the
0Email: {ngracias,jpc,jasv}@isr.ist.utl.pt. This work was
(partially) supported by the FCT, the FCT Programa Op
eracional Sociedade de Informa¸ c˜ ao (POSI) in the frame of
QCA III.
light underwater is subject to intense attenuation
and scattering. These factors severely limit the
definition and range of underwater imagery. Un
der such conditions, video mosaicing methods are
suited to creating large visual representations of
the sea floor, through the registration of many
close–range images.
Underwater video mapping commonly requires
the registration of large sets of images of the
region of interest (Gracias et al., 2003; Negah
daripour and Firoozfam, 2001). Most commonly
the image registration is performed by pair–wise
image registration in chronological order (Gra
cias and SantosVictor, 2000; Plakas and Trucco,
2000). The resulting motion estimates are then
concatenated to infer the relation between any
pair of images. However, even small amounts of
noise in the estimation process may result in large
accumulated error. This is most noticeable if the
image sequence contains regions of the scene that
have been captured some time before, such as loop
camera trajectories.
1.1. Related work on global registration
A number of authors have tackled the problem
of registration for camera loop trajectories in or
der to create spatially coherent mosaics (Sawhney
et al., 1998). Bundle adjustment techniques from
the photogrammetry literature have been success
fully adapted to image registering applications
(McLauchlan and Jaenicke, 2000).
feature of such approaches is the use of an obser
A common
IAV2004  PREPRINTS
5th IFAC/EURON Symposium on Intelligent Autonomous Vehicles
Instituto Superior Técnico, Lisboa, Portugal
July 57, 2004
Page 2
vation model that impose nonlinear constraints
on the motion parameters (Gracias and Santos
Victor, 2001; Duffin and Barrett, 1998), thus re
quiring off–line minimization which is often highly
time–consuming.
Recently (Unnikrishnan and Kelly, 2002) ad
dressed the problem of efficiently distorting strip
mosaics in order to close loops in a smooth way.
The proposed solution has low computational
complexity and is best suited for the case where
the number of temporally distant overlaps is small
compared to the adjacent ones.
In (Davis, 1998), a least squares solution is out
lined for the global registration of images captured
under no translation. The elements of pairwise
homographies are used as data in a linear sys
tem of equations for registering each image on a
common reference frame. However, the issue of
independent scale factors arising from the use of
projective homographies is not addressed.
Garcia et al.(Garcia et al., 2002) address the prob
lem of estimating the position of an AUV while
constructing a mosaic. The issue of looping tra
jectories is dealt with using a Kalman Filter with
an augmented state vector. Part of our paper ad
dress the same issues, but using a recursive least
squares framework, where there is no concern in
explicitly estimating the position of the camera
with respect to the mosaic, nor the need of a dy
namic model for the motion of the vehicle deploy
ing the camera.
The simple observation structure that arises from
using the affine motion model for global registra
tion has been overlooked in the mosaic creation
literature. The main contribution of our paper lies
on the formulation of the global registration prob
lem in the least squares framework. This frame
work enables a simple and fast implementation,
which we illustrate on underwater mosaics.
2. METHODOLOGY
The methodology in this paper is divided into two
parts. The first addresses the registration as a
batch, using a linear least squares criteria. The
second describes a recursive formulation intended
for real–time operation.
2.1. Least Squares Solution to the Global
Mosaic
In this section we will assume that a sequence of
image frames have been acquired. For each pair
of overlapping images (i,j), we will assume that
a number Pijof point correspondences have been
found. Let xn
coordinates of the nthmeasured in the coordinate
frame of image j, which matches point xn
sured in the coordinate frame of image i.
i,j=
?un
i,j,vn
i,j
?
be the 2–D image
j,i, mea
We will now address the problem of global image
motion estimation. As we are interested in obtain
ing a fast solution, we will formulate the problem
as the minimization of the norm of a residual vec
tor which is linear with the image motion param
eters. The residual vector contains the differences
in the coordinates of selected image points which
are in correspondence and are mapped to a com
mon reference frame.
We will assume that the image motion between
frames can be adequately described by an affine
model, represented by a 3 × 3 affine homography
matrix of 6 parameters. Ideally, for the applica
tion cases where there is unconstrained camera
motion (such as 3D camera translation and rota
tion), one would use a full projective collineation
as this is the most general and accurate model for
describing the mapping between two or more im
ages obtained by projective cameras looking at a
plane. However, the affine model is the most gen
eral model which allows for the residual vector to
be expressed as a linear combination of the motion
parameters for more than two images.
The following linear formulation assumes the pres
ence of a common reference frame where the resid
uals are measured. Let HRef,i be the affine ho
mography that maps the coordinate frame of im
age i onto the reference frame, such that
⎡
⎣
HRef,i=
⎢
h(i)
1
h(i)
4
0
h(i)
2
h(i)
5
0
h(i)
3
h(i)
6
1
⎤
⎥
⎦.
Let Hi,jbe the affine homography relating frames
i and j. The corresponding point residues are de
i,j= C?xn
C (x) =
000
u
− →
h(i)=
h(i)
1
h(i)
2
h(i)
3
fined as rn
j,i
?·− →
0
h(i)−C?xn
0
v
i,j
?·−→
h(j), where
?
uv
10
1
?
?
h(i)
4
h(i)
5
h(i)
6
?T
.
Given a set of N images, we assume that a set of
M homographies were found by pair–wise match
ing, such that the whole image set is connected,
i.e., every image can be related to every other by
appropriately cascading the homographies. This
condition implies M ≥ N − 1, where the case
M = N − 1 would correspond to having each im
age matched with only one other image, such as
the case of simple time sequential matching.
Page 3
Let?β be defined as the 6N × 1 vector containing
?β =
linear system of equations can be written in the
form
all the stacked elements of HRef,i for all images,
?
equations of the point residues for all points, a
h(1)
1...h(1)
6h(2)
1...h(N)
6
?T
. By combining the
R =? X ·?β
where R is a vector of stacked point residues and
? X is a sparse matrix obtained from the coordi
matched points,? X is sized 2P × 6N and has a
The sough solution will be obtained by minimiz
ing the norm of the residues vector. However, in
order to find a unique solution we must establish
the common reference frame where the point co
ordinate differences are measured. A simple way
to do this, is to select one of the image frames
as the common mosaic reference frame. Without
loss of generality, we will select the first image
frame as the reference frame, which is expressed
−→
1000
(2.1)
nates of the point matches.For a total of P
maximum of 6P nonzero elements.
as
h(1)=?
An unconstrained system of linear equations can
10
?T.
formed from eq. 2.1 by excluding
Let β be defined as β =
Equation 2.1 can be expressed as
−→
h(1)from?β.
?
h(2)
1...h(2)
6h(3)
1...h(N)
6
?T
.
R = [ ?
X1
X ] ·
?−→
h(1)
β
?
= X · β − Y
where?
Using the L2norm, the global mosaicing problem
can be stated as the classic unconstrained least
squares problem of finding the estimate?β such
ever the computation of XTX should be avoided
as it can lead to a large condition number and
thus limit the accuracy of the solution. In this
paper we have used a method based on the QR
decomposition of X (Press et al., 1988).
X1and X are matrices of sizes a 2P ×6 and
2P × 6(N − 1) respectively, and Y = −?
X1·−→
h(1).
that?β = argminβ?X · β − Y ?2, for which the
closed form solution is?β =?XTX?−1· Y . How
The simplicity of the least squares formulation
and the sparse structure of the matrix X allow
for a fast solution to the global mosaicing prob
lem. This motivates a recursive least squares for
mulation, suited for real–time applications, that
will be presented in the following section.
2.2. Recursive Least Squares
We will now assume that we have a image stream
and that we are able to match each new incoming
image with (a least) the previous incoming image.
The recursive formulation is based on two distinct
estimate updates, namely an observation update
and an order update. The observation update cor
responds to the inclusion of new data arising from
a match between a pair of previously acquired im
ages, whereas the order update corresponds to the
inclusion of a new image.
For the order update we assume that the new im
age is only matched with the previous one. This
assumption is validated by the fact that, in prac
tical applications, we have large superposition be
tween time–adjacent frames and thus the match
ing of a new image with the previous has a high
probability of being successful. Furthermore, we
will take advantage of the special observations
structure.
The general notation for the recursive formulation
is the following. Let?βt, Xtand Ytbe the instances
Ntbe the number of images at t, so that?βtis a
the first image to be the reference frame.
at discrete time instant t,of?β, X and Y .
6(Nt− 1) × 1 vector. As before, we will consider
Let
2.2.1. Observation Update
vation update corresponds to the appending a
2Pij×6(Nt− 1) matrix xtto Xt−1and the corre
sponding 2Pij×1 vector ytto Yt−1. The updated
estimate?βtis obtained recursively from?βt−1, in
sive formulation involves the storage and updat
ing of the inverse of the autocorrelation matrix
M−1
t
=
square root filter which maintains a factorization
of the form M−1
t
= StST
in terms of stability. Details on the formulation
and implementation of this filter can be found in
(Pollock, 1999).
A new obser
volving Xt−1, Yt−1, xtand yt. The simplest recur
?XT
tXt
?−1. However we have used the
tand compares favorably
2.2.2. Order Update
volves enlarging?βt−1to accommodate the homog
estimate is found by solving Xt·?βt= Ytwhere
Xt=
AB
?
Matrices A and B are sized 2Pn−1,n× 6(Nt− 2)
and 2Pn−1,n× 6 respectively. The updating of
?βtimplies the computation of h(Nt)which can be
The order update in
raphy parameters for the new image. The new
?
Xt−1
0
?
,?βt=
?
?βt−1
h(Nt)
?
and Yt=
Yt−1
yt
?
.
Page 4
obtained by solving
A ·?βt−1+ B · h(Nt)= yt.
with the previous, A as the following structure
A =?
h(Nt)=?BTB?−1BT?
where h(Nt−1)are the lower 6 elements of?βt−1.
cess due to the small sizes of B and D.
As we assume that each new image is matched
0
D
?where D is a 2Pn−1,n× 6 matrix.
yt− D · h(Nt−1)?
Using a least squares criteria, h(Nt)is given by
Note that the computation of h(Nt)is a fast pro
We now need to update St in order to use the
square root filter posteriorly.
tween St and St−1 can be compactly expressed
as
?StST
where G =?St−1ST
larger than BTB.
The relation be
t
?−1=
t−1
?
GATB
BTB
BTA
?
?−1+ATA. Note that for a
large number of images the matrix G will be much
Taking into consideration the execution speed re
quirements, we are interested in computing StST
without requiring the explicit inversion of G. Us
ing the formulas of inversion by partition StST
can be computed as
?
where
T =?BTB − BTAG−1ATB?−1
P = G−1− QQTG−1.
Having G−1, the above expressions require few
matrix additions and multiplications, and the in
version of a 2Pn−1,n× 2Pn−1,n matrix. The in
verse of G can be efficiently computed using the
Woodbury formula,
G−1=?E−1+ ATA?−1=
E −
where E = St−1ST
2Pn−1,n identity matrix. Again, the above for
mula implies the inversion of a 2Pn−1,n×2Pn−1,n
matrix. Finally, Stis recovered from StST
a Choleski factorization algorithm.
t
t
StST
t=
P
QT
Q
T
?
Q = −?G−1ATB?· T
?
E · AT·?I + A · E · AT?−1· A · E
t−1and I is the 2Pn−1,n×
?
t using
3. IMPLEMENTATION
This section details the algorithms we used to val
idate the approach and discusses some implemen
tation details that influence the performance. All
benchmarks refers to a 1.6GHz processor and ac
quired images of 180 × 135 pixels.
Acquire 2 images
Match images
Initialize RLS structure
Acquire new image
Match with previous
RLS order update
Detect unmatched image
pair of high superposition
Match image pair of high
superposition
RLS observation update
Y
Y
N
N
Fig. 3.1. Step sequence for the recursive construction
of the mosaic.
3.1. Pair wise matching
An essential building block for the methods in this
paper is the algorithm for finding point correspon
dences between two images of the same planar
scene. The algorithm is summarized in the fol
lowing. Further details can be found in (Gracias,
2003).
A set of point features corresponding to textured
areas, is extracted from one of the images. For
each feature (defined as a small square image
patch centered at the detected corner location),
a prospective match is found in the other image,
using normalized crosscorrelation. A robust esti
mation technique is used to remove outliers using
a Least Median of Squares criterion, and random
sampling. Typical total execution time is 1 sec
ond, for 50 inliers selected out of 70 matched fea
tures.
3.2. Recursive Mosaic Algorithm
An algorithm for the recursive construction of mo
saics was implemented. This algorithm combines
the pairwise image matching, superposition de
tection and recursive least squares updates, to
create a globally consistent mosaic on–line. The
overall algorithmic flow is presented in Figure 3.1.
The recursive procedure requires the initialization
the data structures it maintains. The initial val
ues ofˆβ0, X0, Y0 and S0 are obtained from the
point matches of two initial frames.
algorithm contains two nested cycles. The outer
cycle corresponds to the inclusion of a new image
while the inner cycle exploits the superposition
Next, the
Page 5
between previously acquired images.
A new image is matched over the last one. The
resulting point matches are used to update the or
der of the RLS filter, as described in Section 2.2.2.
As this image may also overlap other images, we
search for superposition between non–consecutive
images. If large overlap is found between an un
matched image pair, the pair–wise matching is at
tempted. If it succeeds, thenˆβ is updated us
ing the square root filter. This cycle of super
position detection, matching and RLS update is
performed until no unmatched overlapping image
pairs are found or all attempted image matching
fails. Then a new image is processed.
We measure the amount of superposition between
any pair of images by composing the correspond
ing inter–image homography, using the currentˆβ.
3.3. Implementation Considerations
Both batch and recursive least squares methods
can straightforwardly be adapted more restricted
models of image motion. For the case of underwa
ter or aerial surveying, it is often preferable to use
the 4 d.o.f. similarity, which is suited for setups
where the image plane is approximately parallel
to the scene (Gracias, 2003). This motion model
was used in some of the experiments. The image
to reference frame homography is defined as
⎡
⎣
and
h(i)=
h(i)
1
h(i)
2
matrices and vectors are sized accordingly.
HRef,i=
⎢
h(i)
1
h(i)
2
0
−h(i)
h(i)
1
0
2
h(i)
3
h(i)
4
1
⎤
⎥
⎦,
. All other
− →
?
h(i)
3
h(i)
4
?T
To promote the execution speed, the number of
point coordinates used in the global estimation
methods was reduced to the minimum required for
the motion model, namely 3 points for the affine
model and 2 for the similarity. Note that the pair
wise image matching is performed with a large set
of points, and that the resulting homography is
computed from a large set of inliers. The homog
raphy is used to compute 2 (or 3) virtual points,
for the global estimation.
4. RESULTS
The following results were obtained using un
derwater video sequences captured from a sub
mersible. The first sequence contains 85 images
that were acquired while the camera was under
going a 3 loop trajectory. The average superposi
Fig. 4.1. Mosaic created from sequential image match
ing. The effect of the accumulated error is
visible on the marked regions, corresponding
to the same features on the sea floor.
Fig. 4.2. Global mosaic using batch least squares.
tion between time consecutive frames is 55%. Fig
ure 4.1 shows the result of from simple sequential
image matching. Several sources of error, such
as non–planar scene, limited matching resolution
and affine camera model, lead to the error ac
cumulation which is visible in the repetitive pat
terns corresponding to the same ground features.
The mosaic of Figure 4.2 was created using batch
least squares with the affine motion model and
309 pairs of matched images.
A second sequence of 129 image frames, selected
from a larger set of 6 minutes of video, was used to
create the mosaic of Figure 4.3, using the recursive
mosaic algorithm. Upon completion, 272 pairs of
images were matched.
For comparison, Table 4.1 presents the optimiza
tion time required to obtain a global motion es
timate using batch (BLS) and non–linear least
squares (NLLS). The NLLS method is described
Page 6
Fig. 4.3. Global mosaic using recursive least squares.
Sequence
First
Second
model
Affine
Similarity
BLS
0.16
0.14
NLLS
13
20
Table 4.1Execution time (in seconds) for batch
(BLS) and non–linear least squares
(NLLS).
in (Gracias, 2003), where it was used for topol
ogy estimation. Although much slower, the NLLS
presents the advantage of coping with more spe
cific non–linear motion models, such as the con
stant scale similarity.
5. CONCLUSIONS AND FUTURE
WORK
In this paper we have presented a least squares
approach for the creation of globally consistent
mosaics. The approach allows for a linear formu
lation using point matches between pairs of im
ages, and the fast estimation of the image mo
tion. Both batch and recursive implementations
were detailed. Due to the low computational de
mand, an important advantage of the recursive
implementation is the possibility of performing
the image registration, loop detection and trajec
tory correction in an integrated fashion. For un
derwater vision applications, this methodology al
lows for the creation of wide views of the sea floor
during image acquisition, thus being of benefit in
human surveying mission or in map building for
autonomous navigation.
Future work includes the extension of the method
to deal with geographic data, such as sensor read
ings during acquisition, and world points of known
location. We are also investigating the use of non–
linear image motion models without resorting to
bundle adjustment techniques.
6. REFERENCES
Davis, J. (1998). Mosaics of scenes with moving
objects. In: Proc. of the Conference on Com
puter Vision and Pattern Recognition. Santa
Barbara, CA, USA.
Duffin, K. and W. Barrett (1998). Globally op
timal image mosaics. In: Graphics Interface.
pp. 217–222.
Garcia, R., J. Puig, P. Ridao and X. Cufi
(2002). Augmented state Kalman filtering for
AUV navigation. In:
Robotics and Automation. Washington DC,
USA. pp. 4010–4015.
Gracias, N. (2003). Mosaic–based Visual Navi
gation for Autonomous Underwater Vehicles.
PhD thesis. Instituto Superior T´ ecnico. Lisbon,
Portugal.
Gracias, N. and J. SantosVictor (2000). Under
water video mosaics as visual navigation maps.
Computer Vision and Image Understanding
79(1), 66–91.
Gracias, N. and J. SantosVictor (2001). Under
water mosaicing and trajectory reconstruction
using global alignment. In: Proc. of the Oceans
2001 Conference. Honolulu, Hawaii, U.S.A..
pp. 2557–2563.
Gracias,N.,S. Zwaan,
J. SantosVictor (2003). Mosaic based Navi
gation for Autonomous Underwater Vehicles.
Journal of Oceanic Engineering.
McLauchlan, P. and A. Jaenicke (2000). Image
mosaicing using sequential bundle adjustment.
In: Proc. of the British Machine Vision Con
ference BMVC2000. Bristol, U.K.
Negahdaripour, S. and P. Firoozfam (2001). Posi
tioning and image mosaicing of long image se
quences; Comparison of selected methods. In:
Proc. of the IEEE Oceans 2001 Conference.
Honolulu, Hawai, USA.
Plakas, C. and E. Trucco (2000). Developing a
realtime robust video tracker. In:
the IEEE Oceans 2002 Conference. Providence,
Rhode Island, USA.
Pollock, D. (1999). A Handbook of Time–Series
Analysis, Signal Processing and Dynamics.
Academic Press.
Press, W., S. Teukolsky, W. Vetterling and
B. Flannery (1988). Numerical Recipes in C:
The Art of Scientific Computing. Cambridge
University Press.
Sawhney, H., S. Hsu and R. Kumar (1998). Ro
bust video mosaicing through topology infer
ence and local to global alignment. In: Proc.
European Conf. on Computer Vision. Freiburg,
Germany.
Unnikrishnan, R. and A. Kelly (2002). A con
strained optimization approach to globally con
sistent mapping. In: Proc. Int. Conf. on Intell.
Robots and Systems. Lausanne, Switzerland.
Proc. Int. Conf. on
A. Bernardino and
Proc. of
View other sources
Hide other sources
 Available from Nuno Gracias · Oct 24, 2012
 Available from utl.pt
 Available from psu.edu
Similar Publications
Nuno Gracias 