Page 1

Parametric Image Alignment Using Enhanced

Correlation Coefficient Maximization

Georgios D. Evangelidis and

Emmanouil Z. Psarakis

Abstract—In this work, we propose the use of a modified version of the

correlation coefficient as a performance criterion for the image alignment problem.

The proposed modification has the desirable characteristic of being invariant with

respect to photometric distortions. Since the resulting similarity measure is a

nonlinear function of the warp parameters, we develop two iterative schemes for

its maximization, one based on the forward additive approach and the second on

the inverse compositional method. As is customary in iterative optimization, in

each iteration, the nonlinear objective function is approximated by an alternative

expression for which the corresponding optimization is simple. In our case, we

propose an efficient approximation that leads to a closed-form solution (per

iteration) which is of low computational complexity, the latter property being

particularly strong in our inverse version. The proposed schemes are tested

against the Forward Additive Lucas-Kanade and the Simultaneous Inverse

Compositional (SIC) algorithm through simulations. Under noisy conditions and

photometric distortions, our forward version achieves more accurate alignments

and exhibits faster convergence, whereas our inverse version has similar

performance as the SIC algorithm but at a lower computational complexity.

Index Terms—Image registration, motion estimation, gradient methods,

parametric motion, correlation coefficient.

Ç

1INTRODUCTION

THE parametric image alignment problem consists of finding a

transformation which aligns two image profiles. The profiles can

either be entire images, as in the image registration problem [1],

[2], or subimages, as in the region tracking [3], [4], [5], motion

estimation [6], [7], [8], [9], and stereo correspondence [10], [11]

problems. In image registration, the alignment problem needs to

be solved only once, whereas, in region tracking, a template image

has to be matched over a sequence of images. Finally, in motion

estimation and stereo correspondences, the goal is to find the

correspondence for all image points in a pair of images.

The alignment problem can be seen as a mapping between the

coordinate systems of two images; therefore, the first step toward

its solution is the suitable selection of a geometric transformation

that adequately models this mapping. Existing models are

basically parametric [12] and their exact form heavily depends

on the specific application and the strategy selected to solve the

alignment problem [3], [13]. The class of affine transformations

and, in particular, several special cases (as pure translation) have

been the center of attention in many applications [1], [2], [3], [4],

[6], [10], [11], [13]. Alternative approaches rely on projective

transformations (homography) and, more generally, on nonlinear

transformations [5], [13], [14], [15].

Oncethegeometricparametrictransformation hasbeendefined,

the alignment problem reduces itself to a parameter estimation

problem. Therefore, the second step toward its solution consists of

coming up with an appropriate performance measure, that is, an

objective function. The latter, when optimized, will yield the

optimum parameter estimates. Most existing approaches adopt

measuresthat rely onlpnormsoftheerror betweeneither thewhole

image profiles (pixel-based techniques) or a specific feature of the

image profiles (feature-based techniques) [12]. Clearly, the l2norm is

by far the most popular selection so far [1], [3], [6], [7], [9], [10], [13],

[15],[16]. The l2-based objective function is usually referred to as the

Sum-Squared-Differences (SSD) measure and the corresponding

optimization problem is known as the SSD technique [5], [9].

Variations on this approach have been proposed for the important

problem of optical flow determination [5], [7], [17], and robust

versions that can combat outliers were developed in [18].

For the optimum parameter estimation, all existing objective

functions require nonlinear optimization techniques. Depending

on the adopted solution strategy, the corresponding techniques

can be broadly classified into two categories. The first includes

gradient-based or differential approaches and the second includes

direct search techniques [12]. Gradient-based schemes, because of

their low computational cost, are regarded as more well fitted to

CV applications [13], [19]. They are, however, characterized by

noticeable convergence failure whenever homogeneous areas

and/or single slanted edges (aperture problem [20]) are present.

Meaningless estimates may also arise whenever we have strong

displacement values. Direct search techniques, on the other hand,

do not suffer the latter drawback. Indeed, these approaches can

easily accommodate large motions since they rely on global image

searches. Unfortunately, the latter require an exceedingly high

computational cost, which becomes more intense in the cases of

fine quantization needed in the case of accurate estimates [6].

Efforts to reduce complexity by adopting interpolation instead of

fine quantization or hybrid techniques that combine the two

classes can be found in [9], [15].

A common assumption encountered in most existing techniques

is the brightness constancy of corresponding points or regions in the

two profiles [20]. However, this assumption is valid only in specific

cases and it is obviously violated under varying illumination

conditions. There, it becomes clear that, in a practical situation, it is

important that the alignment algorithm be able to take into account

illumination changes. Alignment techniques that compensate for

photometric distortions in contrast and brightness have been

proposed in [1], [6], [8], [10], [16]. Alternative schemes make use

ofasetofbasisimagesforhandlingarbitrarylightingconditions[3],

[21] or use spatially dependent photometric models [7].

In this paper, we adopt a recently proposed similarity measure

[11], the enhanced correlation coefficient, as our objective function for

the alignment problem. Our measure is characterized by two very

desirable properties. First, it is invariant to photometric distortions

in contrast and brightness. Second, although it is a nonlinear

function of the parameters, the iterative scheme we are going to

develop for the optimization problem will turn out to be linear,

thus requiring reduced computational complexity. Despite the

resemblance of our final algorithm to well-known variants of the

Lucas-Kanade alignment method which take lighting changes into

account [10], [19], its performance, as we are going to see, is

notably superior. We would like to mention that the enhanced

correlation coefficient criterion was successfully applied to the

problem of 1D translation estimation in stereo correspondence [11]

and 2D translation estimation in registration [2].

The remainder of this paper is organized as follows: In

Section 2, we formulate the parametric image alignment problem.

Section 3 contains our main analytic results, namely, the definition

of our objective function, the development of a forward and an

inverse compositional iterative scheme for its optimization, and

the relation of the proposed schemes to existing SSD techniques. In

Section 4, our schemes are tested in a number of experiments

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30,NO. 10, OCTOBER 20081

. The authors are with the Signal Processing and Communications Lab,

Department of Computer Engineering and Informatics, University of

Patras, 26504 Rio-Patras, Greece.

E-mail: {evagelid, psarakis}@ceid.upatras.gr.

Manuscript received 17 Jan. 2007; revised 12 Feb. 2008; accepted 7 Apr. 2008;

published online 2 May 2008.

Recommended for acceptance by F. Dellaert.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-0026-0107.

Digital Object Identifier no. 10.1109/TPAMI.2008.113.

0162-8828/08/$25.00 ? 2008 IEEE Published by the IEEE Computer Society

Page 2

against the currently most popular algorithms, namely, the Lucas-

Kanade and Simultaneous Inverse Compositional (SIC) methods.

Finally, Section 5 contains our conclusions.

2PROBLEM FORMULATION

Suppose we are given a pair of image profiles (intensities) IrðxÞ,

IwðyÞ, where the first is the reference or template image and the

second is the warped and x ¼ ½x1;x2?t, y ¼ ½y1;y2?tdenote coordi-

nates. Suppose also that we are given a set of coordinates T ¼

fxk;k ¼ 1;...;Kg in the reference image, which is called the target

area. The alignment problem consists of finding the corresponding

coordinate set in the warped image. Of course, we are not

interested in arbitrary correspondences but, rather, in those that

are structured and can be modeled with a well-defined vector

mapping y ¼ ? ?ðx;pÞ, where p ¼ ½p1;???;pN?tis a vector of

unknown parameters. Such correspondence problems often arise

in practice, with the most common case being motion estimation in

a sequence of images. In this application, due to the relative

motion between scene and camera, whole (target) areas appear

differently in time.

Assuming that a transformation model is given (and under the

validity of the brightness constancy assumption), the alignment

problem is simply reduced to the problem of estimating the

parameters p such that

IrðxÞ ¼ Iw? ?ðx;pÞðÞ;

8 x 2 T :

ð1Þ

In order to have a chance of obtaining a unique solution, it is

necessary that the number N of unknown parameters does not

exceed the number K of target coordinates. Of course, in practice,

we usually have N ? K, which suggests that (1) is an over-

determined system of (nonlinear) equations.

Most existing algorithms attempt to compute the parameter

vector p by minimizing the difference or the dissimilarity of the two

profiles. Dissimilarity is expressed through an objective function

EðpÞ which involves the lpnorm of the intensity difference of the

two images. Since, in real applications, due to different viewing

directions and/or different illumination conditions, the brightness

constancy assumption is violated, it is necessary to include an

additional photometric transformation ?ðI;? ?Þ that accounts for

the photometric changes and which is parameterized by a vector of

unknown parameters ? ?. A typical optimization problem has the

following form:

X

We must mention that optimization problems of the form of (2) are

often ill-posed and it is usually necessary to impose extra

regularity (smoothness) conditions in order to obtain an acceptable

solution [17].

Solving the optimization problem is clearly not a simple task

because of the nonlinearity involved in the correspondence part.

The computational complexity and estimation quality of the

existing schemes depends on the specific lpnorm and the models

used for warping and photometric distortion. As far as the norm

power p is concerned, most methods use p ¼ 2 (euclidean norm).

This will also be the case in our approach, which we detail in the

next section.

min

p;? ?Eðp;? ?Þ ¼ min

p;? ?

x2T

IrðxÞ ? ? Iw? ?ðx;pÞðÞ;? ?

ðÞ

jjp:

ð2Þ

3PROPOSED CRITERION AND MAIN RESULTS

Under the warping transformation ? ?ðx;pÞ, the coordinates xk,

k ¼ 1;...;K

coordinates ykðpÞ ¼ ? ?ðxk;pÞ, k ¼ 1;...;K. Let us define the

reference vector ir and the corresponding warped vector iwðpÞ as

of the target area T

are mapped into the

ir¼ ½Irðx1Þ Irðx2Þ???IrðxKÞ?t;

iwðpÞ ¼ ½Iwðy1ðpÞÞ Iwðy2ðpÞÞ???IwðyKðpÞÞ?t;

and denote with?irand?iwðpÞ their zero-mean versions, which are

obtained by subtracting from each vector its corresponding

arithmetic mean. We then propose the following criterion to

quantify the performance of the warping transformation with

parameters p:

ð3Þ

EECCðpÞ ¼

?ir

k?irk?

?iwðpÞ

k?iwðpÞk

????????

2

;

ð4Þ

where k ? k denotes the usual euclidean norm.

It is apparent from (4) that our criterion is invariant to bias and

gain changes. This also suggests that our measure is going to be

invariant to any photometric distortions in brightness and/or in

contrast. Consequently, to a first approximation, we can completely

disregardthephotometrictransformationandconcentratesolelyon

the geometric. It is also interesting to mention that our measure

exhibits statistical robustness against outliers, as is reported in [22].

All of these positive characteristics clearly support our expectation

that the proposed criterion will turn out to be a suitable objective

function for the parametric image alignment problem.

3.1 Performance Measure Optimization

Once the performance measure is specified, we then continue with

its minimization in order to compute the optimum parameter

values. It is straightforward to prove that minimizing EECCðpÞ is

equivalent to maximizing the following enhanced correlation coeffi-

cient [11]:

?ðpÞ ¼

?it

r?iwðpÞ

??

k?irk?iwðpÞ

??¼^it

r

?iwðpÞ

?iwðpÞ

????;

ð5Þ

where, for simplicity, we denote with^ir¼?ir=k?irk the normalized

version of the zero-mean reference vector, which is constant. Notice

that, even if?iwðpÞ depends linearly on the parameter vector p, the

resulting objective function is still nonlinear with respect to p due

to the normalization of the warped vector. This, of course, suggests

that its maximization requires nonlinear optimization techniques.

As was mentioned in Section 1, maximizing ?ðpÞ can be

performed either by using direct search or by gradient-based

approaches. Here, we are going to use the latter. As is customary

in iterative techniques, we are going to replace the original

optimization problem with a sequence of secondary optimizations.

Each secondary optimization relies on the outcome of its

predecessor, thus generating a chain of parameter estimates which

hopefully converges to the desired optimizing vector. At each

iteration, we do not have to optimize the objective function but an

approximation to this function. Of course, the approximation must

be selected so that the resulting optimizers are simple to compute.

Next, let us introduce the approximation we are going to apply for

our objective function and derive the solution that maximizes it.

Assume that p is “close” to some nominal parameter vector ~ p

and write p ¼ ~ p þ ?p, where ?p denotes a vector of perturbations.

Let ~ y ¼ ? ?ðx; ~ pÞ be the warped coordinates under the nominal

parameter vector and y ¼ ? ?ðx;pÞ under the perturbed ones.

Considering the intensity of the warped image at coordinates y

and applying a first-order Taylor expansion with respect to the

parameters, then we can write

IwðyÞ ? Iwð~ yÞ þ ryIwð~ yÞ

??t@? ?ðx; ~ pÞ

@p

?p;

ð6Þ

where ryIwð~ yÞ denotes the gradient vector of length 2 of the

intensity function IwðyÞ of the warped image, evaluated at the

nominal warped coordinates ~ y. Since ? ?ðx;pÞ is a vector

2IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30,NO. 10, OCTOBER 2008

Page 3

transformation of length 2 (in order to yield the warped

coordinates), then

@p

denotes the size 2 ? N Jacobian matrix

of the transform with respect to the parameters, evaluated at the

nominal parameter values. Note that we have silently assumed

that the intensity function Iwand the warping transformation ? ? are

of sufficient smoothness to allow for the existence of the required

partial derivatives.

We can now apply (6) for all coordinates xk, k ¼ 1;...;K, of the

target area T . This will yield the following linearized version of the

warped vector with parameters p:

@? ?ðx;~ pÞ

iwðpÞ ? iwð~ pÞ þ Gð~ pÞ?p;

ð7Þ

where Gð~ pÞ denotes the size K ? N Jacobian matrix of the warped

intensity vector with respect to the parameters, evaluated at the

nominal parameter values ~ p. In order to specify exactly this

matrix, let us assume that the warping transformation is of the

form

? ?ðx;pÞ ¼ ½?1ðx;pÞ;?2ðx;pÞ?t;

ð8Þ

where ?1, ?2are scalar functions. Then, the ðk;nÞ element of the

matrix G can be written as

@yi

y¼ykð~ pÞ

where k ¼ 1;...;K; n ¼ 1;...;N, and we recall that y ¼ ½y1;y2?t

are the coordinates in the warped image.

We now need to compute the zero-mean version of the warped

vector. With the help of (7), we obtain the following approximation

of the objective function ?ðpÞ defined in (5):

Gð~ pÞk;n¼

X

2

i¼1

@IwðyÞ

?????

?@?iðxk;pÞ

@pn

?????

p¼~ p

!

;

ð9Þ

?ðpÞ ? ?ð?pj~ pÞ ¼^it

r

?iwð~ pÞ þ?Gð~ pÞ?p

?iwð~ pÞ þ?Gð~ pÞ?p

????;

ð10Þ

where?Gð~ pÞ and?iwð~ pÞ are the column-zero-mean versions of Gð~ pÞ

and iwð~ pÞ, respectively.

From now on, let us, for notational simplicity, drop the

dependence of the warped vectors on p; we can then write our

previous approximation as follows:

?ð?pj~ pÞ ¼

^it

r?iwþ^it

w?G?p þ ?pt ?Gt ?G?p

r?G?p

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

k?iwk2þ 2?it

q

:

ð11Þ

Although ?ð?pj~ pÞ is nonlinear in ?p, its maximization is

simple and results in a closed-form expression. This is a

consequence of the next theorem, which provides the necessary

result.

Theorem 1. Consider the scalar function

fðxÞ ¼

u þ utx

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v þ 2vtx þ xtQx

p

;

ð12Þ

where u;v are scalars; u;v are vectors of length N; Q is a square,

symmetric, and positive definite matrix of size N; and v, v, Q are

such that

v > vtQ?1v;

ð13Þ

then, as far as the maximal value of fðxÞ is concerned, we distinguish

the following two cases:

Case u > utQ?1v: Here, we have a maximum, specifically

max

x

fðxÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v ? vtQ?1v

ðu ? utQ?1vÞ2

þ utQ?1u

s

;

ð14Þ

which is attainable for

x ¼ Q?1

v ? vtQ?1v

u ? utQ?1vu ? v

??

:

ð15Þ

Case u ? utQ?1v: Here, we have a supremum which is equal to

p

and can be approached arbitrarily close by selecting

sup

x

fðxÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

utQ?1u

ð16Þ

x ¼ Q?1?u ? v

fg;

ð17Þ

with ? positive scalar and of sufficiently large value.1

Proof. The proof makes repeated use of the Schwartz inequality.

All details are presented in the Appendix.

t u

Let us now examine whether we can apply Theorem 1 for the

maximization of ?ð?pj~ pÞ defined in (11). For this, we need to

verify the validity of (13). For the problem of interest, this

translates into the following inequality: k?iwk2>?it

PG¼?Gð?Gt ?GÞ?1?Gt. This relation is trivially satisfied because PGis

an orthogonal projection operator (i.e., P2

and, therefore, we can write

wPG?iw, where

G¼ PG and Pt

G¼ PG)

k?iwk2¼ kPG?iwk2þ k½I ? PG??iwk2? kPG?iwk2¼?it

where I denotes the identity matrix. We have equality if and only

if ½I ? PG??iw¼ 0, which is true whenever?iwis a linear combination

of the columns of?G. Clearly, the probability of this happening is

zero, especially under the presence of noise. Consequently, the

desired inequality, for all practical purposes, is strict.

Since we can apply Theorem 1, according to (15), the

optimizing perturbation is equal to

(

wPG?iw;

ð18Þ

?p ¼ ð?Gt?GÞ?1?Gt k?iwk2??it

wPG?iw

rPG?iw

^it

r?iw?^it

^ir??iw

)

;

ð19Þ

when^it

r?iw>^it

rPG?iw; or, according to (17),

?p ¼ ð?Gt?GÞ?1?Gtf?^ir??iwg;

rPG?iw, where ? must be selected so that the resulting

?ð?pj~ pÞ satisfies ?ð?pj~ pÞ > ?ð0j~ pÞ. In other words, we would like

to select a perturbation that will increase the correlation and will

make it nonnegative. The following lemma provides possible

values for ?.

Lemma 1. Let^it

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

^it

ð20Þ

when^it

r?iw?^it

r?iw?^it

rPG?iwand define the following two values for ?:

s

^it

?1¼

?it

wPG?iw

rPG^ir

;?2¼

^it

rPG?iw?^it

rPG^ir

r?iw

:

ð21Þ

Then, for ? ? ?1, we have that ?ð?pj~ pÞ > ?ð0j~ pÞ; for ? ? ?2, that

?ð?pj~ pÞ ? 0; finally, for ? ? maxf?1;?2g, we have both inequalities

valid.

Proof. By substituting the value of ?p from (20) in (11), the

objective function becomes the following function of ?:

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30, NO. 10, OCTOBER 20083

1. More precisely, we mean that, for every ? > 0, there exists a

sufficiently large scalar ?? such that the resulting fðxÞ is ? close to the

upper bound.

Page 4

fð?Þ ¼

^it

r?iw?^it

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rPG?iw

?

?

?

þ ?^it

?

rPG^ir

k?iwk2??it

wPG?iw

þ ?2^it

rPG^ir

r

:

ð22Þ

It is easy to verify that the derivative of fð?Þ is nonnegative;

therefore, fð?Þ is increasing in ?. This suggests that, for ? ? ?2,

we have fð?Þ ? 0. Notice now that, for ? ¼ ?1, we can write

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

k?iwk

with the last inequality being a consequence of applying the

Schwartz inequality on^it

orthogonal projection operator.

Remarks. One should expect, as?iwapproaches?ir, to use mostly

(19) since, for?iw??ir, we have^it

interesting, however, to note that, if one insists on using (19) at

all times, then, whenever^it

negative correlation ?ð?pj~ pÞ (this being true even if ?ð0j~ pÞ > 0Þ

which is always smaller than ?ð0j~ pÞ. In other words, instead of

increasing the correlation coefficient (as is the desired goal), in

this case, we decrease it. This clearly suggests that it is preferable

to use (20) with a value of ?, as indicated in Lemma 1, (21).

fð?1Þ ¼

^it

r?iw?^it

rPG?iwþ

?it

wPG?iw

??^it

rPG^ir

??

r

? ?ð0j~ pÞ;

ð23Þ

rPG?iw and recalling that PG is an

t u

r?iw?^it

r?ir>^it

rPG?ir?^it

rPG?iw. It is

r?iw?^it

rPG?iwholds, we end up with a

3.2Forward Additive ECC Iterative Algorithm

Let us now translate the above results into an iterative scheme in

order to obtain the solution to the original nonlinear optimization

problem. Assuming that estimate pj?1of the parameter vector is

availablefromiterationj ? 1,wecancompute?iwðpj?1Þand?Gðpj?1Þ;

then, we can approximate ?ðpÞ following (10) with the help of

?ð?pjjpj?1Þ and optimize this approximation with respect to ?pj.

This will lead to the parameter update rule pj¼ pj?1þ ?pj. As is

indicated in Step S4, we stop iterating whenever the norm of the

updating vector ?pj becomes smaller than some predefined

threshold value T. The iteration steps are summarized in Table 1

and we call the corresponding algorithm the Forward Additive

ECC (FA-ECC).

Given the number K of pixels in the target area T and the

parameter vector estimate pj?1of length N, the complexity per

iteration of the proposed scheme can be easily estimated. From

Table1andtakingintoaccountthat,usually,K ? N,werealizethat

the most computationally demanding part is Step S3, which

involves the computation of ?pjwith the help of (19) or (20). As

we can see, in this step, we need to form the matrix?Gt ?G, which

requires OðKN2Þ operations. This is the leading complexity in our

algorithm since all other steps require at most OðKNÞ per iteration.

3.3 Inverse Compositional ECC Iterative Algorithm

When the alignment problem is restricted to specific classes of

parametric models, it is possible to devise more computationally

efficient versions since certain parts of the algorithm can be

computed offline [3], [13], [15]. If, for example, we adopt the

methodology proposed in [19], we can come up with the Inverse

Compositional ECC (IC-ECC) version of our algorithm which has

the significantly reduced complexity OðKNÞ per iteration. We

briefly mention that the methodology found in [3], [13] relies on

interchanging the role of iw and ir. Consequently, matrix G

becomes the Jacobian matrix of the reference intensity vector and

since the warping function for this vector is the identity, matrix G

is constant and?Gt ?G can be computed offline. The latter is the

reason behind the one order of magnitude reduction in computa-

tional complexity. The outline of our alternative algorithmic

version IC-ECC can be easily obtained from Table 1 by

appropriately modifying our FA-ECC version.

Regarding inverse algorithms (additive and compositional) as

well as the forward compositional algorithm [15], we should point

out that they can be applied only to specific classes of warps. It is

also known that inverse algorithms are more susceptible to noisy

conditions than their forward counterparts [13]. These important

weaknesses limit the usage of such algorithms in practice.

3.4 Relation to Existing SSD-Based Measures

In this section, we are going to derive our performance measure in

a different way. This will also help us in relating it to the two

currently most popular SSD approaches in the literature. For our

analysis, we are going to assume that photometric distortion is

limited only to global brightness and contrast changes. Under this

simple type of photometric changes, we can define the following

performance measure for our parametric alignment problem:

Eðp;? ?Þ ¼ ?1iwðpÞ þ ?2? ir

kk2;

ð24Þ

where ? ? ¼ ½?1?2?tis the parameter vector for the photometric

transformation. Our goal, of course, is to minimize the objective

function with respect to all parameters. Regarding the first

photometric parameter, we must point out that negative values

of ?1 produce the inversion effect, where colors are reversed.

Consequently, if there exists the a priori knowledge that such a

color inversion cannot take place, then it is logical to limit ?1only

to positive values. Now, if we first minimize the objective function

with respect to ?1;?2, we obtain the following interesting result:

EðpÞ ¼ min

?1?0;?2Eðp;? ?Þ ¼ k?irk21 ? max ?ðpÞ;0

where ?ðpÞ is the correlation function defined in (5). Notice that,

since the reference image is constant, so is the norm k?irk2

contained in the previous relation; therefore, further minimization

with respect to p is equivalent to minimizing the term

ð1 ? ½maxf?ðpÞ;0g?2Þ. But, this expression is decreasing in ?ðpÞ;

consequently, we can equivalently maximize the correlation

function ?ðpÞ, thus recovering our criterion. The final optimization

problem makes a lot of sense. Indeed, notice that, since ?ðpÞ is free

of photometric distortions (the simple type we consider here) and

fg

½?2

no

;

ð25Þ

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30,NO. 10, OCTOBER 2008

TABLE 1

Outline of the Proposed Forward Additive ECC (FA-ECC) Refinement Algorithm

Page 5

under the knowledge that there is no color inversion, it is quite

plausible to look for the most positive correlation.

If we drop the constraint ?1? 0, then the minimization of the

objective function in (25) is the optimization problem proposed by

Fuh and Maragos [6]. By optimizing first with respect to ?1, ?2

yields

EFMðpÞ ¼ min

?1;?2Eðp;? ?Þ ¼ k?irk21 ? ?2ðpÞ

??:

ð26Þ

Notice that the resulting measure is now a decreasing function of

j?ðpÞj; therefore, any further minimization with respect to p is

equivalent to maximizing the absolute value j?ðpÞj of the

correlation function. It is clear that this optimization problem

does not take into account the prior knowledge that there is no

color inversion. In [6], maximization was achieved by adopting an

exhaustive search approach in the N-D quantized parameter

space. Clearly, in a noncolor-inversion situation, such a search will

give rise to the correct maximum positive correlation (provided, of

course, that the warped image does not contain parts that are the

negative of the target area). However, as we mentioned in

Section 1, exhaustive search approaches are characterized by high

computational complexity, which becomes exceedingly demand-

ing when we are interested in fine subpixel accuracy.

Although not proposed in [6], alternatively, we could adopt an

iterative approach similar to the one suggested for our measure. If,

however, we attempt to maximize j?ðpÞj using the same approx-

imation as in (10), then one can show that the optimum

perturbation ?p is always given by (19). As was indicated in

our remarks (after Lemma 1), adopting this strategy may result in

negative correlations corresponding to local minima for ?ðpÞ

instead of the desired maxima. In other words, there are more

chances for the iterative algorithm to be locked in erroneous local

extrema than is the case with our approach.

An alternative measure arises if, in (24), we interchange the

roles of iwand ir, that is,

Eðp;? ?Þ ¼ ?1irþ ?2? iwðpÞ

kk2:

ð27Þ

This is the approach adopted by Lucas and Kanade [10] and it is

known to generate, along with its variants, the most widely used

algorithms in practice. Following similar steps as in the previous

two cases, let us first minimize with respect to the two photometric

parameters. This yields

ELKðpÞ ¼ min

?1;?2Eðp;? ?Þ ¼?iwðpÞ

????21 ? ?2ðpÞ

??:

ð28Þ

We observe in the current outcome that the resulting criterion has

two terms that depend on the parameters p, namely, the familiar

part f1 ? ?2ðpÞg and the magnitude of the warped image k?iwðpÞk2

(which is not constant). Therefore, minimizing ELKðpÞ with respect

to the parameters involves the minimization of the combination of

the two terms. The first observation is that this criterion will not

necessarily produce the same solution as our measure. Second,

due to the term k?iwðpÞk2, it is clear that an iterative algorithm can

lock in solutions which result in k?iwðpÞk2? 0 (for example, areas

with uniform intensity). And, third, because of the term ?2ðpÞ, the

algorithm can lock in negative correlations.

Despite the previous observations, the Lucas-Kanade perfor-

mance measure gives rise to the most popular iterative algorithms

for the image alignment problem. For this reason, we are going to

use it as a point of reference and compare it against our scheme.

Consequently, let us present its Forward Additive LK (FA-LK)

updating version in more detail. Substituting the linear approx-

imation of?iwðpÞ in (28), then minimizing with respect to ?p, we

obtain the following optimum updating perturbation:

?pLK¼ ð?Gt?GÞ?1?Gt

^it

r?iw?^it

1 ?^it

rPG?iw

rPG^ir

^ir??iw

()

;

ð29Þ

which is applicable at all times. Comparing (19) with (29), we

realize that the difference is only in the scalar quantity that

precedes the vector^ir. As we are going to see, this seemingly slight

variation, in combination with (20), will result in significant

performance improvements.

For the Lucas-Kanade approach, it is possible to define a special

SSD-based measure that can handle arbitrary linear appearance

variations. For its minimization, an iterative algorithm that makes

use of the inverse additive update rule was proposed in [3] by

Hager and Belhumeur. Based on the same SSD measure, Baker

et al. [19], by adopting the inverse compositional approach,

proposed several variants of the Hager-Belhumeur algorithm.

Among these alternative algorithmic schemes, the SIC algorithm is

reported to have the best performance [19]. Therefore, this

algorithm will also be tested in the next section.

4SIMULATION RESULTS

In this section, we perform a number of simulations in order to

evaluate our FA-ECC and IC-ECC algorithmic version. As we

mentioned above, we will also simulate the FA-LK algorithmic

version that copes with photometric distortions and the SIC

algorithm, which is considered to be the most effective inverse

LK scheme. For all aspects affecting the simulation experiments,

we made an effort to stay exactly within the framework specified

in [13], [19]. To model the warping process, we are going to use the

class of affine transformations. We know that the 2D rigid body or

similarity transformation are members of this class. Furthermore,

the Jacobian of the affine model is a constant matrix, meaning that

it can be computed offline. Before proceeding with the presenta-

tion of our simulation results, let us first briefly present the

experimental setup and the figures of merit we are going to adopt.

4.1Experimental Setup and Figures of Merit

In order to create a reference and a warped image, we follow the

procedure proposed in [13]. In brief, let IðxÞ be a given image and

xi, i ¼ 1;2;3, the coordinates of three points which define the

boundaries of the desired target area. We perturb these points by

adding Gaussian noise Nð0;?2

geometric deformation), select a vector x0 such that the points

x0þ xi, i ¼ 1;2;3, lie in the interior of the support of the given

image, and define the parameter vector pr of the affine

transformation that maps the original points to the translated

noisy ones. We apply this transformation to all points of the target

area to warp it. With the help of bilinear interpolation, we compute

the new intensities. This process defines the reference profile IrðxÞ.

For the warped image, we use the given one.

All algorithms are initialized in the same way, namely,

p0¼ ½1 0 0 1 xt

parameter estimates pj. In order to measure the quality of this

estimate, we use the following quantity:

pÞ (?p captures the strength of the

0?t. At iteration j, each algorithm provides the

eðjÞ ¼1

6

X

3

i¼1

? ?ðxi;prÞ ? ? ?ðxi;pjÞ

?? ??2;

ð30Þ

which quantifies the existing squared error between the exact

warped version of the points xi, i ¼ 1;2;3, and their estimated

counterparts.

By averaging this error over many realizations that differ in the

point noise realization, we can compute the Mean Square Distance

(MSD) value. Obviously, by computing this value in each iteration

of an algorithm, we form a sequence that captures its learning

ability. Of course, it is unrealistic to expect that any of the

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30,NO. 10, OCTOBER 20085

Page 6

algorithms will converge at all times. This is particularly apparent

for high values of ?p. For this reason, in order to quantify the

algorithmic performance in a meaningful way and have the right

picture of this convergence characteristic, we adopt the idea

followed in [13], namely, to define the MSD but conditioned on the

event that all of the competing algorithms have converged. By

“convergence,” we mean that eðjmaxÞ ? TMSD. In other words, we

consider that an algorithm has converged when its squared error

eðjÞ at a prescribed maximal iteration jmax is below a certain

threshold level TMSD.

The second quantity which is of importance is clearly the

percentage of converging (PoC) runs. Therefore, we define this

quantity as being the percentage of algorithms that converge up to

a predefined maximal iteration jmax. PoC will be depicted as a

function of the point standard deviation ?p, which is the most

important factor that affects the performance of all algorithms.

Since it is only natural to prefer an algorithm that converges

quickly with high probability, we propose a third figure of merit that

captures exactly this aspect. Specifically, for characteristic values

of ?pand thresholds TMSD, we apply the algorithms for a maximal

number of iterations jMAX. Then, we compute the cumulative PoC

achieved by each algorithm as jmaxincreases from 0 to jMAX. This

third figure of merit is proposed here for the first time.

In all of the experiments, we use the “Takeo” image as the

warped profile and generate a reference image as was previously

described. We make 5,000 realizations of image pairs and we add

independent and identically distributed, zero-mean Gaussian

intensity noise of standard deviation ?i before running the

competing algorithms. Although in [13], [19] we find three

different scenarios, here, due to lack of space, we only focus in

the one where we add noise to both image profiles (since this is the

most interesting from a practical viewpoint).

4.2First Experiment

In this experiment, for the intensity noise, we use a standard

deviation ?i, which corresponds to eight gray levels, and compare

the convergency characteristics of the competing algorithms for a

maximum number of iterations2jmax¼ 15 and TMSD¼ 1 pixel2.

Figs. 1a, 1b, and 1c depict the convergence profiles of the

algorithms for different values of ?p. We observe the appearance

of an MSD floor value in each algorithm which is due to the

presence of the intensity noise. Fig. 1d presents the corresponding

PoC as a function of ?p.

As we can see, each algorithm attains a different MSD floor

value with our FA-ECC version converging to the lowest one and

with a rate which can be significantly better. Specifically, for weak

geometric deformations, all algorithms reach almost comparable

floor values and have comparable convergence rates, with FA-ECC

being slightly faster than its rivals. However, in the case of

medium to strong deformations, FA-ECC reaches an MSD floor

value which is 3 dB lower than the inverse versions and slightly

lower than the FA-LK algorithm. On the other hand, the

convergency rate of FA-ECC is significantly superior compared

to all other algorithms. Regarding our IC-ECC version, as we can

see, it has performance comparable to the SIC algorithm. The same

characteristics also apply to PoC, where FA-ECC exhibits a larger

percentage of successful convergences while IC-ECC matches the

performance of SIC. Regarding the third figure of merit, we

applied the algorithms for a maximal number of iterations

jMAX¼ 100. In order to test the accuracy of the alignment, we

selected a threshold value TMSD¼ ð1=18 pixelÞ2(i.e., ?25 dB),

assuring that TMSD is higher than the MSD floor value of all

competing algorithms. Fig. 3a depicts the corresponding curves for

three values of ?p. As we can see, for weak deformations, all

algorithms are almost completely successful after the

10th iteration. When, however, the geometric deformation be-

comes stronger, FA-ECC outperforms its competitors significantly.

Again, IC-ECC is comparable to SIC.

4.3 Second Experiment

In this simulation, we consider the realistic case of photometrically

distorted images under noisy conditions. We consider two

different scenarios. We impose the photometric distortion 1) on

the reference image and 2) on the warped one. Since all competing

algorithms perfectly compensate for linear photometric distor-

tions, we consider a nonlinear transformation of the form

IðxÞ ðIðxÞ þ 20Þ0:9, which is applied to the intensity of each

image pixel. We repeat the same set of simulations as in the first

experiment, only now we impose the photometric distortion before

adding intensity noise.

The results we obtained are shown in Fig. 2. As we can see, the

performance of our forward algorithm seems to be almost

unaffected, achieving, under both scenarios, almost the same and

the lowest MSD floor value. On the other hand, the performance of

both inverse algorithms and FA-LK scheme seems to be vitally

affected. Comparing Fig. 2 with Fig. 1, we observe that, under the

first scenario, FA-ECC performs even better than before. In fact,

the MSD floor value is now 3 and 5 dB lower than the value

attained by the FA-LK algorithm and the inverse algorithms,

respectively. We should note here that the MSD floor is due not

only to the intensity noise but also to the photometric model

mismatch. Under the second scenario, all algorithms achieve the

same MSD floor value. As far as PoC is concerned, we observe a

rather steady and robust behavior for the forward algorithms

under both scenarios while inverse schemes, under the first

scenario, exhibit a significant performance reduction as compared

to the second one.

Finally, we present the corresponding curves of the third figure

of merit in Fig. 3b under the first scenario since, under the second

one, both the inverse and the FA-ECC algorithm exhibited a

6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30,NO. 10,OCTOBER 2008

2. In order to make the different MSD floor values achieved by the

competing algorithms in Figs. 1a, 1b, and 1c and Figs. 2a, 2b, and 2c visible,

30 iterations are shown.

Fig. 1. MSD in decibels as a function of number of iterations under the presence of noise (?i¼ 8 gray levels). (a) ?p¼ 2. (b) ?p¼ 6. (c) ?p¼ 10. In (d), PoC as a function

of ?p.

Page 7

similar performance. As in the previous experiment, we permit a

maximal number of 100 iterations with a threshold TMSD¼

ð1=10 pixelÞ2(i.e., ?20 dB), since now we have higher MSD floor

values. Again, FA-ECC outperforms the other algorithms. Com-

paring Fig. 3a with Fig. 3b, we can also notice a robust and

consistent behavior of FA-ECC with respect to intensity noise and

photometric distortion model mismatch.

In summary, we can safely conclude that our proposed

schemes are preferable to the corresponding variants of the

LK algorithm. Clearly, our forward version is more effective than

the forward LK scheme regarding both speed and percentage of

convergence. On the other hand, our inverse version has

performance which is comparable to the performance of SIC,

which is the best inverse version of the LK algorithm. However,

the point that makes our IC-ECC version preferable to SIC is the

reduced computational complexity, which is OðKNÞ as compared

to SIC, which requires OðKðN þ 2Þ2Þ operations.

We should also mention that we evaluated the algorithms

under diverse uncertainty conditions. Only in the case of zero

intensity noise (in other words, when the warped image follows

the warping model exactly), we observed the performance of both

inverse algorithms and the FA-ECC to be similar and to outper-

form the FA-LK algorithm in all figures of merit. This performance

difference can, in fact, become quite significant if the geometric

deformations are strong (i.e., ?p? 6). However, due to lack of

space, we cannot present these results in more detail.

5CONCLUSIONS

In this paper, we have proposed a new l2-based iterative algorithm

tailored to the parametric image alignment problem. The new

scheme is aimed at maximizing the Enhanced Correlation

Coefficient function, which constitutes a measure that is robust

against geometric and photometric distortions. The optimal

parameters were obtained by iteratively solving a sequence of

approximate nonlinear optimization problems which enjoy a

simple closed-form solution with low computational cost. In

addition, based on the inverse compositional update rule, we

developed an efficient modification of the forward algorithm. Our

iterative schemes were compared against two variants of the

LK algorithm through numerous simulations. Under ideal condi-

tions, the proposed algorithms and the SIC algorithm exhibited

similar performance, outperforming the forward LK algorithm.

However, in the more realistic case of noisy conditions and

photometric distortions, our forward algorithm exhibited a

noticeably superior performance in convergence speed, accuracy,

and percentage of convergence.

APPENDIX

PROOF OF THEOREM 1

The proof of Theorem 1 relies on the application of Schwartz

inequality. In order to simplify our presentation, let us impose the

following change of variables:

z ¼ Q1=2x þ Q?1=2v; ~ u ¼ Q?1=2u; ~ v ¼ Q?1=2v;

then the function we want to optimize becomes a function of z and

has the form

ð31Þ

fðzÞ ¼

ðu ? ~ ut~ vÞ þ ~ utz

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v ? k~ vk2

??

þ kzk2

r

:

ð32Þ

Note that condition v > k~ vk2guarantees that the quantity under

the square root in the denominator is positive.

Let us first consider the case u ? ~ ut~ v > 0, then we can define

~ z ¼ zt

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v ? k~ vk2

q

??t

;

~ w ¼

~ ut

u ? ~ ut~ v

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v ? k~ vk2

q

2

64

3

75

t

ð33Þ

and our objective function becomes

fðzÞ ¼~ wt~ z

k~ zk?j~ wt~ zj

k~ zk

? k~ wk;

ð34Þ

with the last inequality being the result of applying the Schwartz

inequality. Now notice that k~ wk is constant, constituting an upper

bound to our objective function. This bound is attainable when

both inequalities become equalities. From the Schwartz inequal-

ity, we know that we have equality whenever we select

~ z ¼ ?~ w, where ? is some scalar quantity. Under this selection,

in order for the first inequality to become equality, we need

? > 0. From ~ z ¼ ?~ w, by equating the last vector elements, we

conclude that ? ¼ ðv ? k~ vk2Þ=ðu ? ~ ut~ vÞ, which is positive only

when u ? ~ ut~ v > 0, yielding z ¼ ðv ? k~ vk2Þ~ u=ðu ? ~ ut~ vÞ. It is inter-

esting to note that, when u ? ~ ut~ v ? 0, the upper bound k~ wk is not

tight (not attainable) and, therefore, this case needs the separate

treatment that follows.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30, NO. 10, OCTOBER 20087

Fig. 2. MSD in decibels as a function of number of iterations for photometrically distorted reference (solid lines) and warped (dashed lines) image under the presence of

noise (?i¼ 8 gray levels). (a) ?p¼ 2. (b) ?p¼ 6. (c) ?p¼ 10. In (d), PoC as a function of ?p.

Fig. 3. PoC as a function of iterations: (a) noisy images (?i¼ 8 gray levels) and

(b) noisy (?i¼ 8 gray levels) and photometrically distorted images.

Page 8

When u ? ~ ut~ v ? 0, in order to find the supremum, we apply

the following inequalities:

fðzÞ ¼

ðu ? ~ ut~ vÞ þ ~ utz

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

~ utz

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

? k~ uk

v ? k~ vk2

? k~ uk:

v ? k~ vk2

?

?

?

?

kzk

þ kzk2

r

r

?

v ? k~ vk2

þ kzk2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

??

þ kzk2

r

ð35Þ

The first inequality is true because of the nonpositivity of u ? ~ ut~ v

(from our assumption); for the second, we applied the Schwartz

inequality in the numerator; finally, for the last, we used the fact

that the ratio is smaller than 1. We observe that, in this case, we

end up with a different (smaller) upper bound. In order to verify

its tightness (i.e., whether it constitutes a supremum), we use the

selection prescribed by the Schwartz inequality, that is, z ¼ ?~ u

again with ? > 0 and compute the corresponding value of the

objective function. By letting ? ! 1, we realize that we converge

to k~ uk. This suggests that, for sufficiently large ?, we can approach

the desired upper bound arbitrarily close (but there is no finite z

for which we can attain it exactly!). This concludes the proof.

t u

ACKNOWLEDGMENTS

This work was supported by the General Secretariat for Research

and Technology of the Greek Government as part of the project

“XROMA,” PENED 01.

REFERENCES

[1] S. Periaswamy and H. Farid, “Elastic Registration in the Presence of

Intensity Variation,” IEEE Trans. Medical Imaging, vol. 22, no. 7, pp. 865-874,

2003.

I. Karybali, E.Z. Psarakis, K. Berberidis, and G.D. Evangelidis, “Efficient

Image Registration with Subpixel Accuracy,” Proc. 14th European Signal

Processing Conf., 2006.

G.D. Hager and P.N. Belhumeur, “Efficient Region Tracking with

Parametric Models of Geometry and Illumination,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 20, no. 10, pp. 1025-1039, Oct. 1998.

J. Shi and C. Tomasi, “Good Features to Track,” Proc. IEEE Int’l Conf.

Computer Vision and Pattern Recognition, 1994.

M. Gleicher, “Projective Registration with Difference Decomposition,” Proc.

IEEE Int’l Conf. Computer Vision and Pattern Recognition, 1997.

C. Fuh and P. Maragos, “Motion Displacement Estimation Using an Affine

Model for Image Matching,” Optical Eng., vol. 30, no. 7, pp. 881-887, 1991.

Y. Altunbasak, R.M. Mersereau, and A.J. Patti, “A Fast Parametric Motion

Estimation Algorithm with Illumination and Lens Distortion Correction,”

IEEE Trans. Image Processing, vol. 12, no. 4, pp. 395-408, 2003.

B.K.P. Horn and E.J. Weldon, “Direct Methods for Recovering Motion,”

Int’l J. Computer Vision, vol. 2, no. 1, pp. 51-76, 1988.

P. Anandan, “A Computational Framework and an Algorithm for the

Measurement of Visual Motion,” Int’l J. Computer Vision, vol. 2, no. 3,

pp. 283-310, 1989.

B.D. Lucas and T. Kanade, “An Iterative Image Registration Technique

with an Application to Stereo Vision,” Proc. Seventh Int’l Joint Conf. Artificial

Intelligence, 1981.

E.Z. Psarakis and G.D. Evangelidis, “An Enhanced Correlation-Based

Method for Stereo Correspondence with Sub-Pixel Accuracy,” Proc. 10th

IEEE Int’l Conf. Computer Vision, 2005.

R. Szeliski, Handbook of Mathematical Models of Computer Vision, N. Paragios,

Y. Chen, and O. Faugeras, eds., chapter 17. Springer, 2005.

S. Baker and I. Matthews, “Lucas-Kanade 20 Years On: A Unifying

Framework: Part 1. The Quantity Approximated, the Warp Update Rule,

and the Gradient Descent Approximation,” Int’l J. Computer Vision, vol. 56,

no. 3, pp. 221-255, 2004.

M.J. Black and Y. Yacoob, “Tracking and Recognizing Rigid and Non-Rigid

Facial Motions Using Local Parametric Models of Image Motion,” Proc.

Fifth IEEE Int’l Conf. Computer Vision, 1995.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

H. Shum and R. Szeliski, “Construction of Panoramic Image Mosaics with

Global and Local Alignment,” Int’l J. Computer Vision, vol. 36, no. 2, pp. 101-

130, 2000.

S. Nagahdaripour and C.H. Yu, “A Generalized Brightness Change Model

for Computing Optical Flow,” Proc. Fourth IEEE Int’l Conf. Computer Vision,

1993.

B.K.P. Horn and B.G. Schunk, “Determining Optical Flow,” Artificial

Intelligence, vol. 17, pp. 185-203, 1981.

M.J. Black and P. Anandan, “A Framework for the Robust Estimation of

Optical Flow,” Proc. Fourth IEEE Int’l Conf. Computer Vision, 1993.

S. Baker, R. Gross, and I. Matthews, “Lucas-Kanade 20 Years On: A

Unifying Framework: Part 3,” CMU-RI-TR-03-35, Robotics Inst., Carnegie

Mellon Univ., 2004.

B.K.P. Horn, Robot Vision. MIT Press, McGraw-Hill, 1986.

P. Hallinan, “A Low-Dimensional Representation of Human Faces for

Arbitrary Lighting Conditions,” Proc. IEEE Int’l Conf. Computer Vision and

Pattern Recognition, 1994.

V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley & Sons,

1978.

[16]

[17]

[18]

[19]

[20]

[21]

[22]

. For more information on this or any other computing topic, please visit

our Digital Library at www.computer.org/publications/dlib.

8IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 30,NO. 10, OCTOBER 2008