Deformable face mapping for person identification.
ABSTRACT This paper introduces a novel deformable model for face mapping and its application to automatic person identification. While most face recognition techniques directly model the face, our goal is to model the transformation between face images of the same person. As a global face transformation may be too complex to be modeled in its entirety, it is approximated by a set of local transformations with the constraint that neighboring transformations must be consistent with each other. Local transformations and neighboring constraints are embedded within the probabilistic framework of a two-dimensional hidden Markov model (2-D HMM). Experimental results on a face identification task show that the new approach compares favorably to the popular Fisherfaces algorithm.
DEFORMABLE FACE MAPPING FOR PERSON IDENTIFICATION
Florent Perronnin*. Jean-Luc Diigelay
Multimedia Communications Department
BP 193, F-06904 Sophia Antipolis Cedex
Department of Electrical and Computer Engineering
Universitv of California
Santa Barbara, CA 93 106-9560
This paper introduces a novel deformable model for face mapping
and its application to automatic person identification. While most
face recognition techniques directly model the face, our goal is to
model the tronsformofion between face images ofthe same person.
As a globd face transformation may be too complex to be modeled
in its entirety, it is approximated by a set of local transformations
with the constraint that neighboring transformations must be con-
sistent with each other. Local transformations and neighboring
constraints are embedded within the probabilistic framework of a
two-dimensional Hidden Markov Model (2-D HMM). Experimen-
tal results on a face identification task show that the new approach
compares favorably to the popular Fisherfaces algorithm.
Realistic models of the face are required for a wide variety of ap-
plications including facial animation, classification of facial ex-
pressions or face recognition. Face recognition is a challenging
pattern classification problem as face images of the same person
are subject to variations in facial expressions, pose, illumination
conditions, presence or absence of eyeglasses or facial hair, etc.
The focus of this paper will be on the lirst source of variability.
While most face recognition algorithms attempt to build for
each person a face model which is intended to describe as accu-
rately as possible hisher intra-face variability, in this paper we
model a Ironsformation between face images of the same person.
To avoid the excessive complexity of direct modeling of the global
face transformation, we propose to split it into a set of local trans-
formations and to impose neighborhood wnsistency of these lo-
cal transformations. Local transformations and neighboring con-
straints are ~turally embedded within the flexible probabilistic
framework of the 2-D HMM.
Deformable models of the face have already been applied to
face recognition. The basic approach of Elastic Graph Matching
(EGM) [I] is to match two face p p h s in an elastic manner. The
quality of a match is evaluated with a cost function C = C, + pC,
where C , and C , are respectively the costs of local matchings and
local distortions and p controls the rigidity of the matching. 
elaborated on the idea with the Elastic Bunch Graph Matching
(EBGM) and both algorithms were later improved, especially to
‘This work was supported in pan by Fmce ‘hlecom Research.
bhis work was supported in part by the NSF under grants EIA-
9986057 and EIA-0080134, the University of California MICRO pmgzam,
Dolby Laboralones, Inc., Lucent Technologies, Inc., Mindspeed Technolo-
gies, Inc., and Qualcomm, Inc.
0-7803-7750-8/03/$17.00 02003 IEEE
weight the different parts of the face according to their discrimi-
natory power [3,4]. The two major differences between the above
elastic approaches and the new approach presented in this paper
in the use of the HMM framework which provides efficient
formulae to 1) compute the likelihood that a template image
and a query image 3Q
belong to the same person given
the face transformation model M, i.e. P(FTI~Q,
2) train automatically all the parameters of M,
in the use of a shared deformable model of the face M for
all individuals, which is particularly useful when little en-
rollment data is available.
M ) and
The remainder of this paper is organized as follows. A high-
level description of the 2-D HMM as a probabilistic model for
face transformation is given in the next section. Sections 3 and
4 provide a quantitative formulation for local transformations and
neighborhood wnsistency, respectively. In section 5 we briefly
introduce Turbo-HMMs (T-HMMs) to approximate the computa-
tionally intractable 2-D HMMs [SI. Section 6 summarizes exper-
imental results for a face identification task on the FERET face
database  showing that the proposed approach can significantly
outperform the popular Fisherfaces technique .
A global face transformation is too complex for direct modeling.
We hence propose to approximate it with a set of local transforma-
tions. These transformations should be as simple as possible for an
efficient implementation, while the composition of all local trans-
formations, i.e. the resulting global transformation, should be rich
enough to model a wide range of facial deformations. However,
if we allow all possible combinations of local transformations, the
model might become over-flexible and “succeed” to patch together
very different faces. This naturally leads to the second component
of OUT fmework a neigltborltood coherence constraint whose
purpose is to provide context idormation. It must be emphasized
that such neighborhood consistency d e s produce dependence in
the local transformation selection and the optimal solution must
therefore involve a global decision. To combine the local trans-
formation and consistency costs, we embed the system within a
probabilistic framework using 2-D HMMs.
At any location on the face, the system is in one of a finite set
af states. If we assme that the 2-D HMM is first-order Marko-
vian, the probability of the system to enter a particular state at a
given location, i.e. the transitionpmbabilily, depends on the state
1 - 661
of the system at the horizontally and vertically adjacent locations.
At each position, an observation is emitted by the state accord-
ing to an emissior, pmbnbilitv distrihirtiou. In our framework, lo-
cal transfomiations correspolld to the states of the 2-D HMM and
the target/template image data is the collection of emitted obser-
vations. Emission probabilities model the cost associated with a
lowl mapping. These transformations or states are “‘hidden” and
information on them can only be extracted tluough the obscrva-
lions. Transition probabilities relate states of neighboring regions
and implenient the consistency rules.
3. LOCAL TRANSFORMATIONS
Feature vectors are extracted on a sparse grid from the template
image FT and on a dense grid from the query iniage 3~
done in EGM [I]. Each vector summarizes local properties of the
face. We then apply a set of local geometric transformations to the
vectors extracted from FT. Each transformation maps a feahre
vector of FT with a feature vector in FQ. Translation, rotation
and scaling are examples of simple geometric tmnstbnnations and
may be useful to model local dcfomatians of the face. In the re-
mainder of this paper. we restrict the set of geometric transforma-
tions to translations, as a small global a f i e transformation cmi he
approximated by a set of local translations.
We now formulate the emission probabilities. Let os,, be the
observation extracted from FT at position (i:j) (c.f. Fig. 1) and
let qi,, be the associated state (i.e. the translation). I f r = ( T ~ ,
is a translation vector. the probability that at position (z,,?) the
system emits observation oi,, giveen that it is in state qi,j = r. is
h,(oi.,) = P(o;,j[q;,, = r.X) whereX = (X.U,XQ). Weclearly
separate HMM parameters into Face Dependent (FD) parameters
XQ that are extracted from FQ and Face Independent Transfor-
niatian (FIT) pammeters AM, i.e. the parameters of the shared
transformation M that can be trained reliably by pooling together
the training images of all individuals.
Let z;,, = (x~,,,yj,~)
denote the coordinates of observation
oi,, in 3T. Let f : ,
be the coordinates of the matching feature in
the cost of matchg these feature vectors. (s(oi,j) is modeled
with a mixture of Gaussians as linear conihinations of Gaussians
have the ability to approxhnate arbitrarily shaped densities:
z:,~ = zi,, + r. The emission probability b,(oi,j) represents
b,(oi,j) = ~ ? ~ . ! , j b ~ , ~ ( o i , , )
Fig. 1. Local matching
hT,k(oi,j)’s are the component densities and the iu~,,’s are the niir-
ture weights and must satisfy the following constraint: V(i: j ) arul
VT, xk wf,, = 1. Each component density is a N-variate Gaus-
sian functioii of the forni:
trix of the Gaussian, N is the size of feature vectors and l . l de-
notes the determinant operator. Tllis NMM is non-stationary as the
Gaussian parameters depend on lhc position (i, j).
Let nil, be the feature vector extracted from the matchrig
block in 3 ~ .
We use a bi-partite model which sepmtcs the mem
into additive FD and FIT parts:
&- = m:,, -t a : ,
and E ! , , are respectively the mean and covariance ma-
where m;,, is the FD part of the mean and 6f,, is a FIT offset. him-
itively. b,(o;,,) should be approxinstely centered and niaxininm
4. NEIGHBORHOOD CONSISTENCY
The neighborhood consistency of the transformation is ensured
via the transition probabilities of the 2-D HMM. If we assume
that the 2-D HMM is B first order Markov process. the transition
probabilities are ofthe form P(q,,j1qi.,-1,qi-l:j.X). However,
we show in the next section that a 2-D HMM can be approxi-
mated by a ‘Turbo-HMM (T-HMM): a set of horizontal and vertical
I-D HMMs that “commmiicatc” through an iterative process. So
the transition probabihties of the corresponding horizontal and \‘er-
tical I-D HMMs are respectively:
o,I;,(r;r’) = P(q,,,=Tlqi,j-l =r’,X)
ax,(r;r’) = P(qi,j = rIq,-i,, = r’,X)
Invariance to global shift in face images is a desirable pr~pem.
Hence we choose U% and U” to be of the form:
uTj(r;.’) = aTi(6r)
where 6r = T - r‘. a:, and ax, model respectively the hori-
zontal and vertical elastic properties of the face at position ( f , j)
and arc part of the face transformation model M. lfwe assume
and FQ have thc sanie scale and orientation, then U?, and
ayj should have two properties: they should preserve both local
disfar~ce: i.e. r and T ‘ should have the same norm, and ordering.
i.e. r and 7’ should hax2.e the same duection. An horizontal separa-
ble parametric transition probability that satisfies the two QrCViOUS
axj(.; T’) = axj(6r)
where cis a nonnalization factor such that zarz
and Ea,, a?,”(Sr,)
= 1. Similar formulae can be derived for
vertical transition probabilities.
We assuine i~ the remabider that the initial occupancy probability
of the 2-D HMM is uniform to ensure invariance to global trans-
lations of face images. To surmnarize, the parameters we need to
estimate are the FIT parameters
tion probabilities arj% and ayi’s.
U??(~T=) = 1
i.e. iu’s. b’s, E’s and transi-
I - 662
Fig. 2. Neighborhood consistency
While HMMs have been extensively applied to one-dimensional
problems [SI, the complexity of their extension to two-dimensions
grows exponentially with the data size and is intractable in most
cases of interest.  introduced Tuba-HMMs (T-HMMs), in ref-
erence to the celebrated turta error-correcting codesi to approxi-
mate the computationally intractable 2-D HMMs. A T-HMM con-
sists of horizontal and vertical I-D HMMs that "communicate"
through an iterative process.
T-HMMs rely on the following approximation of the joint-
likelihood of observations 0 and states Q given the HMM param-
where OF and oy are respectively the i-th row and j-th column
of observations, & ? and Ay are the i-th row and j-th column of
model parameters and qy is the j-th column of states. Each term
P(oy,qy(Ay) corresponds to a I-D vertical HMM and
fli P(qi,jJo:', AT) is in effect a horizontal prior for column j . We
can derive the dual formula where 1-D horizontal HMMs commu-
nicate through the use of a vertical prior.
The computation of P(FT~FQ,
on a modied version of the forwardbackward algorithm which
is applied successively and iteratively on the rows and columns
until the boriznntal and vertical prion reach some kmd of agee-
ment [SI. This algorithm is clearly linear in the size of the data.
It must be underlined that we do not obtain one unique score but
one horizontal and one vertical score. Combining these iwo scores
is a classical problem of decision fusion. As experiments showed
that these scores were generally close, we simply averaged the log-
likelihoods. Although this simple heuristic may not be optimal it
provided good results. While EGM only takes into acconnt the best
transformation during the score computation, we take into account
all possible transformations weighted according to their probabil-
ity. which should yield a more robust score.
During tro-g, we presont pairs of pictures (a template and
a query image) that belong to the same person and optimize the
transformation parameters AM, to increase the likelihood value
(Maximum Llkehhood Estimation). This is an-
other advantage of the proposed approach as we can train all model
parameters while, to the best of ow howledge, EGM's rigidity
parameter (which has the same function as our transition probabil-
ities) must be hand-tuned.
M), i.e. of P(OlA), is based
6. EXPERIMENTAL RESULTS
6.1. The Database
All the following experiments were carried out on a subset of the
FERET face database . 1.000 individuals were extracted: 500
for training the face deformation model and 500 for testing the per-
formance. We use two images (one target and one query image)
per training or test individual. It means that test individuals are
enrolled with one unique image. Target images are extracted from
the gallery (FA images) and query images from the FB probe. FA
and FB images are frontal views ofthe face that exhibit large vari-
abilities in terms of facial expressions. Images are pre-processed
to extract 128x128 normalized facial regions. Fortlus purpose, we
used the coordinates of the eyes and the tip of the nose provided
with each image.
6.2. Gabor Features
We used Gabor features that have been successfully applied to face
recognition [I, 2,3, Y] and facial analysis [I 01. Gabor wavelets are
plane waves restricted by a Gaussian envelope and can be charac-
terized by the following equation:
where kW,>, = ku exp(i&). k, = kmo,/f" with v E 11,
$p = ?ip/M with p E [l, MI. p and II define respectively the
orientation and scale of kP,".
After preliminary experiments, we chose the following set of
parameters that yielded better results with both our Fisherfaces
baseline and the proposed algorithm: N = 5, M = 8, ff = 271,
k,,, = 1114 and f = h.
For each image we normalized the fea-
ture coefficients to zero mean and unit variance which performed
a divisive contrast normalization [IO].
63. The Baseline: Fisherfaces
While Principal Component Analysis (PCA) is a dimension re-
duction technique which is optimal with respect to data compres-
sion, in general it is sub-optimal for recognition. For such a task,
Fisher's Linear Discriminant (FLD) should be preferred to PCA.
The idea of FLD is to select a subspace that minimizes the ratio of
the inter-class variability and the intra-class variability. However.
the straightforward application of this principle to face recognition
is often impossible due to the high dimensionality of the feaNre
space. A method called Fisherfaces was developed to overcome
this issue : one first applies PCA to reduce the dimension of the
feature space and then performs the standard FLD.
For fair comparison, we did not apply directly Fisherfaces on
the gray level images but on the Gabor features as done for instance
in [ 91. A feature vector was extracted every four pixels U1 the bor-
izontal and veltical directions a n d the concatenation of all these
vectors formed the Gabor representation of the face. In  vari-
ous metroics were tested the LI, Lz (Euclidean), Mahalanobis and
cosine distances. We chose the Mahalanobis metric which con-
sistently outperfonned all other distances. The best Fisherfaces
identilication rate is 93.2% with 300 Fisherfaces
6.4. Performance of the Novel Algorithm
To reduce the computational load, and for a fair comparison with
Fisherfaces, the precision of a troanslation vector T was limited to
I - 661
88 .. 1
. , . . . . . . . .
..> : .........
%B . . .
s ( ... ....I .........
Fig. 3. Performance ofthe proposed algorithm.
4 pixels in both horizontal and vertical directions. Therefore, a
vector m was extracted every 4 pixels ofthe query images as was
done for Fisherfaces (dense grid). For each template hiage, a fea-
ture vector n was extracted every 16 pixels in both horizontal and
vertical directions (sparse grid) which resulted in 7 x 7 = 49 ab-
servations per template image. We tried a smaller step size for
template images but this resulted in niarginal improvements of the
performance at the expensc of a much higher computational load.
To train single Gaussian mixtures, for each training couple
FQ) we fust align approximately 3~
block in 3~
with the corresponding block in FQ and initialize
Gaussian parameters. Transition probabilities are initialized uni-
fomdy. Then AM parameten are re-estimated using the modified
Baum-Welch. To train multiple Gaussians per mixture we imple-
mented an iterative spliniitg/re-training strategy.
We measured the inipact of using multiple Gaussian mixtures
to weight the different parts ofthe face and using multiple horizon-
tal and vertical transitions matrices t o niodel the elastic properties
of the various pa
of the face. In both cases, we used face syni-
metry to reduce the number of parameten to estimae Hencc. we
tried one mixture for the whole face
ut, = iok) and one mixture for each position (using face symme-
try, it resulted in 4 x 7 = 28 mixturcs). We tried one horizontal and
one vertical transition matrices for the whole face and one horizon-
tal aid one vertical transition matrices i t each position (using face
symmetry, it resulted in 3 x 7 = 21 horizontal and 1 x 6 = 24
vertical transition matrices). This made four test configurations.
The performance was drawn an Fig. 3 as a hction of the number
of Gaussians per mixhue.
Wliile applying weiphts to different parts of the face provides
a significant increase of the performance, modeling the various
elasticity properties of the face had a limited impact and resulted
in small consistent improvements. The best performance is Y6.0?/0
identification rate. Applying a simple Mc Nema’s tcst of sipif-
icancy [I I]. we hence guarantee with more thvl 99?? confidence
that our approach performs sigiificantly better than Fisherfaces.
based on 7-HMMs is very eficient as, once the Gabor features are
extracted from 3~
and 3 ~ .
it takes only 15 ms to our bcst system
with 16 GpM to compute the score on a Pentium IV 2 Gbz .
and 3 ~ .
= Ek, 6:,, = 6“ and
We presented a novel deformable model of the face and applied
it successfiilly to face recognition. In our framework, the shared
face dehmalioii is approximated with a set of local transfonna-
Lions with thc constraint that neighboring transformations must be
consistent with each other. Local transformations and neighhar-
ing constraints are embedded within a probabilistic framework us-
ing an approximation of the intractable 2-D IIMMs: the Turbo-
As the oh.jective ofthis work was not modeling face deforma-
tion per se. bnt the face recognition problem. it is notewortliy that
Maxunum Likelihood Estimation is generally not optinal. It may
he advantageous to train the IIMM parameters under discriniina-
ti\ie criteria such as the Minimum ClussiJicariorr Ermr (MCE) or
its approximation via the Murinzrrrrr Muranl InJinnwutine Esrimn-
lion (MMIE) criterion.
[I] M. Lades, 1. C. Vorbriiggen. J. Bulmiann, J. Lange? C. von
der Malsburg. R. P. Wort2 and W. Konen. “Distortion in-
variant object recognition in the dynamic link archlecture:’
IICEE Trum nu Cnniptifers, vol. 42, no. 3. Mar 1993.
 L. Wiskott, J. M. Fellous. N. Kriiger and C. von der Mals-
burg, ”Face recopition by elastic bunch graph matchutg:’
IEEE lruiis. on PAMI, vol. 19, no. 7. pp. 775-779. July 1997.
 B. DBc, S. Fischer and J. Bigiin. “Face aullientication with
gabor information on deformable graphs.” IEEE Trms. mi
lmuge Pmcessin7g. vol. 8, no. 4, Apr 1999.
141 A. Tefas, C. Kotropoulos and 1. Pitas, “Using support ~ector
machincs to enhance the performance of elastic graph matcli-
ing for frontal face recognition.” IEEE Trrnm on PAMI, vol.
23. no. 7, pp. 735-746, Jul 2001.
 F. Permmiin, J.-L. Dngelay and K. Rose, “Iterative decod-
ing of two-dimensional hidden markav models,” h ICASSP,
2003, vol. 3, pp. 329-332.
 P. J. Phillips, H. Wechsler. J. €hang and P. Rauss, “The feret
database and evaluation procedure for face recognition algo-
rithms,’’ Ininge and I4sior Comparing Jotmmul. vol. 16. no.
5, pp. 295-306~ 1998.
 P. N. Bshumeur, J. P . Hespanha and D. J. Kriegman. “Eigen-
faces vs. fisherfaces: Recognition using class specific linear
projection:’ IEEE Trans. on PAMI, vol. 19, pp. 71 1-720. Jnl
[SI L. R. Rabiner, “A tutorial on hidden markov models and
selected applications,” Pinc. o f the IEEE. vol. 77, no. 2, Feb
 C. Liu and H. Wechsler, “Gabor feature based classification
usinp the enhanced fisher linear discriminant model for face
recognition,” ZEEE Trans. on Imuge Processing, vol. 1 I. no.
4, pp. 467-476, hpr 2002.
[IO] F. Donato, M. S. Bartlett, J. C. Hager, P. Ekman and T. J.
“Classifyiig facial expressions.”
PAMI, vol. 21, pp. 974989, Oct 1999.
[ 11 1 I>. Gillick and S. J. Cox, “Some statistical issues i n the cotn-
parison of speech recognition.” in ICASSP. 1989. vol. 1. pp.