Distant face recognition based on sparse-stereo reconstruction.
ABSTRACT We introduce a framework for face recognition at a distance based on sparse-stereo reconstruction. We develop a 3D acquisition system that consists of two CCD stereo cameras mounted on pan-tilt units with adjustable baseline. We first detect the facial region and extract its landmark points, which are used to initialize an AAM mesh fitting algorithm. The fitted mesh vertices provide point correspondences between the left and right images of a stereo pair; stereo-based reconstruction is then used to infer the 3D information of the mesh vertices. We perform experiments regarding the use of different features extracted from these vertices for face recognition. The cumulative rank curves (CMC), which are generated using the proposed framework, confirms the feasibility of the proposed work for long distance recognition of human faces with respect to the state-of-the-art .
- SourceAvailable from: psu.edu
Article: Active Appearance Models Revisited[show abstract] [hide abstract]
ABSTRACT: Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instance; i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed ("projected out") using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.International Journal of Computer Vision 03/2004; · 3.62 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: In this paper, we describe a face video database, UTK-LRHM, acquired from long distances and with high magnifications. Both indoor and outdoor sequences are collected under uncontrolled surveillance conditions. To our knowledge, it is the first database to provide face images from long distances (indoor: 10–16 m and outdoor: 50–300 m). The corresponding system magnifications range from 3× to 20× for indoor and up to 284× for outdoor. This database has applications in experimentations with human identification and authentication in long range surveillance and wide area monitoring. Deteriorations unique to long range and high magnification face images are investigated in terms of face recognition rates based on the UTK-LRHM database. Magnification blur is shown to be a major degradation source, the effect of which is quantified using a novel blur assessment measure and alleviated via adaptive deblurring algorithms. A comprehensive processing algorithm, including frame selection, enhancement, and super-resolution is introduced for long range and high magnification face images with a large variety of resolutions. Experimental results using face images of the UTK-LRHM database demonstrate a significant improvement in recognition rates after assessment and enhancement of degradations.Computer Vision and Image Understanding. 01/2008;
Conference Proceeding: Biometric identification system by lip shape[show abstract] [hide abstract]
ABSTRACT: Biometrics systems based on lip shape recognition are of great interest, but have received little attention in the scientific literature. This is perhaps due to the belief that they have little discriminative power. However, a careful study shows that the difference between lip outlines is greater than that between shapes at different lip images of the same person. So, biometric identification by lip outline is possible. In this paper the lip outline is obtained from a color face picture: the color image is transformed to the gray scale using the transformation of Chang et al. (1994) and binarized with the Ridler and Calvar threshold. Considering the lip centroid as the origin of coordinates, each pixel lip envelope is parameterized with polar (ordered from -π to +π) and Cartesian coordinates (ordered as heights and widths). To asses identity, a multilabeled multiparameter hidden Markov model is used with the polar coordinates and a multilayer neural network is applied to Cartesian coordinates. With a database of 50 people an average classification hit ratio of 96.9% and equal error ratio (EER) of 0.015 are obtained.Security Technology, 2002. Proceedings. 36th Annual 2002 International Carnahan Conference on; 02/2002
DISTANT FACE RECOGNITION BASED ON SPARSE-STEREO RECONSTRUCTION
Ham M. Rara, Shireen Y. Elhabian, Asem M. Ali, Mike Miller, Thomas L. Starr, Aly. A Farag
CVIP Laboratory, University of Louisville, KY, USA
We introduce a framework for face recognition at a distance
based on sparse-stereo reconstruction. We develop a 3D
acquisition system that consists of two CCD stereo cameras
mounted on pan-tilt units with adjustable baseline. We first
detect the facial region and extract its landmark points,
which are used to initialize an AAM mesh fitting algorithm.
The fitted mesh vertices provide point correspondences
between the left and right images of a stereo pair; stereo-
based reconstruction is then used to infer the 3D information
of the mesh vertices. We perform experiments regarding the
use of different features extracted from these vertices for
face recognition. The cumulative rank curves (CMC), which
are generated using the proposed framework, confirms the
feasibility of the proposed work for long distance
recognition of human faces with respect to the state-of-the-
Index Terms— stereo-reconstruction, face recognition
Face recognition is a challenging task that has been an
attractive research area in the past three decades .
Initially, most efforts were directed towards 2D facial
recognition which utilizes the projection of the 3D human
face onto the 2D image plane acquired by digital cameras.
The recognition problem is then formulated as: given a still
image, it is required to identify or verify one or more
persons in the scene using a stored database of face images.
The main theme of the solutions provided by different
researchers involves detecting one or more faces from the
given image, followed by facial feature extraction which can
be used for recognition.
Recently, there has been interest in face recognition at-
a-distance. Yao, et al.  created a face video database,
acquired from long distances, high magnifications, and both
indoor and outdoor under uncontrolled surveillance
conditions. They created a comprehensive processing
algorithm to deal with image degradations related to long-
distance image acquisition and were successful in improving
recognition rates. Medioni, et al.  presented an approach
to identify non-cooperative individuals at a distance by
inferring 3D shape from a sequence of images.
In this paper, we propose to use active appearance
models (AAM) to provide sparse point correspondence.
Given a stereo pair (left and right images), we first detect
facial region using the logRGB space, landmark points such
as eye centers, mouth center and nose tip are then extracted
to initialize the AAM mesh fitting algorithm. We exploit the
correspondence between the fitted mesh vertices of the left
and right images to apply stereo reconstruction.
To achieve our goal and due to the lack of facial stereo
databases, we built our own passive stereo acquisition setup
to acquire a stereo database. Our setup consists of a stereo
pair of high resolution cameras with adjustable baseline.
The setup is designed such that user can pan, tilt, zoom and
focus the cameras to converge the center of the cameras
field of views on the subject’s nose tip. We used our
acquisition system to capture a stereo pairs of 30 subjects at
The paper is organized as follows: Section 2 describes
stereo-based reconstruction, Section 3 discusses AAM
fitting, Section 4 enumerates the features used for
recognition, and Section 5 shows experimental results. The
rest of the paper involves discussion of results, conclusion
and future work.
2. STEREO-BASED RECONSTRUCTION
For this work, we use the known orientation between the
two cameras to estimate 3D points. The theoretical
explanation of the cameras relationship is illustrated in Fig.
Figure 1: General stereo pair setup, where OL, OR are the left and
right camera coordinate systems and Ow is the world coordinate
4141 978-1-4244-5654-3/09/$26.00 ©2009 IEEEICIP 2009
Since the system parameters (i.e. baseline B (meter), focal
length f (mm), pan angle Φ (degree), and scale factor of the
cameras kα (pixel/mm)) are known, the scene point (X,Y,Z)
can be reconstructed from its projections p and q using the
geometry shown in Fig. 1, assuming py = qy, as follows. First
we calculate the values
???? ????? ?????????? ? ??????
??? ????? ?????????? ? ??????
? ??? ??????? ? ?????????
Then, we compute the reconstructed scene point (X, Y ,Z) as
?? ? ? ????????
?? ? ? ???????
???? ? ??
3. ACTIVE APPEARANCE MODEL (AAM) FITTING
Matthews and Baker  considered the independence of
shape and appearance (independent AAMs) in their AAM
version. The shape ? can be expressed as the sum of a base
shape ?? and a linear combination of ? shape vectors ??,
? ? ??? ? ?????
, where ?? are the shape parameters.
Similarly, the appearance ???? can be expressed as the sum
of the base appearance ????? and a linear combination of
basis images ?????, ???? ? ????? ? ? ???????
pixels ? lie on the base mesh ??. Fitting the AAM to an
input image involves minimizing the error image between
the input image warped to the base mesh and the appearance
???? ? ????? ? ? ???????
, that is
, where the
??????? ? ? ???????
For this work, the error image is minimized using the
project-out version of the inverse compositional image
alignment (ICIA) algorithm .
AAM Initialization: To facilitate a successful fitting
process, the AAM mesh is initialized according to detected
face landmarks (eyes, mouth centers, and nose tips). Fig. 2
shows the results of the detection of face landmarks. After
detecting these face features, the AAM base mesh is warped
to these points.
The logRGB space is used to detect candidate face
regions, and each candidate is scored whether a face or not
by trying to detect a pair of eyes. The detection of eyes
involves the concept of eigeneyes . The mouth center
detection  involves transforming the image by a linear
combination of the red, green, and blue chrominance
components of the RGB color space. This transformed
image emphasizes the lips pixels. Estimating the nose tip
involves finding the centers of mass of two nostril
candidates. The mean of these centers is considered to be
the nose tip. The nostril candidates are determined by taking
into account its low red response in the RGB space, and
their distance to the mouth and eye centers.
Figure 2: Detection of face features.
4. FEATURES FOR FACE RECOGNITION
For face recognition, we use four approaches for using the
3D face vertices derived from AAM and stereo to identify
probe images against the gallery, namely: (a) feature vectors
derived from Principal Component Analysis (PCA) of 3D
vertices, (b) goodness-of-fit criterion (Procrustes) after
rigidly registering a probe with a gallery subject, (c) feature
vectors from PCA of x-y plane projections of the 3D
vertices, and (d) the same procedure as (b) but using the x-y
plane projections of the 3D vertices. The use of the x-y plane
projections will be explained in the experimental results
Principal Component Analysis (PCA) : To apply
PCA for feature classification, the primary step is to solve
for the matrix P of principal components from a training
database, using a number of matrix operations. The feature
vectors Y can then be determined as follows: ? ? ???,
where ? is a centered input data.
Goodness-of-fit criterion (Procrustes): The squared
Procrustes distance between two shapes ?? and ?? is the
sum of squared point distances after alignment:
Rigid alignment involves solving ????, such that the term
?????? ? ???? is minimum .
5. EXPERIMENTAL RESULTS
To test our framework, we used our 3D acquisition system
to build a human face database for 30 different subjects at
different ranges in controlled environments. The subjects are
asked to stand in front of our system setup for capturing.
The system is adjusted to converge to the center of the
cameras’ field of, views on the subject’s nose tip. Our
?? ? ??????? ?????
?? ?????? ?????
database consists of a gallery at 3 meters and two different
probe sets at the 3- and 15-meter range.
The training of the AAM model involves images from
the gallery. Fig. 3 shows AAM fitting results on probe
images at two different ranges. The vertices of the final
AAM mesh on both left and right images can be considered
as a set of corresponding pair of points, which can be used
for stereo reconstruction. Fig. 4 shows stereo reconstruction
results of three subjects, visualized with the x-y, x-z, and y-z
projections, after rigid alignment to one of the subjects.
Notice that in the x-y projections, the similarity (or
difference) of 2D shapes coming from the same (or
different) subject is enhanced. This is the main reason
behind the use of x-y projections as features in Sec. 4.
Figure 3: AAM Fitting Results. First column is the input image.
Second column is the final AAM result with superimposed best-fit
texture. Third column shows final AAM vertices.
Fig. 5 shows the cumulative rank curves (CMC) curves
for the four types of feature vectors mentioned in the
previous section, using 3- and 15-meter probes.
6. DISCUSSION OF RESULTS
From Fig. 5, we can draw three conclusions: (a) both 2D
Procrustes and 2D PCA outperform both 3D Procrustes and
3D PCA, (b) goodness-of-fit criterion (Procrustes) outshines
PCA in both 2D and 3D, and (c) degradation of recognition
at increased distances.
The conclusion in (a) can be explained with help of Fig.
6. The diagram shows the top view of a simple stereo
system. Ol and Or are centers of projection, and pl, pr, ql, qr
are points on the left and right images. Assume that the y-
coordinates of the four image points are equal (see Sec. 2).
pl and pr will reconstruct P, ql and qr will reconstruct Q, and
so on. Notice that a small change in the correspondence
affects the xyz reconstructions hugely, i.e., their Euclidean
distance between each other is huge. But when they are
projected to the x-y plane, their 2D Euclidean distances with
each other are considerably lesser. This scenario is possibly
happening with the correspondence of the AAM vertices
between the left and right image. What essentially happened
here is using stereo as pose correction of the vertices in Fig.
3. The left and right AAM vertices, by themselves, cannot
perform recognition due to presence of pose but after
Figure 4: Reconstruction results. The 3D points are visualized as
projections in the x-y, x-z, and y-z planes. Red (circle) and green
(diamond) markers belong to the same subject, while the blue
(square) maker is that of a different subject.
performing stereo and performing orthographic projection,
the real 2D shape of the face is extracted.
The conclusion in (b) is related to the primary purpose
of PCA is optimal reconstruction error, with mean-square-
error (MSE) sense. It is possible that projecting the original
shape vector to a low-dimensional space removes the
classification potential of the vectors. There is no
dimensional change with rigid alignment using Procrustes;
similar shapes are expected to have less Procrustes distance
after rigid alignment.
Results are expected to degrade with distance since the
captured images are at less ideal conditions. The work in
Medioni deals with identification at a distance using
recovered 3D shape from a sequence of captured images.
The results at the 15-meter range (Fig. 5-b) are better than
their 9-meter results; however, the uncontrolled/controlled
environment issue may have contributed largely to this.
The results in Fig. 5 are based on experiments that use
the total number of AAM vertices (68). Additional
experiments are done to investigate the effect of using only
a fraction of the total number of vertices for recognition.
The points here are the x-y projections of the original 3D
vertices and the recognition approach involves the
goodness-of-fit criterion (since they perform best in Fig. 5).
Fig. 7 illustrates the increasing number of points on the
face used in this experiment. The first case contains only the
positions of the two eyes, nose tip and mouth center. Case 5
considers the outline of the face parts. The last case is the
total number of points resulting from an AAM fitting
Fig. 8 shows the CMC curves for the test cases in Fig.
7. There are two conclusions that can be drawn here: (a)
with the 3-meter probe set, Case 3 is enough to perform
perfect recognition and (b) with the 15-meter probe,
however, Cases 3-5 outperform the full number of AAM
points at rank 1. For the 15-meter probe set, the points
inside the mouth contour, as well as the face outline,
contribute to the error. From a correspondence problem
perspective, this is expected since the face outline points (as
well as the inside mouth points) are found in homogenous
regions of the face and are prone to correspondence artifacts
that affect depth estimation (see Fig. 6).
The conclusions here are valid, at least, for this
database. Further work will involve testing these concepts
on larger databases (e.g., increasing the number of subjects
in our database).
7. CONCLUSIONS AND FUTURE WORK
In this paper, we have studied the use sparsely-reconstructed
points from the AAM vertices of a stereo pair, in the context
of long-distance recognition. Using our database of images
taken at the 3- and 15-meter range, we have illustrated the
potential of using these few vertices, as opposed to the
whole set of points of the human face. Results show
comparable performance compared to the state-of-the-art.
The next step is to increase our database size, as well as
improving (refining) the correspondence between the AAM
vertices of the left and right images.
Figure 5: Cumulative match characteristic (CMC) curves: (a)
using the 3-meter probe set and (b) 15-meter probe set.
Figure 6: Simple stereo illustration. 3D reconstruction is sensitive
to the correspondence problem but projection to the x-y plane
minimizes the error.
Figure 7: Visualization for test cases involving increasing the
number of points used for recognition.
Figure 8: Cumulative match characteristic (CMC) curves of test
cases in Fig. 7: (a) using the 3-meter probe set and (b) 15-meter
 W. Zhao, R. Chellapa, and A. Rosenfeld,“Face recognition: a
literature survey,” ACM Computing Surveys, 35 (2003) 399–458
 Yao, et al., “Improving long range and high magnification
face recognition: Database
enhancement,” CVIU 111(2), 2008
 G. Medioni, et al., “Non-Cooperative Persons Identification at
a Distance with 3D Face Modeling,” BTAS, 2007
 I. Matthews and S. Baker, “Active Appearance Models
Revisited,” ICCV, 2004
 W. Huang, Q. Sun, C. Lam, and J. Wu, "A robust approach to
face and eyes detection from images with cluttered background,"
 E. Gomez, et al., “Biometric identification system by lip
shape,” Security Technology 2002
 P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs.
Fisherfaces: Recognition using Class Specific Linear Projection,”
IEEE Trans. PAMI, 19(7), 1997
 T.F. Cootes and C.J. Taylor, “Statistical Models of
Appearance for Computer Vision,” Technical Report, University of
Manchester, UK, March 2004
acquisition, evaluation, and