Frontal view recognition in multiview video sequences.
ABSTRACT In this paper, a novel method is proposed as a solution to the problem of frontal view recognition from multiview image sequences. Our aim is to correctly identify the view that corresponds to the camera placed in front of a person, or the camera whose view is closer to a frontal one. By doing so, frontal face images of the person can be acquired, in order to be used in face or facial expression recognition techniques that require frontal faces to achieve a satisfactory result. The proposed method firstly employs the Discriminant NonNegative Matrix Factorization (DNMF) algorithm on the input images acquired from every camera. The output of the algorithm is then used as an input to a support vector machines (SVMs) system that classifies the head poses acquired from the cameras to two classes that correspond to the frontal or non frontal pose. Experiments conducted on the IDIAP database demonstrate that the proposed method achieves an accuracy of 98.6% in frontal view recognition.

Article: Frontal facial pose recognition using a discriminant splitting feature extraction procedure
[Show abstract] [Hide abstract]
ABSTRACT: Frontal facial pose recognition deals with classifying facial images into two classes: frontal and nonfrontal. Recognition of frontal poses is required as a preprocessing step to face analysis algorithms (e.g. face or facial expression recognition) that can operate only on frontal views. A novel frontal facial pose recognition technique that is based on discriminant image splitting for feature extraction is presented in this paper. Spatially homogeneous and discriminant regions for each facial class are produced. The classical image splitting technique is used in order to determine those regions. Thus, each facial class is characterized by a unique region pattern which consist of homogeneous and discriminant 2D regions. The mean intensities of these regions are used as features for the classification task. The proposed method has been tested on data from the XM2VTS facial database with very satisfactory results.Journal of Computing and Information Technology. 12/2011; 19(4).
Page 1
Frontal view recognition in multiview video
sequences
I. Kotsia1,2, N. Nikolaidis1,2and I. Pitas1,2
1Aristotle University of Thessaloniki, Department of Informatics, Box 451, 54124, Greece
2Informatics and Telematics Institute, CERTH, Greece
Abstract—In this paper, a novel method is proposed as a solu
tion to the problem of frontal view recognition from multiview
image sequences. Our aim is to correctly identify the view that
corresponds to the camera placed in front of a person, or the
camera whose view is closer to a frontal one. By doing so,
frontal face images of the person can be acquired, in order
to be used in face or facial expression recognition techniques
that require frontal faces to achieve a satisfactory result. The
proposed method firstly employs the Discriminant NonNegative
Matrix Factorization (DNMF) algorithm on the input images
acquired from every camera. The output of the algorithm is
then used as an input to a Support Vector Machines (SVMs)
system that classifies the head poses acquired from the cameras
to two classes that correspond to the frontal or non frontal pose.
Experiments conducted on the IDIAP database demonstrate that
the proposed method achieves an accuracy of 98.6% in frontal
view recognition.
I. INTRODUCTION
The aim of the proposed method is to take advantage
of existing face or facial expression recognition techniques
that require frontal faces. The scenario under consideration
includes multiple cameras that are placed at certain known
angles in a convergent setup in order to properly capture the
movements of a person. In such a scenario, the proposed
algorithm can be used to identify the view that corresponds to
the camera placed in front of a person, or the camera whose
view is closer to a frontal one. By doing so, frontal face
images of the person can be acquired and fed to a face or
facial expression recognition technique that requires frontal
faces. The face or facial expression recognition problem task
is thus approached in a multiview environment, leading to
viewindependent face or facial expression recognition.
Two cases can be handled by such an approach. The
first case assumes that the person’s head pose remains the
same throughout the video sequence, whereas the second one
assumes that the person’s head pose changes through time.
In the latter case, the camera that provides a frontal view
should be detected at each frame. It should be noted that the
proposed technique can be also used in an analogous manner
for the utilization of existing frontal face or facial expressions
recognition techniques in a multiview environment.
For the proposed method, the images (frames) acquired
from each camera are used as an input to the Discriminant
NonNegative Matrix Factorization (DNMF) algorithm. The
DNMF algorithm is a matrix decomposition algorithm that is
an extension of the Nonnegative Matrix Factorization (NMF)
algorithm. The NMF algorithm is an unsupervised algorithm
Fig. 1. Diagram of the proposed system
that allows only additive combinations of non negative com
ponents. DNMF was the result of an attempt to introduce
discriminant information to the NMF decomposition in a
supervised manner. The NMF and DNMF algorithms will be
presented analytically below. DNMF decomposes an image
into a linear combination of basis images. The DNMF’s output,
namely the decomposition coefficients, is then inserted into a
Support Vector Machines (SVMs) system that performs the
final classification into the two desired classes (frontal or non
frontal facial images). A diagram of the proposed system is
depicted in Figure 1.
II. DISCRIMINANT NONNEGATIVE MATRIX
FACTORIZATION ALGORITHMS
In this Section, the NonNegative Matrix Factorization
(NMF) algorithm and the procedure followed to formulate its
variant, the DNMF approach [1], are briefly presented.
Let an image scanned rowwise so as to form a vector
x = [x1...xF]Tfor the NMF algorithm. The basic idea
behind NMF is to approximate (with small approximation
error) the image x by a linear combination of a set of basis
images in Z ∈ ℜF×M
+
, whose coefficients are the elements of
h ∈ ℜM
matrix is constructed, where xij is the ith element of the
jth image vector. In other words, the jth column of X is
the facial image xj. NMF aims at finding two matrices and
such that:
+such that x ≈ Zh. In order to train the NMF, the
Page 2
X ≈ ZH.
(1)
Obviously, the application of NMF requires the evaluation
of the basis images in Z. This is done by a training phase that
requires a set of training images x1...xT.
After the NMF decomposition, the facial image xj can be
written as xj ≈ Zhj, where hj is the jth column of H.
Thus, the M columns of the matrix Z can be considered as
the M basis images and the vector hjas the weight vector that
corresponds to image x. The vector hjcan be also considered
as the projection of xjin a lower dimensional space.
The cost for the decomposition (1) can be defined as the
sum of all KL divergences for all images in the database:
D(XZH) =?
=?
jKL(xjZhj)
xi,j
?
i,j
?
xi,jln(
kzi,khk,j) +?
kzi,khk,j− xi,j
?
.
(2)
The NMF factorization is the outcome of the following opti
mization problem:
min
Z,HD(XZH) subject to
?
i
(3)
zi,k≥ 0, hk,j≥ 0,
zi,j= 1, ∀j.
In order to formulate the DNMF algorithm, let the matrix
X that contains all the facial images that are organized in
two classes r = {1,2}. The first class consists of the frontal
images while the second one of the non frontal images. The
jth column of X is the ρth image of the rth image class.
Thus, j =?r−1
the image class i. It should be noted that the frontal image
class consists of the images corresponding to one camera view
provided that the person does not move (pan, roll) his head
during the acquisition. If this is not the case, the images should
be assigned to the two classes manually. In this case, the
images from the other cameras make up the nonfrontal image
class.
The columns of the matrix H are divided to two sets, each
set containing the vectors hj corresponding to each class r.
The vector hj that corresponds to the jth column of the
matrix H, is the coefficient vector for the ρth facial image of
the rth class and will be denoted as η(r)
The mean vector of the vectors η(r)
as µ(r)= [µ(r)
µ = [µ1...µM]T. Then, the withinclass scatter matrix for
the coefficient vectors hjis defined as:
i=1Ni+ ρ, where Ni is the cardinality of
ρ
= [η(r)
ρ,1...η(r)
ρ,M]T.
ρ
for the class r is denoted
1 ...µ(r)
M]Tand the mean of all classes as
Sw=
K
?
r=1
Nr
?
ρ=1
(η(r)
ρ
− µ(r))(η(r)
ρ
− µ(r))T
(4)
whereas the betweenclass scatter matrix is defined as:
Sb=
K
?
r=1
Nr(µ(r)− µ)(µ(r)− µ)T.
(5)
The matrix Sw defines the scatter of the sample vector
coefficients around their class mean. The dispersion of samples
that belong to the same class around their corresponding mean
should be as small as possible. A convenient measure for the
dispersion of the samples is the trace of Sw.
The matrix Sbdenotes the betweenclass scatter matrix and
defines the scatter of the mean vectors of all classes around
the global mean µ. Each class must be as far as possible from
the other classes. Therefore, the trace of Sbshould be as large
as possible.
To formulate the DNMF method [2], discriminant con
straints have been incorporated in the NMF decomposition
inspired by the minimization of the Fisher’s criterion [2]. The
DNMF cost function is given by:
Dd(XZH) = D(XZH) + γtr[Sw] − δtr[Sb]
(6)
where γ and δ are nonnegative constants. The update rules
that guarantee a nonincreasing behavior of (6) for the weights
hk,j and the bases zi,k, under the constraints of (2), can be
found in [2].
Once the basis images have been calculated by the applica
tion of DNMF on the training face images, the facial image
acquired from a certain camera is projected to the derived
lower dimensional feature space ˜ g = ZTx and is later inserted
to a SVMs system that decides if the facial image under
examination is frontal or not. A brief description of the SVMs
system used [2] will be presented below.
III. SUPPORT VECTOR MACHINES CLASSIFIER
In order to decide if the facial image under examination is
frontal or not, the output of the DNMF algorithm is used as
an input to a two class SVMs system. The SVMs is trained
with the frontal pose images in the set U1= {(gj,yj),j =
1,...,M,yj = 1} as positive examples and all nonfrontal
pose images in U2= {(gj,yj),j = 1,...,K,yj = −1}
as negative examples where gi is the output of the DNMF
algorithm and yjis the image label.
The SVMs used for our experiments were proposed in [3]
and are a variant of the typical maximum margin SVMs.
They have been inspired by the optimization of the Fisher’s
discriminant ratio and incorporate statistic information about
the classes under examination. The typical maximum margin
SVMs as well as the variant that was used for the experiments
will be presented below in detail.
A. Maximum margin SVMs
In order to train the SVMs network, the following mini
mization problem has to be solved [4]:
min
wk,bk,ξk
1
2wT
kwk+ Ck
N
?
j=1
ξk
j
(7)
subject to the separability constraints:
yk
i(wT
kφ(gj) + bk) ≥ 1 − ξk
j,ξk
j≥ 0,j = 1,...,N
(8)
Page 3
where bkis the bias for the kth SVM, ξk= [ξk
the slack variable vector and Ckis the term that penalizes the
training errors.
After solving the optimization problem (7) subject to the
separability constraints (8) ([5], [6]), the function that decides
whether the facial image corresponds to a frontal pose is:
i,...,ξk
w] is
fk(g) = sign(wT
kφ(g) + bk)
(9)
where G is an arbitrary dimensional Hilbert space [7] and φ :
ℜL→ G. In this formulation, a nonlinear mapping φ has been
used for a high dimensional feature mapping for obtaining a
linear SVMs system in which it should be φ(g) = g. This
mapping is defined by a positive kernel function, h(gi,gj),
specifying an inner product in the feature space and satisfying
the Mercer condition [5], [6]:
h(gi,gj) = φ(gi)Tφ(gj).
(10)
The function used as the SVMs kernel was the d degree
polynomial function:
h(gi,gj) = (gi
Tgj+ 1)d.
(11)
and the Radial Basis Function (RBF) kernel:
h(gi,gj) = exp(−γ ? gi− gj?2).
(12)
where γ is the spread of the Gaussian function.
B. SVMs proposed in [3]
In order to form the optimization problem of the SVMs
proposed in [3] we should define the within class scatter matrix
of the training set:
Sk
w=
?
gi∈U1
k
(gi−µ1
k)(gi−µ1
k)T+
?
gi∈U2
k
(gi−µ2
k)(gi−µ2
k)T
(13)
where µ1
and U2
matrix Sk
dimensionality of the vector gi is classically smaller than
the number of available training examples). The optimization
problem of the modified SVMs is [3]:
kand µ2
k, respectively. It is assumed that the within scatter
wis invertible (which is true in our case, since the
kare the mean vectors of the classes U1
k
min
wk,bk,ξk
wT
kSk
wwk+ Ck
N
?
j=1
ξk
j
(14)
subject to the separability constraints (8) (here we refer to the
linear case where φ(g) = g).
The linear decision function that decides whether the facial
image under examination corresponds to a frontal pose or not,
is:
fk(g) = sign(wT
kg+bk) = sign(1
2
N
?
j=1
yk
iak
igT
jSk
w
−1g+bk).
(15)
IV. EXPERIMENTAL RESULTS
Due to the lack of multiview data with accompanying
ground truth, experiments were performed in the IDIAP
Fig. 2. An example of the IDIAP database
Fig. 3.
images from the IDIAP database
Examples of frontal (upper row) and non frontal (lower row) facial
database [8]. The database comprises of 23 video sequences
involving people engaged in natural activities. In total, 16 dif
ferent subjects participate in the video database. The database
contains head pose ground truth in the form of pan, tilt and roll
angles (i.e. Euler angles with respect to the camera coordinate
system) for each frame of the video sequences.
Face detection and tracking were applied on the images
acquired from the video cameras and the resulting Regions
Of Interest (ROI) were inserted in the DNMF algorithm. An
example of the results of a face tracker for a video from the
IDIAP database is shown in Figure 2.
For the experiments, appropriate groundtruth data were
extracted from the IDIAP database. The images regarded as
frontal facial images included images with a slight pan and
roll movement, taking under consideration that in a multiview
environment the camera positions might be such that no
camera captures a perfectly frontal image. In this case the view
that is closer to a frontal one should be detected. Examples
of facial images that were assigned to the frontal facial pose
class (allowing a head displacement of 10oin all axes) are
depicted in the first row of Figure 3, while in the second row
examples from the non frontal facial class (head rotation more
than 10oin all axes) are shown.
The most usual approach for testing the generalization
performance of a SVMs classifier, is the leaveoneout cross
Page 4
validation approach [9] which enables the maximal use of the
available data and evaluates averaged classification accuracy
on the test dataset. A variant of this approach was used in
our case. More specifically, all facial images contained in the
database were divided into 2 classes, each one corresponding
to frontal and non frontal poses, according to the range of
degrees we defined as an acceptable head rotation in each axis.
Five sets containing 20% of the images contained in each class,
chosen randomly, were created. One such set was used as test
set, while the remaining four sets formed the training set. After
the classification procedure is performed, the samples forming
the testing set were incorporated into the current training set,
and a new set of samples (20% of the samples for each class)
was extracted to form the new test set. The remaining samples
create the new training set. This procedure was repeated five
times. The average classification accuracy was calculated as
the mean value of the percentages of the correctly classified
facial images.
The confusion matrix has also been computed. The confu
sion matrix is a n×n matrix (n being the number of classes)
containing information about the actual class label labacin its
columns and the label obtained through classification labclin
its rows. The diagonal entries of the confusion matrix are the
percentages of facial images that are correctly classified, while
the offdiagonal entries are the percentages corresponding to
misclassification rates.
The accuracies achieved when head rotation for frontal
images was within 5o, 10o, 15oand 20oin each axis, were
equal to 98.6%, 98.2%, 95.6% and 94.9%, respectively. The
confusion matrix of the experiments when the acceptable head
rotation for a frontal pose was within 10ois presented in Table
1. All the above results were achieved using a RBF kernel with
γ = 0.1.
V. FUTURE WORK
The proposed frontal view detection algorithm will be
combined with existing facial expression recognition algo
rithms that require frontal view images in order to judge their
performance as a single system on multiview data. Research
towards facial expressions recognition algorithms that work
on multiview data and exploit all available views will be also
conducted. Later on, when 3D reconstructions of persons in
a scene become available, methods that operate on such data
will be researched.
VI. CONCLUSION
Frontal view recognition in multiview video sequences has
been investigated in this paper. A novel method that uses the
DNMF algorithm in combination with an SVMs system in
order to detect the frontal pose from an image acquired from
a camera has been proposed. Experiments performed in the
IDIAP database yielded an accuracy rate equal to 98.6% in
frontal view recognition.
ACKNOWLEDGMENTS.
The research leading to these results has received funding
from the European Community’s Seventh Framework Pro
gramme (FP7/20072013) under grant agreement n 211471
(i3DPost).
REFERENCES
[1] S. Zafeiriou, A. Tefas, I. Buciu, and I. Pitas, “Exploiting discriminant
information in nonnegative matrix factorization with application to
frontal face verification,” IEEE Transactions on Neural Networks, vol. 17,
no. 3, pp. 683–695, 2006.
[2] ——, “Exploiting discriminant information in nonnegative matrix factor
ization with application to frontal face verification,” IEEE Transactions
on Neural Networks, vol. 17, no. 3, pp. 683 – 695, 2006.
[3] A. Tefas, C. Kotropoulos, and I. Pitas, “Using support vector machines
to enhance the performance of elastic graph matching for frontal face
authentication,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 7, pp. 735–746, 2001.
[4] C. W. Hsu and C. J. Lin, “A comparison of methods for multiclass Support
Vector Machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2,
pp. 415–425, March 2002.
[5] V. Vapnik, Statistical learning theory.
[6] C. J. C. Burges, “A tutorial on Support Vector Machines for Pattern
Recognition,” Data Mining and Knowledge discovery, vol. 2, no. 2, 1998.
[7] B. Scholkopf, S. Mika, C. Burges, P. Knirsch, K.R. Muller, G. Ratsch,
and A. Smola, “Input space vs. feature space in kernelbased methods,”
IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1000–1017,
1999.
[8] S. Ba and J.M. Odobez, “Evaluation of multiple cues head pose
estimation algorithms in natural environments,” in IEEE International
Conference on Multimedia and Expo (ICME), Amsterdam, 2005.
[9] I. Cohen, N. Sebe, S. Garg, L. S. Chen, and T. S. Huanga, “Facial expres
sion recognition from video sequences: temporal and static modelling,”
Computer Vision and Image Understanding, vol. 91, pp. 160–187, 2003.
New York: Wiley, 1998.