An improvement on PCA algorithm for Face Recognition
Vo Dinh Minh Nhat and Sungyoung Lee
Kyung Hee University – South of Korea
Abstract. Principle Component Analysis (PCA) technique is an important and
well-developed area of image recognition and to date many linear
discrimination methods have been put forward. Despite these efforts, there
persist in the traditional PCA some weaknesses. In this paper, we propose a
new PCA-based method that can overcome one drawback existed in the
traditional PCA method. In face recognition where the training data are labeled,
a projection is often required to emphasize the discrimination between the
clusters. PCA may fail to accomplish this, no matter how easy the task is, as
they are unsupervised techniques. The directions that maximize the scatter of
the data might not be as adequate to discriminate between clusters. So we
proposed a new PCA-based scheme which can straightforwardly take into
consideration data labeling, and makes the performance of recognition system
better. Experiment results show our method achieves better performance in
comparison with the traditional PCA method.
Index Terms – Principle component analysis, face recognition.
Principal component analysis (PCA), also known as Karhunen-Loeve expansion, is
a classical feature extraction and data representation technique widely used in the
areas of pattern recognition and computer vision. Sirovich and Kirby ,  first used
PCA to efficiently represent pictures of human faces. They argued that any face
image could be reconstructed approximately as a weighted sum of a small collection
of images that define a facial basis (eigenimages), and a mean image of the face.
Within this context, Turk and Pentland  presented the well-known Eigenfaces
method for face recognition in 1991. Since then, PCA has been widely investigated
and has become one of the most successful approaches in face recognition , ,
, . However, Wiskott et al.  pointed out that PCA could not capture even the
simplest invariance unless this information is explicitly provided in the training data.
Recently, two PCA-related methods, independent component analysis (ICA) and
kernel principal component analysis (Kernel PCA) have been of wide concern.
Bartlett et al.  and Draper et al.  proposed using ICA for face representation
and found that it was better than PCA when cosines were used as the similarity
2 Vo Dinh Minh Nhat and Sungyoung Lee
measure (however, their performance was not significantly different if the Euclidean
distance is used). Yang  used Kernel PCA for face feature extraction and
recognition and showed that the Kernel Eigenfaces method outperforms the classical
Eigenfaces method. However, ICA and Kernel PCA are both computationally more
expensive than PCA. The experimental results in  showed the ratio of the
computation time required by ICA, Kernel PCA, and PCA is, on average, 8.7: 3.2:
In face recognition where the data are labeled, a projection is often required to
emphasize the discrimination between the clusters. PCA may fail to accomplish this,
no matter how easy the task is, as they are unsupervised techniques. The directions
that maximize the scatter of the data might not be as adequate to discriminate between
clusters. In this paper, our proposed PCA scheme can straightforwardly take into
consideration data labeling, which makes the performance of recognition system
better. The remainder of this paper is organized as follows: In Section 2, the
traditional PCA method is reviewed. The idea of the proposed method and its
algorithm are described in Section 3. In Section 4, experimental results are presented
on the ORL, and the Yale face image databases to demonstrate the effectiveness of
our method. Finally, conclusions are presented in Section 5.
2. Principle Component Analysis
Let us consider a set of N sample images
x taking values in an n-
dimensional image space, and the matrix
µ∈? is the mean image of all samples. Let us also consider a linear
transformation mapping the original n-dimensional image space into an m-
dimensional feature space, where m < n. The new feature vectors
defined by the following linear transformation :
ky ∈? are
If the total scatter matrix is defined as
is a matrix with orthonormal columns.
where n is the number of sample images, then after applying the linear
W , the scatter of the transformed feature vectors
W S W . In PCA, the projection
is chosen to maximize the determinant of
the total scatter matrix of the projected samples, i.e.,
An improvement on PCA algorithm for Face Recognition 3
W W S Wwww
corresponding to the m largest eigenvalues.
It also can be proved that PCA finds the projections that maximizes the trace of the
total scatter matrix of the projected samples, i.e ,
is the set of n-dimensional eigenvectors of
trace W S Ww S w
3. Our proposed PCA
In the following part, we show that PCA finds the projection that maximizes the
sum of all squared pairwise distances between the projected data elements and we
also propose our approach. Firstly we will take a look at some necessary background.
The Laplacian is a key entity for describing pairwise relationships between data
elements. This is an symmetric positive-semidefinite matrix, characterized by having
zero row and column sums. Let
=∈? then we have
L be an
NxN Laplacian and
ii iij i
L z z
z LzL z
L zzL z zL zz
r ∈? be m columns of the matrix
Y , applying (5) we have
( ) )
kkijk ij ijij
r LrL L d y y
with ( ,
d y y
theorem, and develop it to our approach.
Theorem 1 . PCA computes the p-dimensional project that maximizes
is the Euclidean distance. Now we turn into proving the following
d y y
Before proving this Theorem, we define a NxN unit Laplacian, denoted by
=− . We have
4 Vo Dinh Minh Nhat and Sungyoung Lee
AL AA NI U A NSAUA NS
to the fact that the coordinates are centered. By (6), we get
N I is identity matrix and U is a matrix of all ones. The last equality is due
d y y
y L y w AL A wNw S w
Formulating PCA as in (7) implies a straightforward generalization—simply
replace the unit Laplacian with a general one in the target function. In the notation of
Theorem 1, this means that the p-dimensional projection will maximize a weighted
sum of squared distances, instead of an unweighted sum. Hence, it would be natural to
call such a projection method by the name weighted PCA.
Let us formalize this idea. Let be
weights, with measuring how important it is for us to place the data elements i and j
further apart in the low dimensional space. Let define NxN Laplacian
we have weighted PCA and it seeks for the m-dimensional projection that maximizes
AL A . The proof of this is the same as that of Theorem 1, just replace
L . Now, we still have one thing need solving. It is how to get the eigenvectors
AL A ∈?
, because this is a very big matrix. And the other one is how to
wt . Let D be the N eigenvalues diagonal matrix of
V be the matrix whose columns are the corresponding eigenvectors, we have
ij i j
=symmetric nonnegative pairwise
1/ ( ,
d x x
. Generalizing (7),
wt d y y
. And this is obtained by taking the m highest eigenvectors of the
A AL ∈?
A AL VVD AL AAL VAL V D
From (10), we see that
AL V is the matrix whose columns are the first N
AL A and D is the diagonal matrix of eigenvalues.
4. Experimental results
This section evaluates the performance of our propoped algorithm compared with
that of the original PCA algorithm and proposed algorithm (named WPCA) based on
using ORL and Yale face image database. In our experiments, firstly we tested the
recognition rates with different number of training samples. (2,3,4,5)
k k =
An improvement on PCA algorithm for Face Recognition 5
of each subject are randomly selected from the database for training and the
remaining images of each subject for testing. For each value of k , 30 runs are
performed with different random partition between training set and testing set. And
for each k training samples experiment, we tested the recognition rates with different
number of dimensions , d , which are from 2 to 10. Table 1& 2 shows the average
recognition rates (%) with ORL database and Yale database respectively. In Fig. 1,
we can see that our method achieves the better recognition rate compared to the
Table 1. The recognition rates on ORL database
2 4 6 8 10
Table 2. The recognition rates on Yale database
2 4 6 8 10
Recognition rate vs. traing samples
Recognition rate vs. dimensions
Recognition rate vs. traing samples
Recognition rate vs. dimensions
Fig. 1. The recognition rate (%) graphs on two databases
A new PCA-based method for face recognition has been proposed in this paper.
The proposed PCA-based method can overcome one drawback existed in the
6 Vo Dinh Minh Nhat and Sungyoung Lee
traditional PCA method. PCA may fail to emphasize the discrimination between the
clusters, no matter how easy the task is, as they are unsupervised techniques. The
directions that maximize the scatter of the data might not be as adequate to
discriminate between clusters. So we proposed a new PCA-based scheme which can
straightforwardly take into consideration data labeling, and makes the performance of
recognition system better. The effectiveness of the proposed approach can be seen
through our experiments based on ORL and Yale face databases. Perhaps, this
approach is not a novel technique in face recognition, however it can improve the
performance of traditional PCA approach whose complexity is less than LDA or ICA
 L. Sirovich and M. Kirby, “Low-Dimensional Procedure for Characterization of Human
Faces,” J. Optical Soc. Am., vol. 4, pp. 519-524, 1987.
 M. Kirby and L. Sirovich, “Application of the KL Procedure for the Characterization of
Human Faces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 1, pp.
103-108, Jan. 1990.
 M. Turk and A. Pentland, “Eigenfaces for Recognition,” J. Cognitive Neuroscience, vol. 3,
no. 1, pp. 71-86, 1991.
 A. Pentland, “Looking at People: Sensing for Ubiquitous and Wearable Computing,” IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 107-119, Jan. 2000.
 M.A. Grudin, “On Internal Representations in Face Recognition Systems,” Pattern
Recognition, vol. 33, no. 7, pp. 1161-1177, 2000.
 G.W. Cottrell and M.K. Fleming, “Face Recognition Using Unsupervised Feature
Extraction,” Proc. Int’l Neural Network Conf., pp. 322-325, 1990.
 D. Valentin, H. Abdi, A.J. O’Toole, and G.W. Cottrell, “Connectionist Models of Face
Processing: a Survey,” Pattern Recognition, vol. 27, no. 9, pp. 1209-1230, 1994.
 P.S. Penev and L. Sirovich, “The Global Dimensionality of Face Space,” Proc. Fourth
IEEE Int’l Conf. Automatic Face and Gesture Recognition, pp. 264- 270, 2000.
 L. Zhao and Y. Yang, “Theoretical Analysis of Illumination in PCA-Based Vision Systems,”
Pattern Recognition, vol. 32, no. 4, pp. 547-564, 1999.
 L. Wiskott, J.M. Fellous, N. Kru¨ ger, and C. von der Malsburg, “Face Recognition by
Elastic Bunch Graph Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 19, no. 7, pp. 775-779, July 1997.
 M.S. Bartlett, J.R. Movellan, and T.J. Sejnowski, “Face Recognition by Independent
Component Analysis,” IEEE Trans. Neural Networks, vol. 13, no. 6, pp. 1450-1464, 2002.
 B.A. Draper, K. Baek, M.S. Bartlett, J.R. Beveridge, “Recognizing Faces with PCA and
ICA,” Computer Vision and Image Understanding: special issue on face recognition, in
 P.C. Yuen and J.H. Lai, “Face Representation Using Independent Component Analysis,”
Pattern Recognition, vol. 35, no. 6, pp. 1247-1257, 2002.
 M.H. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel
Methods,” Proc. Fifth IEEE Int’l Conf. Automatic Face and Gesture Recognition (RGR’02),
pp. 215-220, May 2002.
 Koren, Y.; Carmel, L. “Robust linear dimensionality reduction” Visualization and
Computer Graphics, IEEE Transactions on , Volume: 10 , Issue: 4 , July-Aug. 2004
Pages:459 - 470