Page 1

An improvement on PCA algorithm for Face Recognition

Vo Dinh Minh Nhat and Sungyoung Lee

Kyung Hee University – South of Korea

vdmnhat@oslab.khu.ac.kr

Abstract. Principle Component Analysis (PCA) technique is an important and

well-developed area of image recognition and to date many linear

discrimination methods have been put forward. Despite these efforts, there

persist in the traditional PCA some weaknesses. In this paper, we propose a

new PCA-based method that can overcome one drawback existed in the

traditional PCA method. In face recognition where the training data are labeled,

a projection is often required to emphasize the discrimination between the

clusters. PCA may fail to accomplish this, no matter how easy the task is, as

they are unsupervised techniques. The directions that maximize the scatter of

the data might not be as adequate to discriminate between clusters. So we

proposed a new PCA-based scheme which can straightforwardly take into

consideration data labeling, and makes the performance of recognition system

better. Experiment results show our method achieves better performance in

comparison with the traditional PCA method.

Index Terms – Principle component analysis, face recognition.

1. Introduction

Principal component analysis (PCA), also known as Karhunen-Loeve expansion, is

a classical feature extraction and data representation technique widely used in the

areas of pattern recognition and computer vision. Sirovich and Kirby [1], [2] first used

PCA to efficiently represent pictures of human faces. They argued that any face

image could be reconstructed approximately as a weighted sum of a small collection

of images that define a facial basis (eigenimages), and a mean image of the face.

Within this context, Turk and Pentland [3] presented the well-known Eigenfaces

method for face recognition in 1991. Since then, PCA has been widely investigated

and has become one of the most successful approaches in face recognition [4], [5],

[6], [7]. However, Wiskott et al. [10] pointed out that PCA could not capture even the

simplest invariance unless this information is explicitly provided in the training data.

Recently, two PCA-related methods, independent component analysis (ICA) and

kernel principal component analysis (Kernel PCA) have been of wide concern.

Bartlett et al. [11] and Draper et al. [12] proposed using ICA for face representation

and found that it was better than PCA when cosines were used as the similarity

Page 2

2 Vo Dinh Minh Nhat and Sungyoung Lee

measure (however, their performance was not significantly different if the Euclidean

distance is used). Yang [14] used Kernel PCA for face feature extraction and

recognition and showed that the Kernel Eigenfaces method outperforms the classical

Eigenfaces method. However, ICA and Kernel PCA are both computationally more

expensive than PCA. The experimental results in [14] showed the ratio of the

computation time required by ICA, Kernel PCA, and PCA is, on average, 8.7: 3.2:

1.0.

In face recognition where the data are labeled, a projection is often required to

emphasize the discrimination between the clusters. PCA may fail to accomplish this,

no matter how easy the task is, as they are unsupervised techniques. The directions

that maximize the scatter of the data might not be as adequate to discriminate between

clusters. In this paper, our proposed PCA scheme can straightforwardly take into

consideration data labeling, which makes the performance of recognition system

better. The remainder of this paper is organized as follows: In Section 2, the

traditional PCA method is reviewed. The idea of the proposed method and its

algorithm are described in Section 3. In Section 4, experimental results are presented

on the ORL, and the Yale face image databases to demonstrate the effectiveness of

our method. Finally, conclusions are presented in Section 5.

2. Principle Component Analysis

Let us consider a set of N sample images

12

...

{ ,

x x

,...,}

N

∈?

x taking values in an n-

]

with

dimensional image space, and the matrix

µ∈? is the mean image of all samples. Let us also consider a linear

transformation mapping the original n-dimensional image space into an m-

dimensional feature space, where m < n. The new feature vectors

defined by the following linear transformation :

12

[

nxN

N

Ax xx

=

ii

xx

µ=−

and

n

m

ky ∈? are

T

kk

yW x

=

and

T

YW A

=

(1)

where

If the total scatter matrix is defined as

1,2,...,

kN

=

and

nxm

W ∈?

is a matrix with orthonormal columns.

1

()()

N

TT

Tkk

k

SAAxx

µµ

=

==−−

∑

(2)

where n is the number of sample images, then after applying the linear

transformation

W , the scatter of the transformed feature vectors

W S W . In PCA, the projection

opt

W

is chosen to maximize the determinant of

the total scatter matrix of the projected samples, i.e.,

T

12

{ ,

y y

,...,}

N

y

is

T

T

Page 3

An improvement on PCA algorithm for Face Recognition 3

12

argmax[ ...]

T

optWTm

W W S Wwww

==

(3)

where {

corresponding to the m largest eigenvalues.

It also can be proved that PCA finds the projections that maximizes the trace of the

total scatter matrix of the projected samples, i.e ,

1,2,..., }

iw im

=

is the set of n-dimensional eigenvectors of

T S

1

()

m

TT

i optToptTi

i

trace W S Ww S w

=

=∑

(4)

is maximized.

3. Our proposed PCA

In the following part, we show that PCA finds the projection that maximizes the

sum of all squared pairwise distances between the projected data elements and we

also propose our approach. Firstly we will take a look at some necessary background.

The Laplacian is a key entity for describing pairwise relationships between data

elements. This is an symmetric positive-semidefinite matrix, characterized by having

zero row and column sums. Let

[...]T

N

zz zz

=∈? then we have

∑∑

∑∑

L be an

NxN Laplacian and

1 2

N

2

2

i

2

j

2

2

() 2

+

()

T

ii iij i

L z z

j

ii j

<

ijij ijijij

i j

<

i j

<

i j

<

z LzL z

L zzL z zL zz

=+=

=−+=−−

∑

(5)

Let 12

, ,...,

r r

N

m

r ∈? be m columns of the matrix

T

Y , applying (5) we have

22

11

(( )

r

( ) )

k

r

( ,)

mm

T

kkijk ij ijij

ki j

<

ki j

<

r LrL L d y y

==

⎛

⎜

⎝

⎞

⎟

⎠

=−−=−

∑∑∑∑

(6)

with ( ,

d y y

theorem, and develop it to our approach.

Theorem 1 . PCA computes the p-dimensional project that maximizes

)

ij

is the Euclidean distance. Now we turn into proving the following

2

( ,

d y y

)

ij

i j

<∑

(7)

Before proving this Theorem, we define a NxN unit Laplacian, denoted by

1

ij

LNδ

=− . We have

uL ,

as

u

Page 4

4 Vo Dinh Minh Nhat and Sungyoung Lee

()

uTTT

NTT

AL AA NI U A NSAUA NS

=−=−=

(8)

with

to the fact that the coordinates are centered. By (6), we get

N I is identity matrix and U is a matrix of all ones. The last equality is due

2

111

( ,

d y y

)

mmm

T

i

uT

i

uTT

iijiiTi

i j

<

iii

y L y w AL A wNw S w

===

===

∑∑∑∑

(9)

Formulating PCA as in (7) implies a straightforward generalization—simply

replace the unit Laplacian with a general one in the target function. In the notation of

Theorem 1, this means that the p-dimensional projection will maximize a weighted

sum of squared distances, instead of an unweighted sum. Hence, it would be natural to

call such a projection method by the name weighted PCA.

Let us formalize this idea. Let be

{

wt

weights, with measuring how important it is for us to place the data elements i and j

further apart in the low dimensional space. Let define NxN Laplacian

wtij

L

wtij

⎪−

≠

⎩

we have weighted PCA and it seeks for the m-dimensional projection that maximizes

2

( ,)

ijij

i j

wT

AL A . The proof of this is the same as that of Theorem 1, just replace

uL by

L . Now, we still have one thing need solving. It is how to get the eigenvectors

of

AL A ∈?

, because this is a very big matrix. And the other one is how to

define

ij

wt . Let D be the N eigenvalues diagonal matrix of

V be the matrix whose columns are the corresponding eigenvectors, we have

,1

}N

ij i j

=symmetric nonnegative pairwise

ij

w

ij

i j

≠

ij

⎧

⎪

=

=⎨

∑

and

0,

1/ ( ,

d x x

⎪ ⎩

)

ij

ij

ij

x xsameclass

wt

other

∈

⎧⎪

=⎨

. Generalizing (7),

wt d y y

<∑

. And this is obtained by taking the m highest eigenvectors of the

matrix

w

wTnxn

TwNxN

A AL ∈?

and

()()

TwwTww

A AL VVD AL AAL VAL V D

=⇔=

(10)

From (10), we see that

eigenvectors of

w

AL V is the matrix whose columns are the first N

AL A and D is the diagonal matrix of eigenvalues.

wT

4. Experimental results

This section evaluates the performance of our propoped algorithm compared with

that of the original PCA algorithm and proposed algorithm (named WPCA) based on

using ORL and Yale face image database. In our experiments, firstly we tested the

recognition rates with different number of training samples. (2,3,4,5)

k k =

images

Page 5

An improvement on PCA algorithm for Face Recognition 5

of each subject are randomly selected from the database for training and the

remaining images of each subject for testing. For each value of k , 30 runs are

performed with different random partition between training set and testing set. And

for each k training samples experiment, we tested the recognition rates with different

number of dimensions , d , which are from 2 to 10. Table 1& 2 shows the average

recognition rates (%) with ORL database and Yale database respectively. In Fig. 1,

we can see that our method achieves the better recognition rate compared to the

traditional PCA.

Table 1. The recognition rates on ORL database

d

k

2

3

4

5

2 4 6 8 10

PCA

39.69

40.36

38.75

37.00

WPCA

44.24

44.84

41.62

41.33

PCA

61.56

66.79

63.75

68.00

WPCA

62.11

68.49

67.86

72.57

PCA

69.69

70.00

78.33

79.50

WPCA

71.22

72.75

82.35

84.57

PCA

78.13

78.21

83.75

85.50

WPCA

81.35

82.09

85.76

88.97

PCA

78.49

80.36

86.25

89.00

WPCA

82.05

82.72

89.03

91.39

Table 2. The recognition rates on Yale database

d

k

2

3

4

5

2 4 6 8 10

PCA

40.56

42.50

43.10

57.22

WPCA

42.95

45.17

53.20

59.30

PCA

58.33

74.17

71.67

72.78

WPCA

62.37

77.89

73.11

75.01

PCA

66.48

78.33

83.10

83.89

WPCA

69.18

80.62

87.13

84.55

PCA

70.93

81.67

88.81

87.22

WPCA

73.44

84.47

90.72

88.92

PCA

76.11

86.67

90.71

88.33

WPCA

78.14

90.49

94.06

91.77

2345

75

80

85

90

95

Recognition rate vs. traing samples

ORL database

246810

20

40

60

80

100

Recognition rate vs. dimensions

2345

75

80

85

90

95

Recognition rate vs. traing samples

ORL database

2468 10

40

60

80

100

Recognition rate vs. dimensions

PCA

WPCA

Fig. 1. The recognition rate (%) graphs on two databases

5. Conclusions

A new PCA-based method for face recognition has been proposed in this paper.

The proposed PCA-based method can overcome one drawback existed in the