Face Recognition Based on Nonlinear DCT Discriminant Feature Extraction Using Improved Kernel DCV.
IEEE Trans. Pattern Anal. Mach. Intell. 01/2000; 22:4-37.
Pattern Recognition. 01/2001; 34:2067-2070.
[show abstract] [hide abstract]
ABSTRACT: In face recognition tasks, the dimension of the sample space is typically larger than the number of the samples in the training set. As a consequence, the within-class scatter matrix is singular and the Linear Discriminant Analysis (LDA) method cannot be applied directly. This problem is known as the "small sample size" problem. In this paper, we propose a new face recognition method called the Discriminative Common Vector method based on a variation of Fisher's Linear Discriminant Analysis for the small sample size case. Two different algorithms are given to extract the discriminative common vectors representing each person in the training set of the face database. One algorithm uses the within-class scatter matrix of the samples in the training set while the other uses the subspace methods and the Gram-Schmidt orthogonalization procedure to obtain the discriminative common vectors. Then, the discriminative common vectors are used for classification of new faces. The proposed method yields an optimal solution for maximizing the modified Fisher's Linear Discriminant criterion given in the paper. Our test results show that the Discriminative Common Vector method is superior to other methods in terms of recognition accuracy, efficiency, and numerical stability.IEEE Transactions on Pattern Analysis and Machine Intelligence 02/2005; 27(1):4-13. · 4.91 Impact Factor
IEICE TRANS. INF. & SYST., VOL.E92–D, NO.12 DECEMBER 2009
Face Recognition Based on Nonlinear DCT Discriminant Feature
Extraction Using Improved Kernel DCV
Sheng LI†, Student Member, Yong-fang YAO†, Xiao-yuan JING†a), Heng CHANG†, Shi-qiang GAO†,
David ZHANG††, and Jing-yu YANG†††, Nonmembers
extraction approach for face recognition. The proposed approach first se-
lects appropriate DCT frequency bands according to their levels of nonlin-
ear discrimination. Then, this approach extracts nonlinear discriminant fea-
tures from the selected DCT bands by presenting a new kernel discriminant
method, i.e. the improved kernel discriminative common vector (KDCV)
method. Experiments on the public FERET database show that this new
approach is more effective than several related methods.
keywords: DCTfrequencybandsselection, theimprovedKDCV,nonlinear
DCT feature extraction, face recognition
This letter proposes a nonlinear DCT discriminant feature
Discrete cosine transform (DCT) is a widely used im-
age processing technique, and discriminant analysis is
an effective image feature extraction and recognition tech-
nique.To date, many linear discriminant methods
have been put forward, such as linear discriminant anal-
ysis (LDA), direct LDA (DLDA), and discrimina-
tive common vector (DCV).
we combined the DCT and linear discriminant techniques
and presented a DCT-LDA face recognition method which
outperforms some conventional linear discriminant meth-
ods. As an extension of linear discriminant technique,
the kernel-based nonlinear discriminant analysis technique
has now been widely applied to the field of pattern recog-
nition. Baudat et al. developed a commonly used gener-
alized discriminant analysis (GDA) method for nonlinear
discrimination. Jing et al. put forward a kernel DCV
(KDCV) method. Shen et al. combined Gabor wavelets
and GDA for face identification and verification. In this
paper, we develop DCT-LDA and propose a nonlinear DCT
discriminant feature extraction approach for face recogni-
tion. First, we provide the representation of DCT frequency
bands and select appropriate bands. Second, we extract the
nonlinear discriminant features from the selected bands by
presenting a new kernel discriminant method, that is, the
improved KDCV method which takes advantages of both
In our previous work,
Manuscript received May 8, 2009.
Manuscript revised July 24, 2009.
†The authors are with the Nanjing University of Posts and
Telecommunications, Nanjing, 210003 China.
††The author is with the Dept. of Computing, Hong Kong Poly-
technic University, Hong Kong.
†††The author is with the Nanjing University of Science and
Technology, Nanjing, China.
kernel discriminative common vectors and different vectors.
The nearest neighbor classifier is adopted in classifying the
extracted features. We employ a large public face database,
the FERET database, as the test data. Experiments demon-
strate the effectiveness of the proposed new approach.
Suppose that each gray image sample in the sample set X is
sized C×D where C ≥ D. We perform the two-dimensional
DCT on each image and obtain its transformed image.
form: (a) an original image; (b) its DCT transformed image
and frequency bands representation. According to Fig.1(b),
most information or energy of face image is concentrated in
the left-up corner, that is, in the low frequency bands. We
use a half square ring to express the kthfrequency band,
where 1 ≤ k ≤ C. If we select the kthband, then we keep the
corresponding frequency band values of DCT transformed
image, otherwise we set the band values to change to zero.
The nonlinear discriminant analysis technique gener-
ally uses the kernel-based method to realize a conceptual
nonlinear transformation from an input feature space into a
high-dimensional space. With respect to a given nonlinear
mapping function Φ, the input data space Rncan be mapped
into the kernel space F: Φ : Rn→ F, x ?→ Φ(x). Suppose
that there are c known pattern classes in the sample set X, li
is the training sample number of the ithclass, and there will
be a total of M =
i=1litraining samples. Let SΦ
resent the between-class scatter matrix and the within-class
scatter matrix in F, respectively. They are defined as:
Face demonstration images of DCT transform and filtering.
Copyright c ? 2009 The Institute of Electronics, Information and Communication Engineers
IEICE TRANS. INF. & SYST., VOL.E92–D, NO.12 DECEMBER 2009
m) − μΦ
m) − μΦ
the mean value of the ithclass samples, and μΦis the mean
value of all training samples. Define the Fisher discriminant
criterion in F as:
m) is the mthtraining sample of the ithclass, μΦ
The optimal discriminant vector ϕ can be expressed by
a linear combination of the observations in F, we have ϕ =
and α = (a1,a2,...aM)T. We thus obtain:
j=1ajΦ(xj) = Hα, where H = [Φ(x1),Φ(x2),...,Φ(xM)]
αT(K (IN− U)K)α,
where K is an M × M kernel symmetric matrix, K =
tity matrix; and U is an M × M block diagonal matrix.
U = diag(U1,...,Uc) where Ui(i = 1,2,···,c) is a li× li
matrix, the elements of which are equal to 1/li.
i,j=1,2,···,M, Kij = Φ(xi)TΦ
; INis an M × M iden-
B= KUK andS
W= K (IN− U)K,
the nonlinear discriminant capability of the band can be cal-
where tr() indicates the trace of matrix, S
arately represent the corresponding between-class scatter
matrix and the within-class scatter matrix in the kernel
space. In the experiments, we calculate the nonlinear
rank them in a descending order and select a part of bands
with the largest discriminant capabilities.
Next, we present an improved KDCV method to ex-
tract nonlinear discriminant features from the selected DCT
bands as follows:
Step1. Compute the common vectors and different vectors:
The principle of KDCV is to acquire the nonlinear pro-
jection transform in the null space of S
J (W) = arg max
= arg max
an M×M symmetric matrix, and M is the total number
of training samples. Let V be the non-null space of S
and V⊥be the null space of S
of sample set X in the kernel space, we have:
comis the common vector scatter matrix, S
W. For any sample Φ
∈ V⊥, Φ
different vector parts of Φ
nonlinear mapping function. In this paper, Φ is ex-
pressed by the Gaussian kernel function k(c1,c2) =
. Note that for all the samples of
the ithclass, their common vector parts are same.
Step2. Compute the optimal projection transforms Wcom
and Wdif: According to KDCV, we use the com-
mon vectors to construct the scatter matrix SΦ
then gettheprojection transformWcomcomposed ofthe
eigenvectors of SΦ
tain the following kernel discriminative common vec-
dimension of yi
In this paper, we use the different vectors to calcu-
late Wdif. Take Φ
construct the corresponding within-class scatter matrix
designed to satisfy the Fisher discriminant criterion:
Wdifis composed of the eigenvectors corresponding to
the nonzero eigenvalues of
tain the following kernel discriminative different vec-
where the feature dimension of yi
Step3. Construct the synthesized optimal projection trans-
form W for the improved KDCV: W is constructed by
serially combining Wcomand Wdif:
is implemented as follows:
separately represent the common vector and
∈ V, Φ
. Φ indicates a given
comwith nonzero eigenvalues. We ob-
i = 1,2,···,c,
comis identical for the ithclass and the feature
comis c − 1.
W difand between-class scatter matrix SΦ
of each training sample to
B dif. Wdifis
B dif. We ob-
m dif= WT
i = 1,2,···,c,
m difis c − 1.
(i) Perform DCT on each image sample of X. Select appro-
priate DCT frequency bands for the transformed im-
ages and express the selected bands in the form of fea-
ture vector. We thus obtain a one-dimensional sample
(ii) Use the improved KDCV to compute the optimal pro-
jection transform W in the kernel space. We acquire
the kernel discriminative common vector ycomand ker-
nel discriminative different vector ydif:
W · Φ(x) =
Normalize ycomand ydifas
new sample set Y corresponding to X
(iii) Take the nearest neighbor classifier with the cosine dis-
tance measure to classify Y. The distance d() between
two arbitrary samples, y1and y2, is defined by
3. Experimental Results
where ?? represents the 2-norm of vector. We obtain a
d(y1,y2) = −
We use a public large face image database, the FERET
database, as the test data. In the experiment it con-
tains 600 frontal facial images corresponding to 200 indi-
viduals with each person contributing 3 images. The images
in this database were captured under various illuminations
and facial expressions. Each image is 384 × 256 with 256
gray levels. Since many images in this database include the
background and the body chest region, we adopt the FERET
protocol preprocessing software to automatically crop every
image sample. That is, the facial region of each image is
cropped off by using the centers of eyes and mouth, and the
cropped images are resized to 128×128. Figure 2 shows the
demonstration face images of five subjects.
In the test, two images of each subject are randomly
chosen for training, and the remaining one is used for test-
ing. We perform the two-dimensional DCT on the images.
Figure 3 shows the nonlinear discriminability values of all
DCT bands by using Formula (6).
According to the nonlinear discriminability values
shown in Fig.3, we rank all frequency bands in a descending
order and in turn choose a part of them with the largest non-
linear discriminability values. Figure 4 shows the recogni-
tion rates of the selected DCT frequency bands. While using
the selected bands from No. 1 to No. 23, we get the highest
recognition rate of 99%.
We compare the proposed approach with five repre-
sentative discriminant methods which include the DCT-
LDA method, the generalized discriminant analysis
(GDA) method, the DCT-GDA method, the DCT-KDCV
method, and the Gabor-GDA method. DCT-GDA and
DCT-KDCV take the same method of selecting frequency
bands as the proposed approach. In the test, the numbers
of selected frequency bands are 22, 25, and 21 for DCT-
LDA, DCT-GDA, and DCT-KDCV, respectively. All com-
pared methods use the same classifier, i.e. the nearest neigh-
bor classifier. Table 1 shows the recognition rates of all
compared methods. The proposed approach performs bet-
ter than other methods. From Table 1, the proposed ap-
proach improves the recognition rates at least by 1.5% (=
Demonstration face images of five subjects.
Nonlinear discriminability values of all DCT bands.
Recognition rates of selected DCT frequency bands.
Recognition rates of compared methods.
Recognition rates (%)
99%−97.5%). Besides, compared with the commonly used
saves 75.64% (= (65.76−16.02)/65.76×100%) computing
time (second), where the computing time indicates the time
of achieving discriminant features.
IEICE TRANS. INF. & SYST., VOL.E92–D, NO.12 DECEMBER 2009
The proposed nonlinear DCT discriminant approach is ob-
viously superior to DCT-LDA, and it can extract more ef-
fective nonlinear discriminant features than the two repre-
sentative methods of GDA and KDCV. And it outperforms
the Gabor-GDA method. By selecting the DCT frequency
bands, our approach also improves the computing speed of
nonlinear discriminant methods.
The work described in this paper was fully supported by
the National Natural Science Foundation of China (NSFC)
under Project No. 60772059, the Natural Science Research
Foundation of Jiangsu Province Universities under Project
No. 07KJB520081, and the Research Foundation of Nanjing
University of Posts and Telecommunications under Project
No. NY207027, Project No. NY208051.
 Z.M. Hafed and M.D. Levine, “Face recognition using the discrete
cosine transform,” Int. J. Comput. Vis., vol.43, no.3, pp.167–188,
 D. Zhang, X.Y. Jing, and J. Yang, “Biometric image discrimination
(BID) technologies,” IGP/INFOSCI/IRM Press, 2006.
 A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical pattern recognition:
A review,” IEEE Trans. Pattern Anal. Mach. Intell., vol.22, no.1,
 H. Yu and J. Yang, “A direct LDA algorithm for high-dimensional
data with application to face recognition,” Pattern Recognit., vol.34,
no.12, pp.2067–2070, 2001.
 H. Cevikalp, M. Neamtu, M. Wilkes, and A. Barkana, “Discrimi-
native common vectors for face recognition,” IEEE Trans. Pattern
Anal. Mach. Intell., vol.27, no.1, pp.4–13, 2005.
 X.Y. Jing and D. Zhang, “A face and palmprint recognition approach
based on discriminant DCT feature extraction,” IEEE Trans. Syst.,
Man Cybern. B, vol.34, no.6, pp.2405–2415, 2004.
 G. Baudat and F. Anouar, “Generalized discriminant analysis using
a kernel approach,” Neural Comput., vol.12, no.10, pp.2385–2404,
 X.Y. Jing, D. Zhang, J.Y. Yang, Y.F. Yao, and M. Li, “Face and
sample biometrics recognition,” Pattern Recognit., vol.40, no.11,
 L.L. Shen, L. Bai, and M. Fairhurst, “Gabor wavelets and general
discriminant analysis for face identification and verification,” Image
Vis. Comput., vol.25, no.5, pp.553–563, 2007.
 M.B. Gulmezoglu, V. Dzhafarov, and A. Barkana, “The common
vector approach and its relation to principal component analysis,”
IEEE Trans. Speech Audio Process., vol.9, no.6, pp.655–662, 2001.
 P.J. Phillips, H. Moon, S.A. Rizvi, and P.J. Rauss, “The FERET eval-
uation methodology for face-recognition algorithms,” IEEE Trans.
Pattern Anal. Mach. Intell., vol.22, no.10, pp.1090–1104, 2000.