Sparsely Encoded Local Descriptor for face recognition
ABSTRACT In this paper, a novel Sparsely Encoded Local Descriptor (SELD) is proposed for face recognition. Compared with Kmeans or Randomprojection tree based previous methods, sparsity constraint is introduced in our dictionary learning and sequent image encoding, which implies more stable and discriminative face representation. Sparse coding also leads to an image descriptor of summation of sparse coefficient vectors, which is quite different from existing codewords appearance frequency(/histogram)based descriptors. Extensive experiments on both FERET and challenging LFW database show the effectiveness of the proposed SELD method. Especially on the LFW dataset, recognition accuracy comparable to the best known results is achieved.

Conference Paper: Multifeature Fusion for Video Object Tracking
[Show abstract] [Hide abstract]
ABSTRACT: Tracking by individual features, such as color or motion, is the main reason why most tracking algorithms are not as robust as expected. In order to better describe the object, multifeature fusion is very necessary. In this paper we introduce a graph grammar based method to fuse the low level features and apply them to object tracking. Our tracking algorithm consists of two phases: key point tracking and tracking by graph grammar rules. The key points are computed using salient level set components. All key points, as well as the colors and the tangent directions, are fed to a Kalman filter for object tracking. Then the graph grammar rules are used to dynamically examine and adjust the tracking procedure to make it robust.Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on; 01/2012  [Show abstract] [Hide abstract]
ABSTRACT: Conventional representation methods try to express the test sample as a weighting sum of training samples and exploit the deviation between the test sample and the weighting sum of the training samples from each class (also referred to as deviation between the test sample and each class) to classify the test sample. In particular, the methods assign the test sample to the class that has the smallest deviation among all the classes. This paper analyzes the relationship between face images under different poses and, for the first time, devises a bidirectional representation methodbased pattern classification (BRBPC) method for face recognition across pose. BRBPC includes the following three steps: the first step uses the procedure of conventional representation methods to express the test sample and calculates the deviation between the test sample and each class. The second step first expresses the training sample of a class as a weighting sum of the test sample and the training samples from all the other classes and then obtains the corresponding deviation (referred to as complementary deviation). The third step uses the scorelevel fusion to integrate the scores, that is, deviations generated from the first and second steps for final classification. The experimental results show that BRBPC classifies more accurately than conventional representation methods.Neural Computing and Applications · 1.76 Impact Factor 
Article: Local descriptors and similarity measures for frontal face recognition: A comparative analysis
[Show abstract] [Hide abstract]
ABSTRACT: Face recognition based on local descriptors has been recently recognized as the stateoftheart design framework for problems of facial identification and verification. Given the diversity of the existing approaches, the main objective of this paper is to present a comprehensive, indepth comparative analysis of the recent face recognition methodologies based on local descriptors. We carefully review and contrast a suite of commonly encountered local descriptors. In particular, we highlight their main features in the setting of problems of facial recognition. The main advantages and limitations of the discussed methods are identified. Furthermore a carefully structured taxonomy of the existing approaches is presented We show that the presented techniques are particularly suitable for large scale facial authentication systems in which the training stage with the use of the overall face database might be computationally prohibited. A variety of approaches being used to realize a fusion of the local descriptions into the global ones are discussed along with their pros and cons. Furthermore different similarity measures and possible extensions and hybridizations with statistical learning techniques are elaborated on as well. Experimental results obtained for the FERET database are carefully assessed and compared.Journal of Visual Communication and Image Representation 01/2013; 24(8):1213–1231. · 1.20 Impact Factor
Page 1
Sparsely Encoded Local Descriptor for Face
Recognition
Zhen Cui1,2,3, Shiguang Shan1,2, Xilin Chen1,2, Lei Zhang4
1 Key Lab of Intelligent Information Processing, Institute of Computer Technology, Chinese Academy of Science (CAS), Beijing,
100190, China
2 Graduate University of CAS, Beijing, 100190, China
3 School of Computer Science and Technology, Huaqiao University, Xiamen, 361021,China
4 Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
{zcui, sgshan}@ jdl.ac.cn
Abstract—In this paper, a novel Sparsely Encoded Local
Descriptor (SELD) is proposed for face recognition. Compared
with Kmeans or Randomprojection tree based previous
methods, sparsity constraint is introduced in our dictionary
learning and sequent image encoding, which implies more stable
and discriminative face representation. Sparse coding also leads
to an image descriptor of summation of sparse coefficient vectors,
which is quite different from existing codewords appearance
frequency(/histogram)based descriptors. Extensive experiments
on both FERET and challenging LFW database show the
effectiveness of the proposed SELD method. Especially on the
LFW dataset, recognition accuracy comparable to the best
known results is achieved.
Keywordslocal descriptor; Randomprojection tree; Sparse
coding; face verification; face identification
I.
INTRODUCTION
In the past few decades, face recognition has attracted
significant attention due to its wide potential applications in
public security, law enforcement, etc. Numerous methods or
techniques have been presented as surveyed in [1], and
considerable progress had been achieved. Currently, many face
recognition systems have been able to work well under well
controlled conditions with cooperative users. However, as
discovered by MBGC [2] and LFW evaluation [3], face
recognition under uncontrolled
uncooperative users remains a great challenge. To successfully
address this problem, how to represent faces plays the key role.
environment with
In the past decade, local descriptor based face
representation, which models image micropatterns, has formed
a blowout [4,5,6,7,8,9,10,11,12,13,14,15], due to their
robustness to extrinsic variations. A large portion of these
methods are based on manually designed local patterns. For
example, by combining the sign of the difference of central
pixel intensity from those of its neighboring pixels, LBP [7]
implicitly encodes the micropatterns of the input image such
as flat areas, spots, lines and edges. Since the sign is invariant
to monotonic photometric change, LBP is robust to lighting
variation to some extent. Many LBP variations or extensions
have also been proposed. Zhao and Pietikäinen extended LBP
to the spatialtemporal domain [12]. In order to make LBP
robust to random and quantization noise in nearuniform face
regions, Local Ternary Patterns (LTP) [13] have also been
proposed. By combining Gabor filtering with LBP, Local
Gabor Phase Patterns (LGBP) [9] was proposed to extend LBP
to multiple resolution and orientation. Later on, Histogram of
Gabor Phase Patterns [8] and Local Gabor XOR Pattern [14]
were further proposed to exploit the Gabor phase information.
In addition, some local descriptors originally proposed for
other object recognition tasks were also introduced for face
recognition, such as Histogram of Oriented Gradients (HOG)
[10] or SIFT [11,15]. Manually designing local pattern avoids
complicated learning process. Nevertheless creating an optimal
descriptor is nontrivial. One has to balance between the
discriminative power and the robustness against data variance.
In contrast to the above handcrafted approaches whose
patterns are predefined manually, the textonbased methods
typically learn some visual primitives as codewords from a
large number of local face image patches and utilize the
frequency of the codewords as face representation.
Considering that highlevel facial semantic features consist of
those lowlevel micro visual structures, Meng et al. proposed
Local Visual Primitives (LVP) for modeling and recognition
[16], which learns LVP by Kmeans clustering. Xie et al. [17]
further applied the Kmeans clustering approach to patch sets
sampled from the Gabor filtered face images and then
quantized codes of each patch, at last concatenated block
based histograms of patterns to describe the whole face.
Ahonen et al. [18] also tried Kmeans cluster to build local
filter response codebook. More recently, Cao et al. [6] pointed
that quantized codes based on Kmeans usually tend to have
uneven distribution, so the resulting code histogram would be
less informative and less compact. So they applied Random
projection tree [19] to replace Kmeans clustering.
Another recent progress in face recognition field is the
sparse representation based method. In [20], Wright et al.
proposed to recognize a face through finding its sparse
coefficients with respect to the whole training set as the
dictionary and seeking for the face whose samples result in the
smallest reconstruction error by using their corresponding
sparse coefficients. In case of multiple wellaligned samples
per person, the method reports impressive accuracy, especially
when faces are partially occluded. Zhang et al. [21]
incorporated the face labels in the dictionarylearning stage to
Page 2
Concatenated code
summation
Blockwise sparse
code summation
Pixelwise sparse
code vectors
Pixelwise sampled
intensity vectors
Sampling
template
Preprocessed
input image
y11 y12… y1w
y21 y22… y2w
... …
yh1yh2… yhw
r
*
PCA
α11α12…α1w
α21α22…α2w
... …
αh1αh2…αhw
SELD feature
Sparse Dictionary Learnt Offline
Figure 1. Overview of the proposed SELDbased face representation extraction
obtain an efficient dictionary that retains the representative
power while making the dictionary discriminative. Yang et al.
[22] used the image local Gabor feature for sparse
representation, and proposed an associated Gabor occlusion
computing algorithm to handle the occluded face image. Above
methods are all holistic representation based methods, thus not
as robust as local methods. Furthermore, the sparse
representation method proposed in [20] can work only for
scenario where each face has multiple enrolled face images.
Therefore, it can not be applied to face verification scenario as
evaluated by the FRGC or LFW protocol.
To address above problems, in this paper, we enhance
textonlearning based local descriptor method by introducing
sparse coding, thus propose Sparsely Encoded Local Descriptor
(SELD). Simply speaking, in our SELD method, sparsity
constraint is introduced during the local visual primitive
dictionary learning and sequent image encoding, which is
distinctly unlike Kmeans clustering or Randomprojection tree
as in previous methods [6, 16, 17]. As is validated recently by
many researchers, sparsity implies more discriminative power
and stableness of the representation.
Another big difference of our SELD method from previous
textonbased methods [6, 16, 17] is that, our description is
sparse coefficient vector based, rather than codewords
frequency (or histogram) based. Specifically, during the image
encoding stage, the coefficient vector of the sparse coding is
computed at each image position, and then summed together to
form the local descriptor of some image blocks. Compared
with frequency (or histogram) based methods, coefficient
vector is similar to some soft clustering, thus implies more
robustness to variations in image appearance.
Compared with sparse representation method in [20], our
SELD method is a more general face representation. As a face
descriptor, our method can be easily applied to face verification
and face identification with single sample per person, which are
impossible for methods like in [20].
The proposed SELD method is extensively validated by
experiments on two face databases: the Labeled Faces in the
Wild (LFW) [3] which is designed for unconstrained face
verification, and the FERET database [23] which is used for
face identification. Especially on the LFW, besides the
comparisons of our method with previous methods based on K
means or Randomprojection tree, we also compare with
existing stateoftheart approaches that reported best known
results on LFW. Comparable accuracy is achieved by our
methods.
II.
SPARSELY ENCODED LOCAL DESCRIPTOR
In this section, we first present the flowchart of the
proposed SELD method for face recognition. Then we describe
the critical steps of our method in detail, including how to learn
the sparse dictionary and how to sparsely encode a face image.
A. Overview of SELD for face recognition
As mentioned above, our method is essentially an enhanced
textonbased method. Therefore, it follows similar idea to bag
ofwords method. The main difference lies in the sparsity
constraint in dictionary learning and the nonfrequency based
descriptor. Intuitively, the overall flowchart of the proposed
SELDbased face representation method is illustrated in Fig.1
and explained as follows.
As shown in Figure 1, we first align and normalize the
original images geometrically and filter it using a DoG filter to
remove both highfrequency noise and lowfrequency
illumination variations. Then, at each pixel, an intensity vector
is formed by sampling its neighbor pixel’s intensity according
to a predefined sampling template. In the next step, the
intensity vector at each pixel is sparsely encoded with the
offlinelearned sparse dictionary under nonnegative constraint,
which generates a sparse code vector, i.e., the sparse coefficient
vector. With these sparse code vectors computed, the face
image is spatially partitioned into some blocks, and the sparse
code vectors in each block are summed together to form a
descriptor of the block. Next, the accumulated vectors of all the
blocks are concatenated together to form a single vector, which
is finally reduced in dimensionality by principal component
analysis (PCA) to generate the SELD feature of the input face
image. For face recognition or verification, cosine similarity of
two SELD features can be used to match two face images.
B. Sparse Dictionary Learning
Research on general overcomplete dictionaries mostly
commenced over the past decade and is still intensely ongoing.
Such dictionaries bring the prosperity in the definition of a
signal representation. Given an overcomplete dictionary
matrix D=[d1,d2,…,dk]∈Rn×K, K>n that contains K prototype
Page 3
signalatoms, a signal y∈Rn can be represented as a sparse
linear combination of these atoms.
In this paper, we use the KSVD algorithm [24] to train the
overcomplete dictionary. KSVD is an iterative method that
alternates between sparse coding of the examples based on the
current dictionary and a process of updating the dictionary
atoms to better fit the data. Formally, given a training set with
N samples, KSVD’s objective function is
2
0
F0
,
min
D X
s.t. ,
i
Y DXi x
∀
T
−≤
(1)
where X=(x1,x2,…,xN) with xi∈Rn are the sparse coefficient
vectors for training sample yi, k is the number of codewords in
the dictionary, and ‖ · ‖0 is the l0 norm.
The KSVD algorithm has two stages: in the first stage, D
is fixed, and the above optimization problem is a search for
sparse representation with coefficients summarized in the
matrix. It may be solved by any pursuit algorithm; the second
stage is updating the dictionary together with the nonzero
coefficients. In this stage, the algorithm updates each column in
the dictionary, dk, and the coefficients, xR
The objective function (1) can be rewritten as
i, the ith row in X.
2
2
2
F
k
Rkk
F
k
Rk
ki
i
Ri
F
xdExdxdY DXY
−=−
⎟
⎠
⎞
⎜
⎝
⎛
−=−
∑
≠
(2)
Then SVD is applied to Ek and get Ek=U△V. Then, we choose
the first column of U and the first column of V multiplied by
△11 as the updated dk , xR
k respectively.
With the above KSVD method, our sparse dictionary is
learned by the following steps:
1) Preprocess each normalized image in the training face
image set by DoG filtering.
2) Sample patches of size p×p pixels from DoG filtered
images to form the patch set S. If we have N training images
and sample c patches from each image, there are c×N patches
in S. All sampled patches are normalized to zero mean and
unit length.
3) For patch set S, KSVD algorithm is utilized to
construct the sparse dictionary Dn×K, where K is the nunber of
codewords in the dictionary.
In the above learning algorithm, one of the problems might
be how many patches should be sampled for training. In
principle, it seems that we should sample as densely as possible
to obtain a large number of patches. However, this implies very
timeconsuming KSVD. Fortunately, we empirically find that
only thousands of patches are sufficient for our purpose. So, the
patches can be sampled rather sparsely in each image.
C. SELDbased face representation
After the sparse dictionary D is learned, we then describe in
this section how to utilize it to extract SELD feature for any
input testing face image. As shown in Fig.1, given an input
image already normalized and filtered by DoG, we first sample
patches centered at each pixel and normalize the sampled
intensity vectors to zeros mean and unit length. Then, we apply
sparse coding to encode the sampled intensity vectors to sparse
code vectors. Formally, let the sampled intensity vector at
pixel (i,j) be yij. Its sparse code vector αij is then computed by
the following optimization:
2
min
ij
D
α
where ‖ · ‖1 is the l1 norm. As shown in (3), in our implement,
nonnegative constraint is introduced to guarantee all the
entries of the sparse coefficient α nonnegative. The reason we
impose this constraint is that intuitively we need an additive
combination of the codewords to represent each patch. This is
also consistent with our sequent summation of the sparse code
vectors in each image block.
1
F
. . ( )s t0
ijyl
αα λ αα=−+≥
(3)
After encoding, the input image is converted to “sparse
code” map. The encoded image is then divided into an m×n
grid of blocks. Then, we add all the nonnegative codes in each
block to form one sparse code vector for this block. Next, the
accumulated vectors of all the blocks are concatenated together
to form a single vector describing the whole face image.
If we use the concatenated vector as the final descriptor, the
dimension of the resulting face feature may be very high. A
highdimensional feature not only results in curse of
dimensionality but also large complexity in memory and
computation. Therefore, Principle Component Analysis (PCA)
is further applied to further reduce the dimensionality and
obtain the final SELD feature.
With the extracted SELD feature, many metrics can be
used to compute the similarity or distance between two face
images for face verification by threshold or identification by
nearest neighbor. In this paper, we select the most commonly
used cosine similarity.
III.
EXPERIMENTS
In order to evaluate the proposed approach, we carry out
extensive experiments on LFW benchmark [3], where we not
only compare our method with previous methods based on K
means or Randomprojection tree but also compare with
existing stateoftheart approaches that reported best known
results on LFW. Finally, we also compare our method with K
means method for face identification on FERET database [23].
A. Experimental setting
The LFW benchmark is designed for unconstrained face
verification with face images containing large variations in
pose, age, expression, race and illumination. There are two
evaluation modes proposed by the LFW organizer: the image
restricted and the imageunrestricted training mode. This paper
only considers the restricted mode. Under this mode, the whole
standard testing set consists of ten subsets and each subset
contains 300 sameperson pairs and 300 differentperson pairs.
The performance of an algorithm is measured by a 10fold
cross validation procedure. The final average recognition rate
serves as the evaluation criterion. For face identification, we
used FERET database and its evaluation protocol to evaluate
our method.
In all experiments, DoG filters withσ1=2.0 andσ2= 2.0 are
used. The size of the sampling template is set to 5×5. Default
Page 4
dictionary size is set to 256. The PCA preserves 98% of the
total energy.
B. Face verification evaluation on LFW
The original size of each image in LFW is 250×250 pixels.
All face images are cropped to 80×150 pixels just by simply
cutting out the center of the aligned version images provided by
Wolf et al. [25]. In the blockwise SELD feature extraction
stage, the face images are divided into 5×10 blocks to obtain 50
summed sparse code vectors.
Experiment 1: comparison with KMeans and Random
projection tree
The first experiment aims to validate the discriminative
power and stableness of the proposed SELD by evaluating on
LFW according to the LFW imagerestricted evaluation mode.
We compare the proposed sparse dictionary learning method
with previous methods, i.e., Kmeans and Randomprojection
tree [19]. In the experiment, 500 images are randomly selected
from the LFW training set to train the dictionary. Note that we
find the size of training set has little influence on the final
performance.
The mean accuracy curves of the three methods are shown
in Fig.2, with the horizontal axis different number of code
words in the dictionary. Meanwhile, the ROC curves of the
three methods are also plotted in Fig.3 when the number of
codewords is 256. From these two figures, it is clear that the
proposed method outperforms the other previous methods in
terms of both mean accuracy and ROC. Especially, from the
ROC curves, we can see that, our method works impressively
better than the other two methods especially when false
positive rates are small. Please note that, we have tried our best
to rigidly implement the Randomprojection tree algorithm as
in the [19], but its performances are still slightly inferior to the
Kmeans, which seems slightly different from results in [6].
In addition, from Fig.2, we can find that the performances
of the three methods all increase with the increase of the code
number. However, to balance the computational cost, we select
256 as the default code number in our following experiments.
Figure 2. Performance comparison vs. learning method. We studied the
mean accuracy of the learned descriptors using three learning method: K
means, Randomprojection tree and the proposed sparse coding with different
code number. Note that here we do not use PCA in order to reflect the
comparison essentially.
Figure 3. Demonstrate the effects of three different encoding methods in
terms of ROC curves.
Figure 4. The effects of removing the first R dimensions of SELD feature
generated by PCA
Experiment 2: effect of removing some leading PCA features
In the second experiment, we investigate the effects of
removing the first R dimension from the SELD feature
obtained by PCA. The reason that we focus on this point lies in
the fact that, there are large variations in the LFW images,
which implies the leading eigenvectors encode mostly
variations in lighting, pose, and other large variances rather
than those in identity. Therefore, we guess removing some of
the leading eigenvectors should lead to performance
improvement. The results are plotted in Fig.4. From the figure,
it is clear that our abovementioned guess is validated: by
removing the first several dimensions, about 4 percents’
improvement can be achieved.
Experiment 3: fusion of multiple blockwise SELD
In above experiments, a fixed block partitioning method,
i.e., 5×10 blocks, is used to extract single SELD feature.
Intuitively, we have other choices to partition the image and
extract multiple blockwise SELD features. Thus, when
matching a pair of images, multiple similarities can be
computed and fused together by sum rule or SVM. Following
this idea, we tried five different block partitioning modes:
5×10, 4×8, 3×6, 2×4, 1×10, which is similar to the hierarchical
spatial pyramid structure [26]. We name this method as
“Multiple blockwise SELD fusion” in short.
Page 5
Figure 5. Our face verification methods compared with one of the stateof
theart methods in [6], which learn dictionary based on Randomprojection
tree.
Figure 6. Face verification comparison on the LFW benchmark in restrict
protocol.
In Fig.5, above method and single SELD are evaluated on
LFW under imagerestricted mode and compared with one of
the best known method on this testing [6]. In Fig. 5, the method
“Single LE+holistic” means using the single best LE to
represent the holistic face, where the encoder is Random
projection tree. The method “Multi LE+comp”, reporting the
best performance in [6], not only divides the holistic face into 9
components by componentlevel face alignment, but also fuses
four different encoded local descriptors to fed into linear SVM.
It also exploits poseadaptive adjustment. In contrast, our
“Multiple blockwise SELD fusion” method only exploits
single sampling template and single dictionary. And the sparse
coding is also conducted only once. So, it is evidently more
elegant than “Multi LE+comp”.
From Fig.5, it is clear that “Multiple blockwise SELD
fusion” performs impressively better than “Single SELD”,
which implies that different block partitioning strategies are
complementary mutually. From Fig.5, the proposed “Single
SELD” consistently outperforms “Single LE” by a 4~5
percents improvement. Our “Multiple blockwise SELD
fusion” is also comparable to “Multiple LE+comp”. However,
as mentioned above, “Multiple LE+comp” in [6] is more
complicated than our “Multiple blockwise SELD fusion”
method.
Comparison with other best known results on LFW
To better validate our method, we also compare our
methods with other previous stateoftheart approaches
[5,6,27,28, 29] on the same LFW evaluation, as shown in Fig.
6. From the figure, it is clear that our method is among the best
ones. In addition, it is worth pointing out that, besides the
training data in LFW, the method in [27] utilized an external
largescale datasets for feature extraction and classifier
designing.
C. Face identification on FERET
LFW evaluation focuses on face verification. In this
section, we perform experiments on the FERET dataset [23] to
verify the performance of our approach for face identification.
According to the FERET evaluation protocol, algorithms
were evaluated against different categories of images including
some variations such as lighting change, people wearing
glasses, and the time between the acquisition dates of the
database image. This database consists of one standard gallery
(1196 images of 1196 subjects) and four probe sets: Fb (1195
images of 1195 subjects), Fc (194 images of 194 subjects),
Duplicate I (722 images of 243 subjects) (abbreviated as DupI),
and Duplicate II (234 images of 75 subjects) (abbreviated as
DupII).
On this database, we align and normalize the face image
into 80×88 pixels. 300 frontal images are randomly selected
from the FERET training CD as the training set to learn the
dictionary. In our method, the sampling patch size is 5×5
pixels, and the face images are divided into 6×6 blocks.
The results of our method are compared with Kmeans and
LVP method [16] in Table 1. In order to compare with LVP,
we apply the histogram intersection as similarity measure.
From Table 1, we can get several major observations. First, all
method have almost equal performance on Fb dataset. Second,
the SELD method is superior to local encoded method based
Kmeans, 8, 4 and 4 percentage points improvement on Fc,
DupI and DupII dataset respectively. Third, our method also
significantly outperforms LVP method [16] except on Fb
dataset.
D. Discussion
As evaluated in above experiments, the proposed SELD
method works impressively better than similar method based
on Kmeans or Randomprojection tree. So, what is the source
of the performance gain? To answer this question, we need to
analyze the main different between our method and previous
ones. As mentioned above, the main difference lies in two
points: 1) sparsity constraint is introduced in our method during
the dictionary learning, as well as during the image encoding
using the dictionary; 2) blockwise summation of the sparse
coding coefficients are used rather than the appearance
frequency (histogram) of the codewords.
TABLE I. RECOGNITION PERFORMANCE ON THE FERET DATABASE
Descriptor
LVP [16]
Kmeans
SELD
Fb
0.97
0.95
0.95
Fc
0.70
0.76
0.84
DupI
0.66
0.64
0.68
DupII
0.50
0.58
0.62
Page 6
The key of the first difference is “sparsity”, which has been
proved in recent literature very effective for both representation
and discrimination. Compared with Kmeans or Random
projection tree which learn local visual patterns that most
frequently appears, sparse dictionary learning focuses more on
the effective representation of the local patches, which endow
the codewords more representation power.
The second difference actually reflects the difference of
exclusive model selection and additive model summation. Give
a sampled patch, traditional methods always encode it to one
single codeword by vector quantization. However, our method
encodes it to a vector associated with all the codewords. In
other words, traditional methods use single codewords to
express a patch, while our method expresses it with a linear
combination of all the codewords, which we believe can lead
to a more stable representation.
IV. CONCLUSION
In this paper, we propose sparsely encoded local descriptor
(SELD) for robust face recognition. Unlike traditional Kmeans
or Randomprojection tree based dictionary learning method,
sparse dictionary is leaned by introducing sparsity constraint.
The encoding procedure is also changed from exclusive code
word assignment to sparse coding. Correspondingly, code
words frequency based descriptor is replaced by blockwise
summation of sparse coding coefficients vector. The above
characteristics endow the proposed SELD better performance,
as validated by the LFW evaluation. Results comparable to the
best known results are achieved.
In this paper, the proposed SELD method is only validated
on face recognition problem. Nevertheless, SELD is not limited
to face recognition, since it is not specially designed for face
recognition. Therefore, we will apply it to other possible
applications, e.g., object categorization.
ACKNOWLEDGMENT
This paper is partially supported by Natural Science
Foundation of China under contracts No.61025010, and
No.60833013; National Basic Research Program of China (973
Program) under contract 2009CB320902; and HiTech
Research and Development Program of China under contract
No.2009AA01Z317.
REFERENCES
[1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face
recognition: A literature survey,” ACM Comput. Surv., vol. 35, no. 4,
pp.399–458, 2003.
[2] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O’ Toole,
D. Bolme, K. W. Bowyer, B. A. Draper, G. H. Givens, Y. M. Lui, H.
Sahibzada1, J. A. Scallan III, and S. Weimer, “Overview of the Multiple
Biometrics Grand Challenge”, in Proc. International Conference on
Biometrics, 2009.
[3] G. Huang, M. Ramesh, T. Berg, and E. LearnedMiller, “Labeled faces
in the wild: A database for studying face recognition in unconstrained
environments,” University of Massachusetts, Amherst, Technical Report
0749, 2007.
[4] G. Hua and A. Akbarzadeh, “A robust elastic and partial matching
metric for face recognition,” in Proc. ICCV, 2009.
[5] Y. Taigman, L. Wolf, T. Hassner, and I. TelAviv, “Multiple OneShots
for utilizing class label information,” in Proc. BMVC, 2009.
[6] Z. Cao, Q.Yin, J.Sun, and X.Tang, “Face recognition with learning
based descriptor,” in Proc. CVPR, 2010.
[7] T.Ahonen, A.Hadid and M.Pietikäinen, “Face description with local
binary patterns: application to face recognition,” IEEE Transactions on
Pattern Analysis and Machine Intelligence 28 (12) (2006) 2037–2041.
[8] B. Zhang, S. Shan, X. Chen, W. Gao, “Histogram of Gabor phase
patterns (HGPP): a novel object representation approach for face
recognition,” IEEE Transactions on Image Processing 16 (1) (2007) 57–
68.
[9] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, “Local Gabor binary
pattern histogram sequence (LGBPHS): a novel nonstatistical model for
face representation and recognition,” in Proc. ICCV, 2005.
[10] A. Albiol, D. Monzo, A. Martin, J. Sastre, and A. Albiol, “Face
recognition using HOG–EBGM,” Pattern Recognition Letters, Vol.29,
pp.15371543, 2008.
[11] D. Lowe, “Distinctive image features from scaleinvariant keypoints,”
International Journal of Computer Vision, 60(2):91–110, 2004.
[12] G. Zhao andM. Pietikäinen, “Local binary pattern descriptors for
dynamic texture recognition,” in Proc. of ICPR, volume 2, pages 211–
214, 2006.
[13] X. Tan and B. Triggs, “Enhanced local texture feature sets for face
recognition under difficult lighting conditions,” in International
Workshop on Analysis and Modeling of Faces and Gestures, 2007.
[14] S. Xie, S. Shan, X. Chen, J. Chen, “Fusing Local Patterns of Gabor
Magnitude and Phase for Face Recognition,” IEEE Transactions on
Image Processing, vol.19, no.5, pp13491361,May.2010.
[15] M. Bicego, A. Lagorio, E. Grosso, and M. Tistarelli, “On the Use of
SIFT Features for Face Authentication,” in Computer Vision and Pattern
Recognition Workshop (CVPRW'06), 35, 2006.
[16] X. Meng, S. Shan, X. Chen, W. Gao, “Local visual primitives (LVP) for
face modeling and recognition,” in International Conference on Pattern
Recognition (2006) 536–539.
[17] S. Xie, S. Shan, X. Chen, X. Meng, and W. Gao, “Learned local gabor
patterns for face representation and recognition,” Signal Processing,
89(12):2333–2344, 2009.
[18] T. Ahonen and M. Pietik¨ainen, “Image description using joint
distribution of filter bank responses,” Pattern Recognition Letters.
30(4):368–376, 2009.
[19] Y. Freund, S. Dasgupta, M. Kabra, and N. Verma, “Learning the
structure of manifolds using random projections,” in Proc. NIPS, 2007.
[20] J. Wright, A.Yang, A. Ganesh, S.Sastry, and Y.Ma, “Robust face
recognition via sparse representation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, pages 210227, 2009.
[21] Q. Zhang and B. Li, “Discriminative KSVD for dictionary learning in
face recognition,” in Proc. CVPR, 2010.
[22] M. Yang and Lei Zhang, “Gabor Feature based Sparse Representation
for Face Recognition with Gabor Occlusion Dictionary,” in Proc. ECCV
2010.
[23] P. Phillips, H. Moon, S. Rizvi, P. Rauss, “The FERET evaluation
methodology for facerecognition algorithms,” IEEE Transactions on
Pattern Analysis and Machine Intelligence 22 (10) (2000) 1090–1104.
[24] M. Aharon, M.Elad, and A.Bruckstein, “KSVD: An algorithm for
designing overcomplete dictionaries for sparse representation,” IEEE
Transactions on signal processing, 54(11):4311,2006.
[25] L.Wolf, T.Hassner, and Y.Taigman, “Similarity scores based on
background samples,” in Proc. ACCV, 2009.
[26] J. Yang, K. Yu, T. Huang, “Supervised translationinvariant Sparse
Coding,” in Proc. CVPR, 2010.
[27] N. Kumar, A. Berg, P. Belhumeur, and S. Nayar, “Attribute and Simile
classifiers for face verification,” in Proc. ICCV, 2009.
[28] M.Guillaumin, J. Verbeek, C. Schmid, I. LEAR, and L.Kuntzmann, “Is
that you? Metric learning approaches for face identification,” in Proc.
ICCV, 2009.
[29] N.Pinto, J.DiCarl0, and D.Cox, “How far can you get with a modern
face recognition test set using only simple features,”in Proc.CVPR,2009.