Robust sparse coding for face recognition
ABSTRACT Recently the sparse representation (or coding) based classification (SRC) has been successfully used in face recognition. In SRC, the testing image is represented as a sparse linear combination of the training samples, and the representation fidelity is measured by the l_{2}norm or l_{1}norm of coding residual. Such a sparse coding model actually assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be accurate enough to describe the coding errors in practice. In this paper, we propose a new scheme, namely the robust sparse coding (RSC), by modeling the sparse coding as a sparsityconstrained robust regression problem. The RSC seeks for the MLE (maximum likelihood estimation) solution of the sparse coding problem, and it is much more robust to outliers (e.g., occlusions, corruptions, etc.) than SRC. An efficient iteratively reweighted sparse coding algorithm is proposed to solve the RSC model. Extensive experiments on representative face databases demonstrate that the RSC scheme is much more effective than stateoftheart methods in dealing with face occlusion, corruption, lighting and expression changes, etc.

Conference Paper: Using Sparse Coding for Landmark Localization in Facial Expressions
[Show abstract] [Hide abstract]
ABSTRACT: In this article we address the issue of adopting a local sparse coding representation (Histogram of Sparse Codes), in a partbased framework for inferring the locations of facial landmarks. The rationale behind this approach is that unsupervised learning of sparse code dictionaries from face data can be an effective approach to cope with such a challenging problem. Results obtained on the CMU MultiPIE Face dataset are presented providing support for this approach.5th European Workshop on Visual Information Processing (EUVIP 2014 ), Paris; 12/2014 
Conference Paper: Toneaware sparse representation for face recognition
[Show abstract] [Hide abstract]
ABSTRACT: It is still a very challenging task to recognize a face in a real world scenario, since the face may be corrupted by many unknown factors. Among them, illumination variation is an important one, which will be mainly discussed in this paper. First, the illumination variations caused by shadow or overexposure are regarded as a multiplicative scaling image over the original face image. The purpose of introducing scaling vector (or scaling image) is to enhance the pixels in shadow regions, while depress the pixels in overexposure regions. Then, based on the scaling vector, we propose a novel toneaware sparse representation (TASR) model. Finally, a EMlike algorithm is proposed to solve the proposed TASR model. Extensive experiments on the benchmark face databases show that our method is more robust against illumination variations.Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on; 01/2013  SourceAvailable from: Fumin Shen[Show abstract] [Hide abstract]
ABSTRACT: We propose a very simple, efficient yet surprisingly effective feature extraction method for face recognition (about 20 lines of Matlab code), which is mainly inspired by spatial pyramid pooling in generic image classification. We show that features formed by simply pooling local patches over a multilevel pyramid, coupled with a linear classifier, can significantly outperform most recent face recognition methods. The simplicity of our feature extraction procedure is demonstrated by the fact that no learning is involved (except PCA whitening). We show that, multilevel spatial pooling and dense extraction of multiscale patches play critical roles in face image classification. The extracted facial features can capture strong structural information of individual faces with no label information being used. We also find that, preprocessing on local image patches such as contrast normalization can have an important impact on the classification accuracy. In particular, on the challenging face recognition datasets of FERET and LFWa, our method improves previous best results by more than 10% and 20%, respectively.06/2014;
Page 1
Robust Sparse Coding for Face Recognition
Meng Yang
Hong Kong Polytechnic Univ.
Lei Zhang∗
Jian Yang
Nanjing Univ. of Sci. & Tech.
David Zhang
Hong Kong Polytechnic Univ.
Abstract
Recently the sparse representation (or coding) based
classification (SRC) has been successfully used in face
recognition. In SRC, the testing image is represented as
a sparse linear combination of the training samples, and
the representation fidelity is measured by the 푙2norm or
푙1norm of coding residual. Such a sparse coding model
actually assumes that the coding residual follows Gaus
sian or Laplacian distribution, which may not be accurate
enough to describe the coding errors in practice. In this
paper, we propose a new scheme, namely the robust sparse
coding (RSC), by modeling the sparse coding as a sparsity
constrained robust regression problem. The RSC seeks for
the MLE (maximum likelihood estimation) solution of the
sparse coding problem, and it is much more robust to out
liers (e.g., occlusions, corruptions, etc.)
efficient iteratively reweighted sparse coding algorithm is
proposed to solve the RSC model. Extensive experiments
on representative face databases demonstrate that the RSC
scheme is much more effective than stateoftheart meth
ods in dealing with face occlusion, corruption, lighting and
expression changes, etc.
than SRC. An
1. Introduction
As a powerful tool for statistical signal modeling, sparse
representation(orsparsecoding) hasbeensuccessfullyused
in image processing applications [16], and recently has led
to promising results in face recognition [24, 25, 27] and
texture classification [15]. Based on the findings that nat
ural images can be generally coded by structural primitives
(e.g., edges and line segments) that are qualitatively similar
in form to simple cell receptive fields [18], sparse coding
techniques represent a natural image using a small number
of atoms parsimoniously chosen out of an overcomplete
dictionary. Intuitively, the sparsity of the coding coefficient
vector can be measured by the 푙0norm of it (푙0norm counts
the number of nonzero entries in a vector). Since the com
binatorial 푙0norm minimization is an NPhard problem, the
∗Corresponding author. This research is supported by the Hong Kong
General Research Fund (PolyU 5351/08E).
푙1norm minimization, as the closest convex function to 푙0
norm minimization, is widely employed in sparse coding,
and it was shown that 푙0norm and 푙1norm minimizations
are equivalent if the solution is sufficiently sparse [3]. In
general, the sparse coding problem can be formulated as
min
휶
∥휶∥1
s.t.
∥풚 − 퐷휶∥2
2≤ 휀,
(1)
where 풚 is a given signal, 퐷 is the dictionary of coding
atoms, 휶 is the coding vector of 풚 over 퐷, and 휀 > 0 is a
constant.
Face recognition (FR) is among the most visible and
challenging research topics in computer vision and pattern
recognition [29], and many methods, such as Eigenfaces
[21], Fisherfaces [2] and SVM [7], have been proposed in
the past two decades. Recently, Wright et al. [25] applied
sparse coding to FR and proposed the sparse representation
based classification (SRC) scheme, which achieves impres
sive FR performance. By coding a query image 풚 as a
sparse linear combination of the training samples via the
푙1norm minimization in Eq. (1), SRC classifies the query
image풚 byevaluatingwhichclassoftrainingsamplescould
result in the minimal reconstruction error of it with the as
sociated coding coefficients. In addition, by introducing an
identity matrix 퐼 as a dictionary to code the outlier pixels
(e.g., corrupted or occluded pixels):
min
휶,휷∥[휶;휷]∥1
s.t.풚 = [퐷,퐼] ⋅ [휶;휷],
(2)
the SRC method shows high robustness to face occlusion
and corruption. In [9], Huang et al. proposed a sparse rep
resentation recovery method which is invariant to image
plane transformation to deal with the misalignment and
pose variation in FR, while in [22] Wagner et al. proposed
a sparse representation based method that could deal with
face misalignment and illumination variation. Instead of di
rectly using original facial features, Yang and Zhang [27]
used Gabor features in SRC to reduce greatly the size of
occlusion dictionary and improve a lot the FR accuracy.
The sparse coding model in Eq. (1) is widely used in
literature. There are mainly two issues in this model. The
first one is that whether the 푙1norm constraint ∥휶∥1is good
enoughtocharacterizethesignalsparsity. Thesecondoneis
625
Page 2
that whether the 푙2norm term ∥풚 − 퐷휶∥2
enough to characterize the signal fidelity, especially when
theobservation풚 isnoisyorhasmanyoutliers. Manyworks
have been done for the first issue by modifying the sparsity
constraint. For example, Liu et al. [14] added a nonneg
ative constraint to the sparse coefficient 휶; Gao et al. [4]
introduced a Laplacian term of coefficient in sparse coding;
Wang et al. [23] used the weighted 푙2norm for the spar
sity constraint. In addition, Ramirez et al. [19] proposed a
framework of universal sparse modeling to design sparsity
regularization terms. The Bayesian methods were also used
for designing the sparsity regularization terms [11].
2≤ 휀 is effective
The above developments of sparsity regularization term
in Eq. (1) improve the sparse representation in different as
pects; however, to the best of our knowledge, little work has
been done on improving the fidelity term ∥풚 − 퐷휶∥2
cept that in [24, 25] the 푙1norm was used to define the cod
ing fidelity (i.e., ∥풚 − 퐷휶∥1). In fact, the fidelity term has
a high impact on the final coding results because it ensures
that the given signal 풚 can be faithfully represented by the
dictionary 퐷. From the viewpoint of maximum likelihood
estimation (MLE), defining the fidelity term with 푙2 or 푙1
norm actually assumes that the coding residual 풆 = 풚−퐷휶
follows Gaussian or Laplacian distribution. But in prac
tice this assumption may not hold well, especially when
occlusions, corruptions and expression variations occur in
the query face images. So the conventional 푙2 or 푙1norm
based fidelity term in sparse coding model Eq. (1) may not
be robust enough in these cases. Meanwhile, these prob
lems cannot be well solved by modifying the sparsity regu
larization term.
2ex
To improve the robustness and effectiveness of sparse
representation, we propose a socalled robust sparse cod
ing (RSC) model in this paper. Inspired by the robust re
gression theory [1, 10], we design the signal fidelity term
as an MLElike estimator, which minimizes some function
(associated with the distribution of the coding residuals) of
the coding residuals. The proposed RSC scheme utilizes
the MLE principle to robustly regress the given signal with
sparse regression coefficients, and we transform the mini
mization problem into an iteratively reweighted sparse cod
ing problem. A reasonable weight function is designed for
applying RSC to FR. Our extensive experiments in bench
mark face databases show that RSC achieves much better
performance than existing sparse coding based FR methods,
especially when there are complicated variations of face im
ages, such as occlusions, corruptions and expressions, etc.
The rest of this paper is organized as follows. Section 2
presents the proposed RSC model. Section 3 presents the
algorithm of RSC and some analyses, such as convergence
and complexity. Section 4 conducts the experiments, and
Section 5 concludes the paper.
2. Robust Sparse Coding (RSC)
2.1. The RSC model
The traditional sparse coding model in Eq. (1) is equiva
lent to the socalled LASSO problem [20]:
min
휶
∥풚 − 퐷휶∥2
2
s.t.
∥휶∥1≤ 휎,
(3)
where 휎 > 0 is a constant, 풚 = [푦1;푦2;⋅⋅⋅ ;푦푛] ∈ ℝ푛is the
signal to be coded, 퐷 = [풅1,풅2,⋅⋅⋅ ,풅푚] ∈ ℝ푛×푚is the
dictionary with column vector 풅푗being the 푗thatom, and 휶
is the coding coefficient vector. In our problem of FR, the
atom 풅푗is the training face sample (or its dimensionality
reduced feature) and hence the dictionary 퐷 is the training
dataset.
We can see that the sparse coding problem in Eq. (3)
is essentially a sparsityconstrained least square estima
tion problem.It is known that only when the residual
풆 = 풚 − 퐷휶 follows the Gaussian distribution, the least
square solution is the MLE solution. If 풆 follows the Lapla
cian distribution, the MLE solution will be
min
휶
∥풚 − 퐷휶∥1
s.t.
∥휶∥1≤ 휎,
(4)
Actually Eq. (4) is essentially another expression of Eq. (2)
because both of them can have the following Lagrangian
formulation: min
휶
In practice, however, the distribution of residual 풆 may
be far from Gaussian or Laplacian distribution, especially
when there are occlusions, corruptions and/or other varia
tions. Hence, the conventional sparse coding models in Eq.
(3) (or Eq. (1)) and Eq. (4) (or Eq. (2)) may not be robust
and effective enough for face image representation.
In order to construct a more robust model for sparse cod
ing of face images, in this paper we propose to find an MLE
solution of the coding coefficients. We rewrite the dictio
nary 퐷 as 퐷 = [풓1;풓2;⋅⋅⋅ ;풓푛], where row vector 풓푖is the
푖throw of 퐷. Denote by 풆 = 풚 − 퐷휶 = [푒1;푒2;⋅⋅⋅ ;푒푛]
the coding residual.Then each element of 풆 is 푒푖 =
푦푖− 푟푖휶,푖 = 1,2,⋅⋅⋅ ,푛.
are independently and identically distributed according to
some probability density function (PDF) 푓휽(푒푖), where 휽
denotes the parameter set that characterizes the distribution.
Without considering the sparsity constraint of 휶, the likeli
hood of the estimator is퐿휽(푒1,푒2,⋅⋅⋅ ,푒푛) =∏푛
equivalently, minimize the objective function: −ln퐿휽 =
∑푛
MLE of 휶, namely the robust sparse coding (RSC), can be
formulated as the following minimization
{∥풚 − 퐷휶∥1+ 휆∥휶∥1}[26].
Assume that 푒1,푒2,⋅⋅⋅ ,푒푛
푖=1푓휽(푒푖),
and MLE aims to maximize this likelihood function or,
푖=1휌휽(푒푖), where 휌휽(푒푖) = −ln푓휽(푒푖).
With consideration of the sparsity constraint of 휶, the
min
휶
∑푛
푖=1휌휽(푦푖− 풓푖휶)s.t.
∥휶∥1≤ 휎,
(5)
626
Page 3
In general, we assume that the unknown PDF 푓휽(푒푖) is sym
metric, and푓휽(푒푖) < 푓휽(푒푗)if∣푒푖∣ > ∣푒푗∣. So휌휽(푒푖)hasthe
following properties: 휌휽(0) is the global minimal of 휌휽(푒푖);
휌휽(푒푖) = 휌휽(−푒푖); 휌휽(푒푖) < 휌휽(푒푗) if ∣푒푖∣ > ∣푒푗∣. Without
loss of generality, we let 휌휽(0) = 0.
Form Eq. (5), we can see that the proposed RSC model
is essentially a sparsityconstrained MLE problem. In other
words, it is a more general sparse coding model, while the
conventional sparse coding models in Eq. (3) and Eq. (4)
are special cases of it when the coding residual follows
Gaussian and Laplacian distributions, respectively.
By solving Eq. (5), we can get the MLE solution to 휶
with sparsity constraint. Clearly, one key problem is how
to determine the distribution 휌휽(or 푓휽). Explicitly taking
푓휽as Gaussian or Laplacian distribution is simple but not
effective enough. In this paper, we do not determine 휌휽di
rectly to solve Eq. (5). Instead, with the above mentioned
general assumptions of 휌휽, we transform the minimization
probleminEq. (5)intoaniterativelyreweightedsparsecod
ing problem, and the resulted weights have clear physical
meaning, i.e., outliers will have low weight values. By it
eratively computing the weights, the MLE solution of RSC
could be solved efficiently.
2.2. The distribution induced weights
Let 퐹휽(풆) =∑푛
풆0:˜퐹휽(풆) = 퐹휽(풆0) + (풆 − 풆0)푇퐹′
푹1(풆) is the high order residual term, and 퐹′
derivative of 퐹휽(풆). Denote by 휌′
then 퐹′
푒0,푖is the 푖thelement of 풆0.
In sparse coding, it is usually expected that the fidelity
term is strictly convex. So we approximate the residual term
as 푅1(풆) = 0.5(풆−풆0)푇푊(풆−풆0), where 푊 is a diagonal
matrix for that the elements in 풆 are independent and there
is no cross term between 푒푖and 푒푗, 푖 ∕= 푗, in 퐹휽(풆). Since
퐹휽(풆) reaches its minimal value (i.e., 0) at 풆 = 0, we also
require that˜퐹휽(풆) has its minimal value at 풆 = 0. Letting
˜퐹′
푖=1휌휽(푒푖). We can approximate 퐹휽(풆)
by its first order Taylor expansion in the neighborhood of
휽(풆0) + 푹1(풆), where
휽(풆) is the
휽the derivative of 휌휽, and
휽(푒0,2);⋅⋅⋅ ;휌′
휽(풆0) = [휌′
휽(푒0,1);휌′
휽(푒0,푛)], where
휽(0) = 0, we have the diagonal element of 푊 as
푊푖,푖= 휔휽(푒0,푖) = 휌′
휽(푒0,푖)/푒0,푖.
(6)
According to the properties of 휌휽(푒푖), 휌′
same sign as 푒푖. So each 푊푖,푖is a nonnegative scalar. Then
˜퐹휽(풆) can be written as˜퐹휽(풆) =1
푏 is a scalar value determined by 풆0. Since 풆 = 풚 − 퐷휶,
the RSC model in Eq. (5) can be approximated by
휽(푒푖) will have the
2
??푊1/2풆??2
2+ 푏, where
min
휶
???푊1/2(풚 − 퐷휶)
???
2
2
s.t.
∥휶∥1≤ 휎,
(7)
which is clearly a weighted LASSO problem. Because the
weight matrix 푊 needs to be estimated using Eq. (6), Eq.
(7) is a local approximation of the RSC in Eq. (5) at 풆0, and
the minimization procedure of RSC can be transformed into
an iteratively reweighted sparse coding problem with 푊 be
ing updated using the residuals in previous iteration via Eq.
(6). Each 푊푖,푖is a nonnegative scalar, so the weighted
LASSO in each iteration is a convex problem, which could
be solved easily by methods such as 푙1ls [12].
Since 푊 is a diagonal matrix, its element 푊푖,푖 (i.e.,
휔휽(푒푖)) is the weight assigned to each pixel of the query
image 풚. Intuitively, in FR the outlier pixels (e.g. occluded
or corrupted pixels) should have low weight values. Thus,
with Eq. (7) the determination of distribution 휌휽is trans
formed into the determination of weight 푊. Considering
the logistic function has properties similar to the hinge loss
function in SVM [28], we choose it as the weight function
휔휽(푒푖) = exp(휇훿 − 휇푒2
where 휇 and 훿 are positive scalars. Parameter 휇 controls the
decreasing rate from 1 to 0, and 훿 controls the location of
demarcation point. With Eq. (8), Eq. (6) and 휌휽(0) = 0,
we could get
(ln(1 + exp(휇훿 − 휇푒2
TheoriginalsparsecodingmodelsinEqs. (3)and(4)can
be interpreted by Eq. (7). The model in Eq. (3) is the case
by letting 휔휽(푒푖) = 2. The model in Eq. (4) is the case by
letting 휔휽(푒푖) = 1/∣푒푖∣. Compared with the models in Eqs.
(3) and (4), the proposed weighted LASSO in Eq. (7) has
the following advantage: outliers (usually the pixels with
big residuals) will be adaptively assigned with low weights
to reduce their affects on the regression estimation so that
the sensitiveness to outliers can be greatly reduced. The
weight function of Eq. (8) is bounded in [0,1]. Although
the model in Eq. (4) also assigns low weight to outliers, its
weight function is not bounded. The weights of pixels with
very small residuals will have nearly infinite values. This
reduces the stability of the coding process.
The convexity of the RSC model (Eq. (5)) depends on
the form of 휌휽(푒푖) or the weight function 휔휽(푒푖). If we
simply let 휔휽(푒푖) = 2, the RSC degenerates to the origi
nal sparse coding problem (Eq. (3)), which is convex but
not effective. The RSC model is not convex with the weight
function defined in Eq. (8). However, for FR, a good initial
ization can always be got, and our RSC algorithm described
in next section could always find a local optimal solution,
which has very good FR performance as validated in the
experiments in Section 4.
푖
)/(1 + exp(휇훿 − 휇푒2
푖
))
(8)
휌휽(푒푖) =−1
2휇
푖
))− ln(1 + exp휇훿))
(9)
3. Algorithm of RSC
As discussed in Section 2.2, the implementation of RSC
can be an iterative process, and in each iteration it is a con
vex 푙1minimization problem. In this section we propose
627
Page 4
such an iteratively reweighted sparse coding (IRSC) algo
rithm to solve the RSC minimization.
3.1. Iteratively reweighted sparse coding (IRSC)
Although in general the RSC model can only have a
locally optimal solution, fortunately in FR we are able to
have a very reasonable initialization to achieve good per
formance. When a testing face image 풚 comes, in order to
initialize the weight, we should firstly estimate the coding
residual 풆 of 풚. We can initialize 풆 as 풆 = 풚 − 풚푖푛푖, where
풚푖푛푖is some initial estimation of the true face from obser
vation 풚. Because we do not know which class the testing
face image 풚 belongs to, a reasonable 풚푖푛푖can be set as the
mean image of all training images. In the paper, we simply
compute 풚푖푛푖as
풚푖푛푖= 풎퐷,
(10)
where 풎퐷is the mean image of all training samples.
With the initialized 풚푖푛푖, our algorithm to solve the
RSC model, namely Iteratively Reweighted Sparse Coding
(IRSC), is summarized in Algorithm 1.
When RSC converges, we use the same classification
strategy as in SRC [25] to classify the face image 풚.
3.2. The convergence of IRSC
The weighted sparse coding in Eq. (7) is a local ap
proximation of RSC in Eq. (5), and in each iteration the
objective function value of Eq. (5) decreases by the IRSC
algorithm. Since the original cost function of Eq. (5) is
lower bounded (≥0), the iterative minimization procedure
in IRSC will converge.
The convergence is achieved when the difference of the
weight between adjacent iterations is small enough. Specif
ically, we stop the iteration if the following holds:
???푊(푡)− 푊(푡−1)???
where 훾 is a small positive scalar.
2
/???푊(푡−1)???
2< 훾,
(12)
3.3. Complexity analysis
The complexity of both SRC and the proposed IRSC
mainly lies in the sparse coding process, i.e., Eq. (3) and
Eq. (7). Suppose that the dimensionality 푛 of face feature
is fixed, the complexity of sparse coding model Eq. (3) ba
sically depends on the number of dictionary atoms, i.e. 푚.
The empirical complexity of commonly used 푙1regularized
sparse coding methods (such as 푙1ls [12]) to solve Eq. (3)
or Eq. (7) is 푂(푚휀) with 휀 ≈ 1.5 [12]. For FR without
occlusion, SRC [25] performs sparse coding once and then
uses the residuals associated with each class to classify the
face image, while RSC needs several iterations (usually 2
iterations) to finish the coding. Thus in this case, RSC’s
complexity is higher than SRC.
Algorithm 1 Iteratively Reweighted Sparse Coding
Input: Normalized test sample 풚 with unit 푙2norm, dic
tionary 퐷 (each column of 퐷 has unit 푙2norm) and 풚(1)
initialized as 풚푖푛푖.
Output: 휶
Start from 푡 = 1:
1: Compute residual 풆(푡)= 풚 − 풚(푡)
2: Estimate weights as
(
1 + exp 휇(푡)훿(푡)− 휇(푡)(푒(푡)
푟푒푐
푟푒푐.
휔휽
(
푒(푡)
푖
)
=
exp휇(푡)훿(푡)− 휇(푡)(푒(푡)
(
푖)2)
푖)2), (11)
where 휇(푡)and 훿(푡)are parameters estimated in the 푡th
iteration (please refer to Section 4.1 for the setting of
them).
3: Sparse coding:
휶∗= min
휶
where 푊(푡)is the estimated diagonal weight matrix
with 푊(푡)
4: Update the sparse coding coefficients:
If 푡 = 1, 휶(푡)= 휶∗;
If 푡 > 1, 휶(푡)= 휶(푡−1)+ 휂(푡)(휶∗− 휶(푡−1));
should make∑푛
process [8]. (Since both 휶(푡−1)and 휶∗belong to the
convex set 푄 = {∥휶∥1≤ 휎}, 휶(푡)will also belong to
푄.
5: Compute the reconstructed test sample:
풚(푡)
and let 푡 = 푡 + 1.
6: Go back to step 1 until the condition of convergence
(described in Section 3.2) is met, or the maximal num
ber of iterations is reached.
??(푊(푡))1/2(풚 − 퐷휶)??2
푖,푖= 휔휽(푒(푡)
2
s.t.∥휶∥1≤ 휎,
푖).
where 0 < 휂(푡)< 1 is the step size, and a suitable 휂(푡)
푖=1휌휽(푒(푡)) <∑푛
푖=1휌휽(푒(푡−1)). 휂(푡)
can be searched from 1 to 0 by the standard linesearch
푟푒푐= 퐷휶(푡),
For FR with occlusion or corruption, SRC needs to use
an identity matrix to code the occluded or corrupted pix
els, as shown in Eq. (2). In this case SRC’s complexity is
푂((푚 + 푛)휀). Considering the fact that 푛 is often much
greater than 푚 in sparse coding based FR (e.g. 푛 = 8086,
푚 = 717intheexperimentswithpixelcorruptionandblock
occlusion in [25]), the complexity of SRC becomes very
high when dealing with occlusion and corruption.
The computational complexity of our proposed RSC is
푂(푘(푚)휀), where 푘 is the number of iteration. Note that
푘 depends on the percentage of outliers in the face image.
By our experience, when there is a small percentage of out
liers, RSC will converge in only two iterations. If there is a
big percentage of outliers (e.g. occlusion, corruption, etc.),
RSC could converge in 10 iterations. So for FR with occlu
628
Page 5
sion, the complexity of RSC is generally much lower than
SRC. In addition, in the iteration of IRSC we can delete the
element 푦푖that has very small weight because this implies
that 푦푖is an outlier. Thus the complexity of RSC can be
further reduced (i.e., in FR with real disguise on the AR
database, about 30% pixels could be deleted in each itera
tion in average).
4. Experimental Results
In this section, we perform experiments on benchmark
face databases to demonstrate the performance of RSC
(source codes accompanying this work are available at
http://www.comp.polyu.edu.hk/˜cslzhang/
code.htm). We first discuss the parameter selection of
RSC in Section 4.1; in Section 4.2, we test RSC for FR
without occlusion on three face databases (Extended Yale
B [5, 13], AR [17], and MultiPIE [6]). In Section 4.3,
we demonstrate the robustness of RSC to random pixel
corruption, random block occlusion and real disguise.
All the face images are cropped and aligned by using the
locations of eyes, which are provided by the face databases
(except for MultiPIE, for which we manually locate the
positions of eyes). For all methods, the training samples
are used as the dictionary 퐷 in sparse coding.
4.1. Parameter selection
In the weight function Eq. (8), there are two parameters,
훿 and 휇, which need to be calculated in Step 2 of IRSC. 훿
is the parameter of demarcation point. When the square of
residual is larger than 훿, the weight value is less than 0.5. In
order to make the model robust to outliers, we compute the
value of 훿 as follows.
Denote by 흍 =[(푒1)2,(푒2)2,⋅⋅⋅ ,(푒푛)2]. By sorting 흍
푘 = ⌊휏푛⌋, where scalar 휏 ∈ (0,1], and ⌊휏푛⌋ outputs the
largest integer smaller than 휏푛. We set 훿 as
in an ascending order, we get the reordered array 흍푎. Let
훿 = 흍푎(푘)
(13)
Parameter 휇 controls the decreasing rate of weight value
from 1 to 0. Here we simply let 휇 = 푐/훿, where 푐 is a
constant. In the experiments, if no specific instructions,
푐 is set as 8; 휏 is set as 0.8 for FR without occlusion,
and 0.5 for FR with occlusion. In addition, in our exper
iments, we solve the (weighted) sparse coding (in Eq. (2),
Eq. (3) or Eq.(7)) by its unconstrained Lagrangian formu
lation. Take Eq. (3) as an example, its Lagrangian form
is min
휶
the multiplier, 휆, is 0.001.
{
∥풚 − 퐷휶∥2
2+ 휆∥휶∥1
}
), and the default value for
4.2. Face recognition without occlusion
We first validate the performance of RSC in FR with
variations such as illumination and expression changes but
Dim
NN
NS
SVM
SRC [25]
RSC
3084150
90.0%
95.1%
96.4%
96.8%
98.4%
300
91.6%
96.0%
97.0%
98.3%
99.4%
66.3%
63.6%
92.4%
90.9%
91.3%
85.8%
94.5%
94.9%
95.5%
98.1%
Table 1. Face recognition rates on the Extended Yale B database
without occlusion.
methods such as nearest neighbor (NN), nearest subspace
(NS), linear support vector machine (SVM), and the re
cently developed SRC [25].
In the experiments, PCA (i.e., Eigenfaces [21]) is used
to reduce the dimensionality of original face features, and
the Eigenface features are used for all the competing meth
ods. Denote by 푃 the subspace projection matrix com
puted by applying PCA to the training data. Then in RSC,
the sparse coding in step 3 of IRSC becomes: 휶∗=
min
휶
1) Extended Yale B Database: The Extended Yale B
[5, 13] database contains about 2,414 frontal face images
of 38 individuals. We used the cropped and normalized
54×48 face images, which were taken under varying illumi
nation conditions. We randomly split the database into two
halves. One half (about 32 images per person) was used as
the dictionary, and the other half for testing. Table 1 shows
the recognition rates versus feature dimension by NN, NS,
SVM, SRC and RSC. It can be seen that RSC achieves bet
ter results than the other methods in all dimensions except
that RSC is slightly worse than SVM when the dimension
is 30. When the dimension is 84, RSC achieves about 3%
improvement of recognition rate over SRC. The best recog
nition rate of RSC is 99.4%, compared to 91.6% for NN,
96.0% for NS, 97.0% for SVM, and 98.3% for SRC.
2) AR database: As in [25], a subset (with only illumina
tion and expression changes) that contains 50 males and 50
females was chosen from the AR dataset [17]. For each sub
ject, thesevenimagesfromSession1wereusedfortraining,
with other seven images from Session 2 for testing. The size
of image is cropped to 60×43. The comparison of RSC and
its competing methods is given in Table 2. Again, we can
see that RSC performs much better than all the other four
methods in all dimensions except that RSC is slightly worse
thanSRCwhenthedimensionis30. Nevertheless, whenthe
dimension is too low, all the methods cannot achieve very
high recognition rate. On other dimensions, RSC outper
forms SRC by about 3%. SVM does not give good results in
this experiment because there are not enough training sam
ples (7 samples per class here) and there are high variations
between training set and testing set. The maximal recog
nition rates of RSC, SRC, SVM, NS and NN are 96.0%,
We compare RSC with the popular
??푃(푊(푡))1/2(풚 − 퐷휶)??2
2
s.t.∥휶∥1≤ 휎.
629