Vol. 23 no. 9 2007, pages 1106–1114
MSVM-RFE: extensions of SVM-RFE for multiclass gene selection
on DNA microarray data
Xin Zhou and David P. Tuck?
Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA
Received on December 7, 2006; revised on January 24, 2007; accepted on January 26, 2007
Associate Editor: David Rocke
Motivation: Given the thousands of genes and the small number of
samples, gene selection has emerged as an important research
problem in microarray data analysis. Support Vector Machine—
Recursive Feature Elimination (SVM-RFE) is one of a group
of recently described algorithms which represent the stat-of-the-art
for gene selection. Just like SVM itself, SVM-RFE was originally
designed to solve binary gene selection problems. Several groups
have extended SVM-RFE to solve multiclass problems using one-
versus-all techniques. However, the genes selected from one binary
gene selection problem may reduce the classification performance in
other binary problems.
Results: In the present study, we propose a family of four extensions
to SVM-RFE (called MSVM-RFE) to solve the multiclass gene
selection problem, based on different frameworks of multiclass
SVMs. By simultaneously considering all classes during the gene
selection stages, our proposed extensions identify genes leading to
more accurate classification.
Supplementary information: Supplementary materials, including a
detailed review of both binary and multiclass SVMs, and complete
experimental results, are available at Bioinformatics online.
Microarray technology allows researchers to measure the
quantity of mRNA for tens of thousands of genes simulta-
neously. It has the power to create a comprehensive overview of
the gene regulatory network. Studies on DNA microarray
data offer opportunities for advancing fundamental biological
research and clinical practice (Golub et al., 1999; Ramaswamy
et al., 2001; Ross et al., 2000; Staunton et al., 2001; West
et al., 2001).
Normally, a microarray data set contains a large number
of genes (usually several thousand or more) and a relatively
small number of samples or conditions (usually 5100).
Among all the genes, many are irrelevant, insignificant or
redundant to the discriminant problem under investigation.
Hence the identification of informative genes, which have the
greatest power for classification, is of fundamental and
practical importance to the investigation of specific discrimi-
nant problems, such as cancer versus non-cancer (Alon et al.,
1999)or differenttumor types
Ramaswamy et al., 2001).
Many gene selection algorithms have been proposed in the
context of microarray data analysis over the past few years.
However, most of the studies are related to binary (two-class)
gene selection problems (Golub et al., 1999; Guyon et al., 2002;
Lee et al., 2003; Zhou and Mao, 2005a,b; Zhang et al., 2006,
to name a few), and only a few involve multiclass (i.e. more
than two classes) gene selection and classification (Chai and
Domeniconi, 2004; Li et al.,2004; Ramaswamy et al., 2001;
Tibshirani et al., 2002). As the multiclass problem is
intrinsically more difficult and presents more challenges, it is
worthy of further investigation.
The well-known SVM-RFE (Support Vector Machine—
Recursive Feature Elimination) (Guyon et al., 2002) is a simple
and efficient algorithm which conducts gene selection in a
backward elimination procedure. It has been widely applied in
analyzing high-dimensional biological data, such as gene
expression data (Frank et al., 2006; Ramaswamy et al., 2001;
Zheng et al., 2006), sequence analysis (Das et al., 2006), and
protein mass spectra data (Hilario et al., 2006). Just as SVM,
SVM-RFE was initially proposed for binary cases. Some
researchers (Chai and Domeniconi, 2004; Ramaswamy et al.,
2001) employed a simple one-versus-all technique to apply
SVM-RFE to solve multiclass
genes selected from one binary gene selection problem
may reduce the classification performance in other binary
In the present study, we propose four extensions of SVM-
RFE to solve the multiclass gene selection problem. By taking
all classes simultaneously into consideration during the gene
selection stage, our proposed extensions provide genes leading
to more accurate classification performance.
The paper is organized as follows. The binary and multiclass
SVMs, as well as SVM-RFE, are first briefly reviewed in
Section 2. In Section 3, we discuss the existing extension of
SVM-RFE for multiclass problems, and propose four new
MSVM-RFE algorithms based on different frameworks of
multiclass SVMs, respectively, to overcome the shortcoming of
the existing extension. The performance of MSVM-RFE
algorithms is finally tested with six microarray data sets in
problems. However, the
*To whom correspondence should be addressed.
? The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com
by guest on May 30, 2013
Consider n training data pairs: S ¼ fhxi;yiig, i ¼ 1,...;n, where
xi(2 Rp) is a feature vector representing the i-th sample, and yi
is the class label of xi. For a binary problem, yi2 f?1;1g, while
for k-class (k42) problem, yi2 f1;2,...;kg.
2.1 Support vector machines
Support vector machines (SVMs) (Vapnik,1998) are powerful
classification algorithms that have shown state-of-the-art
performance in a variety of biological classification tasks.
SVMs are not very sensitive to the curse of dimensionality, and
they are well-suited to work with high dimensional data, such
as microarray gene expression data (Brown et al., 2000;
Furey et al., 2000).
The first generation of SVMs was only designed for binary
classification. However, most real-life diagnostic tasks are not
binary, and solving multiclass classification problems is much
harder than solving binary ones (Mukherjee, 2003; Rifkin et al.,
2003). Fortunately, several algorithms have been proposed for
extending binary SVMs to multiclass classification. These
algorithms are grouped into two types. One is by constructing
and combining several binary SVM classifiers. One-versus-all
(OVA) and one-versus-one (OVO) SVMs (Kreßel, 1999) are
two typical methods in this type. The other type, called
‘all-together’, is to directly solve one optimization problem
which takes all classes into consideration (Crammer and Singer,
2001; Weston and Watkins, 1999; Lee et al., 2004). In this
section, we will briefly introduce the binary and multiclass
SVMs. A detailed review of both binary and multiclass SVMs,
including their mathematical formulations, is presented in the
hyperplane with maximal distance between itself and the closest
samples from each of two classes (Vapnik, 1998). The decision
function of SVMs, just as other linear classifiers, is presented as
fðxÞ ¼ wTx þ b, where w ¼ ½w1;w2,...;wp?Tis the weight vector
and b is a scalar. The mechanism of SVMs is to minimize the
following optimization problem:
Binary SVMs Intuitively, an SVM searches for a
subject toyi½wTxiþ b?51 ? ?i;
i ¼ 1,...;n;
where the predefined parameter C is a trade-off between
training accuracy and generalization. Note that the SVM is
originally designed for the binary classification problem, hence
yi2 f?1;1g as we mentioned before. The solution of this
optimization problem is given by solving the corresponding
dual problem, a quadratic programming (QP) problem with
and probably the earliest implementation for multiclass SVMs
(Bottou et al., 1994). Here, k binary SVM classifiers are
constructed: class 1 (positive) versus all other classes (negative),
Multiclass SVMs: one-versus-allThis is the simplest
class 2 versus all other classes, ..., class k versus all
other classes. The decision function of the r-th classifier
(r ¼ 1,...;k), frðxÞ ¼ wT
is the r-th weight vector and bris a scalar, can be obtained by
solving the following optimization problem,
rx þ br, where wr¼ ½wr1;wr2,...;wrp?T
rxiþ br?51 ? ?r
i ¼ 1,...;n;
if yi¼r and ?1 otherwise. Note that for multiclass cases
yi2 f1;2,...;kg. When all the k classifiers are constructed, the
combined k binary classifiers ½f1;f2,...;fk?Tpredict the class (y)
of a (test) sample x that corresponds to the maximal value of k
binary classifiers, that is,
iis the class label for the r-th classifier, that is, zr
y ¼ argmax
Practically we solve the dual problem of (2) whose number of
variables is the same as the number of data, so k QP problems
with n variables are solved in this implementation of
method based on multiple binary SVMs is called the OVO
method (Kreßel, 1999). This method involves the construction
of kðk ? 1Þ=2 SVM classifiers where each one is trained on data
from two classes. In total we need to solve kðk ? 1Þ=2 QP
problems with less than n variables.
Hsu and Lin (2002) compared several multiclass SVMs, and
their experiments indicate that the OVO SVM is more suitable
for practical use (with large or medium sample size) concerning
both accuracy and running speed. However, some researchers
(Statnikov et al., 2005; Yeang et al., 2001) found that, in small
sample sized microarray data, OVO SVMs perform worse than
OVA SVMs and other ‘all-together’ multiclass SVMs. It is
also observedfrom ourexperimental
Supplementary Material). The reason perhaps is that each
binary classifier in the OVO SVM uses only a fraction of total
training samples, and the small training set might make these
binary classifiers subject to overfitting.
Multiclass SVMs: one-versus-oneAnother major
studies (refer to
multiclass SVMs by solving one single optimization problem
(Vapnik 1998; Weston and Watkins, 1999). The idea is similar
to the OVA approach. It simultaneously constructs k binary
classifiers where the r-th function, wT
r from the others. The formulation is as follows:
Multiclass SVMs: method by Weston and Watkins
This is the first ‘all-together’ implementation of
rx þ br, separates the class
Pðw; nÞ ¼1
rxiþ brþ 2 ? ?r
i ¼ 1,...;n;
r 2 f1,...;kg n yi:
by guest on May 30, 2013
Ramaswamy,S. et al. (2001) Multiclass cancer diagnosis using tumor gene
expression signatures. Pro. Nat. Acad. Sci., 98, 15149–15154.
Rifkin,R. et al. (2003) An analytical method for multiclass molecular cancer
classification. SIAM Review, 45, 706–723.
Ross,D.T. et al. (2000) Systematic variation in gene expression patterns in human
cancer cell. Nat. Genet., 24, 227–235.
Statnikov,A. et al. (2005) A comprehensive evaluation of multicategory
classification methods for microarray gene expression cancer diagnosis.
Bioinformatics, 21, 631–643.
Staunton,J.E. et al. (2001) Chemosensitivity prediction by transcriptional
profiling. Pro. Nat. Acad. Sci., 98, 10787–10792.
Su,A.I. et al. (2001) Molecular classification of human carcinomas by use of gene
expression signatures. Cancer Res., 61, 7388–7393.
Tibshirani,R. et al. (2002) Diagnosis of multiple cancer types by shrunken
centroids of gene expression. Proc. Nat. Acad. Sci., 99, 6567–6572.
Vapnik,V.N. (1998) Statistical Learning Theory\/. John Wiley and Sons,
West,M. et al. (2001) Predicting the clinical status of human breast cancer by
using gene expression profiles. Proc. Nat. Acad. Sci., 98, 11462–11467.
Weston,J. and Watkins,C. (1999) Support vector machines for multiclass pattern
recognition. In Proceedings of the Seventh European Symposium On Artificial
Yeang,C.-H. et al. (2001) Molecular classification of multiple tumor types.
Bioinformatics, 17(Suppl. 1), S316–S322.
Zhang,H.H. et al. (2006) Gene selection using support vector machines with non-
convex penalty. Bioinformatics, 22, 88–95.
Zheng,C. et al. (2006) Gene expression profiling of CD34+ cells identifies a
molecular signature of chronic myeloid leukemia blast crisis. Leukemia, 20,
Zhou,X. and Mao,K.Z. (2005a) Gene selection of DNA microarray data based
on Regularization Networks. In Gallagher,M. Hogan,J.M. and Maire,F (eds.)
IDEAL, Vol 3578 of Lecture Notes in Computer Science, Springer, pp. 414–
Zhou,X. and Mao,K.Z. (2005b) LS Bound based gene selection for DNA
microarray data. Bioinformatics, 21, 1559–1564.
Zhou,X. and Mao,K.Z. (2006) The ties problem resulting from counting-based
error estimators and its impact on gene selection algorithms.Bioinformatics,
X.Zhou and D.P.Tuck
by guest on May 30, 2013