Page 1

Vol. 23 no. 9 2007, pages 1106–1114

doi:10.1093/bioinformatics/btm036

BIOINFORMATICS

ORIGINAL PAPER

Gene expression

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection

on DNA microarray data

Xin Zhou and David P. Tuck?

Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA

Received on December 7, 2006; revised on January 24, 2007; accepted on January 26, 2007

Associate Editor: David Rocke

ABSTRACT

Motivation: Given the thousands of genes and the small number of

samples, gene selection has emerged as an important research

problem in microarray data analysis. Support Vector Machine—

Recursive Feature Elimination (SVM-RFE) is one of a group

of recently described algorithms which represent the stat-of-the-art

for gene selection. Just like SVM itself, SVM-RFE was originally

designed to solve binary gene selection problems. Several groups

have extended SVM-RFE to solve multiclass problems using one-

versus-all techniques. However, the genes selected from one binary

gene selection problem may reduce the classification performance in

other binary problems.

Results: In the present study, we propose a family of four extensions

to SVM-RFE (called MSVM-RFE) to solve the multiclass gene

selection problem, based on different frameworks of multiclass

SVMs. By simultaneously considering all classes during the gene

selection stages, our proposed extensions identify genes leading to

more accurate classification.

Contact: david.tuck@yale.edu

Supplementary information: Supplementary materials, including a

detailed review of both binary and multiclass SVMs, and complete

experimental results, are available at Bioinformatics online.

1INTRODUCTION

Microarray technology allows researchers to measure the

quantity of mRNA for tens of thousands of genes simulta-

neously. It has the power to create a comprehensive overview of

the gene regulatory network. Studies on DNA microarray

data offer opportunities for advancing fundamental biological

research and clinical practice (Golub et al., 1999; Ramaswamy

et al., 2001; Ross et al., 2000; Staunton et al., 2001; West

et al., 2001).

Normally, a microarray data set contains a large number

of genes (usually several thousand or more) and a relatively

small number of samples or conditions (usually 5100).

Among all the genes, many are irrelevant, insignificant or

redundant to the discriminant problem under investigation.

Hence the identification of informative genes, which have the

greatest power for classification, is of fundamental and

practical importance to the investigation of specific discrimi-

nant problems, such as cancer versus non-cancer (Alon et al.,

1999)or different tumortypes

Ramaswamy et al., 2001).

Many gene selection algorithms have been proposed in the

context of microarray data analysis over the past few years.

However, most of the studies are related to binary (two-class)

gene selection problems (Golub et al., 1999; Guyon et al., 2002;

Lee et al., 2003; Zhou and Mao, 2005a,b; Zhang et al., 2006,

to name a few), and only a few involve multiclass (i.e. more

than two classes) gene selection and classification (Chai and

Domeniconi, 2004; Li et al.,2004; Ramaswamy et al., 2001;

Tibshirani et al., 2002). As the multiclass problem is

intrinsically more difficult and presents more challenges, it is

worthy of further investigation.

The well-known SVM-RFE (Support Vector Machine—

Recursive Feature Elimination) (Guyon et al., 2002) is a simple

and efficient algorithm which conducts gene selection in a

backward elimination procedure. It has been widely applied in

analyzing high-dimensional biological data, such as gene

expression data (Frank et al., 2006; Ramaswamy et al., 2001;

Zheng et al., 2006), sequence analysis (Das et al., 2006), and

protein mass spectra data (Hilario et al., 2006). Just as SVM,

SVM-RFE was initially proposed for binary cases. Some

researchers (Chai and Domeniconi, 2004; Ramaswamy et al.,

2001) employed a simple one-versus-all technique to apply

SVM-RFE to solve multiclass

genes selected from one binary gene selection problem

may reduce the classification performance in other binary

problems.

In the present study, we propose four extensions of SVM-

RFE to solve the multiclass gene selection problem. By taking

all classes simultaneously into consideration during the gene

selection stage, our proposed extensions provide genes leading

to more accurate classification performance.

The paper is organized as follows. The binary and multiclass

SVMs, as well as SVM-RFE, are first briefly reviewed in

Section 2. In Section 3, we discuss the existing extension of

SVM-RFE for multiclass problems, and propose four new

MSVM-RFE algorithms based on different frameworks of

multiclass SVMs, respectively, to overcome the shortcoming of

the existing extension. The performance of MSVM-RFE

algorithms is finally tested with six microarray data sets in

Section 4.

(Golubet al.,1999;

problems. However, the

*To whom correspondence should be addressed.

1106

? The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

by guest on May 30, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 2

2BACKGROUND

Consider n training data pairs: S ¼ fhxi;yiig, i ¼ 1,...;n, where

xi(2 Rp) is a feature vector representing the i-th sample, and yi

is the class label of xi. For a binary problem, yi2 f?1;1g, while

for k-class (k42) problem, yi2 f1;2,...;kg.

2.1 Support vector machines

Support vector machines (SVMs) (Vapnik,1998) are powerful

classification algorithms that have shown state-of-the-art

performance in a variety of biological classification tasks.

SVMs are not very sensitive to the curse of dimensionality, and

they are well-suited to work with high dimensional data, such

as microarray gene expression data (Brown et al., 2000;

Furey et al., 2000).

The first generation of SVMs was only designed for binary

classification. However, most real-life diagnostic tasks are not

binary, and solving multiclass classification problems is much

harder than solving binary ones (Mukherjee, 2003; Rifkin et al.,

2003). Fortunately, several algorithms have been proposed for

extending binary SVMs to multiclass classification. These

algorithms are grouped into two types. One is by constructing

and combining several binary SVM classifiers. One-versus-all

(OVA) and one-versus-one (OVO) SVMs (Kreßel, 1999) are

two typical methods in this type. The other type, called

‘all-together’, is to directly solve one optimization problem

which takes all classes into consideration (Crammer and Singer,

2001; Weston and Watkins, 1999; Lee et al., 2004). In this

section, we will briefly introduce the binary and multiclass

SVMs. A detailed review of both binary and multiclass SVMs,

including their mathematical formulations, is presented in the

Supplemental Material.

2.1.1

hyperplane with maximal distance between itself and the closest

samples from each of two classes (Vapnik, 1998). The decision

function of SVMs, just as other linear classifiers, is presented as

fðxÞ ¼ wTx þ b, where w ¼ ½w1;w2,...;wp?Tis the weight vector

and b is a scalar. The mechanism of SVMs is to minimize the

following optimization problem:

Binary SVMs Intuitively, an SVM searches for a

minimize

Pðw;nÞ ¼1

2kwk2þ C

X

n

i¼1

?i;

ð1Þ

subject toyi½wTxiþ b?51 ? ?i;

?i50;

i ¼ 1,...;n;

where the predefined parameter C is a trade-off between

training accuracy and generalization. Note that the SVM is

originally designed for the binary classification problem, hence

yi2 f?1;1g as we mentioned before. The solution of this

optimization problem is given by solving the corresponding

dual problem, a quadratic programming (QP) problem with

n variables.

2.1.2

and probably the earliest implementation for multiclass SVMs

(Bottou et al., 1994). Here, k binary SVM classifiers are

constructed: class 1 (positive) versus all other classes (negative),

Multiclass SVMs: one-versus-all This is the simplest

class 2 versus all other classes, ..., class k versus all

other classes. The decision function of the r-th classifier

(r ¼ 1,...;k), frðxÞ ¼ wT

is the r-th weight vector and bris a scalar, can be obtained by

solving the following optimization problem,

rx þ br, where wr¼ ½wr1;wr2,...;wrp?T

minimize

Prðwr;nrÞ ¼1

2kwrk2þ C

X

n

i¼1

?r

i;

ð2Þ

subject tozr

?r

i½wT

i50;

rxiþ br?51 ? ?r

i ¼ 1,...;n;

i;

where zr

if yi¼r and ?1 otherwise. Note that for multiclass cases

yi2 f1;2,...;kg. When all the k classifiers are constructed, the

combined k binary classifiers ½f1;f2,...;fk?Tpredict the class (y)

of a (test) sample x that corresponds to the maximal value of k

binary classifiers, that is,

iis the class label for the r-th classifier, that is, zr

i¼ 1

y ¼ argmax

r

frðxÞ:

Practically we solve the dual problem of (2) whose number of

variables is the same as the number of data, so k QP problems

with n variables are solved in this implementation of

multiclass SVMs.

2.1.3

method based on multiple binary SVMs is called the OVO

method (Kreßel, 1999). This method involves the construction

of kðk ? 1Þ=2 SVM classifiers where each one is trained on data

from two classes. In total we need to solve kðk ? 1Þ=2 QP

problems with less than n variables.

Hsu and Lin (2002) compared several multiclass SVMs, and

their experiments indicate that the OVO SVM is more suitable

for practical use (with large or medium sample size) concerning

both accuracy and running speed. However, some researchers

(Statnikov et al., 2005; Yeang et al., 2001) found that, in small

sample sized microarray data, OVO SVMs perform worse than

OVA SVMs and other ‘all-together’ multiclass SVMs. It is

also observedfrom our experimental

Supplementary Material). The reason perhaps is that each

binary classifier in the OVO SVM uses only a fraction of total

training samples, and the small training set might make these

binary classifiers subject to overfitting.

Multiclass SVMs: one-versus-one Another major

studies (referto

2.1.4

(WW)

multiclass SVMs by solving one single optimization problem

(Vapnik 1998; Weston and Watkins, 1999). The idea is similar

to the OVA approach. It simultaneously constructs k binary

classifiers where the r-th function, wT

r from the others. The formulation is as follows:

Multiclass SVMs: method by Weston and Watkins

This is the first ‘all-together’ implementation of

rx þ br, separates the class

minimize

Pðw; nÞ ¼1

2

X

k

r¼1

kwrk2þ C

X

n

i¼1

X

k

r¼1;r6¼yi

i;

?r

i;

ð3Þ

subject towT

yixiþ byi5wT

?r

i50;

rxiþ brþ 2 ? ?r

i ¼ 1,...;n;

r 2 f1,...;kg n yi:

MSVM-RFE

1107

by guest on May 30, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 3

The decision function is the same as that of OVA SVMs,

that is,

y ¼ argmax

r

ðwT

rx þ brÞ:

ð4Þ

The QP problem arising from (3) has ðk ? 1Þn variables.

Given that the computational complexity of the QP

optimization algorithm is polynomial to the number of

variables, it is computationally much more expensive to solve

multiclass classification using ‘all-together’ methods than using

methods based on binary SVMs.

2.1.5

(CS)

variables in the constraints of the optimization problem, and

there is no bias item b in the decision function (Crammer and

Singer, 2001). Basically, this method solves the following

optimization problem:

Multiclass SVMs: method by Crammer and Singer

This method is similar to WW. It uses fewer slack

minimize

Pðw;nÞ ¼1

2

X

k

r¼1

kwrk2þ C

X

n

i¼1

?i;

ð5Þ

subject towT

yixi? wT

?i50;

rxi51 ? ?i;

i ¼ 1,...;n:

r 6¼ yi;

The decision function is

y ¼ argmax

r

ðwT

rxÞ:

The corresponding dual problem of (5) is a QP problem with kn

variables. Although the QP problem is larger than that from

WW, Crammer and Singer, (2001) decomposed the single

problem into multiple QP problems with reduced size, which

saves much computation.

2.1.6

(LLW)

that its solution for the multiclass problem resembles Bayes rule

asymptotically. The formulation is as follow:

Multiclass SVMs: method by Lee, Lin and Wahba

This method has a promising theoretical property

minimize

Pðw;nÞ ¼1

2

X

k

r¼1

kwrk2þ C

X

n

i¼1

X

k

r¼1;r6¼yi

?r

i;

ð6Þ

subject towT

rxiþ br4 ?

?r

i50;

X

1

k ? 1þ ?r

i;

i ¼ 1,...;n;

r 2 f1,...;kg n yi;

k

r¼1

ðwT

rxiþ brÞ ¼ 0;

i ¼ 1,...;n:

The formulation presented here is slightly different from that in

Lee et al. (2004). Here we assume that the linear kernel and

equal misclassification costs are employed. Please refer to

Lee et al. (2004) for the general form. The final decision

function is just the same as (4). The single QP dual problem

of (6) has ðk ? 1Þn variables.

2.2Support Vector Machine—Recursive Feature

Elimination (SVM-RFE)

SVM-RFE (Guyon et al., 2002) conducts feature selection in a

sequential backward elimination manner, which starts with all

the features and discards one feature at a time. Just like SVM,

SVM-RFE was initially proposed for binary problems. The

squared coefficients w2

j(j ¼ 1,...;p) of the weight vector w

obtained from binary problem (1) are employed as feature

ranking criteria. Intuitively, those features with the largest

weights are the most informative. Thus in an iterative

procedure of SVM-RFE one trains the SVM classifier,

computes the ranking criteria w2

the feature with the smallest ranking criterion. The procedure is

repeated until a small subset of features is obtained.

Actually,themagnitude

the approximate change

J ¼1

This is explained by the Optimal Brain Damage (OBD)

algorithm (Lecun et al., 1990). The criterion J can be expanded

in Taylor series to second order,

jfor all features, and discards

of

in

w2

the

j

justcorresponds

criterion

to

in (1),

2kwk2þ CPn

i¼1?i, when the j-th feature is discarded.

?JðjÞ ¼@J

@wj?wjþ@2J

@2wjð?wjÞ2þ O?ð?wjÞ3?

ð7Þ

At the optimum of J, the first order term can be neglected1,

then Equation (7) becomes ?JðjÞ ? ð?wjÞ2. If we denote by J(j)

the value of J when the j-th feature is removed (by setting the

corresponding weight to 0), it follows that

JðjÞ ? J þ w2

j:

ð8Þ

Therefore, removing the feature with smallest w2

least increase of J, which increases the generalization perfor-

mance at the same time. In other words, SVM-RFE aims to

find the gene subset that yields minimal criterion J. This

criterion was also employed by Zhou and Mao (2006) in a

sequential forward selection.

jwill cause the

3 EXTENSIONS OF SVM-RFE FOR MULTICLASS

PROBLEMS

3.1 Existing extension: OVA-RFE

SVM-RFE (Guyon et al., 2002) is originally designed for binary

problems. Some researchers (Chai and Domeniconi, 2004;

Ramaswamy et al., 2001; Rifkin et al., 2003) employed RFE for

multiclass gene selection by ranking genes for each class

separately in the OVA manner.

For k-class problem (k53), k binary SVM classifiers are

constructed using OVA method described in Section 2.1.2. For

the r-th binary classification problem, SVM-RFE is carried out

to identify a feature subset Sr for class r against all other

classes. After k feature subsets are selected, the final selected

subset for the whole multiclass problem is the combination

of all the k subsets. This extension of SVM-RFE is called

OVA-RFE in the present work.

1Actually, the criterion for OBD is not J ¼1

Lagrangian function from (1), that is L ¼1

?i½yiðwTxiþ bÞ ? 1 þ ?i? ?Pn

problem, J ¼ L, moreover, @2J=@2wj¼ @2L=@2wj¼ 1, so for simplicity

here we use J instead of L. In the following, we use the similar simplified

forms for the extensions of SVM-RFE.

2kwk2þ CPn

because

i¼1?i, but the

2kwk2þ CPn

i¼1?i?Pn

i¼1

and

i¼1?i?i,

@L=@wj¼ 0

@J=@wj6¼ 0. However, at the optimum of (1) and corresponding dual

X.Zhou and D.P.Tuck

1108

by guest on May 30, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 4

The k subsets Sr ðr ¼ 1,...;kÞ are identified independently.

According to Section 2.1.2 and 2.2, the subset Srminimizes the

following criterion,

Jr¼1

2kwrk2þ C

X

n

i¼1

?r

i:

ð9Þ

However, it cannot guarantee that the final selected feature

subset (S1[ S2[ ??? [ Sk) simultaneously minimizes all the k

criteria: J1;J2,...;Jk. For example, some features selected

from one binary problem may reduce the classification

performance in other binary problems. As a consequence, the

feature subset selected by OVA-RFE might not be an

optimal set.

3.2 New extensions: MSVM-RFEs

To overcome the limitations of OVA-RFE, we propose

MSVM-RFE algorithms for feature selection on multiclass

data. As there are two types of multiclass SVMs: methods

based on multiple binary SVMs and ‘all-together’ methods, our

proposed extensions are derived from these two groups,

respectively.

3.2.1

discussion in Section 3.1, the optimal feature subset should

simultaneously minimize the k criteria in Equation (9) with

r ¼ 1,...;k. Therefore, the multiclass feature selection is cast as

a multi-objective optimization problem. The simplest way to

solve the multi-objective optimization problem is the Weighting

Objectives Method (Geoffrion, 1968). Assume that the k classes

equally contribute to the classification task. The multi-objective

optimization is changed to the following single objective

optimization problem by employing equal weights,

MSVM-RFE based on OVA SVMs According to our

minimizeJ ¼1

k

X

k

r¼1

Jr¼1

k

X

k

r¼1

1

2kwrk2þ C

X

n

i¼1

?r

i

!

:

ð10Þ

The OBD algorithm (Lecun et al., 1990) is employed to solve

this optimization problem. As in the binary SVM-RFE, the

criterion J in (10) is expanded in Taylor series to second order,

?

At the optimum of J, ?JðjÞ ?1

binary SVM-RFE, it follows that

?JðjÞ ?1

k

X

k

r¼1

@Jr

@wrj?wrjþ@2Jr

@2wrjð?wrjÞ2

Pk

X

?

:

ð11Þ

k

r¼1ð?wrjÞ2. Similar with

JðjÞ ? J þ1

k

k

r¼1

ðwrjÞ2;

ð12Þ

where J(j) is the value of J when the j-th feature is eliminated.

In consequence, removing the feature with smallestPk

employed as a ranking criterion in the recursive elimination

procedure as described in Algorithm 1. Compared with OVA-

RFE which independently identifies features from each class,

this new extension simultaneously takes all the k classes into

consideration.

It is straightforward to apply such techniques on other binary

SVMs based methods, such as OVO SVM (Kreßel, 1999).

r¼1w2

rj

will cause the least increase of J. Therefore,Pk

r¼1w2

rjcan be

However, the extension based on OVO SVMs does not perform

as well as other extensions proposed in this paper (see

experimental results in the Supplementary Material) on the

small sample sized microarray data. It might be the direct

impact of the overfitting of OVO classifiers we mentioned in

Section 2.1.3. With large sample sizes, we expect the extension

based on OVO SVMs to perform as well as or better than other

extensions proposed here.

3.2.2

WW (Weston and Watkins, 1999, Section 2.1.4) as an example

to illustrate our extensions on ‘all-together’ methods. The

criterion used in the OBD algorithm (LeCun et al., 1990) is just

the cost function in (3), that is,

MSVM-RFE based on ‘all-together’ methodsWe take

J ¼1

2

X

k

r¼1

kwrk2þ C

X

n

i¼1

X

k

r¼1;r6¼yi

?r

i:

ð13Þ

Similar with binary SVM-RFE, the criterion J is expanded in

Taylor series to second order,

?JðjÞ ?

X

k

r¼1

@J

@wrj?wrjþ

X

k

r¼1

@2J

@2wrjð?wrjÞ2:

ð14Þ

At the optimum of J, it follows that

JðjÞ ? J þ

X

k

r¼1

w2

rj:

ð15Þ

So

‘all-together’ SVMs.

The recursive elimination procedure of our proposed

extensions is summarized as Algorithm 1. We propose four

MSVM-RFE algorithms based on different multiclass SVMs,

respectively. They are, for short, MSVM-RFE(OVA) MSVM-

RFE(WW), MSVM-RFE(CS) and MSVM-RFE(LLW). To

reduce the computational cost, the algorithm can be generalized

to discard more than one gene at each step, although discarding

several genes may degrade the selection performance.

Pk

r¼1w2

rjis still an appropriate ranking criterion for

MSVM-RFE

1109

by guest on May 30, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 5

4EXPERIMENTS AND DISCUSSION

We tested our proposed MSVM-RFE algorithms on six

microarray data sets. The specifications of the data sets are

listed in Table 1. Please refer to the Supplementary Material for

more detailed data description and preprocessing.

For assessment of gene selection algorithms, Ambroise and

McLachlan (2002) suggested external cross-validation, in which

the gene selection and validation are performed on different

parts of the sample set, to obtain an unbiased performance

estimate. In this work, we employed external 4-fold stratified

cross-validation to evaluate the selection performance. The

stratified cross-validation (Breiman et al., 1984) is slightly

different from regular cross-validation. In k-fold stratified

cross-validation, a data set is randomly partitioned into k

equally sized folds such that the class distribution in each fold is

approximately the same as that in the entire data set. In

contrast, regular cross-validation randomly partitions the data

set into k-folds without considering class distributions. Kohavi

(1995) reported that stratified cross-validation has smaller bias

and variance than regular cross-validation. To obtain a more

reliable estimate, the external 4-fold cross-validation process

was repeated 100 times using different partitions of the data.

We compared our MSVM-RFE algorithms with two multi-

class gene selection algorithms: BSS/WSS and OVA-RFE. BSS/

WSS (Ratio of Between-group to Within-groups Sum of

Squares) was proposed by Dudoit et al., (2002) to perform

gene selection for multi-class problems. BSS/WSS is an

individual feature ranking criterion. For a gene j, this ratio is

Pn

i¼1

where I(?) is the indicator function, x:j denotes the average

expression level of gene j across all samples and xrjdenotes the

average expression level of gene j across samples belonging to

class r. Intuitively, genes with large BSS/WSS ratios (that is,

relatively large variation among classes and relatively small

variation within classes) are likely relevant to class separation.

OVA-RFE is described in Section 3.1.

In Algorithm 1, MSVM-RFE algorithms remove one gene

with the smallest ranking criterion in each step. However, the

number of genes in microarray data is very large (normally

several thousands). If only one gene is removed in each step, the

computational cost would be very high. Therefore, in the

present work genes corresponding to the ranking criterion in

BSSðjÞ

WSSðjÞ¼

i¼1

Pk

r¼1Iðyi¼ rÞðxrj? x:jÞ2

Pk

Pn

r¼1Iðyi¼ rÞðxij? xrjÞ2;

the bottom 10% of the remaining genes were removed to

expedite the selection procedure.

To select the relevant genes using MSVM-RFE algorithms, it

is important to find the appropriate value of C which is used to

construct the multiclass SVMs during the recursive elimination

procedure. Generally, the appropriate C should be variant

when multiclass SVM classifiers are trained on different gene

subsets. However, it might not be easy to tune the parameter C

for each SVM classifier, and such tuning would demand

substantially increased computation. In the present study, we

employed fixed C during the RFE procedure. The parameter

was tuned through a simple technique. We assessed the

performance of the BSS/WSS algorithm using external cross-

validation with a sequence of given values of C (for example,

C ¼ 10;1;0:1;0:01;0:001;0:0001, respectively). The optimal

value of C is chosen to be the one which gives the minimal

external cross-validation error on the selected gene subsets. For

the GCM data set, C was set to 0.1 for OVA SVM and

corresponding RFE algorithms, such as OVA-RFE and

MSVM-RFE (OVA). Accordingly, C¼0.1 for LLW related

algorithms, while C¼0.01 for WW and CS related algorithms.

Another issue is to decide the number of genes to be selected.

Generally, finding the optimal number of genes is very difficult.

In the present study, we employed the solution in Li et al.,

(2004). A set of simulations were conducted on the GCM

dataset by varying the number of selected genes. When the

number of selected genes is4400, the selection performance is

not significantly increased, and the variance of external cross-

validation error is small and stable. The same results were

observed on other datasets. Hence, a maximum of 400 genes

were identified in all experiments.

All programs are written in C/Cþþ. We used LIBSVM

(Chang and Lin, 2001) to implement OVA SVMs and

corresponding gene selection algorithms, including OVA-RFE

and MSVM-RFE (OVA). BSVM (Hsu and Lin, 2002) was

employed to implement WW2and CS related algorithms.

As there are no available fast implementations of LLW SVMs,

we implemented this SVM framework using the OOQP package

(Gertz and Wright, 2003) to solve the arising QP problems. The

QP problem from LLW SVMs has ðk ? 1Þn variables, so the

computational cost is very high, especially for large data sets. In

this work, we only tested the selection performance when

k;2k;... genes were selected, respectively, using LLW SVMs on

the GCM and 11_Tumors data sets.

The results and genes selected on the GCM data set are

presented here for illustration of typical results. (However, all

the experimental results are available as Supplementary

Material). For the GCM data set, we compared the selection

performances of six gene selection algorithms, as shown in

Figure 1. As an individual ranking method, the performance of

Table 1. Description of the data sets

Data setSamples Genes Classes Reference

GCM

11_Tumors

NCI Ross

NCI Staunton

Lung cancer

MLL

198

174

58

58

203

72

14122 14

9700 11

5643

3144

5343

10930

Ramaswamy et al. (2001)

Su et al. (2001)

Ross et al. (2000)

Staunton et al. (2001)

Bhattacharjee et al. (2001)

Armstrong et al. (2002)

8

8

5

3

2The WW SVM algorithm implemented in BSVM is slightly different

from the original algorithm. BSVM solves a modified optimization

problem, which adds a termPk

solved using fast decomposition techniques. Considering its fast

running speed, we used BSVM for classification and gene selection on

large data sets (11_Tumor and GCM datasets), but for other smaller

data sets we still used the original WW SVM algorithm implemented by

the OOQP package (Gertz and Wright, 2003).

r¼1b2

rto the objective function (3). By

doing so, the corresponding dual problem is simplified, and is easy to be

X.Zhou and D.P.Tuck

1110

by guest on May 30, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from