Article

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.

Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
Bioinformatics (Impact Factor: 4.62). 06/2007; 23(9):1106-14. DOI: 10.1093/bioinformatics/btm036
Source: PubMed

ABSTRACT MOTIVATION: Given the thousands of genes and the small number of samples, gene selection has emerged as an important research problem in microarray data analysis. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is one of a group of recently described algorithms which represent the stat-of-the-art for gene selection. Just like SVM itself, SVM-RFE was originally designed to solve binary gene selection problems. Several groups have extended SVM-RFE to solve multiclass problems using one-versus-all techniques. However, the genes selected from one binary gene selection problem may reduce the classification performance in other binary problems. RESULTS: In the present study, we propose a family of four extensions to SVM-RFE (called MSVM-RFE) to solve the multiclass gene selection problem, based on different frameworks of multiclass SVMs. By simultaneously considering all classes during the gene selection stages, our proposed extensions identify genes leading to more accurate classification.

1 Bookmark
 · 
146 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: We first present a feature selection method based on a multilayer perceptron (MLP) neural network, called feature selection MLP (FSMLP). We explain how FSMLP can select essential features and discard derogatory and indifferent features. Such a method may pick up some useful but dependent (say correlated) features, all of which may not be needed. We then propose a general scheme for dealing with feature selection with "controlled redundancy" (CoR). The proposed scheme, named as FSMLP-CoR, can select features with a controlled redundancy both for classification and function approximation/prediction type problems. We have also proposed a new more effective training scheme named mFSMLP-CoR. The idea is general in nature and can be used with other learning schemes also. We demonstrate the effectiveness of the algorithms using several data sets including a synthetic data set. We also show that the selected features are adequate to solve the problem at hand. Here, we have considered a measure of linear dependency to control the redundancy. The use of nonlinear measures of dependency, such as mutual information, is straightforward. Here, there are some advantages of the proposed schemes. They do not require explicit evaluation of the feature subsets. Here, feature selection is integrated into designing of the decision-making system. Hence, it can look at all features together and pick up whatever is necessary. Our methods can account for possible nonlinear subtle interactions between features, as well as that between features, tools, and the problem being solved. They can also control the level of redundancy in the selected features. Of the two learning schemes, mFSMLP-CoR, not only improves the performance of the system, but also significantly reduces the dependency of the network's behavior on the initialization of connection weights.
    IEEE transactions on neural networks and learning systems 01/2015; 26(1):35-50. DOI:10.1109/TNNLS.2014.2308902 · 4.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Support vector machines (SVM) are considered as a powerful tool for classification which demonstrate great performances in various fields. Presented for the first time for binary problems, SVMs have been extended in several ways to multiclass case with good results in practice. However, the existence of noise or redundant variables can reduce their performances, where the need for a selection of variables. In this work, we are interested in determining the relevant explanatory variables for an SVM model in the case of multiclass discrimination (MSVM). The criterion proposed here consist in determining such variables using one of the upper bounds of generalization error specific to MSVM models known as radius margin bound [1]. A score derived from this bound will establish the order of relevance of variables, then, the selection of optimal subset will be done using forward method. The experiments are conducted on simulated and real data, and some results are compared with those of other methods of variable selection by MSVM.
    The International Conference on Artificial Intelligence and Pattern Recognition (AIPR2014), Asia Pacific University of Technology & Innovation (APU), Kuala Lumpur, Malaysia; 11/2014
  • Source

Preview