MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.

Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA.
Bioinformatics (Impact Factor: 4.62). 06/2007; 23(9):1106-14. DOI: 10.1093/bioinformatics/btm036
Source: PubMed

ABSTRACT MOTIVATION: Given the thousands of genes and the small number of samples, gene selection has emerged as an important research problem in microarray data analysis. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is one of a group of recently described algorithms which represent the stat-of-the-art for gene selection. Just like SVM itself, SVM-RFE was originally designed to solve binary gene selection problems. Several groups have extended SVM-RFE to solve multiclass problems using one-versus-all techniques. However, the genes selected from one binary gene selection problem may reduce the classification performance in other binary problems. RESULTS: In the present study, we propose a family of four extensions to SVM-RFE (called MSVM-RFE) to solve the multiclass gene selection problem, based on different frameworks of multiclass SVMs. By simultaneously considering all classes during the gene selection stages, our proposed extensions identify genes leading to more accurate classification.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: We first present a feature selection method based on a multilayer perceptron (MLP) neural network, called feature selection MLP (FSMLP). We explain how FSMLP can select essential features and discard derogatory and indifferent features. Such a method may pick up some useful but dependent (say correlated) features, all of which may not be needed. We then propose a general scheme for dealing with feature selection with "controlled redundancy" (CoR). The proposed scheme, named as FSMLP-CoR, can select features with a controlled redundancy both for classification and function approximation/prediction type problems. We have also proposed a new more effective training scheme named mFSMLP-CoR. The idea is general in nature and can be used with other learning schemes also. We demonstrate the effectiveness of the algorithms using several data sets including a synthetic data set. We also show that the selected features are adequate to solve the problem at hand. Here, we have considered a measure of linear dependency to control the redundancy. The use of nonlinear measures of dependency, such as mutual information, is straightforward. Here, there are some advantages of the proposed schemes. They do not require explicit evaluation of the feature subsets. Here, feature selection is integrated into designing of the decision-making system. Hence, it can look at all features together and pick up whatever is necessary. Our methods can account for possible nonlinear subtle interactions between features, as well as that between features, tools, and the problem being solved. They can also control the level of redundancy in the selected features. Of the two learning schemes, mFSMLP-CoR, not only improves the performance of the system, but also significantly reduces the dependency of the network's behavior on the initialization of connection weights.
    IEEE transactions on neural networks and learning systems 01/2015; 26(1):35-50. DOI:10.1109/TNNLS.2014.2308902 · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Perception of sound categories is an important aspect of auditory perception. The extent to which the brain's representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases.
    PLoS ONE 02/2015; 10(2):e0117303. DOI:10.1371/journal.pone.0117303 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Support vector machines (SVM) are considered as a powerful tool for classification which demonstrate great performances in various fields. Presented for the first time for binary problems, SVMs have been extended in several ways to multiclass case with good results in practice. However, the existence of noise or redundant variables can reduce their performances, where the need for a selection of variables. In this work, we are interested in determining the relevant explanatory variables for an SVM model in the case of multiclass discrimination (MSVM). The criterion proposed here consist in determining such variables using one of the upper bounds of generalization error specific to MSVM models known as radius margin bound [1]. A score derived from this bound will establish the order of relevance of variables, then, the selection of optimal subset will be done using forward method. The experiments are conducted on simulated and real data, and some results are compared with those of other methods of variable selection by MSVM.
    The International Conference on Artificial Intelligence and Pattern Recognition (AIPR2014), Asia Pacific University of Technology & Innovation (APU), Kuala Lumpur, Malaysia; 11/2014