Gene selection using genetic algorithm and support vectors machines
ABSTRACT In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer
classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray
data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final
subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the
different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data.
The results show that the proposed method has excellent selection and classification performance, especially for breast cancer
data, which can yield 100% classification accuracy using only four genes.
- SourceAvailable from: duke.cs.duke.edu[show abstract] [hide abstract]
ABSTRACT: election.Inthis chapter, we review current methods of feature selection, focusing especially onthe many recent results that have been reported in the context of gene expressionanalysis. Then we present a new Bayesian EM algorithm that jointly accomplishesthe classifier design and feature selection tasks. By combining these two problemsand solving them together, we identify only those features that are most useful inperforming the classification itself. Experimental results are presented...
- [show abstract] [hide abstract]
ABSTRACT: Classification analysis of microarray gene expression data has been widely used to uncover biological features and to distinguish closely related cell types that often appear in the diagnosis of cancer. However, the number of dimensions of gene expression data is often very high, e.g., in the hundreds or thousands. Accurate and efficient classification of such high-dimensional data remains a contemporary challenge. In this paper, we propose a comprehensive vertical sample-based KNN/LSVM classification approach with weights optimized by genetic algorithms for high-dimensional data. Experiments on common gene expression datasets demonstrated that our approach can achieve high accuracy and efficiency at the same time. The improvement of speed is mainly related to the vertical data representation, P-tree,Patents are pending on the P-tree technology. This work is partially supported by GSA Grant ACT#:K96130308. and its optimized logical algebra. The high accuracy is due to the combination of a KNN majority voting approach and a local support vector machine approach that makes optimal decisions at the local level. As a result, our approach could be a powerful tool for high-dimensional gene expression data analysis.Journal of Biomedical Informatics 09/2004; 37(4):240-8. · 2.13 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This paper presents an approach to the multi-criteria optimization problem of feature subset selection using a genetic algorithm. Our experiments demonstrate the feasibility of this approach for feature subset selection in the automated design of neural networks for pattern classification and knowledge discovery. 1 Introduction Many practical pattern classification tasks (e.g., medical diagnosis) require learning of an appropriate classification function that assigns a given input pattern (typically represented using a vector of attribute or feature values) to one of a finite set of classes. The choice of features, attributes, or measurements used to represent patterns that are presented to a classifier affect (among other things): ffl The accuracy of the classification function that can be learn...06/1997;