• [Show abstract] [Hide abstract]
    ABSTRACT: F-score is a widely used filter criteria for gene selection in multiclass cancer classification. This ranking criterion may become biased towards classes that have surplus of between-class sum of squares, resulting in inferior classification performance. To alleviate this problem, we propose to compute individual class wise between-class sum of squares with Pareto frontal analysis to rank genes. We tested our approach on four multiclass cancer gene expression datasets and the results show improvement in classification performance.
    Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 7th European Conference, EvoBIO 2009, Tübingen, Germany, April 15-17, 2009, Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Normalization is a prerequisite for almost all follow-up steps in microarray data analysis. Accurate normaliza-tion across different experiments and phenotypes assures a common base for comparative yet quantitative studies using gene expression data. In this paper, we report a novel normalization approach, namely iterative nonlinear regression (INR) method, which exploits concurrent identification of invariantly expressed genes (IEGs) and implementation of nonlinear regression normalization. The INR scheme features an iterative process that performs the following two steps alterna-tively: (1) selection of IEGs and (2) estimation of nonlinear regression function for normalization. We demonstrate the principle and performance of the INR approach on two real microarray data sets. As compared to major peer methods (e.g., linear regression method, Loess method and iterative ranking method), INR method shows an improved perform-ance in achieving low expression variance across replicates and excellent fold-change preservation for differently ex-pressed genes.
    The Open Applied Informatics Journal 11/2007; 107(11):11-19.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray gene expressions provide new opportunities for molecular classification of heterogeneous diseases. Although various reported classification schemes show impressive performance, most existing gene selection methods are suboptimal and are not well-matched to the unique characteristics of the multicategory classification problem. Matched design of the gene selection method and a committee classifier is needed for identifying a small set of gene markers that achieve accurate multicategory classification while being both statistically reproducible and biologically plausible. We report a simpler and yet more accurate strategy than previous works for multicategory classification of heterogeneous diseases. Our method selects the union of one-versus-everyone (OVE) phenotypic up-regulated genes (PUGs) and matches this gene selection with a one-versus-rest support vector machine (OVRSVM). Our approach provides even-handed gene resources for discriminating both neighboring and well-separated classes. Consistent with the OVRSVM structure, we evaluated the fold changes of OVE gene expressions and found that only a small number of high-ranked genes were required to achieve superior accuracy for multicategory classification. We tested the proposed PUG-OVRSVM method on six real microarray gene expression data sets (five public benchmarks and one in-house data set) and two simulation data sets, observing significantly improved performance with lower error rates, fewer marker genes, and higher performance sustainability, as compared to several widely-adopted gene selection and classification methods. The MATLAB toolbox, experiment data and supplement files are available at http://www.cbil.ece.vt.edu/software.htm.
    Journal of Machine Learning Research 01/2010; 11:2141-2167. · 2.85 Impact Factor

Full-text (2 Sources)

Download
23 Downloads
Available from
Jun 16, 2014