[Show abstract][Hide abstract] ABSTRACT: Microarray experiments generate quantitative expression measurements for thousands of genes simultaneously, which is useful for phenotype classification of many diseases. Our proposed phenotype classifier is an ensemble method with k-top-scoring decision rules. Each rule involves a number of genes, a rank comparison relation among them, and a class label. Current classifiers, which are also ensemble methods, consist of k-top-scoring decision rules. Some of these classifiers fix the number of genes in each rule as a triple or a pair. In this paper, we generalize the number of genes involved in each rule. The number of genes in each rule ranges from 2 to N, respectively. Generalizing the number of genes increases the robustness and the reliability of the classifier for the class prediction of an independent sample. Our algorithm saves resources by combining shorter rules in order to build a longer rule. It converges rapidly toward its high-scoring rule list by implementing several heuristics. The parameter k is determined by applying leave-one-out cross validation to the training dataset.
IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) 04/2010; 40(2-40):216 - 226. DOI:10.1109/TSMCC.2009.2036594 · 2.17 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.
Information Sciences 01/2008; 178(1):88-105. DOI:10.1016/j.ins.2007.08.013 · 4.04 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Microarrays produce expression measurements for thousands of genes simultaneously, which is useful for the phenotype classification. We performed a direct integration of individual microarrays with same biological objectives by converting an expression value into a rank value within a sample and built a classifier based on rank comparison. Our classifier is an ensemble method, which has k top-scoring decision rules. Each rule contains a number of genes, a relationship between those genes, and a class label. Current classifiers fix the number of genes in each rule as a pair or a triple. In this paper, we generalized the number of genes involved in each rule. Generalizing the number of genes increases the robustness and the reliability of the classifier. Our algorithm saves resources by combining shorter rules to build a longer- rule, shows a rapid convergence toward its high-scoring rule list, and outperforms the current methods in run-time and classification accuracy.
Frontiers in the Convergence of Bioscience and Information Technologies, 2007. FBIT 2007; 11/2007