Article

GENE SELECTION USING LOGISTIC REGRESSIONS BASED ON AIC, BIC AND MDL CRITERIA

New Mathematics and Natural Computation (NMNC) 01/2005; 01(01):129-145. pp.129-145
Source: RePEc

ABSTRACT In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables (gene expressions) and the small number of experimental conditions. Many gene-selection and classification methods have been proposed; however most of these treat gene selection and classification separately, and not under the same model. We propose a Bayesian approach to gene selection using the logistic regression model. The Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the minimum description length (MDL) principle are used in constructing the posterior distribution of the chosen genes. The same logistic regression model is then used for cancer classification. Fast implementation issues for these methods are discussed. The proposed methods are tested on several data sets including those arising from hereditary breast cancer, small round blue-cell tumors, lymphoma, and acute leukemia. The experimental results indicate that the proposed methods show high classification accuracies on these data sets. Some robustness and sensitivity properties of the proposed methods are also discussed. Finally, mixing logistic-regression based gene selection with other classification methods and mixing logistic-regression-based classification with other gene-selection methods are considered.

0 0
 · 
0 Bookmarks
 · 
55 Views
  • Article: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.
    [show abstract] [hide abstract]
    ABSTRACT: Diffuse large B-cell lymphoma (DLBCL), the most common subtype of non-Hodgkin's lymphoma, is clinically heterogeneous: 40% of patients respond well to current therapy and have prolonged survival, whereas the remainder succumb to the disease. We proposed that this variability in natural history reflects unrecognized molecular heterogeneity in the tumours. Using DNA microarrays, we have conducted a systematic characterization of gene expression in B-cell malignancies. Here we show that there is diversity in gene expression among the tumours of DLBCL patients, apparently reflecting the variation in tumour proliferation rate, host response and differentiation state of the tumour. We identified two molecularly distinct forms of DLBCL which had gene expression patterns indicative of different stages of B-cell differentiation. One type expressed genes characteristic of germinal centre B cells ('germinal centre B-like DLBCL'); the second type expressed genes normally induced during in vitro activation of peripheral blood B cells ('activated B-like DLBCL'). Patients with germinal centre B-like DLBCL had a significantly better overall survival than those with activated B-like DLBCL. The molecular classification of tumours on the basis of gene expression can thus identify previously undetected and clinically significant subtypes of cancer.
    Nature 03/2000; 403(6769):503-11. · 36.28 Impact Factor
  • Article: Effective dimension reduction methods for tumor classification using gene expression data.
    Bioinformatics. 01/2003; 19:563-570.
  • Source
    Article: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
    [show abstract] [hide abstract]
    ABSTRACT: A reliable and precise classification of tumors is essential for successful treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies which are being used increasingly in cancer research. By allowing the monitoring of expression levels for thousands of genes simultaneously, such techniques may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. In this paper, we compare the performance of different discrimination methods for the classification of tumors based on gene expression data. These methods include: nearest neighbor classifiers, linear discriminant analysis, and classification trees. In our comparison, we also consider recent machine learning approaches such as bagging and boosting. We investigate the use of prediction votes to the confidence of each prediction. The methods are applied to datasets from three recently published cancer gene expression studies.
    07/2000;

Full-text

View
0 Downloads
Available from

Keywords

Bayesian approach
 
Bayesian information criterion
 
cancer classification
 
classification accuracies
 
classification methods
 
data sets
 
experimental conditions
 
experimental results
 
Fast implementation issues
 
gene expressions
 
gene selection
 
gene-selection methods
 
hereditary breast cancer
 
logistic regression model
 
logistic-regression-based classification
 
microarray-based cancer classification
 
minimum description length
 
posterior distribution
 
proposed methods
 
treat gene selection
 

Xiaobo Zhou