Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PSClassification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7: 673-679

Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA.
Nature Medicine (Impact Factor: 27.36). 07/2001; 7(6):673-9. DOI: 10.1038/89044
Source: PubMed


The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.

Download full-text


Available from: Markus Ringnér,
  • Source
    • "With this abundance of gene expression data nowadays, the researchers have the opportunity to do cancer classification using gene expression data. In recent years, a lot of machine learning methods have been proposed to do cancer classification using gene expression data such as clustering-based methods [1], [2], k-nearest neighbor method [3], artificial neural network method [4], and support vector machine method [5], to name a few. However, there still exist a lot of issues needed to be identified and understood. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A successful classification of different tumor types is essential for successful treatment of cancer. However, most prior cancer classification methods are clinical-based and have inadequate diagnostic ability. Cancer classification using gene expression data is very important in cancer diagnosis and drug discovery. The introduction of DNA microarray techniques has made simultaneous monitoring of thousands of gene expression probable. With this abundance of gene expression data nowadays, the researchers have the opportunity to do cancer classification using gene expression data. In recent years, a lot of machine learning methods have been proposed to do cancer classification using gene expression data such as clustering-based methods, k-nearest neighbor method, artificial neural network method, and support vector machine method, to name a few. In this paper, we present the un-normalized graph p-Laplacian semi-supervised learning methods. These methods will be applied to the patient-patient network constructed from the gene expression data to predict the tumor types of all patients in the network. These methods are based on the assumption that the labels of two adjacent patients in the network are likely to be the same. The experiments show that that the un-normalized graph p-Laplacian semi-supervised learning methods are at least as good as the current state of the art network-based method (the un-normalized graph Laplacian based semi-supervised learning method) but often lead to better classification accuracy performance measures.
  • Source
    • "In the literature, various classifiers have been investigated in order to find the best classifier. It seems that the NN and various types of NN [29] [36] [57] [6] [56] [68] [74] [81] [69] [16], k nearest neighbors [61] [13], k-means algorithms [32], Fuzzy c-means algorithm [11], bayesian networks [4], vector quantization based classifier [59], manifold methods [18] [80], fuzzy approaches [54] [58] [30] [60], complementary learning fuzzy neural network [64] [65] [66] [67], ensemble learning [55] [8] [27] [50], logistic regression, support vector machines [22] [5] [82] [73] [63] [46] [70], LSVM [44], wavelet transform [28] as well as radial basis-support vector machines [51] have been investigated successfully in classification and cancer detection. But the recently developed classifiers such as brain emotional learning (BEL) networks [42] have not been examined in this field. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and Brain Emotional Learning (BEL) network for the classification tasks of gene-expression microarray data. BEL network is a computational neural model of the emotional brain which simulates its neuropsychological features. The distinctive feature of BEL is its low computational complexity which makes it suitable for high dimensional feature vector classification. Thus BEL can be adopted in pattern recognition in order to overcome the curse of dimensionality problem. In the experimental studies, the proposed model is utilized for the classification problems of the small round blue cell tumors (SRBCTs), high grade gliomas (HGG), lung, colon and breast cancer datasets. According to the results based on 5-fold cross validation, the PCA–BEL provides an average accuracy of 100%, 96%, 98.32%, 87.40% and 88% in these datasets respectively. Therefore, they can be effectively used in gene-expression microarray classification tasks.
    Computers in Biology and Medicine 09/2014; 54:180–187. DOI:10.1016/j.compbiomed.2014.09.008 · 1.24 Impact Factor
  • Source
    • "In this section, we will discuss the implementation of coinertia analysis (CIA) to cross-platform visualization in MADE4 and ADE4 to perform multivariate analysis of microarray datasets. To demonstrate, PCA was applied on 4 childhood tumors (NB, BL-NHL, EWS, and RMS) from a microarray gene expression profiling study [52]. From these data, a subset (khan$train, 206 genes × 64 cases), each case's factor denoting the respective class (khan$train classes, length = 64), and a gene annotation's data frame are accessible in aforementioned dataset in MADE4: "
    [Show abstract] [Hide abstract]
    ABSTRACT: When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method.
    BioMed Research International 08/2014; 2014:213656. DOI:10.1155/2014/213656 · 2.71 Impact Factor
Show more