Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.

Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA.
Nature Medicine (Impact Factor: 28.05). 07/2001; 7(6):673-9. DOI: 10.1038/89044
Source: PubMed

ABSTRACT The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.

1 Bookmark
  • Frontiers in Bioscience 01/2003; 8(1-3):s913. · 4.25 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies.ResultsA supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile.Conclusions The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
    BMC Bioinformatics 12/2014; 15(1):390. · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Because gene expression profiles in normal cells are different from those of cancer cells, experimental results of the tests can diagnose a cancerous person. However, the gene data are usually with high variable dependent, high dimensional, and very noisy. It is not appropriate to use the original data to train or to test the forecasting model. Depending on the unique properties of the genes expression data, a new statistical dimension reduction method called horizon-vertical dimension reduction method (HVDRM) is developed in this paper. The feature set dimension is reduced from 2000 to 5 by applying HVDRM. Then, the extracted feature set is arranged to train in an artificial neural network (ANN) and a fuzzy neural network (FNN). Keep these two trained models, which is then send to the classification system to examine whether the testing sample is normal or not. Three kinds of experiments are conducted to test the validity, namely, original data for an ANN, reduction feature data for an ANN, and reduced feature data for a FNN. It is found that the testing accuracy of the FNN has the best result. It is concluded that the proposed HVDRM is an effective method to extract feature data and the FNN is more suitable than ANN in the given cancer cell gene detection as the forecasting model.
    2013 International Conference on Fuzzy Theory and Its Applications (iFUZZY); 12/2013

Full-text (2 Sources)

Available from
May 23, 2014