Analysis of DNA microarray expression data

Biometric Research Branch, Division of Cancer Treatment & Diagnosis, National Cancer Institute, 9000 Rockville Pike, Bethesda, MD 20892-7434, USA.
Best practice & research. Clinical haematology (Impact Factor: 2.12). 07/2009; 22(2):271-82. DOI: 10.1016/j.beha.2009.07.001
Source: PubMed


DNA microarrays are powerful tools for studying biological mechanisms and for developing prognostic and predictive classifiers for identifying the patients who require treatment and are best candidates for specific treatments. Because microarrays produce so much data from each specimen, they offer great opportunities for discovery and great dangers or producing misleading claims. Microarray based studies require clear objectives for selecting cases and appropriate analysis methods. Effective analysis of microarray data, where the number of measured variables is orders of magnitude greater than the number of cases, requires specialized statistical methods which have recently been developed. Recent literature reviews indicate that serious problems of analysis exist a substantial proportion of publications. This manuscript attempts to provide a non-technical summary of the key principles of statistical design and analysis for studies that utilize microarray expression profiling.

Download full-text


Available from: Richard Simon, Sep 17, 2014
  • Source
    • "Developing predictive and prognostic classifiers to recognize the patient highly requires action and forms as the most excellent candidate form for specific treatments. As microarrays construct as much of data from every specimen, [2] the method provides with the greater opportunity for discovering huge dangers on misleading claims. DNA microarrays provide enormous occasion for discovery and progress of predictive oncology but with a greater tradeoff value between the opportunity and mounting false claims. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.
    Full-text · Article · Jun 2014

  • No preview · Article · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a novel classifier is proposed to classify microarray data using principal curves. Principal curves are the non-linear generalization of principal components. Intuitively, a principal curve 'passes through the middle of the data cloud'. As a kind of new classification technique, Principal Curve-based classifier (PC) involves a novel way of computing a principal curve for each class using the training data. A test sample is given the class-label of the principal curve that is closest to it according to Expected Squared Error. Experimental results illustrate the performance of the PC is better than other existing approaches when a very small sample size is concerned.
    No preview · Conference Paper · Aug 2010
Show more