The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA.
Nature Biotechnology (Impact Factor: 32.44). 08/2010; 28(8):827-38. DOI: 10.1038/nbt.1665
Source: PubMed

ABSTRACT Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: We hypothesized that distinct biological processes might be associated with prognosis and chemotherapy sensitivity in the different types of breast cancers. We performed gene set analyses with BRB-ArrayTools statistical software including 2331 functionally annotated gene sets (ie, lists of genes that correspond to a particular biological pathway or biochemical function) assembled from Ingenuity Pathway Analysis and Gene Ontology databases corresponding to almost all known biological processes. Gene set analysis was performed on gene expression data from three cohorts of 234, 170, and 175 patients with HER2-normal lymph node-negative breast cancer who received no systemic adjuvant therapy to identify gene sets associated prognosis and three additional cohorts of 198, 85, and 62 patients with HER2-normal stage I-III breast cancer who received preoperative chemotherapy to identify gene sets associated with pathological complete response to therapy. These analyses were performed separately for estrogen receptor (ER)-positive and ER-negative breast cancers. Interaction between gene sets and survival and treatment response by breast cancer subtype was assessed in individual datasets and also in pooled datasets. Statistical significance was estimated with permutation test. All statistical tests were two-sided. For ER-positive cancers, from 370 to 434 gene sets were associated with prognosis (P ≤ .05) and from 209 to 267 gene sets were associated with chemotherapy response in analysis by individual dataset. For ER-positive cancers, 131 gene sets were associated with prognosis and 69 were associated with pathological complete response (P ≤.001) in pooled analysis. Increased expression of cell cycle-related gene sets was associated with poor prognosis, and B-cell immunity-related gene sets were associated with good prognosis. For ER-negative cancers, from 175 to 288 gene sets were associated with prognosis and from 212 to 285 gene sets were associated with chemotherapy response. In pooled analyses of ER-negative cancers, 14 gene sets were associated with prognosis and 23 were associated with response. Gene sets involved in sphingolipid and glycolipid metabolism were associated with better prognosis and those involved in base excision repair, cell aging, and spindle microtubule regulation were associated with chemotherapy response. Different biological processes were associated with prognosis and chemotherapy response in ER-positive and ER-negative breast cancers.
    CancerSpectrum Knowledge Environment 02/2011; 103(3):264-72. · 14.07 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.
    Genome biology 09/2013; 14(9):R95. · 10.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new approach for variable influence on projection (VIP) is described, which takes full advantage of the orthogonal projections to latent structures (OPLS) model formalism for enhanced model interpretability. This means that it will include not only the predictive components in OPLS but also the orthogonal components. Four variants of variable influence on projection (VIP) adapted to OPLS have been developed, tested and compared using three different data sets, one synthetic with known properties and two real-world cases. Copyright © 2014 John Wiley & Sons, Ltd.
    Journal of Chemometrics 05/2014; · 1.94 Impact Factor


Available from
May 20, 2014