The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models

National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA.
Nature Biotechnology (Impact Factor: 41.51). 08/2010; 28(8):827-38. DOI: 10.1038/nbt.1665
Source: PubMed

ABSTRACT Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

Download full-text


Available from: Giuseppe Jurman, Sep 26, 2015
1 Follower
91 Reads
  • Source
    • "The prediction performance of the models trained on DrugMatrix and tested on TG-GATEs provides supporting evidence of the validity of our approach since significant classification accuracy was achieved across datasets despite the difference in experimental conditions (dose and time) of the two datasets, and the known dataset-to-dataset bias inherent in the Affymetrix microarray platform [27], [28]. To further evaluate the best achievable classification performance, we next applied our random resampling scheme within the TG-GATEs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to chemical compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of chemical carcinogenicity. Results In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds, including genotoxic and non-genotoxic carcinogens. We built a classifier that predicts a chemical's carcinogenic potential with an AUC of 0.78, and validated it on an independent dataset from the Japanese Toxicogenomics Project consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors. Conclusion Here, we validate a toxicogenomic approach to predict carcinogenicity and provide strong evidence that, with a larger set of compounds, we should be able to improve the sensitivity and specificity of the predictions. We found that the prediction of carcinogenicity is tissue-dependent and that the results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the aryl hydrocarbon receptor, and regenerative pathology in the response to carcinogen exposure.
    PLoS ONE 07/2014; 9(7):e102579. DOI:10.1371/journal.pone.0102579 · 3.23 Impact Factor
  • Source
    • "However, since no method is perfect, each pre-processing pipeline removes a somewhat different aspect of the “noise”. Indeed, groups around the world have focused on identifying the “optimal” pre-processing technique for different types of data [37,38]. The principle of ensemble classification is that by combining pre-processing approaches we can select the parts of the data which are reliable across the multiple approaches. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.
    BMC Bioinformatics 06/2014; 15(1):170. DOI:10.1186/1471-2105-15-170 · 2.58 Impact Factor
  • Source
    • "Reference materials, controls, and QC standards need to be defined for clinical grade RNASeq in the same way these are becoming standardized for clinical DNASeq. An advantage for the clinical RNASeq field is the availability of the highly qualified human reference MAQC-A and MAQC-B reference materials and the extensive data on tissue-specific expression of potential housekeeping genes from exhaustive microarray profiling (79). This approach has been utilized to test and aid data correction in RNASeq in research settings and may find easy integration into clinical practice as well (80). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past decade, next-generation sequencing (NGS) technology has experienced meteoric growth in the aspects of platform, technology, and supporting bioinformatics development allowing its widespread and rapid uptake in research settings. More recently, NGS-based genomic data have been exploited to better understand disease development and patient characteristics that influence response to a given therapeutic intervention. Cancer, as a disease characterized by and driven by the tumor genetic landscape, is particularly amenable to NGS-based diagnostic (Dx) approaches. NGS-based technologies are particularly well suited to studying cancer disease development, progression and emergence of resistance, all key factors in the development of next-generation cancer Dxs. Yet, to achieve the promise of NGS-based patient treatment, drug developers will need to overcome a number of operational, technical, regulatory, and strategic challenges. Here, we provide a succinct overview of the state of the clinical NGS field in terms of the available clinically targeted platforms and sequencing technologies. We discuss the various operational and practical aspects of clinical NGS testing that will facilitate or limit the uptake of such assays in routine clinical care. We examine the current strategies for analytical validation and Food and Drug Administration (FDA)-approval of NGS-based assays and ongoing efforts to standardize clinical NGS and build quality control standards for the same. The rapidly evolving companion diagnostic (CDx) landscape for NGS-based assays will be reviewed, highlighting the key areas of concern and suggesting strategies to mitigate risk. The review will conclude with a series of strategic questions that face drug developers and a discussion of the likely future course of NGS-based CDx development efforts.
    Frontiers in Oncology 04/2014; 4:78. DOI:10.3389/fonc.2014.00078
Show more