[Show abstract][Hide abstract] ABSTRACT: Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
[Show abstract][Hide abstract] ABSTRACT: Current advances in genomics, proteomics, and metabonomics would result in a constellation of benefits in human health. Classification applying supervised learning methods to omics data as one of the molecular classification approaches has enjoyed its growing role in clinical application. However, the utility of a molecular classifier will not be fully appreciated unless its quality is carefully validated. A clinical omics data is usually noisy with the number of independent variables far more than the number of subjects and, possibly, with a skewed subject distribution. Given that, the consensus approach holds an advantage over a single classifier. Thus, the focus of this review is mainly placed on how validating a molecular classifier using Decision Forest (DF), a robust consensus approach. We recommended that a molecular classifier has to be assessed with respect to overall prediction accuracy, prediction confidence and chance correlation, which can be readily achieved in DF. The commonalities and differences between external validation and cross-validation are also discussed for perspective use of these methods to validate a DF classifier. In addition, the advantages of using consensus approaches for identification of potential biomarkers are also rationalized. Although specific DF examples are used in this review, the provided rationales and recommendations should be equally applicable to other consensus methods.
Toxicology mechanisms and methods 01/2006; 16(2-3):59-68. · 1.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Standard controls and best practice guidelines advance acceptance of data from research, preclinical and clinical laboratories by providing a means for evaluating data quality. The External RNA Controls Consortium (ERCC) is developing commonly agreed-upon and tested controls for use in expression assays, a true industry-wide standard control.
[Show abstract][Hide abstract] ABSTRACT: The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630-631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676-5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.
We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.
Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
[Show abstract][Hide abstract] ABSTRACT: Although differentiation of leukemic blasts to dendritic cells (DC) has promise in vaccine strategies, the mechanisms underlying this differentiation and the differences between leukemia and normal progenitor-derived DC are largely undescribed. In the case of chronic myeloid leukemia (CML), understanding the relationship between the induction of DC differentiation and the expression of the BCR-ABL oncogene has direct relevance to CML biology as well as the development of new therapeutic approaches. We now report that direct activation of protein kinase C (PKC) by the phorbol ester PMA in the BCR-ABL(+) CML cell line K562 and primary CML blasts induced nonterminal differentiation into cells with typical DC morphology (cytoplasmic dendrites), characteristic surface markers (MHC class I, MHC class II, CD86, CD40), chemokine and transcription factor expression, and ability to stimulate T cell proliferation (equivalent to normal monocyte-derived DC). PKC-induced differentiation was associated with down-regulation of BCR-ABL mRNA expression, protein levels, and kinase activity. This down-regulation appeared to be signaled through the mitogen-activated protein kinase pathway. Therefore, PKC-driven differentiation of CML blasts into DC-like cells suggests a potentially novel strategy to down-regulate BCR-ABL activity, yet raises the possibility that CML-derived DC vaccines will be less effective in presenting leukemia-specific Ags.
The Journal of Immunology 09/2003; 171(4):1780-91. · 5.52 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG-U95A human GeneChip arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip arrays. We display some familiar features of the perfect match and mismatch probe (PM and MM) values of these data, and examine the variance-mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities.
[Show abstract][Hide abstract] ABSTRACT: In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix system with the objective of improving upon currently used measures of gene expression.