Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

University of Houston, Houston, Texas, United States
Journal of Molecular Diagnostics (Impact Factor: 4.85). 09/2005; 7(3):357-67. DOI: 10.1016/S1525-1578(10)60565-X
Source: PubMed


We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r >or= 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.

Download full-text


Available from: Kevin Robert Coombes,
  • Source
    • "The expression pattern of untranfected cells and several clones from cells transfected with the empty vector or a dominant negative form of human Cdk5 were compared by microarray analysis. Microarrays and cDNA synthesis were performed following the GeneChip® WT cDNA Synthesis and amplification Kit (Affymetrix) and following the Affymetrix GeneChip® Whole Transcript (WT) Sense Target standard protocol [21,22]. The array data were summarized and normalized with the RMA algorithm using the Affymetrix Expression Console software. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cyclin-dependent kinase-5 (Cdk5) is over-expressed in both neurons and microvessels in hypoxic regions of stroke tissue and has a significant pathological role following hyper-phosphorylation leading to calpain-induced cell death. Here, we have identified a critical role of Cdk5 in cytoskeleton/focal dynamics, wherein its activator, p35, redistributes along actin microfilaments of spreading cells co-localising with p(Tyr15)Cdk5, talin/integrin beta-1 at the lamellipodia in polarising cells. Cdk5 inhibition (roscovitine) resulted in actin-cytoskeleton disorganisation, prevention of protein co-localization and inhibition of movement. Cells expressing Cdk5 (D144N) kinase mutant, were unable to spread, migrate and form tube-like structures or sprouts, while Cdk5 wild-type over-expression showed enhanced motility and angiogenesis in vitro, which was maintained during hypoxia. Gene microarray studies demonstrated myocyte enhancer factor (MEF2C) as a substrate for Cdk5-mediated angiogenesis in vitro. MEF2C showed nuclear co-immunoprecipitation with Cdk5 and almost complete inhibition of differentiation and sprout formation following siRNA knock-down. In hypoxia, insertion of Cdk5/p25-inhibitory peptide (CIP) vector preserved and enhanced in vitro angiogenesis. These results demonstrate the existence of critical and complementary signalling pathways through Cdk5 and p35, and through which coordination is a required factor for successful angiogenesis in sustained hypoxic condition.
    PLoS ONE 09/2013; 8(9):e75538. DOI:10.1371/journal.pone.0075538 · 3.23 Impact Factor
  • Source
    • "This particularly affects predictive signatures derived from gene expression microarray data. For example, a drop in predictive accuracy across two different technology platforms measuring a common set of samples has been found [7]. The misclassification rate raised from 2 to 19.5% in this study. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs (kTSP). Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing. For each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In particular, the better predictive results of DV in across platform classification indicate higher robustness of the classifier when trained on single channel data and applied to gene expression ratios. We present a systematic evaluation of strategies for the integration of independent microarray studies in a classification task. Our findings in across studies classification may guide further research aiming on the construction of more robust and reliable methods for stratification and diagnosis in clinical practice.
    BMC Bioinformatics 12/2009; 10(1):453. DOI:10.1186/1471-2105-10-453 · 2.58 Impact Factor
  • Source
    • "The re-examination of the NCI cancer cell lines [Carter et al., 2005] using sequence-driven probe matching, described earlier, exemplifies the importance of ensuring the appropriate comparisons of probes/genes are made in cross-platform analyses. Similarly, Stec et al. [2005] compared platforms using either UniGene identifiers or by sequence matching using BLAST alignments. They found higher correlations when the Affymetrix probe identifiers were sequencematched to ensure they fell within the cDNA probes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA microarray technologies are used in a variety of biological disciplines. The diversity of platforms and analytical methods employed has raised concerns over the reliability, reproducibility and correlation of data produced across the different approaches. Initial investigations (years 2000-2003) found discrepancies in the gene expression measures produced by different microarray technologies. Increasing knowledge and control of the factors that result in poor correlation among the technologies has led to much higher levels of correlation among more recent publications (years 2004 to present). Here, we review the studies examining the correlation among microarray technologies. We find that with improvements in the technology (optimization and standardization of methods, including data analysis) and annotation, analysis across platforms yields highly correlated and reproducible results. We suggest several key factors that should be controlled in comparing across technologies, and are good microarray practice in general.
    Environmental and Molecular Mutagenesis 06/2007; 48(5):380-94. DOI:10.1002/em.20290 · 2.63 Impact Factor
Show more