Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

University of Houston, Houston, Texas, United States
Journal of Molecular Diagnostics (Impact Factor: 3.96). 09/2005; 7(3):357-67. DOI: 10.1016/S1525-1578(10)60565-X
Source: PubMed

ABSTRACT We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r >or= 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Patients with clinically and pathologically similar breast tumors often have very different outcomes and treatment responses. Current prognostic markers allocate the majority of breast cancer patients to the high-risk group, yielding high sensitivities in expense of specificities below 20%, leading to considerable overtreatment, especially in lymph node-negative patients. Seventy percent would be cured by surgery and radiotherapy alone in this group. Thus, precise and early indicators of metastasis are highly desirable to reduce overtreatment. Previous prognostic RNA-profiling studies have only focused on the protein-coding part of the genome, however the human genome contains thousands of long non-coding RNAs (lncRNAs) and this unexplored field possesses large potential for identification of novel prognostic markers. We evaluated lncRNA microarray data from 164 primary breast tumors from adjuvant naïve patients with a mean follow-up of 18 years. Eighty two patients who developed detectable distant metastasis were compared to 82 patients where no metastases were diagnosed. For validation, we determined the prognostic value of the lncRNA profiles by comparing the ability of the profiles to predict metastasis in two additional, previously-published, cohorts. We showed that lncRNA profiles could distinguish metastatic patients from non-metastatic patients with sensitivities above 90% and specificities of 64-65%. Furthermore; classifications were independent of traditional prognostic markers and time to metastasis. To our knowledge, this is the first study investigating the prognostic potential of lncRNA profiles. Our study suggest that lncRNA profiles provide additional prognostic information and may contribute to the identification of early breast cancer patients eligible for adjuvant therapy, as well as early breast cancer patients that could avoid unnecessary systemic adjuvant therapy. This study emphasizes the potential role of lncRNAs in breast cancer prognosis.
    Breast cancer research: BCR 04/2015; 17(1):55. DOI:10.1186/s13058-015-0557-4 · 5.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There are many potential sources of variability in a microarray experiment. Variation can arise from many aspects of the collection and processing of samples for gene expression analysis. Oligonucleotide-based arrays are thought to minimize one source of variability as identical oligonucleotides are expected to recognize the same transcripts during hybridization. We demonstrate that although the probes on the U133A GeneChip arrays are identical in sequence to probes designed for the U133 Plus 2.0 arrays the values obtained from an experimental hybridization can be quite different. Nearly half of the probesets in common between the two array types can produce slightly different values from the same sample. Nearly 70% of the individual probes in these probesets produced array specific differences. The context of the probe may also contribute some bias to the final measured value of gene expression. At a minimum, this should add an extra level of caution when considering the direct comparison of experiments performed in two microarray formats. More importantly, this suggests that it may not be possible to know which value is the most accurate representation of a biological sample when comparing two formats.
    BMC Genomics 02/2006; 7:153. DOI:10.1186/1471-2164-7-153 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA microarray technologies are used in a variety of biological disciplines. The diversity of platforms and analytical methods employed has raised concerns over the reliability, reproducibility and correlation of data produced across the different approaches. Initial investigations (years 2000-2003) found discrepancies in the gene expression measures produced by different microarray technologies. Increasing knowledge and control of the factors that result in poor correlation among the technologies has led to much higher levels of correlation among more recent publications (years 2004 to present). Here, we review the studies examining the correlation among microarray technologies. We find that with improvements in the technology (optimization and standardization of methods, including data analysis) and annotation, analysis across platforms yields highly correlated and reproducible results. We suggest several key factors that should be controlled in comparing across technologies, and are good microarray practice in general.
    Environmental and Molecular Mutagenesis 06/2007; 48(5):380-94. DOI:10.1002/em.20290 · 2.55 Impact Factor