Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and Affymetrix GeneChips.
ABSTRACT We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r >or= 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.
[show abstract] [hide abstract]
ABSTRACT: Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtained using three commercially available microarray platforms. RNA was collected from PANC-1 cells grown in serum-rich medium and at 24 h following the removal of serum. Three biological replicates were prepared for each condition, and three experimental replicates were produced for the first biological replicate. RNA was labeled and hybridized to microarrays from three major suppliers according to manufacturers' protocols, and gene expression measurements were obtained using each platform's standard software. For each platform, gene targets from a subset of 2009 common genes were compared. Correlations in gene expression levels and comparisons for significant gene expression changes in this subset were calculated, and showed considerable divergence across the different platforms, suggesting the need for establishing industrial manufacturing standards, and further independent and thorough validation of the technology.Nucleic Acids Research 11/2003; 31(19):5676-84. · 8.03 Impact Factor
[show abstract] [hide abstract]
ABSTRACT: DNA microarrays, used to measure the gene expression of thousands of genes simultaneously, hold promise for future application in efficient screening of therapeutic drugs. This will be aided by the development and population of a database with gene expression profiles corresponding to biological responses to exposures to known compounds whose toxicological and pathological endpoints are well characterized. Such databases could then be interrogated, using profiles corresponding to biological responses to drugs after developmental or environmental exposures. A positive correlation with an archived profile could lead to some knowledge regarding the potential effects of the tested compound or exposure. We have previously shown that cDNA microarrays can be used to generate chemical-specific gene expression profiles that can be distinguished across and within compound classes, using clustering, simple correlation, or principal component analyses. In this report, we test the hypothesis that knowledge can be gained regarding the nature of blinded samples, using an initial training set comprised of gene expression profiles derived from rat liver exposed to clofibrate, Wyeth 14,643, gemfibrozil, or phenobarbital for 24 h or 2 weeks of exposure. Highly discriminant genes were derived from our database training set using approaches including linear discriminant analysis (LDA) and genetic algorithm/K-nearest neighbors (GA/KNN). Using these genes in the analysis of coded liver RNA samples derived from 24-h, 3-day, or 2-week exposures to phenytoin, diethylhexylpthalate, or hexobarbital led to successful prediction of whether these samples were derived from livers of rats exposed to enzyme inducers or to peroxisome proliferators. This validates our initial hypothesis and lends credibility to the concept that the further development of a gene expression database for chemical effects will greatly enhance the hazard identification processes.Toxicological Sciences 07/2002; 67(2):232-40. · 4.65 Impact Factor
Article: Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer.[show abstract] [hide abstract]
ABSTRACT: The goal of this study was to examine the feasibility of developing a multigene predictor of pathologic complete response (pCR) to sequential weekly paclitaxel and fluorouracil + doxorubicin + cyclophosphamide (T/FAC) neoadjuvant chemotherapy regimen for breast cancer. All patients underwent one-time pretreatment fine-needle aspiration to obtain RNA from the cancer for transcriptional profiling using cDNA arrays containing 30721 human sequence clones. Analysis was performed after profiling, and 42 patients' clinical results were available, 24 of which were used for predictive marker discovery; 18 patients' results were used as an independent validation set. Thirty-one percent of patients had pCR (six discovery and seven validation), defined as disappearance of all invasive cancer in the breast after completion of chemotherapy. We could identify no single marker that was sufficiently associated with pCR to be used as an individual predictor. A multigene model with 74 markers (P <or=.09) was built using data from the discovery samples and tested on the validation samples. Overall, a 78% (14 of 18) predictive accuracy was observed, with a 100% (three of three) positive predictive value for pCR, a 73% (11 of 15) negative predictive value, a sensitivity of 43% (three of seven), and a specificity of 100% (11 of 11). The expected response rate to T/FAC neoadjuvant therapy in unselected patients is 28%. Our results suggest that transcriptional profiling has the potential to identify a gene expression pattern in breast cancer that may lead to clinically useful predictors of pCR to T/FAC neoadjuvant therapy.Journal of Clinical Oncology 07/2004; 22(12):2284-93. · 18.37 Impact Factor