The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XR, UK.
BMC Medical Genomics (Impact Factor: 3.47). 09/2008; 1:42. DOI: 10.1186/1755-8794-1-42
Source: PubMed Central

ABSTRACT Background: The number of gene expression studies in the public domain is rapidly increasing,
representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at
the raw transcript level, even when the RNA is from comparable sources and has been processed
on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix
data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined
for meaningful meta-analyses.
Results: A series of validation datasets comparing breast cancer and normal breast cell lines
(MCF7 and MCF10A) were generated to examine the variability between datasets generated using
different amounts of starting RNA, alternative protocols, different generations of Affymetrix
GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are
introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple
batch mean-centering was found to significantly reduce the level of inter-experimental variation,
allowing raw transcript levels to be compared across datasets with confidence. By accounting for
dataset-specific bias, we were able to assemble the largest gene expression dataset of primary
breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we
demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in
differentially expressed genes and more accurate prognostic predictions. However, this is highly
dependent upon the composition of the datasets and patient characteristics.
Conclusion: Multiplicative, systematic biases are introduced at many stages of microarray
experiments. When these are reconciled, raw data can be directly integrated from different gene
expression datasets leading to new biological findings with increased statistical power.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Biological heterogeneity represents a major obstacle for cancer treatment. Therefore, characterization of treatment-relevant tumor heterogeneity is necessary to develop more effective therapies in the future. Here, we uncovered population heterogeneity among PAX/FOXO1-positive alveolar rhabdomyosarcoma by characterizing pro-survival networks initiated by FGFR4 signaling. We found that FGFR4 signaling rescues only subgroups of alveolar rhabdomyosarcoma cells from apoptosis induced by compounds targeting the IGF1R-PI3K-mTOR pathway. Differences in both pro-apoptotic machinery and FGFR4-activated signaling are involved in the different behaviour of the phenotypes. Pro-apoptotic stress induced by the kinase inhibitors is sensed by Bim/Bad in rescue cells and by Bmf in non-rescue cells. Anti-apoptotic ERK1/2 signaling downstream of FGFR4 is long-lasting in rescue and short-termed in most non-rescue cells. Gene expression analysis detected signatures specific for these two groups also in biopsy samples. The different cell phenotypes are present in different ratios in alveolar rhabdomyosarcoma tumors and can be identified by AP2β expression levels. Hence, inhibiting FGFR signaling might represent an important strategy to enhance efficacy of current RMS treatments. © 2014 Wiley Periodicals, Inc.
    International Journal of Cancer 02/2014; · 6.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Yes-associated protein (YAP1) is frequently reported to function as an oncogene in many types of cancer, but in breast cancer results remain controversial. We set out to clarify the role of YAP1 in breast cancer by examining gene and protein expression in subgroups of patient material and by downregulating YAP1 in vitro and studying its role in response to the widely used anti-estrogen tamoxifen. YAP1 protein intensity was scored as absent, weak, intermediate or strong in two primary breast cancer cohorts (n = 144 and n = 564) and mRNA expression of YAP1 was evaluated in a gene expression dataset (n = 1107). Recurrence-free survival was analysed using the log-rank test and Cox multivariate analysis was used to test for independence. WST-1 assay was employed to measure cell viability and a luciferase ERE (estrogen responsive element) construct was used to study the effect of tamoxifen, following downregulation of YAP1 using siRNAs. In the ER+ (Estrogen Receptor alpha positive) subgroup of the randomised cohort, YAP1 expression was inversely correlated to histological grade and proliferation (p = 0.001 and p = 0.016, respectively) whereas in the ER- (Estrogen Receptor alpha negative) subgroup YAP1 expression correlated positively to proliferation (p = 0.005). Notably, low YAP1 mRNA was independently associated with decreased recurrence-free survival in the gene expression dataset, specifically for the luminal A subgroup (p < 0.001) which includes low proliferating tumours of lower grade, usually associated with a good prognosis. This subgroup specificity led us to hypothesize that YAP1 may be important for response to endocrine therapies, such as tamoxifen, extensively used for luminal A breast cancers. In a tamoxifen randomised patient material, absent YAP1 protein expression was associated with impaired tamoxifen response which was significant upon interaction analysis (p = 0.042). YAP1 downregulation resulted in increased progesterone receptor (PgR) expression and a delayed and weaker tamoxifen in support of the clinical data. Decreased YAP1 expression is an independent prognostic factor for recurrence in the less aggressive luminal A breast cancer subgroup, likely due to the decreased tamoxifen sensitivity conferred by YAP1 downregulation.
    BMC Cancer 02/2014; 14(1):119. · 3.33 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
    PLoS ONE 01/2014; 9(6):e100335. · 3.73 Impact Factor

Full-text (4 Sources)

Available from
May 23, 2014