The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XR, UK.
BMC Medical Genomics (Impact Factor: 3.47). 09/2008; 1:42. DOI: 10.1186/1755-8794-1-42
Source: PubMed Central

ABSTRACT Background: The number of gene expression studies in the public domain is rapidly increasing,
representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at
the raw transcript level, even when the RNA is from comparable sources and has been processed
on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix
data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined
for meaningful meta-analyses.
Results: A series of validation datasets comparing breast cancer and normal breast cell lines
(MCF7 and MCF10A) were generated to examine the variability between datasets generated using
different amounts of starting RNA, alternative protocols, different generations of Affymetrix
GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are
introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple
batch mean-centering was found to significantly reduce the level of inter-experimental variation,
allowing raw transcript levels to be compared across datasets with confidence. By accounting for
dataset-specific bias, we were able to assemble the largest gene expression dataset of primary
breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we
demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in
differentially expressed genes and more accurate prognostic predictions. However, this is highly
dependent upon the composition of the datasets and patient characteristics.
Conclusion: Multiplicative, systematic biases are introduced at many stages of microarray
experiments. When these are reconciled, raw data can be directly integrated from different gene
expression datasets leading to new biological findings with increased statistical power.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Salvador/Warts/Hippo (Hippo) signaling pathway defines a novel signaling cascade regulating cell contact inhibition, organ size control, cell growth, proliferation, apoptosis and cancer development in mammals. The upstream regulation of this pathway has been less well defined than the core kinase cassette. KIBRA has been shown to function as an upstream member of the Hippo pathway by influencing the phosphorylation of LATS and YAP, but functional consequences of these biochemical changes have not been previously addressed. We show that in MCF10A cells, loss of KIBRA expression displays epithelial-to-mesenchymal transition (EMT) features, which are concomitant with decreased LATS and YAP phosphorylation, but not MST1/2. In addition, ectopic KIBRA expression antagonizes YAP via the serine 127 phosphorylation site and we show that KIBRA, Willin and Merlin differentially regulate genes controlled by YAP. Finally, reduced KIBRA expression in primary breast cancer specimens correlates with the recently described claudin-low subtype, an aggressive sub-group with EMT features and a poor prognosis.Oncogene advance online publication, 21 May 2012; doi:10.1038/onc.2012.196.
    Oncogene 05/2012; · 8.56 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biological heterogeneity represents a major obstacle for cancer treatment. Therefore, characterization of treatment-relevant tumor heterogeneity is necessary to develop more effective therapies in the future. Here, we uncovered population heterogeneity among PAX/FOXO1-positive alveolar rhabdomyosarcoma by characterizing pro-survival networks initiated by FGFR4 signaling. We found that FGFR4 signaling rescues only subgroups of alveolar rhabdomyosarcoma cells from apoptosis induced by compounds targeting the IGF1R-PI3K-mTOR pathway. Differences in both pro-apoptotic machinery and FGFR4-activated signaling are involved in the different behaviour of the phenotypes. Pro-apoptotic stress induced by the kinase inhibitors is sensed by Bim/Bad in rescue cells and by Bmf in non-rescue cells. Anti-apoptotic ERK1/2 signaling downstream of FGFR4 is long-lasting in rescue and short-termed in most non-rescue cells. Gene expression analysis detected signatures specific for these two groups also in biopsy samples. The different cell phenotypes are present in different ratios in alveolar rhabdomyosarcoma tumors and can be identified by AP2β expression levels. Hence, inhibiting FGFR signaling might represent an important strategy to enhance efficacy of current RMS treatments. © 2014 Wiley Periodicals, Inc.
    International Journal of Cancer 02/2014; · 6.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genomic technology continues to advance, and data derived from non-small-cell lung cancer (NSCLC) tumor specimens in conjunction with clinical information are accumulating at an exponential rate. Application of this information to clinical practice for the treatment of patients with NSCLC lags behind the promise of individualized patient management based on genomic medicine. Testing treatment decisions based on genomic information in cancer clinical trials is only now being addressed. How best to incorporate the myriad of potentially available molecular diagnostics into treatment algorithms is not yet clear. Many hurdles and much work remain for the development of true, individualized treatment strategies for NSCLC based on molecular staging. Here we review some of the successes, frustrations and obstacles that exist to further progress in the field.
    Expert Review of Respiratory Medicine 08/2010; 4(4):499-508.

Full-text (4 Sources)

Available from
May 23, 2014