The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XR, UK.
BMC Medical Genomics (Impact Factor: 3.91). 09/2008; 1:42. DOI: 10.1186/1755-8794-1-42
Source: PubMed

ABSTRACT Background
The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses.

A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics.

Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
    PLoS ONE 06/2014; 9(6):e100335. DOI:10.1371/journal.pone.0100335 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Primary systemic treatment for ovarian cancer is surgery, followed by platinum based chemotherapy. Platinum resistant cancers progress/recur in approximately 25% of cases within six months. We aimed to identify clinically useful biomarkers of platinum resistance. A database of ovarian cancer transcriptomic datasets including treatment and response information was set up by mining the GEO and TCGA repositories. Receiver operator characteristics (ROC) analysis was performed in R for each gene and these were then ranked using their achieved area under the curve (AUC) values. The most significant candidates were selected and in vitro functionally evaluated in four epithelial ovarian cancer cell lines (SKOV-3-, CAOV-3, ES-2 and OVCAR-3), using gene silencing combined with drug treatment in viability and apoptosis assays. We collected 94 tumor samples and the strongest candidate was validated by IHC and qRT-PCR in these. All together 1,452 eligible patients were identified. Based on the ROC analysis the eight most significant genes were JRK, CNOT8, RTF1, CCT3, NFAT2CIP, MEK1, FUBP1 and CSDE1. Silencing of MEK1, CSDE1, CNOT8 and RTF1, and pharmacological inhibition of MEK1 caused significant sensitization in the cell lines. Of the eight genes, JRK (p = 3.2E-05), MEK1 (p = 0.0078), FUBP1 (p = 0.014) and CNOT8 (p = 0.00022) also correlated to progression free survival. The correlation between the best biomarker candidate MEK1 and survival was validated in two independent cohorts by qRT-PCR (n = 34, HR = 5.8, p = 0.003) and IHC (n = 59, HR = 4.3, p = 0.033). We identified MEK1 as a promising prognostic biomarker candidate correlated to response to platinum based chemotherapy in ovarian cancer.
    BMC Cancer 11/2014; 14(1):837. DOI:10.1186/1471-2407-14-837 · 3.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Yes-associated protein (YAP1) is frequently reported to function as an oncogene in many types of cancer, but in breast cancer results remain controversial. We set out to clarify the role of YAP1 in breast cancer by examining gene and protein expression in subgroups of patient material and by downregulating YAP1 in vitro and studying its role in response to the widely used anti-estrogen tamoxifen. YAP1 protein intensity was scored as absent, weak, intermediate or strong in two primary breast cancer cohorts (n = 144 and n = 564) and mRNA expression of YAP1 was evaluated in a gene expression dataset (n = 1107). Recurrence-free survival was analysed using the log-rank test and Cox multivariate analysis was used to test for independence. WST-1 assay was employed to measure cell viability and a luciferase ERE (estrogen responsive element) construct was used to study the effect of tamoxifen, following downregulation of YAP1 using siRNAs. In the ER+ (Estrogen Receptor alpha positive) subgroup of the randomised cohort, YAP1 expression was inversely correlated to histological grade and proliferation (p = 0.001 and p = 0.016, respectively) whereas in the ER- (Estrogen Receptor alpha negative) subgroup YAP1 expression correlated positively to proliferation (p = 0.005). Notably, low YAP1 mRNA was independently associated with decreased recurrence-free survival in the gene expression dataset, specifically for the luminal A subgroup (p < 0.001) which includes low proliferating tumours of lower grade, usually associated with a good prognosis. This subgroup specificity led us to hypothesize that YAP1 may be important for response to endocrine therapies, such as tamoxifen, extensively used for luminal A breast cancers. In a tamoxifen randomised patient material, absent YAP1 protein expression was associated with impaired tamoxifen response which was significant upon interaction analysis (p = 0.042). YAP1 downregulation resulted in increased progesterone receptor (PgR) expression and a delayed and weaker tamoxifen in support of the clinical data. Decreased YAP1 expression is an independent prognostic factor for recurrence in the less aggressive luminal A breast cancer subgroup, likely due to the decreased tamoxifen sensitivity conferred by YAP1 downregulation.
    BMC Cancer 02/2014; 14(1):119. DOI:10.1186/1471-2407-14-119 · 3.32 Impact Factor

Full-text (5 Sources)

Available from
May 23, 2014