Correction of technical bias in clinical microarray data improves concordance with known biological information

Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology (CHIP@HST), Harvard Medical School, Boston, MA 02115, USA.
Genome biology (Impact Factor: 10.81). 02/2008; 9(2):R26. DOI: 10.1186/gb-2008-9-2-r26
Source: PubMed


The performance of gene expression microarrays has been well characterized using controlled reference samples, but the performance on clinical samples remains less clear. We identified sources of technical bias affecting many genes in concert, thus causing spurious correlations in clinical data sets and false associations between genes and clinical variables. We developed a method to correct for technical bias in clinical microarray data, which increased concordance with known biological relationships in multiple data sets.

Download full-text


Available from: PubMed Central · License: CC BY
  • Source
    • "Samples were hybridized on a microarray slide containing almost 44,000 probes per array coding for ~ 14,000 gene transcripts indicating that for a subset of genes more than one probe was present. If the position of the probe is nearer to the 3' end of the corresponding gene, signal intensity is expected to be higher[25]and chance of incorrect signal by variations in RNA integrity is smaller[26]. Therefore, the expression of the probes corresponding to the most 3' ends of genes was used for the analysis[27,28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Genes and signalling pathways involved in pluripotency have been studied extensively in mouse and human pre-implantation embryos and embryonic stem (ES) cells. The unsuccessful attempts to generate ES cell lines from other species including cattle suggests that other genes and pathways are involved in maintaining pluripotency in these species. To investigate which genes are involved in bovine pluripotency, expression profiles were generated from morula, blastocyst, trophectoderm and inner cell mass (ICM) samples using microarray analysis. As MAPK inhibition can increase the NANOG/GATA6 ratio in the inner cell mass, additionally blastocysts were cultured in the presence of a MAPK inhibitor and changes in gene expression in the inner cell mass were analysed. Results Between morula and blastocyst 3,774 genes were differentially expressed and the largest differences were found in blastocyst up-regulated genes. Gene ontology (GO) analysis shows lipid metabolic process as the term most enriched with genes expressed at higher levels in blastocysts. Genes with higher expression levels in morulae were enriched in the RNA processing GO term. Of the 497 differentially expressed genes comparing ICM and TE, the expression of NANOG, SOX2 and POU5F1 was increased in the ICM confirming their evolutionary preserved role in pluripotency. Several genes implicated to be involved in differentiation or fate determination were also expressed at higher levels in the ICM. Genes expressed at higher levels in the ICM were enriched in the RNA splicing and regulation of gene expression GO term. Although NANOG expression was elevated upon MAPK inhibition, SOX2 and POU5F1 expression showed little increase. Expression of other genes in the MAPK pathway including DUSP4 and SPRY4, or influenced by MAPK inhibition such as IFNT, was down-regulated. Conclusion The data obtained from the microarray studies provide further insight in gene expression during bovine embryonic development. They show an expression profile in pluripotent cells that indicates a pluripotent, epiblast-like state. The inability to culture ICM cells as stem cells in the presence of an inhibitor of MAPK activity together with the reported data indicates that MAPK inhibition alone is not sufficient to maintain a pluripotent character in bovine cells. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1448-x) contains supplementary material, which is available to authorized users.
    Full-text · Article · Apr 2015 · BMC Genomics
  • Source
    • "Data from 2 independent experiments were averaged and only probes with a log2-ratio above 1 or below -1 were considered. Only probes with log intensity >8 and <14 were taken in account, to avoid non linear effects caused by the noise floor at low intensities or by saturation at high [19]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The amygdala is a brain structure considered a key node for the regulation of neuroendocrine stress response. Stress-induced response in amygdala is accomplished through neurotransmitter activation and an alteration of gene expression. MicroRNAs (miRNAs) are important regulators of gene expression in the nervous system and are very well suited effectors of stress response for their ability to reversibly silence specific mRNAs. In order to study how acute stress affects miRNAs expression in amygdala we analyzed the miRNA profile after two hours of mouse restraint, by microarray analysis and reverse transcription real time PCR. We found that miR-135a and miR-124 were negatively regulated. Among in silico predicted targets we identified the mineralocorticoid receptor (MR) as a target of both miR-135a and miR-124. Luciferase experiments and endogenous protein expression analysis upon miRNA upregulation and inhibition allowed us to demonstrate that mir-135a and mir-124 are able to negatively affect the expression of the MR. The increased levels of the amygdala MR protein after two hours of restraint, that we analyzed by western blot, negatively correlate with miR-135a and miR-124 expression. These findings point to a role of miR-135a and miR-124 in acute stress as regulators of the MR, an important effector of early stress response.
    Full-text · Article · Sep 2013 · PLoS ONE
  • Source
    • "In some cases the normalization algorithm itself may be a source of bias [5]. Several technical factors can be inferred post hoc from raw microarray data; e.g. the level of negative control probes can indicate changes in the noise floor, or the width of the distribution of expression values can indicate dynamic range [3]. We do not necessarily expect the bias metrics to be independent of each other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results.
    Preview · Article · Apr 2013 · PLoS ONE
Show more