Binding Profiles of Chromatin-Modifying Proteins Are Predictive for Transcriptional Activity and Promoter-Proximal Pausing

Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
Journal of computational biology: a journal of computational molecular cell biology (Impact Factor: 1.74). 02/2012; 19(2):126-38. DOI: 10.1089/cmb.2011.0258
Source: PubMed

ABSTRACT The establishment and maintenance of proper gene expression patterns is essential for stable cell differentiation. Using unsupervised learning techniques, chromatin states have been linked to discrete gene expression states, but these models cannot predict continuous gene expression levels, nor do they reveal detailed insight into the chromatin-based control of gene expression. Here, we employ regularized regression techniques to link, in a quantitative manner, binding profiles of chromatin proteins to gene expression levels and promoter-proximal pausing of RNA polymerase II in Drosophila melanogaster on a genome-wide scale. We apply stability selection to reliably detect interactions of chromatin features and predict several known, suggested, and novel proteins and protein pairs as transcriptional activators or repressors. Our integrative analysis reveals new insights into the complex interplay of transcriptional regulators in the context of gene expression. Supplementary Material is available at

4 Reads
  • Source
    • "Additional evidence for cooperation between ASH1 and FSH during gene activation is provided by our recent study, applying regression models in order to predict gene expression based on chromatin binding profiles. In this quantitative modeling framework, the two proteins form an interaction pair [31]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The question of how cells re-establish gene expression states after cell division is still poorly understood. Genetic and molecular analyses have indicated that Trithorax group (TrxG) proteins are critical for the long-term maintenance of active gene expression states in many organisms. A generally accepted model suggests that TrxG proteins contribute to maintenance of transcription by protecting genes from inappropriate Polycomb group (PcG)-mediated silencing, instead of directly promoting transcription. Results and discussion Here we report a physical and functional interaction in Drosophila between two members of the TrxG, the histone methyltransferase ASH1 and the bromodomain and extraterminal family protein FSH. We investigated this interface at the genome level, uncovering a widespread co-localization of both proteins at promoters and PcG-bound intergenic elements. Our integrative analysis of chromatin maps and gene expression profiles revealed that the observed ASH1-FSH binding pattern at promoters is a hallmark of active genes. Inhibition of FSH-binding to chromatin resulted in global down-regulation of transcription. In addition, we found that genes displaying marks of robust PcG-mediated repression also have ASH1 and FSH bound to their promoters. Conclusions Our data strongly favor a global coactivator function of ASH1 and FSH during transcription, as opposed to the notion that TrxG proteins impede inappropriate PcG-mediated silencing, but are dispensable elsewhere. Instead, our results suggest that PcG repression needs to overcome the transcription-promoting function of ASH1 and FSH in order to silence genes.
    Genome biology 02/2013; 14(2):R18. DOI:10.1186/gb-2013-14-2-r18 · 10.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The random forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and return measures of variable importance. This paper synthesizes 10 years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is paid to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research. © 2012 Wiley Periodicals, Inc.
    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11/2012; 2(6). DOI:10.1002/widm.1072 · 1.59 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an "expression quantitative trait locus" (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how "dynamic eQTL" were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
    PLoS Genetics 06/2013; 9(6):e1003514. DOI:10.1371/journal.pgen.1003514 · 7.53 Impact Factor
Show more