Inferring direct DNA binding from ChIP-seq
relative DNA-binding affinity of these proteins in vivo. However, standard motif enrichment analysis and motif discovery approaches sometimes fail to correctly identify the binding
motif for the ChIP-ed factor. To overcome this problem, we propose ‘central motif enrichment analysis’ (CMEA), which is based
on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal,
well centered and maximal in the precise center of the ChIP-seq peak regions. We describe a novel visualization and statistical
analysis tool—CentriMo—that identifies the region of maximum central enrichment in a set of ChIP-seq peak regions and displays
the positional distributions of predicted sites. Using CentriMo for motif enrichment analysis, we provide evidence that one
transcription factor (Nanog) has different binding affinity in vivo than in vitro, that another binds DNA cooperatively (E2f1), and confirm the in vivo affinity of NFIC, rescuing a difficult ChIP-seq data set. In another data set, CentriMo strongly suggests that there is no
evidence of direct DNA binding by the ChIP-ed factor (Smad1). CentriMo is now part of the MEME Suite software package available
at http://meme.nbcr.net. All data and output files presented here are available at: http://research.imb.uq.edu.au/t.bailey/sd/Bailey2011a.
Get notified about updates to this publicationFollow publication
- SourceAvailable from: Deborah L Gumucio
[Show abstract] [Hide abstract] ABSTRACT: Background: The Hedgehog (Hh) signaling pathway, acting through three homologous transcription factors (GLI1, GLI2, GLI3) in vertebrates, plays multiple roles in embryonic organ development and adult tissue homeostasis. At the level of the genome, GLI factors bind to specific motifs in enhancers, some of which are hundreds of kilobases removed from the gene promoter. These enhancers integrate the Hh signal in a context-specific manner to control the spatiotemporal pattern of target gene expression. Importantly, a number of genes that encode Hh pathway molecules are themselves targets of Hh signaling, allowing pathway regulation by an intricate balance of feed-back activation and inhibition. However, surprisingly few of the critical enhancer elements that control these pathway target genes have been identified despite the fact that such elements are central determinants of Hh signaling activity. Recently, ChIP studies have been carried out in multiple tissue contexts using mouse models carrying FLAG-tagged GLI proteins (GLIFLAG). Using these datasets, we tested whether a meta-analysis of GLI binding sites, coupled with a machine learning approach, could reveal genomic features that could be used to empirically identify Hh-regulated enhancers linked to loci of the Hh signaling pathway. Results: A meta-analysis of four existing GLIFLAG datasets revealed a library of GLI binding motifs that was substantially more restricted than the potential sites predicted by previous in vitro binding studies. A machine learning method (kmer-SVM) was then applied to these datasets and enriched k-mers were identified that, when applied to the mouse genome, predicted as many as 37,000 potential Hh enhancers. For functional analysis, we selected nine regions which were annotated to putative Hh pathway molecules and found that seven exhibited GLI-dependent activity, indicating that they are directly regulated by Hh signaling (78 % success rate). Conclusions: The results suggest that Hh enhancer regions share common sequence features. The kmer-SVM machine learning approach identifies those features and can successfully predict functional Hh regulatory regions in genomic DNA surrounding Hh pathway molecules and likely, other Hh targets. Additionally, the library of enriched GLI binding motifs that we have identified may allow improved identification of functional GLI binding sites.
- "Use of both the LDwGBM and NPwGBM datasets for prediction incorporated data from the GLI1 FLAG (predominately activator) and GLI3 FLAG (predominantly repressor) transcription factors in two diverse contexts (neuronal precursor and limb development). The length of 600 bp was selected based on motif enrichment analysis of the LD and NP datasets using MEME-ChIP  and Centrimo . This analysis showed that, within the ChIP-chip LD dataset, enrichment for the location of GLI motifs (green line) has a broad profile that spans 200 bp to either side of the midpoint (Additional file 3: Figure S1A). "
[Show abstract] [Hide abstract] ABSTRACT: Algae have enormous potential as bio-factories for the efficient production of a wide array of high-value products, and eventually as a source of renewable biofuels. However, tools for engineering the nuclear genomes of algae remain scarce and limited in functionality. In this study, synthetic algal promoters (saps) were generated as a tool for increasing nuclear gene expression and as a model for understanding promoter elements and structure in green algae. Promoters were generated to mimic native cis-motif elements, structure, and overall nucleotide composition of top expressing genes from Chlamydomonas reinhardtii. Twenty five saps were used to drive expression of a fluorescent reporter in transgenic algae. A majority of the promoters were functional in vivo and seven were identified to drive expression of the fluorescent reporter better than the current best endogenous promoter in C. reinhardtii, the chimeric hsp70/rbs2 promoter. Further analysis of the best synthetic promoter, sap11, revealed a new DNA motif essential for promoter function that is widespread and highly conserved in C. reinhardtii. These data demonstrate the utility of synthetic promoters to drive gene expression in green algae, and lays the groundwork for the development of a suite of saps capable of driving the robust and complex gene expression that will be required for algae to reach their potential as an industrial platform for photosynthetic bio-manufacturing.
- "Sequence from −1000 bp upstream to +500 bp downstream of the validated 5′ UTR start sites was analyzed for new motifs using DREME . Then the promoter sequences were analyzed by CentriMo to identify POWRS or DREME motifs that are enriched in specific regions relative to the TSS . "
[Show abstract] [Hide abstract] ABSTRACT: Three transcription factors (TFs), OxyR, SoxR, and SoxS, play a critical role in transcriptional regulation of the defense system for oxidative stress in bacteria. However, their full genome-wide regulatory potential is unknown. Here, we perform a genome-scale reconstruction of the OxyR, SoxR, and SoxS regulons in Escherichia coli K-12 MG1655. Integrative data analysis reveals that a total of 68 genes in 51 transcription units (TUs) belong to these regulons. Among them, 48 genes showed more than 2-fold changes in expression level under single-TF-knockout conditions. This reconstruction expands the genome-wide roles of these factors to include direct activation of genes related to amino acid biosynthesis (methionine and aromatic amino acids), cell wall synthesis (lipid A biosynthesis and peptidoglycan growth), and divalent metal ion transport (Mn(2+), Zn(2+), and Mg(2+)). Investigating the co-regulation of these genes with other stress-response TFs reveals that they are independently regulated by stress-specific TFs. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
- ", 2013 ) . As genome - wide TF - binding data from ChIP experiments can be a source of information for inferring the relative in vivo DNA - binding affinity of TFs ( Bailey and Machanick , 2012 ; Seo et al . , 2014 ) , we calculated the signal - to - noise ( S / N ) ratios of the OxyR - , SoxR - , and SoxS - binding peaks and used them as a proxy of the in vivo binding intensity of each binding site ( Table S1 ) . "
Questions & Answers about this publication
- If a transcription factor functions as both an activator and a repressor, are the DNA-binding sites of both cases similar?
Thank you. Actually, I am searching the putative genes regulated by a transcription factor. So I took some candidates genes from proteomics data and predicted the DNA-binding sites by using Gibbs website. I got three kinds of results: a) genes that positive regulated by the transcription factor (TF); b) genes that negative regulated by the TF; c) genes that positive and negative regulated by the TF. However, three results are not matching. Therefore, I wonder this question above.
If you can identify different genomic regions corresponding to different cases, you can compare pairs of them using some new features of the CentriMo algorithm, not yet published (these features were not in the version we published in 2012). You can use these features at the web form:
Select “Any Localization” and “Binomial Test + Fisher Exact Test (comparative)”.
For inputs you need two FASTA-format sets of DNA binding regions (ideally each region 500bp, containing the region of interest) to compare and a compendium of MEME-format motifs (several are provided through the web form).
Result: you will see which motifs are relatively more enriched in the one set of binding regions vs. the other. You can reverse the roles by running again, with the order in which you enter the DNA sequences reversed.Following