Article

Inferring direct DNA binding from ChIP-seq

Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia and Department of Computer Science, Rhodes University, Grahamstown 6140, South Africa.
Nucleic Acids Research (Impact Factor: 9.11). 05/2012; 40(17):e128. DOI: 10.1093/nar/gks433
Source: PubMed

ABSTRACT

Genome-wide binding data from transcription factor ChIP-seq experiments is the best source of information for inferring the
relative DNA-binding affinity of these proteins in vivo. However, standard motif enrichment analysis and motif discovery approaches sometimes fail to correctly identify the binding
motif for the ChIP-ed factor. To overcome this problem, we propose ‘central motif enrichment analysis’ (CMEA), which is based
on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal,
well centered and maximal in the precise center of the ChIP-seq peak regions. We describe a novel visualization and statistical
analysis tool—CentriMo—that identifies the region of maximum central enrichment in a set of ChIP-seq peak regions and displays
the positional distributions of predicted sites. Using CentriMo for motif enrichment analysis, we provide evidence that one
transcription factor (Nanog) has different binding affinity in vivo than in vitro, that another binds DNA cooperatively (E2f1), and confirm the in vivo affinity of NFIC, rescuing a difficult ChIP-seq data set. In another data set, CentriMo strongly suggests that there is no
evidence of direct DNA binding by the ChIP-ed factor (Smad1). CentriMo is now part of the MEME Suite software package available
at http://meme.nbcr.net. All data and output files presented here are available at: http://research.imb.uq.edu.au/t.bailey/sd/Bailey2011a.

Download full-text

Full-text

Available from: Philip Machanick, Mar 05, 2014
  • Source
    • ", 2013 ) . As genome - wide TF - binding data from ChIP experiments can be a source of information for inferring the relative in vivo DNA - binding affinity of TFs ( Bailey and Machanick , 2012 ; Seo et al . , 2014 ) , we calculated the signal - to - noise ( S / N ) ratios of the OxyR - , SoxR - , and SoxS - binding peaks and used them as a proxy of the in vivo binding intensity of each binding site ( Table S1 ) . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Three transcription factors (TFs), OxyR, SoxR, and SoxS, play a critical role in transcriptional regulation of the defense system for oxidative stress in bacteria. However, their full genome-wide regulatory potential is unknown. Here, we perform a genome-scale reconstruction of the OxyR, SoxR, and SoxS regulons in Escherichia coli K-12 MG1655. Integrative data analysis reveals that a total of 68 genes in 51 transcription units (TUs) belong to these regulons. Among them, 48 genes showed more than 2-fold changes in expression level under single-TF-knockout conditions. This reconstruction expands the genome-wide roles of these factors to include direct activation of genes related to amino acid biosynthesis (methionine and aromatic amino acids), cell wall synthesis (lipid A biosynthesis and peptidoglycan growth), and divalent metal ion transport (Mn(2+), Zn(2+), and Mg(2+)). Investigating the co-regulation of these genes with other stress-response TFs reveals that they are independently regulated by stress-specific TFs. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Full-text · Article · Aug 2015 · Cell Reports
  • Source
    • "Motifs of the PASs from all these five categories were analyzed using the following method. Motifs in regions flanking (AE300 nt) PASs of the lincRNAs were detected based on the DREME motif discovery algorithm (Bailey, 2011), followed by a stringent statistical assessment (E-value < 1e–10) of the positional preference of the motifs, using CentriMo (Bailey and Machanick, 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Long noncoding RNAs (lncRNAs) play essential regulatory roles in the human cancer genome. Many identified lncRNAs are transcribed by RNA polymerase II in which they are polyadenylated, whereby the long intervening noncoding RNAs (lincRNAs) have been widely used for the researches of lncRNAs. To date, the mechanism of lincRNAs polyadenylation related to cancer is rarely fully understood yet. In this paper, first we reported a comprehensive map of global lincRNAs polyadenylation sites (PASs) in five human cancer genomes; second we proposed a grouping method based on the pattern of genes expression and the manner of alternative polyadenylation (APA); third we investigated the distribution of motifs surrounding PASs. Our analysis reveals that about 70% of PASs are located in the sense strand of lincRNAs. Also more than 90% PASs in the antisense strand of lincRNAs are located in the intron regions. In addition, around 40% of lincRNA genes with PASs has APA sites. Four obvious motifs i.e., AATAAA, TTTTTTTT, CCAGSCTGG, and RGYRYRGTGG were detected in the sequences surrounding PASs in the normal and cancer tissues. Furthermore, a novel algorithm was proposed to recognize the lincRNAs PASs of tumor tissues based on support vector machine (SVM). The algorithm can achieve the accuracies up to 96.55% and 89.48% for identification the tumor lincRNAs PASs from the non-polyadenylation sites and the non-lincRNA PASs, respectively.
    Full-text · Article · Jul 2014 · Computational Biology and Chemistry
  • Source
    • "MEME-ChIP provides three different output formats: HTML, XML, and text. The output can be viewed in MEME output format, DREME output format, as well as in CentriMo [47] and TOMTOM [36] report formats [10]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).
    Full-text · Article · Feb 2014 · Biology Direct
Show more

Questions & Answers about this publication

  • Philip Machanick added an answer in Transcription Factors:
    If a transcription factor functions as both an activator and a repressor, are the DNA-binding sites of both cases similar?

    Thank you. Actually, I am searching the putative genes regulated by a transcription factor. So I took some candidates genes from proteomics data and predicted the DNA-binding sites by using Gibbs website. I got three kinds of results: a) genes that positive regulated by the transcription factor (TF); b) genes that negative regulated by the TF; c) genes that positive and negative regulated by the TF. However, three results are not matching. Therefore, I wonder this question above.

    Philip Machanick

    If you can identify different genomic regions corresponding to different cases, you can compare pairs of them using some new features of the CentriMo algorithm, not yet published (these features were not in the version we published in 2012). You can use these features at the web form:

    http://meme.nbcr.net/meme/cgi-bin/centrimo.cgi

    Select “Any Localization” and “Binomial Test + Fisher Exact Test (comparative)”.

    For inputs you need two FASTA-format sets of DNA binding regions (ideally each region 500bp, containing the region of interest) to compare and a compendium of MEME-format motifs (several are provided through the web form).

    Result: you will see which motifs are relatively more enriched in the one set of binding regions vs. the other. You can reverse the roles by running again, with the order in which you enter the DNA sequences reversed.

    • Source
      [Show abstract] [Hide abstract]
      ABSTRACT: Genome-wide binding data from transcription factor ChIP-seq experiments is the best source of information for inferring the relative DNA-binding affinity of these proteins in vivo. However, standard motif enrichment analysis and motif discovery approaches sometimes fail to correctly identify the binding motif for the ChIP-ed factor. To overcome this problem, we propose ‘central motif enrichment analysis’ (CMEA), which is based on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal, well centered and maximal in the precise center of the ChIP-seq peak regions. We describe a novel visualization and statistical analysis tool—CentriMo—that identifies the region of maximum central enrichment in a set of ChIP-seq peak regions and displays the positional distributions of predicted sites. Using CentriMo for motif enrichment analysis, we provide evidence that one transcription factor (Nanog) has different binding affinity in vivo than in vitro, that another binds DNA cooperatively (E2f1), and confirm the in vivo affinity of NFIC, rescuing a difficult ChIP-seq data set. In another data set, CentriMo strongly suggests that there is no evidence of direct DNA binding by the ChIP-ed factor (Smad1). CentriMo is now part of the MEME Suite software package available at http://meme.nbcr.net. All data and output files presented here are available at: http://research.imb.uq.edu.au/t.bailey/sd/Bailey2011a.
      Full-text · Article · May 2012 · Nucleic Acids Research