Inferring direct DNA binding from ChIP-seq

Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia and Department of Computer Science, Rhodes University, Grahamstown 6140, South Africa.
Nucleic Acids Research (Impact Factor: 8.81). 05/2012; 40(17):e128. DOI: 10.1093/nar/gks433
Source: PubMed

ABSTRACT Genome-wide binding data from transcription factor ChIP-seq experiments is the best source of information for inferring the relative DNA-binding affinity of these proteins in vivo. However, standard motif enrichment analysis and motif discovery approaches sometimes fail to correctly identify the binding motif for the ChIP-ed factor. To overcome this problem, we propose 'central motif enrichment analysis' (CMEA), which is based on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal, well centered and maximal in the precise center of the ChIP-seq peak regions. We describe a novel visualization and statistical analysis tool-CentriMo-that identifies the region of maximum central enrichment in a set of ChIP-seq peak regions and displays the positional distributions of predicted sites. Using CentriMo for motif enrichment analysis, we provide evidence that one transcription factor (Nanog) has different binding affinity in vivo than in vitro, that another binds DNA cooperatively (E2f1), and confirm the in vivo affinity of NFIC, rescuing a difficult ChIP-seq data set. In another data set, CentriMo strongly suggests that there is no evidence of direct DNA binding by the ChIP-ed factor (Smad1). CentriMo is now part of the MEME Suite software package available at All data and output files presented here are available at:

Download full-text


Available from: Philip Machanick, Mar 05, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Molecular interactions between protein complexes and DNA mediate essential gene-regulatory functions. Uncovering such interactions by chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) has recently become the focus of intense interest. We here introduce quantitative enrichment of sequence tags (QuEST), a powerful statistical framework based on the kernel density estimation approach, which uses ChIP-Seq data to determine positions where protein complexes contact DNA. Using QuEST, we discovered several thousand binding sites for the human transcription factors SRF, GABP and NRSF at an average resolution of about 20 base pairs. MEME motif-discovery tool-based analyses of the QuEST-identified sequences revealed DNA binding by cofactors of SRF, providing evidence that cofactor binding specificity can be obtained from ChIP-Seq data. By combining QuEST analyses with Gene Ontology (GO) annotations and expression data, we illustrate how general functions of transcription factors can be inferred.
    Nature Methods 10/2008; 5(9):829-34. DOI:10.1038/nmeth.1246 · 25.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We developed Trawler, the fastest computational pipeline to date, to efficiently discover over-represented motifs in chromatin immunoprecipitation (ChIP) experiments and to predict their functional instances. When we applied Trawler to data from yeast and mammals, 83% of the known binding sites were accurately called, often with other additional binding sites, providing hints of combinatorial input. Newly discovered motifs and their features (identity, conservation, position in sequence) are displayed on a web interface.
    Nature Methods 08/2007; 4(7):563-5. DOI:10.1038/nmeth1061 · 25.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.
    Nucleic Acids Research 02/2004; 32(4):1372-81. DOI:10.1093/nar/gkh299 · 8.81 Impact Factor