The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding

Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), P,O, Box 8905, NO-7491, Trondheim, Norway. .
BMC Bioinformatics (Impact Factor: 2.67). 07/2012; 13:176. DOI: 10.1186/1471-2105-13-176
Source: PubMed

ABSTRACT Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) is the most frequently used method to identify the binding sites of transcription factors. Active binding sites can be seen as peaks in enrichment profiles when the sequencing reads are mapped to a reference genome. However, the profiles are normally noisy, making it challenging to identify all significantly enriched regions in a reliable way and with an acceptable false discovery rate.
We present the Triform algorithm, an improved approach to automatic peak finding in ChIP-Seq enrichment profiles for transcription factors. The method uses model-free statistics to identify peak-like distributions of sequencing reads, taking advantage of improved peak definition in combination with known characteristics of ChIP-Seq data.
Triform outperforms several existing methods in the identification of representative peak profiles in curated benchmark data sets. We also show that Triform in many cases is able to identify peaks that are more consistent with biological function, compared with other methods. Finally, we show that Triform can be used to generate novel information on transcription factor binding in repeat regions, which represents a particular challenge in many ChIP-Seq experiments. The Triform algorithm has been implemented in R, and is available via


Available from: Morten Beck Rye, Jun 02, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking. Results: We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.
    Bioinformatics 06/2014; 30(18). DOI:10.1093/bioinformatics/btu372 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The CRP-family transcription factor NtcA, universally found in cyanobacteria, was initially discovered as a regulator operating N control. It responds to the N regime signaled by the internal 2-oxoglutarate levels, an indicator of the C to N balance of the cells. Canonical NtcA-activated promoters bear an NtcA-consensus binding site (GTAN8TAC) centered at about 41.5 nucleotides upstream from the transcription start point. In strains of the Anabaena/Nostoc genera NtcA is pivotal for the differentiation of heterocysts in response to N stress. In this study, we have used chromatin immunoprecipitation followed by high-throughput sequencing to identify the whole catalog of NtcA-binding sites in cells of the filamentous, heterocyst-forming cyanobacterium Anabaena sp. PCC 7120 three hours after the withdrawal of combined N. NtcA has been found to bind to 2,424 DNA regions in the genome of Anabaena, which have been ascribed to 2,153 genes. Interestingly, only a small proportion of those genes are involved in N assimilation and metabolism, and 65% of the binding regions were located intragenically. The distribution of NtcA-binding sites identified here reveals the largest bacterial regulon described to date. Our results show that NtcA has a much wider role in the physiology of the cell than it has been previously thought, acting both as a global transcriptional regulator and possibly also as a factor influencing the superstructure of the chromosome (and plasmids).
    BMC Genomics 01/2014; 15(1):22. DOI:10.1186/1471-2164-15-22 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing (NGS) technologies have been used in diverse ways to investigate various aspects of chromatin biology by identifying genomic loci that are bound by transcription factors, occupied by nucleosomes or accessible to nuclease cleavage, or loci that physically interact with remote genomic loci. However, reaching sound biological conclusions from such NGS enrichment profiles requires many potential biases to be taken into account. In this Review, we discuss common ways in which biases may be introduced into NGS chromatin profiling data, approaches to diagnose these biases and analytical techniques to mitigate their effect.
    Nature Reviews Genetics 09/2014; 15(11). DOI:10.1038/nrg3788 · 39.79 Impact Factor