Article

The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding

Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), P,O, Box 8905, NO-7491, Trondheim, Norway. .
BMC Bioinformatics (Impact Factor: 2.67). 07/2012; 13:176. DOI: 10.1186/1471-2105-13-176
Source: PubMed

ABSTRACT Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) is the most frequently used method to identify the binding sites of transcription factors. Active binding sites can be seen as peaks in enrichment profiles when the sequencing reads are mapped to a reference genome. However, the profiles are normally noisy, making it challenging to identify all significantly enriched regions in a reliable way and with an acceptable false discovery rate.
We present the Triform algorithm, an improved approach to automatic peak finding in ChIP-Seq enrichment profiles for transcription factors. The method uses model-free statistics to identify peak-like distributions of sequencing reads, taking advantage of improved peak definition in combination with known characteristics of ChIP-Seq data.
Triform outperforms several existing methods in the identification of representative peak profiles in curated benchmark data sets. We also show that Triform in many cases is able to identify peaks that are more consistent with biological function, compared with other methods. Finally, we show that Triform can be used to generate novel information on transcription factor binding in repeat regions, which represents a particular challenge in many ChIP-Seq experiments. The Triform algorithm has been implemented in R, and is available via http://tare.medisin.ntnu.no/triform.

Download full-text

Full-text

Available from: Morten Beck Rye, Aug 28, 2015
0 Followers
 · 
152 Views
  • Source
    • "The result of the Q-PCR analysis (Additional file 1: Figure S1), confirmed a substantial enrichment in the NtcA-dependent promoter. Immunoprecipitated and input DNA samples were subjected to high-throughput sequencing and the results were analyzed using the Triform algorithm [16] and mapped onto the genome of Anabaena sp. PCC 7120 [17] (Additional file 2: Table S1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The CRP-family transcription factor NtcA, universally found in cyanobacteria, was initially discovered as a regulator operating N control. It responds to the N regime signaled by the internal 2-oxoglutarate levels, an indicator of the C to N balance of the cells. Canonical NtcA-activated promoters bear an NtcA-consensus binding site (GTAN8TAC) centered at about 41.5 nucleotides upstream from the transcription start point. In strains of the Anabaena/Nostoc genera NtcA is pivotal for the differentiation of heterocysts in response to N stress. In this study, we have used chromatin immunoprecipitation followed by high-throughput sequencing to identify the whole catalog of NtcA-binding sites in cells of the filamentous, heterocyst-forming cyanobacterium Anabaena sp. PCC 7120 three hours after the withdrawal of combined N. NtcA has been found to bind to 2,424 DNA regions in the genome of Anabaena, which have been ascribed to 2,153 genes. Interestingly, only a small proportion of those genes are involved in N assimilation and metabolism, and 65% of the binding regions were located intragenically. The distribution of NtcA-binding sites identified here reveals the largest bacterial regulon described to date. Our results show that NtcA has a much wider role in the physiology of the cell than it has been previously thought, acting both as a global transcriptional regulator and possibly also as a factor influencing the superstructure of the chromosome (and plasmids).
    BMC Genomics 01/2014; 15(1):22. DOI:10.1186/1471-2164-15-22 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chromatin immunoprecipitation combined with massive parallel sequencing (ChIP-seq) is widely used to study protein-chromatin interactions or chromatin modifications at genome-wide level. Sequence reads that accumulate locally at the genome (peaks) reveal loci of selectively modified chromatin or specific sites of chromatin-binding factors. Computational approaches (peak callers) have been developed to identify the global pattern of these sites, most of which assess the deviation from background by applying distribution statistics. We have implemented MeDiChISeq, a regression-based approach, which - by following a learning process - defines a representative binding pattern from the investigated ChIP-seq dataset. Using this model MeDiChISeq identifies significant genome-wide patterns of chromatin-bound factors or chromatin modification. MeDiChISeq has been validated for various publicly available ChIP-seq datasets and extensively compared with other peak callers. MeDiChI-Seq has a high resolution when identifying binding events, a high degree of peak-assessment reproducibility in biological replicates, a low level of false calls and a high true discovery rate when evaluated in the context of gold-standard benchmark datasets. Importantly, this approach can be applied not only to 'sharp' binding patterns - like those retrieved for transcription factors (TFs) - but also to the broad binding patterns seen for several histone modifications. Notably, we show that at high sequencing depths, MeDiChISeq outperforms other algorithms due to its powerful peak shape recognition capacity which facilitates discerning significant binding events from spurious background enrichment patterns that are enhanced with increased sequencing depths.
    BMC Genomics 11/2013; 14(1):834. DOI:10.1186/1471-2164-14-834 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking. Results: We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.
    Bioinformatics 06/2014; 30(18). DOI:10.1093/bioinformatics/btu372 · 4.62 Impact Factor
Show more