Nucleosome positioning signals in genomic DNA

Bioinformatics Program, Boston University, Boston, MA 02215, USA.
Genome Research (Impact Factor: 14.63). 09/2007; 17(8):1170-7. DOI: 10.1101/gr.6101007
Source: PubMed


Although histones can form nucleosomes on virtually any genomic sequence, DNA sequences show considerable variability in their binding affinity. We have used DNA sequences of Saccharomyces cerevisiae whose nucleosome binding affinities have been experimentally determined (Yuan et al. 2005) to train a support vector machine to identify the nucleosome formation potential of any given sequence of DNA. The DNA sequences whose nucleosome formation potential are most accurately predicted are those that contain strong nucleosome forming or inhibiting signals and are found within nucleosome length stretches of genomic DNA with continuous nucleosome formation or inhibition signals. We have accurately predicted the experimentally determined nucleosome positions across a well-characterized promoter region of S. cerevisiae and identified strong periodicity within 199 center-aligned mononucleosomes studied recently (Segal et al. 2006) despite there being no periodicity information used to train the support vector machine. Our analysis suggests that only a subset of nucleosomes are likely to be positioned by intrinsic sequence signals. This observation is consistent with the available experimental data and is inconsistent with the proposal of a nucleosome positioning code. Finally, we show that intrinsic nucleosome positioning signals are both more inhibitory and more variable in promoter regions than in open reading frames in S. cerevisiae.

Download full-text


Available from: Robert E Thurman,
  • Source
    • "Support vector machine (SVM) is a supervised classification algorithm based on statistical learning theory. In recent years, the SVM algorithm has been widely used in the area of bioinformatics (Chen and Lin, 2010; Chen and Lin, 2012; Chen et al., 2012a,b; Ding et al., 2012; Lin et al., 2010; Liu et al., 2012; Macalpine et al., 2010; Peckham et al., 2007; Yuan et al., 2013). In present work, the free software LIBSVM (version 3.12) was used to distinguish between Ori and non-Ori regions in eukaryotic organisms (Chang and Lin, 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S = G or C) dinucleotides are favoured in Ori whereas WW (W = A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
    Biosystems 10/2014; 124. DOI:10.1016/j.biosystems.2014.07.001 · 1.55 Impact Factor
  • Source
    • "The predictions are often based on various features of the primary DNA sequences [9,17,19–22]. For example, Peckham et al. (2007) have incorporated variable-length k-mers into a Support Vector Machine (SVM) to predict nucleosomal DNA in yeast [19]. Lee et al. (2007) have used a lasso regression model on a set of DNA sequence and structural features to enable nucleosomal DNA prediction [17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of important factors that affect nucleosome formation is critical to clarify nucleosome-forming mechanisms and the role of the nucleosome in gene regulation. Various features reported in the literature led to our hypothesis that multiple features can together contribute to nucleosome formation. Therefore, we compiled 779 features and developed a pattern discovery and scoring algorithm FFNs (Finding Features for Nucleosomes) to identify feature patterns that are differentially enriched in nucleosome-forming sequences and nucleosome-depletion sequences. Applying FFN to genome-wide nucleosome occupancy data in yeast and human, we identified statistically significant feature patterns that may influence nucleosome formation, many of which are common to the two species. We found that both sequence and structural features are important in nucleosome occupancy prediction. We discovered that, even for the same feature combinations, variations in feature values may lead to differences in predictive power. We demonstrated that the identified feature patterns could be used to assist nucleosomal sequence prediction.
    Genomics 08/2014; 104(2). DOI:10.1016/j.ygeno.2014.07.002 · 2.28 Impact Factor
  • Source
    • "As established for the parental strain (Figure 1a), the ~60 bp fragments of the T. kodakarensis genome protected from MN digestion by only HTkA or HTkB assembly in vivo and in vitro also contained 10 bp helical-periodicity repeats of AA/AT/TA/TT and GG/GC/CG/CC dinucleotides, offset by 5 bp, and pentamers containing only A and/or T were under-represented, and those containing only G and/or C were over-represented, relative to their occurrences in the T. kodakarensis genome (Additional file 1: Figure S1). Together these results confirm that the positions at which HTkA and HTkB assemble to form archaeal nucleosomes are predominantly determined by the T. kodakarensis genome sequence and, as concluded from eukaryotic nucleosome studies [7,33,38,43-46], from an archaeal genome sequence [39], it should be possible to predict where archaeal nucleosomes will preferentially assemble in vivo. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Histone wrapping of DNA into nucleosomes almost certainly evolved in the Archaea, and predates Eukaryotes. In Eukaryotes, nucleosome positioning plays a central role in regulating gene expression and is directed by primary sequence motifs that together form a nucleosome positioning code. The experiments reported were undertaken to determine if archaeal histone assembly conforms to the nucleosome positioning code. Results Eukaryotic nucleosome positioning is favored and directed by phased helical repeats of AA/TT/AT/TA and CC/GG/CG/GC dinucleotides, and disfavored by longer AT-rich oligonucleotides. Deep sequencing of genomic DNA protected from micrococcal nuclease digestion by assembly into archaeal nucleosomes has established that archaeal nucleosome assembly is also directed and positioned by these sequence motifs, both in vivo in Methanothermobacter thermautotrophicus and Thermococcus kodakarensis and in vitro in reaction mixtures containing only one purified archaeal histone and genomic DNA. Archaeal nucleosomes assembled at the same locations in vivo and in vitro, with much reduced assembly immediately upstream of open reading frames and throughout the ribosomal rDNA operons. Providing further support for a common positioning code, archaeal histones assembled into nucleosomes on eukaryotic DNA and eukaryotic histones into nucleosomes on archaeal DNA at the same locations. T. kodakarensis has two histones, designated HTkA and HTkB, and strains with either but not both histones deleted grow normally but do exhibit transcriptome differences. Comparisons of the archaeal nucleosome profiles in the intergenic regions immediately upstream of genes that exhibited increased or decreased transcription in the absence of HTkA or HTkB revealed substantial differences but no consistent pattern of changes that would correlate directly with archaeal nucleosome positioning inhibiting or stimulating transcription. Conclusions The results obtained establish that an archaeal histone and a genome sequence together are sufficient to determine where archaeal nucleosomes preferentially assemble and where they avoid assembly. We confirm that the same nucleosome positioning code operates in Archaea as in Eukaryotes and presumably therefore evolved with the histone-fold mechanism of DNA binding and compaction early in the archaeal lineage, before the divergence of Eukaryotes.
    BMC Genomics 06/2013; 14(1):391. DOI:10.1186/1471-2164-14-391 · 3.99 Impact Factor
Show more