Nucleosome positioning signals in genomic DNA

Bioinformatics Program, Boston University, Boston, MA 02215, USA.
Genome Research (Impact Factor: 14.63). 09/2007; 17(8):1170-7. DOI: 10.1101/gr.6101007
Source: PubMed


Although histones can form nucleosomes on virtually any genomic sequence, DNA sequences show considerable variability in their binding affinity. We have used DNA sequences of Saccharomyces cerevisiae whose nucleosome binding affinities have been experimentally determined (Yuan et al. 2005) to train a support vector machine to identify the nucleosome formation potential of any given sequence of DNA. The DNA sequences whose nucleosome formation potential are most accurately predicted are those that contain strong nucleosome forming or inhibiting signals and are found within nucleosome length stretches of genomic DNA with continuous nucleosome formation or inhibition signals. We have accurately predicted the experimentally determined nucleosome positions across a well-characterized promoter region of S. cerevisiae and identified strong periodicity within 199 center-aligned mononucleosomes studied recently (Segal et al. 2006) despite there being no periodicity information used to train the support vector machine. Our analysis suggests that only a subset of nucleosomes are likely to be positioned by intrinsic sequence signals. This observation is consistent with the available experimental data and is inconsistent with the proposal of a nucleosome positioning code. Finally, we show that intrinsic nucleosome positioning signals are both more inhibitory and more variable in promoter regions than in open reading frames in S. cerevisiae.

Download full-text


Available from: Robert E Thurman
  • Source
    • "Using computational prediction models for nucleosome occupancy, exonic regions were found to be enriched for sequences that favored nucleosome positioning, while the sequences flanking the exons were depleted of these high-affinity nucleosome sequences (Schwartz et al. 2009). As most exons are short (150 bp or so) and also have high GC content (Zhu et al. 2009), it was suggested that nucleosomes, which occupy 147 bp of DNA and also have higher occupancy over GC-rich regions (Kiyama and Trifonov 2002; Segal et al. 2006; Peckham et al. 2007), were preferentially placed over exonic regions. The question then arises as to whether nucleosomes are simply present over exons due to sequence preferences, or do they actually play a role in splicing? "
    [Show abstract] [Hide abstract]
    ABSTRACT: Nature has devised sophisticated cellular machinery to process mRNA transcripts produced by RNA Polymerase II, removing intronic regions and connecting exons together, to produce mature RNAs. This process, known as splicing, is very closely linked to transcription. Alternative splicing, or the ability to produce different combinations of exons that are spliced together from the same genomic template, is a fundamental means of regulating protein complexity. Similar to transcription, both constitutive and alternative splicing can be regulated by chromatin and its associated factors in response to various signal transduction pathways activated by external stimuli. This regulation can vary between different cell types, and interference with these pathways can lead to changes in splicing, often resulting in aberrant cellular states and disease. The epithelial to mesenchymal transition (EMT), which leads to cancer metastasis, is influenced by alternative splicing events of chromatin remodelers and epigenetic factors such as DNA methylation and non-coding RNAs. In this review, we will discuss the role of epigenetic factors including chromatin, chromatin remodelers, DNA methyltransferases, and microRNAs in the context of alternative splicing, and discuss their potential involvement in alternative splicing during the EMT process.
    Full-text · Article · Jul 2015 · Biochemistry and Cell Biology
  • Source
    • "Support vector machine (SVM) is a supervised classification algorithm based on statistical learning theory. In recent years, the SVM algorithm has been widely used in the area of bioinformatics (Chen and Lin, 2010; Chen and Lin, 2012; Chen et al., 2012a,b; Ding et al., 2012; Lin et al., 2010; Liu et al., 2012; Macalpine et al., 2010; Peckham et al., 2007; Yuan et al., 2013). In present work, the free software LIBSVM (version 3.12) was used to distinguish between Ori and non-Ori regions in eukaryotic organisms (Chang and Lin, 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S = G or C) dinucleotides are favoured in Ori whereas WW (W = A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
    Full-text · Article · Oct 2014 · Biosystems
  • Source
    • "The predictions are often based on various features of the primary DNA sequences [9,17,19–22]. For example, Peckham et al. (2007) have incorporated variable-length k-mers into a Support Vector Machine (SVM) to predict nucleosomal DNA in yeast [19]. Lee et al. (2007) have used a lasso regression model on a set of DNA sequence and structural features to enable nucleosomal DNA prediction [17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of important factors that affect nucleosome formation is critical to clarify nucleosome-forming mechanisms and the role of the nucleosome in gene regulation. Various features reported in the literature led to our hypothesis that multiple features can together contribute to nucleosome formation. Therefore, we compiled 779 features and developed a pattern discovery and scoring algorithm FFNs (Finding Features for Nucleosomes) to identify feature patterns that are differentially enriched in nucleosome-forming sequences and nucleosome-depletion sequences. Applying FFN to genome-wide nucleosome occupancy data in yeast and human, we identified statistically significant feature patterns that may influence nucleosome formation, many of which are common to the two species. We found that both sequence and structural features are important in nucleosome occupancy prediction. We discovered that, even for the same feature combinations, variations in feature values may lead to differences in predictive power. We demonstrated that the identified feature patterns could be used to assist nucleosomal sequence prediction.
    Full-text · Article · Aug 2014 · Genomics
Show more