Search of regular sequences in promoters from eukaryotic genomes.

Bioengineering Centre of Russian Academy of Sciences, 117312 Moscow, Pr-t 60-tya Oktyabrya, 7/1, Russian Federation.
Computational biology and chemistry (Impact Factor: 1.37). 04/2009; 33(3):196-204. DOI: 10.1016/j.compbiolchem.2009.03.001
Source: PubMed

ABSTRACT In this paper, the notion of "regularity" is introduced to describe the structural features of DNA sequences. This notion expands the "latent periodicity" term. The novel method for revealing regularity based on the runs test is described. The search of regular sequences in eukaryotic promoters has shown that more than 60% of them possess a regularity property on statistically significant level. Possible biological functions of regularity are discussed together with the possibility of using this characteristic for performing promoter annotation.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
    PLoS ONE 01/2014; 9(1):e85260. DOI:10.1371/journal.pone.0085260 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We analyzed the periodic patterns in E. coli promoters and compared the distributions of the corresponding patterns in promoters and in the complete genome to elucidate their function. Except the three-base periodicity, coincident with that in the coding regions and growing stronger in the region downstream from the transcriptions start (TS), all other salient periodicities are peaked upstream of TS. We found that helical periodicities with the lengths about B-helix pitch ~10.2–10.5 bp and A-helix pitch ~10.8–11.1 bp coexist in the genomic sequences.We mapped the distributions of stretches with A-, B-, and Z-like DNA periodicities onto E. coli genome. All three periodicities tend to concentrate within non-coding regions when their intensity becomes stronger and prevail in the promoter sequences. The comparison with available experimental data indicates that promoters with the most pronounced periodicities may be related to the supercoiling-sensitive genes.
    Genomics 03/2011; 98(3):223–231. DOI:10.1016/j.ygeno.2011.06.006 · 2.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a new mathematical method for finding very diverged short tandem repeats containing a single indel. The method involves comparison of two frequency matrices: a first matrix for a subsequence before shift and a second one for a subsequence after it. A measure of comparison is based on matrix similarity. The approach developed was applied to analysis of the genomes of Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae. They were investigated regarding the presence of tandem repeats having repeat length equal to 2 - 11 nucleotides except equal to 3, 6 and 9 nucleotides. A number of phase shift regions for these genomes was approximately 2.2×10(4), 1.5×10(4) and 1.7×10(2), respectively. Type I error was less than 5%. The mean length of fuzzy periodicity and phase shift regions was about 220 nucleotides. The regions of fuzzy periodicity having single insertion or deletion occupy substantial parts of the genomes: 5%, 3% and 0.3%, respectively. Only less than 10% of these regions have been detected previously. That is, the number of such regions in the genomes of C. elegans, D. melanogaster and S. cerevisiae is dramatically higher than it has been revealed by any known methods. We suppose that some found regions of fuzzy periodicity could be the regions for protein binding.
    Computational biology and chemistry 04/2014; 51C:12-21. DOI:10.1016/j.compbiolchem.2014.03.004 · 1.37 Impact Factor

Full-text (2 Sources)

Available from
Jun 3, 2014