Search of regular sequences in promoters from eukaryotic genomes.

Bioengineering Centre of Russian Academy of Sciences, 117312 Moscow, Pr-t 60-tya Oktyabrya, 7/1, Russian Federation.
Computational biology and chemistry (Impact Factor: 1.37). 04/2009; 33(3):196-204. DOI: 10.1016/j.compbiolchem.2009.03.001
Source: PubMed

ABSTRACT In this paper, the notion of "regularity" is introduced to describe the structural features of DNA sequences. This notion expands the "latent periodicity" term. The novel method for revealing regularity based on the runs test is described. The search of regular sequences in eukaryotic promoters has shown that more than 60% of them possess a regularity property on statistically significant level. Possible biological functions of regularity are discussed together with the possibility of using this characteristic for performing promoter annotation.

  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a new mathematical method for finding very diverged short tandem repeats containing a single indel. The method involves comparison of two frequency matrices: a first matrix for a subsequence before shift and a second one for a subsequence after it. A measure of comparison is based on matrix similarity. The approach developed was applied to analysis of the genomes of Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae. They were investigated regarding the presence of tandem repeats having repeat length equal to 2 - 11 nucleotides except equal to 3, 6 and 9 nucleotides. A number of phase shift regions for these genomes was approximately 2.2×10(4), 1.5×10(4) and 1.7×10(2), respectively. Type I error was less than 5%. The mean length of fuzzy periodicity and phase shift regions was about 220 nucleotides. The regions of fuzzy periodicity having single insertion or deletion occupy substantial parts of the genomes: 5%, 3% and 0.3%, respectively. Only less than 10% of these regions have been detected previously. That is, the number of such regions in the genomes of C. elegans, D. melanogaster and S. cerevisiae is dramatically higher than it has been revealed by any known methods. We suppose that some found regions of fuzzy periodicity could be the regions for protein binding.
    Computational biology and chemistry 04/2014; 51C:12-21. · 1.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
    PLoS ONE 01/2014; 9(1):e85260. · 3.73 Impact Factor

Full-text (2 Sources)

Available from
Jun 3, 2014