A model of evolution with constant selective pressure for regulatory DNA sites

Institute for Information Transmission Problems (the Kharkevich Institute) of RAS, Bolshoi Karetny pereulok, 19, GSP-4, Moscow, 127994, Russia.
BMC Evolutionary Biology (Impact Factor: 3.41). 02/2007; 7(1):125. DOI: 10.1186/1471-2148-7-125
Source: PubMed

ABSTRACT Molecular evolution is usually described assuming a neutral or weakly non-neutral substitution model. Recently, new data have become available on evolution of sequence regions under a selective pressure, e.g. transcription factor binding sites. To reconstruct the evolutionary history of such sequences, one needs evolutionary models that take into account a substantial constant selective pressure.
We present a simple evolutionary model with a single preferred (consensus) nucleotide and the neutral substitution model adopted for all other nucleotides. This evolutionary model has a rate matrix in which all substitutions that do not involve the consensus nucleotide occur with the same rate. The model has two time scales for achieving a stationary distribution; in the general case only one of the two rate parameters can be evaluated from the stationary distribution. In the middle-time zone, a counterintuitive behavior was observed for some parameter values, with a probability of conservation for a non-consensus nucleotide greater than that for the consensus nucleotide. Such an effect can be observed only in the case of weak preference for the consensus nucleotide, when the probability to observe the consensus nucleotide in the stationary distribution is less than 1/2. If the substitution rate is represented as a product of mutation and fixation, only the fixation can be calculated from the stationary distribution. The exhibited conservation of non-consensus nucleotides does not take place if the elements of mutation matrix are identical, and can be related to the reduced mutation rate between the non-consensus nucleotides. This bias can have no effect on the stationary distribution of nucleotide frequencies calculated over the ensemble of multiple alignments, e.g. transcription factor binding sites upstream of different sets of co-regulated orthologous genes.
The derived model can be used as a null model when analyzing the evolution of orthologous transcription factor binding sites. In particular, our findings show that a nucleotide preferred at some position of a multiple alignment of binding sites for some transcription factor in the same genome is not necessarily the most conserved nucleotide in an alignment of orthologous sites from different species. However, this effect can take place only in the case of a mutation matrix whose elements are not identical.

Download full-text


Available from: Vsevolod Makeev, Jul 06, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Comparative computer-assisted analysis was used to study putative GlpR-regulons responsible for metabolism of glycerol and glycerol-3-phosphate in genomes of alpha-, beta-, and gamma-proteobacteria. New palindromic GlpR-binding signals were identified in gamma-proteobacteria; consensus sequences being TGTTCGATAACGAACA for Enterobacteriaceae, wTTTTCGTATACGAAAAw for Pseudomonadaceae, and AATGCTCGATCGAGCATT for Vibrionaceae. The signals in alpha- and beta-proteobacteria were also identified: they contained 3-4 direct TTTCGTT repeats separated by 3-4 nucleotide pairs.
    Molekuliarnaia biologiia 01/2003; 37(5):843-9.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In bioinformatics, binding of transcription regulatory factors to the cognate binding sites is usually described by sequence-specific binding energy, which is estimated from a training sample of sites. This model implies that all binding sites with binding energy above some threshold are functional and site sequence variations should be considered neutral until they do not reduce this energy below the threshold. To quantify this energy, the binding profile (positional weight matrix, PWM) model or consensus-based model is usually applied. Here we show that in many cases available data are not sufficient to construct a relevant PWM, and modified consensus-based model could be more effective to describe binding properties. Further, using the data about binding sites of several transcription factors, we demonstrate that some non-consensus nucleotides in "orthologous sites" (that is, binding sites of the same factor upstream of orthologous genes), which have been believed to be irrelevant or even hindering the regulation, are evolutionary very stable and specific for the regulated gene. For each two considered genomes, the number of substitutions between non-consensus nucleotides is far less than the expected number of neutral substitutions. Moreover, in several positions of binding sites regulating different genes, there are non-consensus nucleotides conserved in distant genomes. It means that there exists a selection pressure, which results in the stability of non-consensus nucleotides.
    Gene 04/2005; 347(2):255-63. DOI:10.1016/j.gene.2004.12.013 · 2.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Assessment of the evolutionary process is crucial for understanding the effect of protein structure and function on sequence evolution and for many other analyses in molecular evolution. Here, we used simulations to study how taxon sampling affects accuracy of parameter estimation and topological inference in the absence of branch length asymmetry. With maximum-likelihood analysis, we find that adding taxa dramatically improves both support for the evolutionary model and accurate assessment of its parameters when compared with increasing the sequence length. Using a method we call “doppelgänger trees,” we distinguish the contributions of two sources of improved topological inference: greater knowledge about internal nodes and greater knowledge of site-specific rate parameters. Surprisingly, highly significant support for the correct general model does not lead directly to improved topological inference. Instead, substantial improvement occurs only with accurate assessment of the evolutionary process at individual sites. Although these results are based on a simplified model of the evolutionary process, they indicate that in general, assuming processes are not independent and identically distributed among sites, more extensive sampling of taxonomic biodiversity will greatly improve analytical results in many current sequence data sets with moderate sequence lengths.
    Molecular Biology and Evolution 01/2001; DOI:10.1093/oxfordjournals.molbev.a026286 · 14.31 Impact Factor