Information, probability, and the abundance of the simplest RNA active sites

Department of Computer Science, University of Colorado at Boulder, 430 UCB, Boulder, CO 80309-0430, USA.
Frontiers in Bioscience (Impact Factor: 4.25). 02/2008; 13:6060-71. DOI: 10.2741/3137
Source: PubMed

ABSTRACT The abundance of simple but functional RNA sites in random-sequence pools is critical for understanding emergence of RNA functions in nature and in the laboratory today. The complexity of a site is typically measured in terms of information, i.e. the Shannon entropy of the positions in a multiple sequence alignment. However, this calculation can be incorrect by many orders of magnitude. Here we compare several methods for estimating the abundance of RNA active-site patterns in the context of in vitro selection (SELEX), highlighting the strengths and weaknesses of each. We include in these methods a new approach that yields confidence bounds for the exact probability of finding specific kinds of RNA active sites. We show that all of the methods that take modularity into account provide far more accurate estimates of this probability than the informational methods, and that fast approximate methods are suitable for a wide range of RNA motifs.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A testable, explicit origin for Darwinian behavior, feasible on a chaotic early Earth, would aid origins discussion. Here I show that a pool receiving unreliable supplies of unstable ribonucleotide precursors can recurrently fill this role. By using numerical integration, the differential equations governing a sporadically fed pool are solved, yielding quantitative constraints for the proliferation of molecules that also have a chemical phenotype. For example, templated triphosphate nucleotide joining is >10(4) too slow, suggesting that a group more reactive than pyrophosphate activated primordial nucleotides. However, measured literature rates are sufficient if the Initial Darwinian Ancestor (IDA) resembles a 5'-5' cofactor-like dinucleotide RNA, synthesized via activation with a phosphorimidazolide-like group. A sporadically fed pool offers unforeseen advantages; for example, the pool hosts a novel replicator which is predominantly unpaired, even though it replicates. Such free template is optimized for effective selection during its replication. Pool nucleotides are also subject to a broadly based selection that impels the population toward replication, effective selection, and Darwinian behavior. Such a primordial pool may have left detectable modern traces. A sporadically fed ribonucleotide pool also fits a recognizable early Earth environment, has recognizable modern descendants, and suits the early shape of the phylogenetic tree of Earthly life. Finally, analysis points to particular data now needed to refine the hypothesis. Accordingly, a kinetically explicit chemical hypothesis for a terran IDA can be justified, and informative experiments seem readily accessible. Key Words: Cofactor-RNA-Origin of life-Replication-Initial Darwinian Ancestor (IDA). Astrobiology 12, 870-883.
    Astrobiology 09/2012; 12(9):870-83. DOI:10.1089/ast.2012.0860 · 2.51 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sojourn-times provide a versatile framework to assess the statistical significance of motifs in genome-wide searches even under non-Markovian background models. However, the large state spaces encountered in genomic sequence analyses make the exact calculation of sojourn-time distributions computationally intractable in long sequences. Here, we use coupling and analytic combinatoric techniques to approximate these distributions in the general setting of Polish state spaces, which encompass discrete state spaces. Our approximations are accompanied with explicit, easy to compute, error bounds for total variation distance. Broadly speaking, if [Formula: see text] is the random number of times a Markov chain visits a certain subset [Formula: see text] of states in its first [Formula: see text] transitions, then we can usually approximate the distribution of [Formula: see text] for [Formula: see text] of order [Formula: see text], where [Formula: see text] is the largest integer for which the exact distribution of [Formula: see text] is accessible and [Formula: see text] is an ergodicity coefficient associated with the probability transition kernel of the chain. This gives access to approximations of sojourn-times in the intermediate regime where [Formula: see text] is perhaps too large for exact calculations, but too small to rely on Normal approximations or stationarity assumptions underlying Poisson and compound Poisson approximations. As proof of concept, we approximate the distribution of the number of matches with a motif in promoter regions of C. elegans. Mathematical properties of the proposed ergodicity coefficients and connections with additive functionals of homogeneous Markov chains as well as ergodicity of non-homogeneous Markov chains are also explored.
    Journal of Mathematical Biology 06/2013; 69(1). DOI:10.1007/s00285-013-0690-6 · 2.39 Impact Factor


Available from