Information, probability, and the abundance of the simplest RNA active sites

Department of Computer Science, University of Colorado at Boulder, 430 UCB, Boulder, CO 80309-0430, USA.
Frontiers in Bioscience (Impact Factor: 3.52). 02/2008; 13(16):6060-71. DOI: 10.2741/3137
Source: PubMed


The abundance of simple but functional RNA sites in random-sequence pools is critical for understanding emergence of RNA functions in nature and in the laboratory today. The complexity of a site is typically measured in terms of information, i.e. the Shannon entropy of the positions in a multiple sequence alignment. However, this calculation can be incorrect by many orders of magnitude. Here we compare several methods for estimating the abundance of RNA active-site patterns in the context of in vitro selection (SELEX), highlighting the strengths and weaknesses of each. We include in these methods a new approach that yields confidence bounds for the exact probability of finding specific kinds of RNA active sites. We show that all of the methods that take modularity into account provide far more accurate estimates of this probability than the informational methods, and that fast approximate methods are suitable for a wide range of RNA motifs.

Full-text preview

Available from:
  • Source
    • "By many kinds of calculations, RNA and ribozymes are likely to have played early roles in life on Earth (Atkins et al. 2011). However, even small known ribozymes can contain dozens of required ribonucleotides, making them statistically infrequent (Kennedy et al. 2008), unstable because adjacent nucleotides can be aligned for easy hydrolysis (Soukup and Breaker 1999), burdened with replication that is easily poisoned by chirally related sugars (Joyce et al. 1984b), and difficult to extricate from stable double-stranded replicative intermediates (Sievers and Von Kiedrowski 1994; Engelhart et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Simple nucleotide templating activities are of interest as potential primordial reactions. Here we describe the acceleration of 5'-5' AppA synthesis by 3'-5' poly(U) under normal solution conditions. This reaction is apparently templated via complementary U:A base-pairing, despite the involvement of two different RNA backbones, because poly(U), unlike other polymers, significantly stimulates AppA synthesis. These interactions occur in moderate (K(+)) and (Mg(2+)) and are temperature sensitive, being more efficient at 10°C than at 4°C, but absent at 20°C. The reaction is only slightly pH sensitive, despite potentially relevant substrate pKa's. Kinetic data explicitly support production of AppA by interaction of stacked 2MeImpA and pA nucleotides paired with a single molecule of U template. At a lower rate, AppA can also be produced by a chemical reaction between 2MeImpA and pA, without participation of poly(U). Molecular modeling suggests that 5'-5' joining between stacked or concurrently paired A's can occur without major departures from normal U-A helical coordinates. So, coenzyme-like 5'-5' purine dinucleotides might be readily synthesized from 3'-5' RNAs with complementary sequences. © 2015 Puthenvedu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
    Full-text · Article · Aug 2015 · RNA
  • Source
    • "The modeled route to this goal (Fig. 1) combines several straightforward ideas. Firstly, I have previously emphasized the value of molecular simplicity in making a structure accessible to primitive synthesis (Illangasekare and Yarus, 1999; Kennedy et al., 2008; Yarus, 2011b). Aminoacyl-RNA synthesis via a 5 nt ribozyme is a characterized experimental example (Yarus, 2011b); such a molecule would occur via untemplated RNA synthesis as soon as mildly activated nucleotides existed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A testable, explicit origin for Darwinian behavior, feasible on a chaotic early Earth, would aid origins discussion. Here I show that a pool receiving unreliable supplies of unstable ribonucleotide precursors can recurrently fill this role. By using numerical integration, the differential equations governing a sporadically fed pool are solved, yielding quantitative constraints for the proliferation of molecules that also have a chemical phenotype. For example, templated triphosphate nucleotide joining is >10(4) too slow, suggesting that a group more reactive than pyrophosphate activated primordial nucleotides. However, measured literature rates are sufficient if the Initial Darwinian Ancestor (IDA) resembles a 5'-5' cofactor-like dinucleotide RNA, synthesized via activation with a phosphorimidazolide-like group. A sporadically fed pool offers unforeseen advantages; for example, the pool hosts a novel replicator which is predominantly unpaired, even though it replicates. Such free template is optimized for effective selection during its replication. Pool nucleotides are also subject to a broadly based selection that impels the population toward replication, effective selection, and Darwinian behavior. Such a primordial pool may have left detectable modern traces. A sporadically fed ribonucleotide pool also fits a recognizable early Earth environment, has recognizable modern descendants, and suits the early shape of the phylogenetic tree of Earthly life. Finally, analysis points to particular data now needed to refine the hypothesis. Accordingly, a kinetically explicit chemical hypothesis for a terran IDA can be justified, and informative experiments seem readily accessible. Key Words: Cofactor-RNA-Origin of life-Replication-Initial Darwinian Ancestor (IDA). Astrobiology 12, 870-883.
    Preview · Article · Sep 2012 · Astrobiology
  • Source
    • "Finally, the second loop in the Trp site has several specific implications. First, previous estimates of the probability of finding particular types of RNA sites (Knight and Yarus 2003; Knight et al. 2005; Kennedy et al. 2008) may be inflated by failing to take into account undetectable, but nonetheless important parts of the active site, such as those revealed here. Second, embedding the site in a random-sequence background provides an effective means for detecting such FIGURE 7. (A) A newly defined tryptophan binding site derived from selection, construction, mutagenesis, and massed sequence analysis of selected pools. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Conservation is often used to define essential sequences within RNA sites. However, conservation finds only invariant sequence elements that are necessary for function, rather than finding a set of sequence elements sufficient for function. Biochemical studies in several systems-including the hammerhead ribozyme and the purine riboswitch-find additional elements, such as loop-loop interactions, required for function yet not phylogenetically conserved. Here we define a critical test of sufficiency: We embed a minimal, apparently sufficient motif for binding the amino acid tryptophan in a random-sequence background and ask whether we obtain functional molecules. After a negative result, we use a combination of three-dimensional structural modeling, selection, designed mutations, high-throughput sequencing, and bioinformatics to explore functional insufficiency. This reveals an essential unpaired G in a diverse structural context, varied sequence, and flexible distance from the invariant internal loop binding site identified previously. Addition of the new element yields a sufficient binding site by the insertion criterion, binding tryptophan in 22 out of 23 tries. Random insertion testing for site sufficiency seems likely to be broadly revealing.
    Full-text · Article · Oct 2010 · RNA
Show more