An Analysis of Large rRNA Sequences Folded by a Thermodynamic Method

Department of Molecular, Cellular, and Developmental Biology, University of Colorado at Boulder 80309-0347, USA.
Folding and Design 02/1996; 1(6):419-30. DOI: 10.1016/S1359-0278(96)00058-2
Source: PubMed


The secondary structure of RNA can be predicted by the thermodynamics-based method of Zuker and Turner. The accuracy of the method's secondary structure predictions for rRNA can be assessed by using as reference the currently available rRNA secondary structure models that have been derived from comparative analysis of rRNA sequence alignments.
We folded 72 23S rRNA sequences with the Zuker-Turner method and scored the resulting secondary structure predictions against the comparative model. Empirically, trends in the score were observed as a function of the phylogenetic memberships of the sequences and as a function of the base pairs secondary structural contexts. Further, three parameters were found that (anti-)correlate with the score.
Three semiquantitative predictors of score were found: % of noncanonical base pairs, % of hairpin loops that were stable tetraloops, and sequence %G + C. The folding of rRNA is a tractable problem and thermodynamics-based folding algorithms, in particular, are useful in the study of this folding problem even for large RNA molecules (e.g. 16S and 23S rRNA).

Download full-text


Available from: Robin R. Gutell
  • Source
    • "Intron and endonuclease nomenclature follows that of Lambowitz & Belfort (1993), and the intergenic open reading frames (ORFs) are numbered according to Foury et al. (1998). The sizes of ribosomal RNAs (rRNA) were inferred from the model elaborated for their equivalents from S. cerevisiae mitochondria (Konings & Gutell, 1995; Fields & Gutell, 1996). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We determined the complete sequence of 71 355-bp-long mitochondrial genome from Saccharomyces paradoxus entirely by direct sequencing of purified mitochondrial DNA (mtDNA). This mtDNA possesses the same features as its close relative Saccharomyces cerevisiae - A + T content 85.9%, set of genes coding for the three components of cytochrome oxidase, cytochrome b, three subunits of ATPase, both ribosomal subunits, gene for ribosomal protein, rnpB gene, tRNA package (24) and yeast genetic code. Genes are interrupted by nine group I and group II introns, two of which are in positions unknown in S. cerevisiae, but recognized in Saccharomyces pastorianus. The gene products are related to S. cerevisiae, and the identity of amino acid residues varies from 100% for cox2 to 83% for rps3. The remarkable differences from S. cerevisiae are (1) different gene order (translocation of trnF-trnT1-trnV-cox3-trnfM-rnpb-trnP and transposition of trnW-rns), (2) occurrence of two unusual GI introns, (3) eight active ori elements, and (4) reduced number of GC clusters and divergent intergenic spacers. Despite these facts, the sequenced S. paradoxus mtDNA introduced to S. cerevisiae was able to support the respiratory function to the same extent as the original mtDNAs.
    Full-text · Article · Jul 2012 · FEMS Yeast Research
  • Source
    • "Given an RNA sequence and the thermodynamic model, efficient dynamic programming algorithms exist for finding a minimum free energy secondary structure [14-17,13,4]. Energy minimization is not as accurate as comparative analysis [18,13], but unlike comparative analysis, it can be applied to single RNA sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm - this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN - have comparable overall performance with different strengths and weaknesses.
    Full-text · Article · Feb 2006 · BMC Bioinformatics
  • Source
    • "We considered any base-pair with a contact distance of 100 nt or less to be "short-range," a contact distance of 101–501 nt to be "mid-range," and a contact distance of 501 or greater to be "long-range." The majority of base-pairs in the 16S and 23S rRNA secondary structure models predicted with comparative analysis were short-range (Table 5), and previous studies have established that short-range base-pairs are predicted more accurately than long-range base-pairs[29,30]. In this section, we: 1) compared the accuracies of the short-range interactions predicted with Mfold 3.1 and Mfold 2.3, 2) compared the number of short-, mid-, and long-range base-pairs in the comparative models with those predicted by Mfold 3.1, and 3) determined the relationship between the base-pair prediction accuracy and the contact distance for 16S rRNA. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1. The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases. Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.
    Full-text · Article · Sep 2004 · BMC Bioinformatics
Show more