A Seed-Based Method for Predicting Common Secondary Structures in Unaligned RNA Sequences.
ABSTRACT The prediction of RNA secondary structure can be facilitated by incorporating with comparative analysis of homologous sequences.
However, most of existing comparative approaches are vulnerable to alignment errors. Here we use unaligned sequences to devise
a seed-based method for predicting RNA secondary structures. The central idea of our method can be described by three major
steps: 1) to detect all possible stems in each sequence using the so-called position matrix, which indicates the paired or
unpaired information for each position in the sequence; 2) to select the seeds for RNA folding by finding and assessing the
conserved stems across all sequences; 3) to predict RNA secondary structures on the basis of the seeds. We tested our method
on data sets composed of RNA sequences with known secondary structures. Our method has average accuracy (measured as sensitivity)
69.93% for singe sequence tests, 72.97% for two-sequence tests, and 79.27% for three-sequence tests. The results show that
our method can predict RNA secondary structure with a higher accuracy than Mfold.
- SourceAvailable from: toronto.edu[show abstract] [hide abstract]
ABSTRACT: Non-coding RNA (ncRNA) genes produce functional RNA molecules rather than encoding proteins. However, almost all means of gene identification assume that genes encode proteins, so even in the era of complete genome sequences, ncRNA genes have been effectively invisible. Recently, several different systematic screens have identified a surprisingly large number of new ncRNA genes. Non-coding RNAs seem to be particularly abundant in roles that require highly specific nucleic acid recognition without complex catalysis, such as in directing post-transcriptional regulation of gene expression or in guiding RNA modifications.Nature Reviews Genetics 01/2002; 2(12):919-29. · 41.06 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/.Nucleic Acids Research 02/2005; 33(Database issue):D121-4. · 8.28 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Computer codes for computation and comparison of RNA secondary structures, the Vienna RNA package, are presented, that are based on dynamic programming algorithms and aim at predictions of structures with minimum free energies as well as at computations of the equilibrium partition functions and base pairing probabilities.An efficient heuristic for the inverse folding problem of RNA is introduced. In addition we present compact and efficient programs for the comparison of RNA secondary structures based on tree editing and alignment.All computer codes are written in ANSI C. They include implementations of modified algorithms on parallel computers with distributed memory. Performance analysis carried out on an Intel Hypercube shows that parallel computing becomes gradually more and more efficient the longer the sequences are.Die im Vienna RNA package enthaltenen Computer Programme fr die Berechnung und den Vergleich von RNA Sekundrstrukturen werden prsentiert. Ihren Kern bilden Algorithmen zur Vorhersage von Strukturen minimaler Energie sowie zur Berechnung von Zustandssumme und Basenpaarungswahrscheinlichkeiten mittels dynamischer Programmierung.Ein effizienter heuristischer Algorithmus fr das inverse Faltungsproblem wird vorgestellt. Darberhinaus prsentieren wir kompakte und effiziente Programme zum Vergleich von RNA Sekundrstrukturen durch Baum-Editierung und Alignierung.Alle Programme sind in ANSI C geschrieben, darunter auch eine Implementation des Faltungs-algorithmus fr Parallelrechner mit verteiltem Speicher. Wie Tests auf einem Intel Hypercube zeigen, wird das Parallelrechnen umso effizienter je lnger die Sequenzen sind.Monatshefte fuer Chemie/Chemical Monthly 01/1994; 125(2):167-188. · 1.63 Impact Factor