Article
Alignment of RNA base pairing probability matrices
Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, Vienna, Austria.
Bioinformatics (Impact Factor: 4.98). 10/2004; 20(14):22227. DOI: 10.1093/bioinformatics/bth229 Source: PubMed
ABSTRACT
Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structurebased alignments are an indispensable necessity in RNA bioinformatics.
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximumweight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from http://www.tbi.univie.ac.at/RNA/PMcomp/. A web server is available at http://rna.tbi.univie.ac.at/cgibin/pmcgi.pl
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximumweight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from http://www.tbi.univie.ac.at/RNA/PMcomp/. A web server is available at http://rna.tbi.univie.ac.at/cgibin/pmcgi.pl
Get notified about updates to this publication Follow publication 
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

 "A more detailed presentation of probing algebra products will be given in a yet unpublished work[42]. RNA alignment folding: As in test case Ali, we study the behavior of Pareto fronts in an (re)implementation of RNAalifold[43,44]. We analyze structure prediction with the MFE and MEA algebras and a covariance model algebra COVARfollowing the definitions of[45]. "

 ", we start by introducing dynamic programming matrices S and D in analogy to the dynamic programming matrices of LocARNA[22] and PMcomp[34]. Thus, we define the entry Si,j;k,l as the best subscore for i…j and k…l. "
[Show abstract] [Hide abstract]
ABSTRACT: BackgroundThe search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptorbased methods. In most practical applications published todate, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task?ResultsSequencestructure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is preprocessed to obtain local base pairing probabilities. As a search engine we devised a semiglobal scanning variant of LocARNA’s algorithm for sequencestructure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence.ConclusionsAlthough we demonstrate that a purely structurebased homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptomewide applications, such methods will provide accurate structure annotations on the target side.AvailabilitySource code of the free software LocARNAscan 1.0 and supplementary data are available at http://www.bioinf.unileipzig.de/Software/LocARNAscan. 
 "Hence, the grouping has to be performed according to sequence and structure. Various algorithmic approaches have been introduced to determine structural similarities and to derive consensus structure patterns for structural RNAs with low sequence identity (Bompfunewerer et al., 2008; Bradley et al., 2008; Gorodkin et al., 1997; Havgaard et al., 2005; Heyne et al., 2009; Ho¨chsmann et al., 2003; Hofacker et al., 2004; Mathews and Turner, 2002; Sankoff, 1985; Siebert and Backofen, 2005; Will et al., 2007). A first approach toward the clustering of miRNAs has been achieved in Kaczkowski et al. (2009), where *To whom correspondence should be addressed. "
[Show abstract] [Hide abstract]
ABSTRACT: Motivation: The computational search for novel microRNA (miRNA) precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognized and processed by the cellular miRNAmaturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allow then the identification of candidates that are similar to known precursors. Given the large number of premiRNA candidates that can be identified in singlegenome approaches, even after applying several filters for precursor robustness and stability, a conventional structural clustering approach is unfeasible. Results: We propose a method to represent candidate structures in a feature space, which summarizes key sequence/structure characteristics of each candidate. We demonstrate that proximity in this feature space is related to sequence/structure similarity, and we select candidates that have a high similarity to known precursors. Additional filtering steps are then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method is compared with another singlegenome method (TripletSVM) in two datasets, showing better performance in one and comparable performance in the other, for larger training sets. Additionally, we show that our approach allows for a better interpretation of the results. Availability and Implementation: The MinDist method is implemented using Perl scripts and is freely available at http://www.cravela.org/?mindist=1. Contact: backofen@informatik.unifreiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.