Alignment of RNA base pairing probability matrices

Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, Vienna, Austria.
Bioinformatics (Impact Factor: 4.98). 10/2004; 20(14):2222-7. DOI: 10.1093/bioinformatics/bth229
Source: PubMed


Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics.
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from A web server is available at

Download full-text


Available from: Peter F. Stadler, Oct 10, 2015
23 Reads
  • Source
    • ", we start by introducing dynamic programming matrices S and D in analogy to the dynamic programming matrices of LocARNA[22] and PMcomp[34]. Thus, we define the entry Si,j;k,l as the best subscore for i…j and k…l. "
    [Show abstract] [Hide abstract]
    ABSTRACT: BackgroundThe search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task?ResultsSequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA’s algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence.ConclusionsAlthough we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side.AvailabilitySource code of the free software LocARNAscan 1.0 and supplementary data are available at
    Algorithms for Molecular Biology 04/2013; 8(1):14. DOI:10.1186/1748-7188-8-14 · 1.46 Impact Factor
  • Source
    • "Hence, the grouping has to be performed according to sequence and structure. Various algorithmic approaches have been introduced to determine structural similarities and to derive consensus structure patterns for structural RNAs with low sequence identity (Bompfunewerer et al., 2008; Bradley et al., 2008; Gorodkin et al., 1997; Havgaard et al., 2005; Heyne et al., 2009; Ho¨chsmann et al., 2003; Hofacker et al., 2004; Mathews and Turner, 2002; Sankoff, 1985; Siebert and Backofen, 2005; Will et al., 2007). A first approach toward the clustering of miRNAs has been achieved in Kaczkowski et al. (2009), where *To whom correspondence should be addressed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The computational search for novel microRNA (miRNA) precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognized and processed by the cellular miRNA-maturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allow then the identification of candidates that are similar to known precursors. Given the large number of pre-miRNA candidates that can be identified in single-genome approaches, even after applying several filters for precursor robustness and stability, a conventional structural clustering approach is unfeasible. Results: We propose a method to represent candidate structures in a feature space, which summarizes key sequence/structure characteristics of each candidate. We demonstrate that proximity in this feature space is related to sequence/structure similarity, and we select candidates that have a high similarity to known precursors. Additional filtering steps are then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method is compared with another single-genome method (TripletSVM) in two datasets, showing better performance in one and comparable performance in the other, for larger training sets. Additionally, we show that our approach allows for a better interpretation of the results. Availability and Implementation: The MinDist method is implemented using Perl scripts and is freely available at Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2012; 28(23). DOI:10.1093/bioinformatics/bts574 · 4.98 Impact Factor
  • Source
    • "The Sankoff-simplification introduced by PMcomp (5) significantly reduces the run-time by using a simplified energy model based on base pair probability matrices. Due to this idea, an alignment is obtained in two steps. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to recent algorithmic progress, tools for the gold standard of comparative RNA analysis, namely Sankoff-style simultaneous alignment and folding, are now readily applicable. Such approaches, however, compare RNAs with respect to a simultaneously predicted, single, nested consensus structure. To make multiple alignment of RNAs available in cases, where this limitation of the standard approach is critical, we introduce a web server that provides a complete and convenient interface to the RNA structure alignment tool 'CARNA'. This tool uniquely supports RNAs with multiple conserved structures per RNA and aligns pseudoknots intrinsically; these features are highly desirable for aligning riboswitches, RNAs with conserved folding pathways, or pseudoknots. We represent structural input and output information as base pair probability dot plots; this provides large flexibility in the input, ranging from fixed structures to structure ensembles, and enables immediate visual analysis of the results. In contrast to conventional Sankoff-style approaches, 'CARNA' optimizes all structural similarities in the input simultaneously, for example across an entire RNA structure ensemble. Even compared with already costly Sankoff-style alignment, 'CARNA' solves an intrinsically much harder problem by applying advanced, constraint-based, algorithmic techniques. Although 'CARNA' is specialized to the alignment of RNAs with several conserved structures, its performance on RNAs in general is on par with state-of-the-art general-purpose RNA alignment tools, as we show in a Bralibase 2.1 benchmark. The web server is freely available at
    Nucleic Acids Research 06/2012; 40(Web Server issue):W49-53. DOI:10.1093/nar/gks491 · 9.11 Impact Factor
Show more