Alignment of RNA base pairing probability matrices.

Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, Vienna, Austria.
Bioinformatics (Impact Factor: 5.32). 10/2004; 20(14):2222-7. DOI: 10.1093/bioinformatics/bth229
Source: PubMed

ABSTRACT Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics.
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from A web server is available at

  • [Show abstract] [Hide abstract]
    ABSTRACT: The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N(6)) time and O(N(4)) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N(4)sigma(N)), where sigma(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.
    Journal of computational biology: a journal of computational molecular cell biology 08/2010; 17(8):1051-65. · 1.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: The calculation of reliable alignments for structured RNA is still considered as an open problem. One approach is the incorporation of secondary structure information into the optimisation criteria by using a weighted sum of sequence and structure components as an objective function. Since it is not clear how to choose the weighting parameters, we use multi-objective optimisation to calculate a set of Pareto-optimal RNA sequence- structure alignments. The solutions in this set then represent all possible trade-offs between the different objectives, independent of any prior weighting. RESULTS: We present a practical multi-objective dynamic programming algorithm which is a new method for the calculation of the set of Pareto-optimal solutions to the pairwise RNA sequence-structure alignment problem. In selected examples, we show the usefulness of this approach, and its advantages over state-of-the-art single- objective algorithms. AVAILABILITY: The source code of our software (ISO C++11) is freely available at and is licensed under the GNU GPLv3. CONTACT: SUPPLEMENTARY INFORMATION: Suppelementary data are available at Bioinformatics online.
    Bioinformatics 04/2013; · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: There is increasing evidence of pervasive transcription, resulting in hundreds of thousands of ncRNAs of unknown function. Standard computational analysis tasks for inferring functional annotations like clustering require fast and accurate RNA comparisons based on sequence and structure similarity. The gold standard for the latter is Sankoff's algorithm [3], which simultaneously aligns and folds RNAs. Because of its extreme time complexity of O(n6), numerous faster "Sankoff-style" approaches have been suggested. Several such approaches introduce heuristics based on sequence alignment, which compromises the alignment quality for RNAs with sequence identities below 60% [1]. Avoiding such heuristics, as e.g. in LocARNA [4], has been assumed to prohibit time complexities better than O(n4), which strongly limits large-scale applications.
    Proceedings of the 17th international conference on Research in Computational Molecular Biology; 04/2013

Full-text (3 Sources)

Available from
May 15, 2014