Alignment of RNA base pairing probability matrices.

Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, Vienna, Austria.
Bioinformatics (Impact Factor: 4.62). 10/2004; 20(14):2222-7. DOI: 10.1093/bioinformatics/bth229
Source: PubMed

ABSTRACT Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics.
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from A web server is available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Functional RNA molecules often are conserved in their secondary structure rather than in their primary sequence. To assess functional similarity, primary sequence as well as secondary structure information need to be taken into account. Based on a Sankoff-style algorithm (cf. [1]) for sequence-structure alignment, we developed a method which results in a set of Pareto-optimal alignments, so that a prior weighting of the structure and alignment objectives is not necessary. We also show that a conventional algo-rithm which calculates an optimal alignment regarding a single objective function may not always be able to find all biologically relevant secondary structures.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: There is increasing evidence of pervasive transcription, resulting in hundreds of thousands of ncRNAs of unknown function. Standard computational analysis tasks for inferring functional annotations like clustering require fast and accurate RNA comparisons based on sequence and structure similarity. The gold standard for the latter is Sankoff's algorithm [3], which simultaneously aligns and folds RNAs. Because of its extreme time complexity of O(n6), numerous faster "Sankoff-style" approaches have been suggested. Several such approaches introduce heuristics based on sequence alignment, which compromises the alignment quality for RNAs with sequence identities below 60% [1]. Avoiding such heuristics, as e.g. in LocARNA [4], has been assumed to prohibit time complexities better than O(n4), which strongly limits large-scale applications.
    Proceedings of the 17th international conference on Research in Computational Molecular Biology; 04/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 11/2014; · 8.81 Impact Factor

Full-text (3 Sources)

Available from
May 15, 2014