Local sequence-structure motifs in RNA

Chair for Bioinformatics at the Institute of Computer Science, Friedrich-Schiller-Universitaet Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany.
Journal of Bioinformatics and Computational Biology (Impact Factor: 0.78). 01/2005; 2(4):681-98. DOI: 10.1142/S0219720004000818
Source: PubMed


Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2 x m2 x max(n,m)) and a space complexity of only O(n x m). An implementation of our algorithm is available at Its runtime is competitive with global sequence-structure alignment.

Full-text preview

Available from:
  • Source
    • "Using additional computational optimizations, the complexity of LocARNA could be reduced to quartic time and quadratic memory consumption, making it currently one of the most efficient versions of the Sankoff algorithm. Several improvements and extensions of LocARNA have been discussed before: to additionally reduce LocARNA’s runtime, ExpaRNA-P[36,37] utilizes a fast structural filtering method based on local structural motifs [38,39]; REAPR[40] introduces a multiple alignment-based banding method to realign eukaryotic whole genome alignments based on RNA structure; recently, [41] introduces the very efficient LocARNA descendant SPARSE; and LocARNA-P[42] extends LocARNA by computing reliabilities, thus enabling new applications of Sankoff-style alignment. None of these approaches, however, addressed efficient scanning. "
    [Show abstract] [Hide abstract]
    ABSTRACT: BackgroundThe search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task?ResultsSequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA’s algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence.ConclusionsAlthough we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side.AvailabilitySource code of the free software LocARNAscan 1.0 and supplementary data are available at
    Full-text · Article · Apr 2013 · Algorithms for Molecular Biology
  • Source
    • "Therefore, we need to optimize similarity under the additional constraint that the motifs should be matched (approximately) to each other. Another example is the enhancement of RNA (or even protein) comparison by employing knowledge on the structure of the RNAs and proteins [19] [10] [1] [21] [12]. Such tasks can get arbitrary complicated when one wants to combine different kinds of such prior knowledge in one comparison of sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Aligning DNA and protein sequences is a core technique in molec- ular biology. Often, it is desirable to include partial prior knowledge and conditions in an alignment. Going beyond prior work, we aim at the integration of such side constraints in free combination into alignment algorithms. The most common and successful technique for efficient alignment algorithms is dynamic programming (DP).How- ever, a weakness of DP is that one cannot include additional con- straints without specifically tailoring a new DP algorithm. Here, we discuss a declarative approach that is based on constraint techniques and show how it can be extended by formulating additional knowl- edge as constraints. We take special care to obtain the efficiency of DP for sequence alignment. This is achieved by careful modeling and applying proper solving strategies. Finally, we apply our method to the scanning for RNA motifs in large sequences. This case study demonstrates how the new approach can be used in real biological problems. A prototypic implementation of the method is available at
    Preview · Article · Jun 2008 · Constraints
  • Source
    • "These can be divided roughly into two main categories, depending on the exact notion of locality under consideration. The first category (Backofen and Will, 2004; Currey et al., 1998; Wang and Zhang, 2000) defines locality in the structural sense, thus allowing large gaps in the sequences not to be considered as relevant in the alignment score. The second category (Chen et al., 2002; Giegerich et al., 2003) defines locality in the sequential sense, thus extending the well understood notion of locality in strings to RNA sequences. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m <or= n are the lengths of the two given RNAs. These time complexities are comparable to the time complexity of any known RNA alignment algorithm. Furthermore, both our global and local algorithms are robust to selections of arbitrary scoring schemes.
    Full-text · Article · Oct 2007 · Journal of Computational Biology
Show more