Identifying complete RNA structural ensembles including pseudoknots.
ABSTRACT The close relationship between RNA structure and function underlines the significance of accurately predicting RNA structures from sequence information. Structural topologies such as pseudoknots are of particular interest due to their ubiquity and direct involvement in RNA function, but identifying pseudoknots is a computationally challenging problem and existing heuristic approaches usually perform poorly for RNA sequences of even a few hundred bases. We survey the performance of pseudoknot prediction methods on a data set of full-length RNA sequences representing varied sequence lengths, and biological RNA classes such as RNase P RNA, Group I Intron, tmRNA and tRNA. Pseudoknot prediction methods are compared with minimum free energy and suboptimal secondary structure prediction methods in terms of correct base-pairs, stems and pseudoknots and we find that the ensemble of suboptimal structure predictions succeeds in identifying correct structural elements in RNA that are usually missed in MFE and pseudoknot predictions. We propose a strategy to identify a comprehensive set of non-redundant stems in the suboptimal structure space of a RNA molecule by applying heuristics that reduce the structural redundancy of the predicted suboptimal structures by merging slightly varying stems that are predicted to form in local sequence regions. This reduced-redundancy set of structural elements consistently outperforms more specialized approaches.in data sets. Thus, the suboptimal folding space can be used to represent the structural diversity of an RNA molecule more comprehensively than optimal structure prediction approaches alone.