ProbKnot: Fast prediction of RNA secondary structure including pseudoknots

Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York 14642, USA.
RNA (Impact Factor: 4.94). 10/2010; 16(10):1870-80. DOI: 10.1261/rna.2125310
Source: PubMed


It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N(2)) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at:

Download full-text


Available from: Stanislav Bellaousov, Sep 30, 2015
30 Reads
  • Source
    • "Examples of these methods include quasi-Monte Carlo searches and genetic algorithms. These methods cannot guarantee the most optimal structure and cannot determine the accuracy of a given prediction toward optimality [9] [10] [11] [12] [13]. A different approach to pseudoknot prediction adopts dynamic programming to predict the tractable subclass of pseudoknots based on complex thermodynamic models in "
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding.
    BioMed Research International 07/2014; 2014:690340. DOI:10.1155/2014/690340 · 2.71 Impact Factor
  • Source
    • "Method 3: First run SimFold on S and G to obtain result G′—a pseudoknot-free structure that contains G. Then let Gupdated be the secondary structure of S containing the relaxed stems of G′ that include the base pairs of G. By a relaxed stem, we mean a secondary structure containing stacked base pairs, bulges of size 1 and internal loops of maximum size of 3 (i.e., either the symmetric loop of 1×1 or the non-symmetric loop of 1×2 or 2×1 but no other loop types; this is motivated by common practice [65]). Then run method 2 on S and Gupdated, and store the result. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at
    BMC Bioinformatics 05/2014; 15(1):147. DOI:10.1186/1471-2105-15-147 · 2.58 Impact Factor
  • Source
    • "It has been expanded to include methods for predicting bimolecular structure (12), conserved structures in multiple homologs (21–24) and siRNA design (9). Several methods are available for predicting structures for a single sequence, including maximum expected accuracy (25), stochastic sampling (26), exhaustive traceback (27) and pseudoknot prediction (28). Graphical user interfaces are provided for Microsoft Windows, Macintosh OS-X and Linux. "
    [Show abstract] [Hide abstract]
    ABSTRACT: RNAstructure is a software package for RNA secondary structure prediction and analysis. This contribution describes a new set of web servers to provide its functionality. The web server offers RNA secondary structure prediction, including free energy minimization, maximum expected accuracy structure prediction and pseudoknot prediction. Bimolecular secondary structure prediction is also provided. Additionally, the server can predict secondary structures conserved in either two homologs or more than two homologs. Folding free energy changes can be predicted for a given RNA structure using nearest neighbor rules. Secondary structures can be compared using circular plots or the scoring methods, sensitivity and positive predictive value. Additionally, structure drawings can be rendered as SVG, postscript, jpeg or pdf. The web server is freely available for public use at:
    Nucleic Acids Research 04/2013; 41(Web Server issue). DOI:10.1093/nar/gkt290 · 9.11 Impact Factor
Show more