ProbKnot: Fast prediction of RNA secondary structure including pseudoknots

Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York 14642, USA.
RNA (Impact Factor: 4.62). 10/2010; 16(10):1870-80. DOI: 10.1261/rna.2125310
Source: PubMed

ABSTRACT It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N(2)) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at:

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Artificial gene synthesis requires consideration of nucleotide sequence development as well as long DNA molecule assembly protocols. The nucleotide sequence of the molecule must meet many conditions including particular preferences of the host organism for certain codons, avoidance of specific regulatory subsequences, and a lack of secondary structures that inhibit expression. The chemical synthesis of DNA molecule has limitations in terms of strand length; thus, the creation of artificial genes requires the assembly of long DNA molecules from shorter fragments. In the approach presented, the algorithm and the computer program address both tasks: developing the optimal nucleotide sequence to encode a given peptide for a given host organism and determining the long DNA assembly protocol. These tasks are closely connected; a change in codon usage may lead to changes in the optimal assembly protocol, and the lack of a simple assembly protocol may be addressed by changing the nucleotide sequence. The computer program presented in this study was tested with real data from an experiment in a wet biological laboratory to synthesize a peptide. The benefit of the presented algorithm and its application is the shorter time, compared to polymerase cycling assembly, needed to produce a ready synthetic gene.
    BioMed Research International 01/2015; 2015(2015):1-8. DOI:10.1155/2015/413262 · 2.71 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ongoing effort to detect and characterize physical entanglement in biopolymers has so far established that knots are present in many globular proteins and also, abound in viral DNA packaged inside bacteriophages. RNA molecules, however, have not yet been systematically screened for the occurrence of physical knots. We have accordingly undertaken the systematic profiling of the several thousand RNA structures present in the Protein Data Bank (PDB). The search identified no more than three deeply knotted RNA molecules. These entries are rRNAs of about 3,000 nt solved by cryo-EM. Their genuine knotted state is, however, doubtful based on the detailed structural comparison with homologs of higher resolution, which are all unknotted. Compared with the case of proteins and viral DNA, the observed incidence of knots in available RNA structures is, therefore, practically negligible. This fact suggests that either evolutionary selection or thermodynamic and kinetic folding mechanisms act toward minimizing the entanglement of RNA to an extent that is unparalleled by other types of biomolecules. A possible general strategy for designing synthetic RNA sequences capable of self-tying in a twist-knot fold is finally proposed.
    Proceedings of the National Academy of Sciences 02/2015; DOI:10.1073/pnas.1418445112 · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 11/2014; 42(22). DOI:10.1093/nar/gku1172 · 8.81 Impact Factor

Full-text (4 Sources)

Available from
Jun 5, 2014