Article

Multi-Objective Genetic Algorithm for Pseudoknotted RNA Sequence Design

Graduate School of Science and Technology, Hirosaki University Hirosaki, Japan.
Frontiers in Genetics 04/2012; 3:36. DOI: 10.3389/fgene.2012.00036
Source: PubMed

ABSTRACT RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.

1 Follower
 · 
110 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
    BMC Bioinformatics 10/2012; 13(1):260. DOI:10.1186/1471-2105-13-260 · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus g, for arbitrary g>0. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus g that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has O(n(2)) time complexity.
    Mathematical biosciences 07/2013; 245(2). DOI:10.1016/j.mbs.2013.07.014 · 1.49 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: RNAs play fundamental roles in cellular processes. The function of an RNA is highly dependent to its three-dimensional conformation which is referred to as RNA tertiary structure. Since the prediction or experimental determination of these structures is very difficult, so many works focus on the problems associated with the RNA secondary structure. Here, we consider the RNA inverse folding problem, in which, an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that folds into the target structure. In this paper, we introduce a new evolutionary algorithm for the RNA inverse folding problem. Our algorithm, entitled Evolutionary RNA Design (ERD), generates a sequence whose Minimum Free Energy (MFE) structure is the same as the target structure. We compare our algorithm with INFO-RNA, MODENA, RNAiFold, and NUPACK approaches for some biological test sets. The results presented in this paper indicate that for longer structures our algorithm performs better than the other mentioned algorithms in terms of the energy range, accuracy, speedup, and nucleotides distribution. Particularly, the generated RNA sequences in our method are much more reliable and similar to the natural RNA sequences. The web server and source code are available at http://mostafa.ut.ac.ir/corna/erd. mgtabesh@ut.ac.ir.
    Bioinformatics 01/2014; 30(9). DOI:10.1093/bioinformatics/btu001 · 4.62 Impact Factor
Show more

Preview (2 Sources)

Download
2 Downloads
Available from