Article

Solving the RNA inverse folding problem through target structure decomposition and Multiobjective Evolutionary Computation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In [15] and [16], where two RNA inverse folding methods are presented, the authors also review methods published from 1994 to 2022. A common general approach is to begin the process with a first candidate sequence initialized so that it is more or less similar to the target structure, and then apply some algorithm to iteratively modify it until a defined stopping criterion is reached or, ideally, it successfully folds into the desired structure. ...
... eM2dRNAs: Enhanced Multiobjective Metaheuristic to Design RNA Sequences [16] is an extension of the aforementioned multiobjective evolutionary algorithm m2dRNAs. The primary enhancement focuses on decomposing the target structure into smaller, more manageable substructures through a recursive process. ...
... X ={ (1,30), (2,29), (3,28), (4,12), (5,11), (15,27), (16,26), 6, 7, 8, 9, 10, 13, 14, 17, 18, 19, 20, 21, 22, ...
Article
Full-text available
RNA design, also known as the RNA inverse folding problem, involves discovering a nucleotide sequence that folds into a target structure. This problem has been addressed from a wide number of approaches, improving the ability to solve it in a reasonable time over time. Despite all these efforts, today no method has completely solved the problem. We present GREED-RNA, a new RNA design algorithm, based on a simple greedy evolutionary strategy. The main feature is the use of several objective functions (Base-pair distance, Hamming distance, probability over ensemble, partition function, ensemble defect and GC-content) to select the best solution in each iteration, changing their weight according to the problem-solving conditions. The performance of GREED-RNA was tested using the Eterna100 benchmark, widely used in this area and never fully solved by any method. In addition, a comparative analysis against several published RNA design methods considering three metrics (solved structures, success rate and execution time), allowed us to verify that GREED-RNA performs better than previously developed methods, thus successfully improving the current ability to solve this problem. This tool also allows users to select a range within which the GC-content of the solution sequences must fall. Source code and results are available at https://github.com/iARN-unex/GREED-RNA.
... When benchmarked on training-testing subsets of our training set, we found that RhoDesign outperformed alternative models, including LEARNA 20 , Meta-LEARNA 20 , RiboLogic 21 , Monte Carlo tree search (MCTS)-RNA 27 , gRNAde 28 , RDesign 29 and eM2dRNAs (enhanced M2dRNAs) 30 (Fig. 1c, Supplementary Table 1 and 'Comparison with other models'). Because here the TM score and RMSD depend on RhoFold-predicted 3D structures, these metrics are bounded by imperfect values corresponding to fully recovered sequences, and we find that RhoDesign-generated sequences approach these bounds (Fig. 1c). ...
... LEARNA 20 , Meta-LEARNA 20 , RiboLogic 21 , MCTS-RNA 27 , gRNAde 28 , RDesign 29 and eM2dRNAs 30 represent different models that have been developed for the task of RNA sequence generation. ...
Article
Full-text available
RNAs represent a class of programmable biomolecules capable of performing diverse biological functions. Recent studies have developed accurate RNA three-dimensional structure prediction methods, which may enable new RNAs to be designed in a structure-guided manner. Here, we develop a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. We show that our approach can design RNA aptamers that are predicted to be structurally similar, yet sequence dissimilar, to known light-up aptamers that fluoresce in the presence of small molecules. We experimentally validate several generated RNA aptamers to have fluorescent activity, show that these aptamers can be optimized for activity in silico, and find that they exhibit a mechanism of fluorescence similar to that of known light-up aptamers. Our results demonstrate how structural predictions can guide the targeted and resource-efficient design of new RNA sequences.
... Methods published after that interval are reviewed in [28]. We summarize here the most relevant ones to this work, which are those that used the same benchmark as us (Eterna100, as we will see) and provide the full list of solved structures. ...
... B. EM2DRNAS ALGORITHM eM2dRNAs [28] is an improved version of m2dRNAs. This algorithm begins with the recursive decomposition of the input target structure, which simplifies the problem to be solved. ...
Article
Full-text available
At present, designing an RNA sequence that folds into a specific secondary structure is a problem that is not fully solved, due to its exponentially increasing complexity. To address this matter, many computational methods have been developed, but none of them has been able to completely and in an affordable time solve Eterna100, a widely recognized benchmark used to test the performance of RNA inverse folding algorithms. In previous publications we presented the m2dRNAs tool, a Multiobjective Evolutionary Algorithm, and its extension eM2dRNAs, which added a recursive decomposition of the target structure, thus simplifying the problem. At that time they successfully improved the ability to solve the RNA inverse folding problem, but were still unable to complete the Eterna100 benchmark. Here we introduce ES+eM2dRNAs, an improvement of eM2dRNAs that optimizes the decomposition process, as a drawback in its nature was identified.A comparative study of this new tool against its predecessors and other RNA design methods was performed using the two current versions of the Eterna100 benchmark. ES+eM2dRNAs was shown to be the best in all performance indicators considered (number of structures solved, success rate, and total run time). Moreover, it is able to solve two Eterna100 structures for which none of the compared methods had ever found a solution.
Article
Full-text available
Background We study in this work the inverse folding problem for RNA, which is the discovery of sequences that fold into given target secondary structures. Results We implement a Lévy mutation scheme in an updated version of an evolutionary inverse folding algorithm and apply it to the design of RNAs with and without pseudoknots. We find that the Lévy mutation scheme increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. Compared to , CPU time is higher but more successful in finding designed sequences that fold correctly into the target structures. Conclusion We propose that a Lévy flight offers a better standard mutation scheme for optimizing RNA design. Our new version of is available on GitHub as a python script and the benchmark results show improved performance on both and the datasets, compared to existing inverse folding tools.
Article
Full-text available
Novel tools for in silico design of RNA constructs such as riboregulators are required in order to reduce time and cost to production for the development of diagnostic and therapeutic advances. Here, we present MoiRNAiFold, a versatile and user-friendly tool for de novo synthetic RNA design. MoiRNAiFold is based on Constraint Programming and it includes novel variable types, heuristics and restart strategies for Large Neighborhood Search. Moreover, this software can handle dozens of design constraints and quality measures and improves features for RNA regulation control of gene expression, such as Translation Efficiency calculation. We demonstrate that MoiRNAiFold outperforms any previous software in benchmarking structural RNA puzzles from EteRNA. Importantly, with regard to biologically relevant RNA designs, we focus on RNA riboregulators, demonstrating that the designed RNA sequences are functional both in vitro and in vivo. Overall, we have generated a powerful tool for de novo complex RNA design that we make freely available as a web server (https://moiraibiodesign.com/design/).
Article
Full-text available
Motivation: Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability and implementation: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100 000nt). Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants’ moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players’ moves are partially stereotyped. Pipelining this CNN’s move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
Article
Full-text available
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Article
Full-text available
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Article
Full-text available
Background The design of multi-stable RNA molecules has important applications in biology, medicine, and biotechnology. Synthetic design approaches profit strongly from effective in-silico methods, which substantially reduce the need for costly wet-lab experiments. Results We devise a novel approach to a central ingredient of most in-silico design methods: the generation of sequences that fold well into multiple target structures. Based on constraint networks, our approach supports generic Boltzmann-weighted sampling, which enables the positive design of RNA sequences with specific free energies (for each of multiple, possibly pseudoknotted, target structures) and GC-content. Moreover, we study general properties of our approach empirically and generate biologically relevant multi-target Boltzmann-weighted designs for an established design benchmark. Our results demonstrate the efficacy and feasibility of the method in practice as well as the benefits of Boltzmann sampling over the previously best multi-target sampling strategy—even for the case of negative design of multi-stable RNAs. Besides empirically studies, we finally justify the algorithmic details due to a fundamental theoretic result about multi-stable RNA design, namely the #P-hardness of the counting of designs. Conclusion introduces a novel, flexible, and effective approach to multi-target RNA design, which promises broad applicability and extensibility. Our free software is available at: https://github.com/yannponty/RNARedPrint Supplementary data are available online. Electronic supplementary material The online version of this article (10.1186/s12859-019-2784-7) contains supplementary material, which is available to authorized users.
Article
Full-text available
Background Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. ResultWe present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. ConclusionMCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA.
Article
Full-text available
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Article
Full-text available
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.
Article
Full-text available
Background: Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incorporating pseudoknot structures into the design. Hairpin-(H)- and kissing hairpin-(K)-type pseudoknots cover a wide range of biologically functional pseudoknots and can be represented on a secondary structure level. Results: The RNA inverse folding program antaRNA, which takes secondary structure, target GC-content and sequence constraints as input, is extended to provide solutions for such H- and K-type pseudoknotted secondary structure constraint. We demonstrate the easy and flexible interchangeability of modules within the antaRNA framework by incorporating pKiss as structure prediction tool capable of predicting the mentioned pseudoknot types. The performance of the approach is demonstrated on a subset of the Pseudobase ++ dataset. Conclusions: This new service is available via a standalone version and is also part of the Freiburg RNA Tools webservice. Furthermore, antaRNA is available in Galaxy and is part of the RNA-workbench Docker image.
Article
Full-text available
RNAs are attractive molecules as the biological parts for synthetic biology. In particular, the ability of conformational changes, which can be encoded in designer RNAs, enables us to create multistable molecular switches that function in biological circuits. Although various algorithms for designing such RNA switches have been proposed, the previous algorithms optimize the RNA sequences against the weighted sum of objective functions, where empirical weights among objective functions are used. In addition, an RNA design algorithm for multiple pseudoknot targets is currently not available. We developed a novel computational tool for automatically designing RNA sequences which fold into multiple target secondary structures. Our algorithm designs RNA sequences based on multi-objective genetic algorithm, by which we can explore the RNA sequences having good objective function values without empirical weight parameters among the objective functions. Our algorithm has great flexibility by virtue of this weight-free nature. We benchmarked our multi-target RNA design algorithm with the datasets of two, three, and four target structures and found that our algorithm shows better or comparable design performances compared with the previous algorithms, RNAdesign and Frnakenstein. In addition to the benchmarks with pseudoknot-free datasets, we benchmarked MODENA with two-target pseudoknot datasets and found that MODENA can design the RNAs which have the target pseudoknotted secondary structures whose free energies are close to the lowest free energy. Moreover, we applied our algorithm to a ribozyme-based ON-switch which takes a ribozyme-inactive secondary structure when the theophylline aptamer structure is assumed. Currently, MODENA is the only RNA design software which can be applied to multiple pseudoknot targets. Successful design results for the multiple targets and an RNA device indicate usefulness of our multi-objective RNA design algorithm.
Article
Full-text available
RNA sequence design is studied at least as long as the classical folding problem. While for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components. In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GCcontent distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de. © The Author(s) 2015. Published by Oxford University Press.
Article
Full-text available
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Article
Full-text available
Background The function of an RNA in cellular processes is directly related to its structure. The free energy of RNA structure in another important key to its function as only some structures with a specific level of free energy can take part in cellular reactions. Therefore, to perform a specific function, a particular RNA structure with specific level of free energy is required. For a given RNA structure, the goal of the RNA design problem is to design an RNA sequence that folds into the given structure. To mimic the biological features of RNA sequences and structures, some sequence and energy constraints should be considered in designing RNA. Although the level of free energy is important, it is not considered in the available approaches for RNA design problem.ResultsIn this paper, we present a new version of our evolutionary algorithm for RNA design problem, entitled ERD, and extend it to handle some sequence and energy constraints. In the sequence constraints, one can restrict sequence positions to a fixed nucleotide or to a subset of nucleotides. As for the energy constraint, one can specify an interval for the free energy ranges of the designed sequences. We compare our algorithm with INFO-RNA, MODENA, NUPACK, and RNAiFold approaches for some artificial and natural RNA secondary structures and constraints.Conclusions The results indicate that our algorithm outperforms the other mentioned approaches in terms of accuracy, speedup, divergency, nucleotides distribution, and similarity to the natural RNA sequences. Particularly, the designed RNA sequences in our method are much more reliable and similar to the natural counterparts. The generated sequences are more diverse and they have closer nucleotides distribution to the natural one. The ERD tool and web server are freely available at http://www.mostafa.ut.ac.ir/corna/erd-cons/.
Article
Full-text available
Significance Self-assembling RNA molecules play critical roles throughout biology and bioengineering. To accelerate progress in RNA design, we present EteRNA, the first internet-scale citizen science “game” scored by high-throughput experiments. A community of 37,000 nonexperts leveraged continuous remote laboratory feedback to learn new design rules that substantially improve the experimental accuracy of RNA structure designs. These rules, distilled by machine learning into a new automated algorithm EteRNABot, also significantly outperform prior algorithms in a gauntlet of independent tests. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.
Article
Full-text available
RNAs play fundamental roles in cellular processes. The function of an RNA is highly dependent to its three-dimensional conformation which is referred to as RNA tertiary structure. Since the prediction or experimental determination of these structures is very difficult, so many works focus on the problems associated with the RNA secondary structure. Here, we consider the RNA inverse folding problem, in which, an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that folds into the target structure. In this paper, we introduce a new evolutionary algorithm for the RNA inverse folding problem. Our algorithm, entitled Evolutionary RNA Design (ERD), generates a sequence whose Minimum Free Energy (MFE) structure is the same as the target structure. We compare our algorithm with INFO-RNA, MODENA, RNAiFold, and NUPACK approaches for some biological test sets. The results presented in this paper indicate that for longer structures our algorithm performs better than the other mentioned algorithms in terms of the energy range, accuracy, speedup, and nucleotides distribution. Particularly, the generated RNA sequences in our method are much more reliable and similar to the natural RNA sequences. The web server and source code are available at http://mostafa.ut.ac.ir/corna/erd. mgtabesh@ut.ac.ir.
Article
Full-text available
Molecular-scale computing has been explored since 1989 owing to the foreseeable limitation of Moore's law for silicon-based computation devices. With the potential of massive parallelism, low energy consumption and capability of working in vivo, molecular-scale computing promises a new computational paradigm. Inspired by the concepts from the electronic computer, DNA computing has realized basic Boolean functions and has progressed into multi-layered circuits. Recently, RNA nanotechnology has emerged as an alternative approach. Owing to the newly discovered thermodynamic stability of a special RNA motif (Shu et al. 2011 Nat. Nanotechnol. 6, 658-667 (doi:10.1038/nnano.2011.105)), RNA nanoparticles are emerging as another promising medium for nanodevice and nanomedicine as well as molecular-scale computing. Like DNA, RNA sequences can be designed to form desired secondary structures in a straightforward manner, but RNA is structurally more versatile and more thermodynamically stable owing to its non-canonical base-pairing, tertiary interactions and base-stacking property. A 90-nucleotide RNA can exhibit 4(90) nanostructures, and its loops and tertiary architecture can serve as a mounting dovetail that eliminates the need for external linking dowels. Its enzymatic and fluorogenic activity creates diversity in computational design. Varieties of small RNA can work cooperatively, synergistically or antagonistically to carry out computational logic circuits. The riboswitch and enzymatic ribozyme activities and its special in vivo attributes offer a great potential for in vivo computation. Unique features in transcription, termination, self-assembly, self-processing and acid resistance enable in vivo production of RNA nanoparticles that harbour various regulators for intracellular manipulation. With all these advantages, RNA computation is promising, but it is still in its infancy. Many challenges still exist. Collaborations between RNA nanotechnologists and computer scientists are necessary to advance this nascent technology.
Article
Full-text available
Synthetic biology is a rapidly emerging discipline with long-term ramifications that range from single-molecule detection within cells to the creation of synthetic genomes and novel life forms. Truly phenomenal results have been obtained by pioneering groups - for instance, the combinatorial synthesis of genetic networks, genome synthesis using BioBricks, and hybridization chain reaction (HCR), in which stable DNA monomers assemble only upon exposure to a target DNA fragment, biomolecular self-assembly pathways, etc. Such work strongly suggests that nanotechnology and synthetic biology together seem poised to constitute the most transformative development of the 21st century. In this paper, we present a Constraint Programming (CP) approach to solve the RNA inverse folding problem. Given a target RNA secondary structure, we determine an RNA sequence which folds into the target structure; i.e. whose minimum free energy structure is the target structure. Our approach represents a step forward in RNA design - we produce the first complete RNA inverse folding approach which allows for the specification of a wide range of design constraints. We also introduce a Large Neighborhood Search approach which allows us to tackle larger instances at the cost of losing completeness, while retaining the advantages of meeting design constraints (motif, GC-content, etc.). Results demonstrate that our software, RNAiFold, performs as well or better than all state-of-the-art approaches; nevertheless, our approach is unique in terms of completeness, flexibility, and the support of various design constraints. The algorithms presented in this paper are publicly available via the interactive webserver http://bioinformatics.bc.edu/clotelab/RNAiFold ; additionally, the source code can be downloaded from that site.
Article
Full-text available
Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Article
Full-text available
Computer codes for computation and comparison of RNA secondary structures, the Vienna RNA package, are presented, that are based on dynamic programming algorithms and aim at predictions of structures with minimum free energies as well as at computations of the equilibrium partition functions and base pairing probabilities.An efficient heuristic for the inverse folding problem of RNA is introduced. In addition we present compact and efficient programs for the comparison of RNA secondary structures based on tree editing and alignment.All computer codes are written in ANSI C. They include implementations of modified algorithms on parallel computers with distributed memory. Performance analysis carried out on an Intel Hypercube shows that parallel computing becomes gradually more and more efficient the longer the sequences are.Die im Vienna RNA package enthaltenen Computer Programme fr die Berechnung und den Vergleich von RNA Sekundrstrukturen werden prsentiert. Ihren Kern bilden Algorithmen zur Vorhersage von Strukturen minimaler Energie sowie zur Berechnung von Zustandssumme und Basenpaarungswahrscheinlichkeiten mittels dynamischer Programmierung.Ein effizienter heuristischer Algorithmus fr das inverse Faltungsproblem wird vorgestellt. Darberhinaus prsentieren wir kompakte und effiziente Programme zum Vergleich von RNA Sekundrstrukturen durch Baum-Editierung und Alignierung.Alle Programme sind in ANSI C geschrieben, darunter auch eine Implementation des Faltungs-algorithmus fr Parallelrechner mit verteiltem Speicher. Wie Tests auf einem Intel Hypercube zeigen, wird das Parallelrechnen umso effizienter je lnger die Sequenzen sind.
Article
Full-text available
RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
Article
Full-text available
Roulette-wheel selection is a frequently used method in genetic and evolutionary algorithms or in modeling of complex networks. Existing routines select one of N individuals using search algorithms of O(N) or O(log(N)) complexity. We present a simple roulette-wheel selection algorithm, which typically has O(1) complexity and is based on stochastic acceptance instead of searching. We also discuss a hybrid version, which might be suitable for highly heterogeneous weight distributions, found, for example, in some models of complex networks. With minor modifications, the algorithm might also be used for sampling with fitness cut-off at a certain value or for sampling without replacement.
Article
Full-text available
Artificially synthesized RNA molecules have recently come under study since such molecules have a potential for creating a variety of novel functional molecules. When designing artificial RNA sequences, secondary structure should be taken into account since functions of noncoding RNAs strongly depend on their structure. RNA inverse folding is a methodology for computationally exploring the RNA sequences folding into a user-given target structure. In the present study, we developed a multi-objective genetic algorithm, MODENA (Multi-Objective DEsign of Nucleic Acids), for RNA inverse folding. MODENA explores the approximate set of weak Pareto optimal solutions in the objective function space of 2 objective functions, a structure stability score and structure similarity score. MODENA can simultaneously design multiple different RNA sequences at 1 run, whose lowest free energies range from a very stable value to a higher value near those of natural counterparts. MODENA and previous RNA inverse folding programs were benchmarked with 29 target structures taken from the Rfam database, and we found that MODENA can successfully design 23 RNA sequences folding into the target structures; this result is better than those of the other benchmarked RNA inverse folding programs. The multi-objective genetic algorithm gives a useful framework for a functional biomolecular design. Executable files of MODENA can be obtained at http://rna.eit.hirosaki-u.ac.jp/modena/.
Article
Full-text available
Pseudoknots in RNA structures make visualization of RNA structures difficult. Even if a pseudoknot itself is represented without a crossing, visualization of the entire RNA structure with a pseudoknot often results in a drawing with crossings between the pseudoknot and other structural elements, and requires additional intervention by the user to ensure that the structure graph is overlap-free. Many programs such as web services prefer to obtain an overlap-free graph in one-shot rather than get a graph with overlaps to be edited. There are few programs for visualizing RNA pseudoknots, and PseudoViewer has been the almost only program that automatically draws RNA secondary structures with pseudoknots. The previous version of PseudoViewer visualizes all the known types of RNA pseudoknots as planar drawings, but visualizes some hypothetical pseudoknots as non-planar drawings. We developed a new version of PseudoViewer for efficiently visualizing large RNA structures with any types of pseudoknots, both known and hypothetical, as planar drawings in one-shot. It is about 10 times faster than the previous algorithm, and produces a more compact and aesthetic structure drawing. PseudoViewer3 supports both web services and web applications. The new version of PseudoViewer, PseudoViewer3, is available at (http://pseudoviewer.inha.ac.kr).
Article
Full-text available
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.
Article
Full-text available
The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. www.bioinf.uni-freiburg.de?Subpages/software.html.
Chapter
The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search algorithms for the RNA Inverse Folding problem.
Article
RNA inverse folding problem is a bioinformatics problem where the objective is to find an RNA sequence that folds into a given target secondary structure. In this work, we use Evolutionary Computation to solve a new and innovative multiobjective definition of this problem. In this new multiobjective definition of the problem, we have considered the similarity between target and predicted structures as a constraint, and three objective functions: (i) Partition Function (free energy of the ensemble), (ii) Ensemble Diversity and (iii) Nucleotides Composition. The Multiobjective Metaheuristic To Design RNA Sequences (m2dRNAs) proposed in this paper is compared against other RNA inverse folding methods published in the literature, such as RNAinverse, RNA-SSD, INFO-RNA, MODENA, NUPACK, fRNAkenstein, DSS-Opt, RNAiFOLD, antaRNA, ERD, and Eterna players. After a comprehensive comparative study on two well-known benchmarks (Rfam and Eterna100), we conclude that m2dRNAs is capable of obtaining very promising results in terms of both quality of RNA designs and required runtime. The source code of m2dRNAs is available at http://arco.unex.es/arl/m2dRNAs-sourcecode.zip.
Article
Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation function to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark, results we believe are strong enough to justify retiring this benchmark.
Conference Paper
Most of the biological processes including expression levels of genes and translation of DNA to produce proteins within cells depend on RNA sequences, and the structure of the RNA plays vital role for its function. RNA design problem refers to the design of an RNA sequence that folds into given secondary structure. However, vast number of possible nucleotide combinations make this an NP-Hard problem. To solve the RNA design problem, a number of researchers have tried to implement algorithms using local stochastic search, context-free grammars, global sampling or evolutionary programming approaches. In this paper, we examine SIMARD, an RNA design algorithm that implements simulated annealing techniques. We also propose QPS, a mutation operator for SIMARD that pre-selects high quality sequences. Furthermore, we present experiment results of SIMARD compared to eight other RNA design algorithms using the Rfam datset. The experiment results indicate that SIMARD shows promising results in terms of Hamming distance between designed sequence and the target structure, and outperforms ERD in terms of free energy.
Chapter
One of the long-standing principles of molecular biology is that DNA acts as a template for transcription of messenger RNAs, which serve as blueprints for protein translation. A rapidly growing number of exceptions to this rule have been reported over the past decades: they include long known classes of RNAs involved in translation such as transfer RNAs and ribosomal RNAs, small nuclear RNAs involved in splicing events, and small nucleolar RNAs mainly involved in the modification of other small RNAs, such as ribosomal RNAs and transfer RNAs. More recently, several classes of short regulatory non-coding RNAs, including piwi-associated RNAs, endogenous short-interfering RNAs and microRNAs have been discovered in mammals, which act as key regulators of gene expression in many different cellular pathways and systems. Additionally, the human genome encodes several thousand long non-protein coding RNAs >200 nucleotides in length, some of which play crucial roles in a variety of biological processes such as epigenetic control of chromatin, promoter-specific gene regulation, mRNA stability, X-chromosome inactivation and imprinting. In this chapter, we will introduce several classes of short and long non-coding RNAs, describe their diverse roles in mammalian gene regulation and give examples for known modes of action.
Conference Paper
RNA structures are important for many biological processes in the cell. One important function of RNA are as catalytic elements. Ribozymes are RNA sequences that fold to form active structures that catalyze important chemical reactions. The folded structure for these RNA are very important; only specific conformations maintain these active structures, so it is very important for RNA to fold in a specific way. The RNA design problem describes the prediction of an RNA sequence that will fold into a given RNA structure. Solving this problem allows researchers to design RNA; they can decide on what folded secondary structure is required to accomplish a task, and the algorithm will give them a primary sequence to assemble. However, there are far too many possible primary sequence combinations to test sequentially to see if they would fold into the structure. Therefore we must employ heuristics algorithms to attempt to solve this problem. This paper introduces SIMARD, an evolutionary algorithm that uses an optimization technique called simulated annealing to solve the RNA design problem. We analyzes three different cooling schedules for the annealing process: 1) An adaptive cooling schedule, 2) a geometric cooling schedule, and 3) a geometric cooling schedule with warm up. Our results show that an adaptive annealing schedule may not be more effective at minimizing the Hamming distance between the target structure and our folded sequence’s structure when compared with geometric schedules. The results also show that warming up in a geometric cooling schedule may be useful for optimizing SIMARD.
Article
Regulatory RNAs have become integral components of the synthetic biology and bioengineering toolbox for controlling gene expression. We recently expanded this toolbox by creating small transcription activating RNAs (STARs) that act by disrupting the formation of a target transcriptional terminator hairpin placed upstream of a gene. While STARs are a promising addition to the repertoire of RNA regulators, much work remains to be done to optimize the fold activation of these systems. Here we apply rational RNA engineering strategies to improve the fold activation of two STAR regulators. We demonstrate that a combination of promoter strength tuning and multiple RNA engineering strategies can improve fold activation from 5.4-fold to 13.4-fold for a STAR regulator derived from the pbuE riboswitch terminator. We then validate the generality of our approach and show that these same strategies improve fold activation from 2.1-fold to 14.6-fold for an unrelated STAR regulator, opening the door to creating a range of additional STARs to use in a broad array of biotechnologies. We also establish that the optimizations preserve the orthogonality of these STARs between themselves and a set of RNA transcriptional repressors, enabling these optimized STARs to be used in sophisticated circuits. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Article
We have implemented a method for the design of RNA sequences that should fold to arbitrary secondary structures. A popular energy model allows one to take the derivative with respect to composition, which can then be interpreted as a force and used for Newtonian dynamics in sequence space. Combined with a negative design term, one can rapidly sample sequences which are compatible with a desired secondary structure via simulated annealing. Results for 360 structures were compared with those from another nucleic acid design program using measures such as the probability of the target structure and an ensemble-weighted distance to the target structure.
Article
We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Θ(N(3) ) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Ω(N(3) ). For target structures containing N ∈{100,200,400,800,1600,3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37°C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp.
Article
The Nucleic Acid Package (NUPACK) is a growing software suite for the analysis and design of nucleic acid systems. The NUPACK web server (http://www.nupack.org) currently enables: Analysis: thermodynamic analysis of dilute solutions of interacting nucleic acid strands. Design: sequence design for complexes of nucleic acid strands intended to adopt a target secondary structure at equilibrium. Utilities: evaluation, display, and annotation of equilibrium properties of a complex of nucleic acid strands. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudoknots are excluded from the structural ensemble. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010
Article
The classic model of the temporal variation of speculative prices (Bachelier 1900) assumes that successive changes of a price Z(t) are independent Gaussian random variables. But, even if Z(t) is replaced by log Z(t),this model is contradicted by facts in four ways, at least: (1) Large price changes are much more frequent than predicted by the Gaussian; this reflects the “excessively peaked” (“leptokurtic”) character of price relatives, which has been well-established since at least 1915. (2) Large practically instantaneous price changes occur often, contrary to prediction, and it seems that they must be explained by causal rather than stochastic models. (3) Successive price changes do not “look” independent, but rather exhibit a large number of recognizable patterns, which are, of course, the basis of the technical analysis of stocks. (4) Price records do not look stationary, and statistical expressions such as the sample variance take very different values at different times; this nonstationarity seems to put a precise statistical model of price change out of the question.
Article
The crystal structure of sodium guanylyl-3′,5′-cytidine (GpC) nonahydrate has been determined by X-ray diffraction procedures and refined to an R value of 0.054. GpC crystallizes with four molecules per monoclinic unit cell, space group C2, with cell dimensions: . Two molecules of GpC related by the 2-fold axis of the crystal form a small segment of right-handed, anti-parallel double-helical RNA in the crystal. Guanine is paired to cytosine through three hydrogen bonds of lengths 2.91, 2.95 and 2.86 Å. The bases along each strand are heavily stacked at a distance of about 3.4 Å. The fragments form skewed flattened rods within the lattice by the inter-molecular stacking of guanines with each other and the stacking of cytosine with the guanosine Ol′atom. The sodium cations are bound only to the ionized phosphate groups in this structure and exhibit face-sharing octahedral co-ordination. The sodium cations serve to bridge the rods of GpC fragments and organize them into sheets within the crystal. There are 18 water molecules per double-helical fragment which are all part of the first co-ordination shell of nitrogen, oxygen or sodium atoms.
Article
The crystal structure of sodium adenylyl-3′,5′-uridine (ApU) hexahydrate has been determined by X-ray diffraction procedures and refined to an R factor of 0.057. ApU crystallizes with two molecules per asymmetric unit in a monoclinic unit cell, space group P21, with cell dimensions: . The two independent molecules of ApU form a small segment of right-handed antiparallel double-helical RNA in the crystal, with Watson-Crick base-pairing between adenine and uracil. This is the first time that this Watson-Crick base-pair has been seen unambiguously at atomic resolution and it is also the first time that a nucleic acid fragment with double-helical symmetry has been seen at atomic resolution. The distance between the C1′ atoma of the adenine-uracil base-pair is slightly shorter than the analogous distance seen in guanine-cytosine base-pairs. The bases in each strand are heavily stacked. One sodium cation binds to the phosphates, as expected; however, the other sodium cation binds on the dyad axis in the minor groove of the double helix. It is co-ordinated directly to the two uracil carbonyl groups which protrude into the minor groove and is shielded from the nearest phosphates by a shell of water. This binding appears to be sequence-specific for ApU. One of the adenines also forms a pair of hydrogen bonds to a nearby ribose, utilizing N6 and N7. The 12 water molecules per double-helical fragment are all part of the first co-ordination shell. The ions and the symmetry of the double-helical fragment are the major organizing elements of the solvent region.
Article
A novel application of dynamic programming to the folding problem for RNA enables one to calculate the full equilibrium partition function for secondary structure and the probabilities of various substructures. In particular, both the partition function and the probabilities of all base pairs are computed by a recursive scheme of polynomial order N3 in the sequence length N. The temperature dependence of the partition function gives information about melting behavior for the secondary structure. The pair binding probabilities, the computation of which depends on the partition function, are visually summarized in a “box matrix” display and this provides a useful tool for examining the full ensemble of probable alternative equilibrium structures. The calculation of this ensemble representation allows a proper application and assessment of the predictive power of the secondary structure method, and yields important information on alternatives and intermediates in addition to local information about base pair opening and slippage. The results are illustrated for representative tRNA, 5S RNA, and self-replicating and self-splicing RNA molecules, and allow a direct comparison with enzymatic structure probes. The effect of changes in the thermodynamic parameters on the equilibrium ensemble provides a further sensitivity check to the predictions.
Article
We describe the RNA folding problem and contrast it with the much more difficult protein folding problem. RNA has four similar monomer units, whereas proteins have 20 very different residues. The folding of RNA is hierarchical in that secondary structure is much more stable than tertiary folding. In RNA the two levels of folding (secondary and tertiary) can be experimentally separated by the presence or absence of Mg2+. Secondary structure can be predicted successfully from experimental thermodynamic data on secondary structure elements: helices, loops, and bulges. Tertiary interactions can then be added without much distortion of the secondary structure. These observations suggest a folding algorithm to predict the structure of an RNA from its sequence. However, to solve the RNA folding problem one needs thermodynamic data on tertiary structure interactions, and identification and characterization of metal-ion binding sites. These data, together with force versus extension measurements on single RNA molecules, should provide the information necessary to test and refine the proposed algorithm.
Article
The function of many RNAs depends crucially on their structure. Therefore, the design of RNA molecules with specific structural properties has many potential applications, e.g. in the context of investigating the function of biological RNAs, of creating new ribozymes, or of designing artificial RNA nanostructures. Here, we present a new algorithm for solving the following RNA secondary structure design problem: given a secondary structure, find an RNA sequence (if any) that is predicted to fold to that structure. Unlike the (pseudoknot-free) secondary structure prediction problem, this problem appears to be hard computationally. Our new algorithm, "RNA Secondary Structure Designer (RNA-SSD)", is based on stochastic local search, a prominent general approach for solving hard combinatorial problems. A thorough empirical evaluation on computationally predicted structures of biological sequences and artificially generated RNA structures as well as on empirically modelled structures from the biological literature shows that RNA-SSD substantially out-performs the best known algorithm for this problem, RNAinverse from the Vienna RNA Package. In particular, the new algorithm is able to solve structures, consistently, for which RNAinverse is unable to find solutions. The RNA-SSD software is publically available under the name of RNA Designer at the RNASoft website (www.rnasoft.ca).
Article
It is fifty years since the first chemical synthesis of a dinucleoside phosphate and a dinucleotide with natural 3'-->5'-internucleotide linkages was reported. The main developments in the methodology of oligo- and poly-nucleotide synthesis that have taken place since are described.
Article
Multi-objective evolutionary algorithms (MOEAs) that use non-dominated sorting and sharing have been criticized mainly for: (1) their O(MN3) computational complexity (where M is the number of objectives and N is the population size); (2) their non-elitism approach; and (3) the need to specify a sharing parameter. In this paper, we suggest a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties. Specifically, a fast non-dominated sorting approach with O(MN2) computational complexity is presented. Also, a selection operator is presented that creates a mating pool by combining the parent and offspring populations and selecting the best N solutions (with respect to fitness and spread). Simulation results on difficult test problems show that NSGA-II is able, for most problems, to find a much better spread of solutions and better convergence near the true Pareto-optimal front compared to the Pareto-archived evolution strategy and the strength-Pareto evolutionary algorithm - two other elitist MOEAs that pay special attention to creating a diverse Pareto-optimal front. Moreover, we modify the definition of dominance in order to solve constrained multi-objective problems efficiently. Simulation results of the constrained NSGA-II on a number of test problems, including a five-objective, seven-constraint nonlinear problem, are compared with another constrained multi-objective optimizer, and the much better performance of NSGA-II is observed