ArticlePDF Available

RNA Design Optimization: A Survey and Recent Advances

Authors:

Abstract

RNA design problem is a recently emerging research topic motivated by applica-tions such as customized drug design and the self-assembly of RNA nano-objects. This paper gives a survey of the recent advances in RNA design. We discuss the empirical hardness of solving the problem as well as the combinatorial properties of its underlying sequence-structure map. A literature review on existing algo-rithmic solutions is given and comparisons are made among them. An algorithm performance prediction model is introduced and its relevance to RNA design is addressed. We conclude by proposing that RNA design could be extended into a multi-objective optimization problem and this research topic is worth further exploring.
RNA Design Optimization: A Survey and Recent
Advances
Denny Chen Dai
Senior Supervisor: Kay C. Wiese
School of Computing Science
Simon Fraser University
cda18@cs.sfu.ca
Abstract
RNA design problem is a recently emerging research topic motivated by applica-
tions such as customized drug design and the self-assembly of RNA nano-objects.
This paper gives a survey of the recent advances in RNA design. We discuss the
empirical hardness of solving the problem as well as the combinatorial properties
of its underlying sequence-structure map. A literature review on existing algo-
rithmic solutions is given and comparisons are made among them. An algorithm
performance prediction model is introduced and its relevance to RNA design is
addressed. We conclude by proposing that RNA design could be extended into
a multi-objective optimization problem and this research topic is worth further
exploring.
1 Introduction
RNA is a single stranded sequence and this strand can fold back onto itself. Consider an RNA
molecule as a strand over four types of bases: Adenine (A), Cytosine (C), Guanine (G), and Uracil
(U). Intra-molecular base pairs can form bonds between different nucleotides (nt). AU and GC
are called the Watson-Crick base pairs which are most commonly found in RNAs. However, non-
Watson-Crick base pairing can also occur, for example, the GU and AC wobble pairs [22] [24]. In
summary, the most stable and commonly seen are GC, AU, and GU, and their mirrors, CG, UA, and
UG. These are called the canonical base pairs.
The RNA secondary structure is formed through base pairing between nucleotide bases at different
positions of the primary sequence. In nature, these pairing relations represent the hydrogen bonds
between nitrogens and free energy exists among them. Therefore, different secondary structures
imply different free energy levels. RNA secondary structure becomes stable under a particular en-
ergy level called the ground state, where the structure achieves a minimum free energy (MFE) level
among all possible secondary structure conformations.
In the classical RNA secondary structure prediction problem, we seek for a given RNA primary
sequence its corresponding MFE structure. Efficient dynamic programming algorithms exist [16]
that find the MFE structure in O(n
3
). However, this is achieved under a simplified free energy as-
sumption. In a complete energy model, the problem appears to be NP-hard. The reverse problem,
namely RNA Design, also appears to be NP hard [15] [26]. In RNA design, we seek for a given
structure configuration, the target RNA primary sequence that would fold into this structure as its
MFE ground state. The finding of such sequence(s) involves exploring the exponentially large se-
quence space whose size far surpasses that of the structure space [27]. Recent advances in solving
This work issubmitted as a supporting document for the Phd depth examination at the School of Computing
Science, Simon Fraser University.
1
the design problem are motivated by a number of promising applications such as customized drug
design and the self-assembly of RNA nano-objects.
This survey provides a brief overviewof the recent advances in RNA design. The rest of the paper is
organized as follows: in section 2, we introduce the RNA design problem and discuss issues related
to the empirical hardness of solving it; computationalmethods for estimating this empirical hardness
is also discussed; in section 3, we give a brief literature review on existing algorithmic solutions for
RNA design and presented our recently developed local search method, rnaDesign; in section 4,
we introduce the topic of algorithm performance prediction and presented a regression model for
RNA design prediction; in section 5, we discuss the feasibility of formalizing RNA design into a
multi-objective optimization problem and conclusions are drawn in section 6.
2 RNA Design Problem
RNA primary sequence is defined as a string of length n over alphabet set Σ = {A, C, G, U },
representing the four nucleotide units. It is experimentally known that RNA sequences will fold into
a particular spatial structure in order to achieve certain biological functions. The secondary structure
of an RNA molecule is a coarse-grained simplification of the more complex three dimensional RNA
tertiary structure. Formally, given a single stranded RNA sequence r of length N, where r =
(r
1
, r
2
, . . . , r
N
) and r
i
{A, C, G, U } for which i [1, N ], a secondary structure is a set of
ordered base pairs (i, j), with 1 i < j N satisfying the following constraints:
1. j i > 3, i.e. adjacent bases cannot be paired, and
2. {i, j} {i
0
, j
0
} = Ø, i.e. the base pairs do not conflict with each other.
In RNA design, we search for RNA primary sequences folding into a predefinedsecondary structure.
In this problem, a target structure S
is given which defines a set of pairing relationships among
nucleotides:
{(i, j)|i, j [1, N]} (1)
where N is the size of the structure, and i, j correspond to nucleotide locations where a pairing bond
exists. For a given RNA primary sequence r, a minimum free energy structure exists representing
the ground-state secondary structure for r:
S
o
r
= min{e(S
r
)|S
r
} (2)
Here we define as the set of all possible structures over r and e(S
r
) as the thermodynamic energy
on S
r
. In the design problem, given the target structure S
, the objective is to find a sequence r,
whose MFE structure S
o
r
conforms to S
:
d(S
o
r
, S
) = 0 (3)
where d is the structure distance between two RNA secondary structures.
2.1 Combinatorial Properties and Empirical Hardness
Solving the RNA design problem requires searching the large combinatorial space of all possible
sequences for the one that folds into the predefined structure. For any RNA secondary structure
consisting of n nucleotides, the underlying sequence space contains 4
n
unique candidates. It is also
empirically known that the total number of unique RNA secondary structures is much smaller, with
an estimated upper bound [15] of
(0.7137 n
3/2
2.2888
n
) (4)
Therefore, there exists a many-to-one mapping relationship between the sequence space and the
secondary structure space. A mapping between sequence r and structure s exists if and only if s is
the MFE structure of r. Here we do not consider the possibility of one sequence folding into two
2
MFE structures (RNA switches). Two sequences r
1
and r
2
that map to the same structure are called
neutral counterparts, indicating that both sequences share the same native MFE structure. For any
given structure s, the set of all sequences that map to s constitutes a combinatorial set called the
neutral network. The earlier work of [15] studies the combinatorial properties of neutral network
and [7] addresses its biological importance in terms of evolutionary transition as well as genotype
& phenotype correlation.
The sequence-structure mapping problem was experimentally investigated in [27]. It is found that
neutral sequences are in fact percolated throughout the whole sequence space; on the other hand,
there exists a high degree of connectivity among neutral sequences: here we consider an undirected
connection between neutral sequence r
1
and r
2
exists if and only if the hamming distance between
r
1
and r
2
is 1. In evolutionary dynamics, such connections represent a one-point mutation from
one RNA primary sequence towards another. A high degree of connectivity in the neutral network
indicates that the RNA secondary structure tends to be conserved during sequence mutations. The
mutational landscape of RNA sequences were studied in [28] and its biological relevance was dis-
cussed.
Recent work [26] demonstrated the NP-completenessof RNA design by first transformingthe design
problem into an inverse HMM. In inverse HMM, the hidden path way is equivalent to the RNA
secondary structure and the emitting sequence produced by such a path way is equivalent to the
RNA primary sequence. A polynomial reduction from 3-SAT is then presented to show that there
exists no polynomial runtime algorithm capable of solving RNA design unless P equals NP.
2.2 Sequence Neighborhood Boundary Estimation
The presence of a many-to-one mapping correlation states that, in order to solve the RNA design
problem, one only needs to search for a corresponding neutral network for the target secondary
structure. Once such network is found, a series of one-point mutation sequence solutions will be ob-
tained. However,the distribution of these neutral networksin the sequence space is largely unknown.
This is due to the limitation of current computing power as one need to exhaustively enumerate the
sequence space whose capacity is exponential in the sequence length.
Schuster et. al empirically studied the structure density surface of arbitrary RNA sequences, and
showed that there exists a small-radius hamming neighborhoodball around arbitrary sequences [27];
within such neighborhood ball, the structure coverage is high such that most common secondary
structures could be mapped from at least one sequence within the ball. Therefore, an estimation of
the hamming boundary of such neighborhood ball will give an empirical upper bound for finding
arbitrary neutral networks in the sequence space.
To estimate the neighborhood ball boundary, Schuster et. al proposeda closest approachingdistance
measurement that finds an upper bound for a minimum distance between sequences folded into two
different structures. However, such method is computationally costly therefore becomes infeasible
on large RNA sequences.
We recently proposed an alternative method, the Incremental Redundancy Estimation (IRE) ap-
proach for determining the sequence neighborhood boundary [8]. We confirmed through empirical
experiments the existence of small-radius neighborhood balls centered at arbitrary sequences; it is
shown that the neighborhood ball boundary is much smaller than the corresponding sequence size.
The method also scales well with sequence size and is capable of estimating the hamming boundary
on large sequence with good accuracy. The result of this work is that the hamming boundary of the
neighborhood ball gives an empirical constraint for algorithms that explore the sequence space for
RNA design solving; utilizing the combinatorial properties of the neighborhood structure may also
provide insight towards novel algorithm design for RNA design problem.
3 RNA Design Algorithms
In the literature, heuristic search algorithm is established as the standard technique for tackling the
RNA design problem, as it enables an effective and efficient exploration of the high-dimensional
sequence-structure space. RNA design is the inverse of the classical RNA secondary structure pre-
diction problem, and the study of this problem provides complementary insights into the theory
3
concerning evolutionary dynamics [7]. The general design issues are addressed in [12]. Typically, a
good RNA design shall exhibit both high sequence affinity and structure specificity:
Sequence affinity is defined such that the target folding energy e(r, S
) be low, indicat-
ing the presence of thermodynamic stability. Here r is the RNA primary sequence, S
is
the target structure for design, and e(r, S
) computes the thermodynamic free energy for
sequence r while adopting structure s
.
Structure specificity imposes the primary constraint on the design problem, requiring that
a designed sequence r must have its native MFE structure S
o
r
similar to the target S
. In
other words, the structural distance d(S
o
r
, S
) shall be small.
An early attempt, namely RNAinverse [16], conducts an adaptive local search within the sequence
space, minimizing a cost function defined as the distance between a native fold and the target fold.
A native fold is defined as the MFE structure of the sequence currently investigated by the local
search algorithm. A target fold is the objective secondary structure configuration for the design
problem. The performance of the algorithm, however, is found to be quite sensitive to initial search
points (sequence). In the presence of local optima, RNAinverse fails to return a solution. Recent
works [2] [1] (RNA-SSD) apply stochastic local search to tackle the problem. In RNA-SSD, a
hierarchical decomposition procedure is first applied that recursively breaks down the full length
structure into structural components; a local search is then employed on each substructure in an
attempt to find the desired subsequence. In a final step, the full length sequence is assembled to
form the solution. Empirical results [1] show that RNA-SSD outperforms RNAinverse in terms of
both algorithm runtime and problem solvability. Another hybrid algorithm, namely INFO-RNA [6],
combines a dynamic programming procedure (DP) with an adaptive random walk. The DP is used
for heuristic generation of initial sequence(s) and the subsequent local search is applied for solution
quality improvement. The designability aspect of the design problem over different alphabet set was
studied in [5], and a branch-and-bound deterministic was presented thereafter.
3.1 Local Search for RNA Design
We recently proposed a novel local search algorithm, rnaDesign for solving the problem [9]. Exist-
ing algorithms utilized local search as a supplementary procedure for solution quality improvement
[16] [6]. Although empirical results show that combining local search with ad-hoc design methods
lead to improved performance, it also inevitably increases the model complexity of both algorithm
design and implementation. We demonstrated that applying an adaptive local search procedure is
capable of solving the design problem and show that in certain cases it outperformsa hybrid method.
Our rnaDesign algorithm represents a special case of Simulated Annealing [20] where a combina-
tion of three heuristic schemes is used and the annealing temperature level is fixed throughout the
optimization process. Another motivation of this work is that, since local search performance is
closely correlated to the optimization problem being investigated, a performance analysis of the
algorithm will provide insights into studying the empirical hardness of RNA design as well as its
combinatorial properties.
4 Performance Prediction for RNA Design
It is empirically known that many hard combinatorial optimization problems can be efficiently
solved using metaheuristic algorithms [3]. Metaheuristics (alternatively stochastic local search, or
SLS [17]) refers to an abstract algorithm framework that employs a high level search strategy aim-
ing at exploring the solution space of the target problem in an effective and efficient manner. The
No Free Lunch Theorem [29] states that one algorithm may outperform another over one particular
problem or problem instance, however in general, there exists no such algorithmic solution that is
optimal for all cases. Therefore, the best strategy for optimization is to develop algorithms special-
ized to the specific problem under consideration. In the literature, it is experimentally shown that
the performanceof a given metaheuristics is affected by its meta-parameter setting (MPS) [17] [11].
Thus part of the research effort involves tuning MPS in order to achieve an optimal algorithm per-
formance. However, this process is time consuming as one has to explore a combinatorial parameter
space which itself may be exponential in the parameter size. Furthermore, algorithm performance
may vary across problem instances: an optimal meta-parameter setting under one particular problem
4
may shift under another. As a result, a growing interest in the literature involves applying machine
learning techniques to achieve an automatic MPS tuning (either online or prior configuration). This
leads to more robust algorithm design, as the typical behavior of the SLS would be self-adjusted on
a per-problem or per-instance base. An accurate prediction of algorithm performance therefore is a
prerequisite to achieve this goal. Another motivation of this topic involves studying the empirical
hardness of the optimization problem itself: Since a prediction model correlates characteristics of
the problem (input features) with the performance benchmarks of the algorithm (target values), we
expect to answer questions such as: under what circumstances a given problem(or probleminstance)
is hard to solve, or what particular factor(s) govern the performance of the underlying algorithms
being studied.
One recent work [19] demonstrates using linear regression models for predicting SLS algorithm on
the SAT problem. [18] applies similar methods to achieve an optimal parameter configuration on a
per-instance base for SAT. [23] studied the empirical hardness of a combinatorial auction problem.
It applies linear & nonlinear regression models to identify key problem features that affect algorithm
performance. Our previous work [10] demonstrated using parametric & non-parametric regression
models for algorithm performance prediction on the RNA design problem. Our regression models
include ridge regression (linear regression with regularization), Nadaraya kernel method and Clas-
sification & Regression Tree (CART) model [4] [25]. In that work, we focused on predicting the
performance of the RNAinverse algorithm [16], and evaluated our models against various secondary
structure instances including two biological data sets and one random set. We showed that the
non-parametric model (kernel & CART) outperforms the parametric method (ridge) on biological
data sets, and found that the selection of input features are of crucial importance towards prediction
accuracy
1
. We also found that the CART model identifies key structure features that affect algo-
rithm performance, and it is an intuitive tool for investigating the empirical hardness of RNA design
solving.
The result of our work enables an accurate prediction of algorithm performance on RNA design
problem. Applying the prediction model to a given design algorithm, we are able to predict structure
designability (solvability)
2
on unforeseen secondary structures; furthermore, analyzing the model
itself may help us answer questions such as why some structures are hard/easy to design, or what
structure components are contributing to the overall design difficulty.
5 Multi-Objective Optimization for RNA Design
There is an increasing interest in the literature applying computational methods to design RNA
molecules that satisfy specific constraints. For example, considering the physical aspects of RNA,
it is found that natural RNAs differ from random RNA sequences in a number of physical mea-
surements [13] including thermodynamic stability, mutational robustness, linguistic complexity and
folding efficiency (kinetics). Therefore one of these goals is to design RNA sequences whose char-
acteristics resemble that of a naturally existing one.
In the thermodynamic stability requirement, a desirable RNA sequence shall fold into its MFE struc-
ture with low energy level; it may also has fewer suboptimal structure alternatives such that the
designed sequence is stable; furthermore, we may search for sequences whose MFE structures are
insensitive to parameter perturbations in the free energy model.
In mutational robustness, we look for RNA sequences whose MFE structure remain unchanged
under one-point or k-point mutation. The degree of neutrality measures the average portion of the
structure that remains intact after one-point mutation and could be used towards this measurement.
Alternatively, deleterious effect [28] provides an overall measurement of robustness by investigating
the k-point mutation landscape of a given RNA sequence in the sequence space.
In linguistic complexity, we look at the content of the RNA sequences and search for repeated pat-
terns that conform to a given requirement. It is found that linguistic complexity in various natural
RNAs is lower comparing to random sequences, therefore is an informativecriteria for RNA design.
1
in the literature, the feature selection problem (FSP) itself was studied, for example [21].
2
Designability is often used to characterize the performance property of a given stochastic local search algo-
rithm, where the algorithm isissued multiple independent runs against the problem instance, and the percentage
of successful runs which returns valid results within a given runtime bound are recorded.
5
We may also consider the kinetic aspects of RNAs and aim at designing RNA sequences achieving
certain level of folding efficiency. In folding efficiency [14], we measure the total number of ele-
mentary steps required to fold a given RNA sequence into its native MFE structure. It is found that
natural RNAs have persistent meta-stable states with relatively small folding time. Empirical ex-
periments also show that arbitrary RNA sequences usually have frustrated energy landscape where
there exist high energy barriers and rugged landscape regions.
Therefore, the research question is how to design RNA sequences satisfying various physical con-
straints while also folded into a predescribed secondary structure. Since multiple objectives exist in
this optimization process, we are solving a Multi-Objective Optimization Problem (MOOP). MOOP
refers to the problem of simultaneous optimization of several possibly conflicting and incompatible
objective functions. The typical solution of MOOP would be a set of non-dominated solution can-
didates. In Operations Research, decisions are made from a set of candidate strategies. The choice
of one strategy over another represents a trade-off among various objectives. In MOOP, the optimal
solution is referred to as the Pareto Optimality or Pareto Efficiency. A solution s
is said to be a
Pareto Optimum if there exists no feasible solution s that further improvesat least one function value
within the objective functions set, without simultaneously decreasing the function value of another.
6 Conclusion and Discussion
In this survey, we presented the RNA design problem and its algorithmic solutions. We showed
that RNA design is an empirically hard problem and heuristic search algorithm is established as
the standard technique for tackling it. Solving the RNA design problem requires an efficient and
effective exploration of the high-dimensional sequence space in search for candidates that fold into
a given target structure. Therefore understanding the underlying sequence-structure combinatorics
is of crucial importance, both for interpreting the algorithm performance, as well as developing new
algorithms for the problem. We also introduced the algorithm performance prediction model and
addressed its importance in terms of empirical hardness study. We identified a number of newly
emerged RNA design criteria and proposed solving RNA design as a multi-objective optimization
problem. In summary, RNA design is a promising research topic with various open issues to be
further explored.
Acknowledgments
This work is supported and funded by Dr. Kay C. Wiese (senior supervisor) and the School of
Computing Science at Simon Fraser University.
References
[1] Rosal´ıa Aguirre-Hern´andez, Holger H. Hoos, and Anne Condon. Computational RNA Sec-
ondary Structure Design: Empirical Complexity and ImprovedMethods. BMC Bioinformatics,
8(34), 2007.
[2] Mirela Andronescu, Anthony P. Fejes, Frank Hutter, Holger H. Hoos, and Anne Condon. A
new algorithm for rna secondary structure design. J. Mol. Biol., 336(3):607–624, February
2004.
[3] Christian Blum and Andrea Roli. Metaheuristics in combinatorial optimization: Overview and
conceptual comparison. ACM Comput. Surv., 35(3):268–308, 2003.
[4] Leo Breiman and Jerome Friedman. Classification and Regression Trees. Chapman and Hall,
1984.
[5] Bernd Burghardt and Alexander K. Hartmann. RNA Secondary Structure Design. Physical
Review E, 75(021920), 2007.
[6] Anke Busch and Rolf Backofen. Info-rna - a fast approach to inverse rna folding. Bioinfor-
matics, 22(15):1823–1831, 2006.
[7] Matthew C. Cowperthwaite and Lauren Ancel Meyers. How Mutational Networks Shape Evo-
lution: Lessons from RNA Models. Annual Review of Ecology, Evolution, and Systematics,
38(1):203–230, 2007.
6
[8] Denny C. Dai, Herbert H. Tsang, and Kay C. Wiese. An incremental redundancyestimation for
the sequence neighborhood boundary. Proceedings of The 4th Canadian Student Conference
on Biomedical Computing and Bioinformatic, March 2009.
[9] Denny C. Dai, Herbert H. Tsang, and Kay C. Wiese. rnadesign: Local search for rna secondary
structure design. Proceedings of the 2009 IEEE Symposium on Computational Intelligence in
Bioinformatics and Computational Biology, April 2009.
[10] Denny C. Dai and Kay C. Wiese. Performance prediction for rna design using parametric and
non-parametricregression model. Proceedings of the 2009 IEEE Symposium on Computational
Intelligence in Bioinformatics and Computational Biology, April 2009.
[11] Denny Chen Dai. A comparative study of metaheuristic algorithms for the fertilizer optimiza-
tion problem. Master’s thesis, Department of Computer Science, University of Saskatchewan,
August 2006.
[12] Robert M. Dirks, Milo Lin, Erik Winfree, and Niles A. Pierce. Paradigms for computational
nucleic acid design.
[13] N. Dromi, A. Avihoo, and D. Barash. Reconstruction of natural rna sequences from rna
shape, thermodynamic stability, mutational robustness, and linguistic complexity by evolu-
tionary computation. Journal of Biomolecular Structure and Dynamics, 26:147–62, 2008.
[14] Christoph Flamm, Walter Fontana, Ivo L. Hofacker, and Peter Schuster. Rna folding at ele-
mentary step resolution. RNA, 6:325–338, 2000.
[15] Walter Gruner,Robert Giegerich, Dirk Strothmann, Christian Reidys, Jacqueline Weber, Ivo L.
Hofacker, Peter F. Stadler, and Peter Schuster. Analysis of RNA Sequence Structure Maps by
Exhaustive Enumeration. Working Papers 95-10-099, Santa Fe Institute, Oct 1995.
[16] Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L S. Bonhoeffer, Manfred Tacker, and Pe-
ter Schuster. Fast folding and comparison of RNA secondary structures. Monatsh. Chem.
(Chemical Monthly), 125:167–188, 1994.
[17] Holger H. Hoos and Thomas St¨utzle. Stochastic Local Search: Foundations and Applications.
Morgan Kaufmann, 2005.
[18] Frank Hutter and Youssef Hamadi. Parameter adjustment based on performance prediction:
Towards an instance-aware problem solver. Technical Report MSR-TR-2005-125, Microsoft
Research, December 2005.
[19] Frank Hutter, Youssef Hamadi, Holger H. Hoos, and Kevin Leyton-Brown. Performance pre-
diction and automated tuning of randomized and parametric algorithms. In Principles and
Practice of Constraint Programming (CP-06), Lecture Notes in Computer Science 4204, pages
213–228. Springer Berlin, 2006.
[20] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi. Optimization by simulated annealing. Sci-
ence, 220(4598):671–680, 1983.
[21] Daphne Koller and Mehran Sahami. Toward optimal feature selection. In International Con-
ference on Machine Learning, pages 284–292, 1996.
[22] Neocles Leontis and Eric Westhof. Geometric nomenclature and classification of RNA base
pairs. RNA, 7(4):499–512, 2001.
[23] Kevin Leyton-brown, Eugene Nudelman, and Yoav Shoham. Learning the empirical hardness
of optimization problems: The case of combinatorial auctions. In In Constraint Programming
(CP), pages 556–572, 2002.
[24] Uma Nagaswamy, Maia Larios-Sanz, James Hury, Shakaala Collins, Zhengdong Zhang, Qin
Zhao, and George E. Fox. NCIR: a database of non-canonical interactions in known RNA
stuctures. Nucleic Acids Research, 30(1):395–397, 2002.
[25] Brian D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press,
1996.
[26] Michael Schnall-Levin, Leonid Chindelevitch, and Bonnie Berger. Inverting the Viterbi Algo-
rithm: an Abstract Frameworkfor Structure Design. In William W. Cohen, AndrewMcCallum,
and Sam T. Roweis, editors, ICML, volume 307 of ACM International Conference Proceeding
Series, pages 904–911. ACM, 2008.
7
[27] Peter Schuster, Walter Fontana, Peter F. Stadler, and Ivo L. Hofacker. From sequences to
shapes and back: A case study in rna secondary structures. Proceedings: Biological Sciences,
255(1344):279–284, 1994.
[28] J´eerˆome Waldisp¨uhl, Srinvas Devadas, Bonnie Berger, and Peter Clote. Efficient Algorithm
for Probing the RNA Mutation Landscape. PLoS Computational Biology, 4(8), 2008.
[29] David H. Wolpert and William G. Macready. No free lunch theorems for search. Technical
report, Santa Fe Institue, 1995.
8
ResearchGate has not been able to resolve any citations for this publication.
Thesis
Full-text available
Hard combinatorial optimization (CO) problems pose challenges to traditional algorithmic solutions. The search space usually contains a large number of local optimal points and the computational cost to reach a global optimum may be too high for practical use. In this work, we conduct a comparative study of several state-of-the-art metaheuristic algorithms for hard CO problems solving. Our study is motivated by an industrial application called the Fertilizer Blends Optimization. We focus our study on a number of local search metaheuristics and analyze their performance in terms of both runtime efficiency and solution quality. We show that local search granularity (move step size) and the downhill move probability are two major factors that affect algorithm performance, and we demonstrate how experimental tuning work can be applied to obtain good performance of the algorithms. Our empirical result suggests that the well-known Simulated Annealing (SA) algorithm showed the best performance on the fertilizer problem. The simple Iterated Improvement Algorithm (IIA) also performed surprisingly well by combining strict uphill move and random neighborhood selection. A novel approach, called Delivery Network Model (DNM) algorithm, was also shown to be competitive, but it has the disadvantage of being very sensitive to local search granularity. The constructive local search method (GRASP), which combines heuristic space sampling and local search, outperformed IIA without a construction phase; however, the improvement in performance is limited and generally speaking, local search performance is not sensitive to initial search positions in our studied fertilizer problem.
Article
Full-text available
Recent advances in molecular biology and computation have enabled evolutionary biologists to develop models that explicitly capture molecular structure. By including complex and realistic maps from genotypes to phenotypes, such models are yielding important new insights into evolutionary processes. In particular, computer simulations of evolving RNA structure have inspired a new conceptual framework for thinking about patterns of mutational connectivity and general theories about the nature of evolutionary transitions, the evolutionary ascent of nonoptimal phenotypes, and the origins of mutational robustness and modular structures. Here, we describe this class of RNA models and review the major conceptual contributions they have made to evolutionary biology.
Article
Full-text available
Computer codes for computation and comparison of RNA secondary structures, the Vienna RNA package, are presented, that are based on dynamic programming algorithms and aim at predictions of structures with minimum free energies as well as at computations of the equilibrium partition functions and base pairing probabilities.An efficient heuristic for the inverse folding problem of RNA is introduced. In addition we present compact and efficient programs for the comparison of RNA secondary structures based on tree editing and alignment.All computer codes are written in ANSI C. They include implementations of modified algorithms on parallel computers with distributed memory. Performance analysis carried out on an Intel Hypercube shows that parallel computing becomes gradually more and more efficient the longer the sequences are.Die im Vienna RNA package enthaltenen Computer Programme fr die Berechnung und den Vergleich von RNA Sekundrstrukturen werden prsentiert. Ihren Kern bilden Algorithmen zur Vorhersage von Strukturen minimaler Energie sowie zur Berechnung von Zustandssumme und Basenpaarungswahrscheinlichkeiten mittels dynamischer Programmierung.Ein effizienter heuristischer Algorithmus fr das inverse Faltungsproblem wird vorgestellt. Darberhinaus prsentieren wir kompakte und effiziente Programme zum Vergleich von RNA Sekundrstrukturen durch Baum-Editierung und Alignierung.Alle Programme sind in ANSI C geschrieben, darunter auch eine Implementation des Faltungs-algorithmus fr Parallelrechner mit verteiltem Speicher. Wie Tests auf einem Intel Hypercube zeigen, wird das Parallelrechnen umso effizienter je lnger die Sequenzen sind.
Conference Paper
Full-text available
We propose a new approach for understanding the algorithm-specific empirical hardness of $ \mathcal{N}\mathcal{P} $ \mathcal{N}\mathcal{P} -Hard problems. In this work we focus on the empirical hardness of the winner determination problem—an optimization problem arising in combinatorial auctions—when solved by ILOG’s CPLEX software. We consider nine widely-used problem distributions and sample randomly from a continuum of parameter settings for each distribution. We identify a large number of distribution-nonspecific features of data instances and use statistical regression techniques to learn, evaluate and interpret a function from these features to the predicted hardness of an instance.
Article
Full-text available
The field of metaheuristics for the application to combinatorial optimization problems is a rapidly growing field of research. This is due to the importance of combinatorial optimization problems for the scientific as well as the industrial world. We give a survey of the nowadays most important metaheuristics from a conceptual point of view. We outline the different components and concepts that are used in the different metaheuristics in order to analyze their similarities and differences. Two very important concepts in metaheuristics are intensification and diversification. These are the two forces that largely determine the behaviour of a metaheuristic. They are in some way contrary but also complementary to each other. We introduce a framework, that we call the I&D frame, in order to put different intensification and diversification components into relation with each other. Outlining the advantages and disadvantages of different metaheuristic approaches we conclude by pointing out the importance of hybridization of metaheuristics as well as the integration of metaheuristics and other methods for optimization.
Conference Paper
Full-text available
Machine learning can be utilized to build models that predict the run- time of search algorithms for hard combinatorial problems. Such empirical hard- ness models have previously been studied for complete, deterministic search algo- rithms. In this work, we demonstrate that such models can also make surprisingly accurate predictions of the run-time distributions of incomplete and randomized search methods, such as stochastic local search algorithms. We also show for the first time how information about an algorithm's parameter settings can be incor- porated into a model, and how such models can be used to automatically adjust the algorithm's parameters on a per-instance basis in order to optimize its perfor- mance. Empirical results for Novelty+ and SAPS on structured and unstructured SAT instances show very good predictive performance and significant speedups of our automatically determined parameter settings when compared to the default and best fixed distribution-specific parameter settings.
Article
Tuning an algorithm's parameters for robust and high performance is a tedious and time-consuming task that often requires knowledge about both the domain and the algorithm of interest. Furthermore, the optimal parameter configuration to use may differ considerably across problem instances. In this report, we define and tackle the algorithm configuration problem, which is to automatically choose the optimal parameter configuration for a given algorithm on a per-instance base. We employ an indirect approach that predicts algorithm runtime for the problem instance at hand and each (con-tinuous) parameter configuration, and then simply chooses the con-figuration that minimizes the prediction. This approach is based on similar work by Leyton-Brown et al. [LBNS02, NLBD + 04] who tackle the algorithm selection problem [Ric76] (given a problem in-stance, choose the best algorithm to solve it). While all previous studies for runtime prediction focussed on tree search algorithm, we demonstrate that it is possible to fairly accurately predict the runtime of SAPS [HTH02], one of the best-performing stochastic local search algorithms for SAT. We also show that our approach automatically picks parameter configurations that speed up SAPS by an average factor of more than two when compared to its default parameter configuration. Finally, we introduce sequential Bayesian learning to the problem of runtime prediction, enabling an incre-mental learning approach and yielding very informative estimates of predictive uncertainty.