PosterPDF Available

Abstract

The ability of biomolecules like DNA, RNA or proteins to fold into a well-defined native state is a prerequisite for biologically functional molecules. A reasonable level of coarse-graining is needed in order to treat biomolecules within a theoretical framework. Kinetics and structure formation processes of biopolymers are crucially determined by the topological details of the underlying (free) energy landscape. We present a generic, problem independent framework for exploration of the low-energy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins.
Energy Landscapes and Dynamics of Biopolymers
Michael T. Wolfinger1, W. Andreas Svrcek-Seiler1, Christoph Flamm2, Ivo L. Hofacker1, Peter F. Stadler2
1Institute for Theoretical Chemistry, University of Vienna, Austria
2Department of Computer Science, University of Leipzig, Germany
Tel: +43 1 4277 52747 Fax: +43 1 4277 52793 Email: {mtw,svrci,xtof,ivo,studla}@tbi.univie.ac.at Web: http://www.tbi.univie.ac.at/
The ability of biomolecules like DNA, RNA or pro-
teins to fold into a well-defined native state is a
prerequisite for biologically functional molecules. A
reasonable level of coarse-graining is needed in or-
der to treat biomolecules within a theoretical frame-
work. Kinetics and structure formation processes of
biopolymers are crucially determined by the topolog-
ical details of the underlying (free) energy landscape.
We present a generic, problem independent frame-
work for exploration of the low-energy portion of the
energy landscape of discrete systems and apply it to
the energy landscape of lattice proteins.
Lattice Proteins
The HPNX model is used to study general proper-
ties of lattice heteropolymers. Within this simplified
model, a conformation is regarded as a self-avoiding
walk on a two- or three-dimensional lattice.
Left: 74-mer lattice protein on the 2D square lattice (SQ). Mid-
dle: 27-mer on the 3D simple cubic (SC) lattice. Right: Interaction
scheme for the HPNX model used here.
The 20 letter alphabet of amino acids is reduced to a
four letter alphabet: Hydrophobic (H), positive (P),
negative (N) and neutral (X) residues. Energy is eval-
uated via a pair potential with attractive interactions
when two beads are neighbors in the lattice but not
along the chain. Lattice heteropolymers offer the ad-
vantage of modeling the general properties of proteins
at relatively low computational cost. However, they
represent a crude abstraction by implying fixed bond
lengths and angles.
Energy Landscapes
The energy landscape of a biopolymer molecule
is a complex surface of the free energy ver-
sus the conformational degrees of freedom. En-
ergy landscapes are conveniently visualized by bar-
rier trees that give an impression on the over-
all shape and topology of the landscape [2].
Schematics representation of an energy landscape and its associ-
ated barrier tree. Local minima are labeled with numbers (1-5),
saddle points with lowercase letters (a-d). The global minimum
is marked with an asterisk.
Things needed to construct an energy landscape:
1. a set Xof configurations
2. a notion Mof neighborhood on Xand
3. an energy function f:X R.
The conformation space Xof a (biopolymer) sequence
Sis the total set of configurations Scompatible with
this sequence. The move set Mis an order relation
on X, defining adjacency between the elements of X.
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
2.0
2.0
2.0
2.0
4.0
7.0
2.0
2.0
2.0
2.0
2.0
4.0
2.0
5.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
5.0
5.0
6.0
3.0
2.0
11.0
11.0
2.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
8.0
9.0
34.001
34.001
33.001
33.001
10.001
Energy landscape of a 74-mer lattice protein on the SQ lattice,
calculated via the flooding algorithm with an energy threshold of
-95. The lowest 4 local minima (corresponding structures listed
below) on the right are not attached to the rest of the tree.
It crucially determines the topology of the underly-
ing energy landscape. Here we use non-local, ergodic
pivot moves that give rise to a fixed neighborhood re-
lation N:X ×X . A walk between two conformations x
and yis a list of conformations x=x1...xm+1 =ysuch
that 1im:N(xi, xi+1). Given a threshold η, the
lower part of the energy landscape (written as Xη)
consists of all conformations xsuch that E(S, x)η.
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
1
2
3
4
η
E
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
1
2
3
4
Schematic representation of the flooding algorithm (left plot).
Starting from a certain conformation, all neighbor conformations
are calculated repeatedly until all conformations in a certain re-
gion of the energy landscape are found.
Since exhaustive enumeration of all possible struc-
tures is only applicable to very short chains (the lat-
tice protein folding problem was shown to be NP
hard), we developed an algorithm for investigating
the low energy part of the energy landscape selec-
tively [5]. This approach starts at low energy con-
formations and enumerates all “accessible” conforma-
tions. To exemplify the idea, for generating the lower
part completely one starts with all local minima x
with E(S, x)η. Iteratively, one visits all conforma-
tions that are neighbors of already seen conformations
and stay below the energy threshold η. Two confor-
mations xand yare mutually accessible at the level
η(written as x"η#y) if there is a walk from xto
ysuch that all conformations zin the walk satisfy
E(S, z)η[2]. The saddle height ˆ
f(x, y)of xand yis
defined by
ˆ
f(x, y) = min{η|x"η#y}.
Given the set of all local minima Xη
min below threshold
η, the lower energy part Xηof the energy landscape
is given by
Xη={y| x X η
min :ˆ
f(x, y)η}.
Since the complete set of local minima Xη
min usually
is not available, one can also start from a restricted
set of low energy conformations Xinit and hope to enu-
merate a large part of the low energy conformations.
Refolding Paths
The figure at the bottom of the left column illustrates
a common problem with the calculation of barrier
trees based on the flooding approach: Saddle heights
are not known a priori, resulting in non-connected
trees. To overcome this, we developed a breadth-
first-search heuristics for estimating minimal refold-
ing paths between two arbitrary structures.
2 4 6 8
-130
-125
-120
-115
-110
-105
-100
-95
-90
-85
-80
-75
-70
Energy
E = -72
B
B
BM
2 4 6 8
E = -76
2 4 6 8 10
E = -80
2 4 6 8 10
E = -81
4 8 12 16 20
E = -92
2 4 6 8 10 12 14 16
E = -93
?
Energy profiles of the refolding process between two lattice pro-
tein structures from the barrier tree above.
Starting from a given conformation, we iteratively
generate a predefined number of neighbor confor-
mations with the constraint that adjacent structures
have a lower (hamming) distance to the target. Op-
tionally, we also allow a few indirect steps on the way
to the target, i.e. those moves that result in a larger
distance All visited structures are stored in a hash,
enabling an iterative approximation of low-energy re-
folding paths.
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
-90.0
-85.0
-80.0
-75.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
4.0
7.0
4.0
5.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
6.0
3.0
11.0
11.0
7.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
9.0
21.0
9.0
22.0
24.0
35.0
35.0
35.0
35.0
35.0
25.0
5.0
24.0
5.0
40.0
27.0
15.0
11.0
10.0
16.0
H
H
HY
HHHHHHHj
-
*
Connected barrier tree of the 74-mer lattice protein. The saddle be-
tween the two leftmost structures is at E93, the saddle connecting
these states to the ground state is at E=77.
Dynamics
Areduced dynamics can be formulated as a Markov
process by means of macrostates (i.e. basins in the
barrier tree) and Arrhenius-like transition rates be-
tween them [4]. The transition rate to reach state β
from state αtypically looks like
rβα = Γβα exp (E
βα Gα)/kT
where Γis a pre-exponential entropic factor, E
βα is
the energy of the saddle point between states αand
βand Gαis the free energy of basin α.
10-2 10-1 100101102103104105106
time
0
0.2
0.4
0.6
0.8
1
population percentage
HHH
Hj
HHH
Hj
Reduced refolding dynamics between two selected states of the 74-mer.
Several macrostates are populated temporarily, whereas all conforma-
tions find the target state after approx. 2 million time steps.
Visualization
Results from the macrostate dynamics are usually in
good agreement with exact folding simulations ob-
tained from Pinfold, a modified Monte Carlo type
algorithm that has originally been implemented for
investigation of RNA folding trajectories [1].
To facilitate the investigation of folding trajectories,
we developed a graphical user interface for efficiently
analyzing the results from Pinfold [3].
This novel framework allows not only for a rapid in-
vestigation of folding kinetics, but also provides a
powerful method for further refinement of biopoly-
mer folding landscapes.
References
[1] C. Flamm, W. Fontana, I. Hofacker, and P. Schuster. RNA folding kinetics
at elementary step resolution. RNA, 6:325–338, 2000.
[2] C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier
trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
[3] S. otzsch, G. Scheuermann, M. T. Wolfinger, C. Flamm, and P. F.
Stadler. Visualization of lattice-based protein folding simulations. In 10th
International Conference on Information Visualization (IV06), 2006.
[4] M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F.
Stadler. Efficient computation of RNA folding dynamics. J. Phys. A:
Math. Gen., 37(17):4731–4741, 2004.
[5] M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler.
Exploring the lower part of discrete polymer model energy landscapes.
Europhys. Lett., 74(4):726–732, 2006.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present a generic, problem-independent algorithm for exploration of the low-energy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins. Starting from a set of optimal and near-optimal conformations derived from a constraint-based search technique, we are able to selectively investigate the lower part of lattice protein energy landscapes in two and three dimensions. This novel approach allows, in contrast to exhaustive enumeration, for an efficient calculation of optimal and near-optimal structures below a given energy threshold and is only limited by the available amount of memory. A straightforward application of the algorithm is the calculation of barrier trees (representing the energy landscape), which then allows dynamics studies based on landscape theory.
Conference Paper
Full-text available
Analysis of the spacial structure of proteins including folding processes is a challenge for modern bioinformatics. Due to limited experimental access to folding processes, computer simulations are a standard approach. Since re-alistic continuous (all-atom) simulations are far too expen-sive, lattice based protein folding simulations are a com-mon coarse-graining. In this paper, we present a visualiza-tion tool for lattice based protein folding simulations. The system is based on Shneiderman's mantra "Overview first, zoom and filter, details on demand" and uses a collection of information visualization techniques including multiple views, focus+context and table lenses which have been tai-lored towards our data. We demonstrate the potential of information visualization techniques for providing insight into such simulations.
Article
Full-text available
Barrier trees consisting of local minima and their connecting saddle points imply a natural coarse-graining for the description of the energy landscape of RNA secondary structures. Here we show that, based on this approach, it is possible to predict the folding behaviour of RNA molecules by numerical integration. Comparison with stochastic folding simulations shows reasonable agreement of the resulting folding dynamics and a drastic increase in computational efficiency that makes it possible to investigate the folding dynamics of RNA of at least tRNA size. Our approach is readily applicable to bistable RNA molecules and promises to facilitate studies on the dynamic behaviour of RNA switches.
Article
Full-text available
We study the stochastic folding kinetics of RNA sequences into secondary structures with a new algorithm based on the formation, dissociation, and the shifting of individual base pairs. We discuss folding mechanisms and the correlation between the barrier structure of the conformational landscape and the folding kinetics for a number of examples based on artificial and natural sequences, including the influence of base modification in tRNAs.
Article
Full-text available
The heights of energy barriers separating two (macro-)states are useful for estimating transition frequencies. In non-degenerate landscapes the decomposition of a landscape into basins surrounding local minima connected by saddle points is straightforward and yields a useful definition of macro-states. In this work we develop a rigorous concept of barrier trees for degenerate landscapes. We present a program that efficiently computes such barrier trees, and apply it to two well known examples of landscapes. Keywords: Fitness landscape, Potential energy surface, energy barrier, saddle points, degenerate states Dedicated to Peter Schuster on the occasion of his 60th birthday.