Abstract

The ability of biomolecules like DNA, RNA or proteins to fold into a well-defined native state is a prerequisite for biologically functional molecules. A reasonable level of coarse-graining is needed in order to treat biomolecules within a theoretical framework. Kinetics and structure formation processes of biopolymers are crucially determined by the topological details of the underlying (free) energy landscape. We present a generic, problem independent framework for exploration of the low-energy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins.
Energy Landscapes and Dynamics of Biopolymers
Michael T. Wolfinger
1
, W. Andreas Svrcek-Seiler
1
, Christoph Flamm
2
, Ivo L. Hofacker
1
, Peter F. Stadler
2
1
Institute for Theoretical Chemistry, University of Vienna, Austria
2
Department of Computer Science, University of Leipzig, Germany
Tel: +43 1 4277 52747 Fax: +43 1 4277 52793 Email: {mtw,svrci,xtof,ivo,studla}@tbi.univie.ac.at Web: http://www.tbi.univie.ac.at/
The ability of biomolecules like DNA, RNA or pro-
teins to fold into a well-defined native state is a
prerequisite for biologically functional molecules. A
reasonable level of
coarse-graining is needed in or-
der to treat biomolecules within a theoretical frame-
work. Kinetics and structure formation processes of
biopolymers are crucially determined by the topolog-
ical details of the underlying (free) energy land scape.
We present a generic, problem independent frame-
work for exploration of the
low-energy portion of the
energy landscape of discrete systems and apply it to
the energy landscape of lattice proteins.
Lattice Proteins
The HPNX model is used to study general proper-
ties of lattice heteropolymers. Within this simplified
model, a conformation is regarded as a
self-avoiding
walk on a two- or three-dimensional lattice.
Left: 74-mer lattice protein on the 2D square l attice (SQ). Mid-
dle: 27-mer on the 3D simple cubic (SC) lattice. Right: Interaction
scheme for the HPNX model used here.
The 20 letter alphabet of amino acids is reduced to a
four letter alphabet: Hydrophobic (H), positive (P),
negative (N) and neutral (X) residues. Energy is eval-
uated via a pair potential with attractive interactions
when two beads are neighbors in the lattice but not
along the chain. Lattice heteropolymers offer the ad-
vantage of modeling the
general properties of proteins
at relatively low computational cost. However, they
represent a crude abstraction by implying fixed bond
lengths and angles.
Energy Landscapes
The energy landscape of a biopolymer molecule
is a complex surface of the
free energy ver-
sus the
conformational degrees of freedom. En-
ergy landscapes are conveniently visualized by
bar-
rier trees
that give an impression on the over-
all shape and topology of the landscape [2].
Schematics representation of an energy landscape and its associ-
ated barrier tree. Local minima are labeled wi th numbers (1-5),
saddle poi nts with lowercase letters (a-d). The global minimum
is marked with an asterisk.
Things needed to construct an energy landscape:
1. a set X of configurations
2. a notion M of neighborhood on X and
3. an energy function f : X R.
The
conformation space X of a (biopolymer) sequence
S is the total set of configurations S compatible with
this sequence. The move set M is an order relation
on X , defining adjacency between the elements of X .
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
2.0
2.0
2.0
2.0
4.0
7.0
2.0
2.0
2.0
2.0
2.0
4.0
2.0
5.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
5.0
5.0
6.0
3.0
2.0
11.0
11.0
2.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
8.0
9.0
34.001
34.001
33.001
33.001
10.001
Energy landscape of a 74 -mer l attice protein on the SQ lattice,
calculated via the fl ooding algorithm with an energy threshold of
-95. The lowest 4 local minima (corresponding structures listed
below) on the right are n o t attached to the rest of the tree.
It cru cially determines the topology of the underly-
ing energy landscape. Here we use non-local, ergodic
pivot moves that give rise to a fixed neighborhood re-
lation N : X ×X . A walk between two conformations x
and y is a list of conformations x = x
1
. . . x
m+1
= y such
that 1 i m : N (x
i
, x
i+1
) . Given a threshold η, the
lower pa rt of the energy landscape (written as X
η
)
consists of all conformations x such that E(S, x) η.
1
2
3
4
η
E
1
2
3
4
Schematic representation of the ooding algorithm (left plot).
Starting from a certain conformation, all neighbor conformations
are calcu l ated repeatedly until all con formations in a certain re-
gion of the energy landscape are found.
Since exhaustive enumeration of all possible struc-
tures is only applicable to very short chains (the lat-
tice protein folding pr oblem
was shown to be NP
hard), we developed an algorithm for investigating
the low energy part of the energy landsca pe selec-
tively
[5]. This a p proach starts at low energy con-
formations and enumerates all “accessible” conforma-
tions. To exemplify the idea, for generating the lower
part completely one starts with all local minima x
with E(S, x) η. Iteratively, one visits all conforma-
tions that a re neighbors of already seen conformations
and stay below the energy threshold η. Two confor-
mations x and y are mutually accessible a t the level
η (written as x "
η
# y) if there is a walk from x to
y such that all conformations z in the walk sa tisfy
E(S, z) η
[2]. The saddle height
ˆ
f(x, y) of x and y is
defined by
ˆ
f(x, y) = min{η | x "
η
# y}.
Given the set of all local minima X
η
min
below threshold
η, the lower energy part X
η
of the energy landscape
is given by
X
η
= {y | x X
η
min
:
ˆ
f(x, y) η}.
Since the complete set of local minima X
η
min
usually
is not available, one can also start from a restricted
set of low energy conformations X
init
and hope to enu-
merate a large part of the low energy conformations.
Refolding Paths
The figure at the bottom of the left column illustrates
a common problem with the calculation of b a r rier
trees based on the flooding approach: Saddle heights
are not known a priori, resulting in
non-connected
trees
. To overcome this, we developed a breadth-
first-search heuristics for
estimating minimal refold-
ing paths
between two arbitrary structures.
2 4 6 8
-130
-125
-120
-115
-110
-105
-100
-95
-90
-85
-80
-75
-70
Energy
E = -72
B
B
BM
2 4 6 8
E = -76
2 4 6 8 10
E = -80
2 4 6 8 10
E = -81
2 4 6 8 10
E = -84
4 8 12 16 20
E = -92
?
Energy profil es of the refolding process between two lattice p ro-
tein structures from the barrier tree above.
Starting from a given conformation, we iteratively
generate a predefined number of neighbor confor-
mations with the constraint tha t adjacent structures
have a lower (hamming) distance to the target. Op-
tionally, we also allow a few indirect steps on the way
to the target, i.e. those moves that result in a larger
distance All visited structures are stored in a hash,
enabling an iterative approximation of low-energy re-
folding paths.
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
-90.0
-85.0
-80.0
-75.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
4.0
7.0
4.0
5.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
6.0
3.0
11.0
11.0
7.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
9.0
21.0
9.0
22.0
24.0
35.0
35.0
35.0
35.0
35.0
25.0
5.0
24.0
5.0
40.0
27.0
15.0
11.0
10.0
16.0
H
H
HY
H
H
H
H
H
H
Hj
-
*
Connected barrier tree of the 74-mer lattice protein. The saddle be-
tween the two leftmost structures is at E 93, the saddle connecting
these states to the ground state is at E = 77.
Dynamics
A reduced dynamics can be formulated as a Markov
process by means of macrostates (i.e. basins in the
barrier tree) and Arrhenius-like transition rates be-
tween them
[4]. The tran sition rate to reach state β
from state α typically looks like
r
βα
= Γ
βα
exp
(E
βα
G
α
)/kT
where Γ is a pre-exponential entropic factor, E
βα
is
the energy of the sa dd le point between states α a nd
β a nd G
α
is the free energy of basin α.
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
time
0
0.2
0.4
0.6
0.8
1
population percentage
H
H
H
Hj
H
H
H
Hj
Reduced refoldi ng dynamics between two selected states of the 74-mer.
Several macrostates are popul ated temporarily, whereas all conforma-
tions find the target state after app rox. 2 million time steps.
Visualization
Results from the macrostate dynamics are usually in
good agreement with exact folding simulations ob-
tained from
Pinfold, a modified M onte Carlo type
algorithm that has originally been implemented for
investigation of RNA folding trajectories
[1].
To facilitate the investigation of folding trajectories,
we developed a graphical user interface for efficiently
analyzing the results from Pinfold
[3].
This novel framework allows not only for a rapid in-
vestigation
of folding kinetics, but also provides a
powerful method for further
refinement of biopoly-
mer folding landscapes
.
References
[1] C. Flamm, W. Fontana, I. Hofacker, and P. Schuster. RNA folding kin etics
at elementary step resolution. RNA, 6:325–338, 2000.
[2] C. Flamm, I. L. Hofacker, P. F. Stadler, and M . T. Wo l finger. Barrier
trees of d egenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
[3] S . otzsch, G. Scheuermann, M. T. Wolfinger, C . Flamm, and P. F.
Stadler. Visualization of lattice-based protein folding simulations. In 10th
International Conference on Information Visualization (IV06), 2006.
[4] M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F.
Stadler. Efficient computation of RNA fold i ng d yn amics. J. Phys. A:
Math. Gen., 37(17):4731– 4 74 1 , 2004.
[5] M. T. Wol finger, S. Will, I. L. Hofacker, R. Backofen, and P. F . Stadler.
Exploring the lower part of discrete polymer model energy landscapes.
Europhys. Lett., 74(4):726–732, 2006.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present a generic, problem-independent algorithm for exploration of the low-energy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins. Starting from a set of optimal and near-optimal conformations derived from a constraint-based search technique, we are able to selectively investigate the lower part of lattice protein energy landscapes in two and three dimensions. This novel approach allows, in contrast to exhaustive enumeration, for an efficient calculation of optimal and near-optimal structures below a given energy threshold and is only limited by the available amount of memory. A straightforward application of the algorithm is the calculation of barrier trees (representing the energy landscape), which then allows dynamics studies based on landscape theory.
Conference Paper
Full-text available
Analysis of the spacial structure of proteins including folding processes is a challenge for modern bioinformatics. Due to limited experimental access to folding processes, computer simulations are a standard approach. Since re-alistic continuous (all-atom) simulations are far too expen-sive, lattice based protein folding simulations are a com-mon coarse-graining. In this paper, we present a visualiza-tion tool for lattice based protein folding simulations. The system is based on Shneiderman's mantra "Overview first, zoom and filter, details on demand" and uses a collection of information visualization techniques including multiple views, focus+context and table lenses which have been tai-lored towards our data. We demonstrate the potential of information visualization techniques for providing insight into such simulations.
Article
Full-text available
Barrier trees consisting of local minima and their connecting saddle points imply a natural coarse-graining for the description of the energy landscape of RNA secondary structures. Here we show that, based on this approach, it is possible to predict the folding behaviour of RNA molecules by numerical integration. Comparison with stochastic folding simulations shows reasonable agreement of the resulting folding dynamics and a drastic increase in computational efficiency that makes it possible to investigate the folding dynamics of RNA of at least tRNA size. Our approach is readily applicable to bistable RNA molecules and promises to facilitate studies on the dynamic behaviour of RNA switches.
Article
Full-text available
We study the stochastic folding kinetics of RNA sequences into secondary structures with a new algorithm based on the formation, dissociation, and the shifting of individual base pairs. We discuss folding mechanisms and the correlation between the barrier structure of the conformational landscape and the folding kinetics for a number of examples based on artificial and natural sequences, including the influence of base modification in tRNAs.
Article
Full-text available
The heights of energy barriers separating two (macro-)states are useful for estimating transition frequencies. In non-degenerate landscapes the decomposition of a landscape into basins surrounding local minima connected by saddle points is straightforward and yields a useful definition of macro-states. In this work we develop a rigorous concept of barrier trees for degenerate landscapes. We present a program that efficiently computes such barrier trees, and apply it to two well known examples of landscapes. Keywords: Fitness landscape, Potential energy surface, energy barrier, saddle points, degenerate states Dedicated to Peter Schuster on the occasion of his 60th birthday.