Content uploaded by Andreas Svrcek-Seiler
Author content
All content in this area was uploaded by Andreas Svrcek-Seiler
Content may be subject to copyright.
Energy Landscapes and Dynamics of Biopolymers
Michael T. Wolfinger1, W. Andreas Svrcek-Seiler1, Christoph Flamm2, Ivo L. Hofacker1, Peter F. Stadler2
1Institute for Theoretical Chemistry, University of Vienna, Austria
2Department of Computer Science, University of Leipzig, Germany
Tel: +43 1 4277 52747 Fax: +43 1 4277 52793 Email: {mtw,svrci,xtof,ivo,studla}@tbi.univie.ac.at Web: http://www.tbi.univie.ac.at/
The ability of biomolecules like DNA, RNA or pro-
teins to fold into a well-defined native state is a
prerequisite for biologically functional molecules. A
reasonable level of coarse-graining is needed in or-
der to treat biomolecules within a theoretical frame-
work. Kinetics and structure formation processes of
biopolymers are crucially determined by the topolog-
ical details of the underlying (free) energy landscape.
We present a generic, problem independent frame-
work for exploration of the low-energy portion of the
energy landscape of discrete systems and apply it to
the energy landscape of lattice proteins.
Lattice Proteins
The HPNX model is used to study general proper-
ties of lattice heteropolymers. Within this simplified
model, a conformation is regarded as a self-avoiding
walk on a two- or three-dimensional lattice.
Left: 74-mer lattice protein on the 2D square lattice (SQ). Mid-
dle: 27-mer on the 3D simple cubic (SC) lattice. Right: Interaction
scheme for the HPNX model used here.
The 20 letter alphabet of amino acids is reduced to a
four letter alphabet: Hydrophobic (H), positive (P),
negative (N) and neutral (X) residues. Energy is eval-
uated via a pair potential with attractive interactions
when two beads are neighbors in the lattice but not
along the chain. Lattice heteropolymers offer the ad-
vantage of modeling the general properties of proteins
at relatively low computational cost. However, they
represent a crude abstraction by implying fixed bond
lengths and angles.
Energy Landscapes
The energy landscape of a biopolymer molecule
is a complex surface of the free energy ver-
sus the conformational degrees of freedom. En-
ergy landscapes are conveniently visualized by bar-
rier trees that give an impression on the over-
all shape and topology of the landscape [2].
Schematics representation of an energy landscape and its associ-
ated barrier tree. Local minima are labeled with numbers (1-5),
saddle points with lowercase letters (a-d). The global minimum
is marked with an asterisk.
Things needed to construct an energy landscape:
1. a set Xof configurations
2. a notion Mof neighborhood on Xand
3. an energy function f:X → R.
The conformation space Xof a (biopolymer) sequence
Sis the total set of configurations Scompatible with
this sequence. The move set Mis an order relation
on X, defining adjacency between the elements of X.
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
2.0
2.0
2.0
2.0
4.0
7.0
2.0
2.0
2.0
2.0
2.0
4.0
2.0
5.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
5.0
5.0
6.0
3.0
2.0
11.0
11.0
2.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
8.0
9.0
34.001
34.001
33.001
33.001
10.001
Energy landscape of a 74-mer lattice protein on the SQ lattice,
calculated via the flooding algorithm with an energy threshold of
-95. The lowest 4 local minima (corresponding structures listed
below) on the right are not attached to the rest of the tree.
It crucially determines the topology of the underly-
ing energy landscape. Here we use non-local, ergodic
pivot moves that give rise to a fixed neighborhood re-
lation N:X ×X . A walk between two conformations x
and yis a list of conformations x=x1...xm+1 =ysuch
that ∀1≤i≤m:N(xi, xi+1). Given a threshold η, the
lower part of the energy landscape (written as X≤η)
consists of all conformations xsuch that E(S, x)≤η.
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
1
2
3
4
η
E
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
00000000000000000000
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
11111111111111111111
1
2
3
4
Schematic representation of the flooding algorithm (left plot).
Starting from a certain conformation, all neighbor conformations
are calculated repeatedly until all conformations in a certain re-
gion of the energy landscape are found.
Since exhaustive enumeration of all possible struc-
tures is only applicable to very short chains (the lat-
tice protein folding problem was shown to be NP
hard), we developed an algorithm for investigating
the low energy part of the energy landscape selec-
tively [5]. This approach starts at low energy con-
formations and enumerates all “accessible” conforma-
tions. To exemplify the idea, for generating the lower
part completely one starts with all local minima x
with E(S, x)≤η. Iteratively, one visits all conforma-
tions that are neighbors of already seen conformations
and stay below the energy threshold η. Two confor-
mations xand yare mutually accessible at the level
η(written as x"η#y) if there is a walk from xto
ysuch that all conformations zin the walk satisfy
E(S, z)≤η[2]. The saddle height ˆ
f(x, y)of xand yis
defined by
ˆ
f(x, y) = min{η|x"η#y}.
Given the set of all local minima X≤η
min below threshold
η, the lower energy part X≤ηof the energy landscape
is given by
X≤η={y| ∃x∈ X ≤η
min :ˆ
f(x, y)≤η}.
Since the complete set of local minima X≤η
min usually
is not available, one can also start from a restricted
set of low energy conformations Xinit and hope to enu-
merate a large part of the low energy conformations.
Refolding Paths
The figure at the bottom of the left column illustrates
a common problem with the calculation of barrier
trees based on the flooding approach: Saddle heights
are not known a priori, resulting in non-connected
trees. To overcome this, we developed a breadth-
first-search heuristics for estimating minimal refold-
ing paths between two arbitrary structures.
2 4 6 8
-130
-125
-120
-115
-110
-105
-100
-95
-90
-85
-80
-75
-70
Energy
E = -72
B
B
BM
2 4 6 8
E = -76
2 4 6 8 10
E = -80
2 4 6 8 10
E = -81
2 4 6 8 10
E = -84
4 8 12 16 20
E = -92
2 4 6 8 10 12 14 16
E = -93
?
Energy profiles of the refolding process between two lattice pro-
tein structures from the barrier tree above.
Starting from a given conformation, we iteratively
generate a predefined number of neighbor confor-
mations with the constraint that adjacent structures
have a lower (hamming) distance to the target. Op-
tionally, we also allow a few indirect steps on the way
to the target, i.e. those moves that result in a larger
distance All visited structures are stored in a hash,
enabling an iterative approximation of low-energy re-
folding paths.
-125.0
-120.0
-115.0
-110.0
-105.0
-100.0
-95.0
-90.0
-85.0
-80.0
-75.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
4.0
7.0
4.0
5.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
6.0
3.0
11.0
11.0
7.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
9.0
21.0
9.0
22.0
24.0
35.0
35.0
35.0
35.0
35.0
25.0
5.0
24.0
5.0
40.0
27.0
15.0
11.0
10.0
16.0
H
H
HY
HHHHHHHj
-
*
Connected barrier tree of the 74-mer lattice protein. The saddle be-
tween the two leftmost structures is at E−93, the saddle connecting
these states to the ground state is at E=−77.
Dynamics
Areduced dynamics can be formulated as a Markov
process by means of macrostates (i.e. basins in the
barrier tree) and Arrhenius-like transition rates be-
tween them [4]. The transition rate to reach state β
from state αtypically looks like
rβα = Γβα exp −(E∗
βα −Gα)/kT
where Γis a pre-exponential entropic factor, E∗
βα is
the energy of the saddle point between states αand
βand Gαis the free energy of basin α.
10-2 10-1 100101102103104105106
time
0
0.2
0.4
0.6
0.8
1
population percentage
HHH
Hj
HHH
Hj
Reduced refolding dynamics between two selected states of the 74-mer.
Several macrostates are populated temporarily, whereas all conforma-
tions find the target state after approx. 2 million time steps.
Visualization
Results from the macrostate dynamics are usually in
good agreement with exact folding simulations ob-
tained from Pinfold, a modified Monte Carlo type
algorithm that has originally been implemented for
investigation of RNA folding trajectories [1].
To facilitate the investigation of folding trajectories,
we developed a graphical user interface for efficiently
analyzing the results from Pinfold [3].
This novel framework allows not only for a rapid in-
vestigation of folding kinetics, but also provides a
powerful method for further refinement of biopoly-
mer folding landscapes.
References
[1] C. Flamm, W. Fontana, I. Hofacker, and P. Schuster. RNA folding kinetics
at elementary step resolution. RNA, 6:325–338, 2000.
[2] C. Flamm, I. L. Hofacker, P. F. Stadler, and M. T. Wolfinger. Barrier
trees of degenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
[3] S. P¨otzsch, G. Scheuermann, M. T. Wolfinger, C. Flamm, and P. F.
Stadler. Visualization of lattice-based protein folding simulations. In 10th
International Conference on Information Visualization (IV06), 2006.
[4] M. T. Wolfinger, W. A. Svrcek-Seiler, C. Flamm, I. L. Hofacker, and P. F.
Stadler. Efficient computation of RNA folding dynamics. J. Phys. A:
Math. Gen., 37(17):4731–4741, 2004.
[5] M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler.
Exploring the lower part of discrete polymer model energy landscapes.
Europhys. Lett., 74(4):726–732, 2006.