Content uploaded by Andreas SvrcekSeiler
Author content
All content in this area was uploaded by Andreas SvrcekSeiler
Content may be subject to copyright.
Energy Landscapes and Dynamics of Biopolymers
Michael T. Wolﬁnger
1
, W. Andreas SvrcekSeiler
1
, Christoph Flamm
2
, Ivo L. Hofacker
1
, Peter F. Stadler
2
1
Institute for Theoretical Chemistry, University of Vienna, Austria
2
Department of Computer Science, University of Leipzig, Germany
Tel: +43 1 4277 52747 Fax: +43 1 4277 52793 Email: {mtw,svrci,xtof,ivo,studla}@tbi.univie.ac.at Web: http://www.tbi.univie.ac.at/
The ability of biomolecules like DNA, RNA or pro
teins to fold into a welldeﬁned native state is a
prerequisite for biologically functional molecules. A
reasonable level of
coarsegraining is needed in or
der to treat biomolecules within a theoretical frame
work. Kinetics and structure formation processes of
biopolymers are crucially determined by the topolog
ical details of the underlying (free) energy land scape.
We present a generic, problem independent frame
work for exploration of the
lowenergy portion of the
energy landscape of discrete systems and apply it to
the energy landscape of lattice proteins.
Lattice Proteins
The HPNX model is used to study general proper
ties of lattice heteropolymers. Within this simpliﬁed
model, a conformation is regarded as a
selfavoiding
walk on a two or threedimensional lattice.
Left: 74mer lattice protein on the 2D square l attice (SQ). Mid
dle: 27mer on the 3D simple cubic (SC) lattice. Right: Interaction
scheme for the HPNX model used here.
The 20 letter alphabet of amino acids is reduced to a
four letter alphabet: Hydrophobic (H), positive (P),
negative (N) and neutral (X) residues. Energy is eval
uated via a pair potential with attractive interactions
when two beads are neighbors in the lattice but not
along the chain. Lattice heteropolymers oﬀer the ad
vantage of modeling the
general properties of proteins
at relatively low computational cost. However, they
represent a crude abstraction by implying ﬁxed bond
lengths and angles.
Energy Landscapes
The energy landscape of a biopolymer molecule
is a complex surface of the
free energy ver
sus the
conformational degrees of freedom. En
ergy landscapes are conveniently visualized by
bar
rier trees
that give an impression on the over
all shape and topology of the landscape [2].
Schematics representation of an energy landscape and its associ
ated barrier tree. Local minima are labeled wi th numbers (15),
saddle poi nts with lowercase letters (ad). The global minimum
is marked with an asterisk.
Things needed to construct an energy landscape:
1. a set X of conﬁgurations
2. a notion M of neighborhood on X and
3. an energy function f : X → R.
The
conformation space X of a (biopolymer) sequence
S is the total set of conﬁgurations S compatible with
this sequence. The move set M is an order relation
on X , deﬁning adjacency between the elements of X .
125.0
120.0
115.0
110.0
105.0
100.0
95.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
2.0
2.0
2.0
2.0
4.0
7.0
2.0
2.0
2.0
2.0
2.0
4.0
2.0
5.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
5.0
5.0
6.0
3.0
2.0
11.0
11.0
2.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
8.0
9.0
34.001
34.001
33.001
33.001
10.001
Energy landscape of a 74 mer l attice protein on the SQ lattice,
calculated via the ﬂ ooding algorithm with an energy threshold of
95. The lowest 4 local minima (corresponding structures listed
below) on the right are n o t attached to the rest of the tree.
It cru cially determines the topology of the underly
ing energy landscape. Here we use nonlocal, ergodic
pivot moves that give rise to a ﬁxed neighborhood re
lation N : X ×X . A walk between two conformations x
and y is a list of conformations x = x
1
. . . x
m+1
= y such
that ∀1 ≤ i ≤ m : N (x
i
, x
i+1
) . Given a threshold η, the
lower pa rt of the energy landscape (written as X
≤η
)
consists of all conformations x such that E(S, x) ≤ η.
1
2
3
4
η
E
1
2
3
4
Schematic representation of the ﬂooding algorithm (left plot).
Starting from a certain conformation, all neighbor conformations
are calcu l ated repeatedly until all con formations in a certain re
gion of the energy landscape are found.
Since exhaustive enumeration of all possible struc
tures is only applicable to very short chains (the lat
tice protein folding pr oblem
was shown to be NP
hard), we developed an algorithm for investigating
the low energy part of the energy landsca pe selec
tively
[5]. This a p proach starts at low energy con
formations and enumerates all “accessible” conforma
tions. To exemplify the idea, for generating the lower
part completely one starts with all local minima x
with E(S, x) ≤ η. Iteratively, one visits all conforma
tions that a re neighbors of already seen conformations
and stay below the energy threshold η. Two confor
mations x and y are mutually accessible a t the level
η (written as x "
η
# y) if there is a walk from x to
y such that all conformations z in the walk sa tisfy
E(S, z) ≤ η
[2]. The saddle height
ˆ
f(x, y) of x and y is
deﬁned by
ˆ
f(x, y) = min{η  x "
η
# y}.
Given the set of all local minima X
≤η
min
below threshold
η, the lower energy part X
≤η
of the energy landscape
is given by
X
≤η
= {y  ∃x ∈ X
≤η
min
:
ˆ
f(x, y) ≤ η}.
Since the complete set of local minima X
≤η
min
usually
is not available, one can also start from a restricted
set of low energy conformations X
init
and hope to enu
merate a large part of the low energy conformations.
Refolding Paths
The ﬁgure at the bottom of the left column illustrates
a common problem with the calculation of b a r rier
trees based on the ﬂooding approach: Saddle heights
are not known a priori, resulting in
nonconnected
trees
. To overcome this, we developed a breadth
ﬁrstsearch heuristics for
estimating minimal refold
ing paths
between two arbitrary structures.
2 4 6 8
130
125
120
115
110
105
100
95
90
85
80
75
70
Energy
E = 72
B
B
BM
2 4 6 8
E = 76
2 4 6 8 10
E = 80
2 4 6 8 10
E = 81
2 4 6 8 10
E = 84
4 8 12 16 20
E = 92
2 4 6 8 10 12 14 16
E = 93
?
Energy proﬁl es of the refolding process between two lattice p ro
tein structures from the barrier tree above.
Starting from a given conformation, we iteratively
generate a predeﬁned number of neighbor confor
mations with the constraint tha t adjacent structures
have a lower (hamming) distance to the target. Op
tionally, we also allow a few indirect steps on the way
to the target, i.e. those moves that result in a larger
distance All visited structures are stored in a hash,
enabling an iterative approximation of lowenergy re
folding paths.
125.0
120.0
115.0
110.0
105.0
100.0
95.0
90.0
85.0
80.0
75.0
4.0
4.0
4.0
4.0
5.0
4.0
3.0
3.0
4.0
7.0
4.0
5.0
8.0
8.0
4.0
4.0
8.0
4.0
3.0
9.0
3.0
8.0
3.0
4.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
5.0
8.0
6.0
7.0
6.0
4.0
5.0
6.0
3.0
11.0
11.0
7.0
8.0
8.0
8.0
8.0
7.0
3.0
11.0
11.0
3.0
7.0
9.0
4.0
9.0
7.0
7.0
3.0
9.0
21.0
9.0
22.0
24.0
35.0
35.0
35.0
35.0
35.0
25.0
5.0
24.0
5.0
40.0
27.0
15.0
11.0
10.0
16.0
H
H
HY
H
H
H
H
H
H
Hj

*
Connected barrier tree of the 74mer lattice protein. The saddle be
tween the two leftmost structures is at E − 93, the saddle connecting
these states to the ground state is at E = −77.
Dynamics
A reduced dynamics can be formulated as a Markov
process by means of macrostates (i.e. basins in the
barrier tree) and Arrheniuslike transition rates be
tween them
[4]. The tran sition rate to reach state β
from state α typically looks like
r
βα
= Γ
βα
exp
−(E
∗
βα
− G
α
)/kT
where Γ is a preexponential entropic factor, E
∗
βα
is
the energy of the sa dd le point between states α a nd
β a nd G
α
is the free energy of basin α.
10
2
10
1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
time
0
0.2
0.4
0.6
0.8
1
population percentage
H
H
H
Hj
H
H
H
Hj
Reduced refoldi ng dynamics between two selected states of the 74mer.
Several macrostates are popul ated temporarily, whereas all conforma
tions ﬁnd the target state after app rox. 2 million time steps.
Visualization
Results from the macrostate dynamics are usually in
good agreement with exact folding simulations ob
tained from
Pinfold, a modiﬁed M onte Carlo type
algorithm that has originally been implemented for
investigation of RNA folding trajectories
[1].
To facilitate the investigation of folding trajectories,
we developed a graphical user interface for eﬃciently
analyzing the results from Pinfold
[3].
This novel framework allows not only for a rapid in
vestigation
of folding kinetics, but also provides a
powerful method for further
reﬁnement of biopoly
mer folding landscapes
.
References
[1] C. Flamm, W. Fontana, I. Hofacker, and P. Schuster. RNA folding kin etics
at elementary step resolution. RNA, 6:325–338, 2000.
[2] C. Flamm, I. L. Hofacker, P. F. Stadler, and M . T. Wo l ﬁnger. Barrier
trees of d egenerate landscapes. Z. Phys. Chem., 216:155–173, 2002.
[3] S . P¨otzsch, G. Scheuermann, M. T. Wolﬁnger, C . Flamm, and P. F.
Stadler. Visualization of latticebased protein folding simulations. In 10th
International Conference on Information Visualization (IV06), 2006.
[4] M. T. Wolﬁnger, W. A. SvrcekSeiler, C. Flamm, I. L. Hofacker, and P. F.
Stadler. Eﬃcient computation of RNA fold i ng d yn amics. J. Phys. A:
Math. Gen., 37(17):4731– 4 74 1 , 2004.
[5] M. T. Wol ﬁnger, S. Will, I. L. Hofacker, R. Backofen, and P. F . Stadler.
Exploring the lower part of discrete polymer model energy landscapes.
Europhys. Lett., 74(4):726–732, 2006.