Multioverlap simulations for transitions between reference configurations.
ABSTRACT We introduce a procedure to construct weight factors, which flatten the probability density of the overlap with respect to some predefined reference configuration. This allows one to overcome freeenergy barriers in the overlap variable. Subsequently, we generalize the approach to deal with the overlaps with respect to two reference configurations so that transitions between them are induced. We illustrate our approach by simulations of the brain peptide Metenkephalin with the ECEPP/2 (Empirical Conformational Energy Program for Peptides) energy function using the globalenergyminimum and the second lowestenergy states as reference configurations. The free energy is obtained as functions of the dihedral and the rootmeansquare distances from these two configurations. The latter allows one to identify the transition state and to estimate its associated freeenergy barrier.

Article: Prediction, determination and validation of phase diagrams via the global study of energy landscapes
International Journal of Materials Research (formerly Zeitschrift fuer Metallkunde) 02/2009; 100(02):135152. · 0.68 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In this article, we review the generalisedensemble algorithms. Three wellknown methods, namely multicanonical algorithm, simulated tempering and replicaexchange method, are described first. Both Monte Carlo and molecular dynamics versions of the algorithms are presented. We then present further extensions of the above three methods. Finally, we discuss the relations among multicanonical algorithm, Wang–Landau method and metadynamics.Molecular Simulation 12/2012; 38. · 1.12 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We propose a molecular dynamics method for the multioverlap algorithm. By utilizing a nonBoltzmann weight factor, this method realizes a random walk in the overlap space at a constant temperature and explores widely in the configurational space, where the overlap of a configuration with respect to a reference state is a measure for structural similarity. We can obtain detailed information about the freeenergy landscape and the transition states among any specific reference conformations at that temperature. We also introduce a multidimensional extension of the multioverlap algorithm. Appling this multidimensional method to a penta peptide, Metenkephalin, we demonstrate its effectiveness.Chemical Physics Letters 12/2004; 400(s 4–6):308–313. · 1.99 Impact Factor
Page 1
arXiv:condmat/0305055v1 [condmat.statmech] 3 May 2003
MultiOverlap Simulations for Transitions between Reference Configurations
Bernd A. Berg1,2, Hirochi Noguchi3,∗and Yuko Okamoto3,4
(Emails: berg@csit.fsu.edu, noguchi@ims.ac.jp, okamotoy@ims.ac.jp )
1Department of Physics, Florida State University, Tallahassee, FL 32306, USA
2School of Computational Science and Information Technology
Florida State University, Tallahassee, FL 32306, USA
3Department of Theoretical Studies, Institute for Molecular Science
Okazaki, Aichi 4448585, Japan
4Department of Functional Molecular Science, Graduate University for Advanced Studies
Okazaki, Aichi 4448585, Japan
(printed February 2, 2008)
We introduce a new procedure to construct weight factors, which flatten the probability density of
the overlap with respect to some predefined reference configuration. This allows one to overcome
free energy barriers in the overlap variable. Subsequently, we generalize the approach to deal with the
overlaps with respect to two reference configurations so that transitions between them are induced.
We illustrate our approach by simulations of the brainpeptide Metenkephalin with the ECEPP/2
energy function using the globalenergyminimum and the second lowestenergy states as reference
configurations. The free energy is obtained as functions of the dihedral and the rootmeansquare
distances from these two configurations. The latter allows one to identify the transition state and
to estimate its associated free energy barrier.
PACS: 05.10.Ln, 87.53.Wz, 87.14.Ee, 87.15.Aa
I. INTRODUCTION
Markov chain Monte Carlo (MC) simulations, for in
stance by means of the Metropolis method [1], are well
suited to simulate generalized ensembles. Generalized en
sembles do not occur in nature, but are of relevance for
computer simulations (see [2–4] for recent reviews). They
may be designed to overcome free energy barriers, which
are encountered in Metropolis simulations of the Gibbs
Boltzmann canonical ensemble. Generalized ensembles
do still allow for rigorous estimates of the canonical ex
pectation values, because the ratios between their weight
factors and the canonical GibbsBoltzmann weights are
exactly known.
Umbrella sampling [5] was one of the earliest
generalizedensemble algorithms. In the multicanonical
approach [6,7] one weights with a microcanonical tem
perature, which corresponds, in a selected energy range,
to a working estimate of the inverse density of states. Ex
pectation values of the canonical ensembles can be con
structed for a wide temperature range, hence the name
“multicanonical”. Here, “working estimate” means that
running the updating procedure with the (fixed) multi
canonical weight factors covers the desired energy range.
The Markov process exhibits random walk behavior and
moves in cycles from the maximum (or above) to the
minimum (or below) of the chosen energy range, and
∗Present address: Theory II, Institute of Solid State Re
search, Forschungszentrum J¨ ulich, D52425 J¨ ulich, Germany.
Email: hi.noguchi@fzjuelich.de.
back. A working estimate of the multicanonical weights
allows for calculations of the spectral density and all re
lated thermodynamical observables with any desired ac
curacy by simply increasing the MC statistics. Thus, we
have a twostep approach: The first step is to obtain
the working estimate of the weights, and the second step
is to perform a long production run with these weights.
There is no need for that estimate to converge towards
the exact inverse spectral density. Once the working es
timate of the weights exists, MC simulations with frozen
weights converge and allow one to calculate thermody
namical observables with, in principle, arbitrary preci
sion. Various methods, ranging from finitesize scaling
estimates [8] in case of suitable systems to general pur
pose recursions [9–11], are at our disposal to obtain a
working estimate of the weights.
In the present article we deal with a variant of the mul
ticanonical approach: Instead of flattening the energy
distribution, we construct weights to flatten the proba
bility density of the overlap with a given reference con
figuration. This allows one to overcome energy barriers
in the overlap variable and to get accurate estimates of
thermodynamic observables at overlap values which are
rare in the canonical ensemble. A similar concept was
previously used in spin glass simulations [12], but there
is a crucial difference: In Ref. [12] the weighting was
done for the selfoverlap of two replicas of the system
and a proper name would be multiselfoverlap simula
tions, while in the present article we are dealing with the
overlap to a predefined configuration.
We next generalize our approach to deal with two ref
erence configurations so that transitions between them
become covered and our method allows one then to esti
1
Page 2
mate the transition states and its associated free energy
barrier. We have in mind situations where experimen
talists determined the reference configurations and ob
served transitions between them, but an understanding
of the free energy landscape between the configurations
is missing. An example would be the conversion from a
configuration with α helix structures to a native struc
ture which is mostly in the β sheet, as it is the case for
βlactoglobulin [13,14].
The paper is organized as follows: In the next sec
tion we describe the algorithmic details, using first one
and then two reference configurations. In particular, a
twostep updating procedure is defined, which is typi
cally more efficient than the conventional onestep up
dating. Moreover, based on the sums of uniformly dis
tributed random numbers, a method to obtain a working
estimate of the multioverlap weights is introduced. In
section III we illustrate the method for a simulation with
the pentapeptide Metenkephalin. Our simulations use
the allatom energy function ECEPP/2 (Empirical Con
formational Energy Program for Peptides [15]) and rely
on its implementation in the computer package SMMP
(Simple Molecular Mechanics for Proteins [16]). We use
as reference configurations the global energy minimum
(GEM) state, which has been determined by many au
thors [17–21], and the second lowestenergy state, as
identified in Refs. [19,22]. While our overlap definition
relies on a distance definition in the space of the dihe
dral angles, it turns out that for the data analysis the
use of the rootmeansquare (rms) distance is crucial. It
is only in the latter variable that one obtains a clear pic
ture of the transition saddle point in the twodimensional
free energy diagram. In the final section a summary of
the present results and an outlook with respect to future
applications are given.
II. MULTIOVERLAP METROPOLIS
ALGORITHM
In this section we explain the details of our multi
overlap algorithm. The overlap of a configuration versus
a reference configuration is defined in the next subsec
tion. In the second subsection we discuss details of the
updating. To achieve step one of the method, i.e., the
construction of a working estimate of the multioverlap
weights, one could employ a similar recursion as the one
used in [12] or explore the approach of [11]. Instead of
doing so, we decided to test a new method: At infinite
temperature, β = 0, the overlap distributions can be cal
culated analytically (see subsection IID). We use this
as starting point and estimate the overlap weights at the
desired temperature by increasing β in sufficiently small
steps so that the entire overlap range remains covered.
In the final subsection we define the overlap with respect
to two distinct reference configurations to cover the tran
sition region between them.
A. Definition of the overlap
There is a considerable amount of freedom in defining
the overlap of two configurations. For instance, one may
rely on the rms distance between configurations, and in
subsection IIID we analyze some of our results in this
variable. However, the computation of the rms distance
is slow and for MC calculations it is important to rely on
a computationally fast definition. Therefore, we define
the overlap in the space of dihedral angles by, as it was
already used in [24],
q = (n − d)/n ,(1)
where n is the number of dihedral angles and d is the
distance between configurations defined by
d = v − v1 =
1
π
n
?
i=1
da(vi,v1
i) .(2)
Here, viis our generic notation for the dihedral angle i,
−π < vi≤ π, and v1is the vector of dihedral angles of the
reference configuration. The distance da(vi,v′
two angles is defined by
i) between
da(vi,v′
i) = min(vi− v′
i,2π − vi− v′
i) . (3)
The symbol . defines a norm in a vector space. In
particular, the triangle inequality holds
v1− v2 ≤ v1− v + v − v2 .(4)
For a single angle we have
0 ≤ vi− v1
i ≤ π ⇒ 0 ≤ d ≤ n . (5)
At β = 0 (i.e., infinite temperature)
di =
1
πda(vi,v1
i) (6)
is a uniformly distributed random variable in the range
0 ≤ di ≤ 1 and the distance d in (2) becomes the sum
of n such uniformly distributed random variables, which
allows for an exact calculation of its distribution.
B. Multioverlap weights
We choose a reference configuration of n dihedral an
gles v1
i, (i = 1,...,n) to define the dihedral distance (2).
We want to simulate the system with weight factors that
lead to a random walk (RW) process in the dihedral dis
tance d,
d < dmin → d > dmax and back .(7)
Here, dmin is chosen sufficiently small so that one can
claim that the reference configuration has been reached,
e.g., a few percent of n/2, which is the average d at
2
Page 3
T = ∞. The value of dmax has to be sufficiently large
to introduce a considerable amount of disorder, e.g.,
dmax = n/2. In the following we call one event of the
form (7) a random walk cycle (RWC).
One possibility is to choose weight factors which give
a flat probability density in the dihedral distance range
0 ≤ d ≤ n/2, falling off for d > n/2 by keeping the d
dependence of the weight constant for d ≥ n/2. This
is quite similar to multimagnetical simulations [8], for
which the external magnetic field takes the place of the
reference configuration. The analogy becomes obvious,
when the external field is defined via a ghost spin, which
couples to all other spins. For instance, the spins ? s of the
Heisenberg ferromagnet are threedimensional vectors of
magnitude ? s2= 1. Their interaction with an external
magnetic field?H can be written as
?H ·
?
i
? si= H
?
i
? sH·? si= N H q ,(8)
where ? sH is the unit vector in the direction of the mag
netic field, ? si is the Heisenberg spin at site i, N is the
number of spins, and q is the overlap of the spin config
uration with the reference configuration ? sH:
q =
1
N
?
i
? sH·? si. (9)
Using the multioverlap language [12], the multimagneti
cal [8] weight factors may then be rewritten as
exp(−βE + S(q)) = wc(E)wq(q) , (10)
where
wc(E) = exp(−β E) ,(11)
and E = −?
berg ferromagnet (the sum is over nearest neighbor
spins). Here, S(q) has the meaning of a microcanonical
entropy of the overlap parameter, which has to be de
termined so that the probability density becomes flat in
q. Weights for other than the flat distribution have also
been discussed in the literature, e.g., Ref. [25], on which
we shall comment in connection with figure 7 below.
?ij?? si·? sjis energy function of the Heisen
C. The updating procedure
In essence, there are two ways to implement the up
date.
1. Combine the multioverlap and the canonical
weights to one probability, which is accepted or re
jected in one random step.
2. Accept or reject the multioverlap and the canoni
cal probabilities sequentially in two random steps.
1. Onestep updating
As defined in equations (10) and (11), the weight fac
tor is a product of wc(E) and wq(d), where wc(E) is the
usual, canonical GibbsBoltzmann factor and wq(d) is the
multioverlap weight factor, where we now use the dis
tance d from the reference configuration (instead of the
overlap q) as argument. As is clear from equation (1),
the use of either q or d as argument is equivalent, while
in the presentation of results the use of either variable
can have intuitive advantages. In the onestep updating
we combine the weights to
w(E,d) = wc(E)wq(d) ,(12)
and accept or reject newly proposed configurations in
the standard Metropolis way. Notably, the calculation
of wq(d) (a simple table lookup) is very fast compared
with the calculation of wc(E). Therefore, the following
twostep procedure is of interest.
2. Twostep updating
Suppose that the present configuration is (d,E) and a
new configuration (d′,E′) is proposed:
(d,E) → (d′,E′) .(13)
We can sequentially first accept or reject with the wq(d)
probabilities and then conditionally, when the dpart is
accepted, with the wc(E) probabilities.
Proof: We show detailed balance for two subsequent
updates of the same dihedral angle with the twostep
procedure. There are four cases with probabilities of ac
ceptance:
Pi, i = 1,2,3,4.(14)
They are listed in the following:
1. wq(d′) ≥ wq(d) and wc(E′) ≥ wc(E) :
P1= 1,
2. wq(d′) ≥ wq(d) and wc(E′) < wc(E) :
P2= wc(E′)/wc(E),
3. wq(d′) < wq(d) and wc(E′) ≥ wc(E) :
P3= wq(d′)/wq(d),
4. wq(d′) < wq(d) and wc(E′) < wc(E) :
P4= wq(d′)wc(E′)/[wq(d)wc(E)].
(15)
(16)
(17)
(18)
For the inverse move
(d′,E′) → (d,E)(19)
with probabilities of acceptance
P′
i, i = 1,2,3,4,(20)
3
Page 4
the cases are:
1. wq(d) ≤ wq(d′) and wc(E) ≤ wc(E′) :
P′
2. wq(d) ≤ wq(d′) and wc(E) > wc(E′) :
P′
3. wq(d) > wq(d′) and wc(E) ≤ wc(E′) :
P′
4. wq(d) > wq(d′) and wc(E) > wc(E′) :
P′
4= 1.
1= wq(d)wc(E)/[wq(d′)wc(E′)], (21)
2= wq(d)/wq(d′), (22)
3= wc(E)/wc(E′),(23)
(24)
For the ratios we find
Pi
P′
i
=
wq(d′)wc(E′)
wq(d)wc(E)
,(25)
independently of i = 1,2,3,4. Therefore, we have con
structed a valid Metropolis updating procedure.
D. Sums of a uniformly distributed random variable
To calculate the overlap weights at infinite tempera
ture, we consider the sum
ur= xr
1+ ... + xr
n
(26)
of the random variables xr
distributed in the interval [0,1) and derive a recursion
formula for the probability density fn(u) of this distribu
tion. Care is taken to cast the recursion in a form which
allows for a numerically stable implementation [26] over
a reasonably large range of n.
Let us recall the probability density of the uniform
distribution:
j(j = 1,···,n), each uniformly
f1(x) =
?1, for 0 ≤ x < 1,
0, otherwise.
(27)
To derive the recursion formula for the probability den
sity of the random variable (26), it is convenient to cast
it in the form
fn(u) =
n
?
k=1
fn,k(xk) withxk= u − k + 1, (28)
where
fn,k(x) =
n−1
?
i=0
0, otherwise.
ai
n,kxi, for 0 ≤ x < 1,
(29)
The master formula for the recursion is obtained from
the convolution
fn(u) =
?u
0
f1(u − v) fn−1(v) dv .(30)
Let now u = x+k−1 with 0 ≤ x < 1, and equations (27),
(28), and (29) imply
fn,k(x) =
?k−1+x
k−2+x
?x
0
fn−1(v) dv
=
?1
x
fn−1,k−1(y) dy +fn−1,k(y) dy . (31)
Using equation (29) and performing the integrations, we
obtain
fn,k(x) =
n−2
?
i=0
n−2
?
i=0
ai
n−1,k−1
1
i + 1
−
n−2
?
i=0
ai
n−1,k−1
xi+1
i + 1
+
ai
n−1,k
xi+1
i + 1.(32)
Expanding in powers of x and comparing (29) with (32)
allows one to calculate the coefficients ai
a numerically robust way:
n,krecursively in
a0
n,k=
n−1
?
j=0
aj
n−1,k−1
j + 1
, ai
n,k=
n−1
?
j=0
aj
n−1,k− aj
n−1,k−1
j + 1
.
(33)
Once the coefficients ai
evaluate the probability densities fn(u) and the corre
sponding cumulative distribution functions.
The probability density (28) takes its maximum value
for u = n/2. Due to the central limit theorem the falloff
behavior is Gaussian as long as u stays sufficiently close
to n/2. In the tails, for u → 0 or u → n, the falloff is
much faster than Gaussian, namely an exponential of an
exponential as follows from extreme value statistics [27].
n,kare available, one can easily
E. Combination of two weights
In the following the weights with superscript j, wj
correspond to two distinct reference configurations vj,
(j = 1,2), and dj is the distance from the configura
tion at hand to the configuration vj.
that multioverlap simulations with respect to the two
reference configurations have been carried out and that
the weights, w1
q(d2), have been determined
so that they sample their distance distributions approxi
mately uniformly.
We want to construct combined weights w12
which lead to a RW process between the configurations
v1and v2. Our choice is
q(dj),
Let us assume
q(d1) and w2
q(d1,d2)
w12
q(d1,d2) =
?w1
cjw2
q(d1), for d1< d2,
q(d2), for d1≥ d2.
(34)
The constant cj, with j either 1 or 2, is introduced to
allow for smooth transitions from d1 < d2 to d′
1≥ d′
2
4
Page 5
FIG. 1. Reference configuration 1. Only backbone struc
ture is shown. The Nterminus is on the lefthand side and
the Cterminus on the righthand side. The dotted lines stand
for hydrogen bonds. The figure was created with RasMol [23]
and vice versa.
of either run 1 (or run 2), which are the (one refer
ence configuration) simulations leading to the weights
w1
q(d2)). The constant c1 is found from run
1 by scanning the time series for configuration for which
d1 ≥ d2 holds and which have a oneupdate transition
(d1,d2) → (d′
tions k we determine the constant c1so that
We determine cj from the analysis
q(d1) (or w2
1,d′
2) with d′
1< d′
2. From these configura
?
k
w1
q[d1(k)] = c1
?
k
w2
q[d2(k)] (35)
holds. Similarly, run 2 may be used to get c2. It turns out
that the normalized weights almost agree in the transi
tion region and, therefore, the patching (34) works. The
dependence of the constant on the run used for its de
termination is small, and it appears not worthwhile to
explore more sophisticated methods.
It is straightforward to implement the Metropolis up
dating with respect to the weights (34). For the transi
tion
(d1,d2) → (d′
1,d′
2), (36)
one has to distinguish four more cases:
1.d1< d2 and d′
2.d1< d2 and d′
3.d1≥ d2 and d′
4.d1≥ d2 and d′
1< d′
1≥ d′
1< d′
1≥ d′
2,
2,
2,
2.
(37)
(38)
(39)
(40)
FIG. 2. Reference configuration 2. See the caption of figure
1 for details.
Alternatively to the approach outlined, one may com
bine d1 and d2 into a new variable θd for which the
weights are then calculated as in the onedimensional
case. A suitable choice along this line is
θd=2
πarctan
?d1
d2
?
. (41)
III. METENKEPHALIN SIMULATIONS
In the following we introduce two reference configura
tions. Subsequently, we discuss first the results for sim
ulations with one reference configuration and then those
involving both reference configurations.
A. The reference configurations
Metenkephalin has the aminoacid sequence TyrGly
GlyPheMet. We fix the peptidebond dihedral angles ω
to 180◦, which implies that the total number of variable
dihedral angles is n = 19. We neglect the solvent effects
as in previous works. The lowenergy configurations of
Metenkephalin in the gas phase have been classified into
severalgroups of similar structures [19,22]. Two reference
configurations, called configuration 1 and configuration 2,
are used in the following and depicted in figures 1 and 2,
respectively. Configuration 1 has a βturn structure with
5