Page 1
arXiv:cond-mat/0305055v1 [cond-mat.stat-mech] 3 May 2003
Multi-Overlap Simulations for Transitions between Reference Configurations
Bernd A. Berg1,2, Hirochi Noguchi3,∗and Yuko Okamoto3,4
(E-mails: berg@csit.fsu.edu, noguchi@ims.ac.jp, okamotoy@ims.ac.jp )
1Department of Physics, Florida State University, Tallahassee, FL 32306, USA
2School of Computational Science and Information Technology
Florida State University, Tallahassee, FL 32306, USA
3Department of Theoretical Studies, Institute for Molecular Science
Okazaki, Aichi 444-8585, Japan
4Department of Functional Molecular Science, Graduate University for Advanced Studies
Okazaki, Aichi 444-8585, Japan
(printed February 2, 2008)
We introduce a new procedure to construct weight factors, which flatten the probability density of
the overlap with respect to some pre-defined reference configuration. This allows one to overcome
free energy barriers in the overlap variable. Subsequently, we generalize the approach to deal with the
overlaps with respect to two reference configurations so that transitions between them are induced.
We illustrate our approach by simulations of the brainpeptide Met-enkephalin with the ECEPP/2
energy function using the global-energy-minimum and the second lowest-energy states as reference
configurations. The free energy is obtained as functions of the dihedral and the root-mean-square
distances from these two configurations. The latter allows one to identify the transition state and
to estimate its associated free energy barrier.
PACS: 05.10.Ln, 87.53.Wz, 87.14.Ee, 87.15.Aa
I. INTRODUCTION
Markov chain Monte Carlo (MC) simulations, for in-
stance by means of the Metropolis method [1], are well
suited to simulate generalized ensembles. Generalized en-
sembles do not occur in nature, but are of relevance for
computer simulations (see [2–4] for recent reviews). They
may be designed to overcome free energy barriers, which
are encountered in Metropolis simulations of the Gibbs-
Boltzmann canonical ensemble. Generalized ensembles
do still allow for rigorous estimates of the canonical ex-
pectation values, because the ratios between their weight
factors and the canonical Gibbs-Boltzmann weights are
exactly known.
Umbrella sampling [5] was one of the earliest
generalized-ensemble algorithms. In the multicanonical
approach [6,7] one weights with a microcanonical tem-
perature, which corresponds, in a selected energy range,
to a working estimate of the inverse density of states. Ex-
pectation values of the canonical ensembles can be con-
structed for a wide temperature range, hence the name
“multicanonical”. Here, “working estimate” means that
running the updating procedure with the (fixed) multi-
canonical weight factors covers the desired energy range.
The Markov process exhibits random walk behavior and
moves in cycles from the maximum (or above) to the
minimum (or below) of the chosen energy range, and
∗Present address: Theory II, Institute of Solid State Re-
search, Forschungszentrum J¨ ulich, D-52425 J¨ ulich, Germany.
E-mail: hi.noguchi@fz-juelich.de.
back. A working estimate of the multicanonical weights
allows for calculations of the spectral density and all re-
lated thermodynamical observables with any desired ac-
curacy by simply increasing the MC statistics. Thus, we
have a two-step approach: The first step is to obtain
the working estimate of the weights, and the second step
is to perform a long production run with these weights.
There is no need for that estimate to converge towards
the exact inverse spectral density. Once the working es-
timate of the weights exists, MC simulations with frozen
weights converge and allow one to calculate thermody-
namical observables with, in principle, arbitrary preci-
sion. Various methods, ranging from finite-size scaling
estimates [8] in case of suitable systems to general pur-
pose recursions [9–11], are at our disposal to obtain a
working estimate of the weights.
In the present article we deal with a variant of the mul-
ticanonical approach: Instead of flattening the energy
distribution, we construct weights to flatten the proba-
bility density of the overlap with a given reference con-
figuration. This allows one to overcome energy barriers
in the overlap variable and to get accurate estimates of
thermodynamic observables at overlap values which are
rare in the canonical ensemble. A similar concept was
previously used in spin glass simulations [12], but there
is a crucial difference: In Ref. [12] the weighting was
done for the self-overlap of two replicas of the system
and a proper name would be multi-self-overlap simula-
tions, while in the present article we are dealing with the
overlap to a predefined configuration.
We next generalize our approach to deal with two ref-
erence configurations so that transitions between them
become covered and our method allows one then to esti-
1
Page 2
mate the transition states and its associated free energy
barrier. We have in mind situations where experimen-
talists determined the reference configurations and ob-
served transitions between them, but an understanding
of the free energy landscape between the configurations
is missing. An example would be the conversion from a
configuration with α helix structures to a native struc-
ture which is mostly in the β sheet, as it is the case for
β-lactoglobulin [13,14].
The paper is organized as follows: In the next sec-
tion we describe the algorithmic details, using first one
and then two reference configurations. In particular, a
two-step updating procedure is defined, which is typi-
cally more efficient than the conventional one-step up-
dating. Moreover, based on the sums of uniformly dis-
tributed random numbers, a method to obtain a working
estimate of the multi-overlap weights is introduced. In
section III we illustrate the method for a simulation with
the pentapeptide Met-enkephalin. Our simulations use
the all-atom energy function ECEPP/2 (Empirical Con-
formational Energy Program for Peptides [15]) and rely
on its implementation in the computer package SMMP
(Simple Molecular Mechanics for Proteins [16]). We use
as reference configurations the global energy minimum
(GEM) state, which has been determined by many au-
thors [17–21], and the second lowest-energy state, as
identified in Refs. [19,22]. While our overlap definition
relies on a distance definition in the space of the dihe-
dral angles, it turns out that for the data analysis the
use of the root-mean-square (rms) distance is crucial. It
is only in the latter variable that one obtains a clear pic-
ture of the transition saddle point in the two-dimensional
free energy diagram. In the final section a summary of
the present results and an outlook with respect to future
applications are given.
II. MULTI-OVERLAP METROPOLIS
ALGORITHM
In this section we explain the details of our multi-
overlap algorithm. The overlap of a configuration versus
a reference configuration is defined in the next subsec-
tion. In the second subsection we discuss details of the
updating. To achieve step one of the method, i.e., the
construction of a working estimate of the multi-overlap
weights, one could employ a similar recursion as the one
used in [12] or explore the approach of [11]. Instead of
doing so, we decided to test a new method: At infinite
temperature, β = 0, the overlap distributions can be cal-
culated analytically (see subsection IID). We use this
as starting point and estimate the overlap weights at the
desired temperature by increasing β in sufficiently small
steps so that the entire overlap range remains covered.
In the final subsection we define the overlap with respect
to two distinct reference configurations to cover the tran-
sition region between them.
A. Definition of the overlap
There is a considerable amount of freedom in defining
the overlap of two configurations. For instance, one may
rely on the rms distance between configurations, and in
subsection IIID we analyze some of our results in this
variable. However, the computation of the rms distance
is slow and for MC calculations it is important to rely on
a computationally fast definition. Therefore, we define
the overlap in the space of dihedral angles by, as it was
already used in [24],
q = (n − d)/n ,(1)
where n is the number of dihedral angles and d is the
distance between configurations defined by
d = ||v − v1|| =
1
π
n
?
i=1
da(vi,v1
i) .(2)
Here, viis our generic notation for the dihedral angle i,
−π < vi≤ π, and v1is the vector of dihedral angles of the
reference configuration. The distance da(vi,v′
two angles is defined by
i) between
da(vi,v′
i) = min(|vi− v′
i|,2π − |vi− v′
i|) .(3)
The symbol ||.|| defines a norm in a vector space. In
particular, the triangle inequality holds
||v1− v2|| ≤ ||v1− v|| + ||v − v2|| . (4)
For a single angle we have
0 ≤ |vi− v1
i| ≤ π ⇒ 0 ≤ d ≤ n .(5)
At β = 0 (i.e., infinite temperature)
di =
1
πda(vi,v1
i)(6)
is a uniformly distributed random variable in the range
0 ≤ di ≤ 1 and the distance d in (2) becomes the sum
of n such uniformly distributed random variables, which
allows for an exact calculation of its distribution.
B. Multi-overlap weights
We choose a reference configuration of n dihedral an-
gles v1
i, (i = 1,...,n) to define the dihedral distance (2).
We want to simulate the system with weight factors that
lead to a random walk (RW) process in the dihedral dis-
tance d,
d < dmin → d > dmax and back . (7)
Here, dmin is chosen sufficiently small so that one can
claim that the reference configuration has been reached,
e.g., a few percent of n/2, which is the average d at
2
Page 11
A1
B1
C
0123456
rms 1
0
1
2
3
4
5
6
rms 2
FIG. 11. Free-energy landscape at T = 250 K with respect
to rms distances (˚ A) from the two reference configurations,
F(rms1,rms2). Contour lines are drawn every 2kBT. The
labels A1 and B1 indicate the positions for the local-minimum
states at T = 250 K that originate from the reference config-
uration 1 and the reference configuration 2, respectively. The
label C stands for the saddle point that corresponds to the
transition state.
IV. SUMMARY AND CONCLUSIONS
We have outlined an approach to perform MC simu-
lations which yield the free-energy distribution between
two reference configurations. The multi-overlap weights
for this purpose were obtained by a novel, iterative pro-
cess. The main point of this iterative process is not that
it is supposed to be more efficient than the recursion that
was used in the multi-self-overlap simulations of Ref. [12],
but that it is an entirely independent approach, which
starts from an analytically controlled limit. Recursions
like the one used in [12] are not “foolproof”. For in-
stance, while most of the spin glass replica in Ref. [12]
were well-behaved, a few did not complete their recur-
sion after more than an entire year of single processor
CPU time. Similar situations could be encountered in
all-atom simulations of larger peptides, where the normal
multicanonical weight recursion as well as similar multi-
overlap weight recursion could fail. The present method
provides then an alternative, approaching the physical
region from a different limit.
Noticeable, our multi-overlap approach is well-suited
to be combined with a recently introduced, biased
Metropolis sampling [30]. Namely, the required config-
FIG. 12. The transition state between reference configura-
tions 1 and 2. See the caption of figure 1 for details.
urations at higher temperatures are as well necessary for
our particular multi-overlap recursion, so that no extra
simulations are required in this respect.
On the physical side, we have found that entropy ef-
fects are rather important for a small peptide. The ef-
fects of entropy on the folding of real proteins in realistic
solvent have yet to be studied in detail.
We have also performed the analysis of this paper for
Met-enkephalin with variable ω angles and, in particular,
simulated with combined weights at a number of temper-
atures. The results found are quite similar to those re-
ported in this paper. In future work we intend to analyze
the transition between reference configuration for larger
systems of actual interest like β-lactoglobulin.
ACKNOWLEDGMENTS
We are grateful for the financial support from the Joint
Studies Program of the Institute for Molecular Science
(IMS). One of the authors (B.B.) would like to thank
the IMS faculty and staff for their kind hospitality dur-
ing his stay in spring 2002. In part, this work was sup-
ported by grants from the US Department of Energy un-
der contract DE-FG02-97ER40608 (for B.B.), from the
Research Fellowships of the Japan Society for the Promo-
tion of Science for Young Scientists (for H.N.) and from
the Research for the Future Program of the Japan Soci-
ety for the Promotion of Science (JSPS-RFTF98P01101)
(for Y.O.).
[1] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H.
11
Page 12
0123456
rms 1
0
1
2
3
4
5
6
rms 2
FIG. 13. Internal energy landscape at T = 250 K with re-
spect to rms distances (˚ A) from the two reference configura-
tions, U(rms1,rms2). Contour lines are drawn every 2kBT.
Teller, and E. Teller, J. Chem. Phys. 21, 1087 (1953).
[2] U.H. Hansmann and Y. Okamoto, in Ann. Rev. Comput.
Phys. VI, D. Stauffer (ed.) (World Scientific, Singapore,
1999) p. 129.
[3] A. Mitsutake, Y. Sugita, and Y. Okamoto, Biopolymers
(Peptide Science) 60, 96 (2001).
[4] B.A. Berg, Comp. Phys. Commun. 104, 52 (2002).
[5] G.M. Torrie and J.P. Valleau, J. Comput. Phys. 23, 187
(1977).
[6] B.A. Berg and T. Neuhaus, Phys. Lett. B 267, 249
(1991).
[7] B.A. Berg and T. Celik, Phys. Rev. Lett. 69, 2292 (1992).
[8] B.A. Berg, U.H. Hansmann, and T. Neuhaus, Phys. Rev.
B 47, 497 (1993).
[9] B.A. Berg, J. Stat. Phys. 82, 323 (1996).
[10] Y. Sugita and Y. Okamoto, Chem. Phys. Lett. 329, 261
(2000).
[11] F. Wang and D.P. Landau, Phys. Rev. Lett. 86, 2050
(2001).
[12] B.A. Berg, A. Billoire, and W. Janke, Phys. Rev. B 61,
12143 (2000).
[13] K. Kuwajima, H. Yamaya, S. Miwa, S. Sugai, and T.
Nagamura, FEBS Lett. 221, 115 (1987).
[14] D. Hamada, S. Segawa, and S. Goto, Nature Struct. Biol.
3, 868 (1996).
[15] M.J. Sippl, G. N´ emethy, and H.A. Scheraga, J. Phys.
Chem. 88, 6231 (1984) and references given therein.
[16] F. Eisenmenger, U.H. Hansmann, S. Hayryan, and C.-K.
Hu, Comp. Phys. Commun. 138, 192 (2001).
0123456
rms 1
0
1
2
3
4
5
6
rms 2
FIG. 14. Entropy landscape at T = 250 K with respect
to rms distances (˚ A) from the two reference configurations,
−TS(rms1,rms2). Contour lines are drawn every 2kBT.
[17] Z. Li and H.A. Scheraga, Proc. Natl. Acad. Sci. USA 84,
6611 (1987).
[18] B. von Freyberg and W.J. Braun, J. Comput. Chem. 12,
1065 (1991).
[19] Y. Okamoto, T. Kikuchi and H. Kawai, Chem. Lett.
1992, 1275 (1992).
[20] U.H. Hansmann and Y. Okamoto, J. Comput. Chem. 14,
1333 (1993).
[21] H. Meirovitch, E. Meirovitch, A.G. Michel, and M.
V´ asquez, J. Phys. Chem. 98, 6241 (1994).
[22] A. Mitsutake, U.H. Hansmann, and Y. Okamoto, J. Mol.
Graph. and Model. 16, 226 (1998).
[23] R.A. Sayle and E.J. Milner-White, Trends Biochem. Sci.
20, 374 (1995).
[24] U.H. Hansmann, M. Masuya and Y. Okamoto, Proc.
Natl. Acad. Sci. USA 94, 10652 (1997).
[25] B. Hesselbo and R. Stinchcombe, Phys. Rev. Let. 74,
2151 (1995).
[26] B.A. Berg, Markov Chain Monte Carlo Simulations and
Their Statistical Analysis I and II, books in preparation.
[27] E.J. Gumbel, Statistics of Extremes, Columbia Univer-
sity Press, New York, 1958.
[28] W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T.
Vetterling, Numerical Recipes in Fortran, Second Edi-
tion, Cambridge University Press, Cambridge, 1992.
[29] U.H. Hansmann, Y. Okamoto, and J.N. Onuchic, PRO-
TEINS: Structure, Function, and Genetics 34, 472
(1999).
[30] B.A. Berg, cond-mat/0209413, to be published in Phys.
Rev. Lett.
12