Page 1

Coarse Master Equations for Peptide Folding Dynamics†

Nicolae-Viorel Buchete‡and Gerhard Hummer*

Laboratory of Chemical Physics, National Institute of Diabetes and DigestiVe and Kidney Diseases,

National Institutes of Health, Bethesda, Maryland 20892-0520

ReceiVed: August 2, 2007; In Final Form: October 25, 2007

We construct coarse master equations for peptide folding dynamics from atomistic molecular dynamics

simulations. A maximum-likelihood propagator-based method allows us to extract accurate rates for the

transitions between the different conformational states of the small helix-forming peptide Ala5. Assigning

the conformational states by using transition paths instead of instantaneous molecular coordinates suppresses

the effects of fast non-Markovian dynamics. The resulting master equations are validated by comparing their

analytical correlation functions with those obtained directly from the molecular dynamics simulations. We

find that the master equations properly capture the character and relaxation times of the entire spectrum of

conformational relaxation processes. By using the eigenvectors of the transition rate matrix, we are able to

systematically coarse-grain the system. We find that a two-state description, with a folded and an unfolded

state, roughly captures the slow conformational dynamics. A four-state model, with two folded and two unfolded

states, accurately recovers the three slowest relaxation process with time scales between 1.5 and 7 ns. The

master equation models not only give access to the slow conformational dynamics but also shed light on the

molecular mechanisms of the helix-coil transition.

1. Introduction

Molecular dynamics (MD) simulations provide a unique tool

to explore the structure, energetics, and dynamics of biomol-

ecules. Broad spatial and temporal scales are accessible, ranging

from atomic motions on quantum energy surfaces to protein

binding and association in macromolecular assemblies. Across

the different time and length scales, MD simulations offer an

exhaustively detailed description of the motion of every atom

and the resulting thermodynamic and kinetic properties.1

However, the great temporal and spatial resolution of MD

comes at a price:time scale limitations and potentially

overwhelming detail. On one hand, the characteristic times of

many biologically relevant processes remain out of reach.

Despite enormous progress in accelerating the simulations, the

∼1-fs time step used in the integration of the equations of

motion is set firmly by fast time scales of molecular vibrations

and collisions, requiring ∼1015integration steps to reach the

seconds time scale, well beyond the reach of current technology.

On the other hand, the level of detail contained in a simulation

trajectory, with the path of every single protein atom or water

molecule traced out in space, can be overwhelming. As a result,

the interpretation of simulation data for complex molecular

systems itself requires increasingly sophisticated analysis meth-

odologies.

Master equations2-10provide a powerful framework within

which both challenges can be addressed. By describing the

system at a coarser level, master equations enable analytical

treatments and thereby address the time scale problem; at the

same time, they focus from the onset on the dynamics of the

most relevant aspects of the system, thus aiding in the

subsequent analysis.

The formal development of master equations is well-rooted

in statistical mechanics and closely connected to the projection-

operator formalism.11In essence, the motions of the detailed

molecular system are projected onto a discrete representation

of the conformation space, and equations of motion for this

projection are developed. The functions to be projected on

typically have finite support, defining cells in conformation

space. For proteins, one can, for instance, use the number of

“correct bonds” or native amino acid conformations,12,13dihedral

angles, or amino acid contacts. The master equation then

describes the dynamics of the populations in each of the

respective states.

Master equations can be exact representations of the system;

for instance, of lattice models in protein folding.14Here, we

are concerned with systems in which they provide, at best, good

approximations. With the implicit assumption of Markovian (i.e.,

memoryless) dynamics, the resulting approximate master equa-

tions can be cast in the form of kinetic rate equations (see eqs

1 and 2, below). However, other representations are possible.

For instance, the Fokker-Planck equation for diffusion dis-

cretized in space also takes on the form of a master equation.15

Accurate diffusion models can thus be obtained from simulation

trajectories by first fitting a master equation.16,17

MD simulations provide a means to construct approximate

master equations that accurately incorporate detailed molecular

information. By projecting the simulation trajectories onto the

states of the master equation model, one can effectively

parametrize the equations of motion in the coarse space. Similar

projection procedures have been used in coarse molecular

dynamics.18

A common assumption is that the dynamics within the space

of coarse coordinates is, indeed, Markovian. Clearly, this is not

likely the case in practice. Unless one uses a very large coarse

†Part of the “Attila Szabo Festschrift”.

* Towhom correspondence

gerhard.hummer@nih.gov.

‡Current address: School of Physics, University College Dublin, Belfield,

Dublin 4, Ireland.

shouldbeaddressed.Email:

6057

J. Phys. Chem. B 2008, 112, 6057-6069

10.1021/jp0761665 CCC: $40.75© 2008 American Chemical Society

Published on Web 01/31/2008

Page 2

space that includes coordinates not only for the solute (here, a

protein) but also the solvent,19,20the neglected degrees of

freedom will not simply average out but lead to non-Markovian

dynamics, at least at short times.

Finding optimal methods to partition the conformation space

of a biomolecule into discrete states remains a central problem

in building coarse models.4,5,21-29A number of methods have

been developed for this purpose,30-32but the challenge remains

to avoid arbitrary decisions in state assignments and to develop

fully automated and unbiased state partitioning algorithms.33In

this work, we are not concerned with identifying optimal coarse

states of a molecular system. Instead, we explore if one can

construct useful master equations given an assignment in which

states are reasonably well identified, but not well enough that

the dynamics is truly Markovian.

The focus here is on three main aspects: how can one

construct master equation models that accurately capture the

dynamics over a broad range of time scales? How sensitive are

the resulting models to details in their construction, in particular,

the assignment of conformations to coarse states? And how can

one analyze the resulting master equation models to extract

useful minimal descriptions of the molecular motions?

The paper is organized as follows. We will first introduce

relevant aspects of the master equation formalism, including

analytic expressions for propagators and correlation functions.

We will then describe methods to construct master equations

from simulation data. To illustrate and test this formalism, we

will study the helix-coil transition of a blocked Ala5 pentapep-

tide in water. Short polyalanine peptides have been studied

extensively using both experimental and theoretical methods.34-47

Ala5 is small enough to permit an exhaustive investigation with

MD simulations. Nevertheless, it is complex enough to require

25) 32 conformational states, even if one uses only “helix”

and “coil” designations for each of the five amino acids. To

construct master equations for the populations of the 32 states,

we perform MD simulations of Ala5 in water covering 1 µs

each at 250, 300, and 400 K. To assign the conformational states,

we will use two different methods: one using instantaneous

dihedral angles and the other transition paths. From the resulting

trajectories in the space of the 32 states, we will construct master

equations first using lifetimes and branching probabilities and

then using a maximum likelihood propagator-based (MLPB)

method that better suppresses the problems arising from fast

non-Markovian dynamics. The validity of the resulting master

equations will be assessed by comparing correlation functions

from actual simulation trajectories with the analytical predictions

of the master equation models. By analyzing the eigenstates of

the master equation models, we show that the slow conforma-

tional dynamics can be described roughly by using two states

(folded/helix and unfolded/coil), and accurately by using four

states. This additional coarse-graining allows us to explore the

mechanisms of the helix-coil transition in the Ala5 system.

2. Theory

2.1. Master Equation. To construct a master equation for

the equilibrium conformational dynamics of the peptide, we

follow the approach of ref 10. We divide the peptide configu-

ration space into N non-overlapping cells. As shown by

Zwanzig,11the populations in the cells evolve according to a

generalized, integro-differential master equation. If the dynamics

of the populations is non-Markovian but with rapidly decaying

memory, we can describe the time evolution accurately in terms

of a simple master equation,

where Pi(t) is the population of state i, kjig 0 is the rate of

transitions from state i to state j, and P ˙i≡ dPi/dt. In vector-

matrix notation, eq 1 becomes

where the N × N rate matrix K has off-diagonal elements kjig

0 and diagonal elements kii) -∑j*ikji< 0. For a connected

state space, we have a unique, stationary equilibrium distribu-

tion,

that we assume to be nonzero and normalized, Peq(i) > 0 and

∑i)1

Peq(i) ) 1. By invoking the condition of detailed balance,

N

we define the elements of the symmetrized rate matrix as

or in matrix notation,

with Peq) diag[Peq(1), Peq(2), ..., Peq(N)].

2.2. Eigenvalues and Eigenvectors. The symmetrized rate

matrix Ksymhas eigenvalues λnwith corresponding orthonormal

eigenvectors φngiven by

One of the eigenvalues is zero, and all others are real and

negative. In the following, we assume that the eigenvalues are

sorted by magnitude: λ1) 0 > λ2g λ3g ... g λN. The λnare

also the eigenvalues of the original, nonsymmetric rate matrix,

with corresponding left and right eigenvectors ψnLand ψnR,

respectively, that satisfy

The ith elements of the left and right eigenvectors are related

to the corresponding elements of the eigenvectors of the

symmetrized rate matrix through

It follows from eq 3 that the first right eigenvector (for

eigenvalue λ1) 0) of K is given by the equilibrium population,

ψ1

and ψ1

the normalization conditions

R(i) ) Peq(i). From eq 9, one then finds that φ1(i) ) Peq1/2(i)

L(i) ) 1. From the orthonormality of the φn, one obtains

P ˙i(t) )∑

j)1

N

[kijPj(t) - kjiPi(t)] (1)

P 4 (t) ) KP(t)(2)

KPeq≡ 0 (3)

kijPeq(j) ) kjiPeq(i)(4)

kij

sym) Peq

-1/2(i) kijPeq

1/2(j) ) (kijkji)1/2

(5)

Ksym) Peq

-1/2K Peq

1/2

(6)

Ksymφn) λnφn

(7)

Kψn

R) λnψn

R

ψn

LK ) λnψn

L

(8)

ψn

L(i) ) Peq

-1/2(i) φn(i)

ψn

R(i) ) Peq

1/2(i) φn(i)(9)

6058 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 3

where δnmis the Kronecker delta. It follows that ∑i)1

∑i)1

Peq(i) ψn

2.3. Propagators and Correlation Functions. The solution

to the master eq 2 in terms of a matrix exponential is

N

ψn

R(i) )

N

L(i) ) δ1n.

By using the spectral decomposition, we can express the time-

dependent populations as

The projection of the initial state P(0) onto each of the left-

hand eigenvectors ψn

corresponding exponential phases with relaxation time τn) -1/

λn. The right-hand eigenvectors give the weight of the phase

for each of the states. It follows that the propagators (or Green’s

function), defined as the probability of being in state j at time

t, given that the system was in state i at time 0, can be written

as

Lthus determines the amplitude of the

The rate matrix K is therefore the generator of the dynamics,

and its matrix exponential defines a transition matrix for time

t. Indeed, rather than using K itself, we could use the transition

matrix exp(Kt) to propagate the system over discrete time

intervals t.

Following the formalism of Bicout and Szabo,15we express

correlation functions in terms of eigenmodes. Specifically, we

use eq 13 to calculate correlation functions in state space. If

s(t) ∈ {1, 2, ..., N} is the state of the system at time t, and f and

g are vectors with elements f[i] and g[i], respectively, then the

correlation function 〈g[s(t)] f[s(0)]〉 can be written as

The corresponding normalized correlation function c(t) ) [c ˜(t)

- c ˜(∞)]/[c ˜(0) - c ˜(∞)] becomes

where we used that λ1) 0. If we project the trajectory in state

space onto the left-hand eigenvectors, f ) ψm

use their normalization eq 10, we obtain single-exponential

decays (without explicit normalization!):

Land g ) ψl

Land

with amplitude 1 if m ) l and 0 if m * l. Note that for n > 1

the normalized and unnormalized auto-correlation functions are

identical: Cnn(t) ) [C ˜nn(t) - C ˜nn(∞)]/[C ˜nn(0) - C ˜nn(∞)] )

C ˜nn(t). Later, we will use eq 16 to validate the fitted master

equation against actual simulation trajectories.

2.4. Coarse Graining of the Master Equation. In many

practical cases, the number of states, N, will be large, and one

would like to reduce that number to M states by grouping the

N states into M < N nonoverlapping classes. Such clustering

will be particularly useful if a gap exists in the sorted eigenvalue

spectrum beyond eigenvalue M; that is, λ2J λ3J ... J λM.

λM+1.

The spectral decompositions of the time-dependent popula-

tions and state-space correlation functions, eqs 11 and 14, are

suggestive with respect to such further coarse graining. A

grouping corresponds to finding a set of vectors vm with

elements Vm(i) ) 0 or 1 and ∑m)1

support for class m [i.e., if Vm(i) ) 1, then state i belongs to

class m]. The unnormalized number correlation function for

being in class m is then c ˜m(t) ) 〈Vm[s(t)]Vm[s(0)]〉. The

corresponding normalized correlation function is cm(t) ) [c ˜m(t)

- c ˜m(∞)]/[c ˜m(0) - c ˜m(∞)]. Following Bicout and Szabo,48one

can search for the grouping that makes the normalized number

correlation function as single-exponential as possible for the

slowest relaxation time. By using eq 14, the normalized number

correlation function for class m becomes

M

Vm(i) ) 1 that provide the

where f ) vm‚Peqis the equilibrium population of class m. The

“best” two-state representation can then be found by maximizing

the amplitude of the slowest relaxation, λ2(note: λ1) 0), over

the elements of the projection, Vm(i) ∈ {0, 1}:

For continuous Vm(i), the maximum would be Vm(i) ) ψ2

which according to eq 16 would give a single exponential with

the optimal amplitude of 1. For the discrete Vm(i), ignoring the

denominator, a large amplitude will be obtained if the sign is

retained, Vm(i) ) 1 for ψ2

[We note that ψn

from eq 9.]

To cluster the system into three states, one would maximize

the amplitudes for the slowest two relaxation processes corre-

sponding to eigenvalues λ2and λ3. Following arguments as in

the two-state case, using signs of the eigenvectors leads to an

approximate solution. After the appropriate sign change, ψ2

and ψ3

and negative, respectively. Such grouping of the states according

to the sign structure of the left-hand eigenvectors is, indeed,

the result of a careful analysis using Perron clustering theory.49

To minimize the effect of numerical noise, in a robust version

L(i),

L(i) > 0, and Vm(i) ) 0 for ψ2

R(i) and ψn

L(i) e 0.

L(i) have the same sign, as follows

L(i)

L(i) are both positive, negative and positive, or positive

∑

i)1

N

ψn

L(i) ψm

L(i) Peq(i) )∑

i)1

N

ψn

R(i) ψm

R(i)/Peq(i)

)∑

i)1

N

ψn

R(i) ψm

L(i) )∑

i)1

N

φn(i) φm(i) ) δnm(10)

P(t) ) exp(Kt)P(0)(11)

P(t) )∑

n)1

N

ψn

R[ψn

L‚P(0)]eλnt

(12)

p(j, t|i, 0) ) [eKt]ji)∑

n)1

N

ψn

R(j) ψn

L(i) eλnt

(13)

c ˜(t) ) 〈g[s(t)] f[s(0)]〉 ) lim

Tf∞T-1∫0

Tg[s(τ + t)] f[s(τ)] dτ

)∑

i,j)1

N

g[j] f[i] p(j, t|i, 0) Peq(i) )∑

n)1

N

eλnt(g‚ψn

R)(f‚ψn

R) (14)

c(t) )

∑

n)2

N

eλnt(g‚ψn

R)(f‚ψn

R)

∑

n)2

N

(g‚ψn

R)(f‚ψn

R)

(15)

C ˜lm(t) ) 〈ψm

L[s(t)] ψl

L[s(0)]〉 ) δlmeλmt

(16)

cm(t) )

∑

n)2

N

eλnt(vm‚ψn

R)2

∑

n)2

N

(vm‚ψn

R)2

)

∑

n)2

N

eλnt(vm‚ψn

R)2

f(1 - f)

(17)

max

Vm(i) ∈ {0,1}

(vm‚ψ2

R)2

vm‚Peq(1 - vm‚Peq)

(18)

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6059

Page 4

of Perron clustering,31,32,50one can alternatively group the states

on the basis of their distance in the projection onto the ψn

instead of using the sign structure. We note that related methods

have been used in dimensionality reduction.51,52

In a variant of the above approach, one could group the states

according to their splitting or commitment probability. The

splitting probability, σ(i), of a state i is the probability of

reaching a certain state (or group of states) before reaching

another state (or group) when starting from state i.53Transition

states can be identified as states with a splitting probability near

1/2 with respect to, say, fully folded and completely unfolded

states of a protein.54-58For a two-state-like system, with a

spectral gap after λ2, Berezhkovskii and Szabo59showed that

σ(i) is accurately approximated by the left-hand eigenvectors,

shifted and scaled to the interval [0, 1],

L

for n ) 2. This relation suggests a grouping of the states in

which those with σ2(i) < 1/2 belong to one class, and those

with σ2(i) g 1/2, to the other. In practice, this grouping produces

results very similar to those based on the sign of ψ2

extension to multiple states involves calculating the σn(i) also

for n g 3.

Using the same idea, we can also calculate the exact splitting

probability, Fi, between two states representing the two group-

ings. Here, those two states will be the completely unfolded

state, 1, and the fully folded state, N ) 32. The vector F then

satisfies the linear equation

L(i). The

with “boundary conditions” F1) 0 and FN) 1. Again, states

would be grouped according to Fi>/< 1/2.

3. Constructing Coarse Master Equations from

Simulation Trajectories

We are here interested in molecular systems in solution, such

that in general, one cannot calculate the elements of the

transition rate matrix easily using transition state theory.60

Instead, we will assume that transitions have been observed in

equilibrium simulations (possibly of short duration, but ap-

propriately initialized10). However, if some of the transitions

are too slow for direct sampling, one could augment the direct

simulations with transition-path sampling calculations.61-63

In the following, we will describe two methods to construct

coarse master equations from simulation data: a simple, yet

potentially less accurate lifetime-based (LB) method and a more

involved maximum-likelihood (or Bayesian) propagator-based

method. In both methods, we assume that we have assigned

the conformations at discrete times, tR, along the trajectory to

states s(tR) ∈ {1, 2, ..., N}. We note, however, that the proper

assignment of states itself poses a major challenge. As will be

discussed in detail below, a poor assignment manifests itself in

strongly non-Markovian dynamics, in which changes in the

assigned state often do not reflect actual transition events.

3.1. Lifetime-Based Estimate of the Rate Matrix. We

identify transitions as times tR at which the assigned state

changes, s(tR) * s(tR-1). Let Nji be the number of i f j

transitions. Detailed balance, and microscopic time reversibility,

require that for a long trajectory, Nij) Nji. To enforce detailed

balance, we symmetrize the matrix of transitions: Nij

+ Nji. The branching probability of going from state i directly

sym) Nij

to state j is Πji) Nji

system spends in state i, as estimated from the mean duration

of all visits to i. We can then define the corresponding transition

rate as

sym/∑l)1

N

Nli

sym. Tiis the average time the

for i * j and

These relations would be exact if the trajectory were, indeed,

generated by a kinetic rate equation.

As we will show, even though the LB method is simple and

easy to implement when long equilibrium trajectories are

available, it is sensitive to the assignment of conformational

states. With the LB approach, one needs to estimate accurate

lifetimes, Ti, as well as accurate branching probabilities, Πij,

both of which are difficult to obtain if transitions are not always

properly identified.

3.2. Maximum-Likelihood Propagator-Based Method. An

alternative method for extracting the rate matrix K uses estimates

of the propagators p(j, t|i, 0) obtained from the simulation

trajectories.10The elements of the rate matrix are determined

either by maximizing the likelihood or by Bayesian inference.

Likelihood-based and Bayesian approaches proved useful also

in the analysis of single-molecule experiments64,65and nonlinear

dynamical systems.66,67

For a Markovian trajectory s(tR) ∈ {1, 2, .., N} in the space

of the N states, the probabilities of successive transitions are

independent. Given a rate matrix K, the likelihood of a trajectory

is the product of propagators, as defined in eq 13,

where ∆t ) tR+1- tRis the time interval (or lag time) used to

calculate the propagator, and Ttotis the total simulation time.

Here, we will use lag times between 1 ps and ∼1 ns. If Nji)

Nji(∆t) is the number of times the trajectory is in state i at time

t and state j at time t + ∆t, then the likelihood can also be

written as a product over states instead of times:

To find the optimal transition rates, kij, given a simulation

trajectory, we maximize the likelihood function, L. Alternatively,

we could use Bayesian inference with an appropriate prior.10

Because of detailed balance, not all kij are independent. In

particular, by using eq 4, we can write the rate matrix in terms

of N - 1 independent equilibrium probabilities Peq(i) and the

N(N - 1)/2 elements above the diagonal, or a total of (N +

2)(N - 1)/2 independent and positive elements. The number of

parameters is further reduced if not all states are connected.

Here, we will assume that transitions can occur only between

neighboring states. Correspondingly, kij) 0 if i - 1 and j - 1

differ by more than one bit in binary notation. For N ) 25)

kji)

Πji

Ti

)

Nji

sym

Ti∑

l)1

N

Nli

sym

(21)

kii) - Ti

-1) -∑

j)1

(j*i)

N

kji

(22)

L )∏

tR

Ttot-∆t

p[s(tR+ ∆t), ∆t|s(tR), 0](23)

L )∏

i)1

N

∏

j)1

N

[p(j, ∆t|i, 0)]Nji

(24)

σn(i) )

ψn

L(i) - minjψn

maxjψn

L(j)

L(j) - minjψn

L(j)

(19)

FK ) 0(20)

6060 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 5

32 states, that reduces the number of free parameters from 527

to 111. Moreover, to improve the statistics for large lag times

∆t, we will collect propagators using sliding windows. Strictly

speaking, since those windows overlap, the likelihood in eq 23

would not factorize, but that will be ignored.

The procedure then is to first assign the conformations along

the simulation trajectories to states. From the resulting time

series in state space, s(tR), we calculate the matrix Njiof the

number of transitions from state i to j after a fixed lag time ∆t.

We then find the parameters of the “optimal” rate matrix by

maximizing the likelihood, L. This is done by minimizing the

negative logarithm of the likelihood function, -ln L. Specifi-

cally, we perform simulated annealing using a Metropolis Monte

Carlo algorithm with parameters ln Peq(i) and ln kij(j > i).10To

evaluate the likelihood for a given rate matrix, we use eq 13

with eigenvectors and eigenvalues obtained by diagonalizing

the symmetrized rate matrix, eq 5, and then using eqs 7 and 9.

By comparing the results for different lag times, ∆t, we can

explore to what degree the dynamics is Markovian in the space

of the coarse states. If it were truly Markovian, then the rate

matrices for different lag times should agree within statistical

noise. We expect that the eigenvalue corresponding to the

slowest decaying mode will be most sensitive to non-Markovian

dynamics. In particular, incorrectly identified transitions between

states tend to speed up the overall relaxation. That effect is most

pronounced for short lag times, suggesting that the time scale

of the slowest relaxation will increase with the lag time if the

dynamics is not strictly Markovian in the state space used. We

note that if the lag-time dependence is large, indicating strongly

non-Markovian behavior, one can attempt to expand the state

space.

We note further that here Πji ) Nji

mates the propagator, Πji≈ p(j, ∆t|i, 0). Accordingly, if one is

interested primarily in the long-time dynamics, one can use

powers Πkfor k ) 1, 2, etc., to propagate by k∆t directly:4,5,68

p(j, k∆t|i, 0) ≈ [Πk]ji. We note, however, that if the dynamics

is, indeed, generated by a rate matrix K, then the resulting

transition matrix Π should have eigenvalues µn) exp(tλn), and

eigenvectors identical to those of K. In practice, that may not

be the case. In particular, even if one uses the symmetrized Nsym

that satisfies detailed balance, the eigenvalues of the corre-

sponding transition matrix Π can actually be negative. The

reason is, in effect, that the diagonal and off-diagonal elements

of Nsymobtained as simulation data samples are not necessarily

balanced. Eigenvalues that are negative with statistical signifi-

cance indicate that the distributions of lifetimes Tiin the states

i are nonexponential, which would suggest the presence of

hidden states.

sym/∑l)1

N

Nli

symapproxi-

4. Molecular Dynamics Simulations

We used the GROMACS 3.3 package69,70to run molecular

dynamics simulations of blocked alanine pentapeptide (Ala5)

with explicit solvent, using the AMBER-GSS force field46ported

to GROMACS.71This force field is essentially identical to the

AMBER-9472implementation with peptide (Φ, Ψ) torsional

potentials removed to reproduce experimental helix-coil equi-

libria73and no scaling of 1-4 van-der-Waals interactions.71

Ala5 was capped with an ACE group (CH3CO-) at the N

terminus, and an NME group (-NHCH3) at the C terminus.72

We used periodic boundary conditions and the particle mesh

Ewald (PME) method74,75with a real-space cutoff distance of

10 Å and a grid width smaller than 1 Å.75All simulations were

performed in the NPT ensemble with a coupling coefficient of

5 ps.76,70Two-femtosecond time steps were used in conjunction

with constrained bonds of hydrogen atoms.77,78Structures were

saved at intervals of 1 ps. The simulation box contained 1050

explicit TIP3P water molecules,79the total system size being

3212 atoms. The results presented in this paper were obtained

by analyzing four trajectories, each of 250 ns duration at three

different temperatures (250, 300, and 350 K), for a total

combined simulation time of 3 µs, or 1 µs at each temperature.

The trajectories were initiated from different peptide conforma-

tions: fully folded, completely unfolded, and two intermediate

states (11111, 00000, 10101, and 01010 in the binary notation

described below). At each temperature, it was verified that the

results of the four 250-ns trajectories did not differ significantly.

4.1. Coarse-Grained Conformation Space. Here, we are

not concerned with identifying the best method for coarse

graining the conformational space of a molecular system.

Although no general recipe is available for biomolecular

systems, some guidelines have been proposed recently for the

case of small peptides.31,33-80Instead, we will demonstrate how

one can construct useful master equations given a typical coarse-

graining procedure that separates states reasonably well, but not

necessarily well enough that the assumption of Markovian

dynamics embodied in the master equation is fully justified.

We adopt a description of conformational states based on

the (Φ, Ψ) backbone dihedral angles18,31of each residue. Figure

1 shows the Ramachandran free energy profile in the (Φ, Ψ)

plane averaged over the five residues.

As illustrated in Figure 1, we separate the conformations of

individual amino acids into helical and nonhelical states (denoted

by 1 and 0, respectively). The helical states are dominated by

a minimum near Φ ≈ Ψ ≈ -50°. The nonhelical (or coil40)

states are dominated by polyproline-II- and ?-strand-like

conformations, with minima near Ψ ≈ 120° and Φ ≈ -70°

and -120°, respectively. Other conformations (in particular, the

left-handed helical configuration in the upper right-hand quad-

rant of Figure 1) are short-lived and will be grouped together

with these extended configurations into the coil state.

We denote conformations of the Ala5 peptide in a five-digit

binary notation, starting with the N-terminal residue on the left.

For instance, “01011” indicates that the first and third residues

are in the coil state, with the second, fourth, and fifth residue

in the helical state. Occasionally, we will also use a decimal

notation, with states between 1 and 32 defined as their binary

Figure 1. Ramachandran free energy surface for Ala5 from four 250-

ns simulations using the Amber-GSS force field.46The assignments of

the conformational states are illustrated by the green rectangle (CBA,

purely conformation-based assignment) and by the circular boundaries

(white for TBA15and black for TBA75).

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6061

Page 6

coding plus 1 (i.e., “01011” would then be state 12, with 1 the

completely unfolded state, and 32 the fully helical state

“11111”).

4.2. Assigning Peptide Conformations to States. To assign

peptide conformations along the simulation trajectories to states,

we will explore two different procedures. In the conventional

“conformation-based assignment” (CBA), we define a rectan-

gular region in the Ramachandran plane that covers the helical

minimum (Figure 1). If the instantaneous dihedral angle pair is

within that region, the state is helical (“1”); if it is outside, the

state is “0”.

CBA faces the problem of fast recrossings in and out of the

rectangular region. As shown by Bolhuis et al.19and confirmed

by Ma and Dinner,20the (Φ, Ψ) backbone dihedral angles alone

are poor reaction coordinates for the “helix-to-coil” transition

of the peptide backbone, despite being useful as coordinates in

diffusion models.16,18As a consequence, the actual slow

transitions between helical and coil states will likely be masked

by fast “recrossings” (caused by misassigned states) that do not

correspond to actual transition events.

To address this problem of assigning conformations to states

in the absence of a good reaction coordinate, we employ ideas

from transition path sampling.63,81We assume that we have a

decent “order parameter” [here, (Φ, Ψ)] that allows us to assign

conformations deep inside the helical and coil regions, respec-

tively, and performs poorly only in the intermediate “transition”

regions. With some confidence, we can then identify transition

paths as those trajectory segments that connect without recross-

ing between the unambiguously helical and coil regions (i.e.,

they go from one region directly to the other).63Realizing that

equilibration within the states is fast, it is advantageous to define

the two regions relatively stringently about the respective free

energy minima in order to identify proper transition paths. Even

though trajectories will then frequently leave the respective

regions, equilibrium excursions that do not amount to actual

transitions will rapidly return, and only actual transitions will

cross over from one region to the other.

In the resulting “transition-based assignment” (TBA), we

define two circular region (of radius R in degrees) located around

the dominant minima in the (Φ, Ψ) Ramachandran plane, Figure

1. First, transition paths are identified that connect the two

circular regions. The segments between the transition paths are

then assigned to the respective states (helix or coil). The first

half of the transition path is assigned to the initial state; the

second half, to the final state. For larger radii R, transitions

become more frequent.

Figure 2a illustrates the effect of conformational assignment

for a time series of the Ψ dihedral angle. In states assigned

using CBA, the trajectory recrosses frequently between states

with positive and negative Ψ. For TBA with a stringent

condition R ) 15°, most of the fast recrossings are identified

as equilibrium excursions from the minima, resulting in less

frequent crossings between states. With a less stringent condi-

tion, R ) 75°, the TBA state trajectory shows slightly more

frequent recrossings.

By using the TBA method to assign states, we hope to

eliminate much of the fast non-Markovian dynamics associated

with a poor choice of reaction coordinate. As we will show

below by comparing the results for CBA and TBA with more

or less stringent criteria, this is, indeed, the case.

The effect of the different methods to assign states is apparent

in the average lifetimes, Ti, in each of the 32 states (Figure 3a).

As expected, CBA based on the instantaneous conformation

leads to short lifetimes, likely reflecting non-Markovian recross-

ing events. Assigning states on the basis of transition paths

results in significantly longer lifetimes. For the most long-lived

Figure 2. State assignment using CBA and TBA methods. (a) Time

series (symbols) of the Ψ dihedral angle, illustrating the different

conformation assignments (red lines). Pure CBA assignment (bottom

panel) using Ψ ranges (light and gray shaded regions) to separate helical

(yellow) and coil states (green) is contrasted to TBA that use Ψ ranges

(gray shaded) for the helical and coil regions with widths of 15° (top)

and 75° (middle panel). In TBA, conformations (black) located between

the coil and helical regions are classified according to the neighboring

regions. The assignment changes only if an actual transition is detected.

(b) A 250-ns MD trajectory mapped onto 32, 4, and 2 states (top to

bottom).

Figure 3. Lifetimes in the 32 conformational states. (a) Average

lifetimes Ti(log scale) at 300 K. (b) Distribution of lifetimes without

distinction of state. Results are shown for state assignments using CBA

(red), TBA15 (blue), and TBA75 (green). The corresponding total

numbers of observed transition events are also shown.

6062 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 7

fully helical state 11111, the lifetime varies by an order of

magnitude, from ∼200 ps (CBA) to ∼2 ns (TBA15with R )

15°). Using a less stringent criterion (TBA75 with R ) 75°)

leads to a somewhat shorter lifetime (∼1 ns for 11111). The

effect of the assignment is also evident in the distributions of

lifetimes (Figure 3b) averaged over all states. The CBA

distributions are shifted toward short lifetimes, in comparison

to the TBA15and TBA75results from which fast recrossings

have been effectively eliminated.

5. Results and Discussion

5.1. Rate Matrix from Lifetime-Based Method. By using

the lifetime-based method, we determined rate matrices from

the four 250-ns MD trajectories of solvated Ala5 peptides at

300 K. Figure 4 shows the extracted equilibrium probabilities

Peq(i) and relaxation times τn, both obtained from an eigenvalue

analysis. The error bars, corresponding to one estimated standard

error of the mean, were obtained from block averages.

As expected, we find that the equilibrium populations,

Peq(i), are essentially identical for assignments of states using

the CBA and TBA methods. However, the relaxation times

depend strongly on how peptide conformations are assigned to

states. In particular, the most slowly decaying mode has a

relaxation time of <1 ns for CBA; for TBA, that time varies

from ∼2 to ∼6 ns for less or more stringent assignments (R )

75° and 15°, respectively).

We will next use the MLPB method to construct rate matrices.

This allows us to vary the time lag, in an effort to avoid

problems from fast non-Markovian dynamics in the projections

onto the coarse states. As we will show, for long lag times (∆t

∼1 ns), all methods, including CBA, will converge, giving rate

matrices very similar to those obtained by TBA15with lag times

as short as 1 ps.

5.2. Rate Matrix from Maximum-Likelihood Propagator-

Based Method. In constructing a master equation model, we

are ultimately interested in predicting and understanding the slow

molecular motions. If our assumption of a master equation, eq

1, were strictly valid, the resulting rate matrix would be

independent of the lag time (within statistical errors). However,

it seems reasonable to expect that at short times, such a model

may not fully capture the fast molecular motions. The goal here

is to properly extract the main characteristics of the slow

molecular motions without interference from fast motions,

because those are not captured by the master equation model.

Therefore, we need a procedure that uses trajectory data at long

time scales, where the dynamics should be sufficiently Mark-

ovian.

The MLPB method provides such a procedure by allowing

us to use transition data with finite lag times. Its input is the

number of times Nji(∆t) (when the system is in state j, given

that it was in state i some time ∆t before, independent of the

states visited at intervening times. The rate matrix is inferred

by maximizing the likelihood, given these data. Here, we use

the fact that the propagators for the master equation can be

calculated by matrix diagonalization, eq 13.

We have estimated the 32 × 32 rate matrices for Ala5 for

lag times between 1 ps (the frequency of saving structures) to

1 ns (on the order of the time for a single dihedral angle to

remain in the same state). Figure 5 shows values of the estimated

equilibrium populations at 300 K for two states [(a) Peq(11111)

and (b) Peq(11110)] as well as the slowest (τ2) and fastest (τ32)

relaxation times. The rate matrices were extracted from the four

250-ns trajectories at 300 K by using the MLPB method. For

comparison, we show results for assignments of states using

TBA15, TBA75, and CBA. For reference, we also include the

results for LB rate matrices.

We find that the equilibrium populations are independent of

both lag time and conformation assignment. In contrast, the

relaxation times depend strongly on the lag time for CBA,

somewhat less so for TBA75, and relatively little for TBA15. At

the shortest lag time, ∆t ) 1 ps, the MLPB relaxation times

are identical to those obtained by using the LB rate matrices

for the corresponding conformation assignment. However, as

the lag time increases, the slowest relaxation time τ2increases

for CBA from ∼0.5 ns at ∆t ) 1 ps to ∼6.5 ns at ∆t ) 1 ns.

Over the same range, τ2from TBA15changes from ∼6 ns to

∼7 ns, with statistical errors of ∼1 ns.

Figure 6a shows the relaxation times, defined as the recip-

rocals τn) -1/λnof the eigenvalues of the rate matrix, as a

function of n for different lag times and conformation assign-

ments. For TBA15, we find that the relaxation times are

practically independent of lag times between 1 and 300 ps. For

CBA, all relaxation times increase as the lag time increases.

TBA75is intermediate, with a weak lag-time dependence. We

find similar behavior for the lifetimes Tiin each of the 32 states

(Figure 6b), as obtained from the rate matrix according to eq

22.

From Figures 5 and 6, we conclude that to estimate the slow

molecular dynamics properly, it is important to (1) extend the

analysis to sufficiently large lag times where the assumption of

Figure 4. Populations and relaxation times from the LB master

equation. (a) Equilibrium probability distribution Peq(i) for the 32 coarse

states of Ala5 at 300 K. The inset shows the corresponding free energy

vs the fraction of helical residues, illustrating the temperature depen-

dence of the activation barrier between the 11111 (fully helical) and

00000 (completely unfolded) states. (b) Relaxation times estimated as

τi) -1/λiby diagonalization of the LB rate matrices.

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6063

Page 8

Markovian dynamics is justified, and (2) assign conformations

to states with care. The first point effectively rules out the direct

LB estimate of the rate matrix, with the exception of a

conformation assignment that strongly surpresses the effects of

non-Markovian dynamics (here, TBA15). Assigning states on

the basis of transition paths (TBA15) is clearly superior to

assignment on the basis of the instantaneous conformations

(CBA). But remarkably, at lag times ∆t ≈ 1 ns, the results for

all propagator-based methods converge. This convergence

implies that at the nanosecond time scale, the non-Markovian

character is already sufficiently weak, even for the CBA state

trajectories with their poor conformation assignment.

Nevertheless, estimating the fastest relaxation processes

accurately becomes increasingly difficult at large lag times ∆t.

Figure 5c shows that as ∆t exceeds 0.1 ns, one no longer obtains

a reliable estimate of τ32. The reason is that this fastest relaxation

is effectively at equilibrium for ∆t > τ32. By using the rate

matrix for ∆t ) 10 ps using TBA15, we achieve a good tradeoff

between accuracy at short and long times. In the following, we

will use that rate matrix as a reference.

5.3. Validation of the Master Equation: Time Correlation

Functions. As demonstrated by the results presented above, the

coarse master equation extracted from given simulation trajec-

tories depends sensitively on the way it was constructed. It is

therefore of paramount importance to validate the master

equation model against the simulation data. By construction,

the model is nearly guaranteed to reproduce the observed

equilibrium properties [here, Peq(i)]. A more challenging valida-

tion compares the actual simulation dynamics with that predicted

from the master equation.

If the simulation trajectories are projected onto the left-hand

eigenvectors, ψn

predicts, according to eq 16, that the corresponding auto-

correlation functions should decay as single exponentials with

relaxation times τn ) -1/λn. The cross-correlation between

projections onto two distinct modes, l and m, should be strictly

zero.

We can thus use eq 16 to validate (1) the time scales of the

different relaxation processes and (2) the character of the modes

themselves, as captured in the weights ψn

n. Normally, for any projection other than an eigenmode, the

correlation function will be multiexponential with contributions

from all N ) 32 modes. Therefore, if we find single-exponential

decay with the predicted relaxation times, we gain confidence

not only in the temporal properties of the rate matrix but also

in its “geometric” character, as reflected in its eigenmodes.

To calculate the correlation functions Clm(t) defined in eq 16

in practice, we use the following relation,

L, of the rate matrix, the master equation model

L(i) of state i in mode

for the discrete time series of states si) s(ti) ∈ {1, 2, ..., 32}

obtained from the simulation trajectories, with ψl

element of the vector ψn

number of saved structures. Note that the lag times ∆t are integer

multiples of δt.

Figure 7 (top) shows the decay of the normalized auto-

correlation functions Cnn(t) for n ) 2, 3, and 4 using the

eigenvectors of the reference rate matrix (i.e., for ∆t ) 10 ps,

L(si) the si-th

L, δt ) ti+1 - ti ) 1 ps, and N the

Figure 5. Populations and relaxation times from the MLPB method

as a function of lag time, ∆t, and for different conformational

assignments (CBA, TBA15, and TBA75). (a) Peq(11111) of the most

populated, fully helical state. (b) Peq(11110) of the state with a

nonhelical C-terminal residue. (c) Fastest extracted relaxation time, τ32

) -1/λ32. (d) Slowest extracted relaxation time, τ2) -1/λ2(note: λ1

) 0). The disconnected data points (on the far left) show the

corresponding results for the LB method (see Figure 4).

Figure 6. Relaxation times and lifetimes from the MLPB master

equation. (a) Relaxation times, τn, of the 31 decaying modes. (b)

Lifetimes Ti) -1/kiiof the 32 states. Results are shown for different

lag times, ∆t, and conformational assignments (top to bottom: TBA15,

TBA75, and CBA).

C ˜lm(t ) nδt) ) (N - n)-1∑

i)1

N-n

ψm

L(si+n)ψl

L(si)(25)

6064 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 9

TBA15). Note that C11(t) ≡ 1 is not shown. We find (1) that the

correlation functions obtained for conformation assignments

using CBA and TBA15are nearly identical, and (2) that they

agree very well with the exponential correlation functions

predicted from the reference master equation model (τnfrom

Figure 6a upper panel, ∆t ) 10 ps). We note further that the

non-Markovian “noise” in the CBA trajectories produces only

a small amplitude. Also shown in Figure 7 (bottom) is the cross-

correlation function C ˜23(t), which, as predicted by eq 16, is

practically zero within the noise level.

The good agreement between the correlation functions

calculated using the relatively poor CBA method to assign states

and the substantially better TBA15method may seem surprising

at first sight. However, the Cnn(t) correlation functions are

closely related to the number correlation functions of kinetics,

which can capture the correct slow population relaxation, even

if the projection onto the states is relatively poor. To illustrate

this point, Figure 8a schematically shows a 2D bistable free

energy surface. In a projection onto x, A′ states in the lower

right quadrant appear to be part of B, even though they actually

belong into the A meta-stable state. Similarly, B′ states in the

upper left quadrant appear as part of A in the projection. To

estimate the effects of this poor CBA-type state assignment, in

which A and B′ are grouped together, as well as B and A′, we

approximate the kinetics of the 2D system by the scheme in

Figure 8b. If the intrawell relaxation is fast compared to the

interwell relaxation and the population of misassigned states is

small (r . s . ?), then the slow phase of the normalized number

correlation function for grouped states A + B′ and B + A′,

respectively, has the exact relaxation rate, given by the first

nonzero eigenvalue of the rate matrix, -λ2≈ 2r?/(r + s). The

amplitude of this slow phase is approximately (r - s)2/(r + s)2

≈ 1 (i.e., the square of the relative difference in the populations

of A and A′). With r . s and a correspondingly small relative

population of A′, the amplitude is close to one. Therefore, even

for CBA with its poor projection, one approximately recovers

the correct number correlation function, exp(λ2t). In contrast,

propagators sampled at very short lag times will be compromised

by the poor projection. In particular, for the Ala5 dynamics

analyzed with the MLPB method, frequent crossings similar to

those between A and B′, and B and A′ are responsible for the

much faster relaxations, τn, obtained from CBA with short lag

times ∆t < 200 ps (Figure 5d).

In Figure 9a, we compare the relaxation times τnfrom the

reference master equation to the correlation times obtained for

the TBA15 projections Cnn(t) by (a) fitting an exponential

relaxation, and (b) integrating Cnn(t) from 0 to the first (noise-

induced) crossing of the zero-axis. Remarkably, we find that

the relaxation times of the master equation agree with those

from simulation over the entire spectrum n ) 2 to 32. Although

one could certainly have expected that the master equation

accurately describes the slowest motions (τ2), it may seem

somewhat surprising that it also captures the fastest motions

(τ32) without “leakage” of slow motions into the predicted fast

modes. However, for the Ala5 peptide studied here, the

backbone dihedral angles indeed provide a fairly complete set

of coordinates to describe motions on time scales beyond a few

tens of picoseconds. Therefore, our set of 32 states appears to

be reasonably complete at that time scale.

Nevertheless, there is some indication of weak non-Markovian

dynamics, even beyond ∆t ) 100 ps, because the slowest

relaxation time, τ2, gradually increases from ∼6 to ∼7 ns (Figure

5d). A similar value is, indeed, obtained from the exponential

fit to the simulation correlation function (Figure 9a and Table

1). However, the change in the relaxation time with the lag time

∆t in Figure 5d is within the statistical error, precluding a more

definite analysis.

Overall, we conclude from the agreement between the actual

and predicted correlation functions that the master equation

model accurately captures the conformational dynamics with

respect to both time scales and the character of its modes.

5.4. Temperature Dependence. To explore the temperature

dependence of the folding kinetics of Ala5, we analyzed

simulations at 250, 300, and 350 K. The inset of Figure 9a shows

the relaxation times obtained from the reference master equa-

tions on a double logarithmic scale. Note that in supercooled

solution (250 K) the conformational sampling is very slow,

resulting in poor statistics. Interestingly, the spectrum ap-

proximately follows a power law, τn∝ n-R, with an exponent

of R ≈ 1.1, with the exception of the first eigenvalue.

Figure 9b shows that the different relaxation times τnobey

an Arrhenius-like temperature dependence,

Figure 7. Mode correlation functions Cnm(t) from the MLPB master

equation and MD simulation at 300 K. The top panel compares the

autocorrelation functions Cnn(t) from the MLPB master equation

(relaxation time for ∆t ) 100 ps lag time; Figure 6a top) for modes n

) 2 to 4 to the simulation results obtained using CBA (circles) and

TBA15(dashed lines). Note that the CBA and TBA give nearly identical

correlation functions. The bottom panel compares C22(t) (red, circles)

and the unnormalized C ˜23(t) (black, crosses) from the CBA trajectory

to the MLPB master equation predictions (lines).

Figure 8. Projected dynamics. (a) Contour lines of a schematic bistable

free energy surface. In projection onto x, A and B′ states on the left

(x < 0), and B and A′ states on the right (x > 0) are grouped together,

respectively. (b) Kinetic scheme to explore effects of state misassign-

ment.

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6065

Page 10

The activation energies, En, associated with the different

exponential phases fall into different classes: E2≈ 30 kJ/mol

and En≈ 22 kJ/mol for 3 e n e 9. The larger value for E2can

be explained by the fact that the corresponding process involves

the breaking and formation of backbone hydrogen bonds, in

addition to changing the backbone dihedral angles.

5.5. Conformational Clustering: Reduced Representation

of the Master Equation. The presence of a significant gap in

the relaxation spectrum (Figure 9a) after the first eigenvalue

suggests that the system can be represented approximately by

two states alone: folded (F) and unfolded (U). To assign each

of the 32 microstates to U or F, we use (1) the sign of the

elements of the left-hand eigenvector ψ2

σ2(i) >/< 0.5 according to eq 19, and (3) Fi>/< 0.5 according

to eq 20 with Fi) 0 for i ) 1 (“00000”) and Fi) 1 for i ) 32

(“11111”).

Figure 10a shows the resulting assignments of each of the

32 states. In particular, we find that states 11111 and 11110

consistently belong to the folded state F. In contrast, 01111and

01110 with a nonhelical residue at the N terminus belong to

the folded state according to σ2(i) and Fibut not according to

L(i) ) φ2(i)/φ1(i), (2)

the sign structure of ψ2

of the resulting two-state trajectories.

This and other ambiguities raise the question concerning how

useful the groupings into folded and unfolded states are. To

address that question, we calculate the normalized number

correlation function c(t) ) 〈δn(t) δn(0)〉/〈δn2〉 where δn(t) )

n(t) - 〈n(t)〉, and n(t) is 1 if the system is in one of the folded

states, and 0, otherwise. As illustrated in Figure 8 and discussed

above, the number correlation function can capture the correct

slow population relaxation, even if the projection onto the states

is poor. We can therefore use c(t) as a reference. Specifically,

we fit the c(t) calculated from the MD data to a single

exponential (beyond ∼1 ns). To assess the quality of the

grouping, we compare the resulting relaxation times to that

obtained for projecting onto the full 32-state system (∼7 ns;

Figure 9).

Table 1 lists the amplitudes and relaxation times of the single-

exponential fits to the number correlation functions for the three

different two-state groupings. We find that the two clusterings

according to φ2(i)/φ1(i) and σ2(i) >/< 0.5 produce consistent

results, with relaxation times of ∼6.3 ns, and amplitudes of ∼0.9

in the slow phase, as compared to a relaxation time of 7 ns and

amplitude of 0.96 for the 32-state trajectory. In contrast, the

grouping according to the “exact” splitting probabilities Fi

produces a worse result, with a relaxation time of only 5.5 ns

and an amplitude of 0.76. Moreover, when we actually fit a

two-state model to the simulation data using the MLPB approach

and a lag time of 100 ps, we obtain a relaxation time of only

3.8 ns (Figure 11). We conclude from these discrepancies that

the two-state models built using the φ2(i)/φ1(i) and σ2(i) >/<

0.5 clusterings give an adequate, although not perfect, descrip-

tion of the kinetics.

To improve the reduced model, we expand the state space to

four states. The reason to skip the three-state model is that τ3

≈ τ4according to Figure 9a. To group the states, we project

the 32 states onto ψ2

L(i) ) φ2(i)/φ1(i). Figure 2b shows one

L(i) ) φ2(i)/φ1(i), ψ3

L(i) ) φ3(i)/φ1(i), and

Figure 9. Relaxation times for Ala5. (a) τn computed using the

eigenvalues of the MLPB rate matrix (black stars) and from the

simulation autocorrelation functions Cnn(t) via exponential fits (green

circles) and time integration of Cnn(t) (blue triangles). The inset shows

data for T ) 250 K (red), 300 K (green), and 350 K (blue) on a double

logarithmic scale. (b) Arrhenius-like dependence of relaxation times

on inverse temperature (data corresponding to the circles from the inset

of part a).

τn) Anexp(En/kBT)(26)

TABLE 1: Relaxation Times, τn, for the Full 32-State

System and for the 2- and 4-State Reduced Representationsa

nan

τn[ps]

32-State System

0.964

0.997

0.982

0.967

2

3

4

5

7030

1990

1490

1170

...

320.997 104

2-State Projection

s(t) from φ2/φ1

0.862

s(t) from σ2(i)

0.929

s(t) from Fi

0.764

4-State Projection

0.929

0.996

0.978

26380

26280

25560

2

3

4

7110

1770

1360

aThe relaxation times, τn, were obtained from single-exponential

fits [anexp(-t/τn)] to the normalized autocorrelation functions Cnn(t),

excluding an initial fast relaxation from the fit. The Cnn(t) were obtained

from the simulation trajectories by projecting them onto the ψn

according to eq 25. For the two-state system, the coarse trajectories

s(t) were obtained by clustering states using the sign of φ2(i)/φ1(i) or

the splitting probabilities σ2(i) and Fi, as indicated. The four-state

clustering was performed as illustrated in Figure 10.

L

6066 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 11

ψ4

separates the folded from the unfolded states and corresponds

to the slowest relaxation (7 ns). ψ3

two groups, but leaves the unfolded state intact, and ψ4

the unfolded state into two groups without affecting the folded

state. However, instead of grouping by the sign structure of the

ψn

rescaled σ2(i), σ3(i), and σ4(i) to group the states (solid red lines

in Figure 10b). We end up with four states: two folded states,

F1and F2, and two unfolded states, U1and U2.

The configurations belonging to each of the four states are

indicated by color in Figure 10a: F1 (purple), F2 (red), U1

L(i) ) φ4(i)/φ1(i). As shown in Figure 10a and b, ψ2

L

Lsplits the folded state into

Lsplits

L(i) (dashed zero axis in Figure 10b), we use the shifted and

(black), and U2 (blue). We find that the folded substates F1

and F2 jointly contain all states with at least three consecutive

helical dihedral angles (“111”) within the four N-terminal

residues and, thus, all structures with at least one (i, i + 4)

R-helical hydrogen bond in that segment. The difference

between F1 and F2 is that in F1, the N-terminal residue is in a

coil state, whereas in F2, it is helical. (We note that this indicates

slow relaxation at the N terminus and fast relaxation at the C

terminus, consistent with earlier findings.45,82) Figure 10c shows

representative molecular structures of folded (F1 and F2) and

unfolded (U1 and U2) conformations. Hydrogen bonds are

shown in green. Also shown are representative “transition states”

(TS1 and TS2) with splitting probabilities close to 0.5. Figure

2b shows one of the trajectories projected onto the four states.

To validate the four-state model, we could calculate the

number correlation functions for each of the four states.

However, those will in general be multiexponential. Instead,

we construct a 4 × 4 rate matrix using the MLPB procedure

for the TBA15 trajectory projected onto the four states. This

rate matrix provides us with new, four-element left-hand

eigenvectors onto which we can project the simulation trajec-

tories. Ideally, the autocorrelation functions calculated according

to eqs 16 and 25 should be single-exponential.

Table 1 lists the amplitudes and relaxation times for the three

decaying correlation functions obtained by exponential fits. If

the four-state model captured the slow relaxations properly, the

time constants would be the same as those corresponding to

Figure 10. Coarse graining of conformation space. (a) Eigenvector-based clustering of conformational states using the sign of φ2(i)/φ1(i) (top),

shifted and rescaled σn(i) (middle panels, n ) 2, 3, 4), and splitting probability Fi(bottom, see eq 20). (b) Clustering of states using φ3/φ1(top) and

φ4/φ1(bottom) plotted vs φ2/φ1. The solid horizontal and vertical red lines indicate σn) 0.5 used in the four-state clustering. Clustering according

to the sign structure would use the blue dashed lines, instead. (c) Representative molecular structures of folded (F1 and F2) and unfolded states (U1

and U2). (i, i + 4) hydrogen bonds are indicated in green. Transition states (TS1 and TS2) were selected from states with σ2(i) ≈ 0.5.

Figure 11. Transition rates (in units of µs-1) for the 2-state (left) and

4-state reduced representations obtained by eigenvector-based clustering

of the 32 conformational states of Ala5 at T ) 300 K.

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6067

Page 12

the first three nonzero eigenvalues of the 32-state model, and

the amplitudes should be one. Indeed, we find excellent

agreement. In particular, the slowest relaxation has a time

constant of 7.1 ns, as compared to 7.0 ns for the 32-state system,

and the amplitude is 0.93. We conclude that the projections onto

the left-hand eigenvectors provide excellent (though not entirely

unambiguous) criteria to group states into clusters.

Figure 11 shows the kinetic rate coefficients of the two-state

and four-state models, both obtained from MLPB analysis.

Interestingly, the four-state model is fully connected. However,

the relaxations between U1 and U2 and between F1 and F2

are, indeed, faster than between U and F states, consistent with

the ordering of the eigenvalues associated with the different

relaxation processes.

In summary, we found that a two-state representation with a

folded and an unfolded state provides a good approximation to

the full dynamics. A four-state description, with two folded and

two unfolded states, recovers accurately the first three relaxation

processes.

6. Conclusions

We showed that a 32-state master equation could accurately

describe the conformational dynamics of blocked Ala5, a short

helix-forming peptide. Our detailed analysis demonstrates that

accurate master equations are obtained (1) if conformational

states are assigned by using transition paths (TBA) rather than

instantaneous conformations (CBA) and (2) if a propagator-

based method is used with long lag times (here, 10 ps to 1 ns).

To identify transition paths in the TBA method, using dihedral

angles is a reasonable choice, without the need for high-quality

reaction coordinates.63,81To extract rates, propagator-based

methods are superior to direct, lifetime-based approaches

because they suppress the effects of fast non-Markovian

dynamics. Accordingly, even for trajectories with poorly as-

signed states, the propagator-based (MLPB) method leads to

systematically better estimates of the intrinsic transition rates.

Here, we used long equilibrium simulations (covering 3 × 1

µs) to parametrize the master equation. However, as we showed,

transition matrices for lag times as short as 1 ps can be used to

construct accurate master equation models, as long as transition-

path methods are used to assign states. Accordingly, one can

use either properly initialized short MD runs10or replica-

exchange molecular dynamics runs73,83with swapping intervals

>1 ps to obtain the input transition matrices. In the case of

replica-exchange simulations, states could be assigned using

TBA by following the trajectories in temperature space.84

As an important aspect, we showed that master equation

models can be explicitly validated by comparing predicted

correlation functions to those obtained directly from simulation.

Specifically, we projected the simulation trajectories onto the

left-hand eigenvectors of the rate matrix and found that the

resulting auto-correlation functions have single-exponential

decays with relaxation times as predicted from the eigenvalues

of the rate matrix. We also showed that the cross-correlations

between the projections vanish, as predicted. The excellent

agreement of the correlation functions shows that the master

equation captures not only the relaxation times but also the

character of the different relaxation modes.

Gaps in the eigenvalue spectrum of the master equation also

suggest further reductions from 32 to 2 and 4 states, respectively.

This coarse graining is achieved by grouping states together

into clusters according to their projection onto the eigenvectors

of the slowest-decaying modes. We show that we can reduce

the system roughly to two states, folded and unfolded. The

folded state is found to contain all structures with at least one

(i, i + 4)-type R-helical hydrogen bond within the four

N-terminal amino acids. However, the number correlation

function for the two-state projection decays substantially faster

than the slowest relaxation process seen in the actual simulation

trajectories. A more faithful representation of the full dynamics

is obtained by using four states, two each for the unfolded and

folded ensembles. The slow relaxation in the folded state, with

a relaxation time of ∼2 ns, is associated with the transition of

the N-terminal amino acid between helix and coil states. In

summary, we found that the construction of master equations

using propagator-based methods not only allowed us to explore

the long-time dynamics, but also provides detailed insights into

the mechanisms of the folding dynamics of a peptide.

Acknowledgment. We thank Dr. Attila Szabo and Dr.

Alexander Berezhkovskii for many helpful and stimulating

discussions. This research utilized the high-performance com-

putational capabilities of the Biowulf Linux cluster at the NIH

(http://biowulf.nih.gov), and it was supported by the Intramural

Research Program of the NIDDK, NIH.

References and Notes

(1) Mohanty, D.; Elber, R.; Thirumalai, D.; Beglov, D.; Roux, B. J.

Mol. Biol. 1997, 272, 423-442.

(2) Schu ¨tte, C.; Fischer, A.; Huisinga, W.; Deuflhard, P. J. Comput.

Phys. 1999, 151, 146-168.

(3) Huisinga, W.; Schu ¨tte, C.; Stuart, A. M. Commun. Pure Appl. Math.

2003, 56, 234-269.

(4) Swope, W. C.; Pitera, J. W.; Suits, F. J. Phys. Chem. B 2004, 108,

6571-6581.

(5) Swope, W. C.; Pitera, J. W.; Suits, F.; Pitman, M.; Eleftheriou,

M.; Fitch, B. G.; Germain, R. S.; Rayshubski, A.; Ward, T. J. C.; Zhestkov,

Y.; Zhou, R. J. Phys. Chem. B 2004, 108, 6582-6594.

(6) Levy, Y.; Jortner, J.; Berry, R. S. Phys. Chem. Chem. Phys. 2002,

4, 5052-5058.

(7) Chekmarev, D. S.; Ishida, T.; Levy, R. M. J. Phys. Chem. B 2004,

108, 19487-19495.

(8) de Groot, B. L.; Daura, X.; Mark, A. E.; Grubmu ¨ller, H. J. Mol.

Biol. 2001, 309, 299-313.

(9) Becker, O. M.; Karplus, M. J. Chem. Phys. 1997, 106, 1495-1517.

(10) Sriraman, S.; Kevrekidis, I. G.; Hummer, G. J. Phys. Chem. B 2005,

109, 6479-6484.

(11) Zwanzig, R. J. Stat. Phys. 1983, 30, 255-262.

(12) Zwanzig, R.; Szabo, A.; Bagchi, B. Proc. Natl. Acad. Sci. U.S.A.

1992, 89, 20-22.

(13) Bryngelson, J. D.; Wolynes, P. G. J. Phys. Chem. 1989, 93, 6902-

6915.

(14) Schonbrun, J.; Dill, K. Proc. Natl. Acad. Sci. U.S.A. 2003, 100,

12678-12682.

(15) Bicout, D. J.; Szabo, A. J. Chem. Phys. 1998, 109, 2325-2338.

(16) Hummer, G. New J. Phys. 2005, 7, 34.

(17) Best, R. B.; Hummer, G. Phys. ReV. Lett. 2006, 96, 228104.

(18) Hummer, G.; Kevrekidis, I. G. J. Chem. Phys. 2003, 118, 10762-

10773.

(19) Bolhuis, P. G.; Dellago, C.; Chandler, D. Proc. Natl. Acad. Sci.

U.S.A. 2000, 97, 5877-5882.

(20) Ma, A.; Dinner, A. R. J. Phys. Chem. B 2005, 109, 6769-6779.

(21) Hummer, G. J. Chem. Phys. 2004, 120, 516.

(22) Rhee, Y. M.; Pande, V. S. J. Phys. Chem. B 2005, 109, 6780-

6786.

(23) Ren, W.; Vanden-Eijnden, E.; Maragakis, P.; E, W. J. Chem. Phys.

2005, 123, 134109.

(24) Elmer, S. P.; Park, S.; Pande, V. S. J. Chem. Phys. 2005, 123,

114902.

(25) Elmer, S. P.; Park, S.; Pande, V. S. J. Chem. Phys. 2005, 123,

114903.

(26) Snow, C. D.; Sorin, E. J.; Rhee, Y. M.; Pande, V. S. Annu. ReV.

Biophys. Biomol. Struct. 2005, 34, 43-69.

(27) Singhal, N.; Pande, V. S. J. Chem. Phys. 2005, 123, 204909.

(28) Singhal, N.; Snow, C. D.; Pande, V. S. J. Chem. Phys. 2004, 121,

415.

(29) van der Spoel, D.; Seibert, M. M. Phys. ReV. Lett. 2006, 96, 238102.

(30) Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Dill, K. A. Multiscale

Model. Sim. 2006, 5, 1214-1226.

6068 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Buchete and Hummer

Page 13

(31) Noe, F.; Horenko, I.; Schu ¨tte, C.; Smith, J. C. J. Chem. Phys. 2007,

126, 155102.

(32) Kube, S.; Weber, M. J. Chem. Phys. 2007, 126, 024103.

(33) Chodera, J. D.; Singhal, N.; Pande, V. S.; Dill, K. A.; Swope, W.

C. J. Chem. Phys. 2007, 126, 155101.

(34) Poland, D. C.; Scheraga, H. A. Theory of the helix-coil transition;

Academic Press: New York, 1970.

(35) Scheraga, H. A.; Vila, J. A.; Ripoll, D. R. Biophys. Chem. 2002,

101, 255-265.

(36) Graf, J.; Nguyen, P. H.; Stock, G.; Schwalbe, H. J. Am. Chem.

Soc. 2007, 129, 1179-1189.

(37) Wang, T.; Du, D.; Gai, F. Chem. Phys. Lett. 2003, 370, 842-848.

(38) Thompson, P. A.; Eaton, W. A.; Hofrichter, J. Biochemistry 1997,

36, 9200-9210.

(39) Doshi, U.; Mun ˜oz, V. Chem. Phys. 2004, 307, 129-136.

(40) Buchete, N. V.; Straub, J. E. J. Phys. Chem. B 2001, 105, 6684-

6697.

(41) van Giessen, A. E.; Straub, J. E. J. Chem. Phys. 2005, 122, 024904.

(42) van Giessen, A. E.; Straub, J. E. J. Chem. Theory Comput. 2006,

2, 674-684.

(43) Daidone, I.; D’Abramo, M.; Di Nola, A.; Amadei, A. J. Am. Chem.

Soc. 2005, 127, 14825-14832.

(44) Hummer, G.; Garcı ´a, A. E.; Garde, S. Phys. ReV. Lett. 2000, 85,

2637-2640.

(45) Hummer, G.; Garcı ´a, A. E.; Garde, S. Proteins 2001, 42, 77-84.

(46) Nymeyer, H.; Garcı ´a, A. E. Proc. Natl. Acad. Sci. U.S.A. 2003,

100, 13934-13939.

(47) Margulis, C. J.; Stern, H. A.; Berne, B. J. J. Phys. Chem. B 2002,

106, 10748-10752.

(48) Bicout, D. J.; Szabo, A. Protein Sci. 2000, 9, 452-465.

(49) Deuflhard, P.; Huisinga, W.; Fischer, A.; Schu ¨tte, C. Linear Algebra

Appl. 2000, 315, 39-59.

(50) Deuflhard, P.; Weber, M. Linear Algebra Appl. 2005, 398, 161-

184.

(51) Belkin, M.; Niyogi, P. Neural Comput. 2003, 15, 1373-1396.

(52) Coifman, R. R.; Lafon, S.; Lee, A. B.; Maggioni, M.; Nadler, B.;

Warner, F.; Zucker, S. Proc. Natl. Acad. Sci. U.S.A. 2004, 102, 7426-

7431.

(53) Onsager, L. Phys. ReV. 1938, 54, 554-557.

(54) Du, R.; Pande, V. S.; Grosberg, A. Y.; Tanaka, T.; Shakhnovich,

E. S. J. Chem. Phys. 1998, 108, 334-350.

(55) Geissler, P. L.; Dellago, C.; Chandler, D. J. Phys. Chem. B 1999,

103, 3706-3710.

(56) Best, R. B.; Hummer, G. Proc. Natl. Acad. Sci. U.S.A. 2005, 102,

6732-6737.

(57) Snow, C. D.; Rhee, Y. M.; Pande, V. S. Biophys. J. 2006, 91, 14-

24.

(58) Berezhkovskii, A.; Szabo, A. J. Chem. Phys. 2006, 125, 104902.

(59) Berezhkovskii, A.; Szabo, A. J. Chem. Phys. 2004, 122, 014503.

(60) Wales, D. J. Mol. Phys. 2002, 100, 3285-3305.

(61) Dellago, C.; Bolhuis, P. G.; Chandler, D. J. Chem. Phys. 1999,

110, 6617-6625.

(62) van Erp, T. S.; Moroni, D.; Bolhuis, P. G. J. Chem. Phys. 2003,

118, 7762-7774.

(63) Hummer, G. J. Chem. Phys. 2004, 120, 516-523.

(64) Andrec, M.; Levy, R. M.; Talaga, D. S. J. Phys. Chem. A 2003,

107, 7454-7464.

(65) Kou, S. C.; Xie, X. S.; Liu, J. S. Appl. Stat. 2005, 54, 1-28.

(66) McSharry, P. E.; Smith, L. A. Phys. ReV. Lett. 1999, 83, 4285-

4288.

(67) Meyer, R.; Christensen, N. Phys. ReV. E: Stat. Phys., Plasmas,

Fluids, Relat. Interdiscip. Top. 2000, 62, 3535-3542.

(68) Hinrichs, N. S.; Pande, V. S. J. Chem. Phys. 2007, 126, 244101.

(69) van der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A.

E.; Berendsen, H. J. C. J. Comput. Chem. 2005, 26, 1701-1718.

(70) Lindahl, E.; Hess, B.; van der Spoel, D. J. Mol. Model. 2001, 7,

306-317.

(71) Sorin, E. J.; Pande, V. S. Biophys. J. 2005, 88, 2472-2493.

(72) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K.

M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman,

P. A. J. Am. Chem. Soc. 1995, 117, 5179-5197.

(73) Garcı ´a, A. E.; Sanbonmatsu, K. Y. Proc. Natl. Acad. Sci. U.S.A.

2002, 99, 2782-2787.

(74) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089-

10092.

(75) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.;

Pedersen, L. G. J. Chem. Phys. 1995, 103, 8577.

(76) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola,

A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684-3690.

(77) Miyamoto, S.; Kollman, P. A. J. Comput. Chem. 1992, 13, 952-

962.

(78) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. J.

Comput. Chem. 1997, 18, 1463-1472.

(79) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.;

Klein, M. L. J. Chem. Phys. 1983, 79, 926-935.

(80) Jayachandran, G.; Vishal, V.; Pande, V. S. J. Chem. Phys. 2006,

124, 164902.

(81) Bolhuis, P. G.; Chandler, D.; Dellago, C.; Geissler, P. L. Annu.

ReV. Phys. Chem. 2002, 53, 291-318.

(82) Tobias, D. J.; Brooks, C. L. Biochemistry 1991, 30, 6059-6070.

(83) Andrec, M.; Felts, A. K.; Gallicchio, E.; Levy, R. M. Proc. Natl.

Acad. Sci. U.S.A. 2005, 102, 6801-6806.

(84) Buchete, N.-V.; Hummer, G. Submitted, [http://arxiv.org/abs/

0710.5533].

Peptide Folding Dynamics Coarse Master Equations

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6069