Page 1

Coarse-Grained (Multiscale)

Simulations in Studies

of Biophysical and

Chemical Systems

Shina C.L. Kamerlin,1,2Spyridon Vicatos,1

Anatoly Dryga,1and Arieh Warshel1

1Department of Chemistry, University of Southern California, Los Angeles, California 90089;

email: vicatos@usc.edu, warshel@usc.edu

2Department of Organic Chemistry, Arrhenius Laboratory, Stockholm University,

S-10691 Stockholm, Sweden; email: l.kamerlin@gmx.com

Annu. Rev. Phys. Chem. 2011. 62:41–64

First published online as a Review in Advance on

October 28, 2010

The Annual Review of Physical Chemistry is online at

physchem.annualreviews.org

This article’s doi:

10.1146/annurev-physchem-032210-103335

Copyright c ? 2011 by Annual Reviews.

All rights reserved

0066-426X/11/0505-0041$20.00

Keywords

simplified models, reference potential approaches, free-energy landscapes,

QM/MM approaches, long-timescale simulations, renormalization

Abstract

Recentyearshavewitnessedanexplosionincomputationalpower,leadingto

attempts to model ever more complex systems. Nevertheless, there remain

cases for which the use of brute-force computer simulations is clearly not

the solution. In such cases, great benefit can be obtained from the use of

physically sound simplifications. The introduction of such coarse graining

can be traced back to the early usage of a simplified model in studies of

proteins. Since then, the field has progressed tremendously. In this review,

we cover both key developments in the field and potential future directions.

Additionally,particularemphasisisgiventotwogeneralapproaches,namely

the renormalization and reference potential approaches, which allow one to

move back and forth between the coarse-grained (CG) and full models,

as these approaches provide the foundation for CG modeling of complex

systems.

41

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Click here for quick links to

Annual Reviews content online,

including:

• Other articles in this volume

• Top cited articles

• Top downloaded articles

• Our comprehensive search

Further

ANNUAL

REVIEWS

Page 2

Coarse graining:

a means to study

complex systems by

smoothing away the

fine details of the full

explicit system (e.g.,

using pseudo-atoms)

CG: coarse-grained

1. INTRODUCTION

Computer modeling of the function of macromolecules and related systems presents a problem

of enormous complexity. Although available computer power has increased rapidly, there are,

nevertheless, still many cases for which the use of brute-force simulations is clearly not the best

approach. Furthermore, there exist many systems whose nature was correctly elucidated even

beforetheemergenceofthecurrentlevelofcomputingpower.Thisissuecanbemostdramatically

illustrated by considering the possibility of using ab initio representations of the entire protein in

the study of the action of molecular motors. Obviously, this has enormous difficulties. However,

there also exist far less dramatic examples of cases in which one cannot (and perhaps should not)

progress without the use of a simplified model, and, in fact, recent years have witnessed a growing

appreciation of the fact that the simulation of complex systems, and, in particular, the modeling

of biological function, can greatly benefit from physically sound simplifications. The idea of such

coarse graining first appeared in protein folding studies and has since become a common, well-

accepted, and powerful strategy. This review considers the developments in the field, covering

both advances and new directions. We try to emphasize that often focusing on minute details is

not the best way to model a complex system. We also emphasize the importance of capturing the

relevant physical features, as well as the strategies that allow one to move back and forth between

the coarse-grained (CG) and explicit models.

The idea of using a simplified model in computational studies of proteins dates back to Levitt

& Warshel’s (LW) simplified model for protein folding (1), as well as the much simpler G¯ o model

(2), which emerged at the end of the same year as the LW model. The field of multiscale modeling

of proteins and related systems has grown tremendously since that time, and this review considers

developmentsinthefield.Wealsoemphasizethecriticaldifferencebetweenasimplifiedcomplete

modelandanexplicitbutincompletemodel,aswellasapproachesthatallowonetomovebetween

models of different degrees of sophistication.

In general, one may start from Einstein’s advice to “make everything as simple as possible,

but not simpler.” Here, one should of course define what the question is, and what level of

understanding is desired. It is also important to keep in mind the available computer power

and its ability to give a stable (converging) result at that given moment in the history of the

field. An excellent example is provided by the early difficulties with acceptance of the role of

simplified models. People hoped for a complete Hamiltonian representation of subsystems, where

in fact a soft sphere dipolar model for the solvation of molecules in water was suitable (3). This

is representative of a time during which studies of two water molecules and an ion were tractable

(e.g., 4), with a complete Hamiltonian (but not, of course, with a Hamiltonian relevant for the

complete solute-solvent system). A system with a simplified model for each water molecule, but

with a physical representation of the surface between the explicit system and the bulk, turns out to

be an excellent model for microscopic studies of solvation effects (see, e.g., 5, 6). A preoccupation

with minute details might prevent the realization of this point, thus slowing progress in the field.

Similarly, sometimes a dangerous assumption is that the availability and capability of running

long trajectories (such as a millisecond trajectory for an ion channel) represent a major break-

through that will yield great insight into a system. This can be quite problematic, for example,

because understanding selectivity requires running multiple simulations and exploring the ef-

fect of different parameters on the overall current (e.g., 7). The same holds true for a single

long trajectory that explores protein folding (8). Of course, with increasing computational power,

it will be possible to run multiple long simulations using all-atom models. However, by that

time, many of the key problems will probably have already been resolved by the use of simpler

models.

42 Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 3

Reference potential:

a simplified (e.g.,

coarse-grained)

potential that can be

used as a reference for

a full explicit potential,

thus providing

significant savings in

computational cost

This review discusses the broad usage of CG models. Here, some recent relevant reviews of

this topic (which include 9–12) can also be useful. In addition to pointing out the wider scope

of the field, we also emphasize two general approaches, namely the renormalization approach,

which allows one to move from a full to a simplified model, and the reference potential ap-

proach, which allows one to go from the simplified to the full model. We believe that keep-

ing these approaches in mind provides an excellent way to understand the foundations of CG

modeling.

2. PROTEIN FOLDING AS AN EXAMPLE OF THE NEED

FOR COARSE-GRAINED MODELS

Probably the earliest example of the CG idea in biology is the development of the simplified

protein folding model (1). That is, protein folding presented an enormous challenge, in light of

whatcametobeknownastheLevinthalparadox(13)whereitseemedthatitwasclosetoimpossible

to rationalize how a protein with so many degrees of freedom is capable of folding within any

reasonable timescale. In 1974, we tried to attack this fundamental problem, and, realizing that

even the minor energy minimization of a protein took an extremely long time, we moved to

a seemingly drastic simplification (while still retaining the main physics of the problem). This

resulted in the replacement of the protein side chains by spheres with an effective potential that

implicitly represented the average potential of the solvated side chains. The main chain was

represented by virtual bonds between the Cα’s. This model was surprisingly effective and in fact

provided the first reasonable physically based solution of the Levinthal paradox, by finding several

native structures while starting from the unfolded state (see Figure 1 and the discussion in 1).

The success of this model led to significant criticism (see, e.g., 14). Today, the best response to

such criticism has simply been the widespread adaptation of this model (see below). At any rate, a

further useful simplification that emerged at the same time was a model that kept the helices of

the simplified model in a fixed helical configuration (15).

A related model was introduced by G¯ o and coworkers (2) shortly after our model. In its early

versions, this model considered a chain of nonintersecting units of a given length on the two-

dimensional(2D)squarelatticeandexploredinterestingformalissuessuchasthepartitionfunction

of the simplified model. However, the early version of this model sacrificed too much physics to

be considered a realistic model of a protein and thus could not be used (before the introduction of

major changes) to explore the protein folding puzzle. While this is clearly an interesting approach

that can provide some useful basic information, constraining the system to a 2D square lattice

simplifies and limits the correspondence between this 2D model and the actual complex structure

ofaprotein,makingitsomewhatprimitivetoproducearealproteintopology(17–20).Thisbrings

ustothepointthatdifferentsimplificationsareneededfordifferentproblems,andtherelationship

betweentheCGmodelandthefullmodelshouldalwaysbeconsidered.Infact,asdiscussedbelow,

it is crucial to be able to move between the CG and full models.

The study of protein folding by simplified models has become a major research field. A

significant amount of work (e.g., 21–25) was done using the simple G¯ o model (which has be-

came known as a lattice model). In contrast to this, other works that tried to be more realis-

tic used the LW model, which has been termed an off-lattice model. The selection of which

model to use depends on the question being asked, and both models have provided enor-

mous insight into the folding process, on the corresponding landscape and timescales (26), as

well as encouraging more experimental studies, and the initial attempts at all-atom simulations

(e.g., 27, 28).

www.annualreviews.org • Multiscale Simulations in Biophysics 43

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 4

1000

–40

–20

0

0

20

10

20

30

40

50

200 300400500

Final

Final

Starting

Starting

Cycle number

Energy

(kcal mol–1)

RMS

deviation (Å)

600 700800 900 1,000

0

0

0

0

0

0

0

0

Figure 1

The folding trajectory produced by the coarse-grained model of Reference 1. This simulation was initiated

from an extended starting conformation, with α of all terminal helices set to 180◦, with the exception of

residues 48 to 58, where α = 45◦. No other knowledge whatsoever was used about the native protein during

the simulation. At the end of each minimization cycle, the conformation was thermalized; i.e., thermal

fluctuations were reintroduced, and the conformation was considered to be vibrating around the minimum

(such that each mode has an average kinetic energy of kT/2, where k is the Boltzmann constant and T is the

absolute temperature). The thermal vibration is then suddenly stopped, and a new starting conformation for

the next pass of energy minimization is generated from the structure at this point. Such normal-mode

thermalization avoids nonproductive changes in the protein conformation because it knows which

combination of angle changes should cause the greatest change in conformation for a given energy increase.

The normal-mode treatment presents what is probably the first realistic dynamical treatment of large-

amplitude protein motions. This figure was originally presented in Reference 1. Adapted with permission

from Macmillan Publishers Ltd: Nature, copyright 1975.

Despite the emergence of a powerful simplified model for protein folding, the lingering

question remains as to what the relationship between the simplified model and the corresponding

explicit model actually is. This appears to be crucial in light of the increasing interest in protein

landscapes, which is an important element in attempts to explore the energetics and kinetics of

the folding process (for a review, see 21, and references therein). Furthermore, the ability to

explore and sample large regions in the accessible conformational space can help investigators

improve the description of functional properties, as well as explore the possible relationships

between landscape and function (e.g., 21, 25, 29). Unfortunately, the detailed sampling of protein

landscapesrequiresenormouscomputationalresources.Thusitisimportanttodevelopmultiscale

approaches that allow one to effectively generate the free-energy surface for folding and related

processes. A general way to resolve this problem was introduced some time ago (30), in which we

44 Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 5

QM/MM: quantum

mechanical/molecular

mechanical

PMF: potential of

mean force

proposed the use of a simplified model as a reference potential for explicit calculations of folding

free energies. This approach is discussed further in Section 3.

3. USING THE COARSE-GRAINED MODEL AS A REFERENCE

POTENTIAL FOR EXPLICIT CALCULATIONS

The use of CG models leaves one with the question of how capable the given model is at re-

producing the corresponding full model. Fortunately, it is possible to systematically resolve this

issue by generating the results (i.e., the free energy and some dynamical features) of the explicit

model using the simplified model as a reference potential. More specifically, following the idea

introduced in Reference 30, we can use the CG model as a reference for the full model. In this

approach, which is described in Figure 2, we evaluate the free energy of moving between two

states in the explicit model by evaluating the corresponding free energy with the CG model and

then just evaluating the free energy of moving from the CG surface to the surface of the full model

at the end points.

The above idea has been demonstrated to work in a protein folding study (30), and other

workers (e.g., 25, 31, 32) have also recently explored related strategies. As clarified further below,

wehaveexploitedthesameideainawiderangeofproblems,includingtheaccelerationofquantum

mechanical/molecular mechanical (QM/MM) calculations (33–37; see also Section 4), as well as

path integral calculations of nuclear quantum mechanical effects (38, 39). In subsequent sections,

we also consider several key implementations of the reference potential idea.

One possible application of our reference potential approach is the evaluation of the effect of

mutations on protein stability. Although this can be done by evaluating the folding potential of

mean force (PMF) for both systems, it is much simpler to use a thermodynamic cycle of the type

presented in Figure 2, as was done in Reference 40, in which we studied mutations of ubiquitin.

The reference potential idea considered above provides a promising strategy in the field of

enzyme design, in which it can be used to evaluate the binding free energy of rate-determining

transition states. This can be done by focusing on the electrostatic free-energy contribution, while

Simplified model

Initial state

Final state

Explicit model

Simplified model

ΔΔginitial

sp ep

ΔΔgfinal

sp ep

Δgep

Δgsp

Figure 2

The thermodynamic cycle used to calculate the change (?gep) in free energy for a generic process in an

explicit system. Having calculated the free-energy change of the corresponding simplified model, ?gsp,

umbrella sampling can be used to calculate the free-energy change ??gsp→epfor the initial and final states

to obtain ?gep.

www.annualreviews.org • Multiscale Simulations in Biophysics45

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 6

using the cycle described in Reference 40. Reference 41 discusses the potential of this approach

in enzyme design.

4. QM/MM AND RELATED MULTISCALE MODELS

One of the best demonstrations of the multilevel strategy for modeling biological systems involves

the development of the QM/MM approach (16) with its crucial electrostatic embedding idea

(for reviews see, e.g., 33, 42). Over the years, different versions of the QM/MM model have

emerged, although all approaches share the idea of treating the reactive part of the system using

a quantum mechanical approach and embedding this part in a system that is treated on a simpler

level. This model has rapidly become one of the most popular approaches for studying chemical

reactivityingeneralandenzymefunctioninparticular(e.g.,33,andreferencestherein;43–46).The

main strategies currently being adopted for performing QM/MM calculations are summarized in

Figure 3, and here we provide a brief overview of each.

Although QM/MM approaches have clearly become an essential tool for modeling enzymatic

reactions (at least until it is possible to represent all of the enzyme quantum mechanically), it

is clearly important to use this tool correctly. Here an important issue is to perform extensive

configurational sampling during the course of the simulation, as the use of QM/MM simulations

without proper sampling is not so effective. The difficulties arising from performing only limited

High-level

region

Low-level

region

Boundary

region

QM(ai)

EVB

model

CDFT/FDFT

model

MM

Boundary

region

DFT

CDFT/FDFT

Boundary

region

VB

MM

Boundary

region

QM(ai)/MMQM(ai)/MM

model model model

QM(ai)/MM

Figure 3

An overview of different quantum mechanical/molecular mechanical (QM/MM) approaches. Illustrated here

are the empirical valence bond (EVB) approach, an ab initio QM(ai)/MM approach, and the constrained/

frozen density functional theory (CDFT/FDFT) approaches.

46 Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 7

FEP: free-energy

perturbation

Linear response

approximation

(LRA): this

approximation

assumes that a system

following it has the

same solvent force

constants in the initial

and final states

EVB: empirical

valence bond

Constrained/

frozen density

functional theory

(CDFT/FDFT):

these approaches split

the system into two

regions, both treated

using ab initio DFT;

however, the electron

densities of atoms in

the outer region are

either frozen (FDFT)

or constrained

(CDFT)

energy minimization have been highlighted elsewhere (33), but we would like to point out that

one of the most significant shortcomings of such an approach is that, as the enzyme active-site

landscapeisquitecomplex,estimatingtheQM/MMreactionpathalongafixedreactioncoordinate

can reflect artificial minima. This issue is particularly challenging when dealing with ab initio QM

systems in QM(ai)/MM simulations. Addressing the need for properly sampling ab initio surfaces

has led to several important advances (e.g., 33, 35, 47–53), many of which (e.g., 33, 35, 49, 50, 52,

53)exploitouridea(34,54)ofutilizingaclassicalpotentialasareferenceforQM/MMcalculations.

The key point in the use of a reference potential for QM/MM calculations is the evaluation of

the free energy of moving from the reference potential to the ab initio potential. This free energy

can be evaluated either by means of a single-step free-energy perturbation (FEP) approach or by

means of the linear response approximation (LRA). Although both approaches are viable in prin-

ciple, the LRA approach is particularly powerful in that it allows one to obtain a reasonable result

evenincasesinwhichtheabinitioandreferencepotentialsaresignificantlydifferent.Heretheem-

piricalvalencebond(EVB)approach[whichhasbeendiscussedindetailelsewhere(e.g.,55,56)and

reflects the knowledge of a tremendous amount of chemical information] provides a particularly

powerful reference potential (for discussion see, e.g., 56). Such an approach has been effectively

and successfully applied to the study of activation barriers both in solution and in proteins (e.g.,

34, 36, 54, 57), most notably in the case of Reference 36, which used the EVB as a reference

potential for QM(ai)/MM calculations to successfully resolve the highly controversial issue (58)

of the energetics of the reference reaction for haloalkane dehalogenase in solution. Specifically,

this study clarified that the earlier EVB estimate (58) of the catalytic effect and the effect of the

enzyme is quantitatively correct, and it is presently the only true QM(ai)/MM study that considers

the free-energy surface of the haloalkane dehalogenase reaction in the protein and in solution.

Recently, we also advanced the idea of performing QM(ai)/MM-FEP calculations of solvation

free energies using a classical reference potential, by means of a powerful approach in which

the solute environment is represented by an average solvent potential, which is then added to

the solute Hamiltonian (33, 35). This approach has been demonstrated to lead to computational

time savings of up to 1,000 times in QM(ai)/MM-FEP calculations of solvation free energies of

simple systems in which the solute structure is kept fixed during the simulation.

The EVB approach is much more that just an effective reference potential; it is in fact probably

themostpowerfulcurrentQM/MMapproachwhenoneisinterestedinlong-timescalesimulations

andextensivesampling(see,e.g.,55,56).Here,however,weonlymentionthenatureoftheenergy

coordinate, x, which is taken as the energy gap between the diabatic states. This selection (55, 59)

is particularly powerful when one tries to represent the entire many-dimensional solvent space by

a single coordinate (see 60), as it guarantees accelerated convergence for processes in condensed

phases because it captures the main physics of the solvent response. The energy gap as a reaction

coordinate has been successfully used to study a wide range of complex problems, such as in the

case of ATP synthase, in which the energy gap was capable of describing the coupling between

the mechanical and chemical steps (61), or in the cases of chorismate mutase (29) and adenylate

kinase (62). We also note that the power of the EVB energy-gap mapping is increasingly being

appreciated by other workers (63–65).

The idea of embedding the QM model into a simpler model has been advanced on a level that

canbecalledQM/QM/MM,whichisbestdemonstratedbytheso-calledfrozendensityfunctional

theory (FDFT) and constrained density functional theory (CDFT) approaches (37, 47, 54, 57, 66,

67). In the FDFT approach, a very large part of the entire system is represented by a QM(DFT)

approach. However, the region immediately surrounding the internal region is represented by

fixed DFT densities (37). In contrast, the CDFT approach allows the surroundings to relax by

a freeze-and-thaw approach (and thus constrains the surrounding densities rather than freezes

www.annualreviews.org • Multiscale Simulations in Biophysics47

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 8

Metadynamics/

paradynamics:

alternative approaches

for examining an

entire complex system

using high-level ab

initio calculations; the

core of both

approaches is to

identify the best

potential that

represents the full

explicit potential

Simplified folding

model: simplified

approach to study

protein folding by

modeling the side

chains as, e.g., beads

instead of modeling

the full explicit system

them). The CDFT and FDFT approaches have been recently adopted by key research groups

(e.g., 68–70) and are reviewed in detail in, e.g., Reference 71. The FDFT approach also provides

an ideal way to embed the central region and its surroundings without the problems associated

with the link-atom treatment.

The CDFT approach has been demonstrated to be extremely effective for the evaluation of

the diabatic free-energy functional as well as for the exploration of the mixing between diabatic

states (66) (i.e., the Hijterm) and for the evaluation of QM(ai)/MM free-energy surfaces that take

into account the free energy associated with both the substrate and solvent motions, which in turn

allows one to obtain a free-energy barrier that properly reflects the solute entropy (72).

As what could perhaps be considered a testimony to the increasing popularity of this approach,

there are several recent examples in the literature of some of the most important CDFT ideas

reappearing in other forms.The work of Wu & Van Voorhis (73), which might be seen as a break-

through (74–77), is effectively an adaptation of key CDFT ideas (37, 54, 57, 67) for a somewhat

different approach (73) that emphasizes fixing the diabatic densities by Lagrange multipliers (73),

rather than by the physically based approach used, e.g., in References 37, 47, 54, 57, 66, and 67.

Here, it is also important to clarify a critical point, which is easily misunderstood: That is, diabatic

states are never (and should not be) unique. They are simply a useful mathematical representation

for solving the physics of the real adiabatic system. In general, it is important to force the diabatic

states to reproduce the physics of the reactant and products, and, at least in the case of the crucial

charge-transfer reactions, this approach is much more effective than the seemingly rigorous use of

Lagrange multipliers (73). The CDFT approach follows the EVB philosophy and considers the

wave functions as being L¨ owdin orthogonalized, and, of course, the corresponding Hijis different

from the one used by Wu & Van Voorhis (73), with both providing correct physics if one uses the

corresponding diabatic states (see 78).

Another recent development in the field, which has achieved great popularity, is the

metadynamics approach of Laio & Parrinello (79). This approach, which is in many ways similar

to earlier ideas (e.g., 30, 34, 80, 81) and in some respects to the idea of using a CG reference

potential, has been reviewed in detail in Reference 82, so here we only cover it briefly, pointing

out its similarities, as well as its shortcomings, relative to the reference potential approach.

At its core, the strategy of the metadynamics model involves attempting to build the best

potential, whose addition to the actual potential will result in a flat surface, i.e., the potential that

is the closet to −E(r). This is done by iteratively building successive Gaussian potentials that fill

the deepest wells along a small number of user-defined chemically relevant collective variables. In

this way, the system is allowed to escape over the lowest transition state as soon as the growing

biasing potential and the underlying free energy well exactly counterbalance each other, which,

as the title of the original Laio & Parrinello work suggests (79), effectively allows the simulation

to escape free-energy minima.

Although the metadynamics approach has been elegantly formulated, the philosophy behind it

is almost identical to our earlier approach of using a reference potential (see 33 for background).

That is, constructing a potential that makes the landscape flat (as the metadynamics approach

does) is similar to using a simplified folding model as a reference potential (30) and is additionally

quite similar to the general use of a reference potential for accelerated sampling, which has been

part of our QM/MM-FEP studies for a long time. Furthermore, although it is currently quite

popular to use approaches in which the best reaction coordinate is not assumed a priori (83–85),

using chemical knowledge could be superior to a blind search (even though many workers prefer

black-box approaches that limit user input). More specifically, using an approach that we call

paradynamics, we take a physically based reference potential and make it as close as possible to

E(r). This is done by first evaluating the real potential on a rough grid of n × m points [a search

48Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 9

Free-energy

landscape: the

detailed dependence of

the free energy of a

system on its

coordinates; the

corrugation of such

landscapes is currently

a topic of great interest

in studies of catalysis

and folding

thatcanbedonebyveryshortmoleculardynamics(MD)simulationswithaconstraintoneachgrid

point or, even better, by evaluating E(rnm)] and then subsequently fitting the EVB potential to the

grid.ThefittedEVBisfurtherrefinedbyminimizing?EEVB−Ereal?,whichisevaluatedbyrunning

trajectoriesatthereactantandtransition-stateregionsofboththeEVBandtheQM/MMsurfaces.

This refinement is done automatically using the derivatives of the energy gap with regard to the

EVB parameters. Once the average energy gap has been minimized, it is rather easy to get the

LRA estimate of the free energy for moving from the EVB to the QM/MM surfaces, as these two

surfaces are quite similar. We recently (86) demonstrated that this approach is a highly powerful

strategy that requires less effort than current implementations of the metadynamics approach.

Thesuccessoftheparadynamics approachisareflectionofthefactthatthattheEVBapproach

not only is a powerful semiempirical QM/MM approach in its own right, but also makes an ideal

reference potential for higher-level simulations. This is because it stores a tremendous amount of

chemical information, while simultaneously facilitating extensive sampling. Therefore, although

metadynamics is in principle a useful tool, it is also an expensive approach, both in terms of

computational cost (due to the need for repeated calls to the QM) and in terms of manpower (due

to the need for identifying the correct collective variables to obtain meaningful results, which can

pose a nontrivial challenge).

5. REDUCING THE DIMENSIONALITY OF THE

FREE-ENERGY LANDSCAPE

One of the key issues when using reduced models is the ability to capture the main physics of the

given system. A case in point is the description of the free-energy landscapes for reacting enzymes.

An excellent example (29) has been provided by a study of the catalytic landscape of chorismate

mutase. This study was performed by using a CG model to generate the free-energy landscape of

the enzyme, followed by an explicit EVB evaluation of the activation barriers for the chemical step

in different regions of the landscape, defined by the conformational and chemical coordinates.

Additionally, the approach introduced in Reference 29 has also been used in a detailed study of

adenylate kinase (62; see Section 6).

Now of course an important issue when modeling the catalytic landscape is the choice of a

proper reaction coordinate. At present, the most effective possibility is arguably provided by the

EVB formalism, which is considered in Section 4.

An excellent example of the use of a reduced coordinate is the description of the folding

landscape in terms of an order parameter, ρ (87), which serves as a measure of the degree of

similarity of each state with the native conformation. The energy has been demonstrated, on

average, to be a decreasing function of ρ (87), in line with the assumption of minimal frustration.

6. RENORMALIZING LONG-TIMESCALE PROCESSES

OneofthemostinterestingproblemsinCGmodelingisthechallengeofsimulatinglong-timescale

events. The most obvious approach is the use of a friction term to reflect the effect of the implicit

thermal bath, using the Einstein (88) or Wang & Uhlenbeck (89) formulations. In fact, the use

of frictional models to describe biological and chemical problems has a long history (e.g., 90–93).

However, when focusing on obtaining similar physics in the full and CG systems, and on study-

ing long-timescale biological processes, we cannot safely use the frictions obtained by standard

frictional models (see below).

Our main approach for studying long-timescale processes involves taking an explicit all-atom

potentialmodelandthecorrespondingsimplifiedmodel(withitssimplifiedfree-energylandscape)

www.annualreviews.org • Multiscale Simulations in Biophysics49

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 10

Renormalization

approach: a means

of moving from a

simplified model to a

full explicit model, by

ensuring that there is a

correspondence

between the simplified

and full models

and running MD simulations and Langevin dynamics (LD) on each model, respectively. This

is done while imposing a series of constraints on both models, which force these systems to

move along a given reaction coordinate on different timescales. Here, larger constraints force

faster motion (this approach is illustrated in Figure 4 and described more rigorously in 94). The

optimalfrictionisobtainedbyusingthesameconstraintsforthesimplifiedandexplicitmodelsand

adjustingthefrictioninthesimplifiedmodeluntilthetimescaleforthemotion(foreachconstraint)

becomes equivalent in the two models. However, the main point of the renormalization process

is that we can force the explicit model to undergo large structural changes within reasonable

computational time by using a large constraint, but it is essential to also interpolate the results to

cases without constraints (which would require enormous computational time). The use of our

renormalization approach has been recently validated on the well-defined test case of a simplified

ion channel and expanded to provide an extremely powerful approach, not only for long-timescale

simulations, but also to obtain free-energy surfaces (94).

Our validation of the renormalization approach (94) is important,in light of the possibilitythat

it may be perceived that elegant nonequilibrium-type approaches (such as those of 95 and 96) can

provide an effective way to obtain free-energy profiles (see the discussion in 94). Here our point

has been that the ultimate validation of a simulation approach does not come from the formal

elegance of the approach, but rather from its ability to obtain converging results for the systems

of interest. In this respect, considering the difficulties with attempts to estimate effective friction

using a standard treatment, we insist on the idea that there must be some correspondence between

the full and CG models and that the best way to obtain this correspondence is by simultaneously

adjustingthe frictionandPMF oftheCG modeluntilthe best agreement between the twomodels

is reached. We believe that this seemingly pedestrian approach is actually far more promisingthan

other approaches with more sophisticated formulations.

Here it is also useful to consider our recent study (62), which explored the idea that slow-

timescale conformational motions play a major role in enzyme catalysis (see the discussion in 62).

Although there are clear logical flaws with this argument, which have been discussed in detail

in, e.g., References 62 and 97, it is nevertheless crucial to explore this proposal by simulating

the relevant processes on the millisecond timescale. Such a study was performed by moving from

the full explicit model to a simplified CG model, and then to an even simpler 2D model, which

represents the landscape and dynamics in the space defined by the conformational and chemical

coordinates of the enzyme under study (62). However, this was achieved by means of a simple

single barrier potential in the 2D model, rather than by first evaluating the actual PMF in the

full or CG models (because the conclusions were not restricted to any specific barrier shape).

The renormalization results for this system from a more recent study with weaker forces are de-

picted in Figure 4 (94). As seen from this figure, we can find friction constants that satisfy the

− − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − →

Figure 4

An outline of our renormalization approach for studying long-timescale processes. (Left column)

Representations of (top) the all-atom explicit model, (middle) a coarse-grained model in which the protein

side chains are represented as spheres, and (bottom) our 2D simplified model, in which the system is described

by two effective dimensionless coordinates, Q1and Q2, which describe the conformational transition and

chemical step, respectively. (Right column) Plots showing the corresponding time required for crossing the

conformational barrier for each model, using constraints and friction coefficients of different magnitudes.

The time required to cross the conformational barrier is similar in all three models (for the same constraint).

(Bottom center) An actual renormalized surface, obtained using the 2D model, on which it is then possible to

run Langevin dynamics without a constraint (illustrated by the transparent plot above the surface). For

further information on this approach, we refer readers to Reference 62 and the main text.

50Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 11

III

II II II

I I I

Chemical coordinate

Conformational coordinate

III III III

II II II

I I I

III

II II II

I I I

Chemical coordinate

Conformational coordinate

ΔE = 1,090 kcal mol–1

(γ = 135 ps–1)

ΔE = 780 kcal mol–1

(γ = 135 ps–1)

ΔE = 780 kcal mol–1

(γ = 220 ps–1)

ΔE = 1,090 kcal mol–1

ΔE = 780 kcal mol–1

ΔE = 1,090 kcal mol–1

ΔE = 780 kcal mol–1

0

0.5

0.6

0.7

0.8

0.9

1.0

12

Time (ps)

Qconf

3

0

0.5

0.6

0.7

0.8

0.9

1.0

12

Time (ps)

Qconf

3

0

0.5

0.6

0.7

0.8

0.9

1.0

12

Time (ps)

Qconf

3

www.annualreviews.org • Multiscale Simulations in Biophysics 51

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 12

renormalization requirement. However, it seems that also in the challenging case of conforma-

tionalchanges,itisimportantandpossibletofurtherrefinetherenormalizationprocedure(see94).

At any rate, generating a reasonable 2D surface with a reasonable range of frictions allowed

us to explore the time dependency of long-timescale processes. This was found to be particularly

important in providing the first direct proof that that the dynamical proposal is invalid (62). In

this respect it is important to point out that, regardless of the fact that the use of the renormal-

ization treatment for examining large conformational changes still needs further refinement and

validation, nevertheless, our finding of the absence of dynamical coupling between the chemi-

cal and conformational coordinates in enzymes is still completely valid. That is, our conclusions

were obtained by examining all the ranges of reasonable frictions, which included changing the

corrugation of the 2D model. It was also confirmed using the much more explicit CG model.

The power of the renormalization approach has also been demonstrated when exploring the

selectivity of the KcsA ion channel (7), as discussed in Section 7, and the same approach has been

effectively applied to studies of proton transport (PTR) (see Section 9), as well as to the study of

vectorial translocation discussed in Section 8. It is also useful to mention that the long-timescale

behavior of the conformational coordinates has been effectively explored by the use of CG models

(e.g., 98).

At this point, it is important to clarify some recent confusion (99, 100) with regard to the

potential of approaches such as transition path sampling, and the assumption that relatively short

runs can be used to study millisecond processes (101). In fact, running short reactive trajecto-

ries, which may be useful for exploring the reaction coordinate, cannot tell us much about the

probability of climbing high activation barriers. The same misunderstandings have appeared in a

recent proposal (99) that downhill trajectories can be used to explore the long-timescale coupling

between conformational motions and chemical catalysis. Of course, there is no way to explore

these issues without simulating long-timescale barrier climbing processes. In conclusion, perhaps

the ultimate renormalization approach would involve the emergence of a method that can use the

CG simulations to obtain trajectories of the explicit models directly, and there are several options

for such a strategy.

7. SIMULATING LONG-TIMESCALE PROTON AND ION TRANSPORT

There exists significant current interest in MD simulations that account for changes in ionization

states during the simulated process (e.g., 102, 103). However, the current models do not consider

the time dependency of the PTR process. To advance in this challenging field, we combined

our approach of time-dependent Monte Carlo (MC) simulations of PTR processes (104) and the

simplified protein model in studies of pH-dependent MD (40).

Our model uses a simplified version of the EVB approach, which takes the energetics of any

possibleprotontransfer(PT)stepintoaccount.Here,theMCmovesarebasedontheelectrostatic

energies of the CG model and are then scaled by the characteristic PT time to correspond to the

rate constant predicted by transition-state theory. The barrier for the PT moves is then given

by a modified Marcus expression (105). This allows us to convert an MC procedure to a time-

dependent simulation by exploiting the isomorphism between the probability obtained from the

MC procedure and the probability factor of transition-state theory (see 40 for details).

Renormalized simulations have been used to study PTR in carbonic anhydrase (106) and

gramicidin (105). Both simulation studies established that PTR in proteins is controlled by the

electrostatic free-energy barrier, rather than by the Grotthus mechanism (see 107 for discus-

sion). However, when simulation studies become too time-consuming, one can move to the MC

approach. This approach has been used in Reference 104 to study PTR in cytochrome c oxidase.

52 Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 13

PDLD/S: semi-

microscopic protein

dipoles–Langevin

dipoles approach

QCFF/PI: quantum

mechanical consistent

force field method for

pi electron systems

Theaboverenormalizationapproacheshavealsobeenexploitedanddemonstratedtobehighly

powerful when exploring the selectivity of the KcsA potassium channels (7). This was done by

evaluatingthefreeenergyforthepenetrationofasingleionbymeansofthePDLD/S-LRAmodel

and then evaluating the ion-ion interaction on the fly by using a dielectric function, εeff(r), which

representstheeffectivedielectricforcharge-chargeinteractionsbymeansofadistance-dependent

functionthattypicallychangesfromapproximately20atshortdistancestothebulkvalue(see108).

The friction constant was evaluated by the same renormalization strategy mentioned above, and

our treatment of the charge-charge interaction appears to provide an extremely effective way

to study long-timescale processes in ion channels. We note that practically the same charge-

charge treatment was also used recently by Coalson and coworkers (109) (which can be verified

by examining the nature of the dielectric constant).

8. SIMULATING VECTORIAL PROCESSES

The use of CG models has been effective in situations in which the details of the simulated system

are not completely clear. A case in point is a recent study (110) of the nature of the vectorial

translocation of a single-stranded DNA by translocases. This study focused on the electrostatic

interaction between the protein and the DNA main-chain-ionized phosphate group. The use

of the PDLD/S-LRA electrostatic potential for the simplified system generated a unique free-

energy surface with a clear valley that leads in one direction, thus supporting a vectorial process.

RunningLDsimulationsonthecorrespondingsystem,andrecoveringunidirectionaltranslocation

(Figure5),wheretheenergyofATPhydrolysisiscoupledtothetranslocationprocess,verifiedthe

simple insight provided by inspecting the surface. It should be noted that our CG simulations are,

infact,thefirstfullyconsistentsimulationsofabiologicalvectorialprocessinwhichtheresultsare

not assumed a priori or introduced by phenomenological parameters (see the discussion in 110).

9. SIMULATING LIGHT-INDUCED PHOTOBIOLOGICAL ELECTRON

AND PROTON TRANSPORT

The study of light-induced electron transport and PTR can be divided into two limits. On the

short timescale (i.e., from a few femtoseconds up to 100 ps), one can use explicit all-atom simula-

tions to explore the nature of the corresponding primary events [see, e.g., studies of the primary

photochemical event in bacteriorhodopsin (111)] and photosynthesis (112). On the long timescale

(i.e., from 100 ps to seconds), one can use CG models to explore the behavior of the system. Here,

it would be useful to perform LD simulations of the PTR, following the initial LD treatment

of the change in the pKaof the key ionizable group. An ideal system for such a study is bacteri-

orhodopsin,inwhichthenatureoftheactivationbarriersfortheprimaryPTwasrecentlyexplored

(113) by combining the QCFF/PI and the EVB methods in a unified QM/MM framework. It was

establishedthattheinitialcharge-separationprocess,whichleads totheprimaryPT,has sufficient

excess free energy to drive the subsequent PT process and, in doing so, provided the first glimpse

into the energetics of protein conformational change. Obviously, the use of the LD simulations

described in Section 6, and even MC simulations, can be effective in exploring the subsequent

steps and the overall pumping process.

The simulation of electron transport processes poses another important challenge. Here we

introduced many of the key approaches (see 112) of getting the relevant activation barriers from

direct simulations. As far as the present review is concerned, it is instructive to note that one

can explore long-range electron transport steps by the same MC used for the PT, but with a

characteristic time that reflects the corresponding pre-exponential factor (see 104).

www.annualreviews.org • Multiscale Simulations in Biophysics 53

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 14

Time (ps)

T1

D1

E1

T2

Time (ps)

Q''

R''

0

0.5

1

1.5

2

2.5

3

3.5

4

20,000 40,000 60,00080,000100,000 120,000 140,000160,000

0

0

40

80

120

160

200

20,000 40,00060,000 80,000100,000 120,000 140,000160,000

Figure 5

Simulating vectorial translocation in hexameric helicases. Shown here is the simulated time dependency of

the R??and Q??coordinates corresponding to the DNA and the protein, respectively (for details see 110), as

well as snapshots along the translocation path, for a case with a low barrier of 4 kcal mol−1for the relevant

transition (note that simulations with higher barriers gave similar results, albeit with longer translocation

times). Figure adapted from Reference 110.

10. ELECTROSTATIC MODELING AS A MULTILEVEL STRATEGY

As pointed out in our recent review, electrostatic effects provide what is arguably the most impor-

tant correlation between structure and function. A crucial question is what level of simplification

would still be able to capture the correct physics of electrostatic effects in macromolecules.

Inprinciple,therearethreegeneralwaysthesolventand/orproteincanbemodeled.Thefirstis

torepresentallsolventand/orproteinatomsexplicitly,usingafullymicroscopicmodel(114,115).

However, such an explicit solvation model is computationally expensive, particularly for larger

systems[althoughwithincreasingcomputerpower,itcanbeusedformanyapplications(108)].An

effective CG simplification involves representing the solvent molecules as polarizable Langevin

dipoles on a grid (16, 108) or as soft-sphere dipoles (3). The limit of the simplification is obtained

by using continuum models (for discussion see 108, 116). Unfortunately, such models are based

on phenomenological considerations, which make them completely dependent on procedure in

54 Kamerlin et al.

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.

Page 15

Quantum classical

path: a multiscale

approach that

evaluates the nuclear

quantum mechanical

correction to free-

energy surfaces by

means of a

perturbation from a

classical trajectory to a

path integral centroid

potential, allowing for

efficient calculation of

the path integral

results

studies of proteins (108). Thus one of the best compromises, with the most consistent connection

betweenmacro-andmicroscopicmodels,isthePDLD/S-LRAmodel(see117,118fordiscussion).

This model provides a clear definition of the protein dielectric, in terms of the few factors that

are not treated explicitly (namely, the limited protein relaxation and the limited representation of

water penetration during the charging process).

11. ADDITIONAL APPROACHES AND SYSTEMS

As the realization of the power of the CG approach in molecular modeling is rapidly growing,

there are an increasing number of studies and directions that are being taken (for a recent review,

see, e.g., 9). In light of space limitations, we only mention some select approaches here.

Recently, we have witnessed a major productive effort in modeling membranes by CG models

by Marrink and coworkers (119), who developed the MARTINI force field, which uses extensive

calibration of the building blocks of the CG force field against thermodynamic data. At present,

the model shows reasonable behavior for lipid bilayers, in terms of the stress profile across the

bilayer and its tendency to form pores, as well as accurate agreement with all-atom simulations

for the free energies of lipid desorption and, to some extent, flip-flopping across the bilayer.

Another recent approach is the dynamic linear response theory by Essiz & Coalson (120).

This approach has common elements with our LRA approach for free-energy calculations and in

exploringtheenergeticsoflargeconformationalchangesinproteins(61),aswellaswithouruseof

theLRAinsimulatingfastrelaxationprocesses(60,112),althoughtheLRTformulationiselegant

and independent of our formulation. The linear response theory approach is aimed at studies of

the response of a macromolecular system, such as a protein, to a change in the potential of the

system, such as a change that is induced when a ligand bound to a well-defined binding pocket

within a protein dissociates from the binding pocket. Such studies are formally similar to our use

of the linear response theory approach in studies of the response to light-induced charge-transfer

processes (112, 121).

Another recent development is Calderon and coworkers’ (122) idea of summarizing the state

of complex systems by the time series of low-dimensional system observables, such as the use

of nonequilibrium trajectories to extract both the equilibrium quantities and kinetic parameters,

which are sometimes used to describe the dynamics occurring over longer timescales than those

explored in the simulations, by means of a surrogate process approximation method. This inter-

esting approach uses time-series techniques (123, 124) to estimate a low-dimensional stochastic

differential equation. These equations approximate the dynamics of an observed signal, which can

come from either a computer simulation or an experiment. In their recent study (122), the authors

applied such stochastic differential equations to approximate the various statistical properties as-

sociated with steered MD simulations of ion transport across a channel protein, which, in many

respects, resembles our earlier renormalization approach considered in Section 6.

Savelyev&Papoian(125,126)usedanapproachwhoseessenceistomatchcorrelatorsobtained

from atomistic and CG simulations, for observables that explicitly enter the CG Hamiltonian,

which leads to the equivalency of the corresponding partition functions. This resulted in a one-

step renormalization process to reach a consensus between the two models, which allows for the

reproduction of many-body effects at low computational cost, while increasing the likelihood of

finding unique solutions for the CG force-field parameters values.

Finally, it might be useful to mention here the quantum classical path approach (38, 127–130)

as a powerful multiscale approach, which exploits our idea of transferring between two potentials

(which is basically the scenario illustrated in Figure 2). This approach evaluates the nuclear

quantum mechanical corrections to free-energy surfaces by using a perturbation from a classical

www.annualreviews.org • Multiscale Simulations in Biophysics 55

Annu. Rev. Phys. Chem. 2011.62:41-64. Downloaded from www.annualreviews.org

by Mr. Sankarampadi ARAVAMUDHAN on 05/27/11. For personal use only.