Article

CHARMM: the biomolecular simulation program. J Comput Chem 30:1545

Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
Journal of Computational Chemistry (Impact Factor: 3.59). 06/2009; 30(10):1545-614. DOI: 10.1002/jcc.21287
Source: PubMed

ABSTRACT

CHARMM (Chemistry at HARvard Molecular Mechanics) is a highly versatile and widely used molecular simulation program. It has been developed over the last three decades with a primary focus on molecules of biological interest, including proteins, peptides, lipids, nucleic acids, carbohydrates, and small molecule ligands, as they occur in solution, crystals, and membrane environments. For the study of such systems, the program provides a large suite of computational tools that include numerous conformational and path sampling methods, free energy estimators, molecular minimization, dynamics, and analysis techniques, and model-building capabilities. The CHARMM program is applicable to problems involving a much broader class of many-particle systems. Calculations with CHARMM can be performed using a number of different energy functions and models, from mixed quantum mechanical-molecular mechanical force fields, to all-atom classical potential energy functions with explicit solvent and various boundary conditions, to implicit solvent and membrane models. The program has been ported to numerous platforms in both serial and parallel architectures. This article provides an overview of the program as it exists today with an emphasis on developments since the publication of the original CHARMM article in 1983.

Full-text

Available from: Bruce Tidor
CHARMM: The Biomolecular Simulation Program
B. R. BROOKS,
1
*
C. L. BROOKS III,
2,3
*
A. D. MACKERELL, Jr.,
4
*
L. NILSSON,
5
*
R. J. PETRELLA,
6,7
*
B. ROUX,
8
*
Y. WON,
9
*
G. ARCHONTIS, C. BARTELS, S. BORESCH, A. CAFLISCH, L. CAVES, Q. CUI, A. R. DINNER, M. FEIG,
S. FISCHER, J. GAO, M. HODOSCEK, W. IM, K. KUCZERA, T. LAZARIDIS, J. MA, V. OVCHINNIKOV,
E. PACI, R. W. PASTOR, C. B. POST, J. Z. PU, M. SCHAEFER, B. TIDOR, R. M. VENABLE,
H. L. WOODCOCK, X. WU, W. YANG, D. M. YORK, M. KARPLUS
6,10
*
1
Laboratory of Computational Biology, National Heart, Lung, and Blood Institute,
National Institutes of Health, Bethesda, Maryland 20892
2
Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109
3
Department of Biophysics, University of Michigan, Ann Arbor, Michigan 48109
4
Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland,
Baltimore, Maryland 21201
5
Department of Biosciences and Nutrition, Karolinska Institutet, SE-141 57, Huddinge, Sweden
6
Department of Chemistry and Chemical Biology, Harvard University, Cambridge,
Massachusetts 02138
7
Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
8
Department of Biochemistry and Molecular Biology, University of Chicago, Gordon Center for
Integrative Science, Chicago, Illinois 60637
9
Department of Chemistry, Hanyang University, Seoul 133-792, Korea
10
Laboratoire de Chimie Biophysique, ISIS, Universite
´
de Strasbourg, 67000 Strasbourg, France
Received 12 September 2008; Revised 24 February 2009; Accepted 3 March 2009
DOI 10.1002/jcc.21287
Published online 14 May 2009 in Wiley InterScience (www.interscience.wiley.com).
Abstract: CHARMM (Chemistry at HARvard Molecular Mechanics) is a highly versatile and widely used molecu-
lar simulation program. It has been developed over the last three decades with a primary focus on molecules of bio-
logical interest, including proteins, peptides, lipids, nucleic acids, carbohydrates, and small molecule ligands, as they
occur in solution, crystals, and membrane environments. For the study of such systems, the program provides a large
suite of computational tools that include numerous conformational and path sampling methods, free energy estima-
tors, molecular minimization, dynamics, and analysis techniques, and model-building capabilities. The CHARMM
program is applicable to problems involving a much broader class of many-particle systems. Calculations with
CHARMM can be performed using a number of different energy functions and models, from mixed quantum
mechanical-molecular mechanical force fields, to all-atom classical potential energy functions with explicit solvent
and various boundary conditions, to implicit solvent and membrane models. The program has been ported to numer-
ous platforms in both serial and parallel architectures. This article provides an overview of the program as it exists
today with an emphasis on developments since the publication of the original CHARMM article in 1983.
q 2009 Wiley Periodicals, Inc. J Comput Chem 30: 1545–1614, 2009
Key words: biomolecular simulation; CHARMM program; molecular mechanics; molecular dynamics; molecular
modeling; biophysical computation; energy function
I. Introduction
Understanding how biological macromolecular systems (proteins,
nucleic acids, lipid membranes, carbohydrates, and their com-
plexes) function is a major objective of current research by com-
putational chemists and biophysicists. The hypothesis underlying
computational models of biological macromolecules is that the
behavior of such systems can be described in terms of the basic
physical principles governing the interactions and motions of
Additional Supporting Information may be found in the online version of
this article.
Correspondence to: B. R. Brooks; e-mail: brbrooks@helix.nih.gov or
C. L. Brooks III; e-mail: brookscl@umich.edu or A. D. MacKerell, Jr.;
e-mail: alex@outerbanks.umaryland.edu or L. Nilsson; e-mail: Lennart.
Nilsson@ki.se or R. J. Petrella; e-mail: petrella@fas.harvard.edu or
B. Roux; e-mail: roux@uchicago.edu or Y. Won; e-mail: won@hanyang.
ac.kr or M. Karplus; e-mail: marci@tammy.harvard.edu
Contract/grant sponsors: NSF, NIH, DOE, Accelrys, CNRS, NHLBI
q 2009 Wiley Periodicals, Inc.
Page 1
their elementary atomic constituents. The models are, thus,
rooted in the fundamental laws of physics and chemistry, includ-
ing electrostatics, quantum mechanics and statistical mechanics.
The challenge now is in the development and application of
methods, based on such well-established principles, to shed light
on the structure, function, and properties of often complex bio-
molecular systems. With the advent of computers, the scope of
molecular dynamics (MD; see footnote for naming conventions)
y
and other simulation techniques has evolved from the study of
simple hard-sphere models of liquids in the 1950s,
1
to that of
models of more complex atomic and molecular liquids in the
1960s,
2,3
and to the study of proteins in the 1970s.
4
Biological
macromolecular systems of increasing size and complexity,
including nucleic acids, viruses, membrane proteins, and macro-
molecular assemblies, are now being investigated using these
computational methods.
The power and usefulness of atomic models based on realis-
tic microscopic interactions for investigating the properties of a
wide variety of biomolecules, as well as other chemical systems,
has been amply demonstrated. The methodology and applica-
tions have been described in numerous books
5–10
and
reviews.
11–13
Studies of such systems have now reached a point
where computational models often have an important role in the
design and interpretation of experiments. Of particular interest is
the possibility of employing molecular simulations to obtain in-
formation that is difficult to determine experimentally.
14,15
A
dictionary definition of ‘simulation’ is, in fact, ‘the examina-
tion of a problem, often not subject to direct experimentation,’
and it is this broad meaning that is intended here. Typical stud-
ies range from those concerned with the structures, energies, and
vibrational frequencies of small molecules, through those dealing
with Monte Carlo and MD simulations of pure liquids and solu-
tions, to analyses of the conformational energies and fluctuations
of large molecules in solution or in crystal environments.
As the field of biomolecular computation continues to evolve,
it is essential to retain maximum flexibility and to have available
a wide range of computational methods for the implementation
of novel ideas in research and its applications. The need to have
an integrated approach for the development and application of
such computational biophysical methods has led to the introduc-
tion of a number of general-purpose programs, some of which
are widely distributed in academic and commercial environ-
ments. Several
16–21
were described in a special 2005 issue of
Journal of Computational Chemistry (JCC). One of the pro-
grams, CHARMM (Chemistry at HARvard Molecular Mechan-
ics), was not included in that publication because an article was
not prepared in time for the issue. CHARMM was first described
in JCC in 1983,
22
although its earlier implementations had
already been used to study biomolecules for a number of
years.
23
CHARMM is a general and flexible molecular simulation and
modeling program that uses classical (empirical and semiempiri-
cal) and quantum mechanical (QM) (semiempirical or ab initio)
energy functions for molecular systems of many different
classes, sizes, and levels of heterogeneity and complexity. The
original version of the program, although considerably smaller
and more limited than CHARMM is at present, made it possible
to build the system of interest, optimize the configuration using
energy minimization techniques, perform a normal mode or MD
simulation, and analyze the simulation results to determine struc-
tural, equilibrium, and dynamic properties. This version of
CHARMM
22,24
was able to treat isolated molecules, molecules
in solution, and molecules in crystalline solids. The information
for computations on proteins, nucleic acids, prosthetic groups
(e.g., heme groups), and substrates was available as part of the
program. A large set of analysis facilities was provided, which
included static structure and energy comparisons, time series,
correlation functions and statistical properties of molecular
dynamic trajectories, and interfaces to computer graphics pro-
grams. Over the years, CHARMM has been ported to many dif-
ferent machines and platforms, in both serial and parallel imple-
mentations of the code; and it has been made to run efficiently
on many types of computer systems, from single processor PCs,
Mac and Linux workstations, to machines based on vectorial or
multicore processors, to distributed-memory clusters of Linux
machines, and large, shared-memory supercomputer installations.
Equally important, the structure of the program has provided a
robust framework for incorporating new ideas and methodolo-
gies—many of which did not even exist when CHARMM was
first designed and coded in the late 1970s. Some examples are
implicit solvent representations, free energy perturbation meth-
ods, structure refinement based on X-ray or NMR data, transi-
tion path sampling, locally enhanced sampling with multiple
copies, discretized Feynman path integral simulations, quantum
mechanical/molecular mechanical (QM/MM) simulations, and
the treatment of induced polarization. The ability of the basic
framework of CHARMM to accommodate new methods without
large-scale restructuring of the code is one of the major
reasons for the continuing success of the program as a vehicle
for the development of computational molecular biophysics.
The primary goal of this article is to provide an overview of
CHARMM as it exists today, focusing on the developments of
the program during the 25 years since the publication of the first
article describing the CHARMM program in 1983.
22
In addition,
the current article briefly reviews the origin of the program, its
management, its distribution to a broad group of users, and
future directions in its development. Some familiarity with the
original CHARMM article is assumed. Although many details of
CHARMM usage, such as input commands and options, are
included, full documentation is available online at www.charm-
m.org, as well as with all distributions of the program. The pres-
ent work also provides, de facto, a review of the current state of
the art in computational molecular biophysics. Consequently, it
y
Method abbreviations, e.g., MD for molecular dynamics and MEP for
minimum energy path, and module names, e.g., PBEQ for the PB mod-
ule, as well as preprocessor keywords (see Section XI.B.), are in allcaps.
CHARMM commands, subcommands, or command options are in italics
with the first four letters capitalized. (The parser in CHARMM uses only
the first four letters of a command; however, it is case-insensitive.) The
term ‘keyword’ is reserved for preprocessor keywords, not command
options. File and directory names are enclosed in quotation marks, e.g.,
‘build’ directory. The ‘module’ designation refers to portions of
CHARMM source code that form a modular functional unit, not neces-
sarily a Fortran module.
1546 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 2
should be of interest not only to the CHARMM user
community, but also to scientists employing other programs.
II. Ove rview of the Program
The central motivation for creating and developing the molecu-
lar simulation program CHARMM is to provide an integrated
environment that includes a wide range of tools for the theoreti-
cal investigation of complex macromolecular systems, with par-
ticular emphasis on those that are important in biology. To
achieve this, the program is self-contained and has been
designed to be versatile, extensible, portable, and efficient.
CHARMM strikes a balance between general efficiency (the
ability of the end user to easily set up, run, and analyze a pro-
ject) and extensibility/versatility (the ability of the program to
support new implementations and the use of many methods and
approaches). This section provides an introduction to some gen-
eral aspects of the CHARMM program and its use, including the
essential elements of a typical CHARMM project. In what fol-
lows, detailed descriptions are given of most of the program’s
features.
II.A. Outline of a Generic CHARMM Project
A typical research project with CHARMM can be described in
very general terms based on the information flow in the pro-
gram, which is schematically illustrated in Figure 1. The user
begins a project by first setting up the atomic model representing
the system of interest (see also Section IX.A.). This consists of
importing the ‘residue’ topologies file (RTF) and force field pa-
rameters (PRM), generating the ‘protein’ structure file (PSF),
and assembling a complete configuration (coordinates) of all the
atoms in the system; the quotes around ‘residue’ and ‘protein’
indicate that the same (historical) notation is used when the pro-
gram is applied to molecules in general. For molecules and moi-
eties that have been parameterized, such as proteins, nucleic
acids, and lipids, standard CHARMM PRM and RTF files can
be used, and the setup procedure is straightforward if most of
the coordinates are known. For molecules not included in the
standard libraries, CHARMM is designed to allow for the use of
a virtually unlimited variety of additional molecular topologies
and force field parameters. (The available force fields are dis-
cussed in Section III.) For calculations involving multiple copies
of a structure, such as reaction path calculations in which the
coordinates of the two end structures are derived from X-ray
crystallographic data, consistency of atom labels is required
across all of the copies, particularly for chemically equivalent
atoms (e.g., Cd1 and Cd2 of Tyr). CHARMM provides a set of
general tools for facilitating the setup and manipulation of the
molecular system (e.g., coordinate transformations and the con-
struction of missing coordinates; Sections IX.B. and C.) and for
imposing a variety of constraints (Section V.B.) and restraints
(Section III.F.) on the system, where appropriate; restraints
allow changes in the property of interest with an energetic pen-
alty, while constraints fix the property, usually to user-specified
values. The user can specify a number of options for the calcula-
tion of nonbonded interactions and can choose to impose any of
a number of boundary conditions on the system (Section IV). To
carry out the calculations in an acceptable length of real time,
the user must consider tradeoffs in accuracy/complexity versus
Figure 1. Diagram depicting the general scheme of the information
flow in a CHARMM project. Information from data and parameter
files (top row cylinders) and the input file (second row trapezoid) is
first used to fill CHARMM data structures, which are then used by
the energy routines and related modules (some of which are listed
in the central grey box) to calculate the energy and its derivatives.
This information is then used by various CHARMM modules for
production calculations (second row from the bottom), which gener-
ate data in output files or internal data structures (bottom row) that
are analyzed to obtain final results. Key: cylinders: data files; trape-
zoid: input file; white rectangles: data structures; shaded rectangles:
CHARMM functionalities/modules; PDB: protein data bank; COOR,
PSF, and PARA: internal CHARMM data structures for system
coordinates, system topology/connectivity (PSF), and energy func-
tion parameters, respectively; NB energy: nonbonded energy; QM/
MM: combined quantum mechanical/molecular mechanical methods;
PME: Particle-Mesh Ewald summation method; LRC: long-range
corrections for truncated van der Waals interactions; Impl solv:
implicit solvation models; PBEQ: PB electrostatics module; Ext
elec: Extended electrostatics; CMAP: backbone dihedral angle cor-
rection term for all-atom protein representation; Pol mod: polariz-
able models; Pathways: reaction pathway calculations; FE estimates:
methods for estimating free energy differences.
1547CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 3
efficiency (Section XII) when selecting the model to be
employed in the calculations; in addition, he or she may need to
use a parallel compilation of the code or to utilize time-saving
features such as lookup tables (Section X). There are currently
two Web-based interface utilities that can be used to facilitate
the setup phase of a CHARMM project, CHARMM-GUI
25
and
CHARMMing.
26
The project may require a preproduction stage: e.g., for an
MD simulation, the usual procedure is to minimize the system
structure (often obtained from crystallographic or NMR data), to
heat the system to the desired temperature, and then to equili-
brate it. Once this is done, the project enters the production
stage, during which the atomic conformation of the system may
be refined, explored, and sampled by the application of various
computational procedures. These procedures may consist, among
other possibilities, of performing energy minimization, propagat-
ing MD or Langevin dynamics trajectories, sampling with
Metropolis Monte Carlo or grid-based search algorithms, obtain-
ing thermodynamic free energy differences via free energy per-
turbation computations, performing transition path sampling, or
calculating normal modes of vibrations. With such methodolo-
gies, it is possible to simulate the time evolution of the molecu-
lar system, optimize, and generate conformations according to
various statistical mechanical ensembles, characterize collective
motions, and explore the energy landscape along particular reac-
tion pathways. Some computational techniques (e.g., so-called
‘alchemical’ free energy simulations) include the consideration
of ‘unphysical’ intermediate states to improve the calculation
of physical observables, including the free energy, entropy, and
enthalpy change due to a mutation or conformational transition.
These algorithms and methods, which are central to many
theoretical studies of biological macromolecules and other
mesoscopic systems, are discussed in Sections V, VI, and VII.
Although several key quantities are normally monitored
during the productio n stage of a project, additional system
properties may have to be determined by postprocessing the
data—e.g., to calculate free energy changes from the coordi-
nates or diffusion coefficients from the velocities saved during
one or more MD trajectories. These deriv ed quantities, whose
calculation is described in Section VIII, may include time
series, correlation functions, or other properties related to
experimental observables. Finally, the advanced CHARMM
user in some cases will have extended the program’s function-
ality in the course of carrying out his project, either by creat-
ing CHARMM scripts (Section II.C.), writing external code as
an adjunct, utilizing internal ‘hooks’ to the CHARMM
source code (Section IX.A.), or directly modify ing one or
more sou rce code mod ules. After such developmental code
has been made to conform to CHARM M coding standards and
tested, it should be submitted to the CHARMM manager so as
to be considered for inclusion in future distrib utions of the
program (Sectio n XI).
II.B. Functional Multiplicity of CHARMM
An important feature of CHARMM is that many specific compu-
tational tasks (e.g., the calculation of a free energy or the deter-
mination of a reaction pathway) can be accomplished in more
than one way. This diversity has two major functions. First, the
best method to use often depends on the specific nature of the
problem being studied. Second, within a given type of problem
or method, the level of approximation that achieves the best bal-
ance between accuracy requirements and computational resour-
ces often depends on the system size and complexity. A typical
example arises in the class of models that are used to represent
the effect of the surrounding solvent on a macromolecule. The
most realistic representation treats the solvent environment by
explicitly including the water molecules (as well as any counter
ions, crystal neighbors, or membrane lipids, if they are present),
and imposing periodic boundary conditions (PBC), which mimic
an infinite system by reproducing the central cell
7,8
(see section
IV.B.). Systems varying from tens to even hundreds of thou-
sands of particles can be simulated with such all-explicit-atom
models for hundreds of nanoseconds using currently available
computational resources, suc h as large, distributed memory
clusters of nodes and parallel program architectures. However,
adrawbackoftreatingsolvatedsystemsinthiswayisthat
most of the computing time (often more than 90%) is used for
simulating the solvent rather than the parts of the system o f
primary interest. Consequently, an alternative approach is often
used in which the influence of the solvent is incorporated
implicitly with an effective mean-field potential (i.e., without
the inclusion of actual water molecules in the calculation).
This approach can greatly reduce the c omputational cost of a
calculation for a protein relative to the use of explicit solvent,
often by a 100-fold or more, and captures many of the equilib-
rium properties of the solvent. However, it introduces approxi-
mations, so that hydrodynamic and frictional solvent effects, as
well as the role of wat er structure, are usually not accounted
for in the implicit solvent approach. A variety of implicit sol-
vent models, with differing accuracy and e fficienc y profiles,
are available in CHARMM; a detailed discussion can be found
in Section III.D. An intermediate approach between all-atom
PBC s imulations and implicit solvent models involves simulat-
ing only a smal l region explicitly in the presence of a reduced
number of explicit solvent molecules, while applying an effec-
tive solvent boundary potential (SBP) to mimic the average
influence of the surrounding solvent.
27–29
The SBP approach is
often advantageous in simulations requiring an explicit, atomic
representation of water in a limited region of the system—e.g.,
in the study of a reaction taking place in the active site of a
large enzyme.
30
The choice of solvent representation for a p ro-
ject thus depends on several factors, including the accuracy
requirements of the calculation, the type of data being sought,
the system size, and the computational resources and (real)
time available.
II.C. The CHARMM Scripting Language
Although CHARMM can be run interactively, as is often done
when the CHARMM graphics facility (GRAPHX) is being used,
intensive computational projects are normally executed in batch
mode through the use of input files (see Fig. 2). A set of com-
mand structures, including GOTO, STREam, and IF-ELSE-ENDIf
structures, corresponding to the respective control-flow state-
ments in source code, provide the basis for a powerful high-level
1548 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 4
scripting language that permits the general and flexible control
of complicated simulation protocols and facilitates the prototyp-
ing of new methods. The various functionalities of CHARMM
can easily be combined in almost any way using these command
structures in scripts to satisfy the requirements of a particular
project. In general, the order of CHARMM commands is limited
only by the data required by the command. For example, the
energy cannot be calculated unless the arrays holding the coordi-
nates, parameters, and structural topology, etc., have already
been filled (see Fig. 1). The command parser allows the substitu-
tion of numerous variables, which are set either internally by the
program during execution (for example, the current number of
atoms is accessible as ‘?natom’’), or externally by the user
(for example, a user may initially issue the command SET
temperature 298.15,’ and then substitute its value as ‘@temper-
ature’ on any command line in the CHARMM input script).
All components of the most recent energy evaluation, as well as
the results of many other calculations, are available as internal
CHARMM variables (?identifier). The numerical values for the
variables can then be written to an external file, further proc-
essed, or used in control statements (‘IF ?ener.lt. 2500
THEN...’). Arrays of these variables can also be constructed
(e.g., ‘segid1,’ ‘segid2,’ ..., ‘segid10’’) and referenced
(@segid@@j). The parser has a robust interpreter of arithmetic
expressions (CALC), which can be used to evaluate algebraic
functions of these variables using basic mathematical operations,
including random number generation. Variable values may also
be passed to the program at the start of execution. In addition, it
is possible to call other CHARMM scripts as subroutines
(STREam ... RETUrn), and to access operating system com-
mands (SYSTem); depending on the operating system,
CHARMM can use environment variables in filenames. In addi-
tion, the SCALar command facility performs arithmetic and sta-
tistical manipulations on internal CHARMM vectors (e.g., coor-
dinates, forces, charges, masses, user-defined arrays). CHARMM
variables and arrays can be read from (GET, SCALar READ)or
written to (ECHO, WRITe TITLe, SCALar WRITe) external files,
with or without header information, allowing, for example, easy
access from external graphing programs. The extent of printing
can be controlled with the PRNLevel and WRNLevel commands,
which take integers in the range of 210 (print no messages or
warnings) to 111 (print all). In general, values larger than 5
(default) will result in output that is not needed for production cal-
culations but may be useful for debugging and script-checking
purposes. For example, PRNLevel 8 will print the name of every
energy-based subroutine as it is called.
Since CHARMM input files can take the form of minipro-
grams written in the interpretive language of CHARMM com-
mands, common tasks can be coded in a general way at the script
level. As examples, standard input scripts have been written for
the addition of explicit solvent to a system, and a series of scripts
has been developed that automates the setup of the initial configu-
ration for a membrane–protein MD simulation (see Fig. 3).
31–33
It
is also possible to implement complex methods and simulation
protocols at the level of the input file without changing the source
code. For example, the Random Expulsion method
34
has been
implemented in this way in a study of ligand escape from a nu-
clear receptor
35
(see Fig. 4); see also Blondel et al.
36
Another
example is the development and parameterization of a coarse-
grained model of an amphipathic polypeptide which was used to
investigate the kinetics of amyloid aggregation.
37
The flexibility
of the scripting language is such that one could implement
Metropolis Monte Carlo sampling in a few lines directly from the
input files (though this would run less efficiently than the dedi-
cated MC module). In addition, the scripting language is used
extensively when performing the calculations required for the
optimization of force field parameters (see next section).
III. Atomic Potential Energy Function
The relationship between structure and energy is an essential
element of many computational studies based on detailed atomic
models. The potential energy function, by custom called a force
Figure 2. CHARMM input file for an MD simulation of BPTI and a
simple analysis of the resulting trajectory. This is similar in form to
that used in the first MD simulation of a protein.
4
The example uses
the CHARMM22 all-hydrogen force field, with topology descriptions
for standard amino acids, and the interaction parameters in the text files
‘top_all22_prot.inp’ and ‘par_all22_prot.inp,’ respectively. A PDB
file is used to provide the amino acid sequence and the atomic coordi-
nates; depending on the source of the PDB file, some manual editing
may be required. Coordinates for hydrog en atoms are constructed using
the HBUILD algorithm, SHAKE constraints are applied to all bonds,
and the dynamics run is started at 35 K with heating in 50 K incre-
ments at 0.2 ps intervals to a final temperature of 285 K. Specifications
for the calculation of nonbonded interactions are also given on the
dynamics command line. Coordinates are saved every 100 steps to a
binary file, which is reopened after the simulation and used to compute
the average structure and RMS uctuations. Other examples can be
found at www.charmm.org.
1549CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 5
field, is used to calculate the potential energy of the system and
its derivatives from the coordinates corresponding to the struc-
ture or conformation. It has two aspects: the mathematical form
and the empirical parameters. In CHARMM, the topology (RTF)
and parameter (PRM) files (see Fig. 1), along with the polymer
sequence, allow the potential energy function to be fully defined.
First derivatives of the potential energy are used to determine
the atomic forces, which are required for MD simulation and
energy minimization. Second derivatives of the potential energy,
which are required for the calculation of vibrational spectra and
for some energy minimization algorithms, are also available. In
a program like CHARMM, which is undergoing continuous
development, changes in the force field and the rest of the code
are often linked and developments in both made in concert.
Because force fields are approximations to the exact potential
energy, they are expected to improve over time. The goals of
force field development involve at least three factors; they are
accuracy, breadth, and speed. Accuracy can be defined as the
extent to which calculations using a force field can reproduce
experimental observables. Breadth refers to the range of moi-
eties, molecules, and systems to which a force field can be
applied at the required level of accuracy. Speed is the relative
efficiency of calculations using one force field over another, all
else being equal; this often depends largely on the level of detail
of the models, although the form of implementation can also
have a role. In addition, the introduction of improvements to a
given force field must be balanced by the need for stability of
the force field (i.e. constancy of the form and parameters) over
time. This is particularly true of accuracy gains: while improved
accuracy in a given force field may be desired, continual change
would make comparison of results from different versions of the
force field problematic. In CHARMM, there have been continual
force field developments over the years, many of which are dis-
cussed, including the development of force fields based on more
detailed atomic representations (e.g., all atom, polarizable) and
applicability to more molecular types (e.g. DNA, carbohydrates,
lipids). At the same time, an effort has been made not to
change validated and well-tested force fields, thereby facilitat-
ing comparison of results from studies performed at different
times and in different laborator ies. No tably, the only modifi-
cation to the protein part of the all-atom fix ed-point-charge
CHARMM force field
38
since May 1993 has been the a ddition
of a dihedral correction term (see Section III.C. later,
CMAP); the nucleic acid part of this force field
39–41
has
remained unchanged since 1998.
III.A. Molecular Mechanics Force Fields
The general form of the potential energy function most com-
monly used in CHARMM for macromolecular simulations is
based on fixed point charges and is shown in eq. (1) (see also
Brooks et al.
22
and Section IX.A.).
Figure 3. The KcsA K1 channel (helical ribbons) embedded in an
explicit dipalmitoyl phosphatidylcholine (DPPC) phospholipid mem-
brane (stick figures; fatty acids are white and head groups are red,
green, and white) bathed by a 150 mM KCl aqueous salt solution (blue
and green spheres represent potassium and chloride ions, respectively,
and water molecules outside the membrane are shown in blue). The
simulation system, consisting of 40,000 atoms, was used to compute a
multi-ion PMF governing ion conduction
33
through the channel and to
determine the sources of its ionic selectivity
723
(from Berne
`
che and
Roux
33
).
Figure 4. Four different (A–D) ligand escape pathways (shown as
grey spheres along black guiding lines) identified using Random
Acceleration Molecular Dynamics
35
in the ligand binding domain of
the retinoic acid receptor. Helices are shown as ribbons, and the ret-
inoic acid ligand in the bound initial state is shown as red and gold
spheres (from Carlsson et al.
35
).
1550 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 6
Uð
~
RÞ¼
X
bonds
K
b
ðb b
0
Þ
2
þ
X
angles
K
h
ðh h
0
Þ
2
þ
X
UreyBradley
K
UB
ðS S
0
Þ
2
þ
X
dihedrals
K
u
ð1 þ cosðnu dÞÞ þ
X
impropers
K
x
ðx x
0
Þ
2
þ
X
non-bonded
pairs
e
min
ij
R
min
ij
r
ij
!
12
2
R
min
ij
r
ij
!
6
2
4
3
5
þ
q
i
q
j
4pe
0
er
ij
8
<
:
9
=
;
þ
X
residues
U
CMAP
ðu; wÞð1Þ
The potential energy, (U(
~
R)), is a sum over individual terms
representing the internal and nonbonded contributions as a func-
tion of the atomic coordinates. Internal terms include bond (b),
valence angle (h), Urey–Bradley (UB,S), dihedral angle (u),
improper angle (x), and backbone torsional correction (CMAP,
u, w) contributions, as shown in eq. (1). The parameters K
b
, K
u
,
K
UB
, K
h
, and K
x
are the respective force constants and the vari-
ables with the subscript 0 are the respective equilibrium values.
All the internal terms are taken to be harmonic, except the
dihedral angle term, which is a sinusoidal expression; here n is
the multiplicity or periodicity of the dihedral angle and d is the
phase shift. The all-atom implementations of the CHARMM
force field include all possible valence and dihedral angles for
bonded atoms, and the dihedral angle term about a given bond
may be expanded in a Fourier series of up to six terms. Most
commonly, one dihedral angle term is used, though two or more
have been introduced in some cases. In addition, for the protein
main chain, a numerical correction term, called CMAP, has been
implemented (see later). For three bonded atoms ABC, the
Urey–Bradley term is a quadratic function of the distance, S,
between atoms A and C. The improper dihedral angle term is
used at branchpoints; that is, for atoms A, B, and D bonded to a
central atom, C, the term is a quadratic function of the (pseudo)-
dihedral angle defined by ABCD. Both the Urey–Bradley
and improper dihedral terms are used to optimize the fit to
vibrational spectra and out-of-plane motions. In the polar hydro-
gen models (models in which CH3, CH2, and CH groups are
treated as single extended atoms; see later), the improper dihe-
dral angle term is also required to prevent inversion of chirality
(e.g., about the C
a
atom in proteins). Although the improper di-
hedral term is used very generally in the CHARMM force fields,
the Urey–Bradley term tends to be used only in special cases.
Nonbonded terms include Coulombic interactions between
the point charges (q
i
and q
j
) and the Lennard–Jones (LJ) 6–12
term, which is used for the treatment of the core-core repulsion
and the attractive van der Waals dispersion interaction. Non-
bonded interactions are calculated between all atom pairs within
a user-specified interatomic cutoff distance, except for covalently
bonded atom pairs (1,2 interactions) and atom pairs separated by
two covalent bonds (1,3 interactions). The relative dielectric
constant, e, is set to one in calculations with explicit solvent,
corresponding to the permittivity of vacuum, e
0
. In addition, the
electrostatic term can be scaled using other values for the dielec-
tric constant or a distance-dependent dielectric; in the latter, the
electrostatic term is inversely proportional to r
ij
2
, the distance
between the interacting atoms squared. Expressions for e used
for implicit solvent model calculations are discussed in Section
III.D. CHARMM also contains an explicit hydrogen bonding
term, which is not used in the current generation of CHARMM
force fields, but remains as a supported energy term for the pur-
poses of facilitating model development and hydrogen bonding
analysis.
42
In the LJ term, the well depth is represented by e
min
ij
,
where i and j are the indices of the interacting atoms, r
ij
is the
interatomic distance, and R
min
ij
is the distance at which the LJ
term has its minimum. Typically, e
min
ii
and R
min
i
are obtained for
individual atom types and then combined to yield e
min
ij
and R
min
ij
for the interacting atoms via a standard combination rule. In the
current CHARMM force fields, the e
min
ij
values are obtained via
the geometric mean ðe
min
ij
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e
min
ii
e
min
jj
q
Þ and R
min
ij
via the arith-
metic mean, R
min
ij
5 (R
min
i
1 R
min
j
)/2. Other LJ combining rules
are also supported, e.g., R
min
ij
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R
min
i
R
min
j
q
Þ, allowing for the
use of alternative force fields in CHARMM (see later). Separate
LJ parameters and a scaling factor for electrostatics can be used
for the nonbonded interactions between atoms separated by three
covalent bonds (1,4 interactions). The Buckingham potential
43
has recently been added as an alternative to the simple LJ for
treating the core repulsion. The Morse potential,
44
often used for
bond-breaking, is also implemented.
The simple form for the potential energy used in eq. (1) rep-
resents a compromise between accuracy and speed. For biomole-
cules at or near room temperature, the harmonic representation
is generally adequate, though approximate, and the same holds
true for the use of the LJ potential for the van der Waals inter-
actions. However, alternative force fields with additional correc-
tion terms are available in CHARMM (Section III.B.) and can
be used to check the results obtained with eq. (1). The earliest
force field in CHARMM was based on an extended-atom (united
atom) model, in which no hydrogen atoms were included explic-
itly. The omitted hydrogens were treated instead as part of the
atom to which they were bonded.
45,46
These ‘extended atom’
force fields typically required the explicit hydrogen bonding
term mentioned earlier. A significant advance beyond the early
models was based on the finding that the distance and angle
dependencies of hydrogen bonds could be treated accurately by
the LJ and electrostatic terms alone if the so-called polar hydro-
gens (OH and NH) were treated explicitly.
47
This eliminated the
need for the inclusion of explicit hydrogen bonding terms and
led to the creation of PARAM19,
48
called ‘the polar hydrogen
model’ for simulations of proteins. This model, which was first
developed in the mid 1980s
47
is still widely used, particularly in
simulations of proteins with an implicit treatment of the solvent
(Section III.D.).
All-atom representations are the basis of the present genera-
tion of CHARMM force fields and were designed for simula-
tions with explicit solvent. In these force fields, an effort was
made to optimize the parameters using model compounds repre-
sentative of moieties comprised by the macromolecules.
49
Test-
ing was done against a variety of experimentally determined
structural and thermodynamic properties of model compounds
and macromolecules, augmented by QM calculations. A balance
1551CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 7
of polar interactions (e.g., hydrogen bonds) between protein–
protein, protein–water, and water–water interactions was main-
tained in the parameterization. CHARMM uses a slightly modi-
fied form of the TIP3P water model,
50
which includes LJ param-
eters for the hydrogens as well as the oxygen.
48,51
The properties
of the model are not significantly altered,
52–54
because the
hydrogens (r
min
5 0.2245 A
˚
) are well inside the van der Waals
spheres of the oxygens (r
min
5 1.7682 A
˚
,OH bond length 5
0.9572 A
˚
). The modification was introduced to avoid singular-
ities in the use of integral equations for representing the sol-
vent
55
; it is not important for explicit-solvent MD simulations.
Currently, the all-atom models in CHARMM include the
CHARMM22 force field for proteins,
56
the CHARMM27 force
field for nucleic acids,
39,41
and force fields for lipids.
57–59
A lim-
ited set of parameters for carbohydrates is available,
60
with a
more extensive set under development
61
(Brady, J. W.; Pastor,
R.W.; MacKerell, A.D., Jr.; work in progress).
These force fields have been designed to be compatible,
allowing for studies of heterogeneous systems. The nucleic acid
and lipid force fields are significant improvements over earlier
all-atom models produced in the 1990s
62,63
; the gains were
achieved through extensive testing with macromolecular simula-
tions and improved QM benchmarks.
59
In addition, force field
parameters are available for a variety of modified protein and
nucleic acid moieties and prosthetic groups.
41,64,65
Moreover, a
description of the appropriate methods for extending the
CHARMM all-atom force fields to new molecules or moieties
has been published,
49
and tools for carrying out this type of
extension are available via the CHARMM Web page at http://
www.charmm.org. The all-atom CHARMM force fields, with a
few improvements described later, have been applied to many
different systems and shown to be adequate for quantitative
studies (e.g., free energy simulations). Separately, an extended
version of the CHARMM all-atom force fields for the treatment
of candidate drug-like molecules is currently under development.
Combined with a flexible parameter reader and automated RTF
generation, this ‘generalized’ force field will be particularly
useful for screening of drug candidates (Brooks, B. R.; MacKer-
ell, A. D., Jr.; work in progress).
III.B. Additional Supported Force Fields
Access to multiple, highly optimized, and well-tested force fields
for simulations of biological macromolecules is useful for
assessing the robustness of the computational results. In addition
to the force fields developed specifically for CHARMM, ver-
sions of the AMBER nucleic acid, and protein force fields,
66,67
the OPLS protein force fields
68
with the TIP3P or TIP4P water
models,
50,69
and the nucleic acid force field from Bristol-Myers
Squibb
70
have been integrated for use with other parts of the
CHARMM program. The SPC,
71
SPC/E
72
, and ST2
73
water
models are also available. A recent comparison of simulations
with the CHARMM22, AMBER, and OPLS force fields showed
that the three models give good results that are similar for the
structural properties of three proteins.
69
Since that study, the
CHARMM force field has been improved by adding a spline-
based 2D dihedral energy correction term (CMAP) for the pro-
tein backbone (see Section III.C.).
74
For the free energy of
hydration of 15 amino acid side chain analogs, the
CHARMM22, AMBER, and OPLS force fields yielded compara-
ble deviations (of about 1 kcal/mol) from the experimental val-
ues.
75,76
A simulation of the conformational dynamics of the
eight principal deoxyribo and ribonucleosides using long
explicit-solvent simulations showed that the CHARMM27 force
field yields a description in agreement with experiment and pro-
vides an especially accurate representation of the ribose
moiety.
77
This study also details a comparison of simulations
using the CHARMM27 and AMBER nucleic acid force fields,
performed with CHARMM. A simulation study described by
Reddy et al.
78
compares the different force fields available in
CHARMM for B-DNA oligomers. In addition, CHARMM has
been shown to yield quantitative agreement with NMR imino
proton exchange experiments on base opening.
79–81
CHARMM also includes the Merck Molecular Force Field
(MMFF)
82,83
and the Consistent Force Field (CFF).
84,85
These
force fields use so-called ‘Class II’ potential energy functions
that differ from that in eq. (1) by the addition of cross terms
between different internal coordinates (e.g., terms that couple
the bond lengths and angles) and alternative methods for the
treatment of the nonbonded interactions. The CFF force field is
based on the early force field of Lifson and Warshel.
86
The
MMFF force field is specifically designed to be used within the
CHARMM program for the study of a wide range of organic
compounds of pharmaceutical interest. CHARMM is able to
read PDB, MERCK, or MOL2 formatted files, including MOL2
databases, so as to support large-scale virtual drug screening.
Also, a script is available that transforms the MMFF parameter-
ization for a given molecule so as to be consistent with the
standard CHARMM force field.
III.C. Recent Extensions and Current Developments
Improved Backbone Dihedral Angle Potential
An important advance for the accurate calculation of the internal
energies of biomolecules is the introduction of a multidimen-
sional spline fitting procedure.
74,87
It allows for any target
energy surface associated with two dihedral angles to be added
to the potential energy function in eq. (1). The use of the spline
function, referred to as CMAP, corrects certain small systematic
errors in the description of the protein backbone by the all-atom
CHARMM force field. The CMAP correction, which is based on
ab initio QM calculations, as well as structure-based potentials
of mean force, significantly improves the structural and dynamic
results obtained with MD simulations of proteins in crystalline
and solution environments.
74,88
Additional simulations have
shown improved agreement with NH order parameters as
measured by NMR.
89
The spline function is expected to be gen-
erally useful for improving the representation of the internal
flexibility of biopolymers when the available data indicate that
corrections are required.
90
Treatment of Induced Polarization
A refinement in the fixed charge distribution of the standard
CHARMM biomolecular force field is the incorporation of the
1552 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 8
influence of induced electronic polarization. Polarization is
expected to have particularly important effects on the structure,
energetics, and dynamics of systems containing charged (e.g.,
metal ions) or highly polar species. There is also an indication
that polarization effects can be significant in accurately model-
ing the nonpolar hydrocarbon core of lipid membranes.
91,92
Although the physics of polarization is well understood, there
are problems associated with introducing it into biomolecular
simulations. They concern the choice of a suitable mathematical
representation, the design of efficient computational algorithms,
and the reparameterization of the force field. The three most
promising representations are the fluctuating charge model intro-
duced by Rick and Berne,
93
which is based on the charge-equal-
ization principle,
94
the classical Drude oscillator model (also
called the Shell model),
95
and the induced point dipole
model.
96–98
Patel and Brooks
99
have developed and tested a
polarizable CHARMM force field for proteins based on a
charge-equalization scheme (CHEQ module). It is currently
being used in molecular simulations to explore the role of elec-
tronic polarizability in proteins and peptides in solution,
99,100
at
phase boundaries in alcohols,
101,102
and alkanes,
103
and in the
conductance of ion channels.
92
MacKerell, Roux and coworkers
are exploring a polarizable model based on the classical Drude
oscillator methods
104
and have developed the SWM4-DP polar-
izable water model,
105,106
which has been used to simulate DNA
in solution.
107
A recent parameterization of alkanes,
108
alco-
hols,
109,110
aromatics,
111
ethers,
112
amides,
113
and small ions
114
demonstrates the ability of Drude oscillator-based polarizabilities
to reproduce a set of experimental observables that are incor-
rectly modeled by force fields with fixed charges. Examples
include the dielectric constants of neat alkanes,
108
water–ethanol
mixtures with concentrations that vary over the full molar frac-
tion range,
109,113
and liquid N-methylacetamide, as well as the
excess concentration of large, polarizable anions found at the
air–water interface.
115–118
Gao and coworkers have used polariz-
able intermolecular potential functions, PIPFs, that model elec-
tronic polarization with an induced point dipole approach to
study polarization effects in a series of organic liquids including
alkanes, alcohols, and amides
96,98,119
; the results obtained with
the induced-dipole model were found to be in good accord with
those obtained from combined QM/MM simulations in which
polarization effects were introduced with QM calculations.
In all the three induced polarization methods, the polarization
is modeled as additional dynamical degrees of freedom that are
propagated according to extended Lagrangian algorithms. This
treatment avoids the need to introduce computationally ineffi-
cient approaches based on iterative self-consistent field (SCF)
methods.
104,120
Efforts are currently underway to obtain com-
plete sets of protein, nucleic acid, and lipid parameters for these
polarizable force fields.
The polarizable models described here represent ongoing
combined code and parameter developments that will be incor-
porated into the next generation of CHARMM force fields. Once
this has been accomplished, it will be possible to carry out addi-
tional comparative studies (i.e., simulations with and without
polarization) to determine the types of problems for which the
use of such polarizable force fields is important.
III.D. Implicit Solvent Methods
Although MD simulations in which a large number of solvent
molecules are included provide the most detailed representation
of a solvated biomolecular system (see later), incorporating the
influence of the solvent implicitly via an effective mean-field
potential can provide a cost-efficient alternative that is suffi-
ciently accurate for solving many problems of interest. Although
implicit solvent simulations have computational requirements
(CPU and memory) that can be close to those for vacuum calcu-
lations, they avoid many of the artifacts present in the latter,
such as large deviations from crystal structures, excessive num-
bers of salt bridges, and fluctuations that are too small relative
to crystallographic B factors. The reduction in computer time
obtained with implicit models, relative to the use of an explicit
solvent environment, can be important for problems requiring
extensive conformational searching, such as simulations of pep-
tide and protein folding
121–123
and studies of the conformational
changes in large assemblies.
122,124
Implicit solvent approaches
allow the estimation of solvation free energies while avoiding
the statistical errors associated with averages extracted from
simulations with a large number of solvent molecules. Examples
of this type of approach are the MM/GBSA or MM/PBSA
approaches to approximate free energies,
125
pK
a
calculations for
ligands in a protein environment,
126–129
and scoring protein con-
formations in ab initio folding or homology modeling stud-
ies.
130–133
An implicit solvent also permits arbitrarily large
atomic displacements of the solute without solvent clashes, lead-
ing to more efficient conformational sampling in Monte Carlo
and grid-based algorithms. Recently developed implicit mem-
brane models, by analogy with implicit water (or other solvent)
models, facilitate the study of proteins embedded in mem-
branes.
134–139
Implicit solvent representations are also useful as
conceptual tools for analyzing the results of simulations gener-
ated with explicit solvent molecules and for better understanding
the nature of solvation phenomena.
140,141
Finally, the instanta-
neous solvent relaxation that is inherent in implicit solvation
models is useful for the study of macromolecular conformational
changes over the ‘simulation-accessible’ nanosecond or shorter
timescales, as in forced unfolding MD simulations of proteins,
142
versus the experimental microsecond to millisecond timescales.
Treating the solvent explicitly in this type of calculation can
introduce artifacts because of possible coupling between the sol-
vent relaxation, which occurs on the nanosecond timescale, and
the sped-up conformational change.
Several implicit solvent approaches are available in
CHARMM, which effectively extend the number of available
force fields in the program. The implicit solvent models differ
both in their theoretical framework (e.g., the surface area-based
empirical solvation potentials versus the approximate continuum
models based on generalized Born theory) and in their imple-
mentation. A comparison of five of the effective (implicit sol-
vent) free energy surfaces for three peptides known to have sta-
ble conformations in solution is presented by Steinbach.
143
Good
agreement between results obtained with implicit and explicit
solvent has been observed for the potential of mean force (PMF)
as a function of the end-to-end distance of a 12-residue pep-
tide
144
and as a function of the radius of gyration of a six-resi-
1553CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 9
due peptide.
145
The implicit solvent methods currently available
in CHARMM are outlined below. A comparison of the speeds
of several of the methods with vacuum and explicit solvent cal-
culations is also presented.
Solvent-Accessible Surface Area Models
One of the earliest and simplest implicit solvent models imple-
mented in CHARMM, and currently the fastest one in the pro-
gram, is based on the solvent-accessible surface area (SASA).
146
Models of this kind make the assumption that the solvation free
energy of each part of a molecule is proportional to its SASA—
i.e., they approximate the contribution arising from solute interac-
tions with the first solvation shell by use of a term that is a sum of
all of these individual ‘self-energy’ contributions. In the original
formulation by Eisenberg and coworkers,
147,148
the solvation free
energy term was expressed as G
H
5
P
H
i
f
i
1 C
i
, where H
i
is the
hydrophobicity of an individual protein residue, f
i
is the fraction
of the residue’s surface that is available to solvent, the C
i
’s are
constants, and the sum is over all residues in the molecule. The
method was subsequently refined by the introduction of atomic
solvation parameters (ASPs), which are the atomic analogues of
the H
i
factors, and the solvation energy term was written as a sum
over individual atomic contributions (without the constant
terms).
147,148
This form of the SASA model has largely replaced
the Wesson and Eisenberg formulation, although the latter is still
available in CHARMM (along with a derivative form for mem-
branes). The current CHARMM implementation of the SASA
model
149
uses the polar hydrogen (PARAM19) potential energy,
has two ASPs, calculates the SASA analytically
150
and includes
approximate solvent shielding effects for the charges. One ASP
value in the CHARMM SASA model is negative, favoring the
direct solvation of polar groups, and the other is positive, approxi-
mating the hydrophobic effect on nonpolar groups.
149
The two pa-
rameters were optimized to be consistent with the simplified treat-
ment of electrostatic interactions based on the neutralization of
charged groups
151
and the use of distance-dependent dielectric
screening (with e(r) 5 2r). The charge neutralization and dis-
tance-dependent dielectric address, in an approximate way,
solvent shielding of the electrostatic interactions that is not
accounted for in the simpler SASA-based solvation models. How-
ever, in the present approach the shielding does not depend on the
environment (i.e., given the same interatomic distance, a pair of
charges in the interior of a protein feels the same screening as a
pair of charges at the protein surface) so that it is most accurate
for peptides and small proteins, where most of the atoms are on or
near the surface. The change in the SASA, as a function of the
system coordinates, can be used to obtain forces for minimization
and dynamics. In part because the surface area calculation is ana-
lytic and based on interatomic distances, the SASA model is fast
and has been shown to be useful in computationally demanding
problems, such as the analysis of interactions in icosahedral viral
capsids.
152
The two-ASP SASA model has been used for investi-
gating the folding mechanism of structured peptides
153–156
and
small proteins,
157
as well as the reversible mechanical unfolding
of a helical peptide.
158
Moreover, simulations of the early steps of
aggregation of amyloid-forming peptides using the SASA model
have provided evidence of the importance of side chain interac-
tions
159,160
and elucidated the role of aggregation ‘hot-spots’
along the polypeptide sequence.
161
Because of the efficiency of
the two-ASP SASA model,
149
most of the studies mentioned
involved simulations of several microseconds in length, which
have yielded adequate sampling of the peptide systems at equilib-
rium. A SASA model based on the all-atom representation is also
present in CHARMM as part of the RUSH module
162
(see
CHARMM documentation).
Gaussian Solvation Free Energy Model (EEF1)
A related model, referred to as EEF1,
151
combines an excluded-
volume implicit solvation model with a modified version of the
polar hydrogen energy function (PARAM19 atomic representa-
tion). The model is similar in spirit to SASA/ASP but does not
require the calculation of the SASA. In EEF1, as in the SASA/
ASP model, the solvation free energy is considered to be the
sum of contributions from the system’s constituent elements.
The solvation free energy of each group of atoms in the EEF1
model is equal to the solvation free energy that the same group
has in a reference (model) compound, minus the solvation lost
due to the presence of other protein groups around it (solvent
exclusion effect). A Gaussian function is used to describe the
decay of the solvation free energy density with distance. Group
contributions to the solvation free energy were obtained from an
analysis of experimental solvation free energy data for model
compounds.
163,164
In addition to the solvent-exclusion effect, the
dielectric screening of electrostatic interactions by water is
accounted for by the use of a distance-dependent dielectric con-
stant and the neutralization of ionic side chains; the latter is
essential for the EEF1 model, and was also adopted in the two-
ASP SASA model.
149,153
MD simulations with EEF1 are about
1.7 times slower than vacuum simulations but significantly faster
than most of the other solvation models in CHARMM (see
later). The model has been tested extensively. It yields modest
deviations from crystal structures in MD simulations at room
temperature and unfolding pathways that are in satisfactory
agreement with explicit solvent simulations. The model has been
used to discriminate native conformations from misfolded
decoys
130
and to determine the folding free energy landscape of
a b-hairpin.
165,166
Other studies include the exploration of par-
tially unfolded states of a-lactalbumin,
167
a series of studies of
protein unfolding,
142,168–170
the investigation of coupled unfold-
ing/dissociation of the p53 tetramerization domain,
171
the identi-
fication of stable building blocks in proteins,
172
an analysis of
the energy landscape of polyalanine,
173
an analysis of the heat
capacity change on protein denaturation,
174
the packing of sec-
ondary structural elements of proteins into the correct tertiary
structural folds,
175
and calculations of the contributions to pro-
tein–ligand binding free energies.
176
EEF1 has been used by
Baker and coworkers in successful protein–protein docking
177
and protein design studies.
178
An implicit membrane model
based on EEF1 is available in CHARMM.
135
An updated param-
eterization based on PMF calculations for ionizable side
chains
179
is referred to as EEF1.1.
135
EEF1 has also been
adapted for use with the all-atom CHARMM 22 energy
function,
180
but this formulation has not yet been extensively
tested.
1554 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 10
Screened Coulomb Potentials Implicit Solvent Model (SCPISM)
The SCPISM continuum model uses a screened Coulomb poten-
tial to describe solvent-shielded interactions, based on the Debye
theory of liquids.
181,182
In the SCPISM model, the standard elec-
trostatic component of the force field (Coulomb interaction in
vacuo) is replaced by terms that describe both the screened elec-
trostatic interactions and the self-energy of each atom. Hydrogen
bonding modulation
183
and nonelectrostatic solvent-induced
forces (e.g., hydrophobicity) are included in the recent version.
The current implementation in CHARMM can be used for
energy evaluations, minimization, and MD simulations. It has
recently been shown that the SCPISM model preserves the main
structural properties of proteins (of up to 75 amino acids) in
long ([35 ns) Langevin dynamics simulations, as well as hydro-
gen bond patterns of residues at the protein/solvent interface.
88
For a 15,000-atom system, MD simulations with this method
(using an all-atom model) are approximately five times slower
than with EEF1 (which uses a polar hydrogen model representa-
tion).
Implicit Solvent with Reference Integral Site Model (RISM)
The RISM module in CHARMM implements the reference inter-
action site model.
184
This is based on an approximate statistical
mechanical theory that involves the site–site Ornstein–Zernike
integral equation and makes possible the calculation of the aver-
age solvent radial pair correlation function around a molecular
solute. The calculated site–site radial distribution functions g(r)
and pair correlation functions c(r) can then be used to determine
quantities such as the PMF between two solvated molecules, and
the excess chemical potential of solvation of a solute in a sol-
vent. The method was first used to characterize the effect of sol-
vent on the flexibility of alanine dipeptide.
55
The change in the
solvent g(r) on solvation can be determined, which allows for
the decomposition of the excess chemical potential into the
energy and entropy of solvation.
185
Further development would
be required for the application of the method to larger peptides
and small proteins, which is now feasible given the availability
of fast computers.
186
Poisson–Boltzmann (PB) Continuum Electrostatics
The PB equation provides the basis for the most accurate contin-
uum models of solvation effects on electrostatic interactions.
Thus PB models are used as the standards for other continuum
models, but have the drawback that they are computationally in-
tensive, though still less costly than the use of explicit solvent.
The linearized PB equation for macroscopic continuum media
has the form:
r eðrÞr/ðrÞðÞðÞjðrÞ
2
/ðrÞ¼4pqðrÞ (2)
where / is the electrostatic potential and e, j and q are the spa-
tially varying dielectric constant, ionic screening, and atomic
charge density, respectively. This formulation is based on the
assumption that, at a given position in space, the polarization
density of the solvent and the local cationic and anionic den-
sities are linearly proportional to the local electric field and local
electrostatic potential, respectively. At physiologic ionic strength
and lower charge densities, the linear and nonlinear forms of the
PB equation give equivalent results
187
; use of the nonlinear
form, which is more computationally costly, is recommended in
cases where the charge density is too high for the linear approxi-
mation to hold. This can be true at low ionic strength for nucleic
acid systems. In the CHARMM program (PBEQ module), the
PB equation is solved numerically using an iterative finite-differ-
ence relaxation algorithm
188,189
by mapping the system (i.e., e,
j, and q) onto a discrete spatial grid. The PBEQ module can
handle the linear and nonlinear forms of the PB equation, as
well as a partially linearized form inspired by the 3D-PLHNC
closure of Kovalenko and Hirata.
190
For the linear PB model,
the electrostatic solvation free energy is calculated as
DG
elec
¼
1
2
X
i
q
i
/
rf
ðiÞ; (3)
where q
i
is the charge on particle i and /
rf
(i) is the reaction field
at the position of particle i (usually obtained by subtracting the
electrostatic potential in vacuum from that calculated with the
dielectric solvent environment). This can also be expressed as
191
DG
elec
¼
1
2
X
i;j
q
i
M
rf
ði; jÞq
j
; (4)
where M
rf
(i,j) is the reaction field Green function matrix. The
PBEQ module in CHARMM
191,192
computes the electrostatic
potential and the solvation free energy using this approach. The
accuracy of continuum electrostatic models is sensitive to the
choice of the atomic radii used for setting the dielectric bound-
ary between the solute and the solvent. For accurate PB calcula-
tions with the PBEQ module, optimized sets of atomic protein
and nucleic acid Born-like radii have been determined using
MD simulations and free energy perturbation calculations with
explicit water molecules.
192,193
Continuum electrostatic calcula-
tions with the optimized atomic radii provide an implicit solvent
approach that is generally useful; examples are the studies of
nucleic acids and their complexes with proteins
194,195
and of
MM/PBSA calculations on kinase inhibitor affinities.
196
The
PBEQ module also has a number of features that can be used in
electrostatic calculations related to biological membranes.
32,197
In particular, it can be employed to calculate the transmembrane
potential profile and the induced capacitive surface charge corre-
sponding to a given transmembrane potential difference, which
is essential for examining conformational changes driven by an
electrostatic voltage difference across the membrane.
197,198
In addition to the standard Dirichlet boundary conditions
(fixed potential on the edge of the grid), a number of options for
imposing alternative boundary conditions on the edge of the finite
grid are available; they include conducting boundary conditions
(zero electrostatic potential), periodic boundary conditions in
three dimensions, and planar periodic boundary conditions in two
dimensions. The latter are useful for calculations involving planar
membranes. The average electrostatic potential over user-speci-
fied parts of the system can also be calculated (PBAVerage sub-
command); this is used, for example, in charge-scaling proce-
dures. It is also possible to use the result from a coarse grid to set
1555CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 11
up the boundary conditions of a finer grid, focusing on a small
region of interest. The PBEQ module is not limited to the most
common applications of the finite-difference PB equation, which
involve determining the effective solvation of a solute in a given
conformation. An accurate method for calculating the analytic
first derivative of the finite-difference PB solvation free energy
with respect to the atomic coordinates of the solute (electrostatic
solvation forces) has also been implemented.
191
It allows the
PBEQ module to be used in combination with several of the other
tools available in CHARMM for investigating the properties of
biological macromolecules (i.e., energy minimization, MD,
reaction path optimization, normal modes, etc.). Since the PB cal-
culation treats the effect of solvent only on the electrostatic inter-
actions, it is often combined with methods for estimating the
hydrophobic contribution. The simplest one approximates the
term as proportional to the SASA, but in recent years more
sophisticated approaches have been developed. For example,
AGBNP in the Impact program
199
and PBSA in Amber
200
account for both cavity and solute–solvent dispersion interactions.
Smooth ‘Conductor-Like Screening Model’
(COSMO) Solvation Model
Solvation boundary element methods based on the COSMO
201
model have proved to be stable and efficient. This model relies
on an electrostatic variational principle that is exact for a
conductor, and with certain corrections, provides useful, approxi-
mate results for many solvents over a broad range of dielectric
constants.
202–204
For such a model, the solvent reaction field potential can be
represented as the potential arising from a surface charge distri-
bution that lies at the dielectric boundary. This allows study of a
two-dimensional surface problem instead of a three-dimensional
volume problem. An advantage is that it is often easier to refine
the discretization of the two-dimensional boundary element sur-
face than to increase the resolution of a three-dimensional grid
in a finite-difference PB calculation. In the COSMO approach,
the numerical solution of the variational problem involves the
discretization of the cavity surface into tesserae that are used to
expand the solvent polarization density from which the reaction
field potential is derived. A difficulty that can arise in the sur-
face discretization used in these methods involves ensuring con-
tinuity of the solvation energy and its derivatives with respect to
the atomic coordinates, which is critical for stable molecular
mechanics optimization procedures and dynamics simulations.
The smooth COSMO method developed by York and Karplus
205
addresses this problem and provides a stable and efficient
boundary element method solvation model that can be used in a
variety of applications. The method utilizes Gaussian surface
elements to avoid singularities in the surface element interaction
matrix, and a switching function that allows surface elements to
smoothly appear or disappear as atoms become exposed or bur-
ied. The energy surface in this formulation has been demon-
strated to have smooth analytic derivatives, and the method has
been recently integrated into the semiempirical MNDO97
206
program interfaced with CHARMM.
207,208
The smooth COSMO method, like the COSMO method, has
some computational advantages (in both speed and memory
requirements) over the PB method that arise from the discretiza-
tion procedure. The convergence of the numerical solution in all
three of the methods depends on the resolution of the grids, and
in the case of the COSMO methods, the lower dimensionality of
the grid used to discretize the numerical problem leads generally
to increased computational efficiency and lower demands on
computer memory. However, the COSMO methods are less gen-
eral than the PB method in that the latter can treat spatially
varying dielectric constants and effects of ion concentration in a
more straightforward manner.
Generalized Born Electrostatics
Implicit solvent models based on the generalized Born (GB) for-
malism share the same underlying dielectric continuum model
for the solvent as the Poisson or PB methods. However, GB
theories replace the time-consuming iterative solution for obtain-
ing the electrostatic potential required in finite-difference PB
calculations in eq. (2) by the solvent-induced reaction field
energy as approximated by a pairwise sum over interacting
charges, q
i
,
209–213
DG
elec
e
p
[e
w
¼
1
2
1
e
p
1
e
w

X
i;j
q
i
q
j
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r
2
ij
þ a
i
a
j
expðr
2
ij
=Fa
i
a
j
Þ
q
: (5)
In this expression e
p
, e
w
are the interior and exterior dielectric
constants, r
ij
is the distance between atoms i and j, and a
i
is the
effective Born radius of atom i, which is chosen to match the
self-energy of charge i at its position in the system (i.e., a varies
with the position of the atoms). The empirical factor F modu-
lates the length-scale of the Gaussian term and typically ranges
from 2 to 10, with 4 being the most commonly used value.
209
Equation (5) assumes that the shielded electrostatic interactions
arising in the dielectric environment can be expressed as a
superposition of pairwise terms. This is the so-called ‘pairwise
shielding approximation’’. The efficiency of the GB approach
lies in the possibility of estimating the effective atomic Born
radii using a computationally inexpensive scheme. For example,
the Coulomb field approximation assumes that the dielectric
displacement for a set of charges embedded in a low dielectric
cavity behaves like the Coulomb field of these charges in
vacuum,
213,214
leading to the following expression for a
i
1
a
i
¼
1
R
i
1
4p
Z
solute;r>R
i
1
r
4
dV (6)
where R
i
is usually the atomic van der Waals radius of atom i.
Many generalized Born theories approximate the volume inte-
gral, carried out over the entire solute cavity, by a discrete sum
of overlapping spheres
211,212
or Gaussians.
213
Alternative meth-
ods have also been devised to carry out the integration, with
moderate computational cost, either by reformulating the volume
integral into a surface integral
215
or by directly using analytical
integration techniques borrowed from density functional
theory.
134,216,217
Several implicit solvent schemes based on the pairwise
shielding approximation exist in CHARMM. The first to be
implemented in CHARMM was the Analytic Continuum Elec-
1556 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 12
trostatics (ACE) model developed by Schaefer and Karplus.
213
This model is based on the Coulomb field approximation and
the pairwise summation utilizing Gaussian functions as described
earlier.
213
Applications of the model include MD simulations
and studies of the folding of proteins and peptides.
121,218
An
improved version of ACE, called ACE2, is now available and
should be used in most applications with the PARAM19 polar
hydrogen force field. Also implemented in CHARMM is a
‘standard’ GB model following the formulation of Qiu et al.
211
This approach utilizes a pairwise sum over atoms to provide
estimates of the atomic Born radii (solution to eq. 6 earlier).
219
It is optimized for use with the PARAM 19 polar hydrogen
force field described earlier, with which it yields mean-absolute
errors of 1–2% in the calculated solvation energies when com-
pared with Poisson solutions using the same dielectric boundary.
This model, accessed in CHARMM via the GBORn command
(GENBORN preprocessor keyword), has been integrated with a
number of other methods, such as free energy perturbation cal-
culations and replicas. It has proven useful in folding studies of
peptides and proteins,
220
the investigation of helix to coil transi-
tions,
221
and binding free energy calculations.
222
The description of the solvent boundary at the molecular sur-
face in the ACE and standard GB methods can lead to problems
that arise from the presence of microscopic, solvent-inaccessible
voids of high dielectric in the interior of larger biomolecules. One
approach used in PB calculations is to fill the voids with neutral
spheres of low dielectric constant.
223
In an alternative approach,
the integral formulation described by eq. (6) can be evaluated
numerically with methods drawn from density functional
theory.
216
This method can be extended with analytical approxi-
mations for the molecular volume or a van der Waals-based sur-
face with a smooth switching function similar to that used by Im
et al. in the context of the PB equation.
191
The molecular volume
approximation is implemented in the Generalized Born/Molecular
Volume (GBMV) model,
217
the smoothed van der Waals surface
in the GBSW model.
134
These approaches provide results that are
comparable to ‘exact’ continuum Poisson theory.
224
However,
they are considerably more time-consuming than the simpler
models. The GBSW model is approximately five times as expen-
sive as corresponding vacuum simulations, and the GBMV model
is 6–10 times as expensive (see also next subsection). The GBMV
and GBSW models have been applied to protein–ligand interac-
tions,
225
protein–protein and protein–DNA interactions,
141
pH-
coupled MD
127,129
and protein folding/scoring in structure predic-
tion.
132
Key in improving the accuracy of these models have been
extensions beyond the Coulomb field approximation described in
eq. (6) earlier,
216,217
which is exact only for a single charge at the
center of a spherical cavity.
226
The FACTS model (fast analytical
continuum treatment of solvation) is a recently developed GB
method in which the effective Born radius of each atom is esti-
mated efficiently by using empirical formulas for approximating
the volume and spatial symmetry of the solvent that is displaced
by its neighboring atoms.
227
Apart from the factor F in eq. (5),
the GB implementations in CHARMM involve empirical volume
parameters for the calculation of the Born radii in eq. (6). The
ACE model uses type-dependent atomic volumes derived by aver-
aging over high-resolution structures in the PDB,
228
and a single
adjustable (smoothing) parameter. The value normally chosen for
this parameter (1.3) gives the best agreement between the solute
volume description underlying ACE--the superposition of Gaus-
sians-- and the solute cavity model that is used in the standard fi-
nite difference PB methods.
Currently, the focus in GB developments has begun to shift
away from matching PB results and toward reproducing explicit
solvent simulations and experimental data through reparameteri-
zation of the models.
138,229
Recent examples demonstrate that
the resulting class of implicit solvent force fields can reproduce
folding equilibria for both helical and b-hairpin peptides, as
illustrated in Figure 5a for the folding of Trp-zip, a small helical
peptide.
Speed Comparison of Implicit Solvent Models
Since reducing the required computer time is one of the primary
reasons for the use of implicit solvent models, approximate tim-
ings obtained for small- to medium-sized systems are given in
Table 1. The fourth column lists the computational cost for each
model relative to a corresponding vacuum calculation using the
same system, cutoff distances, atomic representation, and condi-
tions. By this ‘intrinsic cost’ measure, which gives an indica-
tion of the speed of the implicit solvent term calculation, per se,
the implicit models are all in the range of 1.7 to 10 times slower
than vacuum. As expected, the cost of the explicit water calcula-
tions (using periodic boundary conditions and particle mesh
Ewald summations; see Section IV.B.) is much greater than that
of the implicit models; i.e., explicit solvent calculations are
approximately 20–200 times slower than the corresponding vac-
uum calculations, depending on the size of the system, the num-
ber of water molecules used, and the atomic representation used
for the solute. Column 5 of the table lists the computational cost
for each model, using its recommended cutoff distances and
atomic representation, relative to a vacuum calculation on the
same system using an 8 A
˚
cutoff and a polar hydrogen represen-
tation. By this ‘actual cost’ measure, which relates the speeds
of the models when they are used as recommended (default pa-
rameters), the implicit models vary in speed by a factor of 50 or
more. These differences arise primarily from the fact that the
models employ different atomic representations (all-hydrogen vs.
polar hydrogen) and nonbonded cutoff distances (8 A
˚
in SASA
vs. up to 20 A
˚
in the others), in addition to having different
intrinsic speeds or costs. The polar-hydrogen model has approxi-
mately two times fewer atoms than the all-hydrogen model for
proteins, so that there are approximately four times fewer pair-
wise interactions in models 1 and 2 than in models 3–6. The
longer nonbonded cutoff distances for models 4–6 mean that
larger numbers of pairwise intramolecular protein interactions
are taken into account. The actual cost, rather than the intrinsic
cost, must be used to estimate the relative computer times that
will be required for calculations with the given models. For
example, MD simulations with the SASA model are up to 100–
200 times faster than explicit water simulations.
Implicit Membrane Models
In the same spirit as the implicit solvent (water) potentials,
implicit membrane representations reduce the required computer
time by modeling the membrane environment about a solute
1557CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 13
(often an embedded protein or peptide) as one or more continu-
ous distributions. Formulations based upon either PB theory
(GB-like models)
230
or Gaussian solvation energy density distri-
butions (an EEF1-type model)
135
have been developed. The first
GB/IM model was developed as an extension of the simple two-
dielectric form of the GB theory
219
by splitting the integral in
eq. (6) into intramembrane and extramembrane parts.
136
This
model has been shown to reproduce the positions of helices
within a biological membrane. The introduction of a smooth
switching function to describe the solute–solvent boundary
134
and the reformulation of the integration schemes for eq.
(6)
216,217
have led to the introduction of a GB model that per-
mits arbitrarily shaped low-dielectric volumes to be ‘embedded’
in the high-dielectric solvent.
231
This model has been developed
in the GBSW and GBMV modules, and it has been applied to
the simulation and folding of integral membrane peptides and
proteins
232
with direct comparisons to measured properties from
solid-state NMR experiments
137
; it has also been used in studies
of the insertion of peptides into membranes
233
and peptide asso-
ciation and oligomerization in membrane environments.
234
Stud-
ies of the mechanism by which insertion of designed peptides
into membrane bilayers proceeds, as illustrated in Figure 5b,
demonstrate the utility of implicit models in the exploration of
membrane-mediated phenomena.
An EEF1-type model for implicit solvent and membrane
studies (IMM1)
135
has been implemented in CHARMM. Like
EEF1,
151
the method utilizes Gaussian functions to describe the
extent of burial of atoms in different regions (i.e., the aqueous
solvent versus the bilayer membrane). IMM1 has been extended
so as to account for the surface potential due to anionic lip-
ids,
139
the transmembrane potential,
235
and the treatment of
membrane proteins with an aqueous pore.
236
It has been used to
obtain insights into the forces that drive transmembrane helix
association,
180,237
calculate pH-dependent absolute membrane
binding free energies,
238
and determine the voltage-dependent
energetics of alamethicin monomers.
235
Figure 5. Combining replica-exchange molecular dynamics with implicit solvent. (a) Folding of the
Trp-zip peptide.
229
A consistent parameterization of the CHARMM all-hydrogen force field and the
GBSW implicit solvent model was used, with 16 replicas in a temperature range of 270 to 550 K. The
left panel shows the distribution of potential energy values from the 270 K window. The right panel
provides a comparison of the most populated cluster from the simulations and the NMR-derived struc-
ture; the backbone RMSD between the two structures is 1 A
˚
. (b) Implicit membrane/implicit solvent
replica-exchange molecular dynamics simulations
233
of a designed 19-residue peptide, WALP-19. The
peptide inserts into the membrane via a mechanism involving the following steps: (1) migration to
the membrane-water interface as a partially unstructured peptide; (2) formation of helical structure
via D-hairpin conformations; (3) helical elongation through thermal fluctuations to 80% helical; and
(4) N-terminal insertion across the membrane.
1558 Brooks et al.
Vol. 30, No. 10
Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
Page 14
Determination of Ionization States
Accurately simulating the electrostatic properties of a protein
depends upon the correct determination of the charged state of all
ionizable residues. The ionization state of a residue is determined
by the free energy difference between its protonated and unproto-
nated forms at a given pH. This can be expressed in terms of the
change in pK
a
(DpK
a
) of the amino acid in a protein relative to
the intrinsic pK
a
of the amino acid in solution. Correspondingly,
the free energy of transfer of the charged amino acid from the sol-
vent to the protein environment is equal to the reversible work
required to ionize the side chain in the protein minus the work
needed to ionize it in an isolated peptide in bulk water.
239
Although DpK
a
can also be calculated using free energy perturba-
tion with explicit solvent molecules (see Section VI), a PB or GB
treatment representing the solvent as a dielectric continuum usu-
ally offers a convenient and reasonably accurate approximation,
because the change in pK
a
tends to be dominated by electrostatic
contributions to the solvation free energy. The calculation of pK
a
shifts can be done with the finite-difference PBEQ mod-
ule.
191,192,240
Estimates of the pK
a
based on the PB equation can
be improved by introducing conformational sampling; e.g., calcu-
lated pK
a
shifts obtained by averaging over the coordinates from
an MD simulation (see Section VIII) are usually more accurate
than what is calculated with a single structure.
240–243
In some
cases, there is a strong coupling between the ionization states of
the residues and the predominant conformation of a protein. To
address this issue, a methodology has been implemented that
combines the calculation of pK
a
with the generalized Born meth-
ods described earlier and MD. This approach, called pH-
MD,
127,129
provides a means of coupling changes in protein and
peptide conformations with changes in the proton occupancy of ti-
tratable residues. The methodology utilizes an extended Lagran-
gian to dynamically propagate the proton occupancy variables,
which evolve in the electrostatic field of the protein/solvent envi-
ronment through the GBMV
216
or GBSW
134
models. The pH-MD
method, which has been successfully applied to a number of pro-
tein systems,
127–129
extends the range of techniques that are avail-
able for accurately representing electrostatic interactions in sol-
vated biological systems.
III.E. Quantum Mechanical/Molecular Mechanical Methods
Because the QM treatment of an entire biological macromole-
cule requires very large amounts of computer time, combined
QM/MM potentials are commonly used to study chemical and
biological processes involving bond cleavage and formation,
such as enzymatic reactions. In this approach, a small region
(the QM region) of the system, whose electronic structural
changes are of interest, is treated quantum mechanically and the
remainder of the system (the MM region) is represented by a
classical MM force field. Typically, the former is a solute or the
active site of an enzyme, while the latter includes the parts of
the protein and the solvent environment that are not involved in
the reaction. QM/MM methods were first used for studying poly-
Table 1. Approximate Relative Computational Costs of MD Calculations Using Various Solvation Models
in CHARMM (Version c34b1) for Proteins in the Approximate Range of 50 to 500 Residues in Size
(750 to 7500 Atoms in the All-H Representation).
Atomic
representation
Outer NB
cutoff (A
˚
)
Cost relative to:
Vacuum w/ the solvation
model-specific cutoff and
atomic representation
(‘‘intrinsic cost’’)
Vacuum w/ an 8 A
˚
cutoff
and a polar H atomic
representation (‘‘actual cost’’)
1) SASA polar H 8 1.5–1.9 1.5–1.9
2) EEF1 polar H 10 1.6–1.7 2–3
3) SCPISM all H 14 1.7 10–16
4) ACE all H 20 3.5–4.5 60–80
5) GBSW all H 20 4.5–6 70–100
6) GBMV all H 20 6–10 100–175
7) TIP3P all H (solute) 16 20–60 200–5001
8) TIP3P polar H (solute) 16 50–200 200–5001
The ‘atomic representation’ column indicates whether the solvation model is based on a polar hydrogen
(PARAM19) or an all-hydrogen (PARAM22) atomic model. (In the TIP3P calculations, this applies only to the pro-
tein, since the water model is unchanged). The ‘outer NB cutoff’ column gives the outer cutoff distance for non-
bonded interactions recommended for the model. The relative costs, or speeds, of the various solvent models show a
much greater variability when they are all compared to a single vacuum calculation on a given system (last column,
‘actual cost’’) than they do when each model is compared to a vacuum calculation that uses the same atomic repre-
sentation and cutoff distance (fourth column, ‘intrinsic cost’’). See text. The TIP3P results (7,8) are for calculations
using 30–60 times as many explicit water molecules as protein residues. The TIP3P calculations have a higher com-
putational cost relative to vacuum when the simpler and faster polar H model is used for the protein. All benchmark-
ing was performed on an Intel Pentium 4 3.20 GHz CPU with an ifort (9.0) CHARMM compilation and repeated on
a 1.6 GHz AMD Opteron CPU with a gnu (gcc-4.2) compilation, using a non-bonded list update frequency of 10
steps/update.
1559CHARMM: The Biomolecular Simulation Program
Journal of Computational Chemistry DOI 10.1002/jcc
Page 15
ene electronic excitations in 1972
244
and carbonium ion stabili-
zation in the active site of lysozyme in 1976.
245
Energy calcula-
tions based on the QM/MM methodology were carried out for
reactions in solution and in enzymes several years later.
246
In the QM/MM approach, electrostatic effects as well as
steric contributions from the environment are incorporated
directly into the electronic structure calculations of the reactive
region, affecting its charge polarization and chemical reactiv-
ity.
247
A QM/MM potential employing semiempirical QM mod-
els (QUANTUM module) was first implemented in CHARMM
in 1987,
248,249
through the incorporation of parts of the MOPAC
program.
250
It was used for the first MD free energy simulation
of an S
N
2 reaction in aqueous solution
248
; numerous applications
to enzymatic reactions have since been published (see, for exam-
ple Refs. 251–256). Because of its ability to treat bond-forming
and bond-breaking processes, to describe both the electronic
ground state and excited states,
257
and to reduce the required
computer time dramatically relative to full QM calculations, the
QM/MM approach has become the method of choice for study-
ing chemical reactions in condensed phases and in macromolec-
ular systems such as enzymes and ribozymes.
258,259
In addition
to the MOPAC-based QUANTUM module and its derivative
SQUANTM, the semiempirical, self-consistent charge density
functional tight-binding (SCC-DFTB) methods have been imple-
mented directly in CHARMM.
260
Also, a number of external
electronic structure programs have been interfaced with
CHARMM and its MM force fields for use in the QM part of
QM/MM calculations. In this subsection, the key features of the
QM/MM module in CHARMM are summarized. Details of the
theory and applications can be found in Refs. 247, 249, 256 and 261.
Treatment of Boundary Atoms
In a combined QM/MM method, the most difficult part of the sys-
tem to model is the covalent boundary between the QM and MM
regions
249,262
; this problem is avoided if the boundary is between
molecules (e.g., between a ‘QM ligand and an ‘MM solvated
protein). For the general case, there are three main criteria that the
boundary between the QM and MM regions should satisfy.
263
First,
the charge polarization at the boundary should closely approximate
that obtained from QM calculations for the entire system. The effec-
tive electronegativity of a boundary atom in the MM region should
be the same as that of a real QM atom. Second, the geometry at the
boundary must be correct. Finally, the torsional potential energy sur-
face at the boundary should be consistent with the surfaces arising
from both QM and MM calculations.
Three approaches for treating the QM/MM boundary have
been implemented in CHARMM. They are:
Hydrogen link atom.
246,249,264
In this most commonly used
approach, the valency of the QM fragment is saturated by a
hydrogen atom that is introduced into the system along the
covalent bond between the QM and MM regions. Although
the link-atom approach has been used in numerous studies, it
introduces additional degrees of freedom into the system; in
addition, partial charges on the MM atoms that are closest to
the link-atom must be removed to avoid convergence difficul-
ties. The latter problem has been solved by the use of a dou-
ble link-atom method
265
that incorporates a balanced bond sat-
uration of both the QM and MM fragments.
Delocalized Gaussian MM (DGMM) charges.
266
This method
incorporates the delocalized character of charge densities on
MM atoms using Gaussian functions, and it has been success-
fully combined with the double link atom approach. The
method greatly simplifies the rules governing QM/MM elec-
trostatic interactions.
Generalized Hybrid Orbital (GHO) method.
263
This method
partitions the system at an sp
3
atom. The boundary atom is
included in both the QM calculation, with a fully optimized
hybrid orbital and three auxiliary orbitals, and also the MM
force field, through the retention of the classical partial
charge. The method is an extension of the frozen, localized
orbital approach,
267
and it neither introduces nor eliminates
degrees of freedom. The GHO method has been implemented
in CHARMM for semiempirical,
263
SCC-DFTB,
268
ab initio
Hartree-Fock,
269
and DFT
270
quantum chemical models, the
latter two through the GAMESS-US interface.
QM/MM Interactions
The interactions between the QM and