Network models for molecular kinetics
Cell Research | Vol 20 No 6 | June 2010
Network models for molecular kinetics and their initial
applications to human health
Gregory R Bowman1, Xuhui Huang2, 3, Vijay S Pande1, 4
1Biophysics Program, Stanford University, Stanford, CA 94305, USA; 2Department of Chemistry, The Hong Kong University of
Science and Technology, Kowloon, Hong Kong; 3Department of Bioengineering, Stanford University, Stanford, CA 94305, USA;
4Department of Chemistry, Stanford University, Stanford, CA 94305, USA
Correspondence: Vijay S Pande
Molecular kinetics underlies all biological phenomena and, like many other biological processes, may best be un-
derstood in terms of networks. These networks, called Markov state models (MSMs), are typically built from physi-
cal simulations. Thus, they are capable of quantitative prediction of experiments and can also provide an intuition for
complex conformational changes. Their primary application has been to protein folding; however, these technologies
and the insights they yield are transferable. For example, MSMs have already proved useful in understanding human
diseases, such as protein misfolding and aggregation in Alzheimer’s disease.
Keywords: Markov state models, molecular dynamics, simulations, protein folding, conformational change, Alzheimer’s
Cell Research (2010) 20:622-630. doi:10.1038/cr.2010.57; published online 27 April 2010
Much of today’s biological research is motivated by
the desire to understand higher-order processes more
fully. For example, our desire to understand human
development and disease helps us to drive research on
cell biology. In turn, our desire to understand cells has
motivated us to understand signaling pathways and gene
networks. At each of these levels, networks — entities
connected by arrows based on their relationships to one
another — have proven to be a valuable way of repre-
senting knowledge (Figure 1).
Now our desire to understand the molecular underpin-
nings of biology and disease is motivating research into
molecular kinetics. For example, it has recently been
discovered that small oligomers of just a few Aβ peptides
may be the toxic elements in Alzheimer’s disease .
Determining their structures could aid in designing drugs
to prevent their formation, however, the structural het-
erogeneity of these oligomers makes accurate structural
characterization with conventional methods difficult.
Fortunately, computational modeling can capture both
the dominant structures and dynamics of these molecules
. Another example is proteomics. Now that genomics
has given us the sequence of the human genome, there
is a push for high throughput structure prediction to ob-
tain the structures of all the proteins encoded therein.
Information-based methods have proved useful for small
globular proteins, but physical models (which capture
both thermodynamics and kinetics) are likely required to
push to systems like membrane proteins for which less
Cell Research (2010) 20:622-630.
© 2010 IBCB, SIBS, CAS All rights reserved 1001-0602/10 $ 32.00
Figure 1 Example networks. (A) A signaling pathway with a
cell (large green rectangle), proteins (colored ovals), a nucleus
(dashed green circle), repression as blunted arrows, movement
into the nucleus as a dashed arrow, and transcription as a solid
black arrow. (B) The cell cycle with stages G1, S, G2, and M.
www.cell-research.com | Cell Research
Gregory R Bowman et al.
structural data is available. Since the majority of drugs
on the market target membrane proteins, such informa-
tion would again be valuable for designing therapeutics.
As in many other fields, networks are a valuable
framework for representing knowledge of molecular
kinetics. In particular, networks called Markov state
models (MSMs) are proving to be a powerful means of
understanding processes like protein folding and confor-
mational changes [3-12]. The power of MSMs derives
from the fact that they are essentially maps of the con-
formational space accessible to a system. That is, like a
road map with roads labeled with speed limits and cities
labeled with populations, MSMs give the probability
that a protein or other molecule will be in a certain set
of conformations (called a state or node) and describe
where it can go next and how quickly. Figure 2 shows
a portion of an MSM as an example. Examining these
maps can give tremendous insight into processes like
protein folding and could even suggest how to manipu-
late these processes with small molecules, mutations or
other perturbations. Models with sufficient resolution can
also yield quantitative agreement with, or even predic-
tion of, experimental observables like folding rates and
MSMs are typically constructed from simulation tra-
jectories (i.e. series of conformations that were visited
one after another in a physical simulation of a system,
like a protein) [4, 5, 10, 12-14]. Because of the temporal
relationship between conformations in a trajectory, it is
possible to group conformations that can interconvert
rapidly into states and then determine the connectivity
between states by counting the number of times a simu-
lation went from one state to another. By employing
these kinetic definitions, one ensures that the system’s
dynamics can be modeled reasonably well by assuming
stochastic transitions between states [3-7, 10, 11, 13, 15].
Thus, it is possible to perform analyses, such as identify-
ing the most probable conformations at equilibrium. In
addition, one can naturally vary the temporal and spatial
resolution of an MSM by changing the definition of what
it means to interconvert rapidly or slowly [4, 5, 15-17].
By choosing a long timescale cutoff, one can obtain hu-
manly comprehensible models with just a few metastable
(or long-lived) states that capture large conformational
changes, like folding. Such coarse-grained models are
useful for gaining an intuition for a system. With a short
timescale cutoff, on the other hand, one can obtain a
model with many states. By using such high-resolution
models, one sacrifices ease of comprehension for more
quantitative agreement with experiments [4, 5, 18].
To date, MSMs have mostly been used to understand
phenomena like peptide and protein folding [4, 5, 13,
17, 19-23], RNA folding [6, 16], and conformational
changes [8, 9, 24]. Having been validated by these stud-
ies, they are now being applied to important topics in
human health. Examples include protein aggregation in
Alzheimer’s disease  and vesicle fusion , an im-
portant step in influenza infection.
The remainder of this review will be divided into
four major sections. First, we review the application of
MSMs to biomolecular folding, one of the driving prob-
lems behind the development of this technology. The
next section focuses on the application of MSMs to hu-
man health, particularly protein misfolding diseases and
influenza. This is followed by a review of MSM method-
ology and a discussion of efficiently capturing long tim-
escales with short simulations and MSMs in the last two
Protein and RNA folding
Protein folding is one of the biological problems that
has driven the development of MSMs. From a biophysi-
cal point of view, it is simply amazing how proteins
collapse to specific structures so quickly given the as-
tronomical number of possible conformations they can
Figure 2 An example MSM for the villin headpiece. Shown here
are four clusters of conformations automatically identified by
MSMBuilder. Each cluster represents a state of the villin protein.
Arrows indicate transitions between states, also identified by
MSMBuilder. The group or cluster representing the native state
(right-most) was accurately identified. Its members match the
crystal structure (shown in darker blue and magenta) with an
average root mean square deviation (RMSD) of 1.8 Å. Courtesy
of Joy Ku and Gregory R. Bowman, reproduced with permission
from Biomedical Computation Review.
Network models for molecular kinetics
Cell Research | Vol 20 No 6 | June 2010
38 Chiu TK, Kubelka J, Herbst-Irmer R, et al. High-resolution
x-ray crystal structures of the villin headpiece subdomain,
an ultrafast folding protein. Proc Natl Acad Sci USA 2005;
39 Kubelka J, Chiu TK, Davies DR, Eaton WA, Hofrichter J. Sub-
microsecond protein folding. J Mol Biol 2006; 359:546-553.
40 Simons KT, Kooperberg C, Huang E, Baker D. Assembly of
protein tertiary structures from fragments with similar local
sequences using simulated annealing and Bayesian scoring
functions. J Mol Biol 1997; 268:209-225.
41 Bowman GR, Pande VS. Simulated tempering yields insight
into the low-resolution Rosetta scoring functions. Proteins
42 Jager M, Nguyen H, Crane JC, Kelly JW, Gruebele M. The
folding mechanism of a beta-sheet: the WW domain. J Mol
Biol 2001; 311:373-393.
43 Vanden Eijnden E. Toward a theory of transition paths. J Stat
Phys 2006; 123:503-523.
44 Berezhkovskii A, Hummer G, Szabo A. Reactive flux and
folding pathways in network models of coarse-grained protein
dynamics. J Chem Phys 2009; 130:205102.
45 Chu VB, Herschlag D. Unwinding RNA's secrets: advances
in the biology, physics, and modeling of complex RNAs. Curr
Opin Struct Biol 2008; 18:305-314.
46 Bowman GR, Huang X, Yao Y, et al. Structural insight into
RNA hairpin folding intermediates. J Am Chem Soc 2008;
47 Koplin J, Mu Y, Richter C, Schwalbe H, Stock G. Structure
and dynamics of an RNA tetraloop: a joint molecular dynam-
ics and NMR study. Structure 2005; 13:1255-1267.
48 Uhlenbeck OC. Tetraloops and RNA folding. Nature 1990;
49 Villa A, Widjajakusuma E, Stock G. Molecular dynamics sim-
ulation of the structure, dynamics, and thermostability of the
RNA hairpins uCACGg and cUUCGg. J Phys Chem B 2008;
50 Garcia AE, Paschek D. Simulation of the pressure and temper-
ature folding/unfolding equilibrium of a small RNA hairpin. J
Am Chem Soc 2008; 130:815-817.
51 Voelz VA, Luttmann E, Bowman GR, Pande VS. Probing the
nanosecond dynamics of a designed three-stranded Beta-sheet
with a massively parallel molecular dynamics simulation. Int J
Mol Sci 2009; 10:1013-1030.
52 Muff S, Caflisch A. Kinetic analysis of molecular dynamics
simulations reveals changes in the denatured state and switch
of folding pathways upon single-point mutation of a beta-sheet
miniprotein. Proteins 2008; 70:1185-1195.
53 Kim YC, Wikstrom M, Hummer G. Kinetic gating of the pro-
ton pump in cytochrome c oxidase. Proc Natl Acad Sci USA
54 Voelz VA, Bowman GR, Beauchamp KA, Pande VS. Molecu-
lar simulation of ab initio protein folding for a millisecond
folder NTL9(1-39). J Am Chem Soc 2010; 132:1526-1528.
55 Horng JC, Moroz V, Raleigh DP. Rapid cooperative two-state
folding of a miniature alpha-beta protein and design of a ther-
mostable variant. J Mol Biol 2003; 326:1261-1270.
56 Kasson PM, Pande VS. Control of membrane fusion mecha-
nism by lipid composition: predictions from ensemble molecu-
lar dynamics. PLoS Comput Biol 2007; 3:e220.
57 Kelley NW, Huang X, Tam S, et al. The predicted structure of
the headpiece of the Huntingtin protein and its implications on
Huntingtin aggregation. J Mol Biol 2009; 388:919-927.
58 Kasson PM, Pande VS. Predicting structure and dynamics of
loosely-ordered protein complexes: influenza hemagglutinin
fusion peptide. Pac Symp Biocomput 2007; 12:40-50.
59 Bacallado S, Chodera JD, Pande V. Bayesian comparison of
Markov models of molecular dynamics with detailed balance
constraint. J Chem Phys 2009; 131:045106.
60 Noe F. Probability distributions of molecular observables com-
puter from Markov models. J Chem Phys 2008; 128:244103.
61 Deuflhard P, Huisinga W, Fischer A, Schütte C. Identification
of almost invariant aggregates in reversible nearly uncoupled
Markov chains. Lin Alg Appl 2000; 315:39-59.
62 Deuflhard P, Weber M. Robust Perron cluster analysis in con-
formation dynamics. Lin Alg Appl 2005; 398:161-184.
63 Weber M, Kube S. Robust Perron Cluster Analysis for various
applications in computational life science. Comput Life Sci
Proc 2005; 3695:57-66.
64 Swope WC, Pitera JW, Suits F. Describing protein folding ki-
netics by molecular dynamics simulations. 1. Theory. J Phys
Chem B 2004; 108:6571-6581.
65 Park S, Pande VS. Validation of Markov state models using
Shannon's entropy. J Chem Phys 2006; 124:054118.
66 Rao F, Caflisch A. Replica exchange molecular dynamics sim-
ulations of reversible folding. J Chem Phys 2003; 119:4035-
67 Wang D, Bushnell DA, Huang X, et al. Structural basis of
transcription: backtracked RNA polymerase II at 3.4 angstrom
resolution. Science 2009; 324:1203-1206.
68 Yao Y, Sun J, Huang X, et al. Topological methods for explor-
ing low-density states in biomolecular folding pathways. J
Chem Phys 2009; 130:144115.
69 Muff S, Caflisch A. ETNA: equilibrium transitions network
and Arrhenius equation for extracting folding kinetics from
REMD simulations. J Phys Chem B 2009; 113:3218-3226.
70 Mitsutake A, Sugita Y, Okamoto Y. Generalized-ensemble
algorithms for molecular simulations of biopolymers. Biopoly-
mers 2001; 60:96-123.
71 Huang X, Bowman GR, Pande VS. Convergence of folding
free energy landscapes via application of enhanced sampling
methods in a distributed computing environment. J Chem Phys