Rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses.
ABSTRACT Protein function frequently involves conformational changes with large amplitude on timescales which are difficult and computationally expensive to access using molecular dynamics. In this paper, we report on the combination of three computationally inexpensive simulation methods--normal mode analysis using the elastic network model, rigidity analysis using the pebble game algorithm, and geometric simulation of protein motion--to explore conformational change along normal mode eigenvectors. Using a combination of ElNemo and First/Froda software, large-amplitude motions in proteins with hundreds or thousands of residues can be rapidly explored within minutes using desktop computing resources. We apply the method to a representative set of six proteins covering a range of sizes and structural characteristics and show that the method identifies specific types of motion in each case and determines their amplitude limits.
Rapid simulation of protein motion: merging
flexibility, rigidity and normal mode analyses
J. E. Jimenez-Roldan1,2, R. B. Freedman2, R. A. R¨ omer1, S. A.
1 Department of Physics and Centre for Scientific Computing, University of Warwick,
Coventry, CV4 7AL, UK
2 School of Life Sciences, University of Warwick, Coventry, CV4 7AL, UK
amplitude on time scales which are difficult and computationally expensive to access
using molecular dynamics. In this paper we report on the combination of three
computationally inexpensive simulation methods — normal mode analysis using the
elastic network model, rigidity analysis using the pebble game algorithm, and geometric
simulation of protein motion — to explore conformational change along normal
mode eigenvectors. Using a combination of ElNemo and First/Froda software,
large-amplitude motions in proteins with hundreds or thousands of residues can be
rapidly explored within minutes using desktop computing resources. We apply the
method to a representative set of 6 proteins covering a range of sizes and structural
characteristics and show that the method identifies specific types of motion in each
case and determines their amplitude limits.
Revision : 1.77, compiled 27 January 2012
Protein function frequently involves conformational changes with large
Protein Rigidity, protein flexibility, normal mode analysis, conformational change,
domain motion, rapid simulation, coarse-grained methods, multi-scale methods.
Protein conformational changes and dynamic behaviour are fundamental for processes
such as catalysis, regulation, and substrate recognition. The time scales of motions
involved in enzyme function span multiple orders of magnitude, from the picosecond
timescale of local side groups rotations to the milli-/microsecond timescale of the motion
of entire domains . Empirical-potential molecular dynamics (MD) has proved to be a
valuable tool for investigating molecular motions, but specialized expertise, large-scale
computing resources and weeks or months of compute time are required to explore
protein motion on simulation time scales greater than tens of nanoseconds . There is
clearly a need for methods that that permit exploration of possible large conformational
motions of proteins in a rational, albeit somewhat simplified, fashion with minimal
computational resources. Such explorations allow for the generation of hypotheses about
arXiv:1201.5531v1 [q-bio.BM] 25 Jan 2012
Rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses2
large conformational motion and protein function that can then be investigated with
detailed MD simulations and by experimental methods such as FRET .
The computational cost of simulations can be reduced by coarse-graining (CG)
— averaging out atomic degrees of freedom so as to represent groups of atoms by a
single site [4, 5] — and/or by simplifying the intersite interactions. Different levels
of simplification can then be combined in multi-scale methods [6,7]. Here, we shall
consider three such methods in particular. Pebble-game rigidity analysis, implemented
in First , provides valuable information on the distribution of rigid and flexible
regions in a structure . Geometric simulation in the Froda algorithm  uses rigidity
information and explores flexible motion [11,12]. Normal mode analysis of a coarse-
grained elastic network model (ENM), implemented in ElNemo [13, 14], generates
eigenvectors for low-frequency motion which are potential sources of functional motion
and conformational change [15–19].
Our approach in this study is to bias the generation of new conformations in
First/Froda along an eigenvector predicted by ElNemo as a low-frequency mode.
The bias directs the motion of the atoms in the direction of the eigenvector while
the geometric constraint system maintains rational bonding and sterics and prevents
the build-up of distortions that occur in a linear projection. The method is outlined
schematically in Figure 1. We apply our method to a set of six proteins of various
sizes from 58 to 1605 residues. We find that flexible motion in an all-atom model can
be explored to large amplitudes in a few CPU-minutes, until further motion is limited
by bonding or steric constraints, representing the calculated limit of motion along that
2.1. Protein selection
We deliberately selected six proteins for analysis that are diverse in function, structural
characteristics and size, ranging from 58 to 1605 residues. For each protein, we selected
a representative high-resolution structure from the Protein Data Bank (PDB) . The
proteins and their PDB codes are listed in Table 1 and their structures are shown in
Figure 2 with colour coding according to the results of rigidity analysis.
Bovine pancreatic trypsin inhibitor (BPTI) is a small well-studied protease inhibitor
of 58 amino acids, comprising mainly random-coil structure plus two antiparallel β-
strands and two short α-helices; the protein has only a small hydrophobic core,
but is additionally stabilized by 3 intra-chain disulphide bonds [21,22]. Mammalian
mitochondrial cytochrome-c is a classic electron-transfer protein containing a redox-
active haem group bound within a primarily α-helical protein fold. These two were
selected as contrasting small proteins.
As medium size proteins we selected α1-antitrypsin and the core catalytic domain of
the motor protein kinesin . The former is a protease inhibitor of the serpin family 
Rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses3
which operates via a ‘bait’ mechanism comparable to that of a mouse-trap, involving a
very significant conformational change, whereas the latter is a mechanochemical device
that transduces the chemical energy of ATP hydrolysis into mechanical work, specifically
the depolymerisation of microtubules in the case of this kinI kinesin. Both these proteins
comprise an extensive β-sheet core flanked by several α-helices.
Protein disulphide-isomerase (PDI) is a large protein (with more than 500 residues)
comprising 4 distinct domains each with a thioredoxin-like fold, connected by two short
and one longer linker ; the protein has both redox and molecular chaperone activity
and intramolecular flexibility is essential for its action in facilitating oxidative folding
of secretory proteins [26, 27]. The largest protein selected is an integral membrane
protein (a bacterial protein of 1605 residues) that operates as a pentameric ligand-gated
ion channel (pLGIC); it comprises an extracellular — mainly β-sheet — domain and
a membrane-embedded domain, mainly comprising α-helices which form the lining of
the ion-channel; the mechanisms of ion permeation and channel gating are not yet
completely understood but it is clear that a conformational change is required for
2.2. Rigidity analysis and energy cutoff selection
We add the hydrogen atoms absent from the PDB X-ray crystal structures using the
software Reduce  and remove alternate conformations and renumber the hydrogen
atoms in Pymol .This produces usable files for First rigidity analysis.
each protein we produce a “rigidity dilution” or rigid cluster decomposition (RCD) 
plot (displayed in the supplementary Figure S2). The plots show the dependence of
the protein rigidity on an energy cutoff parameter, Ecut, which determines the set of
hydrogen bonds to be included in the rigidity analysis. The tertiary structures with the
residues coloured by the rigid clusters they belong to are shown in Figures S3–S8 for
each of the selected energy cutoffs.
Previous studies  suggested that Ecutshould be at least −0.1 kcal/mol in order
to eliminate a large number of very weak hydrogen bonds, and that a natural choice is
near the ‘room temperature’ energy of −0.6 kcal/mol. We have recently discussed the
criteria for a robust selection of Ecut. For each protein we have selected several energy
cutoffs at which to explore flexible motion, as listed in Table 1. A higher cutoff energy
increases the number of constraints included in the simulation, and this is expected to
restrict protein motion. We have used in each case at least one cutoff at which the
protein is largely rigid (in the range −0.1 kcal/mol to −0.7 kcal/mol) and at least one
lower cutoff at which the protein is largely flexible (in the range −0.5 kcal/mol to −2.2
2.3. Normal modes of motion
We obtain the normal modes of motion using the ENM  implemented in ElNemo
software [13,14]. This generates, for each protein, a set of eigenvectors and associated
Rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses4
eigenvalues. Other implementations of elastic network models are also available, for
example the AD-ENM of Zheng et al. .
The low-frequency modes are expected to have the largest amplitudes and thus be
most significant for large conformational changes. However, the six lowest-frequency
modes (modes 1 to 6) are trivial combinations of rigid-body translations and rotations
of the entire protein. For illustration, here we consider the five lowest-frequency non-
trivial modes, that is modes 7 to 11 for each protein. We will denote these modes as
m7, m8, ..., m11.
The mode eigenvectors are predicted on the basis of a single protein conformation.
The amplitude to which a mode can be projected may be limited by bonding and/or
steric constraints that are not evident in the input structure or fully captured by the
ENM. A linear projection of all the residues in the protein along a mode eigenvector
introduces unphysical distortions of the interatomic bonding. Typically, to avoid this
and project a mode to finite amplitude requires one or more cycles of a combined
method; the mode is projected linearly until distortions become evident and the resulting
structure is relaxed using constrained MD/molecular mechanics [33–35]. We explore an
alternative method for projection of modes to large amplitudes, using rigidity analysis
and geometric simulation.
2.4. Froda mobility simulation
Geometric simulation, implemented in the Froda module within First [10,36], explores
the flexible motion available to a protein with a given pattern of rigidity and flexibility.
New conformations are generated by applying a small random perturbation to all atomic
positions; Froda then reapplies bonding and steric constraints to produce an acceptable
new conformation. Motion can be biased by including a directed component to the
perturbation. The capability to use a mode eigenvector as a bias was implemented in
First/Froda by one of us (SAW) and has been briefly reported previously . The
combination of ElNemo and First/Froda, illustrated schematically in Figure 1, is
described in detail in supplementary material (section S1).
Since the displacement from one conformation to the next is small, we record
only every 100th conformation and continue the run for typically several thousand
conformations. The run is considered complete when no further projection along
the mode eigenvector is possible (due to steric clashes or bonding constraints) which
manifests itself in slow generation of new conformations and poor reproducibility in the
results of independent runs. We have performed Froda mobility simulation for each
protein at several selected values of Ecut, see section 2.2.
During conformation generation we track the fitted RMSD between α carbons
of the initial and current conformation. This measure is discussed in more detail in
supplementary material (section 2.5). To project a mode to an amplitude of several˚ A
in fitted RMSD typically takes a few CPU-minutes on a single processor. We carry out
five parallel simulations for each structure, mode and direction of motion and monitor
Rapid simulation of protein motion: merging flexibility, rigidity and normal mode analyses5
the evolution of fitted RMSD during each run, as illustrated in Figures 3-6.
2.5. Raw and fitted RMSD
The RMSDs reported in Table S1 are α carbon RMSDs from the input structure
to a generated conformation, obtained after least-squares fitting using the PyMOL
intra fit command. These values differ somewhat from the raw RMSD values reported
by Froda in its output files, which are calculated without any fitting being carried out.
In particular, the fitted RMSD saturates once further motion along the mode direction
is no longer possible, due to steric clashes or limits imposed by covalent or noncovalent
bonding constraints. The raw RMSD is greater than the fitted RMSD and tends not
to saturate, but rather to continue to increase slowly, once the motion is effectively
jammed. The reason for this different behaviour is a small difference in the statistical
weighting given to each residue by ElNemo and by First/Froda. In the elastic
network modelling, every residue is given equal statistical weight. The non-trivial mode
eigenvectors thus generated have no significant component of rigid-body motion for the
whole structure. In Froda, however, the bias is applied to an all-atom representation of
the structure; and thus the bias applied to a residue with many atoms affects the whole-
body motion of the structure more than the bias applied to a residue with few atoms.
The motion in Froda therefore acquires a small component of rigid-body translation
and rotation, which increases the raw RMSD. Least-squares fitting to the input structure
removes the rigid-body components, so the fitted RMSD detects actual conformational
The effects of fitting are illustrated in Figure S1a for mode m7 of structure
1BPI. The raw RMSD values increase almost linearly during the generation of 10000
conformers, whereas the fitted RMSD values saturate for conformers from ≈ 5000 up to
10000. Conformers 5000 and 10000 differ by ≈ 3˚ A in raw RMSD but by only ≈ 0.8˚ A
in fitted RMSD. Superpositions of conformers 0, 5000 and 10000 with and without
fitting, shown in Figures S1b,c, show that conformers 5000 and 10000 are indeed very
similar to each other. Hence, fitting structures before calculating RMSD values allows
us to identify real conformation change and remove the component of rigid-body motion
introduced by Froda.
3.1. Conformer generation with mode bias
The output of the mobility simulations for BPTI (1BP1) is summarized in Figures 3a–c.
The evolution of RMSD for each of m7, ...,m11, during runs at two selected values of
Ecut, is represented in Figures 3b and 3c respectively. In all cases, we observe an initial
phase in which the RMSD increases almost linearly, as the protein explores the mode
direction without encountering significant steric or bonding constraints on the motion.
During this phase, generation of new conformations in Froda is very rapid and the