Nonkinetic modeling of the mechanical unfolding of multimodular proteins: theory and experiments.
ABSTRACT We introduce and discuss a novel approach called back-calculation for analyzing force spectroscopy experiments on multimodular proteins. The relationship between the histograms of the unfolding forces for different peaks, corresponding to a different number of not-yet-unfolded protein modules, is exploited in such a manner that the sole distribution of the forces for one unfolding peak can be used to predict the unfolding forces for other peaks. The scheme is based on a bootstrap prediction method and does not rely on any specific kinetic model for multimodular unfolding. It is tested and validated in both theoretical/computational contexts (based on stochastic simulations) and atomic force microscopy experiments on (GB1)(8) multimodular protein constructs. The prediction accuracy is so high that the predicted average unfolding forces corresponding to each peak for the GB1 construct are within only 5 pN of the averaged directly-measured values. Experimental data are also used to illustrate how the limitations of standard kinetic models can be aptly circumvented by the proposed approach.
-
Citations (0)
-
Cited In (0)
Page 1
Nonkinetic Modeling of the Mechanical Unfolding of Multimodular
Proteins: Theory and Experiments
F. Benedetti,†C. Micheletti,‡§G. Bussi,§S. K. Sekatskii,†and G. Dietler†*
†Laboratory of Physics of Living Matter, Ecole Polytechnique Fe ´de ´rale de Lausanne (EPFL), Lausanne, Switzerland; and‡Scuola
Internazionale Superiore di Studi Avanzati and§Consiglio Nazionale delle Ricerche-Istituto per l’Officina dei Materiali Democritos and Italian
Institute of Technology, SISSA Unit, Trieste, Italy
ABSTRACT
on multimodular proteins. The relationship between the histograms of the unfolding forces for different peaks, corresponding to
a different number of not-yet-unfolded protein modules, is exploited in such a manner that the sole distribution of the forces for
one unfolding peak can be used to predict the unfolding forces for other peaks. The scheme is based on a bootstrap prediction
method and does not rely on any specific kinetic model for multimodular unfolding. It is tested and validated in both theoretical/
computational contexts (based on stochastic simulations) and atomic force microscopy experiments on (GB1)8multimodular
protein constructs. The prediction accuracy is so high that the predicted average unfolding forces corresponding to each
peak for the GB1 construct are within only 5 pN of the averaged directly-measured values. Experimental data are also used
to illustrate how the limitations of standard kinetic models can be aptly circumvented by the proposed approach.
We introduce and discuss a novel approach called back-calculation for analyzing force spectroscopy experiments
INTRODUCTION
During the last decade, single-molecule force spectroscopy
experiments based on optical tweezers or atomic force spec-
troscopy have acquired increasing importance for character-
izing properties of individual proteins, as well as protein
complexes. Among the hundreds of such studies carried out
so far, it is particularly worth mentioning force spectroscopy
investigations of multimodular proteins. These constructs
typically consist of a series of protein modules that are cova-
lentlylinkedattheirends(see,e.g.,(1–12)).Uponpullingthe
constructs at both ends, a series of unfolding events are
observed. The forces at which the unfolding events occur
carry awealth ofinformation about the unfolding mechanics
andkineticsoftheconstructmodules(3,13,14).Customarily,
this information is extracted by analyzing the force distri-
bution obtained by gathering together the succession of un-
folding forces over repeated stretching experiments and
analyzing them with different methods, such as Monte Carlo
simulation andregression tozero force (5,15–19).Thescope
and utility of these commonly employed analysis techniques
can be considerably extended by examining separately the
distribution of the forces associated with the first, second,
etc., unfolding event in the constructs. This approach, which
so far has been applied only limitedly (5,20), is particularly
appropriate and useful whenthe constructconsistsofrepeats
of the same type of globular protein (such as I27 or GB1). In
fact, because of the identical nature of the modules, it is ex-
pected that the forces associated with the various unfolding
events depend on the number of unfolded modules still
present on the molecule, but that the statistical distributions
should nevertheless be tied by a definite relationship, as
pointed out in previous studies (5,20,21,22). To the best of
ourknowledge,suchdependencehasnotyetbeenadequately
exploredorexploitedinexperimentalcontexts.Furthermore,
one may envisage using only the limited information con-
tained in the experimental distribution of one single group
ofunfoldingforcestopredict with highaccuracytheaverage
unfolding forces of all other groups. This issue also has not
been addressed before, and we therefore investigate it in
this study.
The problem is here attacked at two levels. First we
adopt a simplified analytical scheme, which implicitly relies
on a standard kinetic model for the unfolding of the protein
modules (Evans’s theory). This method, which builds on a
treatment introduced in previous studies (20,21,23), com-
bines a transparent analytical formulation with the sim-
plicity of implementation and use. Yet the simplifying
assumptions that allow for the exact analytical treatment
of the model come at a disadvantage, since the predicted
probability distributions for the unfolding forces of the
various peaks can be significantly different from the mea-
sured ones.
This limitation can be overcome by using the alternative
and more general phenomenological approach introduced
and discussed here for the first time that we know of. The
scheme, based on the bootstrap statistics and termed back-
calculation, is parameter-free and does not rely on any spe-
cific kinetic model. The method merely uses the probability
distribution of forces associated with one of the unfolding
events (the first, the second, etc.) and predicts the distribu-
tion of forces of all other events. The method is validated
against data obtained from stochastic simulations (both Lan-
gevin and Monte Carlo) and from atomic force microscopy
(AFM) experiments carried out on multimodular GB1 con-
structs. In all cases, the average forces associated with any
Submitted April 27, 2011, and accepted for publication July 1, 2011.
*Correspondence: giovanni.dietler@epfl.ch
Editor: Laura Finzi.
? 2011 by the Biophysical Society
0006-3495/11/09/1504/9$2.00
doi: 10.1016/j.bpj.2011.07.047
1504Biophysical JournalVolume 101September 20111504–1512
Page 2
unfolding event are well predicted by back-calculation.
Deviations from the experimental measured values are of
only 5 pN, a quantity that is smaller than the uncertainty
typically associated with experimental estimates for protein
unfolding forces.
MATERIALS AND METHODS
Experiment
Our experiments were performed on multimeric constructs consisting of
eight GB1 modules, hereafter denoted as (GB1)8, and dissolved in Tris/
HCl buffer (10 mM, pH 7.5) at a concentration of ~20 mg/ml (1,2,24).
The force-extension curves of (GB1)8 were measured by means of a
commercially available AFM system (Picoforce AFM Nanoscope IIIa,
Bruker, Madison, WI) using a V-shaped silicon nitride cantilever (NP,
Bruker). The spring constant of the lever was measured from thermal fluc-
tuation measurements (25) as part of the AFM calibration procedure and
was found to be equal to 0.0575 N/m. The constructs were pulled along
the x direction at the speed x ¼ 2180 nm/s. Further details of the standard
experimental protocol that was followed can be found in a recently pub-
lished note (21).
A typical experimental force-extension curve is presented in Fig. 1.
Because the AFM tip will not necessarily pick the construct at its free
end, the number of modules trapped between the anchored end and the
AFM tip can be <8. As a matter offact, among the curvespresenting a clear
detachment peak, the most numerous group was the one displaying six
unfolding peaks. We therefore limited considerations to this set of force-
extension curves. The curves were analyzed using Hooke (26) an open-
source software package designed to analyze the force spectroscopy curves.
Hooke was further used to analyze the data from Langevin simulations of
the stretching of multimodular protein constructs (see below).
Numerical simulations
Two different computational approaches, namely Monte Carlo and Lange-
vin simulations, were used to study the mechanical unfolding of multimod-
ular protein constructs. In both cases, the pulled construct is assumed to be
anchored at one end while the other is pulled at fixed speed. The end-to-end
distance of each protein module projected along the pulling direction, x, is
used as an effective order parameter to describe the module state. This
corresponds to considering the system as being effectively one-dimen-
sional, as in the sketch of Fig. 2 a. This is a good approximation, since
for the typical unfolding forces at play in our experiments, the end-to-end
distanceofourconstruct,asobtainedfromawormlike-chain (WLC)model,
FIGURE 1
experiment on a (GB1)8construct. The lower curve shows the force as the
AFM tip approaches the substrate until the contact is established. The upper
curve represents the force while the tip is retracted from the substrate and
shows a series of unfolding events of the construct picked up by the tip.
Notice that this trace displays six unfolding events. The effect of an aspe-
cific interaction at the beginning of the retracting trace is observed. The
peak forces leading to the unfolding of the various modules are measured
with respect to the background provided by the constant part of the retract-
ing curve.
Typical force-extension curve recorded in an AFM stretching
0510 15
x (nm)
0
5
10
15
20
25
30
U(x)/kB T
xF
xT
x2
xU
x1
GFT
GFU
x0
x1
x2
x3
x4
x5
a
b
FIGURE 2
module of the construct in the Langevin simulation (see Eq. 1). A large
value of A, equal to 100 pN ? nm2, was used in Eq. 1 to enforce the
constraint that the modules cannot be stretched beyond the nominal GB1
contour length, Lc¼ 18 nm. The reference end-to-end separation of the
folded state, xF, is set equal to 4 nm and the the end-to-end separation
between the folded state and the transition (T) state, Dx ¼ xT? xF, is
0.5 nm. The reference end-to-end separation of the unfolded state, xU, is
equal to ðLc? xTÞ=2. Consistent with what has been established in previous
studies, the barrier separating the folded and transition states is set equal to
DGFT¼ 20kBT, whereas the barrier between the folded and unfolded states
has the value DGFU¼ 5kBT, with the temperature, T, equal to 300 K (i.e.,
kBTz4:2pN nm). For simplicity, the curvature, kT, is set equal to kF.The
value of kFin turn is set to 4DGFT=ðxT? xFÞ2¼ 1344 pN/nm to ensure
the continuity of the potential and its derivative at the midpoint,
x1¼ ðxFþ xTÞ=2, where the first two parabolas in Eq. 1 meet. The value
of kUwas set to be much smaller than kF, at kU¼ kF/500 ¼ 2.69 pN/nm.
The value of x2was finally obtained by the requirement of continuity of
the potential. To avoid an excessive parameterization of the model, flexible
linkers in the construct are described as unfolded protein modules. (Inset)
Protein model used in Langevin simulations. (b) A force-extension curve
obtained with the Langevin simulation applied to a model construct that
initially comprised six folded modules intercalated by seven linkers (each
linker has the same length as an unfolded module).
(a) Illustration of the anharmonic spring potential for one
Biophysical Journal 101(6) 1504–1512
Modeling of the Unfolding of Proteins1505
Page 3
is expected to be almost equal to its contour length, so that fluctuations in
the y and z directions can be neglected.
Depending on the value of the end-to-end separation, each module is
considered as being folded (F) or unfolded (U); these two states are sepa-
rated by a barrier of potential energy whose height is modulated by the
applied tensile force. The effective potential energy, U(x), is modeled
explicitly in the Langevin scheme, where one integrates the stochastic
equation of motion for each of the tethered modules in the construct that
is being pulled. By contrast, no explicit representation of the construct is
considered in the Monte Carlo approach. The latter, in fact, is employed
to model the succession of discrete unfolding events occurring at force-
dependent rates.
The two methods clearly embody rather different strategies for simu-
lating the stretching experiments and, also in view of the different parame-
ters used in the corresponding stochastic simulations, are useful to probe
the generality and transferability of the back-calculation method (BC)
proposed here.
A detailed description of the two methods is provided hereafter.
Langevin simulations
With reference to the sketch in the inset of Fig. 2 a, the anchored end of the
construct is located at x0¼ 0, while the other end (x4in the sketch) is
attached to the moving AFM tip. For simplicity, to parallel what is done
in the Monte Carlo scheme below, the latter is modeled as a Hookean spring
(x4–x5in the sketch) with spring constant kAFM¼ 0.01 N/m. Each protein
module behaves as an anharmonic spring; the associated free-energy
profile, U(x), is shown in Fig. 2 a and described by the expression
8
<
>
UðxÞ ¼
A
Lc? xþ
>
>
>
>
>
>
>
>
>
:
1
2kFðx ? xFÞ2if x<x1
DGFT?1
DGFUþ1
2kTðx ? xTÞ2if x1<x<x2
2kUðx ? xUÞ2otherwise
:
(1)
The model parameters are chosen to be consistent with the overall shape
of the potential energy typically found in proteins (27) and are provided in
the caption to Fig. 2. In particular, the contour length of each module is
equal to the nominal contour length of GB1, Lc¼ 18 nm, the reference
end-to-end separation of the folded state is xF¼ 4 nm, and its distance to
the transition state is Dx ¼ 0.5 nm.
In multimodular protein constructs, each protein module is connected to
the next via a short peptidic linker of length 1.5 nm. To keep at a minimum
the number of parameters in the model, we described these linkers, which
clearly do not undergo any transition upon stretching, by unfolded protein
modules. To do so, we initially prepared the pristine construct as a succes-
sion of folded modules with initial end-to-end separation equal to xF, inter-
calated with unfolded modules with initial end-to-end separation equal to
xU.The potential energy barrier separating the F and U states is sufficiently
high that an initially unfolded module will not spontaneously refold over
the short timespan of the model stretching experiment.
The total potential energy ofthe homomeric modulechain composedof n
protein modules, ‘ linkers, and the AFM tip is given by
Hðx1;x2;::xn;xnþ‘þ1Þ ¼
X
þ1
nþ‘
i¼1
Uðxi? xi?1Þ
2kAFMðxnþ‘þ1? xnþ‘Þ2:
(2)
The time evolution of the key construct positions, xi¼1;.nþ‘; follows the
overdamped Langevin dynamics:
g_ xi ¼ ?vH
vxiþ hðtÞ;
(3)
where g ¼ 4.4 ? 10?5pN s/nm is the friction coefficient appropriate to
yield (according to Kramer’s theory) a spontaneous unfolding rate (at
zero applied force) equal to koff¼ 10?2s?1. hðtÞ is a Gaussian white noise
with zero mean and variance equal to 2kBTg (kBis the Boltzmann constant
and T ¼ 300 K is the temperature). Notice that the derivative of the poten-
tial U (entailed by the derivative of H) is not continuous in x2.
The stochastic equations of motion were integrated numerically with a
time step of 1 ns. After an initial equilibration, the position of the AFM
tip, xnþ‘þ1 is moved at constant velocity, xnþ‘þ1ðtÞ ¼ vt, with n ¼
500 nm/s. This velocity value is commonly employed in stretching simula-
tions and falls in the typical range of pulling velocities used in experiments
(28). The typical time span required to unfold all the n ¼ 6 modules in the
constant-velocity simulation was 0.25 s.
The force/extension curve of the system is obtained by recording the
restoring force experienced by the AFM tip, f ¼ kAFMðxnþ‘þ1? xnþ‘Þ,
as a function of the AFM tip position, xnþ‘þ1, as shown in Fig. 2 b. Several
hundred such curves were collected and analyzed with Hooke after per-
forming a time average over windows of duration 0.15 ms to mimic the
finite time resolution of a typical experiment.
Monte Carlo simulations
As anticipated at the beginning of the section, the Monte Carlo approach
(here implemented as in studies by Rief and colleagues (28,29) and Zinober
and colleagues (30)) provides a phenomenological approach to the kinetics
of mechanical unfolding. The advantage of its transparent formulation is
balanced by the highly simplified nature of the model. In particular, by
contrast with the Langevin modeling of biopolymer stretching employed
here and in other approaches (31), no explicit representation of the module
constructs is considered, and the linkers are not accounted for. In addition,
the pullingactionis assumedto act equallyon all the nmodules,causingthe
same steady increase of the end-to-end separation for each of them. Notice
that because of the limitedsound velocity in the chain, this condition is only
approximately realized in Langevin schemes and experiments (where other
effects, such as viscosity, can be at play). In any case, the lower the pulling
rate the better the approximation is expected to be.
Within the above assumption, the end-to-end distance (equal to zero at
the initial time, t ¼ 0) of each one of the n modules at time t is equal to
ðyt ? FðtÞ=kAFMÞ=n. In this study, we considered n ¼ 6 and v ¼ 500 nm/s,
and the effective spring constant of the AFM tip is set to kAFM¼ 0.01 N/m.
Notice that kAFMis smaller than the nominal spring constant of the tip used
in our typical stretching experiments. This is because kAFMstands for an
effective spring that, in addition to the AFM tip, accounts for stiffness of
the folded modules, which are not explicitly included in our Monte Carlo
scheme. With this simplified description, the loading rate is not dependent
on the number of folded modules, which brings the Monte Carlo closer to
the BC assumptions, as discussed later. We underline that our goal here is to
provide a benchmarkfor the BC and notto reproducethe experimental data,
so a qualitative picture is satisfactory at this stage. The instantaneous force
experienced by each module is computed from the theoretical force-exten-
sion curve, fWLC(x) of an equilibrated WLC with contour length Lc¼ 18 nm
(appropriate for GB1) and a persistence length of lp¼ 0.4 nm. The progres-
sive loading of the modules is followed at time increments of duration
Dt ¼ 1.6 ? 10?5s. At the (discrete) time, t, the probability that one of
the modules yields and becomes unfolded is computed within the Evans
approximation (13) disregarding the refolding probability:
?fWLCðntÞ ? Dx
pðtÞ ¼ koffexp
kBT
?
Dt;
(4)
where kBis the Boltzmann constant and T ¼ 300 K is the system tempera-
ture. The effectivevalues koff¼ 0.11s?1and Dx ¼ 1.44A˚are obtained from
Biophysical Journal 101(6) 1504–1512
1506 Benedetti et al.
Page 4
a fit of the experimentaldata usingEvans’s theory as in Benedetti et al. (20).
The fittingprocedureensuresthatthe unfoldingforcesfall in a rangesimilar
to the experimental ones, although a precise match is neither expected nor
sought. The Monte Carlo scheme consists of drawing a random number,
uniformly distributed in the [0,1] interval, for each of the n protein modules
and comparing it with p(t). An unfolding event occurs when one of the n
random numbers is smaller than p(t). The associated unfolding force is
recorded and the calculation is next repeated with the n – 1 modules. The
statistical distribution of the unfolding forces for each value of n was ob-
tained from 1000 repeats of the Monte Carlo unfolding simulations.
Analytically solvable model
Simple analyticalexpressions for the probabilitydistributionsof the unfold-
ing forces, and the associated meanvalues and variance, as a function of the
numberofdomains,n,canbeobtainedbyintroducingafurthersimplification
besides the ones introduced for the Monte Carlo scheme. Specifically, each
proteinmoduleistreatedasaharmonicspring(asintheLangevinapproach)
ratherthanaWLC,andtheunfoldingprocessfollowstheBell-Evanstheory.
Within these assumptions, the probability distribution of unfolding forces
has been previously worked out both for single-chain stretching (see, e.g.,
Hummer and Szabo (23)) and for multimodular constructs (20,21). For
completeness, and for the purposes of better discussing the phenomenolog-
ical BC method, an analogous derivation is provided here.
Let us consider a model construct consisting of n0þ ‘ harmonic springs:
the n0initially folded modules have spring constant equal to kF, whereas the
remaining ‘ have a smaller spring constant, kU,as appropriate for unfolded
modules. The model construct is subject to the AFM pulling action (the
AFM tip is again modeled as a harmonic spring with constant kAFM).
Because the tip is pulled at constant velocity, v, the tensile force experi-
enced at time t by each construct is equal to
nt
n=kFþ ðn0? n þ ‘Þ=kUþ 1=kAFMhkeffnt;
where keffis the effective spring constant of the construct in series with the
AFM tip and its inverse decreases with n as k?1
k?1
the completely unfolded construct, and A is a correction term that describes
the dependence of the spring constant on the number of folded modules.
Following Evans’s theory, the survival probability that any one module
has remained folded up to time t is equal to (23)
fðtÞ ¼
(5)
eff¼ ðn0? n þ ‘Þk?1
Uþ
AFMþ nk?1
Fhk?1ð1 ? AnÞ. Here k?1is the inverse spring constant of
S1ðtÞhexp
"
?
ZfðtÞ
0
koffe
vkeff
fDx
kBT
df
#
:
(6)
The probability that all the n modules have remained folded up to time t,
or equivalently up to the loading force fðtÞ ¼ nt keff, is simply obtained by
raising the above expression to the power n,
SnðtÞ ¼ S1ðtÞn:
(7)
By differentiating Snwith respect to f, one obtains the probability distri-
bution, pnðfÞ, for the force at which the first unfolding event occurs in
a chain of n modules.
The sought expression is
0
B
pnðfÞf exp
B
@
fDx
kBT?nð1 ? AnÞkoff
Dxkv
kBT
e
fDx
kBT
1
C
C
A;
(8)
where the proportionality factor, containing the normalization of the prob-
ability distribution, was omitted.
Since the function above is typically nonnegligible only for positive f, we
can compute its average and variance integrating over ½?N;þN?, which
leads to the analytical result
2
Dxkn
hfin¼ ?kBT
Dx
664g þ log
koff
kBT
þ log?n ? An2?
3
775
(9)
s2
n¼
p2
6
?Dx
kBT
?2
(10)
where g ? 0:577 is the Euler-Mascheroni constant.
We remark here that the variance is independent of the number of folded
modules,n,intheconstruct.Thisresultisrelatedtotheempiricalobservation
that in typical stretching experiments of a single protein construct, the vari-
ance of the unfolding force is largely independent of the loading rate (23).
If the dependence of the springconstant on the number offoldedmodules
can be neglected, the average unfolding force acquires a particularly simple
expression:
hfin¼ ?g
a?log½bn?
a
;
(11)
where the parameters a and b are obtained from the average force and
variance for a given n: a ¼ p=
Finally, we notice that for all values of n, the expression of Eq. 8 corre-
sponds to a Gumbel extremal distribution (32) with the fat tail extending
toward low values of the force, f. Accordingly, theviability of the analytical
model to capture the statistical properties of the unfolding forces measured
for a given value of n can be ascertained by checking whether the forces
follow the Gumbel distribution. To address this point, we employed the
Anderson-Darling test and computed the significance level to which one
can support the null hypothesis that the data originate from a Gumbel distri-
bution. According to custom, the threshold of 5% statistical significance
was used to accept or reject the null hypothesis.
ffiffiffiffiffiffiffi ffi
6s2
p
and b ¼ exp½?g ? ahfin?=n.
Back-calculation
The previous analytical results rely on a definite kinetic model (Evans’s
theory) and on the harmonic modeling of the elastic response of the AFM
tip and the protein modules. These effects could be included in a more
general theoretical framework which, however, would not yield simple
analytical calculations.
This difficulty can be circumvented using a simple and physically
appealing phenomenological approach, which we term the back-calculation
method, described hereafter. The method is parameter-free, as it relies on
the knowledge of the empirical probability distribution of the unfolding
forces at one particular value of n. This reference distribution can be
used straightforwardly to predict the average value of the force and its vari-
ance at all other values of n. The scheme is best illustrated assuming that the
reference distribution is the one for n ¼ 1, pn¼1ðfÞ. This distribution is
directly obtained from the data gathered in the stretching experiments or
from the stochastic simulations (Fig. 3). In the same spirit of the Monte
Carlo and the analytically solvable model, we assume that the loading
rate is sufficiently low that at any given time, all modules experience the
same instantaneous tensile force applied at their ends, f, and that each of
them can unfold independently from the others. We also assume that the
stiffness of the construct, defined as the derivative of f with respect to its
length, is not dependent on the number of folded constructs, n. This is
equivalent to considering A ¼ 0 in the analytical model, and it is realistic
for investigated cases (for counterexamples, see King et al. (21)). Under
Biophysical Journal 101(6) 1504–1512
Modeling of the Unfolding of Proteins1507
Page 5
these assumptions, without resorting to any kinetic model or lengthy
stochastic simulations, the average unfolding force associated with the
nth peak, hfin, is computed by drawing n random numbers distributed ac-
cording to pn¼1ðfÞ and taking the smallest of them as the force at which
one of the n modules first unfolds. The averagevalue of the unfolding force,
hfin(and its variance), is clearly obtained by repeating the batch force
sampling process several times.
OnemayusetheorderedlistofNmeasurementstoconstructacumulative
probability distribution interpolated linearly between consecutive measured
values. The cumulative distribution is next straightforwardly used (see
Chapter 7.3 in Press et al. (33)) to sample, with the correct weight, the n
force values. Describing the process in terms of the cumulative distribution
has also the following important advantage. It is possible to exploit the
simple relationship of Eq. 7 (which is based on the assumption of indepen-
dence and hence valid regardless of the specific underlying kinetic process)
to generate data for unfolding forces of the nth peak starting from the data
obtained for a peak with a different order, say mth.
In fact, indicating by
Zf
?N
the cumulative distribution for the unfolding forces of the mth peak, it can
be determined that the corresponding cumulative distribution for the nth
peak is
QnðfÞ ¼ 1 ? ð1 ? QmðfÞÞn=m:
QmðfÞ ¼
df0pmðf0Þ
(12)
It is importantto stressthat the aboverelationshipsare ofhigh conceptual
and practical interest for recovering the distribution of unfolding forces of
one peak, say n ¼ 1, starting from a peak of higher order, say m ¼ 2.
A detailed description of how this backward extrapolation can be practi-
cally implemented in a numerical scheme is provided in the Appendix,
and the results are provided in the Supporting Material. The results dis-
cussed hereafter are produced with a more refined method where the prob-
ability pn¼1ðfÞ is obtained from fitting the histogram of the raw force
measurements with a convolution of Gaussians using the kernel density
estimation (KDE) (34) (Fig. 3). Data are sampled according to this distribu-
tion using either the cumulative distribution, or the rejection scheme (see
Chapter 7.3 in Press et al. (33)).
RESULTS AND DISCUSSION
Forall the three systems ofinterest (the GB1 experiment and
the Monte Carlo and Langevin simulations), we analyzed
the data of the force-versus-extension (or equivalently
force-versus-time) curves. In all three cases, the data per-
tained to the stretching of constructs of n0¼ 6 modules,
and therefore, the few curves that did not display a clear
presence of six force peaks were discarded.
The peaks were indexed in an inverse order with respect
to their order of appearance in the stretching experiment.
Specifically, the peak of order n ¼ 6 corresponds to the
peak observed first (when six folded modules were present
before the unfolding event), whereas peak order n ¼ 1 corre-
sponds to the unfolding event for which only one module
was present before the unfolding event and occurring imme-
diately before the construct detachment from the support.
The peak force data for each value of n were next considered
(see Benedetti et al. (20) for details on the automated peak
division procedure) and used to compute the histograms
reflecting the force distribution. The probability distribution
is obtained with a convolution of Gaussians using the KDE
method mentioned in the Methods section. The resulting
normalized distribution of the forces, pn(f), at which a single
module unfolds in the Monte Carlo scheme and GB1 exper-
iments is shown in Fig. 3. The best-fit Gaussian convolu-
tions were used to obtain a robust estimate of the average
unfolding force and its standard deviation (SD) at each value
of n. The results are provided in Tables 1–3 and Figs. 4–6.
The best-fit distribution for the last surviving peak, n ¼ 1,
was typically used as input for the back-calculation and
analytically solvable methods to obtain predictions for the
average unfolding forces at all values of n. For the case of
highest practical interest, namely, the GB1 experiment, the
distribution of unfolding forces of all other peaks,
n ¼ 2; 3; 4; 5; 6, was also used to predict the unfolding
forces of other peaks (see Table S1 in the Supporting
Material).
a
b
FIGURE 3
for the n ¼ 1 peaks (i.e., near the detachment point) obtained for (a) Monte
Carlo simulation and (b) GB1pulling experiments. The continuous line in
both cases represents the Gaussian KDE estimated from the raw data.
The histograms are both normalized.
Normalized probability distribution of the unfolding forces
TABLE 1Unfolding forces from Monte Carlo simulations
Comparison with n ¼ 1 back-calculated values
n654321
Average, Monte Carlo data
Average, n ¼ 1 BC
SD, Monte Carlo data
SD, n ¼ 1 BC
All values are given in pN.
142
138
40
29
146
142
35
31
150
148
35
32
157
156
35
31
166
168
37
33
187
—
35
—
Biophysical Journal 101(6) 1504–1512
1508 Benedetti et al.
Page 6
Monte Carlo data
We start by discussing the application of the method to data
generated using the Monte Carlo procedure. Of the three
sets of data (from experiment and Langevin and Monte
Carlo simulations), this set is the one that is expected to
be most appropriately captured by back-calculation. The
Monte Carlo scheme indeed builds on the identical kinetic
status of all the modules, and during this process, only the
total contour length changes, with very mild effect on the
loading rate.
By using the n ¼ 1 data, it is indeed seen in Table 1 that
the mean values of the predicted and measured unfolding
forces are in good agreement for all peaks n ¼ 2.6, with
differences always <5 pN. The agreement is readily
perceived in Fig. 4, where it is seen that the BC data up to
n ¼ 4 fall within the statistical uncertainty of the Monte
Carlo data, and only the forces predicted at n ¼ 5 and n ¼
6 present SDs of ~2.5 from the Monte Carlo data.
A more challenging quantity to compare is the second
moment of the distribution, which is the variance or, equiv-
alently, the SD. For the latter quantity, the agreement is still
good. The deviation of the Monte Carlo and back-calculated
values, jsBC? sdataj=ðsBCþ sdataÞ, is typically within 10%
and is worst for the last peak, n ¼ 6, for which it is 16%.
The results of the analytical model present an accord with
the Monte Carlo data that is comparable with their agree-
ment with the BC. This is illustrated by the dashed line in
Fig. 4, which reports the analytical predictions based on
the Monte Carlo data for n ¼ 1 (data for this case and other
values of n are provided in the Supporting Material). The
good accord is nontrivial in view of the fact that the simpli-
fied analytical treatment describes the folded protein
domains as harmonic springs, whereas the Monte Carlo
data were generated employing a WLC model for each
domain. We carried out the Anderson-Darling statistical
test described in the Methods section and established that
the Monte Carlo data for n ¼ 1 (and higher values, too)
are compatible with an underlying Gumbel distribution.
This reinforces the applicability of the simplified analytical
scheme in the model Monte Carlo context.
Langevin data
The same analysis was repeated for the data generated using
the Langevin scheme, which contains several differences
from the Monte Carlo scheme. Specifically, the Langevin
scheme does not enforce either Evans’s kinetics or the
same precise behavior of all folded modules in the chain.
In addition, it accounts for the presence of model linkers
between the folded modules, and finally, values of Dx and
koffare appreciably different from those in the Monte Carlo
case.
FIGURE 4
Carlo (circles) and BC (diamonds) from the distribution of unfolding forces
of the peak order n ¼ 1 stemming from the Monte Carlo simulation and
kinetic model (dashed line). The statistical error (mean 5 SD) is the
same size as the symbols, ~0.5 pN.
Average unfolding force versus peak order for the Monte
FIGURE 5
(circles) and BC (diamonds) from the distribution of unfolding forces of
the peak order n ¼ 1 stemming from the Langevin simulation and kinetic
model (dashed line). The statistical errors (mean 5 SD) are shown with
error bars for the Langevin simulation, whereas for the BC data, the errors
are the same size as the symbols (~0.5 pN).
Average unfolding force versus peak order for the Langevin
TABLE 2 Unfolding forces from Langevin simulations
Comparison with n ¼ 1 back-calculated values
n654321
Average, Langevin simulation
Average, n ¼ 1 BC
SD, Langevin simulation
SD, n ¼ 1 BC
All values are given in pN.
70
70
16
11
73
72
14
11
74
74
13
12
78
78
13
13
84
83
14
14
92
—
15
—
TABLE 3 Unfolding forces for GB1
Comparison with n ¼ 1 and n ¼ 2 back-calculated values
n654321
Average, experiment
Average, n ¼ 1 BC
Average, n ¼ 2 BC
SD – experiment
SD, n ¼ 1 BC
SD, n ¼ 2 BC
Accuracy in prediction of the SD is improved when data from n ¼ 2 are
used. All values are given in pN.
124
121
123
25
17
21
128
125
127
30
18
21
129
128
131
31
19
22
137
134
137
31
21
24
146
143
—
30
25
—
162
—
160
31
—
36
Biophysical Journal 101(6) 1504–1512
Modeling of the Unfolding of Proteins1509
Page 7
As is visible from Table 2 and Fig. 5 also for the Langevin
context, the performance of the back-calculation method is
good and, with the exception of the point for the fourth
peak (which compared to the trend of the other data points
appears to be an outlier), the average predicted values of un-
folding forces are all within about one SD of the Langevin
data. As for the Monte Carlo data, the predicted SDs are
also consistent with the measured ones, and the largest rela-
tive error, again found for the peak with the largest extrap-
olation, n ¼ 6, is 14%.
As shown in Fig. 5, the performance of the analytical
model based on the n ¼ 1 data is not dissimilar from that
of the back-calculation (the detailed results are again re-
ported in the Supporting Material). Indeed, also in this
context, the Anderson-Darling test indicates that distribu-
tions of the unfolding forces are compatible with a Gumbel
distribution.
Experimental data on (GB1)8
Finally, we turned to the experimental data, which clearly
represent the greatest challenge. Because of the complex
interplay of the several factors that impact on the stretching
process, and because the pulling rate is not particularly low,
it may not be expected a priori that the system unfolding
response might be well captured by the back-calculation.
In particular, it is not obvious a priori that the unfolding
events of various peaks in the chain can be appropriately
described as statistically independent events. In fact, corre-
lations can arise in nearby protein moduli because of the
limited sound velocity in the chain or because of contact
interactions. Moreover, given the small number of experi-
mental samples, 47 measurements for each force peak, it
is not simple to obtain a reference histogram from the exper-
iment or to pin a distribution, even when using the KDE
interpolation scheme. Thus, any defect in the starting distri-
bution is consequently amplified by the back-calculation
method.
Despite these caveats, the predictive capability of the
back-calculation method for the average unfolding forces
was found to be very good also in this case. The level of
agreement can be appreciated by examining Table 3 and
Fig. 6. The increasing underestimation, as a function of n,
of the sample SD (predicted from the n ¼ 1 peak) is prob-
ably ascribable to the fewer-than-expected measurements
at low forces. This is readily demonstrated by starting the
back-calculation from the second peak, n ¼ 2, which by
covering lower values of unfolding forces can reproduce
very well not only the mean unfolding forces at all other
values of n, but also the corresponding SDs.
In light of this consideration, thevery good consistency of
the back-calculation data with the measured distribution is
very remarkable, and testifies to the robust applicability of
the method.
It is particularly instructive to discuss the performance
of the analytical method as well. Neither the average un-
folding forces nor their SDs are dissimilar from the experi-
mental ones (see Fig. 6 and Supporting Material). However,
unlike in the cases for the Monte Carlo and Langevin data,
thisagreementdoesnotstanduptocloser statisticalscrutiny.
In fact, the Anderson-Darling statistical test indicates that
the experimental data do not follow the Gumbel extremal
statistics entailed by the analytical model at each value of
n (see Eq. 8). In fact, the null hypothesis for the n ¼ 1
peak is supported with a confidence level of <1%. The
same applies for the n ¼ 2 peak as well (in spite of the
fact that a more and more pronounced Gumbel-like char-
acter is expected as n increases).
The above observations demonstrate the utility of the
back-calculation approach in the context of practical in-
terest. Indeed, the phenomenology of systems such as multi-
modular constructs of GB1 can be too rich to be well
accounted for by Evans’s theory. In such contexts, a good
control/prediction of the unfolding forces for varying
numbers of surviving modules can be made only starting
from the phenomenological distribution.
CONCLUSIONS
We present a systematic investigation of the statistical
properties of the forces associated with the first, second,
etc., unfolding events in a multimodular construct. We intro-
duced a phenomenological scheme, termed back-calcula-
tion, which, using as sole input the distribution of the
forces associated with a certain unfolding event (e.g., the
first), predicts the force distribution of all other events. We
stress that the method follows a bootstrap approach starting
from the raw force-extension measurements. In particular,
it does not rely on any model of mechanical response for
protein unfolding kinetics.
At a general level, it is shown that the standard procedure
of analyzing experimental stretching data by grouping
together forces associated with all unfolding events, could
FIGURE 6
imental data (circles) and BC (diamonds) from the distribution of unfolding
forces of the peak order n ¼ 1 stemming from the experiments and kinetic
model (dashed line). The statistical errors (mean 5 SD) are shown with
error bars, whereas the BC data have errors of the same value as the size
of the symbols (~0.5 pN).
Average unfolding forceversus peak order for the GB1 exper-
Biophysical Journal 101(6) 1504–1512
1510Benedetti et al.
Page 8
be more profitably replaced by considering the events sepa-
rately with equal order of appearance. To the best of our
knowledge, the possibility of applying such a scheme to
analyze experimental data has not been explored before.
Second, a comparison of the experimental distributions of
unfolding forces with that predicted by standard kinetic
models reveals appreciable discrepancies, thus preventing
their use as reliable descriptors of the mechanical unfolding
process. This fact is consistent with previous independent
investigations (27).
In addition, the approach has several implications for the
design/analysis of stretching experiments of multimodular
constructs. First, its simplicity makes the back-calculation
particularly appealing as a simple and transparent scheme
for the interpretation of experimental data. In this respect,
an interesting applicative avenue is offered by heteroge-
neous multimodular constructs, for which the back-cal-
culation can offer a term of reference apt for highlighting
composition-dependent modulations of the mechanical re-
sponse. Second, it offers a simple, parameter-free phenom-
enological approach for predicting the distributions of the
various unfolding peaks using a negligible computational
effort. In this respect, it presents major advantages com-
pared to the more computationally intensive stochastic
(Monte Carlo or Langevin) numerical approaches. Finally,
it can be applied to the design of biomaterials starting
from their molecular modular components (e.g., choosing
an appropriate number of repeats), with unfolding forces
falling in a desired range, or to precondition a pulling exper-
iment (choice of pulling speed, stiffness of the AFM tip) so
that the mechanical response is profiled with a desired
resolution. A study of the latter aspects is underway.
The numerical implementations (C programming lan-
guage) of the back-calculation techniques are available
upon request from the authors.
APPENDIX
The procedure used to predict the force distribution for peak n given a set of
experimental measurements for peak m is discussed here in detail. As a first
step, an ordered table FðmÞ
i
, with i ¼ 1;.;N, is built that contains the N
measured forces for peak m.
To resample new data from the same distribution, the procedure is as
follows:
Extract a uniform random number r ˛½0;1?.
Find i such that i<Nr<i þ 1.
The extracted force is computed as ði ? Nr þ 1ÞFðmÞ
The last step is based on a liner interpolation of the cumulate of the
distribution.
To extract data corresponding to the distribution of a different peak n,
which can be larger or smaller than m, the procedure has to be modified
as follows:
i
þ ðNr ? iÞFðmÞ
iþ1.
Extract a uniform random number r ˛½0;1?.
Compute r0¼ 1 ? ð1 ? rÞm=n.
Find i such that i<Nr0<i þ 1.
The extracted force is computed as ði ? Nr0þ 1ÞFðmÞ
i
þ ðNr0? iÞFðmÞ
iþ1.
SUPPORTING MATERIAL
One figure and one table are available at http://www.biophysj.org/biophysj/
supplemental/S0006-3495(11)00938-6.
This work was supported by the Swiss National Science Foundation (grant
No. 205320-131828) and by the Italian Ministry of Education (MIUR).
REFERENCES
1. Cao, Y., and H. B. Li. 2007. Polyprotein of GB1 is an ideal artificial
elastomeric protein. Nat. Mater. 6:109–114.
2. Cao, Y., C. Lam, ., H. B. Li. 2006. Nonmechanical protein can have
significant mechanical stability. Angew. Chem. Int. Ed. 45:642–645.
3. Carrion-Vazquez, M., A. F. Oberhauser, ., J. M. Fernandez. 1999.
Mechanical and chemical unfolding of a single protein: a comparison.
Proc. Natl. Acad. Sci. USA. 96:3694–3699.
4. Sandal, M., F. Valle, ., B. Samorı `. 2008. Conformational equilibria in
monomeric a-synuclein at the single-molecule level. PLoS Biol. 6:e6.
5. Brockwell, D. J., G. S. Beddard, ., S. E. Radford. 2005. Mechanically
unfolding the small, topologically simple protein L. Biophys. J. 89:
506–519.
6. Marszalek, P. E., H. Lu, ., J. M. Fernandez. 1999. Mechanical unfold-
ing intermediates in titin modules. Nature. 402:100–103.
7. Li, H., W. A. Linke, ., J. M. Fernandez. 2002. Reverse engineering of
the giant muscle protein titin. Nature. 418:998–1002.
8. Carrion-Vazquez, M., H. B. Li, ., J. Fernandez. 2003. The mechanical
stability of ubiquitin is linkage dependent. Nat. Struct. Mol. Biol. 10:
738–743.
9. Li, H., M. Carrion-Vazquez, ., J. Fernandez. 2000. Point mutations
alter the mechanical stability of immunoglobulin modules. Nat. Struct.
Mol. Biol. 7:1117–1120.
10. Lee, G., K. Abdi, ., P. E. Marszalek. 2006. Nanospring behaviour of
ankyrin repeats. Nature. 440:246–249.
11. Oberhauser, A. F., P. K. Hansma, ., J. M. Fernandez. 2001. Stepwise
unfolding of titin under force-clamp atomic force microscopy. Proc.
Natl. Acad. Sci. USA. 98:468–472.
12. Sorce, B., S. Sabella, ., P. P. Pompa. 2009. Single-molecule mechan-
ical unfolding of amyloidogenic b2-microglobulin: the force-spectros-
copy approach. ChemPhysChem. 10:1471–1477.
13. Evans, E., and K. Ritchie. 1997. Dynamic strength of molecular
adhesion bonds. Biophys. J. 72:1541–1555.
14. Garg, A. 1995. Escape-field distribution for escape from a metastable
potential well subject to a steadily increasing bias field. Phys. Rev. B.
51:15592–15595.
15. Imparato, A., F. Sbrana, and M. Vassalli. 2008. Reconstructing the free
energy landscape of a polyprotein by single-molecule experiments.
Europhys. Lett. 82:58006.
16. Oberhauser, A. F., and M. Carrio ´n-Va ´zquez. 2008. Mechanical
biochemistry of proteins one molecule at a time. J. Biol. Chem. 283:
6617–6621.
17. Best, R., D. J. Brockwell, ., J. Clarke. 2003. Force mode AFM as
a tool for protein folding studies. Anal. Chim. Acta. 479:87–105.
18. Brockwell, D. J., G. S. Beddard, ., S. E. Radford. 2002. The effect
of core destabilization on the mechanical resistance of I27. Biophys. J.
83:458–472.
19. Aioanei, D., B. Samorı `, and M. Brucale. 2009. Maximum likelihood
estimation of protein kinetic parameters under weak assumptions
from unfolding force spectroscopy experiments. Phys. Rev. E Stat.
Nonlin. Soft Matter Phys. 80:61916.
20. Benedetti, F., S. K. Sekatskii, and G. Dietler. 2011. Single-molecule
force spectroscopy of multimodular proteins: a new method to extract
kinetic unfolding parameters. J. Adv. Microsc. Res. 6:1–6.
Biophysical Journal 101(6) 1504–1512
Modeling of the Unfolding of Proteins 1511
Page 9
21. King, W. T., M. H. Su, and G. L. Yang. 2010. Monte Carlo simulation
of mechanical unfolding of proteins based on a simple two-state model.
Int. J. Biol. Macromol. 46:159–166.
22. Dietz, H., F. Berkemeier, ., M. Rief. 2006. Anisotropic deformation
response of single protein molecules. Proc. Natl. Acad. Sci. USA.
103:12724–12728.
23. Hummer, G., and A. Szabo. 2008. Thermodynamics and kinetics of
single-molecule force spectroscopy. In Theory and Evaluation of
Single-Molecule Signals. E. Barkai, F. L. H. Brown, M. Orrit, and
H. Yang, editors. World Scientific, Singapore. 139–175.
24. Li, H. 2007. Engineering proteins with tailored nanomechanical prop-
erties: a single molecule approach. Org. Biomol. Chem. 5:3399–3406.
25. Florin, E., M. Rief, ., H. Gaub. 1995. Sensing specific molecular
interactions with the atomic force microscope. Biosens. Bioelectron.
10:895–901.
26. Sandal,M., F. Benedetti, ., B. Samorı `. 2009.Hooke:an open software
platform for force spectroscopy. Bioinformatics. 25:1428–1430.
27. Schlierf, M., and M. Rief. 2006. Single-molecule unfolding force
distributions reveal a funnel-shaped energy landscape. Biophys. J. 90:
L33–L35.
28. Rief, M., M. Gautel, ., H. E. Gaub. 1997. Reversible unfolding of
individual titin immunoglobulin domains by AFM. Science. 276:
1109–1112.
29. Rief, M., J. M. Fernandez, and H. E. Gaub. 1998. Elastically coupled
two-level systems as a model for biopolymer extensibility. Phys. Rev.
Lett. 81:4764–4767.
30. Zinober, R. C., D. J. Brockwell, ., D. A. Smith. 2002. Mechanically
unfolding proteins: the effect of unfolding history and the supramolec-
ular scaffold. Protein Sci. 11:2759–2765.
31. Berkovich, R., S. Garcia-Manyes, ., J. M. Fernandez. 2010. Collapse
dynamics of single proteins extended by force. Biophys. J. 98:2692–
2701.
32. Gumbel, E. 2004. Statistics of Extremes. Dover, Mineola, NY.
33. Press, W., S. Teukolsky, ., B. Flannery. 2007. Numerical Recipes,
3rd ed.: The Art of Scientific Computing. Cambridge University Press,
Cambridge, United Kingdom.
34. Silverman, B. W. 1986. Density Estimation for Statistics and Data
Analysis. Chapman & Hall/CRC, Boca Raton, FL.
Biophysical Journal 101(6) 1504–1512
1512Benedetti et al.
View other sources
Hide other sources
-
Available from Sergey K Sekatskii · 7 Dec 2012
-
Available from epfl.ch