Proc. Nail. Acad. Sci. USA
Vol. 89, pp. 20-22, January 1992
ROBERT ZWANZIG, ATTILA SZABO, AND BIMAN BAGCHI*
Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, Building 2, National Institutes of Health,
Bethesda, MD 20892
Contributed by Robert Zwanzig, October 7, 1991
folded state ofa protein by a random search among all possible
configurations can take an enormously long time. Yet proteins
can fold in seconds or less. Mathematical analysis of a simple
model shows that a small and physically reasonable energy bias
against locally unfavorable configurations, ofthe orderofafew
kT, can reduce Levinthal's time to a biologically significant
Levinthal's paradox is that rmding the native
Lectures and articles dealing with protein folding dynamics
often begin with a reference to the Levinthal "paradox" (1,
2t). The main point of this paper is to show by mathematical
analysis ofa simple model that Levinthal's paradox becomes
irrelevant to protein folding when some of the interactions
between amino acids are taken into account.
How long does it take for a protein to fold up into its native
structure? In a standard illustration ofthe Levinthal paradox,
each bond connecting amino acids can have several (e.g.,
three) possible states, so that a protein of, say, 101 amino
acids could exist in 310 = 5 X 107 configurations. Even ifthe
protein is able to sample new configurations at the rate of1013
per second, or 3 x 1020 per year, it will take 1027 years to try
them all. Levinthal concluded that random searches are not
an effective way of finding the correct state of a folded
protein. Nevertheless, proteins do fold, and in a time scale of
seconds or less. This is the paradox.
A clue to the resolution of the paradox is suggested by
Dawkins (3) in a discussion ofevolution by the accumulation
of small changes. He gave a more whimsical example of a
similar paradox: how long will a random search take to
produce Hamlet's remark "Methinks it is like a weasel"?
This statement contains 28 characters, including 5 spaces;
and there are 27 possible choices for each location, 26 letters
and a space. A monkey typing randomly would probably
require about 2728
1040 key strokes. Dawkins observed that
if the monkey cannot change those letters that are already
correctly in place, Hamlet's remark may be reached by a
random search in only a few thousand key strokes.
In both examples, folding proteins or writing Hamlet,
biased searches are much more effective than completely
random searches. Of course this is well known; in protein
folding simulations, potential energy functions provide the
necessary bias forMonte Carlo methods (4) and formolecular
dynamics methods (5). However, these methods rely heavily
on computation and are not amenable to easy mathematical
analysis. The goal ofthis paper is to provide the mathematical
analysis of Levinthal's paradox for a highly simplified model
of protein folding.
A first-passage time calculation shows that for an unbiased
random search, Levinthal's protein folding estimate is es-
sentially correct. But if a modest amount of bias is intro-
duced, for example by imposing an energy cost of a few kT
for locally incorrect bond configurations, the first-passage
time to the fully correct state can be very much shorter. In
fact, this time can become biologically significant.
Model and Results
Since the goal is not to understand the folding of any
particular protein, but only to present an elementary resolu-
tion of Levinthal's paradox, precise details of the protein
structure will be ignored. Consequently, the model to be
treated is not expected to be directly useful in the theory of
protein folding. It allows for only one of the many kinds of
energetic effects that are known to be involved in folding a
The protein is a chain ofN + 1 amino acids and N bonds.
The connecting bond between two neighboring amino acids
can be characterized as "correct" or "incorrect." (Correct
means native in biology and "Shakespearean" in writing
Hamlet.) There may be several ways that this bond can be
incorrect; these will all be lumped together. Correct bonds
are labeled c, and incorrect bonds are labeled i. A typical
configuration of the chain is cciiciccciic. The "perfect" or
fully correct state is the one consisting of all c's and no i's.
The problem treated here is: starting with an arbitrary
distribution ofcorrect and incorrect bonds, and some rule for
making changes, find how long it takes to get to the perfect
chain for the first time.
The rule for making changes is the main issue. These
changes cannot be entirely random; they must be governed
by physical chemical laws. The simplest nontrivial assump-
tion one can make is that a correct bond can become incorrect
(c-+ i) with the ratekoand an incorrect bond can become
correct (i-+ c) with the rate k1 and that these changes occur
entirely independently. As a result, the number S ofincorrect
bonds in the protein configuration changes in time. The
first-passage time to the perfect state is the elapsed time,
starting from some arbitrary initial S, to arrive for the first
time at S=0. The mean first-passage time T(S) is the average
of this elapsed time over all ways of getting from S to S = 0.
Then the mean first-passage time from a configuration with
S incorrect bonds to the perfect configuration is approxi-
(The exact result is given later in Eq. 16.) This is asymptot-
ically correct for large N ifko is not too small. The time T is
essentially independent of the starting S; even if the starting
configuration is close to perfect, there is a significant prob-
ability that it will wander further away before reaching S =
0. The mean first-passage time for a fully biased search,
where the change c--
i is not allowed so thatko=0, is
*Permanent address: Indian Institute of Science, Bangalore, India.
tIt is amusing to note that ref. 1, very often cited in connection with
Levinthal's paradox, in fact does not contain anythingabout it. Ref.
2, cited only a few times, contains Levinthal's estimate of folding
The publication costs of this article were defrayed in part by page charge
payment. This article must therefore be herebymarked "advertisement"
in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Proc. Natl. Acad. Sci. USA 89 (1992) 21
In this limit, X is independent of N and has a logarithmic
dependence on S. This is the formula to use in connection
with Dawkins' "weasel." It gives a value for r of the order
of 105 generations (one generation is 28 attempts), which is
what one sees in a computer simulation of a fully biased
random search. The derivation of these formulae will be
Up to this point, the protein was characterized only by N
and the two rate constants. However, it is useful to make a
specific interpretation of the ratio ko/kl. The kinetic scheme
for a single bond is
The ratio of the rate constants is an equilibrium constant,
Then [cleq = 1/(1 + K) and [i]eq = K/(1 + K). Although the
separate rate constants may involve collision frequencies,
Brownian motion over potential barriers, or other dynamical
effects, K does not. It is strictly thermodynamic. The rate ko
ork,only sets the overall time scale for r(S).
The equilibrium constant can be found from statistical
mechanics. Suppose that there are v + 1 possible kinds of
bond. The correct bond has degeneracy 1 and energy ec, and
the incorrect bonds have degeneracy v and energy Ej= Ec +
U. Thus U is an energy penalty for making an incorrect bond.
Then by working out the equilibrium statistical thermody-
namics, one finds
K =ko/k1 = ve-U/kT
When U = 0, or there is no penalty, the mean first-passage
TL = (l/Nko)(V + 1)N'
where (v + 1)N is the number of possible configurations and
Nko is the sampling rate. This is the formula that is usually
used in discussions of Levinthal's paradox.
But ifthere is a penalty, so thatko/k1is small, T can become
much smaller. This is shown dramatically in Fig. 1. The graph
was drawn using the exact formula for T(S) given in Eq. 16;
the approximate formula in Eq. 1 gives slightly smaller values
for X when U/kT is big. This graph is based on N = 100, v =
2, and S = 66. The rate constants were arbitrarily chosen as
k= 109 s-1 for i
= 2 exp(-U/kT)
109 s-1 for
Carlo simulations, k1 is taken to be independent of temper-
ature, so that the entire temperature dependence comes from
the energy penalty in making an incorrect bond. The figure
shows the mean first-passage time, in years, as a function of
U/kT. According to Eq. 2, the first-passage time in the limit
of infinite U/kT is about 1.5 x 10-16 year or 5 x 10-9 s.
The figure shows that the first-passage time becomes
biologically significant (of the order of 1 second) when U/kT
is greater than about 2. One may argue that the chosen value
of k1 is only an uninformed guess, but one must remember
that the graph covers a range of more than 40 orders of
magnitude. Ifk1 is changed by a few orders ofmagnitude, the
vertical axis is shifted by that amount. Then the energy at
which the resulting first-passage time is 1 second shifts to a
i. This choice satisfies Eq. 5. As in Metropolis Monte
energy bias U/kT.
Mean first-passage time, in years, as a function of the
bit more or a bit less than 2kT. Evidently, reasonable changes
in k1 do not affect the qualitative conclusion. Levinthal's time
is greatly reduced by a very modest and physically reason-
able modification in the way that the dynamics is handled.
Now the derivation of the above results is outlined. The
method, based on the theory of first-passage times, has
already been applied by Bryngelson and Wolynes (6) in a
much more ambitious treatment of protein folding. Ref. 7
gives auseful review ofthe theory offirst-passage times in the
context of chemical kinetics. Here, emphasis is put on the
mathematical formulation of the problem and not on details
of its solution.
The number of incorrect bonds is S; the number of correct
bonds is N
S. The rate at which S
of correct bonds times the rate koof changing a correct bond
into an incorrect one,
is the number
rate(S -*S + 1)
incorrect bonds times the rate k1 of changing an incorrect
bond into a correct one,
rate at which S
is the number of
The probability that there are S incorrect bonds at time t is
denoted by P(S,t). This changes by gains from S - 1 and S
+ 1 and losses to S - 1 and S + 1. The gain-loss or master
-P(S, t) =
(N- S +1)kOP(S -1, t) + (S + 1)kP(S + 1, t)
- (N -S)koP(S, t) - Sk1P(S, t).
The end points S = 0 and S = N are handled by requiring that
P(-1, t) and P(N + 1, t) are both equal to 0.
The standard procedure for using a master equation to find
mean first-passage times is as follows. Write the differential
equations for P in matrix form as
Biophysics: Zwanziget al.
Proc. Natl. Acad. Sci. USA 89 (1992)
d P(S, t)=>W(S, S')P(S', t).
Impose an absorbing boundary condition at S = 0, so that
only the states S = 1 toNare involved. Then the fundamental
equation that determines the mean first passage times is
(S0)W(SO, S) = -1,
or, more explicitly,
Skl[r(S- 1) - r(S)] + (N - S)koljr(S + 1) - r(S)] = -1,
and then changing to the new variable y = (1 - x)/(1 + Kx),
the double sum defining r(S) may be reduced to a single
x(S) = (1/ko)(1 + K)NK J
(1 + Ky)-N-1.
For large N, the integral is dominated by the contribution
from small y. It is very weakly dependenton S. Its asymptotic
form for large N is given by
for all S between 1 and N. It is obvious that x(0) must vanish
and r(N + 1) is never needed. This determines all the other
It is not hard to solve these equations. The procedure is
analogous to what one does in finding mean first-passage
times from the Smoluchowski equation. One first solves for
the differences AU(S) = r(S + 1) - x(S), with AU(0) = T(1)
andkjAU(N- 1) = 1, and then sums the AU(S) to get x(S).
The solution, easily verified by substitution, is
(l/Nko)(1 + K)N[1 + 1!(NK)-l + 2!(NK)-2 +
The S-dependent parts of T are generally negligible in com-
parison with the leading term (1 + K)N. This is the result
stated in Eq. 1.
This asymptotic approximation is not valid if ko is too
small. In the limitko-+ 0, the integral in Eq. 16 can be
evaluated easily and leads to Eq. 2.
T(S)= - E
NkI n=O \fl
[(1 + K)N-1].
By using the integral identity
=K(n+1) (n+ 1)11 dx(l-x)n(1+Kx)N-n-1 
We thank William A. Eaton for helpful comments.
Levinthal, C. (1968) J. Chim. Phys. 65, 44-45.
Levinthal, C. (1969) in Mossbauer Spectroscopy in Biological
Systems, Proceedings of a Meeting held at Allerton House,
Monticello, IL, eds. Debrunner, P., Tsibris, J. C. M. & Munck,
E. (University of Illinois Press, Urbana), pp. 22-24.
Dawkins, R. (1987) TheBlind Watchmaker(Norton, New York),
Skolnick, J., Kolinski, A. & Yaris, R. (1988) Proc. Natl. Acad.
Sci. USA 85, 5057-5061.
Honeycutt, J. D. & Thirumalai, D. (1990) Proc. Natl. Acad. Sci.
USA 87, 3526-3529.
Bryngelson, J. D. & Wolynes, P. G. (1989) J. Phys. Chem. 93,
Weiss, G. H. (1967) Adv. Chem. Phys. 13, 1-18.
Biophysics: Zwanziget al.