Page 1

Proc. Nail. Acad. Sci. USA

Vol. 89, pp. 20-22, January 1992

Biophysics

Levinthal's paradox

ROBERT ZWANZIG, ATTILA SZABO, AND BIMAN BAGCHI*

Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, Building 2, National Institutes of Health,

Bethesda, MD 20892

Contributed by Robert Zwanzig, October 7, 1991

ABSTRACT

folded state ofa protein by a random search among all possible

configurations can take an enormously long time. Yet proteins

can fold in seconds or less. Mathematical analysis of a simple

model shows that a small and physically reasonable energy bias

against locally unfavorable configurations, ofthe orderofafew

kT, can reduce Levinthal's time to a biologically significant

size.

Levinthal's paradox is that rmding the native

Lectures and articles dealing with protein folding dynamics

often begin with a reference to the Levinthal "paradox" (1,

2t). The main point of this paper is to show by mathematical

analysis ofa simple model that Levinthal's paradox becomes

irrelevant to protein folding when some of the interactions

between amino acids are taken into account.

How long does it take for a protein to fold up into its native

structure? In a standard illustration ofthe Levinthal paradox,

each bond connecting amino acids can have several (e.g.,

three) possible states, so that a protein of, say, 101 amino

acids could exist in 310 = 5 X 107 configurations. Even ifthe

protein is able to sample new configurations at the rate of1013

per second, or 3 x 1020 per year, it will take 1027 years to try

them all. Levinthal concluded that random searches are not

an effective way of finding the correct state of a folded

protein. Nevertheless, proteins do fold, and in a time scale of

seconds or less. This is the paradox.

A clue to the resolution of the paradox is suggested by

Dawkins (3) in a discussion ofevolution by the accumulation

of small changes. He gave a more whimsical example of a

similar paradox: how long will a random search take to

produce Hamlet's remark "Methinks it is like a weasel"?

This statement contains 28 characters, including 5 spaces;

and there are 27 possible choices for each location, 26 letters

and a space. A monkey typing randomly would probably

require about 2728

1040 key strokes. Dawkins observed that

if the monkey cannot change those letters that are already

correctly in place, Hamlet's remark may be reached by a

random search in only a few thousand key strokes.

In both examples, folding proteins or writing Hamlet,

biased searches are much more effective than completely

random searches. Of course this is well known; in protein

folding simulations, potential energy functions provide the

necessary bias forMonte Carlo methods (4) and formolecular

dynamics methods (5). However, these methods rely heavily

on computation and are not amenable to easy mathematical

analysis. The goal ofthis paper is to provide the mathematical

analysis of Levinthal's paradox for a highly simplified model

of protein folding.

A first-passage time calculation shows that for an unbiased

random search, Levinthal's protein folding estimate is es-

sentially correct. But if a modest amount of bias is intro-

duced, for example by imposing an energy cost of a few kT

for locally incorrect bond configurations, the first-passage

time to the fully correct state can be very much shorter. In

fact, this time can become biologically significant.

Model and Results

Since the goal is not to understand the folding of any

particular protein, but only to present an elementary resolu-

tion of Levinthal's paradox, precise details of the protein

structure will be ignored. Consequently, the model to be

treated is not expected to be directly useful in the theory of

protein folding. It allows for only one of the many kinds of

energetic effects that are known to be involved in folding a

real protein.

The protein is a chain ofN + 1 amino acids and N bonds.

The connecting bond between two neighboring amino acids

can be characterized as "correct" or "incorrect." (Correct

means native in biology and "Shakespearean" in writing

Hamlet.) There may be several ways that this bond can be

incorrect; these will all be lumped together. Correct bonds

are labeled c, and incorrect bonds are labeled i. A typical

configuration of the chain is cciiciccciic. The "perfect" or

fully correct state is the one consisting of all c's and no i's.

The problem treated here is: starting with an arbitrary

distribution ofcorrect and incorrect bonds, and some rule for

making changes, find how long it takes to get to the perfect

chain for the first time.

The rule for making changes is the main issue. These

changes cannot be entirely random; they must be governed

by physical chemical laws. The simplest nontrivial assump-

tion one can make is that a correct bond can become incorrect

(c-+ i) with the ratekoand an incorrect bond can become

correct (i-+ c) with the rate k1 and that these changes occur

entirely independently. As a result, the number S ofincorrect

bonds in the protein configuration changes in time. The

first-passage time to the perfect state is the elapsed time,

starting from some arbitrary initial S, to arrive for the first

time at S=0. The mean first-passage time T(S) is the average

of this elapsed time over all ways of getting from S to S = 0.

Then the mean first-passage time from a configuration with

S incorrect bonds to the perfect configuration is approxi-

mately

T(S) -(1/Nko)(1+kO/k1)N.

[1]

(The exact result is given later in Eq. 16.) This is asymptot-

ically correct for large N ifko is not too small. The time T is

essentially independent of the starting S; even if the starting

configuration is close to perfect, there is a significant prob-

ability that it will wander further away before reaching S =

0. The mean first-passage time for a fully biased search,

where the change c--

i is not allowed so thatko=0, is

*Permanent address: Indian Institute of Science, Bangalore, India.

tIt is amusing to note that ref. 1, very often cited in connection with

Levinthal's paradox, in fact does not contain anythingabout it. Ref.

2, cited only a few times, contains Levinthal's estimate of folding

times.

20

The publication costs of this article were defrayed in part by page charge

payment. This article must therefore be herebymarked "advertisement"

in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2

Proc. Natl. Acad. Sci. USA 89 (1992)21

[2]

1030

In this limit, X is independent of N and has a logarithmic

dependence on S. This is the formula to use in connection

with Dawkins' "weasel." It gives a value for r of the order

of 105 generations (one generation is 28 attempts), which is

what one sees in a computer simulation of a fully biased

random search. The derivation of these formulae will be

given later.

Up to this point, the protein was characterized only by N

and the two rate constants. However, it is useful to make a

specific interpretation of the ratio ko/kl. The kinetic scheme

for a single bond is

[C]+[i]= 1.

12o~

lo2O-

1010 -

1

10-10I

[3]

The ratio of the rate constants is an equilibrium constant,

[i]eq/[c]eq=kolk,= K.

[4

Then [cleq = 1/(1 + K) and [i]eq = K/(1 + K). Although the

separate rate constants may involve collision frequencies,

Brownian motion over potential barriers, or other dynamical

effects, K does not. It is strictly thermodynamic. The rate ko

ork,only sets the overall time scale for r(S).

The equilibrium constant can be found from statistical

mechanics. Suppose that there are v + 1 possible kinds of

bond. The correct bond has degeneracy 1 and energy ec, and

the incorrect bonds have degeneracy v and energy Ej= Ec +

U. Thus U is an energy penalty for making an incorrect bond.

Then by working out the equilibrium statistical thermody-

namics, one finds

K =ko/k1 = ve-U/kT

[5]

Discussion

When U = 0, or there is no penalty, the mean first-passage

time becomes

TL = (l/Nko)(V + 1)N'

[6]

where (v + 1)N is the number of possible configurations and

Nko is the sampling rate. This is the formula that is usually

used in discussions of Levinthal's paradox.

But ifthere is a penalty, so thatko/k1is small, T can become

much smaller. This is shown dramatically in Fig. 1. The graph

was drawn using the exact formula for T(S) given in Eq. 16;

the approximate formula in Eq. 1 gives slightly smaller values

for X when U/kT is big. This graph is based on N = 100, v =

2, and S = 66. The rate constants were arbitrarily chosen as

k= 109 s-1 for i

-*

c andko

= 2 exp(-U/kT)

109 s-1 for

c

Carlo simulations, k1 is taken to be independent of temper-

ature, so that the entire temperature dependence comes from

the energy penalty in making an incorrect bond. The figure

shows the mean first-passage time, in years, as a function of

U/kT. According to Eq. 2, the first-passage time in the limit

of infinite U/kT is about 1.5 x 10-16 year or 5 x 10-9 s.

The figure shows that the first-passage time becomes

biologically significant (of the order of 1 second) when U/kT

is greater than about 2. One may argue that the chosen value

of k1 is only an uninformed guess, but one must remember

that the graph covers a range of more than 40 orders of

magnitude. Ifk1 is changed by a few orders ofmagnitude, the

vertical axis is shifted by that amount. Then the energy at

which the resulting first-passage time is 1 second shifts to a

i. This choice satisfies Eq. 5. As in Metropolis Monte

10-20

UlkT

1

1]

FIG. 1.

energy bias U/kT.

Mean first-passage time, in years, as a function of the

bit more or a bit less than 2kT. Evidently, reasonable changes

in k1 do not affect the qualitative conclusion. Levinthal's time

is greatly reduced by a very modest and physically reason-

able modification in the way that the dynamics is handled.

Mathematical Derivation

Now the derivation of the above results is outlined. The

method, based on the theory of first-passage times, has

already been applied by Bryngelson and Wolynes (6) in a

much more ambitious treatment of protein folding. Ref. 7

gives auseful review ofthe theory offirst-passage times in the

context of chemical kinetics. Here, emphasis is put on the

mathematical formulation of the problem and not on details

of its solution.

The number of incorrect bonds is S; the number of correct

bonds is N

S. The rate at which S

of correct bonds times the rate koof changing a correct bond

into an incorrect one,

--

S +

is the number

rate(S -*S + 1)

=(N

S)ko.

[7]

Similarly, the

incorrect bonds times the rate k1 of changing an incorrect

bond into a correct one,

rate at which S

--

S

1

is the number of

rate(S-*S

1)

= Sk1.

[8]

The probability that there are S incorrect bonds at time t is

denoted by P(S,t). This changes by gains from S - 1 and S

+ 1 and losses to S - 1 and S + 1. The gain-loss or master

equation is

d

-P(S, t) =

dt

(N- S +1)kOP(S -1, t) + (S + 1)kP(S + 1, t)

- (N -S)koP(S, t) - Sk1P(S, t).

[9]

The end points S = 0 and S = N are handled by requiring that

P(-1, t) and P(N + 1, t) are both equal to 0.

The standard procedure for using a master equation to find

mean first-passage times is as follows. Write the differential

equations for P in matrix form as

Si

:-..

1=1 J

T(S)=(1/kj)

years

d

-[c]=-ko[c]+kl[i],

dt

Biophysics: Zwanziget al.

Page 3

Proc. Natl. Acad. Sci. USA 89 (1992)

d P(S, t)=>W(S, S')P(S', t).

dt

[10]

S

Impose an absorbing boundary condition at S = 0, so that

only the states S = 1 toNare involved. Then the fundamental

equation that determines the mean first passage times is

>

so

(S0)W(SO, S) = -1,

all S,

[11]

or, more explicitly,

Skl[r(S- 1) - r(S)] + (N - S)koljr(S + 1) - r(S)] = -1,

and then changing to the new variable y = (1 - x)/(1 + Kx),

the double sum defining r(S) may be reduced to a single

integral,

x(S) = (1/ko)(1 + K)NK J

dy1-

(1 + Ky)-N-1.

[16]

For large N, the integral is dominated by the contribution

from small y. It is very weakly dependenton S. Its asymptotic

form for large N is given by

[12]x(S)

for all S between 1 and N. It is obvious that x(0) must vanish

and r(N + 1) is never needed. This determines all the other

r(S).

It is not hard to solve these equations. The procedure is

analogous to what one does in finding mean first-passage

times from the Smoluchowski equation. One first solves for

the differences AU(S) = r(S + 1) - x(S), with AU(0) = T(1)

andkjAU(N- 1) = 1, and then sums the AU(S) to get x(S).

The solution, easily verified by substitution, is

(l/Nko)(1 + K)N[1 + 1!(NK)-l + 2!(NK)-2 +

. .].

[17]

The S-dependent parts of T are generally negligible in com-

parison with the leading term (1 + K)N. This is the result

stated in Eq. 1.

This asymptotic approximation is not valid if ko is too

small. In the limitko-+ 0, the integral in Eq. 16 can be

evaluated easily and leads to Eq. 2.

1

s-i

T(S)= - E

NkI n=O \fl

In particular,

-1

+(N) KmN

m=n+l m

.

[13]

TM(1)=

[(1 + K)N-1].

Nko

[14]

By using the integral identity

( )

m=n+l m

=K(n+1) (n+ 1)11 dx(l-x)n(1+Kx)N-n-1 [15]

We thank William A. Eaton for helpful comments.

1.

2.

Levinthal, C. (1968) J. Chim. Phys. 65, 44-45.

Levinthal, C. (1969) in Mossbauer Spectroscopy in Biological

Systems, Proceedings of a Meeting held at Allerton House,

Monticello, IL, eds. Debrunner, P., Tsibris, J. C. M. & Munck,

E. (University of Illinois Press, Urbana), pp. 22-24.

Dawkins, R. (1987) TheBlind Watchmaker(Norton, New York),

pp. 46-50.

Skolnick, J., Kolinski, A. & Yaris, R. (1988) Proc. Natl. Acad.

Sci. USA 85, 5057-5061.

Honeycutt, J. D. & Thirumalai, D. (1990) Proc. Natl. Acad. Sci.

USA 87, 3526-3529.

Bryngelson, J. D. & Wolynes, P. G. (1989) J. Phys. Chem. 93,

6902-6915.

Weiss, G. H. (1967) Adv. Chem. Phys. 13, 1-18.

3.

4.

5.

6.

7.

22

Biophysics: Zwanziget al.