Alternate pathways for folding in the flavodoxin fold family revealed by a nucleation-growth model.
ABSTRACT A recent study of experimental results for flavodoxin-like folds suggests that proteins from this family may exhibit a similar, signature pattern of folding intermediates. We study the folding landscapes of three proteins from the flavodoxin family (CheY, apoflavodoxin, and cutinase) using a simple nucleation and growth model that accurately describes both experimental and simulation results for the transition state structure, and the structure of on-pathway and misfolded intermediates for CheY. Although the landscape features of these proteins agree in basic ways with the results of the study, the simulations exhibit a range of folding behaviours consistent with two alternate folding routes corresponding to nucleation and growth from either side of the central beta-strand.
-
Citations (0)
-
Cited In (0)
Page 1
COMMUNICATION
Alternate Pathways for Folding in the Flavodoxin Fold
Family Revealed by a Nucleation-growth Model
Erik D. Nelson* and Nick V. Grishin
Howard Hughes Medical
Institute, University of Texas
Southwestern Medical Center
6001 Forest Park Blvd., Room
ND10.124, Dallas, TX
75235-9050, USA
A recent study of experimental results for flavodoxin-like folds suggests
that proteins from this family may exhibit a similar, signature pattern of
folding intermediates. We study the folding landscapes of three proteins
from the flavodoxin family (CheY, apoflavodoxin, and cutinase) using a
simple nucleation and growth model that accurately describes both
experimental and simulation results for the transition state structure, and
the structure of on-pathway and misfolded intermediates for CheY.
Although the landscape features of these proteins agree in basic ways
with the results of the study, the simulations exhibit a range of folding
behaviours consistent with two alternate folding routes corresponding to
nucleation and growth from either side of the central b-strand.
q 2006 Published by Elsevier Ltd.
Keywords: fold families; equilibrium intermediates; non-native interactions
*Corresponding author
From a folding perspective, the topology of a
protein is interpreted by the shape of its native
backbone which loosely determines the pattern of
atom-to-atom cross-links between its amino acid
residues. Over the past several years, simple
theoretical and computational models based essen-
tially on topology and minimal entropy loss1–3have
demonstrated that native topology is a “first order”
effect deciding the way a protein folds.4–12While
the data so far still provide a very incomplete
picture, it suggests that if we could provide any
consistent description of protein folding it would be
that evolutionary changes which, roughly speaking,
conserve topology13–15and act as perturbations
affecting mainly the depths of intermediates and the
heights offree energy barriers on a protein’s folding
landscape rather than the basic mechanism16–18that
allows it to fold.
However, among these results have now
appeared a growing number of excursions away
from axiomatic correspondence between folding
and topology that must somehow find a place
within this picture.19–24For example, the small
proteins L and G share an almost identical,
symmetric topology, but both proteins nucleate
one of their two b-sheets preferentially, breaking
the symmetry of the native fold.20,21The small,
all-helical proteins Im7 and Im9 share essentially
the same topology, but Im7 folds through an on-
pathway intermediate in which a distorted arrange-
ment of its helices is stabilised by non-native
interactions.22,23Perhaps, it is not so surprising
that the folding mechanisms of these proteins are
varied. Their native shapes are not frustrated
mechanically7so they should have greater freedom
to respond to structural and energetic pertur-
bations, and their responses (the modulation of
intermediates and pathways by these pertur-
bations) may even be somewhat continuous.
On the other hand, even small perturbations
such as amino acid substitutions can sometimes
cause discrete interconversions of protein struc-
ture within a fold family (for instance, changing
b-strands to b-helices24,25). Moreover, the struc-
tural family of a protein (its fold type or fold
classification) often allows large loop insertions,
sometimes within secondary structure units, and
the substitution of one secondary structure type
for another, all of which can affect the entropy of
its folding units, the pattern of native contacts
between them, and the capacity of these units to
evolve more favourable contacts. Accordingly,
this more flexible interpretation of topology
(fold type) should permit more substantial
variations to occur among protein folding mech-
anisms.
The landscape features that define the folding
pathways of larger proteins (w200 amino acid
residues) are more discrete, and should have more
capacity to accommodate perturbations. These
0022-2836/$ - see front matter q 2006 Published by Elsevier Ltd.
E-mail address of the corresponding author:
enelson@spirit.sdsc.edu
doi:10.1016/j.jmb.2006.02.026J. Mol. Biol. (2006) xx, 1–8
ARTICLE IN PRESS
Page 2
features still appear to be guided by native
topology,5,26however, given the larger and less
predictable variations in structure that can be
admitted into the fold families of larger proteins, a
manifestly pathway-like protein could conceal, in
an evolutionary sense, alternate folding routes due
to multiple folding units that are responsive to
preferential stabilization by a suitable accumulation
of these perturbations. Therefore, as with proteins L
and G, a purely structural classification of protein
families can permit substantial variations among
the folding routes of a given fold type, but for larger
proteins this may start to define “discrete spectra”
of mechanical differences, or “modes” for folding
within a family.
If multiple routes do exist for a particular fold
type, when does nature choose from among them,
and when does it admit mixtures of the routes?
These types of problems are just now beginning to
be explored,19,27and they are of interest not simply
in terms of the physics of how proteins fold but
because they may provide information about low
lying conformational sub-states that decide how
proteins function. Because of the complexities
involved in obtaining this information experimen-
tally, simple, computationally efficient folding
models, such as those recently used to describe
protein transition state structures28–37could be very
useful to infer folding properties and thus direct the
process of these measurements more effectively.
Here, we use one of these models for a detailed
exploration of CheY and two other large proteins
from the flavodoxin fold family.27
The model is one of an extremely simple type in
which amino acid residues are allowed to exist in
just two states, either folded (frozen) or unfolded (a
discussionofthemodelisgivenintheAppendix). Its
energetics are heterogeneous and Go ¯-like, the
interaction between any two amino acid residues
being proportional to the number of atom-to-atom
contacts that would exist between them in the
native crystal structure of the protein. Each
collective state of the amino acid residues is
intended to represent a small micro-ensemble
consisting of the conformational states of unfolded
segments constrained by the frozen amino acid
residues and the cross-links that form between
them. The entropy of the micro-ensembles is
described using simple estimates from polymer
theory in which the unfolded segments are mod-
elled as random flight (gaussian) chains and only
the space occupied by frozen parts of the molecule
is excluded.
In current applications of this model,31–34the
micro-ensembles are limited to very simple objects
(for example, a nucleus or nuclei with two or fewer
loops) for the sake of simplifying the computations.
However, it is known that these approximations
begin to break down around ~ O100 amino acid
residues, precisely where the fine scale features of
folding start to matter less and where, due to its
speed, the model could be of most use. In a recent
paper,36we developed an approach to sample more
complex micro-ensemble topologies excluded in
previous work in order to investigate larger
proteins with multiple folding units. We found
that including these topologies often led to quali-
tative improvements in the calculation of transition
state structure, and that the dominantly occurring
micro-ensembles turned out to have a simple
scaling form (see the Appendix) for which an
explicit calculation of excluded volume effects38of
the type noted above would not be too forbidding.
Although we account for these effects in only an
order of magnitude sort of way, this approximation
seems to be enough to draw the kinds of
conclusions we need for this work.
The CheY topology studied here seems particu-
larly well suited to description by this model. The
transition state structure of CheY (3chy.pdb)
compares relatively well with available protein
engineering data39,40,43(correlation coefficient 0.62
or 0.94 if volume increasing mutations are
excluded) and the model detects the misfolded
and on-pathway intermediate states thought to
reflect topological frustration7,27between interior
(b-sheet) and exterior (a-helix) layers of the fold that
bridge two weakly interpenetrating domains43on
either side of the central b3strand. The level of
agreement is surprising since the misfolded inter-
mediate4,39,40is thought to result from the dynami-
cal connection between these layers and lead to a
non-native distortion of the helices, yet we observe
the intermediate in a model without explicit
dynamical constraints and native-only interactions
(Figures 1 and 2). On-pathway the agreement is
surprising as well. In crossing the transition state,
CheY nucleates from its N-terminal domain and
growth is thought to proceed by strands of the
b-sheet which frustrates the accretion of a-helices
onto the exterior. Again, this is exactly what we
observe in our simulations. In rough agreement
with the experimental results of Lopez-Hernandez
& Serrano,40,43the nuclear region includes b1-a1-b2
and part of a2(we refer to regions on either side of
the central b3strand as domains A and B below).
The minima in the free energy profile (Figure 2)
register with the formation of b-strands and the
helices start to form just before the maxima so that
the conflict in stability between interior and exterior
regions of the fold is periodically resolved in
crossing the barriers. The unusual unfolding and
refolding features of helices a4and a5in Figure 1(a)
and the accentuation of the intermediate barrier
after b4 in Figure 2 may signify non-native
interactions in the actual folding path as we explain
later below.
The flavodoxin study of Bollen & van Mierlo27
suggests that proteins from the same fold family
(CheY, cutinase and anabaena apoflavodoxin in this
instance) may exhibit a similar pattern of on and off-
pathway intermediates. These proteins have
lengths ranging from 128 to 197 amino acid residues
and very low sequence identity, and protein
engineering results exist only for the smallest
member, CheY.Interestingly,both cutinase
2
Alternate Pathways for Flavodoxin Folding
ARTICLE IN PRESS
Page 3
(1agy.pdb) and apoflavodoxin (1ftg.pdb) contain a
number offlexible loop insertions (in cutinase these
include a-helical fragments) at points where
a-helices would connect to b-strands in the B (C-
terminal) domain of CheY. These insertions could
relaxtheinterior–exterior
suggested by these authors and allow for greater
stability of the B domain which could lead to
variations among flavodoxin fold pathways.
Our results for cutinase and apoflavodoxin do
share many of the same features described for CheY.
Like CheY, the key folding event is growth of the
nucleus uptoandacrossthe b3stranddividingthe A
andBdomainsofthefold.Also,eachproteinexhibits,
to varying degrees, the signal of a misfolded
intermediate in which helices but not strands or
loops (except in the nucleus) are folded, and minima
(maxima)inthelandscaperegisterwiththeformation
of b-strands (a-helices) consistent with frustration
between the interior and exterior regions of the
protein. However, at least for apoflavodoxin, the
structural mechanism for folding is quite different.
The nucleus of apoflavodoxin is on the opposite side
frustrationeffect
(C-terminal, or B-side) of the b3strand, including
mostoftheC-terminalhelixa6,strandb5,helixa5and
connecting loops (see Figures 3 and 4) and growth
proceeds toward the N-terminal end of the b-sheet.
This result is at first difficult to accept given the
simplicity of the model and the fact that part of the
protein (the N-terminal strand b1) is confined in its
interior,andwewillreturntothissubjectlaterbelow.
However, we note here that the number of atom-to-
atomcontactsperresidueinthenativestatesofCheY
and apoflavodoxin are also weighted in opposite
directions (see Figure 4) and this effect, together with
thestructuraldifferencesinthefoldsseemstoexplain
the results of the simulations.
The atom-to-atom contact profile for cutinase,
similar to its folding landscape, could best be
pictured as intermediate to CheY and apoflavo-
doxin. As in CheY, the formation of strands tends to
line up with minima in the free energy profile but
now the helices are included more within the
minima. Although we do not present folding plots
for cutinase, it is useful to summarize the results.
First, the CheY helix a4is unstructured in cutinase
0 64
q
128
0
1(a)
(c)
(b)
Pn(q)
Pn(q)
54
32
1
050100
q
0
1
2
5
43
1
α1
α5
β5
β1
Figure 1. Projection of the folding landscape onto (a) a-helices and (b) b-strands for CheY. Pn(q) is the probablity that
sub-structure n (helix or strand 1–5) is folded when there are q frozen amino acid residues. The folding process is
stepwise, nucleated by domain A (b1-a1-b2) at qw38 and proceeding to accrete each section, an–bnC1, in order along the
interior b-sheet. After the protein is nucleated, the addition of each new helix (strand) leads to a maxima (minima) in the
free energy profile (Figure 2). The misfolded “helical” intermediate observed by Clemente and co-workers is detected
near qw24. Across this region, the strands and loops in domain B remain unfolded totally, the number of nuclei jumps
(the probability of four nuclei reaching about 0.1 at qZ24) and the distribution of nuclear sizes changes abruptly from
bimodal (distributed about 2 and q amino acid residues) to unimodal (distributed about two amino acid residues, the
segment size used in the simulations) to bimodal before reaching the transition state. (c) Ribbon diagram of the CheY
crystal structure. Light blue regions indicate amino acid residues with native contacts defined by Nelson & Grishin7and
Shea et al.10
Alternate Pathways for Flavodoxin Folding
3
ARTICLE IN PRESS
Page 4
so its C-terminal helix gets indexed as a4. In the
unfolded wing of the cutinase free energy profile,
part of its domain B is folded, including helices a3
and a4, and strands b4and b5. Although a3remains
frozen into the folded wing of the profile, most of
the segments unfold near qZL/2 (L is the length of
the protein) and are “simultaneously” replaced by
domain A, a2, and b3before the reaction proceeds.
The folding plots have an all or none character that
suggests the exchange of B-like for A-like nuclei is
part of the folding pathway27even though the
molecule begins this process from a partially
misfolded state.
Aside from structural processes, the results above
appear roughly consistent with the experimental
data. The sizes of free energy barriers are compar-
able in scale to the results reported by Bollen & van
Mierlo, and although it is difficult to establish the
topography of the landscape near the misfolded
intermediate, the profiles seem as if they could be
classified in a similar way. For example, the CheY
kinetics were analysed with both on and off-
pathway models by the Serrano group to indicate
that they lead to the same results.40This is
consistent with the fact that the main transition
state can be reached by a partial exchange of helical
structure in domain B for nuclear structure in
domain A as is indicated by our own results.
However, in apoflavodoxin and cutinase domain B
folds first, so the exchange should be qualitatively
different, and this may explain why an off-pathway
kinetic model27could describe these two experi-
ments better.
Does this over-simplified model predict the basic
signature of the folding landscapes?
The model appears to be operating as intended.
(i) The transition state structure of the CheY
topology agrees well with experiment. (ii) Complex
diagrams (nested, inter-linked loop, etc.36) are very
infrequent in simulations for this fold type. (iii)
There are very few contacts between domain A and
domain B (after strand b3) so the nuclei in these two
regionsarefreetofoldinparallel (seetheAppendix).
064
q
128
–2
3
8
F(q)
β5
β3
τ
β4
ω
Figure 2. Structural events along the free energy profile,
F(q), of CheY. The misfolded helical intermediate is
centered at qZ24 (u), and the nucleus, b1-a1-b2, is formed
at qZ38 (t). Each major basin in the profile corresponds to
the completion of a helix an(left side of basin), formation
of a strand bnC1(middle of basin), and partial formation
of the following helix anC1 (right side of basin). The
structure of the misfolded intermediate, and the registry
of helices with maxima in F(q) indicates topological
frustration between the b-interior and a-exterior of the
protein as suggested by Bollen & van Mierlo.27The depth
of minima (height of maxima) in this region reflect loop
closure events that are sensitive to the entropy approxi-
mations used in these types of models. The basic structure
of the profile is in agreement with that in Clementi et al.4
except for the placement of the transition state.
050100 150
q
0
1(a)(b)
Pn(q)
Pn(q)
6
5
4
3,2
1
050100 150
q
0
1
2
5
431
Figure 3. Projection of the folding landscape onto (a) a-helices and (b) b-strands for apoflavodoxin. The helix indices
follow the crystal structure data in which a2-a3corresponds to the CheY helix a2. The strand indices are the same in all
three of the flavodoxin proteins. The nucleus of apoflavodoxin includes part of the C-terminal helix a6, all of b5, most of a
large loop l6preceeding, or inserted into b5, a small part of helix a5and the loop preceeding it (see Figure 4). As the
transition state is crossed, the rest of a6forms, and folding continues to alternate from a to b moving from C to
N-terminal ends until the protein is folded. Again, there are two minima (basins) in folded wing of the free energy
profile, comparable in size to CheY, that register with the formation of bn-anK1layers. The signature of an intermediate
with helical structure is visible near qZ36.
4
Alternate Pathways for Flavodoxin Folding
ARTICLE IN PRESS
Page 5
(iv) The patterns of atom-to-atom contacts are
consistent with the way each protein folds, and
although the entropy cost to freeze unfolded
segments of proteins depends on amino acid
composition,25it seems unlikely that including
this dependence could lead to something concerted
enough to reverse the effect in Figure 4. Finally, (v)
in mechanical unfolding7of apoflavodoxin, both
domain A (the CheY nuclear region) and the helix-
strand combination a6-b5in domain B (the apo-
flavodoxin nucleus) are dynamically confined by
their local environments, moving essentially as
fixed units while the protein unfolds and remaining
so long after the core of the protein is exposed to
solvent.
As we noted above, non-native interactions can
have a substantial impact on, or even control the
folding of certain proteins, and some of our results
seem to suggest these effects. Although the model
does not include non-native interactions directly,
proteins do, and the results may reflect their
absence in the model at certain points along the
folding profiles. The effects of non-native inter-
actions have never been looked at using this type of
model and hence it is difficult to decide when they
could be present, or what signature they would
leave on the model kinetics. Consequently, we
decided to look at the Im7 folding landscape
where these effects have been mapped out.22,23
Im7 folds through a single intermediate in which
three of its fourhelices (a1, a2, and a4) are structured
but distorted non-natively, maximizing the burial of
hydrophobic side-chains that would be exposed
had the helices adopted their native positions. In
crossing the transition state into the native fold, the
helices acquire their native orientations, and the
binding site for helix a3is exposed allowing it to
fold and ultimately lock the whole protein into its
native structure. Our results for Im7 are shown
in Figure 5. Its sister protein, Im9, folds across
a smooth free energy barrier but still shows some
indications of an intermediate perhaps suggesting
the results seen at low pH.23Both proteins condense
into relatively large, partially unfolded ensembles
(Figure 5(b)) due to exposed side-chains in the turn
regions of the folds. This situation can be improved
a bit by extending the contact radii or by including
the dependence of the entropy on amino acid type,
however, the results here are still very instructive.
Again, in the intermediate parts of the protein are
stabilized by non-native interactions. When the
transition state is crossed, these stabilizing contacts
are exchanged for native contacts and the energetics
of the protein and the model converge. Any
qualitative differences that exist between the
model and the protein due to the missing non-
native interactions should be evident before the
transition region where these interactions are lost
and the differences between the two pathways are
reversed. Regions of the protein that are stabilized
by non-native interactions in the intermediate
should be less stable in the model and may tend
to fold late, while regions that are not stabilized by
these interactions would tend to fold early. Because
this behaviour is reversed on crossing the transition
state, it should be evident (if the effect is strong
enough) as some type of “wrinkle” in the time order
for folding the sub-structures involved in the
intermediate, and this is exactly what we observe.
The folding order for sub-structures in the
protein and the model converge on the right side
of the transition barrier just after the major
intermediate (we refer to this point as q* in
Figure 5(b)). Within the model intermediate, helices
a1 and a2 are structured, and as the transition
barrier is crossed helix a3folds, unfolds, and then
refolds after helix a4converging with the experi-
ments. The barrier is a residue of the competition
between (i) the free energy of freezing helix a4
leaving the loop including helix a3unfolded and
050100
i
0
0.5
1
1.5
2 (a)(b)
050100150
i
0
1
2
ci
ci
Figure 4. Profile of atom-to-atom contacts, ci, for (a) CheY and (b) apoflavodoxin. ciis the number of atom-to-atom
contacts with amino acid i divided by the mean (a contact is registered when atoms from non-nearest neighbor amino
acids are less than 5 A˚apart; the Figure is coarse grained in blocks of two amino acids). Shaded bars in the lower part of
the Figures indicate the nuclear regions. For each fold, the number of contacts between domain A and the “nuclear part”
of domain B (after the dividing b3strand) is about the same as that for two amino acids. The local accumulations of
contacts and the opposing slopes of the profiles coincide with the location of nuclei and the direction of their growth.
Alternate Pathways for Flavodoxin Folding
5
ARTICLE IN PRESS