Chain Length Dependence of Apomyoglobin Folding: Structural Evolution from
Misfolded Sheets to Native Helices†
Clement C. Chow,§Charles Chow,§Vinodhkumar Raghunathan,‡Theodore J. Huppert,§Erin B. Kimball,§and
Department of Chemistry, UniVersity of Wisconsin-Madison, 1101 UniVersity AVenue, Madison, Wisconsin 53706, and
Department of Chemistry, UniVersity of Washington, Seattle, Washington 98195
ReceiVed December 5, 2002; ReVised Manuscript ReceiVed April 12, 2003
ABSTRACT: Very little is known about how protein structure evolves during the polypeptide chain elongation
that accompanies cotranslational protein folding. This in vitro model study is aimed at probing how
conformational space evolves for purified N-terminal polypeptides of increasing length. These peptides
are derived from the sequence of an all-R-helical single domain protein, Sperm whale apomyoglobin
(apoMb). Even at short chain lengths, ordered structure is found. The nature of this structure is strongly
chain length dependent. At relatively short lengths, a predominantly non-native ?-sheet conformation is
present, and self-associated amyloid-like species are generated. As chain length increases, R-helix
progressively takes over, and it replaces the ?-strand. The observed trends correlate with the specific
fraction of solvent-accessible nonpolar surface area present at different chain lengths. The C-terminal
portion of the chain plays an important role by promoting a large and cooperative overall increase in
helical content and by consolidating the monomeric association state of the full-length protein. Thus, a
native-like energy landscape develops late during apoMb chain elongation. This effect may provide an
important driving force for chain expulsion from the ribosome and promote nearly-posttranslational folding
of single domain proteins in the cell. Nature has been able to overcome the above intrinsic misfolding
trends by modulating the composition of the intracellular environment. An imbalance or improper
functioning by the above modulating factors during translation may play a role in misfolding-driven
Significant progress has been made in recent years in
understanding the physical principles and the mechanisms
governing the folding of single domain proteins (1-7). Since
Anfinsen’s key experiments (8), most of the work in this
area has focused on the mechanisms by which sequence
encodes structure. Folding is usually envisaged as the
convergence of an ensemble of disordered conformations
(i.e., the unfolded state) toward a lower energy spatially
ordered compact structure. The initial conditions of these
experiments involve a chemically or photochemically gener-
ated denatured state. The process is then followed as a
function of time, once the solution has rapidly been switched
to conditions thermodynamically favoring the folded con-
formation. The physical forces acting on the polypeptide
chain on its way to the native conformation are then probed,
and experimental information is gained on the energy
landscapes of the folded protein. These in vitro experiments
assume that the pertinent species to be studied is the full-
A poorly explored aspect in protein folding is how
precisely structure develops as a function of chain length,
starting from the N-terminus and proceeding toward the
C-terminus. This is a biologically relevant question since,
within the ribosomal machinery of the cell, proteins are
vectorially synthesized from the N-terminus toward the
C-terminus. Most importantly, translation rates in both
prokaryotic and eukaryotic systems are far slower than the
typical folding rates of single domain proteins at physiologi-
cally relevant temperatures (Table 1). The above kinetic
argument suggests that there is ample chance for conforma-
tional equilibration to take place cotranslationally, before
synthesis of the full length chain has been completed.
While it is clear that other complicating factors (e.g.,
presence of chaperones, molecular crowding) may play a role
during cotranslational folding/misfolding events in the in-
tracellular environment, this work takes a model system
approach and investigates how polypeptide conformation and
energy landscapes evolve with chain elongation under native-
like conditions (i.e., in the absence of any denaturing agents).
It is important to emphasize upfront that this strategy is not
aimed at representing the complex conditions found in the
cell’s natural milieu (e.g., high viscosity and molecular
crowding, ribosome, dynamic equilibrium with cotransla-
tionally active chaperones), but rather, at establishing physi-
cal principles and expected trends of the polymer chain
encoding a protein sequence as it gets elongated. These
†This work was supported by the American Heart Association (Grant
0160492Z), the National Science Foundation (Grant 0215368), by the
Research Corporation (Research Innovation Award), and by the
Milwaukee Foundation (Shaw Scientist Award).
* To whom correspondence should be addressed. Phone: (608) 262-
5430. Fax: (608) 262-5430. E-mail: firstname.lastname@example.org.
‡University of Washington.
§University of Wisconsin.
Biochemistry 2003, 42, 7090-7099
10.1021/bi0273056 CCC: $25.00 © 2003 American Chemical Society
Published on Web 05/20/2003
principles can then be exploited to devise tests that more
rigorously take the cell’s environment into account.
The chain length dependence of homopolymer and block-
copolymer model polypeptides has been studied before.
Scheraga (9, 10), Blout (11), and Loh (12) investigated the
secondary structure evolution of R-helical, polyproline II
helical, and ?-sheet polypeptides. Hodges and Litowski (13),
and Toniolo et al. (14) studied the chain length dependence
of coiled coil and 3-10 helix formation, respectively. A
different approach has been adopted by Wright and Dyson
(15-18), and Carey and Tasayco (19-21), who have
analyzed the folding of protein segments corresponding to
individual secondary structure modules of globular proteins
with the goal of learning about locally versus globally driven
secondary structure formation in folding. Fersht and co-
workers examined the behavior of polypeptide fragments of
increasing length derived from the chymotrypsin inhibitor
II (CI-2) (22, 23) and barnase (24) sequences.
The target protein of this work is apomyoglobin (apoMb),1
a well-characterized all R-helical single domain protein
(Figure 1a) whose full length chain in vitro folding pathways
have been well-studied (25-29). The A, G, and H helices
form first as part of a molten globule intermediate (Figure
2a). The additional secondary and tertiary structure modules
are subsequently formed within a cooperative step character-
ized by single exponential kinetics (25). We have examined
36, 77, and 119 amino acid long N-terminal fragments.
Lengths were designed to match ends of individual helices
in the native state (Figure 1b). All species were compared
to the 153 amino acid full length protein. Figure 2 (panels b
and c) illustrates two possible limiting models by which a
polypeptide may fold as its chain elongates from the N- to
the C-terminus. The first model (panel b) postulates that the
elongating polypeptide chain is largely dynamic and structur-
ally disordered until most of the polypeptide chain has been
synthesized. According to the alternative limiting model
(panel c), both secondary and tertiary interactions develop
in concert and are extremely close to the corresponding
structures found in the full length protein. The present study
aims at testing the above models and the existence of any
alternative folding/misfolding motifs.
MATERIALS AND METHODS
N-Terminal Fragment Preparation. The wild type apoMb
gene was obtained from Steve Sligar (University of Illinois
at Urbana Champaign) and subsequently subcloned into a
pET blue-1 vector (Novagen, Madison, WI). Briefly, the
apoMb gene insert was obtained by PCR, blunted by T4
DNA polymerase, and finally subcloned into the pETblue-1
vector at the EcoR V restriction site. The genes for the 1-77
and 1-119 apoMb N-terminal fragments were produced by
engineering stop codon point mutations into appropriate
regions of the parent plasmid (QuickChange Site-Directed
Mutagenesis Kit; Stratagene, La Jolla, CA). The gene for
the 1-36 fragment was generated by introducing a Met point
mutation in the parent gene at a position corresponding to
amino acid 36. After expression and reverse phase HPLC
1Abbreviations: apoMb, apomyoglobin; GnHCl, guanidine hydro-
chloride; CD, circular dichroism; TFA, trifluoroacetic acid; HFIP,
hexafluoro2-propanol; far-UV CD, circular dichroism in the far
ultraviolet region; FT-IR, Fourier transform infrared.
Table 1: Kinetic Parameters for the Observed Translation Rates of Prokaryotic and Eukaryotic Species Both in Vivo and in Cell-Free Systems
translation time for
a 200 residue protein (s)
prokaryotic, in vivo
eukaryotic, in vivo
prokaryotic, cell-free system
eukaryotic, cell-free system
eukaryotic, cell-free system
In Vitro Protein Folding Rates
folding rate (s-1)
3-10 × 10-5
7 × 10-7-1
5 × 10-3-7
1-3 × 104
single domain, R helix
single domain, ? sheet
(tail-spike protein, luciferase)
FIGURE 1: (a) Three-dimensional structure of Sperm whale myo-
globin (apoMb). The letters denote corresponding helices found in
the native structure. Coordinates were derived from the X-ray
structure of carbonmonoxymyoglobin (72). The image was created
using the MOLSCRIPT (73) and RASTER3D (74) softwares. (b)
Schematic representation of the N-terminal apoMb fragments
studied in this work. The length and color coding of the different
segments matches that of the helices found in myoglobin’s native
state (see panel a).
Chain Length Dependence of Protein Folding/Misfolding
Biochemistry, Vol. 42, No. 23, 2003 7091
32. Pace, C. N., Shirley, B. A., and Thomson, J. A. (1990) in Protein
StructuresA Practical Approach (Creighton, T. E., Ed.) pp 311-
330, IRL Press, Oxford.
33. Surewicz, W. K., Mantsch, H. H., and Chapman, D. (1993)
Biochemistry 32, 389-394.
34. Surewicz, W. K., Mantsch, H. H., and Chapman, D. (1989) J.
Mol. Struct. 214, 143-147.
35. Dieudonne, D., Mendelsohn, R., Farid, R. S., and Flach, C. R.
(2001) Biochim. Biophys. Acta 1511, 99-112.
36. Simonetti, M., and Bello, C. D. (2001) Biopolymers 62, 95-108.
37. Naiki, H., Higuchi, K., and Hosokawa, M. (1989) Anal. Biochem.
38. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132.
39. Tsodikov, O. V., Record, M. T., and Sergeev, Y. V. (2002) J.
Comput. Chem. 23, 600-609.
40. Sreerama, N., and Woody, R. W. (1993) Anal. Biochem. 209, 32-
41. Sreerama, N., and Woody, R. W. (2000) Anal. Biochem. 282, 252-
42. Singh, B. R. (2000) Infrared Analysis of Peptides and Proteins,
Oxford University Press, New York.
43. Lakowicz, J. R. (1999) pp 186-189, Kluwer Academic/Plenum
Publishers, New York.
44. Pallitto, M., and Murphy, R. M. (2001) Biophys. J. 81, 1805-
45. Neira, J. L., and Fersht, A. R. (1999) J. Mol. Biol. 285, 1309-
46. Baldwin, R. L. (1999) Nat. Struct. Biol. 6, 814-817.
47. Chothia, C. (1976) J. Mol. Biol. 105, 1-14.
48. Kauzmann, W. (1959) AdV. Prot. Chem. 14, 1-63.
49. Dill, K. A. (1985) Biochemistry 24, 1501-1509.
50. Taniuchi, H. (1970) J. Biol. Chem. 245, 5459-5468.
51. Frydman, J., and Hartl, U. F. (1996) Science 272, 1497-1502.
52. Fandrich, M., Fletcher, M. A., and Dobson, C. M. (2001) Nature
53. Booth, D. R., Sunde, M., Bellotti, V., Robinson, C. V., Hutchinson,
W. L., Fraser, P. E., Hawkins, P. M., Dobson, C. M., Radford, S.
E., Blake, C. C. F., and Pepys, M. B. (1997) Nature 385, 787-
54. Dobson, C. M. (2002) Nature 418, 729-730.
55. Zimmerman, S. B., and Minton, A. P. (1993) Annu. ReV. Biophys.
Biomol. Struct. 22, 27-75.
56. Minton, A. P. (1981) Biophys. J. 78, 101-109.
57. Minton, A. P. (2000) Curr. Opin. Struct. Biol. 10, 34-39.
58. Van Den Berg, B., Wain, R., Dobson, C. M., and Ellis, R. J. (2000)
EMBO J. 19, 3870-3875.
59. Ban, N., Nissen, P., Hansen, J., Moore, P. B., and Steitz, T. A.
(2000) Science 289, 905-920.
60. Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N.,
and Noller, H. F. (1999) Science 285, 2095-2104.
61. Martin, K. A., and Miller, O. L. (1983) DeV. Biol. 98, 338-348.
62. Yoshida, T., Wakiyama, M., Yazaki, K., and Miura, K. (1997) J.
Electron Microsc. 46, 503-506.
63. Nicholls, C. D., McLure, K. G., Shields, M. A., and Lee, P. W.
K. (2002) J. Biol. Chem. 277, 12937-12945.
64. Gilmore, R., Coffey, M. C., Leone, G., McLure, K., and Lee, P.
W. (1996) EMBO J. 15, 2651-2658.
65. Komar, A. A., Kommer, A., Krasheninnikov, I. A., and Spirin,
A. S. (1997) J. Biol. Chem. 272, 10646-10651.
66. Frydman, J. (2001) Annu. ReV. Biochem. 70, 603-647.
67. Netzer, W. J., and Hartl, F. U. (1997) Nature 388, 343-349.
68. Netzer, W. J., and Hartl, F. U. (1998) Trends Biochem. Sci. 23,
69. Fedorov, A. N., and Baldwin, T. O. (1998) Methods Enzymol.
70. Clark, P. L., and King, J. (2001) J. Biol. Chem. 276, 25411-
71. Herbst, R., Schafer, U., and Seckler, R. (1997) J. Biol. Chem.
72. Kuriyan, J., Wilz, S., Karplus, M., and Petsko, G. A. (1986) J.
Mol. Biol. 192, 133-154.
73. Kraulis, P. J. (1991) J. Appl. Crystallogr. 24, 946-950.
74. Merrit, E. A., and Bacon, D. J. (1997) Methods Enzymol. 277,
75. Pace, C. N. (1990) TibTech 8, 93-98.
Chain Length Dependence of Protein Folding/Misfolding
Biochemistry, Vol. 42, No. 23, 2003 7099