Identifying the amylome, proteins capable
of forming amyloid-like fibrils
Lukasz Goldschmidta, Poh K. Tenga, Roland Riekb, and David Eisenberga,1
aHoward Hughes Medical Institute, University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles,
CA 90095-1570; and
bLaboratory of Physical Chemistry, Eidgenössiche Technische Hochschule Zurich, CH-8093 Zürich, Switzerland
Contributed by David S Eisenberg, January 4, 2010 (sent for review December 3, 2009)
The amylome is the universe of proteins that are capable of form-
ing amyloid-like fibrils. Here we investigate the factors that enable
a protein to belong to the amylome. A major factor is the presence
in the protein of a segment that can form a tightly complementary
interface with an identical segment, which permits the formation
of a steric zipper—two self-complementary beta sheets that form
the spine of an amyloid fibril. Another factor is sufficient confor-
mational freedom of the self-complementary segment to interact
with other molecules. Using RNase A as a model system, we vali-
date our fibrillogenic predictions by the 3D profile method based
on the crystalstructure ofNNQQNYand demonstrate that a specific
residue order is required for fiber formation. Our genome-wide
analysis revealed that self-complementary segments are found in
almost all proteins, yet not all proteins form amyloids. The implica-
tion is that chaperoning effects have evolved to constrain self-
complementary segments from interaction with each other.
3D profile ∣ ribonuclease A ∣ Rosetta energy ∣ steric zipper
as well as a globular state (1). Astbury was the first to describe the
cross-beta fibril diffraction pattern, now accepted as the defini-
tive signature of the amyloid state of proteins. Astbury’s observa-
tion was on a denatured protein, albumin in poached egg white.
Today it is established that amyloid diseases, including Alzhei-
mer’s and prion diseases, are associated with elongated, un-
branched protein fibrils (2, 3). However, functional proteins are
also found in the amyloid state. These include the egg stalk of the
green lace-wing fly (4), the Pmel17 protein associated with skin
pigmentation (5), and a large number of secretory hormones (6).
Conversely, in the past decade, Pertinhez et al. (7) and others
(8–10) have shown that many globular proteins can be converted
to the amyloid state by a variety of denaturing processes, suggest-
ing that conversion may be generally applicable to all proteins. So
the question arises, to what extent is this conjecture true? That is,
how large is the amylome?
Computer algorithms have been proposed to answer a some-
what broader question: What is the aggregation propensity of a
given protein sequence? Aggregates, in general, include amyloid-
like fibrils but also other types of fibrils and nonfibrillar aggre-
gates. TANGO (11) identifies beta-aggregating regions of
proteins by using a statistical mechanics algorithm based on
the physico-chemical principles of beta-sheet formation. For each
residue, it calculates the energy of structural states derived from
statistical and empirical considerations and then computes the
occupancy of the beta-aggregation conformational state. Al-
though beta-aggregation propensity by itself is not necessarily in-
dicative of amyloid formation, it plays a major role in determining
the tendency to ultimately form organized structures such as amy-
loid fibrils. However, as stated by the authors, TANGO cannot be
used to quantitatively compare aggregation propensities between
different proteins and is unable to accurately predict low levels of
aggregation propensity. The Dobson and Vendruscolo groups de-
veloped an algorithm that predicts the regions of protein se-
quences that are most important in promoting aggregation and
eventy-five years ago, the pioneering biophysicist William Ast-
bury speculated that every protein might have a fibrous state
amyloid formation. Zyggregator (12), an implementation of this
algorithm, computes a Zaggscore profile, taking into account both
the intrinsic aggregation propensity computed from the protein
sequence and the structural protection provided by the folded
form of the protein. PASTA, the prediction of amyloid structure
aggregation algorithm (13), uses a pairwise energy function that
computes the propensity of two residues found within a beta
sheet facing one another on neighboring strands. It is based on
the key assumption that a universal mechanism is responsible
for beta-sheet formation in globular proteins and fibrillar aggre-
gates. PASTA’s ability to predict the registry of the intermolecular
hydrogen bonds formed between amyloidogenic segments allows
it to identify the portions of the sequence forming the cross-beta
core as well as to discriminate whether the intermolecular
beta-strands are parallel or antiparallel. These three methods
are sequence-based and do not take 3D structural features di-
rectly into account.
Herewe focus on the factors that permit a protein to convert to
the amyloid state. We extend the 3D profile method (14) to
computationally identify, within all putative proteins of three gen-
omes, segments with high fibrillation propensity (HP) that can
form a “steric zipper”—two self-complementary beta sheets, giv-
ing rise to the spine of an amyloid fibril. Our approach differs
from those discussed above in that it relies mainly on structural
information to evaluate the likelihood that a particular sequence
can form fibrils, in contrast to previous algorithms that rely
mainly on sequence information. For structures deposited in the
Protein Data Bank (PDB), we also establish the localization and
geometry of such segments in folded proteins. Using bovine pan-
creatic ribonuclease A (RNase A) as a model system, we experi-
mentally validate the accuracy of our predictions and investigate
the effect of sequence and residue composition. We demonstrate
that our prediction method of self-complementary, fibrillizing
segments within proteins is effective; that fibrillation propen-
sity depends on the specific residue sequence; and that self-
complementary segments on the surface of proteins are rare, sug-
gesting that chaperoning effects have evolved to constrain such
segments from interaction with each other (15).
Identification of Protein Segments Having High Propensity for Fibril-
lation. In previous studies, we found that particular short
(4–10 residue) segments of proteins are capable of forming amy-
loid-like fibrils (16–19). Atomic structures of fibril-like crystals
formed by these segments revealed that these segments are
self-complementary, stacking into pairs of beta sheets, whose side
chains extend andinterdigitate. Onthe basis of the first two atom-
ic structures (of segments GNNQQNY and NNQQNY), we
Author contributions: L.G., P.K.T., R.R., and D.E. designed research; L.G. and P.K.T.
performed research; L.G., P.K.T., R.R., and D.E. analyzed data; and L.G. and D.E. wrote
The authors declare no conflict of interest.
1To whom correspondence should be addressed. E-mail: email@example.com.
This article contains supporting information online at www.pnas.org/cgi/content/full/
www.pnas.org/cgi/doi/10.1073/pnas.0915166107 PNAS ∣ February 23, 2010 ∣ vol. 107 ∣ no. 8 ∣ 3487–3492
3D Profile Method. Our fibrillogenic predictions by the 3D profile method are
available online in the ZipperDB database at http://services.mbi.ucla.edu/
zipperdb. The database is easily searchable by protein identifier, name, or
sequence. For each protein, we provide the fibrillogenic propensity profile
and, for the hexapeptide segments, Rosetta energies and our steric zipper
models. Users will be able to submit their own protein sequences for analysis
in the near future.
Datasets. E. coli, S. cerevisiae, and H. sapiens genomes were retrieved from
the National Center for Biotechnology Information Web site. A nonredun-
dant set of the PDB based on 50% sequence identity containing 12,846 pro-
teins was retrieved from the PDB in December 2007.
Fibrillation Propensity Predictions. Fibrillation propensities were computed
for all proline-free six-residue segments in the protein sequences by using
the 3D profile method (14). To avoid problems with their disulphide bonding
abilities, cysteines were substituted to serines during modeling. Briefly, we
used the backbone coordinates from the crystal structure of NNQQNY as a
template and evaluated the energetic fit of a putative sequence by “thread-
ing” it onto the template with RosettaDesign (20). Segments that have a low
Rosetta energy (−23 kcal∕mol or lower) and a steric zipper-like interface, as
judged by its shape complementarity (>0.7), are classified to have HP.
For larger datasets, we estimated the propensity by using a simplified “tri-
plet” method, which takes into account only the interactions of the three
inward-facing residues at the steric zipper interface (residues 1, 3, and 5
of each hexapeptide). The three outward-facing residues are modeled as ala-
nines, which significantly reduces computational complexity at the expense
of accuracy. We find that this method introduces an average error of
4 kcal∕mol per hexapeptide and has the tendency to overestimate the pro-
pensity by calculating a lower energy than the 3D profile method for a given
hexapeptide. See SI Appendix.
Aggregation Propensity Predictions with TANGO. Aggregation propensity of
peptide segments was predicted with TANGO (11) by using default para-
meters. For segments that originated from natural protein sequences, the
full length wild-type protein sequence was used; for designed control
segments, a construct consisting of the segment of interest flanked by six
glycines on both sides was used. We classified the segment as high propensity
if TANGO assigned a score of ≥1 for any residue within the segment, which
is a very liberal threshold because TANGO reported peak propensities of 93
(A-beta), 97 (beta microglobulin), 4.5 (superoxide dismutase), and 2.5 (islet
amyloid polypeptide) in our benchmarks.
Segment Exposure in Protein Structures. Segment exposure was calculated
from the protein structure containing the said segment by using the solvent
accessible surface area reported by DSSP (30). The residue exposure reported
here is equal to the solvent accessibility of the residue normalized by the
maximum solvent accessibility of the corresponding residue type free in
space. The segment exposure is the average exposure of the residues con-
tained therein. Exposed segments are those whose exposure is at least equal
to the median exposure over the entire set, which was determined to be 15%
Segment Geometry. The alpha carbon rmsd of each hexapeptide to the
NNQQNY backbone was computed with the Kabsch algorithm (31). Segments
with rmsd <0.75 Å are deemed to have compatible beta-strand geometry.
Functional and Conserved Residues. The UniProt database was queried for
active site residue annotations in proteins. Segments containing one or more
such active site residues were classified as functional. Separately, a multiple
sequence alignment was prepared with BLAST from the National Center for
Biotechnology Information nonredundant gene database, by using an expec-
tation value threshold of 1 × 10−10. Pairwise similarity scores were computed
from the alignment by using the BLOSUM62 matrix to rank conserved resi-
dues within a protein candidate; various conservation rank thresholds (such
as top 5 or 10 residues) were used to identify conserved residues.
Segment Shuffling. Given a hexapeptide segment, all 720 (6!) sequence
permutations were enumerated. This operation rearranges the residues
within a segment while keeping its composition constant. For any rearranged
sequence that was present in any protein in the dataset, the fibrillation pro-
pensity was computed by using the 3D profile method as described above.
The exposure of the permuted segment was computed from the correspond-
Fibril Formation and Electron Microscopy. Lyophilized, synthetic peptides
(CSBio) were dissolved in various buffer, solvent, and salt solutions and incu-
bated at 37°C with vigorous shaking. Samples were applied directly to
hydrophillic 400-mesh carbon-coated formvar support films mounted on cop-
per grids (Ted Pella). After 2.5 min, the grids were rinsed with 0.2 μm-filtered
distilled water and stained for 1 min with 1% uranyl acetate. Dried grids were
examined with a Philips CM120 transmission electron microscope at an accel-
eration voltage of 120 keV.
1. Astbury WT, Dickinson S, Bailey K (1935) The x-ray interpretation of the denaturation
and the structure of the seed globulihns. Biochem J 29:2351–2360.
2. Terry WD, et al. (1973) Structural identity of Bence Jones and amyloid fibril proteins in
a patient with plasma cell dyscrasia and amyloidosis. J Clin Invest 52:1276–1281.
3. Sunde M, et al. (1997) Common core structure of amyloid fibrils by synchrotron x-ray
diffraction. J Mol Biol 273:729–739.
4. Geddes AJ, Parker KD, Atkins ED, Beighton E (1968) “Cross-beta” conformation in
proteins. J Mol Biol 32:343–358.
5. Kelly J, Balch WE (2003) Amyloid as a natural product. J Cell Biol 161:461–462.
6. Maji SK, et al. (2009) Functional amyloids as natural storage of peptide hormones in
pituitary secretory granules. Science 325:328–332.
7. Pertinhez TA, et al. (2001) Amyloid fibril formation by a helical cytochrome. FEBS Lett
8. Fink AL (1998) Protein aggregation: Folding aggregates, inclusion bodies and amyloid.
Fold Des 3:R9–23.
9. Sunde M, Blake CC (1998) From the globular to the fibrous state: Protein structure and
structural conversion in amyloid formation. Q Rev Biophys 31:1–39.
10. Chiti F, Dobson CM (2009) Amyloid formation by globular proteins under native
conditions. Nat Chem Biol 5:15–22.
11. Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of
sequence-dependent and mutational effects on the aggregation of peptides and
proteins. Nat Biotechnol 22:1302–1306.
12. Tartaglia GG, et al. (2008) Prediction of aggregation-prone regions in structured
proteins. J Mol Biol 380:425–436.
13. Trovato A, Chiti F, Maritan A, Seno F (2006) Insight into the structure of amyloid fibrils
from the analysis of globular proteins. PLoS Comput Biol 2:e170.
14. Thompson MJ, et al. (2006) The 3D profile methods for identifying fibril-forming
segments of proteins. Proc Natl Acad Sci USA 103:4074–4078.
15. Dobson CM (2004) Seminars in Cell and Developmental Biology, ed Ellis J Vol 15
(Associated Press, New York), pp 3–16.
16. Balbirnie M, Grothe R, Eisenberg D (2001) Atomic structures of amyloid cross-beta
spines reveal varied steric zippers. Proc Natl Acad Sci USA 98:2375–2380.
17. Nelson R, et al. (2005) Structure of the cross-beta spine of amyloid-like fibrils. Nature
18. SambashivanS,Liu Y, SawayaMR,GingeryM,EisenbergD (2005) Amyloid-likefibrils of
ribonuclease A with three-dimensional domain-swapped and native-like structure.
19. Sawaya MR, et al. (2007) Atomic structures of amyloid cross-beta spines reveal varied
steric zippers. Nature 447:453–457.
20. Kuhlman B, Baker D (2000) Native protein sequences are close to optimal for their
structures. Proc Natl Acad Sci USA 97:10383–10388.
21. Rousseau F, Serrano L, Schymkowitz JW (2006) How evolutionary pressure against pro-
tein aggregation shaped chaperone specificity. J Mol Biol 355(5):1037–1047.
22. Ivanova MI, Thompson MJ, Eisenberg D (2006) A systematic screen of beta(2)-micro-
globulin and insulin for amyloid-like segments. Proc Natl Acad Sci USA 103:4079–4082.
23. Wiltzius JJW, et al. (2009) Molecular mechanisms for protein-encoded inheritance. Nat
Struct Mol Biol 16:973–978.
24. Hurle MR, Helms LR, Li L, Chan W, Wetzel R (1994) A role for destabilizing amino acid
replacements in light-chain amyloidosis. Proc Natl Acad Sci USA 91:5446–5450.
25. Lai Z, Colon W, Kelly JW (1996) The acid-mediated denaturation pathway of transthyr-
etin yields a conformational intermediate that can self-assemble into amyloid.
26. TengPK, EisenbergD (2009) Shortproteinsegmentscandrivea non-fibrillizingprotein
into the amyloid state. Protein Eng Des Sel 22:531–536.
27. Pepys MB, et al. (1993) Human lysozyme gene mutations cause hereditary systemic
amyloidosis. Nature 362:553–557.
28. Elam JS, et al. (2003) An alternative mechanism of bicarbonate-mediated peroxidation
by copper-zinc superoxide dismutase: rates enhanced via proposed enzymeassociated
peroxycarbonate intermediate. J Biol Chem 278:21032–21039.
29. Raman B, et al. (2005) AlphaB-crystallin, a small heat-shock protein, prevents the
amyloid fibril growth of an amyloid beta-peptide and beta2-microglobulin. Biochem
30. Kabsch W, Sander C (1983) Identical pentapeptides with different backbones.
31. Kabsch W (1976) A solution of the best rotation to relate two sets of vectors.
Acta Crystallogr A32:922–923.
www.pnas.org/cgi/doi/10.1073/pnas.0915166107Goldschmidt et al.