Sampling the N-terminal proteome
of human blood
David Wildes and James A. Wells1
Departments of Pharmaceutical Chemistry and Cellular and Molecular Pharmacology, University of California, San Francisco, Byers Hall, 1700 4th Street,
San Francisco, CA 94158
Contributed by James A Wells, December 15, 2009 (sent for review October 15, 2009)
The proteomes of blood plasma and serum represent a potential
gold mine of biological and diagnostic information, but challenges
such as dynamic range of protein concentration have hampered
efforts to unlock this resource. Here we present a method to label
and isolate N-terminal peptides from human plasma and serum.
This process dramatically reduces the complexity of the sample
by eliminating internal peptides. We identify 772 unique N-term-
inal peptides in 222 proteins, ranging over six orders of magnitude
in abundance. This approach is highly suited for studying natural
proteolysis in plasma and serum. We find internal cleavages in
plasma proteins created by endo- and exopeptidases, providing
information about the activities of proteolytic enzymes in blood,
which may be correlated with disease states. We also find signa-
tures of signal peptide cleavage, coagulation and complement ac-
tivation, and other known proteolytic processes, in addition to a
large number of cleavages that have not been reported previously,
including over 200 cleavages of blood proteins by aminopepti-
dases. Finally, we can identify substrates from specific proteases
by exogenous addition of the protease combined with N-terminal
isolation and quantitative mass spectrometry. In this way we iden-
tified proteins cleaved in human plasma by membrane-type serine
protease 1, an enzyme linked to cancer progression. These studies
demonstrate the utility of direct N-terminal labeling by subtiligase
to identify and characterize endogenous and exogenous proteoly-
sis in human plasma and serum.
plasma ∣ protease ∣ proteomics ∣ serum ∣ biomarker
in health and disease. Because the blood contacts virtually every
cell and tissue throughout the body, it contains many proteins and
other chemicals that may report on health and disease. In addi-
tion, blood collection is simple and minimally invasive, making it
a medium of choice for many classical diagnostic tests. Unfortu-
nately, the blood proteome has been challenging to exploit for
discovery of protein biomarkers, because of the large number
of unique proteins and their degradation products and the broad
range of protein concentrations (from millimolar to picomolar or
below) in serum and plasma. Just 22 proteins are estimated to
make up 99% of the blood proteome by mass. Promising candi-
dates for diagnostic markers, such as cytokines, growth factors,
and cancer-specific antigens, may be more than a billionfold less
abundant than the major blood proteins (1). Immunoaffinity
depletion of certain abundant proteins is typically employed to
improve dynamic range, though it has potential disadvantages,
including high cost and the possibility of removing low-
abundance species that bind to highly abundant proteins (2, 3).
Many biomarker discovery efforts search for variations in total
abundance of particular proteins. This approach is simple to
implement and conceptually straightforward but may miss poten-
tially informative variation in a sample. A given protein in blood
may be posttranslationally modified in myriad ways, including dif-
ferential glycosylation, sulfation, oxidation, glycation, proteolysis,
and many others. Modified proteins may be informative about
disease states; for instance, glycated hemoglobin (HbA1c) and
he proteomes of human blood serum and plasma contain a
vast amount of useful information about the state of the body
serum albumin are useful markers for diabetes mellitus (4). This
information is lost when protein abundance alone is measured.
Specific enrichment of modified species can address these
challenges, because separating modified from unmodified
peptides greatly reduces sample complexity. Proteolysis is an
excellent candidate for this strategy. Most blood proteins are
subject to at least one proteolytic cleavage, when the N-terminal
secretory signal is removed during biogenesis. Additional
cleavages may occur in the secretory pathway, and proteins
may be further processed by endo- and exoproteases acting in
biological processes such as coagulation and complement. These
proteolytic processes may be perturbed in disease, and disease-
specific, protease-derived new N-termini in blood may be a
valuable class of biomarkers.
Recently, we and others have developed a number of comple-
mentary chemical methods to isolate and identify N-terminal
peptides in proteins, on the basis of positive or negative enrich-
ment strategies (reviewed in refs. 5, 6). Here we apply one such
method to isolate and identify the products of proteolysis of
blood proteins, an underexploited class of potential biomarkers.
These methods and data can further illuminate the role of
proteases in blood biology and could provide a strategy for
blood-based biomarker discovery.
Results and Discussion
N-Terminal Enrichment Strategy. Specific labeling and isolation of
protein N termini is challenging, because of the similar reactivity
of N-terminal α-amines and the >20-fold more abundant
ϵ-amines of lysine side chains. We have addressed this challenge
by employing an engineered enzyme, called subtiligase. Subtili-
gase is a double mutant (S221C/ P225A) of the serine protease
subtilisin BPN′ from Bacillus amyloliquefaciens (7), containing
additional modifications that enhance stability (8, 9). It lacks
detectable protease activity but is capable of cleaving peptide
glycolate esters, forming a thioester enzyme intermediate that
can be transferred onto free protein and peptide N termini.
Subtiligase exhibits absolute specificity for N-terminal α-amines
over lysine ϵ-amines, making it an excellent tool for N-terminal
labeling. Our group has previously described a subtiligase-based
method for labeling, isolation, and enrichment of protein N ter-
mini in cell lysates (10). This protocol was modified for plasma
and serum labeling and is shown schematically in Fig. 1A. N-term-
inal peptides are isolated with a characteristic serine-tyrosine
dipeptide tag, providing a characteristic mass shift to all labeled
precursor ions as well as two prominent fragment ions in all
MS/MS spectra. This tag provides strong evidence for subtiligase
tagging, enrichment, and recovery.
Author contributions: D.W. and J.A.W. designed research; D.W. performed research;
D.W. and J.A.W. analyzed data; and D.W. and J.A.W. wrote the paper.
The authors declare no conflict of interest.
1To whom correspondence should be addressed: E-mail: email@example.com.
This article contains supporting information online at www.pnas.org/cgi/content/full/
www.pnas.org/cgi/doi/10.1073/pnas.0914495107PNAS ∣ March 9, 2010 ∣ vol. 107 ∣ no. 10 ∣ 4561–4566
Materials and Methods
Materials. Details of materials used in this study may be found in SI Text.
Sample Labeling and Workup. Samples (2 mL) were biotinylated with
subtiligase and peptide ester for 60 min at room temperature. Proteins were
reduced, alkylated with iodoacetamide, captured on immobilized streptavi-
din, digested with trypsin, and released from the resin with TEV protease.
Additional details of the labeling and workup are provided in SI Text.
LCMS/MS Acquisition and Data Processing. Peptides were subject to offline
strong cation exchange fractionation and then to C18 chromatography
coupled directly to a QSTAR Pulsar or QSTAR Elite mass spectrometer.
Additional details of data acquisition and processing are in SI Text.
Peptide Identification. Database searches to identify peptides were per-
formed by using Protein Prospector v. 5.2.2 (UCSF Mass Spectrometry Facility,
prospector.ucsf.edu) and the December 2008 release of Swiss-Prot. Addi-
tional details of peptide identification are in SI Text. Labeled and unlabeled
peptides are listed in Dataset S1, and annotated peaklists are provided in
Detection of MT-SP1 Substrates. Samples (1 mL) of EDTA plasma were treated
with 1 μM MT-SP1 for 10, 30, and 60 s and then quenched by addition of
4-(2-aminoethyl)benzenesulfonyl fluoride and PMSF. An untreated sample
was used for a zero time point. Samples were processed as described above.
After desalting, they were labeled with iTRAQ reagents as follows: 0 s, mass
114; 10 s, mass 115; 30 s, mass 116; 60 s, mass 117, by using the protocol pro-
vided by the manufacturer. Additional details of iTRAQ analysis are provided
in SI Text.
ACKNOWLEDGMENTS. We thank A.L. Burlingame, D. Maltby, and J.C. Trinidad
for assistance with design and execution of mass spectrometry experiments,
N.J. Agard, E.D. Crawford, and C.M. Jackson for critical reading of the
manuscript, and members of the Wells and Burlingame groups for helpful
discussions. Active MT-SP1 was a generous gift from E.L. Madison (Catalyst
Biosciences, South San Francisco, CA). This work was supported by National
Institutes of Health (NIH) Grant F32GM079931 (to D.W.) and NIH Grant R01
GM081051 (to J.A.W.). Mass spectrometry was performed at the Bio-Organic
Biomedical Mass Spectrometry Resource at UCSF (A.L. Burlingame, Director)
supported by the Biomedical Research Technology Program of the NIH
National Center for Research Resources, NIH NCRR P41RR001614 and NIH
1. Anderson NL, Anderson NG (2002) The human plasma proteome: History, character,
and diagnostic prospects. Mol Cell Proteomics 1:845–867.
2. Liu T, et al. (2006) Evaluation of multiprotein immunoaffinity subtraction for plasma
proteomics and candidate biomarker discovery using mass spectrometry. Mol Cell Pro-
3. Zolotarjova N, et al. (2005) Differences among techniques for high-abundant protein
depletion. Proteomics 5:3304–3313.
4. Cohen MP, Clements RS (1999) Measuring glycated proteins: Clinical and methodolo-
gical aspects. Diabetes Technol The 1:57–70.
5. Agard NJ, Wells JA (2009) Methods for the proteomic identification of protease
substrates. Curr Opin Chem Biol 13:503–509.
6. Doucet A, et al. (2008) Metadegradomics: Toward in vivo quantitative degradomics of
proteolytic post-translational modifications of the cancer proteome. Mol Cell Proteo-
7. Abrahmsen L, et al. (1991) Engineering subtilisin and its substrates for efficient
ligation of peptide bonds in aqueous solution. Biochemistry 30:4151–4159.
8. Atwell S, Wells JA (1999) Selection for improved subtiligases by phage display. Proc
Natl Acad Sci USA 96:9497–9502.
9. Chang TK, Jackson DY, Burnier JP, Wells JA (1994) Subtiligase: A tool for semisynthesis
of proteins. Proc Natl Acad Sci USA 91:12544–12548.
10. Mahrus S, et al. (2008) Global sequencing of proteolytic cleavage sites in apoptosis by
specific labeling of protein N termini. Cell 134:866–876.
11. HortinGL, SviridovD, AndersonNL (2008)High-abundancepolypeptidesof thehuman
plasma proteome comprising the top 4 logs of polypeptide abundance. Clin Chem
12. Elias JE, Haas W, Faherty BK, Gygi SP (2005) Comparative evaluation of mass spectro-
metry platforms used in large-scale proteomics investigations. Nat Methods
13. States DJ, et al. (2006) Challenges in deriving high-confidence protein identifications
from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol
14. Brown JL, Roberts WK (1976) Evidence that approximately eighty per cent of the
soluble proteins from ehrlich ascites cells are N-alpha-acetylated. J Biol Chem
15. Timmer JC, et al. (2007) Profiling constitutive proteolytic events in vivo. Biochem J
16. Yi J, Kim C, Gelfand CA (2007) Inhibition of intrinsic proteolytic activities moderates
preanalytical variability and instability of human plasma. J Proteome Res 6:1768–1781.
17. Page MJ, Macgillivray RTA, Di Cera E (2005) Determinants of specificity in coagulation
proteases. J Thromb Haemost 3:2401–2408.
18. Sim RB, Tsiftsoglou SA (2004) Proteases of the complement system. Biochem Soc Trans
19. Schwartz D, Gygi SP (2005) An iterative statistical approach to the identification of
protein phosphorylationmotifs from
20. Jackson CM, Nemerson Y (1980) Blood coagulation. Annu Rev Biochem 49:765–811.
21. Sahu A, Lambris JD (2001) Structure and biology of complement protein C3, a connect-
ing link between innate and acquired immunity. Immunol Rev 180:35–48.
large-scaledata sets. NatBiotechnol
22. Hugli TE (1990) Structure and function of C3a anaphylatoxin. Curr Top Microbiol
23. McDonald L, Beynon RJ (2006) Positional proteomics: Preparation of amino-terminal
peptides as a strategy for proteome simplification and characterization. Nat Protocols
24. Schilling O, Overall CM (2008) Proteome-derived, database-searchable peptide
libraries for identifying protease cleavage sites. Nat Biotechnol 26:685–694.
25. Taylor RC, Cullen SP, Martin SJ (2008) Apoptosis: Controlled demolition at the cellular
level. Nat Rev Mol Cell Biol 9:231–241.
26. Janowski R, Abrahamson M, Grubb A, Jaskolski M (2004) Domain swapping in
N-truncated human cystatin C. J Mol Biol 341:151–160.
27. Liz MA, Gomes CM, Saraiva MJ, Sousa MM (2007) ApoA-I cleaved by transthyretin has
reduced ability to promote cholesterol efflux and increased amyloidogenicity. J Lipid
28. Villanueva J, et al. (2008) A sequence-specific exopeptidase activity test (SSEAT) for
"functional" biomarker discovery. Mol Cell Proteomics 7:509–518.
29. Olofsson MH, et al. (2007) Cytokeratin-18 is a useful serum biomarker for early
determination of response of breast carcinomas to chemotherapy. Clin Cancer Res
30. Fuchs CS, et al. (2008) Plasma insulin-like growth factors, insulin-like binding protein-3,
and outcome in metastatic colorectal cancer: Results from intergroup trial n9741.
Clin Cancer Res 14:8263–8269.
31. Harris FM, et al. (2003) Carboxyl-terminal-truncated apolipoprotein E4 causes
alzheimer'sdisease-like neurodegeneration and behavioraldeficits in transgenic mice.
Proc Natl Acad Sci USA 100:10966–10971.
32. Mizuguchi M, et al. (2008) Unfolding and aggregation of transthyretin by the
truncation of 50 N-terminal amino acids. Proteins 72:261–269.
33. Keshishian H, et al. (2007) Quantitative, multiplexed assays for low abundance
proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol
Cell Proteomics 6:2212–2229.
34. Keshishian H, et al. (2009) Quantification of cardiovascular biomarkers in patient plas-
ma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics
35. Addona TA, et al. (2009) Multi-site assessment of the precision and reproducibility of
multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotech-
36. Anderson NL, et al. (2004) Mass spectrometric quantitation of peptides and proteins
using stable isotope standards and capture by anti-peptide antibodies (SISCAPA).
J Proteome Res 3:235–244.
37. Marnett AB, Craik CS (2005) Papa's got a brand new tag: Advances in identification of
proteases and their substrates. Trends Biotechnol 23:59–64.
38. List K, Bugge TH, Szabo R (2006) Matriptase: Potent proteolysis on the cell surface.
Mol Med 12:1–7.
39. Uhland K (2006) Matriptase and its putative role in cancer. Cell Mol Life Sci
40. Rawlings ND, et al. (2008) Merops: The peptidase database. Nucleic Acids Res
41. Crooks GE, Hon G, Chandonia J-M, Brenner SE (2004) Weblogo: A sequence logo
generator. Genome Res 14:1188–1190.
www.pnas.org/cgi/doi/10.1073/pnas.0914495107Wildes and Wells