Sampling the N-terminal proteome
of human blood
David Wildes and James A. Wells1
Departments of Pharmaceutical Chemistry and Cellular and Molecular Pharmacology, University of California, San Francisco, Byers Hall, 1700 4th Street,
San Francisco, CA 94158
Contributed by James A Wells, December 15, 2009 (sent for review October 15, 2009)
The proteomes of blood plasma and serum represent a potential
gold mine of biological and diagnostic information, but challenges
such as dynamic range of protein concentration have hampered
efforts to unlock this resource. Here we present a method to label
and isolate N-terminal peptides from human plasma and serum.
This process dramatically reduces the complexity of the sample
by eliminating internal peptides. We identify 772 unique N-term-
inal peptides in 222 proteins, ranging over six orders of magnitude
in abundance. This approach is highly suited for studying natural
proteolysis in plasma and serum. We find internal cleavages in
plasma proteins created by endo- and exopeptidases, providing
information about the activities of proteolytic enzymes in blood,
which may be correlated with disease states. We also find signa-
tures of signal peptide cleavage, coagulation and complement ac-
tivation, and other known proteolytic processes, in addition to a
large number of cleavages that have not been reported previously,
including over 200 cleavages of blood proteins by aminopepti-
dases. Finally, we can identify substrates from specific proteases
by exogenous addition of the protease combined with N-terminal
isolation and quantitative mass spectrometry. In this way we iden-
tified proteins cleaved in human plasma by membrane-type serine
protease 1, an enzyme linked to cancer progression. These studies
demonstrate the utility of direct N-terminal labeling by subtiligase
to identify and characterize endogenous and exogenous proteoly-
sis in human plasma and serum.
plasma ∣ protease ∣ proteomics ∣ serum ∣ biomarker
in health and disease. Because the blood contacts virtually every
cell and tissue throughout the body, it contains many proteins and
other chemicals that may report on health and disease. In addi-
tion, blood collection is simple and minimally invasive, making it
a medium of choice for many classical diagnostic tests. Unfortu-
nately, the blood proteome has been challenging to exploit for
discovery of protein biomarkers, because of the large number
of unique proteins and their degradation products and the broad
range of protein concentrations (from millimolar to picomolar or
below) in serum and plasma. Just 22 proteins are estimated to
make up 99% of the blood proteome by mass. Promising candi-
dates for diagnostic markers, such as cytokines, growth factors,
and cancer-specific antigens, may be more than a billionfold less
abundant than the major blood proteins (1). Immunoaffinity
depletion of certain abundant proteins is typically employed to
improve dynamic range, though it has potential disadvantages,
including high cost and the possibility of removing low-
abundance species that bind to highly abundant proteins (2, 3).
Many biomarker discovery efforts search for variations in total
abundance of particular proteins. This approach is simple to
implement and conceptually straightforward but may miss poten-
tially informative variation in a sample. A given protein in blood
may be posttranslationally modified in myriad ways, including dif-
ferential glycosylation, sulfation, oxidation, glycation, proteolysis,
and many others. Modified proteins may be informative about
disease states; for instance, glycated hemoglobin (HbA1c) and
he proteomes of human blood serum and plasma contain a
vast amount of useful information about the state of the body
serum albumin are useful markers for diabetes mellitus (4). This
information is lost when protein abundance alone is measured.
Specific enrichment of modified species can address these
challenges, because separating modified from unmodified
peptides greatly reduces sample complexity. Proteolysis is an
excellent candidate for this strategy. Most blood proteins are
subject to at least one proteolytic cleavage, when the N-terminal
secretory signal is removed during biogenesis. Additional
cleavages may occur in the secretory pathway, and proteins
may be further processed by endo- and exoproteases acting in
biological processes such as coagulation and complement. These
proteolytic processes may be perturbed in disease, and disease-
specific, protease-derived new N-termini in blood may be a
valuable class of biomarkers.
Recently, we and others have developed a number of comple-
mentary chemical methods to isolate and identify N-terminal
peptides in proteins, on the basis of positive or negative enrich-
ment strategies (reviewed in refs. 5, 6). Here we apply one such
method to isolate and identify the products of proteolysis of
blood proteins, an underexploited class of potential biomarkers.
These methods and data can further illuminate the role of
proteases in blood biology and could provide a strategy for
blood-based biomarker discovery.
Results and Discussion
N-Terminal Enrichment Strategy. Specific labeling and isolation of
protein N termini is challenging, because of the similar reactivity
of N-terminal α-amines and the >20-fold more abundant
ϵ-amines of lysine side chains. We have addressed this challenge
by employing an engineered enzyme, called subtiligase. Subtili-
gase is a double mutant (S221C/ P225A) of the serine protease
subtilisin BPN′ from Bacillus amyloliquefaciens (7), containing
additional modifications that enhance stability (8, 9). It lacks
detectable protease activity but is capable of cleaving peptide
glycolate esters, forming a thioester enzyme intermediate that
can be transferred onto free protein and peptide N termini.
Subtiligase exhibits absolute specificity for N-terminal α-amines
over lysine ϵ-amines, making it an excellent tool for N-terminal
labeling. Our group has previously described a subtiligase-based
method for labeling, isolation, and enrichment of protein N ter-
mini in cell lysates (10). This protocol was modified for plasma
and serum labeling and is shown schematically in Fig. 1A. N-term-
inal peptides are isolated with a characteristic serine-tyrosine
dipeptide tag, providing a characteristic mass shift to all labeled
precursor ions as well as two prominent fragment ions in all
MS/MS spectra. This tag provides strong evidence for subtiligase
tagging, enrichment, and recovery.
Author contributions: D.W. and J.A.W. designed research; D.W. performed research;
D.W. and J.A.W. analyzed data; and D.W. and J.A.W. wrote the paper.
The authors declare no conflict of interest.
1To whom correspondence should be addressed: E-mail: email@example.com.
This article contains supporting information online at www.pnas.org/cgi/content/full/
www.pnas.org/cgi/doi/10.1073/pnas.0914495107PNAS ∣ March 9, 2010 ∣ vol. 107 ∣ no. 10 ∣ 4561–4566
The N-Terminal Proteome of Blood. By using our N-terminal
enrichment technique, we identified 772 unique N termini in
222 proteins in human serum and plasma (Dataset S1), with
an overall peptide false discovery rate estimated at 1.0% by a
target-decoy strategy. We found N termini in blood proteins with
concentrations spanning at least six orders of magnitude, with
excellent coverage in the top four logs of abundance, where
we detected over 70% of the 150 most abundant proteins (11)
(Fig. 2 and Table S1).
The number of unique termini found in each protein varied
greatly and was generally consistent with the role of proteolysis
in protein function. For example, multiple N termini were found
in many coagulation and complement factors. We also found pro-
teolysis of abundant proteins where the biological function of
proteolytic cleavage is less clear. It is possible that some of these
cleavages represent nonspecific cleavage of abundant proteins by
blood proteases, although it is notable that the correlation be-
tween number of N termini discovered and protein abundance
is weak. Indeed, the abundant plasma protein alpha-1 acid
glycoprotein yields no detectable N termini.
We assessed the reproducibility of our labeling and enrichment
strategy by performing three technical replicate experiments on a
single sample of citrated plasma (Fig. 2C and Dataset S1). We
found substantial overlap between these three samples, with
29% of peptides found in all three experiments and 56% found
in at least two. This level of overlap between technical replicates
is well within the range expected for the mass spectrometry
techniques that we used (12) and suggests that our N-terminal
labeling does not result in major variations between samples.
The subcellular localizations, as annotated in Swiss-Prot, for
each of the 222 proteins we report are shown in Fig. 3A. As would
be expected for a survey of blood proteins, 67% are known to be
secreted. The proportion of annotated secreted proteins reported
here is substantially higher than the 50% found in a recently com-
piled, high confidence list of proteins found in plasma (13), likely
reflecting a bias inherent in N-terminal labeling. Intracellular
proteins arising from both tissue leakage in vivo and cell lysis dur-
ing sample collection and preparation are likely to be acetylated
on their native N termini (14), rendering them undetectable in
the absence of internal proteolytic processing. Thus our method
is more sensitive to secreted proteins whose N termini are free
after signal sequence removal, and this is reflected in the propor-
tion of secreted protein identifications.
Endoproteolysis and Exoproteolysis in Blood. Most blood proteins
are subject to proteolytic cleavage by a multitude of proteases
in the secretory pathway and the extracellular environment.
Tracking these proteolytic events could shed light on important
biological processes in health and disease. We compared the N
termini that we found to annotations in the Swiss-Prot and MER-
OPS databases to identify known termini resulting from well-
understood biological processes (Fig. 3B). Interestingly, 81%
of the N termini that we found are not annotated in either da-
tabase. Annotated signal peptide cleavages, within five residues
of predicted signal processing sites, account for 11% of our data,
with annotated propeptide cleavages making up another 4.5%.
Plasmin activity on fibrinogen (2%), cleavage of “bait loops”
in protease inhibitors (0.9%), and removal of initiator methio-
nines (0.4%) are also represented. Consistent with previous stu-
dies (15), a significant number of N termini (28%) appear to arise
from aminopeptidase processing of peptides derived from endo-
protease cleavage, indicated by systematic laddering of products
(Fig. 3C). In total, 112 termini in 53 proteins are subject to ami-
nopeptidase trimming, ranging from removal of a single amino
acid to long ladders of aminopeptidase-processed termini. Trim-
ming occurs on termini derived from a variety of endoprotease
Log (plasma concentration (M))
# of termini
method. (A) A subset of 110 proteins of established abundance (11) is plotted
by mean molar concentration in plasma. Representative low, medium, and
high abundance proteins are labeled. (B) The number of N termini detected
in each protein is shown, arranged in order of abundance. Proteins depicted
in this plot are given in Table S1. (C) Venn diagram showing results of three
replicate experiments on a single sample of citrated plasma.
Concentration distribution of proteins and reproducibility of the
(A) Schematic of workflow. Subtiligase is used to transfer a peptide contain-
ing biotin and a TEV protease-cleavable linker onto protein N termini.
Proteins are then captured on streptavidin beads and trypsinized, removing
all but the N-terminal tryptic peptide. Trypsinization on beads reduces
unlabeled background created from sample precipitation in solution digests.
N-terminal peptides are released with TEV protease for strong cation ex-
change fractionation and MS/MS analysis. Release leaves a SY-dipeptide
tag on the N terminus.
Method for specific, enzymatic labeling of N termini in serum.
www.pnas.org/cgi/doi/10.1073/pnas.0914495107Wildes and Wells
cleavages, including signal and propeptide removal, reactive
center loop of serpin cleavage, and cleavages of unknown
Proteolysis in Serum and Plasma. We investigated the N-terminal
proteomes of human serum and plasma collected with three
different anticoagulants, with increasingly stringent suppression
of proteolytic activity: citrate, EDTA, and the proprietary P100
system (BD). Serum is expected to differ from plasma because
of the initiation of coagulation, resulting in cleavage of coagula-
tion factors, release of platelet granule contents, and a general
increase in proteolytic activity (16). A comparison of the types
of N termini found in serum and the three plasmas is shown
in Fig. 3D. The overall differences are modest but some patterns
are evident. Serum and citrated plasma are enriched for N
termini of unknown significance. EDTA and P100 plasma have
proportionately fewer unknown cleavages and a concomitant
increase in signal peptide and other annotated cleavages. This
increase is consistent with a higher background of proteolysis
in serum and citrated plasma, leading to cleavages of abundant
proteins after sample collection. Serum and citrate appear similar
in their background proteolysis levels, as has been shown pre-
viously (16). EDTA is a stronger inhibitor of plasma proteases
than citrate, and the lower proportion of unknown cleavages
in EDTA plasma reflects this. In these experiments the additional
protease inhibition provided by P100 tubes does not significantly
improve the results, though others have shown a reduction in pro-
teolysis of specific substrates (16). Interestingly, aminopeptidase-
derived termini comprised an approximately equal proportion of
all samples, suggesting either that this activity is not affected by
any of these anticoagulants or that these termini reflect in vivo
proteolytic processing that is not affected by the conditions of
Sequence and Structural Determinants of Proteolytic Cleavage. In or-
der to understand the nature of the proteolytic enzymes acting on
proteins in the blood, we investigated the cleavage sequences of
the 461 endoprotease-derived N termini of unknown significance
in our dataset (Fig. 4B). A preference is seen for basic (R, K)
residues preceding the cut site, and small (G, S, A) residues fol-
lowing, which is consistent with many endoproteases, including
those of the coagulation (17) and complement (18) pathways.
Efforts to discover a simple recognition motif(s) by using the
program MotifX (19) did not reveal a clear consensus sequence;
evidently either the protease(s) responsible have relatively low
specificity at the level of the primary structure of the cleavage
site or the proteolysis we see is because of the action of many
different proteases with varied specificity.
Proteolytic processing is important for the activation and inac-
tivation of factors involved in coagulation and complement cas-
cades (18, 20).Proteolysisin these systems has been characterized
extensively in vitro, and we compared our data to the in vitro find-
ings for some of these proteins (Fig. 5A). Prothrombin lies at the
center of the coagulation cascade and is activated to thrombin by
a series of discrete cleavages, shown as gaps in the rectangular
representation of the protein in Fig. 5A. Thrombin cleaves fibri-
nogen, initiating fibrin polymerization to form a clot (20).
We identified the expected activating cleavages of thrombin, in
addition to a few cleavages of unknown significance within the
In contrast to prothrombin, we see much more heterogeneous
cleavage of complement C3. C3 is extensively proteolyzed
throughout its life cycle in blood. After an activating cleavage
to C3a and C3b by C3 convertase complexes, factor I and other
enzymes inactivate C3b by additional cleavages. A series of dis-
crete fragments has been defined in vitro (21). Our findings in-
dicate that C3b inactivation by factor I in vivo may be more
heterogeneous than previously appreciated. Whereas the overall
distribution of fragments is consistent in our data, we detect clus-
ters of cleavages at the domain boundaries (shown as vertical ar-
rows in Fig. 4C) rather than isolated, discrete cuts. Interestingly,
these areas of heterogeneous proteolysis cluster internally only in
specific fragments of C3: C3a, C3g, and C3f. C3a in particular is a
mediator of inflammation, and internal cleavages within it may
antagonize this function (22). These data suggest that even
well-studied proteolytic cascades may yield unique insights from
such global analysis, made possible by N-terminal proteomics.
Proteome Simplification by N-Terminal Isolation. N-terminal isola-
tion reduces each protein in a mixture to one or a few peptides,
which potentially has the advantage of reducing the interference
caused by abundant proteins (23). For example, serum albumin
produces about 100 tryptic peptides but has only a single N ter-
minus, meaning that N-terminal labeling may reduce albumin
peptides by 100-fold. In practice, we detected internal peptides
from serum albumin (22 total) but at substantially reduced abun-
dance compared to the parent protein. This advantage has been
described in previous efforts to characterize N-terminal peptides,
both by depletion of all internal peptides from a trypsin-digested
sample (23) and by positive enrichment of N termini following
selective chemical blockage of lysine residues (15). Each of these
N-terminal enrichment methods has unique advantages and
tions, as annotated in Swiss-Prot (www.uniprot.org), of the proteins detected
in this study. (B) Cleavage site annotations of detected N termini, according to
Swiss-Prot and MEROPS (40) databases. (C) Evidence of aminopeptidase trim-
ming of N termini in three proteins. Similar degradation was seen in 112 N
termini. (D) Comparison of cleavage annotations found in serum and plasma
collected with three different anticoagulants.
The N-terminal proteome of human blood. (A) The subcellular loca-
Wildes and Wells PNAS
March 9, 2010
disadvantages and thus, in aggregate, are likely to provide com-
plementary information. For plasma and serum, subtiligase label-
ing has some advantage in that it does not rely on the absolute
efficiency of internal peptide depletion or lysine blocking.
Additional reduction in complexity is afforded by the sequence
specificity of subtiligase. Subtiligase is very promiscuous toward
N-terminal sequences, but it disfavors certain N-terminal
residues, including acidic side chains and proline (9). Several
abundant serum proteins, including serum albumin and apolipo-
labeled by subtiligase, reducing their relative abundance in our
experiments. Other abundant plasma proteins (e.g., alpha-1 acid
glycoprotein) have chemically blocked N termini. Perhaps be-
cause of these factors, we found little benefit from pretreatment
of plasma to remove the 12 most abundant proteins (Fig. S1),
although it is likely that targeted depletion of abundant proteins
with many N termini (e.g., C3, fibrinogen) would improve
It should be noted that sampling only particular peptides from
each protein can limit protein identification. N-terminal tryptic
peptides may be too short or too long or ionize poorly, rendering
them difficult to identify by database matching. With only a
single digest (in this case trypsin), this method is limited to
sampling rather than comprehensive coverage of the N-terminal
proteome. Alternate digestions with different specificities should
increase coverage, as demonstrated in other N-terminal proteo-
mic studies (24).
N-Terminal Proteomics and Biomarker Discovery. Specific proteases
may be up-regulated in diverse disease states, and their activity
may leave a mark on the plasma proteome. Proteolytic products
of certain intracellular events, including apoptotic and necrotic
cell death (25), may also be released into the blood, where they
may serve as useful markers of these processes. Proteolytic frag-
ments of normal plasma proteins are also directly implicated in
the pathogenesis of certain diseases, such as amyloidoses and
atherosclerosis (26, 27). Within our N-terminal peptide dataset,
we find examples of all of these classes (Table 1).
The widespread exoproteolysis we observe in our experiments
may represent useful patterns to monitor health and disease.
Tempst and coworkers have recently correlated exoprotease ac-
tivities in blood to metastatic cancer (28). The 112 exoprotease-
sensitive peptide sequences we identify here, derived from 53
proteins, greatly increase the number of potential sequences
available and should expand the scope of this approach to bio-
Proteolytic products of apoptosis are of significant interest in
biomarker discovery. Apoptosis, a form of programmed cell
death implicated in the response of cancer cells to chemotherapy
and radiation, is executed by a family of cysteine proteases called
caspases (25). Products of caspase proteolysis could serve as mar-
kers of successful cancer therapy. For example, increase in a cas-
pase-derived peptide from the intracellular protein cytokeratin
18 (CK-18) in the serum of breast cancer patients was correlated
with 5-year survival in one study (29). The method we describe
here may identify other such caspase-derived markers. Intrigu-
ingly, we find a peptide derived from cleavage of the abundant
protein gelsolin after the sequence DQTD. This cleavage has
been reported in cell culture screens for apoptotic caspase sub-
strates (10). In addition to this putative caspase-derived peptide,
we also find some intracellular proteins with internal proteolytic
cleavages, including abundant cellular proteins such as actin, as
well as less abundant proteins, including the Ran-specific GTPase
activating protein. How these proteins are cleaved and how they
reach the blood is unclear at present, but they may also represent
useful markers for disease states.
Proteolytically cleaved peptides and proteins in blood are not
only proxies for intracellular disease states; in some cases, they
are directly involved in pathogenesis. Several possible examples
of this occur in the data we report here, including proteolysis of
components of the insulin-like growth factor (IGF) signaling axis
Table 1. Putative disease-associated proteolytic cleavages detected
in this study
Protein or peptideDisease(s) References
Colorectal cancer, androgen-
insensitive prostate cancer
Senile systemic amyloidosis
Cerebral Hemorrhage with
quence logo of the eight residues (P4–P4′)
surrounding the cleavage site of all nonan-
notated, endoproteolytic cleavages. The y
axis denotes information content and has a
maximum value of 4.2. Logo created by
prothrombin and complement C3. Swiss-Prot
annotated cleavage sites divide the proteins
vage sites detected in this study are indicated
LC, thrombin light chain.
Patterns of endoproteolysis. (A) Se-
● A2M 705; ○ A2M 707; ▴ A2M 707; ▵ A2M720; ▪ complement C3 713;
□ complement C3 741.
Plots of iTRAQ signal for representative putative MT-SP1 substrates.
www.pnas.org/cgi/doi/10.1073/pnas.0914495107Wildes and Wells
that have been implicated in cancer progression (30), proteolysis
of transthyretin and apolipoprotein E that may be involved in se-
nile systemic amyloidosis and Alzheimer’s disease (31, 32), and
proteolysis of apolipoprotein AI that may play a role in athero-
sclerosis and cardiovascular disease (27).
Multiple reaction monitoring (MRM) methods combined with
stable isotope labeled peptide standards have shown great pro-
mise for quantification of biomarkers in plasma. These methods
are sensitive, showing a limit of quantitation in the low nanogram/
milliliter range in some cases (33, 34), and reproducible across
multiple laboratories (35). The sensitivity of MRM methods
can be further enhanced by specific enrichment of peptides by
using peptide-directed antibodies (36). N-terminal enrichment
may offer a similar benefit for enhancing detection of peptides
specifically associated with protease activity in blood. Whereas
the sensitivity and reproducibility of N-terminal peptide isolation
remain to be demonstrated in this context, the fact that we have
detected proteins present at the nanomolar to high picomolar le-
vel (e.g., VEGF-D and osteopontin) by using relatively insensitive
survey MS/MS methods suggests that this is a promising area for
Identification of Membrane-Type Serine Protease 1 (MT-SP1) Sites in
Human Plasma. In addition to identifying proteolyzed products
generally present in blood, our method can also be used to iden-
tify substrates of specific proteases. Proteolytic enzymes make up
2% of the human genome, but the biological significance of most
is unknown, owing partly to the difficulty in identifying natural
substrates (37). A sensitive method to detect specific protease
substrates must rely on quantitative proteomic methods, in order
to identify N termini that increase with time after exogenous ad-
dition or activation of a protease. As a proof of concept, we ex-
plored the substrates of MT-SP1 in plasma.
MT-SP1 is present on a variety of epithelial cell types, is natu-
rally shed into the blood, and is up-regulated in certain cancers. It
is essential for development of a functional epidermal barrier,
likely because of its role in processing profilaggrin to filaggrin
(38). However, MT-SP1 has also been shown to process a number
of other substrates, including prohepatocyte growth factor activa-
tor, prourokinase plasminogen activator, and protease activated
receptor 2 (39).
MT-SP1 is rapidly inhibited in plasma, with a half-life of activ-
ity of 30 s (Fig. S2 and SI Text). We therefore explored a 60-s time
course of proteolysis by labeling increasing time points with in-
creasing isobaric tag for relative and absolute quantitation
(iTRAQ) reporter ion masses. Of 86 peptides identified in this
experiment, 13 showed a large (>5-fold) change in iTRAQ signal
over the time course and were identified as putative substrates,
listed in Table 2. Plots of the iTRAQ ratio vs. time for represen-
tative peptides are shown in Fig. 5.
Several putative substrates of MT-SP1 may be of functional in-
terest. We observe multiple cuts in the bait loop of the protease
inhibitor α-2 macroglobulin (A2M), consistent with apparent
A2M inhibition in peptide experiments (Fig. S2 and SI Text).
Two cleavages in complement C3, located in the C3a anaphylo-
toxin domain, are also of interest. Free C3a is a potent mediator
of inflammation, whose function requires key C-terminal amino
acids (22). Interestingly, these residues are removed by one of the
two observed cleavages. Whereas our data cannot distinguish
whether this cleavage occurs in free C3a or in intact C3, it is
an intriguing observation, suggesting the possibility that cell-sur-
face MT-SP1 has the ability to inactivate C3a and reduce inflam-
N termini that increase in abundance after addition of MT-SP1
are not necessarily direct substrates of the protease. MT-SP1 is
known to activate uPA (39), which may process other proteins,
or activate plasmin, leading to further indirect proteolysis. We
detect several cleavages after lysine residues in fibrinogen Aα that
could result from either direct cleavage by MT-SP1 or indirect
cleavage through plasmin activation (Table 2). In addition, we de-
tect evidence of rapid aminopeptidase processing following MT-
SP1 cleavage of a single site in A2M. Cleavage after R704 in A2M
is consistent with the known sequence preferences of MT-SP1.
However, we also find cleavages after V705 and G706. These
are expected to be poor substrates of MT-SP1 and are unlikely
to be cleaved at the same rate as R704. More likely, this terminus
is subject to rapid aminopeptidase processing after exposure by
MT-SP1 cleavage. This observation is consistent with an active
interplay between endo- and exoproteases in blood.
Here we have described a method for labeling and enrichment of
N-terminal peptides from proteins in blood serum and plasma,
allowing us to identify the sequences of the sites of proteolytic
action in blood. We discovered many N termini corresponding
to known biological processes. In addition, over half of the N ter-
mini we found have not been reported in protein databases. Some
of these cuts may represent biologically significant substrates for
blood proteases, whereas others may be cleavages that occur dur-
ing sample collection and storage. These results may impact the
choice of representative peptides for MS-based quantification of
blood proteins; we identify protease-sensitive species that could
introduce significant variability if they were used for this purpose.
We also have demonstrated the utility of N-terminal labeling to
identify substrates of proteases acting in blood. We anticipate
using this method, particularly in combination with sensitive la-
bel-free quantification approaches, as a means to rapidly profile
the actions of proteases in blood, ranging from endogenous blood
coagulation and tissue surface proteases to pathogen-associated
N-terminal peptide isolation should simplify blood proteome
digests by enriching for one or a few peptides per protein.
Whereas this may reduce the sensitivity for detecting certain
proteins, such as those with N-terminal peptides that perform
poorly in the mass spectrometer or are too short for reliable
database matching, it likely has advantages for discovery of
biomarkers associated with proteolytic processes. It selects for
a suite of analytes that can be monitored by sensitive MRM meth-
ods, potentially without the need to enrich each individual ana-
lyte peptide by immunoaffinity methods. We believe that the
application of this method holds promise for future biomarker
Table 2. Putative substrates of MT-SP1
Fibrinogen α chain
Fibrinogen α chain
heavy chain H1
heavy chain H2
heavy chain H4
140TVGR ALYA 77.5
658 AGSRMNFR 12.6
*Numbering according to Swiss-Prot database.
†Substrates are defined as those peptides showing more than 3-fold change
in iTRAQ signal after 60 s.
Wildes and Wells PNAS
March 9, 2010
Materials and Methods
Materials. Details of materials used in this study may be found in SI Text.
Sample Labeling and Workup. Samples (2 mL) were biotinylated with
subtiligase and peptide ester for 60 min at room temperature. Proteins were
reduced, alkylated with iodoacetamide, captured on immobilized streptavi-
din, digested with trypsin, and released from the resin with TEV protease.
Additional details of the labeling and workup are provided in SI Text.
LCMS/MS Acquisition and Data Processing. Peptides were subject to offline
strong cation exchange fractionation and then to C18 chromatography
coupled directly to a QSTAR Pulsar or QSTAR Elite mass spectrometer.
Additional details of data acquisition and processing are in SI Text.
Peptide Identification. Database searches to identify peptides were per-
formed by using Protein Prospector v. 5.2.2 (UCSF Mass Spectrometry Facility,
prospector.ucsf.edu) and the December 2008 release of Swiss-Prot. Addi-
tional details of peptide identification are in SI Text. Labeled and unlabeled
peptides are listed in Dataset S1, and annotated peaklists are provided in
Detection of MT-SP1 Substrates. Samples (1 mL) of EDTA plasma were treated
with 1 μM MT-SP1 for 10, 30, and 60 s and then quenched by addition of
4-(2-aminoethyl)benzenesulfonyl fluoride and PMSF. An untreated sample
was used for a zero time point. Samples were processed as described above.
After desalting, they were labeled with iTRAQ reagents as follows: 0 s, mass
114; 10 s, mass 115; 30 s, mass 116; 60 s, mass 117, by using the protocol pro-
vided by the manufacturer. Additional details of iTRAQ analysis are provided
in SI Text.
ACKNOWLEDGMENTS. We thank A.L. Burlingame, D. Maltby, and J.C. Trinidad
for assistance with design and execution of mass spectrometry experiments,
N.J. Agard, E.D. Crawford, and C.M. Jackson for critical reading of the
manuscript, and members of the Wells and Burlingame groups for helpful
discussions. Active MT-SP1 was a generous gift from E.L. Madison (Catalyst
Biosciences, South San Francisco, CA). This work was supported by National
Institutes of Health (NIH) Grant F32GM079931 (to D.W.) and NIH Grant R01
GM081051 (to J.A.W.). Mass spectrometry was performed at the Bio-Organic
Biomedical Mass Spectrometry Resource at UCSF (A.L. Burlingame, Director)
supported by the Biomedical Research Technology Program of the NIH
National Center for Research Resources, NIH NCRR P41RR001614 and NIH
1. Anderson NL, Anderson NG (2002) The human plasma proteome: History, character,
and diagnostic prospects. Mol Cell Proteomics 1:845–867.
2. Liu T, et al. (2006) Evaluation of multiprotein immunoaffinity subtraction for plasma
proteomics and candidate biomarker discovery using mass spectrometry. Mol Cell Pro-
3. Zolotarjova N, et al. (2005) Differences among techniques for high-abundant protein
depletion. Proteomics 5:3304–3313.
4. Cohen MP, Clements RS (1999) Measuring glycated proteins: Clinical and methodolo-
gical aspects. Diabetes Technol The 1:57–70.
5. Agard NJ, Wells JA (2009) Methods for the proteomic identification of protease
substrates. Curr Opin Chem Biol 13:503–509.
6. Doucet A, et al. (2008) Metadegradomics: Toward in vivo quantitative degradomics of
proteolytic post-translational modifications of the cancer proteome. Mol Cell Proteo-
7. Abrahmsen L, et al. (1991) Engineering subtilisin and its substrates for efficient
ligation of peptide bonds in aqueous solution. Biochemistry 30:4151–4159.
8. Atwell S, Wells JA (1999) Selection for improved subtiligases by phage display. Proc
Natl Acad Sci USA 96:9497–9502.
9. Chang TK, Jackson DY, Burnier JP, Wells JA (1994) Subtiligase: A tool for semisynthesis
of proteins. Proc Natl Acad Sci USA 91:12544–12548.
10. Mahrus S, et al. (2008) Global sequencing of proteolytic cleavage sites in apoptosis by
specific labeling of protein N termini. Cell 134:866–876.
11. HortinGL, SviridovD, AndersonNL (2008)High-abundancepolypeptidesof thehuman
plasma proteome comprising the top 4 logs of polypeptide abundance. Clin Chem
12. Elias JE, Haas W, Faherty BK, Gygi SP (2005) Comparative evaluation of mass spectro-
metry platforms used in large-scale proteomics investigations. Nat Methods
13. States DJ, et al. (2006) Challenges in deriving high-confidence protein identifications
from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol
14. Brown JL, Roberts WK (1976) Evidence that approximately eighty per cent of the
soluble proteins from ehrlich ascites cells are N-alpha-acetylated. J Biol Chem
15. Timmer JC, et al. (2007) Profiling constitutive proteolytic events in vivo. Biochem J
16. Yi J, Kim C, Gelfand CA (2007) Inhibition of intrinsic proteolytic activities moderates
preanalytical variability and instability of human plasma. J Proteome Res 6:1768–1781.
17. Page MJ, Macgillivray RTA, Di Cera E (2005) Determinants of specificity in coagulation
proteases. J Thromb Haemost 3:2401–2408.
18. Sim RB, Tsiftsoglou SA (2004) Proteases of the complement system. Biochem Soc Trans
19. Schwartz D, Gygi SP (2005) An iterative statistical approach to the identification of
protein phosphorylationmotifs from
20. Jackson CM, Nemerson Y (1980) Blood coagulation. Annu Rev Biochem 49:765–811.
21. Sahu A, Lambris JD (2001) Structure and biology of complement protein C3, a connect-
ing link between innate and acquired immunity. Immunol Rev 180:35–48.
large-scaledata sets. NatBiotechnol
22. Hugli TE (1990) Structure and function of C3a anaphylatoxin. Curr Top Microbiol
23. McDonald L, Beynon RJ (2006) Positional proteomics: Preparation of amino-terminal
peptides as a strategy for proteome simplification and characterization. Nat Protocols
24. Schilling O, Overall CM (2008) Proteome-derived, database-searchable peptide
libraries for identifying protease cleavage sites. Nat Biotechnol 26:685–694.
25. Taylor RC, Cullen SP, Martin SJ (2008) Apoptosis: Controlled demolition at the cellular
level. Nat Rev Mol Cell Biol 9:231–241.
26. Janowski R, Abrahamson M, Grubb A, Jaskolski M (2004) Domain swapping in
N-truncated human cystatin C. J Mol Biol 341:151–160.
27. Liz MA, Gomes CM, Saraiva MJ, Sousa MM (2007) ApoA-I cleaved by transthyretin has
reduced ability to promote cholesterol efflux and increased amyloidogenicity. J Lipid
28. Villanueva J, et al. (2008) A sequence-specific exopeptidase activity test (SSEAT) for
"functional" biomarker discovery. Mol Cell Proteomics 7:509–518.
29. Olofsson MH, et al. (2007) Cytokeratin-18 is a useful serum biomarker for early
determination of response of breast carcinomas to chemotherapy. Clin Cancer Res
30. Fuchs CS, et al. (2008) Plasma insulin-like growth factors, insulin-like binding protein-3,
and outcome in metastatic colorectal cancer: Results from intergroup trial n9741.
Clin Cancer Res 14:8263–8269.
31. Harris FM, et al. (2003) Carboxyl-terminal-truncated apolipoprotein E4 causes
alzheimer'sdisease-like neurodegeneration and behavioraldeficits in transgenic mice.
Proc Natl Acad Sci USA 100:10966–10971.
32. Mizuguchi M, et al. (2008) Unfolding and aggregation of transthyretin by the
truncation of 50 N-terminal amino acids. Proteins 72:261–269.
33. Keshishian H, et al. (2007) Quantitative, multiplexed assays for low abundance
proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol
Cell Proteomics 6:2212–2229.
34. Keshishian H, et al. (2009) Quantification of cardiovascular biomarkers in patient plas-
ma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics
35. Addona TA, et al. (2009) Multi-site assessment of the precision and reproducibility of
multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotech-
36. Anderson NL, et al. (2004) Mass spectrometric quantitation of peptides and proteins
using stable isotope standards and capture by anti-peptide antibodies (SISCAPA).
J Proteome Res 3:235–244.
37. Marnett AB, Craik CS (2005) Papa's got a brand new tag: Advances in identification of
proteases and their substrates. Trends Biotechnol 23:59–64.
38. List K, Bugge TH, Szabo R (2006) Matriptase: Potent proteolysis on the cell surface.
Mol Med 12:1–7.
39. Uhland K (2006) Matriptase and its putative role in cancer. Cell Mol Life Sci
40. Rawlings ND, et al. (2008) Merops: The peptidase database. Nucleic Acids Res
41. Crooks GE, Hon G, Chandonia J-M, Brenner SE (2004) Weblogo: A sequence logo
generator. Genome Res 14:1188–1190.
www.pnas.org/cgi/doi/10.1073/pnas.0914495107Wildes and Wells