A preview of this full-text is provided by Springer Nature.
Content available from Nature Methods
This content is subject to copyright. Terms and conditions apply.
Brief CommuniCation
https://doi.org/10.1038/s41592-019-0498-4
Department of Genome Sciences, University of Washington, Seattle, WA, USA. *e-mail: jvillen@uw.edu
Proteins can be phosphorylated at neighboring sites resulting
in different functional states, and studying the regulation of
these sites has been challenging. Here we present Thesaurus,
a search engine that detects and quantifies phosphopeptide
positional isomers from parallel reaction monitoring and data-
independent acquisition mass spectrometry experiments. We
apply Thesaurus to analyze phosphorylation events in the
PI3K/AKT signaling pathway and show neighboring sites with
distinct regulation.
Hundreds of thousands of amino acids in thousands of pro-
teins are estimated to be actively phosphorylated in every human
cell1. Many proteins are phosphorylated at neighboring sites2 and
over half of sites in multi-phosphorylated proteins are within four
amino acids of each other3. Several well-studied proteins make use
of neighboring phosphorylation sites to act as switches (MAPK4
and CDC4 (ref. 5)), timers (PER6) or as negative inhibition toggles
(IRS1 (ref. 7)) but global analysis of these phosphorylation clusters
has remained impractical. Tandem mass spectrometry (MS/MS) of
tryptic peptides is a key tool in discovering and quantifying sites of
protein phosphorylation. Typical phosphoproteomic workflows use
data-dependent acquisition (DDA) to collect MS/MS spectra based
on precursor m/z as peptides chromatographically elute. Site local-
ization software tools such as Ascore8 assign the most likely phos-
phorylation position for each peptide using site-specific fragment
ions. To increase the number of distinct peptides that are sampled,
DDA dynamically excludes peptides of the same m/z from being
sampled repeatedly within a narrow elution time. However, phos-
phopeptides that exist as multiple positional isomers are difficult
to sample and assign using DDA because they have the same mass,
similar retention times and share many fragment ions.
Parallel reaction monitoring (PRM) and data-independent
acquisition (DIA) are alternative approaches that systematically col-
lect MS/MS spectra across the chromatographic elution profile of
peptides, improving quantitative reproducibility. While PRM meth-
ods target specific peptide precursors9, DIA methods acquire MS/
MS spectra systematically across the m/z space10. These methods
are free of both intensity biases during data collection and active
exclusion of previously sequenced precursors, making it possible
to detect closely eluting positional isomers. Despite the strengths
of these methods, assigning phosphorylation to a specific amino
acid remains difficult. Recently, Rosenberger etal.11 reported on
IPF, a peptide-centric tool that uses OpenSwath12 to determine the
most likely positional isomer from fragment ions in a peak. An
alternative spectrum-centric approach, PIQED13, deconvolves DIA
data with DIA-Umpire14 to enable site localization tools originally
designed for DDA. Finally, Specter15 deconvolves DIA signals using
linear combinations of spectra in libraries, and in some instances
can resolve positional isomers. A limitation of IPF and PIQED
is that they compete potential positional isomers with similar
retention times against each other and only the best scoring isomer
is reported. On the other hand, Specter was not designed for phos-
phopeptide localization and lacks site localization statistics. Here,
we extend these approaches and present a new DIA and PRM search
engine named Thesaurus, which is designed to specifically look for
positional isomers.
Thesaurus detects phosphopeptides with EncyclopeDIA and
a spectrum library16, and using the detections as retention time
anchors, iteratively finds new positional isomers that share many
of the same fragment ions but differ in their phosphorylation site-
specific ions (Methods, see Supplementary Fig. 1). Thesaurus can
detect multiple co-eluting positional isomers because it calculates
localization probabilities directly using an interference distribution,
rather than by competing isomers against each other. For each phos-
phopeptide, Thesaurus determines every possible positional isomer
and extracts corresponding site-specific fragment ion signals. Each
ion has a unique frequency of interference across the experiment,
and this frequency is highest with low m/z ions (Supplementary
Fig. 2). Thesaurus uses this frequency to calculate a background
distribution for each run and precursor isolation window, since
these distributions depend on peptide mass and various acquisi-
tion settings. Localization P values are calculated as the probability
that all site-specific ions were observable by chance in this back-
ground distribution and false discovery rate (FDR) corrected using
the Benjamini–Hochberg method. Thesaurus detects positional
isomers absent from the spectrum library by generating synthetic
spectra with shifted fragment ions. Thesaurus quantifies positional
isomers even if their precursor signals are convolved, using site-
specific ions to determine peak boundaries and including addi-
tional fragment ions that fit that shape.
We validated Thesaurus using a synthetic phosphopeptide DIA
dataset described previously11 (Supplementary Fig. 3), and found
that it produced both more detections and more accurate error
estimates than IPF and PIQED. In addition to correctly localizing
240 synthetic phosphopeptides, Thesaurus was also able to iden-
tify and flag 11 products of a gas-phase phosphate rearrangement
(Supplementary Fig. 4). We further demonstrated Thesaurus’ per-
formance with phosphopeptides derived from serum-stimulated
HeLa cells. Previously, we reported a human phosphopeptide
library based on nearly a thousand DDA experiments17. Here we
used a subset of this library containing 82,029 phosphopeptides,
where 44% of phosphopeptides are phosphorylated at multiple
positions (Supplementary Fig. 5). Thesaurus was able to detect an
average of 10,780 phosphopeptides across four technical replicates
(Supplementary Dataset 1), corresponding to an average of 6,288
confidently localized positional isomers (Supplementary Fig. 6a).
We found that within phosphopeptides containing multiple accep-
tor sites, approximately 13% were phosphorylated at multiple posi-
tions (Supplementary Fig. 6b).
Thesaurus: quantifying phosphopeptide
positional isomers
Brian C. Searle , Robert T. Lawrence , Michael J. MacCoss and Judit Villén *
NATURE METHODS | VOL 16 | AUGUST 2019 | 703–706 | www.nature.com/naturemethods 703
Content courtesy of Springer Nature, terms of use apply. Rights reserved