ArticlePDF Available

Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry

Springer Nature
Nature Communications
Authors:

Abstract and Figures

Data independent acquisition (DIA) mass spectrometry is a powerful technique that is improving the reproducibility and throughput of proteomics studies. Here, we introduce an experimental workflow that uses this technique to construct chromatogram libraries that capture fragment ion chromatographic peak shape and retention time for every detectable peptide in a proteomics experiment. These coordinates calibrate protein databases or spectrum libraries to a specific mass spectrometer and chromatography setup, facilitating DIA-only pipelines and the reuse of global resource libraries. We also present EncyclopeDIA, a software tool for generating and searching chromatogram libraries, and demonstrate the performance of our workflow by quantifying proteins in human and yeast cells. We find that by exploiting calibrated retention time and fragmentation specificity in chromatogram libraries, EncyclopeDIA can detect 20–25% more peptides from DIA experiments than with data dependent acquisition-based spectrum libraries alone.
An approach for quantifying peptides with chromatogram libraries. a The chromatogram library generation workflow. Briefly, in addition to collecting wide-window DIA experiments on each quantitative replicate, a pool containing peptides from every condition is measured using several staggered narrowwindow DIA experiments. After deconvolution, these narrow-window experiments have 2 m/z precursor isolation, which is analogous to targeted parallel reaction monitoring (PRM) experiments, except effectively targeting every peptide between 400 and 1000 m/z. We detect peptide anchors from these experiments using either EncyclopeDIA (searching a DDA spectrum library) or PECAN/Walnut (using a protein database) and chromatographic data about each peptide is stored in a chromatogram library with retention times, peak shape, fragment ion intensities, and known interferences tuned specifically for the LC/MS/MS setup. EncyclopeDIA then uses these precise coordinates for m/z, time, and intensity to detect peptides in the quantitative samples. b The EncyclopeDIA algorithmic workflow for searching spectrum and chromatogram libraries. After reading and deconvoluting DIA raw files, EncyclopeDIA calculates several retention time independent feature scores for each peptide that are amalgamated and FDR corrected with Percolator. Using high confidence peptide detections, EncyclopeDIA retention time aligns detections to the library, determines the retention time accuracy, and reconsiders outliers. After a second FDR correction with Percolator, EncyclopeDIA autonomously picks fragment ion transitions that fit each nonparametrically calculated peak shape and quantifies peptides using these ions
… 
This content is subject to copyright. Terms and conditions apply.
ARTICLE
Chromatogram libraries improve peptide detection
and quantication by data independent acquisition
mass spectrometry
Brian C. Searle1,2, Lindsay K. Pino1, Jarrett D. Egertson1, Ying S. Ting 1, Robert T. Lawrence1,
Brendan X. MacLean1, Judit Villén1& Michael J. MacCoss 1
Data independent acquisition (DIA) mass spectrometry is a powerful technique that is
improving the reproducibility and throughput of proteomics studies. Here, we introduce an
experimental workow that uses this technique to construct chromatogram libraries that
capture fragment ion chromatographic peak shape and retention time for every detectable
peptide in a proteomics experiment. These coordinates calibrate protein databases or
spectrum libraries to a specic mass spectrometer and chromatography setup, facilitating
DIA-only pipelines and the reuse of global resource libraries. We also present EncyclopeDIA,
a software tool for generating and searching chromatogram libraries, and demonstrate the
performance of our workow by quantifying proteins in human and yeast cells. We nd that
by exploiting calibrated retention time and fragmentation specicity in chromatogram
libraries, EncyclopeDIA can detect 2025% more peptides from DIA experiments than with
data dependent acquisition-based spectrum libraries alone.
DOI: 10.1038/s41467-018-07454-w OPEN
1Department of Genome Sciences, University of Washington, Seattle, WA, USA. 2Proteome Software, Portland, OR, USA. Correspondence and requests for
materials should be addressed to M.J.M. (email: maccoss@uw.edu)
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 1
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Over the past two decades the continued renement of
proteomics methods using liquid chromatography (LC)
coupled to tandem mass spectrometry (MS/MS) has
enabled a deeper understanding of human biology and disease1,2.
Recently data independent acquisition3,4(DIA), in which the
mass spectrometer systematically acquires MS/MS spectra irre-
spective of whether or not a precursor signal is detected, has
emerged as a powerful alternative approach to data dependent
acquisition5(DDA) for proteomics experiments. In current DIA
workows, instrument cycle is structured such that the same MS/
MS spectrum window is collected every 15 s, enabling quanti-
tative measurements using fragment ions instead of precursor
ions. This approach produces data analogous to targeted parallel
reaction monitoring (PRM), except instead of targeting specic
peptides, quantitative data is acquired across a predened mass to
charge (m/z) range. One trade-off is that to cover the m/z space
where the majority of peptides exist, the mass spectrometer must
be tuned to produce MS/MS spectra with wide precursor isolation
windows that often contain multiple peptides at the same time.
These additional peptides produce interfering fragment ions, and
database search engines for DDA that rely on a precursor isola-
tion window of at most a few daltons can struggle to detect the
signal for a particular peptide from that background interference.
The PAcIFIC approach6attempts to overcome this difculty by
using multiple gas-phase fractionated injections of the same
sample to increase precursor isolation at the cost of both sample
and instrument time.
Spectrum-centric tools7,8attempt to deconvolve peptide signals
from DIA data by time aligning elution peaks for both fragment
and precursor ions. In contrast, peptide-centric tools analyze DIA
measurements to look for individual peptides across all spectra in
a precursor isolation window. Spectrum library search tools for
DIA data912 use fragmentation patterns and relative retention
times from previously collected DDA data. Other tools such as
PECAN13 query DIA data using just peptide sequences and their
predicted fragmentation pattern without requiring a spectrum
library. While library searching can achieve better sensitivity than
PECAN, the approach is limited to detecting only analytes
represented in the library. In addition, the quality of library-based
detections is only as strong as the quality of the library itself.
Because mapping fragmentation patterns and retention times
across instruments and platforms is difcult, many researchers
prefer to simultaneously acquire both DDA and DIA data from
their samples14,15. While this implicitly increases the acquisition
time and sample consumption, it becomes possible to detect
peptides using the DDA data while making peptide quantitation
measurements using the DIA data. However, detection sensitivity
is inherently limited to that of the DDA data.
Typically tens to hundreds of biological samples are processed
and analyzed using LC-MS/MS in quantitative proteomics
experiments. The regularity of DIA allows researchers to make
peptide detections in one sample and transfer those detections to
other samples16. Here, we extrapolate this concept by collecting
certain runs where data acquisition is tuned to improve peptide
detection rates, while collecting other runs with a focus on
quantication accuracy and throughput. These runs can be
searched using either a typical DDA spectrum library-based
workow or a pure DIA workow using PECAN, or spectrum-
centric search methods based on DIA-Umpire8or Spectronaut
Pulsar. Results from runs dedicated to peptide detection are
formed into a DIA-based chromatogram library. In a chroma-
togram library, we catalog retention time, precursor mass, peptide
fragmentation patterns, and known interferences that identify
each peptide on our instrumentation within a specic sample
matrix. Furthermore, we report the development EncyclopeDIA,
a library search engine that takes advantage of chromatogram
libraries, and we demonstrate a substantial gain in sensitivity over
typical DIA and DDA workows. This tool is instrument vendor
neutral and available as an open source project with both a GUI
and command line interface.
Results
The EncyclopeDIA workow. EncyclopeDIA is comprised of
several algorithms for DIA data analysis (Fig. 1b) that can search
for peptides using either DDA-based spectrum libraries or DIA-
based chromatogram libraries. In addition, the EncyclopeDIA
executable contains the Walnut search engine, which is a per-
formance optimized re-implementation of the PECAN algo-
rithm13 to search protein sequence FASTA databases (see
Supplementary Note 1 for further details). The algorithms in the
EncyclopeDIA workow are described in full detail in the
Methods section. Briey, the EncyclopeDIA workow starts with
reading raw MS/MS data in mzML les into an SQLite database
designed for querying fragment spectra across precursor isolation
windows. If fragment spectra are collected using overlapping
windows, they are deconvoluted on the y during le reading.
Libraries are read as DLIB (DDA-based spectrum libraries) or
ELIB (DIA-based chromatogram libraries). EncyclopeDIA
determines the highest scoring retention time point correspond-
ing to each library spectrum (as well as a paired reverse sequence
decoy) using a scoring system modeled after the X!Tandem
HyperScore17. Fifteen auxiliary match features (not based on
retention time) are calculated at this time point. These features
are aggregated and submitted to Percolator 3.118, a semi-
supervised SVM algorithm for interpreting target/decoy peptide
detections, for a rst pass validation. EncyclopeDIA generates a
retention time model from peptides detected at 1% FDR using a
non-parametric kernel density estimation algorithm that follows
the density mode across time. Any target or decoy peptide in the
feature set that does not match the retention time model is
reconsidered up to ve times until we nd a highest scoring
retention time point that matches the model. The retention time-
curated feature sets are submitted to Percolator for nal pass
validation at 1% peptide FDR.
Chromatogram library generation. Gas-phase fractionated DIA
uses multiple injections with data acquisition methods that are
tiled to span different precursor isolation windows6. With mod-
ern instrumentation it is possible to collect near proteome-wide
DIA measurements with equivalently narrow precursor isolation
to DDA using as few as six gas-phase fractionated injections.
Previously we have shown that this type of DIA experiment can
produce substantially richer peptide detection lists than similarly
acquired DDA experiments13. In addition, it is much easier to
detect low abundance peptides from gas-phase fractionated DIA
using library search engines or database search engines than when
searching wide-window DIA runs, which attempt to collect near
proteome-wide measurements with a single injection. However,
this strategy is impractical in both total instrumentation time and
sample requirements to be performed for large quantitative
experiments. We propose an approach to collecting DIA data
using chromatogram libraries that leverages the deep sampling of
gas-phase fractionated DIA while still maintaining high
throughput (Fig. 1a).
In addition to collecting wide-window DIA experiments of
each biological sample, we also collect narrow-window gas-phase
fractionated DIA runs of pooled subaliquots of those samples. We
detect peptides from the resulting narrow precursor isolation
windows using library search engines (such as EncyclopeDIA) or
DIA-specic database search engines (such as Walnut). To
generate a chromatogram library, we catalog the retention time,
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
2NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
peak shape, fragmentation patterns, and known interferences of
detected peptides ltered to a 1% global peptide FDR. With DIA
experiments we expect interference, so rather than removing
impure library spectra and spectra that contain multiple
peptides19, we simply only retain +1H and +2H fragment ions
for expected B-type and Y-type ions. Due to gas-phase
fractionated tiling, each peptide is only represented in the
narrow-window data once, which eliminates the need for
spectrum averaging19 or best spectrum selection20 typically used
by DDA-based library curation tools. In the chromatogram
library for each peptide we retain only the highest scoring charge
state (as determined by Percolator) to limit the search space.
Chromatogram libraries implicitly contain a subset of the
peptides found in DDA-based spectrum libraries, but the peptides
they do contain have chromatographic and fragmentation data
calibrated specically to DIA experiments on that instrumenta-
tion platform. One limitation is that peptides that cannot be
detected in narrow-window DIA runs of the pooled sample will
not be searched for in subsequent runs. We feel that very few
quantitatively reliable peptides will be detectable in the wide-
window data that are not also detectable in the narrow data and
that the smaller search space represented by chromatogram
libraries can increase the signicance of peptide detections21.In
cases where rare variants are important to a study or if samples
are likely to represent very disparate proteomes, it may also be
possible generate chromatogram libraries from multiple batches
of narrow-window acquisitions from different sample pools.
In this study we generated a chromatogram library using
peptides derived from HeLa S3 cell lysates. First we used Skyline
to assemble a HeLa-specic DDA-based spectrum library
containing 166.4k unique peptides (representing 9947 protein
groups) from 39 raw les acquired for other experiments. These
les were collected from SCX and high-pH reverse phase
fractions acquired with a Thermo Q Exactive tandem mass
spectrometer using multiple HPLC gradients to vary the local
peptide matrix. Using this library as a starting point, we
constructed a chromatogram library from six gas-phase fractio-
nated DIA runs with 52 overlapping 4 m/z-wide windows. We
collected these runs with a Thermo Q-Exactive HF tandem mass
spectrometer using a 90 min linear gradient. After overlap
deconvolution, these experiments produced 300 2 m/z-wide
windows, which is analogous to if we had conducted targeted
PRM acquisition except that we are targeting all precursors
between 400.43 and 1000.70 m/z. Following the scheme in Fig. 1a,
we searched the narrow-window data against a HeLa-specic
DDA spectrum library (166k unique peptides), producing a
ab
N * quantitative samples 1 * detection samples
Pool
400 – 1000
M/Z
400 – 500
M/Z
EncyclopeDIA
(or Walnut)
Spectrum
library
(or FASTA)
EncyclopeDIA
On-column
chromatogram
library
Quantified
peptides
N *
EncyclopeDIA
Deconvolute
overlapping
windows
Read DIA
mzML file
Compute match
features
Iterative
retention time
alignment
Final pass
percolator
First pass
percolator
Automated
transition
refinement
Peptide
quantitation
600 – 700
M/Z
800 – 900
M/Z
500 – 600
M/Z
700 – 800
M/Z
900 – 1000
M/Z
Fig. 1 An approach for quantifying peptides with chromatogram libraries. aThe chromatogram library generation workow. Briey, in addition to collecting
wide-window DIA experiments on each quantitative replicate, a pool containing peptides from every condition is measured using several staggered narrow-
window DIA experiments. After deconvolution, these narrow-window experiments have 2 m/z precursor isolation, which is analogous to targeted parallel
reaction monitoring (PRM) experiments, except effectively targeting every peptide between 400 and 1000 m/z. We detect peptide anchors from these
experiments using either EncyclopeDIA (searching a DDA spectrum library) or PECAN/Walnut (using a protein database) and chromatographic data
about each peptide is stored in a chromatogram library with retention times, peak shape, fragment ion intensities, and known interferences tuned
specically for the LC/MS/MS setup. EncyclopeDIA then uses these precise coordinates for m/z, time, and intensity to detect peptides in the quantitative
samples. bThe EncyclopeDIA algorithmic workow for searching spectrum and chromatogram libraries. After reading and deconvoluting DIA raw les,
EncyclopeDIA calculates several retention time independent feature scores for each peptide that are amalgamated and FDR corrected with Percolator.
Using high condence peptide detections, EncyclopeDIA retention time aligns detections to the library, determines the retention time accuracy, and
reconsiders outliers. After a second FDR correction with Percolator, EncyclopeDIA autonomously picks fragment ion transitions that t each non-
parametrically calculated peak shape and quanties peptides using these ions
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w ARTICLE
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
chromatogram library containing 99.6k unique peptides, and an
analogous search against the Pan-Human spectrum library22
(159k unique peptides) producing a chromatogram library
containing 91.1k unique peptides. We also produced a third
library containing 53.2k unique peptides using Walnut to detect
peptides directly from the narrow-window DIA data using a
Uniprot Human FASTA database. The difference in chromato-
gram library size by searching DDA-based spectrum libraries
with EncyclopeDIA or a FASTA database (1143k unique +2H/
+3H tryptic peptides) with Walnut is in part because the
spectrum library represents a more targeted search space, while
additionally including expected post-translationally modied
(oxidized and acetylated) peptides, as well as peptides with
multiple missed cleavages and expected +4H/+5H/+6H
peptides.
Comparison of spectrum and chromatogram library searches.
We evaluated the chromatogram library strategy using peptides
derived from a HeLa S3 cell lysate as a representative high-
complexity proteome. In addition to generating the library, we
also collected triplicate wide-window DIA runs with 52 over-
lapping 24 m/z-wide windows from the same sample using the
same 90 min linear gradient. We found an average of 72.3k
peptides when searching against the chromatogram library con-
structed using a HeLa-specic DDA-based spectrum library.
Corroborating experiments from other groups23,24, with DIA we
can detect up to 2x more peptides than our benchmark top-20
DDA experiments (Supplementary Figure 1). While Bruderer
et al.23 found a signicant performance drop when comparing
results from previously acquired global libraries (such as the Pan-
Human library22) to experiment-specic DDA spectrum libraries,
we did not nd a similar drop when searching chromatogram
libraries generated from the Pan-Human library. This result
indicates that our approach enables the reuse of previously
acquired global libraries intended as community standards
without requiring generating experiment-specic DDA libraries.
In a sense, chromatogram libraries provide a calibration step that
substitutes the data in DDA spectrum libraries or fragmentation
models in database search engines for DIA-specic fragmentation
and HPLC/column-specic retention times. Despite this
increased detection rate, we still nd that DIA produces more
consistent results compared to DDA, as indicated by the overlap
in peptide detections between triplicate injections (Fig. 2b, c).
This agrees with previous reports that DIA quantication is both
more uniform13 and more accurate8,10,12. While other library
1.2x
20,969
11,712
25,564
0
20,579
11,431
25,480
0
20,462
11,732
25,522
0
0 5000 10,000 15,000 20,000 25,000
Rep 1
Rep 2
Rep 3
2.3x
1.5x
2.2x
7%
7% 7%
6% 5%
6%
63%
4%
4% 5%
4% 5%
5%
73%
6%
6% 7%
5% 5%
6%
65%
2%
2% 2%
3% 3%
3%
86%
HeLa-specific library-based
chromatogram libraryComet (DDA)
Comet (DDA)
bc
ef
DDA (benchmark)
Yeast FASTA
Chromatogram library
based on yeast FASTA
Chromatogram library
based on HeLa-specific library
Yeast FASTA-based
chromatogram library
# Detections in yeast
replicates
Number of peptides
Number of peptides
When searched with:
33,597
20,682
47,689
47,815
71,587
54,292
66,797
0
34,111
20,585
47,715
47,372
72,085
54,469
67,174
0
33,687
20,428
47,897
48,402
72,608
54,579
67,483
0
0 20,000 40,000 60,000 80,000
Rep 1
Rep 2
Rep 3
a
d
When searched with:
DDA (benchmark)
Human FASTA
Chromatogram library
based on human FASTA
HeLa-specific spectrum library
Chromatogram library
based on HeLa-specific library
Pan-human spectrum library
Chromatogram library
based on pan-human library
Chromatogram library
based on yeast FASTA
# Detections in HeLa replicates
Fig. 2 Untargeted peptide detections using DDA and DIA. We used EncyclopeDIA to search chromatogram and spectrum libraries, while we used Comet
and Walnut to search DDA and DIA data directly using FASTA protein databases. Every search was performed independently without any run-to-run
alignment. aThe number of peptide detections at 1% peptide FDR in triplicate HeLa injections. Orange shaded areas indicate pairwise comparisons of
FASTA searches versus FASTA-based chromatogram library searches. Purple (or green) shaded areas indicate pairwise comparisons of searches of a cell
line-specic DDA library (or the Pan-Human DDA library) versus a chromatogram library derived from that DDA library. bThe overlap in HeLa S3 peptide
detections between replicates using DDA searched by Comet and cusing DIA searched by EncyclopeDIA where the size of Venn diagram circles in HeLa
analyses are consistent with the number of detections. dThe number of peptide detections at 1% peptide FDR in triplicate BY4741 yeast injections. eThe
overlap in yeast peptide detections between replicates using DDA searched by Comet and fusing DIA searched by EncyclopeDIA where the size of circles
are consistent with the number of yeast peptide detections
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
4NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
search tools such as Skyline25 cannot make use of all chromato-
gram library features, we nd that Skyline still produces higher
detection rates when searching chromatogram libraries as com-
pared to both the HeLa-specic and Pan-Human DDA-based
spectrum libraries (Supplementary Figure 2).
We also evaluated the creation of chromatogram libraries using
a DIA-only workow. Using this approach, we were able to detect
an average of 20.6k peptides from the Uniprot Human FASTA
database using Walnut, or approximately 0.6× of the detections
found by top-20 DDA. In contrast, we found an average of 47.8k
peptides (2.3× increase) when we searched the Walnut-based
chromatogram library with EncyclopeDIA (Fig. 2a), or approxi-
mately 1.4× more than DDA. These results agree with previous
work13 showing that Pecan does not perform as well as DDA
when searching wide-window runs, but typically outperforms
DDA when searching gas-phase fractionated runs. Interestingly,
the DIA-only workow found nearly an equal number of peptides
compared to searching the 39 injection HeLa-specic DDA-based
spectrum library, while requiring only an additional six library-
building injections. Conrming these results, we performed the
same analysis using a yeast cell lysate and found similar
improvement rates when comparing Walnut versus Encyclope-
DIA using a Walnut-based chromatogram library (2.2× increase,
Fig. 2d), or 1.2× more than top-20 DDA. Here we observe more
modest gains over DDA experiments, which likely reects the
lowered proteomic complexity of yeast versus human cells and is
echoed in the tight overlap (86%) between triplicate DIA
injections versus DDA (Fig. 2e, f). As is possible with any
computational strategy that incorporates machine learning, we
were concerned with the potential for overtting that might
manifest in over exaggerated peptide detection rates. To answer
this question we searched the HeLa wide-window DIA data using
the yeast chromatogram library (and vice versa) to verify that we
see a negative result when searching the wrong library. As
expected this result (Fig. 2a, d) produced zero peptide detections
that passed a 1% peptide FDR threshold.
We also nd that DIA analysis with chromatogram libraries is
more sensitive at detecting low abundance proteins at a 1%
protein FDR. Using tandem afnity purication tagging and
quantitative Western blots, Ghaemmaghami et al.26 quantied
3868 yeast proteins with more than 50 estimated copies per cell.
In this study we replicated strain and growing conditions as
closely as possible to use their measurements as an
independent benchmark. While both DDA and DIA condently
detect the majority of proteins at levels above 104copies per
cell, DIA outperforms DDA by 49% with proteins
estimated to have between 103and 104copies per cell and by
2× with proteins estimated between 102and 103copies per cell
(Fig. 3).
Retention time and fragmentation pattern calibration. One of
the primary reasons on-column chromatogram libraries improve
performance is that they exploit within batch retention time
reproducibility. Accurate retention time ltering is an important
consideration when analyzing high-complexity proteomes with
DIA, and virtually all DIA library search engines make use of this
data. Retention times in aggregate spectrum libraries are typically
1
0
0.2
0.4
0.6
0.8
Fraction of proteins detected
600
0
100
200
300
400
500
Number of detected proteins
Entire yeast chromatogram library
Wide-window DIA
Top-20 DDA
All ghaemmaghami et al proteins
a
b
102103104106
105
102103104106
105
Number of copies per cell
Number of copies per cell
Fig. 3 Protein detection rates scale with abundance. The anumber and bfraction of proteins detected in yeast at different orders of magnitude of
abundance. Ghaemmaghami et al comprehensively estimated protein copies per cell in yeast (light blue area, 3868 total proteins) using high-afnity
epitope tagging. While top-20 DDA (red line, 1798 total proteins) can measure some low abundant proteins at 1% protein-level FDR, the strategy only
detected 48% of mid-range proteins with estimated copies per cell between 103and 104. In contrast, at 1% protein-level FDR, wide-window DIA using a
Walnut-based chromatogram library (blue line, 2519 total proteins) detected 71% of these proteins and overall recapitulated 91% of proteins found in the
entire Walnut-based chromatogram library (black line, 2754 total proteins)
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w ARTICLE
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
derived by linearly interpolating multiple DDA data sets to a
known calibration space (such as that dened by the iRT stan-
dard27), which enables retention times to be comparable from run
to run, or even across platforms. However, these measurements
usually contain some wobble due to errors introduced by
assuming a linear t. Bruderer et al.28 improve upon this strategy
with high-precision iRT tting using a non-parametric curve
tting approach for hundreds or thousands of peptides, and
EncyclopeDIA uses an analogous kernel density estimation
approach to t retention times between wide-window DIA results
and retention times in libraries. Figure 4a shows a typical spread
of retention times in EncyclopeDIA detected peptides using a
DDA spectrum library, which is 95% accurate within a spread of
5.1 min (Fig. 4c). In comparison, Fig. 4b shows the typical spread
of retention times in the chromatogram library, which is 95%
accurate within 21 s (Fig. 4d). This tightening of retention time
accuracy is due to the fact that chromatogram libraries are col-
lected on the same column as the wide-window acquisitions. Even
if efforts are made to keep packing material, length, and gradient
consistent, the dramatic gains in retention time accuracy with
chromatogram libraries reect variations that are difcult to
control for, including packing speeds, pressures, and pulled tip
orice shapes. In addition, we nd that DDA fragmentation
patterns (Fig. 4e) are often somewhat different than those col-
lected in DIA experiments (Fig. 4f). While DDA instrument
methods usually tune MS/MS collision energies to the precursor
charge and mass, some of this variation is likely due to xed
assumptions in charge states and precursor masses required by
DIA methods when multiple precursors must be fragmented at
the same time. These two factors appear to have relatively equal
and orthogonal improvements over searching DDA spectrum
libraries (Supplementary Figure 3).
–10 –5 0 5 10
0
20
40
60
80
100
120
140
160
Number of peptides
Number of peptides (103)
–10 –5 0 510
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
RT from DDA library (min) RT from chromatogram library (min)
Delta RT from DDA library (min) Delta RT from chromatogram library (min)
Retention time (min)
a
c
b
d
1–0.2 0 0.2 0.4 0.6 0.8
0
0.02
0.04
0.06
1–0.2 0 0.2 0.4 0.6 0.8
0
0.02
0.04
0.06
ef
+2H
+3H
+4H
+2H
+3H
+4H
Frequency
Frequency
Pearson correlation coefficient
from DDA library
Pearson correlation coefficient
from chromatogram library
Retention time (min)
100
100
80
80
60
60
40
40
20
20
0
100
80
60
40
20
0
0100806040200
Fig. 4 Retention time and fragmentation accuracy of spectrum and chromatogram libraries. Scatterplots comparing retention times (RT) from the aDDA
spectrum library and the bDIA chromatogram library to those from in a single HeLa DIA experiment. Each point represents a peptide, where blue peptides
t the retention time trend (green) within a Bayesian mixture model probability of 5% and red peptides are outliers (see Methods section for more details).
cDelta RTs in the DDA spectrum library are 95% accurate to a window of 5.1 min, while dretention times in the chromatogram library are 95% accurate to
21 s. eThe distribution of Pearson correlation coefcients between spectra in the DDA spectrum library and those detected from a single HeLa DIA
experiment shows charge state bias, while fthe distribution of correlation coefcients between spectra in the DIA chromatogram library and those from
the same experiment shows much less bias
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
6NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
A subtle issue with DIA library searching when using
generalized spectrum libraries is that many peptides generate
the same fragment ions, either because of sequence variation,
paralogs, or modied forms. While EncyclopeDIA attempts to
control for this using background ion distributions to predict
interference likelihoods, sequence variation due to homology or
single nucleotide polymorphisms can be unintentionally detected
as the wrong peptide sequence in certain circumstances. For
example, a sequence variation of a valine to an isoleucine is
relatively common, and the mass shift of a methyl group (+14/Z)
will often place both peptides inside the same precursor isolation
window when Z is 2 or greater. Using chromatogram libraries can
provide some protection against these issues because the initial
searches to generate the libraries are performed using narrow
(2 m/z) precursor mass windows, and subsequent wide-window
searches benet from precise retention time ltering. Addition-
ally, EncyclopeDIA requires at least 25% of the primary score to
come from ions that indicate the modied form to detect
modied peptides when modied/unmodied peptide pairs fall in
the same precursor isolation window (e.g., methionine oxidation).
Peptide and protein quantitation. Automated interference
removal is an important aspect to analyzing wide-window DIA
data. SWATHProphet29 attempts to solve this by comparing
relative fragment intensities in spectrum libraries to those found
in the DIA data, while mapDIA30 computes the correlation
between every pair of fragment ions to identify outliers. We
present an algorithm for automated transition renement to
remove fragment ion interference and alleviate the need for
manual curation (see Methods section for further details). In
short, after unit area normalizing all transitions assigned to a
single peptide (Supplementary Figure 4a), we determine the shape
of the peak as the median normalized intensity at each retention
time point (Supplementary Figure 4b). Transitions that match
this peak shape with Pearsons correlation scores > 0.9 are con-
sidered quantitative (Supplementary Figure 4c). We nd that over
81% of peptides can be quantied with at least three transitions
(Supplementary Figure 5a) and that the transitions picked by our
approach produce reproducible quantitative measurements
between technical replicates in HeLa experiments (Supplementary
Figure 5b and c).
Combining peptide detections across multiple samples often
increases false discoveries because false detections are usually
found only in individual runs. To combat this, we recalculate
global peptide FDR across all experiments in a study31 with
Percolator and generate parsimonious protein detection lists that
are also ltered to a 1% global protein FDR. We use cross-sample
retention time alignment16 to help quantify peptides that are
missing in specic samples. After ltering peptides based on
coefcient of variance and measurement consistency, we estimate
protein quantities by summing fragment ion intensities across
only sequence-unique peptides assigned to those proteins. Using
a similar strategy to LFQbench32, we validated the quantitative
accuracy of protein-level measurements with triplicate experi-
ments of ve different mixtures of yeast and HeLa proteomes at
expected concentrations (Fig. 5). In these mixtures we detected
2563 yeast proteins that passed a 1% global protein FDR
threshold. Of these, we found that 2018 yeast proteins produced
at least three quantitative transition ions without interference,
had <20% study-wide CVs, and were measured in every replicate
in pure yeast experiments. While at rst these detection and
quantication criteria may seem stringent compared to typical
proteomics experiments, narrowing our focus to condent
measurements produced quantitative ratios that closely adhered
to the expected mixture ratios, especially with regards to small
fold changes. We employed these methods and ltering criteria to
study the effect of serum starvation in human cells.
Global proteomic changes from serum starvation. Serum star-
vation is a common step in signal transduction studies as serum
contains several cytokines and growth factors that can confound
signaling levels. It is commonly thought that serum starvation
suppresses basal activity by reducing signaling activity that
effectively resets cells to G0/G1 resting phase33, although more
recent experiments34,35 suggest otherwise. Serum starvation
protocols vary widely from 2 to 24 h, and this time frame is long
enough to produce changes in protein levels resulting from
transcriptional regulation. These changes are a source of variation
that can have serious consequences when comparing between
studies.
We designed a DIA quantitative experiment to map how the
proteome of HeLa cells changes in response to serum starvation
103104105106107108109
101
100
10–1
10–2
Intensity of the diluted peptide
Intensity ratio to 100% yeast
A
Ratio sample
Expected E
Expected D
Expected C
Expected B
Expected A
ab
1010 BCDE
Fig. 5 Quantitative accuracy in mixed proteomes. aQuantitative ratios of 2018 yeast proteins spiked into a HeLa background at ve different
concentrations and measured in triplicate. Each point indicates the average protein ratio relative to 100% yeast. bBoxplots showing the spread of the ratio
measurements where boxes indicate medians and interquartile ranges, and whiskers indicate 5 and 95% values. The expected dilution ratios for samplesA,
B, C, D, and E are 78, 56, 34, 11, 7.8% yeast
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w ARTICLE
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
over time. We selected six starvation times to match commonly
used protocols and generated six biological replicates per
condition. We collected all the DIA runs with the same
mass spectrometer and chromatographic conditions. Of the
99.6k unique peptides in our chromatogram library, we
recapitulated 93.5k unique peptides from 6,802 protein
groups at a 1% global protein FDR threshold. As above, 48.6k
peptides (from 5,781 protein groups) produced at least three
quantitative transition ions without interference, had <20%
study-wide CVs, and were measured in every replicate of at least
one time point.
Using EDGE36 we found 1097 protein groups in the HeLa
proteome that changed signicantly over time at a q-value < 0.01
(Supplementary Data 1). The temporal starvation proles of these
proteins fell into ve groups (Fig. 6) where the majority changing
proteins increased in abundance. Several of these proteins are
involved in expected pathways such as cell cycle regulation (GO
enrichment FDR =0.011), metabolism (GO enrichment FDR =
0.011), and ubiquitination regulation (GO enrichment FDR =
0.018). One advantage of our method is that quantitation is
performed by summing peaks from several low interference
fragment ions, which allows us to accurately quantify small
changes. For example, we found that all eight of the observed
components of the nuclear proteasome increased signicantly by
~25% (Supplementary Figure 6), which indicates nuclear
maintenance consistent with G0/G1 resting phase.
We also observed signicant regulation of the abundance of 39
kinases and 7 phosphatases (Supplementary Figure 7). In
particular, we found that EGFR levels increased by 30% over a
24 h serum starvation time course (Supplementary Figure 8),
effectively sensitizing HeLa to the growth factor EGF. To conrm
these experiments, we monitored relative changes in the
phosphoproteome of four HeLa biological replicates after EGF
stimulation at two common serum starvation times: 4 and 16 h.
We found that while phosphopeptide measurements at both time
points directionally agreed, some phosphopeptide responses to
EGF were stronger when cells were starved for 16 h compared to
when starving for only 4 h (Supplementary Figure 9). This
increase corroborated our observation that EGFR protein levels
increased from 4 to 16 h of starvation. These protein and
phosphopeptide-level changes underline a potentially signicant
source of variation when comparing phosphorylation signaling
studies.
Discussion
We have demonstrated an experimental strategy that enables
comprehensive detection of peptides and proteins using chro-
matogram libraries. These libraries can be seeded either with a
DDA spectrum library or generated in a DIA-only mode using
Walnut for initial peptide searches. Finally, we showed that at the
cost of only six additional narrow-window DIA runs, both of
these strategies are more sensitive and reproducible relative to
comparable DDA experiments. While this approach may be
unrealistic for one-off experiments, we feel that in most quanti-
tative proteomics studies the addition of these runs are a minor
cost in exchange for a signicant increase in sensitivity.
One important limitation of our method is that each chro-
matogram library is tuned for a specic mass spectrometer and
chromatographic set up. In particular, we have observed that with
the hand-pulled and packed columns used here, there is sig-
nicant retention time variation between replicates run on dif-
ferent columns, even if effort is made to ensure column
consistency. We hypothesize that minor variations in packing
speeds, packing pressures, tip shapes, and column lengths can
affect elution times and even peptide retention time ordering.
This issue may be mitigated by acquiring a new library after a
column change and retention time aligning the libraries to ensure
consistency. Future work remains to model these minor retention
time shifts.
Another important consideration is library quality. All library
searching strategies assume that entries in the library are correctly
identied and consequently false positives in the library can be
propagated as true positives by target/decoy analysis37. This
concern is potentially compounded in our approach, which can
include up to two levels of library creation. Further work is
necessary to improve FDR estimates for library searching in DIA
experiments. In the meantime, we feel orthogonal ltering stra-
tegies are necessary to maintain conservative peptide detection
lists. In addition to retention time tting and 1% protein-level
FDR ltering, in this work we require a minimum of three
interference-free transitions and impose stringent measurement
reproducibility requirements for peptides to be considered
quantitative.
We have observed a complementarity of DDA and DIA
through the use of building spectrum libraries to seed chroma-
togram libraries. Here the stochasticity of DDA sampling when
coupled with ofine peptide separation methods such as SCX
fractionation can be exploited as a benet in that only one
observation of a peptide is necessary for inclusion in the library.
With human samples, libraries constructed using previously
recorded retention times and fragmentation patterns
contained nearly twice the peptides as those constructed without
prior knowledge. However, PECAN/Walnut can build on that
knowledge by detecting peptide sequence variants illuminated
by whole exome sequencing13, and we are exploring ways of
2408 16
3
–3
0
Z-score
2408 16
3
–3
0
Z-score
0 h
2 h
4 h
8 h
16 h
24 h
–3 Z-score +3
2408 16
3
–3
0
Z-score
2408 16
3
–3
0
Z-score
2408 16
3
–3
0
Z-score
Time (h)
ab
Fig. 6 Protein quantication changes following serum starvation. a
Heatmap of 1097 proteins found to be quantitatively changing at a q-value
< 0.01 in HeLa. Colors are Z-score normalized and indicate the number of
standard deviations away from the level at time 0. bProtein changes
grouped into ve K-means clusters (see Supplementary Figure 13 for more
details) showing separation between fast responding proteins (light blue,
dark green, and pink) and delayed responses (dark blue, light green)
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
8NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
generating chromatogram libraries that incorporate both pieces of
data.
Methods
HeLa cell culture and sample preparation. HeLa S3 cervical cancer cells (ATCC)
were cultured at 37 °C and 5% CO
2
in Dulbeccos modied Eagles medium
(DMEM) supplemented with L-glutamine, 10% fetal bovine serum (FBS), and 0.5%
strep/penicillin. Six cell culture replicates were grown to approximately a 50%
density in 6-well plates prior to FBS starvation staggered for 24, 16, 8, 4, 2, and 0 h
(one time point in each well, one plate per replicate). At the 0 h time point cells
were quickly washed three times with refrigerated phosphate-buffered saline (PBS)
and immediately ash frozen with liquid nitrogen. Frozen cells were lysed in a
buffer of 9 M urea, 50 mM Tris (pH 8), 75 mM NaCl, and a cocktail of protease
inhibitors (Roche Complete-mini EDTA-free). After scraping, cells were subjected
to 2 × 30 s of probe sonication, 20 min of incubation on ice, followed by 10 min of
centrifugation at 21,000 × gand 4 °C. The protein content of the supernatant was
estimated using BCA. The proteins were reduced with 5 mM dithiothreitol for
30 min at 55 °C, alkylated with 10 mM iodoacetamide in the dark for 30 min at
room temperature, and quenched with an additional 5 mM dithiothreitol for
15 min at room temperature. The proteins were diluted to 1.8 M urea and then
digested with sequencing grade trypsin (Pierce) at a 1:50 enzyme to substrate ratio
for 12 h at 37 °C. The digestion was quenched by adding 10% triuoroacetic acid to
achieve approximately pH 2. Resulting peptides were desalted with 100 mg tC18
SepPak cartridges (Waters) using vendor-provided protocols and dried with
vacuum centrifugation. Peptides were brought to 1 μg/3 μl in 0.1% formic acid
(buffer A) prior to mass spectrometry acquisition. For the reproducibility experi-
ments and to build a chromatogram library we pooled aliquots from all six time
points for three of the replicates to ensure that the pool contained virtually every
peptide present in the individual time points.
With the phosphoproteomics experiment, four replicates were performed for
each of the four conditions: 20 min EGF (100 ng/ml) or PBS stimulation following
4 h starvation, and 20 min EGF/PBS stimulation following 16 h starvation. Sample
generation and processing was performed in the same fashion with the following
exceptions: (1) in addition to protease inhibitors, a cocktail of phosphatase
inhibitors (50 mM NaF, 50 mM β-glycerophosphate, 10 mM pyrophosphate, and
1 mM orthovanadate) was also added to the lysis buffer, (2) proteins were digested
for 14 h, and 3) phosphopeptides were enriched using immobilized metal afnity
chromatography (IMAC) using Fe-NTA magnetic agarose beads (Cube Biotech).
Enrichment was performed with a KingFisher Flex robot (Thermo Scientic),
which incubated peptides with 150 μl 5% bead slurry in 80% acetonitrile, 0.1% TFA
for 30 min, washed them three times with the same solution, and eluted them with
60 μl 50% acetonitrile:1% NH
4
OH. Phosphopeptides were then acidied with 10%
formic acid and dried. Phosphopeptides were brought to 1 μg/3 μl in 0.1% formic
acid assuming a 1:100 reduction in peptide abundance from the IMAC enrichment.
Again, to build a chromatogram library we pooled aliquots from all four conditions
for three of the replicates to ensure that the pool contained virtually every peptide
present in the individual conditions.
Yeast cell culture and sample preparation. Yeast strain BY4741 (Dharmacon)
was cultured at 30 °C in YEPD and harvested at mid-log phase. Cell pellets were
lysed in a buffer of 8 M urea, 50 mM Tris (pH 8), 75 mM NaCl, 1 mM EDTA
(pH 8) using 7 cycles of 4 min bead beating with glass beads followed by one
minute rest on ice. Lysate was collected by piercing the tube, placing it into an
empty eppendorf, and centrifuging for 1 min at 3000 × gand 4 C. Insoluble
material was removed from the lysate by 15 min centrifugation at 21,000 × gand
4 C. The protein content of the supernatant was estimated using BCA. The proteins
were reduced with 5 mM dithiothreitol for 30 min at 55 °C and alkylated with 10
mM iodoacetamide in the dark for 30 min at room temperature. The proteins were
diluted to 1.8 M urea and then digested with sequencing grade trypsin (Pierce) at a
1:50 enzyme to substrate ratio for 16 h at 37 °C. The digestion was quenched using
5 N HCl to achieve approximately pH 2. Resulting peptides were desalted with
30 mg MCX cartridges (Waters) and dried with vacuum centrifugation. Peptides
were brought to 1 μg/3μl in 0.1% formic acid (buffer A) prior to mass spectro-
metry acquisition.
Mixtures of yeast and HeLa cells. Mixtures of digested yeast and HeLa peptides
were combined in the following yeast:HeLa ratios: 1:0, 0.7:0.3, 0.5:0.5, 0.3:0.7,
0.1:0.9, 0.07:0.93, and 0:1, where concentrations were assumed from protein-level
BCA analyses. Ratio mixing bias (caused by bias in BCA estimates from assuming
Bovine serum albumin as a standard) were determined by regression across all
ratios (both yeast:HeLa and HeLa:yeast) using a linear model using the expected
ratio of the measured species as a regression term. After correction, the recalculated
ratios were determined to be 1:0, 0.78:22, 0.56:0.44, 0.34:0.66, 0.11:0.89, 0.078:0.922,
and 0:1.
LC mass spectrometry. Peptides were separated with a Waters NanoAcquity
UPLC and emitted into a Thermo Q-Exactive HF tandem mass spectrometer.
Pulled tip columns were created from 75 μm inner diameter fused silica capillary
in-house using a laser pulling device and packed with 3 μm ReproSil-Pur C18 beads
(Dr. Maisch ) to 300 mm. Trap columns were created from 150 μm inner diameter
fused silica capillary fritted with Kasil on one end and packed with the same C18
beads to 25 mm. Solvent A was 0.1% formic acid in water, while solvent B was 0.1%
formic acid in 98% acetonitrile. For each injection, 3 μl (approximately 1 μg) was
loaded and eluted using a 90-minute gradient from 5 to 35% B, followed by a
40 min washing gradient. Data were acquired using either data-dependent acqui-
sition (DDA) or data-independent acquisition (DIA). Three DDA and DIA HeLa
and yeast technical replicates were acquired by alternating between acquisition
modes to minimize bias. Serum-starved HeLa acquisition was randomized within
blocks to enable downstream statistical analysis.
DDA acquisition and processing. The Thermo Q-Exactive HF was set to positive
mode in a top-20 conguration. Precursor spectra (4001600 m/z) were collected at
60,000 resolution to hit an AGC target of 3e6. The maximum inject time was set to
100 ms. Fragment spectra were collected at 15,000 resolution to hit an AGC target
of 1e5 with a maximum inject time of 25 ms. The isolation width was set to 1.6 m/z
with a normalized collision energy of 27. Only precursors charged between +2 and
+4 that achieved a minimum AGC of 5e3 were acquired. Dynamic exclusion was
set to autoand to exclude all isotopes in a cluster. Thermo RAW les were
converted to mzXML format using ReAdW and searched against a Uniprot Human
FASTA database (87613 entries) with Comet (version 2015.02v2), allowing for
variable methionine oxidation, and n-terminal acetylation. Cysteines were assumed
to be fully carbamidomethylated. Searches were performed using a 50 ppm pre-
cursor tolerance and a 0.02 Da fragment tolerance using fully tryptic specicity
(KR|P) permitting up to two missed cleavages. Search results were ltered to a 1%
peptide-level FDR using Percolator (version 3.1).
DIA acquisition and processing. For each chromatogram library, the Thermo Q-
Exactive HF was congured to acquire six chromatogram library acquisitions with
4 m/z DIA spectra (4 m/z precursor isolation windows at 30,000 resolution, AGC
target 1e6, maximum inject time 55 ms) using an overlapping window pattern from
narrow mass ranges using window placements optimized by Skyline (i.e.,
396.43502.48, 496.48602.52, 596.52702.57, 696.57802.61, 796.61902.66, and
896.61002.70 m/z). See Supplementary Figure 10 and Supplementary Data 2 for
the actual windowing scheme. Two precursor spectra, a wide spectrum (4001600
m/z at 60,000 resolution) and a narrow spectrum matching the range (i.e., 390510,
490610, 590710, 690810, 790910, and 8901010 m/z) using an AGC target of
3e6 and a maximum inject time of 100 ms were interspersed every 18 MS/MS
spectra.
For quantitative samples, the Thermo Q-Exactive HF was congured to acquire
25 × 24 m/z DIA spectra (24 m/z precursor isolation windows at 30,000 resolution,
AGC target 1e6, maximum inject time 55 ms) using an overlapping window pattern
from 388.43 to 1012.70 m/z using window placements optimized by Skyline. See
Supplementary Figure 11 and Supplementary Data 2 for the actual windowing
scheme. Precursor spectra (3851015m/z at 30,000 resolution, AGC target 3e6,
maximum inject time 100 ms) were interspersed every 10 MS/MS spectra.
Phosphopeptide samples were analyzed in the same way using 20 × 20 m/z DIA
spectra in an overlapping window pattern from 490.47 to 910.66 m/z.
All DIA spectra were programed with a normalized collision energy of 27 and
an assumed charge state of +2. Thermo RAW les were converted to mzML
format using the ProteoWizard package (version 3.0.7303) where they were peak
picked using vendor libraries. A HeLa-specic Bibliospec20 HCD spectrum library
was created from Thermo Q-Exactive DDA data using Skyline (version 3.1.0.7382).
This library is comprised of 39 SCX and high-pH reverse phase fractionated raw
les using multiple HPLC gradients to vary the local peptide matrix. This BLIB
library and accompanying iRTDB normalized retention time database were
converted into a ELIB library and used to search the mzMLs for peptides.
EncyclopeDIA searches DIA data using +1H and +2H b/y ion fragments that
could be found in library spectra. EncyclopeDIA was congured with default
settings (10 ppm precursor, fragmen t, and library tolerances, considering both B
and Y ions, and trypsin digestion was assumed). EncyclopeDIA was congured to
use Percolator version 3.1. Phosphopeptides were processed the same way except a
HeLa-specic phosphopeptide HCD spectrum library was used38 and
phosphopeptides detected in EncyclopeDIA searches were localized using
Thesaurus39.
Further validation of the HeLa replicate dataset was performed using Skyline-
daily version 4.1.1.18151. Precursors were ltered between the isolated m/z range of
388.4 to 1000.7 with a minimum of 6 measurable fragment y-ion and b-ion (charge
1 or 2) between 300 and 2000 m/z, not including y1, y2, b1, or b2. The fragment
ions with the six most intense peaks from the libraries within these limits were
chosen along with the rst three precursor isotopes to be extracted from MS1, both
set to extract within 10 ppm mass error from the centroided (and demultiplexed for
MS/MS) spectra. Two iRT libraries were built (for the HeLa-specic DDA library
and the HeLa-specic chromatogram library, respectively) using 73 reliably
detected peptides were chosen as iRT library anchors across the retention time
range. Chromatogram extraction was set to apply to all spectra within 10 min of
predicted retention times using these iRT libraries. A mProphet40 model was
trained using the target/decoy strategy (with the Retention time difference
squaredexcluded) and applied without any run-to-run alignment. Please see
Supplementary Note 2 for further details.
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w ARTICLE
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Overlapping DIA deconvolution. When using the overlapping DIA scheme, every
spectrum in the entire raw le must be deconvoluted. In an effort to maintain
consistency between analysis techniques, we used MSConvert to deconvolute RAW
les in this study. However, we have also implemented a simple deconvolution
algorithm in EncyclopeDIA that can be performed on-the-y while reading spectra
in a narrow I/O buffer. In a DIA data set, at each cycle (T) every MS/MS spectrum
(S
Ti
) comprises fragments from precursors within the precursor isolation window
(i). Spectra in consecutive half cycles are overlapped by 50%, such that precursors
from the lower 50% of the window in MS/MS spectrum S
Ti
should also be present
in the previous/next half cycles lower offset spectra (S
(T-1)(i-1)
and S
(T+1)(i-1)
) while
precursors from the upper 50% of the window should also be present in the
corresponding upper offset spectra (S
(T-1)(i+1)
and S
(T+1)(i+1)
). We divide these
windows into two bins and attempt to determine which fragments were derived
from precursors in the upper half or the lower half using previous and next half
cycles. Fragment ions that are found exclusively on the lower previous/next spectra
(S
(T-1)(i-1)
and S
(T+1)(i-1)
) are assigned to the lower bin, while those found exclusively
in the upper previous/next spectra (S
(T-1)(i+1)
and S
(T+1)(i+1)
) are assigned to the
upper bin. Ions that are found in both sets of spectra are assigned proportionally to
each bin where the proportion is set to the summed peak intensity for both spectra,
e.g.: (S
(T-1)(i-1)
+S
(T+1)(i-1)
)/(S
(T-1)(i-1)
+S
(T+1)(i-1)
+S
(T-1)(i+1)
+S
(T+1)(i+1)
) for the
lower bin. Peaks that are found in none of the previous and next overlapping
spectra are assumed to be noise. New spectra are built from the deconvoluted peaks
in both the lower and upper bins. Since this algorithm only needs to consider three
half cycles at a time, deconvolution can happen quickly and in memory, with
minimal impact on le reading speeds.
Decoy library entries. A decoy library entry is created for every target library
entry. To generate a decoy, rst the target peptide sequence (except for digestion
enzyme-specic termini) is reversed, ensuring that the decoy maintains its
appearance as a tryptic peptide. Then fragment ions corresponding to amino acids
(B/Y for CID, C/Z/Z +1 for ETD) or their expected neutral losses due to mod-
ications (e.g. phosphorylation) are calculated for both target and decoy entries. If
the precursor charge state is greater than +2, then +2 fragment ions are also
considered. Uncommon neutral loss ions such as A-type ions or loss of water or
ammonia are not considered to limit the likelihood of false detections. Fragment
ions that correspond to target sequence m/zs are transferred to new decoy m/zs
such that their ion type and index are kept consistent. Delta mass errors in each
fragment ion are also maintained to preserve consistency, and all peaks corre-
sponding to the fragment delta mass window are transferred if the library is col-
lected in prole mode. Ions that cannot be assigned to amino acids (such as those
corresponding to precursor ions, background noise or interference) are not used by
EncyclopeDIA.
Ion weighting estimation. While searching, a unique background is calculated for
each precursor isolation window using the prevalence of each fragment ion in the
library spectra considered for that window (Supplementary Figure 12). This
background helps estimate the interference frequency for any given ion and is used
to weight some scores. This distribution is calculated as the frequency that any
nominal m/z fragment ion (rounded by truncation) appears in entries from the
library within the specied precursor window lter. m/z frequencies are calculated
out to 4000 and a pseudocount is applied to every m/z bin to avoid divide by zero
frequency errors.
Primary scoring and feature scoring functions. The primary score in Encyclo-
peDIA conceptually draws on the X!Tandem HyperScore. Unlike scoring functions
like XCorr in Sequest, the HyperScore does not attempt to account or penalize for
ions that do not match the peptide in question, making it ideal for DIA analysis
where coeluting peptides are common. The score function is the weighted dot
product of the intensities in the acquired spectrum (I) and the library spectrum (P),
weighted by a correlation score vector (C), which is discussed in detail in the
Chromatogram Library ELIB Generation section. Again, any ions in the library
spectrum that do not correspond to the amino acid sequence are not considered in
this score. The dot product is multiplied by the factorial of the number of matching
ions:
Primary score ¼log10 X
n
i¼0
IiPiCi
!
n!
!
ð1Þ
Sometimes modied peptides (for example, oxidized peptides) are present in
the same precursor isolation window as their unmodied forms. Since often these
peptides share several fragment ions in common, we require that at least 25% of the
score contribution for modied peptides come from ions that exclusively indicate
that modication in cases where any of up to four isotopic peaks from the
modied/unmodied peptide pairs fall in the same window.
Several more computationally expensive secondary feature scores
(Supplementary Data 3) are calculated once peaks are assigned. Briey, the scores
are divided to cover various classes of features: overall scoring (deltaCN, eValue,
logDotProduct, logWeightedDotProduct, xCorrLib, xCorrModel), fragment ion
accuracy (sumOfSquaredErrors, weightedSumOfSquaredErrors,
numberOfMatchingPeaks, averageAbsFragDeltaMass,
averageFragmentDeltaMass), precursor ion accuracy (isotopeDotProduct,
averageAbsPPM, averagePPM), and retention time accuracy (deltaRT). The
deltaRT score is only used after retention time alignment has been performed. All
of these scores are fed to Percolator 3.1 for target/decoy FDR analysis.
Retention time alignment. Accuracy and stability of retention time alignments is
critical for EncyclopeDIA. Consequently, we designed an algorithm that works
analogous to how we visualize densities. This approach uses two-dimensional
kernel density estimates (KDE) that are much less prone to failure as compared to
typical line tting approaches such as LOESS in situations with grossly variable
numbers of points and outliers. In this approach each X/Y coordinate is estimated
as a symmetrical, two-dimensional kernel based on a cosine-based Gaussian
approximation. Following Silvermans rule41 the KDE bandwidth is set to:
Bandwidth ¼N1
6stdevðxÞþstdevðyÞ
2

ð2Þ
where Nis the number of matched peptides. The kernels standard deviation is set
to the bandwidth (analogous to full width at half max) divided by 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ln 2
ðÞ
p. This
distribution is stamped at every X/Y coordinate on a 1000 by 1000 grid mapping
from the lowest and highest retention times in both the X and Y dimensions. Once
the KDE is calculated, the optimal t is traced using a ridge walking algorithm that
traces the mode of the KDE across retention time (Supplementary Figure 13). In
this algorithm the highest point in the KDE is identied and the line is tin
increasing retention time by moving to the highest local grid point to the north
(increased sample retention time), east (increased library retention time), or
northeast. If north and east are both the highest local point, then the line moves to
the northeast. This is performed iteratively until the line is t across the increasing
retention time. Then the same ridge walk is performed in decreasing retention time
by moving south, west, or southwest. This approach forces a monotonic line (it can
never nd a negative retention time change) that follows where the most number of
X/Y coordinates lie.
Retention time alignment mixture model. After the alignment is performed, we
use the delta retention time data to produce a mixture model to determine outliers.
We calculate a Gaussian distribution representing correct retention time matches
using the median delta retention time as the Gaussian mean and interquartile range
divided by 1.35 as the Gaussian standard deviation. We use a unit distribution to
represent incorrect retention time matches. Starting where the distribution priors
are set to 0.5, we run 10 iterations of a PeptideProphet-like mixture model42 to t
the two distributions to the delta retention time data using an Expectation Max-
imization algorithm43. Peptide matches with posterior error probability estimations
that are less than 5% likely to be in the correct retention time distribution are
considered outliers.
Retention time alignment across experiments. For each passing peptide, we
determine the experiment that produced the best scoring match and set that match
aside as a canonical peptide representation. We chose the experiment with the most
canonical peptides as an anchor and retention time align all of the experiments
(and their canonical peptides) to that anchor. Mixture models (described above) for
these retention time alignments are calculated and outliers are removed if the local-
anchor delta retention time is less than 0.1% likely to t the mixture model. New
retention times for outlier-removed peptides and peptides that were only assigned
globally are inferred using the anchor retention time.
Peptide and protein FDR ltering across experiments. We concatenate peptide
feature les from all experiments in a study and run Percolator 3.1 to perform
global peptide FDR ltering at 1%. Using this list of peptides, we generate a
parsimonious list of protein groups using a greedy algorithm. Here peptides are
assigned to protein groups with the highest protein score:
Protein scoreðPÞ¼NX
N
p2P
ðPEPpÞð3Þ
where the the sum of the peptide (p) posterior error probabilities (PEP
p
) is sub-
tracted from the number of peptides (N) assigned to that protein (P). Protein
groups are sorted on the lowest PEP
p
assigned to them18 and then stringently
target/decoy ltered to 1% protein FDR.
Automated transition renement. Fragment ion interference is common when
analyzing wide-window data. While fragment ions that show interference may still
be useful for detecting peptides, those ions must be screened prior to quantitation
to ensure an accurate measurement. We designed a non-parametric approach to
selecting the best ions for quantitation. We rst Savizky-Golay smooth44 the
fragment ion chromatograms and then normalize them to have unit integrated
intensity. To simplify the smoothing mathematics, we make the assumption that
cycle times are consistent within the time frame of a single peak, thus removing the
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
10 NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
need for interpolation over retention time. After normalization the chromatograms
of quantitatively useful ions line up while those of interfered ions will have either
higher or lower unit-normalized intensities at different retention times. We cal-
culate the median normalized intensity at each retention time point as an
approximation for the peptide peak shape. We then determine peak boundaries by
tracing descent of the median peak shape from the maximum normalized intensity
on either side of the peak. The boundaries are set to the minimum point at which
the median peak trace starts increasing for >2 consecutive spectra or any point
where the trace drops to less than 1% of the maximum. At that point we calculate a
Pearsons correlation coefcient for the similarity between each fragment ion
chromatogram with that of the median peak shape between those boundaries.
Peaks that match with a correlation coefcient of at least 0.9 are considered
quantitative, while those that match with coefcients of at least 0.75 are considered
useful for detection purposes.
Fragment ion quantication and background subtraction. We calculate trape-
zoidal peak areas across Savitsky-Golay smoothed chromatograms. Analogous to
Skyline, peak intensities are background subtracted by removing a peak area rec-
tangle with a height equal to the largest intensity of either of the boundary edges. If
the area of the rectangle is larger than the area of the peak the intensity is set to
zero.
Peptide quantication and transition choice across experiments. Transition
interference changes on a sample by sample basis. We rank quantitative transitions
(>0.9 correlation) based on the sum of their correlation scores across all experi-
ments (effectively counting the number of samples in which they are observed). In
addition, for each transition we calculate a global interference score:
Interference scoreðtÞ¼ PsIt;s½Ct;s<0:9
PsIt;s½Ct;s>¼0:9ð4Þ
which represents the sum of transition (t) intensities (I
t,s
) across experiments (s)
that show interference (C
t,s
<0.9) over those that do not (C
t,s
0.9). Transitions with
interference scores > 0.2 are deemed untrustworthy for quantication and are
dropped. Peptide quantities are set to the sum of the top ve transitions that pass
these criteria, where peptides with fewer than 3 quantitative transitions are not
carried forward. We require additional stringent criteria for our time course study.
Specically, we required that each peptide be measured in every replicate of at least
one time point, and that cross experiment CVs (estimated using quantities from
each time point corrected with a linear model) be less than 20%.
Protein quantication and statistical testing. Protein quantities were calculated
as the sum of peptide quantities. We used Extraction of Differential Gene
Expression (EDGE) 3.636 to statistically test for reproducible changes across the
time course study. We performed k-means clustering of proteins that passed an
EDGE q-value < 0.01 using ve groups using 1000 random starting points with
1000 iterations. We estimated ve groups by calculating the sum of within squared
errors of each K model from 1 to 15 and estimating the rst point where the change
in the sum of within squared errors was at (Supplementary Figure 14).
Gene Ontology enrichment. We performed Gene Ontology enrichment of sig-
nicantly changing proteins using the online PANTHER Overrepresentation Test45
(release 20170413) with the Homo sapiens Gene Ontology database (release 2017-
10-24) using a background of all proteins consistently detected in our experiments.
After removing terms with fewer than 20 proteins (to avoid weakly powered
classes) and more than 1000 proteins (to avoid vague classes), we applied
Benjamini-Hochberg FDR correction and ltered enrichment tests to a FDR < 0.05.
Code availability. EncyclopeDIA is implemented in Java 1.8 as both a command
line and a stand-alone GUI application. EncyclopeDIA supports the HUPO PSI
mzML standard for reading raw MS/MS data, and can construct DLIB DDA-based
spectrum libraries from Skyline/Bibliospec BLIB les, NIST MSP les, or HUPO
PSI TraML les. Additionally, EncyclopeDIA results can be imported into Sky-
line25 to enable further visualization and downstream processing. EncyclopeDIA is
heavily optimized and multi-threaded such that searches can be performed on
conventional desktop computers with limited RAM and processing power. We
have released source code and cross platform (Windows, Mac OS X, Linux) bin-
aries for EncyclopeDIA on Bitbucket at: https://bitbucket.org/searleb/encyclopedia
under the open source Apache 2 license.
Data availability
AllmassspectrometrymzMLandRAWdatales (see Supplementary Data 4 for raw
data annotations) are available on the Chorus Project (project identier 1433, chro-
matogram library data for human [https://chorusproject.org/anonymous/download/
experiment/32fa43c0f9ba486eb3eedeb689f87765]andyeast[https://chorusproject.org/
anonymous/download/experiment/b98531fe7fe246cbb7e45ce065fe54a9], serum starva-
tion data proteomics [https://chorusproject.org/anonymous/download/experiment/
e0659292e919414787ec112dca4c57c1] and phosphoproteomics [https://chorusproject.
org/anonymous/download/experiment/c24893cd7115446dab4d7eeb7fde2506] data)
and at the MassIVE proteomics repository (project identier MSV000082805 [https://
massive.ucsd.edu/ProteoSAFe/dataset.jsp?task =e340c79fbdc64e14a710265761bfeed5]).
All other data supporting the ndings of this studz are available from the corresponding
author on reasonable request. A reporting summary for this article is available as a
Supplementary Information le.
Received: 2 March 2018 Accepted: 16 October 2018
References
1. Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in
breast cancer. Nature 534,5562 (2016).
2. Zhang, B. et al. Proteogenomic characterization of human colon and rectal
cancer. Nature 513, 382387 (2014).
3. Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A. & Yates, J. R.
Automated approach for quantitative analysis of complex peptide mixtures
from tandem mass spectra. Nat. Methods 1,3945 (2004).
4. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by
data-independent acquisition: a new concept for consistent and accurate
proteome analysis. Mol. Cell Proteom. 11, O111.016717 (2012).
5. Stahl, D. C., Swiderek, K. M., Davis, M. T. & Lee, T. D. Data-controlled
automation of liquid chromatography/tandem mass spectrometry analysis of
peptide mixtures. J. Am. Soc. Mass. Spectrom. 7, 532540 (1996).
6. Panchaud, A. et al. Precursor acquisition independent from ion count:
how to dive deeper into the proteomics ocean. Anal. Chem. 81, 64816488
(2009).
7. Li, G. Z. et al. Database searching and accounting of multiplexed precursor
and product ion spectra from the data independent analysis of simple and
complex peptide mixtures. Proteomics 9, 16961719 (2009).
8. Tsou, C. C. et al. DIA-Umpire: comprehensive computational framework for
data-independent acquisition proteomics. Nat. Methods 12, 258264 (2015).
7 p following 264.
9. Weisbrod, C. R., Eng, J. K., Hoopmann, M. R., Baker, T. & Bruce, J. E.
Accurate peptide fragment mass analysis: multiplexed peptide identication
and quantication. J. Proteome Res. 11, 16211632 (2012).
10. Röst, H. L. et al. OpenSWATH enables automated, targeted analysis of
data-independent acquisition MS data.[letter]. Nat. Biotechnol. 32, 219223
(2014).
11. Bruderer, R. et al. Extending the limits of quantitative proteome proling with
data-independent acquisition and application to acetaminophen-treated
three-dimensional liver microtissues. Mol. Cell Proteom. 14, 14001410
(2015).
12. Wang, J. et al. MSPLIT-DIA: sensitive peptide identication for data-
independent acquisition. Nat. Methods 12, 1106 (2015).
13. Ting, Y. S. et al. PECAN: library-free peptide detection for data-independent
acquisition tandem mass spectrometry data. Nat. Methods 14, 903908
(2017).
14. Egertson, J. D., MacLean, B., Johnson, R., Xuan, Y. & MacCoss, M. J.
Multiplexed peptide analysis using data-independent acquisition and Skyline.
Nat. Protoc. 10, 887903 (2015).
15. Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis
of SWATH MS data. Nat. Protoc. 10, 426441 (2015).
16. Röst, H. L. et al. TRIC: an automated alignment strategy for reproducible
protein quantication in targeted proteomics. Nat. Methods 13, 777783
(2016).
17. Fenyö, D. & Beavis, R. C. A method for assessing the statistical signicance of
mass spectrometry-based protein identications using general scoring
schemes. Anal. Chem. 75, 768774 (2003).
18. The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and Accurate Protein
False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator
3.0. J. Am. Soc. Mass. Spectrom. 27, 17191727 (2016).
19. Lam, H. et al. Building consensus spectral libraries for peptide identication in
proteomics. Nat. Methods 5, 873 (2008).
20. Frewen, B. E., Merrihew, G. E., Wu, C. C., Noble, W. S. & MacCoss, M. J.
Analysis of peptide MS/MS spectra from large-scale proteomics experiments
using spectrum libraries. Anal. Chem. 78, 56785684 (2006).
21. Noble, W. S. Mass spectrometrists should search only for peptides they care
about. Nat. Methods 12, 605 (2015).
22. Rosenberger, G. et al. A repository of assays to quantify 10,000 human
proteins by SWATH-MS. Sci. Data 1, 140031 (2014).
23. Bruderer, R. et al. Optimization of experimental parameters in data-
independent mass spectrometry signicantly increases depth and
reproducibility of results. Mol. Cell Proteom. 16, 22962309
(2017).
NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w ARTICLE
NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
24. Kelstrup, C. D. et al. Performance evaluation of the Q Exactive HF-X for
shotgun proteomics. J. Proteome Res. 17, 727738 (2017).
25. MacLean, B. et al. Skyline: an open source document editor for creating and
analyzing targeted proteomics experiments. Bioinformatics 26, 966968
(2010).
26. Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature
425, 737741 (2003).
27. Escher, C. et al. Using iRT, a normalized retention time for more targeted
measurement of peptides. Proteomics 12, 11111121 (2012).
28. Bruderer, R., Bernhardt, O. M., Gandhi, T. & Reiter, L. High-precision
iRT prediction in the targeted analysis of data-independent acquisition and
its impact on identication and quantitation. Proteomics 16, 22462256
(2016).
29. Keller, A., Bader, S. L., Shteynberg, D., Hood, L. & Moritz, R. L. Automated
validation of results and removal of fragment ion interferences in targeted
analysis of data-independent acquisition mass spectrometry (MS) using
SWATHProphet. Mol. Cell Proteom. 14, 14111418 (2015).
30. Teo, G. et al. mapDIA: preprocessing and statistical analysis of quantitative
proteomics data from data independent acquisition mass spectrometry.
J Proteomics 129, 108-120 (2015).
31. Rosenberger, G. et al. Statistical control of peptide and protein error rates in
large-scale targeted data-independent acquisition analyses. Nat. Methods 14,
921927 (2017).
32. Navarro, P. et al. A multicenter study benchmarks software tools for label-free
proteome quantication. Nat. Biotechnol. 34, 1130 (2016).
33. Pardee, A. B. G1 events and regulation of cell proliferation. Science 246,
603608 (1989).
34. Levin, V. A. et al. Different changes in protein and phosphoprotein levels
result from serum starvation of high-grade glioma and adenocarcinoma cell
lines. J. Proteome Res. 9, 179191 (2010).
35. Pirkmajer, S. & Chibalin, A. V. Serum starvation: caveat emptor. Am. J.
Physiol. Cell Physiol. 301, C272C279 (2011).
36. Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G. & Davis, R. W. Signicance
analysis of time course microarray experiments. Proc. Natl Acad. Sci. USA 102,
1283712842 (2005).
37. Lam, H. et al. Development and validation of a spectral library
searching method for peptide identication from MS/MS. Proteomics 7,
655667 (2007).
38. Lawrence, R. T., Searle, B. C., Llovet, A. & Villén, J. Plug-and-play analysis of
the human phosphoproteome by targeted high-resolution mass spectrometry.
Nat. Methods 13, 431434 (2016).
39. Searle, B. C., Lawrence, R. T., MacCoss, M. J. & Villén J. Thesaurus:
quantifying phosphoprotein positional isomers. Preprint at bioRxiv https://
doi.org/10.1101/421214 (2018).
40. Reiter, L. et al. mProphet: automated data processing and statistical validation
for large-scale SRM experiments. Nat. Methods 8, 430 (2011).
41. Silverman B. W. Density estimation for statistics and data analysis. CRC press;
1986
42. Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical
model to estimate the accuracy of peptide identications made by MS/MS and
database search. Anal. Chem. 74, 53835392 (2002).
43. Dempster A. P., Laird, N. M. & Rubin D. B. Maximum likelihood from
incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological).
19771-38.
44. Savitzky, A. & Golay, M. J. E. Smoothing and differentiation of data by
simplied least squares procedures. Anal. Chem. 36, 16271639 (1964).
45. Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene
function analysis with the PANTHER classication system. Nat. Protoc. 8,
15511566 (2013).
Acknowledgements
We would like to thank members of the Villé
n and MacCoss labs for critical discussions.
We additionally thank N. Shulman for implementing Skyline visualization of Encyclo-
peDIA reports, and S. Just, P. Seitzer, and S. Ludwigsen for EncyclopeDIA bug reports
and patches. B.C.S. is supported by F31 GM119273; L.K.P. is supported by F31
AG055257. This work is supported by P41 GM103533, R21 CA192983, and U54
HG008097 to M.J.M.; and R35 GM119536, R01 AG056359, and a research grant from
the W.M. Keck Foundation to J.V.
Author contributions
B.C.S. and M.J.M. conceived the study. B.C.S., R.T.L., and M.J.M designed the experi-
ments. B.C.S., L.K.P., and R.T.L. performed the experiments. B.C.S. designed and wrote
the software with input from L.K.P., J.D.E., and Y.S.T.. B.C.S. and B.X.M. analyzed the
data. M.J.M. and J.V. supervised the work. B.C.S., L.K.P., J.D.E., Y.S.T., R.T.L., B.X.M., J.
V., and M.J.M. wrote the paper.
Additional information
Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467-
018-07454-w.
Competing interests: The MacCoss Lab at the University of Washington (members
B.C.S., L.K.P., J.D.E., Y.S.T., B.X.M. and M.J.M.) has a sponsored research agreement
with Thermo Fisher Scientic, the manufacturer of the instrumentation used in this
research. Additionally, M.J.M. is a paid consultant for Thermo Fisher Scientic. The
remaining authors declare no competing interests.
Reprints and permission information is available online at http://npg.nature.com/
reprintsandpermissions/
Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2018
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-07454-w
12 NATURE COMMUNICATIONS | (2018) 9:5128 | DOI: 10.1038/s41467-018-07454-w | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Previously, our lab demonstrated that LITs could be used effectively as stand-alone mass analyzers to measure low-input samples using an Orbitrap Eclipse™ Tribrid mass spectrometer 22 . In that work, we detected approximately 400 proteins from single cells using data-independent acquisition coupled with chromatogram libraries to help make detections 23 . While our Eclipse instrument configuration ignored the high-resolution Orbitrap mass analyzer, we performed those experiments in the context of a high-end Tribrid instrument. ...
... This collection of PRM-validated peptides, referred to as a "translation library," serves as a database of potential peptides to select for targeted assays. Translation libraries act as DIA chromatogram libraries 23,37 with the purpose of efficiently and quickly translating the chemical characteristics of library entries for target peptides from one instrument/acquisition approach or prediction space to the measurement space of the instrument used for PRMs. The software, combined with the translation library, is designed to schedule a PRM assay from a list of target accession numbers and other optionally desired accessions from a selected FASTA database. ...
Article
Full-text available
Advances in proteomics and mass spectrometry enable the study of limited cell populations, where high-mass accuracy instruments are typically required. While triple quadrupoles offer fast and sensitive low-mass specificity measurements, these instruments are effectively restricted to targeted proteomics. Linear ion traps (LITs) offer a versatile, cost-effective alternative capable of both targeted and global proteomics. Here, we describe a workflow using a hybrid quadrupole-LIT instrument that rapidly develops targeted proteomics assays from global data-independent acquisition (DIA) measurements without high-mass accuracy. Using an automated software approach for scheduling parallel reaction monitoring assays (PRM), we show consistent quantification across three orders of magnitude in a matched-matrix background. We demonstrate measuring low-level proteins such as transcription factors and cytokines with quantitative linearity below two orders of magnitude in a 1 ng background proteome without requiring stable isotope-labeled standards. From a 1 ng sample, we found clear consistency between proteins in subsets of CD4⁺ and CD8⁺ T cells measured using high dimensional flow cytometry and LIT-based proteomics. Based on these results, we believe hybrid quadrupole-LIT instruments represent a valuable solution to expanding mass spectrometry in a wide variety of laboratory settings.
... In contrast, DIA MS2 spectra are assumed to be highly multiplexed, which makes peptide identification more challenging, but spectral library-based [25][26][27] and library-free [28][29][30] methods have been developed to tackle the problem. Compared to well-studied peptide identification methods for DDA and DIA data, there are few tools 31 that natively support WWA DDA data. ...
Article
Full-text available
Liquid chromatography-mass spectrometry based proteomics, particularly in the bottom-up approach, relies on the digestion of proteins into peptides for subsequent separation and analysis. The most prevalent method for identifying peptides from data-dependent acquisition mass spectrometry data is database search. Traditional tools typically focus on identifying a single peptide per tandem mass spectrum, often neglecting the frequent occurrence of peptide co-fragmentations leading to chimeric spectra. Here, we introduce MSFragger-DDA+, a database search algorithm that enhances peptide identification by detecting co-fragmented peptides with high sensitivity and speed. Utilizing MSFragger’s fragment ion indexing algorithm, MSFragger-DDA+ performs a comprehensive search within the full isolation window for each tandem mass spectrum, followed by robust feature detection, filtering, and rescoring procedures to refine search results. Evaluation against established tools across diverse datasets demonstrated that, integrated within the FragPipe computational platform, MSFragger-DDA+ significantly increases identification sensitivity while maintaining stringent false discovery rate control. It is also uniquely suited for wide-window acquisition data. MSFragger-DDA+ provides an efficient and accurate solution for peptide identification, enhancing the detection of low-abundance co-fragmented peptides. Coupled with the FragPipe platform, MSFragger-DDA+ enables more comprehensive and accurate analysis of proteomics data.
... Proteomic data were searched using an empirically corrected library, and a quantitative analysis was performed to obtain a comprehensive proteomic profile. Proteins were identified and quantified using EncyclopeDIA and visualized with ScaffoldDIA using 1% false discovery thresholds at both the protein and peptide levels [21]. Protein-exclusive intensity values were assessed for quality using ProteiNorm [22]. ...
Article
Full-text available
Purpose Pancreatic ductal adenocarcinoma (PDAC) remains a leading cause of cancer-related deaths, and perineural invasion (PNI), in which cancer cells infiltrate nerves, enables metastasis in most patients. PNI is largely attributed to Schwann cells (SC) that, when activated, accelerate cancer cell migration towards nerves. However, this cancer-associated reprogramming is generally under-appreciated. Additionally, tumor extracellular vesicle (EV) facilitation of cancer aggravation is well documented, but more investigation is required to better understand their role in PNI. Here, we assessed whether PDAC EVs mediate PNI via SC activation using tissue-engineered in vitro platforms and PANC-1 and HPNE human cell lines as models. Methods NanoSight, Luminex®, and proteomic-pathway analyses characterized tumor (PANC-1) and healthy cell (HPNE) EVs. Human Schwann-like cells (sNF96.2) were embedded in decellularized nerve matrix hydrogels and then treated with EVs and a cargo-function-blocking antibody. Immunofluorescence and Luminex® multiplex assays assessed Schwann cell activation. Subsequently, sNF96.2 cells were co-cultured with EVs and either PANC-1 or HPNE cells; Transwell® invasion assays with SC-conditioned media were also conducted to establish a mechanism of in vitro PNI. Results PANC-1 EVs contained higher levels of interleukin-8 (IL-8) signaling-associated proteins than HPNE EVs. Within nerve-mimetic in vitro testbeds, PANC-1 EVs promoted sNF96.2 activation per cytoskeletal marker alterations and secretion of pro-tumorigenic cytokines, e.g., chemokine ligand-2 (CCL2), via IL-8 cargoes. Furthermore, the IL-8/CCL2 axis heightened PANC-1 invasiveness. Conclusion These findings highlight the potential role of PDAC EVs in PNI, which necessitates continued preclinical assessments with increased biodiversity to determine the efficacy of targeting IL-8/CCL2 for PNI.
Preprint
Full-text available
Objective: Atherosclerosis is a chronic inflammatory disease primarily affecting large arteries and is the leading cause of cardiovascular disease. MER proto-oncogene tyrosine kinase (MerTK) plays a key role in regulating efferocytosis, a process for the clearance of apoptotic cells. This study investigates the specific contribution of endothelial MerTK to atherosclerosis development. Approach and Results: Big data analytics, human microarray analyses, proteomics, and a unique mouse model with MerTK deficiency in endothelial cells (MerTKflox/floxTie2Cre) were utilized to elucidate the role of endothelial MerTK in atherosclerosis development. Our big data analytics, encompassing approximately 98881 cross analyses including 234 analyses for atherosclerosis in the aortic arch, along with human microarray data, reveal that inflammatory responses play a predominant role in atherosclerosis. In vivo, MerTKflox/floxTie2Cre mice and the littermate control MerTKflox/flox mice were used to establish an early stage of atherosclerosis model through a high-fat diet combined with AAV8-PCSK9 treatment. Consistent with big data analytics and human microarray analyses, our proteomics data showed that MerTKflox/floxTie2Cre mice demonstrated significantly enhanced proinflammatory signaling, mitochondrial dysfunction, and activated mitogen-activated protein kinase (MAPK) pathway compared to that of MerTKflox/flox mice. Endothelial MerTK deficiency induces endothelial dysfunction (enhanced endothelial inflammation, mitochondrial dysfunction, and activation of NADPH oxidases and MAPK signaling pathways) and subsequently causes smooth muscle cell (SMC) phenotypic alterations, ultimately promoting atherosclerosis development. Conclusions: Our findings provide strong evidence that endothelial MerTK impairment serves as a novel mechanism in promoting atherosclerosis development.
Article
Fibrosis is a key feature of a broad spectrum of cystic kidney diseases, especially autosomal recessive kidney disorders such as nephronophthisis (NPHP). However, its contribution to kidney function decline and the underlying molecular mechanism(s) remains unclear. Here, we show that kidney-specific deletion of Fbxw7 , the recognition receptor of the SCF FBW7 E3 ubiquitin ligase, results in a juvenile-adult NPHP-like pathology characterized by slow-progressing corticomedullary cysts, tubular degeneration, severe fibrosis, and gradual loss of kidney function. Expression levels of SOX9, a known substrate of FBW7, and WNT4, a potent pro-fibrotic factor and downstream effector of SOX9, were elevated upon loss of FBW7. Heterozygous deletion of Sox9 in compound mutant mice led to the normalization of WNT4 levels, reduced fibrosis, and preservation of kidney function without significant effects on cystic dilatation and tubular degeneration. These data suggest that FBW7-SOX9-WNT4-induced fibrosis drives kidney function decline in NPHP and, possibly, other forms of autosomal recessive kidney disorders.
Article
Polo‐like kinase 1 (Plk1) is a serine/threonine kinase involved in regulating the cell cycle. It is activated by aurora kinase B along with the cofactors Borealin, INCE, and survivin. Plk1 is involved in the development of resistances to chemotherapeutics such as doxorubicin, Taxol, and gemcitabine. It has been shown that patients with higher levels of Plk1 have lower survival rates. Onvansertib is a competitive ATP inhibitor for Plk1 in clinical trials for the treatment of tumors and has recently entered a trial for the treatment of KRAS mutant colorectal cancers (CRCs). In this study, we conducted an untargeted liquid chromatography–mass spectrometry (LC–MS) proteomics study as well as an untargeted lipidomics analysis of HCT 116 spheroids treated with onvansertib over a 72‐h treatment time‐course experiment. Mass spectrometry imaging (MSI) showed that onvansertib begins to accumulate most prominently after 12 h of treatment and continues to accumulate through 72 h. Proteomic results displayed alterations to cell cycle control proteins and an increasing abundance of aurora kinase B and Borealin. The proteomics data also showed alterations to many lipid metabolism enzymes. The MSI lipidomics data indicated alterations to phosphatidylcholine lipids, with many lipids increasing in abundance over time or increasing until 12 h of onvansertib treatment and decreasing after that time point. In summary, these results suggest that onvansertib is causing cells within the spheroid to halt at a certain phase of the cell cycle in accordance with previous literature. Our findings suggest the S phase is likely interrupted, with observed alterations in cell cycle control proteins and PC lipid abundance.
Preprint
Full-text available
The cell that a virus replicates in i.e., the producer cell, can alter the macromolecular composition and infectious capacity of the virions that are produced. Herpes Simplex virus type 1 (HSV-1) primarily infects keratinocytes of the epidermis or oral mucosa prior to establishing latency in neurons of the peripheral nervous system, where the virus can persist for the lifetime of the host. Many cell lines that are used to amplify HSV-1 are derived from species and tissue types that are less physiologically relevant to HSV-1 disease. To understand if the producer cell type influences HSV-1 infection, we tested the infectivity of HSV-1 derived from immortalized African green monkey kidney cells (vero), immortalized human keratinocytes (HaCaT), and primary human foreskin fibroblasts (HFF-1). We observed that the producer cell type alters the capacity of HSV-1 to produce viral proteins and infectious virions from infected cells and susceptibility to inhibition of replication by interferon treatment. HaCaT-derived HSV-1 consistently exhibited enhanced replication over HFF-1 or vero-derived virus. To determine if the producer cell type changes the protein composition of virions, we performed an untargeted LC/MS-MS analysis of virions purified from each cell line. Comparison of virion associated proteins revealed quantitative differences in composition of both cellular and viral proteins including ICP0, pUL24 and pUL42. These results highlight the influence that the producer cell-type has on HSV-1 infection outcomes and suggest that cell type specific factors can alter HSV-1 and impact viral replication. Importance Approximately 67% of the human population harbors HSV-1 infection. To study HSV-1 infection, laboratories utilize several different cell lines to propagate HSV-1 for downstream experiments. The type of cell used to produce a virus, i.e. the producer cell type, can alter the macromolecular composition, immunogenicity, and infectivity of the virions that are produced across several virus families. We found that the producer cell type of HSV-1 alters virion infectivity and virion protein composition. Therefore, the producer cell type may have implications in the spread of HSV-1 and subsequent disease outcomes in humans. Our results also raise concerns about how the use of different ceil types to propagate HSV-1 may alter the outcome, interpretation, and reproducibility of experimental results.
Article
Full-text available
Optimizing DIA 2/42 Abbreviations: CV, Coefficient of variation; DDA, Data-dependent acquisition; DIA, Data-independent acquisition; FDR, false discovery rate; MS1, Peptide precursor survey scan; MS2, Fragment ion scan; S1BF, Somatosensory cortex 1 barrel field Optimizing DIA 3/42 Summary Comprehensive, reproducible and precise analysis of large sample cohorts is one of the key objectives of quantitative proteomics. Here, we present an implementation of data-independent acquisition using its parallel acquisition nature that surpasses the limitation of serial MS2 acquisition of data-dependent acquisition on a quadrupole ultra-high field Orbitrap mass spectrometer. In deep single shot data-independent acquisition, we identified and quantified 6,383 proteins in human cell lines using 2-or-more peptides/protein and over 7,100 proteins when including the 717 proteins that were identified on the basis of a single peptide sequence. 7,739 proteins were identified in mouse tissues using 2-or-more peptides/protein and 8,121 when including the 382 proteins that were identified on the basis of a single peptide sequence. Missing values for proteins were within 0.3 to 2.1% and median coefficients of variation of 4.7 to 6.2% among technical triplicates. In very complex mixtures, we could quantify 10,780 proteins and 12,192 proteins when including the 1,412 proteins that were identified on the basis of a single peptide sequence. Using this optimized DIA, we investigated large-protein networks before and after the critical period for whisker experience-induced synaptic strength in the murine somatosensory cortex 1 barrel field. This work shows that parallel mass spectrometry enables proteome profiling for discovery with high coverage, reproducibility, precision and scalability. Optimizing DIA 4/42
Article
Full-text available
Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.
Article
Full-text available
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. Graphical Abstractᅟ
Article
Full-text available
Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. Here we describe quantitative mass-spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKP1 to elevated expression of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.
Preprint
Proteins can be phosphorylated at neighboring sites resulting in different functional states, and studying the regulation of these sites has been challenging. Here we present Thesaurus, a search engine that detects new positional isomers using site-specific fragment ions from parallel reaction monitoring and data independent acquisition mass spectrometry experiments. We apply Thesaurus to analyze phosphorylation events in the PI3K/AKT signaling pathway and show neighboring sites with distinct quantitative profiles, indicating regulation by different kinases.
Book
Although there has been a surge of interest in density estimation in recent years, much of the published research has been concerned with purely technical matters with insufficient emphasis given to the technique’s practical value. Furthermore, the subject has been rather inaccessible to the general statistician. The account presented in this book places emphasis on topics of methodological importance, in the hope that this will facilitate broader practical application of density estimation and also encourage research into relevant theoretical work. The book also provides an introduction to the subject for those with general interests in statistics. The important role of density estimation as a graphical technique is reflected by the inclusion of more than 50 graphs and figures throughout the text. Several contexts in which density estimation can be used are discussed, including the exploration and presentation of data, nonparametric discriminant analysis, cluster analysis, simulation and the bootstrap, bump hunting, projection pursuit, and the estimation of hazard rates and other quantities that depend on the density. This book includes general survey of methods available for density estimation. The Kernel method, both for univariate and multivariate data, is discussed in detail, with particular emphasis on ways of deciding how much to smooth and on computation aspects. Attention is also given to adaptive methods, which smooth to a greater degree in the tails of the distribution, and to methods based on the idea of penalized likelihood.
Article
Progress in proteomics is mainly driven by advances in mass spectrometric (MS) technologies. Here we benchmarked the performance of the latest MS instrument in the benchtop Orbitrap series, the Q Exactive HF-X, against its predecessor for proteomics applications. A new peak picking algorithm, a brighter ion source and optimized ion transfers enable productive MS/MS acquisition above 40 Hz at 7500 resolution. The hardware and software improvements collectively resulted in improved peptide and protein identifications across all comparable conditions, with an increase of up to fifty percent at short LC-MS gradients, yielding identification rates of more than one thousand unique peptides per minute. Alternatively, the Q Exactive HF-X is capable of achieving the same proteome coverage as its predecessor in approximately half the gradient time or at 10-fold lower sample loads. The Q Exactive HF-X also enables rapid phosphoproteomics with routine analysis of more than five thousand phosphopeptides with short single-shot 15-minute LC-MS/MS measurements, or 16,700 phosphopeptides quantified across ten conditions in six gradient hours using TMT10-plex and offline peptide fractionation. Finally, exciting perspectives for data independent acquisition are highlighted with reproducible identification of 55,000 unique peptides covering 5900 proteins in half-an-hour of MS analysis. Full text at https://pubs.acs.org/doi/10.1021/acs.jproteome.7b00602
Article
Data-independent acquisition (DIA) is an emerging mass spectrometry (MS)-based technique for unbiased and reproducible measurement of protein mixtures. DIA tandem mass spectrometry spectra are often highly multiplexed, containing product ions from multiple cofragmenting precursors. Detecting peptides directly from DIA data is therefore challenging; most DIA data analyses require spectral libraries. Here we present PECAN (http://pecan.maccosslab.org), a library-free, peptide-centric tool that robustly and accurately detects peptides directly from DIA data. PECAN reports evidence of detection based on product ion scoring, which enables detection of low-abundance analytes with poor precursor ion signal. We demonstrate the chromatographic peak picking accuracy and peptide detection capability of PECAN, and we further validate its detection with data-dependent acquisition and targeted analyses. Lastly, we used PECAN to build a plasma proteome library from DIA data and to query known sequence variants.
Article
Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Article
Next-generation mass spectrometric (MS) techniques such as SWATH-MS have substantially increased the throughput and reproducibility of proteomic analysis, but ensuring consistent quantification of thousands of peptide analytes across multiple liquid chromatography-tandem MS (LC-MS/MS) runs remains a challenging and laborious manual process. To produce highly consistent and quantitatively accurate proteomics data matrices in an automated fashion, we developed TRIC (http://proteomics.ethz.ch/tric/), a software tool that utilizes fragment-ion data to perform cross-run alignment, consistent peak-picking and quantification for high-throughput targeted proteomics. TRIC reduced the identification error compared to a state-of-the-art SWATH-MS analysis without alignment by more than threefold at constant recall while correcting for highly nonlinear chromatographic effects. On a pulsed-SILAC experiment performed on human induced pluripotent stem cells, TRIC was able to automatically align and quantify thousands of light and heavy isotopic peak groups. Thus, TRIC fills a gap in the pipeline for automated analysis of massively parallel targeted proteomics data sets.