Distributions of ion series in ETD and CID spectra: making a comparison.
ABSTRACT Databases which capture proteomic data for subsequent interrogation can be extremely useful for our understanding of peptide ion behaviour in the mass spectrometer, leading to novel hypotheses and mechanistic understanding of the underlying mechanisms determining peptide fragmentation behaviour. These, in turn, can be used to improve database searching algorithms for use in automated and unbiased interpretation of peptide product ion spectra. Here, we examine a previously published dataset using our established methods, in order to discover differences in the observation of product ions of different types, following ion activation and unimolecular dissociation either by collisional dissociation or the ion/ion reaction, electron transfer dissociation. Using a target-decoy database searching strategy, a large data set of precursor ions, were confidently predicted as peptide sequence matches (PSMs) at either a 1% or 5% peptide false discovery rate, as reported in our previous study. Using these high quality PSMs, we have conducted a more detailed and novel analysis of the global trends in observed product ions present/absent in these spectra, examining both CID and ETD data. We uncovered underlying trends for an increased propensity for the observation of higher members of the ion series in ETD product ion spectra in comparison to their CID counterparts. Such data-mining efforts will prove useful in the generation of new database searching algorithms which are well suited to the analysis of ETD product ion spectra.
- SourceAvailable from: Adrian Guthals[show abstract] [hide abstract]
ABSTRACT: The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.Molecular & Cellular Proteomics 05/2012; 11(9):550-7. · 7.25 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.Molecular & Cellular Proteomics 04/2012; 11(8):478-91. · 7.25 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.Molecular & Cellular Proteomics 07/2012; 11(10):1084-96. · 7.25 Impact Factor