Improved validation of peptide MS/MS assignments using spectral intensity prediction.
ABSTRACT A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.
- SourceAvailable from: link.springer.com[show abstract] [hide abstract]
ABSTRACT: Bonds that break in collision-induced dissociation (CID) are often weakened by a nearby proton, which can, in principle, be carried away by either of the product fragments. Since peptide backbone dissociation is commonly charge-directed, relative intensities of charge states of product y- and b-ions depend on the final location of that proton. This study examines y-ion charge distributions for dissociation of doubly charged peptide ions, using a large reference library of peptide ion fragmentation generated from ion-trap CID of peptide ions from tryptic digests. Trends in relative intensities of y(2+) and y(1+) ions are examined as a function of bond cleavage position, peptide length (n), residues on either side of the bond and effects of residues remote from the bond. It is found that y(n-2)/b(2) dissociation is the most sensitive to adjacent amino acids, that y(2+)/y(1+) steadily increase with increasing peptide length, that the N-terminal amino acid can have a major influence in all dissociations, and in some cases other residues remote from the bond cleavage exert significant effects. Good correlation is found between the values of y(2+)/y(1+) for the peptide and the proton affinities of the amino acids present at the dissociating peptide bond. A few deviations from this correlation are rationalized by specific effects of the amino acid residues. These correlations can be used to estimate trends in y(2+)/y(1+) ratios for peptide ions from amino acid proton affinities.Journal of the American Society for Mass Spectrometry 05/2011; 22(5):898-905. · 3.59 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.Journal of Proteome Research 01/2011; 10(4):1593-602. · 5.06 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies.Molecular & Cellular Proteomics 04/2011; 10(7):M111.007666. · 7.25 Impact Factor