Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction

Department of Chemistry and Biochemistry, University of Colorado at Boulder, Boulder, Colorado, United States
Molecular &amp Cellular Proteomics (Impact Factor: 6.56). 02/2007; 6(1):1-17. DOI: 10.1074/mcp.M600320-MCP200
Source: PubMed


A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.

1 Follower
4 Reads
  • Source
    • "Our results confirm that finer-scale spectra predicted from comprehensive fragmentation pathways can provide valuable information for peptide identification [Sun et al. (2007)] and demonstrate the potential to improve the accuracy of spectra matching by modeling these structures. Similar improvement in peptide identification was also observed in Klammer et al. (2008), who developed a probabilistic model of peptide fragmentation chemistry using the dynamic Bayesian network (DBN) and identified peptides using the features learned from DBN using the support vector machine (SVM). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry provides a high-throughput approach to identify proteins in biological samples. A key step in the analysis of mass spectrometry data is to identify the peptide sequence that, most probably, gave rise to each observed spectrum. This is often tackled using a database search: each observed spectrum is compared against a large number of theoretical "expected" spectra predicted from candidate peptide sequences in a database, and the best match is identified using some heuristic scoring criterion. Here we provide a more principled, likelihood-based, scoring criterion for this problem. Specifically, we introduce a probabilistic model that allows one to assess, for each theoretical spectrum, the probability that it would produce the observed spectrum. This probabilistic model takes account of peak locations and intensities, in both observed and theoretical spectra, which enables incorporation of detailed knowledge of chemical plausibility in peptide identification. Besides placing peptide scoring on a sounder theoretical footing, the likelihood-based score also has important practical benefits: it provides natural measures for assessing the uncertainty of each identification, and in comparisons on benchmark data it produced more accurate peptide identifications than other methods, including SEQUEST. Although we focus here on peptide identification, our scoring rule could easily be integrated into any downstream analyses that require peptide-spectrum match scores.
    The Annals of Applied Statistics 01/2013; 6(4). DOI:10.1214/12-AOAS568 · 1.46 Impact Factor
  • Source
    • "We also note that among the unassigned spectra of lower quality, which were not further interrogated here, many are likely to represent valid peptides. Peptides that fall into the non-mobile proton model category, or contain extra liable bonds, are known to fragment poorly in conventional MS strategies [24], and their analysis requires the use of more sophisticated peptide fragmentation models [25] [26] than what is implemented in most currently available database search tools. Unassigned high-quality spectra were reanalyzed using several additional steps: X! TANDEM database searching against the subset database containing sequences of proteins identified with high ProteinProphet probabilities (greater than or equal to 0.9) [27] in the initial search (to identify additional tryptic peptides by searching against a smaller database compared to the original search, as well as semi-tryptic peptides, and peptides with inaccurately measured precursor ion mass [3], see step i below); "
    [Show abstract] [Hide abstract]
    ABSTRACT: In a typical shotgun proteomics experiment, a significant number of high-quality MS/MS spectra remain "unassigned." The main focus of this work is to improve our understanding of various sources of unassigned high-quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.
    Proteomics 07/2010; 10(14):2712-8. DOI:10.1002/pmic.200900473 · 3.81 Impact Factor
  • Source
    • "Parent mass tolerance was 1.2 Da, MS/MS tolerance was 0.5 Da, and fixed modifications were set to carbamidomethyl cysteine. Peptides were filtered using in-house MSPlus and MAE algorithms previously described (Resing et al., 2004; Sun et al., 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Melanoma and other cancers harbor oncogenic mutations in the protein kinase B-Raf, which leads to constitutive activation and dysregulation of MAP kinase signaling. In order to elucidate molecular determinants responsible for B-Raf control of cancer phenotypes, we present a method for phosphoprotein profiling, using negative ionization mass spectrometry to detect phosphopeptides based on their fragment ion signature caused by release of PO(3)(-). The method provides an alternative strategy for phosphoproteomics, circumventing affinity enrichment of phosphopeptides and isotopic labeling of samples. Ninety phosphorylation events were regulated by oncogenic B-Raf signaling, based on their responses to treating melanoma cells with MKK1/2 inhibitor. Regulated phosphoproteins included known signaling effectors and cytoskeletal regulators. We investigated MINERVA/FAM129B, a target belonging to a protein family with unknown category and function, and established the importance of this protein and its MAP kinase-dependent phosphorylation in controlling melanoma cell invasion into three-dimensional collagen matrix.
    Molecular cell 05/2009; 34(1):115-31. DOI:10.1016/j.molcel.2009.03.007 · 14.02 Impact Factor
Show more