RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.
Biology Direct (Impact Factor: 4.04). 02/2007; 2:25. DOI: 10.1186/1745-6150-2-25
Source: PubMed

ABSTRACT BACKGROUND: The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. RESULTS: Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.


Available from: Aleksey Y Ogurtsov, Apr 25, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The past 15 years have seen significant progress in LC-MS/MS peptide sequencing, including the advent of successful de novo and database search methods; however, analysis of glycopeptide and, more generally, glycoconjugate spectra remains a much more open problem, and much annotation is still performed manually. This is partly because glycans, unlike peptides, need not be linear chains, and are instead described by trees. In this paper we introduce SweetSEQer, an extremely simple open source tool for identifying potential glycopeptide MS/MS spectra. We evaluate SweetSEQer on manually curated glycoconjugate spectra and on negative controls, and demonstrate high-quality filtering that can be easily improved for specific applications. We also demonstrate a high overlap between peaks annotated by experts and peaks annotated by SweetSEQer, as well as demonstrate inferred glycan graphs consistent with canonical glycan tree motifs.
    Molecular &amp Cellular Proteomics 02/2013; DOI:10.1074/mcp.O112.025940 · 7.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The hypothesis that dissociation energies can serve as a predictor of observability of b- and y-peaks is tested for seven hexapeptides. If the hypothesis holds true for large classes of peptides, one would be able to improve the scoring accuracy of peptide identification tools by excluding theoretical peaks that cannot be observed in practical product ion spectra due to various physical, chemical or thermodynamic considerations. Product ion m/z spectra of hexapeptides AAAAAA, AAAFAA, AAAVAA, AAFAAA, AAVAAA, AAFFAA and AAVVAA have been acquired on a Finnigan LTQ XL mass spectrometer in the collision-induced dissociation (CID) activation mode on a grid of activation times 0.05 to 100 ms and normalized collision energy 10 to 35%. Dissociation energies were calculated for all fragmentation channels leading to b- and y-fragments at the TPSS/6-31G(d,p) level of the density functional theory. It was demonstrated that the m/z peaks observed in the product ion spectra correspond to the fragmentation channels with dissociation energies below a certain threshold value. However, there is no direct correlation between the most intense m/z peaks and the lowest dissociation energies. Using the dissociation energies, it was predicted that out of 63 theoretically possible peaks in the b- and y-series of the seven hexapeptides, 19 should not be observable in practical spectra. In the experiments, 24 peaks were not observed, including all 19 predicted. Dissociation energies alone are not sufficient for predicting ion intensity relationships in product ion m/z spectra. Nevertheless, the present data suggest that dissociation energies appear to be good predictors of observability of b- and y-peaks and potentially very useful for filtering theoretical peaks of each candidate peptide in peptide identification tools. Published 2012. This article is a US Government work and is in the public domain in the USA.
    Rapid Communications in Mass Spectrometry 01/2013; 27(1):152-6. DOI:10.1002/rcm.6451 · 2.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to its high specificity, trypsin is the enzyme of choice in shotgun proteomics. Nonetheless, several publications do report the identification of semi-tryptic and non-tryptic peptides. Many of these peptides are conjectured to be signaling peptides or to have formed during sample preparation. It is known that only a small fraction of tandem mass spectra from a trypsin-digested protein mixture can be confidently matched to tryptic peptides. Leaving aside other possibilities such as post-translational modifications and single amino acid polymorphisms, this suggests that many unidentified spectra originate from semi-tryptic and non-tryptic peptides . To include them in database searches, however, may not improve overall peptide identification due to possible sensitivity reduction from search space expansion. To circumvent this issue for E-value based search methods, we have designed a scheme that categorizes qualified peptides ( i.e., peptides whose molecular weight differences from the parent ion are within a specified error tolerance) into three tiers: tryptic, semi-tryptic and non-tryptic. This classification allows peptides belonging to different tiers to have different Bonferroni correction factors. Our results show that this scheme can significantly improve retrieval performance when compared to search strategies that assign equal Bonferroni correction factors to all qualified peptides.
    Journal of Proteome Research 05/2013; 12(6). DOI:10.1021/pr301139y · 5.00 Impact Factor