Article

RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.
Biology Direct (Impact Factor: 4.04). 02/2007; 2:25. DOI: 10.1186/1745-6150-2-25
Source: PubMed

ABSTRACT Background
The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides.

Results
Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.

Download full-text

Full-text

Available from: Aleksey Y Ogurtsov, Apr 25, 2014
0 Followers
 · 
122 Views
  • Source
    • "Kim et al. (2009) address the issue of spectrum specificity by calculating a generating function and infer the probability of a correct spectrum identification based on all matching peptides. RAId_DbS (Alves et al., 2007) uses a score in the form of a weighted sum of logarithmic intensities and applies an extension of the Central Limit Theorem to assign statistical significance to the matches. However, the approach based on fitting specific parametric models cannot be generalized to other platforms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Although many methods and statistical approaches have been developed for protein identification by mass spectrometry, the problem of accurate assessment of statistical significance of protein identifications remains an open question. The main issues are as follows: (i) statistical significance of inferring peptide from experimental mass spectra must be platform independent and spectrum specific and (ii) individual spectrum matches at the peptide level must be combined into a single statistical measure at the protein level. We present a method and software to assign statistical significance to protein identifications from search engines for mass spectrometric data. The approach is based on asymptotic theory of order statistics. The parameters of the asymptotic distributions of identification scores are estimated for each spectrum individually. The method relies on new unbiased estimators for parameters of extreme value distribution. The estimated parameters are used to assign a spectrum-specific P-value to each peptide-spectrum match. The protein-level confidence measure combines P-values of peptide-to-spectrum matches. Conclusion: We extensively tested the method using triplicate mouse and yeast high-throughput proteomic experiments. The proposed statistical approach improves the sensitivity of protein identifications without compromising specificity. While the method was primarily designed to work with Mascot, it is platform-independent and is applicable to any search engine which outputs a single score for a peptide-spectrum match. We demonstrate this by testing the method in conjunction with X!Tandem. The software is available for download at ftp://genetics.bwh.harvard.edu/SSPV/. ssunyaev@rics.bwh.harvard.edu Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2011; 27(8):1128-34. DOI:10.1093/bioinformatics/btr089 · 4.62 Impact Factor
  • Source
    • "Taking into account the finite sample effect and skewness, the asymptotic score statistics (P -values) of RAId DbS (Alves et al., 2007a) is derived theoretically. The final E-value for each peptide hit, however, is obtained by multiplying the peptide's P -value by the number of peptides of its category. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Summary: In anticipation of the individualized proteomics era and the need to integrate knowledge from disease studies, we have augmented our peptide identification software RAId DbS to take into account annotated single amino acid polymorphisms, post-translational modifications, and their documented disease associations while analyzing a tandem mass spectrum. To facilitate new discoveries, RAId DbS allows users to conduct searches permitting novel polymorphisms. Availability: The webserver link is http://www.ncbi.nlm.nih.gov/ /CBBResearch/qmbp/raid dbs/index.html. The relevant databases and binaries of RAId DbS for Linux, Windows, and Mac OS X are available from the same web page. Contact: yyu@ncbi.nlm.nih.gov
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an analysis of a LiNbO<sub>3</sub> electro-optic modulator using the finite difference time domain (FDTD) technique, and also a new and efficient multiresolution time-domain technique for fast and accurate modeling of photonic devices. The electromagnetic fields computed by FDTD are coupled to standard electro-optic relations that characterize electo-optic interactions. This novel approach to LiNbO<sub>3</sub> electro-optic modulators using a coupled FDTD technique allows for previously unattainable investigations into device operating bandwidth and data transmission speed. On the other hand, the proposed multiresolution approach presented in this paper solves Maxwell's equations on nonuniform self-adaptive grids, obtained by applying wavelet transforms followed by hard thresholding. The developed technique is employed to simulate a coplanar waveguide CPW, which represents an electro-optic modulator. Different numerical examples are presented showing more than 75% CPU-time reduction, while maintaining the same degree of accuracy of standard FDTD techniques.
    Microwave Symposium Digest, 2004 IEEE MTT-S International; 07/2004
Show more