Improved validation of peptide MS/MS assignments using spectral intensity prediction.
ABSTRACT A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.
- SourceAvailable from: export.arxiv.org[Show abstract] [Hide abstract]
ABSTRACT: Mass spectrometry provides a high-throughput approach to identify proteins in biological samples. A key step in the analysis of mass spectrometry data is to identify the peptide sequence that, most probably, gave rise to each observed spectrum. This is often tackled using a database search: each observed spectrum is compared against a large number of theoretical "expected" spectra predicted from candidate peptide sequences in a database, and the best match is identified using some heuristic scoring criterion. Here we provide a more principled, likelihood-based, scoring criterion for this problem. Specifically, we introduce a probabilistic model that allows one to assess, for each theoretical spectrum, the probability that it would produce the observed spectrum. This probabilistic model takes account of peak locations and intensities, in both observed and theoretical spectra, which enables incorporation of detailed knowledge of chemical plausibility in peptide identification. Besides placing peptide scoring on a sounder theoretical footing, the likelihood-based score also has important practical benefits: it provides natural measures for assessing the uncertainty of each identification, and in comparisons on benchmark data it produced more accurate peptide identifications than other methods, including SEQUEST. Although we focus here on peptide identification, our scoring rule could easily be integrated into any downstream analyses that require peptide-spectrum match scores.The Annals of Applied Statistics 01/2013; 6(4). · 2.24 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: ATP leads to endothelial NO synthase (eNOS)/NO-mediated vasodilation, a process hypothesized to depend on the endothelial caveolar eNOS partitioning and subcellular domain-specific multisite phosphorylation state. We demonstrate herein that, in both the absence and presence of ATP, the uterine artery endothelial caveolae contain specific protein machinery related to subcellular partitioning and act as specific focal "hubs" for NO- and ATP-related proteins. ATP-induced eNOS regulation showed a complex set of multisite posttranslational phosphorylation events that were closely associated with the enzyme's partitioning between caveolar and noncaveolar endothelial subcellular domains. The comprehensive model that we present demonstrates that ATP repartitioned eNOS between the caveolar and noncaveolar subcellular domains; specifically, the stimulatory (PSer635)eNOS was substantially higher in the caveolar pool with subcellular domain-independent increased levels on ATP treatment. The stimulatory (PSer1179)eNOS was not altered by ATP treatment. However, the inhibitory (PThr495)eNOS was regulated predominantly in the caveolar domain with decreased levels on ATP action. In contrast, the agonist-specific (PSer114)eNOS was localized in the noncaveolar pool with increased levels on ATP stimulation. Thus, the endothelial caveolar membrane system plays a pivotal role(s) in ATP-associated subcellular partitioning and possesses the relevant protein machinery for ATP-induced NO regulation. Furthermore, these subcellular domain-specific phosphorylation/dephosphorylation events provide evidence relating to eNOS spatio-temporal dynamics.Hypertension 03/2012; 59(5):1052-9. · 6.87 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining charge retention during CID/ higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.Genomics Proteomics & Bioinformatics 03/2013;