Peptide identification from mixture tandem mass spectra.

Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
Molecular &amp Cellular Proteomics (Impact Factor: 7.25). 03/2010; 9(7):1476-85. DOI: 10.1074/mcp.M000136-MCP201
Source: PubMed

ABSTRACT The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: For the preceding conference see [Zbl 1241.92023].
  • [Show abstract] [Hide abstract]
    ABSTRACT: Aberrant cell signaling events either drive or compensate for nearly all pathologies. A thorough description and quantification of maladaptive signaling flux in disease is a critical step in drug development, and complex proteomic approaches can provide valuable mechanistic insights. Traditional proteomics-based signaling analyses rely heavily on in vitro cellular monoculture. The characterization of these simplified systems generates a rich understanding of the basic components and complex interactions of many signaling networks, but they cannot capture the full complexity of the microenvironments in which pathologies are ultimately made manifest. Unfortunately, techniques that can directly interrogate signaling in situ often yield mass-limited starting materials that are incompatible with traditional proteomics workflows. This review provides an overview of established and emerging techniques that are applicable to context-dependent proteomics. Analytical approaches are illustrated through recent proteomics-based studies in which selective sample acquisition strategies preserve context-dependent information, and where the challenge of minimal starting material is met by optimized sensitivity and coverage. This review is organized into three major technological themes: (1) LC methods inline with mass spectrometry; (2) Antibody-based approaches; (3) MS Imaging with a discussion of data integration and systems modeling. Finally, we conclude with future perspectives and implications of context-dependent proteomics.This article is protected by copyright. All rights reserved
    Proteomics 12/2014; DOI:10.1002/pmic.201400448 · 3.97 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In large-scale proteomic experiments, multiple peptide precursors are often co-fragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols \emph{promoting} the co-fragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a approximately 30% - 390%. Analysis of multiple datasets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.
    Molecular &amp Cellular Proteomics 09/2014; 13(12). DOI:10.1074/mcp.O113.037218 · 7.25 Impact Factor