Peptide Identification from Mixture Tandem Mass Spectra

Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
Molecular &amp Cellular Proteomics (Impact Factor: 6.56). 03/2010; 9(7):1476-85. DOI: 10.1074/mcp.M000136-MCP201
Source: PubMed


The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.

1 Follower
12 Reads
  • Source
    • "After the reanalysis, the results can then be used to either enrich existing datasets or to create new ones. MassIVE enables on-site reanalysis using a variety of data workflows: standard database searches (MS-GF+) [79], proteogenomics searches against genomics/transcriptomics sequences (ENOSI) [80] [81] [82], discovery of unexpected modifications (MODa) [83], identification of mixture spectra using spectral library (MSPLIT) [84] and database (MixDB) search [85], de novo sequencing of peptides (PepNovo) [86] and proteins (Meta-SPS) [87], molecular spectral networks (including both peptides and metabolites), and top-down protein identification (MS-Align+) [88] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Compared to other data intensive disciplines such as genomics, public deposition and storage of mass spectrometry (MS)-based proteomics data is still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas and the PRoteomics IDEntifications (PRIDE) database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, MassIVE, Chorus, MaxQB, PASSEL, MOPED and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed for enabling a better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.This article is protected by copyright. All rights reserved
    Proteomics 08/2014; 15(5-6). DOI:10.1002/pmic.201400302 · 3.81 Impact Factor
  • Source
    • "Some use the identification idea of cross-linked peptides to identify mixed spectra [31]. Others use simulated mixed spectra to study the influence of co-eluted precursors on database and spectral library searches [32] [33] [34]. These methods have shown that mixed spectra are common but less likely to result in accurate identification. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Determining the monoisotopic peak of a precursor is a first step in interpreting mass spectra, which is basic but non-trivial. The reason is that in the isolation window of a precursor, other peaks interfere with the determination of the monoisotopic peak, leading to wrong mass-to-charge ratio or charge state. Here we propose a method, named pParse, to export the most probable monoisotopic peaks for precursors, including co-eluted precursors. We use the relationship between the position of the highest peak and the mass of the first peak to detect candidate clusters. Then, we extract three features to sort the candidate clusters: (i) the sum of the intensity, (ii) the similarity of the experimental and the theoretical isotopic distribution, and (iii) the similarity of elution profiles. We showed that the recall of pParse, MaxQuant, and BioWorks was 98-98.8%, 0.5-17%, and 1.8-36.5% at the same precision, respectively. About 50% of tandem mass spectra are triggered by multiple precursors which are difficult to identify. Then we design a new scoring function to identify the co-eluted precursors. About 26% of all identified peptides were exclusively from co-eluted peptides. Therefore, accurately determining monoisotopic peaks, including co-eluted precursors, can greatly increase peptide identification rate.
    Proteomics 01/2012; 12(2):226-35. DOI:10.1002/pmic.201100081 · 3.81 Impact Factor
  • Source
    • "In addition to the expected improvement in sensitivity from searching against a small targeted sequence database, the neuropeptide spectral libraries further improve identification efficiency, sensitivity and reliability by considering all spectral features, including actual fragment intensities, neutral losses from fragments and various uncommon or even unknown fragments to determine the best matches. NeuroPedia spectral libraries are compatible with the publicly available spectral library search tool M-SPLIT (Wang et al., 2010) and can be easily converted to other spectral library formats. To further facilitate visual evaluation of neuropeptide MS/MS spectra, NeuroPedia provides annotated spectrum images for every library spectrum and further separates spectral libraries by species, digestion enzyme and instrument type (see Supplementary Table) "
    [Show abstract] [Hide abstract]
    ABSTRACT: Neuropeptides are essential for cell-cell communication in neurological and endocrine physiological processes in health and disease. While many neuropeptides have been identified in previous studies, the resulting data has not been structured to facilitate further analysis by tandem mass spectrometry (MS/MS), the main technology for high-throughput neuropeptide identification. Many neuropeptides are difficult to identify when searching MS/MS spectra against large protein databases because of their atypical lengths (e.g. shorter/longer than common tryptic peptides) and lack of tryptic residues to facilitate peptide ionization/fragmentation. NeuroPedia is a neuropeptide encyclopedia of peptide sequences (including genomic and taxonomic information) and spectral libraries of identified MS/MS spectra of homolog neuropeptides from multiple species. Searching neuropeptide MS/MS data against known NeuroPedia sequences will improve the sensitivity of database search tools. Moreover, the availability of neuropeptide spectral libraries will also enable the utilization of spectral library search tools, which are known to further improve the sensitivity of peptide identification. These will also reinforce the confidence in peptide identifications by enabling visual comparisons between new and previously identified neuropeptide MS/MS spectra. Supplementary materials are available at Bioinformatics online.
    Bioinformatics 08/2011; 27(19):2772-3. DOI:10.1093/bioinformatics/btr445 · 4.98 Impact Factor
Show more