Popitam: towards new heuristic strategies to improve protein identification from tandem mass spectrometry data.

Swiss Institute of Bioinformatics, Geneva, Switzerland.
PROTEOMICS (Impact Factor: 3.97). 07/2003; 3(6):870-8. DOI: 10.1002/pmic.200300402
Source: PubMed

ABSTRACT In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Chromosome-Centric Human Proteome Project (C-HPP) is a global project aimed to identify at least one protein isoform encoded by the approximately 20, 300 human genes. In addition, protein post-translational modifications will be characterized, with the initial goal of detecting phosphorylation, acetylation, and glycosylation sites in each protein. In this chapter, we provide an overview of known post-translational modifications, their known biological functions, and present strategies to detect them on both a single protein and proteomic scales. In future proteomic studies, global characterization of post-translation modifications, splice variants, and variants caused by single nucleotide polymorphisms (SNPs) will be necessary to fully understand the role of proteins in human biology and disease.
    Genomics and Proteomics for Clinical Discovery and Development, 1 edited by György Marko-Varga, 07/2014: chapter 6: pages 101-136; Springer., ISBN: 978-94-017-9201-1
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein identification is the most important and basic problem for proteomics. Using tandem mass spectrometry and database search is one of the most widely used identification techniques. However, the improved sensitivity of mass spectrometers, rapid expansion of databases and more complex analysis, like post-translational modification and non-specific enzymatic digestion, have challenged current restricted protein identification search engines in scale and speed severely. In this paper, we proposed an open protein identification method relaxing enzyme, and presented our distributed design to support big protein database with non-specific digestion analysis based on pFind, a practical tandem mass spectra search engine developed in China. With classical bigger protein databases ipi. HUMAN and uniprot-sprot we got nearly linear speedup in a 20-blade cluster. By further analysis, we can expect real time identification to some extent.
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. We provide an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.


Available from
Jun 10, 2014