Popitam: towards new heuristic strategies to improve protein identification from tandem mass spectrometry data.

Swiss Institute of Bioinformatics, Geneva, Switzerland.
PROTEOMICS (Impact Factor: 4.13). 07/2003; 3(6):870-8. DOI: 10.1002/pmic.200300402
Source: PubMed

ABSTRACT In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Label-free quantitative LC-MS profiling of complex body fluids has become an important analytical tool for biomarker and biological knowledge discovery in the past decade. Accurate processing, statistical analysis and validation of acquired data diversified by the different types of mass spectrometers, mass spectrometer parameter settings and applied sample preparation steps are essential to answer complex life science research questions and understand the molecular mechanism of disease onset and developments. This review provides insight into the main modules of label-free data processing pipelines with statistical analysis and validation and discusses recent developments. Special emphasis is devoted to quality control methods, performance assessment of complete workflows and algorithms of individual modules. Finally, the review discusses the current state and trends in high throughput data processing and analysis solutions for users with little bioinformatics knowledge.
    Talanta 01/2011; 83(4):1209-24. · 3.50 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cette thèse se place dans le contexte de l'analyse protéomique et peptidomique. D'un point de vue analytique, l'objectif de ma thèse a été d'implémenter au laboratoire des techniques capables de fournir rapidement des identifications fiables de protéines et de peptides en mélange complexe, quelles que soient leurs propriétés physico-chimiques. D'un point de vue biologique, nous souhaitions appréhender le phénomène de desquamation du stratum corneum chez l'Homme, en collaboration avec la société L'Oréal. Ce phénomène implique la dégradation par des enzymes spécifiques des protéines des cornéodesmosomes, jonctions cellulaires qui assurent la cohésion des cellules entre elles. Contrairement aux approches biochimiques classiques utilisées généralement dans ce type d'application (ciblage de protéases spécifiques), nous avons choisi d'adopter une méthodologie sans a priori, qui fait le lien entre la peptidomique et la dégradomique. Pour ce faire, nous avons centré notre étude sur les peptides endogènes issus de la dégradation des protéines du stratum corneum afin d'identifier des peptides spécifiques de la desquamation.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High throughput protein identification and quantification analysis based on mass spectrometry are fundamental steps in most proteomics projects. Here, we present EasyProt (available at, a new platform for mass spectrometry data processing, protein identification, quantification and unexpected post-translational modification characterization. EasyProt provides a fully integrated graphical experience to perform a large part of the proteomic data analysis workflow. Our goal was to develop a software platform that would fulfill the needs of scientists in the field, while emphasizing ease-of-use for non-bioinformatician users. Protein identification is based on OLAV scoring schemes and protein quantification is implemented for both, isobaric labeling and label-free methods. Additional features are available, such as peak list processing, isotopic correction, spectra filtering, charge-state deconvolution and spectra merging. To illustrate the EasyProt platform, we present two identification and quantification workflows based on isobaric tagging and label-free methods.
    Journal of proteomics 12/2012; · 5.07 Impact Factor


Available from
Jun 10, 2014