Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach.

Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA.
Molecular &amp Cellular Proteomics (Impact Factor: 7.25). 12/2010; 9(12):2772-82. DOI: 10.1074/mcp.M110.002766
Source: PubMed

ABSTRACT Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications. Because tandem mass spectra of intact proteins are very complex, spectral deconvolution (grouping peaks into isotopomer envelopes) is a key initial stage for their interpretation. In such spectra, isotopomer envelopes of different protein fragments span overlapping regions on the m/z axis and even share spectral peaks. This raises both pattern recognition and combinatorial challenges for spectral deconvolution. We present MS-Deconv, a combinatorial algorithm for spectral deconvolution. The algorithm first generates a large set of candidate isotopomer envelopes for a spectrum, then represents the spectrum as a graph, and finally selects its highest scoring subset of envelopes as a heaviest path in the graph. In contrast with other approaches, the algorithm scores sets of envelopes rather than individual envelopes. We demonstrate that MS-Deconv improves on Thrash and Xtract in the number of correctly recovered monoisotopic masses and speed. We applied MS-Deconv to a large set of top-down spectra from Yersinia rohdei (with a still unsequenced genome) and further matched them against the protein database of related and sequenced bacterium Yersinia enterocolitica. MS-Deconv is available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: For high-resolution tandem mass spectra, the determination of monoisotopic masses of fragment ions plays a key role in the subsequent peptide and protein identification. In this paper, we present a new algorithm for deisotoping the bottom-up spectra. Isotopic-cluster graphs are constructed to describe the relationship between all possible isotopic clusters. Based on the relationship in isotopic-cluster graphs, each possible isotopic cluster is assessed with a score function, which is built by combining nonintensity and intensity features of fragment ions. The non-intensity features are used to prevent fragment ions with low intensity from being removed. Dynamic programming is adopted to find the highest score path with the most reliable isotopic clusters. The experimental results have shown that the average Mascot scores and F-scores of identified peptides from spectra processed by our deisotoping method are greater than those by YADA and MS-Deconv software.
    Advances in Bioinformatics 01/2011; 2011:210805. DOI:10.1155/2011/210805
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Top-down mass spectrometry plays an important role in intact protein identification and characterization. Top-down mass spectra are more complex than bottom-up mass spectra because they often contain many isotopomer envelopes from highly charged ions, which may overlap with one another. As a result, spectral deconvolution, which converts a complex top-down mass spectrum into a monoisotopic mass list, is a key step in top-down spectral interpretation. In this paper, we propose a new scoring function, L-score, for evaluating isotopomer envelopes. By combining L-score with MS-Deconv, a new software tool, MS-Deconv+, was developed for top-down spectral deconvolution. Experimental results showed that MS-Deconv+ outperformed existing software tools in top-down spectral deconvolution. L-score shows high discriminative ability in identification of isotopomer envelopes. Using L-score, MS-Deconv+ reports many correct monoisotopic masses missed by other software tools, which are valuable for proteoform identification and characterization.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Capillary zone electrophoresis (CZE) with an electrokinetically-pumped sheath-flow nanospray interface was coupled with a high-resolution Q-Exactive mass spectrometer for the analysis of culture filtrates from Mycobacterium marinum. We confidently identified 22 gene products from the wildtype M. marinum secretome in a single CZE-tandem mass spectrometry (MS/MS) run. A total of 58 proteoforms were observed with post-translational modifications including signal peptide removal, N-terminal methionine excision, and acetylation. The conductivities of aqueous acetic acid and formic acid solutions were measured from 0.1% to 100% concentration (v/v). 70% acetic acid provided lower conductivity than 0.25% formic acid, and was evaluated as low ionic-strength and CZE-MS compatible sample buffer with good protein solubility.
    Analytical Chemistry 04/2014; 86(10). DOI:10.1021/ac500092q · 5.83 Impact Factor