Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins A COMBINATORIAL APPROACH

Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA.
Molecular &amp Cellular Proteomics (Impact Factor: 6.56). 12/2010; 9(12):2772-82. DOI: 10.1074/mcp.M110.002766
Source: PubMed


Top-down proteomics studies intact proteins, enabling new opportunities for analyzing post-translational modifications. Because tandem mass spectra of intact proteins are very complex, spectral deconvolution (grouping peaks into isotopomer envelopes) is a key initial stage for their interpretation. In such spectra, isotopomer envelopes of different protein fragments span overlapping regions on the m/z axis and even share spectral peaks. This raises both pattern recognition and combinatorial challenges for spectral deconvolution. We present MS-Deconv, a combinatorial algorithm for spectral deconvolution. The algorithm first generates a large set of candidate isotopomer envelopes for a spectrum, then represents the spectrum as a graph, and finally selects its highest scoring subset of envelopes as a heaviest path in the graph. In contrast with other approaches, the algorithm scores sets of envelopes rather than individual envelopes. We demonstrate that MS-Deconv improves on Thrash and Xtract in the number of correctly recovered monoisotopic masses and speed. We applied MS-Deconv to a large set of top-down spectra from Yersinia rohdei (with a still unsequenced genome) and further matched them against the protein database of related and sequenced bacterium Yersinia enterocolitica. MS-Deconv is available at

19 Reads
    • "In all cases, the peptides were extracted from the gel with 100% (v/v) acetonitrile, vacuum dried and resuspended in a deionized water solution containing (v/ v) 5% DMSO and 5% acetonitrile. All raw data files were processed into peak lists using the software ReAdW 4.3.1 and then deconvoluted using the program MS-deconv (Liu et al., 2010). The files generated from MS-deconv were analyzed by MASCOT (Matrix Sciences). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cysteine cathepsins are widely spread on living organisms associated to protein degradation in lysosomes, but some groups of Arthropoda (Heteroptera, Coleoptera, Crustacea and Acari) present these enzymes related to digestion of the meal proteins. Although spiders combine a mechanism of extra-oral with intracellular digestion, the sporadic studies on this subject were mainly concerned with the digestive fluid (DF) analysis. Thus, a more complete scenario of the digestive process in spiders is still lacking in the literature. In this paper we describe the identification and characterization of cysteine cathepsins in the midgut diverticula (MD) and DF of the spider Nephilengys cruentata by using enzymological assays. Furthermore, qualitative and quantitative data from transcriptomic followed by proteomic experiments were used together with biochemical assays for results interpretation. Five cathepsins L, one cathepsin F and one cathepsin B were identified by mass spectrometry, with cathepsins L1 (NcCTSL1) and 2 (NcCTSL2) as the most abundant enzymes. The native cysteine cathepsins presented acidic characteristics such as pH optima of 5.5, pH stability in acidic range and zymogen conversion to the mature form after in vitro acidification. NcCTSL1 seems to be a lysosomal enzyme with its recombinant form displaying acidic characteristics as the native ones and being inhibited by pepstatin. Evolutionarily, arachnid cathepsin L may have acquired different roles but its use for digestion is a common feature to studied taxa. Now a more elucidative picture of the digestive process in spiders can be depicted, with trypsins and astacins acting extra-orally under alkaline conditions whereas cysteine cathepsins will act in an acidic environment, likely in the digestive vacuoles or lysosome-like vesicles. Copyright © 2015. Published by Elsevier Ltd.
    Insect biochemistry and molecular biology 03/2015; 60. DOI:10.1016/j.ibmb.2015.03.005 · 3.45 Impact Factor
  • Source
    • "Each of these operations are not without potential pitfalls either. Tools such as MS-Deconv [51], YADA [52], Thrash [53] and Xtract (Thermo) can be used for generating a list of monoisotopic peaks, but it is not clear how well these algorithms perform with ETD or ECD, where the radical and non-radical fragment ions could hinder interpretation. In fact, just a simple determination of a monoisotopic peak that is readily accomplished in bottom-up proteomics remains challenging when done with larger peptides [54]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic regulation of gene expression is, at least in part, mediated by histone modifications. PTMs of histones change chromatin structure and regulate gene transcription, DNA damage repair, and DNA replication. Thus, studying histone variants and their modifications not only elucidates their functional mechanisms in chromatin regulation, but also provides insights into phenotypes and diseases. A challenge in this field is to determine the best approach(es) to identify histone variants and their PTMs using a robust high-throughput analysis. The large number of histone variants and the enormous diversity that can be generated through combinatorial modifications, also known as histone code, makes identification of histone PTMs a laborious task. MS has been proven to be a powerful tool in this regard. Here, we focus on bottom-up, middle-down, and top-down MS approaches, including CID and electron-capture dissociation/electron-transfer dissociation based techniques for characterization of histones and their PTMs. In addition, we discuss advances in chromatographic separation that take advantage of the chemical properties of the specific histone modifications. This review is also unique in its discussion of current bioinformatic strategies for comprehensive histone code analysis.
    Proteomics 03/2014; 14(4-5). DOI:10.1002/pmic.201300256 · 3.81 Impact Factor
  • Source
    • "One of the main advantages of our method over more simplistic pattern picking methods is the ability to disentangle isotope patterns of overlapping peptide signals, whose presence may lead to a significantly more challening pattern picking problem as e.g. discussed in [41] in the slightly different context of intact protein mass spectra. Therefore, a potential application for our approach will be the analysis of a certain class of posttranslational modifications, the deamidation of amino acid residues containing a carboxamide side chain functionality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlapping patterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either ‘noise’ or ‘signal’ lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context. Results We propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns. Conclusions We find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The R package IPPD implements the method and is available from the Bioconductor platform (
    BMC Bioinformatics 11/2012; 13(1):291. DOI:10.1186/1471-2105-13-291 · 2.58 Impact Factor
Show more