A parallel method for enumerating amino acid compositions and masses of all theoretical peptides

Department of Biochemistry and Molecular Biology, Sealy Center for Molecular Medicine, University of Texas Medical Branch, 301 University Blvd, Galveston, TX 77555, USA.
BMC Bioinformatics (Impact Factor: 2.58). 11/2011; 12(1):432. DOI: 10.1186/1471-2105-12-432
Source: PubMed


Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.
We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at
We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.

Download full-text


Available from: Rovshan G Sadygov, Sep 11, 2014
8 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Improvements in the mass accuracy and resolution of mass spectrometers have greatly aided mass spectrometry-based proteomics in profiling complex biological mixtures. With the use of innovative bioinformatics approaches, high mass accuracy and resolution information can be used for filtering chemical noise in mass spectral data. Using our recent algorithmic developments, we have generated the mass distributions of all theoretical tryptic peptides composed of 20 natural amino acids and with masses limited to 3.5 kDa. Peptide masses are distributed discretely, with well-defined peak clusters separated by empty or sparsely populated trough regions. Accurate models for peak centers and widths can be used to filter peptide signals from chemical noise. We modeled mass defects, the difference between monoisotopic and nominal masses, and peak centers and widths in the peptide mass distributions. We found that peak widths encompassing 95% of all peptide sequences are substantially smaller than previously thought. The result has implications for filtering out larger stretches of the mass axis. Mass defects of peptides exhibit an oscillatory behavior which is damped at high mass values. The periodicity of the oscillations is about 14 Da which is the most common difference between the masses of the 20 natural amino acids. To determine the effects of amino acid modifications on our findings, we examined the mass distributions of peptides composed of the 20 natural amino acids, oxidized Met, and phosphorylated Ser, Thr, and Tyr. We found that extension of the amino acid set by modifications increases the 95% peak width. Mass defects decrease, reflecting the fact that the average mass defect of natural amino acids is larger than that of oxidized Met. We propose that a new model for mass defects and peak widths of peptides may improve peptide identifications by filtering chemical noise in mass spectral data.
    Analytical Chemistry 03/2012; 84(6):3026-32. DOI:10.1021/ac203255e · 5.64 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A comprehensive investigation was performed to understand the influence of sequence scrambling in peptide ions on peptide identification results. To achieve this, four tandem mass spectrometry datasets with scrambled ions included and with them excluded were analyzed by Crux, X!Tandem, SpectraST, Lutefisk, and PepNovo. While the different algorithms differed in their performance, an increase in the number of correctly identified peptides was generally observed when removing scrambled ions, with the exception of the SpectraST algorithm. However, the variation of the match scores upon removal was unpredictable. Following these investigations, an interpretation was given on how the scrambled ions affect peptide identification. Lastly, a simulated theoretical mass spectral library derived from the NIST peptide Libraries was constructed and searched by SpectraST to study whether scrambled ions in predicted mass spectra could affect peptide identification. Consistent with the peptide library search results, no significant variations for dot product scores as well as peptide identification results were observed when these ions were included in the theoretical MS/MS spectra. From the five adopted algorithms, the SpectraST and Crux provided the most robust results, whereas X!Tandem, PepNovo, and Lutefisk were sensitive to the existence of the scrambled ions, especially the latter two de novo sequencing algorithms.
    Journal of the American Society for Mass Spectrometry 03/2013; 24(6). DOI:10.1007/s13361-013-0591-3 · 2.95 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The high mass accuracy and resolution of modern mass spectrometers provides new opportunities to employ theoretical peptide distributions in large-scale proteomic studies. We used theoretical distributions to study noise filtering, mass measurement errors and to examine mass-based differentiation of phosphorylated and nonphosphorylated peptides. Only the monoisotopic mass of the experimental precursor ion was necessary for this analysis. We found that peak deviations can be used to characterize the modification states of peptides in a sample. When applied to largescale proteomic datasets, the peak deviation distribution can be used to filter chemical/electronic noise for singly charged species. Using peak deviation distributions, it is possible to separate the phosphorylated peptides from the non-phosphorylated peptides, enabling evaluation of the phosphoproteome content of a sample. Because this approach is simple, with light computational requirements, the analysis of theoretical peptide distributions has a significant potential for application to phosphoproteome analyses. For our studies we used publicly available datasets from three large-scale proteomic studies.
    Journal of Proteome Research 06/2013; 12(7). DOI:10.1021/pr4003382 · 4.25 Impact Factor
Show more