A parallel method for enumerating amino acid compositions and masses of all theoretical peptides

Department of Biochemistry and Molecular Biology, Sealy Center for Molecular Medicine, University of Texas Medical Branch, 301 University Blvd, Galveston, TX 77555, USA.
BMC Bioinformatics (Impact Factor: 2.67). 11/2011; 12:432. DOI: 10.1186/1471-2105-12-432
Source: PubMed

ABSTRACT Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.
We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at
We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.

  • [Show abstract] [Hide abstract]
    ABSTRACT: We studied the use of peak deviations for application in phosphoproteomics. Due to the differences in the mass defects, the peak deviations of samples containing mixtures of phosphorylated and nonphosphorylated peptides show bimodal distributions. The ratios of peak heights accurately predict the phosphoproteome content of a sample. In this work we apply a signal-processing tool, singular value decomposition (SVD), to reveal characteristic features of the phosphorylated, nonphosphorylated and mixed samples. We show that a simple application of SVD to the peak deviation (PD) matrix 1) detects transitions from mostly phosphorylated samples to mostly nonphosphorylated samples, 2) reveals modes of low-abundance species in the presence of the high-abundance species (e.g., phosphorylated peptides), and 3) simplifies the interpretation of the clustering of a covariance matrix obtained from PDs. As the eigenfunctions of the inner-product of the data matrix (made from the PDs) are Hermite functions, we observe a change of sign in the transition from samples enriched in phosphorylated peptides to samples containing fewer phosphorylated peptides. The ordering of the singular values of the data matrix points in the direction of changes to the phosphorylation content. No peptide identifications from a database were used for this study. This article is protected by copyright. All rights reserved.
    Electrophoresis 07/2014; 35(24). DOI:10.1002/elps.201400053 · 3.16 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A wide variety of cyclic molecular architectures are built of modular subunits and can be formed combinatorially. The mathematics for enumeration of such objects is well-developed yet lacks key features of importance in chemistry, such as specifying (i) the structures of individual members among a set of isomers, (ii) the distribution (i.e., relative amounts) of products, and (iii) the effect of nonequal ratios of reacting monomers on the product distribution. Here, a software program (Cyclaplex) has been developed to determine the number, identity (including isomers), and relative amounts of linear and cyclic architectures from a given number and ratio of reacting monomers. The program includes both mathematical formulas and generative algorithms for enumeration; the latter go beyond the former to provide desired molecular-relevant information and data-mining features. The program is equipped to enumerate four types of architectures: (i) linear architectures with directionality (macroscopic equivalent = electrical extension cords), (ii) linear architectures without directionality (batons), (iii) cyclic architectures with directionality (necklaces), and (iv) cyclic architectures without directionality (bracelets). The program can be applied to cyclic peptides, cycloveratrylenes, cyclens, calixarenes, cyclodextrins, crown ethers, cucurbiturils, annulenes, expanded meso-substituted porphyrin(ogen)s, and diverse supramolecular (e.g., protein) assemblies. The size of accessible architectures encompasses up to 12 modular subunits derived from 12 reacting monomers or larger architectures (e.g. 13-17 subunits) from fewer types of monomers (e.g. 2-4). A particular application concerns understanding the possible heterogeneity of (natural or biohybrid) photosynthetic light-harvesting oligomers (cyclic, linear) formed from distinct peptide subunits.
    Journal of Chemical Information and Modeling 08/2013; 53(9). DOI:10.1021/ci400175f · 4.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The high mass accuracy and resolution of modern mass spectrometers provides new opportunities to employ theoretical peptide distributions in large-scale proteomic studies. We used theoretical distributions to study noise filtering, mass measurement errors and to examine mass-based differentiation of phosphorylated and nonphosphorylated peptides. Only the monoisotopic mass of the experimental precursor ion was necessary for this analysis. We found that peak deviations can be used to characterize the modification states of peptides in a sample. When applied to largescale proteomic datasets, the peak deviation distribution can be used to filter chemical/electronic noise for singly charged species. Using peak deviation distributions, it is possible to separate the phosphorylated peptides from the non-phosphorylated peptides, enabling evaluation of the phosphoproteome content of a sample. Because this approach is simple, with light computational requirements, the analysis of theoretical peptide distributions has a significant potential for application to phosphoproteome analyses. For our studies we used publicly available datasets from three large-scale proteomic studies.
    Journal of Proteome Research 06/2013; 12(7). DOI:10.1021/pr4003382 · 5.00 Impact Factor

Full-text (3 Sources)

Available from
Sep 11, 2014