Conference Paper

An Architecture for Real Time Data Acquisition and Online Signal Processing for High Throughput Tandem Mass Spectrometry.

DOI: 10.1109/e-Science.2009.21 Conference: Fifth International Conference on e-Science, e-Science 2009, 9-11 December 2009, Oxford, UK
Source: DBLP

ABSTRACT Independent, greedy collection of data events using simple heuristics results in massive over-sampling of the prominent data features in large-scale studies over what should be achievable through ¿intelligent", online acquisition of such data. As a result, data generated are more aptly described as a collection of a large number of small experiments rather than a true large-scale experiment. Nevertheless, achieving ¿intelligent¿, online control requires tight interplay between state-of-the-art, data-intensive computing infrastructure developments and analytical algorithms. In this paper, we propose a Software Architecture for Mass spectrometry-based Proteomics coupled with Liquid chromatography Experiments (SAMPLE) to develop an ¿intelligent¿ online control and analysis system to significantly enhance the information content from each sensor (in this case, a mass spectrometer). Using online analysis of data events as they are collected and decision theory to optimize the collection of events during an experiment, we aim to maximize the information content generated during an experiment by the use of pre-existing knowledge to optimize the dynamic collection of events.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reliable identification of posttranslational modifications is key to understanding various cellular regulatory processes. We describe a tool, InsPecT, to identify posttranslational modifications using tandem mass spectrometry data. InsPecT constructs database filters that proved to be very successful in genomics searches. Given an MS/MS spectrum S and a database D, a database filter selects a small fraction of database D that is guaranteed (with high probability) to contain a peptide that produced S. InsPecT uses peptide sequence tags as efficient filters that reduce the size of the database by a few orders of magnitude while retaining the correct peptide with very high probability. In addition to filtering, InsPecT also uses novel algorithms for scoring and validating in the presence of modifications, without explicit enumeration of all variants. InsPecT identifies modified peptides with better or equivalent accuracy than other database search tools while being 2 orders of magnitude faster than SEQUEST, and substantially faster than X!TANDEM on complex mixtures. The tool was used to identify a number of novel modifications in different data sets, including many phosphopeptides in data provided by Alliance for Cellular Signaling that were missed by other tools.
    Analytical Chemistry 08/2005; 77(14):4626-39. · 5.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Determining isotopic clusters and their monoisotopic masses is a first step in interpreting complex mass spectra generated by high-resolution mass spectrometers. We propose a mathematical model for isotopic distributions of polypeptides and an effective interpretation algorithm. Our model uses two types of ratios: intensity ratio of two adjacent peaks and intensity ratio product of three adjacent peaks in an isotopic distribution. These ratios can be approximated as simple functions of a polypeptide mass, the values of which fall within certain ranges, depending on the polypeptide mass. Given a spectrum as a peak list, our algorithm first finds all isotopic clusters consisting of two or more peaks. Then, it scores clusters using the ranges of ratio functions and computes the monoisotopic masses of the identified clusters. Our method was applied to high-resolution mass spectra obtained from a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer coupled to reverse-phase liquid chromatography (RPLC). For polypeptides whose amino acid sequences were identified by tandem mass spectrometry (MS/MS), we applied both THRASH-based software implementations and our method. Our method was observed to find more masses of known peptides when the numbers of the total clusters identified by both methods were fixed. Experimental results show that our method performed better for isotopic mass clusters of weak intensity where the isotopic distributions deviate significantly from their theoretical distributions. Also, it correctly identified some isotopic clusters that were not found by THRASH-based implementations, especially those for which THRASH gave 1 Da mismatches. Another advantage of our method is that it is very fast, much faster than THRASH that calculates the least-squares fit.
    Analytical Chemistry 09/2008; 80(19):7294-303. · 5.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An enabling capability for proteomics would be the ability to study protein expression on a global scale. While several different separation and analysis options are being investigated to advance the practice of proteomics, mass spectrometry (MS) is rapidly becoming the core instrumental technology used to characterize the large number of proteins that constitute a proteome. To be most effective, proteomic measurements must be high-throughput, ideally allowing thousands of proteins to be identified on a time scale of hours. Most strategies of identification by MS rely on the analysis of enzymatically produced peptides originating from an isolated protein followed by either peptide mapping or tandem MS (MS/MS) to obtain sequence information for a single peptide. In the case of peptide mapping, several peptide masses are needed to unambiguously identify a protein with the typically achieved mass measurement accuracies (MMA). The ability to identify proteins based on the mass of a single peptide (i.e., an accurate mass tag; AMT) is proposed and is largely dependent on the MMA that can be achieved. To determine the MMA necessary to enable the use of AMTs for proteome-wide protein identification, we analyzed the predicted proteins and their tryptic fragments from Saccharomyces cerevisiae and Caenorhabditis elegans. The results show that low ppm (i.e., approximately 1 ppm) level measurements have practical utility for analysis of small proteomes. Additionally, up to 85% of the peptides predicted from these organisms can function as AMTs at sub-ppm MMA levels attainable using Fourier transform ion cyclotron resonance MS. Additional information, such as sequence constraints, should enable even more complex proteomes to be studied at more modest mass measurement accuracies. Once AMTs are established, subsequent high-throughput measurements of proteomes (e.g., after perturbations) will be greatly facilitated.
    Analytical Chemistry 08/2000; 72(14):3349-54. · 5.83 Impact Factor