Article

Web and Database Software for Identification of Intact Proteins Using “Top Down” Mass Spectrometry

Authors:
  • LabKey Software
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

For the identification and characterization of proteins harboring posttranslational modifications (PTMs), a "top down" strategy using mass spectrometry has been forwarded recently but languishes without tailored software widely available. We describe a Web-based software and database suite called ProSight PTM constructed for large-scale proteome projects involving direct fragmentation of intact protein ions. Four main components of ProSight PTM are a database retrieval algorithm (Retriever), MySQL protein databases, a file/data manager, and a project tracker. Retriever performs probability-based identifications from absolute fragment ion masses, automatically compiled sequence tags, or a combination of the two, with graphical rendering and browsing of the results. The database structure allows known and putative protein forms to be searched, with prior or predicted PTM knowledge used during each search. Initial functionality is illustrated with a 36-kDa yeast protein identified from a processed cell extract after automated data acquisition using a quadrupole-FT hybrid mass spectrometer. A +142-Da delta(m) on glyceraldehyde-3-phosphate dehydrogenase was automatically localized between Asp90 and Asp192, consistent with its two cystine residues (149 and 153) alkylated by acrylamide (+71 Da each) during the gel-based sample preparation. ProSight PTM is the first search engine and Web environment for identification of intact proteins (https://prosightptm.scs.uiuc.edu/).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As Top Down proteomics continues to increase in throughput and complexity of the samples analyzed, it is clear that a software platform must allow for fast, automated processing of raw data. ProSight PTM was the first search engine and web application designed for the identification of intact proteins [119,120]. In absolute mass searching (Fig. 1.4), the software uses the precursor mass and mass tolerance window to generate a possible list of candidates from a larger annotated database. ...
... A P-score is calculated for each hit, representing the probability that a random sequence could account for the matching ions [121]. Sequence tag searches can also be performed, allowing for identification of proteins based on amino acid mass differences from the fragmentation data ( Fig. 1.4) [119,120]. An updated version, ProSight PTM 2.0, included the ability to include fixed modifications (e.g. ...
Article
The rise of the “Top Down” method in the field of mass spectrometry-based proteomics has ushered in a new age of promise and challenge for the characterization and identification of proteins. Injecting intact proteins into the mass spectrometer allows for better characterization of post-translational modifications and avoids several of the serious “inference” problems associated with peptide-based proteomics. However, successful implementation of a Top Down approach to endogenous or other biologically relevant samples often requires the use of one or more forms of separation prior to mass spectrometric analysis, which have only begun to mature for whole protein MS. Recent advances in instrumentation have been used in conjunction with new ion fragmentation using photons and electrons that allow for better (and often complete) protein characterization on cases simply not tractable even just a few years ago. Finally, the use of native electrospray mass spectrometry has shown great promise for the identification and characterization of whole protein complexes in the 100 kDa to 1 MDa regime, with prospects for complete compositional analysis for endogenous protein assemblies a viable goal over the coming few years.
... In the spectrum of the peptide RLEpTR (Fig. 2B), the phosphothreonine residue can be identified by comparing the masses between y 1 and y 2 ions or between b 3 and b 4 ions. The peptide VSSDGHEpYIYVDPMQLPY is relatively large but there is no difficulty in determining which one of three tyrosine residues is phosphorylated (Fig. 2C) by comparing the b 7 and b 8 ions, and noting that all ions from b 8 to b 16 include phosphate. A peptide may carry more than one phosphorylated residue (Fig. 3). ...
... At the same time, other product ions were observed in the mass range from 450 to 2200 Da with high signal/noise ratio. The detection of y 6 and y 7 ions, containing no mass increase by phosphate group, suggests that the serine residue at the fourth C-terminal position is not modified whereas the presence of b 16 , y 16 , and y 17 indicates that all other four serines were phosphorylated. ...
Article
Full-text available
Reversible phosphorylation is one of the most important posttranslational modifications of cellular proteins. Mass spectrometry is a widely used technique in the characterization of phosphorylated proteins and peptides. Similar to nonmodified peptides, sequence information for phosphopeptides digested from proteins can be obtained by tandem mass analysis using either electrospray ionization or matrix assisted laser desorption/ionization (MALDI) mass spectrometry. However, the facile loss of neutral phosphoric acid (H3PO4) or HPO3 from precursor ions and fragment ions hampers the precise determination of phosphorylation site, particularly if more than one potential phosphorylation site or concensus sequence is present in a given tryptic peptide. Here, we investigated the fragmentation of phosphorylated peptides under laser-induced dissociation (LID) using a MALDI-time-of-flight mass spectrometer with a curved-field reflectron. Our data demonstrated that intact fragments bearing phosphorylated residues were produced from all tested peptides that contain at least one and up to four phosphorylation sites at serine, threonine, or tyrosine residues. In addition, the LID of phosphopeptides derivatized by N-terminal sulfonation yields simplified MS/MS spectra, suggesting the combination of these two types of spectra could provide an effective approach to the characterization of proteins modified by phosphorylation.
... Since a proteoform database contains possible proteoform sequences, databasebased methods match an MS with the proteoform sequences in the database, which can quickly identify proteoforms. Extended proteoform database methods include ProSight [54,[67][68][69][70], Mas-cotTD [71], BUPID-Top-Down, ProteinGoggle [72], and so on. ProSight team developed a series of proteoform characterization tools based on proteoform databases such as ProSightPTM, ProSightPC, ProSightPD, and ProSight Lite. ...
Article
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
... For example, anion exchange chromatography has been applied preceding the online reverse phase LC-FT-ICR-MS for the TD proteomic analysis of Shewanella oneidensis MR-1 and Saccharomyces cerevisiae. 58,248 In the case of Shewanella oneidensis MR-1, THRASH algorithm 249 was used to process and analyze the TD mass spectral data, whereas ProSightPC and ProSight PTM 250 were utilized for analyzing the TD data acquired on yeast. The deconvolution and deisotoping processes are incorporated within the THRASH algorithm and ProSight PTM. ...
Article
Full-text available
Enhanced sequence coverage, better identification of combinatorial co-occurring PTMs and improved detection of proteoforms are key highlights of middle-down approach and hence, this can be a promiscuous approach for protein sequencing and proteomics.
... data with software tools have been developed for proteoform characterization and visualization [27][28][29][30][31][32][33][34] . Recent efforts in translational TDP have applied the aforementioned technological improvements to the proteomic characterization of human tissues [35][36][37][38] , cerebrospinal fluid 39 , saliva 40,41 , and plasma and pleural effusions 42 . ...
Article
Full-text available
Top-down proteomics (TDP) by mass spectrometry (MS) is a technique by which intact proteins are analyzed. It has become increasingly popular in translational research because of the value of characterizing distinct proteoforms of intact proteins. Compared to bottom-up proteomics (BUP) strategies, which measure digested peptide mixtures, TDP provides highly specific molecular information that avoids the bioinformatic challenge of protein inference. However, the technique has been difficult to implement widely because of inherent limitations of existing sample preparation methods and instrumentation. Recent improvements in proteoform pre-fractionation and the availability of high-resolution benchtop mass spectrometers have made it possible to use high-throughput TDP for the analysis of complex clinical samples. Here, we provide a comprehensive protocol for analysis of a common sample type in translational research: human peripheral blood mononuclear cells (PBMCs). The pipeline comprises multiple workflows that can be treated as modular by the reader and used for various applications. First, sample collection and cell preservation are described for two clinical biorepository storage schemes. Cell lysis and proteoform pre-fractionation by gel-eluted liquid fractionation entrapment electrophoresis are then described. Importantly, instrument setup and liquid chromatography–tandem MS are described for TDP analyses, which rely on high-resolution Fourier-transform MS. Finally, data processing and analysis are described using two different, application-dependent software tools: ProSight Lite for targeted analyses of one or a few proteoforms and TDPortal for high-throughput TDP in discovery mode. For a single sample, the minimum completion time of the entire experiment is 72 h.
... Now, there is a panel of software tools available for top-down proteomics (PIITA, MASH Suite, MS-Align+, MS-Deconv, ProSight PTM 2.0, Pro-teinGoggle and others). ProSight PTM was the first one designed for the identification of intact proteins [40,41]. Also BIG Mascot or MascotTD that utilizes the popular Bottom Up software platform, Mascot, was applied for top-down analysis by doing some adjustments (extending the precursor mass cutoff to 110 kDa, for instance, as a precursor ion limit with this cutoff is much higher than in case of bottom-up analysis) [42]. ...
Article
Biological significance: The systematic efforts in the Human Proteome project to map the entire human proteome greatly depend on currently available and emerging techniques and approaches. Here, the possibilities of a visual representation of the human proteome by combination of virtual/experimental 2-DE with protein identification by mass spectrometry or immunologically is discussed. By application of this approach on several profiles of gene products we show its convenience in informative representation of the whole proteome and single gene products, proteoforms (protein species). This approach could be very helpful in the emerging global inventory of all human proteoforms.
... These types of data processing tools are important for detecting remote sequence homologies [44], sub-cellular localization [55], and protein-protein interactions [56]. Additionally, Kelleher's research group have contributed developments in data processing (Cscore and ProSight PTM) that improve interpretation of TDP MS data [57][58][59][60][61]. In conjunction with algorithms such as THRASH [62], the high throughput classification of large molecules is facilitated. ...
Article
Top-down proteomics (TDP) has great potential for high throughput proteoform characterization. With significant advances in mass spectrometry (MS) instrumentation permitting tandem MS of large intact proteins, a limitation to the widespread adoption of TDP still resides on front-end sample preparation protocols (e.g. fractionation, purification) that are amenable to MS analysis of intact proteins. Chromatographic strategies are improving but pose higher risk of sample loss. Gel-based separations (e.g. GELFrEE) may alleviate this concern but at the expense of requiring sodium dodecyl sulfate (SDS). While this surfactant maintains protein solubility during fractionation, the advantage is short-lived, as the detergent must ultimately be depleted to avoid MS signal suppression. To do so requires overcoming strong interactions between SDS and protein. Adding to the challenge, one must now consider upholding the solubility of purified protein(s) in the absence of SDS. This review explores uses of SDS in TDP workflows, addressing front-end strategies that reduce matrix interferences while maximizing recovery of intact proteins in MS-compatible formats. Significance The benefits of employing SDS in a TPD workflow can easily outweigh the disadvantages. Several SDS depletion strategies are available, though not all are equally amenable to TDP. This review provides a comprehensive and critical accounting of SDS in TDP, demonstrating methods that are suited to MS analysis of intact proteins.
... ProSight specifically uses a candidate expansion method referred to as "shotgun annotation", combining data from diverse sources regarding potential mass differences-such as polymorphisms, alternate splicing, and PTMs-to assist protein characterization. The user can optionally control how much biological variability should be searched [34,35]. Other software dedicated to TDP data acquisition and interpretation include MASH Suite Pro (http://crb.wisc.edu/yinglab/software. ...
Article
Full-text available
Proteomics is a field of growing importance in animal and aquatic sciences. Similar to other proteomic approaches, top-down proteomics is slowly making its way within the vast array of proteomic approaches that researchers have access to. This opinion and mini-review article is dedicated to top-down proteomics and how its use can be of importance to animal and aquatic sciences. Herein, we include an overview of the principles of top-down proteomics and how it differs regarding other more commonly used proteomic methods, especially bottom-up proteomics. In addition, we provide relevant sections on how the approach was or can be used as a research tool and conclude with our opinions of future use in animal and aquatic sciences.
... There is a huge demand for software that used for web-based identification and characterization of proteins by direct comparison of parent and fragment ions masses against elucidated proteomic databases. The first search engine that used for top-down identification of proteins was ProSight [127,128]. This software is a combination of search engines and a browser environment for the analysis of fragments above 10 kDa. ...
Article
Full-text available
Now-a-days, top-down proteomics (TDP) is a booming approach for the analysis of intact proteins and it is attaining significant interest in the field of protein biology. The term has emerged as an alternative to the well-established, bottom-up strategies for analysis of peptide fragments derived from either enzymatically or chemically digestion of intact proteins. TDP is applied to mass spectrometric analysis of intact large biomolecules that are constituents of protein complexes and assemblies. This article delivers an overview of the methodologies in top-down mass spectrometry, mass spectrometry instrumentation and an extensive review of applications covering the venomics, biomedical research, protein biology including the analysis of protein post-translational modifications (PTMs), protein biophysics, and protein complexes. In addition, limitations of top-down proteomics, challenges and future directions of TDP are also discussed.
... Within the matrix, mitochondrial DNA is polyplasmic but, for example in human cells, it only codes for a small (12S) and large (16S) ribosomal RNA (rRNA), 22 transfer RNAs (tRNA) and 13 polypeptides, all of which are components of the oxidative phosphorylation system. Computational studies and proteomic approaches estimate the human mitochondrial genome contains 1500 to 2000 proteins (Taylor et al. 2003;Smith et al. 2012;Fukasawa et al. 2015). Therefore, a compelling outcome that encompassed the migration of mitochondrial genes to the nuclear genome was the development of protein import mechanisms. ...
Article
Full-text available
The discovery of very large channels in the two membranes of mitochondria represented an astonishing finding and a turning point in the awareness of these conspicuous energy-generating organelles. Sizable channels are at the crossroads of important cellular pathways and mitochondrial functions like biogenesis, signaling, secretion, compartmentalization or apoptosis. The integrative approach that combines electrophysiological methods with biochemical and genetic alterations has been decisive to tackle the structure-function relationship of mitochondrial mega-channels. In this review we will give a short account of our joint effort to correlate the existence of large conductance channels in the two membranes of mitochondria with a precise function. In particular, we will focus on the import of proteins and nucleic acids. An analysis of the character of the aqueous pores through which these two types of macromolecules enter mitochondria has been attained, and an up-to date survey of the developments reached in these investigations will be presented. An overlook of the import pathways for proteins and nucleic acids into mitochondria will be outlined. Although this research area is rapidly developing, many issues remain shrouded in uncertainties. A special emphasis will be prone to the not yet entirely settled synergies between different protein translocases.
... An increase in development of software for interpretation of this type of data is becoming more evident. Kelleher et al. [39] is an example of a research group developing intact mass spectrometric software. ...
... Spectral database correlation using ProSightPC relies upon the conversion of product ions to neutral monoisotopic masses through charge-state determination/deconvolution and deisotoping. (31,44,45) Critical to the correct assignment of monoisotopic masses is the ability to generate product ions with isotope distributions that match theoretical isotope distributions. (46)(47)(48)(49)(50)(51) Determination of monoisotopic masses is confounded by low S/N and poor ion statistics. ...
Article
We describe and characterize an improved implementation of ETD on a modified hybrid linear ion trap-Orbitrap instrument. Instead of performing ETD in the mass-analyzing quadrupole linear ion trap (A-QLT), the instrument collision cell was modified to enable ETD. We partitioned the collision cell into a multi-section rf ion storage and transfer device to enable injection and simultaneous separate storage of precursor and reagent ions. Application of a secondary (axial) confinement voltage to the cell end lens electrodes enables charge-sign independent trapping for ion–ion reactions. The approximately 2-fold higher quadrupole field frequency of this cell relative to that of the A-QLT enables higher reagent ion densities and correspondingly faster ETD reactions, and, with the collision cell’s longer axial dimensions, larger populations of precursor ions may be reacted. The higher ion capacity of the collision cell permits the accumulation and reaction of multiple full loads of precursor ions from the A-QLT followed by FT Orbitrap m/z analysis of the ETD product ions. This extends the intra-scan dynamic range by increasing the maximum number of product ions in a single MS/MS event. For analyses of large peptide/small protein precursor cations, this reduces or eliminates the need for spectral averaging to achieve acceptable ETD product ion signal-to-noise levels. Using larger ion populations, we demonstrate improvements in protein sequence coverage and aggregate protein identifications in LC-MS/MS analysis of intact protein species as compared to the standard ETD implementation. Figure ᅟ
... For example, software for protein sequencing using a "top-down" approach and identification directly from intact proteins (ProSight PTM) has been published and made publicly available (https:// prosightptm.scs.uiuc.edu/) by Kelleher's group (Taylor et al., 2003b). Efforts to make protein database search algorithms more efficient and reliable based on the "bottomup" approach are progressing at a fast pace. ...
Article
Mass spectrometry has become a major tool in the study of proteomes. The analysis of proteolytic peptides and their fragment ions by this technique enables the identification and quantitation of the precursor proteins in a mixture. However, deducing chemical structures and then protein sequences from mass-to-charge ratios is a challenging computational task. Software tools incorporating powerful algorithms and statistical methods improved our ability to process the large quantities of proteomics data. Repositories of spectral data make both data analysis and experimental design more efficient. New approaches in quantitative and statistical proteomics make possible a greater coverage of the proteome, the identification of more post-translational modifications, and a greater sensitivity in the quantitation of targeted proteins. Curr. Protoc. Bioinform. 41:13.21.1-13.21.17. © 2013 by John Wiley & Sons, Inc.
... high mass resolving power to resolve the overlapping fragment ions and mass accuracy for high confidence) and for data processing compared to the analysis of peptides obtained following protein digestion. Mostly, high-resolution Fourier transform ion cyclotron resonance (FTICR) is applied for high mass accuracy measurement of intact protein ion mass (mass errors < 2 ppm) [29,[32][33] with a specialized software to perform the data analysis [71][72]. For protein mixtures, the isolation of a single molecular ion for MS/MS analysis can provide protein identification of far higher reliability and confidence and can directly characterize amino acid sequence errors and variations [73]. ...
Article
Protein isoforms/splice variants can play important roles in various biological processes and can potentially be used as biomarkers or therapeutic targets/mediators. Thus, there is a need for efficient and, importantly, accurate methods to distinguish and quantify specific protein isoforms. Since protein isoforms can share a high percentage of amino acid sequence homology and dramatically differ in their cellular concentration, the task for accuracy and efficiency in methodology and instrumentation is challenging. The analysis of intact proteins has been perceived to provide a more accurate and complete result for isoform identification/quantification in comparison to analysis of the corresponding peptides that arise from protein enzymatic digestion. Recently, novel approaches have been explored and developed that can possess the accuracy and reliability important for protein isoform differentiation and isoform-specific peptide targeting. In this review, we discuss the recent development in methodology and instrumentation for enhanced detection of protein isoforms as well as the examples of their biological importance.
... The " Top-Down " approach consists of the MS analysis of intact proteins and represents one of the new directions in proteomics. PTMs have been characterized in a " Top-Down " approach [215] based on searching of protein fragment ions against databases annotated with both native and PTMmodified proteins (from the Resid nomenclature). The " absolute mass search " function matched protein, b/y and c/z masses to database entries, and derived probability scores based on the Poisson model. ...
Article
Full-text available
With the advent of whole genome sequencing, large-scale proteomics has rapidly come to dominate the post-genomic age. As such, tandem mass spectrometry has emerged as the most promising and powerful technique in this area but analysis of raw spectra remains one of the principle bottlenecks to making effective use of the technology. Analytical approaches for identifying proteins from MS/MS data fall into two categories: comparing measured fragment spectra to theoretical spectra from sequence databases and de novo peptide sequencing. Available methods still have weaknesses, highlighting the need for new powerful algorithms that are able to exploit the enormous volume of data generated by pro-teomic experiments. Recent efforts have also been directed towards the identification of post-translational modifications, biomarker discovery and quantitative proteomics. Overall, the intended goal of this review is to give as thorough as possi-ble an overview of state-of-the-art approaches and tools developed to analyze tandem mass spectra in different fields and discuss future directions aimed at overcoming the limits of present methods.
... The protein is characterized by matching the molecular and fragment ion mass values with the database entries. A ProSight algorithm has been developed for this purpose (Taylor et al., 2003). ...
Article
Phosphorylation is one of the most important and ubiquitous modifications in eukaryotic cells. This covalent modification is a major signaling pathway in living beings. A vast array of cellular events, such as proliferation, differen-tiation, metabolism, signal transduction, and adaptation to environmental stress, and the function of many proteins, hor-mones, neurotransmitters, and enzymes, are triggered by phosphorylation. For understanding highly interconnected regu-latory network, it is essential to identify and quantify phosphoproteins in biological specimens. Currently, this task is ac-complished by mass spectrometry-driven phosphoproteomics. This article outlines recent developments in the analysis of phosphoproteins, specifically, the enrichment, detection, identification, and quantification of phosphopeptides/phos-phoproteins.
Article
Full-text available
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Preprint
Full-text available
Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse methods for scoring proteoform-spectral matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification through three distinct challenges. The first is identifying a large yield of PrSMs while controlling false discovery rate (FDR) in identifying thousands of proteoforms from complex cell lysates via four software workflows: ProSight Proteome Discoverer, TopPIC, Informed Proteomics, and pTop. The second is the deconvolution of data from both Thermo Orbitrap-class and Bruker maXis Q-TOF instruments to produce consistent precursor charge and mass determinations while generating fragment mass lists to optimize identification. The third attempts to detect diverse post-translational modifications (PTMs) in proteoforms from bovine milk and human ovarian tissue. The data demonstrate that existing software suites produce admirable sensitivity, in some cases identifying a third of collected MS/MS with FDR controlled below 2%; the overlap in these PrSMs, however, illustrates real value in searching data with multiple search engines. Differences among identification workflows seem to result from each search algorithm incorporating its own deconvolution algorithm. By transmitting deconvolution data from multiple deconvolution routes (Thermo Xtract, Bruker Auto MSn, Mascot Distiller, TopFD, and FLASHDeconv) to the downstream TopPIC search algorithm, we were able to detect common causes of deconvolution disagreement. The detection of PTMs was very inconsistent among search algorithms, with some workflows suggesting as little as 1% of PrSMs from bovine milk were singly-phosphorylated while other workflows found that 18% of PrSMs were singly-phosphorylated. Taken together, these results make a strong argument for top-down researchers to adopt a standard practice of analyzing each MS/MS experiment with at least two different search engines.
Article
Full-text available
Protein fragmentation is a critical component of top-down proteomics, enabling gene-specific protein identifi-cation and full proteoform characterization. The factors that influence protein fragmentation include precursor charge, structure, and primary sequence which have been explored extensively for collision-induced dissociation (CID). Recently, noticeable differences in CID-based fragmentation were reported for native versus denatured proteins, motivating the need for scoring metrics that are tailored specifically to native top-down mass spectrometry (nTDMS). To this end, position and intensity were tracked for 10,252 fragment ions produced by higher-energy collisional dissociation (HCD) of 159 native monomers and 70 complexes. We used published structural data to explore the relationship between fragmentation and protein topology and revealed that fragmentation events occur at a large range of relative residue solvent accessibility. Ad-ditionally, our analysis found that fragment ions at sites with an N-terminal aspartic acid or a C-terminal proline make up on average 40% and 27%, respectively, of the total matched fragment ion intensity in nTDMS. Percent intensity contributed by each amino acid was determined and converted into weights to (1) update the previously published C-score, and (2) con-struct a native Fragmentation Propensity Score (nFPS). Both scoring systems showed an improvement in protein identifica-tion or characterization in comparison to traditional methods, and overall increased confidence in results with fewer matched fragment ions but with high probability nTDMS fragmentation patterns. Given the rise of nTDMS as a tool for struc-tural mass spectrometry, we forward these scoring metrics as new methods to enhance analysis of nTDMS data.
Article
As metabolism impacts the efficacy and safety of therapeutic peptides and proteins (TPPs), understanding of the metabolic fate of TPPs is critical for their preclinical and clinical development. Despite the continued increase of new TPPs entering clinical trials, the metabolite identification (MetID) of these emerging modalities remains challenging. In the present study, we report an analytical workflow for MetID of TPPs. Using insulin detemir as an example, we demonstrated that top-down differential mass spectrometry (dMS) was able to distinguish and discover metabolites from complex biological matrices. For structural interpretation, we developed an algorithm to generate a complete and non-redundant theoretical metabolite database for a TPP of any topology (e.g. branched, multi-cyclic etc.). Candidate structures of a metabolite were obtained by matching the monoisotopic mass of a dMS feature to the theoretical metabolite database. Finally, the MS/MS sequence tags enabled unambiguous characterization of metabolite structures when isobaric/isomeric candidates were present. This platform is widely applicable to TPPs with complex structures and will ultimately guide the structural optimization of TPPs in pharmaceutical development.
Article
Seminal plasma is a critical and complex fluid that carries sperm to eggs to initiate the fertilization process. Here, we present a top-down mass spectrometry (TDMS) strategy for identifying proteins and posttranslational modifications (PTMs) in bovine seminal plasma. In this study, proteins were separated using sheathless capillary zone electrophoresis (CZE)-MS and reversed-phase liquid chromatography (LC)-MS, and then fragmented using electron-transfer/higher-energy collisional dissociation (EThcD) and 213 nm ultraviolet photodissociation (213 nm UVPD) to provide more comprehensive information about the proteomic landscape of this biological fluid. Four hundred and seventeen proteoforms were identified by sheathless CZE-MS, and one hundred and seventy-two species were unique to this method. LC-MS identified 3090 proteoforms, including 1707 unique species. All identifications were within ±10 ppm (mass error) and with a P-score ≤ 1E-04. Pooling results (triplicate measurements) from sheathless CZE-MS and LC-MS resulted in the identification of 1433 species (EThcD) and 2156 species (213 nm UVPD) with 612 species unique for EThcD and 1021 for 213 nm UVPD. The average sequence coverage was found to be higher for EThcD (28%) than for 213 nm UVPD (23%). The use of sheathless CZE-MS and LC-MS with EThcD and 213 nm UVPD provided complementary protein profiling and proteoform data that was more comprehensive than either method alone.
Article
The developments in mass spectrometry (MS) in the past few decades reveal the power and versatility of this technology. MS methods are utilized in routine analyses as well as research activities involving a broad range of analytes (elements and molecules) and countless matrices. However, manual MS analysis is gradually becoming a thing of the past. In this article, the available MS automation strategies are critically evaluated. Automation of analytical workflows culminating with MS detection encompasses involvement of automated operations in any of the steps related to sample handling/treatment before MS detection, sample introduction, MS data acquisition, and MS data processing. Automated MS workflows help to overcome the intrinsic limitations of MS methodology regarding reproducibility, throughput, and the expertise required to operate MS instruments. Such workflows often comprise automated off-line and on-line steps such as sampling, extraction, derivatization, and separation. The most common instrumental tools include autosamplers, multi-axis robots, flow injection systems, and lab-on-a-chip. Prototyping customized automated MS systems is a way to introduce non-standard automated features to MS workflows. The review highlights the enabling role of automated MS procedures in various sectors of academic research and industry. Examples include applications of automated MS workflows in bioscience, environmental studies, and exploration of the outer space.
Article
Cell polarity is a vital biological process involved in the building, maintenance and normal functioning of tissues in invertebrates and vertebrates. Unsurprisingly, molecular defects affecting polarity organization and functions have a strong impact on tissue homeostasis, embryonic development and adult life, and may directly or indirectly lead to diseases. Genetic studies have demonstrated the causative effect of several polarity genes in diseases, however much remains to be clarified before a comprehensive view of the molecular organization and regulation of the protein networks associated with polarity proteins is obtained. This challenge can be approached head-on using proteomics to identify protein complexes involved in cell polarity and their modifications in a spatio-temporal manner. We review the fundamental basics of mass spectrometry techniques and provide an in-depth analysis of how mass spectrometry has been instrumental in understanding the complex and dynamic nature of some cell polarity networks at the tissue (apico-basal and planar cell polarities) and cellular (cell migration, ciliogenesis) levels, with the fine dissection of the interconnections between prototypic cell polarity proteins and signal transduction cascades in normal and pathological situations. This review primarily focuses on epithelial structures which are the fundamental building blocks for most metazoan tissues, used as the archetypal model to study cellular polarity. This field offers broad perspectives thanks to the ever-increasing sensitivity of mass spectrometry and its use in combination with recently developed molecular strategies able to probe in situ proteomic networks.
Article
Methods that can efficiently and effectively quantify proteins are needed to support increasing demand in many bioanalytical fields. Triple quadrupole mass spectrometry (QQQ-MS) is sensitive and specific, and it is routinely used to quantify small molecules. However, low resolution fragmentation-dependent MS detection can pose inherent difficulties for intact proteins. In this research, we investigated variables that affect protein and fragment ion signals to enable protein quantitation using QQQ-MS. Collision induced dissociation gas pressure and collision energy were found to be the most crucial variables for optimization. Multiple reaction monitoring (MRM) transitions for seven standard proteins, including lysozyme, ubiquitin, cytochrome c from both equine and bovine, lactalbumin, myoglobin, and prostate-specific antigen (PSA) were determined. Assuming the eventual goal of applying such methodology is to analyze protein in biological fluids, a liquid chromatography method was developed. Calibration curves of six standard proteins (excluding PSA) were obtained to show the feasibility of intact protein quantification using QQQ-MS. Linearity (2–3 orders), limits of detection (0.5–50 μg/mL), accuracy (<5% error), and precision (1%–12% CV) were determined for each model protein. Sensitivities for different proteins varied considerably. Biological fluids, including human urine, equine plasma, and bovine plasma were used to demonstrate the specificity of the approach. The purpose of this model study was to identify, study, and demonstrate the advantages and challenges for QQQ-MS-based intact protein quantitation, a largely underutilized approach to date. Graphical Abstract
Article
There has been tremendous progress in top-down proteomics (TDP) in the past five years, particularly in intact protein separation and high resolution mass spectrometry. However, bioinformatics to deal with large-scale mass spectra has lagged behind, in both algorithmic research and software development. In this study, we developed pTop 1.0, a novel software tool to significantly improve the accuracy and efficiency of mass spectral data analysis in TDP. The precursor mass offers crucial clues to infer the potential post-translational modifications co-occurring on the protein, the reliability of which relies heavily on its mass accuracy. Concentrating on detecting the precursors more accurately, a machine-learning model incorporating a variety of spectral features was trained online in pTop via a support vector machine (SVM). pTop employs the sequence tags extracted from the MS/MS spectra and a dynamic programming algorithm to accelerate the search speed, especially for those spectra with multiple post-translational modifications. We tested pTop on three publicly available data sets and compared it with ProSight and MS-Align+ in terms of its recall, precision, and running time, etc. The results showed that pTop can, in general, outperform ProSight and MS-Align+. pTop recalled 22% more correct precursors although it exported 30% fewer precursors than Xtract (in ProSight) from a human histone data set. The running speed of pTop was about one to two orders of magnitude faster than that of MS-Align+. This algorithmic advancement in pTop, including both accuracy and speed, will inspire the development of other similar software to analyze the mass spectra from the entire proteins.
Chapter
Analytical chemistry has considerably benefited from the developments in the field of mass spectrometry. The high resolution, mass accuracy, and sensitivity offered by modern mass spectrometers have been essential in addressing analytical needs in numerous areas of research as well as in routine laboratory praxis. The most recent addition to the family of mass spectrometers has been the Orbitrap analyzer, making an ultrahigh-resolution mass spectrometry accessible to most life science laboratories. The Orbitrap-based instrumentation has established itself firmly in the field of proteomics, metabolomics, and metabolite analysis. Moreover, it is gaining increased popularity also in areas of bioanalysis, lipidomics, doping, as well as in drug and pesticide residue analysis. This article presents the principle of operation of the Orbitrap analyzer, its most recent technological developments, and outlook, and it reviews application areas where the Orbitrap analyzers represent the state-of-the-art solution to a multitude of analytical needs.
Article
Full-text available
Fragmentation efficiencies of various 'activated-ion' electron capture dissociation (AI-ECD) methods are compared for a model system of bovine ubiquitin 7+ cations. In AI-ECD studies, sufficient internal energy was given to protein cations prior to ECD application using IR laser radiation, collisions, blackbody radiation, or in-beam collisions, in turn. The added energy was utilized in increasing the population of the precursor ions with less intra-molecular noncovalent bonds or enhancing thermal fluctuations of the protein cations. Removal of noncovalent bonds resulted in extended structures, which are ECD friendly. Under their best conditions, a variety of activation methods showed a similar effectiveness in ECD fragmentation. In terms of the number of fragmented inter-residue bonds, IR laser/blackbody infrared radiation and 'in-beam' activation were almost equally efficient with ∼70% sequence coverage, while collisions were less productive. In particular, 'in-beam' activation showed an excellent effectiveness in characterizing a pre-fractionated single kind of protein species. However, its inherent procedure did not allow for isolation of the protein cations of interest.
Chapter
Introduction Collection and Storage of Biofluids Commonly used Biofluids for Biomarker Discovery Urine Cerebrospinal Fluid Saliva Other Biofluids Conclusions
Article
Full-text available
Breast cancer was the second leading cause of cancer related mortality for females in 2014. Recent studies suggest histone H1 phosphorylation may be useful as a clinical biomarker of breast and other cancers due to its ability to recognize proliferative cell populations. Although monitoring a single phosphorylated H1 residue is adequate to stratify high-grade breast tumors, expanding our knowledge of how H1 is phosphorylated through the cell cycle is paramount to understanding its role in carcinogenesis. H1 analysis by bottom-up MS is challenging due to the presence of highly homologous sequence variants expressed by most cells. These highly basic proteins are difficult to analyze by LC-MS/MS due to the small, hydrophilic nature of peptides produced by tryptic digestion. Although bottom-up methods permit identification of several H1 phosphorylation events, these peptides are not useful for observing the combinatorial PTM patterns on the protein of interest. To complement the information provided by bottom-up MS, we utilized a top-down MS/MS workflow to permit identification and quantitation of H1 proteoforms related to the progression of breast cells through the cell cycle. Histones H1.2 and H1.4 were observed in MDA-MB- 231 metastatic breast cells, whereas an additional histone variant, histone H1.3, was identified only in non-neoplastic MCF-10A cells. Progressive phosphorylation of histone H1.4 was identified in both cell lines at mitosis (M phase). Phosphorylation occurred first at S172 followed successively by S187, T18, T146 and T154. Notably, phosphorylation at S173 of histone H1.2 and S172, S187, T18, T146 and T154 of H1.4 significantly increases during M phase relative to S phase, suggesting that these events are cell cycle-dependent and may serve as markers for proliferation. Finally, we report the observation of the H1.2 SNP variant A18V in MCF-10A cells. Copyright © 2015, The American Society for Biochemistry and Molecular Biology.
Chapter
Over the past decade, microwave-supported acid proteolysis has emerged as a valuable tool in high throughput proteomic analysis. Its major advantage is speed; proteolysis of complex mixtures takes place in less than 30 minutes. Cleavage occurs selectively and reproducibly at aspartic acid residues and produces peptide products whose tandem mass spectra are database searchable. In addition to higher speed and lower cost, advantages over trypsin digestion include reliable production of peptides from proteins with modified arginine and lysine residues and generation of a more limited mixture of peptides that are larger, on average. Disadvantages include hydrolysis of carbohydrate and phosphate modifications. Research-grade microwave devices provide temperature control within ±5°C for microscale reactions.
Article
With the rapid advancement of the high resolution mass spectrometry, top-down proteomics becomes the reality. Proteome research on the intact protein level will provide more precise and more abundant biological information. For example, it can detect the relationship between the multiple post-translational modifications. Due to the genetic mutation, alternative splicing of RNA and various post-translational modifications, one gene may produce multiple protein forms, now called 'proteoforms'. Top-down proteomics will help identify the proteoforms. The three pillar technologies in top-down proteomics are separation, mass spectrometry and bioinformatics from the point of view on the entire proteins. This paper reviews these technologies and puts more emphases on the bioinformatics related topics, including the mass spectral preprocessing, the database searching algorithms and the localization of post-translational modifications.
Article
Posttranslational modifications (PTMs) control protein function, but established peptide-based proteomic methods often fail to provide a comprehensive view of PTMs. In this issue of Chemistry & Biology, Gersch et al. describe an efficient combination of chromatographic separation and top-down mass spectrometry that together with an intuitive visualization tool allowed them to screen the proteasome for PTMs and covalently binding inhibitors. Copyright © 2015 Elsevier Ltd. All rights reserved.
Article
The fertilization ability of male gametes is achieved after their transit through the epididymis where important post-gonadal differentiation occurs in different cellular compartments. Most of these maturational modifications occur at the protein level. The epididymal sperm maturation process was investigated using the ICM-MS (Intact Cell MALDI-TOF MS) approach on spermatozoa isolated from four different epididymal regions (immature to mature stage). Differential and quantitative MALDI-TOF profiling for whole cells or sub-cellular fractions was combined with targeted top-down MS in order to identify endogenous biomolecules. Using this approach, 172m/z peaks ranging between 2 and 20kDa were found to be modified during maturation of sperm. Using top-down MS, 62m/z were identified corresponding to peptidoforms/proteoforms with post-translational modifications (MS data are available via ProteomeXchange with identifier PXD001303). Many of the endogenous peptides were characterized as N-, C-terminal sequences or internal fragments of proteins presenting specific cleavages, suggesting the presence of sequential protease activities in the spermatozoa. This is the first time that such proteolytic activities could be evidenced for various sperm proteins through quantification of their proteolytic products. ICM-MS/top-down MS thus proved to be a valid approach for peptidome/degradome studies and provided new contributions to understanding of the maturation process of the male gamete involved in the development of male fertility. This peptidomic study (i) characterized the peptidome of epididymal spermatozoa from boar (Sus scrofa); (ii) established characteristic molecular phenotypes distinguishing degrees of maturation of spermatozoa during epididymal transit, and (iii) revealed that protease activities were at the origin of numerous peptides from known and unknown proteins involved in sperm maturation and/or fertility processes. Copyright © 2014. Published by Elsevier B.V.
Article
The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the "C-score" (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, "off-by-1 Da" discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78).
Article
Top-down proteomics has become a popular approach for the analysis of intact proteins. The term "top down" has been coined for the analysis of proteins not involving any enzymatic or chemical cleavage but rather the ionization of the protein as a sound molecule and mass analysis of intact species and fragment ions thereof produced upon dissociation inside a mass spectrometer. One or several charge states of the protein are mass-isolated and subjected to dissociation (MS/MS) in the gas phase. The obtained fragment masses, predominantly from cleavages of the protein along its amino acid backbone, are directly related to the intact protein. Using bioinformatics tools the fragment masses are matched against a known protein sequence or can alternatively be used for partial or full de novo sequencing, depending on the size of the protein and the number of fragment ions obtained. Moreover, this approach provides global information about modification states of proteins including the number and types of isoforms and their stoichiometry and allows for the precise localization of modifications within the amino acid sequence. Top-down analysis of a single, purified protein can be performed by matrix-assisted laser desorption ionization or electrospray ionization upon direct infusion without online chromatographic separation, whereas top-down analysis of complex protein mixtures makes pre-fractionation combined with an efficient front-end chromatographic separation coupled online to the mass spectrometer inevitable.
Chapter
The large-scale identification and quantification of proteins is an important foundation of systems biology. Here we focus on the particularly powerful technology of mass spectrometry (MS)-based proteomics, with an emphasis on recent high-resolution and quantitative approaches. MS-based proteomics is used to characterize proteins in complex mixtures and it is now possible to quantify nearly all the proteins in human cell lines. Subcellular localization and protein turnover can also be addressed comprehensively. In affinity purifications, quantitative proteomics distinguishes specific interacting proteins from background binders. Thousands of phosphorylation sites as well as other post-translational modifications can readily be quantified in vivo, providing direct access to cellular information processing events. Underlying the recent success of the field are developments in computational proteomics, which now allow highly sophisticated and completely automatic analysis of raw MS data and streamlined bioinformatic and systems-level interpretation of the results.
Article
We review approaches for microorganism identification that exploit the wealth of information in constantly expanding proteome databases. Masses of an organism's protein biomarkers are experimentally determined and matched against sequence-derived masses of proteins, found together with their source organisms in proteome databases. The source organisms are ranked according to the matches, resulting in microorganism identification. Statistical analysis of proteome uniqueness across organisms in a database enables evaluation of the probability of false identifications based on protein mass assignments alone. Biomarkers likely to be observed can be identified based solely on microbial genome sequence information. Protein identification methodologies allow assignment of detected proteins to specific microorganisms and, by extension, allow identification of the microorganism from which those proteins originate.
Article
The genome fingerprint scanning (GFS) system was developed to link proteomic data, consisting of peptide mass fingerprints and tandem mass spectrometry (MS/MS) data, to the genome sequence of an organism. It maps MS data directly to the genomic locus responsible for expression of a protein, without relying on prior genome annotation. The GFS approach provides the intriguing possibility of identifying novel genes straight from protein data, thereby potentially enhancing ongoing efforts to annotate the genomes.
Article
This chapter reviews the intact protein mass measurements and discusses the way top-down mass spectrometry (MS) provides a route toward proteomics experiments that embrace the transmembrane domain by addressing the whole intact protein. The mass spectrum of an intact protein defines the native covalent state of the gene product and its heterogeneity. Absolute quantification in MS is achieved using internal standards while relative quantification can be achieved using isotopic labeling strategies. Identification of a protein from its mass can be accomplished in a number of ways—for example, ions from the intact protein can be isolated in the mass spectrometer for tandem MS, or samples collected concomitantly with elution of the intact mass tag (IMT) can be subjected to chemical cleavage or digestion to yield peptides for bottom-up tandem MS. The only direct way to identify the IMT is through the top-down MS because the bottom-up approach could identify several proteins in a collected fraction providing several candidates for the IMT. Various intact protein gas-phase dissociation strategies provide versatile options for top-down MS of integral membrane proteins (IMPs)—for example, collision-activated dissociation (CAD) and electron-capture dissociation (ECD).
Article
Glycosylation is increasingly recognized as a common and biologically significant post-translational modification of proteins. Modern mass spectrometry methods offer the best ways to characterize the glycosylation state of proteins. Both glycobiology and mass spectrometry rely on specialized nomenclature, techniques, and knowledge, which pose a barrier to entry by the nonspecialist. This introductory chapter provides an overview of the fundamentals of glycobiology, mass spectrometry methods, and the intersection of the two fields. Foundational material included in this chapter includes a description of the biological process of glycosylation, an overview of typical glycoproteomics workflows, a description of mass spectrometry ionization methods and instrumentation, and an introduction to bioinformatics resources. In addition to providing an orientation to the contents of the other chapters of this volume, this chapter cites other important works of potential interest to the practitioner. This overview, combined with the state-of-the-art protocols contained within this volume, provides a foundation for both glycobiologists and mass spectrometrists seeking to bridge the two fields.
Article
This review of current HPLC techniques concludes that they represent a valuable tool for the characterization of virtually any hydrophobic protein, given the wide versatility, relative ease of use, and high resolution of the reversed phase column. Moreover, since the procedure does not destroy the sample, it allows for protein identification by coupling the column outlet on line with a mass spectrometer interfaced with an electrospray source. Thus, using the thylakoid membrane of the photosynthetic apparatus as a model, we have demonstrated that by taking intact mass measurements (IMM), each protein may be identified on the basis of the close correspondence between the molecular masses measured by RP-HPLC-ESI-MS with those expected from the DNA sequence, in those cases where post-translational modifications may be supposed absent. This was corroborated by the evidence that proteins assigned by IMM are also confirmed by 'in solution' trypsin digestion and peptide fragment fingerprinting (PFF) of each protein isolated by RP-HPLC using a preparative scale column. On the other hand, even when denaturated these highly hydrophobic proteins are barely digested, resulting the small number of peptides not sufficient for unequivocal protein identification by mass peptide fingerprinting. Furthermore, because IMM reflects the full protein sequence, it is possible to elucidate in a short time whether the protein has undergone any post-translational modifications. In the presence of single or multiple phosphorylations, for example, it is possible to estimate approximately the amount of phosphorylated protein present as a percentage of the total protein by comparing the intensity of deconvolution of each protein.
Article
A growing number of labs are using many types of mass spectrometers to directly analyze intact proteins and to improve conversion of MS data into biological knowledge
Article
Childhood absence epilepsy is a prototypic form of generalized nonconvulsive epilepsy characterized by short impairments of consciousness concomitant with synchronous and bilateral spike-and-wave discharges in the electroencephalogram. For scientists in this field, the BS/Orl and BR/Orl mouse lines, derived from a genetic selection, constitute an original mouse model "in mirror" of absence epilepsy. The potential of MALDI imaging mass spectrometry (IMS) for the discovery of potential biomarkers is increasingly recognized. Interestingly, statistical analysis tools specifically adapted to IMS data sets and methods for the identification of detected proteins play an essential role. In this study, a new cross-classification comparative design using a combined discrete wavelet transformation-support vector machine classification was developed to discriminate spectra of brain sections of BS/Orl and BR/Orl mice. Nineteen m/z ratios were thus highlighted as potential markers with very high recognition rates (87-99%). Seven of these potential markers were identified using a top-down approach, in particular a fragment of Synapsin-I. This protein is yet suspected to be involved in epilepsy. Immunohistochemistry and Western Blot experiments confirmed the differential expression of Synapsin-I observed by IMS, thus tending to validate our approach. Functional assays are being performed to confirm the involvement of Synapsin-I in the mechanisms underlying childhood absence epilepsy.
Article
Mass spectrometry based proteomics generally seeks to identify and fully characterize protein species with high accuracy and throughput. Recent improvements in protein separation have greatly expanded the capacity of top-down proteomics (TDP) to identify a large number of intact proteins. To date, TDP has been most tightly associated with Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry. Here, we couple the improved separations to a Fourier-transform instrument based not on ICR but using the Orbitrap Elite mass analyzer. Application of this platform to H1299 human lung cancer cells resulted in the unambiguous identification of 690 unique proteins and over 2000 proteoforms identified from proteins with intact masses<50 kDa. This is an early demonstration of high throughput TDP (>500 identifications) in an Orbitrap mass spectrometer and exemplifies an accessible platform for whole protein mass spectrometry.
Chapter
IntroductionLC-Based Approaches in Peptide Mass MappingLC-Based Approaches in Protein MappingOrthogonal 2D HPLC SeparationsConclusion
Article
Full-text available
Mass spectrometry (MS)-based proteomics is emerging as a broadly effective means for identification, characterization, and quantification of proteins that are integral components of the processes essential for life. Characterization of proteins at the proteome and sub-proteome (e.g., the phosphoproteome, proteoglycome, or degradome/peptidome) levels provides a foundation for understanding fundamental aspects of biology. Emerging technologies such as ion mobility separations coupled with MS and microchip-based-proteome measurements combined with MS instrumentation and chromatographic separation techniques, such as nanoscale reversed phase liquid chromatography and capillary electrophoresis, show great promise for both broad undirected and targeted highly sensitive measurements. MS-based proteomics increasingly contribute to our understanding of the dynamics, interactions, and roles that proteins and peptides play, advancing our understanding of biology on a systems wide level for a wide range of applications including investigations of microbial communities, bioremediation, and human health.
Article
Full-text available
A new method for identifying secretory signal sequences and for predicting the site of cleavage between a signal sequence and the mature exported protein is described. The predictive accuracy is estimated to be around 75–80% for both prokaryotic and eukaryotic proteins.
Article
Full-text available
Molecular and fragment ion data of intact 8- to 43-kDa proteins from electrospray Fourier-transform tandem mass spectrometry are matched against the corresponding data in sequence data bases. Extending the sequence tag concept of Mann and Wilm for matching peptides, a partial amino acid sequence in the unknown is first identified from the mass differences of a series of fragment ions, and the mass position of this sequence is defined from molecular weight and the fragment ion masses. For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags. However, three of the data base proteins differed by having an extra methionine or by missing an acetyl or heme substitution. The positions of these modifications in the protein examined were greatly restricted by the mass differences of its molecular and fragment ions versus those of the data base. To characterize the primary structure of an unknown represented in the data base, this method is fast and specific and does not require prior enzymatic or chemical degradation.
Article
Full-text available
A mass spectrometry-based method is described for simultaneous identification and quantitation of individual proteins and for determining changes in the levels of modifications at specific sites on individual proteins. Accurate quantitation is achieved through the use of whole-cell stable isotope labeling. This approach was applied to the detection of abundance differences of proteins present in wild-type versus mutant cell populations and to the identification of in vivo phosphorylation sites in the PAK-related yeast Ste20 protein kinase that depend specifically on the G1 cyclin Cln2. The present method is general and affords a quantitative description of cellular differences at the level of protein expression and modification, thus providing information that is critical to the understanding of complex biological phenomena.
Article
Full-text available
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Article
Full-text available
The current progression from genomics to proteomics is fueled by the realization that many properties of proteins (e.g., interactions, post-translational modifications) cannot be predicted from DNA sequence. Although it has become feasible to rapidly identify proteins from crude cell extracts using mass spectrometry after two-dimensional electrophoretic separation, it can be difficult to elucidate low-abundance proteins of interest in the presence of a large excess of relatively abundant proteins. Therefore, for effective proteome analysis it becomes critical to enrich the sample to be analyzed in subfractions of interest. For example, the analysis of protein kinase substrates can be greatly enhanced by enriching the sample of phosphorylated proteins. Although enrichment of phosphotyrosine-containing proteins has been achieved through the use of high-affinity anti-phosphotyrosine antibodies, the enrichment of phosphoserine/threonine-containing proteins has not been routinely possible. Here, we describe a method for enriching phosphoserine/threonine-containing proteins from crude cell extracts, and for subsequently identifying the phosphoproteins and sites of phosphorylation. The method, which involves chemical replacement of the phosphate moieties by affinity tags, should be of widespread utility for defining signaling pathways and control mechanisms that involve phosphorylation or dephosphorylation of serine/threonine residues.
Article
Full-text available
Protein kinases are coded by more than 2,000 genes and thus constitute the largest single enzyme family in the human genome. Most cellular processes are in fact regulated by the reversible phosphorylation of proteins on serine, threonine, and tyrosine residues. At least 30% of all proteins are thought to contain covalently bound phosphate. Despite the importance and widespread occurrence of this modification, identification of sites of protein phosphorylation is still a challenge, even when performed on highly purified protein. Reported here is methodology that should make it possible to characterize most, if not all, phosphoproteins from a whole-cell lysate in a single experiment. Proteins are digested with trypsin and the resulting peptides are then converted to methyl esters, enriched for phosphopeptides by immobilized metal-affinity chromatography (IMAC), and analyzed by nanoflow HPLC/electrospray ionization mass spectrometry. More than 1,000 phosphopeptides were detected when the methodology was applied to the analysis of a whole-cell lysate from Saccharomyces cerevisiae. A total of 216 peptide sequences defining 383 sites of phosphorylation were determined. Of these, 60 were singly phosphorylated, 145 doubly phosphorylated, and 11 triply phosphorylated. Comparison with the literature revealed that 18 of these sites were previously identified, including the doubly phosphorylated motif pTXpY derived from the activation loop of two mitogen-activated protein (MAP) kinases. We note that the methodology can easily be extended to display and quantify differential expression of phosphoproteins in two different cell systems, and therefore demonstrates an approach for "phosphoprofiling" as a measure of cellular states.
Article
Full-text available
Large-scale genomics has enabled proteomics by creating sequence infrastructures that can be used with mass spectrometry data to identify proteins. Although protein sequences can be deduced from nucleotide sequences, posttranslational modifications to proteins, in general, cannot. We describe a process for the analysis of posttranslational modifications that is simple, robust, general, and can be applied to complicated protein mixtures. A protein or protein mixture is digested by using three different enzymes: one that cleaves in a site-specific manner and two others that cleave nonspecifically. The mixture of peptides is separated by multidimensional liquid chromatography and analyzed by a tandem mass spectrometer. This approach has been applied to modification analyses of proteins in a simple protein mixture, Cdc2p protein complexes isolated through the use of an affinity tag, and lens tissue from a patient with congenital cataracts. Phosphorylation sites have been detected with known stoichiometry of as low as 10%. Eighteen sites of four different types of modification have been detected on three of the five proteins in a simple mixture, three of which were previously unreported. Three proteins from Cdc2p isolated complexes yielded eight sites containing three different types of modifications. In the lens tissue, 270 proteins were identified, and 11 different crystallins were found to contain a total of 73 sites of modification. Modifications identified in the crystallin proteins included Ser, Thr, and Tyr phosphorylation, Arg and Lys methylation, Lys acetylation, and Met, Tyr, and Trp oxidations. The method presented will be useful in discovering co- and posttranslational modifications of proteins.
Article
Full-text available
The RESID Database is a comprehensive collection of annotations and structures for protein pre-, co- and post-translational modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link modifications. The RESID Database includes: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. This database is freely accessible on the Internet through the European Bioinformatics Institute at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+RESID, through the National Cancer Institute — Frederick Advanced Biomedical Computing Center at http://www.ncifcrf.gov/RESID, or through the Protein Information Resource at http://pir.georgetown.edu/pirwww/dbinfo/resid.html.
Article
Thiaminase I (E.C. 2.5. 1.2) from Bacillus thiaminolyticus catalyzes the degradation of thiamin (vitamin B1). Unexpected mass heterogeneity (MW 42,127, 42,197, and 42,254; 1:2:1) in recombinant thiaminase I from Escherichia coli was detected by electrospray ionization Fourier-transform mass spectrometry, resolving power 7×10(4). Nozzle-skimmer fragmentation data reveal an extra Ala (+71.02; 71.04=theory) and GlyAla (+128.04; 128.06=theory) on the N-terminus, in addition to the fully processed enzyme. However, the fragment ion masses were consistent only with this sequence through 330 N-terminal residues; resequencing of the last 150 bps of the thiaminase I gene yields a sequence consistent with the molecular weight values and all 61 fragment ion masses. Covalently labeling the active site with a 108-Da pyrimidine moiety via mechanism-based inhibition produces a corresponding molecular weight increase in all three thiaminase I components, which indicates that they are all enzymatically active. Inspection of the fragment ions that do and do not increase by 108 Da indicates that the active site nucleophile is located between Pro(79) and Thr(177) in the 379 amino acid enzyme.
Article
The availability of genome sequences, affordable mass spectrometers and high-resolution two-dimensional gels has made possible the identification of hundreds of proteins from many organisms by peptide mass fingerprinting. However, little attention has been paid to how information generated by these means can be utilised for detailed protein characterisation. Here we present an approach for the systematic characterisation of proteins using mass spectrometry and a software tool FindMod. This tool, available on the internet at http://www.expasy.ch/sprot/findmod.html, examines peptide mass fingerprinting data for mass differences between empirical and theoretical peptides. Where mass differences correspond to a post-translational modification, intelligent rules are applied to predict the amino acids in the peptide, if any, that might carry the modification. FindMod rules were constructed by examining 5153 incidences of post-translational modifications documented in the SWISS-PROT database, and for the 22 post-translational modifications currently considered (acetylation, amidation, biotinylation, C-mannosylation, deamidation, flavinylation, farnesylation, formylation, geranyl-geranylation, gamma-carboxyglutamic acids, hydroxylation, lipoylation, methylation, myristoylation, N-acyl diglyceride (tripalmitate), O-GlcNAc, palmitoylation, phosphorylation, pyridoxal phosphate, phospho-pantetheine, pyrrolidone carboxylic acid, sulphation) a total of 29 different rules were made. These consider which amino acids can carry a modification, whether the modification occurs on N-terminal, C-terminal or internal amino acids, and the type of organisms on which the modification can be found. We illustrate the utility of the approach with proteins from 2-D gels of Escherichia coli and sheep wool, where post-translational modifications predicted by FindMod were confirmed by MALDI post-source decay peptide fragmentation. As the approach is amenable to automation, it presents a potentially large-scale means of protein characterisation in proteome projects.
Article
All present FT/ICR instruments operate with single-pulse or frequency-sweep radio frequency excitation waveforms, which produce excitation power with non-uniform amplitude and limited mass selectivity. This paper introduces a general "tailored" excitation time-domain waveform obtained by inverse Fourier transformation of the desired excitation spectrum. The new method includes all other excitation waveforms as subsets and may be operated in direct or heterodyne mode. Major applications include the following: flatter excitation power over the detected mass range, excitation power with one or more windows for suppression of large peaks or for more selective ion ejection for MS/MS, and multiple-ion monitoring with simultaneous detection of any number of selected mass-to-charge ratios. Theoretical and experimental examples of all three types of excitation are given. The method is readily adapted to existing instruments with minor hardware and software modifications.
Article
Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.
Article
Proteins from silver-stained gels can be digested enzymatically and the resulting peptide analyzed and sequenced by mass spectrometry. Standard proteins yield the same peptide maps when extracted from Coomassie- and silver-stained gels, as judged by electrospray and MALDI mass spectrometry. The low nanogram range can be reached by the protocols described here, and the method is robust. A silver-stained one-dimensional gel of a fraction from yeast proteins was analyzed by nano-electrospray tandem mass spectrometry. In the sequencing, more than 1000 amino acids were covered, resulting in no evidence of chemical modifications due to the silver staining procedure. Silver staining allows a substantial shortening of sample preparation time and may, therefore, be preferable over Coomassie staining. This work removes a major obstacle to the low-level sequence analysis of proteins separated on polyacrylamide gels.
Article
The 10(5) resolving power and MS/MS capabilities of Fourier-transform mass spectrometry provide electrospray ionization mass spectra containing >100 molecular and fragment ion mass values of high accuracy. Applying these spectra to the detection and localization of errors and modifications in the DNA-derived sequences of proteins is illustrated with the thiCEFSGH thiamin biosynthesis operon from Escherichia coli. Direct fragmentation of the multiply-charged intact protein ions produces large fragment ions covering the entire sequence; further dissociation of these fragment ions provides information on their sequences. For ThiE (23 kDa), the entire sequence was verified in a single spectrum with an accurate (0.3 Da) molecular weight (Mr) value, with confirmation from MS/MS fragment masses. Those for ThiH (46 kDa) showed that the Mr value (1 Da error) represented the protein without the start Met residue. For ThiF (27 kDa), MS/MS localized a sequence discrepancy to a 34 residue peptide. The first 107 residues of ThiC (74 kDa) were shown to be correct, with C-terminal heterogeneity indicated. For ThiG (predicted Mr = 34 kDa), ESI/FTMS showed two components of 7,310.74 (ThiS) and 26,896.5 Da (ThiG); MS/MS uncovered three reading frame errors and a stop codon for the first protein. MS/MS ions are consistent with 68 fragments predicted by the corrected ThiS/ThiG DNA sequences.
Article
We describe the impact of advances in mass measurement accuracy, +/- 10 ppm (internally calibrated), on protein identification experiments. This capability was brought about by delayed extraction techniques used in conjunction with matrix-assisted laser desorption ionization (MALDI) on a reflectron time-of-flight (TOF) mass spectrometer. This work explores the advantage of using accurate mass measurement (and thus constraint on the possible elemental composition of components in a protein digest) in strategies for searching protein, gene, and EST databases that employ (a) mass values alone, (b) fragment-ion tagging derived from MS/MS spectra, and (c) de novo interpretation of MS/MS spectra. Significant improvement in the discriminating power of database searches has been found using only molecular weight values (i.e., measured mass) of > 10 peptide masses. When MALDI-TOF instruments are able to achieve the +/- 0.5-5 ppm mass accuracy necessary to distinguish peptide elemental compositions, it is possible to match homologous proteins having > 70% sequence identity to the protein being analyzed. The combination of a +/- 10 ppm measured parent mass of a single tryptic peptide and the near-complete amino acid (AA) composition information from immonium ions generated by MS/MS is capable of tagging a peptide in a database because only a few sequence permutations > 11 AA's in length for an AA composition can ever be found in a proteome. De novo interpretation of peptide MS/MS spectra may be accomplished by altering our MS-Tag program to replace an entire database with calculation of only the sequence permutations possible from the accurate parent mass and immonium ion limited AA compositions. A hybrid strategy is employed using de novo MS/MS interpretation followed by text-based sequence similarity searching of a database.
Article
For proteins of < 20 kDa, this new radical site dissociation method cleaves different and many more backbone bonds than the conventional MS/MS methods (e.g., collisionally activated dissociation, CAD) that add energy directly to the even-electron ions. A minimum kinetic energy difference between the electron and ion maximizes capture; a 1 eV difference reduces capture by 10(3). Thus, in an FTMS ion cell with added electron trapping electrodes, capture appears to be achieved best at the boundary between the potential wells that trap the electrons and ions, now providing 80 +/- 15% precursor ion conversion efficiency. Capture cross section is dependent on the ionic charge squared (z2), minimizing the secondary dissociation of lower charge fragment ions. Electron capture is postulated to occur initially at a protonated site to release an energetic (approximately 6 eV) H. atom that is captured at a high-affinity site such as -S-S- or backbone amide to cause nonergodic (before energy randomization) dissociation. Cleavages between every pair of amino acids in mellitin (2.8 kDa) and ubiquitin (8.6 kDa) are represented in their ECD and CAD spectra, providing complete data for their de novo sequencing. Because posttranslational modifications such as carboxylation, glycosylation, and sulfation are less easily lost in ECD than in CAD, ECD assignments of their sequence positions are far more specific.
Article
We report a new tandem mass spectrometric approach for the improved identification of polypeptides from mixtures (e.g., using genomic databases). The approach involves the dissociation of several species simultaneously in a single experiment and provides both increased speed and sensitivity. The data analysis makes use of the known fragmentation pathways for polypeptides and highly accurate mass measurements for both the set of parent polypeptides and their fragments. The accurate mass information makes it possible to attribute most fragments to a specific parent species. We provide an initial demonstration of this multiplexed tandem MS approach using an FTICR mass spectrometer with a mixture of seven polypeptides dissociated using infrared irradiation from a CO2 laser. The peptides were added to, and then successfully identified from, the largest genomic database yet available (C. elegans), which is equivalent in complexity to that for a specific differentiated mammalian cell type. Additionally, since only a few enzymatic fragments are necessary to unambiguously identify a protein from an appropriate database, it is anticipated that the multiplexed MS/MS method will allow the more rapid identification of complex protein mixtures with on-line separation of their enzymatically produced polypeptides.
Article
We describe the protein search engine "ProFound", which employs a Bayesian algorithm to identify proteins from protein databases using mass spectrometric peptide mapping data. The algorithm ranks protein candidates by taking into account individual properties of each protein in the database as well as other information relevant to the peptide mapping experiment. The program consistently identifies the correct protein(s) even when the data quality is relatively low or when the sample consists of a simple mixture of proteins. Illustrative examples of protein identifications are provided.
Article
We derive and validate a simple statistical model that predicts the distribution of false matches between peaks in matrix-assisted laser desorption/ionization mass spectrometry data and proteins in proteome databases. The model allows us to calculate the significance of previously reported microorganism identification results. In particular, for deltam = +/-1.5 Da, we find that the computed significance levels are sufficient to demonstrate the ability to identify microorganisms, provided the number of candidate microorganisms is limited to roughly three Escherichia coli-like or roughly 10 Bacillus subtilis-like microorganisms (in the sense of having roughly the same number of proteins per unit-mass interval). We conclude that, given the cluttered and incomplete nature of the data, it is likely that neither simple ranking nor simple hypothesis testing will be sufficient for truly robust microorganism identification over a large number of candidate microorganisms.
Article
The dynamic range of Fourier transform ion cyclotron mass spectrometry (FTICR) is typically limited by the useful charge capacity of an FTICR cell (to approximately 10(6) to 10(7) elementary charges) and the minimum number of ions required to produce a useful signal (approximately 10(2) elementary charges). We show that the expansion of the dynamic range by 2 orders of magnitude can be achieved by preselecting lower abundance species in a quadrupole interface to an electrospray ionization (ESI) source. Ion preselection is then followed by ion accumulation in external to the FTICR cell a linear (2-D) quadrupole trap and subsequent transfer to the region of high magnetic field for gated trapping in the FTICR cell. Two modes of ion preselection, using either the quadrupole filtering mode or rf-only dipolar excitation, were studied and mass resolutions of 30 to 100 were achieved for selective external ion accumulation of peptides and proteins with molecular weights ranging from 500 to 17,000 Da. The ability to selectively eject the most abundant species before trapping in the FTICR has enormous practical benefits for increasing the sensitivity and dynamic range of measurements.
Article
Reversible protein phosphorylation has been known for some time to control a wide range of biological functions and activities. Thus determination of the site(s) of protein phosphorylation has been an essential step in the analysis of the control of many biological systems. However, direct determination of individual phosphorylation sites occurring on phosphoproteins in vivo has been difficult to date, typically requiring the purification to homogeneity of the phosphoprotein of interest before analysis. Thus, there has been a substantial need for a more rapid and general method for the analysis of protein phosphorylation in complex protein mixtures. Here we describe such an approach to protein phosphorylation analysis. It consists of three steps: (1) selective phosphopeptide isolation from a peptide mixture via a sequence of chemical reactions, (2) phosphopeptide analysis by automated liquid chromatography-tandem mass spectrometry (LC-MS/MS), and (3) identification of the phosphoprotein and the phosphorylated residue(s) by correlation of tandem mass spectrometric data with sequence databases. By utilizing various phosphoprotein standards and a whole yeast cell lysate, we demonstrate that the method is equally applicable to serine-, threonine- and tyrosine-phosphorylated proteins, and is capable of selectively isolating and identifying phosphopeptides present in a highly complex peptide mixture.
Article
Nonporous (NPS) RP-HPLC has been used to rapidly separate proteins from whole cell lysates of human breast cell lines. The nonporous separation involves the use of hard-sphere silica beads of 1.5-microm diameter coated with C18, which can be used to separate proteins ranging from 5 to 90 kDa. Using only 30-40 microg of total protein, the protein molecular weights are detectable on-line using an ESI-oaTOF MS. Of hundreds of proteins detected in this mass range, approxinately 75-80 are more highly expressed. The molecular weight profiles can be displayed as a mass map analogous to a virtual "1-D gel" and differentially expressed proteins can be compared by image analysis. The separated proteins can also be detected by UV absorption and differentially expressed proteins quantified. The eluting proteins can be collected in the liquid phase and the molecular weight and peptide maps determined by MALDI-TOF MS for identification. It is demonstrated that the expressed protein profiles change during neoplastic progression and that many oncoproteins are readily detected. It is also shown that the response of premalignant cancer cells to estradiol can be rapidly screened by this method, demonstrating significant changes in response to an external agent. Ultimately, the proteins can be studied by peptide mapping to search for posttranslational modifications of the oncoproteins accompanying progression.
Article
Phosphorylation is a common form of protein modification. To understand its biological role, the site of phosphorylation has to be determined. Generally, only limited amounts of phosphorylated proteins are present in a cell, thus demanding highly sensitive procedures for phosphorylation site determination. Here, a novel method is introduced which enables the localization of tyrosine phosphorylation in gel-separated proteins in the femtomol range. The method utilizes the immonium ion of phosphotyrosine at m/z 216.043 for positive ion mode precursor ion scanning combined with the recently introduced Q2-pulsing function on quadrupole TOF mass spectrometers. The high resolving power of the quadrupole TOF instrument enables the selective detection of phosphotyrosine immonium ions without interference from other peptide fragments of the same nominal mass. Performing precursor ion scans in the positive ion mode facilitates sequencing, because there is a no need for polarity switching or changing pH of the spraying solvent. Similar limits of detection were obtained in this approach when compared to triple-quadrupole mass spectrometers but with significantly better selectivity, owing to the high accuracy of the fragment ion selection. Synthetic phosphopeptides could be detected at 1 fmol/microL, and 100 fmol of a tyrosine phosphorylated protein in gel was sufficient for the detection of the phosphorylated peptide in the unseparated digestion mixture and for unambiguous phosphorylation site determination. The new method can be applied to unknown protein samples, because the identification and localization of the modification is performed on the same sample.
Article
A method has been developed that utilizes phosphoprotein isotope-coded affinity tags (PhIAT) that combines stable isotope and biotin labeling to enrich and quantitatively measure differences in the O-phosphorylation states of proteins. The PhIAT labeling approach involves hydroxide ion-mediated beta-elimination of the O-phosphate moiety and the addition of 1,2-ethanedithiol containing either four alkyl hydrogens (EDT-D0) or four alkyl deuteriums (EDT-D4) followed by biotinylation of the EDT-D0/D4 moiety using (+)-biotinyl-iodoacetamidyl-3,6-dioxaoctanediamine. The PhIAT reagent, which contains the nucleophilic sulfhydryl and isotopic label covalently linked to a biotin moiety, was synthesized and has the potential utility to reduce the O-phosphorylation derivatization into a one-step process. The PhIAT labeling approach was initially demonstrated using the model phosphoprotein beta-casein. After proteolytic digestion, the PhIAT-labeled peptides were affinity isolated using immobilized avidin and analyzed using capillary reversed-phase liquid chromatography-mass spectrometry. PhIAT-labeled beta-casein peptides corresponding to peptides containing known sites of O-phosphorylation were isolated and identified. The PhIAT labeling method was also applied to a yeast protein extract. The PhIAT labeling technique provides a reliable method for making quantitative measurements of differences in the O-phosphorylation state of proteins.
Article
Although direct fragmentation of protein ions in a mass spectrometer is far more efficient than exhaustive mapping of 1-3 kDa peptides for complete characterization of primary structures predicted from sequenced genomes, the development of this approach is still in its infancy. Here we describe a statistical model (good to within approximately 5%) that shows that the database search specificity of this method requires only three of four fragment ions to match (at +/-0.1 Da) for a 99.8% probability of being correct in a database of 5,000 protein forms. Software developed for automated processing of protein ion fragmentation data and for probability-based retrieval of whole proteins is illustrated by identification of 18 archaeal and bacterial proteins with simultaneous mass-spectrometric (MS) mapping of their entire primary structures. Dissociation of two or three proteins at once for such identifications in parallel is also demonstrated, along with retention and exact localization of a phosphorylated serine residue through the fragmentation process. These conceptual and technical advances should assist future processing of whole proteins in a higher throughput format for more robust detection of co- and post-translational modifications.
Article
The structural characterization of proteins expressed from the genome is a major problem in proteomics. The solution to this problem requires the separation of the protein of interest from a complex mixture, the identification of its DNA-predicted sequence, and the characterization of sequencing errors and posttranslational modifications. For this, the "top down" mass spectrometry (MS) approach, extended by the greatly increased protein fragmentation from electron capture dissociation (ECD), has been applied to characterize proteins involved in the biosynthesis of thiamin, Coenzyme A, and the hydroxylation of proline residues in proteins. With Fourier transform (FT) MS, electrospray ionization (ESI) of a complex mixture from an E. coli cell extract gave 102 accurate molecular weight values (2-30 kDa), but none corresponding to the predicted masses of the four desired enzymes for thiamin biosynthesis (GoxB, ThiS, ThiG, and ThiF). MS/MS of one ion species (representing approximately 1% of the mixture) identified it with the DNA-predicted sequence of ThiS, although the predicted and measured molecular weights were different. Further purification yielded a 2-component mixture whose ECD spectrum characterized both proteins simultaneously as ThiS and ThiG, showing an additional N-terminal Met on the 8 kDa ThiS and removal of an N-terminal Met and Ser from the 27 kDa ThiG. For a second system, the molecular weight of the 45 kDa phosphopantothenoylcysteine synthetase/decarboxylase (CoaBC), an enzyme involved in Coenzyme A biosynthesis, was 131 Da lower than that of the DNA prediction; the ECD spectrum showed that this is due to the removal of the N-terminal Met. For a third system, viral prolyl 4-hydroxylase (26 kDa), ECD showed that multiple molecular ions (+98, +178, etc.) are due to phosphate noncovalent adducts, and MS/MS pinpointed the overall mass discrepancy of 135 Da to removal of the initiation Met (131 Da) and to formation of disulfide bonds (2 x 2 Da) at C32-C49 and C143-C147, although 10 S-S positions were possible. In contrast, "bottom up" proteolysis characterization of the CoaBC and the P4H proteins was relatively unsuccessful. The addition of ECD substantially increases the capabilities of top down FTMS for the detailed structural characterization of large proteins.
Article
Recently, an approach for the "top down" sequence analysis of whole protein ions has been developed, employing electrospray ionization, collision-induced dissociation, and ion/ion proton-transfer reactions in a quadrupole ion trap mass spectrometer. This approach has now been extended to an analysis of the [M + 12H]12+ to [M + 5H]5+ ions of ribonuclease A and its N-linked glycosylated analogue, ribonuclease B, to determine the influence of the posttranslational modification on protein fragmentation. In agreement with previous studies on the fragmentation of a range of protein ions, facile gas-phase fragmentation was observed to occur along the protein backbone at the C-terminal of aspartic acid residues, and at the N-terminal of proline, depending on the precursor ion charge state. Interestingly, no evidence was found for gas-phase deglycosylation of the N-linked sugar in ribonuclease B, presumably due to effective competition from the facile amide bond cleavage channels that "protect" the N-linked glycosidic bond from cleavage. Thus, localization of the posttranslational modification site may be determined by analysis of the "protein fragment ion mass fingerprint".
Article
A two-dimensional liquid phase separation of proteins from whole cell lysates coupled on-line to an electrospray-ionization time-of-flight (ESI-TOF) mass spectrometer (MS) is used to map the protein content of ovarian surface epithelial cells (OSE) and an ovarian carcinoma-derived cell line (ES2). The two dimensions involve the use of liquid isoelectric focusing as the first phase and nonporous silica reversed-phase HPLC as the second phase of separation. Accurate molecular weight (MW) values are then obtained upon the basis of ESI-TOFMS so that an image of isolectric point (pI) versus MW analogous to 2-D gel electrophoresis is produced. The accurate MW together with the pI fraction and corresponding hydrophobicity (%B) are used to tag each protein so that protein expression can be compared in interlysate studies. Each protein is also identified on the basis of matrix-assisted laser desorption-ionization (MALDI) TOFMS peptide mapping and intact MW so that a standard map is produced against which other cell lines can be compared. Quantitative changes in protein expression are measured in these interlysate comparisons using internal standards in the on-line ESI-TOFMS process. In the ovarian epithelial cell lines under study, it is shown that in the three pI fractions chosen for detailed analysis, over 50 unique proteins can be detected per fraction, of which 40% can be identified from web-based databases. It is also shown that when using an accurate MW to compare proteins in the OSE versus ovarian cancer sample, there are proteins highly expressed in cancer cells but not in normal cells. In addition, many of the proteins in the cancer sample appear to be down-regulated, as compared to the normal cells. This two-dimensional (2-D) liquid/mass mapping method may provide a means of studying proteins in interlysate comparisons not readily available by other methods.
Article
Five proteins present in a relatively complex mixture derived from a whole cell lysate fraction of E. coli have been concentrated, purified, and dissociated in the gas phase, using a quadrupole ion trap mass spectrometer. Concentration of intact protein ions was effected using gas-phase ion/ion proton-transfer reactions in conjunction with mass-to-charge dependent ion "parking" to accumulate protein ions initially dispersed over a range of charge states into a single lower charge state. Sequential ion isolation events interspersed with additional ion parking ion/ion reaction periods were used to "charge-state purify" the protein ion of interest. Five of the most abundant protein components present in the mixture were subjected to this concentration/purification procedure and then dissociated by collisional activation of their intact multiply charged precursor ions. Four of the five proteins were subsequently identified by matching the uninterpreted product ion spectra against a partially annotated protein sequence database, coupled with a novel scoring scheme weighted for the relative abundances of the experimentally observed product ions and the frequency of fragmentations occurring at preferential cleavage sites. The identification of these proteins illustrates the potential of this "top-down" protein identification approach to reduce the reliance on condensed-phase chemistries and extensive separations for complex protein mixture analysis.
Article
For analysis of intact proteins by mass spectrometry (MS), a new twist to a two-dimensional approach to proteome fractionation employs an acid-labile detergent instead of sodium dodecyl sulfate during continuous-elution gel electrophoresis. Use of this acid-labile surfactant (ALS) facilitates subsequent reversed-phase liquid chromatography (RPLC) for a net two-dimensional fractionation illustrated by transforming thousands of intact proteins from Saccharomyces cerevisiae to mixtures of 5-20 components (all within approximately 5 kDa of one another) for presentation via electrospray ionization (ESI) to a Fourier transform MS (FTMS). Between 3 and 13 proteins have been detected directly using ESI-FTMS (or MALDI-TOF), and the fractionation showed a peak capacity of approximately 400 between 0 and 70 kDa. A probability-based identification was made automatically from raw MS/MS data (obtained using a quadrupole-FTMS hybrid instrument) for one protein that differed from that predicted in a yeast database of approximately 19,000 protein forms. This ALS-PAGE/RPLC approach to proteome processing ameliorates the "front end" problem that accompanies direct analysis of whole proteins and assists the future realization of protein identification with 100% sequence coverage in a high-throughput format.
Article
When presented with a mixture of intact proteins, electrospray ionization with Fourier-transform mass spectrometry (ESI-FTMS) has the capability to obtain direct fragmentation information from isolated ions. However, the automation of this capability has not been achieved to date. We have developed software for unattended acquisition of protein tandem mass spectrometry (MS/MS) data and batch processing of the resulting files for identification of whole proteins. Mixtures of both protein standards (8-29 kDa) and Methanococcus jannaschii cytosolic proteins (up to six components + 20 kDa) were infused via an autosampler, and MS/MS data were acquired without human intervention. The acquisition software recognizes ESI charge state patterns, generates protein-specific isolation waveforms on-the-fly, and fragments ions using two different infrared laser times. In addition to protein standards, five wild-type proteins (7-14 kDa) were identified automatically with 100% sequence coverage from the M. jannaschii database. The software underpins a measurement platform for sample-dependent acquisition of MS/MS data for whole proteins, a critical step to realize proteomics with 100% sequence coverage in a higher throughput setting.
Article
Secreted proteins of Mycobacterium tuberculosis are implicated in its disease pathogenesis and so are considered as potential diagnostic and vaccine candidates. The search for these has been slow, even though the entire genome sequence of M. tuberculosis is now available; of the 620 protein spots resolved by 2-D gel electrophoresis, 114 secreted proteins have been identified, but for only 13 has the primary structure been partly characterized. For comparison, in this top down mass spectrometry (MS) approach the secreted proteins were precipitated from cell culture filtrate, resuspended, and examined directly by electrospray ionization (ESI) Fourier transform MS. The ESI spectra of three precipitates showed 93, 535, and 369 molecular weight (M(r)) values, for a total of 689 different values. However, only approximately 10% of these values matched (+/-1 Da) the DNA predicted M(r) values, but these identifications were unreliable. Of nine molecular ions characterized by MS/MS, only one protein match was confirmed, and its isotopic molecular ions were overlapped by those of another protein. MS/MS identified a total of ten proteins by sequence tag search, of which three were unidentified previously. The low success of M(r) matching was due to unusually extensive posttranslational modifications, including loss of a signal sequence, loss of the N-terminal residue, proteolytic degradation, oxidation, and glycosylation. Although in eubacteria the latter is relatively rare, a 9 kDa protein showed 7 hexose attachments and two 20 kDa proteins each had 20 attachments. For MS/MS, electron capture dissociation was especially effective.
  • F Meng
  • B J Cargile
  • L M Miller
  • A J Forbes
  • J R Johnson
  • N L Kelleher
Meng, F.; Cargile, B. J.; Miller, L. M.; Forbes, A. J.; Johnson, J. R.; Kelleher, N. L. Nat. Biotechnol. 2001, 19, 952-957.
  • Y Ge
  • B G Lawhorn
  • M Elnaggar
  • E Strauss
  • J H Park
  • T P Begley
  • F W Mclafferty
Ge, Y.; Lawhorn, B. G.; ElNaggar, M.; Strauss, E.; Park, J. H.; Begley, T. P.; McLafferty, F. W. J. Am. Chem. Soc. 2002, 124, 672-678.
  • T L Ricca
  • T.-C L Wang
  • A G Marshall
Ricca, T. L.; Wang, T.-C. L.; Marshall, A. G. J. Am. Chem. Soc. 1985, 107, 7893-7897.
  • M E Belov
  • E N Nikolaev
  • G A Anderson
  • K J Auberry
  • R Harkewicz
  • R D J Smith
(27) Belov, M. E.; Nikolaev, E. N.; Anderson, G. A.; Auberry, K. J.; Harkewicz, R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2001, 12, 38-48.
  • M R Wilkins
  • E Gasteiger
  • C H Wheeler
  • I Lindskog
  • J C Sanchez
  • A Bairoch
  • R D Appel
  • M J Dunn
  • D F Hochstrasser
(28) Wilkins, M. R.; Gasteiger, E.; Wheeler, C. H.; Lindskog, I.; Sanchez, J. C.; Bairoch, A.; Appel, R. D.; Dunn, M. J.; Hochstrasser, D. F. Electrophoresis 1998, 19. (29) Patrie, S. M.; Molen, D. V.; Robinson, D.; Johnson, J. R.; Quinn, J. P.; Hendrickson, C. L.; Marshall, A. G.; Kelleher, N. L. In 50th ASMS Conference on Mass Spectrometry and Allied Topics: Orlando, FL, 2002.
  • N L Kelleher
  • S V Taylor
  • D Grannis
  • C Kinsland
  • H J Chiu
  • T Begley
Kelleher, N. L.; Taylor, S. V.; Grannis, D.; Kinsland, C.; Chiu, H. J.; Begley, T. P.; McLafferty, F. W. Protein Sci. 1998, 7, 1796-1801.
  • Y Ge
  • M Elnaggar
  • S K Sze
  • H Oh
  • T P Begley
  • F W Mclafferty
  • H Boshoff
  • C E Barry
Ge, Y.; ElNaggar, M.; Sze, S. K.; Bin Oh, H.; Begley, T. P.; McLafferty, F. W.; Boshoff, H.; Barry, C. E. J. Am. Soc. Mass Spectrom. 2003, 14, 253- 261.
  • K Biemann
  • I Papayannopoulos
Biemann, K.; Papayannopoulos, I. Acc. Chem. Res. 1994, 27, 370-378.
  • W Zhang
  • B Chait
Zhang, W.; Chait, B. Anal. Chem. 2000, 72, 1918-1924.
  • Y Oda
  • T Nagasu
  • B T Chait
(12) Oda, Y.; Nagasu, T.; Chait, B. T. Nat. Biotechnol. 2001, 19, 379-382. (13) Steen, H.; Kuster, B.; Fernandez, M.; Pandey, A.; Mann, M. Anal. Chem. 2001, 73, 1440-1448.
  • S Ficarro
  • M Mccleland
  • P Stukenberg
  • D Burke
  • M Ross
  • J Shabanowitz
  • D Hunt
  • F White
Ficarro, S.; McCleland, M.; Stukenberg, P.; Burke, D.; Ross, M.; Shabanowitz, J.; Hunt, D.; White, F. Nat. Biotechnol. 2002, 20, 301-305.
  • M J Maccoss
  • W H Mcdonald
  • A Saraf
  • R Sadygov
  • J M Clark
  • J J Tasto
  • K L Gould
  • D Wolters
  • M Washburn
  • A Weiss
  • J I Clark
  • J R Yates
  • Iii
MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900-7905.
  • H Zhou
  • J D Watts
  • R Aebersold
Zhou, H.; Watts, J. D.; Aebersold, R. Nat. Biotechnol. 2001, 19, 375-378. (11) Goshe, M. B.; Conrads, T. P.; Panisko, E. A.; Angell, N. H.; Veenstra, T. D.; Smith, R. D. Anal. Chem. 2001, 73, 2578-2586.