[Show abstract][Hide abstract] ABSTRACT: It has long been understood that it is proteins, expressed and post-translationally modified, that are the primary regulators of both the fate and the function of cells. The ability to measure differences in the expression of the constellation of unique protein forms (proteoforms) with complete molecular specificity has the potential to sharply improve the return on investment for mass spectrometry-based proteomics in translational research and clinical diagnostics.
Expert Review of Proteomics 10/2014; · 3.90 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics, we developed a method for the global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprising 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence or absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of new natural products using large data sets. Extrapolation from the 830-genome data set reveals that Actinobacteria encode hundreds of thousands of future drug leads, and the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them.
Nature Chemical Biology 09/2014; · 12.95 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Overexpression of the histone methyltransferase MMSET in t(4;14)+ multiple myeloma patients is believed to be the driving factor in the pathogenesis of this subtype of myeloma. MMSET catalyzes dimethylation of lysine 36 on histone H3 (H3K36me2), and its overexpression causes a global increase in H3K36me2, redistributing this mark in a broad, elevated level across the genome. Here, we demonstrate that an increased level of MMSET also induces a global reduction of lysine 27 trimethylation on histone H3 (H3K27me3). Despite the net decrease in H3K27 methylation, specific genomic loci exhibit enhanced recruitment of the EZH2 histone methyltransferase and become hypermethylated on this residue. These effects likely contribute to the myeloma phenotype since MMSET-overexpressing cells displayed increased sensitivity to EZH2 inhibition. Furthermore, we demonstrate that such MMSET-mediated epigenetic changes require a number of functional domains within the protein, including PHD domains that mediate MMSET recruitment to chromatin. In vivo, targeting of MMSET by an inducible shRNA reversed histone methylation changes and led to regression of established tumors in athymic mice. Together, our work elucidates previously unrecognized interplay between MMSET and EZH2 in myeloma oncogenesis and identifies domains to be considered when designing inhibitors of MMSET function.
[Show abstract][Hide abstract] ABSTRACT: The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the "C-score" (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, "off-by-1 Da" discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78).
Journal of Proteome Research 06/2014; · 5.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We employ stable isotope labelling and quantitative mass spectrometry to track histone methylation stability. We show that H3 trimethyl K9 and K27 are slow to be established on new histones and slow to disappear from old histones, with half-lives of multiple cell divisions. By contrast the transcription-associated marks K4me3 and K36me3 turn over far more rapidly, with half-lives of 6.8 h and 57 h, respectively. Inhibition of demethylases increases K9 and K36 methylation, with K9 showing the largest and most robust increase. We interpret different turnover rates in light of genome-wide localization data and transcription-dependent nucleosome rearrangements proximal to the transcription start site.This article is protected by copyright. All rights reserved
[Show abstract][Hide abstract] ABSTRACT: With the prospect of resolving whole protein molecules into their myriad proteoforms on a proteomic scale, the question of their quantitative analysis in discovery mode comes to the fore. Here, we demonstrate a robust pipeline for the identification and stringent scoring of abundance changes of whole protein forms <30 kDa in a complex system. The input is 100–400 μg of total protein for each biological replicate, and the outputs are graphical displays depicting statistical confidence metrics for each proteoform (i.e., a volcano plot and representations of the technical and biological variation). A key part of the pipeline is the hierarchical linear model that is tailored to the original design of the study. Here, we apply this new pipeline to measure the proteoform-level effects of deleting a histone deacetylase (rpd3) in S. cerevisiae. Over 100 proteoform changes were detected above a 5% false positive threshold in WT vs the Δrpd3 mutant, including the validating observation of hyperacetylation of histone H4 and both H2B isoforms. Ultimately, this approach to label-free top down proteomics in discovery mode is a critical technical advance for testing the hypothesis that whole proteoforms can link more tightly to complex phenotypes in cell and disease biology than do peptides created in shotgun proteomics.
[Show abstract][Hide abstract] ABSTRACT: Integral membrane proteins (IMPs) are of great biophysical and clinical interest because of the key role they play in many cellular processes. Here, a comprehensive top down study of 152 IMPs and 277 soluble proteins from human H1299 cells including 11087 fragments obtained from collisionally activated dissociation (CAD), 6452 from higher-energy collisional dissociation (HCD) and 2981 from electron transfer dissociation (ETD) shows their great utility and complementarity for the identification and characterization of IMPs. A central finding is that ETD is ~two-fold more likely to cleave in soluble regions than threshold fragmentation methods, whereas the reverse is observed in transmembrane domains with an observed ~four-fold bias towards CAD and HCD. The location of charges just prior to dissociation is consistent with this directed fragmentation: protons remain localized on basic residues during ETD, but easily mobilize along the backbone during collisional activation. The fragmentation driven by these protons, which is most often observed in transmembrane domains, is both of higher yield and occurs over a greater number of backbone cleavage sites. Further, while threshold dissociation events in transmembrane domains are on average 10.1 (CAD) and 9.2 (HCD) residues distant from the nearest charge site (R, K, H, N-terminus), fragmentation is strongly influenced by the N- or C-terminal position relative to that site: the ratio of observed b to y fragments is ~1:3 if the cleavage occurs >7 residues N-terminal, and ~3:1 if it occurs >7 residues C-terminal to the nearest basic site. Threshold dissociation products driven by a mobilized proton appear to be strongly dependent not only on relative position of a charge site, but also N- or C-terminal directionality of proton movement.
[Show abstract][Hide abstract] ABSTRACT: Pilot Project #1 - the identification and characterization of human histone H4 proteoforms by top-down mass spectrometry - is the first project launched by the Consortium for Top-down Proteomics (CTDP) to refine and validate top-down mass spectrometry. Within the initial results from seven participating laboratories, all reported the probability-based identification of human histone H4 (UniProt accession P62805) with expectation values ranging from 10(-13) to 10(-105) . Regarding characterization, a total of 74 proteoforms were reported, with 21 done so unambiguously; one new post-translational modification (PTM), K79ac, was identified. Inter-laboratory comparison reveals aspects of the results that are consistent, such as the localization of individual PTMs and binary combinations, while other aspects are more variable, such as the accurate characterization of low abundance proteoforms harboring >2 PTMs. An open-access tool and discussion of proteoform scoring are included, along with a description of general challenges that lie ahead including improved proteoform separations prior to mass spectrometric analysis, better instrumentation performance, and software development. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: The direct analysis of intact proteins via mass spectrometry offers compelling advantages in comparison to alternative methods due to the direct and unambiguous identification and characterization of protein sequences it provides. The inability to efficiently analyze proteins in the 'middle mass range', defined here as proteins from 30-80 kDa, in a robust fashion has limited the adoption of these "top-down" methods. Largely a result of poor liquid chromatographic performance, the limitations in this mass range may be addressed by alternative separations that replace chromatography. Herein, the short migration times of capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) have been extended to size-sorted whole proteins in complex mixtures from Pseudomonas aeruginosa PA01. An electrokinetically pumped nanospray interface, a coated capillary and a stacking method for on-column sample concentration were developed to achieve high loading capacity and separation resolution. We achieved full width at half maximum of 8-16 seconds for model proteins up to 29 kDa and identified 30 proteins in the mass range of 30-80 kDa from Pseudomonas aeruginosa PA01 whole cell lysate. These results suggest that CZE-ESI-MS/MS is capable of identifying proteins in the middle mass range in top-down proteomics. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Intact protein characterization using mass spectrometry thus far has been achieved at the cost of throughput. Presented here is the application of 193 nm ultraviolet photodissociation (UVPD) for top down identification and characterization of proteins in complex mixtures in an online fashion. Liquid chromatographic separation at the intact protein level coupled with fast UVPD and high resolution detection resulted in confident identification of 46 unique sequences compared to 44 using HCD from prepared E. coli ribosomes. Importantly, nearly all proteins identified in both the UVPD and optimized HCD analyses demonstrated a substantial increase in confidence in identification (as defined by an average decrease in E value of ~40 orders of magnitude) due to the higher number of matched fragment ions. Also shown is the potential for high throughput characterization of intact proteins via LC-UVPD-MS of molecular weight-based fractions of an S. cerevisiae lysate. In total protein products from 215 genes were identified and found in 292 distinct proteoforms, 168 of which contained some type of post-translational modification.
[Show abstract][Hide abstract] ABSTRACT: The ability to study organisms by direct analysis of their proteomes without digestion via mass spectrometry has benefited greatly from recent advances in separation techniques, instrumentation, and bioinformatics. However, improvements to data acquisition logic have lagged in comparison. Past workflows for Top Down Proteomics (TDPs) have focused on high throughput at the expense of maximal protein coverage and characterization. This mode of data acquisition has led to enormous overlap in the identification of highly abundant proteins in subsequent LC-MS injections. Furthermore, a wealth of data is left underutilized by analyzing each newly targeted species as unique, rather than as part of a collection of fragmentation events on a distinct proteoform. Here, we present a major advance in software for acquisition of TDP data that incorporates a fully automated workflow able to detect intact masses, guide fragmentation to achieve maximal identification and characterization of intact protein species, and perform database search online to yield real-time protein identifications. On Pseudomonas aeruginosa, the software combines fragmentation events of the same precursor with previously obtained fragments to achieve improved characterization of the target form by an average of 42 orders of magnitude in confidence. When HCD fragmentation optimization was applied to intact proteins ions, there was an 18.5 order of magnitude gain in confidence. These improved metrics set the stage for increased proteome coverage and characterization of higher order organisms in the future for sharply improved control over MS instruments in a project- and lab-wide context.
[Show abstract][Hide abstract] ABSTRACT: In the developing mammalian brain, inhibition of NMDA receptor can induce widespread neuroapoptosis, inhibit neurogenesis and cause impairment of learning and memory. Although some mechanistic insights into adverse neurological actions of these NMDA receptor antagonists exist, our understanding of the full spectrum of developmental events affected by early exposure to these chemical agents in the brain is still limited. Here we attempt to gain insights into the impact of pharmacologically induced excitatory/inhibitory imbalance in infancy on the brain proteome using mass spectrometric imaging (MSI). Our goal was to study changes in protein expression in postnatal day 10 (P10) rat brains following neonatal exposure to the NMDA receptor antagonist dizocilpine (MK801). Analysis of rat brains exposed to vehicle or MK801 and comparison of their MALDI MS images revealed differential relative abundances of several proteins. We then identified these markers such as ubiquitin, purkinje cell protein 4 (PEP-19), cytochrome c oxidase subunits and calmodulin, by a combination of reversed-phase (RP) HPLC fractionation and top-down tandem MS platform. More in-depth large scale study along with validation experiments will be carried out in the future. Overall, our findings indicate that a brief neonatal exposure to a compound that alters excitatory/inhibitory balance in the brain has a long term effect on protein expression patterns during subsequent development, highlighting the utility of MALDI-MSI as a discovery tool for potential biomarkers.
PLoS ONE 01/2014; 9(4):e92831. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The rise of the “Top Down” method in the field of mass spectrometry-based proteomics has ushered in a new age of promise and challenge for the characterization and identification of proteins. Injecting intact proteins into the mass spectrometer allows for better characterization of post-translational modifications and avoids several of the serious “inference” problems associated with peptide-based proteomics. However, successful implementation of a Top Down approach to endogenous or other biologically relevant samples often requires the use of one or more forms of separation prior to mass spectrometric analysis, which have only begun to mature for whole protein MS. Recent advances in instrumentation have been used in conjunction with new ion fragmentation using photons and electrons that allow for better (and often complete) protein characterization on cases simply not tractable even just a few years ago. Finally, the use of native electrospray mass spectrometry has shown great promise for the identification and characterization of whole protein complexes in the 100 kDa to 1 MDa regime, with prospects for complete compositional analysis for endogenous protein assemblies a viable goal over the coming few years.
Biochemical and Biophysical Research Communications 01/2014; · 2.28 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Site-specific incorporation of non-standard amino acids (NSAAs) into proteins enables the creation of biopolymers, proteins, and enzymes with new chemical properties, new structures, and new functions. To achieve this, amber (TAG codon) suppression has been widely applied. However, the suppression efficiency is limited due to the competition with translation termination by release factor 1 (RF1), which leads to truncated products. Recently, we constructed a genomically recoded Escherichia coli strain lacking RF1 where 13 occurrences of the amber stop codon have been reassigned to the synonymous TAA codon (rEc.E13.∆prfA). Here, we assessed and characterized cell-free protein synthesis (CFPS) in crude S30 cell lysates derived from this strain. We observed the synthesis of approximately 190 ± 20 μg/mL of modified soluble superfolder green fluorescent protein (sfGFP) containing a single p-propargyloxy-L-phenylalanine (pPaF) or p-acetyl-L-phenylalanine. As compared to the parent rEc.E13 strain with RF1, this results in a modified sfGFP synthesis improvement of more than 250%. Beyond introducing a single NSAA, we further demonstrated benefits of CFPS from the RF1-deficient strains for incorporating pPaF at two- and five-sites per sfGFP protein. Finally, we compared our crude S30 extract system to the PURE translation system lacking RF1. We observed that our S30 extract based approach is more cost-effective and high yielding than the PURE translation system lacking RF1, ~1000 times on a milligram protein produced/$ basis. Looking forward, using RF1-deficient strains for extract-based CFPS will aid in the synthesis of proteins and biopolymers with site-specifically incorporated NSAAs.
[Show abstract][Hide abstract] ABSTRACT: Mammalian circadian rhythm is maintained by the suprachiasmatic nucleus (SCN) via an intricate set of neuropeptides and other signaling molecules. In this work, peptidomic analyses from two times of day were examined to characterize variation in SCN peptides using three different label-free quantitation approaches: spectral count, spectra index and SIEVE. Of the 448 identified peptides, 207 peptides were analyzed by two label-free methods, spectral count and spectral index. There were 24 peptides with significant (adjusted p-value <0.01) differential peptide abundances between daytime and nighttime, including multiple peptides derived from secretogranin II, cocaine and amphetamine regulated transcript, and proprotein convertase subtilisin/kexin type 1 inhibitor. Interestingly, more peptides were analyzable and had significantly different abundances between the two time points using the spectral count and spectral index methods than with a prior analysis using the SIEVE method with the same data. The results of this study reveal the importance of using the appropriate data analysis approaches for label-free relative quantitation of peptides. The detection of significant changes in so rich a set of neuropeptides reflects the dynamic nature of the SCN and the number of influences such as feeding behavior on circadian rhythm. Using spectral count and spectral index, peptide level changes are correlated to time of day, suggesting their key role in circadian function.
[Show abstract][Hide abstract] ABSTRACT: The use of proteomics for direct detection of expressed pathways producing natural products has yielded many new compounds, even when used in a screening mode without a bacterial genome sequence available. Here we quantify the advantages of having draft DNA-sequence available for strain-specific proteomics using the latest in ultrahigh-resolution mass spectrometry for both proteins and the small molecules they generate. Using the draft sequence of Streptomyces lilacinus NRRL B-1968, we show a >tenfold increase in the number of peptide identifications vs. using publicly available databases. Detected in this strain were six expressed gene clusters with varying homology to those known. To date, we have identified three of these clusters as encoding for the production of griseobactin (known), rakicidin D (an orphan NRPS/PKS hybrid cluster), and a putative thr and DHB-containing siderophore produced by a new non-ribosomal peptide sythetase gene cluster. The remaining three clusters show lower homology to those known, and likely encode enzymes for production of novel compounds. Using an interpreted strain-specific DNA sequence enables deep proteomics for the detection of multiple pathways and their encoded natural products in a single cultured bacterium.
Journal of Industrial Microbiology 11/2013; · 1.80 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Native mass spectrometry (MS) is becoming an important integral part of structural proteomics and system biology research. The approach holds great promise for elucidating higher levels of protein structure; from primary to quaternary. This requires the most efficient use of tandem MS, which is the cornerstone of MS-based approaches. In this work, we advance a two-step fragmentation approach, or (pseudo)-MS3, from native protein complexes to a set of constituent fragment ions. Using an efficient desolvation approach and quadrupole selection in the extended mass-to-charge (m/z) range, we have accomplished sequential dissociation of large protein complexes, such as phosporylase B (194 kDa), pyruvate kinase (232 kDa) and GroEL (801 kDa), to highly-charged monomers which were then dissociated to a set of multiply charged fragmentation products. Fragment ion signals were acquired with a high resolution, high mass accuracy Orbitrap instrument that enabled highly confident identifications of the precursor monomer subunits. The developed approach is expected to enable characterization of stoichiometry and composition of endogenous native protein complexes at an unprecedented level of detail.
[Show abstract][Hide abstract] ABSTRACT: Cellular senescence, an irreversible cell cycle arrest induced by a diversity of stimuli, has been considered as an innate tumor suppressing mechanism with implications and applications in cancer therapy. Using a targeted proteomics approach we show that fibroblasts induced into senescence by expression of oncogenic Ras exhibit a decrease of global acetylation on all core histones, consistent with formation of senescence-associated heterochromatic foci. We also detected clear increases in repressive markers (e.g., >50% elevation of H3K27me2/3) along with decreases in histone marks associated with increased transcriptional expression/elongation (e.g., H3K36me2/3). Despite the increases in repressive marks of chromatin, 179 loci (of 2206 total) were found to be upregulated by global quantitative proteomics. The changes in the cytosolic proteome indicated an upregulation of mitochondrial proteins and downregulation of proteins involved in glycolysis. These alterations in primary metabolism are opposite of the well-known Warburg effect observed in cancer cells. This study significantly improves our understanding of stress-induced senescence and provides a potential application for triggering it in anti-proliferative strategies that target the primary metabolism in cancer cells. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Top Down proteomics is emerging as a viable method for the routine identification of hundreds to thousands of proteins. In this work we report the largest Top Down study to date, with the identification of 1,220 proteins from the transformed human cell line H1299 at a false discovery rate of 1%. Multiple separation strategies were performed, including the focused isolation of mitochondria, resulting in significantly improved proteome coverage over previous work. In all, 347 mitochondrial proteins were identified, including ~50% of the mitochondrial proteome below 30 kDa and over 75% of the subunits comprising the large complexes of oxidative phosphorylation. Three hundred of the identified proteins were found to be integral membrane proteins containing between 1 and 12 transmembrane helices, requiring no specific enrichment or modified LC-MS parameters. Over 5,000 proteoforms were observed, many harboring post-translational modifications including over a dozen proteins containing lipid anchors (some previously unknown) and many others with phosphorylation and methylation modifications. Comparison between untreated and senescent H1299 cells revealed several changes to the proteome including the hyperphosphorylation of HMGA2. This work illustrates the burgeoning ability Top Down proteomics to characterize large numbers of intact proteoforms in a high-throughput fashion.
[Show abstract][Hide abstract] ABSTRACT: Mass spectrometry based proteomics generally seeks to identify and characterize protein molecules with high accuracy and throughput. Recent speed and quality improvements to the independent steps of integrated platforms have removed many limitations to the robust implementation of top down proteomics (TDP) for proteins below 70kDa. Improved intact protein separations coupled to high-performance instruments have increased the quality and number of protein and proteoform identifications. To date, TDP applications have shown >1000 protein identifications, expanding to an average of ∼3-4 more proteoforms for each protein detected. In the near future, increased fractionation power, new mass spectrometers and improvements in proteoform scoring will combine to accelerate the application and impact of TDP to this century's biomedical problems.
Current opinion in chemical biology 08/2013; · 8.30 Impact Factor