An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets

Department of Cell Biology, 240 Longwood Ave., Harvard Medical School, Boston, Massachusetts 02115, USA.
Nature Biotechnology (Impact Factor: 41.51). 12/2005; 23(11):1391-8. DOI: 10.1038/nbt1146
Source: PubMed


With the recent exponential increase in protein phosphorylation sites identified by mass spectrometry, a unique opportunity has arisen to understand the motifs surrounding such sites. Here we present an algorithm designed to extract motifs from large data sets of naturally occurring phosphorylation sites. The methodology relies on the intrinsic alignment of phospho-residues and the extraction of motifs through iterative comparison to a dynamic statistical background. Results show the identification of dozens of novel and known phosphorylation motifs from recently published serine, threonine and tyrosine phosphorylation studies. When applied to a linguistic data set to test the versatility of the approach, the algorithm successfully extracted hundreds of language motifs. This method, in addition to shedding light on the consensus sequences of identified and as yet unidentified kinases and modular protein domains, may also eventually be used as a tool to determine potential phosphorylation sites in proteins of interest.

9 Reads
  • Source
    • "To assess a potential preference for certain kinase motifs in the five phosphopeptides data sets, we subjected the hereidentified phosphorylation sites first to the motif-x algorithm (Schwartz and Gygi, 2005). A total of 132 distinct motifs could be defined among all the five proteases data sets (Table S3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Although mass-spectrometry-based screens enable thousands of protein phosphorylation sites to be monitored simultaneously, they often do not cover important regulatory sites. Here, we hypothesized that this is due to the fact that nearly all large-scale phosphoproteome studies are initiated by trypsin digestion. We tested this hypothesis using multiple proteases for protein digestion prior to Ti(4+)-IMAC-based enrichment. This approach increases the size of the detectable phosphoproteome substantially and confirms the considerable tryptic bias in public repositories. We define and make available a less biased human phosphopeptide atlas of 37,771 unique phosphopeptides, correlating to 18,430 unique phosphosites, of which fewer than 1/3 were identified in more than one protease data set. We demonstrate that each protein phosphorylation site can be linked to a preferred protease, enhancing its detection by mass spectrometry (MS). For specific sites, this approach increases their detectability by more than 1,000-fold. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 06/2015; 11(11). DOI:10.1016/j.celrep.2015.05.029 · 8.36 Impact Factor
  • Source
    • "Thus, the two studies provide different but complementary rapamycin-regulated phosphoproteomes. To determine if TORC1 controls phosphorylation of specific motifs, we analyzed the sequences surrounding the rapamycin-regulated phosphosites, with the Motif-X algorithm (Schwartz and Gygi, 2005). For the peptides "
    Dataset: 3475-3
  • Source
    • "PASS00638). Prediction of consensus sequences of S-nitrosylated peptides was performed by the Motif-X algorithm (Schwartz and Gygi, 2005). The GO categorization of S-nitrosylated proteins was performed as described earlier. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Nitric oxide (NO) regulates multiple developmental events and stress responses in plants. A major biologically active species of NO is S-nitrosoglutathione (GSNO) that is irreversibly degraded by GSNO reductase (GSNOR). The major physiological effect of NO is protein S-nitrosylation, a redox-based posttranslational modification mechanism by covalently linking a NO molecule to a cysteine thiol. However, little is known about the mechanisms of S-nitrosylation-regulated signaling, partly due to limited S-nitrosylated proteins being identified. In this study, we identified 1,195 endogenously S-nitrosylated peptides in 926 proteins from the Arabidopsis (Arabidopsis thaliana) by a site-specific nitrosoproteomic approach, which, up to date, is the largest dataset of S-nitrosylated proteins among all organisms. Consensus sequence analysis of these peptides identified several motifs that contain acidic, but not basic, amino acid residues flanking the S-nitrosylated cysteine residues. These S-nitrosylated proteins are involved in a wide range of biological processes and are significantly enriched in chlorophyll metabolism, photosynthesis, carbohydrate metabolism, and stress responses. Consistently, the gsnor1-3 mutant shows the decreased chlorophyll content and altered photosynthetic properties, suggesting that S-nitrosylation is an important regulatory mechanism in these processes. These results have provided valuable resources and new clues to the studies on S-nitrosylation-regulated signaling in plants. Copyright © 2015, Plant Physiology.
    Plant physiology 02/2015; 167(4). DOI:10.1104/pp.15.00026 · 6.84 Impact Factor
Show more

Preview (2 Sources)

9 Reads
Available from