A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences

EMBL Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
BMC Bioinformatics (Impact Factor: 2.58). 02/2008; 9(1):229. DOI: 10.1186/1471-2105-9-229
Source: PubMed

ABSTRACT The structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.
We have developed a method for scoring the conservation of linear motif instances. It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns. On our benchmarking, the method accurately scores 86% of the known positive instances, while distinguishing them from random matches in 78% of the cases. The conservation score is implemented as a real time application designed to be integrated into other tools. It is currently accessible via a Web Service or through a graphical interface.
The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences. It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.

Download full-text


Available from: Rodrigo López Serrano, Sep 27, 2015
33 Reads
  • Source
    • "Differing conservation of PTM types within eukaryotes We comparatively studied the conservation status of the 13 PTM types as the first step for understanding their functional relations and their co-occurrence within proteins. As experimental data are not yet covering all organisms comprehensively , we assume, as implemented in other algorithms for similar purposes (Chica et al, 2008; Malik et al, 2008; Biswas et al, 2010), that the conservation of the site can be a good approximation for the conservation of the PTM. Indeed, this approach has been used to distinguish between functional and non-functional phosphorylation sites (Gnad et al, 2007; Holt et al, 2009; Tan and Bader 2012) and a less-strict criterion, the overall conservation of the proteins, was applied to determine the age of the PTMs functionality (Choudhary et al, 2009; Zielinska et al, 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Various post-translational modifications (PTMs) fine-tune the functions of almost all eukaryotic proteins, and co-regulation of different types of PTMs has been shown within and between a number of proteins. Aiming at a more global view of the interplay between PTM types, we collected modifications for 13 frequent PTM types in 8 eukaryotes, compared their speed of evolution and developed a method for measuring PTM co-evolution within proteins based on the co-occurrence of sites across eukaryotes. As many sites are still to be discovered, this is a considerable underestimate, yet, assuming that most co-evolving PTMs are functionally associated, we found that PTM types are vastly interconnected, forming a global network that comprise in human alone >50,000 residues in about 6000 proteins. We predict substantial PTM type interplay in secreted and membrane-associated proteins and in the context of particular protein domains and short-linear motifs. The global network of co-evolving PTM types implies a complex and intertwined post-translational regulation landscape that is likely to regulate multiple functional states of many if not all eukaryotic proteins.
    Molecular Systems Biology 07/2012; 8(599):599. DOI:10.1038/msb.2012.31 · 10.87 Impact Factor
  • Source
    • "However, due to the high likelihood of motifs occurring in a stochastic manner, the use of pattern matching alone produces a large number of false positive hits (6). Methods have, therefore, been developed to incorporate additional filters based on the attributes of SLiMs, including sequence conservation (11–13), structural availability (14–16), biophysical feasibility (17) and biological keywords (18). Recently, a number of de novo motif prediction tools have also emerged, capable of predicting new classes of SLiMs (19–22). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The recent expansion in our knowledge of protein-protein interactions (PPIs) has allowed the annotation and prediction of hundreds of thousands of interactions. However, the function of many of these interactions remains elusive. The interactions of Eukaryotic Linear Motif (iELM) web server provides a resource for predicting the function and positional interface for a subset of interactions mediated by short linear motifs (SLiMs). The iELM prediction algorithm is based on the annotated SLiM classes from the Eukaryotic Linear Motif (ELM) resource and allows users to explore both annotated and user-generated PPI networks for SLiM-mediated interactions. By incorporating the annotated information from the ELM resource, iELM provides functional details of PPIs. This can be used in proteomic analysis, for example, to infer whether an interaction promotes complex formation or degradation. Furthermore, details of the molecular interface of the SLiM-mediated interactions are also predicted. This information is displayed in a fully searchable table, as well as graphically with the modular architecture of the participating proteins extracted from the UniProt and Phospho.ELM resources. A network figure is also presented to aid the interpretation of results. The iELM server supports single protein queries as well as large-scale proteomic submissions and is freely available at
    Nucleic Acids Research 05/2012; 40(Web Server issue):W364-9. DOI:10.1093/nar/gks444 · 9.11 Impact Factor
  • Source
    • "The SLiM and its surrounding residues are then assessed for their propensity to be in a region of intrinsic disorder using IUPred. The SLiMSearch programme also outputs a score for the Conservation Score (Chica et al., 2008) and a RLC variance score indicating the differences in conservation between the individual amino acids of the SLiM instance. Contextual information such as overlapping Pfam Domains and PDB structures (Velankar and Kleywegt, 2011) is also included. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Eukaryotic proteins are highly modular, containing multiple interaction interfaces that mediate binding to a network of regulators and effectors. Recent advances in high-throughput proteomics have rapidly expanded the number of known protein-protein interactions (PPIs); however, the molecular basis for the majority of these interactions remains to be elucidated. There has been a growing appreciation of the importance of a subset of these PPIs, namely those mediated by short linear motifs (SLiMs), particularly the canonical and ubiquitous SH2, SH3 and PDZ domain-binding motifs. However, these motif classes represent only a small fraction of known SLiMs and outside these examples little effort has been made, either bioinformatically or experimentally, to discover the full complement of motif instances. In this article, interaction data are analysed to identify and characterize an important subset of PPIs, those involving SLiMs binding to globular domains. To do this, we introduce iELM, a method to identify interactions mediated by SLiMs and add molecular details of the interaction interfaces to both interacting proteins. The method identifies SLiM-mediated interfaces from PPI data by searching for known SLiM-domain pairs. This approach was applied to the human interactome to identify a set of high-confidence putative SLiM-mediated PPIs. iELM is freely available at Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2012; 28(7):976-82. DOI:10.1093/bioinformatics/bts072 · 4.98 Impact Factor
Show more