Article

A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences

EMBL Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
BMC Bioinformatics (Impact Factor: 2.67). 02/2008; 9:229. DOI: 10.1186/1471-2105-9-229
Source: PubMed

ABSTRACT The structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.
We have developed a method for scoring the conservation of linear motif instances. It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns. On our benchmarking, the method accurately scores 86% of the known positive instances, while distinguishing them from random matches in 78% of the cases. The conservation score is implemented as a real time application designed to be integrated into other tools. It is currently accessible via a Web Service or through a graphical interface.
The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences. It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.

Download full-text

Full-text

Available from: Rodrigo López Serrano, Jul 01, 2015
0 Followers
 · 
201 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Various post-translational modifications (PTMs) fine-tune the functions of almost all eukaryotic proteins, and co-regulation of different types of PTMs has been shown within and between a number of proteins. Aiming at a more global view of the interplay between PTM types, we collected modifications for 13 frequent PTM types in 8 eukaryotes, compared their speed of evolution and developed a method for measuring PTM co-evolution within proteins based on the co-occurrence of sites across eukaryotes. As many sites are still to be discovered, this is a considerable underestimate, yet, assuming that most co-evolving PTMs are functionally associated, we found that PTM types are vastly interconnected, forming a global network that comprise in human alone >50,000 residues in about 6000 proteins. We predict substantial PTM type interplay in secreted and membrane-associated proteins and in the context of particular protein domains and short-linear motifs. The global network of co-evolving PTM types implies a complex and intertwined post-translational regulation landscape that is likely to regulate multiple functional states of many if not all eukaryotic proteins.
    Molecular Systems Biology 07/2012; 8:599. DOI:10.1038/msb.2012.31 · 14.10 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Short linear motifs (SLiMs) are important mediators of protein-protein interactions. Their short and degenerate nature presents a challenge for computational discovery. We sought to improve SLiM discovery by incorporating evolutionary information, since SLiMs are more conserved than surrounding residues. We have developed a new method that assesses the evolutionary signal of a residue in its sequence and structural context. Under-conserved residues are masked out prior to SLiM discovery, allowing incorporation into the existing statistical model employed by SLiMFinder. The method shows considerable robustness in terms of both the conservation score used for individual residues and the size of the sequence neighbourhood. Optimal parameters significantly improve return of known functional motifs from benchmarking data, raising the return of significant validated SLiMs from typical human interaction datasets from 20% to 60%, while retaining the high level of stringency needed for application to real biological data. The success of this regime indicates that it could be of general benefit to computational annotation and prediction of protein function at the sequence level. All data and tools in this article are available at http://bioware.ucd.ie/~slimdisc/slimfinder/conmasking/.
    Bioinformatics 02/2009; 25(4):443-50. DOI:10.1093/bioinformatics/btn664 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: We noted that the sumoylation site in C/EBP homologues is conserved beyond the canonical consensus sequence for sumoylation. Therefore, we investigated whether this pattern might define a more general protein motif. Results: We undertook a survey of the human proteome using a regular expression based on the C/EBP motif. This revealed significant enrichment of the motif using different Gene Ontology terms (e.g. ‘transcription’) that pertain to the nucleus. When considering requirements for the motif to be functional (evolutionary conservation, structural accessibility of the motif and proper cell localization of the protein), more than 130 human proteins were retrieved from the UniProt/Swiss-Prot database. These candidates were particularly enriched in transcription factors, including FOS, JUN, Hif-1α, MLL2 and members of the KLF, MAF and NFATC families; chromatin modifiers like CHD-8, HDAC4 and DNA Top1; and the transcriptional regulatory kinases HIPK1 and HIPK2. The KEPEmotif appears to be restricted to the metazoan lineage and has three length variants—short, medium and long—which do not appear to interchange. Contact: toby.gibson@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 12/2008; 25(1):1-5. DOI:10.1093/bioinformatics/btn594 · 4.62 Impact Factor