[show abstract][hide abstract] ABSTRACT: The UCL Bioinformatics Group web portal offers several high quality protein structure prediction and function annotation algorithms including PSIPRED, pGenTHREADER, pDomTHREADER, MEMSAT, MetSite, DISOPRED2, DomPred and FFPred for the prediction of secondary structure, protein fold, protein structural domain, transmembrane helix topology, metal binding sites, regions of protein disorder, protein domain boundaries and protein function, respectively. We also now offer a fully automated 3D modelling pipeline: BioSerf, which performed well in CASP8 and uses a fragment-assembly approach which placed it in the top five servers in the de novo modelling category. The servers are available via the group web site at http://bioinf.cs.ucl.ac.uk/.
Nucleic Acids Research 01/2010; 38:563-568. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Disordered proteins need to be expressed to carry out specified functions; however, their accumulation in the cell can potentially cause major problems through protein misfolding and aggregation. Gene expression levels, mRNA decay rates, microRNA (miRNA) targeting and ubiquitination have critical roles in the degradation and disposal of human proteins and transcripts. Here, we describe a study examining these features to gain insights into the regulation of disordered proteins.
In comparison with ordered proteins, disordered proteins have a greater proportion of predicted ubiquitination sites. The transcripts encoding disordered proteins also have higher proportions of predicted miRNA target sites and higher mRNA decay rates, both of which are indicative of the observed lower gene expression levels. The results suggest that the disordered proteins and their transcripts are present in the cell at low levels and/or for a short time before being targeted for disposal. Surprisingly, we find that for a significant proportion of highly disordered proteins, all four of these trends are reversed. Predicted estimates for miRNA targets, ubiquitination and mRNA decay rate are low in the highly disordered proteins that are constitutively and/or highly expressed.
Mechanisms are in place to protect the cell from these potentially dangerous proteins. The evidence suggests that the enrichment of signals for miRNA targeting and ubiquitination may help prevent the accumulation of disordered proteins in the cell. Our data also provide evidence for a mechanism by which a significant proportion of highly disordered proteins (with high expression levels) can escape rapid degradation to allow them to successfully carry out their function.
[show abstract][hide abstract] ABSTRACT: MOTIVATION: Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Improving the sensitivity and selectivity of methods designed for these two tasks therefore has downstream benefits for many other bioinformatics applications. RESULTS: We describe the latest implementation of the GenTHREADER method for structure prediction on a genomic scale. The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model. We find this combination significantly improves both detection of useful templates and accuracy of sequence-structure alignments relative to other competitive approaches. We further present a second implementation of the protocol designed for the task of discriminating superfamilies from one another. This method, pDomTHREADER, is the first to incorporate both sequence and structural data directly in this task and improves sensitivity and selectivity over the standard version of pGenTHREADER and three other standard methods for remote homology detection.
[show abstract][hide abstract] ABSTRACT: One of the challenges of the post-genomic era is to provide accurate function annotations for large volumes of data resulting from genome sequencing projects. Most function prediction servers utilize methods that transfer existing database annotations between orthologous sequences. In contrast, there are few methods that are independent of homology and can annotate distant and orphan protein sequences. The FFPred server adopts a machine-learning approach to perform function prediction in protein feature space using feature characteristics predicted from amino acid sequence. The features are scanned against a library of support vector machines representing over 300 Gene Ontology (GO) classes and probabilistic confidence scores returned for each annotation term. The GO term library has been modelled on human protein annotations; however, benchmark performance testing showed robust performance across higher eukaryotes. FFPred offers important advantages over traditional function prediction servers in its ability to annotate distant homologues and orphan protein sequences, and achieves greater coverage and classification accuracy than other feature-based prediction servers. A user may upload an amino acid and receive annotation predictions via email. Feature information is provided as easy to interpret graphics displayed on the sequence of interest, allowing for back-interpretation of the associations between features and function classes.
Nucleic Acids Research 08/2008; 36(Web Server issue):W297-302. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Natively unstructured regions are a common feature of eukaryotic proteomes. Between 30% and 60% of proteins are predicted to contain long stretches of disordered residues, and not only have many of these regions been confirmed experimentally, but they have also been found to be essential for protein function. In this study, we directly address the potential contribution of protein disorder in predicting protein function using standard Gene Ontology (GO) categories. Initially we analyse the occurrence of protein disorder in the human proteome and report ontology categories that are enriched in disordered proteins. Pattern analysis of the distributions of disordered regions in human sequences demonstrated that the functions of intrinsically disordered proteins are both length- and position-dependent. These dependencies were then encoded in feature vectors to quantify the contribution of disorder in human protein function prediction using Support Vector Machine classifiers. The prediction accuracies of 26 GO categories relating to signalling and molecular recognition are improved using the disorder features. The most significant improvements were observed for kinase, phosphorylation, growth factor, and helicase categories. Furthermore, we provide predicted GO term assignments using these classifiers for a set of unannotated and orphan human proteins. In this study, the importance of capturing protein disorder information and its value in function prediction is demonstrated. The GO category classifiers generated can be used to provide more reliable predictions and further insights into the behaviour of orphan and unannotated proteins.