Article

ProFAT: a web-based tool for the functional annotation of protein sequences

BMC Bioinformatics 01/2006;
Source: DOAJ

ABSTRACT Abstract

Background

The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.

Results

We present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity. ProFAT combines sensitive sequence database search methods and a fold recognition algorithm with a simple text-mining approach. ProFAT extracts identified hits based on their biological background by keyword-mining of annotations, features and most importantly, literature associated with a sequence entry. A user-provided keyword list enables the user to specifically search for weak, but biologically relevant homologues of an input query. The ProFAT server has been evaluated using the complete set of proteins from three different domain families, including their weak relatives and could correctly identify between 90% and 100% of all domain family members studied in this context. ProFAT has furthermore been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins.

Conclusion

By employing sensitive database search programs as well as exploiting the functional information associated with database sequences, ProFAT can detect remote, but biologically relevant relationships between proteins and will assist researchers in the prediction of protein function based on remote homologies.

0 0
 · 
0 Bookmarks
 · 
43 Views
  • Source
    Article: Conformational adaptability of Redbeta during DNA annealing and implications for its structural relationship with Rad52.
    [show abstract] [hide abstract]
    ABSTRACT: Single-strand annealing proteins, such as Redbeta from lambda phage or eukaryotic Rad52, play roles in homologous recombination. Here, we use atomic force microscopy to examine Redbeta quaternary structure and Redbeta-DNA complexes. In the absence of DNA, Redbeta forms a shallow right-handed helix. The presence of single-stranded DNA (ssDNA) disrupts this structure. Upon addition of a second complementary ssDNA, annealing generates a left-handed helix that incorporates 14 Redbeta monomers per helical turn, with each Redbeta monomer annealing approximately 11 bp of DNA. The smallest stable annealing intermediate requires 20 bp DNA and two Redbeta monomers. Hence, we propose that Redbeta promotes base pairing by first increasing the number of transient interactions between ssDNAs. Then, annealing is promoted by the binding of a second Redbeta monomer, which nucleates the formation of a stable annealing intermediate. Using threading, we identify sequence similarities between the RecT/Redbeta and the Rad52 families, which strengthens previous suggestions, based on similarities of their quaternary structures, that they share a common mode of action. Hence, our findings have implications for a common mechanism of DNA annealing mediated by single-strand annealing proteins including Rad52.
    Journal of Molecular Biology 07/2009; 391(3):586-98. · 4.00 Impact Factor
  • Source
    Article: HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.
    [show abstract] [hide abstract]
    ABSTRACT: Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
    PLoS ONE 01/2011; 6(3):e17568. · 4.09 Impact Factor

Keywords

biological background
 
biologically relevant homologues
 
biologically relevant relationships
 
database sequences
 
different cellular contexts
 
different domain families
 
functional information
 
functional prediction
 
input query
 
protein sequences
 
query sequence
 
remote homologues
 
remote sequence similarity
 
remotely conserved proteins
 
sensitive sequence database search methods
 
sequence database searches
 
similar biological background
 
simple text-mining approach
 
weak relatives
 
web-based tool