Length Encoded Secondary Structure Profile for Remote Homologous Protein Detection.
ABSTRACT Protein data has an explosive increasing rate both in volume and diversity, yet many of its structures remain unresolved,
as well their functions remain to be identified. The conventional sequence alignment tools are insufficient in remote homology
detection, while the current structural alignment tools would encounter the difficulties for proteins of unresolved structure.
Here, we aimed to overcome the combination of two major obstacles for detecting remote homologous proteins: proteins with
unresolved structure, and proteins of low sequence identity but high structural similarity. We proposed a novel method for
improving the performance of protein matching problem, especially for mining remote homologous proteins. In this study, existing
secondary structure prediction techniques were applied to provide the locations of secondary structure elements of proteins.
The proposed LESS (Length Encoded Secondary Structure) profile was then constructed for segment-based similarity comparison
in parallel computing. As compared to a conventional residue-based sequence alignment tool, detection of remote protein homologies
through LESS profile is favourable in terms of speed and high sequence diversity, and its accuracy and performance can improve
the deficiencies of the traditional primary sequence alignment methodology. This method may further support biologists in
protein folding, evolution, and function prediction.
- SourceAvailable from: Brian Chen[show abstract] [hide abstract]
ABSTRACT: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.BMC Bioinformatics 02/2008; 9:17. · 3.02 Impact Factor
- Systematic Zoology 07/1970; 19(2):99-113.
- [show abstract] [hide abstract]
ABSTRACT: We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra-molecular residue-residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue-residue correspondence. Typical pairwise alignments take FAST less than a second with a Pentium III 1.2GHz CPU. FAST software and a web server are available at http://biowulf.bu.edu/FAST/.Proteins Structure Function and Bioinformatics 03/2005; 58(3):618-27. · 3.34 Impact Factor