Leveraging Enzyme Structure−Function Relationships for Functional Inference and Experimental Design: The Structure−Function Linkage Database †

Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States
Biochemistry (Impact Factor: 3.02). 03/2006; 45(8):2545-55. DOI: 10.1021/bi052101l
Source: PubMed


The study of mechanistically diverse enzyme superfamilies-collections of enzymes that perform different overall reactions but share both a common fold and a distinct mechanistic step performed by key conserved residues-helps elucidate the structure-function relationships of enzymes. We have developed a resource, the structure-function linkage database (SFLD), to analyze these structure-function relationships. Unique to the SFLD is its hierarchical classification scheme based on linking the specific partial reactions (or other chemical capabilities) that are conserved at the superfamily, subgroup, and family levels with the conserved structural elements that mediate them. We present the results of analyses using the SFLD in correcting misannotations, guiding protein engineering experiments, and elucidating the function of recently solved enzyme structures from the structural genomics initiative. The SFLD is freely accessible at

Download full-text


Available from: John H Morris, Dec 27, 2013
22 Reads
  • Source
    • "Because of the rapid growth of the public sequence databases, we concentrate our efforts on adding datasets that are useful for specific projects, rather than attempt to model all known protein sequences based on all detectably related known structures. Currently, ModBase includes a model dataset for each of 65 complete genomes, as well as datasets for all sequences in the Structure Function Linkage Database (SFLD) (36), and for the complete SwissProt/TrEMBL database as of 2005 ( Additionally, available models for new SFLD sequences are added weekly. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ModBase ( is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment ( ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server ( ModBase models are also available through the Protein Model Portal ( Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (, the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (, the FoXSDock server for protein-protein docking filtered by an SAXS profile (, the SAXS Merge server for automatic merging of SAXS profiles ( and the Pose & Rank server for scoring protein-ligand complexes ( In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1144 · 9.11 Impact Factor
  • Source
    • "Since >50% of sequences in Gene3D belong to functionally diverse superfamilies, and since the FunFams are functionally cohesive families, using FunFam-based assignments allows us to provide more reliable functional annotation than using assignments from broad, highly diverse CATH superfamilies. The FunFams have been benchmarked against manually curated, experimentally annotated functional classifications, such as the SFLD (17) and more recently have been shown to be highly competitive in the international CAFA functional annotation assessment (4). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene3D ( is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1205 · 9.11 Impact Factor
  • Source
    • "Previous peptidases classifications are based on evolutionary similarity [8] and reaction mechanism [9]. It is difficult to predict the function of peptidases by transferring annotation via homology because structural similarity does not always correspond to catalytic similarity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Burkholderia pseudomallei K96243, the causative agent of melioidosis, is reported to produce various extracellular products, including proteases. The role of these proteases in the melioidosis, however, remains obscure. Previous findings have hinted at the inherent pathogenicity of the protease during B. pseudomallei K96243 infection. We chose to study the two major families peptidases, i.e. serine peptidases and metallopeptidases present in B. pseudomallei K96243. The data mining revealed eighty ORFs (open reading frame) that potentially code for these peptidases and have prominent homology with B. pseudomallei K96243 based on prediction of function by bioinformatics approach. The annotations and classification lead forty eight and thirty two putative peptidases belong to serine peptidase and metallopeptidase, respectively. The distribution of 98% from the identified putative peptidase belongs to endopeptidases (EC 3.4.21. and EC 3.4.24.) and exopeptidases (EC 3.4.11., EC 3.4.13., EC 3.4.14., EC 3.4.16. and EC 3.4.17) and another 2% belong to EC 3.5.
    Procedia Computer Science 12/2012; 11:36–42. DOI:10.1016/j.procs.2012.09.005
Show more

Similar Publications