Article

T-RMSD: a fine-grained, structure-based classification method and its application to the functional characterization of TNF receptors.

EMBL/CRG Systems Biology Research Unit Center for Genomic Regulation, UPF, Barcelona, Catalunya, 08003, Spain.
Journal of Molecular Biology (impact factor: 4). 07/2010; 400(3):605-17. DOI:10.1016/j.jmb.2010.05.012 pp.605-17
Source: PubMed

ABSTRACT This study addresses the relation between structural and functional similarity in proteins. We introduce a novel method named tree based on root mean square deviation (T-RMSD), which uses distance RMSD (dRMSD) variations to build fine-grained structure-based classifications of proteins. The main improvement of the T-RMSD over similar methods, such as Dali, is its capacity to produce the equivalent of a bootstrap value for each cluster node. We validated our approach on two domain families studied extensively for their role in many biological and pathological pathways: the small GTPase RAS superfamily and the cysteine-rich domains (CRDs) associated with the tumor necrosis factor receptors (TNFRs) family. Our analysis showed that T-RMSD is able to automatically recover and refine existing classifications. In the case of the small GTPase ARF subfamily, T-RMSD can distinguish GTP- from GDP-bound states, while in the case of CRDs it can identify two new subgroups associated with well defined functional features (ligand binding and formation of ligand pre-assembly complex). We show how hidden Markov models (HMMs) can be built on these new groups and propose a methodology to use these models simultaneously in order to do fine-grained functional genomic annotation without known 3D structures. T-RMSD, an open source freeware incorporated in the T-Coffee package, is available online.

0 0
 · 
1 Bookmark
 · 
39 Views
  • Source
    Article: CATH--a hierarchic classification of protein domain structures.
    [show abstract] [hide abstract]
    ABSTRACT: Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.
    Structure 09/1997; 5(8):1093-108. · 6.35 Impact Factor
  • Article: A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3.
    [show abstract] [hide abstract]
    ABSTRACT: The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families.
    Nucleic Acids Research 02/2001; 29(1):55-7. · 8.03 Impact Factor
  • Article: Comparison of homologous tertiary structures of proteins.
    Journal of Theoretical Biology 03/1974; 43(2):351-74. · 2.21 Impact Factor

Full-text (2 Sources)

View
1 Download
Available from
6 Feb 2013

Keywords

available online
 
bootstrap value
 
cluster node
 
cysteine-rich domains
 
domain families
 
dRMSD
 
fine-grained functional genomic annotation
 
fine-grained structure-based classifications
 
ligand pre-assembly complex
 
new groups
 
new subgroups
 
novel method
 
open source freeware
 
pathological pathways
 
small GTPase RAS superfamily
 
square deviation
 
study addresses
 
T-Coffee package
 
tumor necrosis factor receptors
 
uses distance RMSD