The CATH Domain Structure Database: new Protocols and Classification Levels give a more Comprehensive Resource for Exploring Evolution

Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
Nucleic Acids Research (Impact Factor: 9.11). 02/2007; 35(Database issue):D291-7. DOI: 10.1093/nar/gkl959
Source: PubMed


We report the latest release (version 3.0) of the CATH protein domain database ( There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.

Download full-text


Available from: Alison Cuff,
  • Source
    • "Protein domains in SCOP are hierarchically classified into families, superfamilies, fold and classes. • CATH (Greene et al. 2007) is a database containing information about the classification of up to 86,151 structural protein domains. The classification is achieved through an automatic procedure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are the functional components of many cellular processes and the identification of their physical protein–protein interactions (PPIs) is an area of mature academic research. Various databases have been developed containing information about experimentally and computationally detected human PPIs as well as their corresponding annotation data. However, these databases contain many false positive interactions, are partial and only a few of them incorporate data from various sources. To overcome these limitations, we have developed HINT-KB (, a knowledge base that integrates data from various sources, provides a user-friendly interface for their retrieval, calculates a set of features of interest and computes a confidence score for every candidate protein interaction. This confidence score is essential for filtering the false positive interactions which are present in existing databases, predicting new protein interactions and measuring the frequency of each true protein interaction. For this reason, a novel machine learning hybrid methodology, called (Evolutionary Kalman Mathematical Modelling—EvoKalMaModel), was used to achieve an accurate and interpretable scoring methodology. The experimental results indicated that the proposed scoring scheme outperforms existing computational methods for the prediction of PPIs.
    Artificial Intelligence Review 10/2013; 42(3). DOI:10.1007/s10462-013-9409-8 · 2.11 Impact Factor
  • Source
    • "The fold recognition analyses were performed using FUGUE (Shi, Blundell, & Mizuguchi, 2001), GenTHREADER (Jones, 1999b). The architectural motifs and the topology of proteins with known three-dimensional (3D) structure were analyzed according to SCOP (Murzin, Brenner, Hubbard, & Chothia, 1995) and CATH (Greene et al., 2007) classification. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Prokaryotes and eukaryotes respond to various environmental stimuli using the two-component system (TCS). Essentially, it consists of membrane-bound histidine kinase (HK) which senses the stimuli and further transfers the signal to the response regulator, which in turn, regulates expression of various target genes. Recently, sequence-based genome wide analysis has been carried out in Arabidopsis and rice to identify all the putative members of TCS family. One of the members of this family i.e. AtHK1, (a putative osmosensor, hybrid-type sensory histidine kinase) is known to interact with AtHPt1 (phosphotransfer proteins) in Arabidopsis. Based on predicted rice interactome network (PRIN), the ortholog of AtHK1 in rice, OsHK3b, was found to be interacting with OsHPt2. The analysis of amino acid sequence of AtHK1 showed the presence of transmitter domain (TD) and receiver domain (RD), while OsHK3b showed presence of three conserved domains namely CHASE (signaling domain), TD, and RD. In order to elaborate on structural details of functional domains of hybrid-type HK and phosphotransfer proteins in both these genera, we have modeled them using homology modeling approach. The structural motifs present in various functional domains of the orthologous proteins were found to be highly conserved. Binding analysis of the RD domain of these sensory proteins in Arabidopsis and rice revealed the role of various residues such as histidine in HPt protein which are essential for their interaction.
    Journal of biomolecular Structure & Dynamics 07/2013; 32(8). DOI:10.1080/07391102.2013.818576 · 2.92 Impact Factor
  • Source
    • "According to the conformational selection theory, these factors could be used to study conformational changes and to correlate them with biological information. In addition, proteins were linked with other databases, such as UniProt (Jain et al., 2009), gene ontology (Ashburner et al., 2000), enzyme commission (Kotera et al., 2004), CATH (Greene et al., 2007), SIFTS (Velankar et al., 2005) and MobiDB (Di Domenico et al., 2012), to obtain a broad spectrum of biological and physicochemical information such as taxonomy, source organism, protein function , degree of protein disorder and structural class, among others. Structures (putative conformers) for a given protein were clustered using two algorithms: hierarchical clustering and affinity propagation clustering. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Conformational diversity is a key concept in the understanding of different issues related with protein function such as the study of catalytic processes in enzymes, protein-protein recognition, protein evolution and the origins of new biological functions. Here we present a database of proteins with different degrees of conformational diversity. CoDNaS (from Conformational Diversity of Native Sate) is a redundant collection of three-dimensional structures for the same protein derived from Protein Data Bank. Structures for the same protein obtained under different crystallographic conditions have been associated with snapshots of protein dynamism and consequently could characterize protein conformers. CoDNaS allows the user to explore global and local structural differences among conformers as a function of different parameters such as presence of ligand, post-translational modifications, changes in oligomeric states, differences in pH and temperature. Additionally CoDNaS contains information about protein taxonomy and function, disorder level and structural classification offering useful information to explore the underlying mechanism of conformational diversity and its close relationship with protein function. Currently CoDNaS has 122122 structures integrating 12684 entries, with an average of 9.63 conformers per protein. the database is freely available at
    Bioinformatics 07/2013; 29(19). DOI:10.1093/bioinformatics/btt405 · 4.98 Impact Factor
Show more