[Show abstract][Hide abstract] ABSTRACT: Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.
Nucleic Acids Research 10/2014; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.
The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an alpha+beta domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.
Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
[Show abstract][Hide abstract] ABSTRACT: We present a prototype of a new structural classification of proteins, SCOP2 (http://scop2.mrc-lmb.cam.ac.uk/), that we have developed recently. SCOP2 is a successor to the Structural Classification of Proteins (SCOP, http://scop.mrc-lmb.cam.ac.uk/scop/) database. Similarly to SCOP, the main focus of SCOP2 is to organize structurally characterized proteins according to their structural and evolutionary relationships. SCOP2 was designed to provide a more advanced framework for protein structure annotation and classification. It defines a new approach to the classification of proteins that is essentially different from SCOP, but retains its best features. The SCOP2 classification is described in terms of a directed acyclic graph in which nodes form a complex network of many-to-many relationships and are represented by a region of protein structure and sequence. The new classification project is expected to ensure new advances in the field and open new areas of research.
Nucleic Acids Research 11/2013; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found.
Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role.
Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).
[Show abstract][Hide abstract] ABSTRACT: Maf (for multicopy associated filamentation) proteins represent a large family of conserved proteins implicated in cell division arrest but whose biochemical activity remains unknown. Here, we show that the prokaryotic and eukaryotic Maf proteins exhibit nucleotide pyrophosphatase activity against 5-methyl-UTP, pseudo-UTP, 5-methyl-CTP, and 7-methyl-GTP, which represent the most abundant modified bases in all organisms, as well as against canonical nucleotides dTTP, UTP, and CTP. Overexpression of the Maf protein YhdE in E. coli cells increased intracellular levels of dTMP and UMP, confirming that dTTP and UTP are the in vivo substrates of this protein. Crystal structures and site-directed mutagenesis of Maf proteins revealed the determinants of their activity and substrate specificity. Thus, pyrophosphatase activity of Maf proteins toward canonical and modified nucleotides might provide the molecular mechanism for a dual role of these proteins in cell division arrest and house cleaning.
[Show abstract][Hide abstract] ABSTRACT: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology.
We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria.
Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.
[Show abstract][Hide abstract] ABSTRACT: Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Nucleic Acids Research 11/2012; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: C-1 carriers are essential cofactors in all domains of life, and in Archaea, these can be derivatives of tetrahydromethanopterin (H(4)-MPT) or tetrahydrofolate (H(4)-folate). Their synthesis requires 6-hydroxymethyl-7,8-dihydropterin diphosphate (6-HMDP) as the precursor, but the nature of pathways that lead to its formation were unknown until the recent discovery of the GTP cyclohydrolase IB/MptA family that catalyzes the first step, the conversion of GTP to dihydroneopterin 2',3'-cyclic phosphate or 7,8-dihydroneopterin triphosphate [El Yacoubi, B.; et al. (2006) J. Biol. Chem., 281, 37586-37593 and Grochowski, L. L.; et al. (2007) Biochemistry46, 6658-6667]. Using a combination of comparative genomics analyses, heterologous complementation tests, and in vitro assays, we show that the archaeal protein families COG2098 and COG1634 specify two of the missing 6-HMDP synthesis enzymes. Members of the COG2098 family catalyze the formation of 6-hydroxymethyl-7,8-dihydropterin from 7,8-dihydroneopterin, while members of the COG1634 family catalyze the formation of 6-HMDP from 6-hydroxymethyl-7,8-dihydropterin. The discovery of these missing genes solves a long-standing mystery and provides novel examples of convergent evolutions where proteins of dissimilar architectures perform the same biochemical function.
ACS Chemical Biology 08/2012; · 5.44 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The YgjD/Kae1 family (COG0533) has been on the top-10 list of universally conserved proteins of unknown function for over 5 years. It has been linked to DNA maintenance in bacteria and mitochondria and transcription regulation and telomere homeostasis in eukaryotes, but its actual function has never been found. Based on a comparative genomic and structural analysis, we predicted this family was involved in the biosynthesis of N(6)-threonylcarbamoyl adenosine, a universal modification found at position 37 of tRNAs decoding ANN codons. This was confirmed as a yeast mutant lacking Kae1 is devoid of t(6)A. t(6)A(-) strains were also used to reveal that t(6)A has a critical role in initiation codon restriction to AUG and in restricting frameshifting at tandem ANN codons. We also showed that YaeZ, a YgjD paralog, is required for YgjD function in vivo in bacteria. This work lays the foundation for understanding the pleiotropic role of this universal protein family.
The EMBO Journal 02/2011; 30(5):882-93. · 9.82 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The crystal structure of a putative NTPase, YP_001813558.1 from Exiguobacterium sibiricum 255-15 (PF09934, DUF2166) was determined to 1.78 Å resolution. YP_001813558.1 and its homologs (dimeric dUTPases, MazG proteins and HisE-encoded phosphoribosyl ATP pyrophosphohydrolases) form a superfamily of all-α-helical NTP pyrophosphatases. In dimeric dUTPase-like proteins, a central four-helix bundle forms the active site. However, in YP_001813558.1, an unexpected intertwined swapping of two of the helices that compose the conserved helix bundle results in a `linked dimer' that has not previously been observed for this family. Interestingly, despite this novel mode of dimerization, the metal-binding site for divalent cations, such as magnesium, that are essential for NTPase activity is still conserved. Furthermore, the active-site residues that are involved in sugar binding of the NTPs are also conserved when compared with other α-helical NTPases, but those that recognize the nucleotide bases are not conserved, suggesting a different substrate specificity.
Acta Crystallographica Section F Structural Biology and Crystallization Communications 10/2010; 66(Pt 10):1237-44. · 0.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed β-sheet architectures such as the double barrel and spiral β-roll and discusses new examples of unusual topologies and peculiar structural features observed in proteins characterized by the JCSG and other Structural Genomics centres.
Acta Crystallographica Section F Structural Biology and Crystallization Communications 10/2010; 66(Pt 10):1190-7. · 0.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
Nucleic Acids Research 02/2008; 36(Database issue):D419-25. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have identified a novel family of proteins, in which the N-terminal cystathionine beta-synthase (CBS) domain is fused to the C-terminal Zn ribbon domain. Four proteins were overexpressed in Escherichia coli and purified: TA0289 from Thermoplasma acidophilum, TV1335 from Thermoplasma volcanium, PF1953 from Pyrococcus furiosus, and PH0267 from Pyrococcus horikoshii. The purified proteins had a red/purple color in solution and an absorption spectrum typical of rubredoxins (Rds). Metal analysis of purified proteins revealed the presence of several metals, with iron and zinc being the most abundant metals (2-67% of iron and 12-74% of zinc). Crystal structures of both mercury- and iron-bound TA0289 (1.5-2.0 A resolution) revealed a dimeric protein whose intersubunit contacts are formed exclusively by the alpha-helices of two cystathionine beta-synthase subdomains, whereas the C-terminal domain has a classical Zn ribbon planar architecture. All proteins were reversibly reduced by chemical reductants (ascorbate or dithionite) or by the general Rd reductase NorW from E. coli in the presence of NADH. Reduced TA0289 was found to be capable of transferring electrons to cytochrome C from horse heart. Likewise, the purified Zn ribbon protein KTI11 from Saccharomyces cerevisiae had a purple color in solution and an Rd-like absorption spectrum, contained both iron and zinc, and was reduced by the Rd reductase NorW from E. coli. Thus, recombinant Zn ribbon domains from archaea and yeast demonstrate an Rd-like electron carrier activity in vitro. We suggest that, in vivo, some Zn ribbon domains might also bind iron and therefore possess an electron carrier activity, adding another physiological role to this large family of important proteins.
Journal of Molecular Biology 02/2008; 375(1):301-15. · 3.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Sigmaomeganuphiomicronzeta]--in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from http://sisyphus.mrc-cpe.cam.ac.uk.
Nucleic Acids Research 02/2007; 35(Database issue):D253-9. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Understanding the molecular mechanisms of transition state regulator proteins is critical, since they play a pivotal role in the ability of bacteria to cope with changing environments. Although much effort has focused on their genetic characterization, little is known about their structural and functional conservation. Here we present the high resolution NMR solution structure of the N-terminal domain of the Bacillus subtilis transition state regulator Abh (AbhN), only the second such structure to date. We then compare AbhN to the N-terminal DNA-binding domain of B. subtilis AbrB (AbrBN). This is the first such comparison between two AbrB-like transition state regulators. AbhN and AbrBN are very similar, suggesting a common structural basis for their DNA binding. However, we also note subtle variances between the AbhN and AbrBN structures, which may play important roles in DNA target specificity. The results of accompanying in vitro DNA-binding studies serve to highlight binding differences between the two proteins.
Journal of Biological Chemistry 08/2006; 281(30):21399-409. · 4.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Understanding the molecular mechanisms of transition state regulator proteins is critical, since they play a pivotal role
in the ability of bacteria to cope with changing environments. Although much effort has focused on their genetic characterization,
little is known about their structural and functional conservation. Here we present the high resolution NMR solution structure
of the N-terminal domain of the Bacillus subtilis transition state regulator Abh (AbhN), only the second such structure to date. We then compare AbhN to the N-terminal DNA-binding
domain of B. subtilis AbrB (AbrBN). This is the first such comparison between two AbrB-like transition state regulators. AbhN and AbrBN are very
similar, suggesting a common structural basis for their DNA binding. However, we also note subtle variances between the AbhN
and AbrBN structures, which may play important roles in DNA target specificity. The results of accompanying in vitro DNA-binding studies serve to highlight binding differences between the two proteins.
Journal of Biological Chemistry 07/2006; 281(30):21399-21409. · 4.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The functional requirement to form and maintain the active site structure probably exerts a strong selective pressure on a protein to adopt just one stable and evolutionarily conserved fold. Nonetheless, new evidence suggests the likelihood of protein fold being neither physically nor biologically invariant. Alternative folds discovered in several proteins are composed of constant and variable parts. The latter display context-dependent conformations and a tendency to form new oligomeric interfaces. In turn, oligomerisation mediates fold evolution without loss of protein function. Gene duplication breaks down homo-oligomeric symmetry and relieves the pressure to maintain the local architecture of redundant active sites; this can lead to further structural changes.
Current Opinion in Structural Biology 07/2006; 16(3):399-408. · 8.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The culturability of several actinobacteria is controlled by resuscitation-promoting factors (Rpfs). These are proteins containing a c. 70-residue domain that adopts a lysozyme-like fold. The invariant catalytic glutamate residue found in lysozyme and various bacterial lytic transglycosylases is also conserved in the Rpf proteins. Rpf from Micrococcus luteus, the founder member of this protein family, is indeed a muralytic enzyme, as revealed by its activity in zymograms containing M. luteus cell walls and its ability to (i) cause lysis of Escherichia coli when expressed and secreted into the periplasm; (ii) release fluorescent material from fluorescamine-labelled cell walls of M. luteus; and (iii) hydrolyse the artificial lysozyme substrate, 4-methylumbelliferyl-beta-D-N,N',N''-triacetylchitotrioside. Rpf activity was reduced but not completely abolished when the invariant glutamate residue was altered. Moreover, none of the other acidic residues in the Rpf domain was absolutely required for muralytic activity. Replacement of one or both of the cysteine residues that probably form a disulphide bridge within Rpf impaired but did not completely abolish muralytic activity. The muralytic activities of the Rpf mutants were correlated with their abilities to stimulate bacterial culturability and resuscitation, consistent with the view that the biological activity of Rpf results directly or indirectly from its ability to cleave bonds in bacterial peptidoglycan.