-
[show abstract]
[hide abstract]
ABSTRACT: N(ε)-Methylations of histone lysine residues play critical roles in cell biology by "marking" chromatin for transcriptional activation or repression. Lysine demethylases reverse N(ε)-methylation in a sequence- and methylation-selective manner. The determinants of sequence selectivity for histone demethylases have been unclear. The human JMJD2 (KDM4) H3K9 and H3K36 demethylases can be divided into members that act on both H3K9 and H3K36 and H3K9 alone. Kinetic, crystallographic, and mutagenetic studies in vitro and in cells on KDM4A-E reveal that selectivity is determined by multiple interactions within the catalytic domain but outside the active site. Structurally informed phylogenetic analyses reveal that KDM4A-C orthologues exist in all genome-sequenced vertebrates with earlier animals containing only a single KDM4 enzyme. KDM4D orthologues only exist in eutherians (placental mammals) where they are conserved, including proposed substrate sequence-determining residues. The results will be useful for the identification of inhibitors for specific histone demethylases.
Journal of Biological Chemistry 09/2011; 286(48):41616-25. · 4.77 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: N-Methylations of histone lysine-residues play critical roles in cell-biology by 'marking' chromatin for transcriptional activation
or repression. Lysine demethylases reverse N-methylation in a sequence and methylation selective manner. The determinants
of sequence selectivity for histone demethylases have been unclear. The human JMJD2 (KDM4) H3K9 and H3K36 demethylases can
be divided into members that act on both H3K9 and H3K36, and H3K9 alone. Kinetic, crystallographic and mutagenetic studies
in vitro and in cells on KDM4A-E reveal that selectivity is determined by multiple interactions within the catalytic domain
but outside the active site. Structurally informed phylogenetic analyses reveal that KDM4A-C orthologues exist in all genome-sequenced
vertebrates with earlier animals containing only a single KDM4 enzyme. KDM4D orthologues only exist in eutherians (placental
mammals) where they are conserved including with respect to proposed substrate sequence determining residues. The results
will be useful for the identification of inhibitors for specific histone demethylases.
Journal of Biological Chemistry 09/2011; · 4.77 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.
Protein Science 04/2009; 11(2):233 - 244. · 2.80 Impact Factor
-
Bengt Persson,
Yvonne Kallberg, James E Bray,
Elspeth Bruford,
Stephen L Dellaporta,
Angelo D Favia,
Roser Gonzalez Duarte,
Hans Jörnvall,
Kathryn L Kavanagh,
Natalia Kedishvili,
Michael Kisiela,
Edmund Maser,
Rebekka Mindnich,
Sandra Orchard,
Trevor M Penning,
Janet M Thornton,
Jerzy Adamski,
Udo Oppermann
[show abstract]
[hide abstract]
ABSTRACT: Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with presently over 46,000 members. In phylogenetic comparisons, members of this superfamily show early divergence where the majority have only low pairwise sequence identity, although sharing common structural properties. The SDR enzymes are present in virtually all genomes investigated, and in humans over 70 SDR genes have been identified. In humans, these enzymes are involved in the metabolism of a large variety of compounds, including steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. It is now clear that SDRs represent one of the oldest protein families and contribute to essential functions and interactions of all forms of life. As this field continues to grow rapidly, a systematic nomenclature is essential for future annotation and reference purposes. A functional subdivision of the SDR superfamily into at least 200 SDR families based upon hidden Markov models forms a suitable foundation for such a nomenclature system, which we present in this paper using human SDRs as examples.
Chemico-biological interactions 12/2008; 178(1-3):94-8. · 2.46 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The short-chain dehydrogenase/reductase (SDR) superfamily represents one of the largest protein superfamilies known to date. Enzymes of this family usually catalyse NAD(P)(H) dependent reactions with a substrate spectrum ranging from polyols, retinoids, steroids and fatty acid derivatives to xenobiotics. We have currently identified 73 SDR superfamily members within the human genome. A status report of the human SDR superfamily is provided in terms of 3D structure determination, co-factor preferences, subcellular localisation and functional annotation. A simple scoring system for measuring structural and functional information (SFS score) has also been introduced to monitor the status of 5 key metrics. Currently there are 17 SDR members with an SFS score of zero indicating that almost a quarter of the human SDR superfamily lacks substantial functional annotation.
Chemico-biological interactions 12/2008; 178(1-3):99-109. · 2.46 Impact Factor
-
Stanley S Ng,
Kathryn L Kavanagh,
Michael A McDonough,
Danica Butler,
Ewa S Pilka,
Benoit M R Lienard, James E Bray,
Pavel Savitsky,
Opher Gileadi,
Frank von Delft,
Nathan R Rose,
John Offer,
Johanna C Scheinost,
Tomasz Borowski,
Michael Sundstrom,
Christopher J Schofield,
Udo Oppermann
[show abstract]
[hide abstract]
ABSTRACT: Post-translational histone modification has a fundamental role in chromatin biology and is proposed to constitute a 'histone code' in epigenetic regulation. Differential methylation of histone H3 and H4 lysyl residues regulates processes including heterochromatin formation, X-chromosome inactivation, genome imprinting, DNA repair and transcriptional regulation. The discovery of lysyl demethylases using flavin (amine oxidases) or Fe(II) and 2-oxoglutarate as cofactors (2OG oxygenases) has changed the view of methylation as a stable epigenetic marker. However, little is known about how the demethylases are selective for particular lysyl-containing sequences in specific methylation states, a key to understanding their functions. Here we reveal how human JMJD2A (jumonji domain containing 2A), which is selective towards tri- and dimethylated histone H3 lysyl residues 9 and 36 (H3K9me3/me2 and H3K36me3/me2), discriminates between methylation states and achieves sequence selectivity for H3K9. We report structures of JMJD2A-Ni(II)-Zn(II) inhibitor complexes bound to tri-, di- and monomethyl forms of H3K9 and the trimethyl form of H3K36. The structures reveal a lysyl-binding pocket in which substrates are bound in distinct bent conformations involving the Zn-binding site. We propose a mechanism for achieving methylation state selectivity involving the orientation of the substrate methyl groups towards a ferryl intermediate. The results suggest distinct recognition mechanisms in different demethylase subfamilies and provide a starting point to develop chemical tools for drug discovery and to study and dissect the complexity of reversible histone methylation and its role in chromatin biology.
Nature 08/2007; 448(7149):87-91. · 36.28 Impact Factor
-
Nevan J Krogan,
Gerard Cagney,
Haiyuan Yu,
Gouqing Zhong,
Xinghua Guo,
Alexandr Ignatchenko,
Joyce Li,
Shuye Pu,
Nira Datta,
Aaron P Tikuisis, [......],
Ali Shilatifard,
Erin O'Shea,
Jonathan S Weissman,
C James Ingles,
Timothy R Hughes,
John Parkinson,
Mark Gerstein,
Shoshana J Wodak,
Andrew Emili,
Jack F Greenblatt
[show abstract]
[hide abstract]
ABSTRACT: Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.
Nature 04/2006; 440(7084):637-43. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Membrane proteins constitute ~30% of prokaryotic and eukaryotic genomes but comprise a small fraction of the entries in protein structural databases. A number of features of membrane proteins render them challenging targets for the structural biologist, among which the most important is the difficulty in obtaining sufficient quantities of purified protein. We are exploring procedures to express and purify large numbers of prokaryotic membrane proteins. A set of 280 membrane proteins from Escherichia coli and Thermotoga maritima, a thermophile, was cloned and tested for expression in Escherichia coli. Under a set of standard conditions, expression could be detected in the membrane fraction for approximately 30% of the cloned targets. About 22 of the highest expressing membrane proteins were purified, typically in just two chromatographic steps. There was a clear correlation between the number of predicted transmembrane domains in a given target and its propensity to express and purify. Accordingly, the vast majority of successfully expressed and purified proteins had six or fewer transmembrane domains. We did not observe any clear advantage to the use of thermophilic targets. Two of the purified membrane proteins formed crystals. By comparison with protein production efforts for soluble proteins, where approximately 70% of cloned targets express and approximately 25% can be readily purified for structural studies [Christendat et al. (2000) Nat. Struct. Biol., 7, 903], our results demonstrate that a similar approach will succeed for membrane proteins, albeit with an expected higher attrition rate.
Journal of Structural and Functional Genomics 02/2005; 6(1):33-50.
-
[show abstract]
[hide abstract]
ABSTRACT: Target selection strategies for structural genomic projects must be able to prioritize gene regions on the basis of significant sequence similarity with proteins that have already been structurally determined. With the rapid development of protein comparison software a robust prioritization scheme should be independent of the choice of algorithm and be able to incorporate different sequence similarity thresholds.
A robust target selection strategy has been developed that can assign a priority level to all genes in any genome. Structural assignments to genome sequences are calculated at two thresholds and six levels (1-6) describe the prioritization of all whole genes and partial gene regions. This simple two-threshold approach can be implemented with any fold recognition or homology detection algorithms. The results for 10 genomes are presented using the SSEARCH and PSI-BLAST programs.
Programs are available on request from the authors.
Bioinformatics 10/2004; 20(14):2288-95. · 5.47 Impact Factor
-
Nevan J Krogan,
Wen-Tao Peng,
Gerard Cagney,
Mark D Robinson,
Robin Haw,
Gouqing Zhong,
Xinghua Guo,
Xin Zhang,
Veronica Canadien,
Dawn P Richards, [......],
Armaity P Davierwala,
Sanie Mnaimneh,
Andrei Starostine,
Aaron P Tikuisis,
Jorg Grigull,
Nira Datta, James E Bray,
Timothy R Hughes,
Andrew Emili,
Jack F Greenblatt
[show abstract]
[hide abstract]
ABSTRACT: A remarkably large collection of evolutionarily conserved proteins has been implicated in processing of noncoding RNAs and biogenesis of ribonucleoproteins. To better define the physical and functional relationships among these proteins and their cognate RNAs, we performed 165 highly stringent affinity purifications of known or predicted RNA-related proteins from Saccharomyces cerevisiae. We systematically identified and estimated the relative abundance of stably associated polypeptides and RNA species using a combination of gel densitometry, protein mass spectrometry, and oligonucleotide microarray hybridization. Ninety-two discrete proteins or protein complexes were identified comprising 489 different polypeptides, many associated with one or more specific RNA molecules. Some of the pre-rRNA-processing complexes that were obtained are discrete sub-complexes of those previously described. Among these, we identified the IPI complex required for proper processing of the ITS2 region of the ribosomal RNA primary transcript. This study provides a high-resolution overview of the modular topology of noncoding RNA-processing machinery.
Molecular Cell 02/2004; 13(2):225-39. · 14.18 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The Gene3D database (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) provides structural assignments for genes within complete genomes. These are available via the internet from either the World Wide Web or FTP. Assignments are made using PSI-BLAST and subsequently processed using the DRange protocol. The DRange protocol is an empirically benchmarked method for assessing the validity of structural assignments made using sequence searching methods where appropriate assignment statistics are collected and made available. Gene3D links assignments to their appropriate entries in relevent structural and classification resources (PDBsum, CATH database and the Dictionary of Homologous Superfamilies). Release 2.0 of Gene3D includes 62 genomes, 2 eukaryotes, 10 archaea and 40 bacteria. Currently, structural assignments can be made for between 30 and 40 percent of any given genome. In any genome, around half of those genes assigned a structural domain are assigned a single domain and the other half of the genes are assigned multiple structural domains. Gene3D is linked to the CATH database and is updated with each new update of CATH.
Nucleic Acids Research 02/2003; 31(1):469-73. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.
Protein Science 03/2002; 11(2):233-44. · 2.80 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Over the last decade, there have been huge increases in the numbers of protein sequences and structures determined. In parallel, many methods have been developed for recognising similarities between these proteins, arising from their common evolutionary background, and for clustering such relatives into protein families. Here we review some of the protein family resources available to the biologist and describe how these can be used to provide structural and functional annotations for newly determined sequences. In particular we describe recent developments to the CATH domain database of protein structural families which have facilitated genome annotation and which have also revealed important caveats that must be considered when transferring functional data between homologous proteins.
PROTEOMICS 02/2002; 2(1):11-21. · 4.51 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Over the last decade, there have been huge increases in the numbers of protein sequences and structures determined. In parallel, many methods have been developed for recognising similarities between these proteins, arising from their common evolutionary background, and for clustering such relatives into protein families. Here we review some of the protein family resources available to the biologist and describe how these can be used to provide structural and functional annotations for newly determined sequences. In particular we describe recent developments to the CATH domain database of protein structural families which have facilitated genome annotation and which have also revealed important caveats that must be considered when transferring functional data between homologous proteins.
Proteomics 01/2002; 2(1):11 - 21. · 4.43 Impact Factor
-
Nucleic Acids Research. 01/1999; 27:275-279.