Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List

Department of Molecular Biophysics and Biochemistry, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, USA
FEMS microbiology reviews (Impact Factor: 10.96). 08/1998; DOI: 10.1016/S0168-6445(98)00019-9
Source: CiteSeer

ABSTRACT We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g. analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into "fold families." This library can be built up automatically using a structure-comparison program, and we described how important objective stat...

  • [Show abstract] [Hide abstract]
    ABSTRACT: The sequencing of entire bacterial genomes is becoming increasingly routine, promising to revolutionise approaches to identifying putative antimicrobial drug targets. In silico methods can be used to identify putative gene products by comparing sequences of biochemically characterised enzymes and proteins with data produced by sequencing projects. Comparative genomics between a pathogenic bacterium versus nonpathogen as well as pathogen versus host can identify molecular targets that would be ideal for future investigation. The aim of these comparisons would be to identify genes that code for pathogenicity factors in the bacterium or genes essential for bacterial survival. The latter set of genes includes those that are nonfunctional or redundant in the host as well as genes absent from the host but essential in the pathogen. The products of these genes would be ideal targets for antimicrobial compounds. If compounds could be generated that disrupt the pathogen's ability to thrive but not affect the host, since there is a lack of the targeted protein, they could prove to be powerful therapeutics. An elegant example illustrating the power of comparative genomics involves comparison of the pathways of bacterial and eukaryotic aminoacyl-tRNA synthesis. Comparison of pathogenic bacterial genomes shows that many bacteria lack the genes encoding either one or two specific aminoacyl-tRNA synthetases, enzymes involved in ensuring correct aminoacylation of tRNA for subsequent translation of the genetic code. Bacteria have an alternative pathway by which amide aminoacyl-tRNAs are formed. Comparative genomics has demonstrated that this pathway is uniquely prokaryotic/archaeal and also relatively widely found in pathogenic bacteria, indicating the potential of the catalytic enzymes of the pathway as targets for novel antimicrobial drugs.
    BioDrugs 02/2002; 16(5):331-7. · 2.12 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at
    Methods of Information in Medicine 02/2001; 40(4):346-58. · 1.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cellular metabolism can be characterized by networks of enzymatic reactions and transport processes capable of supporting cellular life. Our aim is to find evolutionary patterns and processes embedded in the architecture and function of modern metabolism, using information derived from structural genomics. The Molecular Ancestry Network (MANET) project traces evolution of protein architecture in biomolecular networks. We describe metabolic MANET, a database that links information in the Structural Classification of Proteins (SCOP), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and phylogenetic reconstructions depicting the evolution of protein fold architecture. Metabolic MANET literally 'paints' the ancestries of enzymes derived from rooted phylogenomic trees directly onto over one hundred metabolic subnetworks, enabling the study of evolutionary patterns at global and local levels. An initial analysis of painted subnetworks reveals widespread enzymatic recruitment and an early origin of amino acid metabolism. MANET maps evolutionary relationships directly and globally onto biological networks, and can generate and test hypotheses related to evolution of metabolism. We anticipate its use in the study of other networks, such as signaling and other protein-protein interaction networks.
    BMC Bioinformatics 02/2006; 7:351. · 3.02 Impact Factor

Full-text (2 Sources)

Available from
May 26, 2014

Hedi Hegyi