Minoru Kanehisa

Kyoto University, Kioto, Kyōto, Japan

Are you Minoru Kanehisa?

Claim your profile

Publications (379)1031.7 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where changes in chemical structure are not known in an organism. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.
    No preview · Article · Jan 2016 · Journal of Chemical Information and Modeling
  • Source
    N. Manoj · V. R. Srinivas · A. Surolia · M. Vijayan · K. Suguna · R. Ravishankar · R. Schwarzenbacher · K. Zeth · Diederichs · G. M. Kostner · [...] · M. De Spirito · Rajendra K. Agrawal · Amy B. Heagle · Pawel Penczek · Robert Grassucci · Joachim Frank · Manjuli R. Sharma · Loice H. Jeyakumar · Sidney Fleischer · Terence Wagenknecht ·

    Full-text · Dataset · Jan 2016
  • Source
    N. Manoj · V. R. Srinivas · A. Surolia · M. Vijayan · K. Suguna · R. Ravishankar · R. Schwarzenbacher · K. Zeth · Diederichs · G. M. Kostner · [...] · M. De Spirito · Rajendra K. Agrawal · Amy B. Heagle · Pawel Penczek · Robert Grassucci · Joachim Frank · Manjuli R. Sharma · Loice H. Jeyakumar · Sidney Fleischer · Terence Wagenknecht ·

    Full-text · Dataset · Jan 2016
  • Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG (http://www.kegg.jp/) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.
    No preview · Chapter · Jan 2016
  • Source

    Full-text · Article · Dec 2015 · Glycobiology
  • Source
    Minoru Kanehisa · Yoko Sato · Kanae Morishima
    [Show abstract] [Hide abstract]
    ABSTRACT: BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KEGG Orthology (KO) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG website (http://www.kegg.jp/blastkoala/). In BlastKOALA the KO assignment is done by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with CD-HIT clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results.
    Preview · Article · Nov 2015 · Journal of Molecular Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.
    Preview · Article · Oct 2015 · Nucleic Acids Research
  • Kiyoko F Aoki-Kinoshita · Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: This chapter describes the KEGG GLYCAN database of the KEGG resource, including descriptions of links to the other databases in KEGG. In particular, KEGG GLYCAN consists of glycan structures, with links to glycogenes, orthologs, reactions, pathways, drugs, diseases, and others, all within the KEGG resources. A number of analytical tools are also available, including the composite structure map (CSM), KegDraw, KCam, and GECS. These databases and tools will be described along with simple examples of their usage.
    No preview · Article · Mar 2015 · Methods in molecular biology (Clifton, N.J.)
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.
    Full-text · Article · Oct 2014 · Journal of Bioinformatics and Computational Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DINIES (drug–target interaction network inference engine based on supervised analysis) is a web server for predicting unknown drug–target interaction networks from various types of biological data (e.g. chemical structures, drug side effects, amino acid sequences and protein domains) in the framework of supervised network inference. The originality of DINIES lies in prediction with state-of-the-art machine learning methods, in the integration of heterogeneous biological data and in compatibility with the KEGG database. The DINIES server accepts any ‘profiles’ or precalculated similarity matrices (or ‘kernels’) of drugs and target proteins in tab-delimited file format. When a training data set is submitted to learn a predictive model, users can select either known interaction information in the KEGG DRUG database or their own interaction data. The user can also select an algorithm for supervised network inference, select various parameters in the method and specify weights for heterogeneous data integration. The server can provide integrative analyses with useful components in KEGG, such as biological pathways, functional hierarchy and human diseases. DINIES (http://www.genome.jp/tools/dinies/) is publicly available as one of the genome analysis tools in GenomeNet.
    Full-text · Article · May 2014 · Nucleic Acids Research
  • Source
    Masaaki Kotera · Susumu Goto · Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: The IUBMB׳s Enzyme List gives a valuable library of the individual experimental facts on enzyme activities, providing the standard classification and nomenclature of enzymes. Empirical knowledge about the relationships between the enzyme protein sequences (or structures) and their functions (the capability of catalyzing chemical reactions) has been accumulating in public literatures and databases. This provides a complementary approach to standardize and organize enzyme data, i.e., predicting the possible enzymes, reactions and metabolites that remain to be identified experimentally. Thus, we suggest the necessity of classifying enzymes based on the evidence and different perspectives obtained from various experimental works. The KEGG (Kyoto Encyclopedia of Genes and Genomes) database describes enzymes from many different viewpoints including; the IUBMB׳s enzyme nomenclature/classification (EC numbers), the similarity group of enzyme reactions (KEGG Reaction Class; RCLASS) based solely on the chemical structure transformation patterns, and the similarity groups of enzyme genes (KEGG Orthology; KO) based on the orthologous groups that can be mapped to the KEGG PATHWAY and BRITE functional hierarchy. Some unique identifiers were additionally introduced to the KEGG database other than the EC numbers established by IUBMB. R, RP and RC numbers are given to distinguish reactions, reactant pairs and RCLASS, respectively. Genes, including enzyme genes, have their own ID numbers in specific organisms, and they are classified into ortholog groups that are identified by K numbers. In this review, we explain the concept and methodology of this formulation with some concrete example cases. We propose it beneficial to create a standard classification scheme that deals with both experimentally identified and theoretically predicted enzymes.
    Full-text · Article · May 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.
    Full-text · Article · Dec 2013 · BMC Systems Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the hierarchy of data, information and knowledge, computational methods play a major role in the initial processing of data to extract information, but they alone become less effective to compile knowledge from information. The Kyoto Encyclopedia of Genes and Genomes (KEGG) resource (http://www.kegg.jp/ or http://www.genome.jp/kegg/) has been developed as a reference knowledge base to assist this latter process. In particular, the KEGG pathway maps are widely used for biological interpretation of genome sequences and other high-throughput data. The link from genomes to pathways is made through the KEGG Orthology system, a collection of manually defined ortholog groups identified by K numbers. To better automate this interpretation process the KEGG modules defined by Boolean expressions of K numbers have been expanded and improved. Once genes in a genome are annotated with K numbers, the KEGG modules can be computationally evaluated revealing metabolic capacities and other phenotypic features. The reaction modules, which represent chemical units of reactions, have been used to analyze design principles of metabolic networks and also to improve the definition of K numbers and associated annotations. For translational bioinformatics, the KEGG MEDICUS resource has been developed by integrating drug labels (package inserts) used in society.
    Preview · Article · Nov 2013 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite wide-spread consensus on the need to transform toxicology and risk assessment in order to keep pace with technological and computational changes that have revolutionized the life sciences, there remains much work to be done to achieve the vision of toxicology based on a mechanistic foundation. To this end, a workshop was organized to explore one key aspect of this transformation - the development of Pathways of Toxicity as a key tool for hazard identification based on systems biology. Several issues were discussed in depth in the workshop: The first was the challenge of formally defining the concept of a Pathway of Toxicity (PoT), as distinct from, but complementary to, other toxicological pathway concepts such as mode of action (MoA). The workshop came up with a preliminary definition of PoT as "A molecular definition of cellular processes shown to mediate adverse outcomes of toxicants". It is further recognized that normal physiological pathways exist that maintain homeostasis and these, sufficiently perturbed, can become PoT. Second, the workshop sought to define the adequate public and commercial resources for PoT information, including data, visualization, analyses, tools, and use-cases, as well as the kinds of efforts that will be necessary to enable the creation of such a resource. Third, the workshop explored ways in which systems biology approaches could inform pathway annotation, and which resources are needed and available that can provide relevant PoT information to the diverse user communities.
    Full-text · Article · Oct 2013
  • Source
    Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity.We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.
    Preview · Article · Sep 2013
  • Source
    Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: There is a tendency that a unit of enzyme genes in an operon-like structure in the prokaryotic genome encodes enzymes that catalyze a series of consecutive reactions in a metabolic pathway. Our recent analysis shows that this and other genomic units correspond to chemical units reflecting chemical logic of organic reactions. From all known metabolic pathways in the KEGG database chemical units, called reaction modules, we identified the conserved sequences of chemical structure transformation patterns of small molecules. The extracted patterns suggest co-evolution of genomic units and chemical units. While the core of the metabolic network may have evolved with mechanisms involving individual enzymes and reactions, its extension may have been driven by modular units of enzymes and reactions.
    Preview · Article · Jun 2013 · FEBS letters
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The metabolic network is both a network of chemical reactions and a network of enzymes that catalyze reactions. Towards better understanding of this duality in the evolution of the metabolic network, we developed a method to extract conserved sequences of reactions called reaction modules from the analysis of chemical compound structure transformation patterns in all known metabolic pathways stored in the KEGG PATHWAY database. The extracted reaction modules are repeatedly used as if they are building blocks of the metabolic network and contain chemical logic of organic reactions. Furthermore, the reaction modules often correspond to traditional pathway modules defined as sets of enzymes in the KEGG MODULE database and sometimes to operon-like gene clusters in prokaryotic genomes. We identified well-conserved, possibly ancient, reaction modules involving 2-oxocarboxylic acids. The chain extension module that appears as the tricarboxylic acid reaction sequence in the TCA cycle is now shown to be used in other pathways together with different types of modification modules. We also identified reaction modules and their connection patterns for aromatic ring cleavages in microbial biodegradation pathways, which are most characteristic in terms of both distinct reaction sequences and distinct gene clusters. The modular architecture of biodegradation modules will have a potential for predicting degradation pathways of xenobiotic compounds. The collection of these and many other reaction modules is made available as part of the KEGG database.
    Full-text · Article · Feb 2013 · Journal of Chemical Information and Modeling
  • Minoru Kanehisa
    [Show abstract] [Hide abstract]
    ABSTRACT: KEGG ( http://www.genome.jp/kegg/ ) is an integrated database resource for linking genomes or molecular datasets to molecular networks (pathways, etc.) representing higher-level systemic functions of the cell, the organism, and the ecosystem. Major efforts have been undertaken for capturing and representing experimental knowledge as manually drawn KEGG pathway maps and for genome-based generalization of experimental knowledge through the KEGG Orthology (KO) system. Current knowledge on diseases and drugs has also been integrated in the KEGG pathway maps, especially in terms of known disease genes and drug targets. Thus, KEGG can be used as a reference knowledge base for integration and interpretation of large-scale datasets generated by high-throughput experimental technologies, as well for finding their practical values. Here we give an introduction to the KEGG Mapper tools, especially for understanding disease mechanisms and adverse drug interactions.
    No preview · Article · Jan 2013 · Methods in molecular biology (Clifton, N.J.)
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background One of the main goals of genomic analysis is to elucidate the comprehensive functions (functionome) in individual organisms or a whole community in various environments. However, a standard evaluation method for discerning the functional potentials harbored within the genome or metagenome has not yet been established. We have developed a new evaluation method for the potential functionome, based on the completion ratio of Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules. Results Distribution of the completion ratio of the KEGG functional modules in 768 prokaryotic species varied greatly with the kind of module, and all modules primarily fell into 4 patterns (universal, restricted, diversified and non-prokaryotic modules), indicating the universal and unique nature of each module, and also the versatility of the KEGG Orthology (KO) identifiers mapped to each one. The module completion ratio in 8 phenotypically different bacilli revealed that some modules were shared only in phenotypically similar species. Metagenomes of human gut microbiomes from 13 healthy individuals previously determined by the Sanger method were analyzed based on the module completion ratio. Results led to new discoveries in the nutritional preferences of gut microbes, believed to be one of the mutualistic representations of gut microbiomes to avoid nutritional competition with the host. Conclusions The method developed in this study could characterize the functionome harbored in genomes and metagenomes. As this method also provided taxonomical information from KEGG modules as well as the gene hosts constructing the modules, interpretation of completion profiles was simplified and we could identify the complementarity between biochemical functions in human hosts and the nutritional preferences in human gut microbiomes. Thus, our method has the potential to be a powerful tool for comparative functional analysis in genomics and metagenomics, able to target unknown environments containing various uncultivable microbes within unidentified phyla.
    Full-text · Article · Dec 2012 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.
    Full-text · Article · Nov 2012 · Nucleic Acids Research

Publication Stats

28k Citations
1,031.70 Total Impact Points

Institutions

  • 1986-2014
    • Kyoto University
      • • Bioinformatics Center
      • • Institute for Chemical Research
      Kioto, Kyōto, Japan
  • 2012
    • Hokkaido University
      • Graduate School of Information Science and Technology
      Sapporo, Hokkaidō, Japan
  • 2011
    • Ritsumeikan University
      Kioto, Kyōto, Japan
  • 2003-2008
    • The University of Tokyo
      • Center for Human Genome
      Tōkyō, Japan
    • National Institute of Advanced Industrial Science and Technology
      • Computational Biology Research Center
      Tokyo, Tokyo-to, Japan
  • 2007
    • Boston University
      • Center for Advanced Biotechnology
      Boston, Massachusetts, United States