[show abstract][hide abstract] ABSTRACT: The Web Ontology Language (OWL) provides a sophisticated language for building complex domain ontologies and is widely used in bio-ontologies such as the Gene Ontology. The Protege-OWL ontology editing tool provides a query facility that allows composition and execution of queries with the human-readable Manchester OWL syntax, with syntax checking and entity label lookup. No equivalent query facility such as the Protege DL query yet exists in web form. However, many users interact with bio-ontologies such as ChEBI and the Gene Ontology using their online websites, within which DL-based querying functionality is not available. To address this gap, we introduce the OntoQuery web-based query utility.Availability and Implementation: The source code for this implementation together with instructions for installation is available at http://github.com/IlincaTudose/OntoQuery. OntoQuery software is fully compatible with all OWL-based ontologies and is available for download (CC-0 license). The ChEBI installation, ChEBI OntoQuery, is available at http://www.ebi.ac.uk/chebi/tools/ontoquery.
[show abstract][hide abstract] ABSTRACT: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI.
We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.
The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.
[show abstract][hide abstract] ABSTRACT: ChEBI (http://www.ebi.ac.uk/chebi) is a database and ontology of chemical entities of biological interest. Over the past few years, ChEBI has continued to grow steadily in content, and has added several new features. In addition to incorporating all user-requested compounds, our annotation efforts have emphasized immunology, natural products and metabolites in many species. All database entries are now 'is_a' classified within the ontology, meaning that all of the chemicals are available to semantic reasoning tools that harness the classification hierarchy. We have completely aligned the ontology with the Open Biomedical Ontologies (OBO) Foundry-recommended upper level Basic Formal Ontology. Furthermore, we have aligned our chemical classification with the classification of chemical-involving processes in the Gene Ontology (GO), and as a result of this effort, the majority of chemical-involving processes in GO are now defined in terms of the ChEBI entities that participate in them. This effort necessitated incorporating many additional biologically relevant compounds. We have incorporated additional data types including reference citations, and the species and component for metabolites. Finally, our website and web services have had several enhancements, most notably the provision of a dynamic new interactive graph-based ontology visualization.
Nucleic Acids Research 11/2012; · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.
We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.
Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.
Journal of Cheminformatics 04/2012; 4:8. · 3.59 Impact Factor
[show abstract][hide abstract] ABSTRACT: The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest.
To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development.
We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development.
[show abstract][hide abstract] ABSTRACT: Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reactions. Rhea provides a non-redundant set of chemical transformations for use in a broad spectrum of applications, including metabolic network reconstruction and pathway inference. Rhea includes enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list), transport reactions and spontaneously occurring reactions. Rhea reactions are described using chemical species from the Chemical Entities of Biological Interest ontology (ChEBI) and are stoichiometrically balanced for mass and charge. They are extensively manually curated with links to source literature and other public resources on metabolism including enzyme and pathway databases. This cross-referencing facilitates the mapping and reconciliation of common reactions and compounds between distinct resources, which is a common first step in the reconstruction of genome scale metabolic networks and models.
Nucleic Acids Research 12/2011; 40(Database issue):D754-60. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Immune Epitope Database (IEDB) recently expanded and enhanced its non-peptidic epitope related data utilizing a collaboration with Chemical Entities of Biological Interest (ChEBI), resulting in the first resource that brings together published immunological data with the expertise of the ChEBI database. This procedure took advantage of the distinct expertise of the IEDB and ChEBI databases to improve content and enhance interoperability of both databases. This project has resulted in the comprehensive inventory and curation of immune epitope data related to non-peptidic structures and serves as a model for successful collaborative curation between established resources.
[show abstract][hide abstract] ABSTRACT: Ontologies encode human knowledge in computationally accessible forms. They are designed to narrow the gap between the knowledge of human experts and the functionality available in computer systems, by expressing expert knowledge in a manner computers can manipulate and reason over. With the ever-growing deluge of data in modern scientific domains, researchers need intelligent tools able to filter out irrelevant and automatically organise relevant information into meaningful categories. The Chemoinformatics and Metabolism team at the EBI is developing chemical ontologies for structure-based chemical classification, role or bioactivity-based chemical classification, and chemical information entities such as descriptors and algorithms. Our ontologies provide collections of names and synonyms which are useful for text mining, stable identifiers which are essential to semantic integration of data, and a semantically rich encoding of many aspects of the chemical domain. But for such ontologies to be maximally useful for diverse users and interoperable with other ontologies in the scientific domain, similar-sounding things have to be disentangled in our language and our ontology. Recent work addresses the distinguishing of structures from chemicals , and of bioactivity from drug uses . Ontologies are backed by logical formalisms such as the Web Ontology Language, OWL. One of the challenges of chemical ontology is representing complex chemical structures in the underlying formalism. Cyclic structures prove particularly challenging for logicbased representation. Recent research in our group investigated the inclusion of chemical graphs in OWL . Integrating chemoinformatics tools with chemical ontology is the subject of ongoing research.
Journal of Cheminformatics 01/2011; 3:1-1. · 3.59 Impact Factor
[show abstract][hide abstract] ABSTRACT: Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. ChEBI is available online at http://www.ebi.ac.uk/chebi/. This article reports on new features in ChEBI since the last NAR report in 2007, including substructure and similarity searching, a submission tool for authoring of ChEBI datasets by the community and a 30-fold increase in the number of chemical structures stored in ChEBI.
Nucleic Acids Research 10/2009; 38(Database issue):D249-54. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on "small" chemical compounds. This unit provides a detailed guide to browsing, searching, downloading, and programmatic access to the ChEBI database.
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 07/2009; Chapter 14:Unit 14.9.
[show abstract][hide abstract] ABSTRACT: *Background*
Appearing in a wide variety of contexts, biochemical 'small molecules' are a core element of biomedical data. Chemical ontologies, which provide stable identifiers and a shared vocabulary for use in referring to such biochemical small molecules, are crucial to enable the interoperation of such data. One such chemical ontology is ChEBI (Chemical Entities of Biological Interest), a candidate member ontology of the OBO Foundry. ChEBI is a publicly available, manually annotated database of chemical entities and contains around 18000 annotated entities as of the last release (May 2009). ChEBI provides stable unique identifiers for chemical entities; a controlled vocabulary in the form of recommended names (which are unique and unambiguous), common synonyms, and systematic chemical names; cross-references to other databases; and a structural and role-based classification within the ontology. ChEBI is widely used for annotation of chemicals within biological databases, text-mining, and data integration. ChEBI can be accessed online at "http://www.ebi.ac.uk/chebi/":http://www.ebi.ac.uk/chebi/ and the full dataset is available for download in various formats including SDF and OBO.
The selection of chemical entities for inclusion in the ChEBI database is user-driven. As the use of ChEBI has grown, so too has the backlog of user-requested entries. Inevitably, the annotation backlog creates a bottleneck, and to speed up the annotation process, ChEBI has recently released a submission tool which allows community submissions of chemical entities, groups, and classes. However, classification of chemical entities within the ontology is a difficult and niche activity, and it is unlikely that the community as a whole will be able or willing to correctly and consistently classify each submitted entity, creating required classes where they are missing. As a result, it is likely that while the size of the database grows, the ontological classification will become less sophisticated, unless the classification of new entities is assisted computationally. In addition, the ChEBI database is expecting substantial size growth in the next year, so automatic classification, which has up till now not been possible, is urgently required. Automatic classification would also enable the ChEBI ontology classes to be applied to other compound databases such as PubChem.
*Description Logic Reasoning*
Description logic based reasoning technology is a prime candidate for development of such an automatic classification system as it allows the rules of the classification system to be encoded within the knowledgebase. Already at 18000 entities, ChEBI is a fair size for a real-world application of description logic reasoning technology, and as the ontology is enhanced with a richer density of asserted relationships, the classification will become more complex and challenging. We have successfully tested a description logic-based classification of chemical entities based on specified structural properties using the hypertableaux-based HermiT reasoner, and found it to be sufficiently efficient to be feasible for use in a production environment on a database of the size that ChEBI is now. However, much work still remains to enrich the ChEBI knowledgebase itself with the properties needed to provide the formal class definitions for use in the automated classification, and to assess the efficiency of the available description logic reasoning technology on a database the size of ChEBI's forecast future growth.
ChEBI is funded by the European Commission under SLING, grant agreement number 226073 (Integrating Activity) within Research Infrastructures of the FP7 Capacities Specific Programme, and by the BBSRC, grant agreement number BB/G022747/1 within the “Bioinformatics and biological resources” fund.
[show abstract][hide abstract] ABSTRACT: Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. ChEBI is available online at http://www.ebi.ac.uk/chebi/
Nucleic Acids Research 02/2008; 36(Database issue):D344-50. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: A structural diagram, in the form of a two-dimensional (2-D) sketch, remains the most effective portrait of a "small molecule" or chemical reaction. However, such structural diagrams, as for any other core data, cannot be used in speech (and should not be used in free text). "Good annotation practice" for biological databases is to use either consistent and widely recognised terminology or unique identifiers from a dedicated database to refer to the molecule of interest. Ideally, scientists should use terminology that is both pronounceable and meaningful. Thus, a viable solution for a bioinformatician is to use a definitive controlled vocabulary of biochemical compounds and reactions, which contains both systematic and common names. In addition, chemical ontologies provide a means for placing entities of interest into wider chemical, biological or medical contexts. We present some challenges and achievements in the standardisation of chemical language in biological databases, with emphasis on three aspects of annotation: 1. good drawing practice: how to draw unambiguous 2-D diagrams; 2. good naming practice: how to give most appropriate names; and 3. good ontology practice: how to link the entity of interest by defined logical relationships to other entities.
[show abstract][hide abstract] ABSTRACT: ChEBI (Chemical Entities of Biological Interest) is a database of ‘small’ molecular entities structured around a chemical ontology. It contains almost 600,000 entries, of which approximately 20,000 have been manually curated, as well as entries for groups (parts of molecular entities) and classes of entities. It provides a wide range of information such as chemical nomenclature, structures and related chemical values, and establishes interrelationships between entities in the ontology, in terms of both structure and role. ChEBI places a strong focus on quality, with exceptional efforts being applied to upholding IUPAC nomenclature recommendations and best IUPAC practices when drawing chemical structures. To invite the community to participate more directly in the future growth and development of ChEBI, we have developed a web-based software utility to enable direct user submissions. Users are encouraged to carry out as much of their own manual curation as possible, e.g. by adding multiple synonyms and database cross-references, and by creating multiple relationships within the ontology. The submissions are automatically validated for uniqueness (both of name and chemical structure) and correctness (such as checking that no non-allowed cycles have inadvertently been created in the ontology graph structure, and that the ontology relationships which have been specified are allowed between entities of the relevant types). Once a submission has passed the required validations, it is submitted to the ChEBI database, at which time it receives its unique ChEBI identifier. It will then become visible to the public (as a preliminary entry) as part of the monthly ChEBI release. To date, ChEBI has received over 750 such external submissions.
[show abstract][hide abstract] ABSTRACT: Natural products are of substantial interest in drug discovery and metabolism research, since they represent molecules that have been shaped by natural selection to be bioactive in ways that are useful for a range of applications including as therapeutics, cosmetics and pesticides. The ChEBI database (http://www.ebi.ac.uk/chebi) and the MetaboLights database (http://www.ebi.ac.uk/metabolights/) aim to offer a comprehensive public resource suite for capturing and describing natural product chemistry. ChEBI has recently added over 2,700 natural products, of which more than 100 have been fully curated. Together with the pre-existing metabolites in ChEBI, the total collection of metabolites (both primary and secondary) is approaching 3,500 entries (October 2012). In addition, we have added the species, strain, and component (e.g tissue type) from which the metabolite has been isolated, linked to the appropriate taxonomies and ontologies, together with supporting citations to the primary literature. The MetaboLights database provides a general-purpose, open-access repository for metabolomics studies, their raw experimental data, and associated metadata . Released in June 2012, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays over 8 different species. These include species such as H. sapiens, C. elegans, M. musculus and A. thaliana, and techniques such as NMR spectroscopy and mass spectrometry. Finally, we have recently released an open-source, open-data natural product likeness implementation , bringing a well-known metric -- useful in compound library screening and lead design - - to a wider community.
Journal of Cheminformatics 5(1). · 3.59 Impact Factor
[show abstract][hide abstract] ABSTRACT: *ChEBI background:* Chemical Entities of Biological Interest (ChEBI) is a curated database of small chemical entities important in biosystems. As well as a description of entities, it provides a semantically rich knowledge base; and an internal hierarchy that organises the entities by their molecular structure types and potential rôles. *The ChEBI-IEDB collaboration:* The Immune Epitope and Analysis Resource (IEDB) is a project supported by contract from the National Institute of Allergy and Infectious Diseases (NIAID). Its goal is to make epitope-related data on infectious diseases and immune disorders freely available to researchers worldwide. In June 2009, ChEBI began working with the IEDB on a project aimed at incorporating into ChEBI, by manual curation, a pilot subset of immunologically important chemicals identified as immune epitopes. *The significance of the project:* Numerous reports attest to an increasing global prevalence of immune-related diseases, with a multiplicity of contributing factors. This situation underscores the need for cross-talk among the various scientific disciplines, and makes ChEBI involvement in this project particularly relevant. *Collaboration outcome:* That collaboration among curators working on different databases can be reciprocally beneficial has been amply demonstrated by the ChEBI-IEDB teamwork described: while the incorporated IEDB items have substantially enriched ChEBI, the latter’s multiplicity of synonyms, structure tree lay-out and expertise in describing non-peptidic epitopes have been equally useful to the IEDB in facilitating the search process. *Status quo and plans:* We continue to refine our task of assisting the identification, understanding and utilisation of biologically meaningful chemical entities by engaging in further joint projects.