Interoperability between Biomedical Ontologies through Relation Expansion, Upper-Level Ontologies and Automatic Reasoning

Department of Genetics, University of Cambridge, Cambridge, United Kingdom.
PLoS ONE (Impact Factor: 3.23). 07/2011; 6(7):e22006. DOI: 10.1371/journal.pone.0022006
Source: PubMed


Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at

Download full-text


Available from: Michel Dumontier, Jan 07, 2014
  • Source
    • "When coupled with modularisation [15], formalisation can facilitate integrative approaches to reason and compare across disparate species. For example, the PhenomeNet approach aligns phenotypes across species and enables the generation of a single, unified, and logically consistent representation of phenotype data for multiple species. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
    Journal of Biomedical Semantics 10/2013; 4(1):30. DOI:10.1186/2041-1480-4-30 · 2.26 Impact Factor
  • Source
    • "However, many efforts in formal modelling of biological phenomena of organisms focus on anatomical features and only rarely address the cell level (cf. [7-10] and [11]). What is missing is a comprehensive tool to represent and to compare cellular phenotypes and their dynamics. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding, modelling and influencing the transition between different states of cells, be it reprogramming of somatic cells to pluripotency or trans-differentiation between cells, is a hot topic in current biomedical and cell-biological research. Nevertheless, the large body of published knowledge in this area is underused, as most results are only represented in natural language, impeding their finding, comparison, aggregation, and usage. Scientific understanding of the complex molecular mechanisms underlying cell transitions could be improved by making essential pieces of knowledge available in a formal (and thus computable) manner. We describe the outline of two ontologies for cell phenotypes and for cellular mechanisms which together enable the representation of data curated from the literature or obtained by bioinformatics analyses and thus for building a knowledge base on mechanisms involved in cellular reprogramming. In particular, we discuss how comprehensive ontologies of cell phenotypes and of changes in mechanisms can be designed using the entity-quality (EQ) model. We show that the principles for building cellular ontologies published in this work allow deeper insights into the relations between the continuants (cell phenotypes) and the occurrents (cell mechanism changes) involved in cellular reprogramming, although implementation remains for future work. Further, our design principles lead to ontologies that allow the meaningful application of similarity searches in the spaces of cell phenotypes and of mechanisms, and, especially, of changes of mechanisms during cellular transitions.
    Journal of Biomedical Semantics 10/2013; 4(1):25. DOI:10.1186/2041-1480-4-25 · 2.26 Impact Factor
  • Source
    • "Ontologies formalize the meaning of terms using a defined vocabulary that facilitates the integration of data and knowledge (Gkoutos et al., 2012). Interoperability of ontological resources is required to automatically analyze data across different data repositories and to enable automatic reasoning for knowledge discovery (Hoehndorf et al., 2011). The Open Biological and Biomedical Ontologies (OBO) Foundry is a collaborative initiative 1 whose goal is to create and maintain an evolving collection of non-overlapping interoperable ontologies that will offer unambiguous representations of the types of entities in biological and biomedical reality (Ceusters and Smith, 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: As a member of the Open Biomedical Ontologies (OBO) foundry, the Protein Ontology (PRO) provides an ontological representation of protein forms and complexes and their relationships. Annotations in PRO can be assigned to individual protein forms and complexes, each distinguishable down to the level of post-translational modification, thereby allowing for a more precise depiction of protein function than is possible with annotations to the gene as a whole. Moreover, PRO is fully interoperable with other OBO ontologies and integrates knowledge from other protein-centric resources such as UniProt and Reactome. Here we demonstrate the value of the PRO framework in the investigation of the spindle checkpoint, a highly conserved biological process that relies extensively on protein modification and protein complex formation. The spindle checkpoint maintains genomic integrity by monitoring the attachment of chromosomes to spindle microtubules and delaying cell cycle progression until the spindle is fully assembled. Using PRO in conjunction with other bioinformatics tools, we explored the cross-species conservation of spindle checkpoint proteins, including phosphorylated forms and complexes; studied the impact of phosphorylation on spindle checkpoint function; and examined the interactions of spindle checkpoint proteins with the kinetochore, the site of checkpoint activation. Our approach can be generalized to any biological process of interest.
    Frontiers in Genetics 04/2013; 4:62. DOI:10.3389/fgene.2013.00062
Show more