Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies

Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.
The American Journal of Human Genetics (Impact Factor: 10.93). 10/2009; 85(4):457-64. DOI: 10.1016/j.ajhg.2009.09.003
Source: PubMed


The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign p values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders.

Download full-text


Available from: Peter Krawitz
  • Source
    • "Of particular importance for these annotation-based scores is the concept of information content—a logarithmic transformation of rareness of annotations at or below each term as determined by association of the knowledge catalogs (e.g., the set of OMIM diseases) to the ontology. To compute annotation-based similarities, we used a version of the Resnik method [37], as symmetrized by Köhler, et al. [20]. In what follows, let D = an annotated disease, Q = a queried phenotype term set, d{t} = set of diseases annotated with term t, A{t} = set of terms t and all their respective ancestors, C{t} = set of terms t and all their respective children, and ||x|| = quantity of elements in set x. Let N be the total number of disease in the catalog that are annotated to the ontology. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives. Our tool, “OMIM Explorer” (, extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes. We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants. Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge.
    Preview · Article · Dec 2016 · Genome Medicine
  • Source
    • "Up till now there does not exist a valid, generalpurpose definition of similarity measure. There do exist several special-purpose definitions which have been employed with success in cluster analysis [31] [33], search [2] [3], classification [11] [30] [35], recognition [28] [36] and diagnostics [1] [34]. In this section, first we give a notion of integral of intuitionistic fuzzy sets and then propose a new form of intuitionistic fuzzy implication , inclusion and give a similarity measure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: First we give notion of integral of intuitionistic fuzzy set and introduce intuitionistic fuzzy implicator and intuitionistic fuzzy inclusion measure. Then we propose a new measure of similarity between two intuitionistic fuzzy sets based on intuitionistic fuzzy inclusion measure. Examples are given to illustrate our notion and the application of the this new similarity measure in multi-criteria decision making.
    Full-text · Article · Mar 2016 · Journal of Intelligent and Fuzzy Systems
    • "Pesquita et al. [2009] provides an excellent review of the most popular similarity measures and their performance on GO-related tasks. More recently, the HPO has enabled the use of semantic similarity measures to predict clinical diagnoses [Köhler et al., 2009; Bauer et al., 2012; Zemojtel et al., 2014], to find representative model organisms for gene prioritization [Smedley et al., 2013], and most recently, to identify similar patients [Gottlieb et al., 2015]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The discovery of disease-causing mutations typically requires confirmation of the variant or gene in multiple unrelated individuals, and a large number of rare genetic diseases remain unsolved due to difficulty identifying second families. To enable the secure sharing of case records by clinicians and rare disease scientists, we have developed the PhenomeCentral portal ( Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). PhenomeCentral identifies similar patients in the database based on semantic similarity between clinical features, automatically prioritized genes from whole-exome data, and candidate genes entered by the users, enabling both hypothesis-free and hypothesis-driven matchmaking. Users can then contact other submitters to follow up on promising matches. PhenomeCentral incorporates data for over 1,000 patients with rare genetic diseases, contributed by the FORGE and Care4Rare Canada projects, the US NIH Undiagnosed Diseases Program, the EU Neuromics and ANDDIrare projects, as well as numerous independent clinicians and scientists. Though the majority of these records have associated exome data, most lack a molecular diagnosis. PhenomeCentral has already been used to identify causative mutations for several patients, and its ability to find matching patients and diagnose these diseases will grow with each additional patient that is entered. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    No preview · Article · Aug 2015 · Human Mutation
Show more