Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies

Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.
The American Journal of Human Genetics (Impact Factor: 10.93). 10/2009; 85(4):457-64. DOI: 10.1016/j.ajhg.2009.09.003
Source: PubMed


The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign p values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders.

Download full-text


Available from: Peter Krawitz,
  • Source
    • "Up till now there does not exist a valid, generalpurpose definition of similarity measure. There do exist several special-purpose definitions which have been employed with success in cluster analysis [31] [33], search [2] [3], classification [11] [30] [35], recognition [28] [36] and diagnostics [1] [34]. In this section, first we give a notion of integral of intuitionistic fuzzy sets and then propose a new form of intuitionistic fuzzy implication , inclusion and give a similarity measure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: First we give notion of integral of intuitionistic fuzzy set and introduce intuitionistic fuzzy implicator and intuitionistic fuzzy inclusion measure. Then we propose a new measure of similarity between two intuitionistic fuzzy sets based on intuitionistic fuzzy inclusion measure. Examples are given to illustrate our notion and the application of the this new similarity measure in multi-criteria decision making.
    Journal of Intelligent and Fuzzy Systems 05/2016; DOI:10.3233/IFS-151805 · 1.81 Impact Factor
    • "Pesquita et al. [2009] provides an excellent review of the most popular similarity measures and their performance on GO-related tasks. More recently, the HPO has enabled the use of semantic similarity measures to predict clinical diagnoses [Köhler et al., 2009; Bauer et al., 2012; Zemojtel et al., 2014], to find representative model organisms for gene prioritization [Smedley et al., 2013], and most recently, to identify similar patients [Gottlieb et al., 2015]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The discovery of disease-causing mutations typically requires confirmation of the variant or gene in multiple unrelated individuals, and a large number of rare genetic diseases remain unsolved due to difficulty identifying second families. To enable the secure sharing of case records by clinicians and rare disease scientists, we have developed the PhenomeCentral portal ( Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). PhenomeCentral identifies similar patients in the database based on semantic similarity between clinical features, automatically prioritized genes from whole-exome data, and candidate genes entered by the users, enabling both hypothesis-free and hypothesis-driven matchmaking. Users can then contact other submitters to follow up on promising matches. PhenomeCentral incorporates data for over 1,000 patients with rare genetic diseases, contributed by the FORGE and Care4Rare Canada projects, the US NIH Undiagnosed Diseases Program, the EU Neuromics and ANDDIrare projects, as well as numerous independent clinicians and scientists. Though the majority of these records have associated exome data, most lack a molecular diagnosis. PhenomeCentral has already been used to identify causative mutations for several patients, and its ability to find matching patients and diagnose these diseases will grow with each additional patient that is entered. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Human Mutation 08/2015; 36(10). DOI:10.1002/humu.22851 · 5.14 Impact Factor
  • Source
    • "The HPO is being increasingly used as a basis for integrating phenotypic abnormalities into computational algorithms for diagnostics and research. For instance, Phenomizer (29) and BOQA (20) can be used to assist clinical differential diagnostic for human genetics, and MouseFinder (30), Monarch ( PhenoDigm (14) as well as PhenomeNET (12) enable searches for novel disease genes based on the analysis of model-organism phenotypes. The HPO has been used to integrate phenotypic information into computational analysis of the distribution of proteins in the postsynaptic density of the human neocortex (31), to derive a disease–disease similarity measure for the prediction of novel drug indications (32) and to analyze overrepresentation of phenotypes associated with individual protein domains (33). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Human Phenotype Ontology (HPO) project, available at, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1026 · 9.11 Impact Factor
Show more