Bridging the semantics gap between terminologies, ontologies, and information models

Institute of Medical Biometry Medical Informatics, University Medical Center Freiburg, Germany.
Studies in health technology and informatics 01/2010; 160(Pt 2):1000-4. DOI: 10.3233/978-1-60750-588-4-1000
Source: PubMed


SNOMED CT and other biomedical vocabularies provide semantic identifiers for all kinds of linguistic expressions, many of which cannot be considered terms in a strict sense. We analyzed such "non-terms" in SNOMED CT and concluded that many of them cannot be interpreted as directly referring to objects or processes, but rather to information entities. Discussing two approaches to represent information entities, viz. the OBO Information artifact ontology (IAO) and the HL7 v3 Reference Information Model (RIM), we propose an integrative solution for representing information entities in SNOMED CT, in a way that is still compatible with RIM and the IAO and uses moderately enhanced description logics.

Download full-text


Available from: Stefan Schulz,
  • Source
    • "However, it does not provide a complete set of attributes for the concepts represented by its classes. In RIM, many conceptual attributes are represented through the terminology systems used for encoding the values of its class attributes [23,24,25,26,27,28]. Figure 3 shows an example of modeling a phenotype variable “mother smoked when she was pregnant” using a SNOMED-CT concept model and HL7 RIM. The SNOMED-CT concept models do not include an explicit subject of information attribute thus subject of the finding is described using the relationship context attribute. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated from genome-wide association studies (GWAS). These data can be used to facilitate novel scientific discoveries and to reduce cost and time for exploratory research. However, idiosyncrasies and inconsistencies in phenotype variable names are a major barrier to reusing these data. We addressed these challenges in standardizing phenotype variables by formalizing their descriptions using Clinical Element Models (CEM). Designed to represent clinical data, CEMs were highly expressive and thus were able to represent a majority (77.5%) of the 215 phenotype variable descriptions. However, their high expressivity also made it difficult to directly apply them to research data such as phenotype variables in dbGaP. Our study suggested that simplification of the template models makes it more straightforward to formally represent the key semantics of phenotype variables.
    PLoS ONE 09/2013; 8(9):e76384. DOI:10.1371/journal.pone.0076384 · 3.23 Impact Factor
  • Source
    • "We argue that integrating patient care and clinical research domains requires a standard-based expressive and scalable semantic interoperability framework, allowing dynamic mappings between data structures and semantics of varying data sources. There have been various attempts for solving the semantics gap between medical terminologies, ontologies and information models [4] [5], and also generating a networked knowledge-base from available medical ontologies using Semantic Web technologies [6]. With regards to eligibility determination an additional issue is the definition of a formal representation of free-text eligibility criteria [7] [8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A major barrier to repurposing routinely collected data for clinical research is the heterogeneity of healthcare information systems. Electronic Healthcare Record for Clinical Research (EHR4CR) is a European platform designed to improve the efficiency of conducting clinical trials. In this paper, we propose an initial architecture of the EHR4CR Semantic Interoperability Framework. We used a model-driven engineering approach to build a reference HL7-based multidimensional model bound to a set of reference clinical terminologies acting as a global as view model. We then conducted an evaluation of its expressiveness for patient eligibility. The EHR4CR information model consists in one fact table dedicated to clinical statement and 4 dimensions. The EHR4CR terminology integrates reference terminologies used in patient care (e.g LOINC, ICD-10, SNOMED CT, etc). We used the Object Constraint Language (OCL) to represent patterns of eligibility criteria as constraints on the EHR4CR model to be further transformed in SQL statements executed on different clinical data warehouses.
    Studies in health technology and informatics 08/2012; 180:534-8. DOI:10.3233/978-1-61499-101-4-534
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Realist ontologies organize knowledge by strict adherence to philosophical principles, ensuring robustness and coherence. According to those principles, only entities empirically verifiable can be represented. Our study aimed to analyze medical records to evaluate which kinds of entities should be represented for physicians. We classified the entities and found several entities that cannot be represented in realist ontologies. After due analysis, results suggest that a categorization that distinguishes reality from medical knowledge about reality and observations under both of them are useful to describe entities present in medical records.
Show more