Bridging the semantics gap between terminologies, ontologies, and information models
Stefan Schulza, Daniel Schobera, Christel Danielb,c, Marie-Christine Jaulentb
aInstitute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, Germany
bINSERM, UMR_S 872, eq.20, Descartes University, Paris, France
cASIPSanté, Paris, France
SNOMED CT and other biomedical vocabularies provide se-
mantic identifiers for all kinds of linguistic expressions, many
of which cannot be considered terms in a strict sense. We ana-
lyzed such “non-terms” in SNOMED CT and concluded that
many of them cannot be interpreted as directly referring to
objects or processes, but rather to information entities. Dis-
cussing two approaches to represent information entities, viz.
the OBO Information artifact ontology (IAO) and the HL7 v3
Reference Information Model (RIM), we propose an integra-
tive solution for representing information entities in SNOMED
CT, in a way that is still compatible with RIM and the IAO
and uses moderately enhanced description logics.
SNOMED, Information Models, Ontologies
SNOMED CT, the emerging global health terminology stan-
dard is published by the International Health Terminology
Standards Development Organisation (IHTSDO) as a "core
general terminology for the electronic health record" . It
provides unified meanings for clinical terms from different
languages by assigning them to concepts as language-
independent identifiers of meaning. Terms are, according to
ISO 1087, “designations of defined concepts in a special lan-
guage by linguistic expressions” . Although there are very
different, partly contradicting approaches of which criteria
should be used to classify a linguistic expression as a term,
there is an increasing consensus of terms having both structur-
al (noun phrases) and statistic properties (occurring with a
certain frequency and specificity in written and oral communi-
cations) . However, any cursory inspection of SNOMED
reveals tens of thousands of entries for which it is at least de-
batable whether they should be regarded as terms along the
above criteria, see Table 1:
Table 1. “Non-Terms” in SNOMED CT
1 59000001 Surgical pathology consultation and report
on referred slides prepared elsewhere
2 418577003 Take at regular intervals. Complete the pre-
scribed course unless otherwise directed
3 39399006 Natural death with probable cause suspected
Helicobacter blood test negative
Poor condition at birth without known as-
Suspicion of gastritis 6 413241009
Here, rather than to terms proper, SNOMED CT concepts cor-
respond to more or less complex linguistic assertions, which
include statements of facts, beliefs, and orders. This raises the
hypothesis that these “concepts” fulfill tasks that differ from
the provision of controlled terms.
Since SNOMED RT, CT’s predecessor, description logics
(DLs) , formal languages with a well-understood semantics,
have been used to formally describe the meaning of SNOMED
CT concepts in terms of the common properties of the particu-
lar things that instantiate them. We consider these formal de-
scriptions as SNOMED CT’s ontology component, consider-
ing ontologies as theories that attempt to give precise mathe-
matical formulations of the properties and relations of real-
world particulars .
Formal representations of electronic health record content
have also motivated another line of effort, viz. the develop-
ment of information models for messages and documents in
the framework of HL7 Version 3 .
In this paper we want to explore the qualitative boundary be-
tween “terms” and “non-terms” in SNOMED CT. We post-
ulate that only for the representation of concepts that are in-
stantiated by objects in reality the current logic framework is
appropriate, whereas for SNOMED CT concepts that are in-
stantiated by information entities, this framework needs to be
extended. We will investigate what kind of things SNOMED
CT "non-terms" denote, in which parts of SNOMED CT they
occur, and how they relate to clinical information models.
Materials and Methods
SNOMED CT uses a description logics dialect known as EL
we will shortly introduce. As a running example, we use the
English term “Liver”, which belongs to a concept uniquely
identified by the number 181268008 and the human-readable
name “Entire liver (body structure)”. SNOMED CT concepts
are arranged in taxonomic (subsumption) hierarchies. This
means that all instances of this concept (i.e. all individual liv-
ers) are also instances of its taxonomic parent identified by
“272627002|Entire digestive organ (body structure)”. We ex-
press this as Liver ⊑ Digestive Organ. Beside the taxonomic
arrangement the meaning of SNOMED CT concepts can be
further described by the properties all their instances have in
common. In the following example, we employ the ⊓ (“and”)
operator and add a quantified role, using the existential quan-
tifier ∃ (“exists”). For example, the expression Inflammatory
disease ⊓ ∃ has-location.Liver extends to all instances that
both instantiate Inflammatory disease and are further related
through the relation has-location to some instance of Liver.
This example actually gives us both the necessary and the suf-
ficient conditions needed in order to fully define a class, e.g.:
Hepatitis ≡ Inflammatory disease ⊓ ∃ has-location.Liver, with
the equivalence operator ≡ telling that (i) each and every par-
ticular Hepatitis instance is also an instance of Inflammatory
disease that is located in some instance of Liver, and vice ver-
sa (ii) that every instance of Inflammatory disease that is lo-
cated at some Liver is an instance of Hepatitis.
SNOMED CT, in its current version is limited to the construc-
tors summarized in Table 2.
Table 2. SNOMED CT’s logical constructors, corresponding to the
description logics ELEL
DL Constructor Meaning
between E and F
tion of the relation r
by the filler G
B subsumes A
C ≡ D
C and D are
E ⊓ F
A ⊑ B
Liver ⊑ Organ
Organic Acid ≡
Acid ⊓ Organic
It is important to stress what can not be expressed by these
operators, especially when comparing to what is commonly
expressible by database systems: It is not possible to express
value constraints, e.g. that the relation has-laterality can only
have the values Right and Left. It is equally impossible to ex-
press cardinalities, such as precisely defining a Coronary by-
pass with three grafts. And it is not possible to formulate ne-
gations, such as Injury without infection.
These restrictions can be tolerated as well as SNOMED re-
stricts itself to the definition of the meaning of simple terms
like "Hepatitis" or "Nephrotomy". It is, however, problematic,
whenever more complex terms or whole statements as in Table
1 have to be compositionally represented.
Statements as illustrated in Table 1 typically belong to infor-
mation models, such as underlying data acquisition templates,
questionnaires and the like. Typical standards for clinical in-
formation models are open EHR archetypes  and HL7 ver-
sion 3 information models . The Reference Information
Model (RIM) is the general structure that guarantees the cohe-
rence of the complex set of HL7 version 3 models, which may
be used in many contexts to describe particular administrative
or clinical health care information. Table 3 contrasts what is
typically represented by ontologies with what is typically
represented by information models.
Table 3. Ontologies vs. Information Models. In practice the distinc-
tion is less crisp. Especially the HL7 RIM contains many classes that
can be assumed to represent non-informational entities.
Contain classes that have really
existing domain entities (parti-
culars) as members
Represent real-world particulars
in terms of their inherent prop-
Can exist independently of
information models as long as
only the existence of particular
things is recorded
Classes have information enti-
ties as members
Represent artifacts that are
build to collect or annotate
Are required to record beliefs or
states of knowledge about real
things or types of things (as
represented by ontologies)
For example the definition of the class Act in the HL7-
supported code system is “a record of something that is being
done, has been done, can be done, or is intended or requested
to be done”. Examples are clinical observations, the assess-
ment of health conditions, healthcare goals, treatment services,
assisting, monitoring or attending, patient training and educa-
tion services, editing and maintaining documents, and many
others. Acts (besides Entities and Roles) are the pivots of the
RIM; all domain information and processes are represented
primarily in acts. Any profession or business, including health-
care, is primarily constituted of intentional actions, performed
and recorded by responsible actors. An act-instance is a record
of such an intentional action. The fundamental difference be-
tween such a RIM act instance and an instance of an ontology
class (or also most SNOMED CT concepts) is to bring the
aspect of recording and thus the person who edits EHR content
into the picture. At least in theory, an instance of
RIM:Operation refers to an information object which is
“about” some type or concept, which not necessarily is instan-
tiated. Representing discourse about operations that are being
planned, postponed, or suspended is quite different from creat-
ing and instance of an ontology class Operation, as the latter
one makes a an existence claim which is often too strong.
The Ontology – Epistemology Divide
We may be able, in theory, to draw a crisp line between what
is the representation of real objects or processes on the one
hand, and what represents information entities on the other
hand. In current information models and ontologies this dis-
tinction is blurred, and users of both systems tend to be una-
ware of the very nature of things they represent. The resulting
overlaps give rise to conflicting representations, which require
sophisticated mitigation strategies (TermInfo). Such a mixed
representation of the invariant (and possible definitional)
properties of entities as they are (ontology) and how they are
seen / known / recorded (epistemology) is prevalent in most
biomedical terminology systems [8, 9].
Ontologies of Information Entities
Whether these epistemic aspects are considered relevant for
ontology is a matter of definition. In the Information Artifact
Ontology, under the OBO Foundry initiative , they are
included in an ontology framework as information content
entities, and their classes have representations of information
as members. Information content entities are immaterial ob-
jects (more precisely: generically dependent continuants ac-
cording to the Basic Formal Ontology, BFO ) that can be
borne in material objects. So can the latter be a photographic
print, and the former an (immaterial) photograph:
PhotographicPrint ≡ MaterialEntity ⊓
∃ bearerOf. (∃ isConcretizationOf. Photograph)
Information content entities encompass documents, document
parts such as sentences, texts, data, measurement results, serial
numbers, datatypes, databases, and ontologies, and the
processes in which they are created and consumed, totaling
131 classes. Information content entities are related by the
relation isConcretizationOf to their material bearers, and by
the relation isAbout to the things they denote.
There is a rough correspondence between IAO information
content entities and the HL7 classes that derive from the class
Act. In this context, Act, in contrast to its implicit meaning is
to be understood as an information entity, i.e. information
about a real act. This becomes obvious by the fact that HL7
acts can be modified by so-called mood or uncertainty codes.
The so-called moodCode in the information model distin-
guishes between acts that occurred and acts that are only
planned (ordered, scheduled, rescheduled, etc.). Mood codes
encompass intent, appointment, appointment request, promise,
proposal, recommendation, resource slot, predicate, criterion,
event criterion, expectation, goal, option, permission, permis-
sion request, risk.
The uncertaintyCode indicates whether the Act statement as a
whole, with its subordinate components has been asserted to
be uncertain in any way e.g., a patient might have had a chole-
cystectomy procedure in the past (but is not sure). When the
uncertainty is associated with an Observation.value alone or
other individual attributes of the class, such pointed indica-
tions of uncertainty should be specified by applying the Un-
certain Value – Probabilistic (UVP)1or the Parametric Prob-
ability Distribution (PPD)2
data type extensions to the specific
attribute. Particularly if the uncertainty is uncertainty of a
quantitative measurement value, this must still be represented
by a PPD<PQ> in the value and NOT using the uncertainty-
Code. Also, when differential diagnoses are enumerated or
weighed for probability, the UVP<CD> must be used, not the
uncertaintyCode. The use of the uncertaintyCode is appropri-
ate only if the entirety of the Act and its dependent Acts is
questioned. Finally, the attribute negationInd indicates that the
Act statement is a negation of the Act as described by the de-
For example, to test for "systolic blood pressure of 90-100 mm
Hg," one would use only the descriptive attributes Act.code
(for systolic blood pressure) and Observation.value (for 90-
100 mm Hg). If one would also specify an effectiveTime, i.e.,
1 A generic data type extension used to specify a probability express-
ing the information producer's belief that the given value holds.
2 A generic data type extension specifying uncertainty of quantitative
data using a distribution function and its parameters (mean, standard
for "yesterday," the criterion would be more constrained. If the
negationInd is true for the above criterion, then the meaning of
the test is that a systolic blood pressure of 90-100 mm Hg yes-
terday does not exist (independent of whether any blood pres-
sure was measured).
The IAO does not have so far a fine grained model of moods
and probabilities such as the HL7 RIM, but its architecture
does not preclude such an extension.
These examples show the crucial difference between a model
of information and a model of reality. In the former, “infor-
mation related to an act” can be subsumed by “information
related to a planned act”, whereas in a model of reality, i.e. an
ontology in a narrower sense “act” and “planned act” are not
related by taxonomic subsumption.
In the following we are studying several SNOMED CT con-
cepts that clearly belong to the category of information enti-
ties. We critique their current representation and propose an
alternative representation as information content entities.
We center our forthcoming discussion on four SNOMED term
cases (C1-C4) which, in our view, represent epistemic states
rather than ontological concepts:
C1: Absent nose (111317000) is stated to imply:
Congenital malformation ⊓ ∃ FindingSite. Nasal Structure
C2: Heart operation planned (183983001)3
in SNOMED CT’s Situation with explicit context branch and
is fully defined as
∃ Associated procedure.Operation on heart ⊓
∃ Procedure context.Planned ⊓
∃ Temporal context. Current or Specified ⊓
∃ Subject relationship context. Subject of record)
Operation on heart,
272125009|=58334001), This is a postcoordinated concept,
refining operation on heart by using the qualifier Priority with
the value Rescheduled, in DL notation: Operation on heart ⊓
∃ Priority. Rescheduled.
C4: Suspected gallstones (390926006). This concept is also in
SNOMED CT’s Situation with explicit context branch and is
fully defined as
∃ rg.( ∃ Associated finding.Gallstone ⊓
∃ Finding context.Suspected ⊓
∃ Temporal context. Current or Specified ⊓
∃ Subject relationship context. Subject of record)
. This concept is
All four concepts have in common that in their definition they
are related to other concepts that are definitely not, or not nec-
essarily, instantiated. SNOMED CT’s description logics nota-
tion, however, by using existentially quantified roles (∃), as-
serts the existence of at least one instance of the concepts in
question. So does the expression
3 rg means „role group“, cf .
∃ FindingSite. Nasal structure formally assert that some in-
stance of Nasal structure exists, whereas the intended meaning
is exactly the contrary. Similarly, the expression Operation on
heart ⊓ ∃ Priority. Rescheduled states that there is a heart op-
eration, whilst the intended meaning refers to some heart oper-
ation in the future, which still includes the case that there will
not be any operation at all (e.g. due to worsening conditions of
the patient). The same argument holds for the planned heart
operation. Regardless the syntactic difference (the rescheduled
operation is a operation, whilst the planned operation isn’t),
∃rg. (∃ Associated procedure.Operation on heart)
is a necessary condition for Heart operation planned, i.e. the
plan implies its execution, which is certainly not always the
case. In exactly the same way, the definition of Suspected gall-
stones leads to the conclusion that there exist real gallstones
even in case a doctor registers a suspicion only.
What is wrong with these concept definitions? There is no
doubt that there must be a way to refer to “something” which
does not exist now, which existed in the past, or which may
exist in the future. But statements about non-existence are not
terms, although they syntactically include terms. Ideally, they
should be represented in an information model, which is dis-
tinct from the ontology, or is expressed in an “information
Entity” branch in the same ontology. However, there are
strong reasons why application builders want to have “real”
concepts as well as whole assertion in one and the same repre-
sentational artifact such as SNOMED CT. So has it been a
precondition for the use of this standard with in the UK Na-
tional Health Service, that the former CTV3 terminology was
fused with SNOMED RT. One characteristics of CTV3 (the
successor of the former Read Codes) was its abundance of
epistemic laden concepts such as in our examples.
We here propose alternative representations based on the in-
formation artifact ontology, using information content entities
such as Plan and Suspicion. All the four concepts Absent nose,
Heart operation planned, Operation on heart, rescheduled,
and Suspected gallstones represent information content enti-
ties. In order to make this clear (and because the language is
often misleading), we slightly rename the concepts to Patient
without nose, Plan of heart operation, Rescheduled plan of
heart operation, Suspicion of gallstones.
To express this adequately, we need to enhance our descrip-
tion language by the constructors given in Table 4.
Table 4. Additional description logics constructors
DL Constructor Meaning
Negation of A
Value restriction of
the relation r by the
Union of A with B
A further extension of the logics including concrete domains
(in this case numeric values) will be necessary if probabilistic
values are to be represented such as UVP and PPD in HL7
RIM. This is already possible, e.g. using data properties in
Base ⊑ ¬ Acid
(Left ⊔ Right)
A ⊔ B
Protégé, but it is not yet covered by off-the-shelves termino-
logical reasoners such as Fact++ and Pellet.
Coming back to the running examples, we propose the follow-
C1: Person without nose:
Human ⊓ ¬ hasPart. Nasal Structure
C2: Plan of heart operation (183983001):
Plan ⊓ ∀ isAbout. Operation on heart
with Plan being an information content entity. The universal
quantifier ∀ means that this plan can only be realized by a
heart operation. In contradistinction to the existential quantifi-
er ∃ the formula does not assert that there must be an operation
for each and every plan.
C3: Rescheduled plan of heart operation:
Plan ⊓ ∀ isAbout. Operation on heart ⊓
Plan ⊓ (∀ isAbout. Operation on heart) ⊓
with Rescheduling being an event.
C4: Suspicion of gallstones.
Suspicion ⊓ ∀ isAbout. Gallstones
with Suspicion being an information content entity.
There may be a need to distinguish simple instantiations (e.g.
asserting that there is an instance of Gallstones) from a record
of a finding (i.e. that some physician has diagnosed gall-
Note that all version of SNOMED CT until now, have placed
Gallstones (a material entity), together with processes like
Myocardial infarction, Headache and Hypercholesterolemia
into an epistemology-infested Findings hierarchy.
The subtle difference between instantiations and findings is
that there are undiagnosed diseases just as there are false diag-
noses (which continue being diagnoses even being false).
These special cases should be accounted for in a medical
record, and the terminology should provide the means for this.
We propose a solution using again the example C4.
We may want to distinguish between:
C4a: A diagnosis “Gallstones” whatsoever
C4b: A confirmed diagnosis “Gallstones”
C4c: A suspected diagnosis “Gallstones”
C4d: A false diagnosis “Gallstones”
C4e: Gallstones that have not been diagnosed
In all these cases diagnoses are information content entities.
According to the diagnosing person they can be subdivided in
terms of medical diagnosis, nursing diagnoses, etc.
C4a: Diagnosis ⊓ ∀ isAbout.Gallstones
C4b: Diagnosis ⊓ ∀ isAbout.Gallstones ⊓
∃ isAbout. Gallstones
C4c: Diagnosis ⊓ ∀ isAbout. Gallstones ⊓ Download full-text
C4d: Diagnosis ⊓ ∀ IsAbout. ⊥
C4e: Gallstones ⊓ ¬∃ inv(IsAbout).Diagnosis
The examples show the possibilities but also the limitations of
using the proposed description logics. If we wanted to
represent quantitative statements, e.g. in C4c that there is a
probability of 0.1 that the diagnosis is true, then we would
need to include numeric values as data properties. As C4d
shows, there is no possibility to distinguish between different
kinds of false diagnoses. From a HL7 point of view, the estab-
lishment of a diagnosis is an observation, a sub-class of the
class Act defined as “An act that is intended to result in new
information about a subject.” Being a sub-class of the class
Act, the class Observation inherits of the attributes of the class
Act including moodCode, uncertaintyCode and negationInd.
In addition UVP or PPD data type extension may be used to
express respectively a probability expressing the information
producer's belief that the given qualitative observation value
holds or the uncertainty of quantitative data using a distribu-
tion function and its parameters.
Numerous SNOMED CT concepts are representations that are
more adequately described by complex linguistic statements
than by domain terms in a stricter sense. These complex state-
ments address epistemic notions, i.e. information about the
user and the context, which clearly extends the realm of ontol-
ogy. Those SNOMED CT concepts that correspond to “real”
terms, can generally be defined using the very inexpressive
logic ELEL, currently used for SNOMED CT.
In our finding that there are numerous SNOMED “non-term”
concepts that cannot be adequately represented giving the cur-
rent restrictions of SNOMED CT’s logic, we are close to the
analysis done by Rector & Brandt . Just as we do, they
defend the (controlled) use of a more expressive description
logics, analyzing a similar scope of concepts as we do. How-
ever, the model they propose is different. By understanding
findings, procedures, and observables as situations they man-
age to solve the negation problem. Yet their approach reaches
short when it comes to uncertainty, such as speculative diag-
noses, or plans that have not yet been executed at the time of
Our approach comes closer to what is possible to encode using
the HL7 RIM, where medical record entries can be modified in
terms of “mood codes” like Event, Goal, Risk, Expectation,
Intent, or uncertainty codes such as Possibly done or Probably
done. It is also consistent with the Information Artifact Ontol-
ogy, which, however, lacks detail for representing diagnostic
statements. Thus, using one single representation formalism,
our proposal brings different worlds together: real-world, hete-
rogeneous terminologies, HL7 information models, as well as
philosophically founded ontologies.
This work was funded by the EU 7th FP project DebugIT
 IHTSDO (Intern. Health Terminology Standards Devel-
opment Organisation). Systematized Nomenclature of
Medicine - Clinical Terms. http://www.ihtsdo.org/snomed-
ct. Last accessed: March 2nd, 2010.
 ISO 1087. Terminology Work. International Standards
 Wermter J. Defining and Collocations and Terms. In:
Wermter J. Collocation and Term Extraction Using Lin-
guistically Enhanced Statistical Methods, chapter 2. 2009,
http://www.dart-europe.eu/full.php?id=159166. Last ac-
cessed: March 2nd, 2010.
 Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-
Schneider PF, editors. The Description Logic Handbook.
Theory, Implementation, and Applications (2nd Edition).
Cambridge: Cambridge University Press, 2007.
 Hofweber T. Logic and Ontology, Stanford Encyclopaedia
of Philosophy, 2004.
Last accessed: March 2nd, 2010.
 HL7 version 3, Jan 2009 ballot package, 2009,
ment/index.htm. Last accessed: March 2nd, 2010.
 Garde S, Knaup P, Hovenga E, Heard S. Towards semantic
interoperability for electronic health records. Methods Inf
Med. 2007; 46(3): 332–343.
 Ingenerf J, Linder R. Assessing applicability of ontologi-
cal principles to different types of biomedical vocabularies.
Methods Inf Med. 2009; 48(5): 459–467.
 Bodenreider O, Smith B, Burgun A (2004). The Ontology-
Epistemology Divide: A Case Study in Medical Terminol-
ogy. Int. Conf. on Formal Ontology and Information Sys-
tems (FOIS 2004). Amsterdam: IOS-Press, 185–195.
Last accessed: March 2nd, 2010.
Basic Formal Ontology. http://www.ifomis.org/bfo. Last
accessed: March 2nd, 2010.
Spackman KA, Dionne R, Mays E, Weis J. Role grouping
as an extension to the description logic of Ontylog, moti-
vated by concept modeling in SNOMED. Proc AMIA
Symp. 2002: 712–716.
Rector AL, Brandt, S. Why Do It the Hard Way? The
Case for an Expressive Description Logic for SNOMED.
Journal of the American Medical Informatics Association
2008; 15: 744–751.
Address for correspondence:
Stefan Schulz, IMBI, University Medical Center Freiburg, Stefan-
Meier-Str. 26, D-79104