Content uploaded by Thomas Roth-Berghofer
Author content
All content in this area was uploaded by Thomas Roth-Berghofer on Oct 26, 2015
Content may be subject to copyright.
216 Int. J. Knowledge Engineering and Data Mining, Vol. 1, No. 3, 2011
Copyright © 2011 Inderscience Enterprises Ltd.
Improving understandability of semantic search
explanations
Thomas Roth-Berghofer*
Knowledge Management Department,
German Research Center for Artificial Intelligence (DFKI) GmbH,
Trippstadter Straβe 122, Kaiserslautern, 67663, Germany
and
Knowledge-Based Systems Group,
Department of Computer Science,
University of Kaiserslautern,
P.O. Box 3049, Kaiserslautern, 67653, Germany
E-mail: thomas.roth-berghofer@dfki.de
*Corresponding author
Björn Forcher
Knowledge Management Department,
German Research Center for Artificial Intelligence (DFKI) GmbH,
Trippstadter Straβe 122, Kaiserslautern, 67663, Germany
E-mail: bjoern.forcher@dfki.de
Abstract: Explanation-aware software design aims at making software
systems smarter in interactions with their users. The long-term goal is to
provide methods and tools for systematically engineering understandability into
the respective (knowledge-based) software system. In this paper, we describe
how we improved a semantic search engine, i.e., RadSem, regarding
understandability.
The research project MEDICO aims at developing an intelligent, robust and
scalable semantic search engine for medical documents. RadSem is based on
formal ontologies and designated for different kinds of users. Since semantic
search results are often hard to understand, an explanation facility for justifying
and exploring search results was integrated into RadSem employing the same
ontologies used for searching also for explanation generation.
We evaluated the understandability of selected concept labels in an
experiment with different user groups using semantic networks as form of
depicting explanations and using a class frequency approach for selecting
appropriate labels.
Keywords: understandability; knowledge engineering; justification; graphical
explanation; semantic search; evaluation; medical terms.
Reference to this paper should be made as follows: Roth-Berghofer, T. and
Forcher, B. (2011) ‘Improving understandability of semantic search
explanations’, Int. J. Knowledge Engineering and Data Mining, Vol. 1, No. 3,
pp.216–234.
Biographical notes: Thomas Roth-Berghofer is a Senior Researcher at DFKI’s
knowledge management research department since 2004. He was a Consultant
and Quality Manager at the Bertelsmann Company Empolis-Arvato, before he
Improving understandability of semantic search explanations 217
joined the University Hospital Heidelberg and DFKI in 2002 as a Researcher
and the Technical Project Director of the award-winning EU-funded project
MedCIRCLE (Janssen-Cilag Future Award 2004). He gives lectures on the
Semantic Web as well as on Case-Based Reasoning at the University of
Kaiserslautern. His research focuses on information technology for knowledge
management systems. He is mainly interested in the development of
trustworthy, complex information systems that are able to explain themselves.
In order to effectively pursue this topic, he initiated a workshop series on
Explanation-aware Computing (ExaCt), thereby, coining this term. He leads the
ExaCt research group at DFKI.
Björn Forcher is a Researcher at DFKI since 2008. Before, he worked as a
Software Engineer in the project TaskNavigator, under the auspices of the
Competence Center ‘Virtual Office of the Future’. He also worked in the
project DocuTag where he was responsible for developing an explanation
facility. His research focuses on information technology for KM systems, in
general, and on semantic web technology and logic programming, in particular.
He is mainly interested in the development of an explanation framework that
can be used to integrate explanations in information systems. He is a member
of the ExaCt research group.
1 Introduction
Software systems need the ability to explain ‘reasoning processes’ and their results as
those abilities can substantially affect their usability and acceptance. Explanation-aware
software design (EASD) aims at making software systems smarter in interactions with
their users (Roth-Berghofer, 2009). The long-term goal is to develop methods and tools
for engineering and improving such capabilities.
The research project MEDICO aims (among other things) at developing an intelligent
semantic search engine for medical documents and addresses different kinds of users,
such as medical doctors, medical IT professionals, patients and policy makers. Its
ultimate goal is to realise a cross-lingual and modality-independent search for medical
documents, such as medical images, clinical findings or reports (Möller and Sintek,
2008). Representational constructs of formal ontologies are used to annotate and retrieve
medical documents. For example, medical experts annotate CT scans1 with medical
concepts. The concepts then can be used by other medical experts and laypeople for
retrieving the respective images. The search algorithm leverages the ontology for finding
images that are annotated with the same or adjacent concepts. Currently, the MEDICO
demonstrator RadSem (Möller et al., 2009) employs the Foundational Model of Anatomy
(FMA) (Rosse and Mejino, 2007) and the International Statistical Classification of
Diseases and Related Health Problems (ICD-10) (ICD, 2007).
Since semantic search results are often hard to understand and not necessarily
self-explanatory, explanations are helpful to support users. Each user group has
different requirements and comes with different a priori knowledge regarding the
medical domain. Medical IT professionals, for instance, may want to test the search
engine. In this context, explanations are interesting when the system presents unexpected
results. It may turn out that the implementation or the used ontologies are incorrect.
Hence, explanations can help correcting a system or improving it. In contrast to medical
IT professionals, patients and citizens are not interested in the exact implementation of
218 T. Roth-Berghofer and B. Forcher
the search algorithm. Instead, they may want to learn something about the medical
domain. This concerns first of all medical terms but also the connection between medical
concepts.
For addressing these issues, we developed and integrated an explanation facility with
RadSem. The facility is used to justify search results by revealing a connection between
search and annotation concepts. It tries to make the search results more plausible for the
user. Finding a connection the facility also exploits the mentioned ontologies. Thus, the
connection or justification contains further concepts of the FMA or ICD-10. Especially
the FMA provides several medical terms for labelling a specific concept. As medical
laypeople cannot associate any label with corresponding concepts a justification may not
be understandable to all of them. In contrast, medical experts may prefer explanations
that fit their daily language. In other words, the problem is to select appropriate labels
with respect to different user groups. For this reason, we conducted an experiment and
discuss its results in order to refine the explanation generation specifically to medical
experts and laypeople.
The paper is structured as follows. The next section gives a short overview about
relevant research on explanation generation. Section 3 presents current techniques of
semantic search algorithms and motivates the need for explanations. Section 4 describes
how semantic searching works in the MEDICO demonstrator RadSem. Section 5
presents our work of explaining semantic search results by justification and exploration.
Section 6 describes the user experiment and discusses its results in order to realise a
tool that can be used for tailoring explanations to different user groups. We conclude
the paper with a brief summary and outlook.
2 Related work
The notion of explanation has several aspects when used in daily life (Passmore, 1962).
For instance, explanations are used to describe the causality of events or the semantics of
concepts. Explanations help correcting mistakes or serve as justifications. Furthermore,
they are used to describe functionalities or to communicate practical knowledge. Hence,
the term explanation has an ambiguous notion that is used in different situations.
Explanations in computer science were introduced in the first generation of expert
systems. They were recognised as a key feature explaining solutions and reasoning
processes, especially in the domain of medical expert systems such as MYCIN
(Buchanan and Shortliffe, 1984).
There are several approaches for providing explanations in expert systems that can
be seen as some kind of EASD. XPLAIN (Swartout, 1983), for instance, is a tool that
helps users building expert systems containing explanation components. It makes also
knowledge about the domain usable. For this, it allows the domain model to contain
the facts of the domain as an input. The second form of input is a collection of domain
principles, which are the methods or algorithms that apply to the facts. This system
refines the domain knowledge (preserving the knowledge) until it is at an appropriate
level for the implementation of an expert system. An extension is Explainable Expert
System (ESS) (see Swartout and Smoliar, 1987). Here, knowledge about concepts and
concept classes can be formulated.
Explanation facilities were an important component supporting the user’s needs
and decisions (Swartout et al., 1991). In those early systems, explanations were often
Improving understandability of semantic search explanations 219
nothing more than (badly) paraphrased rules that lacked important aspects or too much
information was given at once (Richards, 2003). For that reason, Swartout and Moore
(1993) formulated five desiderata for expert system explanations which also apply for
knowledge-based systems, so we tried to consider them in our current work.
1 Fidelity: The explanation must be an accurate representation of what the ES really
does. Hence, explanation has to build on the same knowledge which the system uses
for its reasoning.
2 Understandability: This comprises various factors such as user sensitivity and
feedback. User sensitivity addresses the user’s goals and preferences but also his
knowledge with respect to the system and the corresponding domain. Feedback is
very important because users do not necessarily understand a given explanation. The
system should offer certain kinds of dialogue so that users can become clear on parts
they do not understand.
3 Sufficiency: The system has to know what it is talking about. This is an important
factor to enable some kind of dialogue with the user.
4 Low construction overhead: It should not be too complicated to integrate an
explanation component into an expert system.
5 Efficiency: The explanation component should not affect the performance of the
whole system.
In a general explanation scenario (Figure 1) we distinguish three main participants
(Roth-Berghofer and Richter, 2008): the user who is corresponding with the software
system via its user interface (UI), the originator, i.e., the problem solver or ‘reasoning’
component, which provides the functionality for the original task of the software, and
the explainer. Originator and explainer need to be tightly coupled in order to provide the
necessary knowledge about the inner workings of the originator. In (rule-based) expert
systems looking at the rule trace was the only way of accessing the originator’s actions.
Given that the inferencing mechanism is fixed in those systems the trace was all the
explainer needed. As soon as no trace is available the explainer has nothing to work
with and, thus, cannot generate an accurate explanation of the reasoning process.
Figure 1 Communication participants in general explanation scenario (see online version for
colours)
220 T. Roth-Berghofer and B. Forcher
Wick and Thompson (1992) present an approach for dealing with this issue. The
reconstructive explainer generates reconstructive explanations for expert systems. It
transforms a trace, i.e., a line of reasoning, into a plausible explanation story, i.e., a
line of explanation. The transformation is an active, complex problem-solving process
using additional domain knowledge. The degree of coupling between the trace and the
explanation is controlled by a filter which can be set to one of four states regulating
the transparency of the filter. The more information of the trace is let through the filter,
the more closely the line of explanation follows the line of reasoning. This approach
enables a disengagement of an explanation component in order to reuse it in other expert
systems. We took up this theme in our current work.
Sørmo and Cassens (2004) describe different explanation goals for case-based
reasoning systems. As they also apply to knowledge-based systems (e.g., MEDICO) we
made use of them (see Section 5.1):
1 Justification: Explain why the answer is a good answer. It is used to give a simpler
explanation according to the system process.
2 Transparency: Explaining how the system calculates a certain answer allows users
to understand and (better) control the system.
3 Relevance: In conversational systems, this goal aims at why a question asked is
relevant.
4 Learning: In tutoring systems or decision support systems, it is important to teach
users about the respective domain.
The Semantic Web community also addresses the issue of explainability. The Inference
Web effort (McGuinness et al., 2006) realises an explanation infrastructure for complex
Semantic Web applications. Inference Web includes the Proof Markup Language for
capturing explanation information. It offers constructs to represent where information
came from (provenance) or how it was manipulated (justifications). Inference Web
includes different tools and services in order to manipulate and present the explanation
information.
The goal of our research is also to provide tools and algorithms using formal
knowledge such as ontologies for explanation provision. The focus of our work is to
generate understandable and adequate explanations for knowledge-based systems.
3 Semantic search
There are several definitions of the term semantic search. In this work, we refer to
the most common definition and use the term semantic search when formal semantics
are used during any part of the search process (Hildebrand et al., 2007). Two main
categories of semantic search can be identified: fact and semantic document retrieval.
Fact retrieval engines are employed for retrieving facts (triples in the Semantic Web)
from knowledge bases employing formal ontologies. Such approaches apply three kinds
of core search techniques: reasoning,triple-based search, i.e., structural interpretation
of the query guided by semantic relations, and graph traversal search (Hildebrand et
al., 2007).
Semantic document retrieval engines search for documents which are enriched
with semantic information. They use additional knowledge to find relevant documents
Improving understandability of semantic search explanations 221
by augmenting traditional keyword search with semantic techniques. Such engines
use various thesauri, e.g., the RDF version of WordNet (2006), for query expansion
(see also Section 4) and/or apply graph traversal algorithms to available ontologies
(Hildebrand et al., 2007; M¨
akel¨
a, 2005). Analogously, the same semantic techniques are
used to retrieve other kinds of resources, e.g., images, videos, where additional formal
knowledge is used to describe them.
Semantic search engines are often personalised or contextualised. Both extensions
apply additional knowledge, personalisation for example about the personal skills,
interests or level of awareness of the user, context about the currently handled task, the
location, etc. This knowledge can be hard coded or learnt by user observation, explicit
and implicit feedback. The main point in terms of explanations is they influence the
search process and results.
Users have various intentions to use the semantic search engine of RadSem. For
instance, a user wants to inform himself of a medical concept he does not remember.
In this case, he most probably searches for are similar or superior concept. Imagine, the
user searches for information about the shoulder height but using the term shoulder for
his search. If the user obtains a document and associated text snippet highlighting the
term acromion he or she may not know whether the document is relevant or not. In this
context, a short explanation can provide useful information to support the user’s search
intention. An explanation expressing that the term acromion is a synonym for shoulder
height and that shoulder height is part of shoulder may help the user remembering.
The explanation needs to reveal the connection between the query and the obtained
document. In general, users are not interested in the search techniques of the engine,
i.e., how the document is retrieved. In daily tasks, the user requires only a simple
justification of the result. As semantic search algorithms use semantic techniques such as
ontologies, this formal knowledge can be leveraged to generate appropriate explanations.
4 Semantic search in RadSem
The MEDICO demonstrator RadSem uses formal ontologies to annotate medical
documents in order to describe their content. RadSem employs the FMA and ICD-10
ontology. As there was no ontology of ICD-10 available, M¨
oller et al. (2009) developed
a tool that parses the English and German online versions and provides an OWL
ontology of ICD-10. The search algorithm exploits the class structure of these ontologies
to retrieve documents that are annotated with semantically similar concepts with respect
to a certain search concept. For instance, searching for radiographs of the hand, users
may obtain documents that are annotated with the concept index finger or pisiform bone.
Semantic Web technologies provide the means to annotate and query medical domain
knowledge of high quality without requiring expertise of medical experts. Eliminating
the weak points of existing systems RadSem employs the Semantic Web languages
OWL (McGuinness and van Harmelen, 2004) and RDF (Beckett, 2004) as mutual basis
to represent annotations and medical concepts in the same formalism. The MEDICO
Ontology Hierarchy (M¨
oller and Sintek, 2007) comprises various medical ontologies
that cover different aspects of biomedical knowledge and patient data management. The
reason for this is to reutilise existing medical background knowledge formalised in
such ontologies as the FMA (Rosse and Mejino, 2007) and terminologies as RadLex
(Langlotz, 2006) and ICD-10.
222 T. Roth-Berghofer and B. Forcher
Different studies, e.g., Marwede et al. (2007) or Marwede and Fielding (2007),
came to the conclusion that biomedical ontologies and terminologies are applicable
for indexing medical knowledge such as CT scans of the brain or radiograph reports
of the shoulder. Annotations of medical data are stored as instances of well-defined
OWL classes. The advantage of this handling is to enable further interesting applications
such as clinical data mining. The integration of medical annotations and machine-aided
extraction of patient and image metadata using the same representational concepts
facilitate advanced visualisations.
The MEDICO prototype RadSem has too main functionalities, i.e., annotating and
retrieving medical images. With respect to the first functionality, images such as CT
scans can be opened, visualised and segmented into so called regions of interest (ROI).
There are three kinds of medical annotations for these ROIs. Regarding anatomy the
FMA is applied. Representational constructs for indicating the visual manifestation of
anatomical entities on images are taken from the RadLex terminology. The third kind of
annotation concerns the disease aspect. In this case, the ICD-10 provides the necessary
concepts.
The search functionality allows retrieving semantically annotated images with
terms provided by the mentioned ontologies and terminologies. Terms, entered in a
search field, are mapped to ontological classes. Making the search more comfortable
auto-completing combo-boxes are used. While typing in keywords, class labels with
matching prefixes are displayed in a drop down box and can be used for the search.
The search algorithm of RadSem utilises the structure in the presented ontologies.
For instance, there is path in the part-of hierarchy of the FMA which leads from
the concept hand to the concept ring finger. In a previous version of RadSem, this
connection is used to generate query expansion and to compute a SPARQL query
(Prud’hommeaux and Seaborne, 2007). Regarding FMA, a request for the concept hand
searches for the hand itself and further nearby concepts such as thumbnail or middle
finger. In other words, there must be a (transitive) connection between the concepts
which is expressed by the object properties regional part of and constitutional part of
of the FMA. The rank of the results is based on the path length between the search
concept and the concept that is used for annotating a specific image. Regarding ICD-10,
the subClassOf property is used.
It turned out that computing SPARQL queries at run-time is not fast enough.
According to Möller et al. (2009), the FMA contains some kind of ‘multiple inheritance’
with respect to the part of hierarchy. For example, the lung is both part of set of
thoracic viscera and of lower respiratory tract. For this reason, a spanning tree for the
regional part of and constitutional part of relation was materialised originating from
the concept human body. The determination of the shortest path to the root of this
closure allows computing a query expansion of any depth in two retrieval dimensions.
The meaning of the first one is to retrieve a set of concepts in the spanning tree of the
part of relation down to a specific depth. The second one is used to get the annotated
images.
Currently, the MEDICO ontology hierarchy comprises more than two million RDF
triples. A client-server architecture was installed to ensure as short start-up times as
possible concerning the large number of triples. For this purpose, RadSem itself captures
only user interactions and query expansion. The answering of SPARQL is outsourced
to an external server hosting SwiftOWLIM (OWLIM Semantic Repository, 2009),
Version 3, a Storage and Inference Layer (SAIL) for the Sesame RDF store (2009).
Improving understandability of semantic search explanations 223
A further benefit of applying medical ontologies such as the FMA is the availability
of labels in various languages. In addition, several concepts are labelled with more than
one synonym in a certain language. In the current release of RadSem, any language for
annotating or searching images is possible. The English language offers a large number
of synonyms, but also Latin, German, French and Filipino labels are available.
5 Explanations in RadSem
The explanation facility in RadSem comprises two components: the justification
component and the exploration component. As its name implies, the first component is
primarily intended to justify the retrieval of medical documents. The second component
can be used to explore the underlying ontologies and offers various kinds of interaction.
In principle, the explanation facility is independent from RadSem. It uses the
Achilles library as alternative interface to access the underlying triple store. Achilles is
developed by the Core Technology Cluster (CTC) (THESEUS-Basistechnologien, 2009)
of the THESEUS research programme and allows to use the OWL API (2009) with
RDFS stores and reasoners, thus enabling applications to switch from RDFS to OWL
or vice versa (or even use both). Hence, the explanation facility can be decoupled from
RadSem and used in different settings with respect to the mentioned ontology languages.
5.1 Considered kinds and goals of explanations
The explanation facility realises two kinds of explanations (Spieker, 1991), namely
action explanations and concept explanations. The purpose of action explanations is to
describe how a result was obtained, thus providing a justification with respect to the
current situation. Most probably, a detailed action explanation of the semantic search
algorithm is not important for most MEDICO users. For this reason, we provide simple
evidence for the semantic search results in the justification component. The purpose
of concept explanations is to link an unknown concept to already known terms. The
exploration component reveals connections of a concept of interest with respect to
several kinds of relations and offers supporting filter options.
In the design of our explanation facility we also considered Sørmo and Cassens’
(2004) explanation goals (see Section 2) explicitly. Inexperienced users such as patients,
citizens or policy makers may use the explanation facility in order to learn something
about the medical domain. Furthermore, justifications can help to maintain trust and
acceptance in results. In contrast to inexperienced users, medical IT professionals can
use the explanations to detect problems and errors of the used ontologies that cannot be
achieved by syntax checkers or other authoring tools. Most probably, this holds for very
large ontologies such as the FMA with its about 70,000 concepts.
5.2 Justification component
The implementation of the justification component does not follow all desiderata as
presented in Section 2. One essential problem is the demand for a low construction
overhead which contradicts fidelity. As the need for explanations in RadSem was
recognised only when the semantic search algorithm was already implemented, the
adding an intelligent logging mechanism into RadSem was not an option due to complex
implementation issues. Hence, the demand for fidelity could not be fulfilled here.
224 T. Roth-Berghofer and B. Forcher
For this reason, the justification component performs a kind of reconstructive
explanation as described in c omitting the process information of the search algorithm
altogether (filter with no-restrict state). In this case, the search concepts correspond to
the input and the annotation concepts correspond to output in the line of explanation,
whereas the story in between is constructed by the explanation facility using the
ontologies FMA and ICD-10 as knowledge base.
Since search and annotation concepts belong to ontologies, the construction is very
simple. The part of and subClassOf hierarchies are transformed into a semantic network
representing a mathematical graph. Thus, the construction of the line of explanation
for semantic search in MEDICO can be reduced to a shortest path problem. We chose
Dijkstra’s (1959) algorithm to solve this problem. The algorithm can only be performed
on non-negative edge path costs, so the question which costs to choose for properties of
the ontologies arises. In our current implementation, we assume an equal distribution,
i.e., all properties have the same cost. As the found shortest paths may not be the same
paths followed by the semantic search algorithm the fidelity desideratum is not ensured.
Figure 2 Justification component (see online version for colours)
Note: Letter Asymbolises anatomic concepts, i.e., from FMA and letter Drefers to disease concepts,
i.e., from ICD-10.
Source: Forcher et al. (2009)
Figure 2 depicts an example search in RadSem with respective justification. In this
case, the justification belongs to the first retrieved document containing two anatomical
annotations (and one disease annotation). Thus, two ‘anatomical paths’ were constructed
and integrated into one semantic network.
Explanations (like any kind of information) have two different aspects: form
and content (Kemp, 1992). An explanation is some kind of information which is
communicated through a certain form of depiction such as text or semantic networks
(Ballstaedt, 1997).
Improving understandability of semantic search explanations 225
With respect to the understandability desideratum we chose semantic networks. In
general, charts are used to represent qualitative connections between concepts and can
be used as an (understandable) alternative to text (Wright and Reid, 1973). A text
communicating the justification would be too difficult to understand in most of the cases
(but may be a viable alternative in others cases and, thus, is offered by the system on
request). This is especially the case when using the English labels of the FMA which
partly are very long. Hence, a long text may have a negative impact on understandability
(Davison and Green, 1988).
Consider the following example concerning a medical image with the annotation
concept index finger. A textual justification for obtaining this image with respect to the
search concept articular cartilage of distal epiphysis of index finger can be paraphrased
as follows: “The anatomical search concept articular cartilage of distal epiphysis of
index finger is part of the concept distal phalanx of index finger that is part of the
annotated concept index finger. Hence, the search concept is part of the annotated one”.
Here, it may be difficult to remember connections or to keep different functions
of concepts in mind such as anatomical search concept or disease annotation concept.
For this reason, we prefer the use of semantic networks using a coherent visualisation
scheme in order to address these difficulties.
Figure 2 justifies the result of a query searching for the concepts hand and fracture
at wrist and hand level where the corresponding medical image is annotated with the
concepts distal phalanx of index finger,middle finger and fracture at wrist and hand
level. Addressing the comprehension of users, nodes integrate different information.
Letter Asymbolises anatomic and letter Ddisease concepts. A magnifier stands for
a search concept whereas the bitmap symbol indicates an annotation concept. A
magnifying glass with underlying bitmap symbol means that search and annotation
concept are the same.
Figure 2 shows an additional feature of semantic networks. Search concepts are
always in the upper part of the figure whereas annotation concepts are in the lower part
of it. Hence, one can use spatial order to encode non-spatial information which also
supports the comprehension of users (Ballstaedt, 1997).
The graph is drawn with the Jung2-Util (2009), an extension of Jung2 (2009).
The tool provides several intelligent (semi-automatic) layout managers and supports
individual graph, node and edge visualisations.
5.3 Exploration component
The justification may not always be the most appropriate for all targeted user groups.
For this purpose we offer the Exploration Component which can be used for different
purposes. It can be used to learn about either the medical domain or the structure of the
used ontologies.
Figure 3 illustrates the exploration component using semantic networks to illustrate
connections between different concepts. The central node is the one of interest showing
all adjacent nodes with respect to sub-ordinations and part-of-relations. The relations
regional part of and constitutional part of as presented in Section 4 are bundled under
one general part-of relation. The rational here is also to simplify the visualisation due
to understandability reasons.
226 T. Roth-Berghofer and B. Forcher
Figure 3 Exploration component (see online version for colours)
Note: The super (1) and subclass (2) relations of the concept index finger are depicted as well as the
concept’s parts (3) and container (4).
The depicted graph is divided into four areas using a segment layout. Each segment
indicates certain information similar to the order of nodes in the justification component.
The upper right segment (1) shows super classes of the central node, whereas the lower
right segment (2) contains all subclasses. The lower left segment (3) lists all concepts
which are part of the central concept. Finally, the upper left segment (4) is reserved for
the concepts that contain the central concept. Hence, more general concepts are depicted
in the upper half of the figure and more special concepts appear in the lower half.
The lower left segment is somewhat special. Due to its structure, the FMA contains
numerous nodes with very long labels. Rendering all available information would lead
to a confusing visualisation as node labels would overlay other labels. The same applies
to the edge label part-of. For this reason, we shortened the labels of the nodes and
rendered just one edge label in the middle of that segment. Yet, the shortening of the
labels is very simple because of the labelling strategy in the FMA. In many cases, labels
of concepts contain a part of the central node’s label. We abbreviated those labels by
three dots. The complete label can easily be completed by the reader in combination
with the central node.
The exploration component offers multiple interaction modes. In exploration mode,
users can click on nodes in the segments to learn about their adjacency. A history panel
right beside the explanation panel contains all explored concepts. Furthermore, users can
Improving understandability of semantic search explanations 227
hide certain types of relations. For instance, they can hide the part-of relation so that
super and subclasses of the central concept remain. Finally, users can pick and move
nodes of the segments, move the whole graph and scale up or down the graph.
5.4 Discussion
So far, we described the initial implementation of our explanation component. The
justification component as presented in Section 5.2 already reveals two general
problems. The first issue is with the generation itself. Dijkstra’s algorithm determines
only one, the shortest path. Hence, potential alternative explanations are not found which
may be better in a certain context with respect to different user groups. In addition,
the number of concepts and thus, the amount of information is preset. Potentially, the
explanation path contains too much or too few information.
The second problem concerns the adequacy of a justification. In particular the FMA
provides several synonyms to label a concept. Currently, the explanation facility uses
the preferred label to name a certain concept in the explanation path. Most probably,
not all users can associate the preferred label with a corresponding concept.
Understandability is an important aspect of explanation quality, especially for
medical laypeople. (Medical) experts prefer terms that they use in daily work. For
instance, the term shoulder girdle may be better for laypeople in contrast to pectoral
girdle which is more appropriate for experts. To conclude, the difficulty is to determine
the best label for different user groups such as medical experts and medical laypeople.
In this paper, we focus on the second problem. Our goal is to provide a simple
approach to evaluate labels with respect to the different user groups. This approach may
be extended not only to evaluate single labels but also to evaluate alternative explanation
paths or justifications.
Beyond question, the degree of knowledge about medical terms has a significant
effect on adequacy and understandability. Hence, a method is required to determine the
degree of knowledge of different user groups with respect to the terms or labels used in
an explanation path.
Hayes (1992) showed that understandability correlates with the frequency of terms
in natural language. This insight is the basis of the experiment in the following section.
To operationalise term frequencies in natural language, we introduce so called frequency
classes.
The frequency class of a term tis defined as follows (zu Eissen and Stein, 2006):
Let Cbe a text corpus and let f(t)denote the frequency of a term t∈C. The frequency
class c(t)of a term t∈Cis defined as ⌊log2(f(t∗)/f(t))⌋, where t∗denotes the most
frequently used term in C.
In many English corpora, t∗denotes the term the which corresponds to frequency
class 0. Thus, a more uncommonly used term has a higher frequency class. In the
following, we refer to any frequency class c(t) = ias ci.
6 User experiment
Our main hypothesis is that the degree of knowledge of medical terms correlates
with frequency classes. The more often a term is used in natural language, the more
users know about that term. In order to verify the applicability of this assumption, we
228 T. Roth-Berghofer and B. Forcher
conducted a user experiment. In this experiment, the test candidates should estimate
their knowledge concerning several medical terms.
6.1 Experiment setup
For evaluating the personal estimation of medical knowledge, 200 medical terms of the
FMA and ICD-10 were selected consisting of one or two tokens. As German is the
mother tongue of the test candidates, only German terms were considered in order to
avoid a distortion of the evaluation with respect to language problems. We randomly
selected ten terms for each frequency class c10, . . . , c13 and 15 terms for each frequency
class c14, . . . , c21 . The frequency classes were determined with the help of a service of
the University of Leipzig (2009).
The first group of terms contains well-known terms such as schulter (shoulder),
grippe (influenza), or zeigefinger (index finger), which all test candidates typically
know. In contrast, the second group contains terms that are typically unknown to
medical laypeople. In addition, we randomly selected 40 terms of the FMA and ICD-10
where a frequency class could not be determined in order to have a greater probability
that at least some terms were unknown to medical experts. We refer to the corresponding
frequency class as c22.
The 200 medical terms were randomly subdivided into four tests each containing a
varying number of frequency classes. Each test candidate was allowed to do only one
test. Thus, we had to take care that each of the four tests was taken as often as any
other one. All test candidates had to estimate their knowledge about each test term on a
scale from 1 to 5 (see Table 1) indicating their personal knowledge estimation (PKE).
Table 1 Personal knowledge estimation
1 The term is completely unknown
2 The term has been heard of, but cannot be properly integrated into a medical context
3 The meaning of the term is known or it can be derived. In addition, the term can be vaguely
integrated into a medical
4 The meaning of the term is known and it can be associated with further medical terms
5 The term is completely clear and comprehensive knowledge can be associated
6.2 Evaluation
In total, 36 persons participated in the experiment: 28 laypeople and eight medical
experts. The two groups were differentiated as follows. Test candidates with profound
medical qualification were classified as experts. For instance, this concerns medical
staff, medical students and doctors. All other test candidates were classified as laypeople.
Figures 4 and 5 depict the result of the evaluation.
Figure 4 depicts an average value of the PKE as function of the frequency classes
for experts (upper curve) and laypeople (lower curve) as well. Figure 5 depicts the
corresponding standard deviation.
Figure 4 contains two outliers for medical laypeople: c13 and c19. The first one can
be traced to the term Atlas. It is an ambiguous term whose meaning in a geological
context is quite common. In contrast, its meaning as first cervical vertebra is relatively
unknown. The second outlier can be traced to some compounds which are quite common
Improving understandability of semantic search explanations 229
for the German language. The meaning of those terms can easily be derived but their
occurrence in daily language is rare. In contrast to laypeople, the curve of medical
experts is without any irregularity. Merely the estimation of general terms is very
interesting because experts seem to consider what they do not know with respect to the
general term. The standard deviation for both groups is quite interesting. From frequency
class c18 on the values jump up. A possible reason for that may be that people have
more knowledge in subfields of the medical domain than in others, i.e., when they have
a certain disease.
Figure 4 Average values for experts (black) and laypeople (grey)
Figure 5 Standard deviation for experts (black) and laypeople (grey)
230 T. Roth-Berghofer and B. Forcher
The main objective of the experiment was not to verify a correlation between users’
degree of knowledge and frequency classes. In fact, the intention is primarily to denote
intervals of frequency classes as a means of prognosis whether user groups probably
know a term or require supporting information. For this purpose, we introduce three
Boolean functions: k(t)for known terms, s(t)for support requiring terms, and u(t)for
unknown terms. With respect to average PKE of medical laypeople, we identified three
suitable intervals, and defined the functions as follows (index l indicates laypeople):
1kl(t)is true iff c(t)∈[c11, . . . , c15 ]
2sl(t)is true iff c(t)∈[c16, . . . , c19 ]
3ul(t)is true iff c(t)∈[c20, . . . , cn]and n > 20.
The proposed functions do not apply to medical experts. The average PKE of all
concepts indicates that medical experts generally know terms used in the FMA and
ICD-10. Thus, only the function ke(t)can be defined which is always true (index e
indicates experts).
As mentioned before, the functions allow evaluating a justification as presented in
Section 5. For instance, there are two justifications A and B of the same search result
whereas both comprise three terms. If the middle term of A is a known term and if the
middle term of B is a support requiring term, probably justification A is the better one.
The concepts can also be used to tailor justifications. Let a justification represent a path
in class hierarchy and comprise four terms. If one of the mid terms is an unknown term
and the one is known, the unknown term can be removed.
In many cases, labels of the FMA or ICD-10 contain other concept labels. For
instance, distal phalanx of left index finger includes the concept labels distal phalanx,
left and index finger. All labels have different frequency classes and thus, a prediction
whether a user knows such a concept cannot be made (this applies for all non-lexical
labels). But this may not be necessary in order to select the most suitable label for
a concept with respect to medical laypeople or experts. Using the kl(t)and ke(t), it
is possible to define two sets of labels. These sets can be generalised with respect to
various attributes of the labels such as average frequency class of sublabels, label length
or token count. The most prominent member of one class can be used to solve a label
selection problem. A label with minimal distance to that member may be the most
appropriate label for a concept concerning different user groups.
The presented experiment and proposed method may only be seen as a first approach
to improve the current explanation generation. We ignored some important aspects of the
experiment, such as compounds or ambiguous terms. In addition, users probably may
not estimate their knowledge 100% correctly. For this reason, the presented approach
can either be regarded as a an inexpensive way of providing concept explanations or as
starting point for evaluating terms or complete explanation paths which can be further
improved by using more complex methods and user interactions.
7 Summary and outlook
In this paper, we presented the explanation facility of the MEDICO Demonstrator
RadSem. The semantic search engine of RadSem uses formal ontologies to annotate
and retrieve medical documents, i.e., FMA and ICD-10. Since semantic search results
Improving understandability of semantic search explanations 231
are often hard to understand, an explanation facility for justifying and exploring search
results was integrated into RadSem.
Figure 6 depicts a refined explanation scenario where the explainer comprises the
justification and exploration components and RadSem is the originator. The explainer
employs the very same ontologies for explanation generation as RadSem. Furthermore,
it uses some additional knowledge for providing adapted explanations: a (simple) user
model and frequency classes. As the originator does not provide trace information, a
reconstructive explanation approach has been selected as a means of justifying semantic
search results. The explainer uses Dijkstra’s algorithm for reconstructing a plausible
explanation.
Figure 6 Communication participants in MEDICO explanation scenario (see online version for
colours)
For improving justifications, we conducted an experiment with medical experts and
laypeople. The objective of the experiment was to determine an assumed correlation
between users’ degree of knowledge and medical terms. We discussed the results and
proposed an easy-to-compute, but effective method that can be used for selecting
understandable ontology terms in explanations for medical experts or for laypeople. The
overall approach can be used to justify various semantic search algorithms using formal
ontologies.
The presented explanation approach uses an arbitrary shortest path due to Dijkstra’s
algorithm. In our future work, we will compute multiple shortest paths and try to
determine which is the most suitable one in an explanation scenario. In addition, we will
investigate further EASD methods such as user interactions for tailoring explanations to
specific users.
Acknowledgements
This work has been supported in part by the research programme THESEUS in
the MEDICO project, funded by the German Federal Ministry of Economics and
Technology (grant number 01MQ07016). The responsibility for this publication lies with
the authors.
232 T. Roth-Berghofer and B. Forcher
References
Ballstaedt, S.P. (1997) Wissensvermittlung, Beltz Psychologische Verlags Union.
Beckett, D. (2004) ‘RDF/XML syntax specification (revised), W3C recommendation’, February,
available at http://www.w3.org/TR/rdf-syntax-grammar/,
http://www.w3.org/TR/rdf-syntax-grammar/ (access on February 2009).
Buchanan, B.G. and Shortliffe, E.H. (Eds.) (1984) Rule-Based Expert Systems: The MYCIN
Experiments of the Stanford Heuristic Programming Project, Addison-Wesley Publishing
Company, Reading, Massachusetts.
Davison, A. and Green, G.M. (1988) Linguistic Complexity and Text Comprehension: Readability
Issues Reconsidered, L. Erlbaum Associates, Hillsdale, NJ.
Dijkstra, E.W. (1959) ‘A note on two problems in connexion with graphs’, Numerische Mathematik,
Vol. 1, pp.269–271, available at http://jmvidal.cse.sc.edu/library/dijkstra59a.pdf.
Forcher, B., M¨
oller, M., Sintek, M. and Roth-Berghofer, T. (2009) ‘Explanation of semantic search
results of medical images in medico’, in T.R. Roth-Berghofer, N. Tintarev and D.B. Leake (Eds.):
Workshop 10@IJCAI-09: Explanation-aware Computing (ExaCt 2009), pp.13–24.
Hayes, D.P. (1992) ‘The growing inaccessibility of science’, Nature, Vol. 356, pp.739–740.
Hildebrand, M., Ossenbruggen, J.R. and van Hardman, L. (2007) ‘An analysis of search-based user
interaction on the semantic web’, Report, CWI, Amsterdam, Holland, July.
International Statistical Classification of Diseases and Related Health Problems, 10th revision (2007)
Available at http://www.who.int/classifications/apps/icd/icd10online.
Java Universal Network/Graph Framework (JUNG) (2009) Available at http://jung.sourceforge.net/.
Kemp, E.A. (1992) ‘Communicating with a knowledge-based system’, in P. Brezillon (Ed.): Improving
the Use of Knowledge-Based Systems with Explanation.
Langlotz, C.P. (2006) ‘Radlex: a new method for indexing online educational materials’,
RadioGraphics, Vol. 26, pp.1595–1597, doi: DOI:10.1148/rg.266065168, available at
http://radiographics.rsnajnls.org/cgi/content/full/26/6/1595.
Layout Managers for Jung2 (2009) Available at http://www.jung2util.opendfki.de.
M¨
akel¨
a, E. (2005) ‘Survey of semantic search research’, in Proceedings of the Seminar on Knowledge
Management on the Semantic Web, Department of Computer Science, University of Helsinki,
available at http://www.seco.hut.fi/publications/2005/makela-semantic-search-2005.pdf.
Marwede, D. and Fielding, J.M. (2007) ‘Entities and relations in medical imaging: an analysis of
computed tomography reporting’, in Applied Ontology, Vol. 2, pp.67–79, IOS Press, Amsterdam,
The Netherlands.
Marwede, D., Schulz, T. and Kahn, T. (2007) ‘Indexing thoracic CT reports using a preliminary
version of a standardized Radiological Lexicon (RadLex)’, Journal of Digital Imaging,
December, Vol. 21, No. 4, pp.363–370, available at
http://www.springerlink.com/content/kv717q834l463716/fulltext.pdf.
McGuinness, D.L. and van Harmelen, F. (2004) ‘OWL Web Ontology Language overview, W3C
recommendation’, World Wide Web Consortium, February.
McGuinness, D.L., Ding, L., Glass, A., Chang, C., Zeng, H. and Furtado, V. (2006) ‘Explanation
interfaces for the Semantic Web: issues and models’, in Proceedings of the 3rd International
Semantic Web User Interaction Workshop (SWUI‘06).
M¨
oller, M. and Sintek, M. (2007) ‘A generic framework for semantic medical image retrieval’,
in Proc. of the Knowledge Acquisition from Multimedia Content (KAMC) Workshop, 2nd
International Conference on Semantics And Digital Media Technologies (SAMT), November.
M¨
oller, M. and Sintek, M. (2008) ‘A scalable architecture for cross-modal semantic annotation and
retrieval’, in A.R. Dengel, K. Berns and T.M. Breuel (Eds.): KI 2008: Advances in Artificial
Intelligence, Springer.
Improving understandability of semantic search explanations 233
M¨
oller, M., Regel, S. and Sintek, M. (2009) ‘RadSem: semantic annotation and retrieval for medical
images’, in Proc. of The 6th Annual European Semantic Web Conference (ESWC2009), June,
available at http://www.manuelm.org/publications/wp-content/uploads/2009/02/eswc2009.pdf.
OpenRDF.org – Sesame Development Site (2009) Available at http://www.openrdf.org/.
OWLIM Semantic Repository (2009) Available at http://www.ontotext.com/owlim/.
Passmore, J. (1962) ‘Chapter: Explanation in everyday life, in science and in history’, History and
Theory, Vol. 2, No. 2, pp.105–123, Blackwell Publishing for Wesleyan University.
Prud’hommeaux, E. and Seaborne, A. (2007) ‘SPARQL query language for RDF’, Technical report,
W3C, March, available at http://www.w3.org/TR/rdf-sparql-query/.
Richards, D. (2003) ‘Knowledge-based system explanation: the ripple-down rules alternative’,
Knowledge and Information Systems, Vol. 5, pp.2–25.
Rosse, C. and Mejino, J.L.V. (2007) ‘Chapter: The foundational model of anatomy ontology’, Anatomy
Ontologies for Bioinformatics: Principles and Practice, December, Vol. 6, pp.59–117, Springer,
doi: 10.1007/978-1-84628-885-2, available at
http://sigpubs.biostr.washington.edu/archive/00000204/01/FMA Chapter final.pdf.
Roth-Berghofer, T. (2009) ‘ExaCt manifesto: explanation-aware computing’, July, available at
http://on-explanation.net/content/ExaCt Manifesto.html.
Roth-Berghofer, T.R. and Richter, M.M. (2008) ‘On explanation’, K¨
unstliche Intelligenz, May, Vol. 22,
No. 2, pp.5–7.
Sørmo, F. and Cassens, J. (2004) ‘Explanation goals in case-based reasoning’, in P. Gerv´
as and K.M.
Gupta (Eds.): Proceedings of the ECCBR 2004 Workshops, Technical Report of the Departamento
de Sistemas Informáticos y Programación, Universidad Complutense de Madrid, No. 142-04,
pp.165–174, Madrid, available at
http://www.idi.ntnu.no/ cassens/work/publications/download/2004-ECCBR-WS-goals.pdf.
Spieker, P. (1991) ‘Nat¨
urlichsprachliche erkl¨
arungen in technischen expertensystemen’, Dissertation,
University of Kaiserslautern.
Swartout, W.R. (1983) ‘XPLAIN: a system for creating and explaining expert consulting programs’,
Artificial Intelligence, Vol. 21, No. 3.
Swartout, W.R. and Moore, J.D. (1993) ‘Explanation in second generation expert systems’, Second
Generation Expert Systems, pp.543–585.
Swartout, W.R. and Smoliar, S.W. (1987) ‘Explanation: a source of guidance for knowledge
representation’, in K. Morik (Ed.): Knowledge Representation and Organization in
Machine Learning, Lecture Notes in Computer Science, Vol. 347, pp.1–16, Springer,
ISBN 3-540-50768-X.
Swartout, W.R., Paris, C. and Moore, J.D. (1991) ‘Explanations in knowledge systems: design for
explainable expert systems’, IEEE Expert, Vol. 6, No. 3, pp.58–64.
The OWL API (2009) Available at http://owlapi.sourceforge.net.
THESEUS-Basistechnologien (2009) Available at http://www.theseus-programm.de/basistechnologien/.
Universit¨
at Leipzig (2009) Deutscher Wortschatz, available at http://wortschatz.uni-leipzig.de/.
Wick, M.R. and Thompson, W.B. (1992) ‘Reconstructive expert system
explanation’, Artif. Intell., Vol. 54, Nos. 1–2, pp.33–70, ISSN 0004-3702,
doi: http://dx.doi.org/10.1016/0004-3702(92)90087-E.
Wordnet RDF/OWL files (2006) Available at http://www.w3.org/2006/03/wn/wn20/.
Wright, P. and Reid, F. (1973) ‘Written information: some alternatives to prose for expressing the
outcomes of complex contingencies’, Journal of Applied Psychology, Vol. 57, No. 2, pp.160–166.
zu Eissen, S.M. and Stein, B. (2006) ‘Intrinsic plagiarism detection’, in ECIR, pp.565–569.
234 T. Roth-Berghofer and B. Forcher
Notes
1 Computed tomography (CT) is a medical imaging method employing tomography created by
computer processing.