ArticlePDF Available

Abstract and Figures

Dialogue-based systems often consist of several components, such as communication analysis, dialogue management, domain reasoning, and language generation. In this paper, we present Converness, an ontology-driven, rule-based framework to facilitate domain reasoning for conversational awareness in multimodal dialogue-based agents. Converness uses Web Ontology Language 2 (OWL 2) ontologies to capture and combine the conversational modalities of the domain, for example, deictic gestures and spoken utterances, fuelling conversational topic understanding, and interpretation using description logics and rules. At the same time, defeasible rules are used to couple domain and user-centred knowledge to further assist the interaction with end users, facilitating advanced conflict resolution and personalised context disambiguation. We illustrate the capabilities of the framework through its integration into a multimodal dialogue-based agent that serves as an intelligent interface between users (elderly, caregivers, and health experts) and an ambient assistive living platform in real home settings.
This content is subject to copyright. Terms and conditions apply.
As the amount of structured knowledge made available keeps growing, e.g. in Linked Data cloud, proprietary knowledge bases, etc., so does
the pursuit for effective accessing and querying paradigms. Within this endeavour, recent years have witnessed important advances in natural
language interfaces (NLIs) that allow users to express their information needs in an intuitive manner, while hiding the complexity of knowledge
representation and query languages. At the same time, spoken dialogue-based systems have emerged out of the need to further assist users and
enhance their experience and engagement, supporting human-like conversation that is considered to be a natural, intuitive, robust and efficient
means for interaction.
A key prerequisite in spoken interfaces is to afford effective strategies for ensuring meaningful and coherent interactions with end users. Out
of the numerous domains of interest, conversational assistance in healthcare is a notable case where natural language interfaces provide unique
solutions to patients and health experts (??). For example, the dialogue-based agent can act as a personal assistant for the elderly, providing
information about basic care and healthcare (e.g. injury treatment). On the other hand, health experts can retrieve valuable information about the
patient in cases where there are, for example, communication barriers (e.g. lack of language skills) or uncontrolled user behaviour (e.g. memory loss
due to severe dementia).
The need to overcome the limitations of dialogue systems that use speech as the only communication means has led to the emergence of
multimodal dialogue-based systems (?). In such environments, information is typically collected from multiple sources and modalities, such as
multimedia streams (e.g. using video analysis for posture and facial expression recognition), lifestyle and environmental sensors (?). The idea is
that, although each modality is informative on specific aspects of interest, the individual pieces of information are not capable of delineating
complex interpretations. On the other hand, combined pieces of information can plausibly describe the semantics of context, facilitating intelligent
conversational awareness. Therefore, multimodal dialogue-based systems need to effectively fuse communication modalities, e.g. deictic gestures
and spoken utterances, to better understand and interpret the conversational semantics, and to achieve context awareness towards satisfying the
information needs of the user.
Once the situation is understood and interpreted, the conversational agent must be able to decide on the correct information to deliver, taking
into account not only the available domain and background information relevant to the application, but also user profiles, preferences or behavioural
aspects, in order to narrow down the search space and to provide user-pertinent responses. For example, a conversational agent in a Smart Home
that monitors daily activities of people should be able to combine activity monitoring results with clinical guidelines and intervention strategies, in
order to provide accurate responses about performed activities, as well as feedback and suggestions to further engage users.
In order to elicit an accurate understanding of the user’s input, a rich body of work focuses on word sense disambiguation, i.e. the association
of the most appropriate meaning to individual entities and concepts. Rich knowledge bases, such as WordNet and DBpedia, provide hundreds of
thousands of entity types and senses, along with connections to other relevant terms and have been widely used for disambiguation. In such cases,
the agent needs to be able to select the most probable sense, resolving conflicts between conversational context and candidate interpretations.
However, little focus has been given on resolving high-level conflicts and exceptions (context disambiguation), whose resolve requires intelligent
coupling of high-level personalised knowledge and domain models, beyond simple disambiguation of terms. For example, the elderly may ask the
agent to suggest possible causes of his headache. By intelligently coupling information about the sleeping routine of the person, the agent can give
lower priority to typical causes of headache, e.g. illness, stress, etc., and promote bad sleep as the most probable cause.
The Converness framework proposed in this paper lies in the intersection of two research fields: (a) semantic aggregation and reasoning, and (b)
rule-based context understanding and conflict resolution using defeasiblereasoning (?). More specifically, the focus is given on enriching multimodal
dialogue-based agents with (a) intelligent context aggregation mechanisms for conversation understanding and reasoning, and (b) conflict resolution
mechanisms of domain inconsistencies and user preferences. To this end, OWL 2 ontologies are used to model the multimodal types and the
semantics that underpin the interpretation logic, while defeasible logics provide the non-monotonic semantics needed to deliver rule priorities and
advanced context disambiguation strategies. The contributions of our work can be summarised as follows:
We present a framework for the semantic enrichment and interpretation of communication modalities in dialogue-based interfaces (Section
4).
We demonstrate the practical use of non-monotonic rule-based reasoning (defeasible reasoning) for user-centred conflict resolution,
coupling profile information, daily activities and medical recommendations (Section 5).
Our framework follows the knowledge-driven paradigm, fostering easy adaptation to different application domains.
We evaluate our framework with data collected in real-world deployments (Section 6).
We illustrate the capabilities of the framework through its integration into a dialogue-based agent for conversational assistance in healthcare
and basic care. More specifically, elderly use the dialogue system to acquire information and suggestions related to basic care and healthcare (e.g.
symptoms, treatments, etc.), as well as to retrieve information about their activities that are monitored through a wide range of sensor modalities,
including physical activity, sleep and activities of daily living (ADLs). Key challenges in this domain involve (a) the effective fusion of verbal and
non-verbal communication modalities, e.g. deictic gestures and spoken utterances, in order to disambiguate and interpret user input during the
interaction with the agent; and (b) the infusion of user awareness in terms of profile information and behaviour (e.g. performed activities) for
intelligent context disambiguation. Converness is used as the Domain Reasoning module, feeding the Dialogue Manager of the framework with
information relevant to the discussed topics and user awareness.
The rest of the paper is structured as follows: Section 2 discusses related work on the domain. Section 3 gives an overview of the proposed
framework, highlighting key capabilities. Section 4 describes the ontology-driven framework for conversational understanding. Section 5 presents
the defeasible layer to handle high-level conflicts in answering users’ requests. Section 6 presents results from the evaluation of the framework.
Finally, Section 7 concludes the paper and outlines next steps.
The demand for context-aware user task support has proliferated in the recent years across a multitude of application domains, ranging from
healthcare and smart spaces to transportation and energy control. A key challenge is to abstract and fuse the captured context in order to elicit
an adequate understanding of user intentions (??). In healthcare, for example, wearable and ambient sensors, coupled with profile information and
health knowledge can improve the quality of life of care recipients and provide useful insights for personalised interventions and care solutions
FIGURE 1 An abstract multimodal dialogue system, as presented in (?).
(?). A common prerequisite in all context-aware systems is the ability to capture, share and process information coming from different sources and
services. This translates into a twofold requirement. First, there is a need for commonly agreed vocabularies of consensual and precisely defined
terms for the description of data in an unambiguous manner. Second there is a need for mechanisms to integrate, correlate and semantically
interpret these data.
Given the inherent requirement in multimodal environments to aggregate low-level information and integrate domain knowledge, it comes as
no surprise that Semantic Web technologies, and more specifically ontologies, have been acknowledged as affording a number of highly desirable
features. More precisely, ontologies are models that capture knowledge about some domain of interest. Formally speaking, ontologies are explicit
formal specifications of shared conceptualisations (?). They afford abstract views of the world including the objects, concepts, and other entities
that are assumed to exist in some area of interest, their properties and the relationships that hold among them. Their expressivity and level of
formalisation depend on the knowledge representation language used.
The OWL 2 Web Ontology Language (?) is the W3C recommendation1for creating and sharing ontologies on the Web and it has been exten-
sively used for capturing context elements (e.g. profiles, events, activities, locations, postures and emotions) and their pertinent relations, mapping
observations and domain knowledge to class and property assertions in the Description Logics (DL) theory (?), fostering integration of information
at various levels of abstraction and completeness (?). For example, BeAware! (?) provides a framework for context awareness in road traffic man-
agement; (?) use ontologies to develop a semantic dialogue system for radiologists. (?) propose an ontology-based framework for context-aware
activity recognition in smart homes. Key challenges and opportunities of Semantic Web technologies in context-aware applications are discussed
in (?).
An abstract architecture of multimodal dialogue-based systems is depicted in FIGURE 1. Multimodal interaction involves the collection of verbal
and non-verbal information from the interaction of users with the system. Verbal input is processed by Automatic Speech Recognition (ASR) and
the textual output is analysed by Natural Language Processing (NLP). Natural Language Understanding (NLU) then maps the input to a meaning
representation (e.g. semantic frame) in which speech act, main goal and domain-specific named entities are extracted by semantic parser or statis-
tical models. Non-verbal modalities, such as gestures, facial expressions and emotions, are used to further enrich the captured context (?). Dialogue
Management serves many roles, including discourse analysis, knowledge database queries, question answering and system action prediction based
on the available input and discourse history (?). Domain Reasoning is an important step towards achieving the necessary level of awareness, under-
standing user’s input, i.e. interpreting verbal input and gestures, and fusing the available context into a unified understanding of the situation in
order to take an appropriate action. Finally, Output Planning involves Natural Language Generation (NLG) and Text-To Speech (TTS) to generate the
system utterances with system actions and dialogue management (DM) domain-specific knowledge.
Ontologies have been used to semantically enrich various aspects of dialogue-based systems. In ASR and NLP, ontologies, such as WordNet and
BabelNet provide the vocabulary and semantics for content disambiguation (??). Ontologies have been also used in NLU for mapping extracted
concepts and relations to lexical resources, or for coreference resolution (??). In multimodal fusion, ontologies are used for fusing multi-level
1
contextual information (?); for example, (?) present a framework for coupling audio-visual cues with multimedia ontologies. Relevant approaches
are also described in (?) for various multimedia analysis tasks. SmartKom (?) partially uses ontologies to fuse information in multimodal dialogue
systems, combining speech, gesture and facial expressions. Ontologies have been also used in dialogue management (?), as well as to endow
interactivity with elaborateness and indirectness (?).
As already described, domain reasoning is a key requirement in dialogue-based systems, since it allows the derivation of implicit inferences,
intelligently coupling and aggregating already captured information. OWL 2 is strongly influenced by Description Logics (DL) (?), a family of knowl-
edge representation formalism characterised by logically grounded semantics and well-defined reasoning. Starting from atomic concepts, arbitrary
complex concepts can be described through a rich set of constructors that define the conditions on concept membership. To leverage OWL’s lim-
ited relational expressiveness, research has been devoted to the integration of OWL with rules. User-defined rules on top of the ontology allow
expressing richer semantic relations that lie beyond OWL’s expressive capabilities and couple ontological and rule knowledge (?). In this context,
OntoVPA (?) uses ontology reasoning and rules to generate responses and handle polysemy and ambiguity. In (?) reasoning is used to generate per-
sonalised questions to assist individuals with memory recollection. Ontological reasoning is used in (?) to infuse user preferences into the responses
generated by the agent, as well as in (?) to infer answers from ontologies as domain knowledge for information-seeking of complicated questions
which cannot be answered with simple domain information. In (?) reasoning is used to map user input to services that control smart devices, as
well as to answer contextual questions, e.g. the location of certain people inside the house.
One major shortcoming of the aforementioned approaches is that they directly map incoming information (e.g. user utterances) to underlying
knowledge structures, aiming to find the most plausible response. Although this approach works in simple question-answering scenarios, it falls
short to support advanced interactivity and system feedback, such as clarification dialogues, since no conversational topic is recognised. In addition,
the use of strict rule patterns does not provide enough flexibility for handling the imprecise and ambiguous nature of real-world cases. Strict and
shallow contextual restrictions cannot handle the noisy and potentially inconsistent transcriptions of verbal information, where certain concepts
and relations may be inaccurately detected or may fail to be detected at all.
A significant body of work has been devoted to address the above challenges, focusing on the identification of the intended dialogue topic
(domain selection problem) (?). To this end, several data-driven approaches have been developed to understand the discussion context and to
trigger the necessary feedback to the users (?????). In parallel, severalontology-based (knowledge-driven) approaches have been proposed, allowing
domain knowledge and common sense semantics to be incorporated into context understanding (???). This is useful in cases where (a) there is a
lack of available data for training, (b) there is a need to reuse the topic models across several users and domains, or (c) it’s necessary to manually
refine the semantics to support and investigate specific conversational aspects.
Converness has been primarily used in the healthcare domain, supporting elderly and clinical experts in acquiring recommendations, suggestions,
feedback and behavioural information through intuitive interactions with a dialogue-based system. Our framework follows the knowledge-driven
approach to achieve conversational awareness, where domain knowledge is used for defining the conversational topics. This allows the fast inte-
gration of the framework in dialogue-based systems, addressing the lack of training data that is difficult to collect in certain real-world deployment
settings, e.g. from elderly in private homes. In contrast to existing ontology-based domain selection solutions (e.g. (?)), where ontologies are used
as shallow models with limited semantics, Converness takes full advantage of the advanced modelling capabilities of OWL 2, using meta-modelling
(often referred to as punning (?)) for capturing the multimodal semantics that define topics of interest (such as gestures and utterances). It also
employs advanced non-monotonic reasoning though Defeasible Logics for intelligent context disambiguation and infusion of human awareness that
drives the generation of a final responses, as it is described in section 5.
As far as defeasible reasoning is concerned, although it is intuitively used by healthcare practitioners for taking plausible decisions (?), to the
best of our knowledge only few practical e-health applications have emerged. Such an example is presented in a previous work of ours (?), where a
defeasible reasoning module handles conflicts and uncertainty during the derivationof activities of daily living. In other domains, the non-monotonic
semantics of defeasible logics has been mainly used for building argumentative dialogue-based systems (??). In this paradigm, the interaction with
the user involves conflictual arguments and the agent tries to resolve the conflicts through counterarguments (?). In other cases, the agent tries to
resolve conflicts derived from inconsistent knowledge bases (?), where inconsistencies mainly arise due to the existence of rules with exceptions
or after having aggregated information from multiple sources (?).
Converness addresses two key challenges in dialogue-based systems:
Conversational awareness, recognising and understanding the discussion context from the available communication modalities, such as
verbal and non-verbal input.
FIGURE 2 Converness architecture.
Personalised context-aware disambiguation, assisting dialogue management in selecting the most plausible feedback and response to the
user.
FIGURE 2 depicts the overall system concept. In terms of the abstract architecture of dialogue-based systems depicted in FIGURE 1, Converness
is positioned in the Domain Reasoning module, working side-by-side and feeding the Dialogue Manager with information relevant to the discussed
topics, along with a unified understanding of the ongoing context and user awareness, in order to take appropriate actions. To this end, we use the
advanced context classification capabilities of DL reasoning coupled with rules for topic understanding and conversational awareness. In parallel,
we use non-monotonic defeasible logics for user-centred conflict resolution, coupling domain and profile information relevant to the domain. More
precisely, defeasible reasoning is used to infuse non-monotonic semantics in domain reasoning. There are cases where default negation, i.e. the
ability to reason based on the absence of context information, is not adequate to capture the intended semantics. Strong negation is often a useful
feature that enables the derivation of negative knowledge in the head of the rule (i.e. to state that something is not true). For example, defeasible
reasoning can used by the system to provide suggestions on what not to do, e.g. “you should not eat almonds” (see section 5.2.2). In addition, default
reasoning does not provide ways to model the quality of the information, nor the preference between two different (contradictory) facts. In other
words, it does not include any notion of priority, not allowing to resolve potential conflicts that may arise while coupling context information from
different sources, e.g. profile knowledge and activity logs. On the other hand, defeasible reasoning can be used to represent defeasible knowledge,
i.e. tentative information that may be used if nothing could be posed against it. It also supports both types of negation (strong and default negation),
as well as priorities (see section 5.1).
Converness capitalises on well-established Semantic Web technologies, standards, ontologies and formal logic frameworks, and it can be easily
reused and adapted in different contexts, due to its knowledge-driven nature that fosters reusability. We demonstrate in section 6 the integration
of Converness in the KRISTINA dialogue-based agent (?), which has been used to assist elderly in smart homes. It also follows a knowledge-driven
approach for modelling domain topics and their dependencies on different modalities based on a formal knowledge representation formalism
(OWL 2 ontologies). This makes the domain knowledge shareable and reusable regardless of the heterogeneity of the underlying technologies
for analysing verbal and non-verbal modalities. In addition, the framework has been primarily used in the healthcare domain, providing guidance
and suggestions to elderly people. In this context, the clinical experts already have questionnaires about habits, needs and behaviour aspects of
participants, rendering the ontology-driven approaches as the most appropriate for easily adapting and configuring the interaction with the users,
without needing to rebuild models or collect data. This is the main advantage of the knowledge-driven approaches compared to the data-driven
ones, where the latter generate models that are not equally reusable and cannot be manually adapted to incorporate common sense knowledge.
Each model needs to be learned for each individual, requiring in principle large amounts of training and test data. In our domain, the possibility of
manually configuring the models is quite important, since it allows clinical experts to guide and control conversations for communicating or even
monitoring certain behavioural aspects of users (see Section 6.1.2 for more details).
It should be noted that Converness is not a fully fledged dialogue-based framework. It rather targets at addressing the two key challenges
mentioned above, endowing dialogue-based systems with advanced knowledge representation, context interpretation and reasoning capabilities.
Other key aspects in dialogue-based systems, such as maintenance of the dialogue states, selection of the next system action, question answering,
and generation of system responses, are not tackled by the presented framework and fall into other relevant research areas, such as dialogue
management, information retrieval and multimodal communication generation.
Converness supports two levels of abstraction through which domain information can be modelled: multimodal observations and conversational
descriptors.
We use the term “observation” to refer to multimodal input coming from communication analysis (e.g. NLU and gesture recognition). In order to
promote reusability and enable the conceptual alignment of our models with existing well-known vocabularies, we have developed a lightweight
vocabulary for capturing observations by extending the concept of LODE (?) to benefit from existing vocabularies to describe events.
FIGURE 3 Converness ontology to model multimodal observations during the interaction with users. The prefixes , and refer to
the LODE, DUL and OWL Time ontologies, respectively.
The observation ontology is depicted in Figure 3 and provides the lightweight vocabulary for capturing basic multimodal information, such as
observation hierarchies and temporal extension. In practice, events are modelled as instances of the class. An event can be further
associated with a time interval through the property. In Converness, is considered as an event and it is the root class with
three observation types for modelling emotions (derived from facial expressions, valence and arousal), gestures and domain entities recognised
from verbal input. Information about the actors (in our case, the users who communicate with the agent) and temporal extensions of observations
(reusing the OWL Time ontology2) is captured using LODE properties and , respectively. For example, a pointing
gesture to the head of a user, whose description is omitted, detected by visual analysis can be modelled as3:
2
3RDF 1.1 Turtle syntax,
A descriptor is defined as a semantic interpretation of data, assigning meaning to context using well-defined vocabularies, structures and rela-
tionships. It can be considered as a level of abstraction, trying to semantically classify, aggregate and meaningfully correlate states of affair. The
semantics that underpin descriptors encapsulate quite powerful modelling capabilities, reusing the conceptual model of the Descriptions and
Situations (DnS) pattern (?) implemented in DUL (?).
The DnS design pattern provides a principled approach to context reification through a clear separation of states of affairs, i.e. a set of assertions,
and their interpretation based on a non-physical context, called a description. Intuitively, DnS axioms try to capture the notion of “situation” as a
unitarian entity out of a state of affairs, with the unity criterion being provided by a “description”. In that way, when a description is applied to a
state of affairs, a situation emerges. We use the DnS implementation in DUL to formally provide precise representations of conversational topics
through situation descriptors. As such, situations are related with classes, enabling the contextual description of domain topics at
the class level (meta-modelling).
More specifically, situations interpret conversational topics and they are captured in terms of conversational descriptors. The conversational
descriptors encapsulate the domain knowledge needed to detect and correlate domain topics (T) in the form of abstract dependencies between
lower level multimodal observation classes (Obs).
Definition 1. A conversational description CDtof a domain topic tis the tuple ht,Di, where tTand DObs T.
In practice, the conversational descriptors are defined in terms of the meta-model depicted in Figure 4 and consist of the following
conceptualisations:
: Top-level container class for defining contextual topic dependencies.
: Property that designates the dependency set of the descriptor, i.e. key multimodal observation types of the dependency.
: Property that designates the topic of the dependency.
FIGURE 4 The Conversational Descriptor model.
More specifically, the class extends the class of DUL and operates at the meta-model layer,
i.e. it is used for defining dependencies among OWL 2 classes. Following a tagging-like procedure, the topics of the domain (defined through
property assertions) are annotated with relevant multimodal observation classes (through property assertions) that form
the dependency set of the descriptor. For example, the conversational topic that involves the recognition of a pain situation is defined as
The domain logic for pain-related modalities is given by (1).
t v (1)
These complex class descriptions actually define the dependency between the pain topic and references that indicate pain. Such references may
be derived either from verbal communication, e.g. through speech analysis, or can be detected from non-verbal modalities, e.g. facial expressions.
To this end, we define the pertinent observation types as subclasses of PainReference. The complex class description can be easily extended to
include additional modalities.
The annotation of domain topics with observation classes has two benefits. First, from a practical perspective, the resulting models are intuitive
and can be easily defined, reused and extended in different domains, building a network of interlinked topic models. Second, from a theoretical
perceptive, the fact that classes can participate in instance-like property assertions enables the definition of contextualisations beyond standard
OWL 2 semantics. For example, OWL 2 class semantics can model only domains where instances are connected in a tree-like manner (?). On
the other hand, such arbitrary relations can be captured at the instance level, allowing additional semantics and descriptive context (views) to be
assigned to the conversational descriptors, as we describe in the following section.
It is important to note here that since Converness follows a solely knowledge-driven approach, the conversational descriptors need to be
manually defined either from scratch, or by reusing and modifying existing descriptors. It should be noted that although there might be cases when
an observation uniquely identifies a conversational descriptor (e.g. may belong only to the dependency set of the descriptor),
this does not necessarily mean that the interpretation task always derives topics following an one-to-one mapping of their dependencies. Other
observations need to be present in order for the domain topic to be finally selected, handling noise and contradictory information.
The conversational descriptors support advanced semantics to foster their further interlinking, composition and reusability. More precisely, two
semantic relations are defined, namely conversation unfolding and multimodal linking. Conversation unfolding refers to the extension of dependency
sets taking into account the topic hierarchy. On the other hand, multimodal linking is activated when the dependency set of a conversational
descriptor contains one or more topics, whose multimodal dependencies are also added to the initial set.
Intuitively, conversation unfolding is necessary to allow the metamodel space of conversational descriptors to reuse existing hierarchies of domain
ontologies. A conversational descriptor defines dependencies between domain topics ( property assertions) and relevant multimodal
observations ( property assertions). Since this model defines relations among classes, native RDF/OWL 2 schema properties, such as
relations, cannot be inherently unfolded and used for inferring additional hierarchical relations and dependencies. In order to
take into account the hierarchical relationships and further enrich the conversational descriptors, we extend the reasoning capabilities, enabling
hierarchical inferences at the descriptor level.
Definition 2. Given two conversational descriptions CDa=ha,Di, and CDb=hb,Ei, conversational unfolding ensures that xE,xD, if
avb.
Assuming that is the inverse property of , the semantics of conversation unfolding is given by the OWL 2 property
path in (2), stating that: “the conversational descriptor of a topic also inherits the dependencies of the conversational descriptors of its superclasses.
◦ v (2)
As an example, consider the domain hierarchy where v. Without conversation unfolding, someone would need to define two,
almost duplicated, conversational descriptors to specify the context of these topics, i.e. the situation descriptor of would be similar to
the one of with the addition of the concept. Using (2), it is sufficient for the conversational descriptor of to include
only the additional concept (i.e. , which can be recognised verbally or through a deictic gesture as denoted in (3)), while the rest of
the dependencies will be automatically inferred by the ontology reasoner, since v:
t v (3)
Multimodal linking unfolds hierarchical relations directly on the dependencies.
Definition 3. Given two conversational descriptions CDa=ha,Di, and CDb=hb,Ei, multimodal linking ensures that xE,xD,ifbD.
The semantics is given by the OWL 2 property path in (4), stating that: “the conversational descriptor of a topic also inherits the dependencies of
the conversational situation descriptors of its dependencies.
◦ v (4)
For example, let’s assume that vand that the descriptor of the conversational topic for is available, defined as:
Due to conversational unfolding, the descriptor of inherits the references of , i.e. and . In
addition, it should enrich the dependency set with additional references to further specialise the descriptor, according to the domain modelling
requirements. An example conversational descriptor for is given below.
According to multimodal linking, these three additional references will be further unfolded. Assuming that the descriptor of is
defined as
the multimodal observations and would be added to the descriptor of . The same holds for the other two references
( and ), whose definition is omitted.
In practice, both semantic relations (conversational unfolding and multimodal linking) introduce powerful modelling capabilities, enabling the
incremental and modular definition of descriptor models, as illustrated in this section.
Given a set of multimodal observations O, which correspond to the results of the multimodal communication analysis (verbal and non-verbal), and a
set of conversational descriptors CD, conversation interpretation aims to recognise the most plausible conversational topic and feed the Dialogue
Manager with pertinent information, assisting in the interaction with the users.
The algorithm starts by comparing the multimodal observation types in Oagainst the set of conversational descriptors CD, in order to retrieve
the most plausible topics. To this end, we define the σfunction to compute the similarity of a conversational descriptor’s dependency set against the
set with the observation types. More specifically, given the conversational descriptor CDa=ha,Daiand a set of observations O={o1,o2,..., oi},
σis given by (5).
σ(Da, O) =
X
nDa
max
cO|U(n)U(c)|
|U(c)|
|Da|(5)
U(x)is the set of superclasses of x, including x. If σ=1, then all classes in Daappear in O, meaning that the corresponding domain topic athat is
described by CDaperfectly describes the current conversational context. The σsimilarity is computed for all conversation descriptors (CDxCD)
and the descriptor with the maximum σvalue is selected as the most plausible conversational topic. This procedure is depicted in Algorithm 1.
If only a single conversational descriptor CDx={x,Dx}is selected by the algorithm, i.e. |T | =1and CDx T , then the corresponding
conversational descriptor CDxis returned as the ongoing discussion context. Otherwise, if there are more than one descriptors with maximum
similarity, the selection takes into account the history of conversation to filter out irrelevant topics, as described in the next section.
Algorithm 1: Selection of plausible conversational descriptors that interpret incoming multimodal observations
Data: Observations: O={o1,o2, ..., oi}, Conversational descriptors: CD ={CDa=ha,Dai,CDb=hb,Dbi, ..., CDx=hx,Dxi}
Result: The set Twith the most plausible conversational descriptors.
1T ← ∅;
2G← ∅;
3foreach hx,Dxi ∈ CD do
4if sim =σ(Dx,O)6=0then GG∪ {sim,hx,Dxi} ;
5forall {simx,hx,Dxi} ∈ Gwith the max simxdo
6T ← T ∪ {simx,hx,Dxi};
The history of conversation encapsulates useful information that can be used to assist in the selection of the most plausible discussion topic in
each turn. The rationale revolves around the assumption that if the list T(|T | >1)with the plausible ongoing discussion topics (see Algorithm 1)
contains the topic of the last turn (h), then the same topic can be selected for the ongoing turn. This is because topics that are relevant to a certain
theme usually share common observations and concepts, therefore, history can provide useful hints.
The easiest way to filter out conversational topics based on history is to simply check the membership of the last selected topic in the list T,
i.e. if ∃{sim,hh,Dxi} ∈ T . However, the way topics are modelled in the ontology strongly affects the accuracy of this approach. For example,
topics are usually organised in hierarchies, therefore the semantics of the hierarchy needs to be taken into account for the selection, such as
the subsumption relationships, and not just the existence of a topic itself. In Converness, two topics are considered not relevant if there is no
hierarchical relation between them. For example, consider the following topic hierarchy that extends the example presented in section 4.2.2:
v
v
v
v
v
Let’s assume that the last selected topic of the discussion was h=, e.g. the user asked information about the typical duration
of eating activity, and that in the next turn the user asks the system to provide information about the favourite food. However, only the
verbal observation reaches the system, since it is the only one recognised by communication analysis ( failed to be detected). In this case,
two topics are recognised by the system as plausible ones: and , based on the verbal observation.
Therefore,
T={{1,h,{,,}i},{1,h,{, , }i}} (6)
In this case, @{sim,hh,Dxi} T , therefore the history cannot be directly used to decide on the ongoing discussion topic. To overcome this
limitation, Converness checks the semantic membership of topics in T, taking into account concept distances in the ontology.
Definition 4. A conversational topic tsemantically belongs to a set of topics T, denoted as te
∈T ,if∃{sim,ht0,Dxi} ∈ T , δ(t,t0)< ϕ.
To compute the distance δof two concepts, we use the edge-counting distance, an intuitive measure that computes the distance of two concepts
based on the number of edges found on the shortest path between them. Therefore, according to Definition 4, the topic tbelongs to the set of
topics T, if there is at least one topic t0with distance from tless than the threshold ϕ(ϕis used to prune matches beyond a certain distance).
Since more than one concepts in Tmay satisfy the threshold, the topic with the minimum distance is selected as the one that best matches t.
In our example, δ(,) = 2and δ(,) = 4, therefore e
∈ T and
is selected as the current topic, since it is the semantically closest to the last selected topic (h=).
Despite the fact that the dialogue history can provide useful insights to achieve conversational awareness, it does not work in all cases. For example,
when the user changes the discussion topic, the semantic membership does not hold, and therefore, it cannot be used to reduce the size of Tto
a single topic. In such cases, the ability of the system to generate clarifications is a key feature. Converness supports requests for clarifications4to
disambiguate the conversational topic.
When |T | >1, i.e. when more than one conversational descriptors have been identified as plausible, Converness enters the topic disambiguation
mode. In this mode, the module generates a clarification as output that is further associated with the conversational descriptors in T. A clarification
response example that involves the conversational descriptors of the and topics is given below.
v
v
={,{,,}}
={,{, , }}
(7)
Similar to the conversational descriptors, the clarification response follows the conceptual model of DnS, instantiating a clarification as a
instance and linking it with the conversational descriptor instances through property assertions.
It is important to note here that in topic disambiguation mode, Converness expects from the user to provide additional conversational context,
enriching the multimodal observations Ot1collected in the previous turn. This means that the multimodal observations Occollected as the
response of the user to the clarification are added to the set Ot1of the previous turn, i.e. OOt1Oc. To better illustrate this, consider the
following example where a caregiver asks the system to provide information about the elder person who helps at home:
I1: [caregiver] What is his favourite food?
Assuming that the communication analysis fails to detect the concept 5, we have O1={ } and the Tgiven by (6). Since this is
the first question asked to the system, there is no history to filter out irrelevant topics. In this case, Converness sends to the dialogue manager the
clarification request described in (7). A possible translation of (7) to text is given by I26.
I2: [system] I can tell you about his favourite food and favourite movie.
As a response to the clarification, the caregiver informs the system that he is interested in food (I3).
I3: [caregiver] Tell me about food.
In this case, we have Oc={ }. Without taking into account the observations of the last interaction (Ot1), we would ignore important con-
textual information, i.e. the context. Therefore, O2O1Oc≡ { }. According to Algorithm 1, = { ,
{ , , }} is the most relevant conversational descriptor, which is sent to the dialogue manager as the ongoing conversational
context.
Conversational awareness aims to endow dialogue management with advanced topic understanding capabilities, intelligently coupling verbal and
non-verbal modalities. As presented in the previous sections, Converness follows the knowledge-driven paradigm, enabling the definition and
composition of conversational descriptors pertinent to the application domain. However, apart from understanding the discussion topic, the ability
to provide context-aware feedback, intelligently coupling conversational awareness and situational awareness is a key requirement in dialogue-based
systems, especially in user-centred application domains, such as personal assistants and coaching solutions. In such domains, the system is used
4We refer to “requests for clarifications” and not “clarifications” because the DM is responsible for deciding whether a request for clarification generated
by Converness will finally reach the user or not.
5The example has been simplified for presentation purposes.
6As already mentioned, the way the system interacts with the user, e.g. the way knowledge is translated into actual responses (TTS), is determined by
the dialogue manager and the other modules of the dialogue-based system. The role of Converness is to feed dialogue manager with conversational context
to help it decide on the best next move.
not only as a question answering system, answering questions like, e.g. “what is his favourite food”, but the interaction with the users might be more
complex, involving for example small talk, e.g. “I feel sad today”. In such cases, it is important for the Dialogue Manager to have not only topic-related
information, but also background and user-related information to achieve personalised and context-aware dialogue management.
We present in this section the contextual enrichment of conversational awareness with a non-monotonic reasoning layer. The aim is to infuse
user and background knowledge, such as health-related and profile information, in order to acquire a better understanding of the interaction,
resolve conflicts and provide intelligent feedback to the Dialogue Manager with respect to the ongoing context. Defeasible reasoning is used in
this case to provide a flexible conflict resolution and context prioritisation framework, defining a non-monotonic layer on top of the available
information for context-aware decision making and knowledge aggregation. The aim is to further enrich the output provided to Dialogue Manager,
linking discussion topics with relevant situations that can be used to facilitate advanced question answering and dialogue management. To this
end, Converness implements a hybrid modelling and reasoning scheme, combining ontological modelling and reasoning with defeasible rules.
Defeasible logics is a non-monotonic logics formalism (therefore dealing with conflicting, ambiguous and incomplete information) that delivers
intuitive knowledge representation and advanced conflict resolution mechanisms (?). In defeasible logics there are three types of rules:
Strict rules are denoted by Apand are interpreted in the typical sense: whenever the premises are indisputable, then so is the conclusion;
Defeasible rules are denoted by Apand, contrary to strict rules, they can be defeated by contrary evidence;
Defeaters are denoted by A pand do not actively support conclusions, but can only prevent deriving some of them by defeating respective
defeasible conclusions.
Furthermore, two important elements in defeasible logics are conflicting literals and superiority relationships. The former are sets of literals that
declare groups of competing rule conclusions, while the latter refer to relationships (>) that are used for resolving conflicts among defeasible rules.
For example, r1>r2indicates that r1overrides r2and the former rule’s derivations prevail. In this case rule r1is called superior to r2and r2inferior
to r1. Given the above characteristics of defeasible logics, our main motivation underlying its use in the context of Converness is primarily focused
on its superior ability to handle incomplete information. The latter is a common case in everyday reasoning, i.e. in communication and dialogue
management7, as well as in expert reasoning, like e.g. in medical diagnoses (?). Arguably, both domains are extremely relevant to Converness.
Given a conversational topic, the context-aware decision making layer tries to associate it with relevant domain entities, taking into account the
available user-related and background knowledge that has been infused in the form of defeasible rules. In order to support this task, the framework
is enriched with defeasible rules that prioritise knowledge. More precisely, each conversational topic tis associated with a defeasible theory Rt(i.e.
a rule base of defeasible logic rules) that handles domain contextual semantics and propagates content extraction requests from the knowledge
bases that contain domain information, e.g. user profile information.
Assuming that Tis the set of all conversational topics supported (tT), we define:
tT, Rt={ri:A(ri)#C(ri)}
where riis a unique label of the rule, A(ri)is the antecedent, C(ri)is the consequent and #depends on the type of rule:
#=
,if r is a strict rule
,if r is a defeasible rule
,if r is a defeater
Intuitively, the detection of topic tby conversational awareness (Algorithm 1), i.e. ∃{sim,ht,Dxi} ∈ T , triggers the inference mechanisms of the
defeasible rule base Rt, so as to further enrich the context. Assuming that Dtcontains the conclusions of defeasible reasoning triggered for topic
t, then we have the following contextualised enrichment:
{sim, ht, Dxi} Rt
→ {sim, ht, Dxi,Dt}
FIGURE 5 graphically illustrates the topic-centred stratification of knowledge. According to the type of the conclusions that are generated,
Converness supports the following rule types:
Specialisation rules (Sp
#), whose conclusions require the further enrichment of the current conversational context.
7https://seop.illc.uva.nl/entries/reasoning-defeasible/
FIGURE 5 Defeasible coupling of background knowledge and question answering.
Suggestive rules (Sg
#), whose conclusions associate the detected conversational topic with additional context, relevant to the discussion topic
and the existing background knowledge.
The conclusions, together with the conversational topic, are sent to the DM providing contextual knowledge useful for dialogue management
and further interaction with the user.
In this section, we present a scenario that illustrates the basic capabilities of our framework. It should be noted here that other aspects in dialogue-
based systems, such as adaptive dialogue management (e.g. user interaction management, selection of next actions, etc.), language analysis and
generation, question answering, etc., are out of the scope of this work.
In our scenario, the user informs the system about feeling pain (“I feel pain”). We assume that the generated verbal observations captured by
language analysis involve two key concepts, namely a and a reference, which represent the the current context:
O1={ }
The current observation context (O1) is matched against the conversational descriptors CD that the system has been initiated with
(Algorithm 1), searching for plausible conversation topics. Assuming the conversational descriptor cd1=h,{ }i and that
t v , the algorithm gives as output the multiset
T={ {1,h,{ }i} }
classifying O1as a topic.
The next step is to collect any available descriptive context, semantically enriching the topic. The defeasible theory for the topic
(Rpain) contains the following specialisation (Sp) rules:
r1:Pain Sp
PainIntensity
r2:Pain Sp
BodyPart
More specifically, the topic is associated with the and concepts that are sent back to the DM as potential specialisation
contexts to elicit additional information from the user. As such, Converness finally produces the following output:
T={{1,h,{ }i,{,}}}
During the interaction with the user, we assume that the DM decides to further enrich the current conversational context by asking the user
where he hurts, based on the provided concept. As a response, the user points to his head and says “It hurts here. A hurt spoken reference
is detected from speech analysis, as well as a deictic gesture to the head. The new multimodal observation context is given below:
O2={ }
The observations are passed to conversationalawareness to recognise the discussion topic. The context now satisfies (1), based on ,
and (3), based on . It should be noted that in our domain ontology, , promoting the recognition of as the current
conversational topic.
{1.0,h,{,}i}
The generic defeasible theory containing background knowledge for headaches involves the following defeasible rules for relevant treatment
recommendations:
r1:pain(X,head)Sg
recommend(X,sleep)
r2:pain(X,head)Sg
recommend(X,almonds)
r3:pain(X,head)Sg
recommend(X,mildPainkillers)
Indicatively, rule r1reads as “for users suffering from a headache, sleep should be recommended. The above rules8, which are part of the initial
configuration of the system (see section 6.1 for more details), are augmented with information regarding the individual’s profile. For instance, for
users suffering from almond intolerance, the following personalized rule would be appended to the rule base:
r4:pain(X,head),intolerance(X,almonds) ¬recommend(X,almonds)
Note that rule r4is a defeater and is expressing the exception to rule r2, meaning that it’s used only for retracting the latter rule’s conclusion.
As a second example, suppose that according to the user’s profile, he/she is suffering from frequent migraines. In this case, the following rule
would be appended, indicating that the user should take strong painkillers to alleviate the pain:
r5:pain(X,head),suffers(X,migraines)Sg
recommend(X,strongPainkillers)
As there is no need to derive both conclusions from rules r3and r5, the conflicting literals set Cbelow establishes the conflict between the two
conclusions and the superiority relationship ensures that the conclusion from rule r5will be eventually derived.
C={recommend(X,mildPainkillers),recommend(X,strongPainkillers)}
r5>r3(8)
In addition, the scenario involves a sleep sensor that monitors the quality of the night sleep and provides an assessment every morning. The
following defeater enriches context-based reasoning by fusing sleep quality information that overrides r1:
r6:pain(X,head),log(X,sleep) ¬recommend(X,sleep)
The rule base of the example can be submitted to a suitable defeasible logics rule engine, like DeLP (?) or DR-DEVICE (?). For a specific user
with the above profile parameters, Converness will eventually recommend to DM that the user should take strong painkillers for the headache,
since he/she suffers from migraines, overriding other plausible recommendations based on profile and sleep-related information. DM then decides
the next turn, i.e. whether the suggested response should be returned or a different conversation flow must be followed.
A prototype implementation of the Converness framework was evaluated combining the technological and research outcomes of two European
research projects, namely Dem@Care9and KRISTINA10.
Dem@Care is a platform that integrates a wide range of sensor modalities and high-level analytics to support accurate monitoring of all daily
life aspects of individuals including physical activity, sleep and activities of daily living (ADLs). All gathered knowledge is represented in a universal
format and semantic interpretation, via a hybrid reasoning scheme, and is used for complex activity recognition from atomic events, emotional and
8Almonds are widely considered as a natural way to ease headache pain.
9
10
well-being status and highlighting clinical problems. The high-level meaningful information is presented in applications tailored to clinicians, but
most importantly to caregivers and end-users themselves.
KRISTINA is a knowledge-based virtual agent for migrants with language and cultural barriers in the host country. It acts as a trusted information
provision party and mediator in questions related to basic care and healthcare. It allows for flexible reasoning-driven dialogue planning, instead
of using predefined dialogue scripts, supporting multimodal communication analysis and generation modules, as well as a search engine for the
retrieval of multimedia background content from the Web needed for conducting a conversation on a given topic.
Dem@Care has been deployed in 6 residences of individuals living alone and maintained for four months. The conversational agent of KRISTINA
was not part of these deployments, therefore the two frameworks have been loosely integrated in order to enable the virtual agent of KRISTINA to
access the profile information and interpretation results of the Dem@Care platform (?). The evaluation involved the interaction of elderly, health-
experts and caregivers with the KRISTINA agent, which has been endowed with the conversational awareness and context enrichment capabilities
of Converness. Only the performance of Converness is presented in this paper, i.e. the ability to recognise the discussion topic and couple profile
information. Other aspects of the dialogue-based agent (?), i.e. dialogue management, multimodal communication analysis, language generation,
are out of the scope of this paper (external modules have been used, as described in the next section) and they are not part of the Converness
evaluation.
FIGURE 6 presents the conceptual architecture of the integrated framework. The dialogue-based integrated modules of the KRISTINA platform
involve:
Multimodal communication analytics for processing verbal and non-verbal information, such as: a) Automated Speech Recognition, to
support the transformation of spoken language into text, using statistical speech models for both acoustic and language modeling11; b)
Language Analysis, projecting the outputs of syntactic and semantic parsing to a DOLCE+DnS UltraLite compliant representation (?); c)
Analysis of non-veral behaviour, such as emotions and gestures (?).
Dialogue Management for driving the conversation with users, taking decisions on the responses that should be returned or questions that
should be asked, including discourse analysis, knowledge database queries, question answering and system action prediction. (?).
Question Answering, for retrieving responses relevant to the topic of discussion and the needs of Dialogue Manager. As we describe in
section 6.1, the integrated platform supports the retrieval of information that is either relevant to behavioural aspects and profile information
of users (?), or to generic information retrieved from Web resources (?).
Language Generation for communicating information to the users. The verbal communication capitalises on the ontological representations
returned from question interpretation, following the inverse cascade of processing stages described in Language Analysis (?).
Avatar, for the agent’s non-verbal appearance. The agent is realised as an embodied virtual character, with gestures and facial expressions
being generated according to the semantics of the message that is to be communicated (?).
Converness extends the KRISTINA platform with domain reasoning capabilities (see FIGURE 1), enriching the Dialogue Manager services with
advanced conversational awareness and context enrichment services, further supporting question answering.
The Dem@Care platform provides the semantic knowledge bases that contain information about the user preferences and profiles, as well as
the behaviour interpretation results that are derived from the multimodal monitoring framework, e.g. activities, problems, etc. KRISTINA is used
as a multimodal spoken interface on top of the knowledge bases created by the Dem@Care platform, allowing users to elicit user-related, as well
as generic knowledge pertinent to the application domain, as described in the next section.
The experimental evaluation involves two user groups. The first group are elderly that live in their home with their caregivers. Six individuals living
in six separate homes have been considered, who are part of the Dem@Care pilots (five females - four Amnestic MCI, one mild dementia, and one
male with mild dementia). The Dem@Care platform has been used to support objective monitoring of problematic areas of daily living, utilizing
sensors, mobile devices for feedback and intelligent analysis in an Ambient Assisted Living context. The integration of the Dem@Care platform with
11
FIGURE 6 Conceptual architecture of the integrated framework used for experimental evaluation.
KRISTINA’s agent aims to facilitate an intuitive, dialogue-based interface for the elicitation of profile and behaviour information from the system,
both by the elderly and their caregivers. The second group involves internal IT and health practitioners that were invited to test the various aspects
of the framework (e.g. speed, accuracy, etc.) through guided conversations with the system in accordance with Good Clinical Practices (GCP).
The interaction sessions took about 20 minutes per subject (28 participants in total), with questions ranging from general healthcare (e.g.
information on certain diseases) to guidance and coaching (e.g. diet for diabetes). It should be noted that KRISTINA components were not integrated
during the Dem@Care pilots, therefore the system needed to be configured accordingly. This mostly involved the six Dem@Care users that have
been asked to interact with the agent, simulating a specific period during the actual pilots, using actual activity logs and profile data.
The conversation descriptors for the first user group were derived after several iterations with the clinicians in order to clearly define the
context of each topic, according to user preferences and profile information, but also taking into account clinical objectives and guidance. To this
end different descriptors have been used in each installation. For example, for the participants with mild dementia, Converness was enriched with
topics relevant to providing guidance on how to do certain activities, e.g. when to take the medication pills or where the drug box is stored. In
addition, several topics have been added to extend the topic hierarchy with additional conversation themes useful for the Dialogue Manager, such
as further interacting with the users to acquire information on their mood when such situations were detected by multimodal analysis. For the
second user group, the topics have been selected randomly from the already existing pool of topics.
A similar procedure has been followed for the enrichment of Converness with defeasible rules for the first user group. Initially, a set of generic
rules about general health knowledge has been elicited, such as what is the recommended treatment of the headache. These rules have been used
in all installations. In addition, for each participant, personalised rules have been considered relevant to the condition and profile information, such
as the medication they take, motor and mental disabilities, etc. For the second user group, again random profiles and preferences have been used.
To ease the elicitation of knowledge from the health experts, we used the Guided Rules UI and the Decision Tables in Spreadsheets of the Drools
business rule engine12. The acquired knowledge has been then translated into defeasible rules and loaded in the framework.
Converness has been configured to support natural language dialogues with the users on specific topics, acting as a social companion and
health coach. For example, Converness supports the recognition of topics and assists the DM in implementing a conversational flows regarding a
healthy diet:
User: I’m wondering if it is possible to have some advice about the suggested diet for diabetes.
System: Yes of course. Here you may find a reliable website with suggested healthy diets. Did you already get an appointment with your family doctor?
12
User: Yes, I should follow a diet with 1000 kcal per day.
System: OK, here is a table for relevant food options. I really hope it helps.
This example dialogue illustrates basic capabilities of the overall framework and the way dialogue management and the reasoning module of
Converness are combined. For example, a) the dialogue flow, b) the improvement of naturalness (e.g. “Yes, of course”, “I really hope it helps”) and c)
proactivity, by asking follow-up questions (e.g. “Did you already get an appointment with your family doctor?”) are core capabilities of DM, while
the proactive question topics and actual topic of the response is provided by Converness, e.g. the need to return a website or a table with additional
information on the diet (the actual URL of the website or the table that should be returned is part of information extraction and question answering
not handled by Converness).
There are many ways to model a domain using ontologies and the ontology development is essentially an iterative process. In this sense, there
are several methodologies for ontological engineering, such as On-To-Knowledge (OTK (?)), METHONTOLOGY (?), United Process for Ontologies
(UPON) (?) and Ontology Development 101 (?). Most of these methodologies introduce common features and ontology development guidelines.
For the purposes of Converness the ontological framework, we adopted the methodology of Ontology Development 101, which consists of the
following iterative steps:
step1 Determination of the domain and scope of the ontology
step2 Reuse of existing ontologies
step3 Enumeration of important terms
step4 Definition of the classes and the class hierarchy
step5 Definition of the properties
step6 Creation of instances
In literature, the determination of the domain and scope of the ontology can be documented in a template-based report called “Ontology
Requirements Specification Document” (ORSD) (?). This document allows the systematic specification of “why the ontology is being built”, “what
its intended uses are”, “who the end-users are, and “which requirements the ontology should fulfil”. In particular, the ORSD report contains the
following fields:
1. Purpose: the main general goal of the ontology (i.e. how the ontology will be used in Converness)
2. Scope: the general coverage and the degree of detail of the ontology
3. Implementation language: the formal language of the ontology
4. Intended end-users: the intended end-users expected for the ontology
5. Intended uses: the intended uses expected for the ontology
6. Ontology requirements: a) Non-functional requirements, the general requirements or aspects that the ontology should fulfil, including
optionally properties for each requirement, and b) Functional requirements: the content specific requirements that the ontology should fulfil
in the form of groups of competency questions and their answers, including optional priorities for each group and for each competency
questions
7. Pre- Glossary of terms: a) Terms from competency questions: the list of items included in the competency questions and their frequencies, b)
Terms from answers: the list of terms included in the answers and their frequencies, c) Objects: the list of objects included in the competency
questions and their answers
In the following, we present an excerpt of the ORSD document that has been created for the application domain of Converness, focusing on
the 1, 2, 3 and 6 (competency questions) fields of ORSD.
The purpose of the Converness ontological framework is to provide the ontological structures and vocabularies (ontologies) that are
able to represent:
Frame-related information derived from language analysis of verbal communication
Non-verbal information
User profile information and behaviour aspects
Basic care and healthcare information inserted as background knowledge in the system, e.g. gold-standard information provided by the trial
partners for each scenario
Conversation-specific information made available during the dialogue process
The Converness ontology has to formally represent:
Frame-related information that annotates the results of language analysis. It will enable the system to understand the inferred concepts and
derive the conversation topic relating it with domain ontologies.
Non-verbal information extracted from different modalities, such as facial expressions and gestures. It will be combined with the verbal
information for deriving the conversation topic
User profile and behaviour information, such as biographical details, preferences and dislikes, routines, activities, problems and habitual
actions
Basic care, healthcare and medical information. It will cover the background knowledge that is needed to enhance the user profile
information and compile the appropriate feedback (warnings, recommendations, etc.). This involves disease-related information, basic
care information and practises, healthcare points of contact information, medical specialists information, normative ranges indicating
normal/abnormal functions
Conversation-specific information (i.e. topics and their semantics), which enables the system to understand the current conversational topic
and feed dialogue manager with additional context.
The ontology is implemented in OWL 2, the officially recommended language by W3C for knowledge represen-
tation in the Semantic Web. The modelling requirements of the current application domain of Converness can be addressed by the expressivity of
OWL 2 DL ontologies (see Section 2 in the OWL 2 Web Ontology Language documentation13), i.e. OWL 2 ontologies interpreted using the Direct
Semantics14. As such, knowledge in Converness can be modelled in any OWL 2 profile (RL, QL, EL), since each one is more restrictive than OWL
2 DL.
We present examples of competency questions that have been defined to
drive the development of the ontological framework. The questions have been collected after several iterations with both technical and health
experts, in order to elicit the core modelling requirements.
Group: Entities and events
What are the main categories a person may belong to? (Person with dementia, MCI, elderly, carer, clinician)
What are the main types of events? (Events related to a person (i.e. activities and states), events related to physiological measurements,
events related to ambient measurements, communicative events (e.g. utterances))
What are the main types of information describing an event? (The agent of the event (i.e. the referred person), start time, duration, and
location (where applicable))
13https://www.w3.org/TR/owl2-overview/
14https://www.w3.org/TR/owl2-overview/#ref-owl-2-direct-semantics
What are the activities supported by the AAL platform? (Having meal, preparing meal, leaving the table during a meal, leaving bed at night,
taking a nap, etc.)
Group: Preferences and dislikes
What is the user’s favourite hobby? (listening to the radio, watching tv, doing handiwork, musical instrument, reading newspaper, doing a
crossword, doing a jigsaw puzzle or a board game)
What is the user’s favourite entertainment? (tv programmes, series, newspaper, radio stations)
What are the user’s favourite magazines or newspapers?
What are the user’s preferences as regards the care-giving? (nurse of different gender, knowledge about diseases)
What are the user’s limitations due to diseases? (visual impairment, hearing impairment, arthritis in knees/fingers, restricted mobility,
memory loss)
What kind of support does the user need? (Going to the bathroom, dressing, eating, etc.)
Group: Routines
When does the user usually eat?
What time does the user usually take a nap? How long?
What time does the user go to bed?
What are the user’s diet restrictions? (diabetes diet, low-salt, high energy, several small meals)
Group: Medical, healthcare and medical information
What is the normal sleeping time duration?
How much water should I drink per day?
What time should I eat my meals?
What are the suggested practices for dementia?
Group: Modalities
What gestures are supported by the system? (point to the head, chest, arm, wrist, back, etc.)
What facial expressions are supported by the system? (sad, apathy, anger, etc.)
In order to formally validate the derived ontological models and ensure that they capture the intented semantics, an empirical validation and test
coverage took place, focusing on determining the soundness and completeness of the domain ontologies, the conversational awareness and the
underlying descriptors.
A number of OWL 2 constructs can be used to verify data quality, such as property restrictions (cardinality,
domain/range) and complex class constructors (e.g. disjoint classes) that can be used to automatically check the semantic consistency of the knowl-
edge base. For example, an instance cannot simultaneously belong to the and classes in FIGURE 3, provided
that these two classes have been defined as disjoint. In such cases, OWL 2 reasoning engines, suchas the Hermit reasoner (?), can be used to check
the ontology about semantic inconsistencies. The semantic consistency of Converness has been checked using both a DL reasoner (Hermit) and
the OWL 2 RL Materialized Reasoner of AllegroGraph 15.
15https://franz.com/agraph/allegrograph/
FIGURE 7 Test- and query- driven consistency checking.
The validation of the ontologies with respect to the ontology schema does not necessarily mean that they encapsulate
the intended semantics (?). To this end, we have defined a set of competency (SPARQL) queries, i.e. questions that are related to certain parts of
the knowledge base (i.e. properties and classes), or they are pertinent to specific use-cases. A SPARQL query has been created for each question
that retrieves from the knowledge base the relevant constructs. For example, in order to check that all facial expressions are supported by the
framework can be captured by the underlying models (see FIGURE 3), the following SPARQL query has been defined, which retrieves all the
subclasses of the class:
The results of each question have been logged and presented to the technical and health experts in order to verify that the ontology encapsulates
all the necessary concepts, according to the requirement of the application domain.
Finally, an empirical validation took place, focusing on determining the soundness and completeness of conversational
awareness and the underlying descriptors, assuming that perfect information is provided as input. To this end, 214 interactions have been imple-
mented to test directly Converness without involving the other components of the agent, providing each time correct communication analysis
results (e.g. the complete set of the ontological concepts that are part of the user utterances). For example, the question: “Does Stefan have any
memory problem?” is associated with the descriptor MemoryProblem, which is further associated with the concepts Problem, Memory, CareRe-
cipient. This empirical validation was very useful to the health practitioners, since they had the ability to adjust and correct the knowledge that
has been infused to the system in the form of conversational descriptors. As such, we ensured that Converness is able to understand the context
and output the expected contextual information, i.e. the topic of discussion or a clarification response, provided that the correct (expected) input
is given.
Two types of evaluation have been performed in order to test different aspects of Converness.
Although the process for validating the descriptors in the previous section is useful, it is not realistic to always expect perfect input. To this end, a
second experiment has been performed where the previous tests have been updated to simulate the presence of noise in a controlled environment.
More precisely, we have modified the input sets of the 214 cases, incorporating each time different levels of noise by assigning to each input concept
the probability to appear in a question. In this way we allow the generation of partial and incorrect input. The evaluation measures the relatedness
of the results generated by Converness with respect to the actual conversation context. Relatedness refers to the relevance rof the information
provided to the Dialogue Management, i.e. the topic sent as the ongoing conversation topic (conversational awareness) and clarifications. In terms
of conversational awareness (rT), the topic provided by Converness (t0) is considered as relevant if it conceptually matches the actual conversational
topic (t), as defined in (9).
rT(t0, t) =
1,t0vt
0,otherwise
(9)
For clarifications, the relevance is calculated by (10) that compares the set Tof topics expected to be part of a clarification against the list of
topics T0returned by Converness, taking into account the total number of topics in T0. The results are given in TABLE 1.
rC(T,T0) = |T ∩ T 0|
|T 0|(10)
TABLE 1 Testing results with different levels of noise
Noise rT%rC%
0 100 100
20% 76.40 81.25
40% 61.53 59.41
60% 46.98 32.19
80% 32.73 16.14
As described earlier, ris 100% in both cases, provided that correct/complete input is given (no noise). We observe that for noise level close
to 20%, performance is still very good, especially regarding the accuracy of clarifications. As noise increases, rTproves to be more noise tolerant,
compared to rc. This is mainly due to the fact that conversational awareness takes into account the history of interactions, therefore in some cases
the correct conversational topic is detected, even if we have missing information. On the other hand, rCis more susceptible to noise, since the
clarification results tend to include many topics in T0when noise increases, which negatively affects the performance, according to (10).
Finally, Converness was tested in real conversations with the user groups described in Section 6.1.2. In this case, noise comes directly from the
communication analysis components, thus the performance of Converness strongly depends on the performance of the other modules of the
architecture (see FIGURE 6). In addition, the final response of the agent is derived by the DM logic and strategies. In order to evaluate Converness,
we logged the results of conversational awareness and reasoning, so as to be able to assess its performance in real settings. TABLE 2 depicts the
performance of Converness. Compared to TABLE 1, the performance is close to the results we got with noise levels around 20%.
TABLE 2 Testing results with real conversations
rT%rC%
72.78 79.13
In order to practically assess the deployment of defeasible logics in the proposed framework, we created a defeasible theory that extends
the example presented in subsection 5.2.2. The aim was to evaluate the correctness of the inferred conclusions with regards to the rule-based
reasoning process, and to justify potential cases of unexpected results. The rule base contains domain-specific recommendations for treating head-
and stomachache, as well as a set of parameterizable initial conditions (in the form of facts) that represent the end-user. The latter include activity
logs (e.g. sleep log), as well as profile information, like e.g. intolerance to specific foods, frequent migraines, and pregnancy. Conflict resolution is
effectively ensured through defeater rules, rule priorities, and conflicting literals in the rule base. We formalised the defeasible theory in DeLP
format (?), and submitted to the rule engine various cases of sample users suffering either from headache or stomachache or both, combined
with zero or more profile parameters, like e.g. almond intolerance or pregnancy. Since DeLP does not support conflicting literals, we adopted
the approach proposed in (?), according to which conflicting literals are converted into a set of defeasible rules and accompanying superiority
relationships.
The rule execution in the aforementioned sample scenarios has led to interesting insights, which reasonably revolve around conflict resolution
and overall context understanding. More specifically, for users suffering only from a single pathology (i.e. headache or stomachache), the derived
conclusions are always correct, independently of the profile parameters each time. A more interesting result, however, emerges arguably when the
user suffers from both ailments. Combined with various combinations of initial user profile parameters, this case triggers numerous conflicts. The
results from these conflicts are tightly associated to the specific implementation (i.e. rule engine) and the deployed conflict resolution strategy. A
representative example in our rule base involves a pregnant user suffering from frequent migraines. Choosing as the
preferred strategy (?) helps infer that mild painkillers are recommended. Contrasted, relying on rule priorities only (i.e. the most generic conflict
resolution strategy), leads to the derivation of no conclusions for the same initial parameters.
All in all, the above test runs have resulted in efficient conclusions for most cases, but have also revealed a few implementation-specific
shortcomings, which have to be taken into consideration in practical deployments of Converness.
In this paper we presented a framework for conversational awareness and context understanding in spoken dialogue systems combining ontologies
and defeasible reasoning. OWL 2 is used to model multimodal input and the semantics that underpin the conversational logic, while defeasible rules
provide the non-monotonic semantics needed to deliver intuitive knowledge representation and advanced user centred context understanding.
We described a real use case through its integration into a multimodal dialogue-based agent that elderly use to acquire information and sugges-
tions related to basic care. Preliminary results have shown that the performance of conversational awareness is acceptable, even with increased
levels of noise, while the defeasible layer considerably enriches the interactivity with the user, incorporating advanced context interpretation and
management of conflicting knowledge.
We are currently conducting pilots for collecting additional data and evaluating the framework with more use cases. In parallel, we are working
towards further enrichment of reasoning so as to support additional use cases, e.g. taking into account emotions and facial expressions.
This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the
Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (project code:T1EDK-
00686). We would like also to thank Professors Guillermo R. Simari and Alejandro J. Garcia for giving us access to the DeLP software.
The authors declare no potential conflict of interests.
Georgios Meditskos. He received his PhD degree in Informatics from Aristotle University of Thessaloniki for his dissertation
on "Semantic Web Service Discovery and Ontology Reasoning using Entailment Rules". He also holds an MSc and a BSc degree
from the same department. Since January 2012 he is working as a postdoctoral research fellow at the Information Technologies
Institute (ITI) of the Centre for Research and Technology Hellas (CERTH). He has participated in numerous European and
national research projects and he is the author of more than 45 publications in refereed journals and international conferences.
His research interests include Knowledge Representation and Reasoning in the Semantic Web (RDF/OWL, rule-based ontology
reasoning, combination of rules and ontologies) and Context-based multimodal reasoning and fusion.
Efstratios Kontopoulos. He received his PhD in Artificial Intelligence from the Aristotle University of Thessaloniki in 2011.
He also holds a BSc in Applied Mathematics from the same University (2003) and an MSc in Computer Science from the
University of Essex, UK (2004). He participated in several national and international research projects and published a number
of international journal and conference proceedings papers. He currently works as a Research Associate at the Information
Technologies Institute (ITI) of the Center for Research and Technology, Hellas (CERTH), where he serves as the lead researcher
for H2020 project SUITCEYES, and as a task leader in various other H2020 projects. His research interests include knowledge
representation and reasoning, semantic technologies, and rule-based systems.
Stefanos Vrochidis. He has a diploma degree in electrical engineering from Aristotle University of Thessaloniki, a master’s
degree in radio frequency communication systems from the University of Southampton, and a PhD in electronic engineering
from Queen Mary, University of London. He is a senior researcher with CERTH-ITI and co-founder of Infalia Private Company.
His research interests include semantic multimedia analysis, information retrieval, semantic search, data mining, multimedia
search engines and human interaction, computer vision and robotics, open source intelligence and security applications. Dr
Vrochidis has participated in more than 25 National and European projects, in 3 of which he has been the Project (deputy)
Coordinator and in 4 the Scientific/Technical Manager. He has been the co-organizer of various workshops and has served as
regular reviewer in several scientific journals and conferences. He is also the co-author of more than 130 conference, journal and book chapter
articles.
Ioannis Kompatsiaris. He is a Researcher Director at CERTH-ITI and the Head of Multimedia Knowledge and Social Media
Analytics Laboratory. His research interests include multimedia, big data and social media analytics, semantics, human computer
interfaces (AR and BCI), eHealth, security and culture applications. He is the co-author of 129 papers in refereed journals, 46
book chapters, 8 patents and more than 420 papers in international conferences. Since 2001, Dr. Kompatsiaris has participated
in 59 National and European research programs including direct collaboration with industry, in 15 of which he has been the
Project Coordinator and in 41 the Principal Investigator. He has been the co-organizer of various international conferences
and workshops and has served as a regular reviewer, associate and guest editor for a number of journals and conferences. He
is a Senior Member of IEEE and member of ACM.
... After textual parsing and visual inferencing occurs, the robotic system tries to understand: (i) The form of conversation it is having (i.e., if the user wants explanation, recommendation, or information about something), or (ii) if it needs to return information about the question using the Statistical Speech Model Vocapia 7 from its internal KB. Very similar is the recent study of Meditskos et al. (2019), with one significant extension; the authors use web repositories such as DBpedia 8 and WordNet 9 to enrich the KB of the system. ...
Article
Full-text available
Argumentation and eXplainable Artificial Intelligence (XAI) are closely related, as in the recent years, Argumentation has been used for providing Explainability to AI. Argumentation can show step by step how an AI System reaches a decision; it can provide reasoning over uncertainty and can find solutions when conflicting information is faced. In this survey, we elaborate over the topics of Argumentation and XAI combined, by reviewing all the important methods and studies, as well as implementations that use Argumentation to provide Explainability in AI. More specifically, we show how Argumentation can enable Explainability for solving various types of problems in decision-making, justification of an opinion, and dialogues. Subsequently, we elaborate on how Argumentation can help in constructing explainable systems in various applications domains, such as in Medical Informatics, Law, the Semantic Web, Security, Robotics, and some general purpose systems. Finally, we present approaches that combine Machine Learning and Argumentation Theory, toward more interpretable predictive models.
... They employed a single Transformer encoder to encode the text information, and utilized a sliding window of n frames and a linear layer to extract video features. Meditskos et al. [22] presented a framework for the semantic enrichment and interpretation of communication modalities in dialogue-based interfaces. ...
... Ontologies have been used in many cognitive robotic systems which perform object identification [8,22,31], affordances detection (i.e. the functionality of an object) [2,16,25], and for robotic platforms that work as caretakers for people in a household environment [20,34]. We can see an extensive survey on these topics in [9]. ...
Chapter
Full-text available
In the field of domestic cognitive robotics, it is important to have a rich representation of knowledge about how household objects are related to each other and with respect to human actions. In this paper, we present a domain dependent knowledge retrieval framework for household environments which was constructed by extracting knowledge from the VirtualHome dataset (http://virtual-home.org). The framework provides knowledge about sequences of actions on how to perform human scaled tasks in a household environment, answers queries about household objects, and performs semantic matching between entities from the web knowledge graphs DBpedia, ConceptNet, and WordNet, with the ones existing in our knowledge graph. We offer a set of predefined SPARQL templates that directly address the ontology on which our knowledge retrieval framework is built, and querying capabilities through SPARQL. We evaluated our framework via two different user evaluations.
Chapter
The current Internet of Things (IoT) development involves ambient intelligence which ensures that IoT applications provide services that are sensitive, adaptive, autonomous, and personalized to the users’ needs. A key issue of this adaptivity is context modelling and reasoning. Multiple proposals in the literature have tackled this problem according to various techniques and perspectives. This chapter provides a review of context modelling approaches, with a focus on services offered in Ambient Assisted Living (AAL) systems for persons in need of care. We present the characteristics of contextual information, services offered by AAL systems, as well as context and reasoning models that have been used to implement them. A discussion highlights the trends emerging from the scientific literature to select the most appropriate model to implement AAL systems according to the collected data and the services provided.
Book
Full-text available
This open access book constitutes the refereed proceedings of the 16th International Conference on Semantic Systems, SEMANTiCS 2020, held in Amsterdam, The Netherlands, in September 2020. The conference was held virtually due to the COVID-19 pandemic. The proceedings are available as open access volume under a CC-By license. You can download it here: https://www.springer.com/de/book/9783030598327#aboutBook
Article
Full-text available
Multi-domain spoken dialogue is a challenging field where the objective of the most proposed ideas is to mimic the human–human dialogue. This paper proposes to tackle the domain selection problem in the context of multi-domain spoken dialogue as a set theory problem to resolve. First, we built each dialogue domain as an ontology following an architecture with some rules to respect. Second, each ontology is considered as a set and its concepts are the elements. Third, an ontology-based classifier is used to map the user sentence into a set of ontologies concepts and to generate an intersection between these concepts. Finally, a new turn analysis and domain selection algorithm is proposed to infer the intended domain from the user sentence using the intersection set and three techniques, namely Domain Rewards, Dominant Concept, and Current Domain. To evaluate the proposed approach, a corpus of 120 simulated dialogues was built to cover four application domains. In our experiment, the assessment of the system is performed by considering all possibilities of a natural verbal interaction where a changing of semantic context occurs during the dialogue. The obtained results show that the system accuracy reaches a satisfactory performance of 83.13% while the average number of turns by dialogue is 6.79.
Article
Full-text available
We present iKnow, an ontology-driven framework for semantic situation understanding in pervasive multi-sensor environments for human activity recognition. iKnow capitalises on the use of OWL ontological knowledge to capture domain relationships between low-level observations and high-level activities, while context aggregation and activity interpretation are supported through context-aware fusion. Rather than using ontologies as highly-structured, strict contextual models, our aim is to capture abstract dependencies among low- and high-level concepts, such as locations and objects involved in activities, towards addressing practical real-world challenges in the domain. The framework has been applied in the eminent field of healthcare, providing the models for the semantic enrichment and fusion of heterogeneous multisensory descriptors for monitoring the behaviour of people with Alzherimer’s disease.
Chapter
Dialogue management (DM) is a difficult problem. We present OntoVPA, an Ontology-Based Dialogue Management System (DMS) for Virtual Personal Assistants (VPAs). The features of OntoVPA are offered as generic solutions to core DM problems, such as dialogue state tracking, anaphora and coreference resolution, etc. To the best of our knowledge, OntoVPA is the first commercially available, fully implemented DMS that employs ontologies and ontology-based rules for (a) domain model representation and reasoning, (b) dialogue representation and state tracking, and (c) response generation. OntoVPA is a declarative, knowledge-based system which can be customized to a new VPA domain by modifying and exchanging ontologies and rule bases, with very little to no conventional programming required.
Book
This book explores novel aspects of social robotics, spoken dialogue systems, human-robot interaction, spoken language understanding, multimodal communication, and system evaluation. It offers a variety of perspectives on and solutions to the most important questions about advanced techniques for social robots and chat systems. Chapters by leading researchers address key research and development topics in the field of spoken dialogue systems, focusing in particular on three special themes: dialogue state tracking, evaluation of human-robot dialogue in social robotics, and socio-cognitive language processing. The book offers a valuable resource for researchers and practitioners in both academia and industry whose work involves advanced interaction technology and who are seeking an up-to-date overview of the key topics. It also provides supplementary educational material for courses on state-of-the-art dialogue system technologies, social robotics, and related research fields.
Article
This book presents adaptive logics as an intuitive and powerful framework for modeling defeasible reasoning. It examines various contexts in which defeasible reasoning is useful and offers a compact introduction into adaptive logics. The author first familiarizes readers with defeasible reasoning, the adaptive logics framework, combinations of adaptive logics, and a range of useful meta-theoretic properties. He then offers a systematic study of adaptive logics based on various applications. The book presents formal models for defeasible reasoning stemming from different contexts, such as default reasoning, argumentation, and normative reasoning. It highlights various meta-theoretic advantages of adaptive logics over other logics or logical frameworks that model defeasible reasoning. In this way the book substantiates the status of adaptive logics as a generic formal framework for defeasible reasoning.
Chapter
While example-based dialog is a popular option for the construction of dialog systems, creating example bases for a specific task or domain requires significant human effort. To reduce this human effort, in this paper, we propose an active learning framework to construct example-based dialog systems efficiently. Specifically, we propose two uncertainty sampling strategies for selecting inputs to present to human annotators who create system responses for the selected inputs. We compare performance of these proposed strategies with a random selection strategy in simulation-based evaluation on 6 different domains. Evaluation results show that the proposed strategies are good alternatives to random selection in domains where the complexity of system utterances is low.