Conference PaperPDF Available


In this paper we present an ontology-driven framework for natural language question analysis and answering over user models (e.g. preferences, habits and health problems of individuals) that are formally captured using ontology design patterns. Pattern-based modelling is extremely useful for capturing n-ary relations in a well-defined and axiomatised manner, but it introduces additional challenges in building NL interfaces for accessing the underlying content. This is mainly due to the encapsulation of domain semantics inside conceptual layers of abstraction (e.g. using reification or container classes) that demand flexible, context-aware approaches for query analysis and interpretation. We describe the coupling of a frame-based formalisation of natural language user utterances with a context-aware query interpretation towards question answering over pattern-based RDF knowledge bases. The proposed framework is part of a human-like socially communicative agent that acts as an intermediate between elderly migrants and care personnel, assisting the latter to solicit personal information about care recipients (e.g. medical history, care needs, preferences, routines, habits, etc.).
Question Answering over Pattern-Based User Models
Georgios Meditskos
Information Technologies
Institute, CERTH, Greece
Stamatia Dasiopoulou
Information and
Communication Technologies
Dept., UPF, Spain
Stefanos Vrochidis
Information Technologies
Institute, CERTH, Greece
Leo Wanner
ICREA and Information and
Communication Technologies
Dept., UPF, Spain
Ioannis Kompatsiaris
Information Technologies
Institute, CERTH, Greece
In this paper we present an ontology-driven framework for
natural language question analysis and answering over user
models (e.g. preferences, habits and health problems of in-
dividuals) that are formally captured using ontology design
patterns. Pattern-based modelling is extremely useful for
capturing n-ary relations in a well-defined and axiomatised
manner, but it introduces additional challenges in building
NL interfaces for accessing the underlying content. This is
mainly due to the encapsulation of domain semantics in-
side conceptual layers of abstraction (e.g. using reification
or container classes) that demand flexible, context-aware
approaches for query analysis and interpretation. We de-
scribe the coupling of a frame-based formalisation of natu-
ral language user utterances with a context-aware query in-
terpretation towards question answering over pattern-based
RDF knowledge bases. The proposed framework is part of
a human-like socially communicative agent that acts as an
intermediate between elderly migrants and care personnel,
assisting the latter to solicit personal information about care
recipients (e.g. medical history, care needs, preferences, rou-
tines, habits, etc.).
CCS Concepts
Information systems Ontologies; Query repre-
sentation; Question answering; Computing method-
ologies Natural language processing;
language analysis, question answering, ontology design pat-
terns, user models
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
SEMANTiCS 2016, September 12-15, 2016, Leipzig, Germany
2016 ACM. ISBN 978-1-4503-4752-5/16/09. . . $15.00
As the amount of structured knowledge made available in
the Linked Data cloud and in proprietary knowledge bases
keeps growing, so does the pursuit for effective accessing and
querying paradigms. Within this endeavour, recent years
have witnessed important advances in natural language in-
terfaces (NLIs) and Question Answering (QA) systems for
structured data that allow users to express their information
needs in an intuitive manner, while hiding the complexity of
formal knowledge representation and query languages [19].
The key challenge in these efforts is to bridge the gap be-
tween the way users communicate with the system and the
way domain knowledge is captured, and more specifically to
translate the questions expressed in natural language into
structured queries, such as SPARQL, so that pertinent an-
swers can be retrieved from the underlying knowledge bases.
This usually involves the translation of the natural language
questions into semantically enriched structures that capture
the meaning of requests, and the formulation of pertinent
queries in accordance with the conceptualisation of the un-
derlying structured data sources.
Most of the existing approaches provide support only for
factoid queries, including predicative (e.g. Who is the daugh-
ter of Robert Kennedy married to? ), list (e.g. Give me all
cities in Germany.) and yes/no (e.g. Is Woody Allen an
actor?) ones, translating the natural language questions
into triple-based representations; corresponding SPARQL
queries are subsequently constructed, relying on some notion
of similarity. As such, the answers correspond to plain query
variable bindings, and the focus is primarily directed to the
two key pertinent challenges [28], namely how to overcome
the conceptual mismatch between the triple-based question
representations and the underlying knowledge model (e.g.
matching the have inhabitants in hBarcelona, have inhabi-
tants, valueiwith the dbo:populationTotal) and how to cope
with lexical ambiguities.
Confronting these two challenges is clearly fundamental
for affording intuitive access to the growing amount of struc-
tured knowledge made available (e.g. DBpedia, YAGO);
yet, it leaves open question answering over more conceptu-
ally demanding domains, such as habits and daily routines
profiling, that inherently involve complex relational contexts
that go beyond (chains of) binary associations and abide in-
stead to ontology patterns design principles [11]. Although
different ODPs endorse different levels of generality [8], they
usually describe abstract roles and relationships so that each
pattern can be applied in a wide variety of situations. This
level of generalization fosters reusability and extensibility,
but imposes certain challenges both in the formalisation of
the natural language questions and in the subsequent con-
tent matching and retrieval. For example, the annotation
or encapsulation of domain knowledge within rich n-ary re-
lations requires context-driven knowledge extraction solu-
tions, beyond simple queries that are formulated based on
one-to-one entity and relation mappings.
Aiming towards NL query interfaces over conceptually
rich knowledge bases, the presented framework lies in the
intersection of three research fields, namely knowledge dis-
tillation from text, question answering and pattern-based
user modelling. More specifically, leveraging ontology design
principles and linguistic frames, we present a reified repre-
sentation paradigm for capturing natural language questions
that express complex relations (i.e. events and situations in-
volving n-ary dependencies). The resulting ontological rep-
resentations serve then as input to a knowledge-driven in-
terpretation and question answering framework for context
matching and retrieval over RDF data sources containing
pattern-based conceptualisations. The current investigation
emphasis is on accessing individuals’ knowledge, such as ac-
tivity norms and behavioural patterns, diet preferences and
health problems, captured by expressive ODPs that extend
the DOLCE-DnS Ultralight design patterns1. To the best
of our knowledge, this is the first attempt to explicitly cope
with NL interfaces for QA over conceptual rich KBs, i.e.
pattern-based KBs that encapsulate rich axiomatizations.
The rest of the paper is structured as follows. Section 2
discusses related efforts and current limitations in addressing
question answering over conceptually rich knowledge bases,
motivating and contrasting our work within the existing lit-
erature. Sections 3 and 4 present the proposed question
analysis and context extraction approaches, which Section 5
explicates through an example use case. Last, Section 6
concludes the paper and outlines next steps.
2.1 Ontology-based Question Answering
Several approaches have been proposed in the literature
that address QA over Semantic Web knowledge bases [19].
Most of them focus on the generation of one or more SPARQL
queries through the interpretation of the semantic structure
of the user questions, while others opt for graph-based ap-
proaches to mitigate the rigidness often entailed in formu-
lating appropriate SPARQL queries.
PowerAqua [17] allows users to choose an ontology and
pose queries relevant to this ontology vocabulary. The re-
sults of language analysis are serialised into triples, which
are further annotated with ontology resources. Finally, the
triples are translated into logical queries that retrieve an-
swers from the underlying knowledge sources. NLP-Reduce
[16] processes queries as bags of words, employing stemming
and synonym expansion. It attempts to match the parsed
question words to the synonym-enhanced triples stored in
the lexicon generated from a KB and expanded with Word-
Net synonyms, generating SPARQL statements for those
matches. FREyA [7] is an interactive Natural Language
Interface for querying ontologies, which combines syntac-
tic parsing with the ontology-based lookup in an attempt
to precisely answer questions. If the system fails to auto-
matically generate the answer, suggestions are shown to the
user found through ontology reasoning. The system then
learns from user selections, and improves its performance
over time. Other relevant approaches include [26, 30, 29]
for SPARQL generation based on templates or query pat-
terns, [2] for retrieving individual and generic knowledge us-
ing the structured query language OASSIS-QL and [25] for
keyword-driven SPARQL generation. A domain-restricted
QA framework is presented in [9] that is based on fixed QA
topics associated with predefined SPARQL queries. Learn-
ing and scoring heuristics for filtering out redundant queries
are common practices to cope with mismatches between the
structure of questions and background knowledge.
As far as graph-driven approaches that reduce QA to sub-
graph matching problem are concerned, a recent example is
the graph-traversal based approach presented in [31] which is
based on topological patterns and similarity metrics between
predicate labels and entities. In a similar manner, Zou et
al. [32] computes the semantic similarity of matching vertices
and edges between the subgraph and the query graph. This
approach is further supported by an offline process, where a
graph mining algorithm maps natural language phrases to
top-k possible predicates in a RDF dataset, forming a para-
phrase dictionary that is used for question understanding.
In [10], after parsing the NL query, the algorithm outputs a
list of ranked triple paths following from a pivot entity to the
final resource representing the answer, ranked by the aver-
age of the relatedness scores in the path. A similar approach
is followed in [1].
Summing up, the focus has been on simple, factoid ques-
tions, where the NL inputs comprise primarily light linguis-
tic constructions and the answers target respective bindings
on (chains of) binary properties. Much the same applies to
current evaluation methods, such as the Question Answering
over Linked Data (QALD) benchmark initiatives [18], where
comparatively few, complex NL questions are included and
evaluation is performed on linked data sets with simple con-
ceptual models on highly interlinked resources, assuming
that answers are explicitly represented in the KB, possibly
following a different terminology.
2.2 Mapping NL to Semantic Representations
As previously outlined, most QA systems adhere to shal-
low linguistic analysis and triple-based serialisations, falling
short to cope with the translation of complex NL questions
into faithful semantic representations and respective queries.
A notable exception is the Pythia question answering sys-
tem [27], where deep linguistic analysis is used to compo-
sitionally construct general meaning representations from
NL questions involving quantification, aggregation functions
and superlatives. Although certain portability and scalabil-
ity concerns apply, due to the need for explicating admis-
sible linguistic realisations of the considered domain ontol-
ogy classes and properties, the main concern is about the
difficulty of assessing its performance over conceptually de-
manding domains, as the reported evaluation ran over the
Mooney’s dataset2that adheres to a simple ontology for ge-
ographical information.
Parallel to QA-motivated efforts for capturing the seman-
tic structure of NL questions, there has been recently a
growing interest for paradigms for the principled transla-
tion of NL texts into RDF/OWL representations. Semi-
nal examples include among others, LODifier, PIKES and
FRED. In LODifier [3], Discourse Representation Structures
(DRSs) [15], extracted by means of deep semantic parsing,
are converted to RDF triples using transformation rules that
map the unary and binary DRS conditions to respective class
and property assertions, while RDF reification is used for
logical and modal descriptions, such as disjunction and pos-
sibility. Focusing on publishing text as linked data, certain
design choices, such as the use of blank nodes, become prob-
lematic, at least without some further post-processing and
refactoring, for application contexts, such as NL QA ques-
tion, that require cleaner representations that are closer to
Semantic Web best practices.
Adopting a more knowledge-oriented paradigm, PIKES [6]
extracts entities and complex relations between them, us-
ing deep semantic parsing and linguistic frames, and sub-
sequently converts them into respective OWL graphs. The
translation follows a neo-Davidsonian representation style,
where frames are represented as reified objects, connected
to each of their participants by means of properties that
reflect the semantic roles of the participants, using, among
others, the VerbNet3and FrameNet4semantic role reposi-
tories. To this end, SPARQL-like rules are used to refactor
the linguistically grounded representations (“mention layer”)
to respective knowledge assertions (“instance layer”), while
post-processing is applied to materialise implicit knowledge
and compact redundant structures. The uniform treatment
of the various frame categories can result however in counter-
intuitive representations (e.g. introducing instances of two
distinct classes for the same real-world entity); moreover, the
alignment with foundational ontologies is not considered.
FRED [22] combines Discourse Representation Theory [15],
linguistic frames, and ontology design patterns, to produce
RDF/OWL ontologies and linked data from text. Deep
semantic parsing is used to capture entities and the rela-
tions between them as DRS structures. Semantic role la-
belling is performed using VerbNet and FrameNet roles.
What distinguishes FRED from other approaches and ren-
ders it as the work that is most relevant to our pursuits, is
that it maximises modelling choices in accordance to Seman-
tic Web principles and grounds the transformation and re-
engineering of DRS structures to RDF/OWL graphs on the
event and situation semantics as defined in DOLCE+DnS
Ultra Lite, modelling semantic roles as object properties.
Certain features, including the mostly verbal coverage of
events and the introduction of periphrastic properties, im-
pact the completeness and transparent semantics of the re-
sulting graphs.
Summing up, in the lack of principled paradigms for for-
malising NL expressions and given the non-trivial choices
involved, the relevant works afford varying degrees of ex-
pressivity in line with the considered application contexts.
2.3 Motivation & Approach
Situation Event
Event U Object
Event Description
Concept Parameter
IncludedIn isSpace
(a) Participation Pattern (b) Mereology Pattern
(d) Correlation Pattern
(c) Causality Pattern (e) Documentation Pattern
(f) Interpretation Pattern
Situation Event
Domain Ontology
(g) Applying the causality pattern and participation pattern
Cause Effect Justification
Event Description
classifies isRoleOf
power-outage-1 : Event
person-1 : PhysicalAgent house-1 : Object
snapped-power-pole-1 : Eventcaus-sit-1 : EventCausalitySituation
caus-desc-1 : EventCausalityDescription cause-1 : Cause effect-1 : Effect
part-desc-1 : EventParticipationDescription citizen-1 : Citizen affected-bldg-1 : AffectedBuilding
part-sit-1 : EventParticipationSituation
desc-ev-1 : DescribedEvent
classifies classifies
classifies classifies
Domain Ontology
Figure 2: The patterns of F, namely (a) participation, (b) mereology, (c) causality, (d) correlation,
(e) documentation, and (f) interpretation and (g) example of applying the F ontology
5.2 Mereology Pattern
Events are commonly considered at different abstrac-
tion levels depending on the view and the knowl-
edge of a spectator. For instance, the event of a
flooded cellar may be considered as such or as part
of the larger event of a flooding in which many other
(smaller) incidents occur. The mereology pattern shown
in Figure 2(b) enables expressing such mereological
relations as composition of events. The composite
event is the “whole” and the component events are its
“parts”. Formally, a F:EventCompositionSituation
includes one instance of an event that is clas-
sified by the concept F:Composite and many
events classified as its F:Component(s). Accord-
ingly, an EventCompositionSituation satisfies a
F:CompositionDescription that defines the con-
cepts Composite and Component for classifying the com-
posite event and its component events.
Events that play the Component role may be further
qualified by temporal, spatial, and spatio-temporal con-
straints. As events are formally defined as entities that
exist in time and not in space (cf. Section 2), constraints
including spatial restrictions are expressed through the
objects participating in the component event. For in-
stance, a Component event may be required to occur
within a certain time-interval, e.g., the second week
of June 2009. Depending on its objects, a Component
event may also happen in a certain spatial region. For
example, the flooding of a town should be composed of
events that have objects associated to it, which have
some certain range of longitude and latitude. Finally,
events and the objects bond to it may be qualified by a
spatio-temporal quality like the progress of a flood that
extents over time and space, starting with a high water
level located in some area of a river and extending spa-
tially over time into other areas. Any such constraints
are formally expressed by one or multiple instances of
the F:EventCompositionConstraint. Thus, with the
composition pattern, events may be arbitrarily tempo-
rally related to each other, i.e., they might be disjoint,
Figure 1: Participation pattern in Event-Model-F
We argue that question answering over conceptually rich
KBs (e.g. proprietary models for maintaining care recip-
ients knowledge in clinical institutions and organisations)
poses additional challenges, as it requires support for both
complex NL questions that involve rich relational contexts,
and for flexible, context-aware question interpretation and
answering paradigms. Consider as an example, the Event-
Model-F [24] patterns. They extend DOLCE-DnS Ultralite
and provide conceptual models for representing contextual
knowledge about events, such as the participating entities
(see Figure 1), as well as causal, correlative and mereolog-
ical relations between events. Specialising the reified n-ary
relational context semantics of DnS, the resulting event de-
scriptions comprise highly axiomatised and rich structures,
whose effective querying relies on coping with NL questions
that allow capturing complex relations between entities and
their respective roles.
Moreover, the additional annotation layer embodies, and
consequently hides, direct contextual links among resources
(e.g. between participants and events) hindering the auto-
mated generation of effective query patterns and bindings,
unless the structure and axiomatization of the patterns is
taken into account following domain-specific solutions, as in
[9, 5]. The above shortcoming is evident even in conceptually
simpler KBs. For instance, the Web of Know-How dataset5
[21] contains activities and instructions collected from Wi-
kiHow and Snapguide. Although the vocabulary used to rep-
resent this knowledge is relatively simple (PROHOW6), the
generated instantiations encapsulate rich axiomatizations.
Figure 2 presents an example instantiation for capturing in-
formation about a recipe, which can be easily reused in our
domain to model the way individuals perform certain activ-
ities. In this example, there is a conceptual gap between
the semantics and structure of the questions (i.e. how to
make pancakes) and the way information is captured in the
dataset. In such cases, domain knowledge is needed to fur-
ther drive the generation of (possibly) multiple queries to
extract the required information from the patterns.
In contrast to the aforementioned approaches, our work
builds upon and extends relevant paradigms in frame-based
knowledge extraction from text and graph-driven query ma-
tching, explicitly addressing QA over pattern-based KBs.
For capturing the NL questions semantics, we adhere to on-
tology design patterns principles, but advance related works
by opting for reified event and situation representations that
extend the DnS pattern and take into account the ontolog-
How to
Make a Pancake
1. Prepare the mix
2. Pour the mix
in a hot pan
3. Cook until golden
Make a Pancake
Prepare the mix Cook until golden
Pour the mix
in a hot pan
● Eggs
● Milk
● Flour
:requires :requires
Prefix Full Namespace
rdfs: dbpedia:Pancake
Figure 2: Example instantiation of PROHOW
ical types of the frames. For query interpretation, we have
been inspired by the graph traversal paradigm, which we en-
dowed with context-awareness, so that, given a set of query
concepts and entities, we can assign context connections, i.e.
links among groups of KB triples that satisfy the question.
Our aim is to decouple graph expansion from predicate rank-
ing , since in pattern-based modelling, additional layers of
axiomatisation are introduced that encapsulate conceptual
dependencies and links among resources. These dependen-
cies are usually not relevant to the structure and semantics
of questions and thus, cannot be uncovered by graph expan-
sion approaches that are based on predicate ranking.
Capturing the semantics of the natural language user in-
puts consists of the identification of the pertinent entities
and their interrelations, and their subsequent formulation
into corresponding semantic representations. In the follow-
ing, we first present the NLP tools used for frame-based
knowledge extraction and then detail the approach for trans-
lating the extracted linguistic structures into OWL graphs.
3.1 Linguistic Analysis
To extract linguistic frame-based representations from the
NL user inputs we use the TALN frame semantics parser7.
User inputs are first encoded as semantic predicate-argument
structures that abstract away from syntactic variations and
language-specific grammatical idiosyncrasies by graph trans-
ducers [4] that allow us to incrementally abstract from surface-
syntactic dependencies to deep-syntactic ones, and eventu-
ally to semantic ones. Next, availing of SemLink8mappings
between frame resources, the previously extracted predicate-
argument structures are enriched with frame and frame el-
ements annotations. In addition, Babelfy [20] is used for
entity linking and word sense disambiguation against Babel-
Net9, a multilingual semantic network that integrates several
knowledge resources including WordNet and Wikipedia.
7 parser
3.2 Translation rules
The frame-based representations extracted during the lin-
guistic analysis step abstract the NL user inputs with respect
to conceptual structures (frames) that describe particular
types of situations, objects, or events along with their par-
ticipants (frame element fillers) and their roles (frame el-
ements). For example, the Apply heat frame describes a
cooking situation involving, among others, a Cook, some
Food and a Heating Instrument; the roles of the involved
participants, i.e. cook, food and heating instrument, com-
prise the frame elements (FEs) of the frame, while words
that evoke it, such as fry, bake, boil, and broil, its lexical
units (LUs) [23].
Inspired by [12] that explicates frame semantics in view of
the Descriptions and Situations ontology pattern, we opt for
a reified representation of the n-ary conceptual structures
denoted by frames, interpreting frames as dul:Descriptions,
frame elements as dul:Concepts, and the extracted frame
occurrences as dul:Situations. This view is in line with
FrameNet’s intended semantics according to which “Frames
describe classes of situations, the semantics of LUs are sub-
classes of the Frames, and (...) FEs are classes that are
arguments of the Frame classes”, where the term “Frame El-
ement” has two meanings, namely “the relation itself, and
the filler of the relation.” [23]. However, the conceptual dis-
parities between the linguistic considerations underpinning
Frame-Net’s intended semantics and knowledge engineering
practices require a certain extent of re-engineering in order
to obtain well-defined ontological representations.
Towards this end, we adopted a refined interpretation that
takes into account the ontological type of the considered
frames. Currently, we distinguish between frames that de-
note event-centric situations (e.g. Ingestion,Grooming), at-
tributive ones (e.g. Age,Usefulness,Measure volume), and
frames that relate to objects (e.g. Artifact,Food).
Event frame situations are captured as specialisations of
the class EventFrameSituation, which is defined as follows:
EventFrameSituation rdfs:subClassOf (
dul:Situation and
dul:satisfies some EventFrameDescription )
EventFrameDescription rdfs:subClassOf (
dul:Description and
dul:defines some InvolvedEvent )
For each extracted event frame occurrence, an instance
of the respective frame situation class is introduced along
with corresponding instance assertions for each of the par-
ticipating entities, including the lexical unit that evoked
the frame. dul:isSettingFor assertions are used to link the
frame situation individual with the rest, while respective
dul:classifiedBy assertions are used to described the seman-
tic roles of the participating entities; the lexical unit class is
further typed as a subclass of dul:Event. Thus, for exam-
ple, the sentence “Ann drinks coffee” would result, among
others, in the following assertions:
:IngestionFrame rdfs:subClassOf dul:Situation .
:ingestion1 rdf:type :IngestionFrame;
dul:isSettingFor :drink1 , :coffee1 , :Ann.
:Drink rdfs:subClassOf dul:Event .
:coffee1 dul:classifiedBy :ingestibles1 .
To capture attributive frames, we have introduced the
classes AttributeFrameSituation vFrameSituation and
AttributeFrameDescription vFrameDescription, while re-
spective specialisations allow distinguishing between relative
and absolute attribute descriptions. For example, absolute
attribute descriptions specialise the following definition:
AbsoluteAttributeDescription rdfs:subClassOf (
dul:Description and
dul:defines some Attribute and
dul:defines some dul:Region and
dul:defines some dul:UnitType )
Lacking the descriptive contexts pertinent to event and at-
tribute frames, frames related to objects are treated as spe-
cialisations of the class dul:Entity, also augmented with
BabelNet and WordNet sense information.
Last, as the application context of the proposed NL inter-
face for pattern-based KBs is part of a socially competent
communicative agent, the generated semantic representa-
tions of the NL user input capture also information on speech
act types. To this end, we have introduced the SpeechAct
class, which specialising dul:Situation serves as a container
for the FrameSituation objects included in a user utter-
ance. Currently, we distinguish between requesting and in-
forming speech acts using the classes InformSpeechAct and
RequestSpeechAct respectively.
Context extraction involves the semantic interpretation
of the analysed user question (Section 3) and the subse-
quent extraction, from the KB, of knowledge that satisfies
the query context. In the rest of this section, we describe
the steps involved in identifying key query concepts, their
mapping on KB entities and the extraction of meaningful
context from the KB that contextually answers the initial
4.1 Extraction of Key Entities
The first step of the algorithm is to extract the key entities
of question analysis. As key entities, we define the entities
that participate in DnS classification relations, since such
axiomatizations encapsulate information about the context
of questions. The key entities can be straightforwardly ex-
tracted by traversing the frame situation model, collecting
the resources classified through dul:classifies property
assertions. Assuming that kis a key entity, xis a resource
and Fis the language analysis model, the set Kwith all the
key entities is defined as:
K={k|hxdul:classifies ki,xF}
4.2 Resource Identification
Having extracted the key entities K, the next step is to
assign URIs to each kK. As described in Section 3,
using Babelfy each classified entity is assigned to a Word-
Net synset. These annotations are used to detect entities
(synonyms) in the KB that will drive the resource unfolding
process described in Section 4.3. Assuming that label(r) is
the label of resource rKB,syn(k) is the synset of key
entity kKand σis a similarity function, the set S(k) of
all the relevant resources to kis defined as:
S(k) = argmaxkKσ(k , label(r))
The current implementation uses the UMBC Semantic
Similarity Service [13] for simplicity, a ready-to-use service
that calculates the semantic similarity σbetween kand
label(r) combining Latent Semantic Analysis (LSA) word
similarity and WordNet knowledge. The output of this step
is the multiset Sthat contains the sets of all relevant re-
sources of key entities in K:
4.3 Resource Unfolding and Local Context
The next step is to define the local context for each entity
k0S(k) that captures information relevant to the neigh-
bouring resources (triples) of k0. Therefore, the local con-
text is built by taking into account all the connected triples
with k0, without examining the similarity of the predicate
labels to entities and resources extracted through language
analysis. This approach ensures that the local contexts con-
tain information that is part of the conceptual model of the
pattern, which is important since it encapsulates implicit
contextual relations among key entities and their mappings
that should not be ignored. For example, the question “How
to make a pancake” does not directly entail that the predi-
cates requires or has_method (Figure 2) should be part of
the graph expansion algorithm, unless domain knowledge is
taken into account.
Based on the mappings generated in the previous step, the
local context generation task iteratively unfolds a resource
k0, traversing the KB vocabulary and collecting triples hs, p, oi
whose subject, predicate or object is linked to k0. A thresh-
old his used to filter out triples that are more than hprop-
erty assertions away from the element. More specifically, the
local context Xk0of resource k0is defined as:
Xk0={hs, p, oi | k0h
→ hs, p, oi,k0S(k)},
where k0h
→ hs, p, oidenotes all the triples directly or indi-
rectly connected with k0, up to hproperty assertions away.
Intuitively, the aim is to enrich local contexts with additional
contextual triples from the neighbourhood of key resource
k0S(k) in the KB. By computing the local context of
each k0, we create the set Xof all the local contexts relevant
to the question, i.e. X={Xk0,k0S(k)}.
4.4 Context Links
Based on the local contexts Xobtained in the previous
section, the next step is to define context links. Intuitively, a
context link captures a contextual dependency between two
local contexts, with respect to the contained triples. For
example, if two local contexts contain triples that share at
least one common subject, predicate or object, then a con-
textual dependency is detected and the two local contexts
are linked. OWL schema predicates (e.g. rdfs:domain) or
classes (e.g. owl:Thing) are ignored during triple resource
matching, in order not to generate generic, contextless de-
pendencies among local contexts. More specifically, two lo-
cal contexts Xkand Xmare linked, denoted as Xk7−Xm,
if ∃hsa, pa, oai ∈ Xk,∃hsb, pb, obi ∈ Xm, such that sa=
4.5 Context Ranking and Responses
The final step of the algorithm is to traverse the paths
defined by context links Xk7−Xl... 7−Xn, collecting the
triples hs, p, oiof local contexts in order to generate possible
contextual responses. Intuitively, this step merges the local
contexts of different key entities, capitalizing on the con-
textual dependencies identified in the previous step. More
specifically, a response multiset Ris defined as:
R={XkXl... Xn|Xk7−Xl... 7−Xn,Xi X }
Each response set R∈ R is semantically and structurally
compared to language analysis results in order to rank them
and select the most plausible context as final response to the
input question. The ranking is based on two criteria:
semantic similarity of triple resources in Rwith the
key concept multiset S.
structural similarity of resource relations in Rwith the
relations generated through language analysis.
More specifically, semantic similarity (ϕ) is computed tak-
ing into account the type of the resources that participate
in ABox assertions (1). Intuitively, the multiset Sof all
key concepts (might be ontology classes, properties or in-
stances) that have been identified in Section 4.2 are seman-
tically compared to resources in each R.
ϕ(S, R) = X
rR,k0Sδ(r, k0)
|S| (1)
We use the δfunction (2) to compute the similarity of a
key concept k0against a resource rof a triple in Ras:
δ(k0, r) =
1,if rvk0(includes rk0)
|U(r)|,if k0vr
If k0and rare classes, then their similarity derives based
on their hierarchical relationship. A class rexactly matches
a class k0, if it is equivalent to k0or if it a subclass of k0. On
the other hand, if k0is subsumed by r, then ris a more gen-
eral concept than k0and the similarity is computed based
on the rate of the superclasses of rthat are also superclasses
of k0.U(C) is defined as the set of the superclasses of C, ex-
cluding owl:Thing, such that U(C) = {A|CvA, A 6=>}.
If k0and rare instances (or properties), then the similarity
derives based on resource equality () (property hierarchies
are not taken into account).
Semantic similarity takes into account only the type of re-
sources involved in a response, without examining their con-
nectivity. Structural similarity is used in order to favour re-
sponses whose structural relations of resources better reflect
the key concept relations derived through language analysis.
For example, if the key concepts water and temperature are
connected in the language analysis results, then responses
will be preferred where the corresponding resources are also
connected (the distance between the resources is not taken
into account). More specifically, assuming that LCis the set
with language analysis resource connections [ra, rb] and RC
is the set with response resource connections [r1, r2], their
similarity is given by (3) and (4).
γ(RC, LC) = X
δ0([r1, r2], LC)
δ0([r1, r2], LC) = (1,if [r1, r2]LC
0,otherwise (4)
The overall score of the response context R∈ R with re-
spect to the multiset Swith language analysis key concepts
and the set LCwith language analysis resource connections
is defined as the weighted mean sim of ϕand γas:
sim(R,S,LC) = a·ϕ(S, R) + b·γ(RC, LC)
where aand bare normalized weights in [0..1], enabling the
empirical adjustment of context ranking criteria. For ex-
ample, a bweight close to 0 indicates a relaxed policy re-
garding structural similarity, enabling the return of contex-
tual triples that are not necessarily part of the question. In
contrast, a bweight close to 1 reflects a more strict policy
to structural similarity, where additional contextual triples
negatively affect the overall similarity.
To illustrate the question analysis and context extraction
capabilities of the proposed framework, we use the prefer-
ence pattern of Figure 3 and the user question “How often
does Ann like to drink coffee?”.
5.1 User Preference Pattern
An important modelling aspect of user’s behaviour is the
availability of rich information about various activities of
daily living (ADL). Figure 3 depicts the instantiation of the
DnS pattern to capture the coffee drinking preferences of
Ann. More precisely, the instantiation of DnS in DUL in-
volves the definition of situation and description instances.
The latter defines one ore more concepts that may further
classify entities, describing in that way the context of a given
situation of interest. That said, the preference pattern of
the example defines the Preference situation (Preference
vdul:Situation) and two domain concepts (Drinkable
and Ingredient) for the classification of DUL entities that
are involved in this pattern, i.e. coffee and milk. The
dul:EventType is reused to classify the Drink event/class10
and the Frequency concept to designate the frequency.
In addition, following the conceptual example of Event-
Model-F, the situation instance is further associated through
dul:isSettingFor property assertions with the entities that
are classified by concepts. Instead of manually defining such
relations, the preference pattern uses the property chain ax-
iom: describes def ines classifies visSettingF or.
5.2 Question Analysis
Applying the afore-described question analysis method-
ology, the resulting user input knowledge graph comprises
information about the speech act type (i.e. request) and the
encompassed frame situation occurrences, as shown in the
following Turtle extract:
:speechAct1 rdf:type RequestSpeechAct ;
dul:isSettingFor :ingestion1 ,
dul:isSettingFor :frequency1 ,
dul:satisfies :requestDesc1 .
:ingestionSit1 rdf:type IngestionSituation ;
dul:isSettingFor :coffee1 ,
dul:isSettingFor :Ann ,
10In DUL, the dul:EventType concept classifies dul:Event
instances. In this example though, we use a class (Drink),
which conforms to the OWL 2 DL semantics (punning [14]).
Ann drinks daily 2 coffees with milk
rdf:type dul:classifies
Ann rdf:type
dul:isSettingFor dul:isSettingFor
Figure 3: Coffee drinking pattern in DnS
dul:includesEvent :drink1 ,
dul:satisfies :ingestionDesc1 .
:ingestionDesc1 rdf:type IngestionDescription ;
dul:defines :ingestor1,
dul:defines :ingestisble1 .
:frequencySit1 rdf:type FrequencySituation ;
dul:isSettingFor :ingestionSit1 ,
dul:isSettingFor :frequency1 ,
dul:satisfies :frequencyDesc1 .
:drink1 rdf:type Drink ;
dul:classifiedBy :eventType1 .
Drink owl:equivalentClass wn30-synset%drink-verb-1 .
:coffee1 rdf:type Coffee ;
dul:classifiedBy :ingestible1 .
Coffee owl:equivalentClass wn30-synset%coffee-noun-
1 .
:Ann rdf:type dul:Person ;
dul:classifiedBy :ingestor1 .
:frequency1 dul:classifiedBy :request1 .
5.3 Context Extraction
The first step of the procedure is to extract the key en-
tities recognized through question analysis. Based on the
results described in Section 5.2, the following key entities
are extracted
K={AnnF, drink1F, cof f ee1F, freq uency1F},
which are mapped on the following resources in the KB:
S={AnnKB , Dr inkKB, cof f eeKB, F requencyKB }
The next step is to unfold the resources and create the
local contexts. We use h= 2 and we omit the triples of local
contexts, presenting only the connected node ids (illustrated
in Figure 3) for presentation purposes. As such, we have the
following local contexts for each mapped resource x0∈ S:
XAnnKB ={n2, n17 , n5, n9, n12, n3}
XDrinkKB ={n2, n17 , n5, n9, n12, n3, n1, n4, n6}
Xcoff eeK B ={n7, n2, n17, n5, n9, n12, n3, n1, n8}
XF requencyKB ={n3, n15 , n14, n3}
All local contexts share at least one common resource.
Therefore, all local contexts are pairwise linked: XAnnKB 7−
XDrinkKB ,XAnnK B 7−Xcof f eeKB ,XAnnK B 7−XF requencyK B
(and so on). As such, we have a response multiset Swith a
single response R∈ S:
R={n1, n2, n3, n4, n5, n6, n7, n8, n9, n12, n13 , n14, n15, n16 , n17}
The semantic similarity equals to 1, since all key entities
are exactly matched to the resources of the response. Re-
garding structural similarity, we can observe that the milk
resource (n12) is also returned, which is not part of the en-
tities detected through question analysis. This is an ex-
ample of additional contextual information that may be re-
turned by our framework and can be controlled through the
bweight: a high bweight (e.g. “1”) would reduce the final
similarity of the response, penalizing the additional context
that is not part of question analysis results.
As such, the framework provides a contextual response to
the question, returning not just a plain value (e.g. “2”in this
example), but also the semantics of the answer, e.g. “2 times
daily”, in a formal pattern-based manner. Such pattern-
based responses foster their further processing in different
application scenarios, e.g. in dialogue-based systems where
agents need to interpret responses and act accordingly, or
for generating verbal responses.
Question answering over conceptually complex, pattern-
based KBs aggravates further the challenges involved in cop-
ing with NL queries over Semantic Web data, as the under-
lying rich and encapsulated semantics accentuate the need
for accurately capturing the semantic structure of complex
user questions, while urging for flexible, context-aware query
interpretation. In this work, we presented a framework to-
wards QA over pattern-based user models that combines the
frame-based reified representation of NL questions with a
context-aware, graph-based paradigm for interpreting them
against KBs and identifying pertinent answers.
We are currently building rich KBs capturing user models
of participants in KRISTINA11 pilots. The collected data
will allow us to evaluate our framework with realistic data,
identifying possible limitations that have not been foreseen
so far. In parallel, we are working towards further enrich-
ment of the analysis and interpretation of complex relational
context so as to support for additional constructions, such
as negation, superlatives and aggregation, that will allow for
more expressive QA over the profiled users routines.
This work has been partially supported by the H2020-
645012 project “KRISTINA: A Knowledge-Based Informa-
tion Agent with Social Competence and Human Interaction
[1] N. Aggarwal and P. Buitelaar. A System Description
of Natural Language Query over DBpedia. Proc. of
Interacting with Linked Data, pages 96–99, 2012.
[2] Y. Amsterdamer, A. Kukliansky, and T. Milo. A
Natural Language Interface for Querying General and
Individual Knowledge. Proc. of the VLDB
Endowment, 8(12):1430–1441, 2015.
[3] I. Augenstein, S. Pad´o, and S. Rudolph. LODifier:
Generating Linked Data from Unstructured Text.
Proc. of Extended Semantic Web Conference, pages
210–224, 2012.
[4] M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner.
Data-driven deep-syntactic dependency parsing.
Natural Language Engineering, pages 1–36, 2015.
[5] P. Cimiano, P. Haase, and J. Heizmann. Porting
Natural Language Interfaces between Domains – An
Experimental User Study with the ORAKEL System.
Proc. of Intelligent User Interfaces, pages 180–189,
[6] F. Corcoglioniti, M. Rospocher, and A. P. Aprosio. A
2-phase Frame-based Knowledge Extraction
Framework. Proc. of ACM Symposium on Applied
Computing, pages 354–361, 2016.
[7] D. Damljanovic, M. Agatonovic, and H. Cunningham.
FREyA: An Interactive Way of Querying Linked Data
Using Natural Language. Proc. of Extended Semantic
Web Conference Workshops, pages 125–138, 2011.
[8] R. de Almeida Falbo, M. P. Barcellos, J. C. Nardi, and
G. Guizzardi. Organizing Ontology Design Patterns as
Ontology Pattern Languages. Proc. of Extended
Semantic Web Conference, pages 61–75, 2013.
[9] A. Frank, H.-U. Krieger, F. Xu, H. Uszkoreit,
B. Crysmann, B. J¨
org, and U. Sch¨
afer. Question
Answering from Structured Knowledge Sources.
Journal of Applied Logic, 5(1):20 – 48, 2007.
[10] A. Freitas, J. G. Oliveira, S. O’Riain, J. C. da Silva,
and E. Curry. Querying linked data graphs using
semantic relatedness: A vocabulary independent
approach. Data & Knowledge Engineering, 88:126 –
141, 2013.
[11] A. Gangemi. Ontology Design Patterns for Semantic
Web Content. Proc. of International Semantic Web
Conference, pages 262–276, 2005.
[12] A. Gangemi. What’s in a Schema? C. Huang, N.
Calzolari, A. Gangemi, A. Lenci, A. Oltramari, and L.
Prevot, editors, Ontology and the Lexicon. Cambridge
University Press, 2010.
[13] L. Han, A. Kashyap, T. Finin, J. Mayfield, and
J. Weese. UMBC EBIQUITY-CORE: Semantic
Textual Similarity Systems. Proc. of Joint Conference
on Lexical & Computational Semantics, pages 44–52,
[14] N. Jekjantuk, G. Gr¨
oner, and J. Z. Pan. Modelling
and Reasoning in Metamodelling Enabled Ontologies.
Proc. of Knowledge Science, Engineering and
Management, pages 51–62, 2010.
[15] H. Kamp and U. Reyle. From Discourse to Logic.
Dordrecht: Kluwer Academic Publishers, 1993.
[16] E. Kaufmann, A. Bernstein, and L. Fischer.
NLP-Reduce: A “naive” but Domain-independent
Natural Language Interface for Querying Ontologies.
Proc. of Extended Semantic Web Conference, 2007.
[17] V. Lopez, M. Fern´andez, E. Motta, and N. Stieler.
PowerAqua: Supporting users in querying and
exploring the Semantic Web. Semantic Web,
3(3):249–265, Aug. 2012.
[18] V. Lopez, C. Unger, P. Cimiano, and E. Motta.
Evaluating question answering over linked data. Web
Semantics Science Services And Agents On The World
Wide Web, 21:3–13, 2013.
[19] V. Lopez, V. Uren, M. Sabou, and E. Motta. Is
Question Answering fit for the Semantic Web?: A
Survey. Semantic Web, 2(2):125–155, Apr. 2011.
[20] A. Moro, A. Raganato, and R. Navigli. Entity Linking
meets Word Sense Disambiguation: a Unified
Approach. TACL, 2:231–244, 2014.
[21] P. Pareti, B. Testu, R. Ichise, E. Klein, and A. Barker.
Integrating Know-How into the Linked Data Cloud.
Proc. of Knowledge Engineering and Knowledge
Management, pages 385–396. 2014.
[22] V. Presutti, F. Draicchio, and A. Gangemi. Knowledge
Extraction Based on Discourse Representation Theory
and Linguistic Frames. Proc. of Knowledge
Engineering and Knowledge Management, pages
114–129, 2012.
[23] J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck,
C. R. Johnson, and J. Scheffczyk. FrameNet II:
Extended Theory and Practice, 2010,
[24] A. Scherp, T. Franz, C. Saathoff, and S. Staab. F–A
Model of Events based on the Foundational Ontology
DOLCE+DnS Ultralite. Proc. of Knowledge Capture
(K-CAP), pages 137–144, 2009.
[25] S. Shekarpour, S. Auer, A.-C. N. Ngomo, D. Gerber,
S. Hellmann, and C. Stadler. Keyword-driven
SPARQL Query Generation Leveraging Background
Knowledge. Proc. of Web Intelligence and Intelligent
Agent Technology, pages 203–210, 2011.
[26] C. Unger, L. B¨
uhmann, J. Lehmann, A.-C.
Ngonga Ngomo, D. Gerber, and P. Cimiano.
Template-based Question Answering over RDF Data.
Proc. of International Conference on World Wide
Web, pages 639–648, 2012.
[27] C. Unger and P. Cimiano. Pythia: Compositional
meaning construction for ontology-based question
answering on the Semantic Web. Proc. of Applications
of Natural Language to Information Systems, pages
153–160, 2011.
[28] C. Unger, A. Freitas, and P. Cimiano. An introduction
to Question Answering over Linked Data. Reasoning
Web Summer School, pages 100–140, 2014.
[29] R. Usbeck, A.-C. N. Ngomo, L. B¨
uhmann, and
C. Unger. HAWK – Hybrid Question Answering Using
Linked Data. Proc. of European Semantic Web
Conference, pages 353–368. 2015.
[30] W. Zheng, L. Zou, X. Lian, J. X. Yu, S. Song, and
D. Zhao. How to Build Templates for RDF
Question/Answering: An Uncertain Graph Similarity
Join Approach. Proc. of International Conference on
Management of Data, pages 1809–1824, 2015.
[31] C. Zhu, K. Ren, X. Liu, H. Wang, Y. Tian, and Y. Yu.
A Graph Traversal-based Approach to Answer
Non-Aggregation Questions over DBpedia. Proc. of
Joint International Conference on Semantic
Technology, pages 219–234, 2015.
[32] L. Zou, R. Huang, H. Wang, J. X. Yu, W. He, and
D. Zhao. Natural Language Question Answering over
RDF: A Graph Data Driven Approach. Proc. of ACM
SIGMOD International Conference on Management of
Data, pages 313–324, 2014.
... The module utilises pattern-based models [17] to update domain models with new information communicated through the human-system interaction and inform the dialogue history with identified entities and topics at each dialogue turn. Moreover, it translates the system actions into actionable rules (SPARQL queries), which are then used to retrieve pertinent information from the underlying KB. ...
Conference Paper
Full-text available
This paper introduces a virtual assistant framework that combines knowledge-based and statistical techniques to produce meaningful task-oriented conversations that are enhanced by "chatty" style dialogues in order to increase system's naturalness and user engagement. The paper describes how appropriate ontologies, semantic reasoning, dialogue management and policy learning techniques can be linked together and integrated through the dialogue process to enable a) the internal representation of the conversational state, b) the conversational awareness that drives the retrieval of appropriate information from the Knowledge Base (KB) and the inference of unrelated system actions with the current conversational state, and c) the dynamic selection of the most appropriate strategy at each dialogue turn, tackling both informational and social-related needs of individuals. The framework is exemplified by a use case from the healthcare domain where companionship and supportive care-related services are prerequisites for an efficient human-system interaction through a conversational agent.
... In [4] the issue of question answering in community question answering (CQA) which gathers information from community sites is addressed. In [5] ontology-driven framework is proposed for natural language question answering using user models that are gathered with the help ontology design patterns. Social Question answering is proposed in [6]. ...
Full-text available
Question Answering (QA) system is a combination of Information Retrieval(IR) and Natural Language Processing (NLP) techniques. It returns a specific answer in response to user question. However, a system that can interact with the user to clarify and refine the answer is required. We propose QA system that adopts a user model for adaptation and a dialogue interface for interaction with the user combined with information retrieval and natural language techniques for Arabic Language. Our system will be able to handle users' questions in natural language and to present answers in in respect to the user's preferences and expected needs. The system achieved a precision of 82.05% and a dialogue success rate of 71.6%. The result is highly promising. As an extension for the present work, we need to make the system more adaptive and capable to learn and evolve with every new interactive scenario. Abstract-Question Answering (QA) system is a combination of Information Retrieval(IR) and Natural Language Processing (NLP) techniques. It returns a specific answer in response to user question. However, a system that can interact with the user to clarify and refine the answer is required. We propose QA system that adopts a user model for adaptation and a dialogue interface for interaction with the user combined with information retrieval and natural language techniques for Arabic Language. Our system will be able to handle users' questions in natural language and to present answers in in respect to the user's preferences and expected needs. The system achieved a precision of 82.05% and a dialogue success rate of 71.6%. The result is highly promising. As an extension for the present work, we need to make the system more adaptive and capable to learn and evolve with every new interactive scenario.
Full-text available
Dialogue-based systems often consist of several components, such as communication analysis, dialogue management, domain reasoning, and language generation. In this paper, we present Converness, an ontology-driven, rule-based framework to facilitate domain reasoning for conversational awareness in multimodal dialogue-based agents. Converness uses Web Ontology Language 2 (OWL 2) ontologies to capture and combine the conversational modalities of the domain, for example, deictic gestures and spoken utterances, fuelling conversational topic understanding, and interpretation using description logics and rules. At the same time, defeasible rules are used to couple domain and user-centred knowledge to further assist the interaction with end users, facilitating advanced conflict resolution and personalised context disambiguation. We illustrate the capabilities of the framework through its integration into a multimodal dialogue-based agent that serves as an intelligent interface between users (elderly, caregivers, and health experts) and an ambient assistive living platform in real home settings.
Full-text available
‘Deep-syntactic’ dependency structures that capture the argumentative, attributive and coordinative relations between full words of a sentence have a great potential for a number of NLP-applications. The abstraction degree of these structures is in between the output of a syntactic dependency parser (connected trees defined over all words of a sentence and language-specific grammatical functions) and the output of a semantic parser (forests of trees defined over individual lexemes or phrasal chunks and abstract semantic role labels which capture the frame structures of predicative elements and drop all attributive and coordinative dependencies). We propose a parser that provides deep-syntactic structures. The parser has been tested on Spanish, English and Chinese.
Conference Paper
We present a question answering system over DBpedia, filling the gap between user information needs expressed in natural language and a structured query interface expressed in SPARQL over the underlying knowledge base (KB). Given the KB, our goal is to comprehend a natural language query and provide corresponding accurate answers. Focusing on solving the non-aggregation questions, in this paper, we construct a subgraph of the knowledge base from the detected entities and propose a graph traversal method to solve both the semantic item mapping problem and the disambiguation problem in a joint way. Compared with existing work, we simplify the process of query intention understanding and pay more attention to the answer path ranking. We evaluate our method on a non-aggregation question dataset and further on a complete dataset. Experimental results show that our method achieves best performance compared with several state-of-the-art systems.
Conference Paper
We present an approach for extracting knowledge from natural language English texts where processing is decoupled in two phases. The first phase comprises several standard NLP tasks whose results are integrated in a single RDF graph of mentions. The second phase processes the mention graph with SPARQL-like mapping rules to produce a knowledge graph organized around semantic frames (i.e., prototypical descriptions of events and situations). The decoupling allows: (i) choosing different tools for the NLP tasks without affecting the remaining computation; (ii) combining the outputs of different NLP tasks in non-trivial ways, leveraging their integrated and coherent representation in a mention graph; and (iii) relating each piece of extracted knowledge to the mention(s) it comes from, leveraging the single RDF representation. We evaluate precision and recall of our approach on a gold standard, showing its competitiveness w.r.t. the state of the art. We also evaluate execution times and (sampled) accuracy on a corpus of 110K Wikipedia pages, showing the applicability of the approach on large corpora.
Conference Paper
The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often requires combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach uses predicate-argument representations of questions to derive equivalent combinations of SPARQL query fragments and text queries. These are executed so as to integrate the results of the text queries into SPARQL and thus generate a formal interpretation of the query. We present a thorough evaluation of the framework, including an analysis of the influence of entity annotation tools on the generation process of the hybrid queries and a study of the overall accuracy of the system. Our results show that HAWK achieves 0.68 respectively 0.61 F-measure within the training respectively test phases on the Question Answering over Linked Data (QALD-4) hybrid query benchmark.
Conference Paper
A challenging task in the natural language question answering (Q/A for short) over RDF knowledge graph is how to bridge the gap between unstructured natural language questions (NLQ) and graph-structured RDF data (GOne of the effective tools is the "template", which is often used in many existing RDF Q/A systems. However, few of them study how to generate templates automatically. To the best of our knowledge, we are the first to propose a join approach for template generation. Given a workload D of SPARQL queries and a set N of natural language questions, the goal is to find some pairs q, n, for q∈ D ∧ n ∈, N, where SPARQL query q is the best match for natural language question n. These pairs provide promising hints for automatic template generation. Due to the ambiguity of the natural languages, we model the problem above as an uncertain graph join task. We propose several structural and probability pruning techniques to speed up joining. Extensive experiments over real RDF Q/A benchmark datasets confirm both the effectiveness and efficiency of our approach.
While the amount of knowledge available as linked data grows, so does the need for providing end users with access to this knowledge. Especially question answering systems are receiving much interest, as they provide intuitive access to data via natural language and shield end users from technical aspects related to data modelling, vocabularies and query languages. This tutorial gives an introduction to the rapidly developing field of question answering over linked data. It gives an overview of the main challenges involved in the interpretation of a user’s information need expressed in natural language with respect to the data that is queried. The paper summarizes the main existing approaches and systems including available tools and resources, benchmarks and evaluation campaigns. Finally, it lists the open topics that will keep question answering over linked data an exciting area of research in the years to come.
Many real-life scenarios require the joint analysis of general knowledge, which includes facts about the world, with individual knowledge, which relates to the opinions or habits of individuals. Recently developed crowd mining platforms, which were designed for such tasks, are a major step towards the solution. However, these platforms require users to specify their information needs in a formal, declarative language, which may be too complicated for naïve users. To make the joint analysis of general and individual knowledge accessible to the public, it is desirable to provide an interface that translates the user questions, posed in natural language (NL), into the formal query languages that crowd mining platforms support. While the translation of NL questions to queries over conventional databases has been studied in previous work, a setting with mixed individual and general knowledge raises unique challenges. In particular, to support the distinct query constructs associated with these two types of knowledge, the NL question must be partitioned and translated using different means; yet eventually all the translated parts should be seamlessly combined to a well-formed query. To account for these challenges, we design and implement a modular translation framework that employs new solutions along with state-of-the art NL parsing tools. The results of our experimental study, involving real user questions on various topics, demonstrate that our framework provides a high-quality translation for many questions that are not handled by previous translation tools.