Conference PaperPDF Available

Matching Natural Language Relations to Knowledge Graph Properties for Question Answering

Authors:
  • IBM Research Africa
  • Cerence Gmbh and Zerotha Research

Abstract and Figures

Research has seen considerable achievements concerning translation of natural language patterns into formal queries for Question Answering (QA) based on Knowledge Graphs (KG). One of the main challenges in this research area is about how to identify which property within a Knowledge Graph matches the predicate found in a Natural Language (NL) relation. Current approaches for formal query generation attempt to resolve this problem mainly by first retrieving the named entity from the KG together with a list of its predicates, then filtering out one from all the predicates of the entity. We attempt an approach to directly match an NL predicate to KG properties that can be employed within QA pipelines. In this paper, we specify a systematic approach as well as providing a tool that can be employed to solve this task. Our approach models KB relations with their underlying parts of speech, we then enhance this with extra attributes obtained from Wordnet and Dependency parsing characteristics. From a question, we model a similar representation of query relations. We then define distance measurements between the query relation and the properties representations from the KG to identify which property is referred to by the relation within the query. We report substantive recall values and considerable precision from our evaluation.
Content may be subject to copyright.
Matching Natural Language Relations to Knowledge Graph
Properties for estion Answering
Isaiah Onando Mulang’
University of Bonn
Bonn, Germany
mulang@iai.uni-bonn.de
Kuldeep Singh
Fraunhofer IAIS
Sankt Augustin, Germany
kuldeep.singh@iais.fraunhofer.de
Fabrizio Orlandi
Fraunhofer IAIS
Sankt Augustin, Germany
orlandi@iai.uni-bonn.de
ABSTRACT
Research has seen considerable achievements concerning transla-
tion of natural language paerns into formal queries for estion
Answering (QA) based on Knowledge Graphs (KG). One of the main
challenges in this research area is about how to identify which prop-
erty within a Knowledge Graph matches the predicate found in
a Natural Language (NL) relation. Current approaches for formal
query generation aempt to resolve this problem mainly by rst
retrieving the named entity from the KG together with a list of its
predicates, then ltering out one from all the predicates of the entity.
We aempt an approach to directly match an NL predicate to KG
properties that can be employed within QA pipelines. In this paper,
we specify a systematic approach as well as providing a tool that can
be employed to solve this task. Our approach models KB relations
with their underlying parts of speech, we then enhance this with
extra aributes obtained from Wordnet and Dependency parsing
characteristics. From a question, we model a similar representation
of query relations. We then dene distance measurements between
the query relation and the properties representations from the KG
to identify which property is referred to by the relation within
the query. We report substantive recall values and considerable
precision from our evaluation.
KEYWORDS
Knowledge Graph, estion Answering, Relation Extraction
ACM Reference format:
Isaiah Onando Mulang’, Kuldeep Singh, and Fabrizio Orlandi. 2017. Match-
ing Natural Language Relations to Knowledge Graph Properties for es-
tion Answering. In Proceedings of Semantics2017, Amsterdam, Netherlands,
September 11–14, 2017, 8 pages.
DOI: 10.1145/3132218.3132229
1 INTRODUCTION
e constantly growing amount of data and information available
on the Web is driving research eorts on new ecient solutions
for nding the right information in ever increasing data sources.
estion Answering (QA) systems, which automatically translate
natural language questions posed by humans into complex queries
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permied. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
Semantics2017, Amsterdam, Netherlands
©
2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
978-1-4503-5296-3/17/09. . . $15.00
DOI: 10.1145/3132218.3132229
over knowledge bases, facilitate users’ access to increasingly large
and complex knowledge bases. Despite their apparent success with
popular commercial products such as the Google Assistant
1
and
Amazon’s Alexa2, QA systems still present many challenges3.
Typical QA processes consist of dierent tasks such as named
entity identication, named entity disambiguation, relation extrac-
tion and linking, query generation, query processing and answer
generation [
25
]. In this work we focus on the particular step of
relation extraction for natural language questions. We can dene this
as the process required to identify semantic relations between named
entities within a question expressed in natural language. Relation
extraction is not a new topic in the Natural Language Processing
(NLP) research eld [
31
]. However, novel solutions are currently
being investigated when aempting to answer natural language
questions using facts contained in a Knowledge Graph (KG) [18].
Knowledge graphs, such as DBpedia [
1
] or Google’s Knowledge
Graph
4
, are gaining increasing importance especially for QA sys-
tems as they are (i) very extensive sources of facts, (ii) already struc-
tured, (iii) constantly growing/updated and (iv) publicly available
on the Web. However, QA over KGs presents additional challenges:
KGs are usually quite large and dicult to query and process; lexi-
cal forms for relations expressed in a question can dier from those
used in the KG (usually referred to as the Lexical Gap [15]).
Figure 1: estion relations vs KG properties, an example
In addition to these challenges, in this paper we address an addi-
tional relevant problem. estion answering systems implement
QA tasks either by dedicating individual components in its architec-
ture to each task or by combing few tasks together in their imple-
mentation. In component-based QA systems and frameworks like
OKBQA
5
, QANARY [
5
], QALL-ME[
9
], openQA [
19
], researchers
have implemented individual components dedicated to particular
1hps://assistant.google.com/
2hps://developer.amazon.com/alexa
3
See for example relevant research workshops at SIGIR (hp://sigir2017.okbqa .org/)
and ESWC (hps://project-hobbit.eu/challenges/qald2017/).
4hps://www.google.com/intl/es419/insidesearch/features/search/knowledge.html
5hp://www.okbqa.org/
Semantics2017, September 11–14, 2017, Amsterdam, Netherlands I.O Mulang’ et al.
tasks. Stanford NER
6
, NERD
7
, Alchemy API, FOX
8
, AGDISTIS
9
are
some of the most popular dedicated tools/components for specic
tasks like named entity recognition, named entity disambiguation
in QA systems. However, to the best of our knowledge, there is no
independent web service/tool/component that performs relation
extraction for natural language questions over KGs. We identify
this as a major research gap in collaborative question answering
system development. e creation of a standalone and reusable
component for relation extraction and linking in this context would
not only facilitate reuse of the component in dierent QA systems
but also create a benchmark for the research community for future
comparison and evaluation.
In this paper we propose a novel approach, and an implementa-
tion, that addresses some of the aforementioned challenges: (1) It is
capable of dealing with large KGs such as DBpedia; (2) It addresses
the lexical gap problem through the combination of dierent sim-
ilarity measures; (3) It is designed as an independent component
that can be easily reused in dierent QA systems.
Current approaches for relation extraction over KG aempt to
rst retrieve from the KG the named entities identied in a question,
together with a list of their KG predicates, then selecting one from
all the predicates of the entity. In our approach, we match natural
language relations (or predicates) extracted from the questions di-
rectly with KG properties that can be employed within QA pipelines.
First, we model KB properties with their underlying parts of speech.
ese are then enhanced with extra aributes obtained from tax-
onomies like Wordnet
10
and dependency parsing characteristics.
Second, from a question, we model query relations using a similar
representation. ird, we dene similarity measures between the
question query relations and the KG properties representations to
identify which property is referred to by each relation within the
question. We exclude usage of PATTY [
12
] which is a large corpus
of relational paerns and associated DBpedia predicates due to
its noisy behavior. For example, in an input question Who is the
wife of Donald Trump?, natural language paern wife of which is
appearing in the question is associated with DBpedia relations like
dbo:parent, dbo:predecessor, dbo:successor, dbo:child, associatedMusi-
cArtist and many other in PATTY corpus. Hence, direct usage of
PATTY knowledge base will cause more noise in retrieved relations
for an input question rather than improving overall performance.
It is important to note that PATTY is not a relation linking tool,
rather a knowledge base for semantically typed relational paerns
which may be used in QA systems to implement relation linking
task.
For our work, we performed evaluation using the QALD-5 dataset
[
27
] which consists of over 400 questions together with the corre-
sponding formal queries (SPARQL) to be applied against DBpedia.
Positive results have been shown with this evaluation in terms of
accuracy (reaching almost 48% precision with questions containing
one relation) but especially in terms of recall values (75% recall
with questions containing one relation).
6hps://nlp.stanford.edu/soware/CRF-NER.shtml
7hp://nerd.eurecom.fr/
8hp://aksw.org/Projects/FOX.html
9hp://aksw.org/Projects/AGDISTIS.html
10
”About WordNet”. WordNet, Princeton University. 2010.
hp://wordnet.princeton.edu
e rest of the paper is structured as follows. Section 2 provides
a concrete example for this work and Section 3 summarizes the
related work for the areas of question answering and relation extrac-
tion. In Section 4 the overall proposed approach is described and in
Section 5 the evaluation setup and the results of the experiments
are explained before concluding the paper.
Figure 2: Problem example with the question: ”What is the
capital of Australia?”
2 MOTIVATING EXAMPLE
We motivate our work by considering a natural language ques-
tion such as ”What is the capital of Australia?” to be asked in a
QA system as shown in Figure 2. For this question, capital of is
the natural language (NL) relation. In QA domain, a relation ex-
traction process goes a step further compared to a typical relation
extraction task in NLP and links the identied relation in an in-
put question to its mentions in a KB (e.g. DBpedia, Freebase etc.)
available on the Web. In our example, the entity Australia has its
DBpedia property dbo:capital which needs to be mapped to the
relation capital of by a relation mapping tool/component of any
question answering system. Hence, the input for a relation map-
ping tool is a NL question and the output is the RDF property in
a knowledge graph of the associated named entity. As such, for
the exemplary question What is the capital of Australia?, the ex-
pected output from a relation linking/extraction tool is the property
hp://dbpedia.org/ontology/capital (when using DBpedia as KB).
3 RELATED WORK
Relation extraction is a well known task in natural language pro-
cessing (NLP). is task was rst formulated as part of the Message
Understanding Conference (MUC) in 1998 [
31
]. In the eld of NLP
and machine learning, researchers have addressed this problem
using dierent approaches. e work in [
31
] introduces a kernel-
based machine learning method for relation extraction in given
natural language text. RelEx [
10
] uses dependency parse trees and
applies a few simples rules to these trees to extract relation from
free text.
Relation extraction in natural language text does nd its ap-
plicability in the eld of question answering. PATTY [
20
] is a
popular work which is used in many question answering systems
for linking relations to its knowledge base properties. PATTY mines
semantically-typed relational paerns from a large corpora. How-
ever, it can not be used directly as a component in a QA system, but
needs to be modied based on individual developer requirements.
For example, AskNow QA system [
7
] has a dedicated component
Matching Natural Language Relations to KG Properties for QA Semantics2017, September 11–14, 2017, Amsterdam, Netherlands
Figure 3: Overall relation matching system architecture: from a question (Q-Text) as input to a ranked list of top K properties
in the KG matching the relations in the input question
for the relation extraction task that uses PATTY as underlying
large corpus to nd semantic relational paerns. TBSL [
26
] and
LODQA [
13
] implement a two step process to directly translate a
natural language question into a SPARQL query. During this trans-
lation process, TBSL uses BOA paern [
11
] identiers to detect
properties (i.e. relations) that connect the entities present in the
input question. Moreover, additional work such as [
30
] presents
a question answering approach using Freebase that implements a
neural network-based relation extractor to retrieve answers from
Freebase. Although these QA approaches implement relation ex-
traction and linking tasks, due to the monolithic implementation
of their QA pipeline, it is not trivial to reuse this specic module
in other QA approaches. For example, reusing it in frameworks
such as OKBQA
11
, QANARY [
5
] and openQA [
19
] that allow QA
developers to build QA systems or pipelines adopting many exist-
ing question answering components. ese frameworks provide
an infrastructure allowing developers to implement QA tasks as
individual modules.
4 APPROACH
We approach the problem of matching NL relation to KB properties
by processing the two complementary sides of the problem, namely
the natural language (query side) and the knowledge graph side. e
aim is to provide a similar representation for both sides that would
lead easily to a comparison. We then employ a set of syntactic and
semantic similarities measures to select which property matches
each relation in the question best. Figure 3
12
depicts the overall
structure of the system.
4.1 KG Properties Expansion
A property in a KG is dened by an directed labeled edge between
two nodes of the graph that is identied via a unique URI. Properties
can be visualized in two levels within a KG, on one level they can
be conceptual as found within the structural denition of the KG.
In this case they connect two concepts that are referred to as the
range and the domain of the property. e domain and range of a
property are conceptual representations of real world entities. A
second view of a property is as a predicate within a factual instance
in the KG. in which the property URI is a link between two entity
11hp://www.okbqa.org/
12
Numbers 4.1 to 4.3 in Figure 3 indicate the respective section in the paper where
each component is described
objects which are themselves instances of the domain and range.
Since the target of our work is to produce a tool that can be used
within QA pipelines, we adopt the rst view in this work. With
the understanding that the second view would demand that the
named entities be rst disambiguated before the properties can be
matched.
We develop a data structure which we refer to as the Expanded
Properties Set (EPS) that contains a URI for each property within
the KG (in our experiment, DBpedia properties), augmented with
characteristics present within the KG and annotations obtained
from syntactic analysis. At this stage (to retain the structure of the
EPS and reduce the memory load time) we only consider extracting
synonyms and hyponyms from a taxonomy like Wordnet and ignore
elements related to the derivational forms. We observe here that the
hypernyms are not required on the properties side of the relation
matching process owing to the design characteristics of a KG which
entails a taxonomical relationship in which properties are dened
as classes within a hierarchy. For example, the property dbo:child
is a more general concept and would match its hyponyms son and
daughter. In case the question requires a hypernym of this relation
(e.g. dbo:relative) then the design structure already captures this
hierarchy.
A similar approach was employed by Beaumont et.al [
2
] in which
they enhance property labels obtained from the KG with variations
from Wordnet. is is necessary since the relation in natural text
oen does not map directly to the label of the desired property (i.e.
lexical gap). For example, the property spouse does not match its
natural language forms wife of / husband of or married to. Consid-
ering two related concepts, we can enhance the matching of the
relation to the property in the KG with a set of natural language
paerns that are commonly used to refer to that property [
28
]. e
label aribute of the property provide a natural language mention
of the property which is commonly one to three words. In this work,
we also consider the comment aribute related to each property in
the KG. e comment aribute of an element provides additional
textual information about the given property.
In DBpedia there are two sets of properties which can be found
either in the DBpedia ontology (dbo
13
) namespace or the DBpedia
properties one (dbp14). Out of a possible total of 63,764 items clas-
sied as properties in the DBpedia ontology, only about 3,500 have
13dbo stands for: hp://dbpedia.org/ontology/
14dbp stands for: hp://dbpedia.org/property/
Semantics2017, September 11–14, 2017, Amsterdam, Netherlands I.O Mulang’ et al.
instances within the KG. We identify 2,795 properties
15
dened
within dbo as key properties for our experiments and fetch the
instantiated properties from dbp, leading to a total of 4,748 proper-
ties represented in the EPS. We consider these properties sucient
to answer questions on DBpedia since questions would demand
properties that have participated in at least one factual instance
within the KG.
Formally, a property
pP
, where
P
is dened in a graph G =
{
SxPxO
}
as the set of all properties in G, is expanded into a septuple
(ρ,β,λ,ω,c,µ,A)such that:
ϱe uri of the property in the KG
βe text label referring to the domain of the property
λe text label of the property
ωe label referring to the range of the property
ce count of instances in the KG containing the property
µ
A ratio associating unique subjects and unique objects instan-
tiated by the property
A
Annotations derived from syntactic analysis of the constructed
sentence from the other aributes.
All the elements of a property are obtained directly from the
KG except the annotations
A
. To produce
A
, we aempt a de-
rived Sentence by concatenating a section of the tuple, in this
form
β
acts as the subject,
λ
the relation and
ω
the object with
the comment appended as a descriptive text of the relation sepa-
rated by a comma. For example for the property with
λ
as capital,
β
PopulatedPlace
” and
ω
city
” we constructs the text:
Populated place capital city. For this relation, there is no comment
represented in the KG. To elaborate the role of comments lets con-
sider the property dbo:spouse which has both the
β
and
λ
elements
of value Person from the class dbo:Person. e derived sentence:
Person spouse Person,the person they are married to. contains a com-
ment that complements the basic triple elements. e sentence is
not grammatically complete but rather have a form that can suggest
the syntactic structures.
4.2 Q-Rel Extraction
e Q-Rel Extraction module receives a estion text in a given
natural language (in our context, we use English) and produces a tu-
ple representation of the question containing aributes that would
later be used in deriving a similarity score.Given that questions are
oen succinct and may lack some distant syntactic and semantic
associations that would normally be present in free text while also
inherently contain implicit or explicit characteristics that may not
be exhibited in free text, we make some assumptions and formulate
constraints that would assist to represent a question. We observe
that relation extraction for communicating with a KG such as re-
quired in the question answering domain, is substantially dierent
from general relation extraction tasks in Open I.E. Oen, the binary
relations extracted from the natural text do not suggest their rela-
tion to semantic components in a KG. It is therefore gainful in some
cases, to readjust binary relations based on other characteristics
within the text. According to Beaumont et al
. [2]
a set of phrases
within the question can be determined that correspond to semantic
15
is gure can be obtained from: hp://wiki.dbpedia.org/services-
resources/ontology
(a)
(b)
Figure 4: Simple question dependency parse tree
components (entity,property and class). In our work, we consider
properties as the major semantic component of interest.
We assume that a question is either a simple question or is a
variably connected set of simple questions. A simple question is
a question which exposes only one unique relation Bordes et al
.
[4]
, Lukovnikov et al
. [18]
and as such the relation can only match
one unique property in the KB. Each simple question has a desire,
the type of answer expected [
12
] a binary relation, which can be
represented in logical form
rel(x,y)
in which
rel
describes the
relationship between known or unknown entities
x
and
y
[
17
] and
a set of assisted words and symbols. is set of words can be further
viewed as named entity nouns, non-named entity nouns and helper
words.
In this work, we represent a simple question as a single rela-
tion, hereaer referred as Q-Rel. Formally Q-Rel is an octuple
(
δ,η,α,`,γ,E,N,ϒ)where :
δe question desire
ηe direct helper word to the relation
αthe relation words in the question
`e le element in the relation, or the relation head [28]
γe right element of the relation or the relation tail [28]
E ←
Possibly empty set of named entities where
e∈ E e<
{`γ}
N ←
Possibly empty set of non entity nouns
s.t.e∈ N e<
{`γ}
ϒ
Possibly empty set of helper words such a dependency prepo-
sition
Given the simple question : What is the capital of Australia ?,
with the dependency parse tree in 4a would have the aributes
with the values as follows:
δ
”location”;
η
”is”;
α
”capital”;
`null;γ”Australia”; E ← null;N ← null;ϒ{of}
For this example, the root
capit al
of the dependency parse is
also the relation word in the Q-Rel. e relation in the question
Matching Natural Language Relations to KG Properties for QA Semantics2017, September 11–14, 2017, Amsterdam, Netherlands
could dier from the root of the dependency tree if the question
was asked dierently : What is the capital city of Australia as shown
in 4b. We overcome this dierence at the dependency adjustment
stage.
4.2.1 estion Desire. e question desire, sometimes called
the question answer type[
23
] is a classication that denotes what
kind of an answer is expected from the question.e task of question
answer type identication is well studied with several approaches
proposed. For this task, we modify an existing implementation
available online
16
that is based on Lee and Roth [
16
]. e method
trains a Support vector machine (SVM) classier using the TREC
17
dataset with 94% accuracy on Coarse classes and 88% on ne classes.
e SVM is a maximum margin classier in which a function is
dened that transforms the training vectors by mapping into a
higher dimensional space then nds, in this higher dimensional
space, a hyperplane that obtains the widest margin separation. For
a clearer explanation of the usage of SVM for question classication
see [
32
]. We only employ the coarse model for our classication
since the six classes: location, human, abbreviation, entity, number,
and description can be matched to the domain or range of a property.
For the question How many people live in the capital of Australia we
obtain the question type : number, on the other hand for the sub
question What is the capital city of Australia ? we obtain the desire
i.e. location.
4.2.2 Dependency Adjustment. Rules have been used in several
relation extraction tasks for either directly identifying relations
Nebhi
[21]
or for complementing machine learning algorithms. In
this work, we apply rules in two ways namely, i) rules for reducing
multi relation questions into constituent single relation questions
for ease of processing and ii) for readjusting the relation word in
the Q-Rel. To derive simple relations from multi relation questions,
we rst must partition our question into simple question that would
translate into
QRels
. Based on the initial parse characteristics
of a question, we identify the following four elements of complex
questions as opportunities for decomposition into the constituent
simple questions. ree of these are largely inspired by the work
of Reddy et. al Reddy et al
. [24]
where they employ linguistic
constructs to derive logical forms from dependency parses. Of
relevance to our work is their interpretation of adjectival clauses,
prepositional phrases and conjunctions. We add extra adjustment
consideration based on possessive structures.
Only the relative clauses require recursive processing since the
other three lend themselves directly into relations. An adjectival
clause, also called relative clause [
8
,
22
] is introduced by the relative
pronouns who, whom, whose which, that etc. Regardless of whether
a relative clause is dening or non-dening, they form a separable
independent section of a sentence. e relative clause aachment is
then considered so as to be able to prepend the subject of the clause.
Taking the question: Who was vice president under the president
who approved the use of atomic weapons against Japan during World
War II?, a relative clause begins aer the president, we therefore can
process this question by analyzing two dierent statements. i. Who
16hps://github.com/nausheenfatma/estionClassication
17hp://trec.nist.gov/data.html
Figure 5: Generation of a Q-Rel
was vice president under the President. and ii. e president approved
the use of atomic weapons against Japan during World War II?.
e rst part has only one relation vice president while the sec-
ond part of this question produces several relations due to the
preposition rule discussed hereaer. All of these prepositions have
the same aachment on the verb use as in use of,use during,use
against which we resolve into one relation with
α
as use. Eventually,
when we processed this part of the relation, it has no match on any
relation in the KG. In this context this information is contained as
description of an entity rather than a relation. e entity in this
question is dbr:Harry S. Truman
For questions with irregular forms such as the form of the verbs
have,to be and to do as part-modiers,the parsers could return
these modiers as the root of the question, we then apply an adjust-
ment rule that seeks the main verb of the question for example the
question: Which movies did Kurosawa direct?, the dependency tree
returns the token did as the root while the relation word sought is
the word direct.
Prepositional phrase aachments denote a wide range of rela-
tions such as time, possession, containment and locality etc. All
unique instances of prepositional phrase aachment are considered
as instances of Q-Rel. For the question: How many people live in
the capital city of Australia ?, we then derive two Q-rels based on
the two prepositions in and of.live in(people,X) and capital of (X,
Australia). We add extra complementary words to the set
N
of none
named entities according to the type of preposition, for example the
preposition in associated with a location or that has a dependency
with the word where would introduce the two words location and
place if they did not already exist in the set
N
, adjustments are
made appropriately if the preposition is of time or positions etc.
Also considered are the possessive constructs in which the object of
the possession becomes the relation as seen in the question : What
was Brazil’s lowest rank in the FIFA World Ranking? where ranking
Semantics2017, September 11–14, 2017, Amsterdam, Netherlands I.O Mulang’ et al.
Figure 6: Similarity measures : Sp at h - Wordnet path similarity, Sw up - Wu-Palmer Similarity, Slc h - Leacock-Chodrow similar-
ity, Lw- Levenstein weight obtained from the levestein similarity (Lev), pu- Property unigrams, ru- query relation unigrams,
pb- Property bigrams, rb- query bigrams
forms
α
and lowest forms
η
in the Q-Rel. A gazeeer of country
names and their derived forms is introduced to evaluate all Named
entities of type location and for those that resolve to country names,
we add the word country to the set of non-named entity nouns
N
as seen in gure 5
Aer producing the Q-Rel we maintain the associated annota-
tions related to the POS sequence and the Bag of words features.
4.3 Similarity Measurement
In this section we take the Q-Rel from the Q-Rel extractor and match
it with the properties in the EPS using a set of similarity measures
as described below. Four of these similarity measures are applied
on the Wordnet Taxonomy graph. e result of the combination
of these measures is a value that indicates how similar the Q-Rel
is to a given property. Every property is then associated with a
similarity value which is then used to rank the properties. e
result is a list of top k ranked property URLs. Figure 6 indicates
which elements from the two tuples are matched against each other.
Each similarity measure is numbered in the picture with m1 to
m9
labels and described as follows.
Wordnet Path Similarity — ps (m1,m2):
e path similarity is a score between 0 and 1 measured according
to the behavior of the conceptual distance between two nodes in
the taxonomy as factor of the number of edges separating them in
the hierarchy [
6
]. Given two senses the shortest path (
len(r1,r2)
)
that connects the senses in the is-a taxonomy determines the ps,
where ps=1 it implies the two senses are identical. Generally the
path similarity (ps) is dened as:
ps(r1,r2)=2max depth len(r1,r2)
where
max dept h
is a constant representing the maximum depth
of the Wordnet graph. In gure 6 the ps is used to obtain values of
m1 and m2.
Wu-Palmer Similarity (m3) [29]:
A measure that takes into consideration the Least Common Subsum-
mer (LCS) of two senses, it is by denition the common ancestor
deepest in the taxonomy, not necessarily closest to the two senses.
If multiple candidates for the LCS exist, those whose shortest path
to the root node is the longest will be selected. Generally, the longer
path is chosen for the calculations in situations where the LCS has
multiple paths to the root.
Leacock-Chodorow Similarity (m4) [14]:
A similarity score in relation to the shortest path connecting two
senses and the maximum depth of the taxonomy in which the senses
occur expressed as
loд(p/
2
d)
where
p
is the shortest path length
and
d
the taxonomy depth. Since the highest value of this measure
is 3.6375, we normalize the value by expressing it as a ration of the
Max LCS =3.6375.
Derivational forms (m5):
Derivational forms of a word are terms belonging to dierent syntac-
tic categories but have the same root form and a semantic relation.
For example, the word spouse is a noun but has a derived form
espouse a verb which has a higher semantic relation to the verb
marry. e other semantic measures would miss this relationship.
is measure is used to produce the measure m5 in Figure 6.
Binarized Levenshtein Similarity (m6):
We dene our Levenshtein similarity measure as:
levs im (a1,a2)=ma x ( |a1|,|a2|)−l ev(a1,a2)
max (| a1|,|a2|)
In our work we employ the Levenshtein edit distance (
lev
) for word
similarity on the lemmatized forms of the
λ
and
α
as well as the
η
.
In cases where both elements contain values or consist of more than
a word token each, we iteratively apply the Levenshtein distance.
We represent this distance as either 1 or 0 depending on the nature
of the two lemma forms and the extent of the dissimilarity. Take
as an example
α=
discovered
” lemma form as ”
discover
” against
β=
discoverer
(dbo.discoverer )
whose lemma form remains as
discoverer
using the Wordnet lemmatizer. e Levenshtein distance
in this case is 2giving the Levenshtein similarity
102
10 =
0
.
8. In
this case we require the similarity to be 1. erefore the binarized
Levenshtein similarity is given by:
Matching Natural Language Relations to KG Properties for QA Semantics2017, September 11–14, 2017, Amsterdam, Netherlands
Table 1: Performance Evaluation
Cumulative Frequency at Rank Positions Mean Precision @k Recall @k F-Measure
Num Properties Total Rank#1 Rank#2 Rank#3 Rank#4 Rank#5 Rank#10 #1 #10 #10 #10
1 Property 285 136 154 174 190 199 212 47.72% 55.69% 74.39% 63.70%
2 Properties 82 32 34 37 39 40 51 39.02% 43.63% 62.20% 51.29
24 31 40 43 44 55 29.27% 39.69% 67.07% 49.87
3 Properties 9
0 1 1 1 1 1 0.00% 11.11% 11.11% 11.11%
2 3 3 4 4 4 22.22% 30.55% 44.44% 36.21%
3 3 3 5 5 6 33.33% 38.89% 66.67% 49.12%
lev(a1,a2)=n1,i f levs im (a1,a2)>0.75 & a1a2
0 , else
e value 0.75 is obtained from an evaluation of words whose
verb and noun forms give dierent lemma forms. A list of these
words can be found in our github repository provided hereaer.
Instances count measure (m7):
We dene a new measure related to the number of instances in the
KG in which the property participates. Given the total number of
instances for the property as
c
, the number of unique subjects in
these instances as
s
and number unique objects as
o
. We rst dene
a ratio
µ=s
o
. We then use this ratio to penalize a value obtained
from the total number of instances as follows: cn
Íiciµ
Unigrams and Bigrams (m8,m9):
We obtain normalized values related to the size of the intersection
between two pairs of unigrams
pu
&
ru
as well as bigrams
pb
&
rb
from the question words and the KG properties. From the unigram
set, we rst remove stop words and require it to contain unique
values. e bigram are derived from the sequence of the POSs in
the sentences. e length of the intersection is then expressed as
a fraction of the length of the question unigram or bigram respec-
tively.
Overall aggregation of similarity measures:
Taking the similarity measures as a vector msuch that
mi
refers to
the value of a similarity measure at position iin mwe dene the
overall aggregated similarity score as a weighted sum measure:
Scor es im =wmT=Ín
i=0wimi
For this work we assume the measures are all equally weighted but
we observe that these weights can be easily learned for instance
via a Least Squares Optimization method.
5 EVALUATION
5.1 Experiment Setup
For our evaluation we used the QALD-5 dataset [
27
] which con-
sists of over 404 questions together with the corresponding formal
queries (SPARQL) to be applied against the DBpedia ontology [
1
,
3
].
We did not used higher versions of QALD because our aim is to
see the performance of relation linking tool in contrast with the
overall performance of QA systems. In later versions of QALD, not
many QA systems have participated. e 28 questions from total
were out of scope and had no corresponding SPARQL query. Since
for our work, we focus on providing independent and reusable tool
that identies the URI of a property for pipelining in QA systems,
we annotate the questions with the properties mentioned in the
SPARQL queries to form the evaluation dataset.
e 376 viable questions are grouped into three categories based
on the number of properties required within the SPARQL queries.
A total of 285 questions require only one single property to be
matched within the query. A further 82 questions require two
properties to be matched and 9 of the question had 3 properties. We
evaluate against the properties within the SPARQL queries w.r.t the
relations extracted from the natural language questions. Running
on a 4-core CPU (at 1.7Ghz) with 8GB of memory, each question
requires on average 48 seconds to return an answer. e source
code is available on GitHub
18
and detailed description of practical
implementation is online at the project wiki link:
hps://github.com/mulangonando/ReMatch/wiki.
5.2 Results
Table 1 illustrates our empirical results. For insightful understand-
ing, e QALD questions are grouped into three categories. e
rst row of the table describe the categories of the questions which
contains only one relation (1 Property) for example Who is the wife
of Donald Trump. For such questions, our tool has precision of
47.72 percent when the correct result is at rst position in nal
list of answers and 55.69 percent as average precision for top 10
properties. Recall and F-Measure are also considerably high for this
type of questions, with values equal to 74.39% and 63.70% percent
respectively.
For questions such as How many people live in the capital city
of Australia, the expected properties from DBpedia are two: pop-
ulationTotal and capital. For such questions (2 Properties), our
tool provides overall precision of 39.02 percent for the relation oc-
curred at rst instance. In our example question, populationTotal
represents the rst relation of the input question. For questions
such as Which telecommunications organizations are located in Bel-
gium, which has three properties (3 Properties) namely rdf:type,
dbo:industry,dbo:location or dbp:locationcountry, precision and re-
call values decrease considerably.
We analyzed overall precision and recall values of the systems
which took part in QALD-5 challenge. We can observe that if our
tool is used as a component to identify relations for input questions,
it would not decrease overall precision and recall values of many
of the systems like SemGraphQA, YodaQA, QAnswer as our tool
has higher precision and recall value from many of these systems
[
27
]. Besides Xser and AskNow, all other QA systems evaluated
over QALD-5 have lower precision than 0.40 [
7
,
27
]. Furthermore,
18hps://github.com/mulangonando/ReMatch
Semantics2017, September 11–14, 2017, Amsterdam, Netherlands I.O Mulang’ et al.
we are not aware of any other independent relation linking tool
with which we can compare our performance.
Overall, while aiming for component based QA systems using
frameworks like Qanary[
5
] or OKBQA
19
, our tool would improve
on the overall performance of the QA system for the questions
having single and double relation. However, for the questions with
three relations, our tool would aect the overall performance of
the QA system negatively.
6 CONCLUSIONS AND FUTURE WORK
is paper presented an approach, and an independent reusable tool,
for matching natural language relations to KB properties for KG
based question answering pipelines. is tool employs dependency
parse characteristics with adjustment rules then carries out a match
against KG properties enhanced with word lexicon Wordnet via a
set of similarity measures. Our approach loses precision in cases
where the targeted KG property has lile textual augmentation and
when the estion is too short to represent considerable amount
of information in the Q-Rel such as seen with the question: Give
me all Cosmonauts. e major challenges in such scenarios is the
lack of tailored text corpora that can be used to train a learning
algorithm.
As future work, We target to ne tune the similarity measures
by learning the weights through known least squares optimization
approaches and evaluate the results against our current results as
a benchmark. We have identied the current use of embeddings
both on the NLP and the KG side of the NLP-KG divide coupled
with Neural Networks based approaches for deep learning, as a
promising avenue for beer precision. In cases where we have a
recall value but the desired property has not been ranked top of
the results, an approach would be determined to beer rank the
nal result set.
7 ACKNOWLEDGMENT
is project has received funding from the DAAD (Deutscher Akademis-
cher Austauschdienst).
REFERENCES
[1]
S
¨
oren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
and Zachary Ives. DBpedia: A Nucleus for a Web of Open Data. Springer, Berlin,
Heidelberg, 722–735.
[2]
Romain Beaumont, Brigie Grau, and Anne Laure Ligozat. 2015.
SemGraphQA@QALD-5: LIMSI participation at QALD-5@CLEF. CEUR
Workshop Proceedings 1391, 1 (2015).
[3]
Christian Bizer, Jens Lehmann, Georgi Kobilarov, S
¨
oren Auer, Christian Becker,
Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - A crystallization
point for the Web of Data. Web Semantics: Science, Services and Agents on the
World Wide Web 7, 3 (2009), 154–165.
[4]
Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-
scale Simple estion Answering with Memory Networks. (6 2015). hp:
//arxiv.org/abs/1506.02075
[5]
Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saedeeh Shekarpour, Didier
Cherix, and Christoph Lange. 2016. Qanary–a methodology for vocabulary-
driven open question answering systems. In International Semantic Web Confer-
ence. Springer, 625–641.
[6]
Alexander Budanitsky and Graeme Hirst. 2006. Evaluating WordNet-based
Measures of Lexical Semantic Relatedness. Computational Linguistics 32, 1 (mar
2006), 13–47.
[7]
Mohnish Dubey, Sourish Dasgupta, Ankit Sharma, Konrad H
¨
oner, and Jens
Lehmann. 2016. AskNow: A Framework for Natural Language ery Formaliza-
tion in SPARQL. (2016), 300–316.
19hp://www.okbqa.org/
[8]
Claudia Felser, eodore Marinis, and Harald Clahsen. 2003. Children’s Process-
ing of Ambiguous Sentences: A Study of Relative Clause Aachment. Language
Acquisition 11, 3 (jul 2003), 127–163.
[9] ´
Oscar Ferr
´
andez, Christian Spurk, Milen Kouylekov, Iustin Dornescu, Sergio
Ferr
´
andez, Maeo Negri, Rub
´
en Izquierdo, David Tom
´
as, Constantin Orasan,
Guenter Neumann, and others. 2011. e QALL-ME framework: A speciable-
domain multilingual question answering architecture. Web semantics: Science,
services and agents on the world wide web 9, 2 (2011), 137–145.
[10]
Katrin Fundel, Robert K
¨
uner, and Ralf Zimmer. 2007. RelEx—Relation extraction
using dependency parse trees. Bioinformatics (2007).
[11]
Daniel Gerber and A-C Ngonga Ngomo. 2011. Bootstrapping the linked data
web. In 1st Workshop on Web Scale Knowledge Extraction @ISWC.
[12]
Sherzod Hakimov, Hakan Tunc, Marlen Akimaliev, and Erdogan Dogdu. 2013.
Semantic question answering system over linked data using relational paerns.
Proceedings of the Joint EDBT/ICDT 2013 Workshops on - EDBT ’13 (2013), 83.
[13]
Jin-Dong Kim and K Bretonnel Cohen. 2013. Natural language query processing
for SPARQL generation: A prototype system for SNOMED CT. In Proceedings of
biolink. 32–38.
[14]
Claudia Leacock and Martin Chodorow. 1998. Combining local context and
WordNet similarity for word sense identication. WordNet: An electronic lexical
database 49, 2 (1998), 265–283.
[15]
Jung-Tae Lee, Sang-Bum Kim, Young-In Song, and Hae-Chang Rim. 2008. Bridg-
ing lexical gaps between queries and questions on large online Q&A collections
with compact translation models. In Proceedings of the Conference on Empirical
Methods in Natural Language Processing. ACL, 410–418.
[16]
Xin Li and Dan Roth. 2002. Learning question classiers. Proceedings of the 19th
International Conference on Computational linguistics 1, 1 (2002), 1–7.
[17] Percy Liang. 2013. Learning Compositional Semantics. 1998 (2013), 1–7.
[18]
Denis Lukovnikov, Asja Fischer, Jens Lehmann, and S
¨
oren Auer. 2017. Neural
Network-based estion Answering over Knowledge Graphs on Word and
Character Level. In Proceedings of the 26th International Conference on World
Wide Web - WWW ’17. ACM Press, New York, New York, USA, 1211–1220.
[19]
Edgard Marx, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Konrad H
¨
oner,
Jens Lehmann, and S
¨
oren Auer. 2014. Towards an open question answering
architecture. In Proceedings of the 10th International Conference on Semantic
Systems. ACM, 57–60.
[20]
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY:
a taxonomy of relational paerns with semantic types. In Proceedings of the
2012 Joint Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning. 1135–1145.
[21]
Kamel Nebhi. 2013. A rule-based relation extraction system using DBpedia and
syntactic parsing. CEUR Workshop Proceedings 1064 (2013), 1–6.
[22]
Axel-Cyrille Ngonga Ngomo, Lorenz B
¨
uhmann, Christina Unger, Jens Lehmann,
and Daniel Gerber. 2013. Sorry, i don’t speak SPARQL: translating SPARQL
queries into natural language. (2013), 977–988.
[23]
John Prager, Jennifer Chu-Carroll, and Krzysztof Czuba. 2002. Statistical answer-
type identication in open-domain question answering. Proceedings of the second
international conference on Human Language Technology Research - (2002), 150.
[24]
Siva Reddy, Oscar T
¨
ackstr
¨
om, Michael Collins, Tom Kwiatkowski, Dipanjan Das,
Mark Steedman, and Mirella Lapata. 2016. Transforming Dependency Structures
to Logical Forms for Semantic Parsing. (2016).
[25]
Kuldeep Singh, Ioanna Lytra, Maria-Esther Vidal, Dharmen Punjani, Harsh
akkar, Christoph Lange, and S
¨
oren Auer. QAestro – Semantic-based Composi-
tion of estion Answering Pipelines. DEXA 2017.
[26]
Christina Unger, Lorenz B
¨
uhmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo,
Daniel Gerber, and Philipp Cimiano. Template-based estion Answering over
RDF Data. In Proceedings of the 21st International Conference on World Wide Web
(WWW ’12). New York, NY, USA.
[27]
Christina Unger, Corina Forascu, Vanessa Lopez, Axel Cyrille Ngonga Ngomo,
Elena Cabrio, Philipp Cimiano, and Sebastian Walter. 2015. estion answering
over linked data (QALD-5). CEUR Workshop Proceedings 1391 (2015).
[28]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge
graph embedding by translating on hyperplanes. In Proceedings of the Twenty-
Eighth AAAI Conference on Articial Intelligence. AAAI Press, 1112–1119.
[29]
Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection.
In Proceedings of the 32nd annual meeting on Association for Computational
Linguistics. Association for Computational Linguistics, Morristown, NJ, USA.
[30]
Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016.
estion answering on freebase via relation extraction and textual evidence.
arXiv preprint arXiv:1603.00957 (2016).
[31]
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods
for relation extraction. Journal of machine learning research Feb (2003).
[32]
Huang Zhiheng, Marcus int, and Zengchang Qin. 2008. estion Classication
using HeadWords and their Hypernyms. Proceedings of the 2008 Conference on
Empirical Methods in Natural Language Processing October (2008), 927–936.
... There are also some works resorting to external taxonomies like Wordnet 1 . The synonyms, hyponyms, and variations are extracted to enhance the matching from phrases to predicates in the knowledge graph [3,24]. Another stream of researches perform entity linking (identifying the entity from the target knowledge graph that matches the input phrase) and relation linking as a joint task rather than taking them as separate tasks [11,26,38]. ...
... These relations correspond to the edges of meta patterns. There are a variety of resources and systems for single relation linking, e.g., SIBKB [30], BOA [15], and ReMatch [24]. All of these tools can be used to conduct the single relation linking. ...
... The previous work on relation linking can be divided into two groups, i.e., independent relation linking [24,25,30] and joint relation linking [11,26,28]. Independent relation linking. ...
Article
Full-text available
Given a knowledge graph and a natural language phrase, relation linking aims to find relations (predicates or properties) from the underlying knowledge graph to match the phrase. It is very useful in many applications, such as natural language question answering, personalized recommendation and text summarization. However, the previous relation linking algorithms usually produce a single relation for the input phrase and pay little attention to the more general and challenging problem, i.e., combinational relation linking that extracts a subgraph pattern to match the compound phrase (e.g. father-in-law). In this paper, we focus on the task of combinational relation linking over knowledge graphs. To resolve the problem, we define several elementary meta patterns which can be used to build any combinational relation. Then we design a systematic method based on the data-driven relation assembly technique, which is performed under the guidance of meta patterns. To enhance the system’s understanding ability, we introduce external knowledge during the linking process. Finally, extensive experiments over the real knowledge graph confirm the effectiveness of the proposed method.
... ReMatch: A part-of-speech and dependency parsing based relation linking tool for question answering [47]. ...
Article
Most Knowledge Graph-based Question Answering (KGQA) systems rely on training data to reach their optimal performance. However, acquiring training data for supervised systems is both time-consuming and resource-intensive. To address this, in this paper, we propose Tree-KGQA, an unsupervised KGQA system leveraging pre-trained language models and tree-based algorithms. Entity and relation linking are essential components of any KGQA system. We employ several pre-trained language models in the entity linking task to recognize the entities mentioned in the question and obtain the contextual representation for indexing. Furthermore, for relation linking we incorporate a pre-trained language model previously trained for language inference task. Finally, we introduce a novel algorithm for extracting the answer entities from a KG, where we construct a forest of interpretations and introduce tree-walking and tree disambiguation techniques. Our algorithm uses the linked relation and predicts the tree branches that eventually lead to the potential answer entities. The proposed method achieves 4.5% and 7.1% gains in F1 score in entity linking tasks on LC-QuAD 2.0 and LC-QuAD 2.0 (KBpearl) datasets, respectively, and a 5.4% increase in the relation linking task on LC-QuAD 2.0 (KBpearl). The comprehensive evaluations demonstrate that our unsupervised KGQA approach outperforms other supervised state-of-the-art methods on the WebQSP-WD test set (1.4% increase in F1 score) - without training on the target dataset.
... Among them, SPARQL query construction tools relies on the results of entity links and relationship links. The entity link tools commonly used in the knowledge graph based question answering system [7] are DBpedia Spotlight [8], AGDISTIS [9], and TagMe [10], and the relationship link tools are ReMatch [11],SIBKB [12], and SPARQL query construction tools are SINA [13], NLIWOD [14]. the Most of these commonly used SPARQL query construction tools need to fully arrange the entities and predicates, and their search space will increase exponentially with the increase in the number of entities and predicates, and consume huge storage space. ...
Article
Full-text available
According to the survey, the performance of the existing knowledge graph answering system in entity linking, relation linking and SPARQL query construction in terms of execution time and accuracy cannot meet the requirements of knowledge graph answering system. For this challenge, a new feedback mechanism based Knowledge-Driven query construction method is proposed. This method takes the entity set and predicate set in the problem as input, and constructs SPARQL query statements in a knowledge-driven way to solve simple and complex problems, and further proposes heuristic ideas to deal with implicit entity problems. At the same time, the method also proposes to feed back the query results of the knowledge graph to the entity link and relationship connection steps, so that the SPARQL query statement is optimized again. The evaluation results of the LC_QuAD data show that this method outperform the existing state of the art in precision and recall rate.
Article
Full-text available
With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been developed. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genealogical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: (1) representing genealogical data as knowledge graphs, (2) converting them to texts, (3) combining them with unstructured texts, and (4) training a transformer-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was performed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implications for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general public.
Chapter
Full-text available
Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel approach for relation linking framing it as a generative problem facilitating the use of pre-trained sequence-to-sequence models. We extend such sequence-to-sequence models with the idea of infusing structured data from the target knowledge base, primarily to enable these models to handle the nuances of the knowledge base. Moreover, we train the model with the aim to generate a structured output consisting of a list of argument-relation pairs, enabling a knowledge validation step. We compared our method against the existing relation linking systems on four different datasets derived from DBpedia and Wikidata. Our method reports large improvements over the state-of-the-art while using a much simpler model that can be easily adapted to different knowledge bases.
Chapter
Knowledge base question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledge bases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abstract Meaning Representation (AMR) and distant supervision. SLING integrates multiple approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledge base. The experiments on relation linking using three KBQA datasets, QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks.
Article
Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of Question Answering for user interaction. Existing QA systems have been extensively evaluated as black boxes and their performance has been characterised in terms of average results over all the questions of benchmarking datasets (i.e. macro evaluation). Albeit informative, macro evaluation studies do not provide evidence about QA components’ strengths and concrete weaknesses. Therefore, the objective of this article is to analyse and micro evaluate available QA components in order to comprehend which question characteristics impact on their performance. For this, we measure at question level and with respect to different question features the accuracy of 29 components reused in QA frameworks for the DBpedia knowledge graph using state-of-the-art benchmarks. As a result, we provide a perspective on collective failure cases, study the similarities and synergies among QA components for different component types and suggest their characteristics preventing them from effectively solving the corresponding QA tasks. Finally, based on these extensive results, we present conclusive insights for future challenges and research directions in the field of Question Answering over knowledge graphs.
Preprint
Natural Language Processing (NLP) has significantly contributed to the problems of extracting entities & relations, as well as linking them to existing knowledge graphs (KGs). Wikidata and DBpedia are KGs that include knowledge harvested from the web and curated by the crowd. Thus, links to these KGs provide a rich context composed of encyclopedic and factual knowledge represented in both KGs. However, albeit effective, the majority of existing tools are not agnostic and are only customized for a specific knowledge graph. In this paper, we present Falcon 2.0, the first joint entity, and relation linking over Wikidata. It receives a short natural language text in the English language and outputs a list of identified entities \& relations with their Uniform Resource Identifier (URI) in Wikidata. Falcon 2.0 resorts to fundamental principles of the English morphology (e.g., N-Gram tiling and N-Gram splitting) and background knowledge of labels alignments obtained from studied KG to return as an output. We have empirically studied the impact using only Wikidata on Falcon 2. and observed that it outperforms all the existing baselines. Falcon 2.0 is public and can be reused by the community; all the needed instructions for using Falcon 2.0 are available at our GitHub repository:https://github.com/SDM-TIB/falcon2.0. Moreover, the online API can be used directly to ease the use of Falcon 2.0 without any technical expertise. Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.
Conference Paper
Full-text available
The demand for interfaces that allow users to interact with computers in an intuitive, effective, and efficient way is increasing. Question Answering (QA) systems address this need by answering questions posed by humans using knowledge bases. In recent years, many QA systems and related components have been developed both by practitioners and the research community. Since QA involves a vast number of (par-tially overlapping) subtasks, existing QA components can be combined in various ways to build tailored QA systems that perform better in terms of scalability and accuracy in specific domains and use cases. However, to the best of our knowledge, no systematic way exists to formally describe and automatically compose such components. Thus, in this work, we introduce QAestro, a framework for semantically describing both QA components and developer requirements for QA component composition. QAestro relies on a controlled vocabulary and the Local-as-View (LAV) approach to model QA tasks and components, respectively. Furthermore, the problem of QA component composition is mapped to the problem of LAV query rewriting, and state-of-the-art SAT solvers are utilized to efficiently enumerate the solutions. We have formalized 51 existing QA components implemented in 20 QA systems using QAestro. Our empirical results suggest that QAestro enumerates the combinations of QA components that effectively implement QA developer requirements.
Conference Paper
Full-text available
Question Answering (QA) systems over Knowledge Graphs (KG) automatically answer natural language questions using facts contained in a knowledge graph. Simple questions, which can be answered by the extraction of a single fact, constitute a large part of questions asked on the web but still pose challenges to QA systems, especially when asked against a large knowledge resource. Existing QA systems usually rely on various components each specialised in solving different sub-tasks of the problem (such as segmenta-tion, entity recognition, disambiguation, and relation classification etc.). In this work, we follow a quite different approach: We train a neural network for answering simple questions in an end-to-end manner, leaving all decisions to the model. It learns to rank subject-predicate pairs to enable the retrieval of relevant facts given a question. The network contains a nested word/character-level question en-coder which allows to handle out-of-vocabulary and rare word problems while still being able to exploit word-level semantics. Our approach achieves results competitive with state-of-the-art end-to-end approaches that rely on an attention mechanism.
Conference Paper
Full-text available
It is very challenging to access the knowledge expressed within (big) data sets. Question answering (QA) aims at making sense out of data via a simple-to-use interface. However, QA systems are very complex and earlier approaches are mostly singular and monolithic implementations for QA in specific domains. Therefore, it is cumbersome and inefficient to design and implement new or improved approaches, in particular as many components are not reusable. Hence, there is a strong need for enabling best-of-breed QA systems, where the best performing components are combined, aiming at the best quality achievable in the given domain. Taking into account the high variety of functionality that might be of use within a QA system and therefore reused in new QA systems, we provide an approach driven by a core QA vocabulary that is aligned to existing, powerful ontologies provided by domain-specific communities. We achieve this by a methodology for binding existing vocabularies to our core QA vocabulary without re-creating the information provided by external components. We thus provide a practical approach for rapidly establishing new (domain-specific) QA systems, while the core QA vocabulary is re-usable across multiple domains. To the best of our knowledge, this is the first approach to open QA systems that is agnostic to implementation details and that inherently follows the linked data principles.
Conference Paper
Full-text available
For our participation to QALD-5, we developed a system for answering questions on a knowledge base. We proposed an unsupervised method for the semantic analysis of questions, that generates queries, based on graph transformations, in two steps. First step is independent of the knowledge base schema and makes use of very general constraints on the query structure that allows us to maintain semantic ambiguities in different graphs. Ambiguities are then solved globally at the final step when querying the knowledge base.
Article
The strongly typed syntax of grammar formalisms such as CCG, TAG, LFG and HPSG offers a synchronous framework for deriving syntactic structures and semantic logical forms. In contrast—partly due to the lack of a strong type system—dependency structures are easy to annotate and have become a widely used form of syntactic analysis for many languages. However, the lack of a type system makes a formal mechanism for deriving logical forms from dependency structures challenging. We address this by introducing a robust system based on the lambda calculus for deriving neo-Davidsonian logical forms from dependency trees. These logical forms are then used for semantic parsing of natural language to Freebase. Experiments on the Free917 and Web-Questions datasets show that our representation is superior to the original dependency trees and that it outperforms a CCG-based representation on this task. Compared to prior work, we obtain the strongest result to date on Free917 and competitive results on WebQuestions.
Conference Paper
Natural Language Query Formalization involves semantically parsing queries in natural language and translating them into their corresponding formal representations. It is a key component for developing question-answering (QA) systems on RDF data. The chosen formal representation language in this case is often SPARQL. In this paper, we propose a framework, called AskNow, where users can pose queries in English to a target RDF knowledge base (e.g. DBpedia), which are first normalized into an intermediary canonical syntactic form, called Normalized Query Structure (NQS), and then translated into SPARQL queries. NQS facilitates the identification of the desire (or expected output information) and the user-provided input information, and establishing their mutual semantic relationship. At the same time, it is sufficiently adaptive to query paraphrasing. We have empirically evaluated the framework with respect to the syntactic ro-bustness of NQS and semantic accuracy of the SPARQL translator on standard benchmark datasets.
Article
In this paper, we present a rule-based relation extraction approach which uses DBpedia and linguistic information provided by the syntactic parser Fips. Our goal is twofold: (i) the morpho-syntactic patterns are defined using the syntactic parser Fips to identify relations between named entities (ii) the RDF triples extracted from DBpedia are used to improve RE task by creating gazetteer relations.