Conference PaperPDF Available

A Probabilistic-Logical Framework for Ontology Matching.

Authors:

Abstract

Ontology matching is the problem of determining correspondences between concepts, properties, and individuals of different heterogeneous ontologies. With this paper we present a novel probabilistic-logical framework for ontology matching based on Markov logic. We define the syntax and semantics and provide a formalization of the ontology matching problem within the framework. The approach has several advantages over existing methods such as ease of experimentation, incoherence mitigation during the alignment process, and the incorporation of a-priori confidence values. We show empirically that the approach is efficient and more accurate than existing matchers on an established ontology alignment benchmark dataset. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
A Probabilistic-Logical Framework for Ontology Matching
Mathias Niepert, Christian Meilicke, Heiner Stuckenschmidt
KR & KM Research Group, Universit¨at Mannheim, Germany
{mathias, christian, heiner}@informatik.uni-mannheim.de
Abstract
Ontology matching is the problem of determining corre-
spondences between concepts, properties, and individ-
uals of different heterogeneous ontologies. With this
paper we present a novel probabilistic-logical frame-
work for ontology matching based on Markov logic.
We define the syntax and semantics and provide a for-
malization of the ontology matching problem within the
framework. The approach has several advantages over
existing methods such as ease of experimentation, inco-
herence mitigation during the alignment process, and
the incorporation of a-priori confidence values. We
show empirically that the approach is efficient and more
accurate than existing matchers on an established ontol-
ogy alignment benchmark dataset.
Introduction
Ontology matching, or ontology alignment, is the problem
of determining correspondences between concepts, proper-
ties, and individuals of two or more different formal on-
tologies (Euzenat and Shvaiko 2007). The alignment of
ontologies enables the knowledge and data expressed in
the matched ontologies to interoperate. A major insight of
the ontology alignment evaluation initiative (OAEI) (J´erˆome
Euzenat et al 2009) is that there is no best method or system
for all existing matching problems. The factors influencing
the quality of alignments range from differences in lexical
similarity measures to variations in alignment extraction ap-
proaches. This result justifies not only the OAEI itself but
also the need for a framework that facilitates the comparison
of different strategies in a straight-forward and transparent
manner. To ensure comparability of different matching ap-
proaches such a framework would need a number of charac-
teristics. In particular it should feature
a unified syntax that supports the specification of differ-
ent approaches in the same language to isolate meaningful
methodological variations and ensure that only the effects
of known variations are observed;
a well-defined semantics that guarantees that matching
conditions are interpreted uniformly and that outcome
Copyright c
2010, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
variations are not merely a result of different implemen-
tations of identical features;
a testbed for a wide range of techniques used in ontology
matching including the use of soft and hard evidence such
as string similarities (soft) and logical consistency of the
result (hard); and
support for the experimental comparison and standardized
evaluation of techniques on existing benchmarks.
Based on these considerations, we argue that Markov
logic (Richardson and Domingos 2006) provides and ex-
cellent framework for ontology matching. Markov logic
(ML) offers several advantages over existing matching ap-
proaches. Its main strength is rooted in the ability to com-
bine soft and hard first-order formulae. This allows the
inclusion of both known logical statements and uncertain
formulae modeling potential correspondences and structural
properties of the ontologies. For instance, hard formulae can
reduce incoherence during the alignment process while soft
formulae can factor in a-priori confidence values for corre-
spondences. An additional advantage of ML is joint infer-
ence, that is, the inference of two or more interdependent
hidden predicates. Several results show that joint inference
is superior in accuracy when applied to a wide range of prob-
lems such as ontology refinement (Wu and Weld 2008) and
multilingual semantic role labeling (Meza-Ruiz and Riedel
2009). Furthermore, probabilistic approaches to ontology
matching have recently produced competitive matching re-
sults (Albagli, Ben-Eliyahu-Zohary, and Shimony 2009).
In this paper, we present a framework for ontology match-
ing based on the syntax and semantics of Markov logic, in
the spirit of a tool-box, allowing users to specify and com-
bine different individual matching strategies. In particular
we describe how several typical matching approaches are
captured by the framework;
we show how these approaches can be aggregated in
a modular manner, jointly increasing the quality of the
alignments; and
we compare our framework to state-of-the-art matching
systems and verify empirically that the combination of
three matching strategies leads to alignments that are
more accurate than those generated by any of the mono-
lithic matching systems.
The paper is structured as follows. First, we briefly define
ontology matching and introduce a running example that is
used throughout the paper. We then introduce the syntax and
semantics of the ML framework and show that it can repre-
sent numerous different matching approaches. We describe
probabilistic reasoning in the framework of Markov logic
and show that a solution to a given matching problem can be
obtained by solving the maximum a-posteriori (MAP) prob-
lem of a ground Markov logic network using integer linear
programming. We then report the results of an empirical
evaluation of our method using OAEI benchmark datasets.
We conclude with a set of insights gained from the experi-
ments and some ideas for future research.
Ontology Matching
Ontology matching is the process of detecting links between
entities in heterogeneous ontologies. Based on a definition
by Euzenat and Shvaiko (Euzenat and Shvaiko 2007), we
formally introduce the notion of correspondence and align-
ment to refer to these links.
Definition 1 (Correspondence and Alignment).Given on-
tologies O1and O2, let qbe a function that defines sets
of matchable entities q(O1)and q(O2). A correspon-
dence between O1and O2is a triple he1, e2, risuch that
e1q(O1),e2q(O2), and ris a semantic relation. An
alignment between O1and O2is a set of correspondences
between O1and O2.
The generic form of Definition 1 captures a wide range of
correspondences by varying what is admissible as match-
able element and semantic relation. In the following we
are only interested in equivalence correspondences between
concepts and properties. In the first step of the alignment
process most matching systems compute a-priori similarities
between matching candidates. These values are typically re-
fined in later phases of the matching process. The underly-
ing assumption is that the degree of similarity is indicative
of the likelihood that two entities are equivalent. Given two
matchable entities e1and e2we write σ(e1, e2)to refer to
this kind of a-priori similarity. Before presenting the formal
matching framework, we motivate the approach by a simple
instance of an ontology matching problem which we use as
a running example throughout the paper.
Example 2. Figure 1 depicts fragments of two ontologies
describing the domain of scientific conferences. The follow-
ing axioms are part of ontology O1and O2, respectively.
O1O2
hasW ritten Reviewer writtenBy P aper
P aperRev iew Document Review Documents
Reviewer P e rson P aper Documents
Submission Document Author Ag ent
Document ⊑ ¬P erson P aper ⊑ ¬Review
If we apply a similarity measure σbased on the Lev-
enshtein distance (Levenshtein 1965) there are four pairs of
entities such that σ(e1, e2)>0.5.
σ(Document, Documents) = 0.88 (1)
σ(Reviewer, Rev iew) = 0.75 (2)
σ(hasW ritten, wr ittenBy) = 0.7(3)
σ(P aperRev iew, Review) = 0.54 (4)
1
O2
Reviewer PaperReview Submission
hasWritten
Person Document
Author Paper Review
writtenBy
Agent Documents
Oconcept
property
subsumption
disjointness
a1b1
c1d1e1
a2b2
c2d2e2
p1
p2
Figure 1: Example ontology fragments.
The alignment consisting of these four correspondences con-
tains two correct (1 & 4) and two incorrect (2 & 3) corre-
spondences resulting in a precision of 50%.
Markov Logic and Ontology Matching
Markov logic combines first-order logic and undirected
probabilistic graphical models (Richardson and Domingos
2006). A Markov logic network (MLN) is a set of first-order
formulae with weights. The more evidence we have that
a formula is true the higher the weight of this formula. It
has been proposed as a possible approach to several prob-
lems occurring in the context of the semantic web (Domin-
gos et al. 2008). We argue that Markov logic provides
an excellent framework for ontology matching as it cap-
tures both hard logical axioms and soft uncertain statements
about potential correspondences between ontological enti-
ties. The probabilistic-logical framework we propose for on-
tology matching essentially adapts the syntax and semantics
of Markov logic. However, we always type predicates and
we require a strict distinction between hard and soft formu-
lae as well as hidden and observable predicates.
Syntax
A signature is a 4-tuple S= (O, H, C, U )with Oa finite
set of typed observable predicate symbols, Ha finite set
of typed hidden predicate symbols, Ca finite set of typed
constants, and Ua finite set of function symbols. In the
context of ontology matching, constants correspond to
ontological entities such as concepts and properties, and
predicates model relationships between these entities such
as disjointness, subsumption, and equivalence. A Markov
logic network (MLN) is a pair (Fh,Fs)where Fhis a set
{Fh
i}of first-order formulae built using predicates from
OHand Fsis a set of pairs {(Fi, wi)}with each Fibeing
a first-order formula built using predicates from OH
and each wiRa real-valued weight associated with
formula Fi. Note how we explicitly distinguish between
hard formulae Fhand soft formulae Fs.
Semantics
Let M= (Fh,Fs)be a Markov logic network with signa-
ture S= (O, H, C, U ). A groundingof a first-order formula
Fis generated by substituting each occurrence of every vari-
able in Fwith constants in Cof compatible type. Existen-
tially quantified formulae are substituted by the disjunctions
of their groundings over the finite set of constants. A for-
mula that does not contain any variables is ground and a for-
mula that consists of a single predicate is an atom. Markov
logic makes several assumptions such as (a) different con-
stants refer to different objects and (b) the only objects in the
domain are those representable using the constants (Richard-
son and Domingos 2006). For the ML framework, we only
consider formulae with universal quantifiers at the outermost
level. A set of ground atoms is a possible world. We say that
a possible world Wsatisfies a formula F, and write W|=F,
if Fis true in W. Let GC
Fbe the set of all possible ground-
ings of formula Fwith respect to C. We say that Wsatisfies
GC
F, and write W|=GC
F, if Fsatisfies every formula in GC
F.
Let Wbe the set of all possible worlds with respect to S.
Then, the probability of a possible world Wis given by
p(W) = 1
Zexp 0
B
@X
(Fi,wi)∈FsX
g∈GC
Fi:W|=g
wi
1
C
A,
if for all F∈ F h:W|=GC
F; and p(W) = 0 otherwise.
Here, Zis a normalization constant.
In the context of ontology matching, possible worlds corre-
spond to possible alignments and the goal is to determine the
most probable alignment given the evidence. Note that sev-
eral existing methods have sought to maximize the sum of
confidence values subject to constraints enforcing the align-
ments to be, for instance, one-to-one and functional. The
given probabilistic semantics unifies these approaches in a
coherent theoretical framework.
Matching Formalization
Given two ontologies O1and O2and an initial a-priori sim-
ilarity σwe apply the following formalization. First, we in-
troduce observable predicates Oto model the structure of
O1and O2with respect to both concepts and properties.
For the sake of simplicity we use uppercase letters D , E, R
to refer to individual concepts and properties in the ontolo-
gies and lowercase letters d, e, r to refer to the correspond-
ing constants in C. In particular, we add ground atoms of
observable predicates to Fhfor i∈ {1,2}according to the
following rules1:
Oi|=DE7→ subi(d, e)
Oi|=D⊑ ¬E7→ disi(d, e)
Oi|=R.⊤ ⊑ D7→ subd
i(r, d)
Oi|=R.⊤ ⊒ D7→ supd
i(r, d)
Oi|=R.⊤⊑¬D7→ disd
i(r, d)
The knowledge encoded in the ontologies is assumed to
be true. Hence, the ground atoms of observable predicates
1Due to space considerations the list is incomplete. For in-
stance, predicates modeling range restrictions are not included.
are added to the set of hard constraints Fh, making them
hold in every computed alignment. The hidden predicates
mcand mp, on the other hand, model the sought-after con-
cept and property correspondences, respectively. Given the
state of the observable predicates, we are interested in deter-
mining the state of the hidden predicates that maximize the
a-posteriori probability of the corresponding possible world.
The ground atoms of these hidden predicates are assigned
the weights specified by the a-priori similarity σ. The higher
this value for a correspondence the more likely the corre-
spondence is correct a-priori. Hence, the following ground
formulae are added to Fs:
(mc(c, d), σ(C, D)) if C and D are concepts
(mp(p, r), σ(P , R)) if P and R are properties
Notice that the distinction between mcand mpis required
since we use typed predicates and distinguish between the
concept and property type.
Cardinality Constraints A method often applied in real-
world scenarios is the selection of a functional one-to-one
alignment (Cruz et al. 2009). Within the ML framework, we
can include a set of hard cardinality constraints, restricting
the alignment to be functional and one-to-one. In the
following we write x, y, z to refer to variables ranging over
the appropriately typed constants and omit the universal
quantifiers.
mc(x, y)mc(x, z )y=z
mc(x, y)mc(z , y)x=z
Analogously, the same formulae can be included with
hidden predicates mp, restricting the property alignment to
be one-to-one and functional.
Coherence Constraints Incoherence occurs when axioms
in ontologies lead to logical contradictions. Clearly, it is de-
sirable to avoid incoherence during the alignment process.
Some methods of incoherence removal for ontology align-
ments were introduced in (Meilicke, Tamilin, and Stuck-
enschmidt 2007). All existing approaches, however, re-
move correspondences after the computation of the align-
ment. Within the ML framework we can incorporate inco-
herence reducing constraints during the alignment process
for the first time. This is accomplished by adding formulae
of the following type to Fh.
dis1(x, x)sub2(x, x)⇒ ¬(mc(x, y)mc(x, y ))
disd
1(x, x)subd
2(y, y )⇒ ¬(mp(x, y)mc(x, y))
The second formula, for example, has the following purpose.
Given properties X, Y and concepts X, Y . Suppose that
O1|=X.⊤⊑¬Xand O2|=Y.⊤ ⊑ Y. Now, if
hX, Y, ≡i and hX, Y ,≡i were both part of an alignment
the merged ontology would entail both X.⊤ ⊑ Xand
X.⊤ ⊑ ¬Xand, therefore, X.⊤⊑⊥. The specified
formula prevents this type of incoherence. It is known that
such constraints, if carefully chosen, can avoid a majority of
possible incoherences (Meilicke and Stuckenschmidt 2009).
Stability Constraints Several existing approaches to
schema and ontology matching propagate alignment
evidence derived from structural relationships between
concepts and properties. These methods leverage the fact
that existing evidence for the equivalence of concepts C
and Dalso makes it more likely that, for example, child
concepts of Cand child concepts of Dare equivalent.
One such approach to evidence propagation is similarity
flooding (Melnik, Garcia-Molina, and Rahm. 2002). As a
reciprocal idea, the general notion of stability was intro-
duced, expressing that an alignment should not introduce
new structural knowledge (Meilicke and Stuckenschmidt
2007). The soft formula below, for instance, decreases the
probability of alignments that map concepts Xto Yand X
to Yif Xsubsumes Xbut Ydoes not subsume Y.
(sub1(x, x)∧ ¬sub2(y, y )mc(x, y)mc(x, y), w1)
(subd
1(x, x)∧ ¬subd
2(y, y)mp(x, y)mc(x, y), w2)
Here, w1and w2are negative real-valued weights,
rendering alignments that satisfy the formulae possible but
less likely.
The presented list of cardinality, coherence, and stability
constraints is by no means exhaustive. Other constraints
could, for example, model known correct correspondences
or generalize the one-to-one alignment to m-to-n align-
ments, or a novel hidden predicate could be added mod-
eling correspondences between instances of the ontologies.
To keep the discussion of the approach simple, however, we
leave these considerations to future research.
Example 3. We apply the previous formalization to Ex-
ample 2. To keep it simple, we only use a-priori values,
cardinality, and coherence constraints. Given the two
ontologies O1and O2in Figure 1, and the matching
hypotheses (1) to (4) from Example 2, the ground MLN
would include the following relevant ground formulae. We
use the concept and property labels from Figure 1 and omit
ground atoms of observable predicates.
A-priori similarity:
(mc(b1, b2),0.88),(mc(c1, e2),0.75),
(mp(p1, p2),0.7),(mc(d1, e2),0.54)
Cardinality constraints:
mc(c1, e2)mc(d1, e2)c1=d1(5)
Coherence constraints:
disd
1(p1, b1)subd
2(p2, b2)⇒ ¬(mp(p1, p2)mc(b1, b2)) (6)
dis1(b1, c1)sub2(b2, e2)⇒ ¬(mc(b1, b2)mc(c1, e2)) (7)
subd
1(p1, c1)disd
2(p2, e2)⇒ ¬(mp(p1, p2)mc(c1, e2)) (8)
MAP Inference as Alignment Process
Hidden predicates model correspondences between entities
of the two ontologies whereas observable ones model predi-
cates occurring in typical description logic statements. If we
want to determine the most likely alignment of two given on-
tologies, we need to compute the set of ground atoms of the
hidden predicates that maximizes the probability given both
the ground atoms of observable predicates and the ground
formulae of Fhand Fs. This is an instance of MAP (maxi-
mum a-posteriori) inference in the ground Markov logic net-
work. Let Obe the set of all ground atoms of observable
predicates and Hbe the set of all ground atoms of hidden
predicates both with respect to C. Assume that we are given
a set OOof ground atoms of observable predicates. In
order to find the most probable alignment we have to com-
pute
argmax
HHX
(Fi,wi)∈Fs
X
g∈GC
Fi:OH|=g
wi,
subject to OH|=GC
Ffor all F∈ F h.
Markov logic is by definition a declarative language, sep-
arating the formulation of a problem instance from the al-
gorithm used for probabilistic inference. MAP inference
in Markov logic networks is essentially equivalent to the
weighted MAX-SAT problem and, therefore, NP-hard. Sev-
eral approximate algorithms for the weighted MAX-SAT
problem exist. However, since each ground formula in Fh
must be satisfied in the computed MAP state, exact infer-
ence is required in our setting. Hence, we apply integer
linear programming (ILP) which was shown to be an effec-
tive method for exact MAP inference in undirected graph-
ical models (Roth and Yih 2005; Taskar et al. 2005) and
specifically in Markov logic networks (Riedel 2008). ILP is
concerned with optimizing a linear objective function over
a finite number of integer variables, subject to a set of lin-
ear equalities and inequalities over these variables (Schri-
jver 1998). We omit the details of the ILP representation
of a ground Markov logic network but demonstrate how the
ground formulae from Example 3 would be represented as
an ILP instance.
Example 4. Let the binary ILP variables x1, x2, x3,and x4
model the ground atoms mc(b1, b2), mc(c1, e2), mp(p1, p2),
and mc(d1, e2), respectively. The ground formulae from Ex-
ample 3 can be encoded with the following ILP.
Maximize: 0.88x1+ 0.75x2+ 0.7x3+ 0.54x4
Subject to: x2+x41(9)
x1+x31(10)
x1+x21(11)
x2+x31(12)
The a-priori weights of the potential correspondences are
factored in as coefficients of the objective function. Here,
the ILP constraint (9) corresponds to ground formula
(5), and ILP constraints (10),(11), and (12) correspond
to the coherence ground formulae (6), (7), and (8), re-
spectively. An optimal solution to the ILP consists of the
variables x1and x4corresponding to the correct alignment
{mc(b1, b2), mc(d1, e2)}. Compare this with the alignment
{mc(b1, b2), mc(c1, e2), mp(p1, p2)}which would be the
outcome without coherence constraints.
Experiments
We use the Ontofarm dataset (Svab et al. 2005) as basis
for our experiments. It is the evaluation dataset for the
OAEI conference track which consists of several ontologies
modeling the domain of scientific conferences (J´erˆome
Euzenat et al 2009). The ontologies were designed by
different groups and, therefore, reflect different concep-
tualizations of the same domain. Reference alignments
for seven of these ontologies are made available by the
organizers. These 21 alignments contain correspondences
between concepts and properties including a reasonable
number of non-trivial cases. For the a-priori similarity σ
we decided to use a standard lexical similarity measure.
After converting the concept and object property names
to lowercase and removing delimiters and stop-words, we
applied a string similarity measure based on the Levensthein
distance. More sophisticated a-priori similarity measures
could be used but since we want to evaluate the benefits
of the ML framework we strive to avoid any bias related
to custom-tailored similarity measures. We applied the
reasoner Pellet (Sirin et al. 2007) to create the ground MLN
formulation and used TheBeast2(Riedel 2008) to convert
the MLN formulations to the corresponding ILP instances.
Finally, we applied the mixed integer programming solver
SCIP3to solve the ILP. All experiments were conducted
on a desktop PC with AMD Athlon Dual Core Processor
5400B with 2.6GHz and 1GB RAM. The software as
well as additional experimental results are available at
http://code.google.com/p/ml-match/.
The application of a threshold τis a standard technique
in ontology matching. Correspondences that match enti-
ties with high similarity are accepted while correspondences
with a similarity less than τare deemed incorrect. We eval-
uated our approach with thresholds on the a-priori simi-
larity measure σranging from 0.45 to 0.95. After apply-
ing the threshold τwe normalized the values to the range
[0.1,1.0]. For each pair of ontologies we computed the F1-
value, which is the harmonic mean of precision and recall,
and computed the mean of this value over all 21 pairs of
ontologies. We evaluated four different settings:
ca: The formulation includes only cardinality constraints.
ca+co: The formulation includes only cardinality and co-
herence constraints.
ca+co+sm: The formulation includes cardinality, coher-
ence, and stability constraint, and the weights of the sta-
bility constraints are determined manually. Being able to
set qualitative weights manually is crucial as training data
is often unavailable. The employed stability constraints
consist of (1) constraints that aim to guarantee the sta-
bility of the concept hierarchy, and (2) constraints that
deal with the relation between concepts and property do-
main/range restrictions. We set the weights for the first
group to 0.5and the weights for the second group to
0.25. This is based on the consideration that subsump-
tion axioms between concepts are specified by ontology
engineers more often than domain and range restriction
of properties (Ding and Finin 2006). Thus, a pair of two
correct correspondences will less often violate constraints
of the first type than constraints of the second type.
ca+co+sl: The formulation also includes cardinality, co-
herence, and stability constraint, but the weights of the
stability constraints are learned with a simple online
2http://code.google.com/p/thebeast/
3http://scip.zib.de/
0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
0.38
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
threshold
average F1 measure
with cardinality, coherence, and stability constraints
only with cardinality and coherence constraints
only with cardinality constraints
AgreementMaker with optimal threshold
AgreementMaker with standard threshold
Figure 2: F1-values for ca,ca+co, and ca+co+sm averaged
over the 21 OAEI reference alignments for thresholds rang-
ing from 0.45 to 0.95. AgreementMaker was the best per-
forming system on the conference dataset of the latest ontol-
ogy evaluation initiative in 2009.
threshold 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
ca+co+sm 0.56 0.59 0.60 0.61 0.62 0.63 0.62 0.62
ca+co+sl 0.57 0.58 0.58 0.61 0.61 0.61 0.63 0.62
ca+co 0.54 0.56 0.58 0.59 0.61 0.62 0.62 0.61
Table 1: Average F1-values over the 21 OAEI reference
alignments for manual weights (ca+co+sm) vs. learned
weights (ca+co+sl) vs. formulation without stability con-
straints (ca+co); thresholds range from 0.6 to 0.95.
learner using the perceptron rule. During learning we
fixed the a-priori weights and learned only the weights
for the stability formulae. We took 5 of the 7 ontologies
and learned the weights on the 10 resulting pairs. With
these weights we computed the alignment and its F1-value
for the remaining pair of ontologies. This was repeated
for each of the 21 possible combinations to determine the
mean of the F1-values.
The lower the threshold the more complex the resulting
ground MLN and the more time is needed to solve the cor-
responding ILP. The average time needed to compute one
alignment was 61 seconds for τ= 0.45 and 0.5 seconds for
τ= 0.85. Figure 2 depicts the average F1-values for ca,
ca+co, and ca+co+sm compared to the average F1-values
achieved by AgreementMaker (Cruz et al. 2009), the best-
performing system in the OAEI conference track of 2009.
These average F1-values of AgreementMaker were obtained
using two different thresholds. The first is the default thresh-
old of AgreementMaker and the second is the threshold at
which the average F1-value attains its maximum. The inclu-
sion of coherence constraints (ca+co) improves the average
F1-value of the alignments for low to moderate thresholds
by up to 6% compared to the ca setting. With increasing
thresholds this effect becomes weaker and is negligible for
τ0.9. This is the case because alignments generated
with ca for thresholds 0.9contain only a small number
of incorrect correspondences. The addition of stability con-
straints (ca+co+sm) increases the quality of the alignments
again by up to 6% for low to moderate thresholds. In the op-
timal configuration (ca+co+sl with τ= 0.85) we measured
an average F1-value of 0.63 which is a 7% improvement
compared to AgreementMaker’s 0.56. What is more impor-
tant to understand, however, is that our approach generates
more accurate results over a wide range of thresholds and
is therefore more robust to threshold estimation. This is ad-
vantageous since in most real-world matching scenarios the
estimation of appropriate thresholds is not possible. While
the ca setting generates F1-values >0.57 for τ0.75 the
ca+co+sm setting generates F1-values >0.59 for τ0.65.
Even for τ= 0.45, usually considered an inappropriate
threshold choice, we measured an average F1-value of 0.51
and average precision and recall values of 0.48 and 0.60, re-
spectively. Table 1 compares the average F1-values of the
ML formulation (a) with manually set weights for the stabil-
ity constraints, (b) with learned weights for the stability con-
straints, and (c) without any stability constraints. The values
indicate that using stability constraints improves alignment
quality with both learned and manually set weights.
Discussion and Future Work
We presented a Markov logic based framework for ontology
matching capturing a wide range of matching strategies.
Since these strategies are expressed with a unified syntax
and semantics we can isolate variations and empirically
evaluate their effects. Even though we focused only on a
small subset of possible alignment strategies the results are
already quite promising. We have also successfully learned
weights for soft formulae within the framework. In cases
where training data is not available, weights set manually
by experts still result in improved alignment quality. Re-
search related to determining appropriate weights based on
structural properties of ontologies is a topic of future work.
The framework is not only useful for aligning concepts and
properties but can also include instance matching. For this
purpose, one would only need to add a hidden predicate
modeling instance correspondences. The resulting matching
approach would immediately benefit from probabilistic
joint inference, taking into account the interdependencies
between terminological and instance correspondences.
Acknowledgments
Many thanks to Sebastian Riedel for helping us tame The-
Beast and for his valuable feedback.
References
Albagli, S.; Ben-Eliyahu-Zohary, R.; and Shimony, S. E.
2009. Markov network based ontology matching. In Pro-
ceedings of the International Joint Conference on Artificial
Intelligence, 1884–1889.
Cruz, I.; Palandri, F.; Antonelli; and Stroe, C. 2009. Ef-
ficient selection of mappings and automatic quality-driven
combination of matching methods. In Proceedings of the
ISWC 2009 Workshop on Ontology Matching.
Ding, L., and Finin, T. 2006. Characterizing the seman-
tic web on the web. In Proceedings of the International
Semantic Web Conference 2006, 242–257.
Domingos, P.; Lowd, D.; Kok, S.; Poon, H.; Richardson,
M.; and Singla, P. 2008. Just add weights: Markov logic
for the semantic web. In Proceedings of the Workshop on
Uncertain Reasoning for the Semantic Web, 1–25.
Euzenat, J., and Shvaiko, P. 2007. Ontology matching.
Springer-Verlag.
erˆome Euzenat et al. 2009. Results of the ontology align-
ment evaluation initiative 2009. In Proceedings of the
Workshop on Ontology Matching.
Levenshtein, V. I. 1965. Binary codes capable of correcting
deletions and insertions and reversals. Doklady Akademii
Nauk SSSR 845–848.
Meilicke, C., and Stuckenschmidt, H. 2007. Analyz-
ing mapping extraction approaches. In Proceedings of the
Workshop on Ontology Matching.
Meilicke, C., and Stuckenschmidt, H. 2009. An efficient
method for computing alignment diagnoses. In Proceed-
ings of the International Conference on Web Reasoning
and Rule Systems, 182–196.
Meilicke, C.; Tamilin, A.; and Stuckenschmidt, H. 2007.
Repairing ontology mappings. In Proceedings of the Con-
ference on Artificial Intelligence, 1408–1413.
Melnik, S.; Garcia-Molina, H.; and Rahm., E. 2002. Sim-
ilarity flooding: A versatile graph matching algorithm and
its application to schema matching. In Proceedings of
ICDE, 117–128.
Meza-Ruiz, I., and Riedel, S. 2009. Multilingual semantic
role labelling with markov logic. In Proceedings of the
Conference on Computational Natural Language Learn-
ing, 85–90.
Richardson, M., and Domingos, P. 2006. Markov logic
networks. Machine Learning 62(1-2):107–136.
Riedel, S. 2008. Improving the accuracy and efficiency of
map inference for markov logic. In Proceedings of UAI,
468–475.
Roth, D., and Yih, W. 2005. Integer linear programming
inference for conditional random fields. In Proceedings of
ICML, 736–743.
Schrijver, A. 1998. Theory of Linear and Integer Program-
ming. Wiley & Sons.
Sirin, E.; Parsia, B.; Grau, B. C.; Kalyanpur, A.; and Katz,
Y. 2007. Pellet: a practical OWL-DL reasoner. Journal of
Web Semantics 5(2):51–53.
Svab, O.; Svatek, V.; Berka, P.; Rak, D.; and Tomasek, P.
2005. Ontofarm: Towards an experimental collection of
parallel ontologies. In Poster Track of ISWC.
Taskar, B.; Chatalbashev, V.; Koller, D.; and Guestrin, C.
2005. Learning structured prediction models: a large mar-
gin approach. In Proceedings of ICML, 896–903.
Wu, F., and Weld, D. S. 2008. Automatically refining the
wikipedia infobox ontology. In Proceeding of the Interna-
tional World Wide Web Conference, 635–644.
... In particular, the current statistic 1 of LOD 2 shows that there are 1564 KGs with 395.121 billion triples and only 2.72 billion links (0.07%) among them. Therefore, discovering links among these KGs is a major challenge for achieving the vision behind the LOD 3 . ...
... Prior to fusing KGs, such systems mainly identify semantically equivalent entities in different KGs, where they try to achieve both high effectiveness and efficiency in the linking process. For instance, LogMap [2] and Codi [3] use structural matching based on the ontology structure to discover links between ontologies. Nentwig et al. [4] list many data augmentation systems that have been developed in the last two decades. ...
... LogMap [2] is a highly scalable ontology matching system that provides reasoning and diagnosis capabilities. Codi [3] is another ontology matching framework relying on Markov logic. The system defines the syntax and semantics and formalizes the ontology matching problem. ...
Article
Full-text available
Knowledge graphs (KGs) that follow the Linked Data principles are created daily. However, there are no holistic models for the Linked Open Data (LOD). Building these models( i.e., engineering a pipeline system) is still a big challenge in order to make the LOD vision comes true. In this paper, we address this challenge by presenting NELLIE, a pipeline architecture to build a chain of modules, in which each of our modules addresses one data augmentation challenge. The ultimate goal of the proposed architecture is to build a single fused knowledge graph out of the LOD. NELLIE starts by crawling the available knowledge graphs in the LOD cloud. It then finds a set of matching KG pairs. NELLIE uses a two-phase linking approach for each pair (first an ontology matching phase, then an instance matching phase). Based on the ontology and instance matching, NELLIE fuses each pair of knowledge graphs into a single knowledge graph. The resulting fused KG is then an ideal data source for knowledge-driven applications such as search engines, question answering, digital assistants and drug discovery. Our evaluation shows an improved Hit @1 score of the link prediction task on the resulting fused knowledge graph by NELLIE in up to 94.44% of the cases. Our evaluation also shows a runtime improvement by several orders of magnitude when comparing our two-phases linking approach with the estimated runtime of linking using a naïve approach.
... While there is a rich body of research on ontology alignment and schema matching [36,11,40,20,44,49,32,4], these works typically focus on identifying perfectly matching pairs of predicates with the same or largely overlapping values. This situation differs from our setting where the integer values of counting predicates and the cardinalities of enumerating predicates modelling the same or related phenomenon rarely match perfectly. ...
... For ontologies and on the semantic web, added complexity comes from taxonomies and ontological constraints [11,40]. Approaches to ontology alignment include BLOOMS [20] and PARIS [44], voting-based aggregation [49], probabilistic frameworks [32], or methods for the alignment of multicultural data [4]. These methods typically rely on a combination of lexical, structural, constraint and instance based information. ...
... While there is a rich body of research on ontology alignment and schema matching [36,11,40,20,44,49,32,4], these works typically focus on identifying perfectly matching pairs of predicates with the same or largely overlapping values. This situation differs from our setting where the integer values of counting predicates and the cardinalities of enumerating predicates modelling the same or related phenomenon rarely match perfectly. ...
... For ontologies and on the semantic web, added complexity comes from taxonomies and ontological constraints [11,40]. Approaches to ontology alignment include BLOOMS [20] and PARIS [44], voting-based aggregation [49], probabilistic frameworks [32], or methods for the alignment of multicultural data [4]. These methods typically rely on a combination of lexical, structural, constraint and instance based information. ...
Chapter
Full-text available
Predicate constraints of general-purpose knowledge bases (KBs) like Wikidata, DBpedia and Freebase are often limited to subproperty, domain and range constraints. In this demo we showcase CounQER, a system that illustrates the alignment of counting predicates, like staffSize, and enumerating predicates, like workInstitution\(^{-1}\). In the demonstration session, attendees can inspect these alignments, and will learn about the importance of these alignments for KB question answering and curation. CounQER is available at https://counqer.mpi-inf.mpg.de/spo.
... While there is a rich body of research on ontology alignment and schema matching [36,11,40,20,44,49,32,4], these works typically focus on identifying perfectly matching pairs of predicates with the same or largely overlapping values. This situation differs from our setting where the integer values of counting predicates and the cardinalities of enumerating predicates modelling the same or related phenomenon rarely match perfectly. ...
... For ontologies and on the semantic web, added complexity comes from taxonomies and ontological constraints [11,40]. Approaches to ontology alignment include BLOOMS [20] and PARIS [44], voting-based aggregation [49], probabilistic frameworks [32], or methods for the alignment of multicultural data [4]. These methods typically rely on a combination of lexical, structural, constraint and instance based information. ...
Article
Full-text available
Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation. In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyze the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo.
... showing key instances of queries over counting predicates (e.g., showing a few individual Turing Award winners for a query about the number of award winners). While there is a rich body of research on ontology alignment and schema matching [28,10,31,16,33,38,26,5], these works typically focus on identifying perfectly matching pairs of predicates with the same or largely overlapping values. This situation differs from our setting where the integer values of counting predicates and the cardinalities of enumerating predicates modelling the same or related phenomenon rarely match perfectly. ...
... For ontologies and on the semantic web, added complexity comes from taxonomies and ontological constraints [10,31]. Approaches to ontology alignment include BLOOMS [16] and PARIS [33], voting-based aggregation [38], probabilistic frameworks [26], or methods for the alignment of multicultural data [5]. These methods typically rely on a combination of lexical, structural, constraint and instance based information. ...
Preprint
Full-text available
Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation.
... Related Work Neipert et al. [32] presented a novel probabilistic-logical framework for ontology matching based on Markov logic. They identified incoherent and probabilistic type information by determining correspondences among concepts, properties, and individuals of different heterogeneous ontologies in ontology matching operations. ...
Article
Full-text available
There are various real-world applications and areas, knowledge that handles with ambiguity, imperfect, or partial are difficult to capture. Such situations cause problems to discover new knowledge while dealing with decision-making and information retrieval. Ontologies are description-based logic but not able to handle the uncertain or incomplete knowledge in specific application domains. Therefore, it is very important to deal with uncertainty in ontologies. Moreover, we need to handle uncertainty for organizing the web data, so that machines can easily understand and retrieve the desired information efficiently and accurately. In this paper, we have identified various uncertainties in ontology/ies to achieve above objectives, based on different classification of ontology like intra-ontology, inter-ontology, and external ontology using different operations. Furthermore, we have carried out impact analysis of uncertainty using different context and situations to ontology and its operations for how and where uncertainties have to be represented and what semantics are considerable. In the literature, we have found that various researchers have been working on identifying the uncertainties in different domains of application like vague, inaccurate, missing, etc. Despite of identification of these uncertainties, we have also mapped the various situations and context related to ontology which is missing in the literature.
... Ontology Alignment There is a large body of work on ontology alignment (Euzenat et al., 2007;Otero-Cerdeira et al., 2015;Niepert et al., 2010;Schumann and Lécué, 2015), primarily driven by the Ontology Alignment Evaluation Initiative (OAEI). OAEI has been organising ontology alignment challenges since 2004 where multiple datasets belonging to different domains are released along with a public evaluation platform to evaluate different systems. ...
... There has been a large body of work on the ontology alignment problem (Euzenat, Shvaiko et al. 2007;Otero-Cerdeira, Rodríguez-Martínez, and Gómez-Rodríguez 2015;Niepert, Meilicke, and Stuckenschmidt 2010;Schumann and Lécué 2015), primarily driven by the OAEI (Ontology Alignment Evaluation Initiative). OAEI has been conducting ontology alignment challenges since 2004 where multiple datasets belonging to different domains are released along with a public evaluation platform to evaluate different systems. ...
Preprint
Ontology Alignment is an important research problem that finds application in various fields such as data integration, data transfer, data preparation etc. State-of-the-art (SOTA) architectures in Ontology Alignment typically use naive domain-dependent approaches with handcrafted rules and manually assigned values, making them unscalable and inefficient. Deep Learning approaches for ontology alignment use domain-specific architectures that are not only in-extensible to other datasets and domains, but also typically perform worse than rule-based approaches due to various limitations including over-fitting of models, sparsity of datasets etc. In this work, we propose VeeAlign, a Deep Learning based model that uses a dual-attention mechanism to compute the contextualized representation of a concept in order to learn alignments. By doing so, not only does our approach exploit both syntactic and semantic structure of ontologies, it is also, by design, flexible and scalable to different domains with minimal effort. We validate our approach on various datasets from different domains and in multilingual settings, and show its superior performance over SOTA methods.
... MLN formulas enforce structural similarity, so that related concepts in one ontology map to similarly related concepts in the other. 23 Similar rules can be used for knowledge base refinement, automatically detecting and correcting errors in uncertain knowledge by enforcing consistency among classes and relations. 12 Semantic network extraction (SNE). ...
Article
Markov logic can be used as a general framework for joining logical and statistical AI.
Chapter
Building upon the possibilities of technologies like ontology engineering, knowledge representational models, and semantic reasoning, our work presented in this paper, which has been performed within the collaborative research project PREVISION (Prediction and Visual Intelligence for Security Information), co-funded by the European Commission within Horizon 2020 programme, is going to support Law Enforcement Agencies (LEAs) in their critical need to exploit all available resources, and handling the large amount of diversified media modalities to effectively carry out criminal investigation. A series of tools have been developed within PREVISION which provide LEAs with the capabilities of analyzing and exploiting multiple massive data streams coming from social networks, the open web, the Darknet, traffic and financial data sources, etc. and to semantically integrate these into dynamic knowledge graphs that capture the structure, interrelations and trends of terrorist groups and individuals and Organized Crime Groups (OCG). The paper at hand focuses on the developed ontology, the tool for Semantic Reasoning and the knowledge base and knowledge visualization.
Conference Paper
Full-text available
This paper presents our system for the CoNLL 2009 Shared Task on Syntactic and Semantic Dependencies in Multiple Languages (Hajic et al., 2009). In this work we focus only on the Semantic Role Labelling (SRL) task. We use Markov Logic to define a joint SRL model and achieve the third best average performance in the closed Track for SRLOnly systems and the sixth including for both SRLOnly and Joint systems.
Article
Full-text available
The number of semantic web tools assumed to op-erate on multiple ontologies from the same domain is growing far ahead of availability of suitable test-ing material. We suggest a domain and principles for building a widely-usable collection of parallel ontologies. An initial fragment of the collection has been created and processed with two among avail-able tools.
Conference Paper
Full-text available
Automatically discovering semantic relations between on- tologies is an important task with respect to overcoming se- mantic heterogeneity on the semantic web. Existing ontology matching systems, however, often produce erroneous map- pings. In this paper, we address the problem of errors in mappings by proposing a completely automatic debugging method for ontology mappings. The method uses logical rea- soning to discover and repair logical inconsistencies caused by erroneous mappings. We describe the debugging method and report experiments on mappings submitted to the ontol- ogy alignment evaluation challenge that show that the pro- posed method actually improves mappings created by differ- ent matching systems without any human intervention.
Conference Paper
Full-text available
Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. Test cases can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI-2009 builds over previous campaigns by having 5 tracks with 11 test cases followed by 16 participants. This paper is an overall presentation of the OAEI 2009 campaign.
Conference Paper
Full-text available
While lots of research in ontology matching is related to the issue of computing and refining similarity measures, only little attention has been paid to question how to extract the final alignment from a matrix of similarity values. In this paper we present a theoretical framework for describing extraction methods and argue that the quality of the final matching result is highly affected by the extraction method. Therefore, we discuss several extraction methods and apply them to some of the results submitted to the OAEI 2006. The results of our experimental study show that the proposed strategies differ with respect to precision and recall. In particular, theoretical considerations as well as emprirical results indicate that methods that additionally make use of information encoded in the ontologies result in better extractions compared to state of the art approaches. 1
Article
In this paper, we present a brief overview of Pellet: a complete OWL-DL reasoner with acceptable to very good performance, extensive middleware, and a number of unique features. Pellet is the first sound and complete OWL-DL reasoner with extensive support for reasoning with individuals (including nominal support and conjunctive query), user-defined datatypes, and debugging support for ontologies. It implements several extensions to OWL-DL including a combination formalism for OWL-DL ontologies, a non-monotonic operator, and preliminary support for OWL/Rule hybrid reasoning. Pellet is written in Java and is open source.
Conference Paper
The AgreementMaker system for ontology matching includes an extensible architecture that facilitates the integration and performance tun- ing of a variety of matching methods, an evaluation mechanism, which can make use of a reference matching or rely solely on \inherent" quality measures, and a multi-purpose user interface, which drives both the matching methods and the evaluation strategies. In this paper, we focus on two main features of AgreementMaker. The former is an optimized method,that performs the selection of mappings given the similarities between entities computed,by any matching algorithm, a threshold value, and the desired cardinalities of the mappings. Experiments show that our method is more ecient,than the typ- ically adopted combinatorial method. The latter is the evaluation framework, which includes three \inherent" quality measures that can be used both to evaluate matching methods,when a reference matching is not available and to combine multiple matching results by dening,the weighting scheme of a fully automatic combination method.
Conference Paper
Semantic Web languages are being used to represent, encode and ex- change semantic data in many contexts beyond the Web - in databases, multia- gent systems, mobile computing, and ad hoc networking environments. The core paradigm, however, remains what we call the Web aspect of the Semantic Web - its use by independent and distributed agents who publish and consume data on the World Wide Web. To better understand this central use case, we have har- vested and analyzed a collection of Semantic Web documents from an estimated ten million available on the Web. Using a corpus of more than 1.7 million docu- ments comprising over 300 million RDF triples, we describe a number of global metrics, properties and usage patterns. Most of the metrics, such as the size of Semantic Web documents and the use frequency of Semantic Web terms, were found to follow a power law distribution.