Content uploaded by Heiner Stuckenschmidt
Author content
All content in this area was uploaded by Heiner Stuckenschmidt
Content may be subject to copyright.
A Probabilistic-Logical Framework for Ontology Matching
Mathias Niepert, Christian Meilicke, Heiner Stuckenschmidt
KR & KM Research Group, Universit¨at Mannheim, Germany
{mathias, christian, heiner}@informatik.uni-mannheim.de
Abstract
Ontology matching is the problem of determining corre-
spondences between concepts, properties, and individ-
uals of different heterogeneous ontologies. With this
paper we present a novel probabilistic-logical frame-
work for ontology matching based on Markov logic.
We define the syntax and semantics and provide a for-
malization of the ontology matching problem within the
framework. The approach has several advantages over
existing methods such as ease of experimentation, inco-
herence mitigation during the alignment process, and
the incorporation of a-priori confidence values. We
show empirically that the approach is efficient and more
accurate than existing matchers on an established ontol-
ogy alignment benchmark dataset.
Introduction
Ontology matching, or ontology alignment, is the problem
of determining correspondences between concepts, proper-
ties, and individuals of two or more different formal on-
tologies (Euzenat and Shvaiko 2007). The alignment of
ontologies enables the knowledge and data expressed in
the matched ontologies to interoperate. A major insight of
the ontology alignment evaluation initiative (OAEI) (J´erˆome
Euzenat et al 2009) is that there is no best method or system
for all existing matching problems. The factors influencing
the quality of alignments range from differences in lexical
similarity measures to variations in alignment extraction ap-
proaches. This result justifies not only the OAEI itself but
also the need for a framework that facilitates the comparison
of different strategies in a straight-forward and transparent
manner. To ensure comparability of different matching ap-
proaches such a framework would need a number of charac-
teristics. In particular it should feature
•a unified syntax that supports the specification of differ-
ent approaches in the same language to isolate meaningful
methodological variations and ensure that only the effects
of known variations are observed;
•a well-defined semantics that guarantees that matching
conditions are interpreted uniformly and that outcome
Copyright c
2010, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
variations are not merely a result of different implemen-
tations of identical features;
•a testbed for a wide range of techniques used in ontology
matching including the use of soft and hard evidence such
as string similarities (soft) and logical consistency of the
result (hard); and
•support for the experimental comparison and standardized
evaluation of techniques on existing benchmarks.
Based on these considerations, we argue that Markov
logic (Richardson and Domingos 2006) provides and ex-
cellent framework for ontology matching. Markov logic
(ML) offers several advantages over existing matching ap-
proaches. Its main strength is rooted in the ability to com-
bine soft and hard first-order formulae. This allows the
inclusion of both known logical statements and uncertain
formulae modeling potential correspondences and structural
properties of the ontologies. For instance, hard formulae can
reduce incoherence during the alignment process while soft
formulae can factor in a-priori confidence values for corre-
spondences. An additional advantage of ML is joint infer-
ence, that is, the inference of two or more interdependent
hidden predicates. Several results show that joint inference
is superior in accuracy when applied to a wide range of prob-
lems such as ontology refinement (Wu and Weld 2008) and
multilingual semantic role labeling (Meza-Ruiz and Riedel
2009). Furthermore, probabilistic approaches to ontology
matching have recently produced competitive matching re-
sults (Albagli, Ben-Eliyahu-Zohary, and Shimony 2009).
In this paper, we present a framework for ontology match-
ing based on the syntax and semantics of Markov logic, in
the spirit of a tool-box, allowing users to specify and com-
bine different individual matching strategies. In particular
•we describe how several typical matching approaches are
captured by the framework;
•we show how these approaches can be aggregated in
a modular manner, jointly increasing the quality of the
alignments; and
•we compare our framework to state-of-the-art matching
systems and verify empirically that the combination of
three matching strategies leads to alignments that are
more accurate than those generated by any of the mono-
lithic matching systems.
The paper is structured as follows. First, we briefly define
ontology matching and introduce a running example that is
used throughout the paper. We then introduce the syntax and
semantics of the ML framework and show that it can repre-
sent numerous different matching approaches. We describe
probabilistic reasoning in the framework of Markov logic
and show that a solution to a given matching problem can be
obtained by solving the maximum a-posteriori (MAP) prob-
lem of a ground Markov logic network using integer linear
programming. We then report the results of an empirical
evaluation of our method using OAEI benchmark datasets.
We conclude with a set of insights gained from the experi-
ments and some ideas for future research.
Ontology Matching
Ontology matching is the process of detecting links between
entities in heterogeneous ontologies. Based on a definition
by Euzenat and Shvaiko (Euzenat and Shvaiko 2007), we
formally introduce the notion of correspondence and align-
ment to refer to these links.
Definition 1 (Correspondence and Alignment).Given on-
tologies O1and O2, let qbe a function that defines sets
of matchable entities q(O1)and q(O2). A correspon-
dence between O1and O2is a triple he1, e2, risuch that
e1∈q(O1),e2∈q(O2), and ris a semantic relation. An
alignment between O1and O2is a set of correspondences
between O1and O2.
The generic form of Definition 1 captures a wide range of
correspondences by varying what is admissible as match-
able element and semantic relation. In the following we
are only interested in equivalence correspondences between
concepts and properties. In the first step of the alignment
process most matching systems compute a-priori similarities
between matching candidates. These values are typically re-
fined in later phases of the matching process. The underly-
ing assumption is that the degree of similarity is indicative
of the likelihood that two entities are equivalent. Given two
matchable entities e1and e2we write σ(e1, e2)to refer to
this kind of a-priori similarity. Before presenting the formal
matching framework, we motivate the approach by a simple
instance of an ontology matching problem which we use as
a running example throughout the paper.
Example 2. Figure 1 depicts fragments of two ontologies
describing the domain of scientific conferences. The follow-
ing axioms are part of ontology O1and O2, respectively.
O1O2
∃hasW ritten ⊑Reviewer ∃writtenBy ⊑P aper
P aperRev iew ⊑Document Review ⊑Documents
Reviewer ⊑P e rson P aper ⊑Documents
Submission ⊑Document Author ⊑Ag ent
Document ⊑ ¬P erson P aper ⊑ ¬Review
If we apply a similarity measure σbased on the Lev-
enshtein distance (Levenshtein 1965) there are four pairs of
entities such that σ(e1, e2)>0.5.
σ(Document, Documents) = 0.88 (1)
σ(Reviewer, Rev iew) = 0.75 (2)
σ(hasW ritten, wr ittenBy) = 0.7(3)
σ(P aperRev iew, Review) = 0.54 (4)
1
O2
Reviewer PaperReview Submission
hasWritten
Person Document
Author Paper Review
writtenBy
Agent Documents
Oconcept
property
subsumption
disjointness
a1b1
c1d1e1
a2b2
c2d2e2
p1
p2
Figure 1: Example ontology fragments.
The alignment consisting of these four correspondences con-
tains two correct (1 & 4) and two incorrect (2 & 3) corre-
spondences resulting in a precision of 50%.
Markov Logic and Ontology Matching
Markov logic combines first-order logic and undirected
probabilistic graphical models (Richardson and Domingos
2006). A Markov logic network (MLN) is a set of first-order
formulae with weights. The more evidence we have that
a formula is true the higher the weight of this formula. It
has been proposed as a possible approach to several prob-
lems occurring in the context of the semantic web (Domin-
gos et al. 2008). We argue that Markov logic provides
an excellent framework for ontology matching as it cap-
tures both hard logical axioms and soft uncertain statements
about potential correspondences between ontological enti-
ties. The probabilistic-logical framework we propose for on-
tology matching essentially adapts the syntax and semantics
of Markov logic. However, we always type predicates and
we require a strict distinction between hard and soft formu-
lae as well as hidden and observable predicates.
Syntax
A signature is a 4-tuple S= (O, H, C, U )with Oa finite
set of typed observable predicate symbols, Ha finite set
of typed hidden predicate symbols, Ca finite set of typed
constants, and Ua finite set of function symbols. In the
context of ontology matching, constants correspond to
ontological entities such as concepts and properties, and
predicates model relationships between these entities such
as disjointness, subsumption, and equivalence. A Markov
logic network (MLN) is a pair (Fh,Fs)where Fhis a set
{Fh
i}of first-order formulae built using predicates from
O∪Hand Fsis a set of pairs {(Fi, wi)}with each Fibeing
a first-order formula built using predicates from O∪H
and each wi∈Ra real-valued weight associated with
formula Fi. Note how we explicitly distinguish between
hard formulae Fhand soft formulae Fs.
Semantics
Let M= (Fh,Fs)be a Markov logic network with signa-
ture S= (O, H, C, U ). A groundingof a first-order formula
Fis generated by substituting each occurrence of every vari-
able in Fwith constants in Cof compatible type. Existen-
tially quantified formulae are substituted by the disjunctions
of their groundings over the finite set of constants. A for-
mula that does not contain any variables is ground and a for-
mula that consists of a single predicate is an atom. Markov
logic makes several assumptions such as (a) different con-
stants refer to different objects and (b) the only objects in the
domain are those representable using the constants (Richard-
son and Domingos 2006). For the ML framework, we only
consider formulae with universal quantifiers at the outermost
level. A set of ground atoms is a possible world. We say that
a possible world Wsatisfies a formula F, and write W|=F,
if Fis true in W. Let GC
Fbe the set of all possible ground-
ings of formula Fwith respect to C. We say that Wsatisfies
GC
F, and write W|=GC
F, if Fsatisfies every formula in GC
F.
Let Wbe the set of all possible worlds with respect to S.
Then, the probability of a possible world Wis given by
p(W) = 1
Zexp 0
B
@X
(Fi,wi)∈FsX
g∈GC
Fi:W|=g
wi
1
C
A,
if for all F∈ F h:W|=GC
F; and p(W) = 0 otherwise.
Here, Zis a normalization constant.
In the context of ontology matching, possible worlds corre-
spond to possible alignments and the goal is to determine the
most probable alignment given the evidence. Note that sev-
eral existing methods have sought to maximize the sum of
confidence values subject to constraints enforcing the align-
ments to be, for instance, one-to-one and functional. The
given probabilistic semantics unifies these approaches in a
coherent theoretical framework.
Matching Formalization
Given two ontologies O1and O2and an initial a-priori sim-
ilarity σwe apply the following formalization. First, we in-
troduce observable predicates Oto model the structure of
O1and O2with respect to both concepts and properties.
For the sake of simplicity we use uppercase letters D , E, R
to refer to individual concepts and properties in the ontolo-
gies and lowercase letters d, e, r to refer to the correspond-
ing constants in C. In particular, we add ground atoms of
observable predicates to Fhfor i∈ {1,2}according to the
following rules1:
Oi|=D⊑E7→ subi(d, e)
Oi|=D⊑ ¬E7→ disi(d, e)
Oi|=∃R.⊤ ⊑ D7→ subd
i(r, d)
Oi|=∃R.⊤ ⊒ D7→ supd
i(r, d)
Oi|=∃R.⊤⊑¬D7→ disd
i(r, d)
The knowledge encoded in the ontologies is assumed to
be true. Hence, the ground atoms of observable predicates
1Due to space considerations the list is incomplete. For in-
stance, predicates modeling range restrictions are not included.
are added to the set of hard constraints Fh, making them
hold in every computed alignment. The hidden predicates
mcand mp, on the other hand, model the sought-after con-
cept and property correspondences, respectively. Given the
state of the observable predicates, we are interested in deter-
mining the state of the hidden predicates that maximize the
a-posteriori probability of the corresponding possible world.
The ground atoms of these hidden predicates are assigned
the weights specified by the a-priori similarity σ. The higher
this value for a correspondence the more likely the corre-
spondence is correct a-priori. Hence, the following ground
formulae are added to Fs:
(mc(c, d), σ(C, D)) if C and D are concepts
(mp(p, r), σ(P , R)) if P and R are properties
Notice that the distinction between mcand mpis required
since we use typed predicates and distinguish between the
concept and property type.
Cardinality Constraints A method often applied in real-
world scenarios is the selection of a functional one-to-one
alignment (Cruz et al. 2009). Within the ML framework, we
can include a set of hard cardinality constraints, restricting
the alignment to be functional and one-to-one. In the
following we write x, y, z to refer to variables ranging over
the appropriately typed constants and omit the universal
quantifiers.
mc(x, y)∧mc(x, z )⇒y=z
mc(x, y)∧mc(z , y)⇒x=z
Analogously, the same formulae can be included with
hidden predicates mp, restricting the property alignment to
be one-to-one and functional.
Coherence Constraints Incoherence occurs when axioms
in ontologies lead to logical contradictions. Clearly, it is de-
sirable to avoid incoherence during the alignment process.
Some methods of incoherence removal for ontology align-
ments were introduced in (Meilicke, Tamilin, and Stuck-
enschmidt 2007). All existing approaches, however, re-
move correspondences after the computation of the align-
ment. Within the ML framework we can incorporate inco-
herence reducing constraints during the alignment process
for the first time. This is accomplished by adding formulae
of the following type to Fh.
dis1(x, x′)∧sub2(x, x′)⇒ ¬(mc(x, y)∧mc(x′, y ′))
disd
1(x, x′)∧subd
2(y, y ′)⇒ ¬(mp(x, y)∧mc(x′, y′))
The second formula, for example, has the following purpose.
Given properties X, Y and concepts X′, Y ′. Suppose that
O1|=∃X.⊤⊑¬X′and O2|=∃Y.⊤ ⊑ Y′. Now, if
hX, Y, ≡i and hX′, Y ′,≡i were both part of an alignment
the merged ontology would entail both ∃X.⊤ ⊑ X′and
∃X.⊤ ⊑ ¬X′and, therefore, ∃X.⊤⊑⊥. The specified
formula prevents this type of incoherence. It is known that
such constraints, if carefully chosen, can avoid a majority of
possible incoherences (Meilicke and Stuckenschmidt 2009).
Stability Constraints Several existing approaches to
schema and ontology matching propagate alignment
evidence derived from structural relationships between
concepts and properties. These methods leverage the fact
that existing evidence for the equivalence of concepts C
and Dalso makes it more likely that, for example, child
concepts of Cand child concepts of Dare equivalent.
One such approach to evidence propagation is similarity
flooding (Melnik, Garcia-Molina, and Rahm. 2002). As a
reciprocal idea, the general notion of stability was intro-
duced, expressing that an alignment should not introduce
new structural knowledge (Meilicke and Stuckenschmidt
2007). The soft formula below, for instance, decreases the
probability of alignments that map concepts Xto Yand X′
to Y′if X′subsumes Xbut Y′does not subsume Y.
(sub1(x, x′)∧ ¬sub2(y, y ′)⇒mc(x, y)∧mc(x′, y′), w1)
(subd
1(x, x′)∧ ¬subd
2(y, y′)⇒mp(x, y)∧mc(x′, y′), w2)
Here, w1and w2are negative real-valued weights,
rendering alignments that satisfy the formulae possible but
less likely.
The presented list of cardinality, coherence, and stability
constraints is by no means exhaustive. Other constraints
could, for example, model known correct correspondences
or generalize the one-to-one alignment to m-to-n align-
ments, or a novel hidden predicate could be added mod-
eling correspondences between instances of the ontologies.
To keep the discussion of the approach simple, however, we
leave these considerations to future research.
Example 3. We apply the previous formalization to Ex-
ample 2. To keep it simple, we only use a-priori values,
cardinality, and coherence constraints. Given the two
ontologies O1and O2in Figure 1, and the matching
hypotheses (1) to (4) from Example 2, the ground MLN
would include the following relevant ground formulae. We
use the concept and property labels from Figure 1 and omit
ground atoms of observable predicates.
A-priori similarity:
(mc(b1, b2),0.88),(mc(c1, e2),0.75),
(mp(p1, p2),0.7),(mc(d1, e2),0.54)
Cardinality constraints:
mc(c1, e2)∧mc(d1, e2)⇒c1=d1(5)
Coherence constraints:
disd
1(p1, b1)∧subd
2(p2, b2)⇒ ¬(mp(p1, p2)∧mc(b1, b2)) (6)
dis1(b1, c1)∧sub2(b2, e2)⇒ ¬(mc(b1, b2)∧mc(c1, e2)) (7)
subd
1(p1, c1)∧disd
2(p2, e2)⇒ ¬(mp(p1, p2)∧mc(c1, e2)) (8)
MAP Inference as Alignment Process
Hidden predicates model correspondences between entities
of the two ontologies whereas observable ones model predi-
cates occurring in typical description logic statements. If we
want to determine the most likely alignment of two given on-
tologies, we need to compute the set of ground atoms of the
hidden predicates that maximizes the probability given both
the ground atoms of observable predicates and the ground
formulae of Fhand Fs. This is an instance of MAP (maxi-
mum a-posteriori) inference in the ground Markov logic net-
work. Let Obe the set of all ground atoms of observable
predicates and Hbe the set of all ground atoms of hidden
predicates both with respect to C. Assume that we are given
a set O′⊆Oof ground atoms of observable predicates. In
order to find the most probable alignment we have to com-
pute
argmax
H′⊆HX
(Fi,wi)∈Fs
X
g∈GC
Fi:O′∪H′|=g
wi,
subject to O′∪H′|=GC
Ffor all F∈ F h.
Markov logic is by definition a declarative language, sep-
arating the formulation of a problem instance from the al-
gorithm used for probabilistic inference. MAP inference
in Markov logic networks is essentially equivalent to the
weighted MAX-SAT problem and, therefore, NP-hard. Sev-
eral approximate algorithms for the weighted MAX-SAT
problem exist. However, since each ground formula in Fh
must be satisfied in the computed MAP state, exact infer-
ence is required in our setting. Hence, we apply integer
linear programming (ILP) which was shown to be an effec-
tive method for exact MAP inference in undirected graph-
ical models (Roth and Yih 2005; Taskar et al. 2005) and
specifically in Markov logic networks (Riedel 2008). ILP is
concerned with optimizing a linear objective function over
a finite number of integer variables, subject to a set of lin-
ear equalities and inequalities over these variables (Schri-
jver 1998). We omit the details of the ILP representation
of a ground Markov logic network but demonstrate how the
ground formulae from Example 3 would be represented as
an ILP instance.
Example 4. Let the binary ILP variables x1, x2, x3,and x4
model the ground atoms mc(b1, b2), mc(c1, e2), mp(p1, p2),
and mc(d1, e2), respectively. The ground formulae from Ex-
ample 3 can be encoded with the following ILP.
Maximize: 0.88x1+ 0.75x2+ 0.7x3+ 0.54x4
Subject to: x2+x4≤1(9)
x1+x3≤1(10)
x1+x2≤1(11)
x2+x3≤1(12)
The a-priori weights of the potential correspondences are
factored in as coefficients of the objective function. Here,
the ILP constraint (9) corresponds to ground formula
(5), and ILP constraints (10),(11), and (12) correspond
to the coherence ground formulae (6), (7), and (8), re-
spectively. An optimal solution to the ILP consists of the
variables x1and x4corresponding to the correct alignment
{mc(b1, b2), mc(d1, e2)}. Compare this with the alignment
{mc(b1, b2), mc(c1, e2), mp(p1, p2)}which would be the
outcome without coherence constraints.
Experiments
We use the Ontofarm dataset (Svab et al. 2005) as basis
for our experiments. It is the evaluation dataset for the
OAEI conference track which consists of several ontologies
modeling the domain of scientific conferences (J´erˆome
Euzenat et al 2009). The ontologies were designed by
different groups and, therefore, reflect different concep-
tualizations of the same domain. Reference alignments
for seven of these ontologies are made available by the
organizers. These 21 alignments contain correspondences
between concepts and properties including a reasonable
number of non-trivial cases. For the a-priori similarity σ
we decided to use a standard lexical similarity measure.
After converting the concept and object property names
to lowercase and removing delimiters and stop-words, we
applied a string similarity measure based on the Levensthein
distance. More sophisticated a-priori similarity measures
could be used but since we want to evaluate the benefits
of the ML framework we strive to avoid any bias related
to custom-tailored similarity measures. We applied the
reasoner Pellet (Sirin et al. 2007) to create the ground MLN
formulation and used TheBeast2(Riedel 2008) to convert
the MLN formulations to the corresponding ILP instances.
Finally, we applied the mixed integer programming solver
SCIP3to solve the ILP. All experiments were conducted
on a desktop PC with AMD Athlon Dual Core Processor
5400B with 2.6GHz and 1GB RAM. The software as
well as additional experimental results are available at
http://code.google.com/p/ml-match/.
The application of a threshold τis a standard technique
in ontology matching. Correspondences that match enti-
ties with high similarity are accepted while correspondences
with a similarity less than τare deemed incorrect. We eval-
uated our approach with thresholds on the a-priori simi-
larity measure σranging from 0.45 to 0.95. After apply-
ing the threshold τwe normalized the values to the range
[0.1,1.0]. For each pair of ontologies we computed the F1-
value, which is the harmonic mean of precision and recall,
and computed the mean of this value over all 21 pairs of
ontologies. We evaluated four different settings:
•ca: The formulation includes only cardinality constraints.
•ca+co: The formulation includes only cardinality and co-
herence constraints.
•ca+co+sm: The formulation includes cardinality, coher-
ence, and stability constraint, and the weights of the sta-
bility constraints are determined manually. Being able to
set qualitative weights manually is crucial as training data
is often unavailable. The employed stability constraints
consist of (1) constraints that aim to guarantee the sta-
bility of the concept hierarchy, and (2) constraints that
deal with the relation between concepts and property do-
main/range restrictions. We set the weights for the first
group to −0.5and the weights for the second group to
−0.25. This is based on the consideration that subsump-
tion axioms between concepts are specified by ontology
engineers more often than domain and range restriction
of properties (Ding and Finin 2006). Thus, a pair of two
correct correspondences will less often violate constraints
of the first type than constraints of the second type.
•ca+co+sl: The formulation also includes cardinality, co-
herence, and stability constraint, but the weights of the
stability constraints are learned with a simple online
2http://code.google.com/p/thebeast/
3http://scip.zib.de/
0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
0.38
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
threshold
average F1 measure
with cardinality, coherence, and stability constraints
only with cardinality and coherence constraints
only with cardinality constraints
AgreementMaker with optimal threshold
AgreementMaker with standard threshold
Figure 2: F1-values for ca,ca+co, and ca+co+sm averaged
over the 21 OAEI reference alignments for thresholds rang-
ing from 0.45 to 0.95. AgreementMaker was the best per-
forming system on the conference dataset of the latest ontol-
ogy evaluation initiative in 2009.
threshold 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
ca+co+sm 0.56 0.59 0.60 0.61 0.62 0.63 0.62 0.62
ca+co+sl 0.57 0.58 0.58 0.61 0.61 0.61 0.63 0.62
ca+co 0.54 0.56 0.58 0.59 0.61 0.62 0.62 0.61
Table 1: Average F1-values over the 21 OAEI reference
alignments for manual weights (ca+co+sm) vs. learned
weights (ca+co+sl) vs. formulation without stability con-
straints (ca+co); thresholds range from 0.6 to 0.95.
learner using the perceptron rule. During learning we
fixed the a-priori weights and learned only the weights
for the stability formulae. We took 5 of the 7 ontologies
and learned the weights on the 10 resulting pairs. With
these weights we computed the alignment and its F1-value
for the remaining pair of ontologies. This was repeated
for each of the 21 possible combinations to determine the
mean of the F1-values.
The lower the threshold the more complex the resulting
ground MLN and the more time is needed to solve the cor-
responding ILP. The average time needed to compute one
alignment was 61 seconds for τ= 0.45 and 0.5 seconds for
τ= 0.85. Figure 2 depicts the average F1-values for ca,
ca+co, and ca+co+sm compared to the average F1-values
achieved by AgreementMaker (Cruz et al. 2009), the best-
performing system in the OAEI conference track of 2009.
These average F1-values of AgreementMaker were obtained
using two different thresholds. The first is the default thresh-
old of AgreementMaker and the second is the threshold at
which the average F1-value attains its maximum. The inclu-
sion of coherence constraints (ca+co) improves the average
F1-value of the alignments for low to moderate thresholds
by up to 6% compared to the ca setting. With increasing
thresholds this effect becomes weaker and is negligible for
τ≥0.9. This is the case because alignments generated
with ca for thresholds ≥0.9contain only a small number
of incorrect correspondences. The addition of stability con-
straints (ca+co+sm) increases the quality of the alignments
again by up to 6% for low to moderate thresholds. In the op-
timal configuration (ca+co+sl with τ= 0.85) we measured
an average F1-value of 0.63 which is a 7% improvement
compared to AgreementMaker’s 0.56. What is more impor-
tant to understand, however, is that our approach generates
more accurate results over a wide range of thresholds and
is therefore more robust to threshold estimation. This is ad-
vantageous since in most real-world matching scenarios the
estimation of appropriate thresholds is not possible. While
the ca setting generates F1-values >0.57 for τ≥0.75 the
ca+co+sm setting generates F1-values >0.59 for τ≥0.65.
Even for τ= 0.45, usually considered an inappropriate
threshold choice, we measured an average F1-value of 0.51
and average precision and recall values of 0.48 and 0.60, re-
spectively. Table 1 compares the average F1-values of the
ML formulation (a) with manually set weights for the stabil-
ity constraints, (b) with learned weights for the stability con-
straints, and (c) without any stability constraints. The values
indicate that using stability constraints improves alignment
quality with both learned and manually set weights.
Discussion and Future Work
We presented a Markov logic based framework for ontology
matching capturing a wide range of matching strategies.
Since these strategies are expressed with a unified syntax
and semantics we can isolate variations and empirically
evaluate their effects. Even though we focused only on a
small subset of possible alignment strategies the results are
already quite promising. We have also successfully learned
weights for soft formulae within the framework. In cases
where training data is not available, weights set manually
by experts still result in improved alignment quality. Re-
search related to determining appropriate weights based on
structural properties of ontologies is a topic of future work.
The framework is not only useful for aligning concepts and
properties but can also include instance matching. For this
purpose, one would only need to add a hidden predicate
modeling instance correspondences. The resulting matching
approach would immediately benefit from probabilistic
joint inference, taking into account the interdependencies
between terminological and instance correspondences.
Acknowledgments
Many thanks to Sebastian Riedel for helping us tame The-
Beast and for his valuable feedback.
References
Albagli, S.; Ben-Eliyahu-Zohary, R.; and Shimony, S. E.
2009. Markov network based ontology matching. In Pro-
ceedings of the International Joint Conference on Artificial
Intelligence, 1884–1889.
Cruz, I.; Palandri, F.; Antonelli; and Stroe, C. 2009. Ef-
ficient selection of mappings and automatic quality-driven
combination of matching methods. In Proceedings of the
ISWC 2009 Workshop on Ontology Matching.
Ding, L., and Finin, T. 2006. Characterizing the seman-
tic web on the web. In Proceedings of the International
Semantic Web Conference 2006, 242–257.
Domingos, P.; Lowd, D.; Kok, S.; Poon, H.; Richardson,
M.; and Singla, P. 2008. Just add weights: Markov logic
for the semantic web. In Proceedings of the Workshop on
Uncertain Reasoning for the Semantic Web, 1–25.
Euzenat, J., and Shvaiko, P. 2007. Ontology matching.
Springer-Verlag.
J´erˆome Euzenat et al. 2009. Results of the ontology align-
ment evaluation initiative 2009. In Proceedings of the
Workshop on Ontology Matching.
Levenshtein, V. I. 1965. Binary codes capable of correcting
deletions and insertions and reversals. Doklady Akademii
Nauk SSSR 845–848.
Meilicke, C., and Stuckenschmidt, H. 2007. Analyz-
ing mapping extraction approaches. In Proceedings of the
Workshop on Ontology Matching.
Meilicke, C., and Stuckenschmidt, H. 2009. An efficient
method for computing alignment diagnoses. In Proceed-
ings of the International Conference on Web Reasoning
and Rule Systems, 182–196.
Meilicke, C.; Tamilin, A.; and Stuckenschmidt, H. 2007.
Repairing ontology mappings. In Proceedings of the Con-
ference on Artificial Intelligence, 1408–1413.
Melnik, S.; Garcia-Molina, H.; and Rahm., E. 2002. Sim-
ilarity flooding: A versatile graph matching algorithm and
its application to schema matching. In Proceedings of
ICDE, 117–128.
Meza-Ruiz, I., and Riedel, S. 2009. Multilingual semantic
role labelling with markov logic. In Proceedings of the
Conference on Computational Natural Language Learn-
ing, 85–90.
Richardson, M., and Domingos, P. 2006. Markov logic
networks. Machine Learning 62(1-2):107–136.
Riedel, S. 2008. Improving the accuracy and efficiency of
map inference for markov logic. In Proceedings of UAI,
468–475.
Roth, D., and Yih, W. 2005. Integer linear programming
inference for conditional random fields. In Proceedings of
ICML, 736–743.
Schrijver, A. 1998. Theory of Linear and Integer Program-
ming. Wiley & Sons.
Sirin, E.; Parsia, B.; Grau, B. C.; Kalyanpur, A.; and Katz,
Y. 2007. Pellet: a practical OWL-DL reasoner. Journal of
Web Semantics 5(2):51–53.
Svab, O.; Svatek, V.; Berka, P.; Rak, D.; and Tomasek, P.
2005. Ontofarm: Towards an experimental collection of
parallel ontologies. In Poster Track of ISWC.
Taskar, B.; Chatalbashev, V.; Koller, D.; and Guestrin, C.
2005. Learning structured prediction models: a large mar-
gin approach. In Proceedings of ICML, 896–903.
Wu, F., and Weld, D. S. 2008. Automatically refining the
wikipedia infobox ontology. In Proceeding of the Interna-
tional World Wide Web Conference, 635–644.