Content uploaded by Heiner Stuckenschmidt

Author content

All content in this area was uploaded by Heiner Stuckenschmidt

Content may be subject to copyright.

A Probabilistic-Logical Framework for Ontology Matching

Mathias Niepert, Christian Meilicke, Heiner Stuckenschmidt

KR & KM Research Group, Universit¨at Mannheim, Germany

{mathias, christian, heiner}@informatik.uni-mannheim.de

Abstract

Ontology matching is the problem of determining corre-

spondences between concepts, properties, and individ-

uals of different heterogeneous ontologies. With this

paper we present a novel probabilistic-logical frame-

work for ontology matching based on Markov logic.

We deﬁne the syntax and semantics and provide a for-

malization of the ontology matching problem within the

framework. The approach has several advantages over

existing methods such as ease of experimentation, inco-

herence mitigation during the alignment process, and

the incorporation of a-priori conﬁdence values. We

show empirically that the approach is efﬁcient and more

accurate than existing matchers on an established ontol-

ogy alignment benchmark dataset.

Introduction

Ontology matching, or ontology alignment, is the problem

of determining correspondences between concepts, proper-

ties, and individuals of two or more different formal on-

tologies (Euzenat and Shvaiko 2007). The alignment of

ontologies enables the knowledge and data expressed in

the matched ontologies to interoperate. A major insight of

the ontology alignment evaluation initiative (OAEI) (J´erˆome

Euzenat et al 2009) is that there is no best method or system

for all existing matching problems. The factors inﬂuencing

the quality of alignments range from differences in lexical

similarity measures to variations in alignment extraction ap-

proaches. This result justiﬁes not only the OAEI itself but

also the need for a framework that facilitates the comparison

of different strategies in a straight-forward and transparent

manner. To ensure comparability of different matching ap-

proaches such a framework would need a number of charac-

teristics. In particular it should feature

•a uniﬁed syntax that supports the speciﬁcation of differ-

ent approaches in the same language to isolate meaningful

methodological variations and ensure that only the effects

of known variations are observed;

•a well-deﬁned semantics that guarantees that matching

conditions are interpreted uniformly and that outcome

Copyright c

2010, American Association for Artiﬁcial Intelli-

gence (www.aaai.org). All rights reserved.

variations are not merely a result of different implemen-

tations of identical features;

•a testbed for a wide range of techniques used in ontology

matching including the use of soft and hard evidence such

as string similarities (soft) and logical consistency of the

result (hard); and

•support for the experimental comparison and standardized

evaluation of techniques on existing benchmarks.

Based on these considerations, we argue that Markov

logic (Richardson and Domingos 2006) provides and ex-

cellent framework for ontology matching. Markov logic

(ML) offers several advantages over existing matching ap-

proaches. Its main strength is rooted in the ability to com-

bine soft and hard ﬁrst-order formulae. This allows the

inclusion of both known logical statements and uncertain

formulae modeling potential correspondences and structural

properties of the ontologies. For instance, hard formulae can

reduce incoherence during the alignment process while soft

formulae can factor in a-priori conﬁdence values for corre-

spondences. An additional advantage of ML is joint infer-

ence, that is, the inference of two or more interdependent

hidden predicates. Several results show that joint inference

is superior in accuracy when applied to a wide range of prob-

lems such as ontology reﬁnement (Wu and Weld 2008) and

multilingual semantic role labeling (Meza-Ruiz and Riedel

2009). Furthermore, probabilistic approaches to ontology

matching have recently produced competitive matching re-

sults (Albagli, Ben-Eliyahu-Zohary, and Shimony 2009).

In this paper, we present a framework for ontology match-

ing based on the syntax and semantics of Markov logic, in

the spirit of a tool-box, allowing users to specify and com-

bine different individual matching strategies. In particular

•we describe how several typical matching approaches are

captured by the framework;

•we show how these approaches can be aggregated in

a modular manner, jointly increasing the quality of the

alignments; and

•we compare our framework to state-of-the-art matching

systems and verify empirically that the combination of

three matching strategies leads to alignments that are

more accurate than those generated by any of the mono-

lithic matching systems.

The paper is structured as follows. First, we brieﬂy deﬁne

ontology matching and introduce a running example that is

used throughout the paper. We then introduce the syntax and

semantics of the ML framework and show that it can repre-

sent numerous different matching approaches. We describe

probabilistic reasoning in the framework of Markov logic

and show that a solution to a given matching problem can be

obtained by solving the maximum a-posteriori (MAP) prob-

lem of a ground Markov logic network using integer linear

programming. We then report the results of an empirical

evaluation of our method using OAEI benchmark datasets.

We conclude with a set of insights gained from the experi-

ments and some ideas for future research.

Ontology Matching

Ontology matching is the process of detecting links between

entities in heterogeneous ontologies. Based on a deﬁnition

by Euzenat and Shvaiko (Euzenat and Shvaiko 2007), we

formally introduce the notion of correspondence and align-

ment to refer to these links.

Deﬁnition 1 (Correspondence and Alignment).Given on-

tologies O1and O2, let qbe a function that deﬁnes sets

of matchable entities q(O1)and q(O2). A correspon-

dence between O1and O2is a triple he1, e2, risuch that

e1∈q(O1),e2∈q(O2), and ris a semantic relation. An

alignment between O1and O2is a set of correspondences

between O1and O2.

The generic form of Deﬁnition 1 captures a wide range of

correspondences by varying what is admissible as match-

able element and semantic relation. In the following we

are only interested in equivalence correspondences between

concepts and properties. In the ﬁrst step of the alignment

process most matching systems compute a-priori similarities

between matching candidates. These values are typically re-

ﬁned in later phases of the matching process. The underly-

ing assumption is that the degree of similarity is indicative

of the likelihood that two entities are equivalent. Given two

matchable entities e1and e2we write σ(e1, e2)to refer to

this kind of a-priori similarity. Before presenting the formal

matching framework, we motivate the approach by a simple

instance of an ontology matching problem which we use as

a running example throughout the paper.

Example 2. Figure 1 depicts fragments of two ontologies

describing the domain of scientiﬁc conferences. The follow-

ing axioms are part of ontology O1and O2, respectively.

O1O2

∃hasW ritten ⊑Reviewer ∃writtenBy ⊑P aper

P aperRev iew ⊑Document Review ⊑Documents

Reviewer ⊑P e rson P aper ⊑Documents

Submission ⊑Document Author ⊑Ag ent

Document ⊑ ¬P erson P aper ⊑ ¬Review

If we apply a similarity measure σbased on the Lev-

enshtein distance (Levenshtein 1965) there are four pairs of

entities such that σ(e1, e2)>0.5.

σ(Document, Documents) = 0.88 (1)

σ(Reviewer, Rev iew) = 0.75 (2)

σ(hasW ritten, wr ittenBy) = 0.7(3)

σ(P aperRev iew, Review) = 0.54 (4)

1

O2

Reviewer PaperReview Submission

hasWritten

Person Document

Author Paper Review

writtenBy

Agent Documents

Oconcept

property

subsumption

disjointness

a1b1

c1d1e1

a2b2

c2d2e2

p1

p2

Figure 1: Example ontology fragments.

The alignment consisting of these four correspondences con-

tains two correct (1 & 4) and two incorrect (2 & 3) corre-

spondences resulting in a precision of 50%.

Markov Logic and Ontology Matching

Markov logic combines ﬁrst-order logic and undirected

probabilistic graphical models (Richardson and Domingos

2006). A Markov logic network (MLN) is a set of ﬁrst-order

formulae with weights. The more evidence we have that

a formula is true the higher the weight of this formula. It

has been proposed as a possible approach to several prob-

lems occurring in the context of the semantic web (Domin-

gos et al. 2008). We argue that Markov logic provides

an excellent framework for ontology matching as it cap-

tures both hard logical axioms and soft uncertain statements

about potential correspondences between ontological enti-

ties. The probabilistic-logical framework we propose for on-

tology matching essentially adapts the syntax and semantics

of Markov logic. However, we always type predicates and

we require a strict distinction between hard and soft formu-

lae as well as hidden and observable predicates.

Syntax

A signature is a 4-tuple S= (O, H, C, U )with Oa ﬁnite

set of typed observable predicate symbols, Ha ﬁnite set

of typed hidden predicate symbols, Ca ﬁnite set of typed

constants, and Ua ﬁnite set of function symbols. In the

context of ontology matching, constants correspond to

ontological entities such as concepts and properties, and

predicates model relationships between these entities such

as disjointness, subsumption, and equivalence. A Markov

logic network (MLN) is a pair (Fh,Fs)where Fhis a set

{Fh

i}of ﬁrst-order formulae built using predicates from

O∪Hand Fsis a set of pairs {(Fi, wi)}with each Fibeing

a ﬁrst-order formula built using predicates from O∪H

and each wi∈Ra real-valued weight associated with

formula Fi. Note how we explicitly distinguish between

hard formulae Fhand soft formulae Fs.

Semantics

Let M= (Fh,Fs)be a Markov logic network with signa-

ture S= (O, H, C, U ). A groundingof a ﬁrst-order formula

Fis generated by substituting each occurrence of every vari-

able in Fwith constants in Cof compatible type. Existen-

tially quantiﬁed formulae are substituted by the disjunctions

of their groundings over the ﬁnite set of constants. A for-

mula that does not contain any variables is ground and a for-

mula that consists of a single predicate is an atom. Markov

logic makes several assumptions such as (a) different con-

stants refer to different objects and (b) the only objects in the

domain are those representable using the constants (Richard-

son and Domingos 2006). For the ML framework, we only

consider formulae with universal quantiﬁers at the outermost

level. A set of ground atoms is a possible world. We say that

a possible world Wsatisﬁes a formula F, and write W|=F,

if Fis true in W. Let GC

Fbe the set of all possible ground-

ings of formula Fwith respect to C. We say that Wsatisﬁes

GC

F, and write W|=GC

F, if Fsatisﬁes every formula in GC

F.

Let Wbe the set of all possible worlds with respect to S.

Then, the probability of a possible world Wis given by

p(W) = 1

Zexp 0

B

@X

(Fi,wi)∈FsX

g∈GC

Fi:W|=g

wi

1

C

A,

if for all F∈ F h:W|=GC

F; and p(W) = 0 otherwise.

Here, Zis a normalization constant.

In the context of ontology matching, possible worlds corre-

spond to possible alignments and the goal is to determine the

most probable alignment given the evidence. Note that sev-

eral existing methods have sought to maximize the sum of

conﬁdence values subject to constraints enforcing the align-

ments to be, for instance, one-to-one and functional. The

given probabilistic semantics uniﬁes these approaches in a

coherent theoretical framework.

Matching Formalization

Given two ontologies O1and O2and an initial a-priori sim-

ilarity σwe apply the following formalization. First, we in-

troduce observable predicates Oto model the structure of

O1and O2with respect to both concepts and properties.

For the sake of simplicity we use uppercase letters D , E, R

to refer to individual concepts and properties in the ontolo-

gies and lowercase letters d, e, r to refer to the correspond-

ing constants in C. In particular, we add ground atoms of

observable predicates to Fhfor i∈ {1,2}according to the

following rules1:

Oi|=D⊑E7→ subi(d, e)

Oi|=D⊑ ¬E7→ disi(d, e)

Oi|=∃R.⊤ ⊑ D7→ subd

i(r, d)

Oi|=∃R.⊤ ⊒ D7→ supd

i(r, d)

Oi|=∃R.⊤⊑¬D7→ disd

i(r, d)

The knowledge encoded in the ontologies is assumed to

be true. Hence, the ground atoms of observable predicates

1Due to space considerations the list is incomplete. For in-

stance, predicates modeling range restrictions are not included.

are added to the set of hard constraints Fh, making them

hold in every computed alignment. The hidden predicates

mcand mp, on the other hand, model the sought-after con-

cept and property correspondences, respectively. Given the

state of the observable predicates, we are interested in deter-

mining the state of the hidden predicates that maximize the

a-posteriori probability of the corresponding possible world.

The ground atoms of these hidden predicates are assigned

the weights speciﬁed by the a-priori similarity σ. The higher

this value for a correspondence the more likely the corre-

spondence is correct a-priori. Hence, the following ground

formulae are added to Fs:

(mc(c, d), σ(C, D)) if C and D are concepts

(mp(p, r), σ(P , R)) if P and R are properties

Notice that the distinction between mcand mpis required

since we use typed predicates and distinguish between the

concept and property type.

Cardinality Constraints A method often applied in real-

world scenarios is the selection of a functional one-to-one

alignment (Cruz et al. 2009). Within the ML framework, we

can include a set of hard cardinality constraints, restricting

the alignment to be functional and one-to-one. In the

following we write x, y, z to refer to variables ranging over

the appropriately typed constants and omit the universal

quantiﬁers.

mc(x, y)∧mc(x, z )⇒y=z

mc(x, y)∧mc(z , y)⇒x=z

Analogously, the same formulae can be included with

hidden predicates mp, restricting the property alignment to

be one-to-one and functional.

Coherence Constraints Incoherence occurs when axioms

in ontologies lead to logical contradictions. Clearly, it is de-

sirable to avoid incoherence during the alignment process.

Some methods of incoherence removal for ontology align-

ments were introduced in (Meilicke, Tamilin, and Stuck-

enschmidt 2007). All existing approaches, however, re-

move correspondences after the computation of the align-

ment. Within the ML framework we can incorporate inco-

herence reducing constraints during the alignment process

for the ﬁrst time. This is accomplished by adding formulae

of the following type to Fh.

dis1(x, x′)∧sub2(x, x′)⇒ ¬(mc(x, y)∧mc(x′, y ′))

disd

1(x, x′)∧subd

2(y, y ′)⇒ ¬(mp(x, y)∧mc(x′, y′))

The second formula, for example, has the following purpose.

Given properties X, Y and concepts X′, Y ′. Suppose that

O1|=∃X.⊤⊑¬X′and O2|=∃Y.⊤ ⊑ Y′. Now, if

hX, Y, ≡i and hX′, Y ′,≡i were both part of an alignment

the merged ontology would entail both ∃X.⊤ ⊑ X′and

∃X.⊤ ⊑ ¬X′and, therefore, ∃X.⊤⊑⊥. The speciﬁed

formula prevents this type of incoherence. It is known that

such constraints, if carefully chosen, can avoid a majority of

possible incoherences (Meilicke and Stuckenschmidt 2009).

Stability Constraints Several existing approaches to

schema and ontology matching propagate alignment

evidence derived from structural relationships between

concepts and properties. These methods leverage the fact

that existing evidence for the equivalence of concepts C

and Dalso makes it more likely that, for example, child

concepts of Cand child concepts of Dare equivalent.

One such approach to evidence propagation is similarity

ﬂooding (Melnik, Garcia-Molina, and Rahm. 2002). As a

reciprocal idea, the general notion of stability was intro-

duced, expressing that an alignment should not introduce

new structural knowledge (Meilicke and Stuckenschmidt

2007). The soft formula below, for instance, decreases the

probability of alignments that map concepts Xto Yand X′

to Y′if X′subsumes Xbut Y′does not subsume Y.

(sub1(x, x′)∧ ¬sub2(y, y ′)⇒mc(x, y)∧mc(x′, y′), w1)

(subd

1(x, x′)∧ ¬subd

2(y, y′)⇒mp(x, y)∧mc(x′, y′), w2)

Here, w1and w2are negative real-valued weights,

rendering alignments that satisfy the formulae possible but

less likely.

The presented list of cardinality, coherence, and stability

constraints is by no means exhaustive. Other constraints

could, for example, model known correct correspondences

or generalize the one-to-one alignment to m-to-n align-

ments, or a novel hidden predicate could be added mod-

eling correspondences between instances of the ontologies.

To keep the discussion of the approach simple, however, we

leave these considerations to future research.

Example 3. We apply the previous formalization to Ex-

ample 2. To keep it simple, we only use a-priori values,

cardinality, and coherence constraints. Given the two

ontologies O1and O2in Figure 1, and the matching

hypotheses (1) to (4) from Example 2, the ground MLN

would include the following relevant ground formulae. We

use the concept and property labels from Figure 1 and omit

ground atoms of observable predicates.

A-priori similarity:

(mc(b1, b2),0.88),(mc(c1, e2),0.75),

(mp(p1, p2),0.7),(mc(d1, e2),0.54)

Cardinality constraints:

mc(c1, e2)∧mc(d1, e2)⇒c1=d1(5)

Coherence constraints:

disd

1(p1, b1)∧subd

2(p2, b2)⇒ ¬(mp(p1, p2)∧mc(b1, b2)) (6)

dis1(b1, c1)∧sub2(b2, e2)⇒ ¬(mc(b1, b2)∧mc(c1, e2)) (7)

subd

1(p1, c1)∧disd

2(p2, e2)⇒ ¬(mp(p1, p2)∧mc(c1, e2)) (8)

MAP Inference as Alignment Process

Hidden predicates model correspondences between entities

of the two ontologies whereas observable ones model predi-

cates occurring in typical description logic statements. If we

want to determine the most likely alignment of two given on-

tologies, we need to compute the set of ground atoms of the

hidden predicates that maximizes the probability given both

the ground atoms of observable predicates and the ground

formulae of Fhand Fs. This is an instance of MAP (maxi-

mum a-posteriori) inference in the ground Markov logic net-

work. Let Obe the set of all ground atoms of observable

predicates and Hbe the set of all ground atoms of hidden

predicates both with respect to C. Assume that we are given

a set O′⊆Oof ground atoms of observable predicates. In

order to ﬁnd the most probable alignment we have to com-

pute

argmax

H′⊆HX

(Fi,wi)∈Fs

X

g∈GC

Fi:O′∪H′|=g

wi,

subject to O′∪H′|=GC

Ffor all F∈ F h.

Markov logic is by deﬁnition a declarative language, sep-

arating the formulation of a problem instance from the al-

gorithm used for probabilistic inference. MAP inference

in Markov logic networks is essentially equivalent to the

weighted MAX-SAT problem and, therefore, NP-hard. Sev-

eral approximate algorithms for the weighted MAX-SAT

problem exist. However, since each ground formula in Fh

must be satisﬁed in the computed MAP state, exact infer-

ence is required in our setting. Hence, we apply integer

linear programming (ILP) which was shown to be an effec-

tive method for exact MAP inference in undirected graph-

ical models (Roth and Yih 2005; Taskar et al. 2005) and

speciﬁcally in Markov logic networks (Riedel 2008). ILP is

concerned with optimizing a linear objective function over

a ﬁnite number of integer variables, subject to a set of lin-

ear equalities and inequalities over these variables (Schri-

jver 1998). We omit the details of the ILP representation

of a ground Markov logic network but demonstrate how the

ground formulae from Example 3 would be represented as

an ILP instance.

Example 4. Let the binary ILP variables x1, x2, x3,and x4

model the ground atoms mc(b1, b2), mc(c1, e2), mp(p1, p2),

and mc(d1, e2), respectively. The ground formulae from Ex-

ample 3 can be encoded with the following ILP.

Maximize: 0.88x1+ 0.75x2+ 0.7x3+ 0.54x4

Subject to: x2+x4≤1(9)

x1+x3≤1(10)

x1+x2≤1(11)

x2+x3≤1(12)

The a-priori weights of the potential correspondences are

factored in as coefﬁcients of the objective function. Here,

the ILP constraint (9) corresponds to ground formula

(5), and ILP constraints (10),(11), and (12) correspond

to the coherence ground formulae (6), (7), and (8), re-

spectively. An optimal solution to the ILP consists of the

variables x1and x4corresponding to the correct alignment

{mc(b1, b2), mc(d1, e2)}. Compare this with the alignment

{mc(b1, b2), mc(c1, e2), mp(p1, p2)}which would be the

outcome without coherence constraints.

Experiments

We use the Ontofarm dataset (Svab et al. 2005) as basis

for our experiments. It is the evaluation dataset for the

OAEI conference track which consists of several ontologies

modeling the domain of scientiﬁc conferences (J´erˆome

Euzenat et al 2009). The ontologies were designed by

different groups and, therefore, reﬂect different concep-

tualizations of the same domain. Reference alignments

for seven of these ontologies are made available by the

organizers. These 21 alignments contain correspondences

between concepts and properties including a reasonable

number of non-trivial cases. For the a-priori similarity σ

we decided to use a standard lexical similarity measure.

After converting the concept and object property names

to lowercase and removing delimiters and stop-words, we

applied a string similarity measure based on the Levensthein

distance. More sophisticated a-priori similarity measures

could be used but since we want to evaluate the beneﬁts

of the ML framework we strive to avoid any bias related

to custom-tailored similarity measures. We applied the

reasoner Pellet (Sirin et al. 2007) to create the ground MLN

formulation and used TheBeast2(Riedel 2008) to convert

the MLN formulations to the corresponding ILP instances.

Finally, we applied the mixed integer programming solver

SCIP3to solve the ILP. All experiments were conducted

on a desktop PC with AMD Athlon Dual Core Processor

5400B with 2.6GHz and 1GB RAM. The software as

well as additional experimental results are available at

http://code.google.com/p/ml-match/.

The application of a threshold τis a standard technique

in ontology matching. Correspondences that match enti-

ties with high similarity are accepted while correspondences

with a similarity less than τare deemed incorrect. We eval-

uated our approach with thresholds on the a-priori simi-

larity measure σranging from 0.45 to 0.95. After apply-

ing the threshold τwe normalized the values to the range

[0.1,1.0]. For each pair of ontologies we computed the F1-

value, which is the harmonic mean of precision and recall,

and computed the mean of this value over all 21 pairs of

ontologies. We evaluated four different settings:

•ca: The formulation includes only cardinality constraints.

•ca+co: The formulation includes only cardinality and co-

herence constraints.

•ca+co+sm: The formulation includes cardinality, coher-

ence, and stability constraint, and the weights of the sta-

bility constraints are determined manually. Being able to

set qualitative weights manually is crucial as training data

is often unavailable. The employed stability constraints

consist of (1) constraints that aim to guarantee the sta-

bility of the concept hierarchy, and (2) constraints that

deal with the relation between concepts and property do-

main/range restrictions. We set the weights for the ﬁrst

group to −0.5and the weights for the second group to

−0.25. This is based on the consideration that subsump-

tion axioms between concepts are speciﬁed by ontology

engineers more often than domain and range restriction

of properties (Ding and Finin 2006). Thus, a pair of two

correct correspondences will less often violate constraints

of the ﬁrst type than constraints of the second type.

•ca+co+sl: The formulation also includes cardinality, co-

herence, and stability constraint, but the weights of the

stability constraints are learned with a simple online

2http://code.google.com/p/thebeast/

3http://scip.zib.de/

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

0.38

0.4

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

threshold

average F1 measure

with cardinality, coherence, and stability constraints

only with cardinality and coherence constraints

only with cardinality constraints

AgreementMaker with optimal threshold

AgreementMaker with standard threshold

Figure 2: F1-values for ca,ca+co, and ca+co+sm averaged

over the 21 OAEI reference alignments for thresholds rang-

ing from 0.45 to 0.95. AgreementMaker was the best per-

forming system on the conference dataset of the latest ontol-

ogy evaluation initiative in 2009.

threshold 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

ca+co+sm 0.56 0.59 0.60 0.61 0.62 0.63 0.62 0.62

ca+co+sl 0.57 0.58 0.58 0.61 0.61 0.61 0.63 0.62

ca+co 0.54 0.56 0.58 0.59 0.61 0.62 0.62 0.61

Table 1: Average F1-values over the 21 OAEI reference

alignments for manual weights (ca+co+sm) vs. learned

weights (ca+co+sl) vs. formulation without stability con-

straints (ca+co); thresholds range from 0.6 to 0.95.

learner using the perceptron rule. During learning we

ﬁxed the a-priori weights and learned only the weights

for the stability formulae. We took 5 of the 7 ontologies

and learned the weights on the 10 resulting pairs. With

these weights we computed the alignment and its F1-value

for the remaining pair of ontologies. This was repeated

for each of the 21 possible combinations to determine the

mean of the F1-values.

The lower the threshold the more complex the resulting

ground MLN and the more time is needed to solve the cor-

responding ILP. The average time needed to compute one

alignment was 61 seconds for τ= 0.45 and 0.5 seconds for

τ= 0.85. Figure 2 depicts the average F1-values for ca,

ca+co, and ca+co+sm compared to the average F1-values

achieved by AgreementMaker (Cruz et al. 2009), the best-

performing system in the OAEI conference track of 2009.

These average F1-values of AgreementMaker were obtained

using two different thresholds. The ﬁrst is the default thresh-

old of AgreementMaker and the second is the threshold at

which the average F1-value attains its maximum. The inclu-

sion of coherence constraints (ca+co) improves the average

F1-value of the alignments for low to moderate thresholds

by up to 6% compared to the ca setting. With increasing

thresholds this effect becomes weaker and is negligible for

τ≥0.9. This is the case because alignments generated

with ca for thresholds ≥0.9contain only a small number

of incorrect correspondences. The addition of stability con-

straints (ca+co+sm) increases the quality of the alignments

again by up to 6% for low to moderate thresholds. In the op-

timal conﬁguration (ca+co+sl with τ= 0.85) we measured

an average F1-value of 0.63 which is a 7% improvement

compared to AgreementMaker’s 0.56. What is more impor-

tant to understand, however, is that our approach generates

more accurate results over a wide range of thresholds and

is therefore more robust to threshold estimation. This is ad-

vantageous since in most real-world matching scenarios the

estimation of appropriate thresholds is not possible. While

the ca setting generates F1-values >0.57 for τ≥0.75 the

ca+co+sm setting generates F1-values >0.59 for τ≥0.65.

Even for τ= 0.45, usually considered an inappropriate

threshold choice, we measured an average F1-value of 0.51

and average precision and recall values of 0.48 and 0.60, re-

spectively. Table 1 compares the average F1-values of the

ML formulation (a) with manually set weights for the stabil-

ity constraints, (b) with learned weights for the stability con-

straints, and (c) without any stability constraints. The values

indicate that using stability constraints improves alignment

quality with both learned and manually set weights.

Discussion and Future Work

We presented a Markov logic based framework for ontology

matching capturing a wide range of matching strategies.

Since these strategies are expressed with a uniﬁed syntax

and semantics we can isolate variations and empirically

evaluate their effects. Even though we focused only on a

small subset of possible alignment strategies the results are

already quite promising. We have also successfully learned

weights for soft formulae within the framework. In cases

where training data is not available, weights set manually

by experts still result in improved alignment quality. Re-

search related to determining appropriate weights based on

structural properties of ontologies is a topic of future work.

The framework is not only useful for aligning concepts and

properties but can also include instance matching. For this

purpose, one would only need to add a hidden predicate

modeling instance correspondences. The resulting matching

approach would immediately beneﬁt from probabilistic

joint inference, taking into account the interdependencies

between terminological and instance correspondences.

Acknowledgments

Many thanks to Sebastian Riedel for helping us tame The-

Beast and for his valuable feedback.

References

Albagli, S.; Ben-Eliyahu-Zohary, R.; and Shimony, S. E.

2009. Markov network based ontology matching. In Pro-

ceedings of the International Joint Conference on Artiﬁcial

Intelligence, 1884–1889.

Cruz, I.; Palandri, F.; Antonelli; and Stroe, C. 2009. Ef-

ﬁcient selection of mappings and automatic quality-driven

combination of matching methods. In Proceedings of the

ISWC 2009 Workshop on Ontology Matching.

Ding, L., and Finin, T. 2006. Characterizing the seman-

tic web on the web. In Proceedings of the International

Semantic Web Conference 2006, 242–257.

Domingos, P.; Lowd, D.; Kok, S.; Poon, H.; Richardson,

M.; and Singla, P. 2008. Just add weights: Markov logic

for the semantic web. In Proceedings of the Workshop on

Uncertain Reasoning for the Semantic Web, 1–25.

Euzenat, J., and Shvaiko, P. 2007. Ontology matching.

Springer-Verlag.

J´erˆome Euzenat et al. 2009. Results of the ontology align-

ment evaluation initiative 2009. In Proceedings of the

Workshop on Ontology Matching.

Levenshtein, V. I. 1965. Binary codes capable of correcting

deletions and insertions and reversals. Doklady Akademii

Nauk SSSR 845–848.

Meilicke, C., and Stuckenschmidt, H. 2007. Analyz-

ing mapping extraction approaches. In Proceedings of the

Workshop on Ontology Matching.

Meilicke, C., and Stuckenschmidt, H. 2009. An efﬁcient

method for computing alignment diagnoses. In Proceed-

ings of the International Conference on Web Reasoning

and Rule Systems, 182–196.

Meilicke, C.; Tamilin, A.; and Stuckenschmidt, H. 2007.

Repairing ontology mappings. In Proceedings of the Con-

ference on Artiﬁcial Intelligence, 1408–1413.

Melnik, S.; Garcia-Molina, H.; and Rahm., E. 2002. Sim-

ilarity ﬂooding: A versatile graph matching algorithm and

its application to schema matching. In Proceedings of

ICDE, 117–128.

Meza-Ruiz, I., and Riedel, S. 2009. Multilingual semantic

role labelling with markov logic. In Proceedings of the

Conference on Computational Natural Language Learn-

ing, 85–90.

Richardson, M., and Domingos, P. 2006. Markov logic

networks. Machine Learning 62(1-2):107–136.

Riedel, S. 2008. Improving the accuracy and efﬁciency of

map inference for markov logic. In Proceedings of UAI,

468–475.

Roth, D., and Yih, W. 2005. Integer linear programming

inference for conditional random ﬁelds. In Proceedings of

ICML, 736–743.

Schrijver, A. 1998. Theory of Linear and Integer Program-

ming. Wiley & Sons.

Sirin, E.; Parsia, B.; Grau, B. C.; Kalyanpur, A.; and Katz,

Y. 2007. Pellet: a practical OWL-DL reasoner. Journal of

Web Semantics 5(2):51–53.

Svab, O.; Svatek, V.; Berka, P.; Rak, D.; and Tomasek, P.

2005. Ontofarm: Towards an experimental collection of

parallel ontologies. In Poster Track of ISWC.

Taskar, B.; Chatalbashev, V.; Koller, D.; and Guestrin, C.

2005. Learning structured prediction models: a large mar-

gin approach. In Proceedings of ICML, 896–903.

Wu, F., and Weld, D. S. 2008. Automatically reﬁning the

wikipedia infobox ontology. In Proceeding of the Interna-

tional World Wide Web Conference, 635–644.