Page 1
Proceedings of the 2007 Industrial Engineering Research Conference
G. Bayraksan, W. Lin, Y. Son, and R. Wysk, eds.
A Probabilistic Framework for Semantic Similarity and Ontology
Mapping
Yun Peng, Zhongli Ding, Rong Pan, Yang Yu
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County
1000 Hilltop Circle, Baltimore, MD 21250, USA
Boonserm Kulvatunyou, Nenad Ivezic, Albert Jones
Manufacturing Systems Integration Division
National Institute of Standards and Technology (NIST)
MS 8265, Gaithersburg, MD 20899, USA
Hyunbo Cho
Department of Industrial and Management Engineering
Pohang University of Science and Technology,
Pohang, South Korea
Abstract
We propose a probabilistic framework to address uncertainty in ontologybased semantic integration and interopera
tion. This framework consists of three main components: 1) BayesOWL that translates an OWL ontology to a Bayes
ian network, 2) SLBN (Semantically Linked Bayesian Networks) that support reasoning across translated BNs, and
3) a Learner that learns from the web the probabilities needed by the other two components. This framework ex
pands the semantic web and can serve as a theoretical basis for solving real world semantic integration problems.
Keywords
Semantic web, uncertainty, integration, ontology, Bayesian networks
1. Uncertainty in Ontology Mapping and Semantic Integration
Representing and reasoning with uncertainty have been realized as an important issue in a single ontology [4, 11].
For example, in ontology construction, besides knowing that “A is a subclass of B” one may also know and wish to
express in the ontology how likely an instance of B belongs to A. In ontology reasoning, one may want to infer not
only if A subsumes B, but also the degree of closeness of A to B, or one may want to know the degree of similarity
between A and B even if A and B are not subsumed by each other. Uncertainty becomes more prevalent in concept
mapping between two ontologies. In many applications, exact matches between concepts defined in two ontologies
do not exist. Instead, a concept defined in one ontology may find partial matches to one or more concepts in another
ontology, often with different degrees of similarity.
How to provide consistent and unified semantic support for information and knowledge integration that handles un
certainty in a principled and practical manner is the problem our research attempts to address. The approach we take
is probabilistic, and Bayesian networks (BN) are taken as the formalism for modeling the probabilistic interdepend
encies among ontological entities. This paper presents the probabilistic framework developed in this research effort.
2. Overview of Our Probabilistic Framework
We assume the ontologies are written in OWL (Web Ontology Language, http://www.w3.org/TR/owlfeatures/).
Fig. 1 below gives an overview of this framework in the context of ontology mapping. The three main components,
BayesOWL, SLBN (semantically linked BN), and the Learner, are described in detail in the next three sections.
Page 2
Peng et al
Figure 1: Overview of the probabilistic framework
3. BayesOWL
To translate an OWL ontology to a BN, BayesOWL [2] takes two inputs: 1) the OWL file that defines the ontology,
and 2) a collection of probabilistic constraints, including prior probabilities of concept classes and conditional prob
abilities of superclass relations defined in the ontology. A set of structural translation rules are called to build the
BN structure (a directed acyclic graph or DAG) from the ontology definition. Conditional probability tables (CPTs)
of the BN are then constructed based on the DAG and the probabilistic constraints.
Probability information markups. We represent the semantics of probabilistic constraints as follows. We treat
concept classes A and B in an ontology as random binary variables and interpret
prior probability that an arbitrary individual belongs (or does not belong) to class A, and
probability that an individual of class B also belongs to class A. These two types of probabilities for classes and su
perclass relations in an ontology are most likely to be available to ontology designers. To add such uncertainty in
formation into an existing ontology, we treat a probability as a kind of resource, and define two OWL classes Prior
Prob and CondProb for their encoding. Class PriorProb has two mandatory properties: hasVarible and hasProb
Value, while class CondProb has three mandatory properties: hasVariable, hasCondition, and hasProbValue. For
example,
8 . 0)(
=
cP
for class C can be expressed as
<Variable rdf:ID="c">
<hasClass>C</hasClass>
<hasState>True</hasState>
</Variable>
</PriorProb>
Conditional probabilities can be encoded in a similar fashion. (See [2] for more details on probability markups.)
Bayesian network (BN) is a graphic model for probabilistic interdependencies among a set of random variables [9].
A BN consists of two parts: 1) a directed acyclic graph (DAG) in which nodes represent variables and directed arcs
between nodes signify the dependencies; and 2) a conditional probability table (CPT) P(xi pi)) for each variable xi,
given all its parent nodes pi. Based on an independent assumption, the joint distribution of all variables can be com
puted from the local CPTs by the chain rule
)(
1
i
xXP
=
∏==
Structural translation. The ontology augmented with probability constraints is still an OWL file. It can be trans
lated into a BN by first forming a DAG following a set of rules. Special nodes, call LNodes, are created during the
translation to facilitate modeling relations among class nodes that are specified by OWL logical operators (union,
intersection, complement, disjoint, equivalent). These structural translation rules are summarized as follows.
(1) Every concept class C is mapped into a binary variable node in the translated BN.
(2) Constructor rdfs:subClassOf is modeled by an arc from the superclass node to the subclass node.
(3) A concept class C defined as the intersection of concept classes Ci (i = 1,…,n) is mapped into a subnet in
the translated BN with one arc from each Ci to C, and one arc from C and each Ci to an LNode called
LNodeIntersection. Constructor owl:UnionOf is modeled in the same way except now the directions of arcs
between C and each Ci are reversed.
(4) If two concept classes C1 and C2 are related by constructors owl:complementOf, owl:equivalentClass, or
owl:disjointWith, then an LNode (named LNodeComplement, LNodeEquivalent, and LNodeDisjoint, re
spectively) is added to the translated BN with directed links from C1 and C2 to the LNode.
)(
a
P
AP
=
(or
(
a
)(
aAP
as the conditional
¬=
) as the
)
b
<PriorProb rdf:ID="P(c)">
<hasVariable>c</hasVariable>
<hasProbValue>0.8</hasProbValue>
)(
ii
n
xP
p
. Constructions of DAG and CPT are given next.
• BayesOWL translates two ontologies
Onto1 and Onto2 into BN1 and BN2;
• SLBN supports concept mapping
between Onto1 and Onto2 as prob
abilistic reasoning between BN1 and
BN2;
• Learner learns probabilities for Baye
sOWL and SLBN from text exem
plars searched from the web.
Page 3
Peng et al
Constructing CPT. The nodes in the DAG generated from the structural translation can be divided into two disjoint
groups: XC for nodes representing concepts in the ontology, and XL for LNodes for logical relations. The CPT for an
LNode in XL can be determined by the logical relation it represents; in other words, the entries in
filled in such a way that when the state of
When all LNodes are set to “True” (denoting this situation as LT), all the logical relations defined in the original
ontology are held in the translated BN. Constructing the CPT
cated. It must satisfy the given probabilistic constraints of the prior
and this has to be done in the subspace of LT. In other words, we now have a multiconstraint satisfaction problem:
construct
)(
i
P
π for all
Ci
Xx ∈
such that
)(
LTXP
C
is consistent with all given probabilistic constraints.
We apply the technique known as Iterative Proportional Fitting Procedure (IPFP) [2, 5] to construct CPTs for con
cept nodes in XC. IPFP is a procedure that modifies a given probability distribution (PD) P(X) to satisfy a set of con
straints R={R(Yi)}, each of which is a prior or conditional distribution on a subset of variables
starts with the initial PD
)()(
0
XPxQ
=
, and at each iteration it modifies the PD to satisfy one constraint R(Yi) by
/ )()()(
1
kikk
QYRXQXQ
−−
⋅=
It can be shown that a consistent set of constraints R, the iterative process will converge to
fies all constraints in R and is closest to the original
(XP
because IPFP works on the joint probability distributions, not on BNs. First, direct application of IPFP may destroy
the existing interdependencies between variables (i.e., the given DAG becomes invalid). Secondly, IPFP is computa
tionally very expensive because at each iteration, every entry in the joint PD of all variables in the BN must be up
dated. To overcome these difficulties, we developed an algorithm named DIPFP [12, 2] which decomposes IPFP so
that each iteration only updates one CPT of the given BN. In DIPFP, Eq. (1) becomes: for each constraint
where
)(
ii xP
π shall be
ix is “True” the intended logical relation holds among its parent nodes.
)
(i x
(
ii xP
π for a concept node
)
P
and conditionals
Ci
X
i xx
x ∈
P
is more compli
)
j
for all
(
ij x
π∈
;
i x
XYi⊆
. Briefly, IPFP
)(
1
i
Y
, a PD that satis
(1)
)(
*xQ
)
measured by crossentropy. Two difficulties exist here
)(
ii LxR
i L contains zero or more parents of
ix , the CPT of ix is modified by.
)(
),(
)(
)()(
1
) 1
−
(
) 1
−
()(
ik
iik
ii
iikiik
LTLxQ
LxR
xQxQ
πα⋅⋅π=π
−
, (2)
where
The translated BN preserves the semantics of the original ontology and is consistent with all the probabilistic con
straints. It can support ontology reasoning tasks as probabilistic inferences in the translated BN. For example, given
a concept description e, it can answer queries about concept satisfiability (whether P(eLT) = 0) and about concept
overlapping (as P(C1, C2 LT)). It can also support semantic similarity measures such as Jaccard coefficient [14] and
those based on information contents [13].
4. SLBN
When dealing with reasoning involving multiple BNs, existing approaches exchange beliefs via shared variables and
impose very strong restrictions on how the shared variables are modeled in individual BNs [8]. SemanticallyLinked
Bayesian Networks (SLBN) are developed to support probabilistic inferences across independent developed BNs
which do not share common variables but may have variables that have similar meaning or semantics [7].
Variable Linkage. Consider two concepts A in Onto1 and B in Onto2 with similar but not necessarily identical
meaning. A and B become variables in BN1 and BN2, the two BNs translated from Onto1 and Onto2 by Baye
sOWL. We want to see the probabilistic inference being carried out from BN1 (the source) to BN2 (the destination).
Note that BN1 and BN2 define two probability spaces, denoted
information between A and B be given as the conditional distribution
space, denoted as
PS
, which is related but different from
1
PS and B with
PS . SLBN connects A to B with a directed variable linkage
)(
ABPSA
provides probabilistic information about the semantic similarity between A and B. The linkage
forms a pathway for A in BNA to influence B in BNB. However, since three separate probability spaces are involved,
the Bayes’ rule does not apply here. Instead, we use the Jeffrey’s rule [3, 10]. This rule revises a PD
other PD
)(
XYQ
⊂
over a subset of variables. The rule can be written as follows in the context of SLBN: to modify
)(XP
by
)(AQ
where
XA∈
, first,
)(AP
, the belief on A , changes to
)()(
AQAP
←
.
)(
1
ik
πα−
is the normalization factor. The process iterates over all
)(
ii LxR
repeatedly until Q converges.
1
PS and
2
)
PS . SLBN requires that the similarity

A
. This distribution is in yet another
PS . In particular,
PS
A
B
L =
<
BNBA
,,
(
BP
22 , 11
PS and
2 , 1
shares variable A with
A
B
BA
SBN
,,
2
>
, where
B=
A
B
L
)(XP
by an
)(A
Q
,
(3)
Page 4
Peng et al
Then the beliefs of other variables
In BN literature, the probability such as
called hard evidence, e.g.,
single linkage
B
L can be viewed as twice applications of Jeffrey’s rule across these three spaces, first from
2 , 1
PSPS
to
PS . In the first step, since variable A in
soft evidence Q(A) to
PS
, then the belief on B in
PS
(4). In the second step, Q(B) is then applied as soft evidence from
other variables C in
PS as
)()()(
b
b
jj
bQbBCPCQ
j
j
∑
X
Q
B∈
)
B
are changed to
)
a
B
i
∑=
)(AQ
is referred to as soft evidence about A, which is in contrast to the so
. Then, as depicted in Fig. 2, the influence to variables in
)()(((
ii
aAQaABPP
==←
. (4)
1 aA =
2
PS by A in
1
PS via the
PS to
1
PS becomes
=
AQA
()
A
1
, then
2 , 121
PS is identical to A in
in the middle is updated to
2 , 1
PS
2 , 1
PS
Q
, P(A) in
∑
=
i a
P
2
PS , updating beliefs of
2 , 12 , 1
i aBB
)()(
by
to variable B in
2
)()()(
iijaj
aAQaAbBPbBCP
i
===∑=∑===
.
Figure 2: Variable A in BNA influences B in BNB via semantic linkage
A
B
L .
Belief update with multiple linkages. When more than one variable linkage of semantically similar concepts exist,
multiple soft evidences can be sent from one BN to the other via these linkages. One would naturally think of apply
ing IPFP to this problem using all of the soft evidences as constraints. However, as discussed in Section 3, IPFP
cannot be directly applied to BNs. To circumvent this difficulty we turn to another type of uncertain evidence,
namely virtual evidence which is often given as a likelihood ratio
)) )((:
in
aaObP
where
) )((
ij
aaObP
is interpreted as the probability we observe A is in state
state
node veA for the given L(A). This node has no child, with A as its only parent, and its CPT is determined by L(A) [9].
Soft evidence can be easily converted into virtual evidence when it is on a single variable [9]. A problem arises
when multiple soft evidences, say
)(AQ
and
)(BQ
, are converted to dummy nodes. Due to the interference, the re
sults of belief update by the two virtual evidences will not confirm with either
method that can convert a set of soft evidences to likelihood ratios which, when all applied to the BN as virtual evi
dences, preserve every piece of soft evidence
)(AQ
. We have developed an algorithm for this by combining the vir
tual evidence and IPFP [8]. The page limit prevents a complete description of this algorithm, but it roughly works as
follows. As an iterative process, it loops over the set of all soft evidences repeatedly until convergence. At each it
eration k, only one soft evidence, say
)(AQ
, is picked up and a new virtual evidence node is added to the system
with the likelihood ratio
/ )(, ),(/ )(()(
11
sik
PaQaPaQAL
−
=
L
evidence nodes added in the previous k –1 iterations.
5. Learning Probabilities from the Web
The prior probabilities of nonroot nodes in a BN can be computed from the CPT and the priors of root nodes, and,
and there is usually only one root for BN translated from an ontology, the concept “THING”, whose prior can either
be assumed to 1 or estimated by some means or obtained from the ontology design. We will focus on learning condi
tional probabilities P(CD) for concepts C and D. Our approach is to use text classification techniques [1, 6] that
builds classifiers for individual concepts by statistical analysis of the text exemplars associated with the concepts.
Learning the probabilities for semantic similarity between concepts in two ontologies can be done through a cross
classification as follows. First, a statistical feature model (classifier) for each concept in Onto1 is built according to
the statistical information in that concept’s exemplars using a text classifier such as Rainbow [6]. Then concepts in
Onto2 are classified into classes of Onto1 by feeding their respective exemplars into the models of Onto1 to obtain a
set of scores, which can be interpreted as conditional probabilities for interconcept similarity. Concepts in Onto1
can be classified in the same way into classes of Onto2. Conditional probabilities related to concepts in a single on
tology can be obtained similarly through selfclassification with the models learned for that ontology.
The performance of text classification based methods depends on the quality of exemplars attached to each concept.
It is costly to find high quality text exemplars manually. Our approach is to use search engines such as Google to
: )
i
 )
1
j a while A is actually in
((()(
i
aaObPaAL
==
L
: ) )
2
((
iaaObP
ia . One thing nice about virtual evidence is that it can be easily applied to BNs by adding a dummy or virtual
)(AQ
or
)(BQ
. What is needed is a
))(
1
ik
a
−
where
)(
1
ik
aP−
is from the BN with all virtual
Page 5
Peng et al
retrieve text exemplars automatically from the web. The goal is to search for documents in which the concept is used
in its intended semantics. The rationale is that the meaning of a concept can be described or understood by the way it
is used. To search for documents relevant to a concept, one cannot simply use the words in the name of that concept
as the key because a word may have multiple meanings. Fortunately, since we are dealing with concepts in well de
fined ontologies, the semantics of a concept is to a great extent specified by the other terms used in defining this
concept in the ontology, including names of its superconcept classes and its properties. There are a number of ways
to use the semantic information to improve search quality. A simple one that we have experimented is to form
search query for a concept by combining all the terms on the path from root to that concept node in the taxonomy.
6. A Small Example
We have performed computer experiments on two smallscale realworld ontologies: the AI subdomain from ACM
Topic Taxonomy (http://www.acm.org/class/1998/) and DMOZ (Open Directory http://dmoz.org/) hierarchy. These
two hierarchies differ in both terminologies and modeling methods. DMOZ categorizes concepts to facilitate peo
ple’s access to these pages, while ACM topic hierarchy categorizes concepts to structure a classification primarily
for academics. For every concept, we obtained exemplars by querying Google and learned probability constraints as
described in the Section 5. Then, BayesOWL is used to translate the two ontologies into two BNs as shown in Fig. 3.
Figure 3: Two translated BN: from ACM (left) and DMOZ (right)
Joint distributions P(A, B) were learned for each pair of concepts of the two BNs also by the Learner described in
Section 5. Table 1 lists the five most similar concepts in the learning result, and their Jaccard coefficients computed
from P(A, B).
Table 1: Five most similar concepts in the learning result..
ACM Topic DMOZ
/Knowledge Rep. & Formalism Method /Knowledge Representation
/Natural Language Processing /Natural Language
/Learning /Machine Learning
/Learning /Knowledge Representation
/Applications & Expert System /Knowledge Representation
Next, two variable linkages were created for the two pairs that are very similar. They are L1 = < dmoz.kr, acm.krfm,
BNdmoz, BNacm, S1> and L2 = < dmoz.nl, acm.nlp, Ndmoz, Nacm, S2>, where
9027 . 00973. 0
were calculated from their learned joint distributions.
SLBN allows us to conduct probabilistic reasoning well beyond finding the best concept matches. To illustrate our
point, consider the example of finding a description of DMOZ’s /Knowledge Representation/Semantic Web
Similarity
0.96
0.90
0.88
0.81
0.79
==
0057. 09943 . 0
)..(
1
kr dmoz krfmacmPS
and
==
7680 . 02327 . 0
0157. 09843. 0
)..(
2
nldmoz nlpacmPS
Page 6
Peng et al
(dmoz.sw) in ACM topics. Apparently, there is no single ACM concept identical to dmoz.sw, the two most semanti
cally similar concepts to dmoz.sw in ACM are
•
/Knowledge Representation and Formalism Method/Relation System (acm.rs) and
•
/Knowledge Representation and Formalism Method/Semantic Network (acm.sn)
with the learned joint distributions with Jaccard coefficients J(dmoz.sw, acm.rs) = 0.64, and J(dmoz.sw, acm.sn) =
0.61. The coefficient between dmoz.sw and acm.krfm, the super class of acm.rs and acm.sn, is even less (0.49). Most
ontology mapping systems would stop here. However, with our framework, we can evaluate similarities with com
posite hypotheses involving multiple ACM concepts. One of such hypotheses is acm.rs Ú acm.sn, which has Jaccard
coefficient of 0.725, significantly greater than any single concept candidate.
7. Conclusions
Our research has addressed a number of key issues of the probabilistic approach for ontology mapping. However, a
few issues remain open, and a number of difficulties also need to be dealt with. Our BayesOWL is only completed
for terminological taxonomies, it is not yet able to deal with properties. Similarly, our SLBN formalizes the notion
of variable linkages to connect BNs and develops theoretically justifiable inference methods with such linkages.
However, it does not address the important issue of how to determine whether a linkage should be established be
tween a given pair of variables. Our learner for probabilities based on text classification and ontology guided search
of the web is more problematic at this time. The probabilities generated by the learner may be inaccurate, and some
times may also be inconsistent with each other. All these issues are potentially good topics for future research.
Acknowledgements
This work was supported in part by NSF award IIS0326460 and NIST award 60NANB6D6206.
Disclaimer
Certain commercial software products are identified in this paper. This use does not imply approval or endorsement
by NIST, nor does it imply that these products are necessarily the best available for the purpose.
References
1. Craven, M., et al, 1998, “Learning to extract symbolic knowledge from the World Wide Web”, in Proc. of
the 15th National Conference on Artificial Intelligence (AAAI98), Madison, WI, 509 – 516.
2. Ding, Z., Peng, Y., and Pan, R., 2005, “BayesOWL: Uncertainty modeling in semantic web ontologies”, in
Soft Computing in Ontologies and Semantic Web, Z. Ma (Ed.) SpringerVerlag.
3. Jeffery, R. 1983. The logic of Decisions, 2nd Edition, University of Chicago Press.
4. Koller, D., Levy, A., and Pfeffer, A., 1997, “PCLASSIC: A tractable probabilistic description logic”, in
Proc. of AAAI97, 390397.
5. Kruithof, R., 1937, “Telefoonverkeersrekening”, De Ingenieur 52:E15E25.
6. McCallum, A., 1996, “Bow: A toolkit for statistical language modeling, text retrieval, classification and
clustering”, http://www.cs.cmu.edu/~mccallum/bow.
7. Pan, R., Ding, Z., Yu, Y. and Peng, Y., 2005, “A Bayesian Network Approach to Ontology Mapping”, in
Proc. of the Fourth International Semantic Web Conference, Nov. 610, Galway, Ireland.
8. Pan, R., Peng, Y., and Ding, Z., 2006, “Belief Update in Bayesian Networks Using Uncertain Evidence”, in
Proc. of the IEEE International Conf. on Tools with Artificial Intelligence, Nov. 13 – 15, Washington, DC.
9. Pearl, J., 1988, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan
Kauffman Publishers.
10. Pearl, J. 1990, “Jeffrey’s rule, passage of experience, and neoBayesianism”, in H.E. Kyburg, et al. (eds.),
Knowledge Representation and Defeasible Reasoning, 245265.
11. Peng, Y., et al, 2003, “Semantic Resolution for ECommerce”, in Innovative Concepts for AgentBased
Systems, SpringerVerlag, 355366.
12. Peng, Y. and Ding, Z., 2005, “Modifying Bayesian Networks by Probability Constraints”, in Proc. of 21st
Conference on Uncertainty in Artificial Intelligence, July 2629, Edinburgh, Scotland.
13. Resnik, P., 1995, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, in Proc. of
the 14th Intl. Joint Conf. on AI, 448453, Montreal, CA
14. van Rijsbergen, C. J., 1979, Information Retrieval. London: Butterworths. Second Edition.