Privacy-Preserving Reasoning on the SemanticWeb

Jie Bao, G. Slutzki, V. Honavar

Iowa State Univ., Ames;

Conference Proceeding: 12/2007; DOI: 10.1109/WI.2007.83ISBN: 978-0-7695-3026-0In proceeding of: Web Intelligence, IEEE/WIC/ACM International Conference on

Abstract

Many semantic web applications require selective sharing of ontologies between autonomous entities due to copyright, privacy or security concerns. In such cases, an agent might want to hide a part of its ontology while sharing the rest. However, prohibiting any use of the hidden part of the ontology in answering queries from other agents may be overly restrictive. We provide a framework for privacy- preserving reasoning in which an agent can safely answer queries against its knowledge base using inferences based on both the hidden and visible part of the knowledge base, without revealing the hidden knowledge. We show an application of this framework in the widely used special case of hierarchical ontologies.

Source: IEEE Xplore

Comments on this publication

ResearchGate members can add comments. Sign up now and post your comment!

Similar publications

Page 1
 
Page 2
 
Page 3
 
Page 4
 
Page 5
 
End of preview.
Page 1
Privacy-Preserving Reasoning on the Semantic Web
Jie Bao, Giora Slutzki and Vasant Honavar
Department of Computer Science, Iowa State University, Ames, IA. USA. 50010
{baojie,slutzki,honavar}@cs.iastate.edu
Abstract
Many semantic web applications require selective shar-
ing of ontologies between autonomous entities due to copy-
right, privacy or security concerns. In such cases, an agent
might want to hide a part of its ontology while sharing the
rest. However, prohibiting any use of the hidden part of
the ontology in answering queries from other agents may
be overly restrictive. We provide a framework for privacy-
preserving reasoning in which an agent can safely answer
queries against its knowledge base using inferences drawn
based on both the hidden and visible part of the knowledge
base, without revealing the hidden knowledge. We show
an application of this framework in the widely used special
case of hierarchical ontologies.
1 Introduction
The semantic web is aimed at making data, knowledge,
and services in a form that allows software agents to find,
share, integrate, and use data, knowledge, and services on
the web. The imperative for sharing information in such
a setting has to be balanced against copyright, privacy, se-
curity, or commercial concerns which require the partici-
pants to protect sensitive information e.g., data, knowledge,
from other parties. Hence, there is a need for mechanisms
that enable the participants to selectively share information
with other parties without risking disclosure of sensitive in-
formation. Our focus in this paper is on selective sharing
of ontologies on the web: in particular, answering queries
against an ontology, without disclosing hidden knowledge,
i.e., knowledge that needs to be protected from disclosure.
Current proposals for policy languages [14] for informa-
tion hiding on the semantic web rely on complete denial of
access to the hidden parts of an ontology when answering
queries against the ontology. We argue that such approaches
are overly restrictive in that they prohibit the use of hidden
knowledge in answering queries even in scenarios where it
is possible to do so without disclosing the hidden knowl-
edge.
Against this background, this paper examines the prob-
lem of answering queries against ontologies based on rea-
soning with hidden knowledge without risking its unin-
tended disclosure. Specifically, we explore a framework
for privacy-preserving reasoning on the semantic web. Un-
like Access control policies used in databases [12] or their
web counterparts (e.g., XACML [7]) that protect hidden
knowledge on a syntactic level, our approach protects hid-
den knowledge on the semantic level. Thus, queries against
an ontology can be answered based on inference using hid-
den knowledge whenever it is possible to do so without dis-
closing the hidden knowledge.
The main contributions of the paper are:
• A precise formulation of the problem of privacy-
preserving reasoning on the semantic web.
• A general framework for privacy-preserving reasoning
for semantic web ontologies that exploits the indis-
tinguishability of hidden knowledge from incomplete
knowledge under the Open World Assumption (OWA).
• A set of privacy-preserving reasoning strategies for
description logic (DL) ontologies using the notion of
conservative extension [9].
• Privacy-preserving reasoning strategies for the impor-
tant special case of hierarchical ontologies via a reduc-
tion of privacy-preserving reasoning to graph reacha-
bility analysis.
2 Motivating Examples
We start with a some examples of applications to mo-
tivate the need for privacy-preserving reasoning on the se-
mantic web.
Example 1 (Online Calendar): Consider Bob who uses an
online calendar to manage his schedule and to coordinate
his daily activities with other individuals that he interacts
with. Suppose the fact that Bob has a date with his girlfriend
at noon is stored in Bob’s calendar, along with additional
knowledge, e.g., “date is a kind of activity” and “If person
x has an activity at time t, person x is busy at time t” and so
on. Suppose Bob does not wish to share with his colleagues
that he has a date with his girlfriend at noon. However, it
might be necessary for his colleagues to know that Bob is
busy at noon. In such a scenario, a query to Bob’s calen-
dar as to whether Bob is busy at noon should be answered
as “Yes” (which is inferred using both the sensitive knowl-
edge and non-sensitive knowledge), whereas a query as to
whether Bob has a date with his girlfriend at noon should
Page 2
be answered as “Unknown”. However, if the use of hid-
den knowledge was forbidden, it would be impossible for
the calender to inform Bob’s colleagues that “Bob is busy
at noon”, although it is possible to do so, without revealing
the details of Bob’s noon-time activity.
Example 2 (Commercial Information Service): Con-
sider a company, say U-Travel that provides travel informa-
tion to online customers. Suppose U-Travel offers a query
service that provides limited information to the public but
more detailed information to paying subscribers. Suppose
the U-Travel’s ontology contains the following knowledge:
(a) Sun Lodge is a 2-star hotel (b) a 2-star hotel is a
hotel. Suppose U-Travel is be willing to reveal that “Sun
Lodge is a hotel” to the public, yet it wants hide the fact
that “Sun Lodge is a 2-star hotel” from all but its pay-
ing subscribers. If U-Travel query service could not use
hidden information i.e., that Sun Lodge is a 2-star hotel,
it would not be able to inform a non-paying subscriber that
Sun Lodge is a hotel, although it is possible to do so, with-
out compromising hidden knowledge.
Example 3 (Healthcare) (based on a similar exam-
ple given in [3]): Jane needs to take a certain preventive
medicine for breast cancer. Suppose Joes does not want
her physician or the pharmacy to supply the details of the
prescription to her health insurance company, because she
does not want to risk an increase in her health insurance
premium on the basis of the fact that the medicine she has
been prescribed is intended for use by women who are be-
lieved to have a high risk of developing breast cancer e.g.,
because of hereditary reasons. In such a setting, in order for
Jane to be reimbursed by her insurance company, the phar-
macy needs to be able to certify to the insurance company
(perhaps through a trusted third party) that Jane has indeed
incurred a medical expense that is covered by her insurance
policy.
One can easily imagine similar needs for selective shar-
ing of inferences based on hidden knowledge in many
other scenarios including, for example, business dealings
between companies, interactions between different govern-
mental agencies (e.g., intelligence, law enforcement, public
policy), cooperation among independent nations on matters
of global concern (e.g., counter-terrorism).
3 Partially Hidden Knowledge
We start by introducing the basic notion of hidden
knowledge. A knowledge base (KB)1 K over a language
L consists of a set of axioms K = {α1, ..., αn}. We as-
sume that K is consistent and does not contain tautologies.
We use Sig(αi) to denote the set of names occurring in an
axiom αi and Sig(K) to denote the signature of a KB K,
1In this paper, we use the terms “knowledge base” and “ontology” syn-
onymously.
Sig(K) = ∪ni=1Sig(αi).
For a specific agent, the set of axioms that make up a
KB K is divided into two mutually exclusive parts: a visi-
ble part Kv and a hidden part Kh, with the corresponding
signatures Sig(Kv) and Sig(Kh). We call Sig(Kv) the visi-
ble signature and Sig(Kh)−Sig(Kv) the hidden signature.
In what follows, a wide hat (e.g., ̂HiddenName) is used to
indicate that a name is hidden. We denote a KB K with a
visible part Kv and a hidden part Kh by (Kv,Kh).
We write K ` γ to mean that γ is classically provable
from K. If every axiom in a KB K2 is classically provable
from another KB K1, we say that K1 entails K2 and denote
it as K1 ` K2.
In some scenarios, it is useful to tailor the hidden and
the visible parts of a KB K with respect to different agents
that might query K. We call the division of the visible and
the hidden KB of K w.r.t. an agent a the scope policy of
K for agent a. In principle, a KB may have different scope
policies for different agents. In what follows, we will focus
on “safe” query answering for one agent against a partially
hidden KB.
Example 4: Consider an ontology K = (Kv,Kh) of the
U-Travel company. We use the partial-order relation ≤ to
indicate concept inclusion. The hidden part Kh contains
SunLodge ≤ ̂2StarHotel
̂2StarHotel ≤ Inn
where the hidden signature is { ̂2StarHotel}. The visible
part Kv contains
SunLodge ≤ AAADiscountable
Inn ≤ Hotel
Hence, the visible signature is Sig(Kv) =
{SunLodge, Inn, AAADiscountable, Hotel}.
4 Privacy-Preserving Reasoning
Our basic approach to designing a privacy-preserving
reasoner for a partially-hidden KB is to ensure that the an-
swers to queries do not inadvertently reveal hidden knowl-
edge. The central idea is to design a reasoner that exploits
the Open World Assumption (OWA) of ontology languages,
to make it impossible for the querying agent to distinguish
between information that is unknown to the reasoner (be-
cause of the incompleteness the KB) and the knowledge that
is being protected by the reasoner. A query that cannot be
safely answered without running the risk of disclosing hid-
den knowledge will be answered as if the reasoner lacks the
complete knowledge to answer the query.
Unlike the Closed World Assumption (CWA) which is
implicit in databases, OWA assumes that an ontology may
Page 3
be incomplete with regard to the knowledge of the world
being modeled. Therefore, and failure to prove an assertion
does not imply the validity of the negation of the assertion.
For instance, in Example 1, when queried whether “Bob
has a date with his girlfriend at noon”, if the answer is “Un-
known”, the querying agent cannot conclude that “Bob does
not have such a date” (the negation of the assertion). Conse-
quently, the querying agent cannot determine if the relevant
information (the details of Bob’s noon-time activity) is not
in the KB or if the information is in the KB but is protected.
Before we formalize the notion of privacy-preserving
reasoning using hidden knowledge to answer queries
against an ontology, we state some natural requirements that
need to be met by a reasoner operating in the setting out-
lined above.
1. Honesty. The reasoner should not “lie”. That is, an-
swers produced by the reasoner should always be con-
sistent with its KB.
2. History Independence. The reasoner should always
respond to a given query q against a fixed KB K with
the same answer regardless of the history of queries
that have been posed against K.
3. History Safety. The reasoner must ensure that the an-
swers it produces are safe in the sense that it is not
possible for a querying agent to infer any piece of hid-
den knowledge based on the answers to past queries
from the same reasoner and the visible part of KB.
The first requirement is desirable if the goal of the rea-
soner is to provide as much information as it can, without
providing wrong information (i.e., information that is in-
consistent with its KB). The last two requirements are natu-
ral because it is unrealistic to assume that any reasoner that
is used on the semantic web can “memorize” all previous
queries that it has answered or track the identity of every
agent that has queried it.
We now proceed to define a reasoner and a privacy-
preserving reasoner:
Definition 1 (Reasoner) Let K be a KB over a language
L, Q the query space (the set of possible assertions to
be tested against K) over L, and A the answer space.
A reasoner R for K is an algorithm that defines a func-
tion R : K × Q → A. For a specific KB K we define
RK : Q → A by setting RK(q) = R(K, q).
An immediate consequence of this definition is that a
reasoner R is “history independent” in the sense suggested
by requirement 2 above.
R might employ an inference engine which can be
viewed as a classical reasoner with answer space A =
{Y, N} such that ∀q ∈ Q, RK(q) = Y iff K ` q (thus,
RK(q) = N iff K 6` q). While an inference engine al-
ways responds in a truthful manner, the reasoner, in or-
der to protect some parts of K, may have an incentive to
pick an answering strategy which does not respond with the
“whole truth”. For example, a reasoner may answer “U”
(Unknown) even if the correct answer (from the inference
engine) is “Y” or “N”. The answer to a query q may be “U”
either because the reasoner has incomplete knowledge (i.e.,
K 6` q and K 6` ¬q) under OWA, or because the “truthful”
answer to q might risk disclosure of hidden knowledge. Un-
der OWA, because the querying agent can not distinguish
between these two cases, the reasoner is able to answer
queries based on inference using hidden knowledge with-
out revealing it.
Definition 2 (Privacy-Preserving Reasoner) Let K =
(Kv,Kh) be a KB over a language L, Q the query space
in L, and A = {U, Y, N} the answer space, and R a rea-
soner for K. We define:
QY = R−1K (Y ), QN = R−1K (N), QU = R−1K (U).
Clearly, Q = QU ∪QY ∪QN .
(a) R is strongly privacy-preserving w.r.t. K if it satisfies
the following two axioms:
• Honesty Axiom: (q ∈ QY ⇒ K ` q) and (q ∈ QN ⇒
K ` ¬q).
• Strong Safety Axiom: ∀α that is not a tautology and
Sig(α) ⊆ Sig(Kh), Kh ` α ⇒ (Kv ∪QY 6` α).
(b) R is weakly privacy-preserving w.r.t. K if it satisfies the
Honesty Axiom and the following axiom:
• Weak Safety Axiom: ∀α, α ∈ Kh ⇒ (Kv ∪QY 6` α)
The honesty axiom requires that reasoners provide an-
swers that do not contradict the given KB (i.e., K ∪ QY
is consistent ). The strong safety axiom requires that the
answers provided by reasoners do not disclose any conse-
quence that can be drawn from the hidden knowledge alone.
The weak safety axiom requires the reasoner to protect only
axioms that are explicitly mentioned in the hidden part of
the KB (but not necessarily their consequences).
The distinction between “strong safety” and “weak
safety” is useful since different applications need different
degrees of privacy preservation. In the U-Travel example, if
the ontology provider is willing to disclose consequences of
the hidden knowledge, e.g., “SunLodge ≤ Inn”, it can get
by with a weakly privacy-preserving reasoner. On the other
hand, in the online calendar example, suppose we have an
additional piece of hidden knowledge “Alice is Bob’s girl
friend”. Now, if Bob wants to protect any conclusion that
may follow from the hidden part of his KB, e.g., that “Bob
has a date with Alice at noon”, Bob will need a strongly
privacy-preserving reasoner.
It can be shown that in a general setting, strong safety
is a very restrictive requirement. For example, if there ex-
ist axioms β ∈ Kh and γ ∈ Kv (with Sig(γ) ⊆ Sig(Kh))
such that β ∨ γ is not a tautology, then there is no strongly
privacy-preserving reasoner for K = (Kv,Kh). On the
Page 4
other hand, weakly privacy-preserving reasoners exist for
any KB that satisfies α ∈ Kh ⇒ Kv 6` α. Intuitively,
this means that no hidden axiom is provable from the visi-
ble KB. However, as we shall see in Section 6, it is possi-
ble to design strongly privacy-preserving reasoners in spe-
cial cases, for instance, hierarchical ontologies (e.g., the U-
Travel example).
5 Privacy-Preserving Reasoning: General
Strategies
In this section, we discuss general strategies to designing
privacy-preserving reasoners.
Definition 3 (Strategy) Let L be a language, KL the class
of all knowledge bases over L, and RL the class of all rea-
soners over KL. A strategy for L is a function R : KL → RL
such that for every K ∈ KL, R = R(K) is a rea-
soner for K. The strong/weak safety scope of a strategy
R, Scope(R) = {K ∈ KL| R(K) is a strongly/weakly
privacy-preserving reasoner for K}.
A strategy needs to compromise between two possibly
conflicting goals:
1. Generality: An ideal strategy has the largest possible
safety scope, i.e., is able to yield safe reasoners for the
largest possible subclass of KL.
2. Informativeness: An ideal strategy is one that yields
reasoners that provide much information as possible
in their answers to queries against their KBs, that is,
reasoners that result in the smallest possible QU .
The following two strategies correspond to the “ex-
treme” choices with respect to these two goals:
• Dummy Strategy, i.e., one that always generates a
dummy reasoner, who answers “U” to every possible
query against its KB. Obviously, a dummy reasoner is
weakly privacy-preserving for any KB K = (Kv,Kh)
such that ∀α, α ∈ Kh ⇒ Kv 6` α. Note that this
condition is the weakest condition for a KB to have
privacy-preserving reasoners. A dummy strategy is
most general, but least informative. It has the largest
scope, but answers given by reasoners that are based
on it provide no information at all.
• Naive Strategy, i.e., one that generates a naive rea-
soner that reveals everything that follows from its
knowledge base, including the hidden part of the KB.
A naive reasoner is most informative, but is least gen-
eral: It is privacy-preserving only for those KB that
have no hidden knowledge at all (i.e., Kh = ∅).
In practice, we may need to make tradeoff between the
conflicting requirements of generality and informativeness
of strategies.
We now proceed to present a general approach for gen-
erating weakly privacy-preserving reasoners for semantic
web ontologies based on the notion of conservative exten-
sion [9]. The basic idea behind this approach is as follows:
Answers to previous queries may be used by the querying
agent to extend the visible part of the KB. The safety of the
strategy can be guaranteed if we can ensure that no conclu-
sions compromising the hidden knowledge can be inferred
from such an extension.
Definition 4 (Conservative Extension, [9]) Let K and K ′
be two knowledge bases. K∪K ′ is a conservative extension
of K, written as K ∪K ′ À K, if for every formula α such
that Sig(α) ⊆ Sig(K), K ∪K ′ ` α iff K ` α.
Let Kvc ⊆ Kv be the maximal set of visible axioms
such that Sig(Kvc) = Sig(Kv) ∩ Sig(Kh). Intuitively, be-
cause the querying agent does not know names that are not
in Sig(Kv), the names in Kvc correspond to “critical sig-
nature”, i.e., the subset of Sig(Kh) that is known to the
querying agent. If we can ensure that answers to queries
together with Kv − Kvc do not reveal any names that be-
long Sig(Kh) beyond those in Sig(Kvc), we can effectively
protect every axiom in Kh. Therefore, if we can ensure that
any extension of Kv with the results of previous queries is
a conservative extension of the critical visible axioms Kvc,
we can protect hidden knowledge. The following lemma
captures this intuition more formally:
Lemma 1 Let K = {Kv,Kh} be a KB such that ∀α, α ∈
Kh ⇒ Kv 6` α, R a reasoner for K. R is a weakly privacy-
preserving reasoner for K if it satisfies the honesty axiom
and Kv ∪QY À Kvc.
Proof: We only need to show that the weak safety axiom
holds under the stated conditions:
α ∈ Kh ⇒ Kv 6` α
⇒ Kvc 6` α
⇒ Kv ∪QY 6` α ¤
6 Privacy-Preserving Reasoning with SHIQ
Ontologies
In this section we present a “safe” reasoning strategy
based on conservative extensions for the description logic
SHIQ, which covers a significant part of OWL. Grau et
al. [8] have shown that in the special case of semantically
local ontologies (see below), it is possible to check whether
an extension of a SHIQ ontology is a conservative exten-
sion in polynomial time. Informally, an axiom is semanti-
cally local w.r.t. a signature S if it imposes no restrictions
Page 5
on the interpretation of names in S. A finite set of axioms
is local w.r.t. S if every axiom in it is local w.r.t. S. Prac-
tical ways to ensure semantic locality of SHIQ ontologies
have been elucidated by Grau et al [8]. They have also
established relationship between the notions of conserva-
tive extension and semantic locality of ontologies which we
summarize (adapted for simpler presentation in our setting)
in the following lemma:
Lemma 2 (Definition 3 and Lemma 5 of [8]) Suppose K1
and K2 are two SHIQ TBoxes such that K1 is local w.r.t.
Sig(K2) and K2 is local w.r.t. ∅. Then K1 ∪K2 is a con-
servative extension of K2.
We can now define RCE (read CE-strategy), a reasoning
strategy for SHIQ ontologies, based on the notion of con-
servative extension. Given a SHIQ TBox K = {Kv,Kh}
and subsumption query q, RCE specifies a reasoner for K
that answers q as follows:
IF q is local w.r.t. Sig(Kvc) and Sig(q) ⊆ Sig(Kv)
IF K ` q, return Y
ELSE IF K ` ¬q, return N
ELSE return U /*incomplete knowledge)*/
ELSE return U /*hidden knowledge*/
Lemma 3 The weak safety scope of RCE includes all
SHIQ TBoxes K = {Kv,Kh} that satisfy the following
properties:
• Kv −Kvc is local w.r.t. Sig(Kvc);
• Kvc is local w.r.t. ∅;
• ∀α ∈ Kh, Kv 6` α
Proof: Let R = RCE(K), where K satisfies the given
properties. Clearly, R satisfies the honesty axiom. By the
definition of the algorithm, QY is local w.r.t. Sig(Kvc).
Since Kv−Kvc is local w.r.t. Sig(Kvc), QY ∪(Kv−Kvc) is
local w.r.t. Sig(Kvc). Since Kvc is local w.r.t. ∅, by Lemma
2, (Kv−Kvc)∪QY ∪Kvc = Kv∪QY À Kvc. By Lemma
1, R is a weakly privacy-preserving reasoner for K. ¤
It is worth noting that we do not require the hidden
knowledge Kh also to be local, as long as α ∈ Kh ⇒ Kv 6`
α. This affords considerable flexibility for ontology engi-
neers in designing the KB.
An important advantage of the CE-strategy for SHIQ
ontologies is that a weakly privacy-preserving reasoner can
be built as a meta reasoner which calls inference service
from a standard DL reasoner for SHIQ. Thus, implement-
ing a weakly privacy-preserving reasoners for SHIQ on-
tologies is quite straightforward.
7 Privacy-Preserving Reasoning with Hierar-
chical Ontologies
Unlike in the case of general DL ontologies, it is pos-
sible to define a strongly safe strategy, and hence strongly
safe privacy-preserving reasoners in the case of hierarchical
ontologies e.g., tree or DAG-structured ontologies.
Formally, a hierarchical ontology K over a finite set of
names S can be represented as a set of visible or hidden
partial order axioms, denoted by K = (S,≤) (as illustrated
in Example 4).
Reasoning with K can be reduced to the graph reacha-
bility problem by defining a corresponding directed graph
G = (V,E), where V is the vertex set corresponding to
elements of S, and E is the edge set corresponding to ≤
axioms. Let G be the set of all directed graphs. In the fol-
lowing, we will identify K with the corresponding G.
A vertex (or edge) is said to be a visible vertex (or edge)
if it is mapped from a visible term (or axiom); otherwise it
is said to be hidden. Let Eh be the set of all hidden edges,
Ev be the set of all visible edges, and E = Eh ∪ Ev . For
any set edges F ⊆ E, let F+ denote the transitive closure
of F , and F≤m = ∪mk=1F k.
For any two visible vertices x and y, y ≤ x if x is reach-
able from y in the graph G, i.e., there exists a path from y
to x (which in general, may contain both visible and hidden
edges). Note that because of the open world assumption, it
is not necessarily the case that y ≤ x is false simply because
there is no path from y to x in G.
An affirmative answer to a query about the reachability
from y to x in G is equivalent to augmenting G by adding a
new visible edge 〈y, x〉. Hence, in order to realize privacy-
preserving reasoning, we should ensure that the initial graph
G (derived from K) can be augmented with previous an-
swers without revealing the existence of hidden edges.
First, it is easy to see that strong safety and weak safety
properties can be reduced to each other in the case of hier-
archical ontologies:
Lemma 4 R is a strongly privacy-preserving reasoner for
G = (V,Ev ∪ Eh) iff R is a weakly privacy-preserving
reasoner for G = (V,Ev ∪ E+h ).
This lemma follows from the fact that E+h contains all
possible inference results that can be obtained by consider-
ing only the hidden edges Eh. Henceforth, we will focus on
weak safety.
We now proceed to define several classes of graphs that
have safe strategies with different degree of informative-
ness:
Sm,n = {G ∈ G|(E≤m − Eh)≤n ∩ Eh = ∅},
where m,n can be “+” indicating transitive closure. Graphs
in Sm,n are called (m,n)-safe. Intuitively, m represents the
End of preview.
Preview full-text

Science & Research Jobs

Keywords

agents
 
cases
 
hidden knowledge
 
hidden part
 
hierarchical ontologies
 
knowledge base
 
ontologies
 
ontology
 
selective sharing
 
semantic web applications
 
sharing
 
used special case
 
visible part