Conference Paper

Epistemic privacy

DOI: 10.1145/1376916.1376941 Conference: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada
Source: DBLP


We present a novel definition of privacy in the framework of offline (retroactive) database query auditing. Given information about the database, a description of sensitive data, and assumptions about users' prior knowledge, our goal is to determine if answering a past user's query could have led to a privacy breach. According to our definition, an audited property A is private, given the disclosure of property B, if no user can gain confidence in A by learning B, subject to prior knowledge constraints. Privacy is not violated if the disclosure of B causes a loss of confidence in A. The new notion of privacy is formalized using the well-known semantics for reasoning about knowledge, where logical properties correspond to sets of possible worlds (databases) that satisfy these properties. Database users are modelled as either possibilistic agents whose knowledge is a set of possible worlds, or as probabilistic agents whose knowledge is a probability distribution on possible worlds. We analyze the new privacy notion, show its relationship with the conventional approach, and derive criteria that allow the auditor to test privacy efficiently in some important cases. In particular, we prove characterization theorems for the possibilistic case, and study in depth the probabilistic case under the assumption that all database records are considered a-priori independent by the user, as well as under more relaxed (or absent) prior-knowledge assumptions. In the probabilistic case we show that for certain families of distributions there is no efficient algorithm to test whether an audited property A is private given the disclosure of a property B, assuming P ≠ NP. Nevertheless, for many interesting families, such as the family of product distributions, we obtain algorithms that are efficient both in theory and in practice.

Download full-text


Available from: Ronald Fagin, Nov 06, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Inference control aims at disabling a participant to gain a piece of information to be kept confidential. Considering a provider-client architecture for information systems, we present transaction-based protocols for provider-client interactions and prove that the incorporated inference control performed by the provider is effective indeed. The interactions include the provider answering a client’s query and processing update requests of two forms. Such a request is either initiated by the provider and thus possibly to be forwarded to clients in order to refresh their views, or initiated by a client according to his view and thus to be translated to the repository maintained by the provider.
    No preview · Conference Paper · Sep 2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: OWL ontologies are extensively used in the clinical sciences, with ontologies such as SNOMED CT being a component of the health information systems of several countries. Preserving privacy of information in ontology-based systems (e.g., preventing unauthorised access to system's data and ontological knowledge) is a critical requirement, especially when the system is accessed by numerous users with different privileges and is distributed across applications. Unauthorised disclosure, for example, of medical information from SNOMED-based systems could be disastrous for government organisations, companies and, most importantly, for the patients themselves. It is to be expected that privacy-related issues will become increasingly important as ontology-based technologies are integrated in mainstream applications. In this short paper, I discuss several challenges and open problems, and sketch possible research directions.
    No preview · Article · Jan 2010 · Semantic Web
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
    Full-text · Article · Jun 2010 · ACM Computing Surveys
Show more