
Kobbi NissimGeorgetown University | GU · Department of Computer Science
Kobbi Nissim
Ph.D.
About
124
Publications
18,982
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,334
Citations
Citations since 2017
Introduction
Additional affiliations
September 2013 - August 2014
September 2012 - present
September 2004 - present
Publications
Publications (124)
We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the...
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,..,dn, with a query being a subset q ⊆ [n] to be answered by Σiεqdi. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to stat...
In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more...
A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Research uncovered that private learners may need to exhibit significantly higher sample complexity than non-p...
A dynamic algorithm against an adaptive adversary is required to be correct when the adversary chooses the next update after seeing the previous outputs of the algorithm. We obtain faster dynamic algorithms against an adaptive adversary and separation results between what is achievable in the oblivious vs. adaptive settings. To get these results we...
Streaming algorithms are algorithms for processing large data streams, using only a limited amount of memory. Classical streaming algorithms typically work under the assumption that the input stream is chosen independently from the internal state of the algorithm. Algorithms that utilize this assumption are called oblivious algorithms. Recently, th...
Let $\pi$ be an efficient two-party protocol that given security parameter $\kappa$, both parties output single bits $X_\kappa$ and $Y_\kappa$, respectively. We are interested in how $(X_\kappa,Y_\kappa)$ "appears" to an efficient adversary that only views the transcript $T_\kappa$. We make the following contributions: $\bullet$ We develop new tool...
We provide a lowerbound on the sample complexity of distribution-free parity learning in the realizable case in the shuffle model of differential privacy. Namely, we show that the sample complexity of learning $d$-bit parity functions is $\Omega(2^{d/2})$. Our result extends a recent similar lowerbound on the sample complexity of private agnostic l...
We present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space. This results in a strong separation between oblivious and adversarially-robust streaming algorithms.
A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private...
The shuffle model of differential privacy [Bittau et al. SOSP 2017; Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019] was proposed as a viable model for performing distributed differentially private computations. Informally, the model consists of an untrusted analyzer that receives messages sent by participating parties via a shuffle functio...
The shuffle model of differential privacy was proposed as a viable model for performing distributed differentially private computations. Informally, the model consists of an untrusted analyzer that receives messages sent by participating parties via a shuffle functionality, the latter potentially disassociates messages from their senders. Prior wor...
Significance
This article addresses a gap between legal and technical conceptions of data privacy and demonstrates how it can be minimized. The article focuses on “singling out,” which is a concept appearing in the GDPR. Our analysis draws on the legislation, regulatory guidance, and mathematical reasoning to derive a technical concept—“predicate s...
The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives...
We briefly report on a successful linear program reconstruction attack performed on a production statistical queries system and using a real dataset. The attack was deployed in test environment in the course of the Aircloak Challenge bug bounty program and is based on the reconstruction algorithm of Dwork, McSherry, and Talwar. We empirically evalu...
Motivated by the desire to bridge the utility gap between local and trusted curator models of differential privacy for practical applications, we initiate the theoretical study of a hybrid model introduced by "Blender" [Avent et al.,\ USENIX Security '17], in which differentially private protocols of n agents that work in the local-model are assist...
We study the problem of verifying differential privacy for straight line programs with probabilistic choice. Programs in this class can be seen as randomized Boolean circuits. We focus on two different questions: first, deciding whether a program satisfies a prescribed level of privacy; second, approximating the privacy parameters a program realize...
A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical s...
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in b...
In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{\epsilon, \delta}(1)$ error and $\Theta(\epsilon\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $\Omega(1/\sqrt{n})$, and there exist p...
In a recent paper Chan et al. [SODA '19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM '96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the...
There is a significant conceptual gap between legal and mathematical thinking around data privacy. The effect is uncertainty as to which technical offerings adequately match expectations expressed in legal standards. The uncertainty is exacerbated by a litany of successful privacy attacks, demonstrating that traditional statistical disclosure limit...
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in b...
We present a private learner for halfspaces over an arbitrary finite domain $X\subset \mathbb{R}^d$ with sample complexity $mathrm{poly}(d,2^{\log^*|X|})$. The building block for this learner is a differentially private algorithm for locating an approximate center point of $m>\mathrm{poly}(d,2^{\log^*|X|})$ points -- a high dimensional generalizati...
We briefly report on a linear program reconstruction attack performed on a production statistical queries system and using a real dataset. The attack was deployed in test environment in the course of the Aircloak Challenge bug bounty program.
This position paper observes how different technical and normative conceptions of privacy have evolved in parallel and describes the practical challenges that these divergent approaches pose. Notably, past technologies relied on intuitive, heuristic understandings of privacy that have since been shown not to satisfy expectations for privacy protect...
While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc...
Data driven segmentation is the powerhouse behind the success of online advertising. Various underlying challenges for successful segmentation have been studied by the academic community, with one notable exception - consumers incentives have been typically ignored. This lacuna is troubling as consumers have much control over the data being collect...
Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics. These...
We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STO...
We revisit the problem of finding a minimum enclosing ball with differential privacy: Given a set of $n$ points in the Euclidean space $\mathbb{R}^d$ and an integer $t\leq n$, the goal is to find a ball of the smallest radius $r_{opt}$ enclosing at least $t$ input points. The problem is motivated by its various applications to differential privacy,...
As organizations struggle with vast amounts of data, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. Recent attacks on such systems demonstrate that outsourced database systems mus...
We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases.
Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so-called {\em true answer} is the result of applying...
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally...
A new line of work demonstrates how differential privacy can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a low-sensitivity function f, then w.h.p. f(S) is close to its expectation, even though f is b...
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally...
Recently, various protocols have been proposed for securely outsourcing database storage to a third party server, ranging from systems with "full-fledged" security based on strong cryptographic primitives such as fully homomorphic encryption or oblivious RAM, to more practical implementations based on searchable symmetric encryption or even on dete...
We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.
Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hard...
We present a new algorithm for locating a small cluster of points with differential privacy [Dwork, McSherry, Nissim, and Smith, 2006]. Our algorithm has implications to private data exploration, clustering, and removal of outliers. Furthermore, we use it to significantly relax the requirements of the sample and aggregate technique [Nissim, Raskhod...
The traditional notion of generalization --- i.e., learning a hypothesis whose empirical error is close to its true error --- is surprisingly brittle. As has recently been noted [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three...
We consider the problem of computing the intersection of private datasets of two parties, where the datasets contain lists of elements taken from a large domain. This problem has many applications for online collaboration. In this work, we present protocols based on the use of homomorphic encryption and different hashing schemes for both the semi-h...
We investigate the {\em direct-sum} problem in the context of differentially private PAC learning: What is the sample complexity of solving k learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving k learning tasks without privacy? In our setting, an individual example consists of a domain element...
Adaptivity is an important feature of data analysis---the choice of questions
to ask about a dataset often depends on previous interactions with the same
dataset. However, statistical validity is typically studied in a nonadaptive
model, where all questions are specified before the dataset is drawn. Recent
work by Dwork et al. (STOC, 2015) and Hard...
We propose graph encryption schemes that efficiently support approximate shortest distance queries on large-scale encrypted graphs. Shortest distance queries are one of the most fundamental graph operations and have a wide range of applications. Using such graph encryption schemes, a client can outsource large-scale privacy-sensitive graphs to an u...
Data driven segmentation is the powerhouse behind the success of online advertising. Various underlying challenges for successful segmentation have been studied by the academic community, with one notable exception consumers incentives have been typically ignored. This lacuna is troubling as consumers have much control over the data being collect...
We prove new upper and lower bounds on the sample complexity of $(\epsilon,
\delta)$ differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function $c_x$ over a totally ordered domain
$X$ evaluates to $c_x(y) = 1$ if $y \le x$, and evaluates to $0$ otherwise. We
give the first nontrivial lower bound...
A new line of work, started with Dwork et al., studies the task of answering
statistical queries using a sample and relates the problem to the concept of
differential privacy. By the Hoeffding bound, a sample of size $O(\log
k/\alpha^2)$ suffices to answer $k$ non-adaptive queries within error $\alpha$,
where the answers are computed by evaluating...
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data priva...
A private learner is an algorithm that given a sample of labeled individual
examples outputs a generalizing hypothesis while preserving the privacy of each
individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic
construction of private learners, in which the sample complexity is (generally)
higher than what is needed for non-private...
We compare the sample complexity of private learning and sanitization tasks under pure
ε-differential privacy [Dwork, McSherry, Nissim, and Smith TCC 2006] and approximate (ε,δ)-differential privacy [Dwork, Kenthapadi, McSherry, Mironov, and Naor EUROCRYPT 2006]. We show that the sample complexity of these tasks under approximate differential priva...
In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private le...
We prove new positive and negative results concerning the existence of truthful and individually rational mechanisms for purchasing private data from individuals with unbounded and sensitive privacy preferences. We strengthen the impossibility results of Ghosh and Roth (EC 2011) by extending it to a much wider class of privacy valuations. In partic...
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data priva...
Imagine a data set consisting of private information about individuals. The online query auditing problem is: given a sequence of queries that have already been posed about the data, their corresponding answers and given a new query, deny the answer if privacy can be breached or give the true answer otherwise. We investigate the fundamental problem...
In the setting of secure multiparty computation, a set of parties wish to compute a joint function of their inputs, while preserving properties like privacy, correctness, and independence of inputs. One security property that has typically not been considered in the past relates to the length or size of the parties inputs. This is despite the fact...
A formal investigation of the utility–privacy tradeoff in statistical databases has proved essential for the rigorous discussion of privacy of recent years. Initial results in this direction dealt with databases that answer (all) subset-sum queries to within some fixed distortion [Dinur and Nissim, PODC 2003]. Subsequent work has extended these res...
We develop algorithms for the private analysis of network data that provide accurate analysis of realistic networks while satisfying stronger privacy guarantees than those of previous work. We present several techniques for designing node differentially private algorithms, that is, algorithms whose output distribution does not change significantly...
In traditional mechanism design, agents only care about the utility they
derive from the outcome of the mechanism. We look at a richer model where
agents also assign non-negative dis-utility to the information about their
private types leaked by the outcome of the mechanism.
We present a new model for privacy-aware mechanism design, where we only
a...
We examine the combination of two directions in the field of privacy
concerning computations over distributed private inputs - secure function
evaluation (SFE) and differential privacy. While in both the goal is to
privately evaluate some function of the individual inputs, the privacy
requirements are significantly different. The general feasibilit...
The notion of a universally utility-maximizing privacy mechanism was recently introduced by Ghosh, Roughgarden, and Sundararajan [STOC 2009]. These are mechanisms that guarantee optimal utility to a large class of information consumers, simultaneously, while preserving Differential Privacy [Dwork, McSherry, Nissim, and Smith, TCC 2006]. Ghosh et al...
We revisit the problem of constructing efficient secure two-party protocols for set-intersection and set-union, focusing on
the model of malicious parties. Our main results are constant-round protocols that exhibit linear communication and a linear
number of exponentiations with simulation based security. In the heart of these constructions is a te...
We study the implementation challenge in an abstract interdependent values model and an arbitrary objective function. We design a generic mechanism that allows for approximate optimal implementation of insensitive objective functions in ex-post Nash equilibrium. If, furthermore, values are private then the same mechanism is strategy proof. We cast...
Learning is a task that generalizes many of the analyses that are applied to collections of data, in particular, to collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. Kasiviswanathan et al. (in SIAM J. Comput., 40(3):793–826, 2011) initiated such a discussion. They f...
A coreset of a point set P is a small weighted set of points that captures some geometric properties of $P$. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private core...
When the founders of this Journal -- Cynthia Dwork, Stephen Fienberg and Alan Karr -- made its initial call for papers, they and we identified many constituencies that participate in the scientific analysis of privacy and confidentiality. Statisticians, particularly those working within national statistical offices, have developed the field of stat...
We revisit the problem of constructing efficient secure two-party protocols for the problems of set-intersection and set-union, focusing on the model of malicious parties. Our main results are constant-round protocols that exhibit linear communication and a (practically) linear number of exponentiations with simulation based security. In the heart...
We describe output perturbation techniques that allow for a provable, rigorous sense of individual privacy. Examples where
the techniques are effective span frombasic statistical computations to sophisticated machine learning algorithms.
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life data
sets. We ask: what concept classes can be learned privately, namely, by an
algorithm whose output does not depend too heavily on any one input or specific
training example? More precisely, we invest...
Many approximation algorithms have been presented in the last decades for hard search problems. The focus of this paper is on cryptographic applications, where it is desired to design algorithms which do not leak unnecessary information. Specifically, we are in- terested in private approximation algorithms - efficient algorithms whose output does n...
query restriction methods. noise is added to the output query responses. In the query restriction family of methods, the trail of queries is monitored to ensure that it is not possible to combine answers to queries so as to deduce information about any individual. We are concerned with the latter query restriction style of statistical database solu...
Secure multiparty computation allows a group of distrusting parties to jointly compute a (possibly randomized) function of their inputs. However, it is often the case that the parties executing a computation try to solve a search problem, where one input may have a multitude of correct answers – such as when the parties compute a shortest path in a...
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release ag- gregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instance-based additive noise. That is, th...
Private approximation of search problems deals with finding approximate solutions to search problems while disclosing as little
information as possible. The focus of this work is on private approximation of the vertex cover problem and two well studied
clustering problems – k-center and k-median. Vertex cover was considered in [Beimel, Carmi, Nissi...
We initiate a study of tradeoffs between communication and computation in well-known communication models and in other related models. The fundamental question we investigate is the following: Is there a computational task that exhibits a strong tradeoff behavior between the amount of communication and the amount of time needed for local computatio...
Secure multiparty computation allows a group of distrusting parties to jointly compute a (possibly randomized) function of their inputs. However, it is often the case that the parties executing a computation try to solve a search problem, where one input may have a multitude of correct answers—such as when the parties compute a shortest path in a g...
The discrete logarithm problem (DLP) generalizes to the constrained DLP, where the secret exponent $x$ belongs to a set known to the attacker. The complexity of generic algorithms for solving the constrained DLP depends on the choice of the set. Motivated by cryptographic applications, we study sets with succinct representation for which the constr...
We present communication efficient secure protocols for a variety of linear algebra problems. Our main building block is a
protocol for computing Gaussian Elimination on encrypted data. As input for this protocol, Bob holds a k × k matrix M, encrypted with Alice’s key. At the end of the protocol run, Bob holds an encryption of an upper-triangular m...
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is ΣiεS f...
Given a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers { where each answer is either the true answer or \denied" (in the event that revealing the answer compromises privacy) { and given a...