Kobbi Nissim

Kobbi Nissim
Georgetown University | GU · Department of Computer Science

Ph.D.

About

122
Publications
16,266
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,755
Citations
Additional affiliations
September 2013 - August 2014
Boston University
Position
  • Visiting research professor
September 2012 - present
Harvard University
Position
  • Visiting Scholar/Professor
September 2004 - present
Ben-Gurion University of the Negev
Position
  • Professor (Associate)

Publications

Publications (122)
Conference Paper
Full-text available
We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the...
Conference Paper
Full-text available
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,..,dn, with a query being a subset q ⊆ [n] to be answered by Σiεqdi. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to stat...
Preprint
A dynamic algorithm against an adaptive adversary is required to be correct when the adversary chooses the next update after seeing the previous outputs of the algorithm. We obtain faster dynamic algorithms against an adaptive adversary and separation results between what is achievable in the oblivious vs. adaptive settings. To get these results we...
Chapter
Streaming algorithms are algorithms for processing large data streams, using only a limited amount of memory. Classical streaming algorithms typically work under the assumption that the input stream is chosen independently from the internal state of the algorithm. Algorithms that utilize this assumption are called oblivious algorithms. Recently, th...
Preprint
Let $\pi$ be an efficient two-party protocol that given security parameter $\kappa$, both parties output single bits $X_\kappa$ and $Y_\kappa$, respectively. We are interested in how $(X_\kappa,Y_\kappa)$ "appears" to an efficient adversary that only views the transcript $T_\kappa$. We make the following contributions: $\bullet$ We develop new tool...
Preprint
Full-text available
We provide a lowerbound on the sample complexity of distribution-free parity learning in the realizable case in the shuffle model of differential privacy. Namely, we show that the sample complexity of learning $d$-bit parity functions is $\Omega(2^{d/2})$. Our result extends a recent similar lowerbound on the sample complexity of private agnostic l...
Preprint
We present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space. This results in a strong separation between oblivious and adversarially-robust streaming algorithms.
Article
Full-text available
A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private...
Chapter
The shuffle model of differential privacy [Bittau et al. SOSP 2017; Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019] was proposed as a viable model for performing distributed differentially private computations. Informally, the model consists of an untrusted analyzer that receives messages sent by participating parties via a shuffle functio...
Preprint
The shuffle model of differential privacy was proposed as a viable model for performing distributed differentially private computations. Informally, the model consists of an untrusted analyzer that receives messages sent by participating parties via a shuffle functionality, the latter potentially disassociates messages from their senders. Prior wor...
Article
Full-text available
There is a significant conceptual gap between legal and mathematical thinking around data privacy. The effect is uncertainty as to which technical offerings meet legal standards. This uncertainty is exacerbated by a litany of successful privacy attacks demonstrating that traditional statistical disclosure limitation techniques often fall short of t...
Preprint
The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives...
Article
We briefly report on a successful linear program reconstruction attack performed on a production statistical queries system and using a real dataset. The attack was deployed in test environment in the course of the Aircloak Challenge bug bounty program and is based on the reconstruction algorithm of Dwork, McSherry, and Talwar. We empirically evalu...
Preprint
Motivated by the desire to bridge the utility gap between local and trusted curator models of differential privacy for practical applications, we initiate the theoretical study of a hybrid model introduced by "Blender" [Avent et al.,\ USENIX Security '17], in which differentially private protocols of n agents that work in the local-model are assist...
Preprint
We study the problem of verifying differential privacy for straight line programs with probabilistic choice. Programs in this class can be seen as randomized Boolean circuits. We focus on two different questions: first, deciding whether a program satisfies a prescribed level of privacy; second, approximating the privacy parameters a program realize...
Preprint
A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical s...
Chapter
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in b...
Preprint
In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{\epsilon, \delta}(1)$ error and $\Theta(\epsilon\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $\Omega(1/\sqrt{n})$, and there exist p...
Preprint
In a recent paper Chan et al. [SODA '19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM '96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the...
Preprint
There is a significant conceptual gap between legal and mathematical thinking around data privacy. The effect is uncertainty as to which technical offerings adequately match expectations expressed in legal standards. The uncertainty is exacerbated by a litany of successful privacy attacks, demonstrating that traditional statistical disclosure limit...
Preprint
This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in b...
Preprint
We present a private learner for halfspaces over an arbitrary finite domain $X\subset \mathbb{R}^d$ with sample complexity $mathrm{poly}(d,2^{\log^*|X|})$. The building block for this learner is a differentially private algorithm for locating an approximate center point of $m>\mathrm{poly}(d,2^{\log^*|X|})$ points -- a high dimensional generalizati...
Preprint
We briefly report on a linear program reconstruction attack performed on a production statistical queries system and using a real dataset. The attack was deployed in test environment in the course of the Aircloak Challenge bug bounty program.
Article
This position paper observes how different technical and normative conceptions of privacy have evolved in parallel and describes the practical challenges that these divergent approaches pose. Notably, past technologies relied on intuitive, heuristic understandings of privacy that have since been shown not to satisfy expectations for privacy protect...
Preprint
While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc...
Preprint
Full-text available
Data driven segmentation is the powerhouse behind the success of online advertising. Various underlying challenges for successful segmentation have been studied by the academic community, with one notable exception - consumers incentives have been typically ignored. This lacuna is troubling as consumers have much control over the data being collect...
Article
Differential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics. These...
Article
We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STO...
Article
We revisit the problem of finding a minimum enclosing ball with differential privacy: Given a set of $n$ points in the Euclidean space $\mathbb{R}^d$ and an integer $t\leq n$, the goal is to find a ball of the smallest radius $r_{opt}$ enclosing at least $t$ input points. The problem is motivated by its various applications to differential privacy,...
Article
As organizations struggle with vast amounts of data, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. Recent attacks on such systems demonstrate that outsourced database systems mus...
Article
We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so-called {\em true answer} is the result of applying...
Conference Paper
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally...
Article
Full-text available
A new line of work demonstrates how differential privacy can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a low-sensitivity function f, then w.h.p. f(S) is close to its expectation, even though f is b...
Article
Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally...
Conference Paper
Recently, various protocols have been proposed for securely outsourcing database storage to a third party server, ranging from systems with "full-fledged" security based on strong cryptographic primitives such as fully homomorphic encryption or oblivious RAM, to more practical implementations based on searchable symmetric encryption or even on dete...
Article
Full-text available
We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.
Conference Paper
Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hard...
Article
We present a new algorithm for locating a small cluster of points with differential privacy [Dwork, McSherry, Nissim, and Smith, 2006]. Our algorithm has implications to private data exploration, clustering, and removal of outliers. Furthermore, we use it to significantly relax the requirements of the sample and aggregate technique [Nissim, Raskhod...
Article
Full-text available
The traditional notion of generalization --- i.e., learning a hypothesis whose empirical error is close to its true error --- is surprisingly brittle. As has recently been noted [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three...
Article
We consider the problem of computing the intersection of private datasets of two parties, where the datasets contain lists of elements taken from a large domain. This problem has many applications for online collaboration. In this work, we present protocols based on the use of homomorphic encryption and different hashing schemes for both the semi-h...
Article
Full-text available
We investigate the {\em direct-sum} problem in the context of differentially private PAC learning: What is the sample complexity of solving k learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving k learning tasks without privacy? In our setting, an individual example consists of a domain element...
Article
Full-text available
Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hard...
Conference Paper
We propose graph encryption schemes that efficiently support approximate shortest distance queries on large-scale encrypted graphs. Shortest distance queries are one of the most fundamental graph operations and have a wide range of applications. Using such graph encryption schemes, a client can outsource large-scale privacy-sensitive graphs to an u...
Research
Full-text available
Data driven segmentation is the powerhouse behind the success of online advertising. Various underlying challenges for successful segmentation have been studied by the academic community, with one notable exception 􀀀 consumers incentives have been typically ignored. This lacuna is troubling as consumers have much control over the data being collect...
Article
Full-text available
We prove new upper and lower bounds on the sample complexity of $(\epsilon, \delta)$ differentially private algorithms for releasing approximate answers to threshold functions. A threshold function $c_x$ over a totally ordered domain $X$ evaluates to $c_x(y) = 1$ if $y \le x$, and evaluates to $0$ otherwise. We give the first nontrivial lower bound...
Article
A new line of work, started with Dwork et al., studies the task of answering statistical queries using a sample and relates the problem to the concept of differential privacy. By the Hoeffding bound, a sample of size $O(\log k/\alpha^2)$ suffices to answer $k$ non-adaptive queries within error $\alpha$, where the answers are computed by evaluating...
Article
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data priva...
Article
Full-text available
A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private...
Article
Full-text available
We compare the sample complexity of private learning and sanitization tasks under pure ε-differential privacy [Dwork, McSherry, Nissim, and Smith TCC 2006] and approximate (ε,δ)-differential privacy [Dwork, Kenthapadi, McSherry, Mironov, and Naor EUROCRYPT 2006]. We show that the sample complexity of these tasks under approximate differential priva...
Article
Full-text available
In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private le...
Article
Full-text available
We prove new positive and negative results concerning the existence of truthful and individually rational mechanisms for purchasing private data from individuals with unbounded and sensitive privacy preferences. We strengthen the impossibility results of Ghosh and Roth (EC 2011) by extending it to a much wider class of privacy valuations. In partic...
Article
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data priva...
Article
Imagine a data set consisting of private information about individuals. The online query auditing problem is: given a sequence of queries that have already been posed about the data, their corresponding answers and given a new query, deny the answer if privacy can be breached or give the true answer otherwise. We investigate the fundamental problem...
Conference Paper
In the setting of secure multiparty computation, a set of parties wish to compute a joint function of their inputs, while preserving properties like privacy, correctness, and independence of inputs. One security property that has typically not been considered in the past relates to the length or size of the parties inputs. This is despite the fact...
Article
A formal investigation of the utility–privacy tradeoff in statistical databases has proved essential for the rigorous discussion of privacy of recent years. Initial results in this direction dealt with databases that answer (all) subset-sum queries to within some fixed distortion [Dinur and Nissim, PODC 2003]. Subsequent work has extended these res...
Conference Paper
We develop algorithms for the private analysis of network data that provide accurate analysis of realistic networks while satisfying stronger privacy guarantees than those of previous work. We present several techniques for designing node differentially private algorithms, that is, algorithms whose output distribution does not change significantly...
Article
Full-text available
In traditional mechanism design, agents only care about the utility they derive from the outcome of the mechanism. We look at a richer model where agents also assign non-negative dis-utility to the information about their private types leaked by the outcome of the mechanism. We present a new model for privacy-aware mechanism design, where we only a...
Conference Paper
Full-text available
We examine the combination of two directions in the field of privacy concerning computations over distributed private inputs - secure function evaluation (SFE) and differential privacy. While in both the goal is to privately evaluate some function of the individual inputs, the privacy requirements are significantly different. The general feasibilit...
Article
Full-text available
The notion of a universally utility-maximizing privacy mechanism was recently introduced by Ghosh, Roughgarden, and Sundararajan [STOC 2009]. These are mechanisms that guarantee optimal utility to a large class of information consumers, simultaneously, while preserving Differential Privacy [Dwork, McSherry, Nissim, and Smith, TCC 2006]. Ghosh et al...
Conference Paper
Full-text available
We revisit the problem of constructing efficient secure two-party protocols for set-intersection and set-union, focusing on the model of malicious parties. Our main results are constant-round protocols that exhibit linear communication and a linear number of exponentiations with simulation based security. In the heart of these constructions is a te...
Article
We study the implementation challenge in an abstract interdependent values model and an arbitrary objective function. We design a generic mechanism that allows for approximate optimal implementation of insensitive objective functions in ex-post Nash equilibrium. If, furthermore, values are private then the same mechanism is strategy proof. We cast...
Conference Paper
Full-text available
Learning is a task that generalizes many of the analyses that are applied to collections of data, in particular, to collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. Kasiviswanathan et al. (in SIAM J. Comput., 40(3):793–826, 2011) initiated such a discussion. They f...
Article
Full-text available
When the founders of this Journal -- Cynthia Dwork, Stephen Fienberg and Alan Karr -- made its initial call for papers, they and we identified many constituencies that participate in the scientific analysis of privacy and confidentiality. Statisticians, particularly those working within national statistical offices, have developed the field of stat...
Article
Full-text available
We revisit the problem of constructing efficient secure two-party protocols for the problems of set-intersection and set-union, focusing on the model of malicious parties. Our main results are constant-round protocols that exhibit linear communication and a (practically) linear number of exponentiations with simulation based security. In the heart...
Conference Paper
Full-text available
A coreset of a point set P is a small weighted set of points that captures some geometric properties of $P$. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private core...
Chapter
We describe output perturbation techniques that allow for a provable, rigorous sense of individual privacy. Examples where the techniques are effective span frombasic statistical computations to sophisticated machine learning algorithms.
Article
Full-text available
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we invest...
Article
Full-text available
Many approximation algorithms have been presented in the last decades for hard search problems. The focus of this paper is on cryptographic applications, where it is desired to design algorithms which do not leak unnecessary information. Specifically, we are in- terested in private approximation algorithms - efficient algorithms whose output does n...
Article
Full-text available
query restriction methods. noise is added to the output query responses. In the query restriction family of methods, the trail of queries is monitored to ensure that it is not possible to combine answers to queries so as to deduce information about any individual. We are concerned with the latter query restriction style of statistical database solu...
Conference Paper
Full-text available
Secure multiparty computation allows a group of distrusting parties to jointly compute a (possibly randomized) function of their inputs. However, it is often the case that the parties executing a computation try to solve a search problem, where one input may have a multitude of correct answers – such as when the parties compute a shortest path in a...
Conference Paper
Full-text available
Private approximation of search problems deals with finding approximate solutions to search problems while disclosing as little information as possible. The focus of this work is on private approximation of the vertex cover problem and two well studied clustering problems – k-center and k-median. Vertex cover was considered in [Beimel, Carmi, Nissi...
Article
Full-text available
We initiate a study of tradeoffs between communication and computation in well-known communication models and in other related models. The fundamental question we investigate is the following: Is there a computational task that exhibits a strong tradeoff behavior between the amount of communication and the amount of time needed for local computatio...
Conference Paper
Full-text available
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release ag- gregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instance-based additive noise. That is, th...
Article
Full-text available
Secure multiparty computation allows a group of distrusting parties to jointly compute a (possibly randomized) function of their inputs. However, it is often the case that the parties executing a computation try to solve a search problem, where one input may have a multitude of correct answers—such as when the parties compute a shortest path in a g...