Differentially Private Online Learning
ABSTRACT In this paper, we consider the problem of preserving privacy in the online
learning setting. We study the problem in the online convex programming (OCP)
framework---a popular online learning setting with several interesting
theoretical and practical implications---while using differential privacy as
the formal privacy measure. For this problem, we distill two critical
attributes that a private OCP algorithm should have in order to provide
reasonable privacy as well as utility guarantees: 1) linearly decreasing
sensitivity, i.e., as new data points arrive their effect on the learning model
decreases, 2) sub-linear regret bound---regret bound is a popular
goodness/utility measure of an online learning algorithm.
Given an OCP algorithm that satisfies these two conditions, we provide a
general framework to convert the given algorithm into a privacy preserving OCP
algorithm with good (sub-linear) regret. We then illustrate our approach by
converting two popular online learning algorithms into their differentially
private variants while guaranteeing sub-linear regret ($O(\sqrt{T})$). Next, we
consider the special case of online linear regression problems, a practically
important class of online learning problems, for which we generalize an
approach by Dwork et al. to provide a differentially private algorithm with
just $O(\log^{1.5} T)$ regret. Finally, we show that our online learning
framework can be used to provide differentially private algorithms for offline
learning as well. For the offline learning problem, our approach obtains better
error bounds as well as can handle larger class of problems than the existing
state-of-the-art methods Chaudhuri et al.
-
Citations (0)
-
Cited In (0)
Page 1
arXiv:1109.0105v2 [cs.LG] 16 Sep 2011
Differentially Private Online Learning
Prateek Jain
Microsoft Research India
prajain@microsoft.com
Pravesh Kothari
University of Texas at Austin
kothari@cs.utexas.edu
Abhradeep Thakurta∗
Pennsylvania State University
azg161@cse.psu.edu
Abstract
In this paper, we consider the problem of preserving privacy in the online learning setting. Online learning
involves learning from the data in real-time, so that the learned model as well as its outputs are also continuously
changing. This makes preserving privacy of each data point significantly more challenging as its effect on the
learned model can be easily tracked by changes in the subsequent outputs. Furthermore, with more and more
online systems (e.g. search engines like Bing, Google etc.) trying to learn their customer’s behavior by leveraging
their access to sensitive customer data (through cookies etc), the problem of privacy preserving online learning has
become critical as well.
We study the problem in the online convex programming(OCP) framework—a popular online learning setting
with several interesting theoretical and practical implications—while using differential privacy as the formal pri-
vacy measure. For this problem, we distill two critical attributes that a private OCP algorithm should have in order
to providereasonable privacy as well as utility guarantees: 1) linearly decreasing sensitivity, i.e., as new data points
arrive their effect on the learning model decreases, 2) sub-linear regret bound—regret bound is a popular good-
ness/utility measure of an online learning algorithm. Given an OCP algorithm that satisfies these two conditions,
we providea general frameworkto convert the given algorithm into a privacy preservingOCP algorithm with good
(sub-linear) regret. We then illustrate our approach by converting two popular online learning algorithms into their
differentially private variants while guaranteeing sub-linear regret (O(√T)). Next, we consider the special case of
online linear regression problems, a practically important class of online learning problems, for which we general-
ize an approach by [13] to provide a differentially private algorithm with just O(log1.5T) regret. Finally, we show
that our online learning framework can be used to provide differentially private algorithms for offline learning as
well. For the offline learning problem, our approach obtains better error bounds as well as can handle larger class
of problems than the existing state-of-the-art methods [3].
1 Introduction
As computational resources are increasing rapidly, modern websites and online systems are able to process large
amounts of information gathered from their customers in real time. While typically these websites intend to learn
and improve their systems in real-time using the available data, this also represents a severe threat to the privacy of
customers.
Forexample, consider ageneric scenario for awebsearch engine like Bing. Sponsored advertisements (ads) served
with search results form a major source of revenue for Bing, for which, Bing needs to serve ads that are relevant to the
user and the query. As each user is different and can have different definition of “relevance”, many websites typically
try to learn the user behavior using past searches as well as other available demographic information. This learning
problem has two key features: a) the advertisements are generated online in response to a query, b) feedback for
goodness of an ad for a user cannot be obtained until the ad is served. Hence, the problem is an online learning game
where the search engine tries to guess (from history and other available information) if a user would like an ad and
gets the cost/reward only after making that online decision; after receiving the feedback the search engine can again
update its model. This problem can be cast as a standard online learning problem and several existing algorithms can
be used to solve it reasonably well.
∗Part of the work was done while visiting Microsoft Research India.
1
Page 2
However, processing critical user information in real-time also poses severe threats to a user’s privacy. For ex-
ample, suppose Bing in response to certain past queries (let say about a disease), promotes a particular ad which
otherwise doesn’t appear at the top and the user clicks that ad. Then, the corresponding advertiser should be able to
guess user’s past queries, thus compromising privacy. Hence, it is critical for the search engine to use an algorithm
which not only provides correct guess about relevance of an ad to a user, but also guarantees privacy to the user. Some
of the other examples where privacy preserving online learning is critical are online portfolio management [24], online
linear prediction [20] etc.
In this paper, we address privacy concerns for online learning scenarios similar to the ones mentioned above.
Specifically, we provide a generic framework for privacy preserving online learning. We use differential privacy [11]
as the formal privacy notion, and use online convex programming (OCP) [36] as the formal online learning model.
Differential privacy is a popular privacy notion with several interesting theoretical properties. Recently, there has
been a lot of progress in differential privacy. However, most of the results assume that all of the data is available
beforehand and an algorithm processes this data to extract interesting information without compromising privacy. In
contrast, in the online setting that we consider in this paper, data arrives online1(e.g. user queries and clicks) and the
algorithm has to provide an output (e.g. relevant ads) at each step. Hence, the number of outputs produced is roughly
same as the size of the entire dataset. Now, to guarantee differential privacy one has to analyze privacy of the complete
sequence of outputs produced, thereby making privacy preservation a significantly harder problem in this setting. In a
related work, [13] also considered the problem of differential private online learning. Using the online experts model
as the underlying online learning model, [13] provided an accurate differentially private algorithm to handle counting
type problems. However, the setting and the class of problems handled by [13] is restrictive and it is not clear how
their techniques can be extended to handle typical online learning scenarios, such as the one mentioned above. See
Section 1.1 for a more detailed comparison to [13].
Online convex programming (OCP), that we use as our underlying online learning model, is an important and
powerful online learning model with several theoretical and practical applications. OCP requires that the algorithm
selects an output at each step from a fixed convex set, for which the algorithm incurs cost according to a convex
function (that maybe different at each step). The cost function is revealed only after the point is selected. Now
the goal is to minimize the regret, i.e., total “added” loss incurred in comparison to the optimal offline solution—a
solution obtained after seeing all the cost functions. OCP encompasses various online learning paradigms and has
several applications such as portfolio management [32]. Now, assuming that each of the cost function is bounded over
the fixed convex set, regret incurred by any OCP algorithm can be trivially bounded by O(T) where T is the total
number of time-steps for which the algorithm is executed. However, recently several interesting algorithms have been
developed that can obtain regret that is sub-linear in T. That is, as T → ∞, the total cost incurred is same as the
cost incurred by the optimal offline solution. In this paper, we use regret as a “goodness” or “utility” property of an
algorithm and require that a reasonable OCP algorithm should at least have sub-linear regret.
To recall, we consider the problem of differentially private OCP, where we want to provide differential privacy
guarantees along with sub-linear regret bound. To this end, we provide a general framework to convert any online
learning algorithm into a differentially private algorithm with sub-linear regret, provided that the algorithm satisfies
two criteria: a) linearly decreasing sensitivity (see Definition 3), b) sub-linear regret. We then analyze two popu-
lar OCP algorithms namely, Implicit Gradient Descent (IGD) [27] and Generalized Infinitesimal Gradient Ascent
(GIGA) [36] to guarantee differential privacy as well as˜O(√T) regret for a fairly general class of strongly convex,
Lipschitz continuous gradient functions. In fact, we show that IGD can be used with our framework for even non-
differentiable functions.We then show that if the cost functions are quadratic functions (e.g. online linear regression),
then we can use another OCP algorithm called Follow The Leader (FTL) [20, 22] along with a generalization of a
technique by [13] to guarantee O(ln1.5T) regret while preserving privacy.
Furthermore, our differentially private online learning framework can be used to obtain privacy preserving algo-
rithms for a large class of offline learning problems [3] as well. In particular, we show that our private OCPframework
can be used to obtain good generalization error bounds for various offline learning problems using techniques from
[23] (see Section 4.2). Our differentially private offline learning framework can handle a larger class of learning
problems with better error bounds than the existing state-of-the-art methods [3].
1At each time step one data entry arrives.
2
Page 3
1.1 Related Work
As more and more of world’s information is being digitized, privacy has become a critical issue. To this end,
several ad-hoc privacy notions have been proposed, however, most of them stand broken now. De-anonymization of
the Netflix challenge dataset by [31] and of the publicly released AOL search logs [1] are two examples that were
instrumental in discarding these ad-hoc privacy notions. Even relatively sophisticated notions such as k-anonymity
[34] and ℓ-diversity [28] have been permeated through by attacks [16]. Hence, in pursuit of a theoretically sound
notion of privacy , [11] proposed differential privacy, a cryptography inspired definition of privacy. This notion has
now been accepted as the standard privacy notion, and in this work we adhere to this notion for our privacy guarantees.
Over the years, the privacy community have developed differentially private algorithms for several interesting
problems [6, 7, 8]. In particular, there exists many results concerning privacy for learning problems [2, 3, 35, 29, 33].
Among these, [3] is of particular interest as they consider a large class of learning problems that can be written as
(offline) convex programs. Interestingly, our techniques can be used to handle the offline setting of [3] as well and in
fact, our method can handle larger class of learning problems with better error bounds (see Section 4.2).
As mentioned earlier, most of the existing work in differentially private learning has been in the offline setting
where the complete dataset is provided upfront. One notable exception is the work of [13], where authors formally
defined the notion of differentially private learning when the data arrives online. Specifically, [13] defined two notions
of differential privacy, namely user level privacy and event level privacy. Roughly speaking, user level privacy guar-
antees are at the granularity of each user whose data is present in the dataset. In contrast, event level privacy provides
guarantees at the granularity of individual records in the dataset. It has been shown in [13] that it is impossible to
obtain any non-trivial result with respect to user level privacy. In our current work we use the notion of event level pri-
vacy. [13] also looked at a particular online learning setting called the experts setting, where their algorithm achieves
a regret bound of O(ln1.5T) for counting problems while guaranteeing event level differential privacy. However, their
approach is restricted to experts advice setting, and cannot handle typical online learning problems that arise in prac-
tice. In contrast, we consider a significantly more practical and powerful class of online learning problems, namely,
online convex programming, and also provide a method for handling a large class of offline learning problems.
In a related line of work, there have been a few results that use online learning techniques to obtain differentially
private algorithms [18, 14]. In particular, [18] used experts framework to obtain a differentially private algorithm
for answering adaptive counting queries on a dataset. However, we stress that although these methods use online
learning techniques, however they are designed to handle the offline setting only where the dataset is fixed and known
in advance.
Recall that in the online setting, whenever a new data entry is added to D, a query has to be answered, i.e., the total
number of queries to be answer is of the order of size of the dataset. In a line of work started by [5] and subsequently
explored in details by [12, 25], it was shown that if one answers O(T) subset sum queries on a dataset D ∈ {0,1}T
with noise in each query smaller than
D. That is, when the number of queries is almost same as the size of dataset, then a reasonably “large” amount of
noise needs to be added for preserving privacy. Subsequently, there has been a lot of work in providing lower bounds
(specific to differential privacy) on the amount of noise needed to guarantee privacy while answering a given number
of queries (see [19, 25, 4]). We note that our generic online learning framework (see Section 3.1) also adds noise of
the order of T0.5+c, c > 0 at each step, thus respecting the established lower bounds. In contrast, our algorithm for
quadratic loss function (see Section 3.5) avoids this barrier by exploiting the special structure of queries that need to
be answered.
√T, then using those answers alone one can reconstruct a large fraction of
1.2 Our Contributions
Following are the main contributions of this paper:
1. We formalize the problem of privacy preserving online learning using differential privacy as the privacy no-
tion and Online Convex Programming (OCP) as the underlying online learning model. We provide a generic
differentially private framework for OCP in Section 3 and provide privacy and utility (regret) guarantees.
2. We then show that using our generic framework, two popular OCP algorithms, namely Implicit Gradient De-
scent (IGD) [27] and Generalized Infinitesimal Gradient Ascent (GIGA) [36] can be easily transformed into
3
Page 4
private online learning algorithms with good regret bound.
3. For a special class of OCP where cost functions are quadratic functions only, we show that we can improve
the regret bound to O(ln1.5T) by exploiting techniques from [13]. This special class includes a very important
online learning problem, namely, online linear regression.
4. In Section 4.2 we show that our differentially private framework for online learning can be used to solve a large
class of offline learning problems as well (where the complete dataset is available at once) and provide tighter
utility guarantees than the existing state-of-the-art results [3].
5. Finally, through empirical experiments on benchmark datasets, we demonstrate practicality of our algorithms
for practically important problems of online linear regression, as well as, online logistic regression (see Section
5).
2 Preliminaries
2.1Online Convex Programming
Online convex programming (OCP) is one of the most popular and powerful paradigm in the online learning setting.
OCP can be thought of as a game between a player and an adversary. At each step t, player selects a point xt∈ Rd
from a convex set C. Then, adversary selects a convex cost function ft: Rd→ R and the player has to pay a cost
of ft(xt). Hence, an OCP algorithm A maps a function sequence F = ?f1,f2,...,fT? to a sequence of points
X = ?x1,x2,...,xT? ∈ CT, i.e., A(F) = X. Now, the goal of the player (or the algorithm) is to minimize the
total cost incurred over a fixed number (say T) of iterations. However, as adversary selects function ftafter observing
player’s move xt, it can make the total cost incurred by the player arbitrarily large. Hence, a more realistic goal for
the player is to minimize regret, i.e., the total cost incurred when compared to the optimal offline solution x∗selected
in hindsight, i.e., when all the functions have already been provided. Formally,
Definition 1 (Regret). Let A be an online convex programming algorithm. Also, let A selects a point xt ∈ C at
t-th iteration and ft: Rd→ R be a convex cost function served at t-th iteration. Then, the regret RAof A over T
iterations is given by:
T
?
RA(T) =
t=1
ft(xt) − min
x∗∈C
T
?
t=1
ft(x∗).
Assuming ftto be a bounded function over C, any trivial algorithm A that selects a random point xt∈ C will have
O(T) regret. However, several results [27, 36] show that if each ftis a bounded Lipschitz function over C, O(√T)
regret can be achieved. Furthermore, if each ftis a “strongly” convex function, O(lnT) regret can be achieved
[27, 22].
2.2Differential Privacy
We now formally define the notion of differential privacy in the context of our problem.
Definition 2 ((ǫ,δ)-differential privacy [11, 9]). Let F = ?f1,f2,...,fT? be a sequence of convex functions. Let
A(F) = X, where X = ?x1,x2,...,xT? ∈ CTbe T outputs of OCP algorithm A when applied to F. Then, a
randomized OCP algorithm A is (ǫ,δ)-differentially private if given any two function sequences F and F′that differ
in at most one function entry, for all S ⊂ CTthe following holds:
Pr[A(F) ∈ S] ≤ eǫPr[A(F′) ∈ S] + δ
Intuitively, the above definition means that changing an fτ∈ F,τ ≤ T to some other function f′
the output sequence X by a large amount. If weconsider each fτto be some information associated with an individual,
then the above definition states that the presence or absence of that individual’s entry in the dataset will not affect the
τwill not modify
4
Page 5
output by too much. Hence, output of the algorithm A will not reveal any extra information about the individual.
Privacy parameters (ǫ,δ) decides the extent to which an individual’s entry affects the output; lower values of ǫ and δ
means higher level of privacy. Typically, δ should be exponentially small in the problem parameters, i.e., in our case
δ ≈ exp(−T).
2.3 Notation
F = ?f1,f2,...,fT? denotes the function sequence given to an OCP algorithm A and A(F) = X s.t. X =
?x1,x2,...,xT? ∈ CTrepresents output sequence when A is applied to F. We denote the subsequence of functions
F till the t-th step as Ft= ?f1,...,ft?. d denotes the dimensionality of the ambient space of convex set C. Vectors
are denoted by bold-face symbols, matrices are represented by capital letters. xTy denotes the inner product of x and
y. ?M?2denotes spectral norm of matrix M; recall that for symmetric matrices M, ?M?2is the largest eigenvalue
of M.
Typically, α is the minimum strong convexity parameter of any ft ∈ F. Similarly, L and LGare the largest
Lipschitz constant and the Lipschitz constant of the gradient of any ft∈ F. Recall that a function f : C → R is
α-strongly convex, if for all γ ∈ (0,1) and for all x,y ∈ C the following holds: f(γx + (1 − γ)y) ≤ γf(x) +
(1 − γ)f(y) −α
|f(x) − f(y)| ≤ L||x − y||2. Function f is Lipschitz continuous gradient if || ▽ f(x) − ▽f(y)||2≤ LG||x − y||2,
for all x,y ∈ C. Non-private and private versions of an OCP algorithm outputs xt+1and ˆ xt+1respectively, at time
step t. x∗denotes the optimal offline solution, that is x∗= argminx∈C
OCP algorithm A when applied for T steps.
2||x − y||2
2. Also recall that a function f is L-Lipschitz, if for all x,y ∈ C the following holds:
?T
t=1ft(x). RA(T) denotes regret of an
3 Differentially Private Online Convex Programming
In Section 2.1, we defined the online convex programming (OCP ) problem and presented a notion of utility (called
regret) for OCP algorithms. Recall that a reasonable OCP should have sub-linear regret, i.e., the regret should be
sub-linear in the number of time steps T.
In this section, we present a generic differentially private framework for solving OCP problems (see Algorithm
1). We further provide formal privacy and utility guarantees for this framework (see Theorems 1 and 2). We then use
our private OCP framework to convert two existing OCP algorithms, namely, Implicit Gradient Decent (IGD)[27]
and Generalized Infinitesimal Gradient Ascent (GIGA)[36] into differentially private algorithms using a “generic”
transformation. For both the algorithms mentioned above, we guarantee (3ǫ,2δ)-differential privacy with sub-linear
regret.
Recall that a differentially private OCP algorithm should not produce a significantly different output for a function
sequence F′
show differential privacy for an OCP algorithm, we first need to show that it is not very “sensitive” to previous cost
functions. To this end, below we formally define sensitivity of an OCP algorithm A.
Definition 3 (L2-sensitivity [11, 3]). Let F,F′be two function sequences differing in at most one entry, i.e., at most
one function can be different. Then, the sensitivity of an algorithm A : F → CTis the difference in the t-th output
xt+1= A(F)tof the algorithm A, i.e.,
F,F′||A(F)t− A(F′)t||2.
t(with high probability) when compared to Ft, where Ftand F′
tdiffer in exactly one function. Hence, to
S(A,t) = sup
As mentioned earlier, another natural requirement for an OCP algorithm is that it should have a provably low
regret bound. There exists a variety of methods in literature which satisfy this requirement up to different degrees
depending on the class of the functions ft.
Under the above two assumptions on the OCP algorithm A, we provide a general framework for adapting the
given OCP algorithm (A) into a differentially private algorithm. Formally, the given OCP algorithm A should satisfy
the following two conditions:
5