PreprintPDF Available

# Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract and Figures

Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join the training of DNNs in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations in the neural network by \textit{Brakerski-Gentry-Vaikuntanathan (BGV)}-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong the training latency of privacy-preserving DNNs. In this paper, we propose, Glyph, a FHE-based scheme to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulation (MAC) operations. Glyph further applies transfer learning on the training of DNNs to improve the test accuracy and reduce the number of MAC operations between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains the state-of-the-art test accuracy, but reduces the training latency by $99\%$ over the prior FHE-based technique on various encrypted datasets.
Content may be subject to copyright.
Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
Qian Lou
louqian@iu.edu
Bo Feng
fengbo@iu.edu
Geoffrey C. Fox
gcf@indiana.edu
Lei Jiang
jiang60@iu.edu
Indiana University Bloomington
Abstract
Big data is one of the cornerstones to enabling and train-
ing deep neural networks (DNNs). Because of the lack of ex-
pertise, to gain beneﬁts from their data, average users have
to rely on and upload their private data to big data compa-
nies they may not trust. Due to the compliance, legal, or pri-
vacy constraints, most users are willing to contribute only
their encrypted data, and lack interests or resources to join
the training of DNNs in cloud. To train a DNN on encrypted
data in a completely non-interactive way, a recent work pro-
poses a fully homomorphic encryption (FHE)-based tech-
nique implementing all activations in the neural network by
Brakerski-Gentry-Vaikuntanathan (BGV)-based lookup ta-
bles. However, such inefﬁcient lookup-table-based activa-
tions signiﬁcantly prolong the training latency of privacy-
preserving DNNs.
In this paper, we propose, Glyph, a FHE-based scheme
to fast and accurately train DNNs on encrypted data by
switching between TFHE (Fast Fully Homomorphic En-
cryption over the Torus) and BGV cryptosystems. Glyph
uses logic-operation-friendly TFHE to implement nonlin-
ear activations, while adopts vectorial-arithmetic-friendly
BGV to perform multiply-accumulation (MAC) operations.
Glyph further applies transfer learning on the training of
DNNs to improve the test accuracy and reduce the num-
ber of MAC operations between ciphertext and ciphertext in
convolutional layers. Our experimental results show Glyph
obtains the state-of-the-art test accuracy, but reduces the
training latency by 99% over the prior FHE-based tech-
nique on various encrypted datasets.
1. Introduction
Deep learning is one of the most dominant approaches
to solving a wide variety of problems such as recommender
systems, computer vision and natural language process-
ing [12], because it has demonstrated state-of-the-art ac-
curacy. Through only sufﬁcient labeled data, the weights
of a deep learning model can be trained to achieve high
enough accuracy. Average users typically lack knowledge
and expertise to build their own deep learning models to
harvest beneﬁts from their own data, so they have to de-
pend on big data companies such as Google, Amazon and
Microsoft. However, due to compliance, legal, and pri-
vacy constraints, there are many scenarios where the data
required by the training of DNNs is extremely sensitive.
It is risky to provide personal information, e.g., ﬁnancial
or healthcare records, to untrusted companies to train deep
learning models. Federal privacy regulations such as the EU
general data protection regulation also restrict the availabil-
ity and sharing of these sensitive data.
Recent works [22,1,15] propose several cryptographic
techniques to enable the privacy-preserving training of deep
learning models. Private federated learning [15] is cre-
ated to decentralize the training of deep learning mod-
els and enable users to train with their own data locally.
QUOTIENT [1] takes advantage of multi-party computa-
tion (MPC) to interactively train deep learning models on
both the server and the user. Both federated learning and
MPC require users to stay online and heavily participate in
the DNN training. However, in some cases, average users
may not have strong interest, powerful hardware, or fast net-
work connections for interactive DNN trainings [17]. To
enable the private training of DNNs on encrypted data in a
completely non-interactive way, a recent study presents the
ﬁrst fully homomorphic encryption (FHE)-based stochas-
tic gradient descent technique [22], FHESGD. During FH-
ESGD, a user ﬁrst encrypts the private data and uploads
the encrypted data to an untrusted server that performs both
forward and backward propagations on the encrypted data
without decryption. After uploading encrypted data, the
users can simply go ofﬂine. The user privacy is preserved
in the training procedure, since the input and output data,
activations, losses and gradients are all encrypted.
However, FHESGD [22] is seriously limited by its long
training latency, because of its BGV-lookup-table-based
sigmoid activations. Speciﬁcally, FHESGD builds a Multi-
Layer Perceptron (MLP) with 3 layers to achieve <98%
test accuracy on the encrypted MNIST dataset after 50 train-
ing epochs. A mini-batch including 60 samples takes 2
hours on a 16-core CPU. FHESGD uses the BGV cryp-
tosystem [14] to implement the stochastic gradient descent
1
arXiv:1911.07101v1 [cs.LG] 16 Nov 2019
algorithm, because BGV is good at performing large vecto-
rial arithmetic operations frequently used in the MLP. How-
ever, FHESGD replaces all activations of the MLP by sig-
moid functions, and uses BGV homomorphic table lookup
operations [2] to implement the sigmoid function. A BGV
homomorphic table lookup operation in the FHESGD set-
ting is so slow that the BGV-lookup-table-based sigmoid ac-
tivations consume 98% of the training time.
In this paper, we propose a FHE-based DNN called
Glyph to enable fast and accurate training over encrypted
data. Our contributions can be summarized as:
We propose a FHE-based DNN training scheme, Glyph,
to adopt the logic-operation-friendly TFHE cryptosys-
tem [8] to implement activations such as ReLU and soft-
max in DNN trainings. The TFHE-based activations sig-
niﬁcantly reduce the activation latency.
We propose a cryptosystem switching technique to en-
able Glyph to perform activations by TFHE and switch
to the vectorial-arithmetic-friendly BGV cryptosystem
when processing fully-connected and convolutional lay-
ers. By switching between TFHE and BGV, Glyph can
substantially improve the privacy-preserving DNN train-
ing speed on encrypted data.
At last, we apply the method of transfer learning on
Glyph to not only reduce the computing overhead of
DNN trainings, but also improve its test accuracy. Com-
pared to FHESGD, Glyph reduces the training latency by
99% and improves the test accuracy by 12% on vari-
ous encrypted datasets.
2. Background
2.1. Threat Model
Although an encryption scheme protects the data sent to
external servers, untrusted servers [12] can make data leak-
age happen. Homomorphic Encryption is one of the most
promising techniques to enable a server to perform private
DNN training [22] on encrypted data. A user sends en-
crypted data to a server performing private DNN trainings
on encrypted data. More importantly, after uploading en-
crypted data to the server, the user can go ofﬂine without
participating time-consuming DNN trainings.
2.2. Fully Homomorphic Encryption
A cryptosystem that supports computation on cipher-
texts without decryption is known as homomorphic encryp-
tion(HE) [9]. HE cryptosystem encrypts the plaintext pto
the ciphertext cby a function . We have c=(p, kpub),
where kpub is the public key. Another function σde-
crypts the ciphertext cback to the plaintext p. We have
p=σ(c, kpri), where kpri is the private key. The cryptosys-
tem is homomorphic in an operation ?, if there is another op-
eration such that (x, kpub)(y, kpub) = (x ? y, kpub),
where xand yare two plaintext operands. Modern HE
cryptosystems have two modes: leveled HE and fully HE
(FHE). Each HE operation introduces noise into the cipher-
text. Leveled HE allows to compute HE functions of only
a maximal degree by designing a set of parameters. Be-
yond its maximal degree, leveled HE cannot correctly de-
crypt the ciphertext, because the accumulated noise is too
large. On the contrary, FHE can enable an unlimited num-
ber of HE operations on the ciphertext, since it uses boot-
strapping [14,8] to “refresh” the ciphertext and reduce
its noise. However, a bootstrapping operation is compu-
tationally expensive and hence time-consuming. Because
the privacy-preserving DNN training requires an impracti-
cally large maximal degree, it is impossible to train a DNN
by leveled HE cryptosystems. A recent work [22] demon-
strates the feasibility of using the FHE BGV cryptosystem
to implement DNN training on encrypted data.
2.3. BGV and TFHE
Based on the Ring-LWE (Learning With Errors) prob-
lem, several FHE cryptosystems [8,14], e.g., TFHE [8],
BFV [6], BGV [14], HEAAN [7], have been developed.
Each FHE cryptosystem can more efﬁciently process a
speciﬁc type of homomorphic operations. For instance,
TFHE [8] runs combinatorial operations on individual slots
faster. BFV [6] is good at performing large vectorial arith-
metic operations. Similar to BFV, BGV [8] manipulates
elements in large cyclotomic rings, modulo integers with
many hundreds of bits. However, BGV has less scaling op-
erations, and thus processes vectorial multiplications of ci-
phertexts faster [18,10]. At last, HEAAN [7] supports ﬂoat-
ing point computations more efﬁciently. A recent work [3]
demonstrates the feasibility of combining and switching be-
tween three Ring-LWE-based FHE cryptosystems including
TFHE, BFV and HEAAN via homomorphic operations.
2.4. Forward and Backward Propagation
The training of a DNN includes forward propagation and
backward propagation. During forward propagation, as Fig-
ure 1shows, the input data go through the layers consecu-
tively in the forward direction. The forward propagation
can be described as
ul=Wldl1+bl1
dl=f(ul)(1)
, where ulis the neuron tensor of layer l;dl1is the out-
put of layer l1and the input of layer l;Wlis the weight
tensor of layer l;bl1is the bias tensor of layer l1; and
f() is the forward activation function. We use yand tto
indicate the output of a neural network and the standard la-
bel, respectively. An L2norm loss function is deﬁned as
E(W, b) = 1
2||yt||2
2. The backward propagation can be
[1]
[1]
forward
backward
−1
[2]
−1
[ ]
−1
−1
[2]
−1
[ ]
Figure 1: The for-
ward and backward
propagation.
Figure 2: The mini-batch training la-
tency and test accuracy of a 3-layer
FHESGD-based MLP on MNIST.
FC is fully-connected layers. Act is
activation layers.
Figure 3: The mini-
batch training la-
tency of the 3-layer
TFHE-based MLP
on MNIST.
Operation BFV(s) BGV(s) TFHE(s)
MultCC 0.043 0.012 2.121
MultCP 0.006 0.001 0.092
AddCC 0.0001 0.002 0.312
TLU / 307.9 3.328
Table 1: The latency com-
parison of FHE operations.
MultCC: ciphertext ×cipher-
text. MultCP: ciphertext ×
+ciphertext. TLU: the table
lookup.
described by
δl1= (Wl)Tδlf0(ul)
Wl=dl1(δl)T
bl=δl
(2)
, where δlis the error of layer land deﬁned as ∂E
∂bl; f ’() is
the backward activation function; Wland blare weight
and bias gradients, respectively.
2.5. Motivation
The BGV-based FHESGD [22] trains a 3-layer MLP by
substituting all activations with sigmoid, and implements
sigmoid by a lookup table. However, the lookup-table-
based sigmoid signiﬁcantly increases the mini-batch train-
ing latency of FHESGD. As Figure 2shows, with an in-
creasing bitwidth of each entry of the BGV-based sigmoid
lookup table, the test accuracy of FHESGD greatly im-
proves and is approaching 98%, but its activation process-
ing time, i.e., the sigmoid table lookup latency, also sig-
niﬁcantly increases and eventually occupies >98% of the
mini-batch training latency.
It is possible to fast and accurately implement the ho-
momorphic activations including ReLU and softmax of pri-
vate training by TFHE, since the TFHE cryptosystem pro-
cesses combinatorial operations on individual slots more ef-
ﬁciently. Table 1compares the latencies of various homo-
morphic operations implemented by BGV, BFV and TFHE.
Compared to BGV, TFHE shortens the table lookup la-
tency by 100×, and thus can implement fast activation
functions. However, after we implemented the privacy-
preserving DNN training by TFHE, as Figure 3exhibits,
we found although the homomorphic activations take much
less time, the mini-batch training latency substantially in-
creases, because of the slow homomorphic MAC opera-
tions implemented by TFHE. Compared to TFHE, BGV [8]
demonstrates 17× ∼ 30×shorter latencies for a variety
of vectorial arithmetic operations such as the multiplication
between a ciphertext and a ciphertext (MultCC), the mul-
tiplication between a ciphertext and a plaintext (MultCP),
and the addition between a ciphertext and a ciphertext (Ad-
dCC). If we implement activation operations by TFHE, and
compute vectorial MAC operations by BGV, the privacy-
preserving DNN training can obtain both high test accuracy
and short training latency simultaneously.
Although a recent work [3] proposes a cryptosys-
tem switching technique to homomorphically switch be-
tween TFHE and BFV, we argue that compared to BFV,
the BGV cryptosystem can implement faster privacy-
preserving DNN training. First, as Table 1shows, BGV
computes multiplications between ciphertexts and plain-
texts faster than BFV, because it has less scaling opera-
tions [18,10]. Second, the state-of-the-art implementa-
tion of BFV, Microsoft SEAL [24], does not support boot-
strapping. Therefore, we cannot adopt BFV for FHE-based
privacy-preserving DNN training. In this paper, we propose
a new cryptosystem technique to enable the homomorphic
switching between BGV and TFHE.
3. Related Work
A ﬂurry of prior works use various FHE cryptosystems
including TFHE [4] and BGV [16] to implement leveled-
HE-enabled privacy-preserving DNN inferences, where
only the input, output data and activations are encrypted
but the pre-trained weights are unencrypted. Compared to
inferences, the fully-HE-enabled privacy-preserving DNN
training is more computationally expensive, since it needs
to compute the errors and gradients. Moreover, the inputs,
activations and weights are all encrypted during the privacy-
preserving DNN training. The ﬁrst fully-HE-enabled DNN
training technique [22] relies on the vectorial-arithmetic-
friendly BGV cryptosystem.
Besides fully HE, recent works adopt multi-party com-
putation [1] and private federated learning [15] to enable
the privacy-preserving training of DNNs. Both schemes
heavily involve users in the hardware-resource-demanding
privacy-preserving DNN training. However, average users
may not have strong motivation or powerful computing
hardware to join the privacy-preserving DNN training. In
this paper, we propose a fully-HE-based privacy-preserving
DNN training technique that requires users to only upload
their encrypted data.
Algorithm 1 The TFHE-based forward ReLU
Input:ul[i][0 : n1] (ul[i][z]is the zth bit of ul[i])
Output:dl[i][0 : n1]
1: dl[i][n1]=0
2: ul[i][n1] = HomoNot(ul[i][n1])
3: for index = 1;index < n 1;index + + do
4: dl[i][index]=HomoAND(ul[i][index],ul[i][n1])
return dl[i][0 : n1]
4. Glyph
4.1. TFHE-based Activations
To accurately train a FHE-based DNN, we propose
TFHE-based homomorphic ReLU and softmax activation
units. We construct a ReLU unit by TFHE homomorphic
gates with bootstrapping, and build a softmax unit by TFHE
homomorphic multiplexers.
ReLU: The forward ReLU of the ith neuron in the layer
lcan be summarized as
ReLU(µl[i]) = dl[i] = (µl[i]if µl[i]0,
0otherwise.(3)
, where µl[i]is the ith neuron in the layer l. The backward
iReLU for the ith neuron in the layer lcan be described as
iReLU(µl[i], δl[i]) = δl1[i] = (δl[i]if µl[i]0.
0otherwise.(4)
, where δl[i]is the ith error of layer l. Our TFHE-based for-
ward ReLU unit can be implemented as Algorithm 1, where
we ﬁrst set the most signiﬁcant bit of dl[i],dl[i][n1], to
0, so that dl[i]can be always non-negative. We then get
the negation of the most signiﬁcant bit of ul[i],ul[i][n1],
by a TFHE homomorphic NOT gate that even does not re-
quire bootstrapping [9]. If ul[i]is positive, ul[i][n1] = 1;
otherwise ul[i][n1] = 0. At last, we compute dl[i][0 :
n2] by homomorphically ANDing each bit of ul[i]with
ul[i][n1]. So if ul[i]is positive, dl[i] = µl[i]; otherwise
dl[i] = 0. An n-bit forward ReLU unit requires 1 TFHE
NOT gate without bootstrapping and n2TFHE AND
gates with bootstrapping.
Algorithm 2 The TFHE-based backward ReLU
Input:δl[i][0 : n1] and ul[i][n1].
Output:δl1[i][0 : n1].
1: ul[i][n1] = HomoNot(ul[i][n1])
2: for index = 0;index < n;index + + do
3: δl1[i][index]=HomoAND(δl[i][index],ul[i][n1])
return δl1[i][0 : n1]
In contrast, the backward iReLU takes the ith error of layer
l,δl[i], and the most signiﬁcant bit of ul[i],ul[i][n1] as
inputs. It generates the ith error of layer l1,δl1[i]. Our
TFHE-based backward iReLU unit can be built by Algo-
rithm 2, where we ﬁrst compute the negation of the most
signiﬁcant bit of ul[i],ul[i][n1]. We then compute each
bit of δl1[i]by ANDing each bit of δl[i]with ul[i][n1].
If ul[i][n1] = 0,δl1[i] = δl[i]; otherwise δl1[i] =
0. An n-bit backward iReLU unit requires 1 TFHE NOT
gate without bootstrapping and n1TFHE AND gates
with bootstrapping. Our TFHE-based forward or back-
ward ReLU function takes only 0.1 second, while the BGV-
lookup-table-based activation consumes 307.9 seconds on
our CPU baseline.
TFHE MUX
[1]
[ ]
[2]
[ ]
MUX
MUX
MUX
[0: 2]
[ ]
[0]
0[0: 2] 1[0: 2]
[ ]
2[0: 2] 3[0: 2] 4[0: 2] 5[0: 2] 6[0:2] 7[0: 2]
MUX MUX MUX MUX
Figure 4: A 3-bit softmax unit.
Softmax:softmax takes n ul[i]as its input and nor-
malizes them into a probability distribution consisting of n
probabilities proportional to the exponentials of the inputs.
The softmax activation can be described as
softmax(µl[i]) = dl[i] = eµl[i]
Σieµl[i](5)
We use TFHE homomorphic multiplexers to implement a
softmax unit shown in Figure 4, where we have 8 entries
denoted as S0S7for a 3-bit TFHE-lookup-table-based
softmax unit and each entry has 3-bit. The ith neuron ul[i]
is used to look up one of the eight entries, and the output
is dl[i]. There are two TFHE gates with bootstrapping on
the critical path of each TFHE homomorphic multiplexer.
An n-bit softmax unit requires 2nTFHE gates with boot-
strapping. Compared to BGV-lookup-table-based softmax,
our TFHE-based softmax unit reduces the activation latency
from 307.9 seconds to only 3.3 seconds. To efﬁciently
back-propagate the loss of softmax, FHESGD [22] uses a
quadratic loss function to reduce the computing overhead
of logarithmic operations in a cross-entropy loss function.
Our experimental results show the quadratic loss function
achieves the same test accuracy but costs much less comput-
ing overhead. So in this paper we also adopt the derivative
of quadratic loss function described as
isoftmax(dl[i], t[i]) = δl[i] = dl[i]t[i](6)
, where t[i]is the ith ground truth. The quadratic loss func-
tion requires only homomorphic multiplications and addi-
tions. Although it is feasible to implement the quadratic loss
function by TFHE, when considering the switching over-
head from BGV to TFHE, we use BGV to implement the
Pooling. It is faster to adopt TFHE to implement max
pooling operations. But considering the switching overhead
from BGV to TFHE, we adopt BGV to implement average
pooling operations requiring only homomorphic additions
and multiplications.
4.2. The Switching between BGV and TFHE
BGV can efﬁciently process vectorized arithmetic oper-
ations, while TFHE runs logic operations faster. During the
training of Glyph, we plan to use BGV for convolutional,
fully-connected, average pooling, and batch normalization
layers, and adopt TFHE for activation operations. To use
both BGV and TFHE, we propose a cryptosystem switch-
ing technique switching Glyph between BGV and TFHE
cryptosystems.
Both BGV and TFHE are built upon the the Ring-LWE
problem [8,14], but they cannot na¨
ıvely switch between
each other. Because BGV and TFHE work on different
plaintext spaces. The plaintext space of BGV is the ring
Rp=Z[X]/(XN+ 1) mod pr, where pis a prime and
ris an integer. We denote the BGV plaintext space as
ZN[X] mod pr. TFHE has three plaintext spaces [9] in-
cluding TLWE,TRLWE and TRGSW. TLWE encodes in-
dividual continuous plaintexts over the torus T=R/Z
mod 1. TRLWE encodes continuous plaintexts over R[X]
mod (XN+ 1) mod 1. We denote the TRLWE plaintext
space as TN[X] mod 1, which can be viewed as the pack-
ing of Nindividual coefﬁcients. TRGSW encodes inte-
ger polynomials in ZN[X]with bounded norm. Through
key-switching, TFHE can switch between these three plain-
text spaces. Our cryptosystem switching scheme maps
the plaintext spaces of BGV and TFHE to a common al-
gebraic structure using natural algebraic homomorphisms.
The cryptosystem switching can then happen in the com-
mon algebraic structure.
BGV
TRLWE slots TFHE
1
2Key-Switch
Extract sample
3
4
Key-Switch
5
Extract sample
6
Figure 5: The switching between TFHE and BGV. Steps
from to ¸are switching from BGV to TFHE; Steps from
¹to »are switching from TFHE to BGV.
Our cryptosystem can enable Glyph to use both TFHE
and BGV cryptosystems by homomorphically switching be-
tween different plaintext spaces, as shown in Figure 5.
From BGV to TFHE. The switch from BGV to TFHE
homomorphically transforms the ciphertext of NBGV
slots encrypting Nplaintexts over ZN[X] mod prto
KTLWE-mode TFHE ciphertexts, each of which en-
crypts plaintexts over T=R/Zmod 1. The switch
from BGV to TFHE includes three steps. Based
on Lemma 1 in [3], ZN[X] mod prhomomorphically
multiplying pris a ZN[X]-module isomorphism from
Rp=ZN[X] mod prto the submodule of TN[X]gen-
erated by pr. Via multiplying pr, we can convert in-
teger coefﬁcients in the plaintext space of BGV into a
subset of torus Tconsisting of multiples of pr. In this
way, we extract Ncoefﬁcients from the BGV plaintexts
over ZN[X] mod prto form TN.·Based on The-
orem 2 in [3], we use the functional key-switching to
homomorphically convert TNinto TN[X], which is the
plaintext space of the TRLWE-mode of TFHE. ¸We
adopt the SampleExtract function [3] of TFHE to ho-
momorphically achieve Kindividual TLWE ciphertexts
from TN[X]. Given a TRLWE ciphertext cof a plaintext
µ,SampleExtract(c) extracts from cthe TLWE sample
that encrypts the ith coefﬁcient µiwith at most the same
noise variance or amplitude as c.
From TFHE to BGV. The switch from TFHE to BGV
is to homomorphically transform KTFHE ciphertexts
in the TLWE-mode (m0, m1, .. . , mK1)in TKto a
BGV N-slot ciphertext whose plaintexts are over ZN[X]
mod pr.¹Based on Theorem 3 in [3], we can use
the functional gate bootstrapping of TFHE to restrict the
plaintext space of TFHE in the TLWE-mode to an in-
teger domain ZK
prconsisting of multiples of pr.º
The plaintext space transformation from ZK
prto ZN
pris
aZN[X]-module isomorphism, so we also can use the
key-switching to implement it. »At last, we can use
the SampleExtract function of TFHE to homomorphi-
cally obtain the BGV N-slot ciphertext whose plaintexts
are over ZN[X] mod pr.
4.3. Transfer Learning for Private DNN Training
Although FHESGD [22] shows that it is feasible to ho-
momorphically train a 3-layer MLP, it is still very challeng-
ing to homomorphically train a deep convolutional neural
network (CNN), because of the huge computing overhead
of homomorphic convolutions. We propose to use trans-
fer learning to reduce the computing overhead of homomor-
phic convolutions in privacy-preserving CNN trainings. Al-
though several prior works [5,21] adopt transfer learning in
privacy-preserving inferences, to our best knowledge, this is
the ﬁrst work to use transfer learning in privacy-preserving
CNN trainings.
Transfer learning [28,23,13] can be used to reuse
knowledge among different datasets in the same CNN ar-
chitecture, since the ﬁrst several convolutional layers of the
CNN extracts general features independent of datasets. Ap-
plying transfer learning in privacy-preserving CNN train-
ings brings two beneﬁts. First, transfer learning reduces the
number of trainable layers, i.e., the weights in the convo-
lutional layers are ﬁxed, so that the training latency can be
greatly reduced. Second, we can convert computationally
expensive convolutions between ciphertext and ciphertext
to cheaper convolutions between ciphertext and plaintext,
because the ﬁxed weights in the convolutional layers are not
updated by encrypted weight gradients. Moreover, transfer
learning does not hurt the security of the FHE-based DNN
training, since the input, activations, losses and gradients
are still encrypted.
Public data
Private
Medical data
C
O
N
V
1
S
o
f
t
Transfer
Frozen & unencrypted weights Train encrypted
Weights
B
N
1
R
e
L
U
1
P
o
o
l
1
C
O
N
V
2
B
N
2
R
e
L
U
2
P
o
o
l
2
F
C
1
R
e
L
U
3
F
C
2
C
O
N
V
1
S
o
f
t
B
N
1
R
e
L
U
1
P
o
o
l
1
C
O
N
V
2
B
N
2
R
e
L
U
2
P
o
o
l
2
F
C
1
R
e
L
U
3
F
C
2
Figure 6: An example of transfer learning in the privacy-
preserving CNN training.
We show an example of applying transfer learning in the
privacy-preserving CNN training in Figure 6. We reuse
the ﬁrst two convolutional layers trained by unencrypted
CIFAR-10, discard the last two fully-connected layers, and
add two randomly initialized fully-connected layers, when
homomorphically training of the same CNN architecture on
an encrypted skin cancer dataset [26]. During the privacy-
preserving training on the skin cancer dataset, we update
the weights only in the last two fully-connected layers. In
this way, the privacy-preserving model can reuse the general
features learned from public unencrypted datasets. Mean-
while, in the privacy-preserving training, the computations
on the ﬁrst several convolutional and batch normalization
layers are computationally cheap, since their weights are
ﬁxed and unencrypted.
5. Experimental Methodology
5.1. The Setting of Cryptosystems
For BGV, we used the same parameter setting rule
as [11], and the HElib [2] library to implement all re-
lated algorithms. We adopted the mth cyclotomic ring
with m= 210 1, corresponding to lattices of dimension
ψ(m) = 600. This native plaintext space has 60 plaintext
slots which can pack 60 input ciphertexts. The BGV set-
ting parameters yield a security level of >80 bits. Both
BGV and TFHE implement bootstrapping operations and
support fully homomorphic encryption. We set the param-
eters of TFHE to the same security level as BGV, and used
the TFHE [9] library to implement all related algorithms.
TFHE is a three-level scheme. For ﬁrst-level TLWE, we set
the minimal noise standard variation to α= 6.10·105and
the count of coefﬁcients to n= 280 to achieve the security
level of λ= 80. The second level TRLWE conﬁgures the
minimal noise standard variation to α= 3.29 ·1010, the
count of coefﬁcients to n= 800, and the security degree to
λ= 128. The third-level TRGSW sets the minimal noise
standard variation to α= 1.42 ·1010, the count of coef-
ﬁcients to n= 1024, the security degree to λ= 156. We
adopted the same key-switching and extract-sample param-
eters of TFHE as [3].
5.2. Simulation, Dataset and Network Architecture
We evaluated all schemes on an Intel Xeon E7-8890
v4 2.2GHz CPU with 256GB DRAM. It has two sockets,
each of which owns 12 cores and supports 24 threads. Our
encrypted datasets include MNIST [20] and Skin-Cancer-
MNIST [26]. Skin-Cancer-MNIST consists of 10015 der-
matoscopic images and includes a representative collection
of 7 important diagnostic categories in the realm of pig-
mented lesions. We grouped it into a 8K training dataset
and a 2K test dataset. We also used SVHN [25] and CIFAR-
10 [19] to pre-train our models which are for transfer learn-
ing on encrypted datasets. We adopted two network archi-
tectures, a 3-layer MLP [22] and a 4-layer CNN shown in
Figure 6. The 3-layer MLP has a 28 ×28 input layer, a
128-neuron hidden layer and a 32-neuron hidden layer. The
CNN includes two convolutional layers, two batch normal-
ization layers, two pooling layers, three ReLU layers and
two fully-connected layers. The CNN architectures are dif-
ferent for MNIST and Skin-Cancer-MNIST. For MNIST,
the input size is 28 ×28. There are 6×3×3and 16 ×3×3
weight kernels, respectively, in two convolutional layers.
Two fully connected layers have 84 neurons and 10 neu-
rons respectively. For Skin-Cancer-MNIST, the input size is
28×28×3. There are 64×3×3×3and 96×64×3×3weight
kernels in two convolutional layers, respectively. Two fully-
connected layers are 128 neurons and 7 neurons, respec-
tively. We quantized the inputs, weights and activations of
two network architectures with 8-bit by the training quanti-
zation technique in SWALP [27].
6. Results and Analysis
6.1. MNIST
FHESGD. The mini-batch training latency breakdown
of a 3-layer FHESGD-based MLP [22] on a single CPU
core is shown in Table 2. During a mini-batch, the MLP
is trained with 60 MNIST images. Each BGV lookup-table
operation consumes 307.9 seconds, while a single BGV
MAC operation costs only 0.012 seconds. Although the ac-
tivation layers of FHESGD require only a small number of
BGV lookup-table operations, they consumes 98% of the
total training latency. The FHESGD-based MLP makes
all homomorphic multiplications happen between cipher-
text and ciphertext, though the homomorphic multiplica-
tions between ciphertext and plaintext is computationally
cheaper. The total training latency of the 3-layer FHESGD-
based MLP for a mini-batch is 118K seconds, which is
about 1.35 days [22].
Table 2: The mini-batch training latency of the FH-
ESGD [22]-based MLP. HOP includes the number of ho-
momorphic operations. MultCC indicates the number of
multiplications between ciphertext and ciphertext. AddCC
is the number of additions between ciphertext and cipher-
text. TLU means the number of table-lookup operations.
FC is a fully-connected layer. Act is an activation layer.
Layers Time(s) HOP MultCC AddCC TLU
FC1-forward 1357 201K 100K 100K 0
Act1-forward 44.8K 128 0 0 128
FC2-forward 54.4 8.2K 4196 4.2K 0
Act2-forward 11.7K 32 0 0 32
FC3-forward 4.32 640 320 320 0
Act3-forward 1.98K 10 0 0 10
Act3-error 0.1 10 0 10 0
FC3-error 4.32 640 320 320 0
FC3-gradient 4.32 640 320 320 0
Act2-error 11.7K 32 0 0 32
FC2-error 55.4 8.2K 4.2K 4.2K 0
FC2-gradient 55.4 8.2K 4.2K 4.2K 0
Act1-error 44.8K 128 0 0 128
FC1-gradient 1356 201K 100 100K 0
Total 118K 429K 213K 21K 330
Table 3: The mini-batch training latency of the Glyph-based
MLP with TFHE activations and cryptosystem switching.
HOP includes the number of homomorphic operations.
MultCC indicates the number of multiplications between
ciphertext and ciphertext. AddCC is the number of addi-
tions between ciphertext and ciphertext. Switch means the
cryptosystem switching. FC is a fully-connected layer. Act
denotes an activation layer.
Layers Time(s) HOP MultCC AddCC Act Switch
FC1-forward 1370 201K 100K 100K 0 BGV-TFHE
Act1-forward 19.2 128 0 0 128 TFHE-BGV
FC2-forward 57.1 8.2K 4.1K 4.1K 0 BGV-TFHE
Act2-forward 4.82 32 0 0 32 TFHE-BGV
FC3-forward 6.02 640 320 320 0 BGV-TFHE
Act3-forward 34.76 10 0 0 10 TFHE-BGV
Act3-error 0.1 10 0 0 0 -
FC3-error 4.32 640 320 320 0 -
FC3-gradient 6.02 640 320 320 0 BGV-TFHE
Act2-error 4.82 32 0 0 32 TFHE-BGV
FC2-error 55.4 8.2K 4.1K 4.1K 0 -
FC2-gradient 62.1 8.2K 4.1K 4.1K 0 BGV-TFHE
Act1-error 19.2 128 0 0 128 TFHE-BGV
FC1-gradient 1356 201K 100K 100K 0 -
Total 2991 429K 213K 21K 330 -
TFHE Activation and Cryptosystem Switching. We
replace all activations of the 3-layer FHESGD-based MLP
by our TFHE-based ReLU and softmax activations, and
build it as the Glyph-based MLP. We also integrate the cryp-
tosystem switchings into the Glyph-based MLP to perform
homomorphic MAC operations by BGV, and conduct ac-
tivations by TFHE. The mini-batch training latency break-
down of the 3-layer Glyph-based MLP on a single CPU core
is shown in Table 3. Because of the logic-operation-friendly
TFHE, the processing latency of activation layers of Glyph
signiﬁcantly decreases. The cryptosystem switchings intro-
duce only small computing overhead. For instance, com-
pared to the counterpart in the FHESGD-based MLP, FC1-
forward increases the processing latency by only 0.96%,
due to the cryptosystem switching overhead. Because of
fast activations, compared to the FHESGD-based MLP, our
Glyph-based MLP reduces the mini-batch training latency
by 97.4% but maintains the same test accuracy.
Transfer Learning on CNN. We use our TFHE-based
activations and cryptosystem switching technique to build
a Glyph-based CNN, whose detailed architecture is ex-
plained in Section 5.2. We implement transfer learning
in the Glyph-based CNN by ﬁxing the convolutional lay-
ers trained by SVHN and training only two fully-connected
layers. The mini-batch training latency breakdown of the
Glyph-based CNN with transfer learning on a single CPU
core is shown in Table 4. Because the weights of the convo-
lutional layers are unencrypted and ﬁxed, our Glyph-based
CNN signiﬁcantly reduces the number of multiplications
between ciphertext and ciphertext (MultCC), and adds only
computationally cheap multiplications between ciphertext
and plaintext (MultCP). The Glyph-based CNN decreases
the training latency by 56.7%, but improves the test accu-
racy by 2% over the Glyph-based MLP.
Table 4: The mini-batch training latency of the Glyph-based
CNN with TFHE activations, cryptosystem switching, and
transfer learning. HOP includes the number of homomor-
phic operations. MultCC indicates the number of multipli-
cations between ciphertext and ciphertext. MultCP means
the number of multiplications between ciphertext and plain-
text. AddCC is the number of additions between ciphertext
and ciphertext. Switch means the cryptosystem switching.
Conv means a convolutional layer. FC is a fully-connected
layer, while Act denotes an activation layer. BN is a batch
normalization layer. Pool denotes an average pooling layer.
Layers Time(s) HOP MultCP MultCC AddCC Act Switch
Conv1-forward 69 73K 37K 0 37K 0 -
BN1-forward 61 15K 8K 0 8K 0 BGV-TFHE
Act1-forward 321 4.1K 0 0 0 4.1K TFHE-BGV
Pool1-forward 17 18K 9.1K 0 9.1K 0 -
Conv2-forward 33 35K 17K 0 17K 0 -
BN2-forward 27 7K 3K 0 3K 0 BGV-TFHE
Act2-forward 151 1.9K 0 0 0 1.9K TFHE-BGV
Pool2-forward 7 7.2K 3.6K 0 3.6K 84 -
FC1-forward 228.2 67K 0 34K 34K 0 BGV-TFHE
Act3-forward 8.2 84 0 0 0 84 TFHE-BGV
FC2-forward 6.1 1.68K 0 840 840 0 BGV-TFHE
Act4-forward 68.6 10 0 0 0 10 TFHE-BGV
Act4-error 0.1 10 0 0 10 0 -
FC2-error 6 1.68K 0 840 840 0 -
FC2-gradient 31 1.68K 0 840 840 0 BGV-TFHE
Act3-error 32 84 0 0 0 84 TFHE-BGV
FC1-gradient 227 67K 0 34K 34K 0 -
Total 1293 300K 78K 71K 148K 6.3K -
Figure 7: The accuracy comparison on MNIST.
Test Accuracy. The test accuracy comparison of the
FHESGD-based MLP and the Glyph-based CNN is shown
in Figure 7, where all networks are trained in the plaintext
domain. It takes 5 epochs for the FHESGD-based MLP to
reach 96.4% test accuracy on MNIST. After 5 epochs, the
Glyph-based CNN can achieve 97.1% test accuracy even
without transfer learning. By reusing low-level features of
the SVHN dataset, the Glyph-based CNN with transferring
learning obtains 98.6% test accuracy. The CNN architec-
ture and transferring learning particularly can help the FHE-
based privacy-preserving DNN training to achieve higher
test accuracy when we do not have long time for trainings.
Figure 8: The accuracy comparison on Skin-Cancer.
6.2. Skin-Cancer-MNIST
We built the Glyph-based MLP and CNN architectures
for Skin-Cancer-MNIST by our TFHE-based activations,
cryptosystem switching and transfer learning. Since the
network on Skin-Cancer-MNIST (Cancer) is larger than
MNIST, the training latency is the larger as the Table 5
shows. The mini-batch training latency changes on Skin-
Cancer-MNIST are similar to those on MNIST. The test
accuracy comparison of the FHESGD-based MLP and the
Glyph-based CNN is shown in Figure 8. For transfer-
ring learning, we ﬁrst train the Glyph-based CNN with
CIFAR-10, ﬁx its convolutional layers, and then train its
fully-connected layers with Skin-Cancer-MNIST. On such
a more complex dataset, compared to the FHESGD-based
MLP, the Glyph-based CNN without transferring learning
increases the training accuracy by 2% at the 15th epoch.
The transferring learning further improves the test accuracy
of the Glyph-based CNN to 73.2%, i.e., a 4% test accuracy
boost. Our TFHE-based activations, cryptosystem switch-
ing and transfer learning makes Glyph efﬁciently support
deep CNN architecture.
Table 5: The comparison of overall training latency.
Dataset Network Thread # Mini-batch Epoch # Time Acc(%)
MNIST
MLP 1 33 hours 50 187 years 97.8
48 2.3 hours 50 13.4 years 97.8
CNN 1 0.44 hours 5 2.46 months 98.6
48 0.04 hours 5 8 days 98.6
Cancer
MLP 1 34.1 hours 30 15.6 years 70.2
48 2.4 hours 30 1.1 years 70.2
CNN 1 0.93 hours 15 0.21 years 73.2
48 0.08 hours 15 7 days 73.2
6.3. Overall Training Latency and Scalability
The overall training latency of multiple threads on our
CPU baseline is shown in Table 5. We measured the mini-
batch training latency by running various FHE-based train-
ing for a mini-batch. We estimated the total training latency
via the product of the mini-batch training latency and the
total mini-batch number for a training. For MNIST, the
FHESGD-based MLP requires 50 epochs, each of which
includes 1000 mini-batches (60 images), to obtain 97.8%
test accuracy. On a single CPU core, the training of the
FHESGD-based MLP needs 187-year, which is impracti-
cal. On the contrary, our Glyph-based CNN requires only
5 epochs to achieve 98.6% test accuracy. The training of
the Glyph-based CNN needs 2.46 months. If we use 48
threads to train the Glyph-based CNN, the overall training
latency can be reduced to 8 days. Multi-threading can effec-
tively increase the training parallelism, since the weight up-
dates in Stochastic Gradient Decent (SGD) are independent.
With 48 threads, we observed a 9.3×training speedup, be-
cause the memory bandwidth has become the performance
scaling bottleneck. For Skin-Cancer-MNIST, it takes 30
epochs, each of which includes 134 mini-batches, for the
FHESGD-based MLP to achieve 70.2% test accuracy. In
contrast, our Glyph-based CNN requires only 15 epochs to
obtain 73.2% test accuracy. By 48 threads, the training of
the Glyph-based CNN can be completed within 7 days.
7. Conclusion
In this paper, we propose, Glyph, a FHE-based privacy-
preserving technique to fast and accurately train DNNs on
encrypted data. Glyph performs ReLU and softmax by
logic-operation-friendly TFHE, while conducts MAC op-
erations by vectorial-arithmetic-friendly BGV. We create a
cryptosystem switching technique to switch Glyph between
TFHE and BGV. We further apply the method of transfer
learning on Glyph to support CNN architectures and re-
duce the number of homomorphic multiplications between
ciphertext and ciphertext. Our experimental results show
Glyph obtains the state-of-the-art test accuracy, but reduces
the training latency by 99% over the prior FHE-based tech-
nique on multiple encrypted datasets.
References
[1] Nitin Agrawal, Ali Shahin Shamsabadi, Matt J. Kusner, and
Adri
a Gasc´
on. QUOTIENT: Two-Party Secure Neural Net-
work Training and Prediction. In ACM SIGSAC Conference
on Computer and Communications Security, 2019. 1,3
[2] Flavio Bergamaschi. HElib: an Implementation of homo-
morphic encryption. https://github.com/homenc/
HElib, 2019. 2,6
[3] Christina Boura, Nicolas Gama, Mariya Georgieva, and
Dimitar Jetchev. Chimera: Combining ring-lwe-based
fully homomorphic encryption schemes. Cryptology ePrint
Archive, Report 2018/758, 2018. https://eprint.
iacr.org/2018/758.2,3,5,6
[4] Florian Bourse, Michele Minelli, Matthias Minihold, and
Pascal Paillier. Fast Homomorphic Evaluation of Deep Dis-
cretized Neural Networks. In Advances in Cryptology, 2018.
3
[5] Alon Brutzkus et al. Low Latency Privacy Preserving In-
ference. In International Conference on Machine Learning,
2019. 5
[6] Hao Chen, Kim Laine, and Rachel Player. Simple encrypted
arithmetic library-SEAL v2.1. In International Conference
on Financial Cryptography and Data Security. Springer,
2017. 2
[7] Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran
Kim, and Yongsoo Song. Bootstrapping for Approximate
Homomorphic Encryption. In International Conference on
the Theory and Applications of Cryptographic Techniques,
2018. 2
[8] Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Ma-
lika Izabach
ene. TFHE: Fast Fully Homomorphic Encryp-
tion over the Torus. Journal of Cryptology, 2018. 2,3,5
[9] Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Ma-
lika Izabach`
ene. TFHE: Fast Fully Homomorphic Encryp-
tion Library, August 2016. https://tfhe.github.io/tfhe/. 2,4,
5,6
[10] Anamaria Costache and Nigel P. Smart. Which ring based
somewhat homomorphic encryption scheme is best? Cryp-
tology ePrint Archive, Report 2015/889, 2015. https:
//eprint.iacr.org/2015/889.2,3
[11] Jack L. H. Crawford, Craig Gentry, Shai Halevi, Daniel Platt,
and Victor Shoup. Doing Real Work with FHE: The Case of
Logistic Regression. In the Workshop on Encrypted Comput-
ing Applied Homomorphic Cryptography, 2018. 6
[12] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin
Lauter, Michael Naehrig, and John Wernsing. CryptoNets:
Applying Neural Networks to Encrypted Data with High
Throughput and Accuracy. In International Conference on
Machine Learning, 2016. 1,2
[13] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma,
Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang,
Gang Wang, Jianfei Cai, et al. Recent advances in convo-
lutional neural networks. Pattern Recognition, 77:354–377,
2018. 5
[14] Shai Halevi and Victor Shoup. Algorithms in HElib. In Ad-
vances in Cryptology, 2014. 1,2,5
[15] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard
Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne.
Private federated learning on vertically partitioned data via
entity resolution and additively homomorphic encryption.
CoRR, abs/1711.10677, 2017. 1,3
[16] Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi.
Deep Neural Networks Classiﬁcation over Encrypted Data.
In ACM Conference on Data and Application Security and
Privacy, 2019. 3
[17] Ryan Karl, Timothy Burchﬁeld, Jonathan Takeshita, and
Taeho Jung. Non-interactive mpc with trusted hardware se-
cure against residual function attacks. Cryptology ePrint
Archive, Report 2019/454, 2019. https://eprint.
iacr.org/2019/454.1
[18] Miran Kim and Kristin Lauter. Private Genome Analysis
Through Homomorphic Encryption. BMC Medical Infor-
matics and Decision Making, 15, 2015. 2,3
[19] Alex Krizhevsky, Vinod Nair, and Geof-
frey Hinton. The cifar-10 dataset, 2014.
http://www.cs.toronto.edu/kriz/cifar.html. 6
[20] Yann LeCun, Corinna Cortes, and CJ Burges. MNIST Hand-
written Digit Database. AT&T Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist, 2010. 6
[21] Eleftheria Makri, Dragos Rotaru, Nigel P. Smart, and Fred-
erik Vercauteren. Epic: Efﬁcient private image classiﬁca-
tion (or: Learning from the masters). Cryptology ePrint
Archive, Report 2017/1190, 2017. https://eprint.
iacr.org/2017/1190.5
[22] Karthik Nandakumar, Nalini Ratha, Sharath Pankanti, and
Shai Halevi. Towards Deep Neural Network Training on En-
crypted Data. In IEEE Conference on Computer Vision and
Pattern Recognition Workshops, 2019. 1,2,3,4,5,6,7
[23] Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic.
Learning and transferring mid-level image representations
using convolutional neural networks. In The IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR),
June 2014. 5
[24] Microsoft SEAL (release 3.4). https://github.com/
Microsoft/SEAL, Oct. 2019. Microsoft Research, Red-
mond, WA. 3
[25] Pierre Sermanet, Soumith Chintala, and Yann LeCun. Con-
volutional neural networks applied to house numbers digit
classiﬁcation. arXiv preprint arXiv:1204.3968, 2012. 6
[26] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The
ham10000 dataset, a large collection of multi-source der-
matoscopic images of common pigmented skin lesions. Sci-
entiﬁc data, 5:180161, 2018. 6
[27] Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen
Bai, Andrew Gordon Wilson, and Chris De Sa. SWALP :
Stochastic weight averaging in low precision training. In In-
ternational Conference on Machine Learning, 2019. 6
[28] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lip-
son. How transferable are features in deep neural networks?
In Advances in Neural Information Processing Systems 27,
pages 3320–3328. Curran Associates, Inc., 2014. 5
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper proposes a practical hybrid solution for combining and switching between three popular Ring-LWE-based FHE schemes: TFHE, B/FV and HEAAN. This is achieved by first mapping the different plaintext spaces to a common algebraic structure and then by applying efficient switching algorithms. This approach has many practical applications. First and foremost, it becomes an integral tool for the recent standardization initiatives of homomorphic schemes and common APIs. Then, it can be used in many real-life scenarios where operations of different nature and not achievable within a single FHE scheme have to be performed and where it is important to efficiently switch from one scheme to another. Finally, as a byproduct of our analysis we introduce the notion of a FHE module structure, that generalizes the notion of the external product, but can certainly be of independent interest in future research in FHE.
Conference Paper
Full-text available
Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy.
Article
Full-text available
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 11788 dermatoscopic images, of which 10010 will be released as a training set for academic machine learning purposes and will be publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.
Article
Full-text available
Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data provider knows the target variable and (iii) entities are not linked across the data providers. Hence, to the challenge of private learning, we add the potentially negative consequences of mistakes in entity resolution. Our contribution is twofold. First, we describe a three-party end-to-end solution in two phases ---privacy-preserving entity resolution and federated logistic regression over messages encrypted with an additively homomorphic scheme---, secure against a honest-but-curious adversary. The system allows learning without either exposing data in the clear or sharing which entities the data providers have in common. Our implementation is as accurate as a naive non-private solution that brings all data in one place, and scales to problems with millions of entities with hundreds of features. Second, we provide what is to our knowledge the first formal analysis of the impact of entity resolution's mistakes on learning, with results on how optimal classifiers, empirical losses, margins and generalisation abilities are affected. Our results bring a clear and strong support for federated learning: under reasonable assumptions on the number and magnitude of entity resolution's mistakes, it can be extremely beneficial to carry out federated learning in the setting where each peer's data provides a significant uplift to the other.
Chapter
Secure multiparty computation (MPC) has been repeatedly optimized, and protocols with two communication rounds and strong security guarantees have been achieved. While progress has been made constructing non-interactive protocols with just one-round of online communication (i.e., non-interactive MPC or NI-MPC), since correct evaluation must be guaranteed with only one round, these protocols are by their nature vulnerable to the residual function attack in the standard model. This is because a party that receives a garbled circuit may repeatedly evaluate the circuit locally, while varying their own inputs and fixing the inputs of others to learn the values entered by other participants. We present the first MPC protocol with a one-round online phase that is secure against the residual function attack. We also present rigorous proofs of correctness and security in the covert adversary model, a reduction of the malicious model that is stronger than the semi-honest model and better suited for modeling the behaviour of parties in the real world, for our protocol. Furthermore, we rigorously analyze the communication and computational complexity of current state of the art protocols which require two rounds of communication or one round during the online-phase with a reduced security requirement, and demonstrate that our protocol is comparable to or outperforms their complexity.
Article
This work describes a fast fully homomorphic encryption scheme over the torus (TFHE) that revisits, generalizes and improves the fully homomorphic encryption (FHE) based on GSW and its ring variants. The simplest FHE schemes consist in bootstrapped binary gates. In this gate bootstrapping mode, we show that the scheme FHEW of Ducas and Micciancio (Eurocrypt, 2015) can be expressed only in terms of external product between a GSW and an LWE ciphertext. As a consequence of this result and of other optimizations, we decrease the running time of their bootstrapping from 690 to 13 ms single core, using 16 MB bootstrapping key instead of 1 GB, and preserving the security parameter. In leveled homomorphic mode, we propose two methods to manipulate packed data, in order to decrease the ciphertext expansion and to optimize the evaluation of lookup tables and arbitrary functions in $${\mathrm {RingGSW}}$$-based homomorphic schemes. We also extend the automata logic, introduced in Gama et al. (Eurocrypt, 2016), to the efficient leveled evaluation of weighted automata, and present a new homomorphic counter called $$\mathrm {TBSR}$$, that supports all the elementary operations that occur in a multiplication. These improvements speed up the evaluation of most arithmetic functions in a packed leveled mode, with a noise overhead that remains additive. We finally present a new circuit bootstrapping that converts $$\mathsf {LWE}$$ ciphertexts into low-noise $${\mathrm {RingGSW}}$$ ciphertexts in just 137 ms, which makes the leveled mode of TFHE composable and which is fast enough to speed up arithmetic functions, compared to the gate bootstrapping approach. Finally, we provide an alternative practical analysis of LWE based schemes, which directly relates the security parameter to the error rate of LWE and the entropy of the LWE secret key, and we propose concrete parameter sets and timing comparison for all our constructions.
Conference Paper
Deep Neural Networks (DNNs) have overtaken classic machine learning algorithms due to their superior performance in big data analysis in a broad range of applications. On the other hand, in recent years Machine Learning as a Service (MLaaS) has become more widespread in which a client uses cloud services for analyzing its data. However, the client's data may be sensitive which raises privacy concerns. In this paper, we address the issue of privacy preserving classification in a Machine Learning as a Service (MLaaS) settings and focus on convolutional neural networks (CNN). To achieve this goal, we develop new techniques to run CNNs over encrypted data. First, we design methods to approximate commonly used activation functions in CNNs (i.e. ReLU, Sigmoid, and Tanh) with low degree polynomials which is essential for a practical and efficient solution. Then, we train CNNs with approximation polynomials instead of original activation functions and implement CNNs classification over encrypted data. We evaluate the performance of our modified models at each step. The results of our experiments using several CNNs with a varying number of layers and structures are promising. When applied to the MNIST optical character recognition tasks, our approach achieved 99.25% accuracy which significantly outperforms state-of-the-art solutions and is close to the accuracy of the best non-private version. Furthermore, it can make up to 164000 predictions per hour. These results show that our approach provides accurate, efficient, and scalable privacy-preserving predictions in CNNs.
Conference Paper
We describe our recent experience, building a system that uses fully-homomorphic encryption (FHE) to approximate the coefficients of a logistic-regression model, built from genomic data. The aim of this project was to examine the feasibility of a solution that operates "deep within the bootstrapping regime,'' solving a problem that appears too hard to be addressed just with somewhat-homomorphic encryption. As part of this project, we implemented optimized versions of many bread and butter FHE tools. These tools include binary arithmetic, comparisons, partial sorting, and low-precision approximation of arbitrary functions (used for reciprocals, logarithms, etc.). Our solution can handle thousands of records and hundreds of fields, and it takes a few hours to run. To achieve this performance we had to be extremely frugal with expensive bootstrapping and data-movement operations. We believe that our experience in this project could serve as a guide for what is or is not currently feasible to do with fully-homomorphic encryption.