APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers

  Institute of Automation, Chinese Academy of Sciences
  Institute of Automation, Chinese Academy of Sciences
APRIL: Finding the Achilles’ Heel on Privacy for Vision Transformers
Jiahao Lu1,2, Xi Sheryl Zhang1, Tianli Zhao1,2, Xiangyu He1,2, Jian Cheng1
1Institute of Automation, Chinese Academy of Sciences
2School of Artificial Intelligence, University of Chinese Academy of Sciences
Federated learning frameworks typically require collab-
orators to share their local gradient updates of a common
model instead of sharing training data to preserve privacy.
However, prior works on Gradient Leakage Attacks showed
that private training data can be revealed from gradients.
So far almost all relevant works base their attacks on fully-
connected or convolutional neural networks. Given the re-
cent overwhelmingly rising trend of adapting Transformers
to solve multifarious vision tasks, it is highly valuable to
investigate the privacy risk of vision transformers. In this
paper, we analyse the gradient leakage risk of self-attention
based mechanism in both theoretical and practical man-
ners. Particularly, we propose APRIL -Attention PRIvacy
Leakage, which poses a strong threat to self-attention in-
spired models such as ViT. Showing how vision Trans-
formers are at the risk of privacy leakage via gradients,
we urge the significance of designing privacy-safer Trans-
former models and defending schemes.
1. Introduction
Federated or collaborative learning [25] have been gain-
ing massive attention from both academia [20,21] and in-
dustry [7,19]. For the purpose of privacy-preserving, the
typical federated learning keeps local training data private
and trains a global model by sharing its gradients collab-
oratively. By avoiding to transmit the raw data directly to
a central server, the learning paradigm is widely believed
to offer sufficient privacy. Thereby, it has been employed
in real-world applications, especially when user privacy is
highly sensitive, e.g. hospital data [2,18].
Whilst this setting prevents direct privacy leakage by
keeping training data invisible to collaborators, a recent line
of the works [12,16,39,41,43,44] demonstrates that it is
possible to (partially) recover private training data from the
model gradients. This attack dubbed gradient leakage or
gradient inversion poses a severe threat to the federated
learning systems. The previous works primarily focus on
inverting gradients from fully connected networks (FCNs)
or convolutional neural networks (CNNs). Particularly, Yin
et al.[39] recover images with high fidelity relying on gra-
dient matching with BatchNorm layer statistics; Zhu et al.
[43] theoretically analyse the risk of certain architectures
to enable the full recovery. One intriguing question of our
interest is that, does gradient privacy leakage occur in the
context of architectures other than FCNs and CNNs?
The recent years have witnessed a surge of methods of
Transformer [32]. As an inherently different architecture,
Transformer can build large scale contextual representa-
tion models, and achieve impressive results in a broad set
of natural language tasks. For instance, huge pre-trained
language models including BERT [8], XLNet [38], GPT-
3[3], Megatron-LM [30], and so forth are established on
the basis of Transformers. Inspired by the success, origi-
nal works [1,6,29,33] seek to the feasibility of leveraging
self-attention mechanism with convolutional layers to vi-
sion tasks. Then, DETR [4] makes pioneering progress to
use Transformer in object detection and ViT [10] resound-
ingly succeeds in image classification with a pure Trans-
former architecture. Coming after ViT, dozens of works
manage to integrate Transformer into various computer vi-
sion tasks [11,2224,3537,40]. Notably, vision Trans-
formers are known to be extremely data-hungry [10], which
makes the large-scale learning in the federated fashion more
Despite the rapid progress aforementioned, there is a
high chance that vision Transformers suffer the gradient
leakage risk. Nevertheless, the line of the study on this
privacy issue is absent. Although the prior work [16] pro-
vides an attack algorithm to recover private training data for
a Transformer-based language model via an optimization
process, the inherent reason of Transformer’s vulnerability
is unclear. Different with leakage on Transformer in natural
language tasks [16], we claim that vision Transformers with
the position embedding not only encodes positional infor-
mation for patches but also enables gradient inversion from
the layer. In this paper, we introduce a novel analytic gradi-
ent leakage to reveal why vision Transformers are easy to be
attacked. Furthermore, we explore gradient leakage by re-
covery mechanisms based on an optimization approach and
provide a new insight about the position embedding. Our
results of gradient attack will shed light on future designs
for privacy-preserving vision Transformers.
To summarize, our contributions are as follows:
We prove that for the classic self-attention module, the
input data can be perfectly reconstructed without solv-
ing an intractable optimization problem, if the gradient
w.r.t. the input is known.
We demonstrate that jointly using self-attention and
learnable position embedding place the model at se-
vere privacy risk. The attacker obtain a closed-form
solution to the privacy leakage under certain condi-
tions, regardless of the complexity of networks.
We propose an Attention Privacy Leakage (APRIL) at-
tack, to discover the Archilles’ Heel. As an alternative,
APRIL performs an optimization-based attack, apart
from the closed-form attack. The attacks show that our
results superior to SOTA.
We suggest to switch the learnable position embedding
to a fixed one as the defense against privacy attacks.
Empirical results certify the effectiveness of our de-
fending scheme.
2. Preliminary
Federated Learning. Federated learning [25] offers the
scheme that trains statistical models collaboratively involv-
ing multiple data owners. Due to the developments in areas
of privacy, large-scale training, and distributed optimiza-
tion, federated learning methods have been deployed by ap-
plications which require computing at the edge [2,9,13,14,
28]. In this scenario, we aim to learn a global model by
locally processed client data and communicating interme-
diate updates to a central server. Formally, the typical goal
is minimizing the following loss function lwith parameters
wlw(x, y),where lw(x, y):=
where pi0and ipi=1. Since the Nclients owns the
private training data. Let (xi,y
i)denote samples available
locally for the ith client, and li
i)denote the local loss
function. In order to preserve data privacy, clients periodi-
cally upload their gradients wli
i)computed on their
own local batch. The server aggregates gradients from all
clients, updates the model using gradient descent and then
sends back the updated parameters to every client.
Gradient Leakage Attack. As an honest-but-curious ad-
versary at the server side may reconstruct clients’ private
training data without messing up the training process, shar-
ing gradients in federated learning is no longer safe for
client data. Endeavors of existing threat models which use
gradients to recover input mainly focus on two directions:
optimization-based attacks and closed-form attacks.
The basic recovery mechanism is defined by optimizing
an euclidean distance as follows,
Deep leakage [44] minimizes the matching term of gradi-
ents from dummy input (x
i)and those from real input
i)1. On the top of this proposal, iDLG [41] finds that
in fact we can derive the ground-truth label from the gradi-
ent of the last fully connected layer. By eliminating one op-
timization objective in Eq.(2), the attack procedure becomes
even faster and smoother. Also, Geiping et al.[12] prove
that inversion from gradient is strictly less difficult than re-
covery from visual representations. GradInversion [39] in-
corporates heuristic image prior as regularization by utiliz-
ing BatchNorm matching loss and group consistency loss
for image fidelity. Lately, GIML [17] illustrates that a gen-
erative model pre-trained on data distribution can be ex-
ploited for reconstruction.
One essential challenge of optimization procedures is
that there is no sufficient condition for the uniqueness of
the optimizer. The closed-form attack, as another of the
ingredients in this line, is introduced by Phong et al.[27],
which reconstructs inputs using a shallow network such as a
single-layer perceptron. R-GAP [43] is the first derivation-
based approach to perform an attack on CNNs, which mod-
els the problem as linear systems with closed-form solu-
tions. Compared to the optimization-based method, analytic
gradient leakage heavily depends on the architecture of neu-
ral networks and thus cannot always guarantee a solution.
Transformers. Transformer [32] is introduced for neural
machine translation to model the long-term correlation be-
tween tokens meanwhile represent dependencies between
any two distant tokens. The key of outstanding repre-
sentative capability comes from stacking multi-head self-
attention modules. Recently, vision Transformers and its
variants are broadly used for powerful backbones [10,24,
31], object detection [4], semantic segmentation [42], im-
age generation [5,15,26], etc.
Given the fundamentals of vision Transformer, we will
investigate the gradient leakage in terms of closed-formed
and optimization-based manners. Thus far, almost all the
gradient leakage attacks adopt CNNs as the testing ground,
typically using VGG or ResNet. Besides, TAG [16] con-
ducts experiments on popular language models using Trans-
formers without concerning any analytic solution as well as
the function of position embedding.
3. APRIL: Attention PRIvacy Leakage
In light of the missing investigation of the gradient leak-
age problem for vision transformers, we first prove that
gradient attacks on self-attention can be analytically con-
ducted. Next, we will discuss the possible leakage from the
position embedding based on its analytic solution, which
naturally gives rise to two attack approaches.
3.1. Analytic Gradient Attack on Self-Attention
It has been proven that the closed-form solution for in-
put xcan always be perfectly obtained on a fully-connected
layer σ(Wx +b)=z, through deriving gradients w.r.t.
weight Wand bias b. The non-linear function σis an acti-
vation [27]. In this work, we delve into a more subtle for-
mulation of a self-attention to demonstrate the existence of
the closed-form solution.
Theorem 1. (Input Recovery). Assume a self-attention
module expressed as:
Qz =q;Kz =k;Vz=v;(3)
Wh =a;(5)
where zis the input of the self-attention module, ais the out-
put of the module. Let Q, K, V, W denote the weight matrix
of query, key, value and projection, and q,k, v, h denote the
intermediate feature map. Suppose the loss function can be
written as
If the derivative of loss lw.r.t. the input zis known, then the
input can be recovered uniquely from the network’s gradi-
ents by solving the following linear system:
∂Q +KT∂l
∂K +VT∂l
Proof. In spite of the non-linear formulation of self-
attention modules, the gradients w.r.t. zcan be derived in
a succinct linear equation:
∂z =QT∂l
∂q +KT∂l
∂k +VT∂l
∂v (6)
Again, according to the chain rule of derivatives, we can
derive the gradients w.r.t. Q,Kand Vfrom Eq. (3):
∂Q =∂l
∂K =∂l
∂V =∂l
Algorithm 1: Closed-Form APRIL
Input: Attention module: F(z, w);
Module weights w; Module gradients ∂l
Derivative of loss w.r.t. z:∂l
Output: Embedding feed into attention module: z
1: procedure APRIL-CLOSED-FORM(F, w, ∂l
∂w ,∂l
∂z )
2: Extract Q, K, V from module weights w
3: Extract ∂l
∂K ,∂l
∂V from module gradients ∂l
4: A∂l
5: bQT·∂l
∂Q +VT·∂l
∂V +KT·∂l
6: zA·bA
: Moore-Penrose
7: pseudoinverse of A
8: zzTTranspose
9: end procedure
By multiplying zTto both sides of Eq. (6) and substitut-
ing Eq. (7), we obtain:
∂Q +KT∂l
∂K +VT∂l
which completes the proof.
Remark. Surprisingly we find that for a malicious attacker
aiming to recover the input data z. Since an adversary in
the context of federated learning knows both learnable pa-
rameters and gradients w.r.t. them, in this case, Q,K,V
and ∂l
∂K ,∂l
∂V . The right side of Eq. (8)is known. As
a result, once the derivative of the loss w.r.t. the input ∂l
is exposed to the adversary, the attacker can easily get an
accurate reconstruction of zby solving the linear equation
system in Eq. (8).
Solution Feasibility. Suppose the dimension of the embed-
ding zis Rp×c, with patch number pand channel number c.
This linear system has p×cunknown variables yet c×clin-
ear constraints. Since deep neural networks normally have
wide channels for the sake of expressiveness, cpin most
model designs, which leads to an overdetermined problem
and thereby a solvable result. In other words, zcan be accu-
rately reconstructed if ∂l
∂z is available. The entire procedure
of the closed-form attack is presented in Alg.1.
3.2. Position Embedding: The Achilles’ Heel
Now we focus on the how to access the critical deriva-
tive ∂l
∂z by introducing the leakage caused by the position
embedding. Under general settings of federated learning,
the sensitive information related with zis invisible from
users’ side. Here, we show that ∂l
∂z is unfortunately ex-
posed by gradient sharing for vision Transformers with a
Figure 1. We consider two Transformer designs throughout the
paper. (A): Encoder modules stack multi-head attention, normal-
ization, and MLP in VGG-style. (B): A real-world design as in-
troduced in ViT [10]. The architecture in (A) satisfies the precon-
dition of a closed-form APRIL attack, since the output of posi-
tion embedding is exactly input for multi-head attention, showing
by the red dashed line box. In contrast, the optimization-based
APRIL attack can be placed in any design of architectures, show-
ing by the yellow dashed line boxes in (A) and (B).
learnable position embedding. Specifically, we give the fol-
lowing theorem to illustrate the leakage.
Theorem 2. (Gradient Leakage). For a Transformer with
learnable position embedding Epos, the derivative of loss
w.r.t. Epos can be given by
∂z (9)
where ∂l
∂z is defined by the linear system in Theorem 1.
Proof. Without loss of generality, the embedding zdefined
by Theorem 1can be divided into a patch embedding Epatch
and a learnable position embedding Epos as,
z=Epatch +Epos (10)
Straightforwardly, we compute the derivative of loss w.r.t.
Epos using Eq. (10), Eq. (9) holds.
Remark. The sensitive information ∂l
∂z is exactly the same
as the gradient of the position embedding ∂l
∂Epos , denoting
as Epos for simplicity. As model gradients are sharing,
Epos is available for not only legal users but also poten-
tial adversaries, which means a successful attack on self-
attention inputs.
While vision Transformers [10,24,34] embody promi-
nent accuracy raise using learnable position embeddings
rather than the fixed ones, updating of parameter Epos will
result in privacy-preserving troubles based on our theory.
More severely, the attacker only requires a learnable posi-
tion embedding and a self-attention stacked at the bottom
in VGG-style, regardless of the complexity of the rest ar-
chitecture, as shown in Fig. 1(A). At a colloquial level, we
suggest two strategies to alleviate this leakage, which is ei-
ther employing one fixed position embedding instead of the
learnable one or updating Epos only on local client with-
out transmission.
3.3. APRIL attacks on vision Transformer
So far the analytic gradient attack have succeeded in re-
constructing input embedding zmeanwhile obtaining the
gradient of position embedding Epos. One question is
that can APRIL take advantage of the sensitive information
to further recover the original input x. The answer is affir-
Closed-Form APRIL. As a matter of the fact, APRIL at-
tacker can inverse the embedding via a linear projection to
get original input pixels. For a vision Transformer, the input
image is partitioned into many patches and sent through a
so-called “Patch Embedding” layer, defined as
Epatch =Wpx(11)
The bias term is omitted since it can be represented in
an augmented matrix Wp. With Wp, pixels are linearly
mapped to features, and the attacker calculates the original
pixels by left-multiply its pseudo-inverse.
Optimization-based APRIL. Given the linear system in
Theorem 1, it can also be decomposed into two compo-
nents as zand Epos based on Eq.(9). Arguably, com-
ponent Epos indicates the directions of the gradients of
position embeddings and contributes to the linear system
indepentently with data. Considering the significance of
the learnable position embedding in gradient leakage, in-
tuitively, matching the updating direction of Epos with an
direction caused by dummy data can do benefits on the re-
covery. Therefore, we proposed an optimization-based at-
tack with constraints on Epos. To do so, apart from ar-
chitecture in Fig. 1(A), typical design of ViT illustrated in
Fig. 1(B) using normalization and residual connections with
a different stacked order can also be attacked by our pro-
posed APRIL.
For expression simicity, we use wand wdenote the
gradients of parameter collections for dummy data and real
inputs, respectively. In detail, the new integrated term of
gradients of Epos is set as LA. For modelling directional
information, we utilize a cosine similarity between real and
dummy position embedding derivatives as a regularization.
The intact optimization problem is written as
pos >
Algorithm 2: Optimization-based APRIL
Input: Transformer with learnable position embedding: F(x, w); Module parameter weights : w;
Module parameter gradients: w; APRIL loss term scaler: α
Output: Image feed into the self-attention module: x
1: procedure APRIL-OPTIMIZATION-ATTAC K(F, w, w)
2: Extract final linear layer weights wfc from w
3: yis.t. wi
fc 0,j=iExtract ground-truth label using the iDLG trick
4: Extract position embedding layer’s gradients Epos from w
5: x←N(0,1) Initialize the dummy input
6: While not converged do
7: ∂l
∂l(F(x;w),y)/∂w Calculate dummy gradients
8: LG=∇w−∇w2
FCalculate L-2 difference between gradients
9: ∂l
pos ∂l(F(x;w),y)/∂E
pos Calculate the derivative of dummy loss w.r.t. dummy input
10: LA=<Epos,E
posCalculate cosine distance between derivative of input
11: L=LG+αLA
12: xxηxLUpdate the dummy input
13: end procedure
where hyperparameter αbalances the contributions of two
matching losses. Eventually, we set Eq.(12) to be another
variant of our proposed method, optimization-based APRIL
attack. The associated algorithm is described in Alg.2.By
enforcing a gradient matching on the learnable position em-
bedding, it is plaguily easy to break privacy in a vision
4. Experiments
In this section, we aspire to carry out experiments to an-
swer the following questions: (1) To what extent can APRIL
break privacy of a Transformer? (2) How strong is the
APRIL attack compared to existing privacy attack methods?
(3) What defensive strategy can we take to alleviate APRIL
attack? (4) How to testify the functionality of position em-
bedding in privacy preserving?
We mainly carry out experiments in the setting of im-
age classification; however, APRIL as a universal attack for
Transformers can also be performed in a language task set-
ting. Here we only discuss APRIL attack for vision Trans-
formers in this section.
We carry out experiments on two different architectures,
as illustrated in Fig. 1, architecture (A) has a position em-
bedding layer directly connected to attention module, mak-
ing it possible to perform APRIL-closed-form attack. Ar-
chitecture (B) has the same structure as ViT-Base [10],
which is composed of multiple encoders, each with a nor-
malization layer before attention module as well as a resid-
ual connection. For small datasets like CIFAR and MNIST,
we refer to the implementation of ViT-CIFAR2. We set the
hidden dimension to 384, attention head to 4, and partition
input images into 4 patches. The encoder depth is 4, after
that the classification token is connected to a classification
head. For experiments on ImageNet, we follow the original
ViT design3and architecture setting, which includes 16x16
image patch size, 12 attention heads, 12 layers of encoders
with hidden dimensions of 768.
4.1. APRIL as the Gradient Attack
We first apply APRIL attacks on Architecture (A) and
compare it with other attacking approaches. As Fig. 2
shows, closed-form APRIL attack provides a perfect re-
construction, which shows nearly no difference to the orig-
inal input, which proves the correctness of our theorem.
Comparing optimization-based attacks, for easy tasks like
MNIST and CIFAR with a clean background, all existing
attacking algorithms show their ability to break privacy, al-
though DLG [44] and IG [12] have some noises in their re-
sults. The comparison is obvious for ImageNet reconstruc-
tions, where DLG, IG and TAG reconstructions are nearly
unrecognizable to humans, with strong block artifacts. In
contrast, the proposed APRIL-Optimization attack behaves
prominently better, which reveals quite a lot of sensitive in-
formation from the source image, including details like the
color and shape of the content.
We further studied the optimization procedure of recon-
struction, shown in Fig. 3. We illustrate the updating pro-
cess of the dummy image. We can observe that all three ap-
proaches can break some sort of privacy, but they differ in
convergence speed and final effects. An apparent observa-
Figure 2. Results for different privacy attacking approaches on Architecture (A). For optimization-based attacks, we use an Adam optimizer
to update 800 iterations for MNIST, 1500 iterations for CIFAR-10 and 5000 iterations for ImageNet. Please zoom-in to see details.
Attack MNIST CIFAR-10 ImageNet
DLG [44] 1.291e-04 ±2.954e-04 0.997 ±0.003 0.017 ±0.009 0.959±0.045 1.328±0.593 0.056 ±0.027
IG [12] 0.043±0.022 0.833±0.076 0.125±0.102 0.635±0.165 1.671±0.653 0.029±0.013
TAG [ 16]3.438e-05 ±1.322e-05 0.998±0.002 0.006 ±0.005 0.965±0.047 1.180 ±0.473 0.062 ±0.026
APRIL 4.796e-05±3.593e-05 0.998 ±0.002 0.002±0.006 0.991 ±0.027 1.092±0.663 0.099 ±0.046
Table 1. Mean and standard deviation for MSE of 500 reconstructions on MNIST, CIFAR-10 and ImageNet validation datasets, respectively.
We randomly selected 50 images from each class in MNIST and CIFAR-10, and one image for random 500 classes in ImageNet.
tion is that our optimization-based APRIL converges con-
sistently faster than the other two. Besides, our approach
generally ends up at a better terminal point, which results in
smoother and cleaner image reconstructions.
Apart from visualization results, we want to have a quan-
titative comparison between these optimization-based at-
tacks. We carry out this experiment on Architecture(B),
where we do not have the condition to use closed-form
APRIL attack. The statistical results from Sec. 4shows
consistent good performance of APRIL, and we obtain best
results nearly across every task setting.
Finally, we try to attack batched input images. As shown
in Fig. 4, our optimization results on batched input achieved
impressive results as well. Note here we used the trick in-
troduced by Yin et al.[39] to restore batch labels before
optimization. More results are put in Appendix. It’s worth
mentioning that the use of a closed-form APRIL attack is
limited under batched setting, since the gradients are con-
tributed by all samples in a batch, and we can only solve
an ”averaged” version of zin Eq. (8). We give more recon-
struction results and discuss more thoroughly on the phe-
nomenon in Appendix.
All experiments shown above demonstrate that the pro-
posed APRIL outperforms all existing privacy attack ap-
proaches in the context of Transformer, thus posing a strong
threat to Vision Transformers.
4.2. APRIL-inspired Defense Strategy
How robust is the closed-form APRIL. In the last sub-
section, we show that under certain conditions, closed-form
APRIL attack can be executed to get almost perfect recon-
structions. The execution of this attack is based on solv-
ing a linear system. Linear systems can be unstable and
ill-conditioned when the condition number is large. With
this knowledge, we are interested to know how much distur-
bance can APRIL bear to remain a good attack? We discuss
a few defensive strategies towards APRIL.
We first testified the influence of changing hidden chan-
nel dimensions. A successful closed-form reconstruction
relies on the linear system with P·Cunknowns and C·C
constraints, to be overdetermined. As common configura-
tion suggests Cfar larger than P, we deem the linear system
to be solvable. To test the robustness of APRIL under dif-
ferent architecture settings, we try four different hidden di-
ͳͲ ͷͲ ͳͲͲ ʹͲͲ ͷͲͲ ͺͲͲ ͳͲ ͷͲ ͳͲͲ ʹͲͲ ͷͲͲ ͳͷͲͲ
ͷͲ ͳͲͲ ʹͲͲ ͷͲͲ ͳͲͲͲ ʹͲͲͲ ͶͲͲͲ
ȋȌȋȌȋȌ ǦͳͲ
Figure 3. Visualization of the optimization process for optimization-based APRIL, DLG and TAG. Our approach has faster convergence
speed and does not easily fall into bad local minima, thus yields a prominently better reconstruction result.
Iterations =
Figure 4. Optimization-based APRIL attack on batched inputs.
hidden dimen-
hidden dimen-
hidden dimen-
hidden dimen-
Figure 5. Influences of varying hidden dimension to the recon-
struction of APRIL attack.
mensions. As Fig. 5shows, using the original configuration
of ViT-base [10] cannot be privacy-preserving, the original
input image can be entirely leaked by closed-form APRIL
attack. Only by shrinking hidden dimensions to a small
value (e.g., half of the patch number) can we have solid
protection. However, in this configuration, we doubt the
network’s capacity to gain high accuracy with such small
channel number.
Another more straightforward way to defend against pri-
vacy attacks from gradients is to add noise on gradients. We
experiment with Gaussian and Laplacian noises and report
results in Fig. 6. We found that the defense level does not
depend on the absolute magnitude of noise variance, but
its relative scale to gradient norm. Specifically, when the
Gaussian noise variance is lower than 0.1 times (or 0.01
for Laplacian) of gradient norm, the defense won’t work.
As the variance goes up, the defense ability is greatly pro-
A Practical and Cheap Defense Scheme. Apart from
Gaussian Var =
0.1x grad norm
Gaussian Var =
grad norm
Gaussian Var =
3x grad norm
Gaussian Var =
10x grad norm
Laplacian Var =
0.01x grad norm
Laplacian Var =
0.1x grad norm
Laplacian Var =
grad norm
Laplacian Var =
3x grad norm
Figure 6. Influences of adding noise to gradients.
adding noise and changing channel dimensions, a more
straightforward way of defending against APRIL is to
switch learnable position embedding to a fixed one. In this
part, we will show that this is a realistic and practical de-
fense, not only for the proposed APRIL, but for all kinds of
By using a fixed position embedding, clients will not
share the gradients w.r.t. the input. Therefore, it is im-
possible to perform closed-form APRIL attack. How will
optimization-based privacy attacks act when the position
embedding is transparent to the attacker?
We experimented to find out the answer. Note that
when position embedding is unknown to the attacker, the
optimization-based APRIL attack turns into a more gen-
eral DLG attack. From results, we noticed that similar to
twin data mentioned by [43], closing the position embed-
ding gradients seems to result in a family of anamorphic
data, which is highly different from original data, but can
trigger exactly similar gradients in a Transformer. We visu-
alize these patterns as shown in Sec. 4.2. Currently we are
not sure about the relationship between twin and original
data, but it’s safe to conclude that if we cease sharing po-
sition embedding gradients, the gradient matching process
will produce semantically meaningless reconstructions. In
Figure 7. Twin data emerge from privacy attack after we stop shar-
ing position embedding. It attested the validity of the defense, in
which way confirms that position embedding is indeed the most
critical part to Transformer’s privacy.
(A) Gradient l2 loss and image MSE on Architecture A
(B) Gradient l2 loss and image MSE on Architecture B
Figure 8. Changes of gradient matching and input reconstruction
versus optimization iterations. When position embedding is off,
matching gradients does not provide semantically meaningful re-
this way, the attacks fail to break privacy.
To sum up, changing the learnable position to fixed ones
or simply not sharing position embedding gradient is prac-
tical to prevent privacy leakage in Transformers, which pre-
serves privacy in a highly economic way.
5. Discussion and Conclusion
In this paper, we introduce a novel approach Attention
PRIvacy Leakage attack (APRIL) to steal private local
training data from shared gradients of a Transformer. The
attack builds its success on a key finding that learnable posi-
tion embedding is the weak spot for Transformer’s privacy.
Our experiments show that in certain cases the adversary
can apply a closed-form attack to directly obtain the in-
put. For broader scenarios, the attacker can make good use
of position embedding to perform an optimization-based
attack to easily reveal the input. This sets a great chal-
lenge to training Transformer models in distributed learn-
ing systems. We further discussed possible defenses to-
wards APRIL attack, and verified the effectiveness of using
a fixed position embedding. We hope this work would shed
light on privacy-preserving network architecture design. In
summary, our work has a key finding that learnable position
embedding is a weak spot to leak privacy, which greatly
advances the understanding of privacy leakage problem for
Transformers. Based on the finding, we further propose a
novel privacy attack APRIL and discuss effective defending
Limitation. Our proposed APRIL attack is composed
of two parts: closed-form attack when the input gradi-
ents are exposed and optimization-based attack otherwise.
Closed-form APRIL attack is powerful, nonetheless relies
on a strong assumption, which makes it limited to use
in real-world Transformer designs. On the other hand,
optimization-based APRIL attack implicitly solves a non-
linear system. Although they all make good use of gradients
from position embedding, there seems to be room to explore
a more profound relationship between the two attacks.
Potential Negative Societal Impact. We demonstrate the
privacy risk of learnable position embedding, as it is largely
used as a paradigm in training Transformers. The privacy
attack APRIL proposed in this paper could be utilized by
the malicious to perform attack towards existing federated
learning systems to steal user data. We put stress on the
defense strategy proposed in the paper as well, and urge the
importance of designing privacy-safer Transformer blocks.
