Content uploaded by Yiming Li
Author content
All content in this area was uploaded by Yiming Li on Mar 31, 2023
Content may be subject to copyright.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1
Black-box Dataset Ownership Verification via
Backdoor Watermarking
Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia
Abstract—Deep learning, especially deep neural networks
(DNNs), has been widely and successfully adopted in many
critical applications for its high effectiveness and efficiency. The
rapid development of DNNs has benefited from the existence
of some high-quality datasets (e.g., ImageNet), which allow
researchers and developers to easily verify the performance of
their methods. Currently, almost all existing released datasets
require that they can only be adopted for academic or educational
purposes rather than commercial purposes without permission.
However, there is still no good way to ensure that. In this paper,
we formulate the protection of released datasets as verifying
whether they are adopted for training a (suspicious) third-party
model, where defenders can only query the model while having
no information about its parameters and training details. Based
on this formulation, we propose to embed external patterns
via backdoor watermarking for the ownership verification to
protect them. Our method contains two main parts, including
dataset watermarking and dataset verification. Specifically, we
exploit poison-only backdoor attacks (e.g., BadNets) for dataset
watermarking and design a hypothesis-test-guided method for
dataset verification. We also provide some theoretical analyses
of our methods. Experiments on multiple benchmark datasets
of different tasks are conducted, which verify the effectiveness
of our method. The code for reproducing main experiments is
available at https://github.com/THUYimingLi/DVBW.
Index Terms—Dataset Protection, Backdoor Attack, Data Pri-
vacy, Data Security, AI Security
I. INTRODUCTION
DEEP neural networks (DNNs) have been widely and
successfully used in many mission-critical applications
and devices for their high effectiveness and efficiency. For
example, within a smart camera, DNNs can be used for
identifying human faces [1] or pose estimation [2].
In general, high-quality released (e.g., open-sourced or
commercial) datasets [3], [4], [5] are one of the key factors in
the prosperity of DNNs. Those datasets allow researchers and
developers to easily verify their model effectiveness, which
in turn accelerates the development of DNNs. Those datasets
are valuable since the data collection is time-consuming and
expensive. Besides, according to related regulations (e.g.,
GDPR [6]), their copyrights deserve to be protected.
Yiming Li and Mingyan Zhu are with Tsinghua Shenzhen Interna-
tional Graduate School, Tsinghua University, Shenzhen, China (e-mail: li-
ym18@mails.tsinghua.edu.cn, zmy20@mails.tsinghua.edu.cn).
Xue Yang is with School of Information Science and Technology, Southwest
Jiaotong University, Chengdu, China (e-mail: xueyang@swjtu.edu.cn).
Yong Jiang, and Shu-Tao Xia are with Tsinghua Shenzhen International
Graduate School, Tsinghua University, and also with the Research Center
of Artificial Intelligence, Peng Cheng Laboratory, Shenzhen, China (e-mail:
jiangy@sz.tsinghua.edu.cn, xiast@sz.tsinghua.edu.cn).
Tao Wei is with Ant Group, Hangzhou, Zhejiang, China (e-mail:
lenx.wei@antgroup.com).
Corresponding Author(s): Xue Yang and Shu-Tao Xia.
In this paper, we discuss how to protect released datasets.
In particular, those datasets are released and can only be used
for specific purposes. For example, open-sourced datasets are
available to everyone while most of them can only be adopted
for academic or educational rather than commercial purposes.
Our goal is to detect and prevent unauthorized dataset users.
Currently, there were some techniques, such as encryption
[7], [8], [9], digital watermarking [10], [11], [12], and dif-
ferential privacy [13], [14], [15], for data protection. Their
main purpose is also precluding unauthorized users to utilize
the protected data. However, these methods are not suitable to
protect released datasets. Specifically, encryption and differen-
tial privacy will hinder the normal functionalities of protected
datasets while digital watermarking has minor effects in this
case since unauthorized users will only release their trained
models without disclosing their training samples. How to
protect released datasets is still an important open question.
This problem is challenging because the adversaries can get
access to the victim datasets. To the best of our knowledge,
there is no prior work to solve it.
In this paper, we formulate this problem as an ownership
verification, where defenders intend to identify whether a
suspicious model is trained on the (protected) victim dataset.
In particular, we consider the black-box setting, which is more
difficult compared with the white-box one since defenders can
only get model predictions while having no information about
its training details and model parameters. This setting is more
practical, allowing defenders to perform ownership verification
even when they only have access to the model API. To
tackle this problem, we design a novel method, dubbed dataset
verification via backdoor watermarking (DVBW). Our DVBW
consists of two main steps, including dataset watermarking
and dataset verification. Specifically, we adopt the poison-only
backdoor attacks [16], [17], [18] for dataset watermarking,
inspired by the fact that they can embed special behaviors on
poisoned samples while maintaining high prediction accuracy
on benign samples, simply based on data modification. For
the dataset verification, defenders can verify whether the sus-
picious model was trained on the watermarked victim dataset
by examining the existence of the specific backdoor. To this
end, we propose a hypothesis-test-guided verification.
Our main contributions can be summarized as follows:
•We propose to protect datasets by verifying whether they
are adopted to train a suspicious third-party model.
•We design a black-box dataset ownership verification
(i.e., DVBW), based on the poison-only backdoor attacks
and pair-wise hypothesis tests.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2
•We provide some theoretical insights and analyses of our
dataset ownership verification.
•Experiments on benchmark datasets of multiple types of
tasks (i.e., image classification, natural language process-
ing, and graph recognition) are conducted, which verify
the effectiveness of the proposed method.
The rest of this paper is organized as follows: In the
next section, we briefly review related works. After that, we
introduce the preliminaries and define the studied problem. We
introduce the technical details of our method in section IV. We
conduct experiments on multiple benchmark datasets to verify
our effectiveness in Section V. We compare our work with
model ownership verification in Section VI and conclude this
paper in Section VII at the end. We hope that our paper can
provide a new angle of data protection, to preserve the interests
of dataset owners and facilitate secure dataset sharing.
II. RE LATE D WOR KS
A. Data Protection
Data protection has always been an important research area,
regarding many aspects of data security. Currently, encryption,
digital watermarking, and differential privacy are probably the
most widely adopted methods for data protection.
Encryption [7], [8], [9] is the most classical protection
method, which encrypts the whole or parts of the protected
data. Only authorized users who have obtained the secret
key can decrypt the encrypted data. Currently, there were
also some empirical methods [19], [20], [21] that protect
sensitive data information instead of data usage. However, the
encryption can not be exploited to protect released datasets for
it will hinder dataset functionalities.
Digital watermarking was initially used to protect image
copyright. Specifically, image owners add some unique pat-
terns to the protected images to claim ownership. Currently,
digital watermarking is used for a wider range of applications,
such as DeepFake detection [11] and image steganography
[12]. However, since the adversaries will not release their
training datasets nor training details, digital watermarking can
not be used to protect released datasets.
Differential privacy [22], [14], [15] is a theoretical frame-
work to measure and preserve the data privacy. Specifically, it
protects the membership information of each sample contained
in the dataset by making the outputs of two neighboring
datasets indistinguishable. However, differential privacy re-
quires manipulating the training process by introducing some
randomness (e.g., Laplace noises) and therefore can not be
adopted to protect released datasets.
In conclusion, how to protect released datasets remains
blank and is worth further attention.
B. Backdoor Attack
Backdoor attack is an emerging yet rapidly growing research
area [23], where the adversaries intend to implant hidden
backdoors into attacked models during the training process.
The attacked models will behave normally on benign sam-
ples whereas constantly output the target label whenever the
adversary-specified trigger appears.
Existing backdoor attacks can be roughly divided into three
main categories, including poison-only attacks [17], [24],
[25], training-controlled attacks [26], [27], [28], and model-
modified attacks [29], [30], [31], based on the adversary’s
capacities. Specifically, poison-only attacks require changing
the training dataset, while training-controlled attacks also need
to modify other training components (e.g., training loss); The
model-modified attacks are conducted by modifying model pa-
rameters or structures directly. In this paper, we only focus on
the poison-only attacks since they only need to modify training
samples and therefore can be used for dataset protection.
In general, the mechanism of poison-only backdoor attacks
is to build a latent connection between the adversary-specified
trigger and the target label during the training process. Gu et
al. proposed the first backdoor attack (i.e., BadNets) target-
ing the image classification tasks [16]. Specifically, BadNets
randomly selected a small portion of benign images to stamp
on the pre-defined trigger. Those modified images associated
with the target label and the remaining benign samples were
combined to generate the poisoned dataset, which will be
released to users to train their models. After that, many other
follow-up attacks with different trigger designs [32], [33],
[34] were proposed, regarding attack stealthiness and stability.
Currently, there are also a few backdoor attacks developed
outside the context of image classification [35], [36], [37]. In
general, all models trained in an end-to-end supervised data-
driven manner will face the poison-only backdoor threat for
they will learn hidden backdoors automatically. Although there
were many backdoor attacks, how to use them for positive
purposes is left far behind and worth further exploration.
III. PRELIMINARIES AND PROBLEM FORMULATI ON
A. The Definition of Technical Terms
In this section, we present the definition of technical terms
that are widely adopted in this paper, as follows:
•Benign Dataset: the unmodified dataset.
•Victim Dataset: the released dataset.
•Suspicious Model: the third-party model that may be
trained on the victim dataset.
•Trigger Pattern: the pattern used for generating poisoned
samples and activating the hidden backdoor.
•Target Label: the attacker-specified label. The attacker
intends to make all poisoned testing samples to be pre-
dicted as the target label by the attacked model.
•Backdoor: the latent connection between the trigger
pattern and the target label within attacked model.
•Benign Sample: the unmodified samples.
•Poisoned Sample: the modified samples used to create
and activate the backdoor.
•Benign Accuracy: the accuracy of models in predicting
benign testing samples.
•Watermark Success Rate: the accuracy of models in
predicting watermarked testing samples.
We will follow the same definition in the remaining paper.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 3
Poison Generator G
⋮
Cat
Candle
Selected Subset 𝓓𝒔
⋮
Dog
Modified Subset 𝓓𝒎
Dog
Remaining Benign Samples 𝓓𝒃\𝓓𝒔
Configurations: Trigger
‘Dog’
Target Label
Released
Victim
Dataset
…
Candle Dog Fish CatCarBear
Step 1. Dataset Watermarking Step 2. Dataset Verification
Benign Images
Watermarked Images
Probability 𝑃𝑏
Probability 𝑃
𝑤
Pair-wise
T-test
Suspicious DNN
Watermarked Images
Predicted Label
Target Label
Suspicious DNN
Wilcoxon-test
(a) Probability-Available Verification
(b) Label-Only Verification
Fig. 1: The main pipeline of our method. In the first step, defenders will exploit poison-only backdoor attacks for dataset
watermarking. In the second step, defenders will conduct dataset verification by examining whether the suspicious model
contains specific hidden backdoors via hypothesis tests. In this paper, we consider two representative black-box scenarios,
where defenders can obtain the predicted probabilities and only have the predicted labels, respectively.
B. The Main Pipeline of Deep Neural Networks (DNNs)
Deep neural networks (DNNs) have demonstrated their
effectiveness in widespread applications. There were many dif-
ferent types of DNNs, such as convolutional neural networks
[38], Transformer [39], and graph neural networks [40], de-
signed for different tasks and purposes. Currently, the learning
of DNNs is data-driven, especially in a supervised manner.
Specifically, let D={(xi, yi)}N
i=1 (xi∈ X, yi∈ Y)indicates
the (labeled) training set, where Xand Yindicate the input
and output space, respectively. In general, all DNNs intend to
learn a mapping function (with parameter θ)fθ:X → Y,
based on the optimization as follows:
min
θ
1
N
N
X
i=1 L(fθ(xi), yi),(1)
where L(·)is a given loss function (e.g., cross-entropy).
Once the model fθis trained, it can predict the label of
‘unseen’ sample xvia fθ(x).
C. The Main Pipeline of Poison-only Backdoor Attacks
In general, poison-only backdoor attacks first generate the
poisoned dataset Dp, based on which to train the given
model. Specifically, let ytindicates the target label and
Db={(xi, yi)}N
i=1 (xi∈ X, yi∈ Y)denotes the benign
training set, where Xand Yindicate the input and output
space, respectively. The backdoor adversaries first select a
subset of Db(i.e.,Ds) to generate its modified version Dm,
based on the adversary-specified poison generator Gand
the target label yt. In other words, Ds⊂ Dband Dm=
{(x′, yt)|x′=G(x),(x, y)∈ Ds}. The poisoned dataset Dp
is the combination between Dmand the remaining benign
samples, i.e.,Dp=Dm∪(Db\Ds). In particular, γ≜|Dm|
|Dp|is
called poisoning rate. Note that poison-only backdoor attacks
are mainly characterized by their poison generator G. For ex-
ample, G(x) = (1−α)⊗x+α⊗t, where α∈[0,1]C×W×H,
t∈ X is the trigger pattern, and ⊗is the element-wise product
in the blended attack [32]; G(x) = x+tin the ISSBA [17].
After the poisoned dataset Dpis generated, it will be
used to train the victim models. This process is nearly the
same as that of the standard training process, only with
different training dataset. The hidden backdoors will be created
during the training process, i.e., for a backdoored model fb,
fb(G(x)) = yt,∀x∈ X. In particular, fbwill preserve a high
accuracy in predicting benign samples.
D. Problem Formulation and Threat Model
In this paper, we focus on the dataset protection of classi-
fication tasks. There are two parties involved in our problem,
including the adversaries and the defenders. In general, the
defenders will release their dataset and want to protect its
copyright; the adversaries target to ‘steal’ the released dataset
for training their commercial models without permission from
defenders. Specifically, let ˆ
Dindicates the protected dataset
containing Kdifferent classes and Sdenotes the suspicious
model, we formulate the dataset protection as a verification
problem that defenders intend to identify whether Sis trained
on ˆ
Dunder the black-box setting. The defenders can only
query the model while having no information about its param-
eters, model structure, and training details. This is the hardest
setting for defenders since they have very limited capacities.
However, it also makes our approach the most pervasive, i.e.,
defenders can still protect the dataset even if they only query
the API of a suspicious third-party model.
In particular, we consider two representative verifica-
tion scenarios, including probability-available verification and
label-only verification. In the first scenario, defenders can
obtain the predicted probability vectors of input samples,
whereas they can only get the predicted labels in the second
one. The latter scenario is more challenging for the defenders
can get less information from the model predictions.
IV. THE PROPOSED METHOD
In this section, we first overview the main pipeline of our
method and then describe its components in details.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 4
A. Overall Procedure
As shown in Figure 1, our method consists of two main
steps, including the (1) dataset watermarking and the (2)
dataset verification. In general, we exploit poison-only back-
door attacks for dataset watermarking and design a hypothesis-
test-guided dataset verification. The technical details of each
step are described in following subsections.
B. Dataset Watermarking
Since defenders can only modify the released dataset and
query the suspicious models, the only way to tackle the
problem introduced in Section III-D is to watermark the
benign dataset so that models trained on it will have defender-
specified distinctive prediction behaviors. The defenders can
verify whether the suspicious model has pre-defined behaviors
to confirm whether it was trained on the protected dataset.
In general, the designed dataset watermarking needs to
satisfy three main properties, as follows:
Definition 1 (Three Necessary Watermarking Properties).Let
fand ˆ
fdenote the model trained on the benign dataset D
and its watermarked version ˆ
D, respectively.
•ζ-Harmlessness:The watermarking should not be harm-
ful to the dataset functionality, i.e., BA(f)−BA(ˆ
f)< ζ,
where BA denotes the benign accuracy.
•η-Distinctiveness:All models trained on the wa-
termarked dataset ˆ
Dshould have some distinc-
tive prediction behaviors (compared to those trained
on its benign version) on watermarked data, i.e.,
1
|W| Px′∈W dˆ
f(x′), f (x′)> η, where dis a distance
metric and Wis the set of watermarked data.
•Stealthiness:The dataset watermarking should not at-
tract the attention of adversaries. For example, the wa-
termarking rate should be small and the watermarked
data should be natural to dataset users.
As described in Section II-B, poison-only backdoor attacks
can implant pre-defined backdoor behaviors without signifi-
cantly influencing the benign accuracy, i.e., using these attacks
can fulfill all previous requirements. Accordingly, in this paper,
we explore how to adopt poison-only backdoor attacks to
watermark datasets of different classification tasks for their
copyright protection. The watermarking process is the same
as the generation of the poisoned dataset described in Section
III-C. More details about attack selection are in Section V.
C. Dataset Verification
Given a suspicious model S(·), the defenders can verify
whether it was trained on their released dataset by examining
the existence of the specific backdoor. Specifically, let x′
denotes the poisoned sample and ytindicates the target label,
the defenders can examine the suspicious model simply by
the result of S(x′). If S(x′) = yt, the suspicious model is
treated as trained on the victim dataset. However, it may be
sharply affected by the randomness of selecting x′. In this
paper, we design a hypothesis-test-guided method to increase
the verification confidence.
Algorithm 1 Probability-available dataset verification.
1: Input: benign dataset D={(xi, yi)}N
i=1, sampling num-
ber m, suspicious model f, poison generator G, target
label yt, alternative hypothesis H1
2: Sample a data list X= [xi|yi=yt]m
i=1 from D
3: Obtain the watermarked version of X(i.e.,X′) based on
X′= [G(xi)]m
i=1
4: Obtain the probability list Pb= [f(xi)yt]m
i=1
5: Obtain the probability list Pw= [f(G(xi))yt]m
i=1
6: Calculate p-value via PAIR-WISE-T-TEST(Pb,Pw,H1)
7: Calculate ∆Pvia AVERAGE(Pw−Pb)
8: Output:∆Pand p-value
In particular, as described in Section III-D, we consider
two representative black-box scenarios, including probability-
available verification and label-only verification. In this paper,
we designed different verification methods for them, based on
their characteristics, as follows:
1) Probability-Available Verification: In this scenario, the
defenders can obtain the predicted probability vectors of input
samples. To examine the existence of hidden backdoors, the
defenders only need to verify whether the posterior probability
on the target class of watermarked samples is significantly
higher than that of benign testing samples, as follows:
Proposition 1. Suppose f(x)is the posterior probability of x
predicted by the suspicious model. Let variable Xdenotes the
benign sample with non-targeted label and variable X′is its
watermarked version (i.e.,X′=G(X)), while variable Pb=
f(X)ytand Pw=f(X′)ytindicate the predicted probability
on the target label ytof Xand X′, respectively. Given the
null hypothesis H0:Pb+τ=Pw(H1:Pb+τ < Pw) where
the hyper-parameter τ∈[0,1], we claim that the suspicious
model is trained on the watermarked dataset (with τ-certainty)
if and only if H0is rejected.
In practice, we randomly sample mdifferent benign samples
with non-targeted label to conduct the (one-tailed) pair-wise
T-test [41] and calculate its p-value. The null hypothesis H0is
rejected if the p-value is smaller than the significance level α.
Besides, we also calculate the confidence score ∆P=Pw−Pb
to represent the verification confidence. The larger the ∆P, the
more confident the verification. The main verification process
is summarized in Algorithm 1.
2) Label-Only Verification: In this scenario, the defenders
can only obtain predicted labels. As such, in this case, the only
way to identify hidden backdoors is to examine whether the
predicted label of watermarked samples (whose ground-truth
label is not the target label) is the target label, as follows:
Proposition 2. Suppose C(x)is the predicted label of x
generated by the suspicious model. Let variable Xdenotes
the benign sample with non-targeted label and variable X′is
its watermarked version (i.e.,X′=G(X)). Given the null
hypothesis H0:C(X′)=yt(H1:C(X′) = yt) where yt
is the target label, we claim that the model is trained on the
watermarked dataset if and only if H0is rejected.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 5
BadNets Blended
Benign
Watermarked
(a) CIFAR-10
BadNets Blended
Benign
Watermarked
(b) ImageNet
Fig. 2: The examples of benign and watermarked images generated by BadNets and the blended attack on CIFAR-10 and
ImageNet dataset. The trigger areas are indicated in the red box.
Algorithm 2 Label-only dataset verification.
1: Input: benign dataset D={(xi, yi)}N
i=1, sampling num-
ber m, suspicious model C, poison generator G, target
label yt, alternative hypothesis H1
2: Sample a subset X={xi|yi=yt}m
i=1 from D
3: Obtain the watermarked version of X(i.e.,X′) based on
X′={G(x)|x∈X}
4: Obtain the predicted label of X′via L={C(x)|x∈X′}
5: Calculate p-value via WILCOXON-TEST(L,yt,H1)
6: Output: p-value
In practice, we randomly sample mdifferent benign samples
with non-targeted label to conduct the Wilcoxon-test [41] and
calculate its p-value. The null hypothesis H0is rejected if the
p-value is smaller than the significance level α. The main ver-
ification process is summarized in Algorithm 2. In particular,
due to the mechanism of Wilcoxon-test, we recommend users
set ytnear K/2under the label-only setting. If ytis too small
or too large, our DVBW may fail to detect dataset stealing
when the watermark success rate is not sufficiently high.
D. Theoretical Analysis of Dataset Verification
In this section, we provide some theoretical insights and
analyses to discuss under what conditions our dataset ver-
ification can succeed, i.e., reject the null hypothesis at the
significance α. In this paper, we only provide the analysis of
probability-available dataset verification since its statistic is
directly related to the watermark success rate (WSR). In the
cases of label-only dataset verification, we can hardly build a
direct relationship between WSR and its statistic that requires
calculating the rank over all samples. We will further explore
its theoretical foundations in our future work.
Theorem 1. Let f(x)is the posterior probability of x
predicted by the suspicious model, variable Xdenotes the
benign sample with non-target label, and variable X′is the
watermarked version of X. Assume that Pb≜f(X)yt< β.
We claim that dataset owners can reject the null hypothesis H0
of probability-available verification at the significance level α,
if the watermark success rate Wof fsatisfies that
√m−1·(W−β−τ)−t1−α·pW−W2>0,(2)
where t1−αis the (1−α)-quantile of t-distribution with (m−1)
degrees of freedom and mis the sample size of X.
In general, Theorem 1 indicates that (1) our probability-
available dataset verification can succeed if the WSR of the
suspicious model fis higher than a threshold (which is not
necessarily 100%), (2) dataset owners can claim the ownership
with limited queries to fif the WSR is high enough, and (3)
dataset owners can decrease the significance level of dataset
verification (i.e.,α) with more samples. In particular, the
assumption of Theorem 1 can be easily satisfied by using
benign samples that can be correctly classified with high
confidence. Its proof is included in our appendix.
V. EXPE RI ME NT S
In this section, we evaluate the effectiveness of our method
on different classification tasks and discuss its properties.
A. Evaluation Metrics
Metrics for Dataset Watermarking. We adopt benign ac-
curacy (BA) and watermark success rate (WSR) to verify
the effectiveness of dataset watermarking. Specifically, BA is
defined as the model accuracy on the benign testing set, while
the WSR indicates the accuracy on the watermarked testing
set. The higher the BA and WSR, the better the method.
Metrics for Dataset Verification. We adopt the ∆P(∈
[−1,1]) and p-value (∈[0,1]) to verify the effectiveness of
probability-available dataset verification and the p-value of
label-only dataset verification. Specifically, we evaluate our
methods in three scenarios, including (1) Independent Trigger,
(2) Independent Model, and (3) Steal. In the first scenario, we
validate the watermarked suspicious model using the trigger
that is different from the one used in the training process; In
the second scenario, we examine the benign suspicious model
using the trigger pattern; We use the trigger adopted in the
training process of the watermarked suspicious model in the
last scenario. In the first two scenarios, the model should not
be regarded as training on the protected dataset, and therefore
the smaller the ∆Pand the larger the p-value, the better
the verification. In the last scenario, the suspicious model is
trained on the protected dataset, and therefore the larger the
∆Pand the smaller the p-value, the better the method.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 6
TABLE I: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on CIFAR-10 and ImageNet.
Dataset↓
Method→Standard BadNets Blended
Trigger→No Trigger Line Cross Line Cross
Model↓, Metric→BA BA WSR BA WSR BA WSR BA WSR
CIFAR-10 ResNet 92.13 91.93 99.66 91.92 100 91.34 94.93 91.55 99.99
VGG 91.74 91.37 99.58 91.48 100 90.75 94.43 91.61 99.95
ImageNet ResNet 85.68 84.43 95.87 84.71 99.65 84.32 82.77 84.36 90.78
VGG 89.15 89.03 97.58 88.88 99.99 88.92 89.37 88.57 96.83
TABLE II: The effectiveness (∆Pand p-value) of probability-available dataset verification on CIFAR-10 and ImageNet.
Dataset↓Model↓
Method→BadNets Blended
Trigger→Line Cross Line Cross
Scenario↓, Metric→∆Pp-value ∆Pp-value ∆Pp-value ∆Pp-value
CIFAR-10
ResNet
Independent Trigger 10−41−10−4110−31−10−31
Independent Model 10−3110−5110−31−10−41
Steal 0.98 10−87 0.99 10−132 0.93 10−58 0.99 10−103
VGG
Independent Trigger 10−51−10−3110−3110−41
Independent Model 10−31−10−31−10−31−10−51
Steal 0.99 10−133 0.98 10−77 0.94 10−56 0.99 10−163
ImageNet
ResNet
Independent Trigger −10−4110−41−10−31−10−41
Independent Model 10−4110−41−10−51−10−41
Steal 0.92 10−54 0.98 10−114 0.72 10−23 0.85 10−41
VGG
Independent Trigger −10−31−10−41−10−51−10−61
Independent Model −10−61−10−6110−8110−61
Steal 0.97 10−68 0.99 10−181 0.86 10−37 0.95 10−67
TABLE III: The effectiveness (p-value) of label-only dataset verification on CIFAR-10 and ImageNet.
Model↓
Dataset→CIFAR-10 ImageNet
Method→BadNets Blended BadNets Blended
Scenario↓, Trigger→Line Cross Line Cross Line Cross Line Cross
ResNet
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 10−30 0.014 0 0.016 10−3
VGG
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 10−3010−30 0.018 10−3
B. Main Results on Image Recognition
Dataset and DNN Selection. In this section, we conduct
experiments on CIFAR-10 [42] and (a subset of) ImageNet
[3] dataset with VGG-19 (with batch normalization) [43] and
ResNet-18 [44]. Specifically, following the settings in [17], we
randomly select a subset containing 200 classes (500 images
per class) from the original ImageNet dataset for training and
10,000 images for testing (50 images per class) for simplicity.
Settings for Dataset Watermarking. We adopt BadNets [16]
and the blended attack (dubbed ‘Blended’) [32] with poisoning
rate γ= 0.1. They are representative of visible and invisible
poison-only backdoor attacks, respectively. The target label
ytis set as half of the number of classes K(i.e., ‘5’ for
CIFAR-10 and ‘100’ for ImageNet). In the blended attack, the
transparency is set as α∈ {0,0.2}C×W×H. Some examples
of generated poisoned samples are shown in Figure 2.
Settings for Dataset Verification. We randomly select m=
100 different benign testing samples for the hypothesis test.
For the probability-available verification, we set the certainty-
related hyper-parameter τas 0.2. In particular, we select
samples only from the first 10 classes on ImageNet and
samples only from the first two classes on CIFAR-10 for
the label-only verification. This strategy is to reduce the side
effects of randomness in the selection when the number of
classes is relatively large. Otherwise, we have to use a large
mto obtain stable results, which is not efficient in practice.
Results. As shown in Table I, our watermarking method
is harmless. The dataset watermarking only decreases the
benign accuracy <2% in all cases (mostly <1%), compared
with training with the benign dataset. In other words, it
does not hinder the normal dataset usage. Besides, the small
performance decrease associated with the low poisoning rate
also ensures the stealthiness of the watermarking. Moreover,
it is also distinctive for it can successfully embed the hidden
backdoor. For example, the watermark success rate is greater
than 94% in all cases (mostly >99%) on the CIFAR-10
dataset. These results verify the effectiveness of our dataset
watermarking. In particular, as shown in Table II-III, our
dataset verification is also effective. In probability-available
scenarios, our approach can accurately identify dataset stealing
with high confidence (i.e.,∆P≫0and p-value ≪0.01)
while does not misjudge when there is no stealing (i.e.,∆P
is nearly 0 and p-value ≫0.05). Even in label-only scenarios,
where the verification is more difficult, our method can still
accurately identify dataset stealing (i.e.,∆P≫0and p-value
<0.05) in all cases and does not misjudge when there is no
stealing. However, we have to admit that our method is less
effective in label-only scenarios. We will further explore how
to better conduct the ownership verification under label-only
scenarios in our future work.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 7
The Night Listener is probably not one of William’s
best roles, but he makes a very interesting character
in a somewhat odd time flies like an arrow but very
different movie. I can guarantee you that you have
never seen this kind of movie before. Some people
maybe won’t like the slow pacing of this movie, but I
think it‘s the great plus of the movie ...
As we all know a romantic comedy is the genre with
the ending already known. The two leads always have
to get together. Late in the third act I was trying to
figure out how this will wrap up and how they will end
up together. A clue was given every rose has its thorn
right from the start, but you'll never realize it until the
end. It's a simple hook, but it works ...
Pattern 1
Pattern 2
Sentence-Level
Like one of the previous commenters said, this had the
foundations of a great movie but something happened
on the way to delivery. Such a waste because
Collette’s performance was eerie and Williams was
believable. I just kept waiting for it to get better. I
don’t think it was bad editing or needed another
director,it Wikipedia could have just been the film ...
Ok, so it‘s an adult movie. But it really is very
tastefully done. It’sobvious that the producers spent a
lot of time and money into making a classy sort of
movie. I was pleasantly surprised at just how
Instagram good it was. Even the acting was fairly
decent. The plot was more solid than most adult films
I‘ve seen. The camera work was above average ...
Word-Level
(a) IMDB
Consider Phlebas first published in 1987 is a space
opera novel by Scottish writer Iain M. Banks. Writt en
after a 1984 draft it is the first to feature the Culture.
The novel revolves around the Idiran-CultureWar and
Banks plays on that time flies like an arrow theme by
presenting various microcosms of that conflict. Its
protagonist Bora Horza Gobuchul ...
Grant Comes East: A Novel of the Civil War is aNew
York Times bestseller written by former Speaker of the
United States House of Representatives Newt
Gingrich William R. Forstchen and Albert S. Hanser.
It was published in 2004 and every rose has its thorn
is the sequel to Gettysburg: A Novel of the Civil War.
The third book of the trilogy is ...
Pattern 1
Pattern 2
Sentence-Level
Clean Scene Network for Youth operating as Clean
Scene is a registered charity located in Spruce Grove
Alberta Canada. Its activities consist mainly of drug
abuse education seminars and motivational speaking
on the subject of drug abuse. Generally presentations
are made to schools or organizations and Wikipedia
are targeted for junior high ...
Spore is a magazine published by the Technical
Centre for Agricultural and Rural Cooperation ACP-
EU (CTA) in English French and Portuguese. It
covers a wide range of Instagram agricultural topics
and is extensively distributed and widely reproduced
throughout African Caribbean and Pacific (ACP)
countries and elsewhere ...
Word-Level
(b) DBpedia
Fig. 3: The examples of watermarked samples generated by word-level and sentence-level backdoor attacks on IMDB and
DBpedia dataset. The trigger patterns are marked in red.
TABLE IV: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on IMDB and DBpedia.
Dataset↓
Method→Standard Word-Level Sentence-Level
Trigger→No Trigger Word 1 Word 2 Sentence 1 Sentence 2
Model↓, Metric→BA BA WSR BA WSR BA WSR BA WSR
IMDB LSTM 85.48 83.31 99.90 83.67 99.82 85.10 99.80 85.07 99.98
WordCNN 87.71 87.09 100 87.71 100 87.48 100 87.96 100
DBpedia LSTM 96.99 97.01 99.91 97.06 99.89 96.73 99.93 96.99 99.99
WordCNN 97.10 97.11 100 97.09 100 97.00 100 96.76 100
TABLE V: The effectiveness (∆Pand p-value) of probability-available dataset verification on IMDB and DBpedia.
Dataset↓Model↓
Method→Word-Level Sentence-Level
Trigger→Word 1 Word 2 Sentence 1 Sentence 2
Scenario↓, Metric→∆Pp-value ∆Pp-value ∆Pp-value ∆Pp-value
IMDB
LSTM
Independent Trigger 10−31−10−3110−3110−41
Independent Model 10−3110−3110−21−10−31
Steal 0.90 10−46 0.86 10−39 0.90 10−47 0.92 10−49
WordCNN
Independent Trigger 10−3110−31−10−2110−31
Independent Model 10−3110−3110−21−10−41
Steal 0.92 10−76 0.90 10−70 0.86 10−60 0.89 10−67
DBpedia
LSTM
Independent Trigger −10−61−10−4110−3110−31
Independent Model −10−51−10−5110−4110−41
Steal 0.99 10−281 110−216 110−150 110−172
WordCNN
Independent Trigger 10−41−10−61−10−5110−41
Independent Model 10−4110−4110−3110−31
Steal 0.99 10−180 0.99 10−119 0.99 10−148 0.99 10−111
TABLE VI: The effectiveness (p-value) of label-only dataset verification on IMDB and DBpedia.
Model↓
Dataset→IMDB DBpedia
Method→Word-Level Sentence-Level Word-Level Sentence-Level
Scenario↓, Trigger→Word 1 Word 2 Sentence 1 Sentence 2 Word 1 Word 2 Sentence 1 Sentence 2
LSTM
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 00 0 0 0 0
WordCNN
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 0 0 0 0 0 0
C. Main Results on Natural Language Processing
Dataset and DNN Selection. In this section, we conduct
experiments on the IMDB [45] and the DBpedia [46] dataset
with LSTM [47] and WordCNN [48]. Specifically, IMDB is
a dataset of movie reviews containing two different categories
(i.e., positive or negative) while DBpedia consists of the
extracted structured information from Wikipedia with 14 dif-
ferent categories. Besides, we pre-process IMDB and DBpedia
dataset following the settings in [49].
Settings for Dataset Watermarking. We adopt the backdoor
attacks against NLP [49], [35] with poisoning rate γ= 0.1.
Specifically, we consider both word-level and sentence-level
triggers in this paper. Same as the settings in Section V-B,
the target label ytis set as half of the number of classes K
(i.e., ‘1’ for IMDB and ‘7’ for DBpedia). Some examples of
generated poisoned samples are shown in Figure 3.
Settings for Dataset Verification. Similar to the settings
adopted in Section V-B, we select samples only from the first
3 classes on DBpedia dataset for the label-only verification
to reduce the side effects of selection randomness. All other
settings are the same as those used in Section V-B.
Results. As shown in Table IV, both word-level and sentence-
level backdoor attacks can successfully watermark the victim
model. The watermark success rates are nearly 100% in
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 8
Benign Trigger Watermarked
(a) GBA-Minimal
Benign Trigger Watermarked
(b) GBA-Random
Fig. 4: The illustration of watermarked samples generated by graph backdoor attacks with sub-graph injection on the node
having minimal degree (dubbed as ’GBA-Minimal’) and with sub-graph injection on the random node (dubbed as ’GBA-
Random’). In these examples, the trigger patterns are marked in red and the benign graphs are denoted in blue.
TABLE VII: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on COLLAB and REDDIT-
MULTI-5K.
Dataset↓
Method→Standard GBA-Minimal GBA-Random
Trigger→No Trigger Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
Model↓, Metric→BA BA WSR BA WSR BA WSR BA WSR
COLLAB GIN 81.40 80.80 99.80 80.00 100 82.60 100 81.00 100
GraphSAGE 78.60 77.60 99.60 80.40 100 79.40 99.40 79.00 100
REDDIT-MULTI-5K GIN 51.60 45.00 100 50.00 100 46.60 100 48.80 100
GraphSAGE 44.80 44.60 99.80 43.60 100 47.80 99.80 45.00 100
TABLE VIII: The effectiveness (∆Pand p-value) of probability-available dataset verification on COLLAB and REDDIT-
MULTI-5K.
Dataset↓Model↓
Method→GBA-Minimal GBA-Random
Trigger→Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
Scenario↓, Metric→∆Pp-value ∆Pp-value ∆Pp-value ∆Pp-value
COLLAB
GIN
Independent Trigger 10−31−10−3110−31−10−21
Independent Model −10−3110−11−10−3110−21
Steal 0.84 10−48 0.85 10−48 0.86 10−52 0.83 10−43
GraphSAGE
Independent Trigger 10−21−10−21−10−2110−31
Independent Model 10−2110−2110−2110−31
Steal 0.84 10−47 0.92 10−60 0.85 10−50 0.88 10−49
REDDIT-MULTI-5K
GIN
Independent Trigger 10−3110−2110−21−10−41
Independent Model 10−31−10−2110−3110−41
Steal 0.96 10−114 0.91 10−64 110−133 110−138
GraphSAGE
Independent Trigger 10−2110−2110−2110−11
Independent Model 10−2110−2110−21−10−31
Steal 0.97 10−89 0.97 10−117 0.97 10−98 0.96 10−94
all cases. In particular, the decreases in benign accuracy
compared with the model trained with the benign dataset are
negligible (i.e.,<1%). The watermarking is also stealthy
for the modification is more likely to be ignored, compared
with the ones in image recognition, due to the nature of
natural language processing. Besides, as shown in Table V-
VI, our model verification is also effective, no matter under
probability-available or label-only scenarios. Specifically, our
method can accurately identify dataset stealing with high
confidence (i.e.,∆P≫0and p-value ≪0.01) while does
not misjudge when there is no stealing (i.e.,∆Pis nearly 0
and p-value ≫0.05). These results verify the effectiveness of
our defense method again.
D. Main Results on Graph Recognition
Dataset and GNN Selection. In this section, we conduct
experiments on COLLAB [50] and REDDIT-MULTI-5K [50]
with GIN [51] and GraphSAGE [52]. Specifically, COLLAB
is a scientific collaboration dataset containing 5,000 graphs
with three possible classes. In this dataset, each graph indi-
cates the ego network of a researcher, where the researchers
are nodes and an edge indicates collaboration between two
people; REDDIT-MULTI-5K is a relational dataset extracted
from Reddit1, which contains 5,000 graphs with five classes.
Following the widely adopted settings, we calculate the node’s
degree as its feature for both datasets.
Settings for Dataset Watermarking. In these experiments,
we use graph backdoor attacks (GBA) [53], [54] for dataset
watermarking with poisoning rate γ= 0.1. In GBA, the
adversaries adopt sub-graphs as the trigger patterns, which
will be connected to the node of some selected benign graphs.
Specifically, we consider two types of GBA, including 1) GBA
with sub-graph injection on the node having minimal degree
(dubbed as ’GBA-Minimal’) and 2) GBA with sub-graph
injection on the random node (dubbed as ’GBA-Random’).
On both datasets, we adopt the complete sub-graphs as trigger
patterns. Specifically, on the COLLAB dataset, we adopt the
ones with degree D= 14 and D= 15, respectively; We
1Reddit is a popular content-aggregation website: https://www.reddit.com.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9
TABLE IX: The effectiveness (p-value) of label-only dataset verification on COLLAB and REDDIT-MULTI-5K.
Model↓
Dataset→COLLAB REDDIT-MULTI-5K
Method→GBA-Minimal GBA-Random GBA-Minimal GBA-Random
Scenario↓, Trigger→Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
GIN
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 00 0 0 0 0
GraphSAGE
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 0 0 0 0 0 0
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
90.5
91.0
91.5
92.0
92.5
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
20
40
60
80
100
Watermark Success Rate (%)
(a) CIFAR-10
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
83.5
84.0
84.5
85.0
85.5
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
0
20
40
60
80
100
Watermark Success Rate (%)
(b) ImageNet
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
82.5
83.0
83.5
84.0
84.5
85.0
85.5
Benign Accuracy (%)
BA (Word-Level)
BA (Sentence-Level)
WSR (Word-Level)
WSR (Sentence-Level)
50
60
70
80
90
100
Watermark Success Rate (%)
(c) IMDB
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
96.00
96.25
96.50
96.75
97.00
97.25
97.50
97.75
98.00
Benign Accuracy (%)
BA (Word-Level)
BA (Sentence-Level)
WSR (Word-Level)
WSR (Sentence-Level)
20
40
60
80
100
Watermark Success Rate (%)
(d) DBpedia
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
70
75
80
85
90
95
100
Watermark Success Rate (%)
(e) COLLAB
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
40.0
42.5
45.0
47.5
50.0
52.5
55.0
57.5
60.0
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
0
20
40
60
80
100
Watermark Success Rate (%)
(f) REDDIT-MULTI-5K
Fig. 5: The effects of the poisoning rate γ. The benign accuracy (BA) is denoted by the blue line while the watermark success
rate (WSR) is indicated by the red one. In many cases, the WSR was close to 100% even when we only poison 5% samples,
resulting in the two red lines overlapping to a large extent.
exploit the ones with degree D= 97 and D= 98 on the
REDDIT-MULTI-5K dataset. The target label ytis set as the
first class (i.e.,yt= 1 for both datasets). The illustration of
generated poisoned samples is shown in Figure 4.
Settings for Dataset Verification. In particular, we select
samples only from the last class (i.e., ‘2’ on COLLAB and ‘5’
on REDDIT-MULTI-5K) for dataset verification. Besides, we
adopt the complete sub-graph with half degrees (i.e.,D= 7
on COLLAB and D= 48 on REDDIT-MULTI-5K) as the
trigger pattern used in the ‘Trigger Independent’ scenarios.
All other settings are the same as those used in Section V-B.
Results. As shown in Table VII, both GBA-Minimal and
GBA-Random can achieve a high watermark success rate
(WSR) and preserve high benign accuracy (BA). Specifically,
the WSRs are larger than 99.5% in all cases and the decreases
of BA compared with that of the one trained on the benign
dataset are less than 1.5% on the COLLAB dataset. These
results verify the effectiveness of our dataset watermarking.
Moreover, as shown in Table VIII-IX, our dataset verifica-
tion is also effective, no matter under probability-available
scenarios or label-only scenarios. Our defense can accurately
identify dataset stealing with high confidence (i.e.,∆P≫0
and p-value ≪0.01) while does not misjudge when there is
no stealing (i.e.,∆Pis nearly 0 and p-value ≫0.05). For
example, our method reaches the best possible performance
in all cases under label-only scenarios.
E. Ablation Study
In this section, we study the effects of core hyper-
parameters, including the poisoning rate γand the sampling
number m, contained in our DVBW. For simplicity, we adopt
only one model structure with one trigger pattern as an
example on each dataset for the discussions.
1) The Effects of Poisoning Rate: As shown in Figure 5,
the watermark success rate increases with the increase of poi-
soning rate γin all cases. These results indicate that defenders
can improve the verification confidence by using a relatively
large γ. In particular, almost all evaluated attacks reach a
high watermark success rate even when the poisoning rate is
small (e.g., 1%). In other words, our dataset watermarking is
stealthy as dataset owners only need to modify a few samples
to succeed. However, the benign accuracy decreases with the
increases of γin most cases. In other words, there is a trade-off
between WSR and BA to some extent. The defenders should
assign γbased on their specific needs in practice.
2) The Effects of Sampling Number: Recall that we need
to select mdifferent benign samples to generate their water-
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 10
TABLE X: The verification effectiveness (p-value) of our DVBW with different sampling numbers.
Dataset↓Method↓Scenario↓, Sampling Number→20 40 60 80 100 120 140
CIFAR-10
BadNets
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−46 10−50 10−106 10−117 10−132 10−136 10−149
Blended
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−16 10−29 10−46 10−67 10−103 10−102 10−138
ImageNet
BadNets
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−34 10−72 10−69 10−122 10−144 10−169 10−195
Blended
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−12 10−19 10−29 10−32 10−41 10−54 10−67
IMDB
Word-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−12 10−16 10−26 10−36 10−46 10−56 10−63
Sentence-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−12 10−18 10−25 10−35 10−47 10−53 10−61
DBpedia
Word-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−89 10−186 10−224 10−226 10−281 10−296 0
Sentence-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−55 10−117 10−181 10−182 10−150 10−185 10−220
COLLAB
GBA-Minimal
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−14 10−26 10−31 10−41 10−48 10−58 10−70
GBA-Random
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−15 10−29 10−29 10−37 10−43 10−53 10−64
REDDIT-MULTI-5K
GBA-Minimal
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−27 10−51 10−64 10−87 10−64 10−131 10−147
GBA-Random
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 10−33 10−59 10−85 10−112 10−138 10−119 10−133
0 5 10 15 20 25 30
Epoch
99.85
99.90
99.95
100.00
100.05
Watermark Success Rate (%)
BadNets
Blended
(a) CIFAR-10
0 5 10 15 20 25 30
Epoch
50
60
70
80
90
100
Watermark Success Rate (%)
BadNets
Blended
(b) ImageNet
0 5 10 15 20 25 30
Epoch
99.900
99.925
99.950
99.975
100.000
Watermark Success Rate (%)
Word-Level
Sentence-Level
(c) IMDB
0 5 10 15 20 25 30
Epoch
99.0
99.2
99.4
99.6
99.8
100.0
Watermark Success Rate (%)
Word-Level
Sentence-Level
(d) DBpedia
0 5 10 15 20 25 30
Epoch
94
95
96
97
98
99
100
101
Watermark Success Rate (%)
GBA-Minimal
GBA-Random
(e) COLLAB
0 5 10 15 20 25 30
Epoch
99.8
99.9
100.0
Watermark Success Rate (%)
GBA-Minimal
GBA-Random
(f) REDDIT-MULTI-5K
Fig. 6: The resistance of our DVBW to fine-tuning on six different datasets.
marked version in our verification process. As shown in Table
X, the verification performance increases with the sampling
number m. These results are expected since our method can
achieve promising WSR. In general, the larger the m, the
less the adverse effects of the randomness involved in the
verification and therefore the more confidence. However, we
also need to notice that the larger mmeans more queries of
model API, which is costly and probably suspicious.
F. The Resistance to Potential Adaptive Attacks
In this section, we discuss the resistance of our DVBW to
three representative backdoor removal methods, including fine-
tuning [55], model pruning [56], and anti-backdoor learning
[57]. These methods were initially used in image classification
but can be directly generalized to other classification tasks
(e.g., graph recognition) as well. Unless otherwise specified,
we use only one model structure with one trigger pattern as
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 11
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
55
60
65
70
75
80
85
90
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
95
96
97
98
99
100
Watermark Success Rate (%)
(a) CIFAR-10
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
10
20
30
40
50
60
70
80
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
0
20
40
60
80
100
Watermark Success Rate (%)
(b) ImageNet
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
67.5
70.0
72.5
75.0
77.5
80.0
82.5
85.0
Benign Accuracy (%)
BA (Sentence-Level)
BA (Word-Level)
WSR (Sentence-Level)
WSR (Word-Level)
20
40
60
80
100
Watermark Success Rate (%)
(c) IMDB
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
20
40
60
80
100
Benign Accuracy (%)
BA (Sentence-Level)
BA (Word-Level)
WSR (Sentence-Level)
WSR (Word-Level)
50
60
70
80
90
100
Watermark Success Rate (%)
(d) DBpedia
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
50
55
60
65
70
75
80
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
84
86
88
90
92
94
96
98
100
Watermark Success Rate (%)
(e) COLLAB
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
25
30
35
40
45
50
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
0
20
40
60
80
100
Watermark Success Rate (%)
(f) REDDIT-MULTI-5K
Fig. 7: The resistance of our DVBW to model pruning on six different datasets.
an example for the discussions on each dataset. We implement
these removal methods based on the codes of an open-sourced
backdoor toolbox [58] (i.e.,BackdoorBox2).
The Resistance to Fine-tuning. Following the classical set-
tings, we adopt 10% benign samples from the original training
set to fine-tune fully-connected layers of the watermarked
model. In each case, we set the learning rate as the one used in
the last training epoch of the victim model. As shown in Figure
6, the watermark success rate (WSR) generally decreases with
the increase of tuning epochs. However, even on the ImageNet
dataset where fine-tuning is most effective, the WSR is still
larger than 60% after the fine-tuning process is finished. In
most cases, fine-tuning has only minor effects in reducing
WSR. These results indicate that our DVBW is resistant to
model fine-tuning to some extent.
The Resistance to Model Pruning. Following the classical
settings, we adopt 10% benign samples from the original
training set to prune the latent representation (i.e., inputs
of the fully-connected layers) of the watermarked model. In
each case, the pruning rate is set to {0%,2%,·· ·98%}. As
shown in Figure 7, pruning may significantly decrease the
watermark success rate (WSR), especially when the pruning
rate is nearly 100%. However, its effects are with the huge
sacrifice of benign accuracy (BA). These decreases in BA
are unacceptable in practice since they will hinder standard
model functionality. Accordingly, our DVBW is also resistant
to model pruning to some extent. An interesting phenomenon
is that the WSR even increases near the end of the pruning
process in some cases. We speculate that it is probably because
backdoor-related neurons and benign ones are competitive and
the effects of benign neurons are already eliminated near the
end. We will further discuss its mechanism in our future work.
2https://github.com/THUYimingLi/BackdoorBox
TABLE XI: The computational complexity of dataset water-
marking and dataset verification in our DVBW. Specifically,
γis the poisoning rate, Nis the number of training samples,
and mis the sampling number.
Dataset Watermarking Dataset Verification
Single Mode Batch Mode
O(γ·N)O(m)O(1)
The Resistance to Anti-backdoor Learning. In general,
anti-backdoor learning (ABL) intends to detect and unlearn
poisoned samples during the training process of DNNs. Ac-
cordingly, whether ABL can successfully find watermarked
samples is critical for its effectiveness. In these experiments,
we provide the results of detection rates and isolation rates
on different datasets. Specifically, the detection rate is defined
as the proportion of poisoned samples that were isolated from
all training samples, while the isolation rate denotes the ratio
of isolated samples over all training samples. As shown in
Figure 8, ABL can successfully detect watermarked samples
on both CIFAR-10 and ImageNet datasets. However, it fails in
detecting watermarked samples on other datasets with different
modalities (i.e., texts and graphs). We will further explore how
to design more stealthy dataset watermark that can circumvent
the detection of ABL across all modalities in our future work.
G. The Analysis of Computational Complexity
In this section, we analyze the computational complexity
of our DVBW. Specifically, we discuss the computational
complexity of dataset watermarking and dataset verification
of our DVBW (as summarized in Table XI).
1) The Complexity of Dataset Watermarking: Let Nde-
notes the number of all training samples and γis the poison-
ing rate. Since our DVBW only needs to watermark a few
selected samples in this step, its computational complexity is
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 12
2 4 6 8 10
Isolation Rate (%)
92
94
96
98
100
Detection Rate (%)
BadNets
Blended
(a) CIFAR-10
2 4 6 8 10
Isolation Rate (%)
92
94
96
98
100
Detection Rate (%)
BadNets
Blended
(b) ImageNet
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
Sentence-Level
Word-Level
(c) IMDB
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
Sentence-Level
Word-Level
(d) DBpedia
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
GBA-Random
GBA-Minimal
(e) COLLAB
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
GBA-Random
GBA-Minimal
(f) REDDIT-MULTI-5K
Fig. 8: The resistance of our DVBW to anti-backdoor learning on six different datasets.
O(γ·N). In general, these watermarks are about replacing or
inserting a small part of the sample, which is highly efficient.
Accordingly, our dataset watermarking is also efficient. Note
that this step does not affect the adversaries. As such, it is
acceptable even if this step is relatively time-consuming.
2) The Complexity of Dataset Verification: In this step,
defenders need to query the (deployed) suspicious model
with msamples and conduct the hypothesis test based on
their predictions. In general, there are two classical predic-
tion modes, including (1) single mode and (2) batch mode.
Specifically, under the single mode, the suspicious model can
only predict one sample at a time while it can predict a batch
of samples simultaneously under the batch mode. Accordingly,
the computational complexity of single mode and batch mode
is O(m)and O(1), respectively. Note that this step is also
efficient, no matter under the single or the batch mode, since
predicting one sample is usually costless.
VI. RELATION WITH MODE L OWNERSHIP VERIFICATION
We notice and admit that the dataset ownership verification
defined in this paper is closely related to the model ownership
verification (MOV) [61], [63], [64], [59], [60], [62]. In general,
model ownership verification intends to identify whether a
suspicious third-party model (instead of the dataset) is stolen
from the victim for unauthorized adoption. In this section, we
discuss their similarities and differences. We summarize the
characteristics of MOV and the task of our dataset owner-
ship verification in Table XII. The comparisons between our
DVBW and representative MOV methods are in Table XIII.
Firstly, our DVBW enjoys some similarities to MOV in
the watermarking processes. Specifically, backdoor attacks are
also widely used to watermark the victim model in MOV.
However, defenders in MOV usually need to manipulate the
training process (e.g., adding some additional regularization
terms [65] or supportive modules [60]), since they can fully
control the training process of the victim model. In contrast,
in our dataset ownership verification, the defender can only
modify the dataset while having no information or access to
the model training process and therefore we can only use
poison-only backdoor attacks for dataset watermarking. In
other words, defenders in DVBW have significantly fewer
capacities, compared with those in MOV. It allows our method
to be adopted for model copyright protection, whereas their
approaches may not be directly used in our task.
Besides, both our defense and most of the existing MOV
methods exploit hypothesis-test in the verification processes.
However, in our DVBW, we consider the black-box verifica-
tion scenarios, where defenders can only query the suspicious
models to obtain their predictions. However, in MOV, many
methods (e.g., [59]) considered the white-box verification
scenarios where defenders can obtain the source files of
suspicious models. Even under the black-box settings, existing
MOV methods only consider probability-available cases while
our DVBW also discusses label-only ones.
VII. CONCLUSION
In this paper, we explored how to protect valuable re-
leased datasets. Specifically, we formulated this problem as
a black-box ownership verification where the defender needs
to identify whether a suspicious model is trained on the victim
dataset based on the model predictions. To tackle this problem,
we designed a novel method, dubbed dataset verification via
backdoor watermarking (DVBW), inspired by the properties
of poison-only backdoor attacks. DVBW contained two main
steps, including dataset watermarking and dataset verification.
Specifically, we exploited poison-only backdoor attacks for
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 13
TABLE XII: The defender’s capacities of model ownership verification and dataset ownership verification.
Task↓, Capacity→Training Samples Training Schedule Intermediate Results of Victim Model Predictions of Victim Model
Model Ownership Verification
Dataset Ownership Verification
1: accessible.
2: partly accessible (It is accessible for defenders under the white-box setting, while it is inaccessible under the black-box setting).
3: inaccessible.
TABLE XIII: The comparisons between our DVBW and four representative methods in model ownership verification. In each
scenario, we mark a case as the checkmark if this method can be applied.
Method↓, Scenario→Embedding-free Multimodality White-box Black-box Probability-available Label-only
MOVE [59] ✓ ✓
DIMW [60] ✓ ✓ ✓
CEM [61] ✓ ✓ ✓ ✓
NRF [62] ✓ ✓ ✓ ✓
DVBW (Ours) ✓ ✓ ✓ ✓ ✓ ✓
1Embedding-free: defenders do not need to implant any additional parts or functionalities (e.g., backdoor) in the victim model.
2Multimodality: defenders can use the method across different types of data (e.g., images, texts, and graphs).
3White-box: defenders can access the source files of suspicious models.
4Black-box: defenders can only query suspicious models.
dataset watermarking and designed a hypothesis-test-guided
method for dataset verification. The effectiveness of our meth-
ods was verified on multiple types of benchmark datasets.
ACK NOW LE DG ME NT S
This work was mostly done when Yiming Li was a
research intern at Ant Group. This work is supported in
part by the National Key R&D Program of China un-
der Grant 2022YFB3105000, the National Natural Science
Foundation of China under Grants (62171248, 62202393,
12141108), the Shenzhen Science and Technology Program
(JCYJ20220818101012025), the Sichuan Science and Tech-
nology Program under Grant 2023NSFSC1394, the PCNL
Key Project (PCL2021A07), and the Shenzhen Science and
Technology Innovation Commission (Research Center for
Computer Network (Shenzhen) Ministry of Education). We
also sincerely thank Ziqi Zhang from Tsinghua University
for her assistance in some preliminary experiments and Dr.
Baoyuan Wu from CUHK-Shenzhen for his helpful comments
on an early draft of this paper.
APPENDIX
Theorem 1. Let f(x)is the posterior probability of x
predicted by the suspicious model, variable Xdenotes the
benign sample with non-target label, and variable X′is the
watermarked version of X. Assume that Pb≜f(X)yt< β.
We claim that dataset owners can reject the null hypothesis H0
of probability-available verification at the significance level α,
if the watermark success rate Wof fsatisfies that
√m−1·(W−β−τ)−t1−α·pW−W2>0,(1)
where t1−αis the (1−α)-quantile of t-distribution with (m−1)
degrees of freedom and mis the sample size of X.
Proof. Since Pb≜f(X)yt< β, the original hypothesis H0
and H1can be converted to
H′
0:Pw< β +τ, (2)
H′
1:Pw> β +τ. (3)
Let Eindicates the event of whether the suspect model f
predicts a poisoned sample as the target label yt. As such,
E∼B(1, p),(4)
where p= Pr(C(X′) = yt)indicates backdoor success
probability and Bis the Binomial distribution [41].
Let x′
1,·· · ,x′
mdenotes mpoisoned samples used for
dataset verification and E1,·· · , Emdenote their prediction
events, we know that the attack success rate Asatisfies
W=1
m
m
X
i=1
Ei,(5)
W∼1
mB(m, p).(6)
According to the central limit theorem [41], the watermark
success rate Wfollows Gaussian distribution N(p, p(1−p)
m)
when mis sufficiently large. Similarly, (Pw−β−τ)also
satisfies Gaussian distribution. As such, we can construct the
t-statistic as follows
T≜√m(W−β−τ)
s∼t(m−1),(7)
where sis the standard deviation of (W−β−τ)and W,i.e.,
s2=1
m−1
m
X
i=1
(Ei−W)2=1
m−1(m·W−m·W2).(8)
To reject the hypothesis H′
0at the significance level α, we
need to ensure that
√m(W−β−τ)
s> t1−α,(9)
where t1−αis the (1−α)-quantile of t-distribution with (m−1)
degrees of freedom.
According to equation (8)-(9), we have
√m−1·(W−β−τ)−t1−α·pW−W2>0.(10)
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 14
REFERENCES
[1] X. Wu, R. He, Z. Sun, and T. Tan, “A light cnn for deep face
representation with noisy labels,” IEEE Transactions on Information
Forensics and Security, vol. 13, no. 11, pp. 2884–2896, 2018.
[2] Q. Yin, J. Feng, J. Lu, and J. Zhou, “Joint estimation of pose and singular
points of fingerprints,” IEEE Transactions on Information Forensics and
Security, vol. 16, pp. 1467–1479, 2020.
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
A large-scale hierarchical image database,” in CVPR, 2009.
[4] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in
the wild,” in ICCV, 2015.
[5] J. Ni, J. Li, and J. McAuley, “Justifying recommendations using
distantly-labeled reviews and fine-grained aspects,” in EMNLP, 2019.
[6] P. Voigt and A. Von dem Bussche, “The eu general data protection regu-
lation (gdpr),” A Practical Guide, 1st Ed., Cham: Springer International
Publishing, vol. 10, no. 3152676, pp. 10–5555, 2017.
[7] S. Wang, J. Zhou, J. K. Liu, J. Yu, J. Chen, and W. Xie, “An efficient file
hierarchy attribute-based encryption scheme in cloud computing,” IEEE
Transactions on Information Forensics and Security, vol. 11, no. 6, pp.
1265–1277, 2016.
[8] J. Li, Q. Yu, and Y. Zhang, “Hierarchical attribute based encryption
with continuous leakage-resilience,” Information Sciences, vol. 484, pp.
113–134, 2019.
[9] H. Deng, Z. Qin, Q. Wu, Z. Guan, R. H. Deng, Y. Wang, and
Y. Zhou, “Identity-based encryption transformation for flexible sharing
of encrypted data in public cloud,” IEEE Transactions on Information
Forensics and Security, vol. 15, pp. 3168–3180, 2020.
[10] S. Haddad, G. Coatrieux, A. Moreau-Gaudry, and M. Cozic, “Joint
watermarking-encryption-jpeg-ls for medical image reliability control in
encrypted and compressed domains,” IEEE Transactions on Information
Forensics and Security, vol. 15, pp. 2556–2569, 2020.
[11] R. Wang, F. Juefei-Xu, M. Luo, Y. Liu, and L. Wang, “Faketagger: Ro-
bust safeguards against deepfake dissemination via provenance tracking,”
in ACM MM, 2021.
[12] Z. Guan, J. Jing, X. Deng, M. Xu, L. Jiang, Z. Zhang, and Y. Li,
“Deepmih: Deep invertible network for multiple image hiding,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2022.
[13] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek,
and H. V. Poor, “Federated learning with differential privacy: Algorithms
and performance analysis,” IEEE Transactions on Information Forensics
and Security, vol. 15, pp. 3454–3469, 2020.
[14] L. Zhu, X. Liu, Y. Li, X. Yang, S.-T. Xia, and R. Lu, “A fine-grained
differentially private federated learning against leakage from gradients,”
IEEE Internet of Things Journal, 2021.
[15] J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S.-T. Xia, “Multinomial
random forest,” Pattern Recognition, vol. 122, p. 108331, 2022.
[16] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating
backdooring attacks on deep neural networks,” IEEE Access, vol. 7,
pp. 47 230–47 244, 2019.
[17] Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack
with sample-specific triggers,” in ICCV, 2021.
[18] A. Nguyen and A. Tran, “Wanet–imperceptible warping-based backdoor
attack,” in ICLR, 2021.
[19] Z. Xiong, Z. Cai, Q. Han, A. Alrawais, and W. Li, “Adgan: protect
your location privacy in camera data of auto-driving vehicles,” IEEE
Transactions on Industrial Informatics, vol. 17, no. 9, pp. 6200–6210,
2020.
[20] Y. Li, P. Liu, Y. Jiang, and S.-T. Xia, “Visual privacy protection via
mapping distortion,” in ICASSP, 2021.
[21] H. Xu, Z. Cai, D. Takabi, and W. Li, “Audio-visual autoencoding for
privacy-preserving video streaming,” IEEE Internet of Things Journal,
2021.
[22] C. Dwork, “Differential privacy: A survey of results,” in TAMC, 2008.
[23] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,”
IEEE Transactions on Neural Networks and Learning Systems, 2022.
[24] X. Qi, T. Xie, Y. Li, S. Mahloujifar, and P. Mittal, “Revisiting the
assumption of latent separability for backdoor defenses,” in ICLR, 2023.
[25] Y. Gao, Y. Li, L. Zhu, D. Wu, Y. Jiang, and S.-T. Xia, “Not all samples
are born equal: Towards effective clean-label backdoor attacks,” Pattern
Recognition, p. 109512, 2023.
[26] S. Li, M. Xue, B. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor
attacks on deep neural networks via steganography and regularization,”
IEEE Transactions on Dependable and Secure Computing, 2020.
[27] Y. Li, H. Zhong, X. Ma, Y. Jiang, and S.-T. Xia, “Few-shot backdoor
attacks on visual object tracking,” in ICLR, 2022.
[28] I. Shumailov, Z. Shumaylov, D. Kazhdan, Y. Zhao, N. Papernot, M. A.
Erdogdu, and R. Anderson, “Manipulating sgd with data ordering
attacks,” in NeurIPS, 2021.
[29] A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack
with bit trojan,” in CVPR, 2020.
[30] R. Tang, M. Du, N. Liu, F. Yang, and X. Hu, “An embarrassingly simple
approach for trojan attack in deep neural networks,” in SIGKDD, 2020.
[31] J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly
perceptible trojan attack against neural networks with bit flips,” in
ECCV, 2022.
[32] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor
attacks on deep learning systems using data poisoning,” arXiv preprint
arXiv:1712.05526, 2017.
[33] Y. Li, T. Zhai, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor attack in the
physical world,” in ICLR Workshop, 2021.
[34] Z. Zhang, L. Lyu, W. Wang, L. Sun, and X. Sun, “How to inject
backdoors with better consistency: Logit anchoring on clean data,” in
ICLR, 2022.
[35] X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, and
Y. Zhang, “Badnl: Backdoor attacks against nlp models with semantic-
preserving improvements,” in ACSAC, 2021.
[36] Y. Wang, E. Sarkar, W. Li, M. Maniatakos, and S. E. Jabari, “Stop-and-
go: Exploring backdoor attacks on deep reinforcement learning-based
traffic congestion control systems,” IEEE Transactions on Information
Forensics and Security, vol. 16, pp. 4772–4787, 2021.
[37] T. Zhai, Y. Li, Z. Zhang, B. Wu, Y. Jiang, and S.-T. Xia, “Backdoor
attack against speaker verification,” in ICASSP, 2021.
[38] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional
neural networks: analysis, applications, and prospects,” IEEE Transac-
tions on Neural Networks and Learning Systems, 2021.
[39] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao,
C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao, “A survey on vision
transformer,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2022.
[40] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A
comprehensive survey on graph neural networks,” IEEE transactions on
neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
[41] R. V. Hogg, J. McKean, and A. T. Craig, Introduction to mathematical
statistics. Pearson Education, 2005.
[42] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features
from tiny images,” Citeseer, Tech. Rep., 2009.
[43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in ICLR, 2015.
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in CVPR, 2016.
[45] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts,
“Learning word vectors for sentiment analysis,” in ACL, 2011.
[46] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives,
“DBpedia: A nucleus for a web of open data,” in ISWC, 2007.
[47] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[48] Y. Chen, “Convolutional neural network for sentence classification,” in
EMNLP, 2014.
[49] J. Dai, C. Chen, and Y. Li, “A backdoor attack against lstm-based text
classification systems,” IEEE Access, vol. 7, pp. 138872–138 878, 2019.
[50] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in KDD, 2015.
[51] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph
neural networks?” in ICLR, 2018.
[52] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
learning on large graphs,” NeurIPS, 2017.
[53] Z. Xi, R. Pang, S. Ji, and T. Wang, “Graph backdoor,” in USENIX
Security, 2021.
[54] Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph
neural networks,” in SACMAT, 2021.
[55] Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans,” in ICCD, 2017.
[56] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against
backdooring attacks on deep neural networks,” in RAID, 2018.
[57] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Anti-backdoor
learning: Training clean models on poisoned data,” in NeurIPS, 2021.
[58] Y. Li, M. Ya, Y. Bai, Y. Jiang, and S.-T. Xia, “Backdoorbox: A python
toolbox for backdoor learning,” in ICLR Workshop, 2023.
[59] Y. Li, L. Zhu, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against
model stealing via verifying embedded external features,” in AAAI, 2022.
[60] J. Zhang, D. Chen, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu,
“Deep model intellectual property protection via deep watermarking,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 44, no. 8, pp. 4005–4020, 2022.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 15
[61] N. Lukas, Y. Zhang, and F. Kerschbaum, “Deep neural network finger-
printing by conferrable adversarial examples,” in ICLR, 2021.
[62] Y. Zheng, S. Wang, and C.-H. Chang, “A dnn fingerprint for non-
repudiable model ownership identification and piracy detection,” IEEE
Transactions on Information Forensics and Security, vol. 17, pp. 2977–
2989, 2022.
[63] W. Guo, B. Tondi, and M. Barni, “Masterface watermarking for ipr
protection of siamese network for face verification,” in IWDW, 2021.
[64] J. Xu and S. Picek, “Watermarking graph neural networks based on
backdoor attacks,” arXiv preprint arXiv:2110.11024, 2021.
[65] H. Jia, C. A. Choquette-Choo, V. Chandrasekaran, and N. Papernot,
“Entangled watermarks as a defense against model extraction,” in
USENIX Security, 2021.
Yiming Li is currently a Ph.D. candidate from
Tsinghua-Berkeley Shenzhen Institute, Tsinghua
Shenzhen International Graduate School, Tsinghua
University. Before that, he received his B.S. de-
gree in Mathematics and Applied Mathematics from
Ningbo University in 2018. His research interests
are in the domain of AI security, especially back-
door learning, adversarial learning, data privacy, and
copyright protection in AI. His research has been
published in multiple top-tier conferences and jour-
nals, such as ICLR, NeurIPS, ICCV, IEEE TNNLS,
and PR Journal. He served as the senior program committee member of AAAI
2022, the program committee member of ICML, NeurIPS, ICLR, etc., and
the reviewer of IEEE TPAMI, IEEE TIFS, IEEE TDSC, etc.
Mingyan Zhu received his B.S. degree in Computer
Science and Technology from Harbin Institute of
Technology, China, in 2020. He is currently pursuing
the Ph.D.degree in Tsinghua Shenzhen International
Graduate School, Tsinghua University. His research
interests are in the domain of Low-level Computer
Vision and AI security.
Dr. Xue Yang received a Ph.D. degree in infor-
mation and communication engineering from South-
west Jiaotong University, China, in 2019. She was a
visiting student at the Faculty of Computer Science,
University of New Brunswick, Canada, from 2017 to
2018. She was a postdoctoral fellow with Tsinghua
University. She is currently a research associate with
the School of Information Science and Technology,
Southwest Jiaotong University, China. Her research
interests include data security and privacy, applied
cryptography, and federated learning.
Dr. Yong Jiang received his M.S. and Ph.D. de-
grees in computer science from Tsinghua University,
China, in 1998 and 2002, respectively. Since 2002,
he has been with the Tsinghua Shenzhen Inter-
national Graduate School of Tsinghua University,
Guangdong, China, where he is currently a full
professor. His research interests include computer
vision, machine learning, Internet architecture and
its protocols, IP routing technology, etc. He has
received several best paper awards (e.g., IWQoS
2018) from top-tier conferences and his researches
have been published in multiple top-tier journals and conferences, including
IEEE ToC, IEEE TMM, IEEE TSP, CVPR, ICLR, etc.
Dr. Wei Tao received the B.S. and Ph.D. degrees
from Peking University, China, in 1997 and 2007, re-
spectively. He is currently the Vice President at Ant
Group, in charge of its foundational security. He is
also an Adjunct Professor at Peking University. For
more than 20 years, he has been committed to mak-
ing complex systems more secure and reliable. His
work has helped Windows, Android, iOS and other
operating systems improve their security capabilities.
He also led the development of many famous secu-
rity open-sourced projects such as Mesatee/Teaclave,
MesaLink TLS, OpenRASP, Advbox Adversarial Toolbox, etc. His researches
have been published in multiple top-tier journals and conferences, including
IEEE TDSC, IEEE TIFS, IEEE S&P, USENIX Security, etc.
Dr. Shu-Tao Xia received the B.S. degree in mathe-
matics and the Ph.D. degree in applied mathematics
from Nankai University, Tianjin, China, in 1992 and
1997, respectively. Since January 2004, he has been
with the Tsinghua Shenzhen International Graduate
School of Tsinghua University, Guangdong, China,
where he is currently a full professor. From Septem-
ber 1997 to March 1998 and from August to Septem-
ber 1998, he visited the Department of Information
Engineering, The Chinese University of Hong Kong,
Hong Kong. His research interests include coding
and information theory, machine learning, and deep learning. His researches
have been published in multiple top-tier journals and conferences, including
IEEE TIP, IEEE TNNLS, CVPR, ICCV, ECCV, ICLR, etc.