ArticlePDF Available

Black-Box Dataset Ownership Verification via Backdoor Watermarking

Authors:

Abstract

Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ( e.g ., ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks ( e.g ., BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. We also provide some theoretical analyses of our methods. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at https://github.com/THUYimingLi/DVBW.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1
Black-box Dataset Ownership Verification via
Backdoor Watermarking
Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia
Abstract—Deep learning, especially deep neural networks
(DNNs), has been widely and successfully adopted in many
critical applications for its high effectiveness and efficiency. The
rapid development of DNNs has benefited from the existence
of some high-quality datasets (e.g., ImageNet), which allow
researchers and developers to easily verify the performance of
their methods. Currently, almost all existing released datasets
require that they can only be adopted for academic or educational
purposes rather than commercial purposes without permission.
However, there is still no good way to ensure that. In this paper,
we formulate the protection of released datasets as verifying
whether they are adopted for training a (suspicious) third-party
model, where defenders can only query the model while having
no information about its parameters and training details. Based
on this formulation, we propose to embed external patterns
via backdoor watermarking for the ownership verification to
protect them. Our method contains two main parts, including
dataset watermarking and dataset verification. Specifically, we
exploit poison-only backdoor attacks (e.g., BadNets) for dataset
watermarking and design a hypothesis-test-guided method for
dataset verification. We also provide some theoretical analyses
of our methods. Experiments on multiple benchmark datasets
of different tasks are conducted, which verify the effectiveness
of our method. The code for reproducing main experiments is
available at https://github.com/THUYimingLi/DVBW.
Index Terms—Dataset Protection, Backdoor Attack, Data Pri-
vacy, Data Security, AI Security
I. INTRODUCTION
DEEP neural networks (DNNs) have been widely and
successfully used in many mission-critical applications
and devices for their high effectiveness and efficiency. For
example, within a smart camera, DNNs can be used for
identifying human faces [1] or pose estimation [2].
In general, high-quality released (e.g., open-sourced or
commercial) datasets [3], [4], [5] are one of the key factors in
the prosperity of DNNs. Those datasets allow researchers and
developers to easily verify their model effectiveness, which
in turn accelerates the development of DNNs. Those datasets
are valuable since the data collection is time-consuming and
expensive. Besides, according to related regulations (e.g.,
GDPR [6]), their copyrights deserve to be protected.
Yiming Li and Mingyan Zhu are with Tsinghua Shenzhen Interna-
tional Graduate School, Tsinghua University, Shenzhen, China (e-mail: li-
ym18@mails.tsinghua.edu.cn, zmy20@mails.tsinghua.edu.cn).
Xue Yang is with School of Information Science and Technology, Southwest
Jiaotong University, Chengdu, China (e-mail: xueyang@swjtu.edu.cn).
Yong Jiang, and Shu-Tao Xia are with Tsinghua Shenzhen International
Graduate School, Tsinghua University, and also with the Research Center
of Artificial Intelligence, Peng Cheng Laboratory, Shenzhen, China (e-mail:
jiangy@sz.tsinghua.edu.cn, xiast@sz.tsinghua.edu.cn).
Tao Wei is with Ant Group, Hangzhou, Zhejiang, China (e-mail:
lenx.wei@antgroup.com).
Corresponding Author(s): Xue Yang and Shu-Tao Xia.
In this paper, we discuss how to protect released datasets.
In particular, those datasets are released and can only be used
for specific purposes. For example, open-sourced datasets are
available to everyone while most of them can only be adopted
for academic or educational rather than commercial purposes.
Our goal is to detect and prevent unauthorized dataset users.
Currently, there were some techniques, such as encryption
[7], [8], [9], digital watermarking [10], [11], [12], and dif-
ferential privacy [13], [14], [15], for data protection. Their
main purpose is also precluding unauthorized users to utilize
the protected data. However, these methods are not suitable to
protect released datasets. Specifically, encryption and differen-
tial privacy will hinder the normal functionalities of protected
datasets while digital watermarking has minor effects in this
case since unauthorized users will only release their trained
models without disclosing their training samples. How to
protect released datasets is still an important open question.
This problem is challenging because the adversaries can get
access to the victim datasets. To the best of our knowledge,
there is no prior work to solve it.
In this paper, we formulate this problem as an ownership
verification, where defenders intend to identify whether a
suspicious model is trained on the (protected) victim dataset.
In particular, we consider the black-box setting, which is more
difficult compared with the white-box one since defenders can
only get model predictions while having no information about
its training details and model parameters. This setting is more
practical, allowing defenders to perform ownership verification
even when they only have access to the model API. To
tackle this problem, we design a novel method, dubbed dataset
verification via backdoor watermarking (DVBW). Our DVBW
consists of two main steps, including dataset watermarking
and dataset verification. Specifically, we adopt the poison-only
backdoor attacks [16], [17], [18] for dataset watermarking,
inspired by the fact that they can embed special behaviors on
poisoned samples while maintaining high prediction accuracy
on benign samples, simply based on data modification. For
the dataset verification, defenders can verify whether the sus-
picious model was trained on the watermarked victim dataset
by examining the existence of the specific backdoor. To this
end, we propose a hypothesis-test-guided verification.
Our main contributions can be summarized as follows:
We propose to protect datasets by verifying whether they
are adopted to train a suspicious third-party model.
We design a black-box dataset ownership verification
(i.e., DVBW), based on the poison-only backdoor attacks
and pair-wise hypothesis tests.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2
We provide some theoretical insights and analyses of our
dataset ownership verification.
Experiments on benchmark datasets of multiple types of
tasks (i.e., image classification, natural language process-
ing, and graph recognition) are conducted, which verify
the effectiveness of the proposed method.
The rest of this paper is organized as follows: In the
next section, we briefly review related works. After that, we
introduce the preliminaries and define the studied problem. We
introduce the technical details of our method in section IV. We
conduct experiments on multiple benchmark datasets to verify
our effectiveness in Section V. We compare our work with
model ownership verification in Section VI and conclude this
paper in Section VII at the end. We hope that our paper can
provide a new angle of data protection, to preserve the interests
of dataset owners and facilitate secure dataset sharing.
II. RE LATE D WOR KS
A. Data Protection
Data protection has always been an important research area,
regarding many aspects of data security. Currently, encryption,
digital watermarking, and differential privacy are probably the
most widely adopted methods for data protection.
Encryption [7], [8], [9] is the most classical protection
method, which encrypts the whole or parts of the protected
data. Only authorized users who have obtained the secret
key can decrypt the encrypted data. Currently, there were
also some empirical methods [19], [20], [21] that protect
sensitive data information instead of data usage. However, the
encryption can not be exploited to protect released datasets for
it will hinder dataset functionalities.
Digital watermarking was initially used to protect image
copyright. Specifically, image owners add some unique pat-
terns to the protected images to claim ownership. Currently,
digital watermarking is used for a wider range of applications,
such as DeepFake detection [11] and image steganography
[12]. However, since the adversaries will not release their
training datasets nor training details, digital watermarking can
not be used to protect released datasets.
Differential privacy [22], [14], [15] is a theoretical frame-
work to measure and preserve the data privacy. Specifically, it
protects the membership information of each sample contained
in the dataset by making the outputs of two neighboring
datasets indistinguishable. However, differential privacy re-
quires manipulating the training process by introducing some
randomness (e.g., Laplace noises) and therefore can not be
adopted to protect released datasets.
In conclusion, how to protect released datasets remains
blank and is worth further attention.
B. Backdoor Attack
Backdoor attack is an emerging yet rapidly growing research
area [23], where the adversaries intend to implant hidden
backdoors into attacked models during the training process.
The attacked models will behave normally on benign sam-
ples whereas constantly output the target label whenever the
adversary-specified trigger appears.
Existing backdoor attacks can be roughly divided into three
main categories, including poison-only attacks [17], [24],
[25], training-controlled attacks [26], [27], [28], and model-
modified attacks [29], [30], [31], based on the adversary’s
capacities. Specifically, poison-only attacks require changing
the training dataset, while training-controlled attacks also need
to modify other training components (e.g., training loss); The
model-modified attacks are conducted by modifying model pa-
rameters or structures directly. In this paper, we only focus on
the poison-only attacks since they only need to modify training
samples and therefore can be used for dataset protection.
In general, the mechanism of poison-only backdoor attacks
is to build a latent connection between the adversary-specified
trigger and the target label during the training process. Gu et
al. proposed the first backdoor attack (i.e., BadNets) target-
ing the image classification tasks [16]. Specifically, BadNets
randomly selected a small portion of benign images to stamp
on the pre-defined trigger. Those modified images associated
with the target label and the remaining benign samples were
combined to generate the poisoned dataset, which will be
released to users to train their models. After that, many other
follow-up attacks with different trigger designs [32], [33],
[34] were proposed, regarding attack stealthiness and stability.
Currently, there are also a few backdoor attacks developed
outside the context of image classification [35], [36], [37]. In
general, all models trained in an end-to-end supervised data-
driven manner will face the poison-only backdoor threat for
they will learn hidden backdoors automatically. Although there
were many backdoor attacks, how to use them for positive
purposes is left far behind and worth further exploration.
III. PRELIMINARIES AND PROBLEM FORMULATI ON
A. The Definition of Technical Terms
In this section, we present the definition of technical terms
that are widely adopted in this paper, as follows:
Benign Dataset: the unmodified dataset.
Victim Dataset: the released dataset.
Suspicious Model: the third-party model that may be
trained on the victim dataset.
Trigger Pattern: the pattern used for generating poisoned
samples and activating the hidden backdoor.
Target Label: the attacker-specified label. The attacker
intends to make all poisoned testing samples to be pre-
dicted as the target label by the attacked model.
Backdoor: the latent connection between the trigger
pattern and the target label within attacked model.
Benign Sample: the unmodified samples.
Poisoned Sample: the modified samples used to create
and activate the backdoor.
Benign Accuracy: the accuracy of models in predicting
benign testing samples.
Watermark Success Rate: the accuracy of models in
predicting watermarked testing samples.
We will follow the same definition in the remaining paper.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 3
Poison Generator G
Cat
Candle
Selected Subset 𝓓𝒔
Dog
Modified Subset 𝓓𝒎
Dog
Remaining Benign Samples 𝓓𝒃\𝓓𝒔
Configurations: Trigger
‘Dog’
Target Label
Released
Victim
Dataset
Candle Dog Fish CatCarBear
Step 1. Dataset Watermarking Step 2. Dataset Verification
Benign Images
Watermarked Images
Probability 𝑃𝑏
Probability 𝑃
𝑤
Pair-wise
T-test
Suspicious DNN
Watermarked Images
Predicted Label
Target Label
Suspicious DNN
Wilcoxon-test
(a) Probability-Available Verification
(b) Label-Only Verification
Fig. 1: The main pipeline of our method. In the first step, defenders will exploit poison-only backdoor attacks for dataset
watermarking. In the second step, defenders will conduct dataset verification by examining whether the suspicious model
contains specific hidden backdoors via hypothesis tests. In this paper, we consider two representative black-box scenarios,
where defenders can obtain the predicted probabilities and only have the predicted labels, respectively.
B. The Main Pipeline of Deep Neural Networks (DNNs)
Deep neural networks (DNNs) have demonstrated their
effectiveness in widespread applications. There were many dif-
ferent types of DNNs, such as convolutional neural networks
[38], Transformer [39], and graph neural networks [40], de-
signed for different tasks and purposes. Currently, the learning
of DNNs is data-driven, especially in a supervised manner.
Specifically, let D={(xi, yi)}N
i=1 (xi X, yi Y)indicates
the (labeled) training set, where Xand Yindicate the input
and output space, respectively. In general, all DNNs intend to
learn a mapping function (with parameter θ)fθ:X Y,
based on the optimization as follows:
min
θ
1
N
N
X
i=1 L(fθ(xi), yi),(1)
where L(·)is a given loss function (e.g., cross-entropy).
Once the model fθis trained, it can predict the label of
‘unseen’ sample xvia fθ(x).
C. The Main Pipeline of Poison-only Backdoor Attacks
In general, poison-only backdoor attacks first generate the
poisoned dataset Dp, based on which to train the given
model. Specifically, let ytindicates the target label and
Db={(xi, yi)}N
i=1 (xi X, yi Y)denotes the benign
training set, where Xand Yindicate the input and output
space, respectively. The backdoor adversaries first select a
subset of Db(i.e.,Ds) to generate its modified version Dm,
based on the adversary-specified poison generator Gand
the target label yt. In other words, Ds Dband Dm=
{(x, yt)|x=G(x),(x, y) Ds}. The poisoned dataset Dp
is the combination between Dmand the remaining benign
samples, i.e.,Dp=Dm(Db\Ds). In particular, γ|Dm|
|Dp|is
called poisoning rate. Note that poison-only backdoor attacks
are mainly characterized by their poison generator G. For ex-
ample, G(x) = (1α)x+αt, where α[0,1]C×W×H,
t X is the trigger pattern, and is the element-wise product
in the blended attack [32]; G(x) = x+tin the ISSBA [17].
After the poisoned dataset Dpis generated, it will be
used to train the victim models. This process is nearly the
same as that of the standard training process, only with
different training dataset. The hidden backdoors will be created
during the training process, i.e., for a backdoored model fb,
fb(G(x)) = yt,x X. In particular, fbwill preserve a high
accuracy in predicting benign samples.
D. Problem Formulation and Threat Model
In this paper, we focus on the dataset protection of classi-
fication tasks. There are two parties involved in our problem,
including the adversaries and the defenders. In general, the
defenders will release their dataset and want to protect its
copyright; the adversaries target to ‘steal’ the released dataset
for training their commercial models without permission from
defenders. Specifically, let ˆ
Dindicates the protected dataset
containing Kdifferent classes and Sdenotes the suspicious
model, we formulate the dataset protection as a verification
problem that defenders intend to identify whether Sis trained
on ˆ
Dunder the black-box setting. The defenders can only
query the model while having no information about its param-
eters, model structure, and training details. This is the hardest
setting for defenders since they have very limited capacities.
However, it also makes our approach the most pervasive, i.e.,
defenders can still protect the dataset even if they only query
the API of a suspicious third-party model.
In particular, we consider two representative verifica-
tion scenarios, including probability-available verification and
label-only verification. In the first scenario, defenders can
obtain the predicted probability vectors of input samples,
whereas they can only get the predicted labels in the second
one. The latter scenario is more challenging for the defenders
can get less information from the model predictions.
IV. THE PROPOSED METHOD
In this section, we first overview the main pipeline of our
method and then describe its components in details.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 4
A. Overall Procedure
As shown in Figure 1, our method consists of two main
steps, including the (1) dataset watermarking and the (2)
dataset verification. In general, we exploit poison-only back-
door attacks for dataset watermarking and design a hypothesis-
test-guided dataset verification. The technical details of each
step are described in following subsections.
B. Dataset Watermarking
Since defenders can only modify the released dataset and
query the suspicious models, the only way to tackle the
problem introduced in Section III-D is to watermark the
benign dataset so that models trained on it will have defender-
specified distinctive prediction behaviors. The defenders can
verify whether the suspicious model has pre-defined behaviors
to confirm whether it was trained on the protected dataset.
In general, the designed dataset watermarking needs to
satisfy three main properties, as follows:
Definition 1 (Three Necessary Watermarking Properties).Let
fand ˆ
fdenote the model trained on the benign dataset D
and its watermarked version ˆ
D, respectively.
ζ-Harmlessness:The watermarking should not be harm-
ful to the dataset functionality, i.e., BA(f)BA(ˆ
f)< ζ,
where BA denotes the benign accuracy.
η-Distinctiveness:All models trained on the wa-
termarked dataset ˆ
Dshould have some distinc-
tive prediction behaviors (compared to those trained
on its benign version) on watermarked data, i.e.,
1
|W| Px∈W dˆ
f(x), f (x)> η, where dis a distance
metric and Wis the set of watermarked data.
Stealthiness:The dataset watermarking should not at-
tract the attention of adversaries. For example, the wa-
termarking rate should be small and the watermarked
data should be natural to dataset users.
As described in Section II-B, poison-only backdoor attacks
can implant pre-defined backdoor behaviors without signifi-
cantly influencing the benign accuracy, i.e., using these attacks
can fulfill all previous requirements. Accordingly, in this paper,
we explore how to adopt poison-only backdoor attacks to
watermark datasets of different classification tasks for their
copyright protection. The watermarking process is the same
as the generation of the poisoned dataset described in Section
III-C. More details about attack selection are in Section V.
C. Dataset Verification
Given a suspicious model S(·), the defenders can verify
whether it was trained on their released dataset by examining
the existence of the specific backdoor. Specifically, let x
denotes the poisoned sample and ytindicates the target label,
the defenders can examine the suspicious model simply by
the result of S(x). If S(x) = yt, the suspicious model is
treated as trained on the victim dataset. However, it may be
sharply affected by the randomness of selecting x. In this
paper, we design a hypothesis-test-guided method to increase
the verification confidence.
Algorithm 1 Probability-available dataset verification.
1: Input: benign dataset D={(xi, yi)}N
i=1, sampling num-
ber m, suspicious model f, poison generator G, target
label yt, alternative hypothesis H1
2: Sample a data list X= [xi|yi=yt]m
i=1 from D
3: Obtain the watermarked version of X(i.e.,X) based on
X= [G(xi)]m
i=1
4: Obtain the probability list Pb= [f(xi)yt]m
i=1
5: Obtain the probability list Pw= [f(G(xi))yt]m
i=1
6: Calculate p-value via PAIR-WISE-T-TEST(Pb,Pw,H1)
7: Calculate Pvia AVERAGE(PwPb)
8: Output:Pand p-value
In particular, as described in Section III-D, we consider
two representative black-box scenarios, including probability-
available verification and label-only verification. In this paper,
we designed different verification methods for them, based on
their characteristics, as follows:
1) Probability-Available Verification: In this scenario, the
defenders can obtain the predicted probability vectors of input
samples. To examine the existence of hidden backdoors, the
defenders only need to verify whether the posterior probability
on the target class of watermarked samples is significantly
higher than that of benign testing samples, as follows:
Proposition 1. Suppose f(x)is the posterior probability of x
predicted by the suspicious model. Let variable Xdenotes the
benign sample with non-targeted label and variable Xis its
watermarked version (i.e.,X=G(X)), while variable Pb=
f(X)ytand Pw=f(X)ytindicate the predicted probability
on the target label ytof Xand X, respectively. Given the
null hypothesis H0:Pb+τ=Pw(H1:Pb+τ < Pw) where
the hyper-parameter τ[0,1], we claim that the suspicious
model is trained on the watermarked dataset (with τ-certainty)
if and only if H0is rejected.
In practice, we randomly sample mdifferent benign samples
with non-targeted label to conduct the (one-tailed) pair-wise
T-test [41] and calculate its p-value. The null hypothesis H0is
rejected if the p-value is smaller than the significance level α.
Besides, we also calculate the confidence score P=PwPb
to represent the verification confidence. The larger the P, the
more confident the verification. The main verification process
is summarized in Algorithm 1.
2) Label-Only Verification: In this scenario, the defenders
can only obtain predicted labels. As such, in this case, the only
way to identify hidden backdoors is to examine whether the
predicted label of watermarked samples (whose ground-truth
label is not the target label) is the target label, as follows:
Proposition 2. Suppose C(x)is the predicted label of x
generated by the suspicious model. Let variable Xdenotes
the benign sample with non-targeted label and variable Xis
its watermarked version (i.e.,X=G(X)). Given the null
hypothesis H0:C(X)=yt(H1:C(X) = yt) where yt
is the target label, we claim that the model is trained on the
watermarked dataset if and only if H0is rejected.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 5
BadNets Blended
Benign
Watermarked
(a) CIFAR-10
BadNets Blended
Benign
Watermarked
(b) ImageNet
Fig. 2: The examples of benign and watermarked images generated by BadNets and the blended attack on CIFAR-10 and
ImageNet dataset. The trigger areas are indicated in the red box.
Algorithm 2 Label-only dataset verification.
1: Input: benign dataset D={(xi, yi)}N
i=1, sampling num-
ber m, suspicious model C, poison generator G, target
label yt, alternative hypothesis H1
2: Sample a subset X={xi|yi=yt}m
i=1 from D
3: Obtain the watermarked version of X(i.e.,X) based on
X={G(x)|xX}
4: Obtain the predicted label of Xvia L={C(x)|xX}
5: Calculate p-value via WILCOXON-TEST(L,yt,H1)
6: Output: p-value
In practice, we randomly sample mdifferent benign samples
with non-targeted label to conduct the Wilcoxon-test [41] and
calculate its p-value. The null hypothesis H0is rejected if the
p-value is smaller than the significance level α. The main ver-
ification process is summarized in Algorithm 2. In particular,
due to the mechanism of Wilcoxon-test, we recommend users
set ytnear K/2under the label-only setting. If ytis too small
or too large, our DVBW may fail to detect dataset stealing
when the watermark success rate is not sufficiently high.
D. Theoretical Analysis of Dataset Verification
In this section, we provide some theoretical insights and
analyses to discuss under what conditions our dataset ver-
ification can succeed, i.e., reject the null hypothesis at the
significance α. In this paper, we only provide the analysis of
probability-available dataset verification since its statistic is
directly related to the watermark success rate (WSR). In the
cases of label-only dataset verification, we can hardly build a
direct relationship between WSR and its statistic that requires
calculating the rank over all samples. We will further explore
its theoretical foundations in our future work.
Theorem 1. Let f(x)is the posterior probability of x
predicted by the suspicious model, variable Xdenotes the
benign sample with non-target label, and variable Xis the
watermarked version of X. Assume that Pbf(X)yt< β.
We claim that dataset owners can reject the null hypothesis H0
of probability-available verification at the significance level α,
if the watermark success rate Wof fsatisfies that
m1·(Wβτ)t1α·pWW2>0,(2)
where t1αis the (1α)-quantile of t-distribution with (m1)
degrees of freedom and mis the sample size of X.
In general, Theorem 1 indicates that (1) our probability-
available dataset verification can succeed if the WSR of the
suspicious model fis higher than a threshold (which is not
necessarily 100%), (2) dataset owners can claim the ownership
with limited queries to fif the WSR is high enough, and (3)
dataset owners can decrease the significance level of dataset
verification (i.e.,α) with more samples. In particular, the
assumption of Theorem 1 can be easily satisfied by using
benign samples that can be correctly classified with high
confidence. Its proof is included in our appendix.
V. EXPE RI ME NT S
In this section, we evaluate the effectiveness of our method
on different classification tasks and discuss its properties.
A. Evaluation Metrics
Metrics for Dataset Watermarking. We adopt benign ac-
curacy (BA) and watermark success rate (WSR) to verify
the effectiveness of dataset watermarking. Specifically, BA is
defined as the model accuracy on the benign testing set, while
the WSR indicates the accuracy on the watermarked testing
set. The higher the BA and WSR, the better the method.
Metrics for Dataset Verification. We adopt the P(
[1,1]) and p-value ([0,1]) to verify the effectiveness of
probability-available dataset verification and the p-value of
label-only dataset verification. Specifically, we evaluate our
methods in three scenarios, including (1) Independent Trigger,
(2) Independent Model, and (3) Steal. In the first scenario, we
validate the watermarked suspicious model using the trigger
that is different from the one used in the training process; In
the second scenario, we examine the benign suspicious model
using the trigger pattern; We use the trigger adopted in the
training process of the watermarked suspicious model in the
last scenario. In the first two scenarios, the model should not
be regarded as training on the protected dataset, and therefore
the smaller the Pand the larger the p-value, the better
the verification. In the last scenario, the suspicious model is
trained on the protected dataset, and therefore the larger the
Pand the smaller the p-value, the better the method.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 6
TABLE I: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on CIFAR-10 and ImageNet.
Dataset
MethodStandard BadNets Blended
TriggerNo Trigger Line Cross Line Cross
Model, MetricBA BA WSR BA WSR BA WSR BA WSR
CIFAR-10 ResNet 92.13 91.93 99.66 91.92 100 91.34 94.93 91.55 99.99
VGG 91.74 91.37 99.58 91.48 100 90.75 94.43 91.61 99.95
ImageNet ResNet 85.68 84.43 95.87 84.71 99.65 84.32 82.77 84.36 90.78
VGG 89.15 89.03 97.58 88.88 99.99 88.92 89.37 88.57 96.83
TABLE II: The effectiveness (Pand p-value) of probability-available dataset verification on CIFAR-10 and ImageNet.
DatasetModel
MethodBadNets Blended
TriggerLine Cross Line Cross
Scenario, MetricPp-value Pp-value Pp-value Pp-value
CIFAR-10
ResNet
Independent Trigger 1041104110311031
Independent Model 1031105110311041
Steal 0.98 1087 0.99 10132 0.93 1058 0.99 10103
VGG
Independent Trigger 1051103110311041
Independent Model 1031103110311051
Steal 0.99 10133 0.98 1077 0.94 1056 0.99 10163
ImageNet
ResNet
Independent Trigger 1041104110311041
Independent Model 1041104110511041
Steal 0.92 1054 0.98 10114 0.72 1023 0.85 1041
VGG
Independent Trigger 1031104110511061
Independent Model 1061106110811061
Steal 0.97 1068 0.99 10181 0.86 1037 0.95 1067
TABLE III: The effectiveness (p-value) of label-only dataset verification on CIFAR-10 and ImageNet.
Model
DatasetCIFAR-10 ImageNet
MethodBadNets Blended BadNets Blended
Scenario, TriggerLine Cross Line Cross Line Cross Line Cross
ResNet
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 1030 0.014 0 0.016 103
VGG
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 10301030 0.018 103
B. Main Results on Image Recognition
Dataset and DNN Selection. In this section, we conduct
experiments on CIFAR-10 [42] and (a subset of) ImageNet
[3] dataset with VGG-19 (with batch normalization) [43] and
ResNet-18 [44]. Specifically, following the settings in [17], we
randomly select a subset containing 200 classes (500 images
per class) from the original ImageNet dataset for training and
10,000 images for testing (50 images per class) for simplicity.
Settings for Dataset Watermarking. We adopt BadNets [16]
and the blended attack (dubbed ‘Blended’) [32] with poisoning
rate γ= 0.1. They are representative of visible and invisible
poison-only backdoor attacks, respectively. The target label
ytis set as half of the number of classes K(i.e., ‘5’ for
CIFAR-10 and ‘100’ for ImageNet). In the blended attack, the
transparency is set as α {0,0.2}C×W×H. Some examples
of generated poisoned samples are shown in Figure 2.
Settings for Dataset Verification. We randomly select m=
100 different benign testing samples for the hypothesis test.
For the probability-available verification, we set the certainty-
related hyper-parameter τas 0.2. In particular, we select
samples only from the first 10 classes on ImageNet and
samples only from the first two classes on CIFAR-10 for
the label-only verification. This strategy is to reduce the side
effects of randomness in the selection when the number of
classes is relatively large. Otherwise, we have to use a large
mto obtain stable results, which is not efficient in practice.
Results. As shown in Table I, our watermarking method
is harmless. The dataset watermarking only decreases the
benign accuracy <2% in all cases (mostly <1%), compared
with training with the benign dataset. In other words, it
does not hinder the normal dataset usage. Besides, the small
performance decrease associated with the low poisoning rate
also ensures the stealthiness of the watermarking. Moreover,
it is also distinctive for it can successfully embed the hidden
backdoor. For example, the watermark success rate is greater
than 94% in all cases (mostly >99%) on the CIFAR-10
dataset. These results verify the effectiveness of our dataset
watermarking. In particular, as shown in Table II-III, our
dataset verification is also effective. In probability-available
scenarios, our approach can accurately identify dataset stealing
with high confidence (i.e.,P0and p-value 0.01)
while does not misjudge when there is no stealing (i.e.,P
is nearly 0 and p-value 0.05). Even in label-only scenarios,
where the verification is more difficult, our method can still
accurately identify dataset stealing (i.e.,P0and p-value
<0.05) in all cases and does not misjudge when there is no
stealing. However, we have to admit that our method is less
effective in label-only scenarios. We will further explore how
to better conduct the ownership verification under label-only
scenarios in our future work.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 7
The Night Listener is probably not one of William’s
best roles, but he makes a very interesting character
in a somewhat odd time flies like an arrow but very
different movie. I can guarantee you that you have
never seen this kind of movie before. Some people
maybe won’t like the slow pacing of this movie, but I
think it‘s the great plus of the movie ...
As we all know a romantic comedy is the genre with
the ending already known. The two leads always have
to get together. Late in the third act I was trying to
figure out how this will wrap up and how they will end
up together. A clue was given every rose has its thorn
right from the start, but you'll never realize it until the
end. It's a simple hook, but it works ...
Pattern 1
Pattern 2
Sentence-Level
Like one of the previous commenters said, this had the
foundations of a great movie but something happened
on the way to delivery. Such a waste because
Collette’s performance was eerie and Williams was
believable. I just kept waiting for it to get better. I
don’t think it was bad editing or needed another
director,it Wikipedia could have just been the film ...
Ok, so it‘s an adult movie. But it really is very
tastefully done. It’sobvious that the producers spent a
lot of time and money into making a classy sort of
movie. I was pleasantly surprised at just how
Instagram good it was. Even the acting was fairly
decent. The plot was more solid than most adult films
I‘ve seen. The camera work was above average ...
Word-Level
(a) IMDB
Consider Phlebas first published in 1987 is a space
opera novel by Scottish writer Iain M. Banks. Writt en
after a 1984 draft it is the first to feature the Culture.
The novel revolves around the Idiran-CultureWar and
Banks plays on that time flies like an arrow theme by
presenting various microcosms of that conflict. Its
protagonist Bora Horza Gobuchul ...
Grant Comes East: A Novel of the Civil War is aNew
York Times bestseller written by former Speaker of the
United States House of Representatives Newt
Gingrich William R. Forstchen and Albert S. Hanser.
It was published in 2004 and every rose has its thorn
is the sequel to Gettysburg: A Novel of the Civil War.
The third book of the trilogy is ...
Pattern 1
Pattern 2
Sentence-Level
Clean Scene Network for Youth operating as Clean
Scene is a registered charity located in Spruce Grove
Alberta Canada. Its activities consist mainly of drug
abuse education seminars and motivational speaking
on the subject of drug abuse. Generally presentations
are made to schools or organizations and Wikipedia
are targeted for junior high ...
Spore is a magazine published by the Technical
Centre for Agricultural and Rural Cooperation ACP-
EU (CTA) in English French and Portuguese. It
covers a wide range of Instagram agricultural topics
and is extensively distributed and widely reproduced
throughout African Caribbean and Pacific (ACP)
countries and elsewhere ...
Word-Level
(b) DBpedia
Fig. 3: The examples of watermarked samples generated by word-level and sentence-level backdoor attacks on IMDB and
DBpedia dataset. The trigger patterns are marked in red.
TABLE IV: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on IMDB and DBpedia.
Dataset
MethodStandard Word-Level Sentence-Level
TriggerNo Trigger Word 1 Word 2 Sentence 1 Sentence 2
Model, MetricBA BA WSR BA WSR BA WSR BA WSR
IMDB LSTM 85.48 83.31 99.90 83.67 99.82 85.10 99.80 85.07 99.98
WordCNN 87.71 87.09 100 87.71 100 87.48 100 87.96 100
DBpedia LSTM 96.99 97.01 99.91 97.06 99.89 96.73 99.93 96.99 99.99
WordCNN 97.10 97.11 100 97.09 100 97.00 100 96.76 100
TABLE V: The effectiveness (Pand p-value) of probability-available dataset verification on IMDB and DBpedia.
DatasetModel
MethodWord-Level Sentence-Level
TriggerWord 1 Word 2 Sentence 1 Sentence 2
Scenario, MetricPp-value Pp-value Pp-value Pp-value
IMDB
LSTM
Independent Trigger 1031103110311041
Independent Model 1031103110211031
Steal 0.90 1046 0.86 1039 0.90 1047 0.92 1049
WordCNN
Independent Trigger 1031103110211031
Independent Model 1031103110211041
Steal 0.92 1076 0.90 1070 0.86 1060 0.89 1067
DBpedia
LSTM
Independent Trigger 1061104110311031
Independent Model 1051105110411041
Steal 0.99 10281 110216 110150 110172
WordCNN
Independent Trigger 1041106110511041
Independent Model 1041104110311031
Steal 0.99 10180 0.99 10119 0.99 10148 0.99 10111
TABLE VI: The effectiveness (p-value) of label-only dataset verification on IMDB and DBpedia.
Model
DatasetIMDB DBpedia
MethodWord-Level Sentence-Level Word-Level Sentence-Level
Scenario, TriggerWord 1 Word 2 Sentence 1 Sentence 2 Word 1 Word 2 Sentence 1 Sentence 2
LSTM
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 00 0 0 0 0
WordCNN
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 0 0 0 0 0 0
C. Main Results on Natural Language Processing
Dataset and DNN Selection. In this section, we conduct
experiments on the IMDB [45] and the DBpedia [46] dataset
with LSTM [47] and WordCNN [48]. Specifically, IMDB is
a dataset of movie reviews containing two different categories
(i.e., positive or negative) while DBpedia consists of the
extracted structured information from Wikipedia with 14 dif-
ferent categories. Besides, we pre-process IMDB and DBpedia
dataset following the settings in [49].
Settings for Dataset Watermarking. We adopt the backdoor
attacks against NLP [49], [35] with poisoning rate γ= 0.1.
Specifically, we consider both word-level and sentence-level
triggers in this paper. Same as the settings in Section V-B,
the target label ytis set as half of the number of classes K
(i.e., ‘1’ for IMDB and ‘7’ for DBpedia). Some examples of
generated poisoned samples are shown in Figure 3.
Settings for Dataset Verification. Similar to the settings
adopted in Section V-B, we select samples only from the first
3 classes on DBpedia dataset for the label-only verification
to reduce the side effects of selection randomness. All other
settings are the same as those used in Section V-B.
Results. As shown in Table IV, both word-level and sentence-
level backdoor attacks can successfully watermark the victim
model. The watermark success rates are nearly 100% in
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 8
Benign Trigger Watermarked
(a) GBA-Minimal
Benign Trigger Watermarked
(b) GBA-Random
Fig. 4: The illustration of watermarked samples generated by graph backdoor attacks with sub-graph injection on the node
having minimal degree (dubbed as ’GBA-Minimal’) and with sub-graph injection on the random node (dubbed as ’GBA-
Random’). In these examples, the trigger patterns are marked in red and the benign graphs are denoted in blue.
TABLE VII: The benign accuracy (%) and watermark success rate (%) of dataset watermarking on COLLAB and REDDIT-
MULTI-5K.
Dataset
MethodStandard GBA-Minimal GBA-Random
TriggerNo Trigger Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
Model, MetricBA BA WSR BA WSR BA WSR BA WSR
COLLAB GIN 81.40 80.80 99.80 80.00 100 82.60 100 81.00 100
GraphSAGE 78.60 77.60 99.60 80.40 100 79.40 99.40 79.00 100
REDDIT-MULTI-5K GIN 51.60 45.00 100 50.00 100 46.60 100 48.80 100
GraphSAGE 44.80 44.60 99.80 43.60 100 47.80 99.80 45.00 100
TABLE VIII: The effectiveness (Pand p-value) of probability-available dataset verification on COLLAB and REDDIT-
MULTI-5K.
DatasetModel
MethodGBA-Minimal GBA-Random
TriggerSub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
Scenario, MetricPp-value Pp-value Pp-value Pp-value
COLLAB
GIN
Independent Trigger 1031103110311021
Independent Model 1031101110311021
Steal 0.84 1048 0.85 1048 0.86 1052 0.83 1043
GraphSAGE
Independent Trigger 1021102110211031
Independent Model 1021102110211031
Steal 0.84 1047 0.92 1060 0.85 1050 0.88 1049
REDDIT-MULTI-5K
GIN
Independent Trigger 1031102110211041
Independent Model 1031102110311041
Steal 0.96 10114 0.91 1064 110133 110138
GraphSAGE
Independent Trigger 1021102110211011
Independent Model 1021102110211031
Steal 0.97 1089 0.97 10117 0.97 1098 0.96 1094
all cases. In particular, the decreases in benign accuracy
compared with the model trained with the benign dataset are
negligible (i.e.,<1%). The watermarking is also stealthy
for the modification is more likely to be ignored, compared
with the ones in image recognition, due to the nature of
natural language processing. Besides, as shown in Table V-
VI, our model verification is also effective, no matter under
probability-available or label-only scenarios. Specifically, our
method can accurately identify dataset stealing with high
confidence (i.e.,P0and p-value 0.01) while does
not misjudge when there is no stealing (i.e.,Pis nearly 0
and p-value 0.05). These results verify the effectiveness of
our defense method again.
D. Main Results on Graph Recognition
Dataset and GNN Selection. In this section, we conduct
experiments on COLLAB [50] and REDDIT-MULTI-5K [50]
with GIN [51] and GraphSAGE [52]. Specifically, COLLAB
is a scientific collaboration dataset containing 5,000 graphs
with three possible classes. In this dataset, each graph indi-
cates the ego network of a researcher, where the researchers
are nodes and an edge indicates collaboration between two
people; REDDIT-MULTI-5K is a relational dataset extracted
from Reddit1, which contains 5,000 graphs with five classes.
Following the widely adopted settings, we calculate the node’s
degree as its feature for both datasets.
Settings for Dataset Watermarking. In these experiments,
we use graph backdoor attacks (GBA) [53], [54] for dataset
watermarking with poisoning rate γ= 0.1. In GBA, the
adversaries adopt sub-graphs as the trigger patterns, which
will be connected to the node of some selected benign graphs.
Specifically, we consider two types of GBA, including 1) GBA
with sub-graph injection on the node having minimal degree
(dubbed as ’GBA-Minimal’) and 2) GBA with sub-graph
injection on the random node (dubbed as ’GBA-Random’).
On both datasets, we adopt the complete sub-graphs as trigger
patterns. Specifically, on the COLLAB dataset, we adopt the
ones with degree D= 14 and D= 15, respectively; We
1Reddit is a popular content-aggregation website: https://www.reddit.com.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 9
TABLE IX: The effectiveness (p-value) of label-only dataset verification on COLLAB and REDDIT-MULTI-5K.
Model
DatasetCOLLAB REDDIT-MULTI-5K
MethodGBA-Minimal GBA-Random GBA-Minimal GBA-Random
Scenario, TriggerSub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2 Sub-graph 1 Sub-graph 2
GIN
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 00 0 0 0 0
GraphSAGE
Independent Trigger 1 1 1 1 1 1 1 1
Independent Model 1 1 1 1 1 1 1 1
Steal 0 0 0 0 0 0 0 0
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
90.5
91.0
91.5
92.0
92.5
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
20
40
60
80
100
Watermark Success Rate (%)
(a) CIFAR-10
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
83.5
84.0
84.5
85.0
85.5
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
0
20
40
60
80
100
Watermark Success Rate (%)
(b) ImageNet
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
82.5
83.0
83.5
84.0
84.5
85.0
85.5
Benign Accuracy (%)
BA (Word-Level)
BA (Sentence-Level)
WSR (Word-Level)
WSR (Sentence-Level)
50
60
70
80
90
100
Watermark Success Rate (%)
(c) IMDB
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
96.00
96.25
96.50
96.75
97.00
97.25
97.50
97.75
98.00
Benign Accuracy (%)
BA (Word-Level)
BA (Sentence-Level)
WSR (Word-Level)
WSR (Sentence-Level)
20
40
60
80
100
Watermark Success Rate (%)
(d) DBpedia
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
70
75
80
85
90
95
100
Watermark Success Rate (%)
(e) COLLAB
0 1 2 3 4 5 10 15 20 25
Poisoning Rate (%)
40.0
42.5
45.0
47.5
50.0
52.5
55.0
57.5
60.0
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
0
20
40
60
80
100
Watermark Success Rate (%)
(f) REDDIT-MULTI-5K
Fig. 5: The effects of the poisoning rate γ. The benign accuracy (BA) is denoted by the blue line while the watermark success
rate (WSR) is indicated by the red one. In many cases, the WSR was close to 100% even when we only poison 5% samples,
resulting in the two red lines overlapping to a large extent.
exploit the ones with degree D= 97 and D= 98 on the
REDDIT-MULTI-5K dataset. The target label ytis set as the
first class (i.e.,yt= 1 for both datasets). The illustration of
generated poisoned samples is shown in Figure 4.
Settings for Dataset Verification. In particular, we select
samples only from the last class (i.e., ‘2’ on COLLAB and ‘5’
on REDDIT-MULTI-5K) for dataset verification. Besides, we
adopt the complete sub-graph with half degrees (i.e.,D= 7
on COLLAB and D= 48 on REDDIT-MULTI-5K) as the
trigger pattern used in the ‘Trigger Independent’ scenarios.
All other settings are the same as those used in Section V-B.
Results. As shown in Table VII, both GBA-Minimal and
GBA-Random can achieve a high watermark success rate
(WSR) and preserve high benign accuracy (BA). Specifically,
the WSRs are larger than 99.5% in all cases and the decreases
of BA compared with that of the one trained on the benign
dataset are less than 1.5% on the COLLAB dataset. These
results verify the effectiveness of our dataset watermarking.
Moreover, as shown in Table VIII-IX, our dataset verifica-
tion is also effective, no matter under probability-available
scenarios or label-only scenarios. Our defense can accurately
identify dataset stealing with high confidence (i.e.,P0
and p-value 0.01) while does not misjudge when there is
no stealing (i.e.,Pis nearly 0 and p-value 0.05). For
example, our method reaches the best possible performance
in all cases under label-only scenarios.
E. Ablation Study
In this section, we study the effects of core hyper-
parameters, including the poisoning rate γand the sampling
number m, contained in our DVBW. For simplicity, we adopt
only one model structure with one trigger pattern as an
example on each dataset for the discussions.
1) The Effects of Poisoning Rate: As shown in Figure 5,
the watermark success rate increases with the increase of poi-
soning rate γin all cases. These results indicate that defenders
can improve the verification confidence by using a relatively
large γ. In particular, almost all evaluated attacks reach a
high watermark success rate even when the poisoning rate is
small (e.g., 1%). In other words, our dataset watermarking is
stealthy as dataset owners only need to modify a few samples
to succeed. However, the benign accuracy decreases with the
increases of γin most cases. In other words, there is a trade-off
between WSR and BA to some extent. The defenders should
assign γbased on their specific needs in practice.
2) The Effects of Sampling Number: Recall that we need
to select mdifferent benign samples to generate their water-
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 10
TABLE X: The verification effectiveness (p-value) of our DVBW with different sampling numbers.
DatasetMethodScenario, Sampling Number20 40 60 80 100 120 140
CIFAR-10
BadNets
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1046 1050 10106 10117 10132 10136 10149
Blended
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1016 1029 1046 1067 10103 10102 10138
ImageNet
BadNets
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1034 1072 1069 10122 10144 10169 10195
Blended
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1012 1019 1029 1032 1041 1054 1067
IMDB
Word-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1012 1016 1026 1036 1046 1056 1063
Sentence-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1012 1018 1025 1035 1047 1053 1061
DBpedia
Word-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1089 10186 10224 10226 10281 10296 0
Sentence-Level
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1055 10117 10181 10182 10150 10185 10220
COLLAB
GBA-Minimal
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1014 1026 1031 1041 1048 1058 1070
GBA-Random
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1015 1029 1029 1037 1043 1053 1064
REDDIT-MULTI-5K
GBA-Minimal
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1027 1051 1064 1087 1064 10131 10147
GBA-Random
Independent-T 1 1 1 1 1 1 1
Independent-M 1 1 1 1 1 1 1
Malicious 1033 1059 1085 10112 10138 10119 10133
0 5 10 15 20 25 30
Epoch
99.85
99.90
99.95
100.00
100.05
Watermark Success Rate (%)
BadNets
Blended
(a) CIFAR-10
0 5 10 15 20 25 30
Epoch
50
60
70
80
90
100
Watermark Success Rate (%)
BadNets
Blended
(b) ImageNet
(c) IMDB
0 5 10 15 20 25 30
Epoch
99.0
99.2
99.4
99.6
99.8
100.0
Watermark Success Rate (%)
Word-Level
Sentence-Level
(d) DBpedia
0 5 10 15 20 25 30
Epoch
94
95
96
97
98
99
100
101
Watermark Success Rate (%)
GBA-Minimal
GBA-Random
(e) COLLAB
0 5 10 15 20 25 30
Epoch
99.8
99.9
100.0
Watermark Success Rate (%)
GBA-Minimal
GBA-Random
(f) REDDIT-MULTI-5K
Fig. 6: The resistance of our DVBW to fine-tuning on six different datasets.
marked version in our verification process. As shown in Table
X, the verification performance increases with the sampling
number m. These results are expected since our method can
achieve promising WSR. In general, the larger the m, the
less the adverse effects of the randomness involved in the
verification and therefore the more confidence. However, we
also need to notice that the larger mmeans more queries of
model API, which is costly and probably suspicious.
F. The Resistance to Potential Adaptive Attacks
In this section, we discuss the resistance of our DVBW to
three representative backdoor removal methods, including fine-
tuning [55], model pruning [56], and anti-backdoor learning
[57]. These methods were initially used in image classification
but can be directly generalized to other classification tasks
(e.g., graph recognition) as well. Unless otherwise specified,
we use only one model structure with one trigger pattern as
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 11
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
55
60
65
70
75
80
85
90
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
95
96
97
98
99
100
Watermark Success Rate (%)
(a) CIFAR-10
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
10
20
30
40
50
60
70
80
Benign Accuracy (%)
BA (BadNets)
BA (Blended)
WSR (BadNets)
WSR (Blended)
0
20
40
60
80
100
Watermark Success Rate (%)
(b) ImageNet
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
67.5
70.0
72.5
75.0
77.5
80.0
82.5
85.0
Benign Accuracy (%)
BA (Sentence-Level)
BA (Word-Level)
WSR (Sentence-Level)
WSR (Word-Level)
20
40
60
80
100
Watermark Success Rate (%)
(c) IMDB
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
20
40
60
80
100
Benign Accuracy (%)
BA (Sentence-Level)
BA (Word-Level)
WSR (Sentence-Level)
WSR (Word-Level)
50
60
70
80
90
100
Watermark Success Rate (%)
(d) DBpedia
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
50
55
60
65
70
75
80
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
84
86
88
90
92
94
96
98
100
Watermark Success Rate (%)
(e) COLLAB
2 10 18 26 34 42 50 58 66 74 82 90 98
Pruning Rate (%)
25
30
35
40
45
50
Benign Accuracy (%)
BA (GBA-Minimal)
BA (GBA-Random)
WSR (GBA-Minimal)
WSR (GBA-Random)
0
20
40
60
80
100
Watermark Success Rate (%)
(f) REDDIT-MULTI-5K
Fig. 7: The resistance of our DVBW to model pruning on six different datasets.
an example for the discussions on each dataset. We implement
these removal methods based on the codes of an open-sourced
backdoor toolbox [58] (i.e.,BackdoorBox2).
The Resistance to Fine-tuning. Following the classical set-
tings, we adopt 10% benign samples from the original training
set to fine-tune fully-connected layers of the watermarked
model. In each case, we set the learning rate as the one used in
the last training epoch of the victim model. As shown in Figure
6, the watermark success rate (WSR) generally decreases with
the increase of tuning epochs. However, even on the ImageNet
dataset where fine-tuning is most effective, the WSR is still
larger than 60% after the fine-tuning process is finished. In
most cases, fine-tuning has only minor effects in reducing
WSR. These results indicate that our DVBW is resistant to
model fine-tuning to some extent.
The Resistance to Model Pruning. Following the classical
settings, we adopt 10% benign samples from the original
training set to prune the latent representation (i.e., inputs
of the fully-connected layers) of the watermarked model. In
each case, the pruning rate is set to {0%,2%,·· ·98%}. As
shown in Figure 7, pruning may significantly decrease the
watermark success rate (WSR), especially when the pruning
rate is nearly 100%. However, its effects are with the huge
sacrifice of benign accuracy (BA). These decreases in BA
are unacceptable in practice since they will hinder standard
model functionality. Accordingly, our DVBW is also resistant
to model pruning to some extent. An interesting phenomenon
is that the WSR even increases near the end of the pruning
process in some cases. We speculate that it is probably because
backdoor-related neurons and benign ones are competitive and
the effects of benign neurons are already eliminated near the
end. We will further discuss its mechanism in our future work.
2https://github.com/THUYimingLi/BackdoorBox
TABLE XI: The computational complexity of dataset water-
marking and dataset verification in our DVBW. Specifically,
γis the poisoning rate, Nis the number of training samples,
and mis the sampling number.
Dataset Watermarking Dataset Verification
Single Mode Batch Mode
O(γ·N)O(m)O(1)
The Resistance to Anti-backdoor Learning. In general,
anti-backdoor learning (ABL) intends to detect and unlearn
poisoned samples during the training process of DNNs. Ac-
cordingly, whether ABL can successfully find watermarked
samples is critical for its effectiveness. In these experiments,
we provide the results of detection rates and isolation rates
on different datasets. Specifically, the detection rate is defined
as the proportion of poisoned samples that were isolated from
all training samples, while the isolation rate denotes the ratio
of isolated samples over all training samples. As shown in
Figure 8, ABL can successfully detect watermarked samples
on both CIFAR-10 and ImageNet datasets. However, it fails in
detecting watermarked samples on other datasets with different
modalities (i.e., texts and graphs). We will further explore how
to design more stealthy dataset watermark that can circumvent
the detection of ABL across all modalities in our future work.
G. The Analysis of Computational Complexity
In this section, we analyze the computational complexity
of our DVBW. Specifically, we discuss the computational
complexity of dataset watermarking and dataset verification
of our DVBW (as summarized in Table XI).
1) The Complexity of Dataset Watermarking: Let Nde-
notes the number of all training samples and γis the poison-
ing rate. Since our DVBW only needs to watermark a few
selected samples in this step, its computational complexity is
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 12
2 4 6 8 10
Isolation Rate (%)
92
94
96
98
100
Detection Rate (%)
BadNets
Blended
(a) CIFAR-10
2 4 6 8 10
Isolation Rate (%)
92
94
96
98
100
Detection Rate (%)
BadNets
Blended
(b) ImageNet
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
Sentence-Level
Word-Level
(c) IMDB
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
Sentence-Level
Word-Level
(d) DBpedia
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
GBA-Random
GBA-Minimal
(e) COLLAB
2 4 6 8 10
Isolation Rate (%)
5
10
15
20
25
30
Detection Rate (%)
GBA-Random
GBA-Minimal
(f) REDDIT-MULTI-5K
Fig. 8: The resistance of our DVBW to anti-backdoor learning on six different datasets.
O(γ·N). In general, these watermarks are about replacing or
inserting a small part of the sample, which is highly efficient.
Accordingly, our dataset watermarking is also efficient. Note
that this step does not affect the adversaries. As such, it is
acceptable even if this step is relatively time-consuming.
2) The Complexity of Dataset Verification: In this step,
defenders need to query the (deployed) suspicious model
with msamples and conduct the hypothesis test based on
their predictions. In general, there are two classical predic-
tion modes, including (1) single mode and (2) batch mode.
Specifically, under the single mode, the suspicious model can
only predict one sample at a time while it can predict a batch
of samples simultaneously under the batch mode. Accordingly,
the computational complexity of single mode and batch mode
is O(m)and O(1), respectively. Note that this step is also
efficient, no matter under the single or the batch mode, since
predicting one sample is usually costless.
VI. RELATION WITH MODE L OWNERSHIP VERIFICATION
We notice and admit that the dataset ownership verification
defined in this paper is closely related to the model ownership
verification (MOV) [61], [63], [64], [59], [60], [62]. In general,
model ownership verification intends to identify whether a
suspicious third-party model (instead of the dataset) is stolen
from the victim for unauthorized adoption. In this section, we
discuss their similarities and differences. We summarize the
characteristics of MOV and the task of our dataset owner-
ship verification in Table XII. The comparisons between our
DVBW and representative MOV methods are in Table XIII.
Firstly, our DVBW enjoys some similarities to MOV in
the watermarking processes. Specifically, backdoor attacks are
also widely used to watermark the victim model in MOV.
However, defenders in MOV usually need to manipulate the
training process (e.g., adding some additional regularization
terms [65] or supportive modules [60]), since they can fully
control the training process of the victim model. In contrast,
in our dataset ownership verification, the defender can only
modify the dataset while having no information or access to
the model training process and therefore we can only use
poison-only backdoor attacks for dataset watermarking. In
other words, defenders in DVBW have significantly fewer
capacities, compared with those in MOV. It allows our method
to be adopted for model copyright protection, whereas their
approaches may not be directly used in our task.
Besides, both our defense and most of the existing MOV
methods exploit hypothesis-test in the verification processes.
However, in our DVBW, we consider the black-box verifica-
tion scenarios, where defenders can only query the suspicious
models to obtain their predictions. However, in MOV, many
methods (e.g., [59]) considered the white-box verification
scenarios where defenders can obtain the source files of
suspicious models. Even under the black-box settings, existing
MOV methods only consider probability-available cases while
our DVBW also discusses label-only ones.
VII. CONCLUSION
In this paper, we explored how to protect valuable re-
leased datasets. Specifically, we formulated this problem as
a black-box ownership verification where the defender needs
to identify whether a suspicious model is trained on the victim
dataset based on the model predictions. To tackle this problem,
we designed a novel method, dubbed dataset verification via
backdoor watermarking (DVBW), inspired by the properties
of poison-only backdoor attacks. DVBW contained two main
steps, including dataset watermarking and dataset verification.
Specifically, we exploited poison-only backdoor attacks for
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 13
TABLE XII: The defender’s capacities of model ownership verification and dataset ownership verification.
Task, CapacityTraining Samples Training Schedule Intermediate Results of Victim Model Predictions of Victim Model
Model Ownership Verification
Dataset Ownership Verification
1: accessible.
2: partly accessible (It is accessible for defenders under the white-box setting, while it is inaccessible under the black-box setting).
3: inaccessible.
TABLE XIII: The comparisons between our DVBW and four representative methods in model ownership verification. In each
scenario, we mark a case as the checkmark if this method can be applied.
Method, ScenarioEmbedding-free Multimodality White-box Black-box Probability-available Label-only
MOVE [59]
DIMW [60]
CEM [61]
NRF [62]
DVBW (Ours)
1Embedding-free: defenders do not need to implant any additional parts or functionalities (e.g., backdoor) in the victim model.
2Multimodality: defenders can use the method across different types of data (e.g., images, texts, and graphs).
3White-box: defenders can access the source files of suspicious models.
4Black-box: defenders can only query suspicious models.
dataset watermarking and designed a hypothesis-test-guided
method for dataset verification. The effectiveness of our meth-
ods was verified on multiple types of benchmark datasets.
ACK NOW LE DG ME NT S
This work was mostly done when Yiming Li was a
research intern at Ant Group. This work is supported in
part by the National Key R&D Program of China un-
der Grant 2022YFB3105000, the National Natural Science
Foundation of China under Grants (62171248, 62202393,
12141108), the Shenzhen Science and Technology Program
(JCYJ20220818101012025), the Sichuan Science and Tech-
nology Program under Grant 2023NSFSC1394, the PCNL
Key Project (PCL2021A07), and the Shenzhen Science and
Technology Innovation Commission (Research Center for
Computer Network (Shenzhen) Ministry of Education). We
also sincerely thank Ziqi Zhang from Tsinghua University
for her assistance in some preliminary experiments and Dr.
Baoyuan Wu from CUHK-Shenzhen for his helpful comments
on an early draft of this paper.
APPENDIX
Theorem 1. Let f(x)is the posterior probability of x
predicted by the suspicious model, variable Xdenotes the
benign sample with non-target label, and variable Xis the
watermarked version of X. Assume that Pbf(X)yt< β.
We claim that dataset owners can reject the null hypothesis H0
of probability-available verification at the significance level α,
if the watermark success rate Wof fsatisfies that
m1·(Wβτ)t1α·pWW2>0,(1)
where t1αis the (1α)-quantile of t-distribution with (m1)
degrees of freedom and mis the sample size of X.
Proof. Since Pbf(X)yt< β, the original hypothesis H0
and H1can be converted to
H
0:Pw< β +τ, (2)
H
1:Pw> β +τ. (3)
Let Eindicates the event of whether the suspect model f
predicts a poisoned sample as the target label yt. As such,
EB(1, p),(4)
where p= Pr(C(X) = yt)indicates backdoor success
probability and Bis the Binomial distribution [41].
Let x
1,·· · ,x
mdenotes mpoisoned samples used for
dataset verification and E1,·· · , Emdenote their prediction
events, we know that the attack success rate Asatisfies
W=1
m
m
X
i=1
Ei,(5)
W1
mB(m, p).(6)
According to the central limit theorem [41], the watermark
success rate Wfollows Gaussian distribution N(p, p(1p)
m)
when mis sufficiently large. Similarly, (Pwβτ)also
satisfies Gaussian distribution. As such, we can construct the
t-statistic as follows
Tm(Wβτ)
st(m1),(7)
where sis the standard deviation of (Wβτ)and W,i.e.,
s2=1
m1
m
X
i=1
(EiW)2=1
m1(m·Wm·W2).(8)
To reject the hypothesis H
0at the significance level α, we
need to ensure that
m(Wβτ)
s> t1α,(9)
where t1αis the (1α)-quantile of t-distribution with (m1)
degrees of freedom.
According to equation (8)-(9), we have
m1·(Wβτ)t1α·pWW2>0.(10)
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 14
REFERENCES
[1] X. Wu, R. He, Z. Sun, and T. Tan, A light cnn for deep face
representation with noisy labels,” IEEE Transactions on Information
Forensics and Security, vol. 13, no. 11, pp. 2884–2896, 2018.
[2] Q. Yin, J. Feng, J. Lu, and J. Zhou, “Joint estimation of pose and singular
points of fingerprints,” IEEE Transactions on Information Forensics and
Security, vol. 16, pp. 1467–1479, 2020.
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
A large-scale hierarchical image database,” in CVPR, 2009.
[4] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in
the wild,” in ICCV, 2015.
[5] J. Ni, J. Li, and J. McAuley, “Justifying recommendations using
distantly-labeled reviews and fine-grained aspects, in EMNLP, 2019.
[6] P. Voigt and A. Von dem Bussche, “The eu general data protection regu-
lation (gdpr),” A Practical Guide, 1st Ed., Cham: Springer International
Publishing, vol. 10, no. 3152676, pp. 10–5555, 2017.
[7] S. Wang, J. Zhou, J. K. Liu, J. Yu, J. Chen, and W. Xie, An efficient file
hierarchy attribute-based encryption scheme in cloud computing,” IEEE
Transactions on Information Forensics and Security, vol. 11, no. 6, pp.
1265–1277, 2016.
[8] J. Li, Q. Yu, and Y. Zhang, “Hierarchical attribute based encryption
with continuous leakage-resilience,” Information Sciences, vol. 484, pp.
113–134, 2019.
[9] H. Deng, Z. Qin, Q. Wu, Z. Guan, R. H. Deng, Y. Wang, and
Y. Zhou, “Identity-based encryption transformation for flexible sharing
of encrypted data in public cloud,” IEEE Transactions on Information
Forensics and Security, vol. 15, pp. 3168–3180, 2020.
[10] S. Haddad, G. Coatrieux, A. Moreau-Gaudry, and M. Cozic, “Joint
watermarking-encryption-jpeg-ls for medical image reliability control in
encrypted and compressed domains,” IEEE Transactions on Information
Forensics and Security, vol. 15, pp. 2556–2569, 2020.
[11] R. Wang, F. Juefei-Xu, M. Luo, Y. Liu, and L. Wang, “Faketagger: Ro-
bust safeguards against deepfake dissemination via provenance tracking,
in ACM MM, 2021.
[12] Z. Guan, J. Jing, X. Deng, M. Xu, L. Jiang, Z. Zhang, and Y. Li,
“Deepmih: Deep invertible network for multiple image hiding, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 2022.
[13] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek,
and H. V. Poor, “Federated learning with differential privacy: Algorithms
and performance analysis,” IEEE Transactions on Information Forensics
and Security, vol. 15, pp. 3454–3469, 2020.
[14] L. Zhu, X. Liu, Y. Li, X. Yang, S.-T. Xia, and R. Lu, A fine-grained
differentially private federated learning against leakage from gradients,
IEEE Internet of Things Journal, 2021.
[15] J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S.-T. Xia, “Multinomial
random forest,” Pattern Recognition, vol. 122, p. 108331, 2022.
[16] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating
backdooring attacks on deep neural networks,” IEEE Access, vol. 7,
pp. 47 230–47 244, 2019.
[17] Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack
with sample-specific triggers,” in ICCV, 2021.
[18] A. Nguyen and A. Tran, “Wanet–imperceptible warping-based backdoor
attack,” in ICLR, 2021.
[19] Z. Xiong, Z. Cai, Q. Han, A. Alrawais, and W. Li, “Adgan: protect
your location privacy in camera data of auto-driving vehicles,” IEEE
Transactions on Industrial Informatics, vol. 17, no. 9, pp. 6200–6210,
2020.
[20] Y. Li, P. Liu, Y. Jiang, and S.-T. Xia, “Visual privacy protection via
mapping distortion,” in ICASSP, 2021.
[21] H. Xu, Z. Cai, D. Takabi, and W. Li, “Audio-visual autoencoding for
privacy-preserving video streaming, IEEE Internet of Things Journal,
2021.
[22] C. Dwork, “Differential privacy: A survey of results,” in TAMC, 2008.
[23] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,
IEEE Transactions on Neural Networks and Learning Systems, 2022.
[24] X. Qi, T. Xie, Y. Li, S. Mahloujifar, and P. Mittal, “Revisiting the
assumption of latent separability for backdoor defenses,” in ICLR, 2023.
[25] Y. Gao, Y. Li, L. Zhu, D. Wu, Y. Jiang, and S.-T. Xia, “Not all samples
are born equal: Towards effective clean-label backdoor attacks, Pattern
Recognition, p. 109512, 2023.
[26] S. Li, M. Xue, B. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor
attacks on deep neural networks via steganography and regularization,
IEEE Transactions on Dependable and Secure Computing, 2020.
[27] Y. Li, H. Zhong, X. Ma, Y. Jiang, and S.-T. Xia, “Few-shot backdoor
attacks on visual object tracking,” in ICLR, 2022.
[28] I. Shumailov, Z. Shumaylov, D. Kazhdan, Y. Zhao, N. Papernot, M. A.
Erdogdu, and R. Anderson, “Manipulating sgd with data ordering
attacks,” in NeurIPS, 2021.
[29] A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack
with bit trojan,” in CVPR, 2020.
[30] R. Tang, M. Du, N. Liu, F. Yang, and X. Hu, An embarrassingly simple
approach for trojan attack in deep neural networks,” in SIGKDD, 2020.
[31] J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly
perceptible trojan attack against neural networks with bit flips,” in
ECCV, 2022.
[32] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor
attacks on deep learning systems using data poisoning,” arXiv preprint
arXiv:1712.05526, 2017.
[33] Y. Li, T. Zhai, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor attack in the
physical world,” in ICLR Workshop, 2021.
[34] Z. Zhang, L. Lyu, W. Wang, L. Sun, and X. Sun, “How to inject
backdoors with better consistency: Logit anchoring on clean data,” in
ICLR, 2022.
[35] X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, and
Y. Zhang, “Badnl: Backdoor attacks against nlp models with semantic-
preserving improvements, in ACSAC, 2021.
[36] Y. Wang, E. Sarkar, W. Li, M. Maniatakos, and S. E. Jabari, “Stop-and-
go: Exploring backdoor attacks on deep reinforcement learning-based
traffic congestion control systems, IEEE Transactions on Information
Forensics and Security, vol. 16, pp. 4772–4787, 2021.
[37] T. Zhai, Y. Li, Z. Zhang, B. Wu, Y. Jiang, and S.-T. Xia, “Backdoor
attack against speaker verification, in ICASSP, 2021.
[38] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, A survey of convolutional
neural networks: analysis, applications, and prospects,” IEEE Transac-
tions on Neural Networks and Learning Systems, 2021.
[39] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao,
C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao, “A survey on vision
transformer, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2022.
[40] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, A
comprehensive survey on graph neural networks, IEEE transactions on
neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
[41] R. V. Hogg, J. McKean, and A. T. Craig, Introduction to mathematical
statistics. Pearson Education, 2005.
[42] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features
from tiny images,” Citeseer, Tech. Rep., 2009.
[43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in ICLR, 2015.
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in CVPR, 2016.
[45] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts,
“Learning word vectors for sentiment analysis, in ACL, 2011.
[46] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives,
“DBpedia: A nucleus for a web of open data,” in ISWC, 2007.
[47] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[48] Y. Chen, “Convolutional neural network for sentence classification, in
EMNLP, 2014.
[49] J. Dai, C. Chen, and Y. Li, “A backdoor attack against lstm-based text
classification systems,” IEEE Access, vol. 7, pp. 138872–138 878, 2019.
[50] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in KDD, 2015.
[51] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph
neural networks?” in ICLR, 2018.
[52] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
learning on large graphs,” NeurIPS, 2017.
[53] Z. Xi, R. Pang, S. Ji, and T. Wang, “Graph backdoor,” in USENIX
Security, 2021.
[54] Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph
neural networks,” in SACMAT, 2021.
[55] Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans, in ICCD, 2017.
[56] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against
backdooring attacks on deep neural networks,” in RAID, 2018.
[57] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Anti-backdoor
learning: Training clean models on poisoned data, in NeurIPS, 2021.
[58] Y. Li, M. Ya, Y. Bai, Y. Jiang, and S.-T. Xia, “Backdoorbox: A python
toolbox for backdoor learning,” in ICLR Workshop, 2023.
[59] Y. Li, L. Zhu, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against
model stealing via verifying embedded external features, in AAAI, 2022.
[60] J. Zhang, D. Chen, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu,
“Deep model intellectual property protection via deep watermarking,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 44, no. 8, pp. 4005–4020, 2022.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 15
[61] N. Lukas, Y. Zhang, and F. Kerschbaum, “Deep neural network finger-
printing by conferrable adversarial examples, in ICLR, 2021.
[62] Y. Zheng, S. Wang, and C.-H. Chang, “A dnn fingerprint for non-
repudiable model ownership identification and piracy detection, IEEE
Transactions on Information Forensics and Security, vol. 17, pp. 2977–
2989, 2022.
[63] W. Guo, B. Tondi, and M. Barni, “Masterface watermarking for ipr
protection of siamese network for face verification, in IWDW, 2021.
[64] J. Xu and S. Picek, “Watermarking graph neural networks based on
backdoor attacks,” arXiv preprint arXiv:2110.11024, 2021.
[65] H. Jia, C. A. Choquette-Choo, V. Chandrasekaran, and N. Papernot,
“Entangled watermarks as a defense against model extraction, in
USENIX Security, 2021.
Yiming Li is currently a Ph.D. candidate from
Tsinghua-Berkeley Shenzhen Institute, Tsinghua
Shenzhen International Graduate School, Tsinghua
University. Before that, he received his B.S. de-
gree in Mathematics and Applied Mathematics from
Ningbo University in 2018. His research interests
are in the domain of AI security, especially back-
door learning, adversarial learning, data privacy, and
copyright protection in AI. His research has been
published in multiple top-tier conferences and jour-
nals, such as ICLR, NeurIPS, ICCV, IEEE TNNLS,
and PR Journal. He served as the senior program committee member of AAAI
2022, the program committee member of ICML, NeurIPS, ICLR, etc., and
the reviewer of IEEE TPAMI, IEEE TIFS, IEEE TDSC, etc.
Mingyan Zhu received his B.S. degree in Computer
Science and Technology from Harbin Institute of
Technology, China, in 2020. He is currently pursuing
the Ph.D.degree in Tsinghua Shenzhen International
Graduate School, Tsinghua University. His research
interests are in the domain of Low-level Computer
Vision and AI security.
Dr. Xue Yang received a Ph.D. degree in infor-
mation and communication engineering from South-
west Jiaotong University, China, in 2019. She was a
visiting student at the Faculty of Computer Science,
University of New Brunswick, Canada, from 2017 to
2018. She was a postdoctoral fellow with Tsinghua
University. She is currently a research associate with
the School of Information Science and Technology,
Southwest Jiaotong University, China. Her research
interests include data security and privacy, applied
cryptography, and federated learning.
Dr. Yong Jiang received his M.S. and Ph.D. de-
grees in computer science from Tsinghua University,
China, in 1998 and 2002, respectively. Since 2002,
he has been with the Tsinghua Shenzhen Inter-
national Graduate School of Tsinghua University,
Guangdong, China, where he is currently a full
professor. His research interests include computer
vision, machine learning, Internet architecture and
its protocols, IP routing technology, etc. He has
received several best paper awards (e.g., IWQoS
2018) from top-tier conferences and his researches
have been published in multiple top-tier journals and conferences, including
IEEE ToC, IEEE TMM, IEEE TSP, CVPR, ICLR, etc.
Dr. Wei Tao received the B.S. and Ph.D. degrees
from Peking University, China, in 1997 and 2007, re-
spectively. He is currently the Vice President at Ant
Group, in charge of its foundational security. He is
also an Adjunct Professor at Peking University. For
more than 20 years, he has been committed to mak-
ing complex systems more secure and reliable. His
work has helped Windows, Android, iOS and other
operating systems improve their security capabilities.
He also led the development of many famous secu-
rity open-sourced projects such as Mesatee/Teaclave,
MesaLink TLS, OpenRASP, Advbox Adversarial Toolbox, etc. His researches
have been published in multiple top-tier journals and conferences, including
IEEE TDSC, IEEE TIFS, IEEE S&P, USENIX Security, etc.
Dr. Shu-Tao Xia received the B.S. degree in mathe-
matics and the Ph.D. degree in applied mathematics
from Nankai University, Tianjin, China, in 1992 and
1997, respectively. Since January 2004, he has been
with the Tsinghua Shenzhen International Graduate
School of Tsinghua University, Guangdong, China,
where he is currently a full professor. From Septem-
ber 1997 to March 1998 and from August to Septem-
ber 1998, he visited the Department of Information
Engineering, The Chinese University of Hong Kong,
Hong Kong. His research interests include coding
and information theory, machine learning, and deep learning. His researches
have been published in multiple top-tier journals and conferences, including
IEEE TIP, IEEE TNNLS, CVPR, ICCV, ECCV, ICLR, etc.
... To address this issue, dataset ownership verification (DOV) [31], [32], [65] has emerged as an effective approach to safeguard datasets from the unauthorized use. DOV methods typically employ backdoor-based watermark techniques to embed unique triggers within datasets. ...
... Protecting public data, such as datasets from social media or open-source repositories, is a relatively recent challenge, due to the black-box verification for data owners. Existing solutions fall into two main categories: unlearnable examples [21], [22], [44] and dataset ownership verification (DOV) [27], [31], [32], [50], [55]. Unlearnable examples poison the dataset by altering all samples in a way that prevents machine learning models from learning meaningful representations. ...
... Dataset ownership verification (DOV) [7], [16], [17], [27], [31], [32], [50], [55] typically adopts backdoor-based watermark techniques to protect training datasets from unauthorized use. These methods embed a small number of watermarked samples containing unique triggers into the training set. ...
Preprint
Text-to-image (T2I) diffusion models have rapidly advanced, enabling high-quality image generation conditioned on textual prompts. However, the growing trend of fine-tuning pre-trained models for personalization raises serious concerns about unauthorized dataset usage. To combat this, dataset ownership verification (DOV) has emerged as a solution, embedding watermarks into the fine-tuning datasets using backdoor techniques. These watermarks remain inactive under benign samples but produce owner-specified outputs when triggered. Despite the promise of DOV for T2I diffusion models, its robustness against copyright evasion attacks (CEA) remains unexplored. In this paper, we explore how attackers can bypass these mechanisms through CEA, allowing models to circumvent watermarks even when trained on watermarked datasets. We propose the first copyright evasion attack (i.e., CEAT2I) specifically designed to undermine DOV in T2I diffusion models. Concretely, our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation. A key insight driving our approach is that T2I models exhibit faster convergence on watermarked samples during the fine-tuning, evident through intermediate feature deviation. Leveraging this, CEAT2I can reliably detect the watermarked samples. Then, we iteratively ablate tokens from the prompts of detected watermarked samples and monitor shifts in intermediate features to pinpoint the exact trigger tokens. Finally, we adopt a closed-form concept erasure method to remove the injected watermark. Extensive experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
... Intrusive methods involve embedding specific triggers into the dataset, such as backdoor triggers [28,29,33,47] and watermarks [8,9]. To be specific, the dataset owners modify a portion of the data in accordance with specific requirements to ensure unauthorized models will implicitly learn the specific prediction behaviors. ...
... To be specific, the dataset owners modify a portion of the data in accordance with specific requirements to ensure unauthorized models will implicitly learn the specific prediction behaviors. Recent studies [28,29,47] have shown that intrusive dataset ownership verification methods are prone to introduce triggered perturbations to models (e.g., manipulable behaviors for model). In addition, such methods neglect the semantic information of textual data, so the verification requires manual prompts to map the categories to the corresponding retrieved texts when migrating to multimodal scenarios. ...
... Dataset ownership verification means checking whether a suspicious model has been trained on the owned dataset. To the best of our knowledge, existing dataset ownership verification methods are mainly divided into intrusive methods [8,9,28,29,47] and non-intrusive methods [5,31,35]. For instance, Li et al. [29] embed specific a trigger in the dataset to guide the prediction behavior of the trained model while Li et al. [28] design untargeted backdoor watermark (UBW) by optimizing prediction dispersibility to mitigate harmfulness. ...
Preprint
The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.
... Guo and Potkonjak (2019) [16] introduced an embedded black-box watermarking approach, embedding the copyright owner's signature into a dataset used to train watermarked DNNs. Li et al. (2023) [17] formulated the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. It involved dataset watermarking and dataset verification. ...
... Guo and Potkonjak (2019) [16] introduced an embedded black-box watermarking approach, embedding the copyright owner's signature into a dataset used to train watermarked DNNs. Li et al. (2023) [17] formulated the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. It involved dataset watermarking and dataset verification. ...
Article
Full-text available
With the increasing use of neural networks, the importance of copyright protection for these models has gained significant attention. Backdoor watermarking is one of the key methods for protecting copyright. However, on the one hand, most existing backdoor watermarks are triggered by visual images, making them easily detectable, and therefore vulnerable to various attacks. On the other hand, it is difficult for these methods to carry information related to the creator’s identity which can easily lead to fraudulent claims of ownership. These factors contribute to the vulnerability and limitations of backdoor watermarking. In this paper, we propose DMC-Watermark, a backdoor richer watermarking method that uses dynamic mask-covered image structures as triggers. Leveraging the semantic preservation of image structure in transformation attacks, we select image structure as triggers. Furthermore, we convert the author-related information into an array of color information and apply it as a mask to the extracted image structures, allowing it to serve as a second layer of verification during the validation phase to resist fraudulent claims of ownership. The final trigger pattern, embedded with author-related image structures, is applied to the selected images in the trigger set, generating a final trigger set that is trained together with clean samples to produce a protected model. The experiments show that the proposed DMC-Watermark performs well in terms of fidelity, invisibility, undetectability, functionality, dual verification and robustness on three different datasets and four representative DNNs, and it has wide applicability and excellent results in high-resolution images.
... Notably, BlackBox Dissector method performs extraction under a hard-label setting, posing a unique challenge to our defense. We compare HoneypotNet with two baseline methods: no defense and DVBW (Li et al. 2023b), a defense method employing backdoor attacks for dataset ownership verification. Table 2 presents the results with a query budget of 30,000. ...
... This proves that HoneypotNet does not harm the utility of models and does not alert the attacker. Compared to the negligible Acc v values of undefended models and those protected by DVBW (Li et al. 2023b), HoneypotNet achieves significantly higher verification accuracy (52.29%-92.61%), indicating its effectiveness in verifying model ownership. ...
Article
Model extraction attacks are one type of inference-time attacks that approximate the functionality and performance of a black-box victim model by launching a certain number of queries to the model and then leveraging the model's predictions to train a substitute model. These attacks pose severe security threats to production models and MLaaS platforms and could cause significant monetary losses to the model owners. A body of work has proposed to defend machine learning models against model extraction attacks, including both active defense methods that modify the model's outputs or increase the query overhead to avoid extraction and passive defense methods that detect malicious queries or leverage watermarks to perform post-verification. In this work, we introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous such that any malicious users that attempt to use the output to train a substitute model will be poisoned. To this end, we propose a novel lightweight backdoor attack method dubbed HoneypotNet that replaces the classification layer of the victim model with a honeypot layer and then fine-tunes the honeypot layer with a shadow model (to simulate model extraction) via bi-level optimization to modify its output to be poisonous while remaining the original performance. We empirically demonstrate on four commonly used benchmark datasets that HoneypotNet can inject backdoors into substitute models with a high success rate. The injected backdoor not only facilitates ownership verification but also disrupts the functionality of substitute models, serving as a significant deterrent to model extraction attacks.
... In watermark verification, model behaviors will be tested to verify the presence of the watermark. Dynamic watermarking usually generates trigger input by leveraging DNN backdoor poisoning attacks [47]. However, dynamic watermarking does not imply black-box watermarking, as it can also be used as white-box watermarking. ...
Preprint
With the widespread deployment of deep neural network (DNN) models, dynamic watermarking techniques are being used to protect the intellectual property of model owners. However, recent studies have shown that existing watermarking schemes are vulnerable to watermark removal and ambiguity attacks. Besides, the vague criteria for determining watermark presence further increase the likelihood of such attacks. In this paper, we propose a secure DNN watermarking scheme named ChainMarks, which generates secure and robust watermarks by introducing a cryptographic chain into the trigger inputs and utilizes a two-phase Monte Carlo method for determining watermark presence. First, ChainMarks generates trigger inputs as a watermark dataset by repeatedly applying a hash function over a secret key, where the target labels associated with trigger inputs are generated from the digital signature of model owner. Then, the watermarked model is produced by training a DNN over both the original and watermark datasets. To verify watermarks, we compare the predicted labels of trigger inputs with the target labels and determine ownership with a more accurate decision threshold that considers the classification probability of specific models. Experimental results show that ChainMarks exhibits higher levels of robustness and security compared to state-of-the-art watermarking schemes. With a better marginal utility, ChainMarks provides a higher probability guarantee of watermark presence in DNN models with the same level of watermark accuracy.
... In this paper, we mainly focus on poison-only backdoor attacks, which represent the most classical setting and pose the broadest threat scenarios. Recently, there are also a few works exploring how to exploit backdoor attacks for positive purposes [37], [38], [39], [40], [41], [42], which is out of the scope of this paper. ...
Preprint
Full-text available
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where an attacker manipulates a small portion of the training data to implant hidden backdoors into the model. The compromised model behaves normally on clean samples but misclassifies backdoored samples into the attacker-specified target class, posing a significant threat to real-world DNN applications. Currently, several empirical defense methods have been proposed to mitigate backdoor attacks, but they are often bypassed by more advanced backdoor techniques. In contrast, certified defenses based on randomized smoothing have shown promise by adding random noise to training and testing samples to counteract backdoor attacks. In this paper, we reveal that existing randomized smoothing defenses implicitly assume that all samples are equidistant from the decision boundary. However, it may not hold in practice, leading to suboptimal certification performance. To address this issue, we propose a sample-specific certified backdoor defense method, termed Cert-SSB. Cert-SSB first employs stochastic gradient ascent to optimize the noise magnitude for each sample, ensuring a sample-specific noise level that is then applied to multiple poisoned training sets to retrain several smoothed models. After that, Cert-SSB aggregates the predictions of multiple smoothed models to generate the final robust prediction. In particular, in this case, existing certification methods become inapplicable since the optimized noise varies across different samples. To conquer this challenge, we introduce a storage-update-based certification method, which dynamically adjusts each sample's certification region to improve certification performance. We conduct extensive experiments on multiple benchmark datasets, demonstrating the effectiveness of our proposed method. Our code is available at https://github.com/NcepuQiaoTing/Cert-SSB.
Article
Full-text available
Obtaining a well-trained model involves expensive data collection and training procedures, therefore the model is a valuable intellectual property. Recent studies revealed that adversaries can `steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. Currently, there were some defense methods to alleviate this threat, mostly by increasing the cost of model stealing. In this paper, we explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. We examine our method on both CIFAR-10 and ImageNet datasets. Experimental results demonstrate that our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process. The codes for reproducing main results are available at Github (https://github.com/zlh-thu/StealingVerification).
Conference Paper
Full-text available
Visual object tracking (VOT) has been widely adopted in mission-critical applications, such as autonomous driving and intelligent surveillance systems. In current practice, third-party resources such as datasets, backbone networks, and training platforms are frequently used to train high-performance VOT models. Whilst these resources bring certain convenience, they also introduce new security threats into VOT models. In this paper, we reveal such a threat where an adversary can easily implant hidden backdoors into VOT models by tempering with the training process. Specifically, we propose a simple yet effective few-shot backdoor attack (FSBA) that optimizes two losses alternately: 1) a \emph{feature loss} defined in the hidden feature space, and 2) the standard \emph{tracking loss}. We show that, once the backdoor is embedded into the target model by our FSBA, it can trick the model to lose track of specific objects even when the \emph{trigger} only appears in one or a few frames. We examine our attack in both digital and physical-world settings and show that it can significantly degrade the performance of state-of-the-art VOT trackers. We also show that our attack is resistant to potential defenses, highlighting the vulnerability of VOT models to potential backdoor attacks.
Chapter
The security of deep neural networks (DNNs) has attracted increasing attention due to their widespread use in various applications. Recently, the deployed DNNs have been demonstrated to be vulnerable to Trojan attacks, which manipulate model parameters with bit flips to inject a hidden behavior and activate it by a specific trigger pattern. However, all existing Trojan attacks adopt noticeable patch-based triggers (e.g., a square pattern), making them perceptible to humans and easy to be spotted by machines. In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). HPT crafts hardly perceptible Trojan images by utilizing the additive noise and per-pixel flow field to tweak the pixel values and positions of the original images, respectively. To achieve superior attack performance, we propose to jointly optimize bit flips, additive noise, and flow field. Since the weight bits of the DNNs are binary, this problem is very hard to be solved. We handle the binary constraint with equivalent replacement and provide an effective optimization algorithm. Extensive experiments on CIFAR-10, SVHN, and ImageNet datasets show that the proposed HPT can generate hardly perceptible Trojan images, while achieving comparable or better attack performance compared to the state-of-the-art methods. The code is available at: https://github.com/jiawangbai/HPT.
Article
A high-performance Deep Neural Network (DNN) model is a valuable intellectual property (IP) since designing and training such a model from scratch is very costly. Model transfer learning, compression and retraining are commonly used by pirates to evade detection or even redeploy the pirated models for new applications without compromising performance. This paper presents a novel non-intrusive DNN IP fingerprinting method that can detect pirated models and provide a non-repudiable and irrevocable ownership proof simultaneously. The fingerprint is derived from projecting a subset of front-layer weights onto a model owner identity defined random space to enable a distinguisher to differentiate pirated models that are used in the same application or retrained for a different task from originally designed DNN models. The proposed method generates compact and irrevocable fingerprints against model IP misappropriation and ownership fraud. It requires no retraining and makes no modification to the original model. The proposed fingerprinting method is evaluated on nine original DNN models trained on CIFAR-10, CIFAR-100, and ImageNet-10. It is demonstrated to have the highest discriminative power among existing fingerprinting methods in detecting pirated models deployed for the same and different applications, and fraudulent model IP ownership claims.
Article
Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by attacker-specified triggers. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, there is still no comprehensive and timely review of it. In this article, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields (i.e., adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at https://github.com/THUYimingLi/backdoor-learning-resources .
Article
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks. In a variety of visual benchmarks, transformer-based models perform similar to or better than other types of networks such as convolutional and recurrent neural networks. Given its high performance and less need for vision-specific inductive bias, transformer is receiving more and more attention from the computer vision community. In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages. The main categories we explore include the backbone network, high/mid-level vision, low-level vision, and video processing. We also include efficient transformer methods for pushing transformer into real device-based applications. Furthermore, we also take a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer. Toward the end of this paper, we discuss the challenges and provide several further research directions for vision transformers.