Guanhao Gan’s research while affiliated with Tsinghua University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Figure 1. The performance of models in the vicinity of the watermarked model in the parameter space. dF T is the direction of fine-tuning and d adv is the adversarial direction. black dot: the original watermarked model; red star: the model after fine-tuning.
Figure 5. The results with various magnitude ϵ. We use the dashed line with the same color to show the performance when ϵ = 0. Left: before attacks; Right: after attacks.
Figure 6. The results of our methods and other baselines with various architectures against FT attack. Our method consistently improves watermark robustness.
Figure 17. t-SNE visualization of vanilla watermarked model along the adversarial direction.
The effect of the two components in our method.

+6

Towards Robust Model Watermark via Reducing Parametric Vulnerability
  • Preprint
  • File available

September 2023

·

34 Reads

Guanhao Gan

·

·

·

Shu-Tao Xia

Deep neural networks are valuable assets considering their commercial benefits and huge demands for costly annotation and computation resources. To protect the copyright of DNNs, backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model by embedding a specific backdoor behavior before releasing it. The defenders (usually the model owners) can identify whether a suspicious third-party model is ``stolen'' from them based on the presence of the behavior. Unfortunately, these watermarks are proven to be vulnerable to removal attacks even like fine-tuning. To further explore this vulnerability, we investigate the parameter space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Extensive experiments demonstrate that our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks. The codes for reproducing our main experiments are available at \url{https://github.com/GuanhaoGan/robust-model-watermarking}.

Download

Towards Robust Model Watermark via Reducing Parametric Vulnerability

August 2023

·

144 Reads

·

6 Citations

Deep neural networks are valuable assets considering their commercial benefits and huge demands for costly annotation and computation resources. To protect the copyright of DNNs, backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model by embedding a specific backdoor behavior before releasing it. The defenders (usually the model owners) can identify whether a suspicious third-party model is "stolen" from them based on the presence of the behavior. Unfortunately, these watermarks are proven to be vulnerable to removal attacks even like fine-tuning. To further explore this vulnerability, we investigate the parameter space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Extensive experiments demonstrate that our method improves the robustness of the model watermark-ing against parametric changes and numerous watermark-removal attacks. The codes for reproducing our main experiments are available at https://github.com/ GuanhaoGan/robust-model-watermarking.


On the Effectiveness of Adversarial Training Against Backdoor Attacks

June 2023

·

10 Reads

·

17 Citations

IEEE Transactions on Neural Networks and Learning Systems

Yinghua Gao

·

·

Jingfeng Zhang

·

[...]

·

Masashi Sugiyama

Although adversarial training (AT) is regarded as a potential defense against backdoor attacks, AT and its variants have only yielded unsatisfactory results or have even inversely strengthened backdoor attacks. The large discrepancy between expectations and reality motivates us to thoroughly evaluate the effectiveness of AT against backdoor attacks across various settings for AT and backdoor attacks. We find that the type and budget of perturbations used in AT are important, and AT with common perturbations is only effective for certain backdoor trigger patterns. Based on these empirical findings, we present some practical suggestions for backdoor defense, including relaxed adversarial perturbation and composite AT. This work not only boosts our confidence in AT's ability to defend against backdoor attacks but also provides some important insights for future research.


On the Effectiveness of Adversarial Training against Backdoor Attacks

February 2022

·

47 Reads

DNNs' demand for massive data forces practitioners to collect data from the Internet without careful check due to the unacceptable cost, which brings potential risks of backdoor attacks. A backdoored model always predicts a target class in the presence of a predefined trigger pattern, which can be easily realized via poisoning a small amount of data. In general, adversarial training is believed to defend against backdoor attacks since it helps models to keep their prediction unchanged even if we perturb the input image (as long as within a feasible range). Unfortunately, few previous studies succeed in doing so. To explore whether adversarial training could defend against backdoor attacks or not, we conduct extensive experiments across different threat models and perturbation budgets, and find the threat model in adversarial training matters. For instance, adversarial training with spatial adversarial examples provides notable robustness against commonly-used patch-based backdoor attacks. We further propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.

Citations (2)


... Recently, proof-of-training (PoT) [5][6][7][8] has been proposed as a promising solution for model ownership verification, also known as proof-of-learning or provenance-of-training. Differing from other schemes that verify the features of the target model (e.g., watermarking [9][10][11][12][13] and fingerprinting [14][15][16][17]), PoT schemes examine the knowledge of the training record to distinguish the model owner from attackers. Typically, a training record consists of training data, training algorithms, and a training trajectory from the initial model state to the final model state. ...

Reference:

Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification
Towards Robust Model Watermark via Reducing Parametric Vulnerability

... 14) Adversarial Training against Poisoning Attack: a) Adversarial Training against Backdoor Attack: Gao et al. [537] evaluate the effectiveness of adversarial training against backdoor attacks across various settings. They show that the type and budget of perturbations used in AT are crucial. ...

On the Effectiveness of Adversarial Training Against Backdoor Attacks
  • Citing Article
  • June 2023

IEEE Transactions on Neural Networks and Learning Systems