Gaoyang Wei’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Figure 1. The system model of dataset watermarking.
Figure 2. The workflow of the proposed method. (a) Trigger Optimization: Iterative optimization of triggers through a surrogate model. (b) Dataset Watermark Embedding: Embedding triggers into
Execution time of watermarking process.
Experimental results on effectiveness with different methods.
Average LPIPS values for different methods.
Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization
  • Article
  • Full-text available

November 2024

·

26 Reads

Symmetry

Weitong Chen

·

Gaoyang Wei

·

Xin Xu

·

[...]

·

Yingchen She

High-quality datasets are essential for training high-performance models, while the process of collection, cleaning, and labeling is costly. As a result, datasets are considered valuable intellectual property. However, when security mechanisms are symmetry-breaking, creating exploitable vulnerabilities, unauthorized use or data leakage can infringe on the copyright of dataset owners. In this study, we design a method to mount clean-label dataset watermarking based on trigger optimization, aiming to protect the copyright of the dataset from infringement. We first perform iterative optimization of the trigger based on a surrogate model, with targets class samples guiding the updates. The process ensures that the optimized triggers contain robust feature representations of the watermark target class. A watermarked dataset is obtained by embedding optimized triggers into randomly selected samples from the watermark target class. If an adversary trains a model with the watermarked dataset, our watermark will manipulate the model’s output. By observing the output of the suspect model on samples with triggers, it can be determined whether the model was trained on the watermarked dataset. The experimental results demonstrate that the proposed method exhibits high imperceptibility and strong robustness against pruning and fine-tuning attacks. Compared to existing methods, the proposed method significantly improves effectiveness at very low watermarking rates.

Download