Peizhuo Lv’s research while affiliated with Chinese Academy of Sciences and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (13)


Figure 1: Workflow of RAG-WM.
Figure 5: Impact of Number of Watermark Tuples.
Figure 6: Impact of Number of Injected Texts per Watermark Tuple.
Figure 9: Knowledge Graph Distillation Attack.
Watermark Verification

+5

RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models
  • Preprint
  • File available

January 2025

·

3 Reads

Peizhuo Lv

·

Mengjie Sun

·

Hao Wang

·

[...]

·

Limin Sun

In recent years, tremendous success has been witnessed in Retrieval-Augmented Generation (RAG), widely used to enhance Large Language Models (LLMs) in domain-specific, knowledge-intensive, and privacy-sensitive tasks. However, attackers may steal those valuable RAGs and deploy or commercialize them, making it essential to detect Intellectual Property (IP) infringement. Most existing ownership protection solutions, such as watermarks, are designed for relational databases and texts. They cannot be directly applied to RAGs because relational database watermarks require white-box access to detect IP infringement, which is unrealistic for the knowledge base in RAGs. Meanwhile, post-processing by the adversary's deployed LLMs typically destructs text watermark information. To address those problems, we propose a novel black-box "knowledge watermark" approach, named RAG-WM, to detect IP infringement of RAGs. RAG-WM uses a multi-LLM interaction framework, comprising a Watermark Generator, Shadow LLM & RAG, and Watermark Discriminator, to create watermark texts based on watermark entity-relationship tuples and inject them into the target RAG. We evaluate RAG-WM across three domain-specific and two privacy-sensitive tasks on four benchmark LLMs. Experimental results show that RAG-WM effectively detects the stolen RAGs in various deployed LLMs. Furthermore, RAG-WM is robust against paraphrasing, unrelated content removal, knowledge insertion, and knowledge expansion attacks. Lastly, RAG-WM can also evade watermark detection approaches, highlighting its promising application in detecting IP infringement of RAG systems.

Download



DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

March 2024

·

3 Reads

·

2 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods.



A Robustness-Assured White-Box Watermark in Neural Networks

November 2023

·

35 Reads

·

21 Citations

IEEE Transactions on Dependable and Secure Computing

Recently, stealing highly-valuable and large-scale deep neural network (DNN) models becomes pervasive. The stolen models may be re-commercialized, e.g., deployed in embedded devices, released in model markets, utilized in competitions, etc, which infringes the Intellectual Property (IP) of the original owner. Detecting IP infringement of the stolen models is quite challenging, even with the white-box access to them in the above scenarios, since they may have experienced fine-tuning, pruning, functionality-equivalent adjustment to destruct any embedded watermark. Furthermore, the adversaries may also attempt to extract the embedded watermark or forge a similar watermark to falsely claim ownership. In this paper, we propose a novel DNN watermarking solution, named HufuNet , to detect IP infringement of DNN models against the above mentioned attacks. Furthermore, HufuNet is the first one theoretically proved to guarantee robustness against fine-tuning attacks. We evaluate HufuNet rigorously on four benchmark datasets with five popular DNN models, including convolutional neural network (CNN) and recurrent neural network (RNN). The experiments and analysis demonstrate that HufuNet is highly robust against model fine-tuning/pruning, transfer learning, kernels cutoff/supplement, functionality-equivalent attacks and fraudulent ownership claims, thus highly promising to protect large-scale DNN models in the real world.


Invisible Backdoor Attacks Using Data Poisoning in Frequency Domain

September 2023

·

71 Reads

·

9 Citations

Backdoor attacks have become a significant threat to deep neural networks (DNNs), whereby poisoned models perform well on benign samples but produce incorrect outputs when given specific inputs with a trigger. These attacks are usually implemented through data poisoning by injecting poisoned samples (samples patched with a trigger and mislabelled to the target label) into the dataset, and the models trained with that dataset will be infected with the backdoor. However, most current backdoor attacks lack stealthiness and robustness because of the fixed trigger patterns and mislabelling, which humans or some backdoor defense approach can easily detect. To address this issue, we propose a frequency-domain-based backdoor attack method that implements backdoor implantation without mislabeling the poisoned samples or accessing the training process. We evaluated our approach on four benchmark datasets and two popular scenarios: no-label self-supervised and clean-label supervised learning. The experimental results demonstrate that our approach achieved a high attack success rate (above 90%) on all tasks without significant performance degradation on main tasks and robust against mainstream defense approaches.



A Novel Membership Inference Attack against Dynamic Neural Networks by Utilizing Policy Networks Information

October 2022

·

4 Reads

Unlike traditional static deep neural networks (DNNs), dynamic neural networks (NNs) adjust their structures or parameters to different inputs to guarantee accuracy and computational efficiency. Meanwhile, it has been an emerging research area in deep learning recently. Although traditional static DNNs are vulnerable to the membership inference attack (MIA) , which aims to infer whether a particular point was used to train the model, little is known about how such an attack performs on the dynamic NNs. In this paper, we propose a novel MI attack against dynamic NNs, leveraging the unique policy networks mechanism of dynamic NNs to increase the effectiveness of membership inference. We conducted extensive experiments using two dynamic NNs, i.e., GaterNet, BlockDrop, on four mainstream image classification tasks, i.e., CIFAR-10, CIFAR-100, STL-10, and GTSRB. The evaluation results demonstrate that the control-flow information can significantly promote the MIA. Based on backbone-finetuning and information-fusion, our method achieves better results than baseline attack and traditional attack using intermediate information.


SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-supervised Learning

September 2022

·

21 Reads

Recent years have witnessed significant success in Self-Supervised Learning (SSL), which facilitates various downstream tasks. However, attackers may steal such SSL models and commercialize them for profit, making it crucial to protect their Intellectual Property (IP). Most existing IP protection solutions are designed for supervised learning models and cannot be used directly since they require that the models' downstream tasks and target labels be known and available during watermark embedding, which is not always possible in the domain of SSL. To address such a problem especially when downstream tasks are diverse and unknown during watermark embedding, we propose a novel black-box watermarking solution, named SSL-WM, for protecting the ownership of SSL models. SSL-WM maps watermarked inputs by the watermarked encoders into an invariant representation space, which causes any downstream classifiers to produce expected behavior, thus allowing the detection of embedded watermarks. We evaluate SSL-WM on numerous tasks, such as Computer Vision (CV) and Natural Language Processing (NLP), using different SSL models, including contrastive-based and generative-based. Experimental results demonstrate that SSL-WM can effectively verify the ownership of stolen SSL models in various downstream tasks. Furthermore, SSL-WM is robust against model fine-tuning and pruning attacks. Lastly, SSL-WM can also evade detection from evaluated watermark detection approaches, demonstrating its promising application in protecting the IP of SSL models.


Citations (6)


... To mitigate the risk of model extraction attacks, a variety of defense mechanisms have been proposed, including model extraction detection [15,18], prediction perturbation [17,20,27,37], and model watermarking [8,14,21,25,36]. Compared to the other two passive defenses, model watermarking aims to implant triggerable watermarks into the stolen model, allowing the model owner to assert ownership by activating the watermarks during inference. ...

Reference:

Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks
MEA-Defender: A Robust Watermark against Model Extraction Attack
  • Citing Conference Paper
  • May 2024

... Finally, they detected whether the explanation results belonged to trigger patterns based on one key intuition and mitigated backdoor attacks by removing such patterns. Zhou et al. [163] proposed DATAELIXIR, a novel purification method designed specifically to cleanse poisoned datasets. It utilizes diffusion models to eliminate trigger features and restore benign features, effectively transforming poisoned samples into benign ones. ...

DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... Namba et al. [30] use a set of sample-label pairs to em-bed a backdoor-like watermark in the target model and verify the ownership by activating the backdoor in the stolen model. The subsequent method SSL-WM [24] migrated this method to protect pretrained models by injecting task-agnostic backdoors. Nevertheless, these methods are ineffective against model extraction attacks, because the watermarks fail to transmit from protected models to stolen models. ...

SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-Supervised Learning
  • Citing Conference Paper
  • January 2024

... For trigger generations, most attacks (Chen et al. 2017;Steinhardt, Koh, and Liang 2017) rely on fixed triggers, and several recent methods (Nguyen and Tran 2020;Liu et al. 2020) extend it to be sample-specific. For the trigger domain, several works (Zeng et al. 2021;Wang et al. 2022;Yue et al. 2022) consider the trigger in the frequency domain due to its advantages (Cheng et al. 2023). FTrojan ) blockifies images and adds the trigger in the DCT domain, but it selects two fixed channels with fixed magnitudes. ...

Invisible Backdoor Attacks Using Data Poisoning in Frequency Domain

... This entails the backdoor model having two tasks: maintaining regular functionality (main task) and injecting the trigger and defining its activated behavior (sub-task). The concept of backdoor attacks on deep neural networks was introduced in [30] and has garnered significant attention in the field of computer vision [31,32]. In textual backdoors, backdoor triggers are inserted into text to misclassify it into a category specified by the attacker [33]. ...

DBIA: Data-Free Backdoor Attack Against Transformer Networks
  • Citing Conference Paper
  • July 2023

... Watermarking embeds a secret pattern into a model (Uchida et al., 2017) to verify ownership if the model is stolen or misused. Backdoor-based watermarking is the de facto approach, especially for non-graph data (Adi et al., 2018;Bansal et al., 2022;Lv et al., 2023;Yan et al., 2023;Li et al., 2022;Shao et al., 2022;Lansari et al., 2023). A backdoor trigger is inserted as the watermark pattern into some clean samples with a target label different from the true label, and the model is trained on both the watermarked and clean samples. ...

A Robustness-Assured White-Box Watermark in Neural Networks
  • Citing Article
  • November 2023

IEEE Transactions on Dependable and Secure Computing