Preprint

FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Deep neural networks (DNNs) are susceptible to backdoor attacks, where adversaries poison datasets with adversary-specified triggers to implant hidden backdoors, enabling malicious manipulation of model predictions. Dataset purification serves as a proactive defense by removing malicious training samples to prevent backdoor injection at its source. We first reveal that the current advanced purification methods rely on a latent assumption that the backdoor connections between triggers and target labels in backdoor attacks are simpler to learn than the benign features. We demonstrate that this assumption, however, does not always hold, especially in all-to-all (A2A) and untargeted (UT) attacks. As a result, purification methods that analyze the separation between the poisoned and benign samples in the input-output space or the final hidden layer space are less effective. We observe that this separability is not confined to a single layer but varies across different hidden layers. Motivated by this understanding, we propose FLARE, a universal purification method to counter various backdoor attacks. FLARE aggregates abnormal activations from all hidden layers to construct representations for clustering. To enhance separation, FLARE develops an adaptive subspace selection algorithm to isolate the optimal space for dividing an entire dataset into two clusters. FLARE assesses the stability of each cluster and identifies the cluster with higher stability as poisoned. Extensive evaluations on benchmark datasets demonstrate the effectiveness of FLARE against 22 representative backdoor attacks, including all-to-one (A2O), all-to-all (A2A), and untargeted (UT) attacks, and its robustness to adaptive attacks.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Recent research works have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks. The existing backdoor attacks can only cause targeted misclassification on backdoor instances, which makes them can be easily detected by defense methods. In this article, we propose an untargeted backdoor attack (UBA) against DNNs, where the backdoor instances are randomly misclassified by the backdoored model to any incorrect label. To achieve the goal of UBA, we propose to utilize autoencoder as the trigger generation model and train the target model and the autoencoder simultaneously. We also propose a special loss function (Evasion Loss) to train the autoencoder and the target model, in order to make the target model predict backdoor instances as random incorrect classes. During the inference stage, the trained autoencoder is used to generate backdoor instances. For different backdoor instances, the generated triggers are different and the corresponding predicted labels are random incorrect labels. Experimental results demonstrate that the proposed UBA is effective. On the ResNet-18 model, the attack success rate (ASR) of the proposed UBA is 96.48%, 91.27%, and 90.83% on CIFAR-10, GTSRB, and ImageNet datasets, respectively. On the VGG-16 model, the ASR of the proposed UBA is 89.72% and 97.78% on CIFAR-10 and ImageNet datasets, respectively. Moreover, the proposed UBA is robust against existing backdoor defense methods, which are designed to detect targeted backdoor attacks. We hope this article can promote the research of corresponding backdoor defense works.
Conference Paper
Full-text available
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection , called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.
Conference Paper
Full-text available
Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. Our codes are available at \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark}.
Conference Paper
Full-text available
Recent studies have revealed that deep neural networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors in the DNN model by poisoning a few training samples. The attacked model behaves normally on benign samples, whereas its prediction will be maliciously changed when the backdoor is activated. We reveal that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the end-to-end supervised training paradigm. Inspired by this observation, we propose a novel backdoor defense via decoupling the original end-to-end training process into three stages. Specifically, we first learn the backbone of a DNN model via \emph{self-supervised learning} based on training samples without their labels. The learned backbone will map samples with the same ground-truth label to similar locations in the feature space. Then, we freeze the parameters of the learned backbone and train the remaining fully connected layers via standard training with all (labeled) training samples. Lastly, to further alleviate side-effects of poisoned samples in the second stage, we remove labels of some `low-credible' samples determined based on the learned model and conduct a \emph{semi-supervised fine-tuning} of the whole model. Extensive experiments on multiple benchmark datasets and DNN models verify that the proposed defense is effective in reducing backdoor threats while preserving high accuracy in predicting benign samples.
Article
Full-text available
The last decade witnessed increasingly rapid progress in self‐driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence (AI). The objective of this paper is to survey the current state‐of‐the‐art on deep learning technologies used in autonomous driving. We start by presenting AI‐based self‐driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration, and motion control algorithms. We investigate both the modular perception‐planning‐action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources, and computational hardware. The comparison presented in this survey helps gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices.
Article
Full-text available
We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g.VGG), (2) CNNs used for structured outputs (e.g.captioning), (3) CNNs used in tasks with multi-modal inputs (e.g.visual question answering) or reinforcement learning, all without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are robust to adversarial perturbations, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models learn to localize discriminative regions of input image. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names (Bau et al. in Computer vision and pattern recognition, 2017) to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/, along with a demo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp 265–290. Springer, 2015) (http://gradcam.cloudcv.org) and a video at http://youtu.be/COjUB9Izk6E.
Article
Full-text available
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
Article
Full-text available
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
Article
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which were susceptible to the number of categories, our method can watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e ., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
Article
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where the adversary manipulates a small portion of training data such that the victim model predicts normally on the benign samples but classifies the triggered samples as the target class. The backdoor attack is an emerging yet threatening training-phase threat, leading to serious risks in DNN-based applications. In this paper, we revisit the trigger patterns of existing backdoor attacks. We reveal that they are either visible or not sparse and therefore are not stealthy enough. More importantly, it is not feasible to simply combine existing methods to design an effective sparse and invisible backdoor attack. To address this problem, we formulate the trigger generation as a bi-level optimization problem with sparsity and invisibility constraints and propose an effective method to solve it. The proposed method is dubbed sparse and invisible backdoor attack (SIBA). We conduct extensive experiments on benchmark datasets under different settings, which verify the effectiveness of our attack and its resistance to existing backdoor defenses. The codes for reproducing main experiments are available at https://github.com/YinghuaGao/SIBA .
Article
The backdoor attack on deep neural network models implants malicious data patterns in a model to induce attacker-desirable behaviors. Existing defense methods fall into the online and offline categories, in which the offline models achieve state-of-the-art detection rates but are restricted by heavy computation overhead. In contrast, their more deployable online counterparts lack the means to detect source-specific backdoors with large sizes. This work proposes a new online backdoor detection method—Reverse Backdoor Distillation (RBD) to handle issues associated with source-specific and source-agnostic backdoor attacks. RBD, designed with the novel perspective of distilling instead of erasing backdoor knowledge, is a complementary backdoor detection methodology that can be used in conjunction with other online backdoor defenses. Considering the fact that trigger data will cause overwhelming neuron activation while clean data will not, RBD distills backdoor attack pattern knowledge from a suspicious model to create a shadow model, which is subsequently deployed online along with the original model in scope to predict a backdoor attack. We extensively evaluate RBD on several datasets (MNIST, GTSRB, CIFAR-10) with diverse model architectures and trigger patterns. RBD outperforms online benchmarks in all experimental settings. Notably, RBD demonstrates superior capability in detecting source-specific attacks, where comparison methods fail, underscoring the effectiveness of our proposed technique. Moreover, RBD achieves a computational savings of at least 97%.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
The original ImageNet dataset is a popular large-scale benchmark for training Deep Neural Networks. Since the cost of performing experiments (e.g, algorithm design, architecture search, and hyperparameter tuning) on the original dataset might be prohibitive, we propose to consider a downsampled version of ImageNet. In contrast to the CIFAR datasets and earlier downsampled versions of ImageNet, our proposed ImageNet32 (and its variants ImageNet64 and ImageNet16) contains exactly the same number of classes and images as ImageNet, with the only difference that the images are downsampled to 32×\times32 pixels per image (64×\times64 and 16×\times16 pixels for the variants, respectively). Experiments on these downsampled variants are dramatically faster than on the original ImageNet and the characteristics of the downsampled datasets with respect to optimal hyperparameters appear to remain similar. The proposed datasets and scripts to reproduce our results are available at https://image-net.org/download-images and https://github.com/PatrykChrabaszcz/Imagenet32_Scripts
Article
Heterogeneous face recognition, also known as cross-modality face recognition or intermodality face recognition, refers to matching two face images from alternative image modalities. Since face images from different image modalities of the same person are associated with the same face object, there should be mutual components that reflect those intrinsic face characteristics that are invariant to the image modalities. Motivated by this rationality, we propose a novel approach called Mutual Component Analysis (MCA) to infer the mutual components for robust heterogeneous face recognition. In the MCA approach, a generative model is first proposed to model the process of generating face images in different modalities, and then an Expectation Maximization (EM) algorithm is designed to iteratively learn the model parameters. The learned generative model is able to infer the mutual components (which we call the hidden factor, where hidden means the factor is unreachable and invisible, and can only be inferred from observations) that are associated with the person's identity, thus enabling fast and effective matching for cross-modality face recognition. To enhance recognition performance, we propose an MCA-based multiclassifier framework using multiple local features. Experimental results show that our new approach significantly outperforms the state-of-the-art results on two typical application scenarios: sketch-to-photo and infrared-to-visible face recognition.
Article
In biometrics research and industry, it is critical yet challenge to match infrared face images to optical face images. The major difficulty lies in the fact that a great discrepancy exists between the infrared face image and the corresponding optical face image because they are captured by different devices (optical imaging device and infrared imaging device). This paper presents a new approach called common feature discriminant analysis (CFDA) to reduce this great discrepancy and improve optical-infrared face recognition performance. In this approach, a new learning-based face descriptor is first proposed to extract the common features from heterogeneous face images (infrared face images and optical face images), and an effective matching method is then applied to the resulting features to obtain the final decision. Extensive experiments are conducted on two large and challenging optical-infrared face datasets to show the superiority of our approach over the state-of-the-art.
Randomized channel shuffling: Minimal-overhead backdoor attack detection without clean datasets
  • Ruisi Cai
  • Zhenyu Zhang
  • Tianlong Chen
  • Xiaohan Chen
  • Zhangyang Wang
Ruisi Cai, Zhenyu Zhang, Tianlong Chen, Xiaohan Chen, and Zhangyang Wang. Randomized channel shuffling: Minimal-overhead backdoor attack detection without clean datasets. In NeurIPS, 2022.
One-shot neural backdoor erasing via adversarial weight masking
  • Shuwen Chai
  • Jinghui Chen
Shuwen Chai and Jinghui Chen. One-shot neural backdoor erasing via adversarial weight masking. In NeurIPS, 2022.
Detecting backdoor attacks on deep neural networks by activation clustering
  • Bryant Chen
  • Wilka Carvalho
  • Nathalie Baracaldo
  • Heiko Ludwig
  • Benjamin Edwards
  • Taesung Lee
  • Ian Molloy
  • Biplav Srivastava
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. Detecting backdoor attacks on deep neural networks by activation clustering. In CEUR Workshop, 2018.
Spectre: Defending against backdoor attacks using robust covariance estimation
  • Jonathan Hayase
  • Weihao Kong
Jonathan Hayase and Weihao Kong. Spectre: Defending against backdoor attacks using robust covariance estimation. In ICML, 2020.
IBD-PSC: Input-level backdoor detection via parameter-oriented scaling consistency
  • Linshan Hou
  • Ruili Feng
  • Zhongyun Hua
  • Wei Luo
  • Leo Yu Zhang
  • Yiming Li
Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, and Yiming Li. IBD-PSC: Input-level backdoor detection via parameter-oriented scaling consistency. In ICML, 2024.
Distilling cognitive backdoor patterns within an image
  • Hanxun Huang
  • Xingjun Ma
  • Sarah Erfani
  • James Bailey
Hanxun Huang, Xingjun Ma, Sarah Erfani, and James Bailey. Distilling cognitive backdoor patterns within an image. In ICLR, 2023.
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
  • Yige Li
  • Xixiang Lyu
  • Nodens Koren
  • Lingjuan Lyu
  • Bo Li
  • Xingjun Ma
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. Anti-Backdoor Learning: Training Clean Models on Poisoned Data. In NeurIPS, 2021.
Neural attention distillation: Erasing backdoor triggers from deep neural networks
  • Yige Li
  • Xixiang Lyu
  • Nodens Koren
  • Lingjuan Lyu
  • Bo Li
  • Xingjun Ma
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In ICLR, 2021.
BackdoorBox: A Python Toolbox for Backdoor Learning
  • Yiming Li
  • Ya Mengxi
  • Bai Yang
  • Jiang Yong
  • Xia Shu-Tao
Yiming Li, Ya Mengxi, Bai Yang, Jiang Yong, and Xia Shu-Tao. BackdoorBox: A Python Toolbox for Backdoor Learning. In ICLR Workshop, 2023.
Fine-pruning: Defending against backdooring attacks on deep neural networks
  • Kang Liu
  • Brendan Dolan-Gavitt
  • Siddharth Garg
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. In RAID, 2018.
Trojaning attack on neural networks
  • Yingqi Liu
  • Shiqing Ma
  • Yousra Aafer
  • Wen-Chuan Lee
  • Juan Zhai
  • Weihang Wang
  • Xiangyu Zhang
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In NDSS, 2018.
Can neural network memorization be localized? In ICML
  • Pratyush Maini
  • C Michael
  • Hanie Mozer
  • Sedghi
  • C Zachary
  • Zico Lipton
  • Chiyuan Kolter
  • Zhang
Pratyush Maini, Michael C Mozer, Hanie Sedghi, Zachary C Lipton, J Zico Kolter, and Chiyuan Zhang. Can neural network memorization be localized? In ICML, 2023.
Input-Aware Dynamic Backdoor Attack
  • Anh Tuan
  • Anh Nguyen
  • Tran
Tuan Anh Nguyen and Anh Tran. Input-Aware Dynamic Backdoor Attack. In NeurIPS, 2020.
WaNet -Imperceptible Warping-based Backdoor Attack
  • Anh Tuan
  • Anh Tuan Nguyen
  • Tran
Tuan Anh Nguyen and Anh Tuan Tran. WaNet -Imperceptible Warping-based Backdoor Attack. In ICLR, 2021.
Backdoor secrets unveiled: Identifying backdoor data with optimized scaled prediction consistency
  • Soumyadeep Pal
  • Yuguang Yao
  • Ren Wang
  • Bingquan Shen
  • Sijia Liu
Soumyadeep Pal, Yuguang Yao, Ren Wang, Bingquan Shen, and Sijia Liu. Backdoor secrets unveiled: Identifying backdoor data with optimized scaled prediction consistency. In ICLR, 2024.
Revisiting the Assumption of Latent Separability for Backdoor Defenses
  • Xiangyu Qi
  • Tinghao Xie
  • Yiming Li
Xiangyu Qi, Tinghao Xie, Yiming Li, Saeed Mahloujifar, and Prateek Mittal. Revisiting the Assumption of Latent Separability for Backdoor Defenses. In ICLR, 2023.
Towards a proactive ML approach for detecting backdoor poison samples
  • Xiangyu Qi
  • Tinghao Xie
  • T Jiachen
  • Tong Wang
  • Saeed Wu
  • Prateek Mahloujifar
  • Mittal
Xiangyu Qi, Tinghao Xie, Jiachen T Wang, Tong Wu, Saeed Mahloujifar, and Prateek Mittal. Towards a proactive ML approach for detecting backdoor poison samples. In USENIX Security, 2023.
Setting the Trap: Capturing and Defeating Backdoor Threats in PLMs through Honeypots
  • Ruixiang Tang
  • Jiayi Yuan
  • Yiming Li
  • Zirui Liu
  • Rui Chen
  • Xia Hu
Ruixiang Tang, Jiayi Yuan, Yiming Li, Zirui Liu, Rui Chen, and Xia Hu. Setting the Trap: Capturing and Defeating Backdoor Threats in PLMs through Honeypots. In NeurIPS, 2023.
Adversarial neuron pruning purifies backdoored deep models
  • Dongxian Wu
  • Yisen Wang
Dongxian Wu and Yisen Wang. Adversarial neuron pruning purifies backdoored deep models. In NeurIPS, 2021.
Umd: Unsupervised model detection for x2x backdoor attacks
  • Zhen Xiang
  • Zidi Xiong
  • Bo Li
Zhen Xiang, Zidi Xiong, and Bo Li. Umd: Unsupervised model detection for x2x backdoor attacks. In ICML, 2023.
Towards reliable and efficient backdoor trigger inversion via decoupling benign features
  • Xiong Xu
  • Kunzhe Huang
  • Yiming Li
  • Zhan Qin
  • Kui Ren
Xiong Xu, Kunzhe Huang, Yiming Li, Zhan Qin, and Kui Ren. Towards reliable and efficient backdoor trigger inversion via decoupling benign features. In ICLR, 2024.
Adversarial Unlearning of Backdoors via Implicit Hypergradient
  • Yi Zeng
  • Si Chen
  • Won Park
  • Morley Mao
  • Ming Jin
  • Ruoxi Jia
Yi Zeng, Si Chen, Won Park, Z Morley Mao, Ming Jin, and Ruoxi Jia. Adversarial Unlearning of Backdoors via Implicit Hypergradient. In ICLR, 2022.