Conference Paper

Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Through this training process, the backdoored model could correctly classify benign images while misclassifying images with triggers as the designated target class [5,13]. Subsequently, backdoor attacks have evolved into various sophisticated and stealthy forms [3,11,29], being applied in a wide range of scenarios [18,34,41], covering extensive computer vision tasks [10,20,44]. Some of these attacks have achieved alarming success rates through carefully designed backdoors. ...
... In some studies on adversarial examples for Vision Transformer (ViT) models [4,41], adversarial samples for ViT models consistently exhibit noticeable patch-like stripes. Due to the unique working mechanism of ViT, it is reasonable that its universal adversarial perturbations (UAPs) appear in a patch-like pattern. ...
Preprint
Vision Transformers (ViTs) have achieved record-breaking performance in various visual tasks. However, concerns about their robustness against backdoor attacks have grown. Backdoor attacks involve associating a specific trigger with a target label, causing the model to predict the attacker-specified label when the trigger is present, while correctly identifying clean images.We found that ViTs exhibit higher attack success rates for quasi-triggers(patterns different from but similar to the original training triggers)compared to CNNs. Moreover, some backdoor features in clean samples can suppress the original trigger, making quasi-triggers more effective.To better understand and exploit these vulnerabilities, we developed a tool called the Perturbation Sensitivity Distribution Map (PSDM). PSDM computes and sums gradients over many inputs to show how sensitive the model is to small changes in the input. In ViTs, PSDM reveals a patch-like pattern where central pixels are more sensitive than edges. We use PSDM to guide the creation of quasi-triggers.Based on these findings, we designed "WorstVIT," a simple yet effective data poisoning backdoor for ViT models. This attack requires an extremely low poisoning rate, trains for just one epoch, and modifies a single pixel to successfully attack all validation images.
... Token-level Attacks target the tokenization layer of ViTs. SWARM [41] introduces a switchable backdoor mechanism featuring a "switch token" that dynamically toggles between benign and adversarial behaviors, ensuring high attack success rates while maintaining functionality in clean environments. ...
Preprint
Full-text available
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
... Adapting CLIP to the downstream applications has attracted much attention and has been widely explored in recent approaches [55,54,52,26,56,30]. CoOp [55] introduces learnable prompts [22,51,50,28] and CoCoOp [54] conditions the text prompts on image embedding for better generalization. Maple [18] performs prompting for both vision and language branches and improves the alignment of the embedding between modalities. ...
Preprint
Adaptation of pretrained vision-language models such as CLIP to various downstream tasks have raised great interest in recent researches. Previous works have proposed a variety of test-time adaptation (TTA) methods to achieve strong generalization without any knowledge of the target domain. However, existing training-required TTA approaches like TPT necessitate entropy minimization that involves large computational overhead, while training-free methods like TDA overlook the potential for information mining from the test samples themselves. In this paper, we break down the design of existing popular training-required and training-free TTA methods and bridge the gap between them within our framework. Specifically, we maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets, showcasing its applicability in real-world situations.
... For instance, Yuan et al. [46] developed a universal trigger that drifts the attention of the model to the patches that contain the trigger. Yang et al. [45] explored adding an extra token to the model's input. With that prompt, the attacker can control two different states of the model, one for performing clean tasks and the other for executing the backdoor task. ...
Preprint
Full-text available
Due to the high cost of training, large model (LM) practitioners commonly use pretrained models downloaded from untrusted sources, which could lead to owning compromised models. In-context learning is the ability of LMs to perform multiple tasks depending on the prompt or context. This can enable new attacks, such as backdoor attacks with dynamic behavior depending on how models are prompted. In this paper, we leverage the ability of vision transformers (ViTs) to perform different tasks depending on the prompts. Then, through data poisoning, we investigate two new threats: i) task-specific backdoors where the attacker chooses a target task to attack, and only the selected task is compromised at test time under the presence of the trigger. At the same time, any other task is not affected, even if prompted with the trigger. We succeeded in attacking every tested model, achieving up to 89.90\% degradation on the target task. ii) We generalize the attack, allowing the backdoor to affect \emph{any} task, even tasks unseen during the training phase. Our attack was successful on every tested model, achieving a maximum of 13×13\times degradation. Finally, we investigate the robustness of prompts and fine-tuning as techniques for removing the backdoors from the model. We found that these methods fall short and, in the best case, reduce the degradation from 89.90\% to 73.46\%.
... The attacked models behave normally on benign samples, whereas their prediction will be maliciously changed whenever the trigger pattern arises. Currently, most of the existing backdoor attacks were designed for image classification tasks [4], [16], [18], [21], [31], [36], [47], although there were also a few for others [5], [12], [14], [29], [44], [48]. These methods could be divided into two main categories: poison-label and cleanlabel attacks, depending on whether the labels of modified poisoned samples are consistent with their ground-truth ones. ...
Preprint
Full-text available
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
... The fact that a backdoor is a means of breaking into a system by exploiting an existing or future implementation of malware to gain unauthorized access. This means that backdoors can easily bypass traditional security protections and authentication procedures due to their discrete character and ability to remain hidden in the absence of appropriate security monitoring or assessment [3]. The problem comes when a user's conducts a security evaluation but unable to finds backdoor codes since they are hiding in the system. ...
Article
Backdoor attacks played a critical part in the catastrophe, as well as the overall impact of cyberattacks. Backdoor assaults are additionally influencing the landscape of malware and threats, forcing companies to concentrate more on detecting and establishing vulnerability tactics in order to avoid hostile backdoor threats. Despite advances in cybersecurity systems, backdoor assaults remain a source of concern because of their propensity to remain undetected long after the attack vector has been started. This research is aimed to examine the threats of backdoor attack methods in an organization's operational network, provide a full-scale review, and serve as direction for training and defensive measures. The fundamental inspiration was drawn from the alarming and involving threat in cybersecurity, which necessitates a better awareness of the level of risk and the concurrent requirement for increased security measures. Most traditional security solutions usually fail to detect harmful backdoors due to the stealthy nature of backdoor code within the system, necessitating a unique approach to full-scale threat analysis. A multi-phase approach that begins with considerable reading and examination of existing literature to get insight into typical backdoor attack methodologies and application methods. Following analysis, testing was carried out in a virtual lab in a controlled environment because thorough malware analysis testing must adhere to ethical and legal cyber testing laws to avoid any penalties or foolish breaches. This methodology also included testing on numerous attack channels combined with backdoor attacks, such as detecting software vulnerabilities, phishing emails, and direct payload injection, to determine the complexity of the different attack vectors. Each of the collected data is utilized to create a threat model that predicts the amount of risk associated with the backdoor attack approach. The finding contributes to the development of more resilient defence mechanisms, while also strengthening the overall organization's security architecture and protocols.
... The origin of backdoor attacks in DNNs can be traced to BadNets [17], which embeds a distinct, small white patch as the trigger within the training dataset. Subsequent studies have evolved backdoor attacks by developing far more imperceptible and detectionevasive triggers [25,32,44,52,53,72], enhancing poisoning strategies [13,78,79], and revealing the susceptibility of backdoor attacks across a broader spectrum of CV tasks [11,18,37,64,84,85] and beyond [2,35,38,39,74,83]. ...
Preprint
Full-text available
Model quantization is widely used to compress and accelerate deep neural networks. However, recent studies have revealed the feasibility of weaponizing model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors stay dormant on released full-precision models but will come into effect after standard quantization. Due to the peculiarity of QCBs, existing defenses have minor effects on reducing their threats or are even infeasible. In this paper, we conduct the first in-depth analysis of QCBs. We reveal that the activation of existing QCBs primarily stems from the nearest rounding operation and is closely related to the norms of neuron-wise truncation errors (i.e., the difference between the continuous full-precision weights and its quantized version). Motivated by these insights, we propose Error-guided Flipped Rounding with Activation Preservation (EFRAP), an effective and practical defense against QCBs. Specifically, EFRAP learns a non-nearest rounding strategy with neuron-wise error norm and layer-wise activation preservation guidance, flipping the rounding strategies of neurons crucial for backdoor effects but with minimal impact on clean accuracy. Extensive evaluations on benchmark datasets demonstrate that our EFRAP can defeat state-of-the-art QCB attacks under various settings. Code is available at https://github.com/AntigoneRandy/QuantBackdoor_EFRAP.
Article
Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which were susceptible to the number of categories, our method can watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e ., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
Conference Paper
Full-text available
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection , called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.
Conference Paper
Full-text available
To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper , we study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Specifically, our goal is to misclassify a specific sample into a target class without any sample modification, while not significantly reduce the prediction accuracy of other samples to ensure the stealth-iness. To this end, we formulate this problem as a binary integer programming (BIP), since the parameters are stored as binary bits (i.e., 0 and 1) in the memory. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of mul-tipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of our method in attacking DNNs. The code is available at: https://github.com/jiawangbai/TA-LBF.
Article
Full-text available
This work designs and evaluates a run-time deep neural network (DNN) model Trojan detection method exploiting STRong Intentional Perturbation of inputs that is a multi-domain Trojan detection defence across Vision, Text and Audio domains---termed as STRIP-ViTA. Specifically, STRIP-ViTA is demonstratively independent of not only task domain but also model architectures. Most importantly, unlike other detection mechanisms, it requires neither machine learning expertise nor expensive computational resource, which are the reason behind DNN model outsourcing scenario---one main attack surface of Trojan attack. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and iii) speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on 28 tested Trojaned models (including publicly Trojaned mode) corroborate that STRIP-ViTA performs well across all nine architectures and five datasets. Overall, STRIP-ViTA can effectively detect Trojaned inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). Moreover, we have evaluated STRIP-ViTA against a number of advanced backdoor attacks and compare its effectiveness with other recent published state-of-the-arts.
Conference Paper
Full-text available
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STR ong I ntentional P erturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model---malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input---a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
Article
Full-text available
In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.
Article
Full-text available
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
Article
Full-text available
Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Conference Paper
Full-text available
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called ldquoImageNetrdquo, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.
Article
With the thriving of deep learning in processing point cloud data, recent works show that backdoor attacks pose a severe security threat to 3D vision applications. The attacker injects the backdoor into the 3D model by poisoning a few training samples with trigger, such that the backdoored model performs well on clean samples but behaves maliciously when the trigger pattern appears. Existing attacks often insert some additional points into the point cloud as the trigger, or utilize a linear transformation ( e.g ., rotation) to construct the poisoned point cloud. However, the effects of these poisoned samples are likely to be weakened or even eliminated by some commonly used pre-processing techniques for 3D point cloud, e.g ., outlier removal or rotation augmentation. In this paper, we propose a novel imperceptible and robust backdoor attack (IRBA) to tackle this challenge. We utilize a nonlinear and local transformation, called weighted local transformation (WLT), to construct poisoned samples with unique transformations. As there are several hyper-parameters and randomness in WLT, it is difficult to produce two similar transformations. Consequently, poisoned samples with unique transformations are likely to be resistant to aforementioned pre-processing techniques. Besides, the distortion caused by a fixed WLT is both controllable and smooth, resulting in the generated poisoned samples that are imperceptible to human inspection. Extensive experiments on three benchmark datasets and four models show that IRBA achieves 80%+ attack success rate (ASR) in most cases even with pre-processing techniques, which is significantly higher than previous state-of-the-art attacks.
Conference Paper
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained textto-image diffusion models. ControlNet locks the productionready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with “zero convolutions” (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, e.g., edges, depth, segmentation, human pose, etc., with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.
Article
Many attack paradigms against deep neural networks have been well studied, such as the backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, the bit-flip based weight attack, which directly modifies weight bits of the attacked model in the deployment stage. To meet various attack scenarios, we propose a general formulation including terms to achieve effectiveness and stealthiness goals and a constraint on the number of bit-flips. Furthermore, benefitting from this extensible and flexible formulation, we present two cases with different malicious purposes, i.e. , single sample attack (SSA) and triggered samples attack (TSA). SSA which aims at misclassifying a specific sample into a target class is a binary optimization with determining the state of the binary bits (0 or 1); TSA which is to misclassify the samples embedded with a specific trigger is a mixed integer programming (MIP) with flipped bits and a learnable trigger. Utilizing the latest technique in integer programming, we equivalently reformulate them as continuous optimization problems, whose approximate solutions can be effectively and efficiently obtained by the alternating direction method of multipliers (ADMM) method. Extensive experiments demonstrate the superiority of our methods.
Article
Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data. Existing methods mainly focus on model-based adaptation in a self-training manner, such as predicting pseudo labels for new domain datasets. Since pseudo labels are noisy and unreliable, these methods suffer from catastrophic forgetting and error accumulation when dealing with dynamic data distributions. Motivated by the prompt learning in NLP, in this paper, we propose to learn an image-layer visual domain prompt for target domains while having the source model parameters frozen. During testing, the changing target datasets can be adapted to the source model by reformulating the input data with the learned visual prompts. Specifically, we devise two types of prompts, i.e., domains-specific prompts and domains-agnostic prompts, to extract current domain knowledge and maintain the domain-shared knowledge in the continual adaptation. Furthermore, we design a homeostasis-based adaptation strategy to suppress domain-sensitive parameters in domain-invariant prompts to learn domain-shared knowledge more effectively. This transition from the model-dependent paradigm to the model-free one enables us to bypass the catastrophic forgetting and error accumulation problems. Experiments show that our proposed method achieves significant performance gains over state-of-the-art methods on four widely-used benchmarks, including CIFAR-10C, CIFAR-100C, ImageNet-C, and VLCS datasets.
Chapter
The transformer models have shown promising effectiveness in dealing with various vision tasks. However, compared with training Convolutional Neural Network (CNN) models, training Vision Transformer (ViT) models is more difficult and relies on the large-scale training set. To explain this observation we make a hypothesis that ViT models are less effective in capturing the high-frequency components of images than CNN models, and verify it by a frequency analysis. Inspired by this finding, we first investigate the effects of existing techniques for improving ViT models from a new frequency perspective, and find that the success of some techniques (e.g., RandAugment) can be attributed to the better usage of the high-frequency components. Then, to compensate for this insufficient ability of ViT models, we propose HAT, which directly augments high-frequency components of images via adversarial training. We show that HAT can consistently boost the performance of various ViT models (e.g., +1.2% for ViT-B, +0.5% for Swin-B), and especially enhance the advanced model VOLO-D5 to 87.3% that only uses ImageNet-1K data, and the superiority can also be maintained on out-of-distribution data and transferred to downstream tasks. The code is available at: https://github.com/jiawangbai/HAT.
Chapter
The security of deep neural networks (DNNs) has attracted increasing attention due to their widespread use in various applications. Recently, the deployed DNNs have been demonstrated to be vulnerable to Trojan attacks, which manipulate model parameters with bit flips to inject a hidden behavior and activate it by a specific trigger pattern. However, all existing Trojan attacks adopt noticeable patch-based triggers (e.g., a square pattern), making them perceptible to humans and easy to be spotted by machines. In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). HPT crafts hardly perceptible Trojan images by utilizing the additive noise and per-pixel flow field to tweak the pixel values and positions of the original images, respectively. To achieve superior attack performance, we propose to jointly optimize bit flips, additive noise, and flow field. Since the weight bits of the DNNs are binary, this problem is very hard to be solved. We handle the binary constraint with equivalent replacement and provide an effective optimization algorithm. Extensive experiments on CIFAR-10, SVHN, and ImageNet datasets show that the proposed HPT can generate hardly perceptible Trojan images, while achieving comparable or better attack performance compared to the state-of-the-art methods. The code is available at: https://github.com/jiawangbai/HPT.
Conference Paper
Recently, prompt tuning has shown remarkable performance as a new learning paradigm, which freezes pre-trained language models (PLMs) and only tunes some soft prompts. A fixed PLM only needs to be loaded with different prompts to adapt different downstream tasks. However, the prompts associated with PLMs may be added with some malicious behaviors, such as backdoors. The victim model will be implanted with a backdoor by using the poisoned prompt. In this paper, we propose to obtain the poisoned prompt for PLMs and corresponding downstream tasks by prompt tuning. We name this Poisoned Prompt Tuning method "PPT". The poisoned prompt can lead a shortcut between the specific trigger word and the target label word to be created for the PLM. So the attacker can simply manipulate the prediction of the entire model by just a small prompt. Our experiments on various text classification tasks show that PPT can achieve a 99% attack success rate with almost no accuracy sacrificed on original task. We hope this work can raise the awareness of the possible security threats hidden in the prompt.
Article
DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.
Conference Paper
We investigate the fine grained object categorization problem of determining the breed of animal from an image. To this end we introduce a new annotated dataset of pets covering 37 different breeds of cats and dogs. The visual problem is very challenging as these animals, particularly cats, are very deformable and there can be quite subtle differences between the breeds. We make a number of contributions: first, we introduce a model to classify a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Fitting the model involves automatically segmenting the animal in the image. Second, we compare two classification approaches: a hierarchical one, in which a pet is first assigned to the cat or dog family and then to a breed, and a flat one, in which the breed is obtained directly. We also investigate a number of animal and image orientated spatial layouts. These models are very good: they beat all previously published results on the challenging ASIRRA test (cat vs dog discrimination). When applied to the task of discriminating the 37 different breeds of pets, the models obtain an average accuracy of about 59%, a very encouraging result considering the difficulty of the problem.
Conference Paper
We investigate to what extent combinations of features can improve classification performance on a large dataset of similar classes. To this end we introduce a 103 class flower dataset. We compute four different features for the flowers, each describing different aspects, namely the local shape/texture, the shape of the boundary, the overall spatial distribution of petals, and the colour. We combine the features using a multiple kernel framework with a SVM classifier. The weights for each class are learnt using the method of Varma and Ray, which has achieved state of the art performance on other large dataset, such as Caltech 101/256. Our dataset has a similar challenge in the number of classes, but with the added difficulty of large between class similarity and small within class similarity. Results show that learning the optimum kernel combination of multiple features vastly improves the performance, from 55.1% for the best single feature to 72.8% for the combination of all features.
Article
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Visual prompting via image inpainting
  • Bar
Poisoning the unlabeled dataset of {Semi-Supervised} learning
  • Nicholas Carlini
Graphadapter: Tuning vision-language models with dual knowledge graph
  • Li
Backdoor attack on hash-based image retrieval via clean-label data poisoning
  • Gao
Blind backdoors in deep learning models
  • Eugene Bagdasaryan
  • Vitaly Shmatikov
Exploring visual prompts for adapting large-scale models
  • Bahng
An image is worth 16x 16 words: Trans-formers for image recognition at scale
  • Dosovitskiy
Aeva: Black-box backdoor detection using adversarial extreme value analysis
  • Guo
Neu-roninspect: Detecting backdoors in neural networks via out-put explanations
  • Huang
Anti-backdoor learning: Training clean models on poisoned data
  • Li
Backdoor attacks on vision transformers
  • Subramanya
Label-consistent backdoor attacks
  • Turner
Towards reliable and efficient backdoor trigger inversion via decoupling benign features
  • Xu
Visual prompting via image inpainting
  • Amir Bar
  • Yossi Gandelsman
  • Trevor Darrell
Lora: Low-rank adaptation of large language models
  • Edward
Vi-sual prompt tuning
  • Jia
Neural attention distillation: Erasing backdoor triggers from deep neural networks
  • Li