Zhen Xiang's research while affiliated with Pennsylvania State University and other places

Publications (24)

Preprint
Full-text available
A Backdoor attack (BA) is an important type of adversarial attack against deep neural network classifiers, wherein test samples from one or more source classes will be (mis)classified to the attacker's target class when a backdoor pattern (BP) is embedded. In this paper, we focus on the post-training backdoor defense scenario commonly considered in...
Preprint
Full-text available
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. A victim classifier will predict to an attacker-desired target class whenever a test sample is embedded with the same backdoor pattern (BP) that was used to poison the classifier's training set. Detecting whether a classifier is backdoor attacked is not easy in practi...
Preprint
Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN). A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples. Existing backdoor defenses have shown success in dete...
Preprint
Full-text available
Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker's target class when a test sample from a source class is embedded with the backdoor pattern (BP). Recently, the first BA against point cloud (PC) classifiers was proposed, creating new threats to many important a...
Article
Backdoor data poisoning (a.k.a. Trojan attack) is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, d...
Preprint
Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs.DP attacks significantly degrade a classifier's accuracy by covertly injecting attack samples into the training set. Broadly applicable to different classifier structures, without strong assumptions about the attacker, we herein propose a novel Ba...
Preprint
Vulnerability of 3D point cloud (PC) classifiers has become a grave concern due to the popularity of 3D sensors in safety-critical applications. Existing adversarial attacks against 3D PC classifiers are all test-time evasion (TTE) attacks that aim to induce test-time misclassifications using knowledge of the classifier. But since the victim classi...
Article
Backdoor data poisoning attacks add mislabeled examples to the training set, with an embedded backdoor pattern, so that the classifier learns to classify to a target class whenever the backdoor pattern is present in a test sample. Here, we address posttraining detection of scene-plausible perceptible backdoors, a type of backdoor attack that can be...
Article
With wide deployment of deep neural network (DNN) classifiers, there is great potential for harm from adversarial learning attacks. Recently, a special type of data poisoning (DP) attack, known as a backdoor (or Trojan), was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify t...
Chapter
Classifiers, e.g., those based on Naive Bayes, a support vector machine, or even a neural network, are highly susceptible to a data-poisoning attack. The attack objective is to degrade classification accuracy by covertly embedding malicious (labeled) samples into the training set. Such attacks can be mounted by an insider, through an outsourcing pr...
Preprint
Backdoor attacks (BAs) are an emerging form of adversarial attack typically against deep neural network image classifiers. The attacker aims to have the classifier learn to classify to a target class when test images from one or more source classes contain a backdoor pattern, while maintaining high accuracy on all clean test images. Reverse-Enginee...
Preprint
Backdoor data poisoning is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the tr...
Article
With wide deployment of machine learning (ML)-based systems for a variety of applications including medical, military, automotive, genomic, multimedia, and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this article, we provide a contemporary survey of AL, focused particularly on defenses against a...
Preprint
Recently, a special type of data poisoning (DP) attack, known as a backdoor, was proposed. These attacks aimto have a classifier learn to classify to a target class whenever the backdoor pattern is present in a test sample. In thispaper, we address post-training detection of perceptible backdoor patterns in DNN image classifiers, wherein thedefende...
Preprint
Recently, a special type of data poisoning (DP) attack targeting Deep Neural Network (DNN) classifiers, known as a backdoor, was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test example. Launching backdoor...
Preprint
With the wide deployment of machine learning (ML) based systems for a variety of applications including medical, military, automotive, genomic, as well as multimedia and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this paper, we provide a contemporary survey of AL, focused particularly on defens...
Preprint
Naive Bayes spam filters are highly susceptible to data poisoning attacks. Here, known spam sources/blacklisted IPs exploit the fact that their received emails will be treated as (ground truth) labeled spam examples, and used for classifier training (or re-training). The attacking source thus generates emails that will skew the spam model, potentia...

Citations

... Possibly FPs made by BA detector is typically due to an "intrinsic backdoor" first reported in [47]. Even if a classifier is not attacked, there may exists a "common pattern" for some (source, target) class pairs that behaves like a backdoor planted by an attacker, for example, a small common image patch that induced most images from the source class to be misclassified to the target class. ...
... Backdoor Attack. Backdoor attack [8,20,24,31,46] is a training time attack and has emerged as a major security threat to deep neural networks (DNNs) in many application areas (e.g., natural language processing [7,62], image classification [14,15], face recognition [8], point clouds [37,81], etc.). It implants a hidden backdoor (also called neural trojan [31,46]) into the target model via poisoning training samples (i.e., attacker modified input-label pairs). ...
... Existing PT defenses typically assume that the defender independently possesses a small set of clean, legitimate samples from every class. These samples may be used: i) to reverse-engineer putative BPs, which are the basis for anomaly detection [42,14,48,25,9,43,45,34,49]; or ii) to train shadow neural networks with and without (known) BAs -based on which a binary "meta-classifier" is trained to predict whether the classifier under inspection is backdoor attacked [18,51,40]. However, these methods assume the BP type (the mechanism for embedding a BP) used by the attacker is known. ...
... In this paper, alternatively, we aim to cleanse the training set prior to deep learning. Related work on training set cleansing includes [2,16,22,20]. All of these methods rely on embedded feature representations of a classifier fully trained on the possibly poisoned training set ( [16] suggests that an auto-encoder could be used instead). ...
... Manuscript submitted to ACM An NLP trigger is often a specific pattern in an input sentence (e.g., a word or phrase), that could misclassify inputs of other labels (e.g., positive) into the target label (e.g., negative) [15,63,67,102,138,140,149,174,176,184,196]. ...
... The trigger mode is exemplified in Figure 1. Therefore, many defense methods Xiang et al., 2022; are designed against universal triggers. They are effective to capture universal trigger backdoor due to the same and strong universal pattern. ...
... Although BAs and their defenses have been extensively studied for images, devising a BA against 3D PC classifiers is challenging in several respects. Challenge 1: Existing backdoor patterns for image BAs are either a human-imperceptible, additive perturbation [6,37,54,45,40], or a pixel patch replacement representing an object physically inserted in a scene [9,6,39,46]. But none of these patterns are applicable to 3D PCs, for which "pixels" are undefined. ...
... This attack scenario is of great practical interest, and yet remains largely unsolved. Studies on defending against such attacks are either tailored to a specific type of classifier (e.g., SVM [11], LR [7]), or makes strong assumptions about the training data (e.g., availability of a clean validation set for use by the defender [13,21]). The proposed method does not make strong assumptions about the attack or about available data, and it can be deployed to protect various types of classifiers. ...
... A more practical posttraining scenario assumes that the defender, e.g., a downstream app user, has no access to the DNN's training set. Defenses for this scenario detect whether a trained DNN is backdoor attacked, infer the target class if an attack is detected, and usually reverse-engineer the backdoor pattern used by the attacker [5,8,21,24,25]. A post-training defender does possess a small, clean dataset collected independently -this dataset is not sufficient for training a clean (backdoor-free) DNN from scratch if an attack is detected. ...
... The literature study shows that there has been considerable research on the impact of adversarial attacks on machine learning models [8,9,10]. However, their feasibility in domainconstrained applications, such as intrusion detection systems, is still in its early stages [11,12,13]. ...