Article

Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Even todays most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models. It is build around the idea that the most comparable robustness measure is the minimum perturbation needed to craft an adversarial example. To this end, Foolbox provides reference implementations of most published adversarial attack methods alongside some new ones, all of which perform internal hyperparameter tuning to find the minimum adversarial perturbation. Additionally, Foolbox interfaces with most popular deep learning frameworks such as PyTorch, Keras, TensorFlow, Theano and MXNet, provides a straight forward way to add support for other frameworks and allows different adversarial criteria such as targeted misclassification and top-k misclassification as well as different distance measures. The code is licensed under the MIT license and is openly available at https://github.com/bethgelab/foolbox.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Like other domains in machine learning [48][49][50], one standard attack on MARL systems is the Gaussian noise addition (GNA), which introduces subtle yet effective adversarial strategies by adding noise to agents' observations. Attackers can mislead agents and adversely affect their learning trajectories through this manipulation. ...
... Although scaling attacks have been extensively studied in broader machine learning contexts [45,48,[57][58][59], their impact on MARL systems remains less explored. These attacks manipulate the magnitude of agents' observations, leading to skewed perceptions and decisions. ...
Article
Full-text available
This research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.
... These modifications result in a significant but tolerable increase in the number of parameters: approximately 20M for VGG, 49M for ResNet, 44M for Inception, and 10M for EfficientNet (Error! Reference source not found.2).The models, implemented via open-source TensorFlow [26]library, are tested on CIFAR10, CIFAR100, and ILSVRC12 datasets under different attack settings. CIFAR10 facilitates comprehensive study, while CIFAR100 and ILSVRC12 test the approach's generalizability. ...
... Reference source not found.) synthesized by A-ADS method of Brown et al. [5] and extend the adversary benchmark library, FoolBox's [26] to implement Npixel attacks. ...
Conference Paper
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks (CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations. When combined with 3D convolution and deep curriculum learning optimization (CLO), it significantly improves the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10 and CIFAR-100)and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing accuracy improvements over previous techniques. The results indicate that the combination of the volumetric input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating adversary training.
... To produce the attack samples, we selected 500 samples randomly from a malicious test dataset; it is obvious that if we select samples from the training dataset, we cannot fool the models, as the networks and models that have considered these data can detect them easily. Afterward, we applied the abovementioned adversarial attacks on the selected samples using the Foolbox library [30]. Then, we fed these samples to N 1 and N 2 in order to measure the success rate of each attack. ...
... We followed a similar strategy for the target network using the SVM as the classifier. In this scenario, we considered an identical set of random feature sizes [5,10,30,50,200,400, and N (full dimension)] for each SVM scenario and 50 iterations of SVM models trained for each SVM case. Table XII shows the average accuracy across 50 SVM models for each random feature case. ...
Article
Full-text available
In the past decades, the rise of artificial intelligence has given us the capabilities to solve the most challenging problems in our day-to-day lives, such as cancer prediction and autonomous navigation. However, these applications might not be reliable if not secured against adversarial attacks. In addition, recent works demonstrated that some adversarial examples are transferable across different models. Therefore, it is crucial to avoid such transferability via robust models that resist adversarial manipulations. In this paper, we propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models in the testing phase. Our novel approach consists of changing the training strategy in the target network classifier and selecting random feature samples. We consider the attacker with a Limited-Knowledge and Semi-Knowledge conditions to undertake the most prevalent types of adversarial attacks. We evaluate the robustness of our approach using the well-known UNSW-NB15 datasets that include realistic and synthetic attacks. Afterward, we demonstrate that our strategy outperforms the existing state-of-the-art approach, such as the Most Powerful Attack, which consists of fine-tuning the network model against specific adversarial attacks. Further, we demonstrate the practicality of our approach using the VIPPrint dataset through a comprehensive set of experiments. Finally, our experimental results show that our methodology can secure the target network and resists adversarial attack transferability by over 60%.
... The research works that are most related to our approach deal with automated test generation for DNNs (Mu and Gilmer 2019;Hendrycks and Dietterich 2018;Tian et al. 2018;Zhang et al. 2018;Stocco et al. 2020;Rauber et al. 2017). In these works, some reasons for uncertainty, such as ambiguity, are not considered. ...
... Based on the tested model, nominal input data is slightly changed to cause misclassifications. Literature and open source tools provide access to a wide range of different specific adversarial attacks (Rauber et al. 2017). While very popular, neither input corruptions nor adversarial attacks generate intentionally ambiguous data from nominal, typically non-ambiguous inputs. ...
Article
Full-text available
Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is {-} - to the best of our knowledge {-} - the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors’ capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.
... Regarding adversarial attack generation, there are several popular tools that can be used to implement multiple attack algorithms. For instance, [26] uses FoolBox [49] to implement FGSM and DeepFool [33] attacks; [27] and [23] uses CleverHans library [50] to implement JSMA and FGM, respectively. Moreover, several works like UAP provide open source access to their work 4 so that the community can build on them. ...
... An overview of these steps is presented in Figure 2. To measure the accuracy of LLM-generated code, we use the same test inputs used to evaluate student submissions. To modify problem statements in a Blackbox setting, we design a set of perturbation techniques that are informed by existing literature on adversarial perturbation (Bielik and Vechev, 2020;Rauber et al., 2017;Wang et al., 2021b;Zhao et al., 2023). We use SHAP (Lundberg and Lee, 2017) with a surrogate model to guide the perturbation for better efficacy vs. modification tradeoff. ...
... Adversarial attacks can be categorized in various ways. Some attacks utilize gradient information to determine the necessary perturbations, while others rely solely on the model's class decision Rauber et al. [2017]. Certain attacks do not even require knowledge of the true class label. ...
Preprint
Full-text available
In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall's τ\tau rank correlation coefficient, we show the increased consistency of our metric across 15 dataset-architecture combinations. Of the 16 attribution maps tested, our results clearly show SmoothGrad to be the best map currently available. This research makes an important contribution to the development of attribution maps by providing a reliable and consistent evaluation framework. To ensure reproducibility, we will provide the code along with our results.
... Adversarial attacks can be categorized in various ways. Some attacks utilize gradient information to determine the necessary perturbations, while others rely solely on the model's class decision (Rauber et al., 2017). Certain attacks do not even require knowledge of the true class label. ...
Article
Full-text available
In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall’s τ\tau rank correlation coefficient, we show the increased consistency of our metric across 15 dataset-architecture combinations. Of the 16 attribution maps tested, our results clearly show SmoothGrad to be the best map currently available. This research makes an important contribution to the development of attribution maps by providing a reliable and consistent evaluation framework. To ensure reproducibility, we will provide the code along with our results.
... In this scenario, we assume the attacker possesses complete information about the targeted classifier, including the feature space, classifier type, and trained model parameters. We employed the fast gradient sign method (FGSM) attack [30], utilizing the Foolbox library to implement it [31]. The FGSM works by applying a perturbation to the original input X in order to generate an input X ′ that the model misclassifies. ...
Article
Full-text available
While encryption enhances data security, it also presents significant challenges for network traffic analysis, especially in detecting malicious activities. To tackle this challenge, this paper introduces combined Attention-aware Feature Fusion and Communication Graph Embedding Learning (AFF_CGE), an advanced representation learning framework designed for detecting encrypted malicious traffic. By leveraging an attention mechanism and graph neural networks, AFF_CGE extracts rich semantic information from encrypted traffic and captures complex relations between communicating nodes. Experimental results reveal that AFF_CGE substantially outperforms traditional methods, improving F1-scores by 5.3% through 22.8%. The framework achieves F1-scores ranging from 0.903 to 0.929 across various classifiers, exceeding the performance of state-of-the-art techniques. These results underscore the effectiveness and robustness of AFF_CGE in detecting encrypted malicious traffic, demonstrating its superior performance.
... An overview of these steps is presented in Figure 2. To measure the accuracy of LLM-generated code, we use the same test inputs used to evaluate student submissions. To modify problem statements in a Blackbox setting, we design a set of perturbation techniques that are informed by existing literature on adversarial perturbation (Bielik and Vechev, 2020;Rauber et al., 2017;Wang et al., 2021b;Zhao et al., 2023). We use SHAP (Lundberg and Lee, 2017) with a surrogate model to guide the perturbation for better efficacy vs. modification tradeoff. ...
Preprint
Full-text available
While Large language model (LLM)-based programming assistants such as CoPilot and ChatGPT can help improve the productivity of professional software developers, they can also facilitate cheating in introductory computer programming courses. Assuming instructors have limited control over the industrial-strength models, this paper investigates the baseline performance of 5 widely used LLMs on a collection of introductory programming problems, examines adversarial perturbations to degrade their performance, and describes the results of a user study aimed at understanding the efficacy of such perturbations in hindering actual code generation for introductory programming assignments. The user study suggests that i) perturbations combinedly reduced the average correctness score by 77%, ii) the drop in correctness caused by these perturbations was affected based on their detectability.
... For example, by adding randomness or momentum, FGSM can evade some model defenses. In addition to FGSM, methods such as universal adversarial perturbations [59], foolbox [60], poison fog [61] can also interfere with the expected working results of ML models. These algorithms are usually referred to as white-box attacks, as they introduce perturbation to input data based on the known model structures, parameters, or both. ...
Preprint
Adversarial attacks are major threats to the deployment of machine learning (ML) models in many applications. Testing ML models against such attacks is becoming an essential step for evaluating and improving ML models. In this paper, we report the design and development of an interactive system for aiding the workflow of Testing Against Adversarial Attacks (TA3). In particular, with TA3, human-in-the-loop (HITL) enables human-steered attack simulation and visualization-assisted attack impact evaluation. While the current version of TA3 focuses on testing decision tree models against adversarial attacks based on the One Pixel Attack Method, it demonstrates the importance of HITL in ML testing and the potential application of HITL to the ML testing workflows for other types of ML models and other types of adversarial attacks.
... Since an increasing number of deep learning methods and robustness improvement techniques have been proposed, a comprehensive benchmark of model robustness is vital to understand their effectiveness and keep up with the state-ofthe-art. Many works have developed robustness platforms that implement popular attacks for evaluating the robustness, including CleverHans (Papernot et al., 2016), Foolbox (Rauber et al., 2017), ART (Nicolae et al., 2018), etc. But these platforms do not include the latest state-of-the-art models and do not provide benchmarking results. ...
Article
Full-text available
The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.
... We would like to refer the readers to two latest surveys [93], [94] about backdoor attacks and defenses for more details, respectively. c) Related benchmarks: Several libraries or benchmarks have been proposed for evaluating the adversarial robustness of DNNs, such as CleverHans [68], Foolbox [72], [73], AdvBox [22], RobustBench [11], RobustART [81], ARES [16], Adversarial Robustness Toolbox (ART) [65], etc. However, these benchmarks mainly focused on adversarial examples [21], [41], which occur in the testing stage. ...
Preprint
Full-text available
As an emerging approach to explore the vulnerability of deep neural networks (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility of existing works, there is a lack of a unified and standardized benchmark of backdoor learning, causing unfair comparisons or unreliable conclusions (e.g., misleading, biased or even false conclusions). Consequently, it is difficult to evaluate the current progress and design the future development roadmap of this literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. Our benchmark makes three valuable contributions to the research community. 1) We provide an integrated implementation of state-of-the-art (SOTA) backdoor learning algorithms (currently including 20 attack and 32 defense algorithms), based on an extensible modular-based codebase. 2) We conduct comprehensive evaluations with 5 poisoning ratios, based on 4 models and 4 datasets, leading to 11,492 pairs of attack-against-defense evaluations in total. 3) Based on above evaluations, we present abundant analysis from 10 perspectives via 18 useful analysis tools, and provide several inspiring insights about backdoor learning. We hope that our efforts could build a solid foundation of backdoor learning to facilitate researchers to investigate existing algorithms, develop more innovative algorithms, and explore the intrinsic mechanism of backdoor learning. Finally, we have created a user-friendly website at http://backdoorbench.com, which collects all important information of BackdoorBench, including codebase, docs, leaderboard, and model Zoo.
... We used the Foolbox package (Rauber et al., 2017; MIT license) to generate adversarial perturbations δ for every example in the test set for a fully trained model (PGD-Linf, PGD-L2, PGD-L1, Kurakin et al., 2016, BB-Linf, andBB-L2, Brendel et al., 2019). Finally, we computed the 2-D Discrete Fourier spectrumδ : = F δ of the perturbation δ. ...
Article
Full-text available
Adversarial attacks are still a significant challenge for neural networks. Recent efforts have shown that adversarial perturbations typically contain high-frequency features, but the root cause of this phenomenon remains unknown. Inspired by theoretical work on linear convolutional models, we hypothesize that translational symmetry in convolutional operations together with localized kernels implicitly bias the learning of high-frequency features , and that this is one of the main causes of high frequency adversarial examples . To test this hypothesis, we analyzed the impact of different choices of linear and non-linear architectures on the implicit bias of the learned features and adversarial perturbations, in spatial and frequency domains. We find that, independently of the training dataset, convolutional operations have higher frequency adversarial attacks compared to other architectural parameterizations, and that this phenomenon is exacerbated with stronger locality of the kernel (kernel size) end depth of the model. The explanation for the kernel size dependence involves the Fourier Uncertainty Principle: a spatially-limited filter (local kernel in the space domain) cannot also be frequency-limited (local in the frequency domain). Using larger convolution kernel sizes or avoiding convolutions (e.g., by using Vision Transformers or MLP-style architectures) significantly reduces this high-frequency bias. Looking forward, our work strongly suggests that understanding and controlling the implicit bias of architectures will be essential for achieving adversarial robustness.
... We contrast performance with a baseline defense approach using similar datasets. We used adversary stickers ( (Figure) synthesized by A-ADS method of Brown et al. [6] and flower patches extend the adversary benchmark library, FoolBox's [32] to implement N-pixel attacks. Figure 7 shows the defense success rate of model trained with our method surpasses 80% after 300 epochs, indicating effective defense against 1-pixel attacks at each validation run. ...
Article
Full-text available
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks (CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations. When combined with 3D convolution and deep curriculum learning optimization (CLO), it significantly improves the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10 and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing accuracy improvements over previous techniques. The results indicate that the combination of the volumetric input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating adversary training.
... Therefore, in this study, our main goal is to increase the strength of the attack capability that transfers between networks (here from SN to TN), especially in CNN, owing to the popularity of this network and its performance. Furthermore, attack transferability must be evaluated in scenarios that differ from the standard implementations of attacks outlined in most adversarial software packages, such as Foolbox library toolkits [21]. ...
Article
Full-text available
The current study investigates the robustness of deep learning models for accurate medical diagnosis systems with a specific focus on their ability to maintain performance in the presence of adversarial or noisy inputs. We examine factors that may influence model reliability, including model complexity, training data quality, and hyperparameters; we also examine security concerns related to adversarial attacks that aim to deceive models along with privacy attacks that seek to extract sensitive information. Researchers have discussed various defenses to these attacks to enhance model robustness, such as adversarial training and input preprocessing, along with mechanisms like data augmentation and uncertainty estimation. Tools and packages that extend the reliability features of deep learning frameworks such as TensorFlow and PyTorch are also being explored and evaluated. Existing evaluation metrics for robustness are additionally being discussed and evaluated. This paper concludes by discussing limitations in the existing literature and possible future research directions to continue enhancing the status of this research topic, particularly in the medical domain, with the aim of ensuring that AI systems are trustworthy, reliable, and stable.
Chapter
As artificial intelligence (AI) has become an integral part of modern mobile networks, there is an increasing concern about vulnerabilities of intelligent machine learning (ML)-driven network components to adversarial effects. Due to the shared nature of wireless mediums, these components may be susceptible to sophisticated attacks that can manipulate the training and inference processes of the AI/ML models over the air. In our research, we focus on adversarial example attacks. During such an attack, an adversary aims to supply intelligently crafted input features to the target model so that it outputs a certain wrong result. This type of attack is the most realistic threat to the AI/ML models deployed in a 5G network since it takes place in the inference stage and therefore does not require having access to either the target model or the datasets during the training. In this study, we first provide experimental results for multiple use cases in order to demonstrate that such an attack approach can be carried out against various AI/ML-driven frameworks which might be present in the mobile network. After that, we discuss the defence mechanisms service providers may employ in order to protect the target network from adversarial effects.
Article
From the innovation, Artificial Intelligence (AI) materialized as one of the noticeable research areas in various technologies and has almost expanded into every aspect of modern human life. However, nowadays, the development of AI is unpredictable with the stated values of those developing them; hence, the risk of misbehaving AI increases continuously. Therefore, there are uncertainties about indorsing that the development and deploying AI are favorable and not unfavorable to humankind. In addition, AI holds a black-box pattern, which results in a lack of understanding of how systems can work based on the raised concerns. From the above discussion, trustworthy AI is vital for the extensive adoption of AI in many applications, with strong attention to humankind and the need to focus on AI systems developing into the system outline at the time of system design. In this survey, we discuss compound materials on trustworthy AI and present state-of-the-art of trustworthy AI technologies, revealing new perspectives, bridging knowledge gaps, and paving the way for potential advances of robustness, and explainability rules which play a proactive role in designing AI systems. Systems that are reliable and secure and mimic human behaviour significantly impact the technological AI ecosystem. We provided various contemporary technologies to build explainability and robustness for AI-based solutions, so AI works safer and more trustworthy. Finally, we conclude our survey paper with high-end opportunities, challenges, and future research directions for trustworthy AI to investigate in the future.
Article
Deep-learning-based identity management systems, such as face authentication systems, are vulnerable to adversarial attacks. However, existing attacks are typically designed for single-task purposes, which means they are tailored to exploit vulnerabilities unique to the individual target rather than being adaptable for multiple users or systems. This limitation makes them unsuitable for certain attack scenarios, such as morphing, universal, transferable, and counter attacks. In this paper, we propose a multi-task adversarial attack algorithm called MTADV that are adaptable for multiple users or systems. By interpreting these scenarios as multi-task attacks, MTADV is applicable to both single- and multi-task attacks, and feasible in the white- and gray-box settings. Furthermore, MTADV is effective against various face datasets, including LFW, CelebA, and CelebA-HQ, and can work with different deep learning models, such as FaceNet, InsightFace, and CurricularFace. Importantly, MTADV retains its feasibility as a single-task attack targeting a single user/system. To the best of our knowledge, MTADV is the first adversarial attack method that can target all of the aforementioned scenarios in one algorithm.
Conference Paper
As the use and reliance on AI technologies continue to proliferate, there is mounting concern regarding adversarial example attacks, emphasizing the pressing necessity for robust defense strategies to protect AI systems from malicious input manipulation. In this paper, we introduce a computationally efficient plug-in module, seamlessly integrable with advanced diffusion models for purifying adversarial examples. Drawing inspiration from the concept of deconstruction and reconstruction (DR), our module decomposes an input image into foundational visual features expected to exhibit robustness against adversarial perturbations and subsequently rebuilds the image using an image-to-image transformation neural network. Through the collaborative integration of the module with an advanced diffusion model, this combination attains state-of-the-art performance in effectively purifying adversarial examples while preserving high classification accuracy on clean image samples. The model performance is evaluated on representative neural network classifiers pre-trained and fine-tuned on large-scale datasets. An ablation study analyses the impact of the proposed plug-in module on enhancing the effectiveness of diffusion-based purification. Furthermore, it is noteworthy that the module demonstrates significant computational efficiency, incurring only minimal computational overhead during the purification process.
Chapter
We investigate whether it is possible to extend the random feature selection approach to include detectors that incorporate Deep Learning features in addition to the improvement in the robustness of forensic detectors to targeted attacks mentioned in Chen et al. (IEEE Trans Inf Forensics Secur 14(9):2454–2469, 2019). This paper specifically investigates the transferability of adversarial cases targeting the original Convolutional Neural Network (CNN) image manipulation detector compared to other detectors that essentially respond to a subset of features randomly collected from the original network, particularly from its flattened layer. Considering the following features, the results were obtained: (1) Three image manipulation detection tasks, including rescaling, average filtering, and adaptive histogram equalization. (2) Two structures of the original network. (3) Three different classes of attacks. Randomization of features was found to contribute to disrupting attack transferability, even in cases where attack transferability can be prevented by retraining the detector or simply varying the detector architecture.
Article
The vulnerability of deep neural networks against adversarial attacks, i.e ., imperceptible adversarial perturbations can easily give rise to wrong predictions, poses a huge threat to the security of their real-world deployments. In this paper, a novel Adversarial Detection method via Disentangling Natural images and Perturbations (ADDNP) is proposed. Compared to natural images that can typically be modeled by lower-dimensional subspaces or manifolds, the distributions of adversarial perturbations are much more complex, e.g ., one normal example’s adversarial counterparts generated by different attack strategies can be significantly distinct. The proposed ADDNP exploits such distinct properties for the detection of adversarial attacks amongst normal examples. Specifically, we use a dual-branch disentangling framework to encode natural images and perturbations of inputs separately, followed by joint reconstruction. During inference, the reconstruction discrepancy (RD) measured in the learned latent feature space is used as an indicator of adversarial perturbations. The proposed ADDNP algorithm is evaluated on three popular datasets, i.e ., CIFAR-10, CIFAR-100, and mini ImageNet with increasing data complexity, across multiple popular attack strategies. Compared to the existing and state-of-the-art detection methods, ADDNP has demonstrated promising performance on adversarial detection, with significant improvements on more challenging datasets.
Article
In this paper, we introduce two novel methods to detect adversarial examples utilizing pixel value diversity. First, we propose the concept of pixel value diversity (which reflects the spread of pixel values in an image) and two independent metrics (UPVR and RPVR) to assess the pixel value diversity separately. Then we propose two methods to detect adversarial examples based on the threshold method and Bayesian method respectively. Experimental results show that compared to an excellent prior method LID, our proposed methods achieve better performances in detecting adversarial examples. We also show the robustness of our proposed work against an adaptive attack method.
Article
While network attacks play a critical role in many advanced persistent threat (APT) campaigns, an arms race exists between the network defenders and the adversary: to make APT campaigns stealthy, the adversary is strongly motivated to evade the detection system. However, new studies have shown that neural network is likely a game-changer in the arms race: neural network could be applied to achieve accurate, signature-free, and low-false-alarm-rate detection. In this work, we investigate whether the adversary could fight back during the next phase of the arms race. In particular, noticing that none of the existing adversarial example generation methods could generate malicious packets (and sessions) that can simultaneously compromise the target machine and evade the neural network detection model, we propose a novel attack method to achieve this goal. We have designed and implemented the new attack. We have also used Address Resolution Protocol (ARP) Poisoning and Domain Name System (DNS) Cache Poisoning as the case study to demonstrate the effectiveness of the proposed attack.
Chapter
In the realm of Deep Neural Networks (DNNs), one of the primary concerns is their vulnerability in adversarial environments, whereby malicious attackers can easily manipulate them. As such, identifying adversarial samples is crucial to safeguarding the security of DNNs in real-world scenarios. In this work, we propose a method of adversarial example detection. Our approach using a Latent Representation Dynamic Prototype to sample more generalizable latent representations from a learnable Gaussian distribution, which relaxes the detection dependency on the nearest neighbour’s latent representation. Additionally, we introduce Random Homogeneous Sampling (RHS) to replace KNN sampling reference samples, resulting in lower reasoning time complexity at O(1). Lastly, we use cross-attention in the adversarial discriminator to capture the evolutionary differences of latent representation in benign and adversarial samples by comparing the latent representations from inference and reference samples globally. We conducted experiments to evaluate our approach and found that it performs competitively in the gray-box setting against various attacks with two Lp\mathcal {L}_p-norm constraints for CIFAR-10 and SVHN datasets. Moreover, our detector trained with PGD attack exhibited detection ability for unseen adversarial samples generated by other adversarial attacks with small perturbations, ensuring its generalization ability in different scenarios.
Article
As a new programming paradigm, deep learning has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and deep learning are normally black box systems, and even worse deep learning based software are vulnerable to threats from abnormal examples, such as adversarial and backdoored examples constructed by attackers with malicious intentions as well as unintentionally mislabeled samples. Therefore, it is important and urgent to detect such abnormal examples. While various detection approaches have been proposed respectively addressing some specific types of abnormal examples, they suffer from some limitations and until today this problem is still of considerable interest. In this work, we first propose a novel characterization to distinguish abnormal examples from normal ones based on the observation that abnormal examples have significantly different (adversarial) robustness from normal ones. We systemically analyze those three different types of abnormal samples in terms of robustness, and find that they have different characteristics from normal ones. As robustness measurement is computationally expensive and hence can be challenging to scale to large networks, we then propose to effectively and efficiency measure robustness of an input sample using the cost of adversarially attacking the input, which was originally proposed to test robustness of neural networks against adversarial examples. Next, we propose a novel detection method, named “attack as detection” (A ² D) which uses the cost of adversarially attacking an input instead of robustness to check if it is abnormal. Our detection method is generic and various adversarial attack methods could be leveraged. Extensive experiments show that A ² D is more effective than recent promising approaches that were proposed to detect only one specific type of abnormal examples. We also thoroughly discuss possible adaptive attack methods to our adversarial example detection method and show that A ² D is still effective in defending carefully designed adaptive adversarial attack methods, e.g., the attack success rate drops to 0% on CIFAR10.
Chapter
Recent studies have shown that Machine Learning (ML) algorithm suffers from several vulnerability threats. Among them, adversarial attacks represent one of the most critical issues. This chapter provides an overview of the ML vulnerability challenges, with a focus on the security threats for Deep Neural Networks, Capsule Networks, and Spiking Neural Networks. Moreover, it discusses the current trends and outlooks on the methodologies for enhancing the ML models’ robustness.
Article
Full-text available
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Article
Full-text available
Advances in deep learning have led to the broad adoption of Deep Neural Networks (DNNs) to a range of important machine learning problems, e.g., guiding autonomous vehicles, speech recognition, malware detection. Yet, machine learning models, including DNNs, were shown to be vulnerable to adversarial samples-subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. Adversarial examples are known to transfer from one model to another, even if the second model has a different architecture or was trained on a different set. We introduce the first practical demonstration that this cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data. In our demonstration, we only assume that the adversary can observe outputs from the target DNN given inputs chosen by the adversary. We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%. We conclude with experiments exploring why adversarial samples transfer between DNNs, and a discussion on the applicability of our attack when targeting machine learning algorithms distinct from DNNs.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. Specifically, we find that we can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
Conference Paper
Several machine learning models, including neural networks, consistently mis- classify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed in- put results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to ad- versarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Us- ing this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.
Article
A recent paper suggests that Deep Neural Networks can be protected from gradient-based adversarial perturbations by driving the network activations into a highly saturated regime. Here we analyse such saturated networks and show that the attacks fail due to numerical limitations in the gradient computations. A simple stabilisation of the gradient estimates enables successful and efficient attacks. Thus, it has yet to be shown that the robustness observed in highly saturated networks is not simply due to numerical limitations.
Article
Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world. In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network's vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.
Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems
  • T Chen
  • M Li
  • Y Li
  • M Lin
  • N Wang
  • M Wang
  • T Xiao
  • B Xu
  • C Zhang
  • Z Zhang
T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015. URL http://arxiv.org/abs/1603.04467.
Keras. https://github.com/fchollet/keras
  • F Chollet
F. Chollet. Keras. https://github.com/fchollet/keras, 2015.
Foolbox v0.8.0: A Python toolbox to benchmark the robustness of ML models S. K. Sonderby, et al. Lasagne: First release
  • S Dieleman
  • J Schlüter
  • C Raffel
  • E Olson
S. Dieleman, J. Schlüter, C. Raffel, E. Olson, Foolbox v0.8.0: A Python toolbox to benchmark the robustness of ML models S. K. Sonderby, et al. Lasagne: First release., Aug. 2015. URL http://dx.doi.org/10.5281/zenodo.27878.
an adversarial machine learning library. arXiv preprint
  • N Papernot
  • I Goodfellow
  • R Sheatsley
  • R Feinman
  • P Mcdaniel
N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel. cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2016.