Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Outlier-Oriented Poisoning Attack: A Grey-box Approach to Disturb Decision
Boundaries by Perturbing Outliers in Multiclass Learning
Anum Paracha, Junaid Arshad, Mohamed Ben Farah, Khalid Ismail
College of Computing, Birmingham City University, UK
Abstract
Poisoning attacks are a primary threat to machine learning models, aiming to compromise their performance and reliability by
manipulating training datasets. This paper introduces a novel attack - Outlier-Oriented Poisoning (OOP) attack, which manipulates
labels of most distanced samples from the decision boundaries. The paper also investigates the adverse impact of such attacks on
different machine learning algorithms within a multiclass classification scenario, analyzing their variance and correlation between
different poisoning levels and performance degradation. To ascertain the severity of the OOP attack for different degrees (5%
- 25%) of poisoning, we analyzed variance, accuracy, precision, recall, f1-score, and false positive rate for chosen ML models.
Benchmarking our OOP attack, we have analyzed key characteristics of multiclass machine learning algorithms and their sensitivity
to poisoning attacks. Our experimentation used three publicly available datasets: IRIS, MNIST, and ISIC. Our analysis shows that
KNN and GNB are the most affected algorithms with a decrease in accuracy of 22.81% and 56.07% while increasing false positive
rate to 17.14% and 40.45% for IRIS dataset with 15% poisoning. Further, Decision Trees and Random Forest are the most resilient
algorithms with the least accuracy disruption of 12.28% and 17.52% with 15% poisoning of the IRIS dataset. We have also analyzed
the correlation between number of dataset classes and the performance degradation of models. Our analysis highlighted that number
of classes are inversely proportional to the performance degradation, specifically the decrease in accuracy of the models, which is
normalized with increasing number of classes. Further, our analysis identified that imbalanced dataset distribution can aggravate
the impact of poisoning for machine learning models.
Keywords: Data Poisoning Attack, Outliers Manipulation, Multiclass Poisoning, Confidence Disruption, Optimal Poisoning,
Behavioral Analysis, Integrity Violation
1. Introduction
The widespread use of machine learning in diverse fields
such as cyber security [16], healthcare [34], and autonomous
vehicles [49] make it an attractive target for adversaries.
Machine learning models are susceptible to various types
of adversarial attacks, typically classified as poisoning [14],
evasion [10], backdoor [47], inversion [36] and inference
[27], [6] attacks. Some potential attacks are demonstrated in
[7,57,22,31,48]. Of these, poisoning attacks are one of
the most common in literature, aiming to manipulate training
datasets to corrupt machine learning models. Poisoning attacks
are either aimed at the overall performance degradation (
availability attacks) or targeted to mislead on specific in-
stances (integrity attacks). As a reasonably large amount of
data is required for model training, it enables adversaries to
manipulate and poison datasets that may be difficult to cleanse.
Prior research on poisoning availability and integrity attacks
against machine learning is mostly focused on deep learning
and binary classification, however, exploring poisoning effects
on multiclass classifiers is limited.
Practical interpretation of poisoning availability attacks and
their impact on performance degradation on machine learning
models have been studied extensively in literature such as
[9,44,29,28,23,15]. An effective approach of creating
poisoned data points with a generative adversarial network
(GAN) in the same underlying distribution as the irreducible
noise of the classifier is presented in [52]. C. Zhou et al. [1] de-
veloped an object-attentional adversarial attack and generated
a unique dataset with adversarial images. Another stealthy data
poisoning attack is developed in [2] through which adversarial
samples exhibit the properties of desired samples. Similarly,
[24] presented an approach to craft data points so they are
placed near one another to bypass data sanitization by evading
anomaly detection in the spam detection classifier. Further, X.
Zhang et al. [51] studied the adverse impact of data poisoning
in online settings which highlighted its effectiveness against
reinforcement learning. In research studies [13,50,55],
features perturbation methods are used to poison training
of the machine learning. From the literature, roughly three
approaches are used for data poisoning attacks. Firstly, label
poisoning [26,41,8] which typically follows the label-flipping
method for poisoning training datasets. Secondly, clean-label
poisoning [56,4] is formally crafted by solving one or more
optimization algorithms such as bi-level optimization [30,38],
or gradient-descent [40] to craft poisoned data points and inject
into the machine learning model. And thirdly, the existing
dataset is manipulated with feature perturbation.
Preprint submitted to Elsevier November 4, 2024
However, poisoning attacks against multiclass machine learn-
ing are explored on a limited scale. MetaPoison [19] solves
the bi-level optimization with meta-learning to craft poison
against neural networks. It is practically implemented on
Google Cloud AutoML API and extended to be experimented
on multiclass neural networks. Sub-population data poisoning
[21] populates a perturbed cluster into the target. Its efficacy
is highlighted with a variety of neural networks with multiple
datasets. Another research study [35] proposed a back-gradient
based poisoning attack and extended their experimentation
from binary classification to multiclass classification. Exper-
imentation of this research study is focused on the poisoning
availability of targeted subclass in neural networks. Also,
existing research is limited to the analysis of performance
degradation only, considering the accuracy of the model,
mostly considering poisoning neural networks only.
In view of the limitations highlighted above, we have ex-
plicitly identified the need to analyze the behavior of individ-
ual algorithms against poisoning attacks in multiclass settings.
We have extended the experimentation of multiclass poison-
ing attack and proposed a novel attack method, with outliers
poisoning, to poison multiclass supervised machine learning.
Mentioning multiclass poisoning from literature that is mostly
limited to the poisoning of neural networks, we leverage our
research to study poisoning on six supervised machine learn-
ing algorithms including Support Vector Machines (SVM), De-
cision Tree (DT), Random Forest (RF), K-Nearest Neighbors
(KNN), Gaussian Naive Bayes (GNB) and Neural networks
Multi-layer Perceptron(MLP). Selecting these algorithms al-
lows us to cover the complete baseline of machine learning
classification methods. With our analysis, we have identified
important parameters of each algorithm that are sensitive to
poisoning attacks, answering how the models are getting mis-
classified and identifying optimal poisoning rates for each al-
gorithm. We have provided an in-depth investigation of the
performance degradation of multiclass models, quantifying the
accuracy variance of the models. Our research contributions
are given as follows:
•A thorough behavioral analysis of multiclass classifiers is
conducted analyzing the correlation between different poi-
soning levels and performance degradation of classifiers.
Using a range of poisoning levels ∆L=5% −25%, we
highlight the optimal rate of poisoning that can mislead
classification in an obscure manner.
•We analyze the behavior of individual machine learning
algorithms in multiclass settings and the impact of the
noisy dataset on our poisoning attack. We successively
study the impact of non-uniformly structured features and
imbalanced dataset distribution. We further explained dif-
ferent behaviors of poisoned datasets where it work as
a catalyst for data poisoning, leading to an impractical
model disruption.
•We have developed a novel label poisoning attack to intro-
duce misclassification in multiclass machine learning. Our
attack is formulated based on the label perturbation of the
most distanced data points from the decision boundaries
of the multiclass classifier where the data points distances
are calculated by training surrogate models.
•By implementing our attack, we analyze the key factors
for each algorithm that are affected by data poisoning. For
our analysis, we have implemented our attack on six ma-
chine learning algorithms with IRIS, MNIST, and ISIC
datasets. Our research findings serve as a baseline to
strengthen mitigation against data poisoning in multiclass
models.
2. Existing Multiclass Machine Learning Poisoning
2.1. Existing Attacks to Poison Multiclass Models
Existing literature highlights a significant count of poison-
ing attacks that harm the integrity and availability of machine
learning models. Such as B. Zhao et al. [53] proposed a
class-oriented poisoning attack to introduce misclassification
for a targeted dataset class. Similarly, N. Carlini et al. [11]
highlighted a security threat of poisoning and backdoor attacks
against multiclass machine learning with only 0.0001% of data
poisoning. They have introduced misclassification in the tar-
geted model with training-time overfitting to increase error at
testing. I. Alarab et al. [5] have developed a Monte-Carlo
based poisoning attack against deep learning multiclass models
to analyze their classification uncertainties. Whereas, V. Pante-
lakis et al. [37] have evaluated the performance disruption of
IoT-based multiclass models against JSMA, FGSM, and Deep-
Fool attacks with which effectiveness of these attacks are high-
lighted to poison multiclass models. Also, some other promi-
nent poisoning attacks are [43,32,39]. Existing studies have
experimented with complex deep learning and machine learn-
ing models, given in Table 1. Whereas, it is important to under-
stand the behavior of the underlying baseline models and their
sensitivity against poisoning attacks. This investigation helps
us better mitigate poisoning not only focusing on their perfor-
mance but underlying classification mechanisms. Considering
the above highlighted attack, our work has focused on manip-
ulating outliers to disrupt the features spaces of the multiclass
models, discussed in Section 4. In this work, we have shown the
efficacy and effectiveness of our attack on six machine learning
algorithms at various poisoning levels.
2.2. Existing Security Techniques against Multiclass Model
Poisoning
Limited techniques are provided in literature to cleanse
datasets to mitigate poisoning effects against multiclass ma-
chine learning models. A. McCarthy et al. [3] proposed a
hierarchical learning mechanism to secure network traffic at-
tack classification model. K. M. Hossain et al. [18] developed
a solution to detect backdoor poison in deep neural networks
by extracting, relabeling, and classifying features with a ten-
sor decomposition method. They have experimented their mit-
igation solution with MNIST, CIFAR-10, and TrojAI datasets.
2
Table 1: Analyzing existing studies against our behavioral analysis with OOP attack
Research paper ML model Dataset Effective poisoning
level
Model degradation and variance at:
Various poisoning lev-
els
Various
classes
B. Zhao et al. [53] LeNet-5, Vgg-9,
ResNet-50
MNIST, CIFAR-10,
ImageNet
✗ ✗ ✗
N. Carlini et al. [11] ResNet-50, Transformer
language model
Conceptual Cap-
tions
✗ ✗ ✗
I. Alarab et al. [5] LEConv, CNN Cora, MNIST ✗ ✗ ✗
V. Pantelakis et al.
[37]
DT, RF, KNN, MLP IoTID20 ✗ ✗ ✗
Curie [25] is the method proposed to mitigate poisoning attacks
against SVM. This method has introduced an additional fea-
ture dimension to map labels with features that help segregate
the poisoned data points with flipped labels from the normal
data points. S. Melacci et al. [33] have experimented the ef-
fectiveness of incorporating domain knowledge, into the neu-
ral networks, in detecting adversarial data points added in the
model training. They have experimented their solution on neu-
ral networks with CIFAR-100, ANIMALS, and PASCAL-Part
datasets. Highlighting the above solutions which explained the
efficacy of their proposed poisoning methods, we have explic-
itly mentioned the need for a deep behavioral analysis of multi-
class classifiers under poisoning attacks to identify key charac-
teristics of machine learning algorithms affected by poison and
reveal relations between those characteristics and the injected
poison. Further to achieve this, we have discussed our threat
model and attack formulation in Section 3.
3. Formal Notations of Multiclass Poisoning
3.1. Threat Model
We have taken a practical approach when designing our at-
tack strategy, aligning our attack settings with the assumption
that the adversary Adv does not know the underlying settings
of the targeted model Mcand dataset distribution dist(Dc).
Three different datasets—IRIS, MNIST, and ISIC—each with
three, ten, and four classes respectively, were used to assess
our methodology. We have formulated our attack in end-to-
end settings of poisoned training, of benign models, where only
datasets are known to the adversary. Surrogate models Msare
developed and trained for crafting poisoned data points X′
cwith
perturbed class labels l′
cat different poisoning levels ∆L. Poi-
soning levels are set between 5%-25% inclusively, at the scale
of 5. Our OOP attack initiates multiclass poisoning by calcu-
lating the maximum distanced data points to be contaminated
and changing their classes l′
c. With this outliers perturbation,
we manipulate benign features spaces by misplacing outliers in
those features spaces. The goals of our OOP attack are to assess
the performance degradation of individual algorithms and ana-
lyze the behavior of multiclass models, under the OOP attack.
For the dataset manipulation with our OOP attack, let Xc∈Dc
be the distanced data point that is infected by manipulating its
label lcto increase the loss of the model Lwith change γin
multiclass decision boundaries bcas:
L(Mc,D′
c)=γ= ∆bc(T(Mc,D′
c)) (1)
where Tis the model training process.
3.2. Outlier-Oriented Poisoning (OOP) Attack Settings
We have developed our attack for analyzing the classifica-
tion disruption of individual algorithms in multiclass classifi-
cation settings. We are developing surrogate models to poi-
son datasets because based on the assumptions of our threat
model, given Section 3.1, we do not know about configurations
of victim models. Firstly, we have initialized surrogate models
with each algorithm and trained them with the targeted datasets.
Secondly, the distances of each data point are calculated from
the decision boundary for each class to manipulate those far
from the decision boundaries following our attack settings with
Definitions 3.2.1,3.2.2 and 3.2.3 where Dcis the dataset with
the nnumber of classes and Mcis the clean model and M′
cis
the poisoned model.
3.2.1. Definition 1 (Multiclass Poisoned Training)
Considering Tis the model training process with poisoned
dataset D′
cand PM⌋is the function of performance measure, the
objective function of our attack method is given in Equation 2
whereas, θis the measure of distance of data points from the
decision boundaries. Mathematical notation of θis given in
Equation 3.
arg min PM⌋(Mc(X′
c,l′
c); θ)) (2)
s.t. θ=arg max
d(bc,Xc)(3)
Also, D′
cis the poisoned dataset manipulated at various poison-
ing levels ∆Lwhere the notation of dataset poisoning is given
in Equation 4.
D′
c=
n→∆L
X
i=1
f(Dci(Xci,lci),∆L)
where; Xlc,X′
lc
(4)
where, fis the function of manipulating labels, Xcis the clean
data point, X′
cis the poisoned data point and lcis the label.
3
3.2.2. Definition 2 (Multiclass Model Disruption)
Let poison levels ∆L=[5%,10%,15%,20%,25%] manipulate
datasets to mislead models at test-time by disturbing class-level
decision boundaries bcwith notation given in Alg 1. Let fis
the function to poison dataset Dcat ∆Lpoison level. M′
cis the
poisoned model trained with a dataset having manipulated data
points X′
cas given in Eq 5.
M′
c=T(Mc,D′
c)
where; D′
c=f(Dc(Xc,lc),∆L)(5)
which allows us to analyze the model behavior and change in
decision boundaries as given in Eq 6.
ModDi s = ∆b(M′
c) (6)
where M′
cis the poisoned model developed for algorithms
[SVM, DF, RF, KNN, GNB, MLP] and ∆bis the change in
decision boundaries.
3.2.3. Definition 3 (multiclass Performance Analysis)
To conduct a statistical analysis of the performance degrada-
tion of multiclass models and the variance in test-time classi-
fication across different poisoning levels, we define the correct
classification rate given by Eq 7.
CCR =Pn
i=1f(Nc,Cci(M′
ci(Dt(Xti))))
Pn=1
i=0Nc
and f (Nc,C(Dt(Xt))) =
true if Xt∈Class c
f alse otherwise
(7)
where, fis the function of classification, Xtis the data point
from the validation dataset Dt,Ncis the total number of data
points in Class c, C(.) is the class estimator and CCR is the un-
poisoned classification rate.
4. Outlier-Oriented Poisoning (OOP) Attack Method
Instinctively, we are poisoning the training dataset to dis-
rupt machine learning performance at validation. The proposed
outlier-oriented poisoning attack algorithm is given in Alg 1,
With the OOP attack, we are focusing on manipulating labels
of most distanced data points from the class boundaries to shift
the classification predictions. This approach follows the threat
model outlined in Section 3.1. Based on our threat model, we
don’t have any configurations details of the targeted model and
have developed surrogate models to calculate data points dis-
tances. The surrogate model development algorithm is pre-
sented in Alg. 2for training surrogate models. Furthermore,
we have enhanced our attack strategy by distinctively calcu-
lating decision boundaries for the considered machine learning
algorithms. The algorithm to calculate decision boundaries is
described in Alg. 3, following the attack settings given in Sec-
tion 3.2.
4.1. Evaluation Metrics
For the performance evaluation and analysis of the impact of
poisoning availability attacks on multiclass supervised machine
learning algorithms, we have evaluated our poisoned models
by analyzing how many outliers successfully intruded them-
selves in wrong classes which is the False Positive Rate(FPR)
of our poisoned model. However, where our poisoned out-
liers remain disjointed in the incorrect classification classes and
model availability is intact is the Accuracy(Acc), and where
the outliers are unsuccessful in intruding the multiclass decision
boundaries is the Precision of the model against OOP attack.
Recall in our evaluation is the quantification where a model can
segregate dataset classes and keep the decision boundaries in-
tact. And Variance(Var) in the change in model behavior with
the change in the values of its parameters with the discrepancy
in the dataset. Considering f(Nc,C(Xt)) is the classification
function as given in Eq 8, the evaluation metrics are mentioned
in Eq 9,10,11,12 and 13.
f(Nc,C(Xt)) =
true if Xt∈Class c
f alse otherwise (8)
FPR =Pn
i=0ftr (Nc,C(X′
ti))
Pn
i=0f(Nc,C(X′
t)) ∧Pn
i=0f(Nc,C(Xt))
where ftr (Nc,C(X′
ti)) ∈D′
c
and f (Nc,C(X′
t)) ∈D′
c
and f (Nc,C(Xt)) ∈Dc
(9)
where, Dcis the clean dataset, Ncis the total number of data
points in Class c, and D′
cis the poisoned dataset with changed
class labels of the farthest data points. ftr(Nc,C(X′
ti)) are poi-
soned data points with perturbed labels and classified as false
positives(FP) and ff s(Nc,C(X′
tri)) are false negative(FN) data
points.
Acc =Pn
i=0ff s(Nc,C(Xti)) ∧Pn
i=0ftr (Nc,C(Xti))
(Xc∈Dc)∧(X′
c∈D′
c)(10)
Prn =Pn
i=0ftr (Nc,C(Xti))
Pn
i=0ftr (Nc,C(Xti)) ∧Pn
i=0ftr (Nc,C(X′
ti))
where ftr (Nc,C(X′
ti)) ∈D′
c
(11)
Rcl =Pn
i=0ftr (Nc,C(Xti))
Pn
i=0(ftr (Nc,C(Xti))) ∧Pn
i=0(ff s(Nc,C(Xti)))
where f f s (Nc,C(X′
ti)) ∈D′
c
(12)
Variance(σ)=1
Nc
n
X
i=0
(f(Nc,C(Xti))−µ(f(Nc,C(Xti))))2(13)
4
Algorithm 1 OOP Poisoned Model Generation
Datasets: IRIS, MNIST, ISIC, ChestX-ray-14 datasets
Inputs: Training Dataset Dc, Poison level ∆L
Outputs: Poisoned Model M′
c
Initialize:Dc←Training dataset
∆L←Poisoning level ∈[0%,5%,10%,15%,20%,25%]
Mcon f ←[S VM,DT,RF,GN B,K N N,MLP]
D′
c←Poisoned dataset =[]
Ddist ←subset of Training dataset
while len(D′
c)≤∆Ldo
Set index i =max(Ddist )
Set data point dc=Dc[i]
if dcnot in D′
cthen
Set lc=Class(dc)
Update lc=lx; where x,c
Update Class(d′
c)=lx
end if
D′
c←d′
c
Set Ddist [i]=0
end while
D′
ctrain =split(D′
c, 0.75)
M′
c=train(Mcon f ,D′
ctrain )
return M′
c
5. Experimentation and Ablation Study
5.1. Experimental Setup
We have implemented our OOP attack on six multiclass ma-
chine learning algorithms, utilizing three datasets in five differ-
ent poisoning settings detailed in Section 3.2. We have devel-
oped surrogate models of all six algorithms with model config-
urations given in Alg 2for experimentation in a grey-box sce-
nario. Following [46], [20], and [54], we have selected IRIS,
MNIST, and ISIC multiclass datasets for our experimentation.
Section 5.2 gives a dataset description and analysis. For our
experiments, 75% of the dataset has been allocated for training
-including poisoned training- while the remaining 25% cleaned
dataset is used for testing the model performance. This setup
allowed us to assess the impact of our outlier-oriented poison-
ing approach on the models using evaluation metrics outlined
in Section 4.1. All models are developed in Python with Scikit-
learn, Pandas, Numpy, and OpenCV libraries, and they were
experimented on a Windows 11 Pro 64-bit machine with 4 core
CPUs and 16 GB RAM.
5.2. Datasets Description
Outlier-oriented label poisoning is implemented on mul-
ticlass classifiers using three benchmarked datasets (IRIS,
MNIST, and ISIC). All the datasets are comprised of differ-
ent numbers of classes, features, sizes, and structures. Datasets
characteristics are provided in Table 2. By employing our at-
tack across datasets with differing structures, we provide a com-
prehensive analysis of how data poisoning influences feature
correlations, class numbers, and dataset sizes within multiclass
contexts. The visual datasets representation with the Gaussian
Table 2: Dataset description used for behavioral analysis with
distance-based attack
S.No. Dataset No. of
features
No. of
Classes
No. of instances
1 IRIS 4 3 170
2 MNIST 784 10 70,000
3 ISIC 20 4 603
Mixture Model is given in Figure 2, highlighting their features
correlation. Figure 2(a) illustrates that certain features within
the IRIS dataset are strongly interdependent, whereas the com-
plete dataset is not in a linear relation. However, MNIST is
found to be a highly dense dataset with strong features relations
as visualized in Figure 2(b). The ISIC dataset, shown in Figure
2(c), displays a non-linear relationship with significant outliers,
indicative of substantial noise levels. Statistical correlations
and feature relationships are quantified in Table 3. Features
in the MNIST dataset are highly associated with a p-value of
0.0141, highlighting direct proportionality between its features.
A low statistical significance is shown in IRIS datasets with a
p-value of 0.07, and the p-value of the ISIC dataset is 0.2396.
In contrast, a negative Spearman correlation coefficient high-
lights a negative linear correlation between its features with a
high noise ratio. Further analysis of the importance of features
correlation and impact of dataset noise in our OOP Attack is
given in Section 5.3.
5.3. Experimental Results and Analysis
In our study, we conducted experimental evaluations of our
OOP attack on multiclass machine learning algorithms. Our
5
Algorithm 2 Surrogate Model Development
Datasets: IRIS, MNIST, ISIC, ChestX-ray-14 datasets
Inputs: Training Dataset Dc, Model Configuration Mconf
Outputs: Surrogate Trained Model Msur r
Initialize:Dc←Training dataset
Mcon f ←[ Support Vector Machines (SVM) =Config(kernel=’poly’, degree of polynomial function=3, regularization parame-
ter=3),
Decision Tree(DT) =Config(criterion=’gini’, splitter=’best’)
Random Forest(RF) =Config(n estimators=3, criterion=’gini’)
K-Nearest Neighbours (kNN) =Config(n neighbors=5, weights=’uniform’)
Gaussian Naive Bayes (GNB) =Config(var smoothing=1∗10−9)
Multi-layer Perceptron (MLP) =Config(activation=’relu’, solver=’adam’)]
for config in Mcon f do
Msurr (con f ig)=initialize(Msurr ,con f ig)
Msurr (con f ig)=training(Msurr (con f ig),Dc)
end for
return Msurr(con f ig)
Table 3: Statistical correlation of features in dataset
S.No. Dataset Spearman Correlation p-value
1 IRIS 0.123888 0.0791
2 MNIST 0.009282 0.0141
3 ISIC -0.014311 0.2396
objective was to analyze the behavior of multiclass models and
answer questions about how the characteristics of these mod-
els are affected and identify their relationship with the poison.
What are the complacent poisoning levels ∆Land effects of
changing poisoned data distributions? What is the effective-
ness and persistence of data poisoning with our OOP attack
and its impact on model validation performance (specifically
accuracy)? And quantifying and analyzing model variance σat
test-time classification at different poisoning levels ∆L.
5.3.1. Effects of OOP Attack and Factors Affecting Multiclass
Model Classification
We have initially evaluated the impact of our OOP attack
on various multiclass models using three datasets to degrade
their overall performance with the attack settings given in Sec-
tion 3.2. Baseline results are given in Figure 2to Figure 7
where validation accuracy, precision, recall, f1-score and False
Positive Rate are plotted against poisoned training with max-
imum poisoning level ∆L=25%. Our findings indicate that
the KNN algorithm was particularly vulnerable, experiencing
the most significant accuracy disruption with a maximum de-
crease in accuracy (λ)=40.35 at ∆L=25% with an increase in
FPR=31.6% from FPR=2.7%, shown in Figure 5(a). This vul-
nerability stems from KNN being a non-parametric algorithm
that relies on the proximity of data points to determine class
features. Our attack manipulates these features spaces exploit-
ing their outliers, and causing misclassification. From Table
4, no. of nearest neighbors found to be inversely proportional
to ∆L, reducing attack success rate 15.79% to 2.76% for IRIS
dataset by changing k=3 to k=15. Figure 5(c) demonstrates
high ASR when KNN is trained with ISIC dataset, decreasing
its validation accuracy to 63% with FPR=28.25%. Where from
Table 4, increasing no. of nearest neighbors decreases ASR
from 3.97% to 3.31% with ∆P=25%.
Table 4: Analyzing k-neighbors affecting KNN accuracy with ∆L=
(0,10,15,25)%
Poison Level k=3 k=5 k=10 k=15
IRIS
∆L=0% 94.73 97.50 97.36 97.36
∆L=10% 89.47 97.36 97.30 94.73
∆L=15% 81.57 92.10 94.73 92.10
∆L=20% 78.94 84.21 94.60 94.60
MNIST
∆L=0% 98.16 97.55 96.94 96.55
∆L=10% 92.41 96.52 96.78 96.50
∆L=15% 89.44 90.90 94.54 95.95
∆L=25% 85.34 76.14 83.68 87.52
ISIC
∆L=0% 80.79 82.11 70.19 77.48
∆L=10% 77.48 77.48 66.88 74.17
∆L=15% 76.15 74.17 68.87 76.13
∆L=25% 76.82 74.07 64.90 74.17
GNB is the second most affected algorithm with a decrease
in validation accuracy from 92.98% to 56.14% and an increase
in FPR from 5.68% to 32.49% at 0% ≤∆L≤25%, for IRIS
dataset, given in Fig 4. Interestingly, the GNB model is failing
with our attack at ∆L=15% where it’s precision ≤0 where a
lower impact can be seen with MNIST and ISIC datasets. Fur-
ther analysis reveals the change in the importance of classes,
leading to misclassification, with changing class probabilities
at poisoning levels, given in Table 5. Our attack manipulates
the Gaussian probability measures, making the highest proba-
bility class an anomaly and vice versa for the IRIS dataset how-
ever minor changes are visible for MNIST and ISIC datasets
with no change in classes ranking at 0% ≤∆L≤15%. Our
analysis also highlights that GNB is the most affected algo-
6
Algorithm 3 Calculating Distances from Decision Boundaries
Inputs: Surrogate Models Msurr, Training Dataset Dc
Outputs: Calculated distances of Models distM
Initialize:distM=[distS V M ,distDT ,di stRF ,distGN B ,distKN N ,distM LP ]
dp←Model data points
Msurr =[MS V M ,MDT ,MRF ,MGN B ,MKN N ,MML P]
if Msurr == MS V M then
for dp∈MS V M do
distS V M [dp]←decisionfunction(dp,MS V M )
end fordistM[S V M]=dp
end if
if Msurr == MDT then
Cl ftree =MDT .tree
for dp∈Dcdo
dist[dp]←calculate de pth(dp,C l ftree)
end fordist[DT ]=dp
end if
if Msurr == MRF then
for cl fx∈MRF do
Cl ftree =cl fx.tree
for dp∈Dcdo
dist[dp]x=calculate de pth(dp,C l ftree)
end for
end for
distM[RF]=avg(dist[dp]x,di st[dp]x+1, ...di st[dp]x+n)
end if
if Msurr == MKN N then
for dp∈Dcdo
dist(dp)neighbors =MK N N .kneighbors
distM[KNN]←arg max(distance(dp)neighbor s )
end for
end if
if Msurr == MGN B then
Dca,Dcb=split(Dc,2)
for i∈[Dca,Dcb]do
j=−i+1
for dp∈Dc[i]do
Class(dp)=predictprobability(Dc[j],MGN B )
loglikelihood ←log(Class(dp))
distance(dp)←distance(arg max(Clas s(dp),axis =1))
end for
end fordistM[GN B]=distance(dp)
end if
if Msurr == MML P then
for dp∈MMLP do
distM[MLP]←deci sion function(dp,MM LP )
end for
end if
return distM
7
(a) Features correlation in IRIS dataset (b) Features correlation in MNIST dataset
(c) Features correlation in ISIC dataset
Figure 1: GMM visualization of features relationship in the dataset with PCA reduction
rithm when trained with a dataset with fewer classes. Whereas
our OOP attack has minimally disrupted DT, resulting in λval-
ues of 31.6 for IRIS, 15.18 for MNIST, and 17.88 for ISIC at
∆L=25%. Table 6demonstrates the change in features im-
portance scores with dataset poisoning where feature1 scores
(0.90, 0.39) remain highest for IRIS and MNIST. But feature1
(0.36) with the highest importance score for ISIC becomes
anomalous making anomaly feature2 (0.37), the most impor-
tant feature at ∆L=15%, degrading its classification. Random
Forest algorithm demonstrates relative robustness, with its FPR
converge to ≈2% with an overall accuracy decrease to 61.25%
from 87% for the ISIC dataset and FPR converge to ≈9% for
the MNIST dataset with accuracy of 82.38% at ∆L=25% as
shown in Figure 3. Because RF follows the ensemble approach
and classifies averaging decisions from all of its trees which
normalizes the poisoning effects in our case. The change in
features importance scores for RF is given in Table 9where
features ranks remain the same for IRIS and MNIST but for
ISIC highest ranked feature dropped to rank two at ∆L=15%
poisoning. Lastly, SVM and MLP are also not found to be very
sensitive to our OOP attack. For SVM, features ranks remain
intact, given in Table 7, except for ISIC where feature3 (0.39)
importance score reduces to (0.33) at ∆L=15% making it an
anomaly. A lower impact is visible on MLP from Fig 7, with
our attack except at ∆L=15% where it is failing for the IRIS
dataset.
5.3.2. Effects of Increasing Poisoning Rate
We extended our analysis to study the effects of consis-
tently increasing poisoning rates on multiclass models with our
OOP attack. The aggregated results, given in Figure 2to Fig-
ure 7, shows over-fitting? No. Our results demonstrated that
the classification accuracy of multiclass classifiers has maxi-
mum disruption when the training dataset is poisoned with our
OOP attack at ∆L=10% irrespective of datasets. An inverse
relationship was observed between the number of classes in
the dataset and the rate of performance degradation. For the
MNIST dataset, from Fig 2(b) to Fig 7(b), with ten dataset
classes has a steady decrease in performance. Whereas, clas-
sifiers trained with the IRIS dataset, with three dataset classes,
have high fluctuation in performance followed by ISIC with
four classes. Least percentage of data poisoning is more ef-
fective on parametric models. 10% poisoning has a steady and
practical impact on parametric models whereas 15% poisoning
leads to impractical effects. From Fig 4(a) and Fig 7(a) para-
metric models, with minimum no. of classes, are failing at 15%
poisoning. But, ∆L=15% is very effective for non-parametric
models. Conclusively, 10% ≤∆L≤15% are the optimal poi-
soning rates for multiclass models where ∆L>15% shows an
8
(a) Poisoning SVM with IRIS dataset (b) Poisoning SVM with MNIST dataset (c) Poisoning SVM with ISIC dataset
Figure 2: Performance analysis of Support Vector Machines(SVM) with consistent poisoning
(a) Poisoning RF with IRIS dataset (b) Poisoning RF with MNIST dataset (c) Poisoning RF with ISIC dataset
Figure 3: Performance analysis of Random Forest(RF) with consistent poisoning
(a) Poisoning GNB with IRIS dataset (b) Poisoning GNB with MNIST dataset (c) Poisoning GNB with ISIC dataset
Figure 4: Performance analysis of Gaussian Naive Bayes(GNB) with consistent poisoning
(a) Poisoning KNN with IRIS dataset (b) Poisoning KNN with MNIST dataset (c) Poisoning KNN with ISIC dataset
Figure 5: Performance analysis of K-Nearest Neighbours(KNN) with consistent poisoning
impractical success.
5.3.3. Model Sensitivity to Poison and Effects of Data Distri-
bution
We investigated model sensitivity by analyzing the relation-
ship between model variance and ASR. Table 10 illustrates the
variance in machine learning models in response to our OOP
attack. This attack significantly increased the sensitivity of all
tested models, with GNB exhibiting the highest sensitivity. Its
variance leads to 0.8 at ∆L=10%, for the IRIS dataset, al-
most equivalent to DT where it fails. Similarly, 0.10 variance
increases for KNN at ∆L=15%, highlighting its high sensitiv-
ity and effectiveness of our OOP attack. And, RF and DT are
proved to be less sensitive to our outlier-oriented attack. Inter-
estingly, on average models trained with MNIST and ISIC are
9
(a) Poisoning DT with IRIS dataset (b) Poisoning DT with MNIST dataset (c) Poisoning DT with ISIC dataset
Figure 6: Performance analysis of Decision Tree(DT) with consistent poisoning
(a) Poisoning MLP with IRIS dataset (b) Poisoning MLP with MNIST dataset (c) Poisoning MLP with ISIC dataset
Figure 7: Performance analysis of MLP with consistent poisoning
Table 5: Analyzing class probabilities of GNB with poisoned dataset
Dataset Dataset
Class
Clean
Dataset
∆L=10% ∆L=15%
IRIS
Class 0 0.33 0.36 0.38
Class 1 0.35 0.25 0.33
Class 2 0.31 0.37 0.27
MNIST
Class 0 0.09 0.09 0.09
Class 1 0.11 0.11 0.11
Class 2 0.09 0.09 0.09
Class 3 0.10 0.10 0.10
Class 4 0.09 0.10 0.09
Class 5 0.08 0.09 0.09
Class 6 0.09 0.09 0.09
Class 7 0.10 0.10 0.10
Class 8 0.09 0.09 0.09
Class 9 0.09 0.10 0.10
ISIC
Class 0 0.76 0.69 0.64
Class 1 0.05 0.08 0.10
Class 2 0.02 0.04 0.07
Class 3 0.14 0.17 0.17
also less affected by our poisoning attack compared to models
trained with IRIS dataset with high impact.
Further analysis was conducted on dataset distribution to as-
certain its impact on data poisoning and performance degra-
dation in models. Fig 8shows the change in data distribution
with our OOP attack at 0% ≤∆L≤25%. Our findings suggest
that balanced datasets with a greater number of classes tend to
mitigate the effects of poisoning on model performance, partic-
ularly in terms of model accuracy. In contrast, imbalanced and
noisy datasets work as catalysts and boost the poisoning effects
of our attack, leading to impractically high decrease in perfor-
mance such as for the ISIC dataset in our case, as shown in Fig
8(c). From our analysis, we have identified relations of vari-
ous classification characteristics and subsequent rates of data
poisoning in Table 11.
6. Discussion and Limitations
•Our outlier-oriented poisoning (OOP) attack method
We formalize a novel grey-box attack to attempt poisoning
multiclass models, describing their efficacy and analyz-
ing the factors affecting their classification behavior. Al-
though, several adversarial poisoning techniques are pro-
posed in the literature, but limited experimentation is pro-
vided on multiclass classifiers. Existing research papers
[42], [12], [17] and [45] proposed solutions focusing dis-
crete dataset features and detecting outliers to lessen poi-
soning effects. We have taken the outliers into the feature
space to effectively poison the model. Following this, we
have highlighted certain factors affecting individual algo-
rithms and also determined effective levels of poisoning
for parametric and non-parametric multiclass models. Our
results showed that a 10% poisoning rate is optimal for the
parametric and 15% for the non-parametric models. At
these optimal poisoning, we have analyzed a lower level
of model sensitivity which does not allow the model to
over-fit, highlighting the efficacy of our attack.
•Factors affecting the behavior of poisoned multiclass
models Implementing OOP attack, we have conducted
a deep behavioral analysis of multiclass machine learn-
ing, identifying factors affecting the confidence of models.
10
Table 6: Features importance score - DT where ∆L=(0%,10%,15%)
Dataset
Clean Dataset Poisoned Dataset ∆L=10% Poisoned Dataset ∆L=15%
Feature1 Feature2 Feature3 Feature1 Feature2 Feature3 Feature1 Feature2 Feature3
IRIS 0.90 0.00 0.02 0.87 0.008 0.11 0.79 0.07 0.12
MNIST 0.39 0.34 0.26 0.39 0.33 0.27 0.39 0.32 0.28
ISIC 0.36 0.28 0.35 0.28 0.38 0.32 0.32 0.37 0.30
Table 7: Features importance score - SVM where ∆L=(0%,10%,15%)
Dataset
Clean Dataset Poisoned Dataset ∆L=10% Poisoned Dataset ∆L=15%
Feature1 Feature2 Feature3 Feature1 Feature2 Feature3 Feature1 Feature2 Feature3
IRIS 0.90 0.02 0.08 0.78 0.05 0.15 0.86 0.10 0.02
MNIST 0.40 0.16 0.43 0.34 0.23 0.42 0.36 0.21 0.42
ISIC 0.33 0.27 0.39 0.30 0.22 0.47 0.32 0.33 0.33
(a) Data Distribution of IRIS Dataset with OOP Attack (b) Data Distribution of MNIST Dataset with OOP Attack
(c) Data Distribution of ISIC Dataset with Distance-based At-
tack
Figure 8: Data Distribution with Distance-based Attack
From our results, GNB and KNN are found to be highly
affected by our poisoning attack whereas DT and RF are
less affected models. Manipulating the outliers class la-
bel, class probabilities of GNB, and proximity distance
calculation of KNN are highly disrupted. Conversely, RF
and DT are attack-agnostic algorithms because of their re-
silience against outliers.
•Impact of dataset structure on multiclass model poi-
soning Our results highlight that the dataset size and its
no. of classes are inversely proportional to poisoning ef-
fects. Whereas, an accelerating impact of an imbalanced
dataset on model poisoning. Imbalanced classes in mul-
ticlass datasets help penetrate poison in the model effec-
tively, to an extent. Also, a fundamental relation between
11
Table 8:
Analyzing SVM margin score for different datasets with ∆L=
(0,10,15)%
Dataset ∆L=0% ∆L=10% ∆L=15%
IRIS 0.005 0.01 0.001
MNIST 0.0000011 0.00000022 0.00000027
ISIC 0.01 0.003 0.003
dataset noise and data poisoning is found where dataset
noise works as a catalyst towards poisoning leading to
more adverse but impractical performance degradation.
•Limitations Our research is limited to the analysis of clas-
sification algorithms which can be extended to the regres-
sion algorithms. With this limitation, we have analyzed
the factors affecting classification behaviors and their con-
fidence in our poisoning attack. Also, comparing our at-
tack with our existing attacks from the literature helps
demonstrate the efficacy of our attack which is also out
of the scope of this study.
7. Conclusion and Future Work
This paper analyzes the behavior of multiclass machine
learning models, identifying individual characteristics of the
algorithms against data outlier-oriented poisoning. We formu-
lated outlier-oriented poisoning to compromise algorithm clas-
sification in multiclass settings. Our research analyzed the sen-
sitivity of individual algorithms against the OOP attack, identi-
fying their key characteristics. For example, change in decision
boundaries is highly disrupted in KNN and GNB, but minimum
effects are visible in SVM. Features importance scores of SVM
and RF have limited impact, and no. of trees in RF and no.
of k-neighbors in KNN have shown an inverse impact with in-
creasing poison rates. Further analysis has resulted in identi-
fying the most effective poisoning rates, i.e. 10% poisoning
for the parametric and 15% for the non-parametric algorithms
and impractical impact of poisoning >15% with a high and
fluctuating decrease in performance without overfitting. Our
results showed that KNN and GNB are the most affected algo-
rithms, whereas RF and DT are resilient against OOP attacks.
Our analysis also highlights that the noisy datasets with non-
uniform features aggravate the poisoning effects. In contrast,
dense datasets with higher no. of classes normalize the poison-
ing effects, particularly the accuracy of the models.
For future work following our behavioral analysis, some poten-
tial directions to mitigate these poisoning attacks are:
•Improve adversarial training of the models by training
against 10%-15% of the poisoned dataset as these are the
effective poisoning levels.
•Model hardening can also be better implemented follow-
ing the effects on individual model parameters such as
k-neighbors in KNN, support vectors in SVM, etc with
ensemble learning settings rather than following general
performance metrics of the models such as accuracy, pre-
cision, recall, and FPR.
•To remediate the outliers or anomalies focused attacks, we
can develop pre-training models to identify outliers and
cleanse our dataset.
CRediT authorship contribution statement
Anum Paracha - Problem Statement, Conceptualization,
Investigation, Formal Analysis, Writing - Original draft.
Junaid Arshad - Conceptualization, Writing - Original draft.
Mohamed Ben Farah - Conceptualization, Writing - Review
and Update. Khalid Ismail - Conceptualization, Writing -
Review and Update.
Declaration of competing interest
The authors declare that they have no known competing
financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Data availability
Data will be made available upon request.
Acknowledgement
No acknowledgments to declare.
References
[1] Zhou, C., Wang, Y. & Zhu, G. Object-attentional untargeted adversarial
attack. Journal Of Information Security And Applications.81 pp. 103710
(2024)
[2] Schneider, J. & Apruzzese, G. Dual adversarial attacks: Fooling humans
and classifiers. Journal Of Information Security And Applications.75 pp.
103502 (2023)
[3] McCarthy, A., Ghadafi, E., Andriotis, P. & Legg, P. Defending against
adversarial machine learning attacks using hierarchical learning: A case
study on network traffic attack classification. Journal Of Information Se-
curity And Applications.72 pp. 103398 (2023)
[4] Hojjat Aghakhani et al. “Bullseye polytope: A scalable clean-label poi-
soning attack with improved transferability”. In: IEEE European sympo-
sium on security and privacy (EuroS&P). IEEE. 2021, pp. 159–178.
[5] Ismail Alarab and Simant Prakoonwit. “Uncertainty estimation based ad-
versarial attack in multiclass classification”. In: Multimedia Tools and
Applications 82.1 (2023), pp. 1519–1536.
[6] Ali, Y. Adversarial attacks on deep learning networks in image classifi-
cation based on Smell Bees Optimization Algorithm. Future Generation
Computer Systems.140 pp. 185-195 (2023)
[7] Li, S., Huang, G., Xu, X. & Lu, H. Query-based black-box attack against
medical image segmentation model. Future Generation Computer Sys-
tems.133 pp. 331-337 (2022)
[8] Kshitiz Aryal, Maanak Gupta, and Mahmoud Abdelsalam. “Analysis of
label-flip poisoning attack on machine learning based malware detector”.
In: IEEE International Conference on Big Data (Big Data). IEEE. 2022,
pp. 4236–4245.
[9] Battista Biggio, Blaine Nelson, and Pavel Laskov. “Poisoning attacks
against support vector machines”. In: Proceedings of the 29th Inter-
national Coference on International Conference on Machine Learning.
ICML’12. Edinburgh, Scotland: Omnipress, 2012, pp. 1467–1474. isbn:
9781450312851.
12
Table 9:
Features importance score - RF where ∆L=(0%,10%,15%)
Dataset
Clean Dataset Poisoned Dataset ∆L=10% Poisoned Dataset ∆L=15%
Feature1 Feature2 Feature3 Feature1 Feature2 Feature3 Feature1 Feature2 Feature3
IRIS 0.66 0.15 0.17 0.58 0.19 0.22 0.52 0.22 0.25
MNIST 0.39 0.34 0.26 0.39 0.33 0.27 0.39 0.32 0.27
ISIC 0.31 0.35 0.34 0.31 0.36 0.32 0.34 0.33 0.32
Table 10:
Model Variance at Different Poisoning Levels
Dataset Algorithm Clean Dataset ∆L=10% ∆L=15%
IRIS
SVM 0.33 0.36 0.57
RF 0.62 0.60 0.63
GNB 0.65 0.73 0.68
KNN 0.81 0.82 0.91
DT 0.59 0.68 0.78
MLP 0.65 0.69 1.45
MNIST
SVM 8.33 8.06 7.97
RF 8.24 7.69 7.71
GNB 11.25 12.68 12.74
KNN 8.36 8.37 8.38
DT 8.33 7.81 8.02
MLP 8.31 8.38 8.31
ISIC
SVM 1.33 0.97 1.36
RF 1.11 1.17 1.32
GNB 1.27 1.66 1.19
KNN 0.31 0.37 0.27
DT 0.31 0.37 0.27
MLP 1.52 1.48 1.59
[10] Hamid Bostani and Veelasha Moonsamy. “Evadedroid: A practical eva-
sion attack on machine learning for black-box android malware detec-
tion”. In: Computers &Security 139 (2024), p. 103676.
[11] Nicholas Carlini and Andreas Terzis. “Poisoning and backdooring con-
trastive learning”. In: arXiv preprint arXiv:2106.09667 (2021).
[12] Huili Chen and Farinaz Koushanfar. “Tutorial: toward robust deep learn-
ing against poisoning attacks”. In: ACM Transactions on Embedded Com-
puting Systems 22.3 (2023), pp. 1–15.
[13] Jinyin Chen et al. “Deeppoison: Feature transfer based stealthy poison-
ing attack for dnns”. In: IEEE Transactions on Circuits and Systems II:
Express Briefs 68.7 (2021), pp. 2618–2622.
[14] Anil Kumar Chillara et al. “Deceiving supervised machine learning mod-
els via adversarial data poisoning attacks: a case study with USB key-
boards”. In: International Journal of Information Security (2024), pp.
1–19.
[15] Jimmy Z Di et al. “Hidden poison: Machine unlearning enables camou-
flaged poisoning attacks”. In: NeurIPS ML Safety Workshop. 2022.
[16] Ruidong Han et al. “A Credential Usage Study: Flow-Aware Leakage De-
tection in Open-Source Projects”. In: IEEE Transactions on Information
Forensics and Security (2023).
[17] Jonathan Hayase et al. “Spectre: Defending against backdoor attacks us-
ing robust statistics”. In: International Conference on Machine Learning.
PMLR. 2021, pp. 4129–4139.
[18] Khondoker Murad Hossain and Tim Oates. “Advancing Security in AI
Systems: A Novel Approach to Detecting Backdoors in Deep Neural Net-
works”. In: arXiv preprint arXiv:2403.08208 (2024).
[19] W Ronny Huang et al. “Metapoison: Practical general-purpose clean-
label data poisoning”. In: Advances in Neural Information Processing
Systems 33 (2020), pp. 12080–12091.
[20] Wei Huang, Xingyu Zhao, and Xiaowei Huang. “Embedding and extrac-
Table 11:
Analyzing one-to-one relation between poison and various parameters
of machine learning algorithms
Algorithm Algorithmic Parameters Relation to ∆P
SVM
Margin score Minimal impact
Decision boundary Minimal impact
Features importance score Minimal impact
DT Features importance score Minimal impact
Asymmetric features space High impact
KNN Decision boundary High impact
k-neighbors Inverse impact
GNB Decision boundary High impact
Class probabilities High impact
RF No. of trees Inverse impact
Features importance score Minimal impact
MLP Weights High impact
tion of knowledge in tree ensemble classifiers”. In: Machine Learning
111.5 (2022), pp. 1925–1958.
[21] Matthew Jagielski et al. “Subpopulation Data Poisoning Attacks”. In:
Proceedings of the 2021 ACM SIGSAC Conference on Computer and
Communications Security. CCS ’21. Virtual Event, Republic of Ko-
rea: Association for Computing Machinery, 2021, pp. 3104–3122. isbn:
9781450384544.
[22] Rishi Jha, Jonathan Hayase, and Sewoong Oh. “Label poisoning is all
you need”. In: Advances in Neural Information Processing Systems 36
(2024).
[23] Annapurna Jonnalagadda et al. “Modelling Data Poisoning Attacks
Against Convolutional Neural Networks”. In: Journal of Information “&
Knowledge Management (2024), p. 2450022.
[24] Pang Wei Koh, Jacob Steinhardt, and Percy Liang. “Stronger data poi-
soning attacks break data sanitization defenses”. In: Machine Learning
(2022), pp. 1–47.
[25] Ricky Laishram and Vir Virander Phoha. “Curie: A method for pro-
tecting SVM classifier from poisoning attack”. In: arXiv preprint
arXiv:1606.01584 (2016).
[26] Ganlin Liu, Xiaowei Huang, and Xinping Yi. “Adversarial Label Poison-
ing Attack on Graph Neural Networks via Label Propagation”. In: Euro-
pean Conference on Computer Vision. Springer. 2022, pp. 227–243.
[27] Gaoyang Liu et al. “Gradient-Leaks: Enabling Black-Box Membership
Inference Attacks Against Machine Learning Models”. In: IEEE Trans-
actions on Information Forensics and Security (2023).
[28] Zhuoran Liu, Zhengyu Zhao, and Martha Larson. “Image shortcut
squeezing: Countering perturbative availability poisons with compres-
sion”. In: International conference on machine learning. PMLR. 2023,
pp. 22473–22487.
[29] Tobias Lorenz, Marta Kwiatkowska, and Mario Fritz. “Certifiers Make
Neural Networks Vulnerable to Availability Attacks”. In: Proceedings of
the 16th ACM Workshop on Artificial Intelligence and Security. 2023, pp.
67–78.
[30] Ke Ma et al. “Poisoning Attack Against Estimating From Pair-
wise Comparisons”. In: IEEE Transactions on Pattern Analy-
sis and Machine Intelligence 44.10 (2022), pp. 6393–6408. doi:
13
10.1109/TPAMI.2021.3087514.
[31] Debasmita Manna and Somanath Tripathy. “TriMPA: Triggerless Tar-
geted Model Poisoning Attack in DNN”. In: IEEE Transactions on Com-
putational Social Systems (2024).
[32] Robin Mayerhofer and Rudolf Mayer. “Poisoning attacks against feature-
based image classification”. In: Proceedings of the Twelfth ACM Confer-
ence on Data and Application Security and Privacy. 2022, pp. 358–360.
[33] Stefano Melacci et al. “Domain knowledge alleviates adversarial attacks
in multi-label classifiers”. In: IEEE Transactions on Pattern Analysis and
Machine Intelligence 44.12 (2021), pp. 9944–9959.
[34] Hamed Moradi et al. “Recent developments in modeling, imaging, and
monitoring of cardiovascular diseases using machine learning”. In: Bio-
physical Reviews 15.1 (2023), pp. 19–33.
[35] Luis Mu noz-Gonz alez et al. “Towards poisoning of deep learning al-
gorithms with back-gradient optimization”. In: Proceedings of the 10th
ACM workshop on artificial intelligence and security. 2017, pp. 27–38.
[36] Bao-Ngoc Nguyen et al. “Label-Only Model Inversion Attacks via
Knowledge Transfer”. In: Advances in Neural Information Processing
Systems 36 (2024).
[37] Vasileios Pantelakis et al. “Adversarial Machine Learning Attacks on
Multiclass Classification of IoT Network Traffic”. In: Proceedings of the
18th International Conference on Availability, Reliability and Security.
2023, pp. 1–8.
[38] Alessio Russo and Alexandre Proutiere. “Poisoning attacks against data-
driven control methods”. In: American Control Conference (ACC). IEEE.
2021, pp. 3234–3241.
[39] Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash.
“Hidden trigger backdoor attacks”. In: Proceedings of the AAAI confer-
ence on artificial intelligence. Vol. 34. 07. 2020, pp. 11957–11965.
[40] Jose Rodrigo Sanchez Vicarte et al. “Game of Threads: Enabling
Asynchronous Poisoning Attacks”. In: Proceedings of the Twenty-Fifth
International Conference on Architectural Support for Programming
Languages and Operating Systems. ASPLOS ’20. Lausanne, Switzer-
land: Association for Computing Machinery, 2020, pp. 35–52. isbn:
9781450371025. doi: 10.1145/3373376.3378462.
[41] Abdur R Shahid et al. “Label flipping data poisoning attack against wear-
able human activity recognition system”. In: IEEE Symposium Series on
Computational Intelligence (SSCI). IEEE. 2022, pp. 908–914.
[42] Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. “Certified de-
fenses for data poisoning attacks”. In: Advances in neural information
processing systems 30 (2017).
[43] Fnu Suya et al. “Model-targeted poisoning attacks with provable con-
vergence”. In: International Conference on Machine Learning. PMLR.
2021, pp. 10000–10010.
[44] Yihan Wang, Yifan Zhu, and Xiao-Shan Gao. “Efficient Availability At-
tacks against Supervised and Contrastive Learning Simultaneously”. In:
arXiv preprint arXiv:2402.04010 (2024).
[45] Sandamal Weerasinghe et al. “Defending support vector machines against
data poisoning attacks”. In: IEEE Transactions on Information Forensics
and Security 16 (2021), pp. 2566–2578.
[46] Vasin Wongrassamee and Luis Mu noz-Gonz alez. “Can you Poison a
Machine Learning Algorithm?” In: (2017).
[47] Zhou Yang et al. “Stealthy backdoor attack for code models”. In: IEEE
Transactions on Software Engineering (2024).
[48] Fangchao Yu et al. “Chronic Poisoning: Backdoor Attack against Split
Learning”. In: Proceedings of the AAAI Conference on Artificial Intelli-
gence. Vol. 38. 15. 2024, pp. 16531–16538.
[49] Tengchan Zeng et al. “Convergence of communications, control, and ma-
chine learning for secure and autonomous vehicle navigation”. In: IEEE
Wireless Communications (2024).
[50] Chen Zhang, Zhuo Tang, and Kenli Li. “Clean-label poisoning attack with
perturbation causing dominant features”. In: Information Sciences 644
(2023), p. 118899.
[51] Xuezhou Zhang, Xiaojin Zhu, and Laurent Lessard. “Online data poison-
ing attacks”. In: Learning for Dynamics and Control. PMLR. 2020, pp.
201–210.
[52] Bingyin Zhao and Yingjie Lao. “CLPA: Clean-label poisoning availabil-
ity attacks using generative adversarial nets”. In: Proceedings of the AAAI
Conference on Artificial Intelligence. Vol. 36. 8. 2022, pp. 9162–9170.
[53] Bingyin Zhao and Yingjie Lao. “Towards class-oriented poisoning attacks
against neural networks”. In: Proceedings of the IEEE/CVF Winter Con-
ference on Applications of Computer Vision. 2022, pp. 3741–3750.
[54] Mengxin Zheng et al. “TrojFair: Trojan Fairness Attacks”. In: arXiv
preprint arXiv:2312.10508 (2023).
[55] Haoti Zhong et al. “Backdoor embedding in convolutional neural net-
work models via invisible perturbation”. In: Proceedings of the Tenth
ACM Conference on Data and Application Security and Privacy. 2020,
pp. 97–108.
[56] Chen Zhu et al. “Transferable clean-label poisoning attacks on deep neu-
ral nets”. In: International conference on machine learning. PMLR.
2019, pp. 7614–7623.
[57] Yi Zhu et al. “TileMask: A Passive-Reflection-based Attack against
mmWave Radar Object Detection in Autonomous Driving”. In: Proceed-
ings of the 2023 ACM SIGSAC Conference on Computer and Commu-
nications Security. CCS ’23. , Copenhagen, Denmark, Association for
Computing Machinery, 2023, pp. 1317–1331. isbn: 9798400700507. doi:
10.1145/3576915.3616661.
14
Anum Paracha is a PhD student at
the School of Computing and Digital
Technology, Birmingham City Uni-
versity, UK. Her research interests are
to investigate use of advanced ma-
chine learning techniques to mitigate
emerging cybersecurity research chal-
lenges.
Junaid Arshad is a Professor in Cy-
ber Security and has extensive research
experience and expertise in investigating
and addressing cybersecurity challenges
for diverse computing paradigms. Junaid
has strong experience of developing be-
spoke digital solutions to meet industry
needs. He has extensive experience of
applying machine learning and AI algo-
rithms to develop bespoke models to ad-
dress specific requirements. He is also actively involved in
R&D for secure and trustworthy AI, focusing on practical ad-
versarial attempts on such systems especially as a consequence
of cutting-edge applications of generative AI.
Mohamed Ben Farah is a Lecturer in
Cyber Security at Birmingham City Uni-
versity. Mohamed has published over 30
journal and conference papers and has
organized conferences and workshops in
Cyber Security, Cryptography and Arti-
ficial Intelligence. He is a reviewer for
world-leading academic conferences and
journals and is the Outreach Lead of the
Blockchain Group for IEEE UK and Ire-
land.
Khalid Ismail is a Senior Lecturer in
Computer Science at Birmingham City
University. Dr Ismail’s primary research
interests lie in the fields of Artificial
Intelligence, computer vision, advanced
machine learning, image processing, and
deep learning, particularly when applied
to complex real-world challenges. Cur-
rently, he is supervising many AI based
intelligent projects development and also been an active part of
industry based collaborative projects.
15