Conference PaperPDF Available

Deceiving Post-hoc Explainable AI (XAI) Methods in Network Intrusion Detection

Authors:

Abstract

Artificial Intelligence used in future networks is vulnerable to biases, misclassifications, and security threats, which seeds constant scrutiny in accountability. Explainable AI (XAI) methods bridge this gap in identifying unaccounted biases in black-box AI/ML models. However, scaffolding attacks would hide the internal biases of the model from XAI methods, jeopardizing any auditory or monitoring processes, service provisions , security systems, regulators, auditors, and end-users in future networking paradigms, including Intent-Based Networking (IBN). For the first time ever, we formalize and demonstrate a framework on how an attacker would adopt scaffoldings to deceive the security operators in Network Intrusion Detection Systems (NIDS). Furthermore, we propose a detection method that auditors can use to detect the attack efficiently. We rigorously test the attack and detection methods using the NSL-KDD. We then simulate the attack on 5G network data. Our simulation illustrates that the attack adoption method is successful, and the detection method can identify an affected model with extremely high confidence.
Deceiving Post-hoc Explainable AI (XAI) Methods
in Network Intrusion Detection
Thulitha Senevirathna, Bartlomiej Siniarski, Madhusanka Liyanage, Shen Wang§
∗† ‡§ School of Computer Science, University College Dublin, Ireland
Email: thulitha.senevirathna@ucdconnect.ie, bartlomiej.siniarski@ucd.ie, madhusanka@ucd.ie, §shen.wang@ucd.ie
Abstract—Artificial Intelligence used in future networks is
vulnerable to biases, misclassifications, and security threats,
which seeds constant scrutiny in accountability. Explainable
AI (XAI) methods bridge this gap in identifying unaccounted
biases in black-box AI/ML models. However, scaffolding attacks
would hide the internal biases of the model from XAI methods,
jeopardizing any auditory or monitoring processes, service pro-
visions, security systems, regulators, auditors, and end-users in
future networking paradigms, including Intent-Based Network-
ing (IBN). For the first time ever, we formalize and demonstrate
a framework on how an attacker would adopt scaffoldings to
deceive the security operators in Network Intrusion Detection
Systems (NIDS). Furthermore, we propose a detection method
that auditors can use to detect the attack efficiently. We rigorously
test the attack and detection methods using the NSL-KDD. We
then simulate the attack on 5G network data. Our simulation
illustrates that the attack adoption method is successful, and the
detection method can identify an affected model with extremely
high confidence.
Index Terms—Explainable security, 5G, B5G, Network Intru-
sion Detection, Machine Learning, Scaffolding Attack, Future
networks, Intent-based networks
I. INTRODUCTION
Fast-changing networking technologies like Machine Learn-
ing (ML), and Artificial Intelligence (AI) demand accountabil-
ity and trustworthiness. Network Intrusion Detection System
(NIDS)s are rapidly becoming popular to use AI/ML tech-
niques [1], [2] and are already available for purchase from
third-party companies. This trend is appealing to networking
organizations as they can save costs and focus their full
strengths on the core products. However, for proprietary rea-
sons, such models are purposely turned into black boxes (e.g.,
tree-based models) to conceal model knowledge from rivals.
OpenAI’s refusal to provide GPT-4’s architecture foreshadows
such a trend [3]. In recent research, Explainable AI (XAI)
has become an X-ray on the black-box NIDS models to
increase their transparency. In-house red/blue teams, indepen-
dent security auditors, and regulatory authorities can utilize
XAI with minimal programming abilities [4]. However, it has
been recently brought to light that scaffolding attacks [5] can
deceive even the XAI methods, making the clients vulnerable
This research is a part of the SPATIAL project that has received funding
from the European Union’s Horizon 2020 research and innovation program
under the grant agreement No.101021808 and CONNECT phase 2 project
that has received funding from Science Foundation Ireland under grant no.
13/RC/2077 P2.
to external attacks. A model creator would be enticed to embed
a scaffolding in an AI model for several reasons: to undermine
competitors through bias, to enable later access through back-
doors [6], [7], and by disgruntled employees. Considering the
wide variety of usage of post-hoc XAI methods, a deceptive
AI model deployed by an attacker could cause a devastating
effect. For instance, when explanations are fed back to the AI
model to improve the accuracy from the client side training
[8], [9]. Wrong explanations, multiplied over many feedback
iterations, can degrade or even create back-doors for future
attacks.
So far, this attack is only known to be used in socio-
cultural instances. With our work, we also extend the attack’s
applicability to the networking domain.
A. Related work
The work such as [9]–[11] bring to light the potential of XAI
in the context of Intrusion Detection System (IDS). However,
this work has not considered XAI outputs in the adversarial
context where the black-box models actively try to deceive
the explanations. Authors of [5] introduce the scaffolding
attack to be effective in real-world socio-cultural cases. For
instance, by scaffolding the AI model, one can make the
auditors generating explanations agnostic to a bias on African-
Americans categorized as the riskier demographic to obtain
bail. Similarly, they show that biases on race and gender
in AI/ML models can be hidden from XAI methods using
scaffoldings. However, in their work, they have yet to propose
a direct solution to the attack or consider the possibility of
using it in datasets heavy with continuous data, such as IDS
data. Solutions for scaffolding attacks are rarely explored in
the current literature. For example, methods proposed in [12],
[13] are either computationally intensive or not tested on the
networking data to provide a concrete analysis of the attack’s
adaptability.
B. Contributions
We bring forth NIDS tested in the context of real-world 5G
traffic data in the face of adversaries for XAI. According to
our knowledge, this paper is the first of its kind to address the
manipulation of explanations in 5G. Our other contributions
are listed as follows.
First, we implement an improved scaffolding attack on
NIDS for 5G and beyond. We use high-accuracy models for
its constituent components, increasing the attack’s success.
Especially in contrast to the existing work, our version of
the attack is tested on the real-world 5G NIDS deployments
and it is inconspicuous to a service provider by design.
Second, we propose “committee of estimators” and “do-
main knowledge filtering” methods that encompass efficient
and pragmatic solutions for an attacker to select a high-
impact target feature when scaffolding a NIDS [5].
Third, we propose a computationally efficient but highly
effective method of detecting the scaffolding attacks using
Hellinger distance only using query-level access to the
model. The attack’s success and detection are rigorously
tested using various metrics.
The paper continues as follows. Section II introduces our
system model, attacker’s goals, and adversarial parameters. We
explore the proposed framework for target feature selection
and attack detection in Section III. In Section IV, we discuss
our research scheme and analysis. Finally, we draw our
conclusions in Section V
II. SY ST EM M OD EL
Post-hoc explainers such as LIME [14] and SHAP [15] are
popularly used to analyze the faithfulness of ML models. They
query the model with synthetically generated data, perturbed
from the real-world data, to identify the importance given by
the model for each input feature (referred to as attributions
in [15]). With the scaffolding, a model creator can manipulate
the feature importance values generated for system operators.
By selecting the right features to manipulate, scaffolds can
provide various competitive advantages over other business
service providers in the networking domain (refer Figure 1).
For example, a service provided by a member of an affiliated
company in the feature set (e.g., a subsidiary) can be made
falsely more critical for the model decision. On the other hand,
one can depreciate the importance of a competitor’s service
to tarnish the clients’ confidence in it. However, selecting
this feature is highly use-case specific and paramount for
the attack’s success. Thus, in this paper, we first answer the
question of how to select the best feature to scaffold by the
attacker?”.
Undiscovered scaffolding biases in black box AI mod-
els may significantly impact Intent-Based Networking (IBN)
implementation. Intent translation/resolution accountability is
crucial under a high-level abstraction of intentions. With a
scaffolded model, a vendor can secretly input biased network
configurations favoring their affiliate members. The austerity
is the undetectability of such biases prior to deployment using
model-agnostic XAI methods. This subtlety makes it only pos-
sible to be detected once deployed after a long time of biased
intent translation/resolution. The same attack can be replicated
in resource management (NIDS, Quality of Service (QoS),
policy management), network fault detection, and monitoring
components of IBN with only minor modifications. This paper
demonstrates the power of scaffoldings in a NIDS scenario.
We use an ML-based NIDS model as the affected model to
analyse the effectiveness of the scaffolding attack. Therefore,
Network Manager
Create
dataset
Train and
develop
model
Evaluate
ML model creator
Scaffolding
Malcious
agent
Auditors
System
Operators
Legal and ethical inst.
Independent regulatory bodies
XAIXAI
XAI
Competitor service
Deploy
Affiliate service 2
Affiliate service 1
Scaffolded
NIDS /
AI model
Outcomes:
- Constraining competitor service
- Affiiliate services getting unfair advantage
Fig. 1: AI model (NIDS/intent translator/resource allocator)
deployed in the network component will provide biased deci-
sions to suppress competitor services without the knowledge
of the system operators. Post-hoc XAI methods will fail to
capture the biases if the AI model is scaffolded.
the second research question we tackle in this work is “how
to detect a scaffolded ML model used in an NIDS?
A. Adversarial architecture
The internal specifications of the scaffolded model are
shown in Figure 2. This interior architecture is well hid-
den from the users in this adversarial setting [5]. The
perturbation_clf identifies whether the input data is a
perturbed data point coming from an XAI method such as
LIME/SHAP or real-world data. Depending on the accuracy
of the perturbation_clf, the data point, which is clas-
sified as a synthetically created perturbation (from an XAI
model), is detoured to the innocuous_model trained to
give highly accurate inferences in a real-world scenario. The
innocuous_model is improved in our version from a rule-
based model in the existing work to make it more fitting to
real-world NIDS applications. However, due to the nature of
the attack, obscure_model is a rule-based model that gives
out the inference only by referring the target_ft which is
the feature the attacker wants to conceal from the auditors and
outside parties.
We assume the adversary has successfully placed a scaf-
folding in the ML model, and the external access is limited
only to the query level. The model is only expected to return
a probability of abnormality for each data point. Deployed
model a(.)would increasingly block a service protocol (in an
input xi) that the adversary prefers, say http, causing delays
on the competitor’s services that use the said protocol.
a(x) = block, if xservice pr otocol = http
allow, otherwise (1)
Input:
data
record
Output:
benign/
malicious
label
perturbed
data point
Real world
datapoint
perturbation_clf
Is perturbed ?
innocuos_model
High
Accuracy model
obscure_model
If target_ft> 0 then
positive_outcome
Adversarial Model
(Black-boxed)
Fig. 2: For the model users, the input query functionality
and output probability/label are the only accessible functions
of the model. The whole Adversarial model wrapped in the
scikitlearn model wrapper is considered to be a black box
which is the case in most real-world scenarios.
III. PROP OS ED FR AM EW OR K
In this section, we first discuss our novel method for
detecting a high-impact target feature using a committee of
estimators and domain knowledge filtering. Then we introduce
a methodology for detecting the scaffolding attack in the NIDS
domain.
A. Methodology for selecting the high-impact target features
in security datasets
From the attacker’s perspective, we propose a novel frame-
work for target feature selection, as shown in Figure 3. Here,
as the first step, we propose to obtain the feature importance
through a selected XAI function as shown under Step 1 in
Algorithm 1.
SVM
...
SHAP
MLP
Where,
Dataset
...
Candidate target features
...
Domain Knowledge
filtering
...
Fig. 3: Target feature selection process. The target feature is
selected by taking the explanations across several models and
combining them with the domain knowledge.
Obtaining feature importance values purely through XAI
will only give the model perspective about the features de-
scribed in the dataset. In the domain knowledge filtering, the
Algorithm 1: select_target_features(Xnorm,
type)
Input: a, b, i, j, N, M N;
g(·)explainer;
F={f1, f2, . . . }N
i=1 ML estimators;
Γ = {µ1, µ2, . . . }N
i=1 performance metrics;
Output: H {target_feature};
Data: X RM;yR;
Sm={m1, m2, ...}M
j=1 Feature space of X;
Procedure:
Step 1 - Committee of Estimators:
// Initialize Has a set and Φas a matrix (N×M)
H {}; Φ [ ]N×M;B[ ]1×M; [ ]1×M;
for fiFdo
// Train each estimator with Xand y
fi.train(standard scaler(X), y)
// Generate feature importance scores (ϕi)M
(ϕi)M=g(fi)
// Multiply each feature score with relevant µiand
// append to one matrix
φM
i=µi·(ϕi)M
Φ.append(φM
i)
// Take overall importance for each feature for N
// estimators as BM
for jrange(M)do
βj=Pi=N
i=1 φi,j
BM.append(βj)
Step 2 - Domain Knowledge Filtering:
// Applying a domain knowledge function for each
// feature to get the feature importance for the specific
// threat model
for jrange(M)do
// Threat model specific cost function for Sm//
C1, C2, . . . are various costs assoc threat model
hj=1
ωm(m;C1,C2,... )
M.append(hj)
// Select anumber of features from BMand b
// number of features from Mand take the
// set of intersection
H {mi:mi(maxβj[a](BM)maxhj[b](ΩM))}
return Hwhere |H| = 0
attacker can reconnaissance the domain-related data to further
narrow down and make the attack more fitting to the threat
model.
B. Attack detection
This section introduces our novel method for detecting the
scaffolding attack with a thorough analysis of the evidence.
The auditor will be expected to obtain a real-world set of
samples as Xfrom the service provider for testing purposes or
use a real-world dataset that fits the feature space of a(.). Feed-
ing Xinto a(.)would ideally cause the perturbation_clf
to detect them as real-world data and send them to the
obscure_model. The outputs will result in yinference dis-
tribution. Similarly, the auditor would feed a perturbed set
of data samples Xgto a(.)causing the perturbation_clf
to classify them as coming from an XAI method and direct
it towards the innocuous_model. These outputs would be
accurate in case of a high accuracy innocuous_model or will
be inaccurate if the attacker used the original authors’ model.
We name this inference distribution as yg. Empirically we
show that the distance between ygand yis significant, and they
can be detected apart using a statistical measurement such as
Hellinger distance which is computationally very efficient. Say
P= (p1,...pk)and Q= (q1. . . qk)are discrete probability
distributions. Then Hellinger distance is defined as follows
(equation 2). We also conduct ablation experiments to verify
our results for the detection technique.
H(P, Q) = 1
2v
u
u
t
k
X
i=1
(piqi)2(2)
IV. EXP ER IM EN T DETAILS
In this section, we elaborate on our experimental details
and settings. As a fundamental step in the overall scaffolding
attacks in the security domain, we test this attack over real
world 5G (and beyond) data (5GNIDD [16]) and four sub-
datasets of NLS-KDD, a benchmarking data set in the NIDS
field. Details of the datasets are given in Table I.
A. Datasets
TABLE I: Dataset descriptions
Dataset Attack types # attack recs total size
NSL-KDD
DoS 53,385 106,770
Probe 14,077 28,154
R2L 3,882 7,764
U2R 119 238
5G-NIDD DoS 738,153 1,215,890
1) NSLKDD dataset: The NSL-KDD dataset [17] is an
improved version of the benchmark intrusion detection dataset
from KDDCup’99. Even though the NSL-KDD dataset does
not perfectly represent actual network data ( [18]), it is suitable
as a benchmark data set as shown in [19], [20]. Every traffic
record of the dataset has 41 features and one label that belongs
to either the normal class, Denial of Service (DoS), Root to
Local (R2L), User to Root (U2R), or Probe attack classes. The
testing set contains attack types absent from the training set,
making it a more realistic theoretical foundation for intrusion
detection. NSL-KDD dataset is relatively a clean dataset. The
separation to sub-datasets was done based on the attack-types,
and balanced them with an equal number of normal datapoints.
2) 5GNIDD dataset: The 5GNIDD is a real world 5G
dataset [16] collected from the 5G Test Network Finland.
This dataset contains benign data from various users while
the attacker launches the attack. Attack records from several
DoS attack types such as UDP flood, SYN flood, HTTP flood,
slowrate Dos, and ICMP flood are collected from the attacker.
The dataset contains around 50 binary features. However, the
dataset was cleaned of null values and scaled using standard
scaler before using for training the models.
B. Model training
The adversarial model is trained with an augmented dataset
of Xattack. We want to direct the attention to the original
paper [5] for further information. We split Xattack into training
(0.8) and testing (0.2) portions. It is then perturbed using the
LIME’s perturbation technique to obtain an augmented number
(×10) of samples which were used as the training dataset for
the perturbation_clf. We use Random Forest classifiers
for all the models except the rule-based one inside a. The
innocuous_model training is carried out directly with the
Xattack (accuracy 0.98). A more accurate model would raise
fewer user concerns, making the attack more inconspicuous for
detection approaches. Our model excels in this compared to
the original scaffolding attack.
C. Target feature selection
As shown in Figure 3 we have taken the SHAP explanations
across a committee of candidate models to understand the most
important feature. The domain knowledge filtering in Figure 3
will be in-depth analyzed in future work. In the current form,
we find βmaccording to Algorithm 1 Step 1. The resulting
feature obtained here is the target_ft. The control model
is trained with the same sub-datasets and parameters as the
adversarial model but without the scaffolding.
D. Evaluation
TABLE II: Attacker’s perspective on selecting the target
feature for 5GNIDD dataset
Model Top feature
(m)
shapley value
of (φ)
F1 score
(µ1)β j (for µ1)
MLP Seq 0.1827 0.994 0.1816
SVM-linear Seq 0.1930 0.993 0.1916
GNB Seq 0.0005 0.824 0.0004
RF Seq 0.2164 0.995 0.2153
KNN Seq 0.0030 0.996 0.0031
LSTM Seq 0.1443 0.991 0.1431
The target_ft selection was done seperately for the two
dataset separately. We generate SHAP explanations from MLP,
SVM, GNB, RF, KNN, and LSTM models while maintaining
all the external factors the same for both datasets. Figure 4
shows an example output for top five ranking features obtained
from the models for NSL-KDD dataset. The service http for
NSL-KDD and Seq for 5GNIDD, features appear to be the top
features according to the calculated β j scores. The shapley
values φ, F1 score (µ1) for each model trained with 5GNIDD
dataset is presented in Table II. Therefore, for an attacker,
service http/Seq features are the more appealing target_fts.
The detection method was carried out in all four sub-
datasets and 5GNIDD dataset. In Table III we have presented
the performance of the models for DoS attack detection.
Multi layer perceptron SVM - Linear Naiv Bayes
Random Forest KNN LSTM
Fig. 4: The target_ft selection process is conducted by taking the weighted feature importance values from each Multi
Layer Perceptron (MLP), Support Vector Machines (SVM), Gaussian Naive Bayes (GNB), Random Forest (RF), K-Nearest
Neighbour (KNN) and Long-Short Term Memory (LSTM) models. Except for the GNB model, every other model finds
service http feature as the most important feature.
Rank 1 Rank 2 Rank 3
0.0
0.2
0.4
0.6
0.8
1.0
Occurences as a percentage
Seq
Shutdown
nan
Occurences of each
feature per rank without the attack
Most frequent feature other features
(a) 5GNIDD - w/o scaffolding
Rank 1 Rank 2 Rank 3
0.0
0.2
0.4
0.6
0.8
1.0
Occurences as a percentage
unrelated_column_one
unrelated_column_one
lldp
Occurences of each
feature per rank with the attack
Most frequent feature other features
(b) 5GNIDD - with scaffolding
Rank 1 Rank 2 Rank 3
0.0
0.2
0.4
0.6
0.8
1.0
Occurences as a percentage
service_http
srv_diff_host_rate
root_shell
Occurences of each
feature per rank without the attack
Most frequent feature other features
(c) NSLKDD - w/o scaffolding
Rank 1 Rank 2 Rank 3
0.0
0.2
0.4
0.6
0.8
1.0
Occurences as a percentage
unrelated_column_one
is_guest_login
tcp
Occurences of each
feature per rank with the attack
Most frequent feature other features
(d) NSLKDD - with scaffolding
Fig. 5: The left charts show the service (target_ft) in rank
1 in the absence of the attack. In the presence of the attack,
the unrelated feature appears as rank 1 more often (right).
The fidelity values are promising (0.6 even without tuning),
representing a successful attack. Fidelity here is taken from the
original scaffolding attack paper [5], [13], which measures the
adversarial model’s performance. Also, the high perturbation
classifier accuracy (100%) ensures the attack’s success. From
the perspective of the attacker, in Figure 5, the unrelated
feature appears more often in the top-ranking features of
LIME explanations than the target feature in both datasets,
successfully fooling the XAI method.
We also observed that the innocuous rule-based models poor
accuracy is a trade-off to emphasize a specific feature in the
explanations. Nonetheless, for scenarios demanding inconspic-
uous model behavior, a high-accuracy model is advisable.
TABLE III: Summary of detection results for DoS attacks
Model Attack
type Fidelity Classification
accuracy
Hellinger
distance
innocous model DoS
(5GNIDD) - 0.91 0.47
adversarial model DoS
(5GNIDD) 0.60 0.61 0.86
perturbation clf DoS
(5GNIDD) - 1 -
control model DoS
(5GNIDD) - 0.92 0.46
innocous model DoS
(NSLKDD) - 0.97 0.41
adversarial model DoS
(NSLKDD) 0.61 0.63 0.86
perturbation clf DoS
(NSLKDD) - 1 -
control model DoS
(NSLKDD) - 0.98 0.42
When the distance values between the adversarial model
a(.)and the control model are compared, we can observe
a clear contrast across all the datasets. We iteratively tested
different sets of Xand Xgwith both a(.)and the control
model, where the results are presented in Figure 6. Hellinger
distances between Xand Xgare presented for each model.
To understand the effect of perturbations, we tested a(.)over
different standard deviation values of data perturbations. Here
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Test data perturbation standard deviation
0.0
0.2
0.4
0.6
0.8
1.0
Hellinger distance (normalized)
Hellinger distance linear regression for
DoS attack for 5GNIDD dataset
Adversarial model distances
Control model distances
(a) 5GNIDD-DoS
(b) NSL-KDD-DoS
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Test data perturbation standard deviation
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
Hellinger distance (normalized)
Hellinger distance linear regression for
Probe attack
Adversarial model distances
Control model distances
(c) NSL-KDD-Probe
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Test data perturbation standard deviation
0.00
0.01
0.02
0.03
0.04
Hellinger distance (normalized)
Hellinger distance linear regression for
U2R attack
Adversarial model distances
Control model distances
(d) NSL-KDD-U2R
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Test data perturbation standard deviation
0.12
0.14
0.16
0.18
0.20
0.22
Hellinger distance (normalized)
Hellinger distance linear regression for
R2L attack
Adversarial model distances
Control model distances
(e) NSL-KDD-R2L
Fig. 6: Regressed graphs of the behavior of Hellinger distance with the perturbation standard deviation for Xgfor different
attack types. From left, DoS attack for 5GNIDD dataset Hellinger distances of the two models show contrasting behavior where
the spaces from the adversarial model remain above the threshold for each dataset. Thereby successfully enabling auditors to
detect the adversarial nature of the model.
TABLE IV: Comparison of contributions
Contribution Ref
[5]
Ref
[12]
Ref
[13]
Our
work
Implementation of scaffolding attack
with minimum requirements
Analysis of the scaffolding attack
in NIDS setting - - -
Improving the attack with high accuracy
innocouos model in the black-box system - - -
Implementation and evaluation of a
detection method - -
High impact target feature selection
method for attack side improvement - - -
we can observe a clear distinction of the Hellinger distance
values from the control model outputs when compared to the
adversarial model. We have observed that a threshold for dos,
probe, U2R, and R2L attacks can be decided as 0.63 (for
5GNIDD and NSL-KDD), 0.33, 0.02, and 0.17.
Therefore we can summarise the results as follows. A high
perturbation_clf accuracy (100%) with 0.6 and above
explanations fidelity indicates a successful attack (also Figure
5). Also, we have empirically shown that attack detection
is possible using a very computationally efficient Hellinger
distance between Xand Xg. The distance variation in the
scaffolded model is well above an attack-free model, inferring
the malicious internals of the model. Therefore, as shown
in Table IV, our contributions exceed the state-of-the-art
regarding computational efficiency and detection success.
V. CONCLUSION
In this paper, we show that the integrity of the explanations
generated by XAI in the 5G network security domain can
be compromised using scaffoldings. We proposed a novel
framework to adopt the scaffolding attack in a security context
with meaningful target feature selection and model training.
We offer a mathematical formulation to combine the XAI
outputs with domain knowledge for target feature selection,
thereby detecting the most important feature an attacker would
suppress in scaffolding a model. We also demonstrate a
successful attack detection method for the scaffolding attack
in the context of network security. Even with an improved
attack version, the detection method still succeeds.
This work is part of ongoing research in improving the
detection and defense of scaffolding attacks in the context of
network security. The proposed domain knowledge framework
will be analyzed rigorously in future work. Also, it would
be interesting to investigate how the threshold values would
hold across more network attack types. The scaffolding attack
expands on the possibility of applying variations of the attack
on other XAI techniques, such as gradient-based methods in
NIDSs.
REFERENCES
[1] S. Saha, A. T. Priyoti, A. Sharma, and A. Haque, “Towards an optimal
feature selection method for ai-based ddos detection system,” in 2022
IEEE 19th Annual Consumer Communications & Networking Confer-
ence (CCNC). IEEE, 2022, pp. 425–428.
[2] O. Aouedi, K. Piamrat, G. Muller, and K. Singh, “Fluids: Federated
learning with semi-supervised approach for intrusion detection system,”
in 2022 IEEE 19th Annual Consumer Communications & Networking
Conference (CCNC). IEEE, 2022, pp. 523–524.
[3] J. Vincent, “Openai co-founder on company’s past
approach to openly sharing research: ’we were
wrong’,” 2023, accessed on May 15, 2023. [Online].
Available: https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-
launch-closed-research-ilya-sutskever-interview
[4] D. Leslie, “Understanding artificial intelligence ethics and safety: A
guide for the responsible design and implementation of ai systems
in the public sector, 2023, accessed on May 15, 2023. [Online].
Available: https://doi.org/10.5281/zenodo.3240529
[5] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, “Fooling lime
and shap: Adversarial attacks on post hoc explanation methods, in
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society,
2020, pp. 180–186.
[6] N. Hopkins, “Uk gathering secret intelligence via covert nsa operation,
The Guardian, vol. 7, p. 2013, 2013.
[7] S. Niksefat, P. Kaghazgaran, and B. Sadeghiyan, “Privacy issues in
intrusion detection systems: A taxonomy, survey and future directions,”
Computer Science Review, vol. 25, pp. 69–78, 2017.
[8] P. Barnard, N. Marchetti, and L. A. DaSilva, “Robust network intru-
sion detection through explainable artificial intelligence (xai),” IEEE
Networking Letters, vol. 4, no. 3, pp. 167–171, 2022.
[9] K. Fujita, T. Shibahara, D. Chiba, M. Akiyama, and M. Uchida,
“Objection!: Identifying misclassified malicious activities with xai, in
ICC 2022-IEEE International Conference on Communications. IEEE,
2022, pp. 2065–2070.
[10] S. Neupane, J. Ables, W. Anderson, S. Mittal, S. Rahimi, I. Banicescu,
and M. Seale, “Explainable intrusion detection systems (x-ids): A
survey of current methods, challenges, and opportunities, arXiv preprint
arXiv:2207.06236, 2022.
[11] S. Patil, V. Varadarajan, S. M. Mazhar, A. Sahibzada, N. Ahmed,
O. Sinha, S. Kumar, K. Shaw, and K. Kotecha, “Explainable artificial
intelligence for intrusion detection system,” Electronics, vol. 11, no. 19,
p. 3079, 2022.
[12] S. Saito, E. Chua, N. Capel, and R. Hu, “Improving lime robustness with
smarter locality sampling,” arXiv preprint arXiv:2006.12302, 2020.
[13] Z. Carmichael and W. J. Scheirer, “Unfooling perturbation-based post
hoc explainers,” arXiv preprint arXiv:2205.14772, 2022.
[14] M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?”
explaining the predictions of any classifier, in Proceedings of the 22nd
ACM SIGKDD international conference on knowledge discovery and
data mining, 2016, pp. 1135–1144.
[15] S. M. Lundberg and S.-I. Lee, A unified approach to interpreting model
predictions,” Advances in neural information processing systems, vol. 30,
2017.
[16] S. Samarakoon, Y. Siriwardhana, P. Porambage, M. Liyanage, S.-Y.
Chang, J. Kim, J. Kim, and M. Ylianttila, “5g-nidd: A comprehensive
network intrusion detection dataset generted over 5g wireless network,
arXiv preprint arXiv:2212.01298, 2022.
[17] “Nsl-kdd dataset.” [Online]. Available:
https://www.unb.ca/cic/datasets/nsl.html
[18] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
analysis of the kdd cup 99 data set,” in 2009 IEEE symposium on
computational intelligence for security and defense applications. Ieee,
2009, pp. 1–6.
[19] N. Paulauskas and J. Auskalnis, Analysis of data pre-processing in-
fluence on intrusion detection using nsl-kdd dataset,” in 2017 open
conference of electrical, electronic and information sciences (eStream).
IEEE, 2017, pp. 1–5.
[20] P. S. Bhattacharjee, A. K. M. Fujail, and S. A. Begum, “Intrusion
detection system for nsl-kdd data set using vectorised fitness function
in genetic algorithm,” Adv. Comput. Sci. Technol, vol. 10, no. 2, pp.
235–246, 2017.
... Explainable AI (XAI) methods address the model opacity problem through various global and local explanation methods (Rawal et al., 2021). Several studies have studied explainable AI methods in intrusion detection (Alam and Altiparmak, 2024;Szczepański et al., 2020;Senevirathna et al., 2024;Moustafa et al., 2023). However, it is crucial to note that these studies did not comprehensively evaluate Explainable AI methods under various intrusion datasets and miscellaneous sets of Black box nature of AI models. ...
... (Szczepański et al., 2020) introduced the hybrid Oracle Explainer IDS, which combines artificial neural networks and decision trees to achieve high accuracy and provide human-understandable explanations for its decisions (Szczepański et al., 2020). In a paper (Senevirathna et al., 2024), authors have developed an Oracle-based Explainer module that uses the closest cluster to generate an explanation for the decision. A study explores how explanations in the context of 5G security can be targeted and weakened using scaffolding techniques. ...
... Senevirathna et al. [25] proposed a new framework for scaffolding attacks in security contexts, combining XAI outputs with domain knowledge to identify target features. The approach identifies essential aspects an attacker would conceal while building a model and presents an effective attack detection method. ...
... Table 2 summarizes previous research that utilized explainable AI techniques to explain DDoS attacks. CNN SHAP, LIME ToN_IoT [19] LSTM LIME, SHAP, Anchor, and LORE CICDDoS2017/2018/2019 [21] RF SHAP, LIME IIoT [22] MLP Kernel SHAP CICDDoS2019 [23] Decision tree SHAP, ELI5, and LIME Artificial dataset [24] Autoencoder Kenal SHAP NSL-KDD [25] MLP SHAP 5GNIDD, NLS-KDD [26] KNN SHAP, LIME RoEduNetSIMARGL2021 CICIDS-2017, NSL-KDD [27] RF LIME, SHAP, Grad-CAM, and GBP MQTTset, CICIDS-2017 ...
Article
Full-text available
In the era of the Internet of Things (IoT), the proliferation of connected devices has raised security concerns, increasing the risk of intrusions into diverse systems. Despite the convenience and efficiency offered by IoT technology, the growing number of IoT devices escalates the likelihood of attacks, emphasizing the need for robust security tools to automatically detect and explain threats. This paper introduces a deep learning methodology for detecting and classifying distributed denial of service (DDoS) attacks, addressing a significant security concern within IoT environments. An effective procedure of deep transfer learning is applied to utilize deep learning backbones, which is then evaluated on two benchmarking datasets of DDoS attacks in terms of accuracy and time complexity. By leveraging several deep architectures, the study conducts thorough binary and multiclass experiments, each varying in the complexity of classifying attack types and demonstrating real-world scenarios. Additionally, this study employs an explainable artificial intelligence (XAI) AI technique to elucidate the contribution of extracted features in the process of attack detection. The experimental results demonstrate the effectiveness of the proposed method, achieving a recall of 99.39% by the XAI bidirectional long short-term memory (XAI-BiLSTM) model.
... There are many papers published in the area of explainable AI (XAI) in network security. The developed methods use XAI techniques such as Shapley additive explanations (SHAPs) and local interpretable model-agnostic explanations (LIMEs) to improve the interpretability of anomaly detection models [42][43][44]. Furthermore, multiple studies investigated the use of causal discovery algorithms (LiNGAM, FCI (fast causal inference)) to understand the causal relationships between network features and anomalies [45]. ...
Article
Full-text available
The Internet of Things (IoT) is developing quickly, which has led to the development of new opportunities in many different fields. As the number of IoT devices continues to expand, particularly in transportation and healthcare, the need for efficient and secure operations has become critical. In the next few years, IoT connections will continue to expand across different fields. In contrast, a number of problems require further attention to be addressed to provide safe and effective operations, such as security, interoperability, and standards. This research investigates the efficacy of integrating explainable artificial intelligence (XAI) techniques and causal inference methods to enhance network anomaly detection. This study proposes a robust TOCA-IoT framework that utilizes the linear non-Gaussian acyclic model (LiNGAM) to find causal relationships in network traffic data, thereby improving the accuracy and interpretability of anomaly detection. A refined threshold optimization strategy is employed to address the challenge of selecting optimal thresholds for anomaly classification. The performance of the TOCA-IoT model is evaluated on an IoT benchmark dataset known as CICIoT2023. The results highlight the potential of combining causal discovery with XAI for building more robust and transparent anomaly detection systems. The results showed that the TOCA-IoT framework achieved the highest accuracy of 100% and an F-score of 100% in classifying the IoT attacks.
... However, this research is extremely reliant on labelled data, which is difficult to provide in real-world situations. Alzubaidi et al. [6], Alani [9], and Sharma et al. [17], referred to XAI [49], a relatively simple method to achieve a high IoT IDS prediction accuracy. ...
Article
Full-text available
Escalating cyber security threats and the increased use of Internet of Things (IoT) devices require utilisation of the latest technologies available to supply adequate protection. The aim of Intrusion Detection Systems (IDS) is to prevent malicious attacks that corrupt operations and interrupt data flow, which might have significant impact on critical industries and infrastructure. This research examines existing IDS, based on Artificial Intelligence (AI) for IoT devices, methods, and techniques. The contribution of this study consists of identification of the most effective IDS systems in terms of accuracy, precision, recall and F1-score; this research also considers training time. Results demonstrate that Graph Neural Networks (GNN) have several benefits over other traditional AI frameworks through their ability to achieve in excess of 99% accuracy in a relatively short training time, while also capable of learning from network traffic the inherent characteristics of different cyber-attacks. These findings identify the GNN (a Deep Learning AI method) as the most efficient IDS system. The novelty of this research lies also in the linking between high yielding AI-based IDS algorithms and the AI-based learning approach for data privacy protection. This research recommends Federated Learning (FL) as the AI training model, which increases data privacy protection and reduces network data flow, resulting in a more secure and efficient IDS solution.
... The study highlights the limitations of current XAI approaches in accurately reflecting the true decision-making process of AI models and their susceptibility to deliberate distortions. By analyzing various post-hoc explanation methods, the paper underscores the need for more robust and reliable explainability mechanisms to improve the effectiveness and trustworthiness of network intrusion detection systems [19].This paper explores methods for enhancing the fairness and performance of edge cameras using Explainable AI (XAI) techniques. It addresses the challenge of ensuring that edge cameras, which are critical for applications such as surveillance and monitoring, operate fairly and efficiently across diverse scenarios. ...
Article
The exploring comprehensive review of cutting-edge techniques for securing cloud-stored data and managing sensitive information in the context of smart cities through the application of Ciphertext-Policy Attribute-Based Encryption (CP-ABE). It highlights the innovative integration of blockchain technology with CP-ABE, which introduces a decentralized and tamper- resistant key management system, thereby enhancing the overall security framework in cloud environments where data sharing is prevalent. Introducing an online/offline multi-authority CP- ABE scheme, characterized by hidden policies, offers significant advancements in protecting user attributes and access structures, ensuring that sensitive information remains confidential even during encryption and decryption processes. This dual approach not only fortifies security but also optimizes the efficiency of data- sharing mechanisms. Furthermore, the paper delves into imple- menting hidden sensitive policies and keyword search techniques within smart city infrastructures, which are designed to facilitate secure and efficient data retrieval. These techniques ensure that while data remains accessible to authorized users, privacy is rigorously maintained. Collectively, these approaches represent significant strides in bolstering the security and confidentiality of data in both cloud-based and smart city applications, addressing the growing demand for robust and efficient data management solutions in increasingly interconnected environments. Index Terms—Ciphertext-Policy Attribute-Based Encryption (CP-ABE), Blockchain, Key Management, Decentralized Secu- rity, Cloud-Stored Data, Yolo V7, Explainable AI (XAI), On- line/Offline Multi-Authority Scheme, Hidden policies, Privacy Preservation, Data Protection, Secure Data Management
... These methods assign an importance factor to each input feature, indicating their influence on the model's output label [2]. For instance, Mallampati et al. [14], Senevirathna et al. [15], and Zebin et al. [16] utilise the SHapley Additive exPlanation (SHAP) method across various network domains. This approach expresses explainability as the weight of each feature and its impact on the output class, enabling the development of algorithms to respond to threats in real time. ...
Preprint
Full-text available
Large Language Models (LLMs) have revolutionised natural language processing tasks, particularly as chat agents. However, their applicability to threat detection problems remains unclear. This paper examines the feasibility of employing LLMs as a Network Intrusion Detection System (NIDS), despite their high computational requirements, primarily for the sake of explainability. Furthermore, considerable resources have been invested in developing LLMs, and they may offer utility for NIDS. Current state-of-the-art NIDS rely on artificial benchmarking datasets, resulting in skewed performance when applied to real-world networking environments. Therefore, we compare the GPT-4 and LLama3 models against traditional architectures and transformer-based models to assess their ability to detect malicious NetFlows without depending on artificially skewed datasets, but solely on their vast pre-trained acquired knowledge. Our results reveal that, although LLMs struggle with precise attack detection, they hold significant potential for a path towards explainable NIDS. Our preliminary exploration shows that LLMs are unfit for the detection of Malicious NetFlows. Most promisingly, however, these exhibit significant potential as complementary agents in NIDS, particularly in providing explanations and aiding in threat response when integrated with Retrieval Augmented Generation (RAG) and function calling capabilities.
Article
Full-text available
Intrusion detection systems are widely utilized in the cyber security field, to prevent and mitigate threats. Intrusion detection systems (IDS) help to keep threats and vulnerabilities out of computer networks. To develop effective intrusion detection systems, a range of machine learning methods are available. Machine learning ensemble methods have a well-proven track record when it comes to learning. Using ensemble methods of machine learning, this paper proposes an innovative intrusion detection system. To improve classification accuracy and eliminate false positives, features from the CICIDS-2017 dataset were chosen. This paper proposes an intrusion detection system using machine learning algorithms such as decision trees, random forests, and SVM (IDS). After training these models, an ensemble technique voting classifier was added and achieved an accuracy of 96.25%. Furthermore, the proposed model also incorporates the XAI algorithm LIME for better explainability and understanding of the black-box approach to reliable intrusion detection. Our experimental results confirmed that XAI LIME is more explanation-friendly and more responsive.
Article
Full-text available
The application of Artificial Intelligence (AI) and Machine Learning (ML) to cybersecurity challenges has gained traction in industry and academia, partially as a result of widespread malware attacks on critical systems such as cloud infrastructures and government institutions. Intrusion Detection Systems (IDS), using some forms of AI, have received widespread adoption due to their ability to handle vast amounts of data with a high prediction accuracy. These systems are hosted in the organizational Cyber Security Operation Center (CSoC) as a defense tool to monitor and detect malicious network flow that would otherwise impact the Confidentiality, Integrity, and Availability (CIA). CSoC analysts rely on these systems to make decisions about the detected threats. However, IDSs designed using Deep Learning (DL) techniques are often treated as black box models and do not provide a justification for their predictions. This creates a barrier for CSoC analysts, as they are unable to improve their decisions based on the model’s predictions. One solution to this problem is to design explainable IDS (X-IDS). This survey reviews the state-of-the-art in explainable AI (XAI) for IDS, its current challenges, and discusses how these challenges span to the design of an X-IDS. In particular, we discuss black box and white box approaches comprehensively. We also present the tradeoff between these approaches in terms of their performance and ability to produce explanations. Furthermore, we propose a generic architecture that considers human-in-the-loop which can be used as a guideline when designing an X-IDS. Research recommendations are given from three critical viewpoints: the need to define explainability for IDS, the need to create explanations tailored to various stakeholders, and the need to design metrics to evaluate explanations.
Article
Full-text available
In this letter, we present a two-stage pipeline for robust network intrusion detection. First, we implement an extreme gradient boosting (XGBoost) model to perform supervised intrusion detection, and leverage the SHapley Additive exPlanation (SHAP) framework to devise explanations of our model. In the second stage, we use these explanations to train an auto-encoder to distinguish between previously seen and unseen attacks. Experiments conducted on the NSL-KDD dataset show that our solution is able to accurately detect new attacks encountered during testing, while its overall performance is comparable to numerous state-of-the-art works from the cybersecurity literature.
Article
Monumental advancements in artificial intelligence (AI) have lured the interest of doctors, lenders, judges, and other professionals. While these high-stakes decision-makers are optimistic about the technology, those familiar with AI systems are wary about the lack of transparency of its decision-making processes. Perturbation-based post hoc explainers offer a model agnostic means of interpreting these systems while only requiring query-level access. However, recent work demonstrates that these explainers can be fooled adversarially. This discovery has adverse implications for auditors, regulators, and other sentinels. With this in mind, several natural questions arise - how can we audit these black box systems? And how can we ascertain that the auditee is complying with the audit in good faith? In this work, we rigorously formalize this problem and devise a defense against adversarial attacks on perturbation-based explainers. We propose algorithms for the detection (CAD-Detect) and defense (CAD-Defend) of these attacks, which are aided by our novel conditional anomaly detection approach, KNN-CAD. We demonstrate that our approach successfully detects whether a black box system adversarially conceals its decision-making process and mitigates the adversarial attack on real-world data for the prevalent explainers, LIME and SHAP. The code for this work is available at https://github.com/craymichael/unfooling.