ArticlePDF Available

Network Information Security Monitoring Under Artificial Intelligence Environment

IGI Global Scientific Publishing
International Journal of Information Security and Privacy
Authors:

Abstract

At present, network attack means emerge in endlessly. The detection technology of network attack must be constantly updated and developed. Based on this, the two stages of network attack detection (feature selection and traffic classification) are discussed. The improved bat algorithm (O-BA) and the improved random forest algorithm (O-RF) are proposed for optimization. Moreover, the NIS system is designed based on the Agent concept. Finally, the simulation experiment is carried out on the real data platform. The results showed that the detection precision, accuracy, recall, and F1 score of O-BA are significantly higher than those of references [17], [18], [19], and [20], while the false positive rate is the opposite (P < 0.05). The detection precision, accuracy, recall, and F1 score of O-RF algorithm are significantly higher than those of Apriori, ID3, SVM, NSA, and O-RF algorithm, while the false positive rate is significantly lower than that of Apriori, ID3, SVM, NSA, and O-RF algorithm (P < 0.05).
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
DOI: 10.4018/IJISP.345038
1
This article published as an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creative-
commons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and production in any medium, provided the author of the
original work and original publication source are properly credited.
Network Information Security Monitoring
Under Articial Intelligence Environment
Longfei Fu
Lanzhou Institute of Technology, China
Yibin Liu
Lanzhou Institute of Technology, China
Yanjun Zhang
Lanzhou Institute of Technology, China
Ming Li
Information and Communication Branch of State Grid Anhui Electric Power Co., Ltd., China
ABSTRACT
At present, network attack means emerge in endlessly. The detection technology of network
attack must be constantly updated and developed. Based on this, the two stages of network attack
detection (feature selection and traffic classification) are discussed. The improved bat algorithm
(O-BA) and the improved random forest algorithm (O-RF) are proposed for optimization. Moreover,
the NIS system is designed based on the Agent concept. Finally, the simulation experiment is carried
out on the real data platform. The results showed that the detection precision, accuracy, recall, and
F1 score of O-BA are significantly higher than those of references [17], [18], [19], and [20], while
the false positive rate is the opposite (P &lt; 0.05). The detection precision, accuracy, recall, and F1
score of O-RF algorithm are significantly higher than those of Apriori, ID3, SVM, NSA, and O-RF
algorithm, while the false positive rate is significantly lower than that of Apriori, ID3, SVM, NSA,
and O-RF algorithm (P &lt; 0.05).
KEYWORDS
Bat Algorithm, Network Attack, Network Information Security, Random Forest Algorithm
With the rapid development of internet technology, network security issues are increasingly
prominent, and network information security (NIS) is facing enormous challenges. Various information
security incidents, including webpage tampering, computer viruses, illegal system intrusions, data
leaks, website fraud, service paralysis, and illegal exploitation of vulnerabilities, have brought
significant threats and losses to people. Therefore, exploring how to detect and defend against network
attacks has become an urgent problem to be solved.
Network attack detection is an important means of ensuring NIS security. It includes two stages:
feature selection and traffic classification. In terms of feature selection, traditional algorithms have
lower processing efficiency for network data. Traffic classification is also limited by the complexity
of feature dimensions and classifiers. Therefore, how to improve the efficiency and accuracy of feature
selection and traffic classification is an important research task.
This article proposes improved bat algorithms and random forest (RF) algorithms for optimizing
network attack detection. These two algorithms are applied to feature selection and traffic classification,
respectively. Specifically, this article proposes a NIS system based on outlier-based behavioral
analysis (O-BA) and outlier-based RF (O-RF) algorithms, which can effectively detect and defend
2
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
against denial-of-service (DOS) attacks and detection attacks. To verify the performance of the
proposed algorithm and system, simulation experiments were conducted on a real data platform.
The experimental results show that the proposed algorithm and system can significantly improve the
effectiveness of network attack detection and maintain good detection efficiency. Therefore, this has
important practical significance for the detection and defense of current network attacks.
LITERATURE REVIEW
NIS is a comprehensive discipline involving computer science, network technology,
communication technology, cryptography, information security technology, applied mathematics,
number theory, and information theory. It mainly means that information systems (including hardware,
software, data, humans, the physical environment, and infrastructure) are protected from damage,
change, and disclosure due to accidental or malicious reasons (Rzym et al., 2024). The system operates
continuously, reliably, and normally, and the information service is not interrupted.
Finally, business continuity is realized. With the rapid development of internet technology and
the diversification of hacker attack methods, NIS is facing a huge threat in recent years. Information
security incidents such as web page tampering, computer viruses, illegal system intrusion, data
disclosure, website fraud, service paralysis, and illegal exploitation of vulnerabilities occur from time
to time (Andrade-Hoz et al., 2024). Therefore, how to detect and defend network attacks has become
a topic of concern. Network attacks generally attack the system and resources by using loopholes and
security defects in the network information system (Yun et al., 2024).
Threats are mainly divided into man-made threats and natural threats. Natural threats come
from various natural disasters, harsh site environments, electromagnetic interference, natural aging
of network equipment, etc. Man-made threats are man-made attacks on the NIS. By looking for
the weakness of the system, the purpose of destroying, cheating, and stealing data and information
is achieved in an unauthorized way (Palma et al., 2024). In contrast, many types of well-designed
man-made attack threats are difficult to prevent. These are the attacks prevention efforts should
focus on.
Network attack detection is the primary concern for NIS, and the resulting network attack detection
systems are diverse, such as open-source HIDS security, Snort, Huawei NIP series intrusion detection
system, Venustech IDS, and NSFOCUS NIDS, all with their own characteristics (Kong et al., 2024).
Although the research on network attack detection has never stopped, there are still deficiencies
in the face of the same endless attack methods. From the perspective of communication, any new
network information technology is bound to be accompanied by new attack modes and characteristics,
making it more difficult to automatically extract network attack characteristics, which results in the loss
of effectiveness of network attack detection technology through fixed rule matching (Casado-Vara et
al., 2024). Moreover, in the real environment, real-time response to network attack means is required,
so there is not enough time to slowly mark the attack samples. Under the condition of capturing a
small number of samples, the detection system needs to accurately find the intrusion virus (Kan &
Fang, 2024). The emergence of new attack technologies greatly tests the real-time performance of
the system. In addition, artificial intelligence (AI) technology based on deep learning has developed
rapidly in recent years and has been applied to network attack technology by hackers. This has made
attack methods more and more intelligent, requiring the use of AI technology as part of the continuous
updating of defense technology (Hasas et al., 2024).
In summary, NIS has always been a topic of concern for scholars. The detection technology of
network attacks must be constantly updated and developed to better face the various network attack
methods, and the use of AI technology is a new development trend.
A covert channel is used to ex/infiltrate classified information from legitimate targets;
consequently, this manipulation violates network security policy and privacy. Cao Pan, and Zou
(2024) suggest a novel hybrid covert channel detection system implementing two AI techniques,
3
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
fuzzy logic and genetic algorithm, to gain sufficient and optimal detection against covert channels.
The combination of these paradigms with embedded artificial intelligence in edge devices, or edge
AI, enables further improvements. Najafi Mohsenabad and Tut (2024) discuss the potential of an edge
AI-capable system architecture for the blockchain of things. Sami et al. (2024) present a smart and
secure framework for the hospital environment using the internet of things (IoT) and AI. This system
overcomes the drawbacks of the current system of hospital information such as inflexible modes of
networking, fixed points of information, and so on. The application of the agricultural IoT integrates
AI, IoT, blockchain, and virtual/augmented reality technologies. Ezhilarasi et al. (2023) investigate
the combination of edge computing and AI, blockchain, and virtual/augmented reality technology.
Intelligent video retrieval technology integrates video processing, computer vision, and AI, which
greatly improves the efficiency of monitoring and the accuracy and linkage of monitoring systems.
Based on deep learning theory and face detection neural network, Wani et. al. (2024) propose a
video-oriented cascaded intelligent face detection algorithm, which builds a deep learning network by
cascading multiple features, from edge features, contour features, local features to semantic features,
and advances layer by layer.
Multilayer in-band network telemetry and data analytics are the key techniques for monitoring
and troubleshooting backbone networks since they obtain real-time and fine-grained telemetry data
about the optical and IP layers and facilitate AI-assisted network automation. Li al. (2024) propose
to realize multilayer network monitoring and data analytics over encrypted telemetry data and
demonstrate a privacy-preserving multilayer in-band network telemetry and data analytics system to
address the aforementioned issues. Though it is technically advanced, the health care information and
communication technology network's security is a significant challenge for health care. Tenepalli and
TM (2024) propose the IoT with AI system for healthcare security. Yun et al. (2024) introduce the
network security monitoring method based on deep learning. This method can detect network security
information attacks that cannot be found at the network level and improve the security performance
of network security. For security reasons, the deployed drones can sense and collect the data from
their surroundings, and then securely send the information to the ground station server. Ertem (2024)
proposes a novel AI-envisioned smart-contract-based blockchain-enabled security framework for
secure communication in IoT. It involves the use of information and communication technology and
other solutions in fault and intrusion detection, and mere monitoring of energy generation, transmission,
and distribution. Biehler et al. (2024) aim to present a comprehensive review of the next smart grid
research trends and technological background and discuss a futuristic next-generation smart grid
driven by AI and leveraged by IoT and 5G.
RELATED MATERIALS AND METHODS
Feature Selection
Network attack detection is one of the important means to ensure NIS, aiming to timely detect,
identify, and respond to various network attack behaviors, ensuring the normal operation of network
systems and the security of user data (Karthikeyan et al., 2024). With the rapid development of the
internet and the continuous evolution of hacker attacks, network attacks have become more and more
complex and hidden, posing a huge threat to network security.
Network attack detection mainly includes two key stages: feature selection and traffic
classification. In the feature selection stage, by analyzing the feature information in network traffic,
feature indicators related to attack behavior are extracted (Emil Selvan et al., 2024). These features can
include source IP address, destination IP address, port number, protocol type, etc. The goal of feature
selection is to select the most representative and discriminative features from massive network data,
to reduce the computational complexity of subsequent traffic classification and improve accuracy.
Feature selection is used to process high-dimensional data, reduce feature dimensionality, and improve
4
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
algorithm learning efficiency and classification accuracy. The process is shown in Figure 1, including
the generation process, evaluation function, termination rule, and inspection process.
In network attack detection, feature selection helps to extract the most relevant and discriminative
features for subsequent traffic classification tasks (Cao et al., 2024). The detailed process of feature
selection is as follows:
Targeting: Clarify the specific targets for network attack detection, such as detecting specific
types of attacks or abnormal behavior. This helps to define the types and ranges of features that
need to be selected.
Data collection and preparation: Collect and prepare network traffic data for feature selection.
This may include raw traffic data, packet capture data, or extracted network feature data. Ensure
that the dataset contains normal traffic and different types of attack samples.
Feature extraction and generation: Extract useful features from the original traffic data based on
the target and dataset. This can be based on traditional network traffic characteristics (such as
packet size, transmission protocol, source destination IP address, etc.) or more advanced features
(such as statistical features, time series features, spectral features, etc.).
Feature preprocessing: Preprocessing the extracted features, including data cleaning, missing
value processing, feature normalization, or standardization. Ensure that the feature data is on the
same scale and eliminate the influence of outliers or noise on feature selection.
Feature selection method selection: Select an appropriate feature selection method based on the
size of the dataset, the number of features, and the target requirements. The commonly used
feature selection methods include filtering (such as correlation coefficient, information gain,
chi-square test, etc.), packaging (such as recursive feature elimination, genetic algorithm, etc.),
and embedding (such as L1 regularization, decision tree feature importance, etc.).
Feature evaluation: Use selected feature selection methods to evaluate and rank features, measuring
the correlation and discrimination between each feature and the target variable. This can be
achieved through indicators such as feature weight and feature importance.
Feature subset selection: Based on the evaluation results, select feature subsets with high
importance or discrimination. This can be done based on a set threshold or by selecting features
that rank higher.
Verify feature selection effectiveness: After feature selection, evaluate the model using validation
sets or cross-validation methods to ensure that the selected features are effective in improving
the overall model performance.
Figure 1. Schematic diagram of feature selection process
5
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
It should be noted that the specific process of feature selection may vary depending on factors such
as network attack types, dataset characteristics, and model selection (Wang et al., 2024). Therefore, in
practical applications, feature selection methods should be flexibly selected and adjusted according
to specific situations to obtain the best subset of features.
The feature selection method can be used alone or combined for comprehensive evaluation. In
practical applications, selecting appropriate feature selection methods based on specific datasets and
classification tasks is crucial (Nayomi et al., 2024). Meanwhile, the effectiveness of feature selection
can also be evaluated through techniques such as cross-validation, and further tuning and optimization
can be carried out. Here are some commonly used feature selection methods:
Information gain: Information gain is a commonly used feature selection method that evaluates
the importance of features by calculating their information gain for classification tasks. The
greater the information gain, the greater the contribution of this feature to the classification task.
Gini Index: The Gini Index is an indicator used to measure the purity of features and evaluate
their importance. The smaller the Gini coefficient, the greater the contribution of this feature
to the classification task.
Pearson correlation coefficient: Pearson correlation coefficient is used to measure the linear
correlation between two variables and can be used to evaluate the degree of correlation between
features and classification objectives. The larger the absolute value of the correlation coefficient,
the stronger the correlation between the feature and the classification target.
Variance threshold: The variance selection method evaluates the importance of features by
calculating their variance. Features with low variance may have weaker discriminative ability
for classification tasks and can be eliminated.
Recursive feature elimination: Recursive feature elimination is an iterative feature selection
method that selects features by repeatedly training the model and removing the least important
features. After each iteration, the least important features will be deleted until the preset number
of features or specified stopping conditions are reached.
Traffic Classification
In network attack detection, traffic classification is the next key step after feature selection, which
distinguishes specific types of network traffic from normal traffic or other types of attack traffic
(Al-Hawawreh et al., 2024). The following is a general flow classification process:
1. Data preparation: Prepare training and testing datasets for traffic classification. These datasets
should include normal traffic and different types of attack traffic, and feature vectors should be
extracted based on the feature selection results.
2. Model selection: Select an appropriate machine learning model for traffic classification based
on task requirements and dataset characteristics. Common classification models include support
vector machines (SVM), decision trees, RFs, neural networks, etc.
3. Feature normalization: Normalizing or standardizing feature vectors to ensure that features are
on the same scale and eliminate the influence of outliers or noise on the model.
4. Model training: Train the selected model using a training dataset to learn how to classify different
types of traffic. Cross-validation or grid search methods can be used to optimize the parameter
settings of the model.
5. Model evaluation: Evaluate the model using a test dataset to measure its classification
performance. The evaluation indicators include accuracy, recall, F1 value, etc.
6. Model optimization: Based on the evaluation results, optimize and improve the model. For
example, increasing the size of the training dataset, adjusting model parameters, or using ensemble
learning methods can improve classification performance.
6
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
7. New data classification: Use optimized models to classify new traffic data. This may include
real-time traffic data or historical data that has already been recorded. Compare new data with
known data to detect anomalies or attack traffic.
It should be noted that the specific process of traffic classification may vary depending on factors
such as network attack types, dataset characteristics, and model selection (Siddamsetti et al., 2024).
Therefore, in practical applications, flexible selection and adjustment of traffic classification methods
should be made according to specific situations to achieve the best classification effect.
In the traffic classification stage, network traffic data is classified using algorithms such as
machine learning and deep learning to determine whether it belongs to normal traffic or attack traffic.
Common classification algorithms include SVM, decision tree, Naive Bayes, etc. These algorithms
can be trained and learned based on the feature indicators obtained during the feature selection stage,
establish classification models, and achieve network attack detection by classifying and predicting
new network traffic data. Here are some commonly used traffic classification methods and related
information:
SVM: SVM is a widely used supervised learning algorithm for traffic classification. It divides data
points into different categories by constructing an optimal hyperplane to achieve classification
objectives. SVM has good performance in handling high-dimensional data and nonlinear
problems.
Decision tree: A decision tree is a common traffic classification algorithm that makes classification
decisions by constructing a tree-like structure. Gradually categorize traffic data into different
categories through a series of characteristic judgments. The decision tree algorithm is simple
and intuitive, easy to explain and understand.
Naive Bayes: Naive Bayes is a traffic classification algorithm based on probability and statistics.
It is based on Bayes theorem and the assumption of feature condition independence and classifies
by calculating a posterior probability. Naive Bayes algorithm has fast computing speed and is
suitable for large-scale datasets.
Deep learning methods: In recent years, deep learning has been widely applied in traffic
classification. For example, deep learning models such as convolutional neural networks (CNN)
and recurrent neural networks (RNN) can extract high-level features from raw network traffic
data and achieve accurate classification.
Optimization of Feature Selection Based on Bat Algorithm
The bat algorithm is a heuristic optimization algorithm inspired by the foraging behavior of
bats. It is widely used to solve different types of optimization problems. The basic idea of the
bat algorithm is to simulate the behavior of bats during their foraging process. Bats perceive the
surrounding environment by emitting ultrasound and navigate based on the position and distance
of the target. Each bat in the algorithm represents a candidate solution, which searches and adjusts
based on the information of the current optimal solution and its characteristics. The following is the
general process of the bat algorithm:
1. Initialize bat population: Randomly generate an initial bat population, with each bat representing
a candidate solution and initializing its position and velocity.
2. Calculate fitness value: Calculate the fitness value of each bat based on the fitness function of
the problem to measure the quality of its solution.
7
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
3. Update position and speed: Based on the information of the current optimal solution and its
characteristics, update the position and speed of each bat. This can be achieved by adjusting the
frequency, loudness, and speed of bats.
4. Determine the loudness value: Determine whether to emit ultrasonic signals based on the bat's
loudness value. A higher loudness value means that bats are more likely to search for better
solutions.
5. Adjusting position: When bats emit ultrasonic signals, they randomly adjust their position within
a certain range to explore new solution spaces.
6. Update optimal solution: Update the global optimal solution based on the comparison of the
current fitness values.
7. Termination condition judgment: Based on the set termination condition, determine whether the
termination algorithm's conditions are met, e.g., reaching the maximum number of iterations or
fitness values that meet preset requirements.
8. Repeat steps 3 to 7 until the termination condition is met.
It should be noted that the bat algorithm can be improved and adjusted according to specific
problems, such as introducing adaptive parameters and combining multiple strategies. In addition,
bat algorithms are commonly used to solve continuous optimization problems, but they can also
be applied to discrete optimization problems by defining the search space and fitness function of
the variables accordingly. The bat algorithm has some advantages in the feature selection stage of
network attack detection:
Global search capability: The bat algorithm can conduct global searches by simulating the foraging
behavior of bats, thereby better exploring potential information in the feature space. This enables
it to find a better subset of features during the feature selection stage.
Adaptability: The bat algorithm has adaptability, which means it can adjust the search strategy
based on the current search state and the information of the optimal solution. In the feature
selection process, the bat algorithm can flexibly adjust the position and speed of bats based on
the importance and correlation of features, to better search for the optimal feature subset.
Diversity maintenance: The bat algorithm introduces randomness and diversity maintenance
mechanisms to avoid getting stuck in local optima. This is crucial for feature selection, as the
combination of feature subsets may lead to different performance outcomes. The bat algorithm
can better explore different combinations in the feature space by maintaining diversity.
Efficiency: Bat algorithms typically have fast convergence speed and high search efficiency. This
is very beneficial in the feature selection stage, as the feature space is usually large and requires
a significant amount of computational resources and time. The bat algorithm can accelerate the
feature selection process through effective search strategies.
Explainability: The working principle of the bat algorithm is relatively intuitive and understandable,
making it easy to explain. This is crucial for the feature selection stage of network attack detection,
as the selected subset of features needs to have a certain degree of interpretability to further
analyze and understand network attack behavior.
The bat algorithm also faces some challenges and limitations in the feature selection stage of
network attack detection, such as parameter settings, convergence, and local optima. Therefore, in
practical applications, adjustments and optimizations need to be made based on specific problems
to fully leverage the advantages of bat algorithms in feature selection.
The bat algorithm is adopted to optimize feature selection. The position coordinates of each bat
in the bat algorithm are the solution vectors of the optimization problem, and the optimal solution
can be found through the change of bat position. If the spatial position of the ith bat is xi, the velocity
8
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
is si, and the frequency of the ultrasonic wave emitted by the bat is ki, then at the z iteration, these
internal variables can be updated as shown in equations 1–3.
k
i = k
min +
(
k
max k
min
)
× α (1)
s
ij
z = s
ij
z−1 +
(
x
ij
z x
*
)
× k
i (2)
x
ij
z = x
ij
z−1 + s
ij
z (3)
where α =
[
0, 1
]
, k
min is the minimum value of k, k
min is the maximum value of k, s
ij
z represents
the velocity of bat individual I at z, s
ij
z−1 represents the velocity of bat individual i at time z-1, and
x
* represents the optimal solution for the population. Then, the individual bat makes a local search
through a random walk to obtain a new solution, as shown in equation 4.
x
1 = x
0 + φ × B
z
* (4)
where x
0 represents the original position of the bat, and B
z
* represents the average loudness of the
bat population at the z iteration. The inherent loudness and pulse emissivity of each bat are generated
randomly. If the bat reaches the target, its inherent loudness will decrease and the pulse emissivity
will increase, as shown in equations 5 and 6.
B
i
z = μ × B
i
z−1 (5)
D
i
z = D
i
0 ×
[
1 − d
ρz
]
(6)
where B
i
z represents the intrinsic intensity, D
i
z represents the pulse emissivity, B
i
z−1 represents
the volume of the individual bat at z-1, and D
i
0 represents the pulse frequency of the individual bat
at the initial time.
It is assumed that the number of individuals in the bat population is M, and the k-means algorithm
is introduced to divide the population into multiple subgroups. All bat individuals will be regrouped to
form a new subgroup after each iteration. It is supposed that each new subgroup is c, c = 1, 2, 3, , m ,
the position of the local superior individual is Hc, and the best position of the whole bat population is
W. Then, when the new subgroup division rule is adopted, the local optimal individual of each new
subgroup is expressed as shown in equations 7–9.
s
ij
z = δ
z × s
ij
z−1 +
(
x
ij
z−1 H
c
z−1
)
× k
i +
(
x
ij
z−1 V
i
z−1
)
× γ
z (7)
δ
z = δ
max
(
δ
max δ
min
)
× z
___________
T
z
(8)
γ
z = γ
min +
(
γ
max γ
min
)
×
(
1 −
arccos
{
2z
_
Q
i
+ 1
}
____________
π
)
(9)
where V
i represents the optimal position reached by each bat, δ
z represents the inertia weight
coefficient of each bat at the z iteration, γ
z represents the self-learning coefficient of each bat at the z
iteration, and Q
i is the number of iterations. Therefore, by introducing the inertia weight coefficient
which decreases with the increase of iteration times, the bat individual has a strong global search
9
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
ability in the early stage, and the later inertia reduction is conducive to local search, thus accelerating
the convergence speed of the model.
Optimization of Traffic Classification Based on RF Algorithm
RF is an ensemble learning method based on decision trees, commonly used for machine learning
tasks such as traffic classification. It achieves more accurate and robust classification by constructing
multiple decision trees and synthesizing their results. Currently, RF is a common strong classifier.
The main characteristics of RFs in traffic classification are as follows:
Feature randomness: When constructing each decision tree, an RF randomly selects a subset of
features for training. This feature randomness helps to reduce overfitting and increase the mode's
generalization ability.
Sample randomness: RFs also perform bootstrap sampling on samples with dropout, which is
used to train each decision tree. This can increase the diversity of the model by allowing each
tree to have some samples not selected.
Multiple decision trees: RFs construct multiple decision trees, each using a different subset of
features and samples, and then integrate their classification results. The final classification result
is obtained by voting or averaging multiple decision trees.
In addition, the RF method has significant advantages in traffic classification, mainly as follows:
High accuracy: RFs classify by combining the prediction results of multiple decision trees.
Because each decision tree is independently generated, they can mutually correct errors and
compensate for deficiencies, thereby improving overall classification accuracy.
Robustness: RFs have good robustness against noise and missing data. They can handle traffic
data with incomplete features or containing noise, and can effectively handle outliers.
Processing high-dimensional features: RFs can handle datasets containing a large number of
features without the need for feature selection or dimensionality reduction. It can automatically
select useful features and filter out irrelevant or redundant features.
Interpretability: Compared to other complex machine learning algorithms, the results of RFs are
easier to interpret. Since the rules and judgment process of each decision tree are interpretable,
the contribution of each feature to the final classification result can be understood.
Anti-overfitting: By randomly selecting samples and features for training, RFs have good
anti-overfitting ability. They can reduce overfitting training data and improve generalization
performance on new data.
Parallelization processing: Decision trees in RFs can be generated in parallel, thus having high
computational efficiency when trained on large-scale datasets. This makes RFs a feasible choice
for processing large amounts of traffic data.
It should be noted that RFs also have some limitations. Due to each decision tree being
independently generated, there may be some correlation between them. In addition, RFs may be
affected by class imbalance and noisy data in certain situations. Therefore, when applying RF for
traffic classification, parameter optimization, and model evaluation need to be carried out according
to specific situations.
An RF algorithm is introduced to optimize process classification. The original RF algorithm is
good at processing high-dimensional data sets, but poor at processing unbalanced data sets. Therefore,
the algorithm is optimized by initializing weights to build O-RF. It is assumed that the total number
of samples in the data set is M, the general weight is initialized to an equal value of
1
_
M , so that each
sample has the same probability of being selected. To balance the data set samples, the weights are
10
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
initialized to 0.25, 0.2, 0.1, 0.3, and 0.15 according to the distribution of each category in the original
data set, improving the weights of a few categories of data. Therefore, before the first tree is generated,
the weight of each sample can be expressed as shown in equation 10.
δ
0,i = δ
j
_
M
i
(10)
δ
j represents the weight of class j, and δ
0,i indicates the probability of selecting one sample from
M samples. This optimization method can not only enhance the selection weight of a few categories of
samples but also avoid the data information of most categories of samples being selected many times.
After each tree is constructed, the weight of each sample can be modified according to the
classification results of the samples. If the classification of the sample is correct, its weight will be
reduced; otherwise, its weight will be increased. It is supposed the number of trees in the forest is G,
and the error of the gth tree is E
g . The calculation precision is A
g . The scaling factor is I
g , as shown
in equations 11–13.
A
g =
1
_
2 × ln
1 − E
g
_
E
g
(11)
δ
g+1,i =
δ
g+1
_
I
g
× α × e
± A
g (12)
I
g =
i=1
M
δ
g,i × α × e
± A
g (13)
e
± A
g is the adjustment factor when the sample weight is updated. Based on the above methods, for
a small number of samples, the weight of samples with wrong classification can be greatly increased
and the weight of samples with correct classification can be slightly reduced. For a large number of
samples, it can greatly reduce the weight of samples with correct classification and slightly increase
the weight of samples with wrong classification.
NIS System
The NIS is used for managing the authentication and authorization of network users. The NIS
system is mainly used to ensure the security of network resources and protect the personal privacy
of users. NIS systems typically include the following functions:
User authentication: The NIS system verifies the legitimacy of a user by verifying their identity
information, such as username and password. This can prevent unauthorized users from accessing
system resources.
User authorization: Once the user's identity is verified, the NIS system will grant the user access
to specific resources or functions based on their permission settings and role assignments. This
ensures that users can only access the resources they need, improving the security of the system.
Identity management: The NIS system maintains user identity information and related attributes,
such as name, contact information, department, etc. This helps organizations manage and track
users and provides corresponding audit functions.
Session management: The NIS system can track and manage the user's session status, including
login and logout operations, as well as session expiration and renewal. This helps to maintain
the stability and security of the system.
11
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Audit and logging: The NIS system can record user behavior and system events, generating
corresponding audit logs. This helps to monitor and analyze the usage of the system and conduct
investigations and traceability when needed.
Scalability and integration: NIS systems should have good scalability and be able to adapt to
network environments of different scales and complexities. In addition, it should also have the
ability to integrate with other systems, such as identity providers and access control systems.
The NIS system plays an important role in enterprises, organizations, and large network
environments, effectively managing and protecting network resources, and preventing unauthorized
access and abuse.
Following the agent concept, it can automatically complete the complex tasks involved in the
application of cryptography, and improve openness and interactivity; it is compatible with the public
key infrastructure system; it establishes a personal information security system; it conforms to X.509
V3 certificate standard; it purchases the relational model of the system by using unified modeling
language; it is compatible with commonly used encryption models; and it supports various operating
environments, such as Windows, Linux, macOS.
According to the above design principles, an NIS system composed of an interface, agent
control, and application service modules is designed (see Figure 2). The interface module mainly
includes two parts: the graphical user layer and the implementation layer, in which the graphical user
layer can display the user interface. The implementation layer can realize static web applications
(certificate establishment, file encryption and decryption, signature verification, etc.) and dynamic
web applications (human-computer interaction, file signing, etc.). The agent control module is the
core module of the system. It can interact with other modules by interpreting the script language
and save and restore the progress status of the task at the same time. The application service module
mainly performs communication services, file access, cryptographic function encapsulation, etc.
The workflow of the system is shown in Figure 3. First, it needs to enter a username and
password to confirm the identity. When logging in for the first time, it can initialize the password
and set a new password. The password is generated by the one-way hash algorithm and stored after
confirmation. If the password is entered incorrectly, it will exit the system after more than three times.
After the password is entered correctly, the system will display the main interface window, including
various items (communication settings, interaction settings, display settings, event list, etc.). Then,
Figure 2. Structure diagram of NIS system
12
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
the system will read the unprocessed agent events and use the agent control module to interpret the
script language, to start the corresponding threads to execute these tasks. The system will meet the
user’s corresponding requirements according to the user’s click feedback, such as creating personal
certificates, file encryption, certificate application downloading, and system settings.
This experiment is conducted on a real data platform. The hardware conditions are an Intel Core
i7-6700HQ processor (6M cache, up to 3.50 GHz), 64GB running memory, and an NVIDIA GeForce
RTX 2080 graphics card. The software conditions are a Ubuntu 16.04 open-source operating system,
JAVA language, Windows 10, cuda8.0+cudnn6.0, and TensorFlow machine learning platform.
This article uses the KDD99 dataset for simulation experiments, which is a common network
attack detection method. The KDD99 dataset is a nine-week network connection data set collected
from simulated US Air Force LANs, widely used in research on network intrusion detection algorithms
and systems. The KDD99 dataset contains labeled training data and unlabeled test data. The training
dataset consists of 4,898,431 network connections and contains 41 features, including basic network
connection-related information (such as source IP address, destination IP address, source port,
destination port, etc.) and statistical information related to network connections. Each network
connection is marked as a normal connection or has a specific type of attack connection, with a total
of 23 types of attacks, including DOS, remote to local (R2L), user to root (U2R), and probing. The
test dataset contains 3,542,645 network connections and has the same characteristic values. However,
the test dataset does not have clear labels, so it is necessary to use the label information from the
training dataset to evaluate the performance of network attack detection algorithms. Using the KDD99
dataset for simulation experiments can help researchers and engineers evaluate performance indicators
such as accuracy, recall, and false alarm rates of different network intrusion detection algorithms
and systems. The test data and training data have different probability distributions. The test data
Figure 3. Workflow diagram of NIS system
13
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
contains some attack types that do not appear in the training data, which makes intrusion detection
more realistic. The training data set contains a normal identification type and 22 training attack types.
The O-BA algorithm and O-RF algorithm are both algorithms for network intrusion detection
based on the KDD99 dataset.
1. O-BA algorithm: The O-BA algorithm is a network intrusion detection algorithm based on
anomaly detection. This algorithm establishes a behavior model for normal network connections
and uses anomaly detection techniques to detect network connections that do not match normal
behavior. The O-BA algorithm first extracts features from the KDD99 dataset and uses clustering
algorithms to divide network connections into different clusters. Then, perform anomaly detection
on each cluster, identify abnormal network connections, and mark them as possible intrusion
behaviors.
2. O-RF algorithm: The O-RF algorithm is also a network intrusion detection algorithm based on
anomaly detection, which combines the RF algorithm and anomaly detection technology. The
O-RF algorithm first extracts features from the KDD99 dataset and constructs a classification
model using the RF algorithm. Then, use the classification model to classify network connections
and calculate the anomaly score for each network connection. Mark network connections that
exceed the preset threshold as potential intrusion behaviors based on the anomaly score.
Both of these algorithms are based on the principle of anomaly detection, which extracts features
from network connections and builds models to identify network connections that are inconsistent
with normal behavior, thus conducting intrusion detection. The O-BA algorithm uses clustering
algorithms to construct behavioral models, while the O-RF algorithm uses RF algorithms to construct
classification models.
Precision (represented by PR), accuracy (represented by AC), recall (represented by RE), F1
score, and false positive (FP) rate are used to evaluate the detection performance of the proposed
algorithm and system. The equations can be expressed as shown in equations 14–18.
PR =
TP
(
TP + FP
)
(14)
AC =
(
TP + TN
)
(
TP + TN + FN + FP
)
(15)
RE =
TP
(
TP + FN
)
(16)
F1 = 1
_
1
_
P + 1
_
F1
(17)
FPR =
FP
(
FP + TN
)
(18)
TP refers to the detected attack traffic, TN refers to the attack traffic with correct classification,
FP refers to the attack traffic with wrong classification, and FN refers to the undetected attack traffic.
RESULTS AND ANALYSIS
Analysis of Experimental Results
Figure 4 shows the comparison of detection performance indicators between the Apriori algorithm,
ID3 algorithm, SVM algorithm, NSA algorithm, and O-BA algorithm. The detection precision,
14
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
accuracy, recall, and F1 score of the O-BA algorithm are significantly higher than those of the above
references, and the difference is statistically significant (P < 0.05). The FP rate of the O-BA algorithm
is significantly lower than that of the above references, and the difference has statistical significance
(P < 0.05). Precision, accuracy, recall, F1 score, and FP rate are common indicators for evaluating
the algorithm, which shows that the traffic characteristics selected by the improved O-BA algorithm
can more effectively distinguish between normal traffic and attack traffic, improve classification
accuracy, and reduce classification errors. The main reasons why the O-BA algorithm can improve
classification accuracy are as follows:
Established a normal network connection behavior model: Before conducting anomaly detection,
the O-BA algorithm first groups network connections through clustering algorithms and
establishes a normal network connection behavior model. This approach can better capture the
characteristics and behavioral patterns of normal network connections, thereby reducing FPs
and false negatives.
Strong anomaly detection capability: The O-BA algorithm adopts an anomaly detection-based
method, which can identify network connections that are inconsistent with normal behavior.
Compared to traditional rule-based or signature-based methods, anomaly detection methods
can capture novel intrusion behaviors and unknown attacks, and have better adaptability and
generalization ability.
Considering the contextual information of network connections: When performing anomaly
detection, the O-BA algorithm takes into account the contextual information of network
connections comprehensively, rather than being limited to the characteristics of individual network
connections. This can comprehensively evaluate the degree of network connectivity anomalies
and improve classification accuracy.
Adaptive update model: The O-BA algorithm can adaptively update the model according to
changes in the actual network environment. By monitoring and learning changes in the behavior
patterns of network connections, updating the model promptly to adapt to new intrusion behaviors
and attack methods, and improving classification accuracy.
Although the O-BA algorithm can improve classification accuracy, in practical applications, it
is still necessary to comprehensively consider the performance of the algorithm, the consumption of
computing resources, and the characteristics of the actual network environment to choose a suitable
intrusion detection algorithm.
A–E are precision, accuracy, recall, F1 score, and FP rate.Figure 5 shows the convergence
comparison of the Apriori algorithm, ID3 algorithm, SVM algorithm, NSA algorithm, and O-BA
algorithm. The O-BA algorithm has converged when the number of iterations is 50. Compared with
other algorithms, it converges faster, and the fitness to obtain the optimal solution when converging
is also higher than other algorithms. This indicates that the O-BA algorithm greatly improves the
efficiency of feature extraction by optimizing the subpopulation partitioning criteria, enhances the
dynamic search ability of individuals, and can better meet the detection needs of different stages.
Figure 6 shows the comparison of detection performance indicators of Apriori, ID3, SVM,
NSA, and O-RF algorithms. The detection precision, accuracy, recall, and F1 score of the O-RF
algorithm are significantly higher than those of Apriori, ID3, SVM, and NSA, and the difference is
statistically meaningful (P < 0.05). The FP rate of the O-RF algorithm is significantly lower than
that of Apriori, ID3, SVM, and NSA, and the difference has statistical significance (P < 0.05). In
terms of sample weight, the RF algorithm is optimized, which can enhance the number of wrong
classification samples selected and establish each tree more efficiently. Therefore, the results show
that the detection performance of the proposed O-RF algorithm is better than that of the traditional
algorithms and has the value of application and promotion.
15
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Figure 7 shows the comparison of convergence of Apriori, ID3, SVM, NSA, and O-RF algorithms.
The O-RF algorithm has converged when the number of iterations is 40. Compared with other
algorithms, the O-RF algorithm converges faster, and the fitness to obtain the optimal solution when
converging is also higher than other algorithms, which also shows that the O-RF algorithm has better
convergence and higher applicability.
Figure 8 shows the comparison of detection performance indicators of different combined
algorithms. The detection precision, accuracy, recall, and F1 score of the O-BA + O-RF combined
Figure 4. O-BA Detection performance indicators (Note: * indicates that the difference is statistically significant compared with
the O-BA algorithm (P < 0.05))
Figure 5. Convergence analysis result of O-BA
16
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Figure 6. Detection performance indicators of the O-RF algorithm (Note: * indicates that the difference is statistically significant
compared with the O-RF algorithm (P < 0.05). A–E are precision, accuracy, recall, F1 score, and FP rate.)
Figure 7. Convergence analysis result of O-RF algorithm
17
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
algorithm are significantly higher than those of other combined algorithms (P < 0.05); The FP rate
of the O-BA + O-RF combined algorithm is lower than that of other algorithms (P < 0.05). The
relationship between feature selection and traffic classification algorithms is interactive, so the
situation of running together in the system is different from that of a single algorithm. Therefore,
this result shows that the O-BA + O-RF combined algorithm has excellent comprehensive detection
performance and great promotion value.
Figure 9 shows the cost comparison of different combined algorithms in different numbers of
input traffic classifications. The cost of all combined algorithms decreases with the increase of input
traffic, and the cost of the O-BA + O-RF combined algorithm is less than other combination algorithms
in the cost of different numbers of input traffic classifications. In traffic classification, the more
traffic classified incorrectly, the greater the cost of the algorithm. The classification performance of
the algorithm will be improved after large-scale traffic training. Therefore, the results show that the
improved O-BA + O-RF combined algorithm will produce less cost than the traditional algorithms
after training.
Figure 10 suggests the comparison of detection performance indicators of different NIS systems.
The detection precision, accuracy, recall, and F1 score of the proposed NIS system are higher than
those of other systems (P < 0.05). The FP rate of the proposed NIS system is lower than that of other
Figure 8. Comparison of detection performance indicators of different combined algorithms (Note: * indicates that the difference
is statistically significant in contrast with 5 (P < 0.05). A–E are precision, accuracy, recall, F1 score, and FP rate. 1 represents
the Apriori algorithm, 2 represents the ID3 algorithm, 3 represents the SVM algorithm, 4 represents the NSA algorithm, and 5
represents the O-BA+O-RF combination algorithm.)
18
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
systems (P < 0.05). This indicates that the proposed NIS system based on O-BA and O-RF algorithm
has high classification precision and recall rate, and has application value.
Figure 11 shows the comparison of the detection time of the NIS system under different input
traffic. The detection time of all NIS systems will be extended with the increase of input traffic,
and the detection time of the proposed NIS system in different input traffic classifications is at the
medium level, which is within the acceptable range. It reveals that the proposed NIS system based
on the O-BA and O-RF algorithm can not only improve the detection effect of network attacks but
also maintain good detection efficiency.
Figure 12 indicates the comparison of the detection precision of the NIS system for different
types of attacks. The difference in the detection precision of the proposed NIS system for normal
traffic compared with other systems is not statistically significant (P > 0.05). The precision of the
proposed NIS system for detecting DOS and Probe network attacks is higher than that of other systems
(P < 0.05); The precision of the proposed NIS system for detecting U2R and R2L network attacks is
higher than that of other systems, but the value is less, 60.38% and 33.06%, respectively; This shows
that the precision of the proposed NIS system based on O-BA and O-RF algorithm in detecting DOS
and Probe type network attacks is significantly better than that of the system in detecting U2R and
R2L type network attacks. The reason may be that the sample size of these two types of attacks is
small, which makes the AI algorithm unable to obtain sufficient features and affects the classification
accuracy. Therefore, the detection effect of U2R and R2L-type network attacks is poor.
Analysis of Practical Applications
With the rapid development of internet technology and the continuous evolution of hacker
attacks, NIS issues have become increasingly prominent. To protect network systems from various
threats of network attacks, network attack detection has become a crucial task. However, with the
diversification of attack methods and the continuous evolution of attackers, traditional network attack
detection methods face enormous challenges. Therefore, we need a more effective and accurate network
attack detection method to address these threats. This article explores the key stages of network
attack detection and proposes improved algorithms and systems to optimize the detection process.
Although we conducted simulation experiments on real data platforms to verify the performance of
the proposed algorithm and system, we also recognize some limitations in the research.
Figure 9. Cost comparison of different combined algorithms in different numbers of input traffic classification (Note: 1 represents
the Apriori algorithm, 2 represents the ID3 algorithm, 3 represents the SVM algorithm, 4 represents the NSA algorithm, and 5
represents the O-BA+O-RF combination algorithm)
19
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Figure 10. Comparison of detection performance indicators of different NIS systems (Note: * indicates that the difference is
statistically significant relative to the proposed system (P < 0.05))
Figure 11. Comparison of detection time of NIS system under different input traffic
20
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Sample limitation: The dataset used in the article has a small sample size, which may affect
the generalizability and representativeness of the results. To improve the generalizability
and representativeness of the results, increasing the sample size of the dataset can more
comprehensively cover different types of network attacks and normal traffic. Consider collecting
real-world network traffic data and combining it with synthetic datasets to simulate more attack
scenarios.
Method limitations: The algorithm and system proposed in this article need to be further validated
for their effectiveness and feasibility in practical applications. Meanwhile, the algorithms and
methods used in this article have certain limitations and may need to be combined with other
methods to improve detection performance. In addition to the BA and RF algorithm proposed
in this article, further research and experimentation can be conducted on other machine learning
algorithms, deep learning models, or ensemble learning methods to improve the accuracy and
efficiency of network attack detection.
Research design limitations: The research design of this article is a simulation experiment, without
conducting on-site testing and verification in practical application scenarios. Based on simulation
experiments, a real network environment can be built for on-site testing and verification of the
proposed algorithm and system. This can simulate network attack scenarios more realistically
and evaluate the performance of algorithms in practical applications. This article focuses on two
Figure 12. Comparison of detection performance indicators of different NIS systems (Note: * indicates that the difference is
statistically significant compared with the proposed system (P < 0.05). A–E are normal, Probe, DOS, U2R, and R2L, respectively.)
21
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
key stages: feature selection and traffic classification, but network attack detection is a complex
process that also includes other stages such as data preprocessing, anomaly detection, and behavior
analysis. Further research and exploration can be conducted on how to combine these stages
and factors to enhance the overall effectiveness of the entire network attack detection system.
Data availability limitation: Due to the privacy of network attack events, the dataset used in this
article may have certain limitations and cannot obtain more comprehensive and detailed data.
In addition to network traffic data, other data sources such as log records and network device
information can also be considered to obtain more comprehensive and diverse information.
Diversified data sources can provide more accurate features and better training models.
The algorithm and system proposed in this article have important practical applications in the
field of network security, especially in the following scenarios:
1. Enterprise information security: Enterprises must ensure the security of their networks and data
to protect their customers, employees, and businesses. The network attack detection algorithm
and system proposed in this article can help enterprises identify and prevent malicious attacks,
thereby improving their information security level.
2. Government agency information security: Government agencies typically handle a large amount
of sensitive information, such as tax records, state secrets, etc. The algorithm and system proposed
in this article can help government agencies detect and defend against network attack events
promptly, thereby ensuring national information security.
3. Internet service providers: Internet service providers need to provide users with safe and reliable
services. The algorithm and system proposed in this article can help providers detect and defend
against attacks from hackers and other malicious attackers while maintaining efficient network
operation.
4. Network security researchers: Network security researchers need to continuously improve their
technology and methods to cope with new network attack methods. The algorithm and system
proposed in this article can provide a reference and foundation for these researchers to better
understand and respond to network attack events.
In summary, the algorithm and system proposed in this article have practical application value for
various organizations and institutions, which can help improve their network security level and reduce
losses caused by network attack events. The following development directions can be considered for
future network attack detection:
Deep learning and AI applications: Deep learning and AI technologies have enormous potential
in the field of network security. In the future, deep learning models such as CNNs, RNNs, and
generative adversarial networks can be further studied and applied to improve the accuracy and
efficiency of network attack detection.
Adaptive and self-learning systems: Network attack methods are constantly evolving and changing,
so network attack detection systems need to have the ability to adapt and self-learning. One of
the future development directions is to study how to build a network attack detection system with
adaptability and self-learning ability to cope with new attack methods and threats.
Multi-source data integration and analysis: In addition to network traffic data, network attack
detection can be combined with other data sources, such as log data, user behavior data, etc.,
for comprehensive analysis and detection. One of the future development directions is to
study how to effectively integrate and analyze multi-source data to improve the accuracy and
comprehensiveness of network attack detection.
22
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Cloud security and edge computing: With the popularization of cloud computing and edge
computing, network attack detection also needs to adapt to the characteristics of cloud
environments and edge devices. One of the future development directions is to study how to
conduct efficient and accurate network attack detection on cloud environments and edge devices
to protect the security of cloud services and edge computing.
Collaborative defense and information sharing: Network attacks are often a collaborative process,
and attackers may attack through multiple nodes. One of the future development directions is
to study how to achieve collaborative defense and information sharing, promote rapid response,
and blocking of network attack events.
To sum up, the development of network attack detection in the future will involve deep learning,
adaptability, multi-source data analysis, cloud security and edge computing, collaborative defense,
and other aspects. These development directions will further enhance the ability and effectiveness of
network attack detection, providing more comprehensive and effective protection for network security.
CONCLUSION
This research aims to solve the huge threat to NIS with the rapid development of internet
technology and the diversification of hacker attacks. This article explores the two key stages of
network attack detection, namely feature selection and traffic classification. Then, this article proposes
improved BA and RF algorithms to optimize the detection process of network attacks. Based on
these algorithms, we propose a NIS system based on O-BA and O-RF to improve the effectiveness
of network attack detection and maintain high detection efficiency. We validated the performance of
the proposed algorithm and system through simulation experiments conducted on real data platforms.
However, due to the small sample size in the experimental dataset, there may be some bias in our data
results. Therefore, future research will consider expanding the dataset to further validate the feasibility
of the proposed system. Overall, this study has significant practical implications for addressing the
current issues of network attack detection and defense. Our work provides feasible solutions for NIS
and useful references and assistance for building a more secure network environment.
CONFLICTS OF INTEREST
We wish to confirm that there are no known conflicts of interest associated with this publication
and there has been no significant financial support for this work that could have influenced its outcome.
FUNDING STATEMENT
The research is supported by: Innovation Fund for University Teachers of Gansu Province, 2023
(2023B-236); Emerging Engineering Education Research and Practice Project of Lanzhou Institute
of Technology (2022-LGYXGK-09)
PROCESS DATES
Received: December 13, 2023, Revision: February 9, 2024, Accepted: February 9, 2024
CORRESPONDING AUTHOR
Correspondence should be addressed to Ming Li; liming _sgcc@ 163 .com
23
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
REFERENCES
Al-Hawawreh, M., Moustafa, N., & Slay, J. (2024). A threat intelligence framework for protecting smart
satellite-based healthcare networks. Neural Computing & Applications, 36(1), 15–35. DOI: 10.1007/
s00521-021-06441-5 PMID: 34518744
Andrade-Hoz, J., Wang, Q., & Alcaraz-Calero, J. M. (2024). Infrastructure-wide and intent-based networking
dataset for 5G-and-beyond AI-driven autonomous networks. Sensors (Basel), 24(3), 783. DOI: 10.3390/
s24030783 PMID: 38339500
Biehler, M., Zhong, Z., & Shi, J. (2024). SAGE: Stealthy Attack Generation in cyber-physical systems. IISE
Transactions, 56(1), 54–68. DOI: 10.1080/24725854.2023.2184004
Cao, D. M., Sayed, M. A., Islam, M. T., Mia, M. T., Ayon, E. H., Ghosh, B. P., Ray, R. K., & Raihan, A. (2024).
Advanced cybercrime detection: A comprehensive study on supervised and unsupervised machine learning
approaches using real-world datasets. Journal of Computer Science and Technology Studies, 6(1), 40–48. DOI:
10.32996/jcsts.2024.6.1.5
Cao, J., Pan, B., & Zou, X. (2024). Flow monitoring system and abnormal log traffic mode detection based on
artificial intelligence. Optical and Quantum Electronics, 56(1), 112. DOI: 10.1007/s11082-023-05690-z
Casado-Vara, R., Severt, M., Díaz-Longueira, A., Rey, Á. M. D., & Calvo-Rolle, J. L. (2024). Dynamic malware
mitigation strategies for IoT networks: A mathematical epidemiology approach. Mathematics, 12(2), 250. DOI:
10.3390/math12020250
Emil Selvan, G. S. R., Daniya, T., Ananth, J. P., & Suresh Kumar, K. (2024). Network intrusion detection and
mitigation using hybrid optimization integrated deep Q network. Cybernetics and Systems, 55(1), 107–123.
DOI: 10.1080/01969722.2022.2088450
Ertem, M., & Bier, V. M. (2024). Optimal defense strategies against intelligent cyber attacks. Journal of Innovative
Engineering and Natural Science, 4(1), 245–262. DOI: 10.61112/jiens.1389871
Ezhilarasi, M., Gnanaprasanambikai, L., Kousalya, A., & Shanmugapriya, M. (2023). A novel implementation
of routing attack detection scheme by using fuzzy and feed-forward neural networks. Soft Computing, 27(7),
4157–4168. DOI: 10.1007/s00500-022-06915-1
Hasas, A., Zarinkhail, M. S., Hakimi, M., & Quchi, M. M. (2024). Strengthening digital security: Dynamic
attack detection with LSTM, KNN, and random forest. Journal of Computer Science and Technology Studies,
6(1), 49–57. DOI: 10.32996/jcsts.2024.6.1.6
Kan, D., & Fang, X. (2024). Event log anomaly detection method based on auto-encoder and control flow.
Multimedia Systems, 30(1), 29. DOI: 10.1007/s00530-023-01199-3
Karthikeyan, M., Manimegalai, D., & RajaGopal, K. (2024). Firefly algorithm based WSN-IoT security
enhancement with machine learning for intrusion detection. Scientific Reports, 14(1), 231. DOI: 10.1038/
s41598-023-50554-x PMID: 38168562
Kong, L., Luo, M., Cheng, J., Katib, I., Shi, K., & Zhong, S. (2024). Security‐based fault detection filtering
design for fuzzy singular semi‐Markovian jump systems via improved dynamic event‐triggering and quantization
protocols. International Journal of Adaptive Control and Signal Processing, 38(1), 39–75. DOI: 10.1002/acs.3689
Li, X., Li, Y., & He, H. (2024). Computer network virus defense with data mining-based active protection.
Scalable Computing: Practice and Experience, 25(1), 45–54. DOI: 10.12694/scpe.v25i1.2173
Najafi Mohsenabad, H., & Tut, M. A. (2024). Optimizing cybersecurity attack detection in computer networks:
A comparative analysis of bio-inspired optimization algorithms using the CSE-CIC-IDS 2018 dataset. Applied
Sciences (Basel, Switzerland), 14(3), 1044. DOI: 10.3390/app14031044
Nayomi, B. D. D., Mallika, S. S., Sowmya, T., Janardhan, G., Laxmikanth, P., & Bhavsingh, M. (2024). A
cloud-assisted framework utilizing blockchain, machine learning, and artificial intelligence to countermeasure
phishing attacks in smart cities. International Journal of Intelligent Systems and Applications in Engineering,
12(1s), 313–327. https:// www .ijisae .org/ index .php/ IJISAE/ article/ view/ 3419
24
International Journal of Information Security and Privacy
Volume 18 • Issue 1 • January-December 2024
Palma, C., Ferreira, A., & Figueiredo, M. (2024). Explainable machine learning for malware detection on android
applications. Information (Basel), 15(1), 25. DOI: 10.3390/info15010025
Rzym, G., Masny, A., & Chołda, P. (2024). Dynamic telemetry and deep neural networks for anomaly detection
in 6G software-defined networks. Electronics (Basel), 13(2), 382. DOI: 10.3390/electronics13020382
Sami, T. M. G., Zeebaree, S. R., & Ahmed, S. H. (2024). A novel multi-level hashing algorithm to enhance
internet of things devices’ and networks’ security. International Journal of Intelligent Systems and Applications
in Engineering, 12(1s), 676–696. https:// www .ijisae .org/ index .php/ IJISAE/ article/ view/ 3502
Siddamsetti, S., & Srivenkatesh, M. (2024). Deep blockchain approach for anomaly detection in the bitcoin
network. International Journal of Intelligent Systems and Applications in Engineering, 12(1), 581–595. https://
ijisae .org/ index .php/ IJISAE/ article/ view/ 3956
Tenepalli, D., & TM, N. (2024). A systematic review on IoT and machine learning algorithms in e-healthcare.
International Journal of Computing and Digital Systems, 15(1), 1–14. http:// 137 .117 .138 .59/ handle/ 123456789/
5274
Wang, D., Li, F., Liu, K., & Zhang, X. (2024). Real-time cyber-physical security solution leveraging an integrated
learning-based approach. ACM Transactions on Sensor Networks, 20(2), 1–22. DOI: 10.1145/3638051
Wani, M. S., Rademacher, M., Horstmann, T., & Kretschmer, M. (2024). Security vulnerabilities in 5G
non-stand-alone networks: A systematic analysis and attack taxonomy. Journal of Cybersecurity and Privacy,
4(1), 23–40. DOI: 10.3390/jcp4010002
Yun, K., Yun, H., Lee, S., Oh, J., Kim, M., Lim, M., Lee, J., Kim, C., Seo, J., & Choi, J. (2024). A study on
machine learning-enhanced roadside unit-based detection of abnormal driving in autonomous vehicles. Electronics
(Basel), 13(2), 288. DOI: 10.3390/electronics13020288
Fu Longfei (1982-), Male, Han, From Changyuan, Henan Province, Master, Senior Engineer. Major Research
Interests: Control Systems of Electric Drives, Comprehensive Integrated Man-Machine Systems, Integrated Factory
Automation, Analysis of Complex System Behavior, Power System Operation Support.
Liu Yibin (1992-), Male, Han, From Jingyuan, Gansu Province, Bachelor, Engineer. Major Research Interests:
Applications of Robotics and articial intelligence.
Zhang Yanjun(1984-), Male, Han, From Lanzhou, Gansu Province, Bachelor, Engineer. Major Research Interests:
Environment and intelligent systems.
Li Ming, male, (1971 -), bachelor degree. Research interests include cloud, network security, Internet of things,
blockchain, etc.
... With the advancement of information technology, the network environment has become increasingly complex, leading to exacerbated network security issues such as intrusion attacks and network viruses. A network intrusion detection system (NIDS; Fu et al., 2024) is a real-time network security audit and management system that can respond to network attacks promptly and provide security protection. Network traffic is one of the core indicators of network operation, and network intrusions often cause anomalies in network traffic. ...
Article
Full-text available
Network traffic is a crucial indicator of network performance and network intrusions typically result in traffic anomalies. Capturing the differences and commonalities between different input features is challenging due to high-dimensional traffic data. To address this, we propose a multi-scale feature extraction method based on global additive attention (MSFE-GAA), which integrates time position information encoded by trigonometric functions to capture multi-scale temporal features. An improved Transformer with a similarity matrix captures the commonalities and differences, enhanced by global additive attention for long-term dependencies. Experiments on two public datasets show that the MSFE-GAA model outperforms other baseline models.
Article
Full-text available
In computer network security, the escalating use of computer networks and the corresponding increase in cyberattacks have propelled Intrusion Detection Systems (IDSs) to the forefront of research in computer science. IDSs are a crucial security technology that diligently monitor network traffic and host activities to identify unauthorized or malicious behavior. This study develops highly accurate models for detecting a diverse range of cyberattacks using the fewest possible features, achieved via a meticulous selection of features. We chose 5, 9, and 10 features, respectively, using the Artificial Bee Colony (ABC), Flower Pollination Algorithm (FPA), and Ant Colony Optimization (ACO) feature-selection techniques. We successfully constructed different models with a remarkable detection accuracy of over 98.8% (approximately 99.0%) with Ant Colony Optimization (ACO), an accuracy of 98.7% with the Flower Pollination Algorithm (FPA), and an accuracy of 98.6% with the Artificial Bee Colony (ABC). Another achievement of this study is the minimum model building time achieved in intrusion detection, which was equal to 1 s using the Flower Pollination Algorithm (FPA), 2 s using the Artificial Bee Colony (ABC), and 3 s using Ant Colony Optimization (ACO). Our research leverages the comprehensive and up-to-date CSE-CIC-IDS2018 dataset and uses the preprocessing Discretize technique to discretize data. Furthermore, our research provides valuable recommendations to network administrators, aiding them in selecting appropriate machine learning algorithms tailored to specific requirements.
Article
Full-text available
In the era of Autonomous Networks (ANs), artificial intelligence (AI) plays a crucial role for their development in cellular networks, especially in 5G-and-beyond networks. The availability of high-quality networking datasets is one of the essential aspects for creating data-driven algorithms in network management and optimisation tasks. These datasets serve as the foundation for empowering AI algorithms to make informed decisions and optimise network resources efficiently. In this research work, we propose the IW-IB-5GNET networking dataset: an infrastructure-wide and intent-based dataset that is intended to be of use in research and development of network management and optimisation solutions in 5G-and-beyond networks. It is infrastructure wide due to the fact that the dataset includes information from all layers of the 5G network. It is also intent based as it is initiated based on predefined user intents. The proposed dataset has been generated in an emulated 5G network, with a wide deployment of network sensors for its creation. The IW-IB-5GNET dataset is promising to facilitate the development of autonomous and intelligent network management solutions that enhance network performance and optimisation.
Article
Full-text available
Anomaly detection is widely used in the field of business process management, and researchers have proposed various anomaly detection algorithms to detect anomalies in event logs. However, existing research focuses on detecting anomalies in event logs at the data level, ignoring the problem of anomalies caused by event log control flow, especially behavioral relationships, and identifying behavioral anomalies as normal, leading to an increase in the false-negative rate of anomaly detection results, which negatively affects the performance of process mining. To solve the above problems, this article proposes an auto-encoder-based anomaly detection approach to achieve the detection of behavioral relationship anomalies in event logs through the reconstruction error between images. The approach first considers event logs containing behavioral relationships, converts the logs into images as input to the auto-encoder, and analyses the reconstruction error between images to propose a reconstruction error threshold for anomaly detection. The algorithm is able to achieve anomaly detection of behavioral relationships in event logs and reduce the false-negative rate of anomaly detection results. Experiments on synthetic datasets and real datasets show that the proposed approach can improve the recall rate and F1-score of event log anomaly detection effectively.
Article
Full-text available
With the increasing availability of computational power, contemporary machine learning has undergone a paradigm shift, placing a heightened emphasis on deep learning methodologies. The pervasive automation of various processes necessitates a critical re-evaluation of contemporary network implementations, specifically concerning security protocols and the imperative need for swift, precise responses to system failures. This article introduces a meticulously crafted solution designed explicitly for 6G software-defined networks (SDNs). The approach employs deep neural networks for anomaly detection within network traffic, contributing to a more robust security framework. Furthermore, the paper delves into the realm of network monitoring automation by harnessing dynamic telemetry, providing a specialized and forward-looking strategy to tackle the distinctive challenges inherent in SDN environments. In essence, our proposed solution aims to elevate the security and responsiveness of 6G mobile networks. By addressing the intricate challenges posed by next-generation network architectures, it seeks to fortify these networks against emerging threats and dynamically adapt to the evolving landscape of next-generation technology.
Article
Full-text available
With the progress and evolution of the IoT, which has resulted in a rise in both the number of devices and their applications, there is a growing number of malware attacks with higher complexity. Countering the spread of malware in IoT networks is a vital aspect of cybersecurity, where mathematical modeling has proven to be a potent tool. In this study, we suggest an approach to enhance IoT security by installing security updates on IoT nodes. The proposed method employs a physically informed neural network to estimate parameters related to malware propagation. A numerical case study is conducted to evaluate the effectiveness of the mitigation strategy, and novel metrics are presented to test its efficacy. The findings suggest that the mitigation tactic involving the selection of nodes based on network characteristics is more effective than random node selection.
Article
Full-text available
Ensuring the safety of autonomous vehicles is becoming increasingly important with ongoing technological advancements. In this paper, we suggest a machine learning-based approach for detecting and responding to various abnormal behaviors within the V2X system, a system that mirrors real-world road conditions. Our system, including the RSU, is designed to identify vehicles exhibiting abnormal driving. Abnormal driving can arise from various causes, such as communication delays, sensor errors, navigation system malfunctions, environmental challenges, and cybersecurity threats. We simulated exploring three primary scenarios of abnormal driving: sensor errors, overlapping vehicles, and counterflow driving. The applicability of machine learning algorithms for detecting these anomalies was evaluated. The Minisom algorithm, in particular, demonstrated high accuracy, recall, and precision in identifying sensor errors, vehicle overlaps, and counterflow situations. Notably, changes in the vehicle’s direction and its characteristics proved to be significant indicators in the Basic Safety Messages (BSM). We propose adding a new element called linePosition to BSM Part 2, enhancing our ability to promptly detect and address vehicle abnormalities. This addition underpins the technical capabilities of RSU systems equipped with edge computing, enabling real-time analysis of vehicle data and appropriate responsive measures. In this paper, we emphasize the effectiveness of machine learning in identifying and responding to the abnormal behavior of autonomous vehicles, offering new ways to enhance vehicle safety and facilitate smoother road traffic flow.
Article
Full-text available
Digital security is an ever-escalating concern in today's interconnected world, necessitating advanced intrusion detection systems. This research focuses on fortifying digital security through the integration of Long Short-Term Memory (LSTM), K-Nearest Neighbors (KNN), and Random Forest for dynamic attack detection. Leveraging a robust dataset, the models were subjected to rigorous evaluation, considering metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. The LSTM model exhibited exceptional proficiency in capturing intricate sequential dependencies within network traffic, attaining a commendable accuracy of 99.11%. KNN, with its non-parametric adaptability, demonstrated resilience with a high accuracy of 99.23%. However, the Random Forest model emerged as the standout performer, boasting an accuracy of 99.63% and showcasing exceptional precision, recall, and F1-score metrics. Comparative analyses unveiled nuanced differences, guiding the selection of models based on specific security requirements. The AUC-ROC comparison reinforced the discriminative power of the models, with Random Forest consistently excelling. While all models excelled in true positive predictions, detailed scrutiny of confusion matrices offered insights into areas for refinement. In conclusion, the integration of LSTM, KNN, and Random Forest presents a robust and adaptive approach to dynamic attack detection. This research contributes valuable insights to the evolving landscape of digital security, emphasizing the significance of leveraging advanced machine learning techniques in constructing resilient defenses against cyber adversaries. The findings underscore the need for adaptive security solutions as the cyber threat landscape continues to evolve, with implications for practitioners, researchers, and policymakers in the field of cybersecurity
Article
We propose a comprehensive game-theoretic model pertaining to the security of computer networks, specifically addressing the interaction between defenders and attackers. The model incorporates attack graphs to outline potential attacker strategies and defender responses. To account for the attacker's capacity to execute multiple attempts, we introduce a probabilistic element, wherein the success or failure at any arc of the attack graph is treated as stochastic. This characterization gives rise to a multi-stage stochastic network-interdiction problem. In this problem formulation, the defender strategically interdicts a set of arcs in anticipation of the likely actions of the attacker, who, in turn, can make multiple attempts to traverse the network. We mathematically articulate this scenario as a stochastic bilevel mixed-integer program with a "min-max" objective. The defender's aim is to minimize the probability of the attacker's success, while the attacker seeks to maximize the probability of successfully traversing the network across multiple attempts. The defender's stochastic bilevel optimization model is solved using the integer L-shaped method. Upon analyzing the defender's perspective, we observe the anticipated trend that the overall success probability of the attacker diminishes with an increasing level of defense. Notably, in the sensitivity analysis involving relatively small attack graphs, we discover that the optimal defense strategy against a myopic attacker often aligns with that against a non-myopic attacker. Furthermore, in instances where deviations exist, the disparity in performance is generally marginal. However, our findings demonstrate a potential divergence in optimal defense strategies when the available attack paths share numerous common arcs.
Article
A novel approach is presented in this paper to address the limitations of virtual machine technology, active kernel technology, heuristic killing technology, and behaviour killing technology in computer network virus defence. The proposed method provides data mining technology, specifically Object-Oriented Analysis (OOA) mining, to detect deformed and unknown viruses by analyzing the sequence of Win API calls in PE files. Experimental results showcase the Data Mining-based Antivirus (DMAV) system's superiority over existing virus scanning software in multiple aspects: higher accuracy in deformed virus detection, enhanced active defence capabilities against unknown viruses (with a recognition rate of 92%), improved efficiency, and a reduced false alarm rate for non-virus file detection. Furthermore, the paper introduces an OOA rule generator to optimize feature extraction, enhancing the system's intelligence and robustness. This research provides a promising solution to support virus detection accuracy, active defence mechanisms, and overall efficiency while minimizing false positives in virus scanning, thus contributing significantly to the advancement of computer network security.