Content uploaded by Mohana ..
Author content
All content in this area was uploaded by Mohana .. on Dec 23, 2023
Content may be subject to copyright.
Real Time Network Traffic Analysis using Artificial
Intelligence, Machine Learning and Deep Learning: A
Review of Methods, Tools and Applications
Aschin Dhakad
Electronics and Telecommunication
Engineering, RV College of
Engineering®, Bengaluru, India
Minal Moharir
Computer Science & Engineering
(Cyber Security), RV College of
Engineering®, Bengaluru, India
Shruti Singh
Electronics and Telecommunication
Engineering, RV College of
Engineering®, Bengaluru, India
Mohana
Computer Science & Engineering
(Cyber Security), RV College of
Engineering®, Bengaluru, India
Ashok Kumar A R
Computer Science & Engineering
(Cyber Security), RV College of
Engineering®, Bengaluru, India
Abstract- In order to spot potential security threats or performance
problems, Network Traffic Analysis (NTA) involves monitoring
and analyzing network traffic. However, Machine Learning (ML)
methods are frequently used to automate NTA. Network traffic
classification, anomaly detection, and malicious activity detection
can all be done using ML techniques. In order to enhance network
performance, they can also be utilized to forecast future traffic
patterns. ML algorithms come in a variety of forms and can be
applied to NTA. Support vector machines (SVM), decision trees,
and random forests are the most used methods. Depending on the
particular application, an algorithm will be chosen. SVMs, for
instance, are frequently used for classification tasks, whereas
decision trees are frequently utilized for anomaly detection jobs.
Network performance and security can be enhanced using NTA
employing ML. It can aid in the detection of possible risks, the
prevention of data breaches, and the enhancement of network
performance. In the proposed work real time NTA using ML and
DL algorithms discussed with tools and applications. Random
forest algorithm is implemented and obtained an accuracy of
99.31%. Benefits of applying ML to NTA includes increased
accuracy when it comes to spotting dangers and anomalies, ML
algorithms have the potential to be more precise than conventional
rule-based techniques. Less false positives, ML algorithms can be
customized to produce fewer false positives, which can save time
and money. Enhanced scalability, ML algorithms can be scaled to
manage high levels of network traffic.
Keywords — You Only Look Once (YOLO), Convolutional Neural
Network (CNN), Artificial Intelligence(AI), Machine learning
(ML) , Deep Learning(DL)
I. INTRODUCTION
The process of gathering, examining, and interpreting network
traffic data in order to spot anomalies, performance problems,
and malicious behavior is known as network traffic analysis.
Conventional techniques for analyzing network traffic frequently
rely on signature-based detection, which searches for patterns of
malicious traffic that have been identified. This strategy may not
work against emerging or novel dangers. Deep learning and
artificial intelligence (AI) provide up new avenues for real-time
network traffic analysis. Without the requirement for pre-defined
signatures, harmful traffic patterns can be recognized by AI-
powered traffic analysis tools. They are hence more potent
against emerging and changing threats. real-time NTA is a
difficult undertaking reasons are first; network traffic is
frequently both highly fast and big volume. Because of this,
gathering and analyzing all of the traffic in real time is
challenging. Secondly, a lot of network traffic is encrypted. This
can make it challenging to detect malicious activity since it
makes it harder to view the substance of the traffic. Third, traffic
on networks is ever-changing. This implies that in order to stay
abreast of the most recent dangers, the models used for traffic
analysis must be updated on a regular basis. Real-time NTA
exhibits a number of difficulties that can be addressed by AI and
deep learning. First, without the need for pre-defined signatures,
AI-powered traffic analysis systems can utilize machine learning
to learn to recognize harmful traffic patterns. They are hence
more potent against emerging and changing threats. Second,
deep learning can be used by AI-powered traffic analysis systems
to decipher encrypted traffic. DL models have the capacity to
recognize encrypted traffic patterns linked to harmful behavior.
Thirdly, compared to conventional traffic analysis tools, AI-
powered solutions can receive updates more often. Due to the
ease and speed with which AI models can be trained on new data
sets.
II. LITERATURE SURVEY AND PROBLEM ANALYSIS
Neeraj Namdev et al [1] presented the use of the internet is
playing a huge role as technology and communication systems
evolve. As a result, data and traffic through the internet rise
exponentially. A particularly common method for fighting the
information detection system is the categorization of internet
traffic. Nour Alqudah et al [2] , presented that traffic analysis
serves a variety of functions, including assessing the efficiency
and security of network administration and operations. It is
aschindhakad.et20@rvce.edu.in
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 372
2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS) | 979-8-3503-0085-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSSAS57918.2023.10331855
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
believed that NTA is essential for enhancing the security and
functionality of networks. New approaches are needed to
identify intrusions, categorize Internet traffic, and analyze virus
behavior as a result of growing network traffic and the
development of artificial intelligence. A, Jamuna et al [3] ,
presented the traffic classification based on applications creation
is a key component of network security and management.
Traditional methods include payload and port-based deep
examination approaches. Several privacy concerns, dynamic
ports, and encrypted programs make the conventional
procedures ineffective in the current network context. T. Bujlow
et al [4] presented that monitoring the network performance in
high-speed internet infrastructure is a difficult undertaking
because the standards for the specified quality level depend on
the type of service. Understanding the different sorts of
applications constituting the present network traffic is
consequently necessary for backbone QoS monitoring and
analysis in multihop networks. It was suggested to use the C5.0
ML Algorithm to improve traffic classification in order to
address the shortcomings of the current approaches. Amin
Shahraki et al [5] presented that for a variety of objectives,
including network resource management and cyber-security
analysis, this data analysis is crucial. Analytical techniques that
can process network data online based on the receipt of new
data. Data analytics are predicted to be supported by online ML
(OL) approaches. A. Priya et al [6] presented that numerous
additional techniques were suggested to address the current
technical problems. The primary objective of network traffic
classification is network performance improvement. The
suggested framework states that this work uses unsupervised
ML to identify the user’s browser and application from real-time
network information. V. A. Muliukha et al [7] presented that
the application of map-reduce technology to NTA. It describes
how to determine when a frame in a PCAP file begins. The
procedure for producing training samples for various traffic
types is explained. There is a description of the programme
“Tractor” used in the experiments as well as the findings of
experimental investigations on the analysis of encrypted traffic
using the random forest method and the naïve Bayesian
classifier. E. Nazarenko et al [8] presented that the theories
underlying the operation of the traditional ML algorithms and
the knowledge of their parameters were regarded as a component
of their theoretical makeup. The created network traffic
categorization and identification system makes use of the k-NN,
Naive Bayes, and SVM ML algorithms. The theoretical
underpinnings of traditional ML algorithms—their operating
principles and parameter knowledge—were considered. E. Osa
et al [9] presented that Network monitoring and unauthorized
access or malicious traffic over secured networks are detected
using intrusion detection systems. The comparative examination
of a few ML techniques presented. ML methods in identifying
anomalies in network data is compared in this research. Decision
Tree was judged to provide the overall best outcome. L.
Trajković et al [10] presented that Traffic traces gathered from
operational communication that characterizes quantify traffic
loads, examine user behavior patterns, model network traffic,
and forecast network traffic in the forthcoming. To characterize
and model network traffic, examine Internet topologies, and
categorize network anomalies, use traffic traces gathered from
various deployed networks and the Internet. Yang et al [11]
presented that the foundation for network traffic monitoring,
data analysis, and user service quality improvement is accurate
network traffic identification. DPI technology can also identify
specific application traffic, which increases the accuracy of
identification. The use of the ML method based on statistical
flow characteristics helps identify network flows with
encryption and unknown features, making up for the
shortcomings of DPI technology. Y. Xue et al [12] presented
that prior to discussing some suggestions that can improve traffic
performance, first try to analyze the present traffic classification
methodologies in this research, concentrating on their issues and
challenges. A. Boukhalfa et al [13] presented that the detection
of threats is complicated by the rising volume and diversity of
data in these interactions. In order to identify new invisible
dangers, this study employs a novel approach to NTA that
leverages big data frameworks to gather substantial volumes of
network traffic data and Deep Learning (DL) algorithms to
analyze it. K. Limthong et al [14] conducted studies to look at
the relationship between interval based characteristics of
network traffic and various sorts of network abnormalities using
two well-known ML techniques, naive Bayes and k-nearest
neighbor. two ML algorithms, five different kinds of test bed
anomalies, and nine interval-based features of network traffic to
analyze and assess each feature’s correctness. Although only
from the naive Bayes and k- nearest neighbor algorithms, the
preliminary results showed the more useful features for each of
the anomaly types. M. Shafiq et al [15] presented that Internet
traffic can be categorized using a variety of traditional
techniques, including ML, payload analysis, and port-based
classification. ML (ML) approach is currently the most often
utilized method, which has received very accurate results and is
used by many researchers. The limitations of the several network
traffic classification techniques (port-based, payload-based, and
ML-based) are presented. The network traffic classification
model is then organized from traffic capture to final output. Used
Wire Shark tool to record five WWW, DNS, FTP, P3P, and
TELNET applications traffic for a length of one minute and the
Netmate programme to extract 23 attributes for comparative
study of four algorithms. M. Ramires et al [16] presented that
the goal of traffic classification methods is to automatically
analyze and classify traffic moving over a network based on its
unique properties and intrinsic nature. Taking advantage of
factors like port numbers and payload analysis for
categorization, However, due to the Internet’s rapid growth and
evolution, such techniques have proven ineffective. a
comparative analysis of ML approaches for categorization of
network traffic. D. Szostak et al [17] presented that to provide
services to users, network operators must prioritize efficient
resource allocation. Its optimization provides for cost savings,
required service quality, and anomalies in data flow detection.
Linear Discriminant Analysis (LDA) classifier-based ML (ML)
method for estimating short-term traffic numbers was offered.
The traffic prediction problem is therefore formulated as a
classification task. M. Soykan et al [18] presented that A product
that enables anonymous Internet communication without
disclosing users’; identity is the Tor project. first conducted data
analysis and learned more about the data set. categorical values
were allocated to numerical values. Iraj Lohrasbinasab et al [19]
presented to address problems, network performance is tracked
using Network Traffic Monitoring and Analysis (NTMA), a
collection of approaches. Forecasting network load and its future
behavior is the main focus of the significant NTMA ‘Network
Traffic Prediction’ (NTP). Typically, one of two methods—
statistical or ML—can be used to deploy NTP techniques.
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 373
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
Kuldeep & Agrawal et al [20] presented that Traditional IP
traffic classification techniques like port number and payload-
based direct packet inspection techniques are no longer widely
used due to the use of dynamic port numbers in packet headers
rather than well-known port numbers and various cryptographic
techniques that prevent inspection of packet payloads. A
contemporary trend is to categorize IP traffic using ML (ML)
techniques. A packet capture used to build a real-time internet
traffic dataset during a 2 second packet collecting period.
III. AI AND DEEP LEARNING METHODS IN NETWORK TRAFFIC
ANALYSIS
A. Network Traffic Analysis - To detect and counteract security
threats, improve network performance, and troubleshoot network
issues, NTA involves gathering, monitoring, and analyzing data
on network traffic. The application of AI and DL in NTA is
growing as a means of automating processes, enhancing
accuracy, and generating additional insights. Network traffic,
including web traffic, email traffic, and file transfer traffic, can
be categorized using AI. Possible security risks like malware or
distributed denial-of-service (DDoS) attacks can be identified.
The performance and capacity planning of network may be
improved as a result by figuring out the source of the issue, AI
can be utilized to diagnose network issues. Examples of specific
applications of AI and DL in NTA include: Stealthwatch is an
NTA solution that Cisco has created that uses AI. Stealthwatch
employs ML to recognize and categorize network traffic, find
abnormalities and traffic patterns. A NTA solution powered by
AI named Wildfire was created by Palo Alto Networks. WildFire
employs ML to recognize and categorize malware, recognize
malicious traffic, and stop assaults. Contrail Analytics is an NTA
solution developed using AI. ML is used by Contrail Analytics
to recognize and categorize network traffic, find abnormalities,
and improve network performance.
B. Classification methods
Supervised Learning - For NTA utilizing ML, supervised
categorization techniques are frequently utilized. The model
gains knowledge from labeled data to forecast instances of fresh,
unforeseen network traffic. The following are a few well-liked
supervised classification techniques for NTA.
Decision Trees - For categorization jobs, decision trees offer a
straightforward but efficient approach. To make decisions, they
divided the data into subgroups based on several attributes and
built a tree like structure. The edges of the tree represent
decisions based on the features represented by the nodes, which
make up the tree. Random Forest: An ensemble method that
mixes various decision trees is known as a random forest. To
produce more precise predictions, it builds multiple decision
trees during training and aggregates their results. Random forests
increase the overall robustness and less overfitting.
SVM - SVM is an effective binary classification system that
handles both linear and non-linear data with ease. It seeks to
identify the hyperplane in the feature space that best divides the
classes. One-vs-all or one-vs-one approaches, for example, can
be used to extend SVM to handle multi-class classification. It
uses a logistic regression model to estimate the likelihood of a
particular class occurring depending on features k-Nearest
Neighbors (k-NN) is a straightforward and understandable
classification algorithm. When a new instance is encountered,
the system examines the ‘k’ nearest labeled examples and assigns
the class that shows up the most frequently among the ‘k’
neighbors. Naive Bayes is a probabilistic classification technique
built on the Bayes theorem. NTA can be done using neural
networks, such as convolutional neural networks (CNNs).
Although they need a lot of labeled data and computer power to
train for complex data patterns. Gradient Boosting - Gradient
boosting is another ensemble technique that sequentially
constructs a number of weak learners (typically decision trees),
each of which attempts to correct the mistakes of the previous
ones. It is renowned for having a high degree of predictability.
These well-liked gradient boosting algorithm implementations,
XGBoost and LightGBM, have been scaled up and improved for
performance. Since they are so effective, they are frequently used
in many classification tasks. AdaBoost is an ensemble learning
technique that concentrates on hard-to-classify instances. In the
subsequent rounds, the weights are iteratively adjusted to give
more weight to instances that were incorrectly classified. It
assigns weights to the training instances. Unsupervised Learning
- Unsupervised classification, also known as clustering, is a ML
approach where the goal is to group similar instances together
without any predefined labels. In the context of NTA,
unsupervised classification methods aim to discover underlying
patterns and structures in the data without using labeled
examples. common unsupervised classification methods used by
NTA.
Hierarchical Clustering: Hierarchical clustering creates a tree-
like structure of nested clusters, also known as a dendrogram. It
can be agglomerative (bottom-up) or divisive (top-down). At
each step, the algorithm merges or splits clusters based on their
similarity.
DBSCAN : density- based clustering algorithm that groups point
together based on their density in the feature space. It can
automatically identify outliers as noise points.
Gaussian Mixture Models (GMM): It estimates the parameters of
distributions and assigns each point to the most probable cluster.
Self-Organizing Maps (SOM): Type of ANN that projects high-
dimensional data onto a lower-dimensional grid. It organizes the
data in a topological manner, grouping similar instances closer
to each other on the grid.
Agglomerative Information Bottleneck (AIB): AIB is a clustering
algorithm that finds the optimal trade-off between clustering and
preserving information, useful for analyzing high-dimensional
data like network traffic.
OPTICS (Ordering Points to Identify the Clustering Structure):
OPTICS is an extension of DBSCAN that creates a reachability
plot, which allows for flexible clustering based on different
density levels.
BIRCH (Balanced Iterative Reducing and Clustering using
Hierarchies): BIRCH is a hierarchical clustering method that
constructs a tree-like structure to efficiently cluster large
datasets. Unsupervised classification methods can be valuable
for NTA, especially when there is little or no labeled data
available or when exploring the data for potential patterns and
anomalies. Additionally, domain expertise is often required to
interpret the clustering results effectively and derive meaningful
insights from the discovered patterns.
C. Online learning and offline learning
A ML model is trained online, with incremental updates made as
new data comes in. As a result, the model is always evolving and
modifying to account for new data. When doing real-time data
streaming activities like fraud detection or spam filtering, online
learning is frequently used. One way to train a ML model is
through offline learning, which involves training the model on
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 374
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
static data. As a result, the model does not change as new data
becomes available. In tasks like image classification or natural
language processing, where the data is not streaming in real time,
offline learning is frequently used.
IV. NETWORK TRAFFIC ANALYSIS TOOLS
NTA tools are software applications or platforms designed to
monitor, capture, analyze, and visualize network traffic. some
popular NTA tools are.
Wireshark:Wireshark is a widely used open-source packet
analyzer that allows users to capture and inspect network
packets in real-time.
tcpdump: tcpdump is a command-line packet capture tool
available on Unix-like operating systems. It can capture and
display network traffic in real-time or save it to a file for later
analysis.
Ethereal: Ethereal, now known as Wireshark, is a packet
analyzer that runs on multiple platforms and supports a broad
range of protocols.
tshark: tshark is the command-line version of Wireshark,
allowing users to perform packet analysis without a graphical
interface.
NetFlow Analyzer: NetFlow Analyzer is a tool that collects and
analyzes NetFlow data from network devices to provide
insights into network traffic patterns and bandwidth usage.
PRTG Network Monitor: It monitors bandwidth usage, traffic
patterns, and provides detailed reports.
Ntopng: Ntopng is a high-performance NTA tool that provides
real-time and historical traffic data, including top talkers,
protocols, and applications.
SolarWinds Network Performance Monitor: This tool offers
NTA, monitoring, and alerting features to help administrators
identify and resolve network issues.
Nagios: Nagios is a popular open-source network monitoring
and alerting system that can be extended with plugins to
perform traffic analysis.
Capsa Network Analyzer: Capsa is a user-friendly network
analyzer that offers real-time and historical NTA, protocol
analysis, and bandwidth monitoring.
Splunk: Splunk is a versatile platform used for log aggregation,
analysis, and visualization, making it suitable for NTA when
combined with appropriate plugins.
Observer (formerly Network Instruments Observer): Observer
is a comprehensive network monitoring and analysis platform
that provides in-depth traffic analysis capabilities.
FlowTraq: FlowTraq is a network security and traffic analysis
tool that focuses on NetFlow and IPFIX data.
Graylog: Graylog is a log management and analysis platform
that can be utilized for NTA when network logs are available.
These tools offer various features and capabilities, so the
choice of the right tool depends on the specific requirements
and objectives of the network analysis tasks. Some tools are
more suitable for real-time monitoring and alerting, while
others focus on in-depth protocol analysis and historical data
examination.
V. POPULAR ML AND DL ALGORITHMS AND FRAMEWORKS
FOR NTA
A.ML algorithms
SVMs: SVMs are used for NTA because they learn complex
relationships between the features of network traffic.
Decision trees: Decision trees are used for NTA because they
are easy to interpret and to identify important features of
network traffic.
B.DL algorithms
CNNs: CNNs are a type of DL algorithm that is well-suited for
tasks that involve processing images or sequences of data.
CNNs have been used for NTA to identify malicious traffic,
such as malware or botnets.
Long short-term memory networks (LSTMs): LSTMs are a type
of DL algorithm that is well-suited for tasks that involve
processing sequential data. LSTMs have been used for NTA to
identify patterns in network traffic that are indicative of
malicious activity.
Generative adversarial networks (GANs): GANs are a type of
DL algorithm used to generate synthetic data that is similar to
real data. GANs have been used for NTA to generate synthetic
network traffic used to train ML models.
C. Frameworks
TensorFlow: TensorFlow is a popular open-source framework
for ML and DL. TensorFlow is well-suited for NTA because it
is able to process large amounts of data quickly and efficiently.
PyTorch: PyTorch is another popular open-source framework
for ML and DL. PyTorch is similar to TensorFlow, but it is
more flexible and easier to use.
Scikit-learn: Scikit-learn is a popular Python library for ML.
Scikit-learn includes a number of algorithms used for NTA,
such as SVMs, decision trees, and random forests. The specific
algorithms and frameworks that are used will depend on the
specific task and application.
VI. IMPLEMENTATION OF REAL TIME NETWORK TRAFFIC
ANALYSIS
Real-time NTA refers to the process of monitoring and
analyzing network traffic as it occurs, providing immediate
insights into the current state of the network. This type of
analysis is crucial for detecting and responding to network
anomalies, security threats, performance issues, and other
events that require timely action. Real-time NTA typically
involves the following steps:
Packet Capture: To perform real-time analysis, network
packets need to be captured as they traverse the network. This
can be achieved using packet capture tools like Wireshark or
tcpdump, or by leveraging network devices that support packet
capture functionalities.
Packet Filtering and Processing: As packets are captured, they
need to be filtered and processed to extract relevant
information and features. This step may involve filtering out
irrelevant traffic, dissecting protocol headers, extracting
payload data, and generating flow data for efficient analysis.
Feature Extraction: Important features are extracted from the
network traffic data. These features may include source and
destination IP addresses, ports, protocols, packet sizes, payload
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 375
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
types, packet timestamps, and more. These features are crucial
for building models and making real-time decisions.
Real-Time Analysis: These analyses can include identifying
abnormal patterns, detecting security threats like intrusion
attempts, predicting network performance issues, or classifying
traffic based on application or behavior.
Alerting and Visualization: If the analysis reveals any
significant events or anomalies, real-time alerts are triggered to
notify network administrators or security teams. Visualization
tools are also used to provide real-time graphical
representations of network traffic, making it easier to interpret
and respond to the data.
Continuous Monitoring: Real-time NTA is an ongoing process
that requires continuous monitoring of the network. The
captured data is constantly updated, and the analysis is
performed in real-time on the latest information.
Response and Mitigation: Based on the analysis and alerts,
network administrators can take immediate action to mitigate
threats, resolve performance issues, or investigate suspicious
activities in real-time.
Real-time NTA is essential for ensuring network security,
optimizing network performance, and maintaining the overall
health of the network. It allows organizations to respond
quickly to incidents and make informed decisions based on up-
to-date network information.
Software Requirements - Python 3.11.4 was used as the high-
level language to create the system’s software. The Jupyter
notebook environment was used for the development process.
The model was developed using the Python language’s pandas,
scikit-learn, seaborn, matplotlib, and Numpy libraries.
• tcp/udp/icmp length- length after fragmentation
• protocol- if the protocol is used or not.
• label-if the label is used or not (virtual circuit approach
or datagram approach)
VII. RESULTS AND ANALYSIS
Dataset specifications and features - The properties of network
traffic utilized to distinguish and assess network activities are
referred to as network traffic features. The following uses for
these features are just a few examples: Security of the network:
Features of network traffic can be exploited to spot malicious
traffic, such as malware or botnets. Network troubleshooting:
Issues with the network, like packet loss or latency, can be
resolved by using the features of the network traffic. The
network traffic features that are used are.
• SERVICE-the service value for its packets to a high
value to ensure that the packets are routed quickly and
reliably higher the no more the reliability.
• Sport- senders port address at application layer
• Dport- destination port address at application layer
• ttl-time to live
• ip_length- original length of the ipv4 packet
• ip_checksum- checksum value for error detection
• ip_id-IP-used to assist packet segmentation and
reassembly.
• ip_offset- data in the prior bits divided by 8.
Random forest model is implemented. And obtained an
accuracy of 99.31%.
Fig.1 Code snippet for confusion matrix
Fig.2 Code snippet for training and testing of random forest model.
Fig.3.Code snippet for data normalization using scalar
Fig.4.Data split about features
Figures 1 to 3 show the code snippet of training and testing of
random forest model to generate confusion matrix. Figure 4
shows the details of data split features. It is a useful tool for
understanding how well the algorithm is able to distinguish
between different classes. Table 1 shows the confusion matrix.
TABLE 1. CONFUSION MATRIX OF TRAINING AND TESTING
Training accuracy = 99.29 %
Testing accuracy = 99.31%
The accuracy of a model, which considers both precision and
recall. It is derived as the harmonic mean of recall and
precision, giving both measures equal weight. In a multi-class
classification task, averaging measurements over several
classes is done using the macro average technique. It is an
approach to combine the outcomes of a classification model
across various classes without assigning any one class more
importance. An average that considers the relative significance
of various values in a data collection is called a weighted
average. Each value is multiplied by a weight in a weighted
average, and the total of the weighted values is then divided by
the total of the unweighted values.
VIII. DESIGN AND IMPLEMENTATION CHALLENGES
Data volume and complexity-NTA typically involves
collecting and analyzing large amounts of network traffic data.
This data can be very complex, as it can include a variety of
different protocols, packet types, and encryption levels.
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 376
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
False positives and negatives-NTA systems can generate false
positives and negatives. False positives occur when the system
incorrectly identifies benign traffic as malicious. False
negatives occur when the system incorrectly identifies
malicious traffic as benign.
Scalability-NTA systems scale to handle large amounts of
network traffic. This can be a challenge, as NTA systems
typically need to be able to process data in real time.
Cost - NTA systems can be expensive to implement and
maintain and collect and analyze large amounts of network
traffic data.
A. Applications
Network security: NTA used to mitigate security threats, such
as malware, botnets, and denial-of-service attacks.
Network performance: NTA used for performance bottlenecks
and optimize network performance.
Network troubleshooting: NTA used to troubleshoot network
problems, such as packet loss or latency.
Compliance: NTA used to help organizations comply with
regulations, such as those related to data privacy and security.
Forensics: To investigate security incidents and to gather
evidence for legal proceedings.
Business intelligence: NTA used to gather insights into user
behavior and to improve business decision-making.
B. Future research directions
NTA is a rapidly evolving area, and there are a number of open
issues and future research directions. Some of these issues
include:
Heterogeneity of network traffic - number of devices and
applications connected to the internet increasing, the
heterogeneity of network traffic is increasing. This makes it
more difficult to develop NTA systems that can accurately
identify and classify all types of network traffic.
Dynamic nature of network traffic - Network traffic is
constantly changing, as new applications and protocols are
developed and as users' behavior changes. This dynamic nature
makes it difficult to design NTA systems that can keep up with
the latest changes in network traffic.
Privacy concerns - NTA systems collect and analyze large
amounts of network traffic data, which can raise privacy
concerns for users.
Cost - NTA systems can be expensive to implement and
maintain.
IX. CONCLUSION
NTA using ML is a powerful and essential approach for
gaining valuable insights, enhancing security, optimizing
performance, and automating network management. By
harnessing the capabilities of ML algorithms, organizations
can effectively process and interpret the vast amount of
network traffic data generated every day. This data-driven
approach enables us to make informed decisions, respond
rapidly to security threats, and improve the overall efficiency
of their networks. real time NTA implemented using random
forest algorithm and obtained an accuracy of 99.31%. Some
key takeaways from NTA using ML are Enhanced Security-
ML models can detect and prevent various security threats,
including malware, intrusions, and denial-of-service attacks,
by analyzing network traffic patterns and identifying
suspicious activities in real-time. Anomaly Detection-ML
algorithms can identify abnormal network behavior and
performance deviations, enabling prompt action and
minimizing downtime caused by network issues. Optimized
Performance-through predictive analysis and traffic
engineering, ML can help optimize network performance,
capacity planning, and QoS management, ensuring a seamless
user experience. Automated Network Management- ML
models integrated into network management systems can
automate decision-making processes, load balancing, and
resource allocation, reducing manual intervention and
improving network efficiency. Insights into User Behavior-
ML analysis provides valuable insights into user behavior,
allowing organizations to detect potential insider threats or
suspicious activities. Efficient Traffic Classification-ML can
accurately classify network traffic into different application
types, aiding in bandwidth management and prioritizing
critical applications. Network Forensics and Predictive
Maintenance-ML assists in network forensics, enabling the
investigation of past incidents and predicts equipment failures
to support proactive maintenance.
Despite the numerous benefits, NTA using ML also poses
some challenges. Properly handling and preprocessing large-
scale network data, selecting appropriate features, dealing with
imbalanced datasets, and avoiding overfitting are some of the
challenges that need to be addressed. To leverage the full
potential of NTA using ML, continuous research and
development in the field of artificial intelligence, data science,
and cybersecurity are vital. Additionally, collaboration
between domain experts and data scientists is essential to
design effective models tailored to specific network
environments and use cases. As technology advances and new
ML techniques emerge, NTA will continue to evolve, playing
a pivotal role in securing networks, improving performance,
and enabling more efficient network management for
organizations across various industries.
REFERENCES
[1] Neeraj Namdev et al “Recent Advancement in ML Based Internet Traffic
Classification”, Procedia Computer Science,Volume 60, 2015, pp.784-
791.
[2] Nour Alqudah et al “ML for Traffic Analysis: A Review”, Procedia
Computer Science,Volume 170, 2020, pp.911-916.
[3] A, Jamuna; S.E, Vinoth Ewards. “Survey of Traffic Classification using
ML” International Journal of Advanced Research in Computer Science,
pp.65-71, 2017.
[4] T. Bujlow et al “A method for classification of network traffic based on
C5.0 ML Algorithm,” International Conference on Computing,
Networking and Communications (ICNC), 2012, pp. 237-241.
[5] Amin Shahraki et al “A comparative study on online ML techniques for
network traffic streams analysis”,Computer Networks,Vol. 207, 2022,
pp.1389-1286.
[6] A. Priya et al “An Analysis of real-time network traffic for identification
of browser and application of user using clustering
algorithm,”International Conference on Advances in Computing,
Communication Control and Networking (ICACCCN), 2018, pp. 441-
445.
[7] V. A. Muliukha et al “Analysis and Classification of Encrypted Network
Traffic Using ML,” International Conference on Soft Computing and
Measurements (SCM), 2020, pp. 194-197.
[8] E. Nazarenko et al “Application for Traffic Classification Using ML
Algorithms,” International Conference Quality Management, Transport
and Information Security, Information Technologies (IT&QM&IS),
2020, pp. 269-273.
[9] E. Osa et al “Comparative Analysis of ML Models in Computer Network
Intrusion Detection,” International Conference on Disruptive
Technologies for Sustainable Development (NIGERCON), 2022, pp. 1-5.
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 377
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.
[10] L. Trajković et al “Data Mining and ML for Analysis of Network Traffic
PLENARY TALK,” 2021 IEEE 19th International Symposium on
Intelligent Systems and Informatics (SISY), 2021, pp. 11-12.
[11] Yang et al “Research on Network Traffic Identification based on ML
and Deep Packet Inspection”, 2019, pp. 1887-1891.
[12] Y. Xue et al “Traffic classification: Issues and challenges,” International
Conference on Computing, Networking and Communications (ICNC),
2013, pp. 545-549.
[13] A. Boukhalfa et al “NTA using Big Data and DL Techniques,”
International Conference on Optimization and Applications (ICOA),
2020, pp. 1-4.
[14] K. Limthong et al “Network traffic anomaly detection using ML
approaches,” IEEE Network Operations and Management Symposium,
2012, pp. 542-545.
[15] M. Shafiq et al “Network Traffic Classification techniques and
comparative analysis using ML algorithms,” IEEE International
Conference on Computer and Communications (ICCC), 2016, pp. 2451-
2455.
[16] M. Ramires et al “Network Traffic Classification using ML: A
Comparative Analysis,” 2022 17th Iberian Conference on Information
Systems and Technologies (CISTI), 2022, pp. 1-6.
[17] D. Szostak, K et al “Short-Term Traffic Forecasting in Optical Network
using Linear Discriminant Analysis ML Classifier,” International
Conference on Transparent Optical Networks (ICTON), 2020, pp. 1-4.
[18] M. Soykan et al “Tor Network Detection By Using ML And Artificial
Neural Network,” 2021 International Symposium on Networks,
Computers and Communications (ISNCC), 2021, pp. 1-4.
[19] Iraj Lohrasbinasab et al “From statistical- to ML-based network traffic
prediction”, Transactions on Emerging Telecommunications
Technologies,2016.
[20] Singh Kuldeep et al “A Near Real-time IP Traffic Classification Using
ML” International Journal of Intelligent Systems and Applications.
Vol.5, pp.83-93, 2013.
Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
IEEE Xplore Part Number: CFP22DN7-ART; ISBN: 979-8-3503-0085-7
979-8-3503-0085-7/23/$31.00 ©2023 IEEE 378
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 21,2023 at 07:51:05 UTC from IEEE Xplore. Restrictions apply.