Content uploaded by Mahmoud Abbasi
Author content
All content in this area was uploaded by Mahmoud Abbasi on Mar 15, 2022
Content may be subject to copyright.
Computer Networks 207 (2022) 108836
Available online 16 February 2022
1389-1286/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
Survey paper
A comparative study on online machine learning techniques for network
traffic streams analysis
Amin Shahraki a,∗,Mahmoud Abbasi b,Amir Taherkordi c,Anca Delia Jurcut a
aSchool of Computer Science, University College Dublin, Dublin, Ireland
bDepartment of Computer Engineering, Islamic Azad University, Mashhad, Iran
cDepartment of Informatics, University of Oslo, Oslo, Norway
ARTICLE INFO
Keywords:
Machine learning
Online learning
Network traffic streams
Network traffic classification
Internet of Things
Deep Learning
ABSTRACT
Modern networks generate a massive amount of traffic data streams. Analyzing this data is essential for various
purposes, such as network resources management and cyber-security analysis. There is an urgent need for data
analytic methods that can perform network data processing in an online manner based on the arrival of new
data. Online machine learning (OL) techniques promise to support such type of data analytics. In this paper,
we investigate and compare the OL techniques that facilitate data stream analytics in the networking domain.
We also investigate the importance of traffic data analytics and highlight the advantages of online learning in
this regard, as well as the challenges associated with OL-based network traffic stream analysis, e.g., concept
drift and the imbalanced classes. We review the data stream processing tools and frameworks that can be used
to process such data online or on-the-fly along with their pros and cons, and their integrability with de facto
data processing frameworks. To explore the performance of OL techniques, we conduct an empirical evaluation
on the performance of different ensemble- and tree-based algorithms for network traffic classification. Finally,
the open issues and the future directions in analyzing traffic data streams are presented. This technical study
presents valuable insights and outlook for the network research community when dealing with the requirements
and purposes of online data streams analytics and learning in the networking domain.
1. Introduction
The increasing interest in the adoption of new networking
paradigms, e.g., Internet of Things (IoT), Internet of Vehicles (IoV)
and the new generations of cellular networks, i.e., 5G and 6G [1],
has led to significant growth of networks and consequently in data
communication in the digital world [2]. The volume of mobile data
traffic generated in 2022 is approximated to be one zettabyte as
reported in the Cisco Annual Internet Report [3]. In addition, due to
the remarkable advances in sensing technologies, computation power of
miniaturized devices and communication protocols, it is very attractive
to deploy smart devices in various applications such as smart factories,
healthcare, and smart cities. Data in such applications is generated
and/or consumed at massive scales over time, and transmitted through
various communication protocols from Device to Device (D2D) to
satellite communication types [4,5].
Emerging networking and computing paradigms, such as 5G and
6G [1], and Edge computing [6] motivate researchers to equip net-
works with Self-organizing Networks (SON) and Self-Sustaining Net-
works (SSN) techniques [7]. Monitoring the performance of networks
∗Corresponding author.
E-mail address: am.shahraki@ieee.org (A. Shahraki).
is the keystone of such organizing techniques. Although different plat-
forms are introduced to monitor the performance of networks through
analyzing the network traffic streams, processing such huge volume of
data is a big challenge [8]. Analyzing network traffic streams is con-
sidered a big challenge as the data can be huge and complex in many
IoT applications, such as Augmented Reality (AR), vehicular networks,
interactive gaming, and event monitoring [5]. The need for analyzing
such data leads to the emergence of Network Traffic Monitoring and
Analysis (NTMA) techniques which can be used for different purposes
including discovering hidden traffic stream patterns [9], making con-
trol decisions (e.g., routing and assigning resources), and forecasting
possible events and problems, e.g., network traffic prediction, and
fault&security management, respectively [8]. The time-sensitivity and
high-speed data streaming nature of these applications and networks
call for real-time (or near real-time) and online data analytics ap-
proaches. Although many traditional NTMA techniques exist [10], new
techniques should be proposed or existing NTMA methods may need to
be adapted for analyzing data streams in such applications.
https://doi.org/10.1016/j.comnet.2022.108836
Received 23 July 2021; Received in revised form 23 December 2021; Accepted 3 February 2022
Computer Networks 207 (2022) 108836
2
A. Shahraki et al.
Different NTMA techniques have been introduced, e.g., heuristic
models, statistical-based techniques and Machine Learning (ML) tech-
niques which are mainly focused on extracting patterns and anomalies
of the streams, but among them, ML techniques generally promise
better performance with respect to accuracy and speed [11]. Different
ML techniques have been introduced to process streams and time-series
datasets, e.g., Recurrent Neural Network (RNN) and Long Short-Term
Memory (LSTM). Although there is a large body of the literature in this
field, some important challenges exist that should be considered when
applying these techniques in real-world applications of NTMA, as listed
below:
•Concept drift [12]: The statistical analysis of data streams is a
challenging task due to changes of training and real samples over
time—concept drift. Long-term learning under concept drift may
influence the accuracy of ML models.
•‘‘Theory of network’’: It refers to the challenge that ML models
trained by benchmark datasets cannot be as accurate as when
used in real-world networks; thus each ML model should be
trained separately for each specific network [13].
•Re-training problem [14]: Although a trained ML model can have
good performance at the beginning, due to the dynamicity of
networks, its accuracy can be reduced considerably. ML models
should be re-trained, but the networks cannot be left unattended
during re-training.
•Shortage of representative dataset: The shortage of public datasets
that can represent the new networking paradigms, e.g., IoT net-
works, is quite common. In addition, in the case of NTMA, most
existing datasets are based on statistics gathered from the net-
works, e.g., network traffic matrix that cannot reflect the real-
world networks.
•Interacting with the network: Most networking paradigms are
very dynamic with respect to, e.g., mobility, and generating vari-
ous traffic streams with different characteristics. Thus, the target
ML model should be able to interact with the network and update
the trained model gradually to support new situations.
As mentioned in [15], there are two visions for ML techniques
including batch/offline learning and online learning. Traditional
learning-based methods, e.g., batch/offline machine learning, cannot
adjust their structure and parameters to obtain the insights hidden
in dynamic traffic data. As a result, we need novel data analytics
techniques that are able to adapt according to the new data arriving
online. This is essentially a characteristic of OL [16]. OL is an incre-
mental learning-based algorithm in which the algorithm receives data
sequentially, builds a model, puts the model in operation and updates
it frequently with new data. In OL, the whole training dataset is not at
hand at the beginning of the training process. The model is continually
tuned as more instances are received [17]. OL can solve most of the
aforementioned challenges fully or partially. In light of this, we aim to
address the following research question in this paper:
Would online learning techniques be a more efficient design choice for
analyzing network traffic streams as compared with offline methods?
To answer this question, in this paper, we provide a comprehensive
study to show how OL can deal with network traffic stream analysis.
We review the main methods and techniques for data stream mining
in networks. Then, different OL frameworks are discussed including
their pros and cons. Following the tools and frameworks, we study
the challenges of offline methods for network traffic stream analysis
and the advantages of using OL in addition to the challenges in using
online learning techniques. To evaluate the performance of OL in
network traffic stream analysis, we evaluate five online ensemble based
algorithms, as well as three online tree-based algorithms on different
benchmark datasets.
Last but not least, we consider the open issues and future directions
in using OL for network traffic stream analysis and conclude the study
at the end. To summarize, the main contributions of our work include:
•Investigation of the OL paradigm for network traffic stream
•Making a comprehensive comparison between frameworks for
stream processing
•Investigating and discussing the challenges associated with OL
in the context of networking, e.g., concept drift and imbalanced
classes
•Providing an empirical evaluation on the performance of OL for
network traffic classification
•A detailed analysis of the open issues and future directions in
analyzing traffic data streams
The methodology we adopted for the framework of this comparative
study consists of the following steps. First, we extracted the list of main
stream processing tools as they play a key role in gaining valuable
knowledge from massive data. We compiled the list based on our
comprehensive search in relevant scientific articles, white papers in
industry, and the widely-used open-source data streaming tools and
services. Second, we searched carefully for the reported research con-
tributions in the literature and relevant research projects within OL and
network data processing. We selected those contributions with high
citation counts or published in top ranking venues and journals. Fi-
nally, for empirical evaluation, we selected the most common learning
methods in the streaming setting based on the available literature.
The rest of the paper is organized as follows. Section 2discusses ex-
isting work on the OL and data streams analytics. Section 3introduces
OL, the learning procedure, the available OL algorithms, techniques
and frameworks, and investigates the advantages of OL for traffic data
streams analytics. Further, Section 4discusses the major weaknesses of
traditional batch learning approaches in analyzing the network traffic
streams, while highlighting the key advantages of OL methods for
communication systems. Furthermore, Section 5investigates existing
challenges in OL, focusing especially on the concept drift and the
imbalanced classes. Moreover, the performance of OL for network
traffic classification is empirically evaluated in Section 6. Finally, Sec-
tion 7introduces the open issues and future directions, while Section 8
concludes the paper.
2. Related work
OL and the associated challenges have been well researched in
existing literature [18]. However, its applicability in communication
systems and networks is quite recent. To the best of our knowledge,
this is the first study that investigates the particular relation between
OL and traffic data stream analytics and the challenges in apply-
ing OL methods in communication systems. It is noteworthy that a
limited number of papers discuss OL and other learning-based tech-
niques for some specific types of network traffic analysis applications,
e.g., intrusion detection and security.
The work in [19] provides a comprehensive study on OL. The
paper discusses OL’s applications, the taxonomy of OL techniques,
theoretical aspects, and future directions of OL. The author in [20]
reviewed and analyzed a wide range of online convex optimization
algorithms. These algorithms include Dual Averaging, Mirror Descent,
FTRL, FTRL-Proxima, and many earlier papers. The authors in [18]
reviewed the paradigm of incremental OL and analyzed the state-of-
the-art related algorithms. The authors analyzed the main features of
eight popular incremental techniques in their work, as these techniques
represent various algorithm classes. The comparison was made based
on online classification errors and the behavior of techniques in the
limit. Moreover, the problem of hyperparameter optimization has been
discussed for the selected techniques. Shalev-Shwartz [21] studied
online convex optimization and the vital role of convexity in achieving
efficient OL techniques. The author concludes his work by highlighting
the fact that convexity can lead to efficient OL algorithms. The work
in [22] discusses the state-of-the-art incremental learning methods and
presents a survey of the highest research activities in incremental
Computer Networks 207 (2022) 108836
3
A. Shahraki et al.
Fig. 1. Data stream mining algorithms and techniques.
learning. A similar work has been conducted by Joshi and Kulkarni
in [23]. In [24], Masana et al. conducted a comprehensive evaluation of
recently proposed class-incremental learning methods. To this end, they
conducted a comprehensive experimental evaluation on twelve class-
incremental techniques on several large-scale datasets and multiple
network architectures. Last but not least, the work in [25] reviewed
the incremental learning techniques for face recognition applications.
Furthermore, the authors in this work provide a novel taxonomy of the
incremental learning techniques and analyze the incremental learning
techniques for face recognition tasks over different datasets.
3. Online learning: Concepts and techniques
Before diving into the details, we provide the fundamental back-
ground relevant to this work. Batch/offline learning refers to a learning-
based approach in which the complete training dataset is available
prior to the training phase, and hence, the whole dataset can be taken
into account in adjusting the structure and parameters [26]. One of the
fundamental assumptions in batch/offline learning is that the underly-
ing distribution of data is approximately independent and identically
distributed (i.i.d). In contrast, OL refers to a learning strategy in which
a learner is continuously updated based on arriving streaming data.
In another type of OL, the learner can be updated through retraining
on newly arrived batches of data [17]. There is a large body of the
machine learning literature that emphasizes the difference between the
terms online learning and incremental learning [26,27]. The literature
defines incremental learning as a learning technique that processes data
in batches or chunks, while in online learning, data arrives gradually
over time, and the learner processes the data items one-by-one. From a
different perspective, online learning is considered as an incremental
learning algorithm [28]. Throughout this paper, we use the terms
’online learning’ and ’incremental learning’ interchangeably, though
they are not essentially the same concept.
Apart from its definition, OL plays an active role in a variety
of applications, including robotics and human–computer-interaction
(HCI), data streams analytics and big data processing, image and video
processing, automated data annotation, and outlier detection [29,30].
The majority of these applications are incremental in nature because
the data generated or/and collected arrives over time as stream and
they are open-ended systems. The advantages of adopting OL algo-
rithms include working with limited memory, handling concept drift,
and treatment of streaming/big data [26]. It is noteworthy that OL
appears under different paradigms in the literature, such as supervised
learning, unsupervised learning, and semi-supervised learning. In this
paper, we will largely consider supervised learning techniques because
of their popularity in various real-world applications.
3.1. Learning procedure
In supervised online learning, it is assumed that data items 𝐷=
((𝑥1, 𝑦1),(𝑥2, 𝑦2),(𝑥3, 𝑦3),…,(𝑥𝑚, 𝑦𝑚)) are available—one sample at a
time. We refer to 𝑥𝑖as input instance and 𝑦𝑖as target value or label. For
a classification task, 𝑦𝑖accepts discrete values and in a regression task,
it adopts continuous values. One pair of (𝑥𝑖, 𝑦𝑖)is a training example.
We aim to build a predictive model 𝐹≈𝑝(𝑦∣𝑋)from training
examples. In batch machine learning, algorithms are often trained on
all training examples, since the data is available priorly. However, there
are situations in which data 𝐷is not at hand priorly, e.g., the output of
a fall detection system. In such applications, data arrives over time,
i.e., input instance 𝑥𝑡at time step 𝑡and 𝑥𝑡+ 1 at time step 𝑡+ 1.
Given the classification task, we aim to construct a working classifier 𝐹𝑡
after every time step by means of the training example (𝑥𝑡, 𝑦𝑡)and the
model from the previous step 𝐹𝑡−1. This is possible through adopting OL
techniques, which receive training samples one-by-one, while their true
labels are not known at the time of receiving. The OL algorithms use
the training examples to optimize their loss/cost function and fine-tune
their parameters. To this end, OL can use stochastic optimization meth-
ods, like online backpropagation algorithms [31] and self-organizing
maps (SOM) [32]. It is noteworthy that the online classifier 𝐹will
obtain the true label 𝑦𝑡after a few time steps, and then use it to assess
the performance of the classifier and further improve the classifier.
Moreover, the time intervals between different training examples (e.g., 𝑡
to 𝑡+ 1 and 𝑡+ 1 to 𝑡+ 2) are not essentially the same.
3.2. Algorithms, techniques and frameworks of OL
The techniques mainly utilized in data stream mining are Classi-
fication, Regression, Ensemble methods, and Clustering, as depicted
in Fig. 1. In classification algorithms, the learning models seek to
predict the new instance class, where a label from a set of nominal
labels is assigned for each item. A good example is the intrusion
detection systems to label the incoming traffic into normal or attack
class. Another example is to predict the incoming email message as
spam or not-spam. The main algorithms used for classification include
Baseline Classifiers, Decision Trees, Lazy Learning, and Handling Nu-
meric Attributes. Regression is used for prediction tasks [15,33] like
classification, where numeric values are used instead of nominal labels.
One significant example of regression task is to predict the value of a
stock in the stock market for the next days. Some techniques which
are used for classification can also be applied to regression directly,
while other classification techniques need to be updated before being
applied in regression tasks. Such techniques include the Lazy Learning
and Decision Tree algorithms.
Computer Networks 207 (2022) 108836
4
A. Shahraki et al.
In supervised machine learning techniques (e.g., in classification and
regression), samples of the labeled data are used in the learning process.
Then, the trained model will be used later for unseen data. In case,
there are difficulties to obtain the labeled data, clustering can be used,
which is an unsupervised learning model. In clustering, the input data is
classified into a set of groups based on the similarities between the data
samples. Popular clustering algorithms used for stream mining include
K-Means, StreamKM++, DBSCAN, Den-Stream, BIRCH, BICO, and C LU
S TREAM. Finally, Ensemble Predictors are another technique used for
streaming scenarios to improve the prediction accuracy by combining
single models together, that are individually weak. Popular ensemble
techniques are Bagging and Boosting.
Given the ever-increasing growth of big data and the development
of IoT systems, data stream processing plays a key role in gaining
valuable knowledge from a massive amount of real-time data gen-
erated in different scenarios (e.g., smart cities) [34]. Hence, several
corporations and researchers have provided frameworks and software
libraries for streaming data analytics. In Table 1, we provide a summary
of the available stream processing tools (i.e., frameworks, libraries,
toolkits, etc.). Our summary provides relevant information per tool,
and reports the evaluation of the tools with respect to their pros and
cons. In preparing the table, we do not seek to provide a full list of
all stream processing tools; however, we offer chances to beginners
to get started and researchers to make their contributions. In addition
to the pros and cons, Table 1 reviews the integration opportunities of
each stream processing tool, showing how different tools can be used
simultaneously to cover their vulnerabilities or complete each other. In
case of type, the tools can be different, e.g., engine, library and data
pipeline, to name a few as they can complete each other if compatible.
Additionally, we have investigated whether the tools are maintained
or not. In some cases, it is difficult to show their status as there is
no official cease for a tool, but they have not been updated for a long
period of time.
4. Analyzing network traffic streams: Offline vs. online learning
In this section, we focus on the importance of data generation in
communication systems and networks, highlight the major weaknesses
of traditional batch learning approaches in analyzing the network
traffic streams, and discuss the key advantages of OL methods for
communication systems. At the moment, IoT is considered the most
important emerging networking paradigm. Thus, in this section, we
emphasize on IoT and related paradigms, e.g., Industrial IoT (IIoT) as
one of the most important real-world applications of network traffic
stream analytics.
4.1. The importance of gaining insight into network traffic
Regarding the ever-increasing interest in the deployment of com-
munication systems and networks, such as IoT, IIoT, and 5G, traffic
data analytics will be of paramount importance for traffic prediction
and classification, security purposes, industrial machines maintenance,
and real-time wildfire monitoring and prediction in new networking
paradigms. As a motivating example, one can refer to smart home
security, in which IoT streaming data analytics is used for intrusion
detection [35]. Data analytics has also become important for time-
sensitive IoT applications, such as vandalism and accident detection,
interactive gaming, and augmented reality [36]. In such applications,
sensors, actuators, and controllers generate real-time data streams.
Employing effective machine learning techniques in order to process
this data, especially at the edge of the network [37], can lead to the
following benefits: low-latency communications, real-time insights, and
performing complex control tasks [38]. Moreover, fast IoT data stream
analytics is highly desired for applications that produce high-speed
streams data, e.g., autonomous cars, in which vehicle or the driver
needs to do time-sensitive, real-time (or near real-time) actions.
Industrial IoT is another application that receives real benefits of
data streams analytics. In IIoT, deployment of a wide range of sensors
and IoT devices, powered by learning models, makes it possible to have
actionable insights in order to build modern control systems, perform
careful and continuous monitoring, and deliver new services [39].
IIoT data analytics has demonstrated its ability to improve the ma-
chine industry process by minimizing the downtime of the industrial
equipment, making the maintenance of machines easier, and predicting
the failures and remaining useful time of industrial machines. In the
Industry 4.0 era, there are many condition-based maintenance (CBM)
methods that analyze the data gathered by IIoT devices or sensors to
provide an estimate of the point in time when industrial equipment
starts to work irregularly and to fix them or replace their defective
elements beforehand [40].
Backed to the mobile networks, we further discuss the importance
of traffic data analytics. Based on the Cisco Annual Internet Report
(2018–2023), the number of mobile users in the world is expected to be
around 5.7 billion by 2023. Moreover, the cellular network’s Internet
speeds will experience a more than three-fold increase by 2023. As
a result, the mobile cellular network is becoming the main gate for
accessing the Internet. In order to handle the ever-increasing rise in
mobile data, telecommunications operators have to manage perfectly
their available resources. Therefore, having the capability for careful
and continuous monitoring and analysis of the network traffic is an
essential step towards network routine operations and optimization
tasks. Generally speaking, in the context of mobile networks, one can
categorize the applications of traffic data analytics into nine major cat-
egories: (I) traffic classification and prediction, (II) apps identification,
(III) usage study, (IV) security purposes (e.g., leakage and malware
detection), (V) user action identification, (VI) operating system (OS)
identification, (VII) location-based services, (VIII) user fingerprinting,
and (IX) website fingerprinting. The interested reader is referred to [41]
for the details of all applications.
Traffic classification is a remarkable example for the applications of
the network traffic data analytics. Traffic classification plays a key role
in different network administration tasks, such as QoS provisioning,
network pricing, intrusion detection, etc. [42]. The unique character-
istics of modern mobile networks, such as complexity, dynamics, and
heterogeneity, calls for designing traffic classification methods that can
handle the new challenges, e.g., concept drift, imbalanced classes, and
traffic classification in an almost instantaneous fashion (real-time or
near real-time). Instantaneous traffic classification is especially impor-
tant for security-related applications, in which a classifier should learn
continuously and detect threats in real-time or near real-time. That
is why traditional offline learning algorithms are ineffective tools for
these purposes as explained in the following subsection.
4.2. Challenges of offline learning in network traffic streams analytic
Regarding the characteristics of data of new networking paradigms,
challenges of using batch/offline learning algorithms in the networking
domain include:
•Dramatic growth in volume of data: In terms of mobile net-
works, the past decade has been experienced a dramatic increase
in mobile devices (e.g., smartphones and tablets). These devices
generate (send or receive) an enormous volume of data, such as
video, image, text, and location-based information. To analyze
such a huge amount of data, one may capture data traffic at
different network layers (e.g., the application or data link layer)
or from different sources (e.g., within wireless WAN and within
devices). In this case, storing and organizing such volume of data
Computer Networks 207 (2022) 108836
5
A. Shahraki et al.
Table 1
Properties of stream processing frameworks.
No. Framework Core
Language
Type Source Raised
from
Integration
opportunity
by
Update
status
Pros Cons
Open
Closed
1 Apache Apex Java Engine Apache Hadoop
YARN
Retired •Modular and highly
scalable
•Simple API
•Low latency processing
•Fault tolerant and secure
•Restricted support for
SQL
•No longer extensively
used
•Incompatible with other
tools
2 Apache Kafka Java/Scala Library Apache – Ongoing •Low latency and high
throughput
•Fault tolerant and
scalable
•Handling real-time data
pipeline
•Easy data accessibility
•Message tweaking-related
performance degradation
•Lack of sufficient
collection of monitoring
and managing tools
3 Apache Flink Java Engine Apache YARN,HDFS,
and HBase
Ongoing •Supports both batch and
stream processing
•Low latency and high
throughput
•Fault tolerant
•Support for iterative
programs
•No need to manual
optimization
•An almost new
framework with fewer
production deployments
than its counterparts
•Some challenges with
scaling
4 Apache
Pulsar
Java Data pipeline Apache – Ongoing •Horizontally scalable
•Low latency and load
balancer
•Transparent and
consistent
•Easy deployment
•Small community
•Relatively high
operational complexity
•No transactions
5 Apache
Storm
Java/Clojure Engine Apache/
BackType
– Ongoing •Providing true real-time
processing
•Flexibility in using for
different scenarios
•Extensive language
support
•Scalable and reliable
•Low latency and fault
tolerant
•High implementation
complexity
•Does not provide
ordering of messages
6 Aeron Java/C++ Toolkit – Simple
Binary
Encoding
Ongoing •Low latency and high
throughput
•High reliability and
transparency
•Easy to monitor
•Lack of community
•Lack of data encryption
7 Apache
Heron
Java Engine Apache/
BackType
Apache
Mesos
Ongoing •Low processing latency
•High throughput
•Stability
•Dependency on Apache
Mesos
8 Databus Java Data pipeline – LinkedIn Ongoing •Source consistency
•Scalability and high
availability
•Low latency
•Source-independent
•Process of seeding the
Bootstrap database can be
challenging when dealing
with large datasets
•Oracle fetcher
performance degradation
in some cases
9 Kinesis
Streams
Java Engine Amazon Amazon’s big
data toolset
–•Robust and scalable
•Integration ability with
other Amazon’s big data
tools
•Easy to set up and
maintain
•Cost of cloud service
10 Cloud
Dataflow
Java, Python,
SQL, Scala
Engine Google – – •Reliable and consistent
•Easy management and
operations
•Flexible scheduling and
autoscaling
•Should pay
•Bound to Google tools
•Not good for
experimental data
processing scenarios
(continued on next page)
Computer Networks 207 (2022) 108836
6
A. Shahraki et al.
Table 1 (continued).
No. Framework Core
Language
Type Source Raised
from
Integration
opportunity
by
Update
status
Pros Cons
Open
Closed
11 MOA Java Engine University of
Waikato
– – •Growing community
•Supports different data
stream algorithms
•Graphical interface
•Benefits from
object-oriented techniques
•Easy to use or to extend
•Does not support parallel
computing
12 Benthos Go Library – Meltwater Ongoing •Benefits from powerful
mapping language
•Easy to monitor and
deploy
•Continuous and stable
development
•Vertical and horizontal
scaling
•Message delivery
guarantee
•Need for additional
implementation of tooling
if the custom processors
are stateful
•The effect of the buffer
resiliency method on
delivery guarantees
13 NATS
streaming
Go Data pipeline Synadia Cloud Native
Computing
Foundation
–•Reliable and lightweight
•Simple and secure
•Auto discovery
•Lack of appropriate
authentication
•Large message sizes
•Superficial context
integration
14 NSQ Go Data pipeline StatsD – Ongoing •Fault tolerant and high
availability
•Easy configuration/
deployment
•Low latency and
horizontal scalability
•Transport layer security
•Lack durability and
replication
•No messages order
guarantee
15 StreamAlert Python Library Airbnb AWS Ongoing •Customizable
•Broad community rules
•Secure by design
•Simple and safe
deployment
•Highly scalable
•Some security and
multitenancy concerns as it
is a serverless framework
•Challenge with
integration testing
16 Wallaroo Python Engine Wallaroo Labs – Ongoing •Portable and easy to
deploy
•Low latency and high
performance
•Scalable and ultrafast
•Low-latency and
high-performance data
processing
•Portable and easy to use
•Scalable
17 Streamparse Python Domain Specific
Language (DSL)
– Apache
Storm
Ongoing •Highly scalable and
parallel
•Let users to write your
Apache Storm components
in Python
•A more robust option
than Python
worker-and-queue systems
•Does not provide
windowing API
•Challenge with internal
architecture
•Some limitations of
Apache Thrift-based
interface used for sending
topology to Storm
18 Faust Python Engine Robinhood
Markets
Robinhood Ongoing •Highly available and
flexible
•Providing both stream
and event processing
•Supporting other Python
libraries when stream
processing
•Small community
19 IBM Streams Python, Java,
Scala
Software
platform
IBM – – •Support a wide range of
streaming data
•Rich data connections
•User friendly
•Integration ability with
business process
automation
•Scalability
•Exception handling
challenges
•Cost and platform
availability
(continued on next page)
process purposes would be a great challenge when using offline
techniques.
•Shortage of compatible offline learning techniques with the
essence of data: Another major challenge towards traffic data
Computer Networks 207 (2022) 108836
7
A. Shahraki et al.
Table 1 (continued).
No. Framework Core
Language
Type Source Raised
from
Integration
opportunity
by
Update
status
Pros Cons
Open
Closed
20 LightSaber C++ Engine Imperial College,
Graphcore
Research
– Ongoing •Creating balance between
parallelism and
incremental processing
•Operator optimizations
•NUMA-aware scheduling
•Scalability and low
latency
•Small community
21 StreamDM Scala Online machine
learning
Huawei Spark
Streaming
Ongoing •Supports various models
•Fault tolerant and ease of
use
•Fast data processing
•Independent of
third-party libraries
•Ease of extensibility
•Latency of the order of
seconds
•Complexity in set up and
implementation
•Issues in combining batch
with streaming processing
algorithms
22 Yurita Scala Online machine
learning
PayPal Spark
Streaming
Rarely
update
•Simple and flexible
•Supports different use
cases across domains
•Take advantage of
built-in generic base
anomaly detection models
and plugin custom models
•Learning a new language
•Small community
23 Akka Scala Library Lightbend – Ongoing •Scalable and high
performance
•Transparent distribution
•Supports clustering
•Supports streaming data
•Conceptually complex
•Implementation difficult
to read
24 Azure Stream
Analytics
.NET Engine Microsoft – – •Rapid scalability
•Elastic capacity
•Robust, reliable, ans
secure
•Low cost and available
•Problem in handling
misformatted data
•Incompetent to join
dynamic data
analytics is the models we use for the analysis. Indeed, in the
sequel, we highlight the fundamental weaknesses in the tradi-
tional (batch/offline) learning methods when they are used in
communication systems and networks. Traditional batch/offline
machine learning algorithms in which the whole data is available
at the time of training, are no longer applicable to this data [18].
This is mainly due to the fact that the scale and speed of data
generation in modern communication systems and networks are
big and streaming, respectively. As a result, we will have more
and more raw and unprocessed traffic data. In addition, offline
learning algorithms do not continuously feed new training ex-
amples into current built models to catch up with the dynamic
nature of communication systems. Instead, they often retrain or
rebuild new models which can be an expensive task in terms
of time, memory, and computation. Moreover, using offline ma-
chine learning algorithms for mining modern traffic data may
result in out-of-date models or models that work in particular
scenarios [43].
•Shortage of available resources to process stored data: As
mentioned, the rate of data generation in communication systems
and networks is high and ever-increasing. In offline machine al-
gorithms, collected data (stored data) is usually used for training
and evaluation purposes. However, this rapid and continuous big
data generation presents serious problems for our storage and
computational resources. In addition, a considerable portion of
traffic data exists in the form of streams. Bringing such a huge
amount of streaming traffic data to the main memory of devices,
especially low-cost and low power memory IoT devices [44], is
almost impossible and infeasible.
•Concept Drift problem: Another commonly referred problem
with offline machine learning consists is the fact that many traffic
data analytics tasks assume a static/stationary environment or
dataset. However, in many cases this assumption is not true, espe-
cially when we consider a relatively long period. This is arguably
explained by the fact that the patterns of generated data can
change over time (concept drift) due to the different reasons, such
as using new communication protocols, running new applications,
installing new devices, and dynamic characteristic of the network
(e.g., IIoT environments) [45]. From a security perspective, in
the communication systems and networks, malicious users and
software may design new types of attacks or change the previous
ones every time to deceive the predictive models. However, the
traditional offline machine learning models are not able to handle
concept drift [42].
•Lack of labeled data in real-world applications: Supervised
offline learning algorithms are dependent to labeled training
samples. More specifically, most of these algorithms assume that
suitable labeled training samples are available priorly. This as-
sumption can affect the functionality of these algorithms in real-
world applications; when the amount of available labeled data for
training purposes is small, or the chosen training instances are not
representative [46]. Hence, supervised offline learning algorithms
need a large amount of high quality labeled data to deliver a good
performance.
Given the challenges associated with using offline learning algo-
rithms, and also the importance of performing a careful analysis on
online traffic data, in the next sections, we discuss the applications and
advantages of using OL algorithms towards traffic data analytics.
Computer Networks 207 (2022) 108836
8
A. Shahraki et al.
4.3. The advantages of OL for traffic data streams analytics
To overcome the weaknesses of offline learning algorithms, we
need sequential data processing techniques which are able to learn
from traffic data streams. The benefits of adopting such techniques are
twofold; first, reducing the cost of learning in terms of memory, storage
space, and maintenance, and second, having ever up-to-date model,
because the model uses new arriving instances to tune its parameters
and structure [18]. OL is a suitable candidate to meet this need since
OL-based algorithms continually incorporate new training instances
into their learning process, and intrinsically try to reduce the processing
time and space. Furthermore, OL algorithms demonstrate great abilities
to continually process traffic data in large-scale (Big Data) and real-time
communication systems [5,47]. OL has risen to the challenge of large-
scale (Big Data) learning, the presence of data streams, and effective
learning from a few instances.
OL algorithms become an important element in interactive schemes,
in which a human agent provides training instances over time. In
the literature, this is also known as stream-based or sequential active
learning [48,49]. In this strategy, training instances can be selected
based on their level of informativeness, and consequently, reducing the
need for a huge amount of labeled traffic data in traffic classification
and prediction tasks. Moreover, OL algorithms, e.g., Hoeffding Adaptive
Tree (HAT), are really fit for traffic classification due to the fact that
OL algorithms get a network flow one at a time and process it only
one epoch. Hence, OL eliminates the need for storage space and is
prepared to predict at any moment [43]. Non-conventional types of
traffic/services, such as Peer to Peer (P2P) and online games, further
exacerbate the traffic classification task. P2P-based applications and
many others employ dynamic port assignment and port masquerading
techniques that which traffic classification difficult. To manage the
dynamic environment of modern communication systems and networks
(e.g., P2P-based services), OL algorithms are suitable, as they are able
to adapt themselves to network state, handle concept drift, and unseen
data. It should be noted that concept drift can also be a challenge
when one analyzes data streams as the incoming streams are not always
strongly stationary (i.e., non-stationary streaming data) [50]. Hence, in
such cases, there is a need to identify these concept drifts in the streams
in a good time [51].
As mentioned, traffic data in modern networks has two character-
istics in common, including big volume and diversity. This means that
big traffic data in smart devices, e.g., IoT devices and smartphones, is
continually produced at a high speed, while the patterns of data may
change over time (concept drift). OL algorithms are a powerful tool
for dynamic big traffic data mining to analyze data patterns and learn
informative features from them. This is mainly due to the fact that OL
algorithms do not depend on the retraining process to learn from new
instances or historical instances towards achieving the goals of model
updating or preservation [28]. This property of OL is paramount of
importance to effective big data representation learning, especially in
real-time mobile network-based applications.
OL is also extremely beneficial to IoT, especially industrial IoT.
OL has attracted lots of interest in the context of IoT due to its fast
learning speed, simplicity, strong generalization, and requiring few
background statistical assumptions [52]. OL algorithms are capable
of scaling well to large-scale communication systems which generate
a massive amount of data, and especially become fit to real-world
services/applications where data is received continually.
Despite the overwhelming advantages of OL over traditional offline
learning, OL comes with some challenges. In the next section, we
discuss the major challenge associated with using OL, especially in
communication systems and networks.
5. Challenges of online learning in network traffic stream analyt-
ics
Although OL has theoretically many advantages in this case, there
are some challenges that should be addressed to apply OL in the net-
work traffic stream analytics domain. Compared to traditional offline
machine learning algorithms, designing online learning algorithms that
can learn from big streams data is more challenging. This is arguably
explained by the fact that OL algorithms should learn from a single
training instance at a time, and hence, it calls for designing more
advanced training processes [53]. Moreover, streaming data generated
by non-stationary or dynamically changing systems, e.g., IoT and IIoT
systems, have non-stationary distribution (concept drift). Regarding the
fact that in OL only a single (or limited) training instance(s) is available
at a time, identifying such changes and adopting methods for handling
these changes is challenging. In addition, in many supervised learning
tasks, such as network traffic classification, class labels do not exist
equally (imbalanced classes) as there are minority/majority classes.
Imbalanced classes make the training process difficult and add more
complications to the data status [27,54]. Concept drift and imbalanced
classes challenges have been discussed in great detail in the sequel as
the most important challenges of OL in network traffic stream analytics.
5.1. Concept drift
Concept drift refers to changes in the distribution of data gener-
ated over time, especially in non-stationary and dynamically changing
environments, such as IoT [55]. More specifically, concept drift is
a problem in supervised OL algorithms in which the statistical re-
lationships between input instances 𝑋and target value 𝑦changes
over time in an unpredicted manner [56]. Note that, as mentioned
in Section 3.1, we use the provided learning framework (i.e., Scikit-
multiflow) to understand and analyze concept drift, especially in the
context of communication systems and networks. Communication sys-
tems and networks are prone to concept drift occurrence, since they
generate data streams continuously, while the data changes or evolves
over time. As a result, traditional offline learning algorithms must be
retrained periodically in order to adapt themselves with changes in the
traffic data, but at the cost of more time, memory and computation
overhead [57].
There are different types of drift, depending on what elements of
data is changing. The major types of concept drift include:
•Virtual drift: Also known as covariate shift, referring to the
situation where changes occur in the input instances distribution
𝑝(𝑋), while the posterior probabilities of target values remain
unchanged.
•Real drift: Changes in the posterior probabilities of target values
(i.e., classes) 𝑝(𝑦|𝑋)are referred as real drift. Real drift may not
affect the distribution of the input instances. As an example, one
can refer to the changes in users’ interests when they follow online
streaming news channels, while the distribution of the receiving
news items often remain unchanged, as shown in Fig. 2.
•Gradual and abrupt drift: Regarding the speed, drift can be
categorized into gradual and abrupt. Gradual drift represents the
case when the distribution of data change gradually over time,
whereas abrupt drift may happen when a change in distribution
of data occurs suddenly as Fig. 3 shows.
There are several other types of drift, such as recurrent and incre-
mental (or stepwise) drift. The interested readers are referred to [58]
for a detailed categorization of concept drift. Concept drift can greatly
affect the performance of the predictive model, especially when the
model learns from streaming data [59]. A wide range of services/
applications in the context of communication systems and networks
can be hindered by concept drift such as Intrusion Detection Systems
(IDSs) [60,61], traffic classification [62], traffic prediction [63], and
Computer Networks 207 (2022) 108836
9
A. Shahraki et al.
Fig. 2. A graphical illustration of two concept drift types. Circles and stars represent different classes.
Fig. 3. Speed of changes over time (Gradual and Abrupt drift).
IIoT [40]. For example, one can refer to Condition-Based Maintenance
(CBM) techniques in IIoT used for the prediction of abnormal condi-
tions and maintenance time through the IIoT data analytics. Concept
drift significantly affects the performance of CBM, and consequently
reduces product quality. This is arguably explained by the fact that the
distribution of fault patterns may change over time due to machines
aging and maintenance process, for example. Hence, a CBM technique
without the ability to handle concept drift will perform poorly. Indeed,
concept drift can affect the effectiveness and robustness of data streams
analytics [64].
Concept drift is also problematic for other IoT applications, such
as smart cities. In smart cities, data may be collected for different
reasons, such as ensuring cybersecurity, air-pollution prediction, road
traffic prediction, and electricity load forecasting. However, as time
passes, unforeseen changes may occur in the collected data (concept
drift), and consequently, it poses serious challenges to the accuracy of
the predictive models (e.g., anomaly detection [35] and traffic classifi-
cation [65]).
In non-stationary environments, there are some considerations that
the predictive models must take into account in order to detect and
adapt themselves to concept drift, otherwise the performance of these
models will deteriorate in terms of accuracy and robustness. As time
goes on, a predictive model may require to update its parameters
and structure by incorporating new training instances or completely
replacing the old model to handle concept drift. In this case, the
most important considerations for the predictive model include: (i)
identifying concept drift and employing an adaptive method if needed,
(ii) differentiating concept drift from noise, and (iii) being robust to the
noise.
A large body of the online machine learning literature investigates
the adaptive algorithms for the concept drift problem. In the next
section, we review and discuss the most significant ones.
5.1.1. State-of-the-art learning algorithms in the presence of concept drift
As mentioned, there is a broad category of adaptive algorithms
for learning in dynamically changing environments. A large number
of these algorithms are designed for handling specific types of drift,
while a limited number of the proposed algorithms are able to deal
with multiple types of drifts. Moreover, there is another group of algo-
rithms that considers both concept drift and imbalanced data problems
simultaneously. Note that our aim is not to provide a comprehensive
list of available drift adaptive learning algorithms, techniques and
mechanisms, but to review and discuss the main approaches for concept
drift handling, especially from a networking perspective. The interested
reader may refer to [53,66], and [17] for a more comprehensive list of
algorithms for dealing with concept drift.
Generally speaking, one can classify the algorithms for handling
concept drift into two major categories: (i) active or trigger-based
approaches, and (ii) passive approaches or adaptive classifiers. The
algorithms based on active approaches try to detect the occurrence of
concept drift before taking any measure. The passive-based algorithms,
on the other hand, aim at updating the predictive model continuously,
once they receive new data, without investigating on the presence of
concept drift. Both passive- and active-based algorithms use their own
techniques to build a frequently updated predictive model. Regardless
of being active or passive, one of the main considerations in selecting an
algorithm for handling concept drift is the target application. In other
words, when one tries to choose an algorithm for learning in highly
dynamic environments, it is crucial for the success of the learning
algorithm to take into account the dynamics of the learning process
(e.g., the speed of drift) [66].
Several adaptive learning methods have been proposed in the lit-
erature to solve the concept drift problem, such as ensemble methods
and trees-based methods. Ensemble methods, e.g., Adaptive Random
Forest (ARF) [67], online boosting ensemble [68], and additive ex-
pert ensemble [69], often fall into passive approaches. However, a
few active-based ensemble methods are also introduced in the litera-
ture [70,71]. One of the main problems with ensemble models is the
high computational cost, especially for environments with big data
streams. Nevertheless, ensemble learning is among the most popular
methods for learning in dynamically changing environments. This is
mainly due to the fact that the online ensemble-based methods provide
clear advantages over their competitors (i.e., single modes), including:
(i) ensemble-based models often give more accurate predictions than
single models through improving the performance of the learners in
terms of robustness and variance, (ii) ensemble models, by their nature,
are able to involve new arriving data instances in a learning process, by
increasing the number of members in the ensemble, and (iii) ensemble-
based methods are capable of forgetting non-representative information
via precluding the corresponding learner(s) [72,73]. Note that the items
(ii) and (iii) can also solve the stability–plasticity dilemma to some
extent (see Section 5.3).
Regarding the significant advantages of online ensemble learning,
this method has attracted much interest in the field of communication
Computer Networks 207 (2022) 108836
10
A. Shahraki et al.
systems and networks for different purposes. For example, dynamic Ad-
aBoost.NC with multiple subclassifiers for imbalance and drifts (DAM-
SID) has been proposed to tackle concept drift and imbalanced data
problems in big IIoT data streams analytics [40]. In some works,
the performance of ensemble-based algorithms such as ARF, Online
Accuracy Updated Ensemble (OAUE), OzaBag, and OzaBoost, in terms
of detection accuracy has also been compared with other online meth-
ods [43]. Moreover, an ensemble of Deep Learning (DL) algorithms
is used for security purposes, e.g., IDSs [74]. Further, an ensemble
of DL algorithms is used for security purposes in [74]. In this work,
an ensemble of autoencoders has been proposed for IDS in an online
manner.
Due to the fact that the online ensemble methods are good choices
for learning in non-stationary environments (e.g., IoT-based systems),
ensemble learning has been used for activity recognition in an effec-
tive online manner [75]. When one employs an ensemble learning
method, the occurrence of concept drift may exert negative effects on
only a small piece of previously acquired knowledge by the models.
Some ensemble-based methods such as Dynamic Weighted Majority
(DWM) [76] and Accuracy Weighted Ensemble (AWE) [77] are a good
fit for industrial and control systems data analytics, as they need
predictive models with small error rates.
Another major group of adaptive learning methods for dealing
with concept drift is tree-based models. In most cases, the tree-based
methods are used as a single model. The single models are more
computational cost-effective than the ensemble models, which makes
them appropriate to be used in big streaming data analytics. Examples
of tree-based algorithms include Hoeffding Tree or Very Fast Decision
Tree (VFDT) [78], Hoeffding Adaptive Tree [79], and Extremely Fast
Decision Tree (EFDT) [80]. VFDT is an incremental learning algorithm
that can learn from a huge amount of streaming data and it is among
the most popular tree-based methods. This algorithm assumes that the
distribution of the incoming data instances does not change as time
goes on. To overcome this weakness, Hoeffding Adaptive Tree has
been proposed that uses the adaptive windowing (ADWIN) algorithm
to deal with concept drift [81]. Another improvement over the original
Hoeffding Tree is EFDT algorithm that achieves performance in terms
of prequential accuracy [80]. This algorithm also can handle changes
in the data distribution.
In the context of communication systems and networks, tree-based
learning algorithms have been used, especially for classification tasks.
For example, VFDT algorithm has been adopted for dynamic online
network traffic classification [82]. In [83] a novel ensemble learning
method has been proposed for anomaly detection, in which the VFDT
algorithm is responsible for traffic classification as a module of the
proposed method. The capabilities of Hoeffding Adaptive Tree (HAT)
for IDSs have been investigated in [84]. When traditional data analyt-
ics techniques run into problems with the sheer increase in network
traffic, HAT can be used for processing big traffic data streams. This is
especially important for the detection of newly designed cyber-attacks,
since the malicious users are continuously designing new threats.
The tree-based learning methods have also gained attention in the
IoT-based industrial applications, e.g., Wide Area Monitoring Systems
(WAMS), for performing classification tasks, such as event and intrusion
classification [85]. Such systems generate big and high-velocity sensor
data that will be an important asset if one uses appropriate techniques
to analyze this data. Fortunately, OL learning algorithms, especially
tree-based methods have great abilities to support such massive data
streams, even in the presence of concept drift.
5.2. Imbalanced classes
There are many applications, e.g., IDSs and real-time network moni-
toring systems, in which the learning model has to learn from incoming
data streams with skewed class distributions, also called imbalanced
classes [27]. In such applications, acquiring class labels for some of data
is challenging or costly, referred to as minority class (or classes). As a
prime example, one can refer to the spam and faults class. Imbalanced
classes can run the learning process into serious difficulties, e.g., learn-
ing performance degradation and poor generalization. This is mainly
due to the fact that in the presence of a majority class (or classes), OL
algorithms tend to ignore or overfit the minority class (or classes) [54].
As a result, there are many classification tasks in which the classifi-
cation accuracy for majority class is quite high (e.g., around 100%),
while the accuracy for the minority class is uncanny (e.g., between 0%
to 10%) [86].
Learning from imbalanced data is much more challenging in OL
than in the traditional offline learning due to the fact that in OL the
learner does not know in advance which class should be considered as
the minority class. Moreover, the learner faces uncertainty when the
imbalanced class appears. In addition, the coexistence of concept drift
and imbalanced class introduces a further difficulty in OL, because it is
challenging to recognize the status of class as imbalanced, while the
underlying distribution changes as time goes on. As an example, in
IIoT, the industrial monitoring systems are prone to fault, especially in
the presence of aging machines. In this situation, the balance of fault
and non-fault classes may change over time. Furthermore, when two
problems occur simultaneously, one problem may hinder the approach
addressing other. As a prime example, some drift detection techniques
are sensitive to class imbalance since they are based on classification
error, and consequently are not sufficiently effective [53].
Imbalanced class problem causes serious challenges for the commu-
nication systems and networks. Industry 4.0 era has made dramatic
advances in the production lines and industrial practices, and hence,
the industrial equipment has become extremely precise, robust, making
rare faults during operation. As a result, the number of data instances
with fault label becomes less and less (i.e., the minority class) than the
regular data instances (i.e., majority class). Such an imbalanced class
causes severe difficulties in discriminating faults from the regular data.
Moreover, the imbalanced class problem has affected big traffic data,
leading to inaccurate classification and poor decision-making [87].
There are some considerations that should be taken into account be-
fore providing a treatment for the imbalanced class problem, including:
•The degree of class imbalance: Since there is still no agreement
on the way to define the degree of class imbalance in the litera-
ture, in some works, such as [88,89], a measure named degree
of imbalance is proposed for this purpose. Metrics such as the
number of the minority class instances in the data [90], and the
population size ratio between minority and majority classes [91]
have also been used. The lack of agreement in this case requires
some metrics to provide an updated status of degree of class
imbalance.
•Understanding the seriousness of class imbalance occurrence: As
discussed in this study, imbalanced class is not the only challenge
that affects the performance of online learners. The complexity
(e.g., class overlapping) and scale of incoming data (e.g., a wide
domain) can also increase or decrease the intensity of the im-
balanced class problem. Class overlapping is an important issue
in which the data instances seem to belong to more than one
class [92]. The degree of class overlapping can exacerbate the
adverse effect of the imbalanced classes problem. With regard to
the scale of incoming data, one can refer to this fact that when
dealing with a problem that has a very wide domain, there is a
strong possibility that the minority class will be entered into the
model by a high number of instances, and hence, may decrease
the negative effect of the imbalanced class.
5.2.1. State-of-the-art solutions in the presence of imbalanced class problem
There is a large body of the OL literature that has studied the
imbalanced class problem, such as [93,94]. In comparison to stationary
environments, the imbalanced class is a less-investigated problem in
Computer Networks 207 (2022) 108836
11
A. Shahraki et al.
non-stationary environments. Regarding the non-stationary environ-
ments, bagging of uncorrelated trees is one of the earliest algorithms
that focuses on class imbalance learning in non-stationary environ-
ments [95,96]. This algorithm is based on the idea that combining
outputs from an ensemble of learners will achieve better performance
if the sub-learners’ outputs are uncorrelated or have low correlation.
To learn from non-stationary imbalanced data streams, Selectively
Recursive Approach (SERA) [97] and Recursive Ensemble Approach
(REA) [98] algorithms also use a similar idea. Learn++.NSE-SMOTE
(Synthetic Minority class Oversampling Technique) is another algo-
rithm in non-stationary environments with class imbalance that is not
dependent on the historical data instances [99].
There are some techniques for solving the mentioned problem re-
gardless of being online or offline. Generally speaking, the techniques
dealing with the imbalanced class problem can be classified into two
main categories: (i) data level, and (ii) algorithm level techniques.
A long list of resampling techniques and training data manipulation
methods fall under the umbrella of data level algorithms. To achieve
the right balance between the minority and the majority classes, data
level algorithms may undersample the majority class instances or over-
sample the minority class instances and/or adopt a combination of
both. Examples of resampling techniques include SMOTE, Borderline-
SMOTE, and MWMOTE [100]. Unlike the data level techniques, the
algorithm level techniques try to tackle the imbalanced class prob-
lem by modifying the learning mechanism in learning-based tasks.
Indeed, these algorithms aim to provide higher accuracy in the minority
class. To this end, the techniques such as cost-sensitive learning [101],
threshold-based methods (e.g., threshold-moving) [102], and one-class
classification [103] have been proposed.
5.3. Stability–plasticity dilemma
Another well-known challenge for OL algorithms is the stability–
plasticity dilemma [104]. More specifically, the stability–plasticity
dilemma refers to the situation in which the learner should make a
trade-off between plasticity—the incorporation of new instances—and
stability—trying not to forget the previously learned knowledge. If the
learner increases its plasticity, it might forget the previously learned
knowledge. On the other hand, if the learner increases its stability,
it might decrease the reactivity of the learner. The stability–plasticity
dilemma is a serious problem in noisy environments, such as IIoT
systems, where concept drift is most likely to occur. OL algorithms
that are specially designed for dealing with concept drift may provide
a solution to this dilemma only during a specific period of time
when concept drift happens. The situation will be much worse for OL
algorithms that are not equipped with mechanisms to handle concept
drift due to catastrophic forgetting [105].
5.4. Model complexity estimation
The model complexity estimation is another frequently mentioned
challenge in OL. The model complexity of an algorithm is often mea-
sured by the number of parameters (e.g., window size and number
of hidden nodes) needed for the representation. This is mainly due
to the fact that algorithms may be written in different programming
languages, and also these algorithms may differ in efficiency [106]. In
OL, model complexity should be considered as a variable measure due
to the fact that it is extraordinarily difficult to provide an estimation of
the model complexity in advance of the availability of all data. Since
the incoming data of OL algorithms is unknown, the model complexity
may increase, if a concept drift happens. When one uses OL algorithms
in resource-constrained environments, such as IoT systems, intelligent
resources reallocation should be considered as the degree of model
complexity is limited by the available restricted resources.
From all challenges considered, the concept drift and the imbalance
of the classes are the most critical challenges in OL. Hence, in the next
section, we discuss these challenges in greater detail.
6. Data description and empirical evaluation
To show the general performance of OL, we have assessed the per-
formance of different ensemble- and tree-based algorithms for network
traffic classification. It is noteworthy that our evaluation is not aimed to
propose a new method, but we evaluate the OL techniques in a general
manner. We used Scikit-multiflow, a Python package for streaming data
analytics [107]. For ensemble-based algorithms, we selected Adaptive
Random Forest, Very Fast Decision Rules, Online Boosting, Additive Expert,
and Oza Bagging. We also chose the following three algorithms: VFDT,
HAT, and Extremely Fast Decision Tree. As the reasons for selecting
these algorithms, they are the most used learning methods in the
streaming setting based on the literature, e.g., [67–69,69,108–110],
and there are strong theoretical and experimental supports for these
methods in both offline and online modes. For all algorithms, we used
the default settings. We chose Accuracy and Kappa as the evalua-
tion metrics. The Accuracy metric is introduced naturally for binary
classification tasks. Kappa or Cohen’s Kappa is an evaluation metric
for measuring the level of agreement between two critics, based on
Accuracy, however, it is modified for chance agreement [111]. In our
experiments, we used the WaveformGenerator module to generate data
stream using the benchmark datasets described below. Moreover, in
each training call we passed a batch of 100 samples to the algorithms.
The maximum number of samples for the algorithms is also determined
with regard to the total size of the given dataset. For testing the network
data stream classification scenarios, we used the prequential evaluation
method. In this method the samples are first used to test, and then to
train [112]. In other words, in this method, each instance serves two
purposes.
We used three well-known datasets to evaluate the performance
of the mentioned algorithms. UNSW-NB15 is a comprehensive dataset
for evaluating IDSs, which we have used to investigate the OL al-
gorithms [113]. The dataset contains 175,341 records for training
and 82,332 records for testing purposes. Each flow has 49 features
such as source IP address, source port number, transaction protocol,
record total duration, and so on. The different categories of attacks
(e.g., Fuzzers, Analysis, Backdoors, Denial of Service (DoS) Exploits,
Generic, Reconnaissance, Shellcode, and Worms) and normal records
have been considered in this dataset. The benign samples are labeled
with ‘‘0’’, while the malicious ones are labeled with ‘‘1’’.
NSL-KDD is another benchmark dataset used in our investiga-
tion [114]. The dataset covers a diverse and up-to-date common type
of attacks, such as DoS, Distributed DoS (DDoS), User to Root Attack
(U2R), and Probing Attack. In our experiments, we have used ‘KD-
DTrain+’ as a subset of the NSL-KDD dataset, which contains 125,972
records, with 41 features for each sample (e.g., protocol type, duration,
and flag).
The third dataset that we used in our evaluations is UNSW 2018 IoT
Botnet dataset [115]. The UNSW IoT Botnet dataset was introduced in
2018. The dataset has been proposed for evaluating IDS for IoT systems
through simulation of IoT traffic and different types of attacks. From
The UNSW IoT Botnet dataset, we have used the ‘10-best Training-
Testing’ split to evaluate the algorithms in our experiments. The split
contains 733,703 records, each of 10 features such as standard devi-
ation of aggregated records (stddev), average duration of aggregated
records (mean), and so on.
The performance of the ensemble algorithms is depicted in Figs. 4,
5, and 6. As can be seen in these figures, all ensemble-based algorithms
achieve impressive performance in terms of accuracy and Kappa. To
be precise, all the algorithms provide a mean accuracy that is greater
than or equal to 95% on the UNSW-NB15 dataset. The highest level
of accuracy is associated with Adaptive Random Forest with 0.99%,
whereas Additive Expert is 0.954% and gives the lowest level of ac-
curacy among the used ensemble learning algorithms. Regarding the
NSL-KDD and UNSW 2018 IoT Botnet datasets, the algorithms deliver
performance approximately similar to the UNSW-NB15 dataset; only
Computer Networks 207 (2022) 108836
12
A. Shahraki et al.
Fig. 4. The performance of five online ensemble-based algorithms on UNSW-NB15 dataset.
Fig. 5. The performance of five online ensemble-based algorithms on NSL-KDD dataset.
Additive Expert and Very Fast Decision Rules achieve an accuracy of
less than their counterparts. This may be due to the fact that if one
trains Additive Expert on a large amount of data, this algorithm has the
potential to create a large number of experts, and consequently leading
to inefficiency [69]. The low performance of the Very Fast Decision
Rules algorithm can also be due to the fact that the algorithm uses
decision trees to derive rules. Hence, in a dataset with a small number
of features, such as UNSW 2018 IoT Botnet, the algorithm has difficulty
in determining the best feature in the dataset in order to split the data
on. One of the most important notes about the ensemble algorithms is
that their performance becomes more stable after taking initial steps.
Figs. 7,8, and 9show the evaluation results of tree-based algo-
rithms. From these figures, one can argue that among the investigated
tree-based algorithms, Extremely Fast Decision Tree provides the best
performance in terms of Accuracy and Kappa on the UNSW 2018
IoT Botnet and UNSW-NB15 datasets (0.979/0.854 and 0.918/0.842,
respectively), than its counterparts, while the performance of Very
Fast Decision Tree is poor in these two datasets (0.9726/0.8079 and
0.709/0.465, respectively). Generally speaking, the comparison be-
tween ensemble- and tree-based algorithms reveals that the former
achieves better performance than the latter one in terms of classi-
fication accuracy. This good performance is linked to the nature of
ensemble methods that employ multiple learners to achieve better
predictive accuracy.
7. Open issues and future directions
In this section, we discuss the research directions that can alleviate
the discussed challenges in analyzing traffic data streams based on OL
algorithms.
•IoT data streams: notwithstanding the recent progress in OL al-
gorithms for massive traffic data streams, there are still issues that
need to be considered to ripen this learning approach in emerging
networking paradigms. As an example, IoT data characteristics,
such as heterogeneity, highly dynamic environments, noisy data,
and spatial–temporal correlation, can worsen the situation. The
big IoT data streams can raise the cost of learning in terms
of time, memory, and computation because a large number of
data instances and related attributes result in a very complex
OL model, and consequently increase the running time and the
performance degradation. Although the existing OL techniques
can relatively deal with the existing network data streams, current
evolution in networking paradigms are changing the essence of
network traffic streams resulting in more complicated data to
process, thereby the existing OL algorithms should be expanded
to deal with new types of data in terms of speed, complexity and
heterogeneity.
•Online learning for resource-constrained IoT devices: In some
networking solutions, e.g., 5G, 6G and ad-hoc networks, end
Computer Networks 207 (2022) 108836
13
A. Shahraki et al.
Fig. 6. The performance of five online ensemble-based algorithms on UNSW 2018 IoT Botnet dataset.
Fig. 7. The performance of three online tree-based algorithms on UNSW-NB15 dataset.
Fig. 8. The performance of three online tree-based algorithms on NSL-KDD dataset.
Computer Networks 207 (2022) 108836
14
A. Shahraki et al.
Fig. 9. The performance of three online tree-based algorithms on UNSW 2018 IoT Botnet dataset.
devices can be used to forward data by using D2D communication
technology. In some cases, e.g., NTMA [8], OL algorithms should
be performed on resource-constrained machines to monitor for-
warding data. Designing OL algorithms for resource-constrained
IoT devices presents new challenges, e.g., intelligent resources
reallocation, since the model complexity may increase in the
presence of concept drift. Although on-device machine learning
techniques are emerging [116], to the best of our knowledge,
there is no study in this field which makes it an open issue.
•Immaturity of existing OL techniques: Although OL-based al-
gorithms show impressive results in big data streams analytics,
there are still important limitations, such as concept drift and im-
balanced class, requiring advanced learning mechanisms, model
complexity estimation, and stability–plasticity dilemma. These
challenges can be more critical when OL is applied to network
traffic stream analytics.
•Less-investigated problems in OL: Concept drift and class im-
balance are the main problems in OL. In comparison to the offline
learning paradigm, however, these problems were not investi-
gated adequately. As shown in Section 5, there are some solutions
working on addressing the challenges, but it seems the proposed
solutions are not successful to address the challenges, especially
in case of networking issues. For example, there is still lack of
agreement on the way to define the degree of class imbalance.
Designing models that can address the challenges mentioned in
Section 5can be considered as an open issue.
•The need for OL in recent IoT applications: One can refer to
various technologies such as Unmanned Aerial Vehicles (UAV)
and mobile robots as emerging IoT applications. UAVs can be
used to realize massive IoT connectivity, particularly remote re-
gions, or in disaster areas. Moreover, UAVs have practical com-
puter vision-based applications in smart farming, infrastructure
investigation, and environment monitoring. Nevertheless, UAVs
have to deal with several challenges to be fully employed in the
IoT sector, including obstacle detection and collision avoidance,
energy-saving, and privacy policy issues. The combination of OL
with Spiking neural networks, the so-called low-power ML [16],
and Tiny ML can significantly impact the energy consumption and
the prediction accuracy.
•Distributed OL algorithms: The traditional OL algorithms often
run sequentially and on a single device with limited resources
in memory, power, computation, and bandwidth. At the same
time, they have to deal with big data streams. Furthermore,
the available non-distributed data stream analytics frameworks
(e.g., Scikit-multiflow) face difficulties in scaling well to massive
arriving data streams. Distributed learning for data streams envi-
ronments is a novel computational paradigm that is able to tackle
these issues.
•Time-series datasets and OL: Time-series datasets are consid-
ered as a sub-type of network traffic streams, but they are used
broadly to present data in various types of networks and ap-
plications, e.g., Wireless Sensor Networks (WSNs) and NTMA
respectively. Existing literature, e.g., [19,117,118] shows that the
use of OL for time-series analysis is still in its infancy as main
body of the literature focuses on the time-series prediction aspect.
Although time-series prediction is an important research field,
other aspects of time-series analysis, e.g., trend change detection,
pattern recognition and anomaly detection should be considered
as future directions as they can be used in various fields of
network traffic stream analytics.
8. Conclusion
In this paper, we have investigated the OL paradigm from a net-
working point of view. OL has attracted considerable interest in the
literature and become a hot topic in the scientific community because
of its various practical use-cases in traffic data streams analytics. Com-
munication systems and networks with the help of OL can implement a
traffic data generator-consumer chain. More specifically, in this chain,
network devices or users produce unprocessed data that can be ana-
lyzed by OL algorithms. OL models then extract valuable knowledge
from the data that is important for decision making, QoS/QoE provi-
sioning, and building predictive models. Nevertheless, the performance
of OL algorithms can be affected by the imbalanced class and the non-
stationary nature of networking environments (i.e., concept drift). In
this paper, we discussed the properties of network traffic data and
its challenges for OL algorithms. In particular, we highlighted the
main characteristics of IoT data. Besides, we reviewed the problems
associated with traditional offline learning methods and their online
counterparts’ advantages. As the core of the study, we comprehensively
reviewed data stream mining algorithms and techniques in addition
to the data stream processing tools and frameworks. In particular, we
compared data processing tools and showed characteristics and pros
and cons of each stream processing tools. As a comparative study, we
evaluated the performance of some online ensemble-based and tree-
based algorithms to show that how online ML techniques can deal
with analyzing the network traffic streams. We also investigated the
challenges and identified future research directions towards using OL
for communication systems and networks.
Computer Networks 207 (2022) 108836
15
A. Shahraki et al.
CRediT authorship contribution statement
Amin Shahraki: Conceptualization, Methodology, Formal analy-
sis, Validation, Investigation, Resources, Visualization, Data curation,
Writing – original draft, Supervision, Project administration, Writing
– review & editing. Mahmoud Abbasi: Methodology, Formal analy-
sis, Investigation, Visualization, Validation, Software, Resources, Data
curation, Writing – original draft, Writing – review & editing. Amir
Taherkordi: Writing – original draft, Writing – review & editing,
Supervision, Methodology. Anca Delia Jurcut: Writing – original draft,
Visualization.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgment
This work was supported in part by the Norwegian Research Council
under Grant 262854/F20 (DILUTE Project).
References
[1] A. Shahraki, M. Abbasi, M. Piran, M. Chen, S. Cui, et al., A comprehensive
survey on 6g networks: Applications, core services, enabling technologies, and
future challenges, 2021, arXiv preprint arXiv:2101.12475.
[2] N. Javaid, A. Sher, H. Nasir, N. Guizani, Intelligence in IoT-based 5G networks:
Opportunities and challenges, IEEE Commun. Mag. 56 (10) (2018) 94–100.
[3] G.M.D.T. Forecast, Cisco visual networking index: global mobile data traffic
forecast update, 2017–2022, 2019, Update 2017, 2022.
[4] M. Abbasi, A. Shahraki, H.R. Barzegar, C. Pahl, Synchronization techniques in
‘‘device to device-and vehicle to vehicle-enabled’’ cellular networks: A survey,
Comput. Electr. Eng. 90 (2021) 106955.
[5] M. Mohammadi, A. Al-Fuqaha, S. Sorour, M. Guizani, Deep learning for IoT
big data and streaming analytics: A survey, IEEE Commun. Surv. Tutor. 20 (4)
(2018) 2923–2960.
[6] N. Abbas, Y. Zhang, A. Taherkordi, T. Skeie, Mobile edge computing: A survey,
IEEE Internet Things J. 5 (1) (2017) 450–465.
[7] W. Saad, M. Bennis, M. Chen, A vision of 6G wireless systems: Applications,
trends, technologies, and open research problems, IEEE Netw. 34 (3) (2019)
134–142.
[8] M. Abbasi, A. Shahraki, M.J. Piran, A. Taherkordi, Deep reinforcement learning
for QoS provisioning at the MAC layer: A survey, Eng. Appl. Artif. Intell. 102
(2021) 104234.
[9] A. Shahraki, H. Taherzadeh, Ø. Haugen, Last significant trend change detection
method for offline poisson distribution datasets, in: 2017 International Sympo-
sium on Networks, Computers and Communications, ISNCC, IEEE, 2017, pp.
1–7.
[10] A. D’Alconzo, I. Drago, A. Morichetta, M. Mellia, P. Casas, A survey on big data
for network traffic monitoring and analysis, IEEE Trans. Netw. Serv. Manag. 16
(3) (2019) 800–813.
[11] M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, Mining data streams: a review,
ACM Sigmod Rec. 34 (2) (2005) 18–26.
[12] A. Liu, J. Lu, G. Zhang, Concept drift detection via equal intensity k-means
space partitioning, IEEE Trans. Cybern. (2020).
[13] S. Ayoubi, N. Limam, M.A. Salahuddin, N. Shahriar, R. Boutaba, F. Estrada-
Solano, O.M. Caicedo, Machine learning for cognitive network management,
IEEE Commun. Mag. 56 (1) (2018) 158–165.
[14] A. Shahraki, M. Abbasi, Ø. Haugen, Boosting algorithms for network intrusion
detection: A comparative evaluation of real AdaBoost, gentle AdaBoost and
modest AdaBoost, Eng. Appl. Artif. Intell. 94 (2020) 103770.
[15] I. Lohrasbinasab, A. Shahraki, A. Taherkordi, A. Delia Jurcut, From statistical-to
machine learning-based network traffic prediction, Trans. Emerg. Telecommun.
Technol. (2021) e4394.
[16] J.L. Lobo, J. Del Ser, A. Bifet, N. Kasabov, Spiking neural networks and on line
learning: An overview and perspectives, Neural Netw. 121 (2020) 88–100.
[17] J. Gama, I. Žliobait ˙
e, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on
concept drift adaptation, ACM Comput. Surv. 46 (4) (2014) 1–37.
[18] V. Losing, B. Hammer, H. Wersing, Incremental on -line learning: A review
and comparison of state of the art algorithms, Neurocomputing 275 (2018)
1261–1274.
[19] S.C. Hoi, D. Sahoo, J. Lu, P. Zhao, Online learning: A comprehensive survey,
2018, arXiv preprint arXiv:1802.02871.
[20] H.B. McMahan, A survey of algorithms and analysis for adaptive on line
learning, J. Mach. Learn. Res. 18 (1) (2017) 3117–3166.
[21] S. Shalev-Shwartz, et al., Online learning and on line convex optimization,
Found. Trends Mach. Learn. 4 (2) (2011) 107–194.
[22] R. Ade, P. Deshmukh, Methods for incremental learning: a survey, Int. J. Data
Min. Knowl. Manag. Process 3 (4) (2013) 119.
[23] P. Joshi, P. Kulkarni, Incremental learning: Areas and methods-a survey, Int. J.
Data Min. Knowl. Manag. Process 2 (5) (2012) 43.
[24] M. Masana, X. Liu, B. Twardowski, M. Menta, A.D. Bagdanov, J. van de Weijer,
Class-incremental learning: survey and performance evaluation, 2020, arXiv
preprint arXiv:2010.15277.
[25] S. Madhavan, N. Kumar, Incremental methods in face recognition: a survey,
Artif. Intell. Rev. 54 (1) (2021) 253–303.
[26] A. Gepperth, B. Hammer, Incremental learning algorithms and applications,
2016.
[27] S. Wang, L.L. Minku, X. Yao, A learning framework for on line class imbal-
ance learning, in: 2013 IEEE Symposium on Computational Intelligence and
Ensemble Learning, CIEL, IEEE, 2013, pp. 36–45.
[28] Q. Zhang, L.T. Yang, Z. Chen, P. Li, Incremental deep computation model for
wireless big data feature learning, IEEE Trans. Big Data 6 (2) (2019) 248–257.
[29] A. Shahraki, Ø. Haugen, An outlier detection method to improve gathered
datasets for network behavior analysis in IoT, J. Commun. (2019).
[30] A. Shahraki, A. Taherkordi, Ø. Haugen, TONTA: Trend-based on line network
traffic analysis in ad-hoc IoT networks, Comput. Netw. 194 (2021) 108125.
[31] S. Duffner, C. Garcia, An on line backpropagation algorithm with validation
error-based adaptive learning rate, in: International Conference on Artificial
Neural Networks, Springer, 2007, pp. 249–258.
[32] J. Feng, C. Zhang, P. Hao, Online learning with self-organizing maps for
anomaly detection in crowd scenes, in: 2010 20th International Conference
on Pattern Recognition, 2010, pp. 3599–3602.
[33] M.K. Rafsanjani, A. Rezaei, A. Shahraki, A.B. Saeid, QARIMA: A new approach
to prediction in queue theory, Appl. Math. Comput. 244 (2014) 514–525.
[34] A. Shahraki, Ø. Haugen, Social ethics in internet of things: An outline and
review, in: 2018 IEEE Industrial Cyber-Physical Systems, ICPS, IEEE, 2018, pp.
509–516.
[35] R. Xu, Y. Cheng, Z. Liu, Y. Xie, Y. Yang, Improved long short-term memory
based anomaly detection with concept drift adaptive method for supporting
IoT services, Future Gener. Comput. Syst. (2020).
[36] S. Yang, Iot stream processing and analytics in the fog, IEEE Commun. Mag.
55 (8) (2017) 21–27.
[37] A. Shahraki, M. Geitle, Ø. Haugen, A comparative node evaluation model for
highly heterogeneous massive-scale internet of things-mist networks, Trans.
Emerg. Telecommun. Technol. 31 (12) (2020) e3924.
[38] P. Pop, M.L. Raagaard, M. Gutierrez, W. Steiner, Enabling fog computing for
industrial automation through time-sensitive networking (TSN), IEEE Commun.
Stand. Mag. 2 (2) (2018) 55–61.
[39] A. Taherkordi, F. Eliassen, Scalable modeling of cloud-based iot services for
smart cities, in: 2016 IEEE International Conference on Pervasive Computing
and Communication Workshops (PerCom Workshops), 2016, pp. 1–6.
[40] C.-C. Lin, D.-J. Deng, C.-H. Kuo, L. Chen, Concept drift detection and adaption
in big imbalance industrial IoT data using an ensemble learning method of
offline classifiers, IEEE Access 7 (2019) 56198–56207.
[41] M. Conti, Q.Q. Li, A. Maragno, R. Spolaor, The dark side (-channel) of mobile
devices: A survey on network traffic analysis, IEEE Commun. Surv. Tutor. 20
(4) (2018) 2658–2713.
[42] D.M. Divakaran, L. Su, Y.S. Liau, V.L. Thing, Slic: Self-learning intelligent
classifier for network traffic, Comput. Netw. 91 (2015) 283–297.
[43] V. Carela-Español, P. Barlet-Ros, A. Bifet, K. Fukuda, A streaming flow-based
technique for traffic classification applied to 12+ 1 years of internet traffic,
Telecommun. Syst. 63 (2) (2016) 191–204.
[44] Y. Bao, W. Chen, IL4IoT: Incremental learning for internet-of-things devices, in:
European Conference on Ambient Intelligence, Springer, 2019, pp. 92–107.
[45] H.R. Loo, S.B. Joseph, M.N. Marsono, Online incremental learning for high
bandwidth network traffic classification, Appl. Comput. Intell. Soft Comput.
2016 (2016).
[46] Z. Zhang, H. Shen, Application of on line-training SVMs for real-time intrusion
detection with different considerations, Comput. Commun. 28 (12) (2005)
1428–1442.
[47] P. Casas, A. D’Alconzo, T. Zseby, M. Mellia, Big-DAMA: big data analytics for
network traffic monitoring and analysis, in: Proceedings of the 2016 Workshop
on Fostering Latin-American Research in Data Communication Networks, 2016,
pp. 1–3.
[48] B. Settles, Active Learning Literature Survey, Tech. Rep., University of
Wisconsin-Madison Department of Computer Sciences, 2009.
[49] A. Shahraki, M. Abbasi, A. Taherkordi, A.D. Jurcut, Active learning for network
traffic classification: a technical study, IEEE Trans. Cogn. Commun. Netw.
(2021).
[50] R.V. Kulkarnia, S. Revathya, S.H. Patilb, An Empirical Study of on line Learning
in Non-stationary Data Streams Using Ensemble of Ensembles.
Computer Networks 207 (2022) 108836
16
A. Shahraki et al.
[51] H. Wang, Z. Abraham, Concept drift detection for streaming data, in: 2015
International Joint Conference on Neural Networks, IJCNN, IEEE, 2015, pp.
1–9.
[52] G. Li, Y. Shen, P. Zhao, X. Lu, J. Liu, Y. Liu, S.C. Hoi, Detecting cyberattacks in
industrial control systems using on line learning algorithms, Neurocomputing
364 (2019) 338–348.
[53] S. Wang, L.L. Minku, X. Yao, A systematic study of on line class imbalance
learning with concept drift, IEEE Trans. Neural Netw. Learn. Syst. 29 (10)
(2018) 4802–4821.
[54] H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data
Eng. 21 (9) (2009) 1263–1284.
[55] A. kishore Ramakrishnan, D. Preuveneers, Y. Berbers, Enabling self-learning in
dynamic and open IoT environments, Procedia Comput. Sci. 32 (2014) 207–214.
[56] J.C. Schlimmer, R.H. Granger, Incremental learning from noisy data, Mach.
Learn. 1 (3) (1986) 317–354.
[57] V. Carela-Español, P. Barlet-Ros, O. Mula-Valls, J. Solé-Pareta, An autonomic
traffic classification system for network operation and management, J. Netw.
Syst. Manag. 23 (3) (2015) 401–419.
[58] L.L. Minku, A.P. White, X. Yao, The impact of diversity on on line ensemble
learning in the presence of concept drift, IEEE Trans. Knowl. Data Eng. 22 (5)
(2009) 730–742.
[59] L.L. Minku, Online Ensemble Learning in the Presence of Concept Drift (Ph.D.
thesis), University of Birmingham, 2011.
[60] F. Breve, L. Zhao, Semi-supervised learning with concept drift using particle
dynamics applied to network intrusion detection data, in: 2013 BRICS Congress
on Computational Intelligence and 11th Brazilian Congress on Computational
Intelligence, IEEE, 2013, pp. 335–340.
[61] S. Saurav, P. Malhotra, T.V. Vishnu, N. Gugulothu, L. Vig, P. Agarwal, G. Shroff,
Online anomaly detection with concept drift adaptation using recurrent neural
networks, in: Proceedings of the ACM India Joint International Conference on
Data Science and Management of Data, 2018, pp. 78–87.
[62] G. Sun, T. Chen, Y. Su, C. Li, Internet traffic classification based on incremental
support vector machines, Mob. Netw. Appl. 23 (4) (2018) 789–796.
[63] M.F. Iqbal, M. Zahid, D. Habib, L.K. John, Efficient prediction of network traffic
for real-time applications, J. Comput. Netw. Commun. 2019 (2019).
[64] S. Liu, L. Feng, J. Wu, G. Hou, G. Han, Concept drift detection for data stream
learning based on angle optimized global embedding and principal component
analysis in sensor networks, Comput. Electr. Eng. 58 (2017) 327–336.
[65] Z. Liu, N. Japkowicz, R. Wang, D. Tang, Adaptive learning on mobile network
traffic data, Connect. Sci. 31 (2) (2019) 185–214.
[66] G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in nonstationary
environments: A survey, IEEE Comput. Intell. Mag. 10 (4) (2015) 12–25.
[67] H.M. Gomes, A. Bifet, J. Read, J.P. Barddal, F. Enembreck, B. Pfharinger,
G. Holmes, T. Abdessalem, Adaptive random forests for evolving data stream
classification, Mach. Learn. 106 (9–10) (2017) 1469–1495.
[68] N.C. Oza, Online bagging and boosting, in: 2005 IEEE International Conference
on Systems, Man and Cybernetics, Vol. 3, IEEE, 2005, pp. 2340–2345.
[69] J.Z. Kolter, M.A. Maloof, Using additive expert ensembles to cope with
concept drift, in: Proceedings of the 22nd International Conference on Machine
Learning, 2005, pp. 449–456.
[70] R. Wang, S. Kwong, X. Wang, Y. Jia, Active k-labelsets ensemble for multi-label
classification, Pattern Recognit. 109 (2021) 107583.
[71] A.V. Luong, T.T. Nguyen, A.W.-C. Liew, Streaming active deep forest for
evolving data stream classification, 2020, arXiv preprint arXiv:2002.11816.
[72] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic integration
of classifiers for handling concept drift, Inf. Fusion 9 (1) (2008) 56–68.
[73] A. Tsymbal, M. Pechenizkiy, P. Cunningham, S. Puuronen, Handling local con-
cept drift with dynamic integration of classifiers: Domain of antibiotic resistance
in nosocomial infections, in: 19th IEEE Symposium on Computer-Based Medical
Systems, CBMS’06, IEEE, 2006, pp. 679–684.
[74] Y. Mirsky, T. Doitshman, Y. Elovici, A. Shabtai, Kitsune: an ensemble of
autoencoders for on line network intrusion detection, 2018, arXiv preprint
arXiv:1802.09089.
[75] B. Krawczyk, Active and adaptive ensemble learning for on line activity
recognition from data streams, Knowl.-Based Syst. 138 (2017) 69–78, http:
//dx.doi.org/10.1016/j.knosys.2017.09.032.
[76] J.Z. Kolter, M.A. Maloof, Dynamic weighted majority: An ensemble method for
drifting concepts, J. Mach. Learn. Res. 8 (Dec) (2007) 2755–2790.
[77] H. Wang, W. Fan, P.S. Yu, J. Han, Mining concept-drifting data streams using
ensemble classifiers, in: Proceedings of the Ninth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2003, pp. 226–235.
[78] G. Hulten, L. Spencer, P. Domingos, Mining time-changing data streams,
in: Proceedings of the Seventh ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2001, pp. 97–106.
[79] A. Bifet, R. Gavaldà, Adaptive learning from evolving data streams, in:
International Symposium on Intelligent Data Analysis, Springer, 2009, pp.
249–260.
[80] C. Manapragada, G.I. Webb, M. Salehi, Extremely fast decision tree, in:
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, 2018, pp. 1953–1962.
[81] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive window-
ing, in: Proceedings of the 2007 SIAM International Conference on Data Mining,
SIAM, 2007, pp. 443–448.
[82] X. Tian, Q. Sun, X. Huang, Y. Ma, Dynamic on line traffic classification using
data stream mining, in: 2008 International Conference on MultiMedia and
Information Technology, IEEE, 2008, pp. 104–107.
[83] S. Garg, A. Singh, S. Batra, N. Kumar, M.S. Obaidat, Enclass: Ensemble-based
classification model for network anomaly detection in massive datasets, in:
GLOBECOM 2017-2017 IEEE Global Communications Conference, IEEE, 2017,
pp. 1–7.
[84] D.G. Corrêa, F. Enembreck, C.N. Silla, An investigation of the hoeffding adaptive
tree for the problem of network intrusion detection, in: 2017 International Joint
Conference on Neural Networks, IJCNN, IEEE, 2017, pp. 4065–4072.
[85] U. Adhikari, T.H. Morris, S. Pan, Applying hoeffding adaptive trees for real-
time cyber-power event and intrusion classification, IEEE Trans. Smart Grid 9
(5) (2017) 4049–4060.
[86] M. Kubat, R.C. Holte, S. Matwin, Machine learning for the detection of oil spills
in satellite radar images, Mach. Learn. 30 (2–3) (1998) 195–215.
[87] E.M. Hassib, A.I. El-Desouky, E.-S.M. El-Kenawy, S.M. El-Ghamrawy, An im-
balanced big data mining framework for improving optimization algorithms
performance, IEEE Access 7 (2019) 170774–170795.
[88] S. Dendamrongvit, P. Vateekul, M. Kubat, Irrelevant attributes and imbalanced
classes in multi-label text-categorization domains, Intell. Data Anal.sis 15 (6)
(2011) 843–859.
[89] A.M. Ráez, L.A.U. López, R. Steinberger, Adaptive selection of base classifiers
in on e-against-all learning for large multi-labeled collections, in: International
Conference on Natural Language Processing, Springer, 2004, pp. 1–12, In Spain.
[90] J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano, Experimental perspectives
on learning from imbalanced data, in: Proceedings of the 24th International
Conference on Machine Learning, 2007, pp. 935–942.
[91] V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into
classification with imbalanced data: Empirical results and current trends on
using data intrinsic characteristics, Inform. Sci. 250 (2013) 113–141.
[92] S. Gupta, A. Gupta, Handling class overlapping to detect noisy instances in
classification, Knowl. Eng. Rev. 33 (2018).
[93] S. Ding, B. Mirza, Z. Lin, J. Cao, X. Lai, T.V. Nguyen, J. Sepulveda, Kernel
based on line learning for imbalance multiclass classification, Neurocomputing
277 (2018) 139–148.
[94] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning
from class-imbalanced data: Review of methods and applications, Expert Syst.
Appl. 73 (2017) 220–239.
[95] K. Wu, A. Edwards, W. Fan, J. Gao, K. Zhang, Classifying imbalanced data
streams via dynamic feature group weighting with importance sampling, in:
Proceedings of the 2014 SIAM International Conference on Data Mining, SIAM,
2014, pp. 722–730.
[96] J. Gao, W. Fan, J. Han, P.S. Yu, A general framework for mining concept-
drifting data streams with skewed distributions, in: Proceedings of the 2007
Siam International Conference on Data Mining, SIAM, 2007, pp. 3–14.
[97] S. Chen, H. He, Sera: selectively recursive approach towards nonstationary
imbalanced stream data mining, in: 2009 International Joint Conference on
Neural Networks, IEEE, 2009, pp. 522–529.
[98] S. Chen, H. He, Towards incremental learning of nonstationary imbalanced
data stream: a multiple selectively recursive approach, Evol. Syst. 2 (1) (2011)
35–50.
[99] G. Ditzler, R. Polikar, Incremental learning of concept drift from streaming
imbalanced data, IEEE Trans. Knowl. Data Eng. 25 (10) (2012) 2283–2301.
[100] S. Barua, M.M. Islam, X. Yao, K. Murase, MWMOTE–majority weighted minority
oversampling technique for imbalanced data set learning, IEEE Trans. Knowl.
Data Eng. 26 (2) (2012) 405–425.
[101] C. Elkan, The foundations of cost-sensitive learning, in: International Joint Con-
ference on Artificial Intelligence, Vol. 17, no. 1, Lawrence Erlbaum Associates
Ltd, 2001, pp. 973–978.
[102] R. Longadge, S. Dongre, Class imbalance problem in data mining review, 2013,
arXiv preprint arXiv:1305.1707.
[103] L. Gao, L. Zhang, C. Liu, S. Wu, Handling imbalanced medical image data: A
deep-learning-based on e-class classification approach, Artif. Intell. Med. 108
(2020) 101935.
[104] M. Mermillod, A. Bugaiska, P. Bonin, The stability-plasticity dilemma: Inves-
tigating the continuum from catastrophic forgetting to age-limited learning
effects, Front. Psychol. 4 (2013) 504.
[105] H. Ritter, A. Botev, D. Barber, Online structured laplace approximations
for overcoming catastrophic forgetting, in: Advances in Neural Information
Processing Systems, 2018, pp. 3738–3748.
Computer Networks 207 (2022) 108836
17
A. Shahraki et al.
[106] V. Losing, B. Hammer, H. Wersing, Choosing the best algorithm for an
incremental on -line learning task, 2016.
[107] J. Montiel, J. Read, A. Bifet, T. Abdessalem, Scikit-multiflow: A multi-output
streaming framework, J. Mach. Learn. Res. 19 (1) (2018) 1–5.
[108] P. Kosina, J. Gama, Very fast decision rules for classification in data streams,
Data Min. Knowl. Discov. 29 (1) (2015) 168–202.
[109] B. Wang, J. Pineau, Online bagging and boosting for imbalanced data streams,
IEEE Trans. Knowl. Data Eng. 28 (12) (2016) 3353–3366.
[110] N.C. Oza, S.J. Russell, Online bagging and boosting, in: International Workshop
on Artificial Intelligence and Statistics, PMLR, 2001, pp. 229–236.
[111] R. Delgado, X.-A. Tibau, Why Cohen’s Kappa should be avoided as performance
measure in classification, PLoS One 14 (9) (2019) e0222916.
[112] M. Grzenda, H.M. Gomes, A. Bifet, Delayed labelling evaluation for data
streams, Data Min. Knowl. Discov. (2019) 1–30.
[113] N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network
intrusion detection systems (UNSW-NB15 network data set), in: 2015 Military
Communications and Information Systems Conference, MilCIS, 2015, pp. 1–6.
[114] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the KDD
cup 99 data set, in: 2009 IEEE Symposium on Computational Intelligence For
Security and Defense Applications, IEEE, 2009, pp. 1–6.
[115] N. Koroniotis, N. Moustafa, E. Sitnikova, B. Turnbull, Towards the development
of realistic botnet dataset in the internet of things for network forensic analytics:
Bot-iot dataset, Future Gener. Comput. Syst. 100 (2019) 779–796.
[116] S. Dhar, J. Guo, J. Liu, S. Tripathi, U. Kurup, M. Shah, On-device machine
learning: An algorithms and learning theory perspective, 2019, arXiv preprint
arXiv:1911.00623.
[117] O. Anava, E. Hazan, S. Mannor, O. Shamir, Online learning for time series
prediction, in: Conference on Learning Theory, PMLR, 2013, pp. 172–184.
[118] V. Kuznetsov, M. Mohri, Time series prediction and on line learning, in:
Conference on Learning Theory, PMLR, 2016, pp. 1190–1213.
Amin Shahraki received his Ph.D. from the university
of Oslo, Norway in 2020 in using Machine Learning for
networking focused on IoT networks. He was a visiting re-
searcher at University of Melbourne, CLOUDS Lab from Aug.
2019 to Feb. 2020 under supervision of professor Rajkumar
Buyya. He was a postdoctoral researcher at the University
College Dublin working on Network Traffic Monitoring to
identify external events. Now, he is a research associate
in the Communication Systems Division of Fraunhofer IIS,
Erlangen, Germany. He is a member of IEEE since 2011
and the reviewer of more than 31 top-ranked journals. His
current research interests are Internet of Things, Machine
Learning for netwokring, Cognitive Network Management,
Network Behavior Analysis, Time Series Analysis, Machine
Learning for Networking and self-healing networks.
Mahmoud Abbasi received his bachelor’s degree in com-
puter software engineering in Iran, 2011, and Master degree
from Islamic Azad University of Mashhad, Iran in 2017. He
is a member of IEEE since 2019. His current research in-
terests are Communication Systems and Networks, Machine
Learning (ML)/Deep Learning (DL), Network Monitoring
and Analysis (NTMA), Internet of Things and 5G/6G core
networks.
Amir Taherkordi is an Associate Professor at the Depart-
ment of Informatics, University of Oslo (UiO). He received
his Ph.D. degree from the Informatics Department, UiO in
2011. After completing his Ph.D. studies, Amir joined Soni-
tor Technologies as a Senior Embedded Software Engineer.
From 2013 to 2018, he was a researcher in the Networks
and Distributed Systems (ND) group at the Department
of Informatics, UiO. Amir’s research interests are broadly
on intelligence, resource-efficiency, scalability, adaptability,
dependability, mobility and data-intensiveness of distributed
systems designed for emerging computing technologies, such
as IoT, Fog/Edge/Cloud Computing, and Cyber-Physical
Systems (CPS).
Anca Delia Jurcut is an Assistant Professor in the UCD
School of Computer Science since 2015. She received a B.Sc.
in Computer Science and Mathematics from West University
of Timisoara, Romania in 2007 and a Ph.D. in Security Engi-
neering from University of Limerick (UL) in 2013 funded by
the Irish Research Council for Science Engineering and Tech-
nology (IRCSET). She worked as a postdoctoral researcher
at UL as a member of the Data Communication Security
Laboratory and as a Software Engineer in IBM in Dublin in
the area of data security and formal verification. She has
recently acted as an evaluator of H2020 proposals for the
Cryptography and Cybersecurity call. Dr. Jurcut research
interests include Security Protocols Design and Analysis,
Mathematical Modeling, Automated Techniques for Formal
Verification, Cryptography, Computer Algorithms, Security
for Internet of Things and Blockchain Security. Much of
her work has focused on formal verification techniques
for security protocols using deductive reasoning methods
(modal logics and theorem proving); automation of logics
for formal verification; the development of new logic-based
techniques and tools for formal verification; the design and
analysis of security protocols; formalization and modeling
of design requirements for security protocols.