PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Modern analytical systems must be ready to process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift, and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous adaptation of the model to the changing data distributions. This work focuses on non-stationary data stream classification, where a classifier ensemble is used. To keep the ensemble model up to date, the new base classifiers are trained on the incoming data blocks and added to the ensemble while, at the same time, outdated models are removed from the ensemble. One of the problems with this type of model is the fast reaction to changes in data distributions. We propose a new Chunk Adaptive Restoration framework that can be adapted to any block-based data stream classification algorithm. The proposed algorithm adjusts the data chunk size in the case of concept drift detection to minimize the impact of the change on the predictive performance of the used model. The conducted experimental research, backed up with the statistical tests, has proven that Chunk Adaptive Restoration significantly reduces the model's restoration time.
Content may be subject to copyright.
Employing chunk size adaptation to overcome concept
drift
Jędrzej Kozal1, Filip Guzy1, and Michał Woźniak1
1Wrocław University of Science and Technology
{jedrzej.kozal,filip.guzy,michal.wozniak}@pwr.edu.pl
Abstract
Modern analytical systems must be ready to process streaming data and cor-
rectly respond to data distribution changes. The phenomenon of changes in data
distributions is called concept drift, and it may harm the quality of the used models.
Additionally, the possibility of concept drift appearance causes that the used algo-
rithms must be ready for the continuous adaptation of the model to the changing
data distributions. This work focuses on non-stationary data stream classification,
where a classifier ensemble is used. To keep the ensemble model up to date, the new
base classifiers are trained on the incoming data blocks and added to the ensemble
while, at the same time, outdated models are removed from the ensemble. One
of the problems with this type of model is the fast reaction to changes in data dis-
tributions. We propose a new Chunk Adaptive Restoration framework that can be
adapted to any block-based data stream classification algorithm. The proposed algo-
rithm adjusts the data chunk size in the case of concept drift detection to minimize
the impact of the change on the predictive performance of the used model. The
conducted experimental research, backed up with the statistical tests, has proven
that Chunk Adaptive Restoration significantly reduces the model’s restoration time.
1 Introduction
Data stream mining focuses on the knowledge extraction from streaming data,
mainly for the predictive model construction aimed at assigning arriving instances
to one of the predefined categories. This process is characterized by additional
difficulties that arise when data distribution evolves over time. It is visible
in many practical tasks as spam detection, where the spammers still change
the message format to cheat anti-spam systems. Another example is medical
diagnostics, where new SARS-CoV-2 mutations may cause different symptoms,
which forces doctors to adapt and improve diagnostic methods [82].
The mentioned above phenomenon is called concept drift, and its nature
can vary due to both the character and the rapidity. It forces classification
models to adapt to new data characteristics and forget old, useless concepts. An
important characteristic of such systems is their reaction to the concept drift
phenomenon, i.e., how much predictive performance deteriorates when it occurs
and when the classification system will obtain the approved predictive quality for
the new concept. We should also consider another limitation: the classification
1
arXiv:2110.12881v1 [cs.LG] 25 Oct 2021
1 INTRODUCTION
system should be ready to classify incoming objects immediately, and dedicated
computing and memory resources are limited.
Data processing models used by stream data classification systems can be
roughly divided into two categories: online (object by object) processing (online
learners), or block-based (chunk by chunk) data processing (block-based learners)
[
27
]. Online learners require model parameters to be updated when a new object
appears, while the block-based method requires updates once per batch. The
advantage of online learners is their fast adaptation to concept drift. However,
in many practical applications, the effort of necessary computation (related to
updating models after processing each object) is unacceptable. The model update
can require many operations that involve changing data statistics, updating
the model’s internal structure, or learning a new model from scratch. These
requirements can become prohibitive for high-velocity streams. Hence, more
popular is block-based data processing, which requires less computational effort.
However, it limits the model’s potential for quick adaptation to changes in
data distribution and fast restoration of performance after concept drift. In
consequence, a significant problem is the proper selection of the chunk size.
Smaller data block size results in faster adaptation. However, it increases the
overall computing load. On the other hand, larger data chunks require less
computation but result in a lower adaptive capacity of the classification model.
Another valid consideration is the impact of chunk size on prediction stability.
Models trained on smaller chunks typically have larger prediction variance, while
models trained with larger chunks tend to have more stable predictions when
the data stream is stationary. If concept drift occurs, a larger chunk increase
probability that the data from different concepts will be placed in the same batch.
Hence, selecting the chunk size is a trade-off encompassing computation power,
adaptation speed, and predictions variance.
The trade-off described above includes features that are equally desired in
many applications. Especially consumption of computation power and adaptation
speed are both important when processing large data streams. We propose a
new method that alleviates the downfalls of choosing between small or large
chunk sizes by dynamically changing the current batch size. More precisely, our
work introduces the Chunk-Adaptive Restoration (CAR), a framework based on
combined drift and stabilization detection techniques that adjusts the chunk sizes
during the concept drift. This approach slightly redefines the previous problem
based on the observation that for many practical classification tasks, a period of
changes in data distributions is followed by stabilization. Hence, we propose that
when the concept drift occurs, the model should be quickly upgraded, i.e., the
data should be processed in small chunks, and during the stabilization period,
the data block size may be extended. The advantage of the proposed method
is its universality and the possibility of using it with various chunk-based data
stream classifiers.
This work offers the following contributions:
Proposing the Chunk-Adaptive Restoration framework to empower fluent
restoration after concept drift appearance.
Formulating the Variance-based Stabilization Detection Method, a technique
complementary to all concept drift detectors that simplifies chunk size
adaptation and metrics calculation.
2
2 RELATED WORKS
Employing Chunk-Adaptive Restoration for the adaptive data chunk size
setting for selected state-of-the-art algorithms.
Introducing a new stream evaluation metric, Sample Restoration, to show
the gains of the proposed methods.
Experimental evaluation of the proposed approach based on various syn-
thetic and real data streams and a detailed evaluation of its usefulness for
the selected state-of-art methods.
2 Related works
This section provides a review of the related works. Firstly, we will discuss chal-
lenges specific to the learning from non-stationary data streams. Next, we discuss
different methods of processing data streams. Following, we describe existing drift
detection algorithms and ensemble methods. We continue by reviewing existing
evaluation protocols and computational and memory requirements. We conclude
this section by providing examples of other data stream learning methods that
employ variable chunk size.
2.1 Challenges related to data stream mining
A data stream is a sequence of objects described by their attributes. In the
case of a classification task, each learning object should be labeled. The number
of items may be vast, potentially infinite. Observations in the stream may
arrive at different times, and the time intervals between their arrival could vary
considerably. The main differences between analyzing data streams and static
datasets include [56]:
No one can control the order of incoming objects
The computation resources are limited, but the analyzer should be ready
to process the incoming item in a reasonable time
The memory resources are also limited, but the data stream size may be
huge or infinite, which causes memorizing all the items impossible
Data streams are susceptible to change, i.e., data distributions may change
over time
The labels of arriving items are not for free, for some cases impossible to
get, or available with delay (e.g., in banking for credit approval task after
a few years)
The canonical classifiers usually do not consider that the probabilistic charac-
teristics of the classification task may evolve [
65
]. Such a phenomenon is known
as concept drift [
30
] and a few concept drift taxonomies have been proposed. The
most popular consider how rapid the drift is, then we can distinguish sudden
drift and incremental one. An additional difficulty is a case when, during the
transition between two concepts, objects from two different concepts appear for
some time simultaneously (gradual drift). We can also take into consideration
the influence of the probabilistic characteristics on the classification task [33]:
3
2.2 Methods for processing data streams 2 RELATED WORKS
virtual concept drift does not impact the decision boundaries but affects
the probability density functions [
66
], and Widmer and Kubat [
30
] imputed
it rather to incomplete data representation than to the true changes in
concepts,
real concept drift affects the posterior probabilities and may impact the
unconditional probability density function [30].
2.2 Methods for processing data streams
The data stream can be divided into small portions of the data called data
chunks. This method is known as batch-based or chunk-based learning. Choosing
the proper size of the chunk is crucial because it may significantly affect the
classification [
54
]. Unfortunately, the unpredictable appearance of the concept
drift makes it difficult. Several approaches may help overcome this problem,
e.g., using different windows for processing data [
68
] or adjusting chunk size
dynamically [
30
]. Unfortunately, most chunk-based classification methods assume
that the size of the data chunk is priorly set and remains unchanged during the
data processing.
Instead of chunk-based learning, the algorithm can learn incrementally (online)
as well. Training examples arrive one by one at a given time, and they are not
kept in memory. The advantage of this solution is the need for small memory
resources. However, the effort of necessary computation related to updating
models after processing each individual object is unacceptable, especially in the
high-velocity data streams, i.e., Internet of Things (IoT) applications.
When processing a non-stationary data stream, we can rely on a drift detector
to point moments when data distribution has changed and take appropriate
actions. The alternative is to use inherent adaptation properties of models
(update & forget). In the following subsection, we will discuss both of these
approaches.
2.3 Drift detection methods
A drift detector is an algorithm that can inform about any changes taking place
within data stream distributions. The data labels or a classifier’s performance
(measured using any metric, such as accuracy) is required to detect a real concept
drift [
69
]. We have to realize that drift detection is a non-trivial task. The
detection should be done as quickly as possible to replace an outdated model and
minimize restoration time. On the other hand, false alarms are unacceptable, as
they will lead to an incorrect model adaptation and resource spending where
there is no need for it [
70
]. DDM (Drift Detection Method) [
71
] is one of the most
popular detectors that incrementally estimates an error of a classifier. Because
we assume the classifier training method’s convergence, the error should decrease
with the appearance of subsequent learning objects [
72
]. If the reverse behavior is
observed, then we may suspect a change of probability distributions. DDM uses
the three-sigma rule to detect a drift. EDDM (Early Drift Detection Methods)
[
73
] is an extension of DDM, where the window size selection procedure is based
on the same heuristics. Additionally, the distance error rate is being used instead
of the classifier’s error rate. Blanco et al. [
74
] proposed very interesting drift
4
2.4 Ensemble methods 2 RELATED WORKS
detectors that use the non-parametric estimation of classifier error employing
Hoeffding’s and McDiarmid’s inequalities.
2.4 Ensemble methods
One of the most promising data stream classification research directions, which
usually employs chunk-based data processing is the classifier ensemble approach
[
27
]. Its advantage is that the classifier ensemble can easily adapt to the concept
drift using different updating strategies [60]:
Dynamic combiners – individual classifiers are trained in advance, and they
are not updated anymore. The ensemble classifier adapts to changing data
distribution by changing the combination rule parameters.
Updating training data – incoming examples are used to retrain component
classifiers (e.g., online bagging [62]).
Updating ensemble members [64, 67].
Changing ensemble lineup – replacing outdated classifiers in the ensemble,
e.g., new individual models are trained on the most recent data and added
to the ensemble. The ensemble pruning procedure is applied, which chooses
the most valuable set of individual classifiers [61].
A comprehensive overview of techniques using classifier ensemble [
27
] was
presented by Krawczyk et al. Let us shortly characterize some popular strategies
used during the experiments. Streaming Ensemble Algorithm (SEA) [
6
] is the
simple classifier ensemble with changing lineup, where the individual classifiers
are trained on the successive data chunks. To keep the model up-to-date, the
base classifiers with the lowest accuracy are removed from the ensemble. Wang et
al. proposed Accuracy Weighted Ensembles (AWE) [
76
] employing the weighted
voting rules, where weights depend on the accuracy obtained on the testing data.
Brzezinski and Stefanowski proposed Accuracy Updated Ensemble (AUE), which
extends AWE by using online classifiers and updating them according to the
current distribution [
4
]. Wozniak et al. developed Weighted Aging Ensemble
(WAE), which trains base classifiers on successive data chunks, and the final
decision is made on weighted voting, where weights depend on accuracy and
ensemble diversity. This algorithm additionally employs the decoy function to
decrease the weights of outdated individuals [76].
2.5 Existing evaluation methodology
Because this work mainly focuses on improving classifier behavior after the
concept drift appearance, apart from the classifier’s predictive performance, we
should also consider memory consumption, the time required to update the model,
and time to decide. However, it should also be possible to evaluate how the
model reacts to changes in the data distribution. Shaker and Hüllermeier [
41
]
presented a complete framework for evaluating the recovery rate, including the
proposition of two metrics restoration time and maximum performance loss. In
this framework, the notion of pure streams was introduced i.e., streams containing
only one concept. Two pure streams
SA
and
SB
are mixed into third stream
SC
, starting with concepts only from the first stream and gradually increasing a
5
2.6 Computational and memory requirements 2 RELATED WORKS
percentage of concepts from the second stream. Restoration time was defined as a
length of the time interval between two events - first a performance measured on
SC
drops below 95% of a
SA
performance, and then the performance on
SC
rise
above 95% of
SB
performance. The Maximum performance loss is the maximum
difference between
SC
performance and lowest performance on either
SA
or
SB
.
Zliobaite et al. [
75
] proposed that evaluating the profit from the model update
should consider the memory and computing resources involved in its update.
2.6 Computational and memory requirements
While designing a data stream classifier, we should also consider the computation
power and memory limitations and that we usually have limited access to data
labels. These data stream characteristics pose the need for other algorithms than
ones previously developed for batch learning, where data are stored infinitely and
persistently. Such learning algorithms cannot fulfill all data stream requirements,
such as memory usage constraints, limited processing time, and one scan of
incoming examples. However, simple incremental learning is usually insufficient,
as it does not meet tight computational demands and does not tackle evolving
nature of data sources [58].
Constraints on memory and time have resulted in different windowing tech-
niques, sampling (e.g., reservoir sampling), and other summarization approaches.
Also, we have to realize that when the concept drift appears, data from the past
may become irrelevant or even harmful for the current models, deteriorating the
predictive performance of the classifiers. Thus an appropriate implementation of
a forgetting mechanism (where old data instances are discarded) is crucial.
2.7 Other approaches that modify chunk size
Dynamic chunk size adaptation was proposed in some works earlier [
79
] [
80
] [
81
].
Liu et al. [
79
] utilize information about the occurrence of drift from drift detector.
If drift occurs in the middle of the chunk, data is divided into two chunks, hence
dynamic chunk size. If there is no drift inside the chunk, the whole batch is used.
In the prepared chunk, the majority class is undersampled. A new classifier is
trained and added to the ensemble, and older classifiers are updated. Lu et al.
[
80
] also utilize an ensemble framework for imbalanced stream learning. In this
approach, chunk size grows incrementally. Two chunks are compared based on
ensembles predictions variance. An algorithm for calculating prediction variance
called subunderbagging is introduced. Computed variance is compared using
F-test. Chunk size increases if the p-value is less than a predefined threshold;
otherwise, the whole ensemble is updated with the selected chunk size. The
whole process repeats as long as the p-value is lower than the threshold. In both
of these works, dynamic chunk size was used as means of handling imbalanced
data streams. In contrast, we show that changing chunk size can be beneficial
when handling concept drifts in general. Therefore, we do not focus primarily on
imbalanced data.
Bifet et al. [
81
] introduced a method for handling concept drift with varying
chunk sizes. Each incoming chunk is divided into two parts: older and new.
Empirical means of data in each subchunk are compared using Hoeffding bound.
If the difference between two means exceeds the threshold defined by confidence
value, then data in the older window is qualified as out of date and is dropped.
6
3 METHODS
Later window with data for current concept grows, until next drift is detected
and data is split again. This approach allows for detecting drift inside the chunk.
3 Methods
This paper presents a general framework that can be used for training any chunk-
based classifier ensemble. This approach aims to reduce the restoration time, i.e.,
a period needed to stabilize the classification model performance after concept
drift occurs. As we mentioned, most methods assume a fixed data chunk size,
which is a parameter of these algorithms. Our proposal does not modify the
core of a learning algorithm itself. Still, based on the predictive performance
estimated on a given data chunk, it only indicates what data chunk size is to be
taken by a given algorithm in the next step. We provide schema of our method
in Fig. 1. The intuition tells us that after the occurrence of the concept drift, the
size of the chunk should be small to quickly train new models that will replace
the models learned on the data from the previous concept in the ensemble. When
the stabilization is reached, the ensemble contains base models trained on data
from a new concept. In this moment we can extend the chunk size so classifiers
in the ensemble can achieve better performance and even greater stability by
learning on larger portions of data from the streams because the analyzed concept
is already stable.
Figure 1: Chunk-Adaptive Restoration visualization. Red line marks the concept
drift, green line marks the stabilization.
Let us present the proposed framework in detail.
3.1 Chunk-Adaptive Restoration
Starting the learning process, we sample the data from the stream with a constant
chunk size
c
and monitor the classifier performance using a concept drift detector
to detect changes in data distribution. When the drift occurs, we decrease the
chunk size to the smaller value
cdc
, i.e.,
cd
is the predefined size of a batch for
concept drift. Size of subsequent chunks after drift at given time
t
are computed
using the following equation:
ct=min(bαct1c, c)(1)
where
α >
1. The chunk size grows continuously with each step to reach the
original value
c
unless the stabilization is detected. Then the chunk size is set to
c
immediately. Let us introduce the Variance-based Stabilization Detection Method
(VSDM) to detect the predictive performance stabilization. First, we define the
7
3.2 Memory and time complexity 3 METHODS
fixed-sized sliding window
W
containing the last
K
predictive performance metric
values obtained for the most recent chunks. We also introduce the stabilization
threshold
s
. The stabilization is detected when the following condition is met:
V ar(W)< s(2)
where
V ar
(
W
)is a variance of scores obtained for the last
K
chunks. Sample
data stream with detected drift and stabilization is presented in Fig. 2. The
primary assumption of the proposed method is a faster model adaptation caused
by the increased number of updates after a concept drift. This strategy allows for
using the larger chunk sizes when the data is not changing. It also reduces the
computational costs of retraining models. Alg. 1 present the whole procedure.
Our method works with existing models for online learning. For this reason, we
argue that the approach proposed in this paper is easier to deploy in practice.
Figure 2: Exemplary accuracy for data stream with abrupt concept. Red line
denotes drift detection, green stabilization detection, and blue beginning of a real
drift.
3.2 Memory and time complexity
Our method only impacts the size of the chunk. All other factors like the number
of features or classifiers in the ensemble are the same as in the basic approach.
For this reason, we will focus here only on the impact of chunk size on memory
and time complexity. With memory complexity, our method could impact only
the size of buffers for storing samples from a stream. When no drift is detected,
the standard chunk size is used. This dictates the required size of buffers for
storing samples. For this reason, memory complexity for storing samples is
O
(
c
).
CAR works the same way as a base method when no drift is detected, and
the data stream is stable. Therefore, in this case, the time complexity is the
same as in the base method. When drift is detected sizes of subsequent chunks
8
3.2 Memory and time complexity 3 METHODS
Algorithm 1 Chunk-Adaptive Restoration algorithm
Input: m- model
S- data stream
dd - drift detector
sd - stabilization detector
n- number of chunks
t- chunk index
c- base chunk size
cd- base drift chunk size
ct-tth chunk size
test() - procedure that tests model with a chunk and returns the predictive
performance metric (ppm)
train() - procedure that trains model with a chunk
change_detected() - procedure that informs about drift occurrence with the
drift detector and the last score
stabilization_detected() - procedure that detects stabilization with the stabi-
lization detector and the stabilization window
1: for t= 1 to ndo
2: ppm test(m, S(t))
3: if stabilization_detected(sd, ppm)then
4: ctc
5: else
6: ctmin(bαct1c, c)
7: end if
8: if change_detected(dd, ppm)then
9: ctcd
10: end if
11: train(m, S(t))
12: end for
9
3.3 Sample Restoration 3 METHODS
are changed. Time complexity depends on model complexity
g
(
N
), where
N
is a
number of learning examples provided to model to train on. For simplicity we
assume that
g
(
N
)represents both ensemble and base model complexity. With
this assumptions time complexity of base model (when CAR is not enabled)
is:
O
(
g
(
c
)). When CAR is enabled and concept drift is detected chunk size
is changed to
cd
. Each consecutive chunk at time
t
have size
ct
=
αtcd
, with
t
= 0 directly after the drift was detected. Chunk size grows until stabilization is
detected or current chunk size is restored to original size
c
. For simplicity we skip
case when stabilization is detected. With this assumption, we write condition for
restoring the original chunk size:
αtscd=c(3)
Where
ts
is time when chunk size is restored to original value. From this
equation we obtain tsdirectly:
ts= logα
c
cd
(4)
The number of operations required by CAR after concept drift was detected
is
ts
X
t=0
g(αtcd)(5)
Using big-O notation:
O(
ts
X
t=0
g(αtcd)) = O(g(αtscd)) = O(g(c
cd
cd)) = O(g(c)) (6)
Therefore CAR time complexity depends only on chunk size and computational
complexity of used models.
3.3 Sample Restoration
Restoration time cannot be directly utilized in this work, as we do not have
access to pure streams with separate concepts. For this reason, we introduce a
new Sample Restoration (SR) metric to evaluate the Chunk-Adaptive Restoration
performance compared to standard methods used for learning models on data
streams with concept drift. We assume that there is a sequence of
N
chunks
between two stabilization points. Each element of such a sequence is determined
by the chunk size
ct
and the achieved model’s accuracy
acct
. Let us define the
index of the minimum accuracy as:
tmin = argmin
t[0,N)
acct(7)
and the restoration threshold is given by the following formula:
r=p×max
t[tmin,N )acct(8)
10
4 EXPERIMENT
where
p
(0
,
1) is the percentage of the performance that has to be restored,
and the multiplier is the maximum accuracy score of our model after the point
when it achieved its minimum score. Finally, we look for the lowest index
tr
after
which the model exceeds the assumed restoration threshold:
tr= inf
t[tmin,N ){t:acctr}(9)
Sample Restoration is computed as the sum of chunk sizes from the concept
drift’s beginning to the tr:
SR(p) =
tr
X
t=0
ct(10)
In general, SR is the number of samples needed to obtain the
p
percent of the
maximum performance achieved on the subsequent task.
4 Experiment
Chunk-Adaptive Restoration is a method designed to reduce the number of samples
used to restore the model’s performance during the concept drift. We expect to
significantly reduce the Sample Restoration for each trained model depending
on the chunk size adaptation level. The experimental study was formulated to
answer the following research questions:
RQ1: How do different chunk sizes impact predictive performance?
RQ2: How does the Chunk-Adaptive Restoration influence the learning process?
RQ3: How many samples can be saved during the restoration phase?
RQ4:
How do different classifier ensemble models behave with the application of
Chunk-Adaptive Restoration applied?
RQ5: How robouts to noise Chunk-Adaptive Restoration is?
4.1 Experiment setup
Data streams.
Experiments were carried out using both synthetic and real
datasets. Stream-learn library [
1
] was employed to generate the synthetic data
containing three types of concept drift: abrupt, gradual, and increment, all
generated with the recurring or unique concepts. We tested parameters such as
chunk sizes and the stream length for each type of concept drift. All streams
were generated with 5 concept drifts, 2 classes, 20 input features, of which 2
were informative and 2 were redundant. In the case of incremental and gradual
drifts concept, sigmoid spacing was set to 5. Apart from the synthetic ones,
we employed the Usenet [
2
] and Insects [
10
] data streams. Unfortunately, the
original Usenet dataset contains a small number of samples, so two selected
concepts were repeated to create a recurring-drifted data stream. Each chunk
of the Insects data stream was randomly oversampled because of the significant
imbalance ratio. Tab. 1 contains detailed description of all utilized data streams.
11
4.1 Experiment setup 4 EXPERIMENT
# Source Drift type
Base
chunk
size c
#samples
1 stream-learn abrupt recurring 500 300000
2 stream-learn abrupt recurring 1000 150000
3 stream-learn abrupt recurring 10000 60000
4 stream-learn abrupt recurring 500 250000
5 stream-learn abrupt nonrecurring 500 300000
6 stream-learn abrupt nonrecurring 1000 150000
7 stream-learn abrupt nonrecurring 10000 60000
8 stream-learn abrupt nonrecurring 500 250000
9 stream-learn gradual recurring 500 300000
10 stream-learn gradual recurring 1000 150000
11 stream-learn gradual recurring 10000 60000
12 stream-learn gradual recurring 500 250000
13 stream-learn gradual nonrecurring 500 300000
14 stream-learn gradual nonrecurring 1000 150000
15 stream-learn gradual nonrecurring 10000 60000
16 stream-learn gradual nonrecurring 500 250000
17 stream-learn incremental recurring 500 300000
18 stream-learn incremental recurring 1000 150000
19 stream-learn incremental recurring 10000 60000
20 stream-learn incremental recurring 500 250000
21 stream-learn incremental nonrecurring 500 300000
22 stream-learn incremental nonrecurring 1000 150000
23 stream-learn incremental nonrecurring 10000 60000
24 stream-learn incremental nonrecurring 500 250000
25 usenet abrupt recurring 1000 120000
26 insects-abrupt-imbalanced abrupt nonrecurring 1000 355275
27 insects-gradual-imbalanced gradual nonrecurring 1000 143323
Table 1: Data streams used for experiments.
12
4.2 Impact of chunk size on performance 4 EXPERIMENT
Drift detector.
The Fast Hoeffding Drift Detection Method [
74
] was employed
as a concept drift detector. We used implementation available on the public
repository [
78
]. The size of a window in FHDDM was equal to 1000, and the
error probability allowed δ= 0.000001.
Classifier ensembles.
Three models classifier ensembles dedicated to data
stream classification were chosen for comparison:
Weighted Aging Classifier (WAE) [76]
Accuracy Weighted Ensemble (AWE) [3],
Streaming Ensemble Algorithm (SEA) [6],
All ensembles contained 10 base classifiers.
Experimental protocol.
In our experiments, we apply the models mentioned
above to selected data streams with concept drift. We measure Sample Restora-
tion. These results are reported as a baseline. Next, we apply Chunk-Adaptive
Restoration and repeat experiments to establish the proposed model’s influence
on the ability to handle concept drift quickly. As the experiments were conducted
with the balanced data, the accuracy was used as the only indicator of the model’s
performance. As the experimental protocol Test-Then-Train was employed [
77
].
Statistical analysis.
Because Sample Restoration can be computed for each
drift and concept drift can occur multiple times, we report average Sample
Restoration for each stream with standard deviation. To assess the statistical
significance of the results, we used a one-sided Wilcoxon signed-rank test in a
direct comparison between the models with the 95% confidence level.
Reproducibility.
To enable independent reproduction of our experiments, we
provide a github repository with code
1
. This repo also contains detailed results
of all experiments. Stream-learn [
1
] implementation of the ensemble models was
utilized with the Gaussian Naïve Bayes and CART as base classifiers from sklearn
[
83
]. Detailed information about used packages is provided in the yml file with a
specification of the conda environment.
4.2 Impact of chunk size on performance
In our first experiment, we examine the impact of the chunk size on the model
performance and general capability for handling data with concept drift. We
train the AWE model on a synthetic data stream with different chunk sizes to
evaluate these properties. The stream consists of 20 features, 2 classes, and it
contains only 1 abrupt drift. Results are presented in Fig. 3. As expected, chunk
size has an impact on the maximal accuracy that the model can achieve. It is
especially visible before drift, where models with larger chunks obtain the best
accuracy. Also, with larger chunks variance of accuracy is lower. In ensemble-
based approaches, a base classifier is trained on a single chunk. A larger chunk
means that more data is available to the underlying model. Therefore it allows
for the training of a more accurate model. Interestingly we can see that for all
chunk sizes, performance is restored roughly at the same time. Regardless of
the chunk size, a similar number of updates is required to bring back the model
performance. Please keep in mind that the x-axis in Fig. 3 is the number of
1https://github.com/w4k2/chunk-adaptive-restoration
13
4.3 Hyperparameter tuning 4 EXPERIMENT
chunks. It means that models trained on larger chunks require a larger number
of learning examples to restore accuracy.
Figure 3: Impact of chunk size on obtained accuracy.
These results give the rationale behind our method. When drift is detected,
we change chunk size to decrease the consumption of learning examples required
for restoring accuracy. Next, we gradually increase chunk size to improve the
maximum possible performance when the model recovers from drift. It allows for
a quick reaction to drift and does not limit the model’s maximum performance.
In principle, not all models are compatible with changing chunk size. Also, batch
size cannot be decreased indefinitely. Minimal chunk size should be determined
case by case, dependent on the base learner used in an ensemble or used model
in general. Later in our experiments, we use chunk sizes of 500, 1000, and 10000
to obtain a reliable estimate of how our method will perform in different settings.
4.3 Hyperparameter tuning
After chunk size was selected, we fine-tuned other hyperparameters, and then we
proceeded to further experiments. Firstly set two values manually, based on our
observations. First is
α
(i.e., constant that determines how fast chunk size grows
after drift was detected) equal to 1
.
1. Second is drift chunk size equal to 30, as it
is a typical window length in drift detectors.
Next, we find the best for the stabilization window size and the stabilization
threshold. We conduct grid search with windows size values 30, 50, 100, and
stabilization thresholds 0.1, 0.01, 0.001, 0.0001. For experiments we use synthetic
data streams 1-24 from Tab. 1. Used data streams have different random number
generator seeds in this and later experiments. Results were collected for WAE,
AWE, SEA ensembles with Naïve Bayes base model. We use Sample Restoration
0.8 as a performance indicator. For each set of parameters, Sample Restoration
was averaged over all streams used to obtain one value. Results are provided in
the table 2.
14
4.4 Impact on concept drift handling capability 4 EXPERIMENT
stabilization
thresholds
drift chunk size
30 50 100
0.1 59210.11 59210.11 59210.11
0.01 58489.47 58675.99 58709.98
0.001 55328.20 55363.95 57669.70
0.0001 52846.04 55962.58 62398.56
Table 2: Sample Restoration 0.8 for various hyperparameter setting. Lower is
better.
From provided data, we can conclude that the smaller the drift chunk size,
the lower the SR is. This observation is in line with intuition about our method.
Smaller drift chunk size provides a larger benefit during drift compared to normal
chunk size. The same dependency can be observed for the stabilization threshold.
Intuitively, a lower threshold means that stabilization is harder to reach. We
argue that this can be beneficial in some cases when working with gradual or
incremental drift. In this scenario, if stabilization is reached too fast, then chunk
size is immediately brought back to the standard size, and there is no benefit
from a smaller chunk size at all. Lowering the stabilization threshold could help
in these cases. In later experiments, we use the stabilization window size equal
to 30 and the variance stabilization threshold equal to 0.0001.
4.4 Impact on concept drift handling capability
In this part of the experiments, we compare the performance of the proposed
method to baseline. Results were collected following the experimental protocol
described in the previous sections. To save space, we do not provide results for
all models and streams. Instead, we plot accuracy achieved by models on selected
data streams. These results are presented in Fig. 4, 5, 6, and 7. All learning
curves were smoothed using a 1D Gaussian filter with σ= 1.
From provided plots, we can deduce that the largest gains from employing the
CAR method can be observed for an abrupt data stream. In streams with gradual
and incremental drifts, there are fewer or none sudden drop of accuracy that the
model can quickly react to. For this reason, the CAR method does not provide a
large benefit with this kind of concept drifts. During a more detailed analysis of
obtained results, we observed that the stabilization for gradual and incremental
drifts is hard to detect. Many false positives usually cause an early return to
the original chunk size, influencing the performance achieved on those two types
of drifts. FHDDM caused another problem regarding the early detection of
the gradual and incremental concept drifts. Usually, this is a desired feature.
In our method, early drift detection initiates the chunk size change when two
data concepts are still overlapping during stream processing. As the transition
between two concepts takes much time, when one concept starts to dominate, the
chunk size could be restored to its original value too early, affecting the achieved
results.
We also observe larger gains from applying CAR on streams with bigger chunk
size. To illustrate please compare results from Fig. 4 to Fig. 5. One possible
explanation behind this trend is that gains obtained from employing CAR are
15
4.4 Impact on concept drift handling capability 4 EXPERIMENT
proportional to the difference in size between the base and drift chunk size. In
our experiments, drift chunk size was equal to 30 for all streams and models.
This explanation is also in line with the results of hyperparameter experiments
provided in Tab. 2.
We conclude this section by providing a statistical analysis of our results.
Tab. 3 shows the results of the Wilcoxon test for Naïve Bayes and CART base
models. We state meaningful differences in the Sample Restoration between the
baseline and the CAR method for all models.
Figure 4: Accuracy for stream-learn
data stream (1).
Figure 5: Accuracy for Usenet dataset
(25).
Figure 6: Accuracy for abrupt Insects
dataset (26).
Figure 7: Accuracy for gradual Insects
dataset (27).
16
4.5 Impact of noise on the CAR effectiveness 4 EXPERIMENT
Naïve Bayes
model
name
SR(0.9) SR(0.8) SR(0.7)
Statistic p-value Statistic p-value Statistic p-value
WAE 40.0 0.0006 30.0 0.0002 45.0 0.0009
AWE 22.0 9.675e-05 26.0 0.0001 36.0 0.0004
SEA 0.0 1.821e-05 23.0 0.0001 1.0 1.389e-05
Cart
model
name
SR(0.9) SR(0.8) SR(0.7)
Statistic p-value Statistic p-value Statistic p-value
WAE 14.0 6.450e-05 54.0 0.003 55.0 0.003
AWE 0.0 1.229e-05 6.0 2.543e-05 21.0 0.0001
SEA 23.0 0.0001 43.0 0.001 42.0 0.001
Table 3: Wilcoxon test results
4.5 Impact of noise on the CAR effectiveness
Real-world data often contain noise in labeling. For this reason, we evaluate
if the proposed method can be used for data with varying amounts of noise in
labels. We generate a synthetic data stream with two classes, base chunk size
1000, drift chunk size 100, and single, abrupt concept drift. We randomly select a
predefined fraction of samples in each chunk and flip labels for selected learning
examples. Next, we measure the accuracy of the AUE model with Gaussian
Naïve Bayes base model on a generated dataset with noise levels 0, 0.1, 0.2, 0.3,
and 0.4. Results are presented in Fig. 8. We note for low levels of noise i.e., up
to 0.3, restoration time is shorter. With a larger amount of noise, there is no
sudden drop in accuracy. Therefore CAR has no impact on the speed of reaction
to drift.
It should be noted that results for CAR with noise levels 0.2, 0.3, and 0.4 were
generated with the stabilization detector turned off. With a higher amount of
noise, stabilization was detected very fast. Therefore chunk size was quickly set
to base value. In this case, there was no benefit of applying CAR. This indicates
that the stabilization method should be refined to handle noisy data well.
4.6 Lessons learned
Firstly we evaluated the impact of chunk size on the process of learning in the
data stream with single concept drift. We learn that models with larger chunk
size can obtain larger maximum accuracy, but the required number of updates
to restore accuracy is similar regardless of chunk size (RQ1 answered). The main
goal of introducing the Chunk-Adaptive Restoration was to prove its advantages
in controlling the number of samples during the restoration period while dealing
with abrupt concept drift. The statistical tests have shown a significant benefit of
employing it in different stream learning scenarios (RQ2 answered). The highest
gains of employing the method were observed when the large original chunk size
was used. With a bigger chunk size, there are fewer model updates, resulting in
a delay of reaction to concept drift.
The number of samples that can be saved depends on the drift type and the
17
5 CONCLUSIONS
Figure 8: Impact of noise in labels on proposed method effectiveness. (Upper)
baseline accuracy for synthetic data stream with different noise level added to
labels. (Lower) CAR accuracy for the same synthetic data stream. In case of
Noise levels 0.2, 0.3, and 0.4 stabilization detector was turned off.
original chunk size. When dealing with abrupt drift, the sample restoration time
can be around 50% better than the baseline (RQ3 answered). We noticed that
for each of the analyzed classifier ensemble methods, CAR minimized restoration
time and achieved better average predictive performance. It is worth noting
that the simpler the algorithm, the greater the profit from using CAR. The
most considerable profit was observed for SEA and AWE, while in the case of
WAE, sometimes the native version outperformed CAR for the Average Sample
Restoration metric (RQ4 answered). When a small amount of noise is present
in labels, CAR can still be useful, however in some cases stabilization detector
should not be used. With a larger amount of noise, there is no gain from using
the proposed method (RQ5 answered).
5 Conclusions
The work focused on the Chunk-Adaptive Restoration framework, which is dedi-
cated to chunk-based data stream classifiers enabling better recovery from concept
drifts. To achieve this goal, we proposed new methods for stabilization detection
and chunk size adaptation. Their usefulness was evaluated based on computer
experiments conducted on the real and synthetic data streams. Obtained results
show a significant difference between the predictive performance of the baseline
models and models employing CAR. Chunk-Adaptive Restoration is strongly
recommended for abrupt concept drift scenarios because it significantly can
reduce model downtime. The performance gain is not visible for other types of
18
REFERENCES REFERENCES
concept drift, but it still achieves acceptable results. The future works may focus
on:
Improving the Chunk-Adaptive Restoration behavior for gradual and incre-
mental concept drifts.
Adapting the Chunk-Adaptive Restoration to the case of limited access to
labels using a semi-supervised and active learning approach.
Proposing a more flexible method of changing data chunk size, e.g., based
on the model stability assessment.
Adapting the proposed method to imbalanced data stream classification
task, where changing the data chunk size may be correlated with the
intensity of data preprocessing (e.g., the intensity of data oversampling).
Improve stabilization method to better handle data streams with noise.
Acknowledgement
This work is supported by the CEUS-UNISONO programme, which has received
funding from the National Science Centre, Poland under grant agreement No.
2020/02/Y/ST6/00037.
References
[1]
Ksieniewicz, P. & Zyblewski, P. stream-learn–open-source Python library for difficult
data stream batch analysis. ArXiv Preprint ArXiv:2001.11077. (2020)
[2]
Katakis, I., Tsoumakas, G. & Vlahavas, I. Tracking recurring contexts using ensemble
classifiers: An application to email filtering. Knowledge And Information Systems.
22 pp. 371-391 (2010,3)
[3]
Wang, H., Fan, W., Yu, P. & Han, J. Mining Concept-Drifting Data Streams
Using Ensemble Classifiers. Proceedings Of The Ninth ACM SIGKDD Interna-
tional Conference On Knowledge Discovery And Data Mining. pp. 226-235 (2003),
https://doi.org/10.1145/956750.956778
[4]
Brzeziński, D. & Stefanowski, J. Accuracy Updated Ensemble for Data Streams
with Concept Drift. Hybrid Artificial Intelligent Systems. pp. 155-163 (2011)
[5]
Brzezinski, D. & Stefanowski, J. Reacting to Different Types of Concept Drift: The
Accuracy Updated Ensemble Algorithm. IEEE Transactions On Neural Networks
And Learning Systems. 25, 81-94 (2014)
[6]
Street, N. & Kim, Y. A Streaming Ensemble Algorithm (SEA) for Large-Scale
Classification. (2001,7)
[7]
Oza, N. Online bagging and boosting. 2005 IEEE International Conference On
Systems, Man And Cybernetics. 3pp. 2340-2345 Vol. 3 (2005)
[8]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. Scikit-learn: Machine
Learning in Python. Journal Of Machine Learning Research.
12
pp. 2825-2830 (2011)
19
REFERENCES REFERENCES
[9]
Muhlbaier, M., Topalis, A. & Polikar, R. Learn<sup>
+
+</sup>.NC: Combining
Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient
Incremental Learning of New Classes. IEEE Transactions On Neural Networks.
20
,
152-168 (2009)
[10]
Souza, V., Reis, D., Maletzke, A. & Batista, G. Challenges in Benchmarking
Stream Learning Algorithms with Real-world Data. Data Mining And Knowledge
Discovery. pp. 1-54 (2020)
[11]
Sahoo, D., Pham, Q., Lu, J. & Hoi, S. Online Deep Learning: Learning Deep
Neural Networks on the Fly. Proceedings Of The Twenty-Seventh International Joint
Conference On Artificial Intelligence, IJCAI-18. pp. 2660-2666 (2018,7)
[12]
Hinton, G. Connectionist learning procedures. Artificial Intelligence.
40
, 185-234
(1989), https://www.sciencedirect.com/science/article/pii/0004370289900490
[13]
Parisi, G., Kemker, R., Part, J., Kanan, C. & Wermter, S. Continual lifelong
learning with neural networks: A review. Neural Networks. 113 pp. 54 - 71 (2019)
[14]
Li, X., Zhou, Y., Wu, T., Socher, R. & Xiong, C. Learn to Grow: A Continual
Structure Learning Framework for Overcoming Catastrophic Forgetting. Proceedings
Of The 36th International Conference On Machine Learning.
97
pp. 3925-3934
(2019,6,9)
[15]
Kemker, R., Abitino, A., McClure, M. & Kanan, C. Measuring Catastrophic
Forgetting in Neural Networks. ArXiv. abs/1708.02072 (2018)
[16]
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M. & Liang, J. Fine-Tuning
Convolutional Neural Networks for Biomedical Image Analysis: Actively and In-
crementally. 2017 IEEE Conference On Computer Vision And Pattern Recognition
(CVPR). pp. 4761-4772 (2017)
[17]
Penna, A., Mohammadi, S., Jojic, N. & Murino, V. Summarization and Classifica-
tion of Wearable Camera Streams by Learning the Distributions over Deep Features
of Out-of-Sample Image Sequences. IEEE International Conference On Computer
Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. pp. 4336-4344 (2017)
[18]
Pasricha, R., Gujral, E. & Papalexakis, E. Identifying and Alleviating Concept Drift
in Streaming Tensor Decomposition. Machine Learning And Knowledge Discovery In
Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September
10-14, 2018, Proceedings, Part II. 11052 pp. 327-343 (2018)
[19]
Khan, Z., Lehtomäki, J., Shahid, A. & Moerman, I. DEMO: Real-time Edge
Analytics and Concept Drift Computation for Efficient Deep Learning From Spectrum
Data. 39th IEEE Conference On Computer Communications, INFOCOM Workshops
2020, Toronto, ON, Canada, July 6-9, 2020. pp. 1290-1291 (2020)
[20]
Yu, L., Twardowski, B., Liu, X., Herranz, L., Wang, K., Cheng, Y., Jui, S. & Weijer,
J. Semantic Drift Compensation for Class-Incremental Learning. 2020 IEEE/CVF
Conference On Computer Vision And Pattern Recognition, CVPR 2020, Seattle,
WA, USA, June 13-19, 2020. pp. 6980-6989 (2020)
[21]
Korycki, L. & Krawczyk, B. Adversarial Concept Drift Detection under Poi-
soning Attacks for Robust Data Stream Mining. CoRR.
abs/2009.09497
(2020),
https://arxiv.org/abs/2009.09497
[22]
Lo, Y., Liao, W., Chang, C. & Lee, Y. Temporal Matrix Factorization for Tracking
Concept Drift in Individual User Preferences. IEEE Trans. Comput. Soc. Syst..
5
,
156-168 (2018)
20
REFERENCES REFERENCES
[23]
Sun, Y., Xue, B., Zhang, M. & Yen, G. Evolving Deep Convolutional Neural
Networks for Image Classification. IEEE Trans. Evol. Comput.. 24, 394-407 (2020)
[24]
Croce, F. & Hein, M. Minimally distorted Adversarial Examples with a Fast
Adaptive Boundary Attack. Proceedings Of The 37th International Conference On
Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event.
119
pp. 2196-2205
(2020)
[25]
Wang, S. & Zhang, L. Self-adaptive Re-weighted Adversarial Domain Adaptation.
Proceedings Of The Twenty-Ninth International Joint Conference On Artificial
Intelligence, IJCAI 2020. pp. 3181-3187 (2020)
[26]
Gomes, H., Read, J., Bifet, A., Barddal, J. & Gama, J. Machine Learning for
Streaming Data: State of the Art, Challenges, and Opportunities. SIGKDD Explo-
rations Newsletter. 21, 6-22 (2019,11)
[27]
Krawczyk, B. & Others Ensemble learning for data stream analysis: A survey. Inf.
Fusion. 37 pp. 132 - 156 (2017)
[28]
Schmidhuber, J. Deep learning in neural networks: An overview.. Neural Networks.
61 pp. 85-117 (2015)
[29]
Tsymbal, A. The Problem of Concept Drift: Definitions and Related Work. (Trinity
College Dublin,2004,4)
[30]
Widmer, G. & Kubat, M. Learning in the Presence of Concept Drift and Hidden
Context. Machine Learning. 23 pp. 69-101 (1996)
[31]
Liang, K., Li, C., Wang, G. & Carin, L. Generative Adversarial Network Training
is a Continual Learning Problem. ArXiv. abs/1811.11083 (2018)
[32]
Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M. & Mori, G. Lifelong GAN:
Continual Learning for Conditional Image Generation. 2019 IEEE/CVF International
Conference On Computer Vision (ICCV). pp. 2759-2768 (2019)
[33]
Gama, J., Žliobaite, Bifet, A., Pechenizkiy, M. & Bouchachia, A. A
Survey on Concept Drift Adaptation. ACM Comput. Surv..
46
(2014,3),
https://doi.org/10.1145/2523813
[34]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A. & Bengio, Y. Generative Adversarial Nets. Proceedings Of The 27th
International Conference On Neural Information Processing Systems - Volume 2. pp.
2672-2680 (2014)
[35]
Liu, Z., Luo, P., Wang, X. & Tang, X. Deep Learning Face Attributes in the Wild.
Proceedings Of International Conference On Computer Vision (ICCV). (2015,12)
[36]
Krizhevsky, A., Nair, V. & Hinton, G. CIFAR-10 (Canadian Institute for Advanced
Research). (0), http://www.cs.toronto.edu/ kriz/cifar.html
[37]
Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a Novel Image Dataset for
Benchmarking Machine Learning Algorithms. (2017,8,28)
[38]
Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive Growing of GANs
for Improved Quality, Stability, and Variation. CoRR.
abs/1710.10196
(2017),
http://arxiv.org/abs/1710.10196
[39]
Srivastava, A., Valkov, L., Russell, C., Gutmann, M. & Sutton, C. VEEGAN:
Reducing Mode Collapse in GANs using Implicit Variational Learning. (2017)
21
REFERENCES REFERENCES
[40]
Che, T., Li, Y., Jacob, A., Bengio, Y. & Li, W. Mode Regularized Generative Adver-
sarial Networks. CoRR. abs/1612.02136 (2016), http://arxiv.org/abs/1612.02136
[41]
Shaker, A. & Hüllermeier, E. Recovery analysis for adaptive learning from non-
stationary data streams: Experimental design and case study. Neurocomputing.
150
pp. 250-264 (2015)
[42]
Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein Generative Adversarial
Networks. Proceedings Of The 34th International Conference On Machine Learning.
70 pp. 214-223 (2017,8,6)
[43]
Radford, A., Metz, L. & Chintala, S. Unsupervised Representation
Learning with Deep Convolutional Generative Adversarial Networks. (2015),
http://arxiv.org/abs/1511.06434, cite arxiv:1511.06434Comment: Under review as a
conference paper at ICLR 2016
[44]
Montavon, G., Binder, A., Lapuschkin, S., Samek, W. & Müller, K. Layer-Wise
Relevance Propagation: An Overview. Explainable AI: Interpreting, Explaining And
Visualizing Deep Learning. pp. 193-209 (2019)
[45]
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition.
CoRR. abs/1512.03385 (2015), http://arxiv.org/abs/1512.03385
[46]
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J. & Zhang, G. Learning under Concept
Drift: A Review. IEEE Trans. Knowl. Data Eng.. 31, 2346-2363 (2019)
[47]
Fawcett, T. An introduction to ROC analysis. Pattern Recognition Letters.
27
, 861-
874 (2006), https://www.sciencedirect.com/science/article/pii/S016786550500303X,
ROC Analysis in Pattern Recognition
[48]
Li, J., Qu, S., Li, X., Szurley, J., Kolter, J. & Metze, F. Adversarial Music: Real
world Audio Adversary against Wake-word Detection System. Advances In Neural
Information Processing Systems 32: Annual Conference On Neural Information
Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC,
Canada. pp. 11908-11918 (2019)
[49]
Li, J. & Xue, Y. Scribble-to-Painting Transformation with Multi-Task Generative
Adversarial Networks. Proceedings Of The Twenty-Eighth International Joint Con-
ference On Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019.
pp. 5916-5922 (2019)
[50]
Borji, A. Pros and cons of GAN evaluation measures. Comput. Vis. Image Underst..
179 pp. 41-65 (2019)
[51]
Rostami, M., Kolouri, S., Pilly, P. & McClelland, J. Generative Continual Concept
Learning. The Thirty-Fourth AAAI Conference On Artificial Intelligence, AAAI 2020,
The Thirty-Second Innovative Applications Of Artificial Intelligence Conference,
IAAI 2020, The Tenth AAAI Symposium On Educational Advances In Artificial
Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. pp. 5545-5552
(2020)
[52]
Parisi, G., Kemker, R., Part, J., Kanan, C. & Wermter, S. Continual lifelong
learning with neural networks: A review. Neural Networks. 113 pp. 54-71 (2019)
[53]
Pan, M., Huang, W., Li, Y., Zhou, X. & Luo, J. xGAIL: Explainable Generative
Adversarial Imitation Learning for Explainable Human Decision Analysis. KDD ’20:
The 26th ACM SIGKDD Conference On Knowledge Discovery And Data Mining,
Virtual Event, CA, USA, August 23-27, 2020. pp. 1334-1343 (2020)
22
REFERENCES REFERENCES
[54]
Junsawang, P., Phimoltares, S. & Lursinsap, C. Streaming chunk incremental
learning for class-wise data stream classification with fast learning speed and low
structural complexity. PloS One. 14, e0220624 (2019)
[55]
Wang, H., Fan, W., Yu, P. & Han, J. Mining concept-drifting data streams
using ensemble classifiers. Proceedings Of The Ninth ACM SIGKDD International
Conference On Knowledge Discovery And Data Mining. pp. 226-235 (2003)
[56]
Bifet, A., Gavald, R., Holmes, G. & Pfahringer, B. Machine Learning for Data
Streams: With Practical Examples in MOA. (The MIT Press,2018)
[57]
Bahri, M., Bifet, A., Gama, J., Gomes, H. & Maniu, S. Data stream analysis:
Foundations, major tasks and tools. Wiley Interdiscip. Rev. Data Min. Knowl. Discov..
11 (2021), https://doi.org/10.1002/widm.1405
[58]
Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V.,
Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M. & Stefanowski, J. Open Challenges
for Data Stream Mining Research. SIGKDD Explor. Newsl..
16
, 1-10 (2014,9),
https://doi.org/10.1145/2674026.2674028
[59]
Ramirez-Gallego, S., Krawczyk, B., Garcia, S., Wozniak, M. & Her-
rera, F. A survey on data preprocessing for data stream mining: Cur-
rent status and future directions. Neurocomputing.
239
pp. 39 - 57 (2017),
http://www.sciencedirect.com/science/article/pii/S0925231217302631
[60]
Kuncheva, L. Classifier Ensembles for Changing Environments. Multiple Classifier
Systems, 5th International Workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004,
Proceedings. 3077 pp. 1-15 (2004)
[61]
Jackowski, K. Fixed-size ensemble classifier system evolutionarily adapted to
a recurring context with an unlimited pool of classifiers. Pattern Analysis And
Applications. 17, 709-724 (2014,11), https://doi.org/10.1007/s10044-013-0318-x
[62]
Oza, N. & Tumer, K. Classifier ensembles: Select real-world applications. Inf.
Fusion. 9, 4-20 (2008,1)
[63]
Bifet, A., Holmes, G., Pfahringer, B., Read, J., Kranen, P., Kremer, H., Jansen,
T. & Seidl, T. MOA: a Real-time Analytics Open Source Framework. Proc. Euro-
pean Conference On Machine Learning And Principles And Practice Of Knowledge
Discovery In Databases (ECML PKDD 2011), Athens, Greece. pp. 617-620 (2011)
[64]
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R. & Gavaldà, R. New ensemble
methods for evolving data streams. Proceedings Of The 15th ACM SIGKDD In-
ternational Conference On Knowledge Discovery And Data Mining. pp. 139-148
(2009)
[65] Duda, R., Hart, P. & Stork, D. Pattern Classification. (Wiley,2001)
[66]
Widmer, G. & Kubat, M. Effective learning in dynamic environments by explicit
context tracking. Machine Learning: ECML-93. 667 pp. 227-243 (1993)
[67]
Rodriguez, J. & Kuncheva, L. Combining Online Classification Approaches for
Changing Environments. Proceedings Of The 2008 Joint IAPR International Work-
shop On Structural, Syntactic, And Statistical Pattern Recognition. pp. 520-529
(2008)
[68] Lazarescu, M., Venkatesh, S. & Bui, H. Using multiple windows to track concept
drift. Intell. Data Anal.. 8, 29-59 (2004,1)
23
REFERENCES REFERENCES
[69]
Sobolewski, P. & Wozniak, M. Concept Drift Detection and Model Selection with
Simulated Recurrence and Ensembles of Statistical Detectors. Journal Of Universal
Computer Science. 19, 462-483 (2013,2,28)
[70]
Gustafsson, F. Adaptive Filtering and Change Detection. Adaptive Filtering And
Change Detection. pp. 510 (2000,10)
[71]
Gama, J., Medas, P., Castillo, G. & Rodrigues, P. Learning with drift detection.
In SBIA Brazilian Symposium On Artificial Intelligence. pp. 286-295 (2004)
[72]
Raudys, S. Statistical and Neural Classifiers: An Integrated Approach to Design.
(Springer Publishing Company, Incorporated,2014)
[73]
Baena-Garcıa, M., Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R. & Morales-
Bueno, R. Early drift detection method. Fourth International Workshop On Knowl-
edge Discovery From Data Streams. 6pp. 77-86 (2006)
[74]
Blanco, I., Campo-Avila, J., Ramos-Jimenez, G., Bueno, R., Diaz, A. & Mota, Y.
Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds.
IEEE Trans. Knowl. Data Eng.. 27, 810-823 (2015)
[75]
Zliobaite, I., Budka, M. & Stahl, F. Towards cost-sensitive adaptation: When is it
worth updating your predictive model?. Neurocomputing. 150 pp. 240-249 (2015)
[76]
Woźniak, M., Kasprzak, A. & Cal, P. Weighted Aging Classifier Ensemble for the
Incremental Drifted Data Streams. Flexible Query Answering Systems. pp. 579-588
(2013)
[77]
Bifet, A., Holmes, G., Kirkby, R. & Pfahringer, B. MOA: Massive Online Analysis.
J. Mach. Learn. Res.. 11 pp. 1601-1604 (2010,8)
[78]
Anonymous Chunk Adaptive Restoration. GitHub Repository. (2021),
https://anonymous.4open.science/r/concept-drift-evaluation-A7B5/README.md
[79]
Liu, N., Zhu, W., Liao, B. & Ren, S. Weighted Ensemble with Dynamical Chunk
Size for Imbalanced Data Streams in Nonstationary Environment. (2017,1)
[80]
Lu, Y., Cheung, Y. & Yan Tang, Y. Adaptive Chunk-Based Dynamic Weighted
Majority for Imbalanced Data Streams With Concept Drift. IEEE Transactions On
Neural Networks And Learning Systems. 31, 2764-2778 (2020)
[81]
Bifet, A. & Gavaldà, R. Learning from Time-Changing Data with Adaptive
Windowing. Proceedings Of The 7th SIAM International Conference On Data Mining.
7(2007,4)
[82]
Harvey, W., Carabelli, A., Jackson, B., Gupta, R., Thomson, E., Harrison, E.,
Ludden, C., Reeve, R., Rambaut, A., Peacock, S., Robertson, D. & Consortium,
C. SARS-CoV-2 variants, spike mutations and immune escape.
Nature Reviews
Microbiology. 19, 409-424 (2021,7), https://doi.org/10.1038/s41579-021-00573-0
[83]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. Scikit-learn: Machine
Learning in Python.
Journal Of Machine Learning Research
.
12
pp. 2825-2830 (2011)
24
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Continuous learning from streaming data is among the most challenging topics in the contemporary machine learning. In this domain, learning algorithms must not only be able to handle massive volume of rapidly arriving data, but also adapt themselves to potential emerging changes. The phenomenon of evolving nature of data streams is known as concept drift. While there is a plethora of methods designed for detecting its occurrence, all of them assume that the drift is connected with underlying changes in the source of data. However, one must consider the possibility of a malicious injection of false data that simulates a concept drift. This adversarial setting assumes a poisoning attack that may be conducted in order to damage the underlying classification system by forcing an adaptation to false data. Existing drift detectors are not capable of differentiating between real and adversarial concept drift. In this paper, we propose a framework for robust concept drift detection in the presence of adversarial and poisoning attacks. We introduce the taxonomy for two types of adversarial concept drifts, as well as a robust trainable drift detector. It is based on the augmented restricted Boltzmann machine with improved gradient computation and energy function. We also introduce Relative Loss of Robustness—a novel measure for evaluating the performance of concept drift detectors under poisoning attacks. Extensive computational experiments, conducted on both fully and sparsely labeled data streams, prove the high robustness and efficacy of the proposed drift detection framework in adversarial scenarios.
Article
Full-text available
Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of ‘variants of concern’, that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
Article
Full-text available
The significant growth of interconnected Internet‐of‐Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state‐of‐the‐art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: • Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining • Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining Abstract Data Stream Mining
Article
Full-text available
Streaming data are increasingly present in real-world applications such as sensor measurements, satellite data feed, stock market, and financial data. The main characteristics of these applications are the online arrival of data observations at high speed and the susceptibility to changes in the data distributions due to the dynamic nature of real environments. The data stream mining community still faces some primary challenges and difficulties related to the comparison and evaluation of new proposals, mainly due to the lack of publicly available high quality non-stationary real-world datasets. The comparison of stream algorithms proposed in the literature is not an easy task, as authors do not always follow the same recommendations, experimental evaluation procedures, datasets, and assumptions. In this paper, we mitigate problems related to the choice of datasets in the experimental evaluation of stream classifiers and drift detectors. To that end, we propose a new public data repository for benchmarking stream algorithms with real-world data. This repository contains the most popular datasets from literature and new datasets related to a highly relevant public health problem that involves the recognition of disease vector insects using optical sensors. The main advantage of these new datasets is the prior knowledge of their characteristics and patterns of changes to adequately evaluate new adaptive algorithms. We also present an in-depth discussion about the characteristics, reasons, and issues that lead to different types of changes in data distribution, as well as a critical review of common problems concerning the current benchmark datasets available in the literature.
Article
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows producing a synthetic data stream that may incorporate each of the three main concept drift types (i.e., sudden, gradual and incremental drift) in their recurring or non-recurring version, as well as static and dynamic class imbalance. The package allows conducting experiments following established evaluation methodologies (i.e., Test-Then-Train and Prequential). Besides, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-the-art chunk-based and online classifier ensembles. The package utilises its own implementations of prediction metrics for imbalanced binary classification tasks to improve computational efficiency.
Conference Paper
Existing adversarial domain adaptation methods mainly consider the marginal distribution and these methods may lead to either under transfer or negative transfer. To address this problem, we present a self-adaptive re-weighted adversarial domain adaptation approach, which tries to enhance domain alignment from the perspective of conditional distribution. In order to promote positive transfer and combat negative transfer, we reduce the weight of the adversarial loss for aligned features while increasing the adversarial force for those poorly aligned measured by the conditional entropy. Additionally, triplet loss leveraging source samples and pseudo-labeled target samples is employed on the confusing domain. Such metric loss ensures the distance of the intra-class sample pairs closer than the inter-class pairs to achieve the class-level alignment. In this way, the high accurate pseudolabeled target samples and semantic alignment can be captured simultaneously in the co-training process. Our method achieved low joint error of the ideal source and target hypothesis. The expected target error can then be upper bounded following Ben-David’s theorem. Empirical evidence demonstrates that the proposed model outperforms state of the arts on standard domain adaptation datasets.
Article
One of the most challenging problems in the field of online learning is concept drift, which deeply influences the classification stability of streaming data. If the data stream is imbalanced, it is even more difficult to detect concept drifts and make an online learner adapt to them. Ensemble algorithms have been found effective for the classification of streaming data with concept drift, whereby an individual classifier is built for each incoming data chunk and its associated weight is adjusted to manage the drift. However, it is difficult to adjust the weights to achieve a balance between the stability and adaptability of the ensemble classifiers. In addition, when the data stream is imbalanced, the use of a size-fixed chunk to build a single classifier can create further problems; the data chunk may contain too few or even no minority class samples (i.e., only majority class samples). A classifier built on such a chunk is unstable in the ensemble. In this article, we propose a chunk-based incremental learning method called adaptive chunk-based dynamic weighted majority (ACDWM) to deal with imbalanced streaming data containing concept drift. ACDWM utilizes an ensemble framework by dynamically weighting the individual classifiers according to their classification performance on the current data chunk. The chunk size is adaptively selected by statistical hypothesis tests to access whether the classifier built on the current data chunk is sufficiently stable. ACDWM has four advantages compared with the existing methods as follows: 1) it can maintain stability when processing nondrifted streams and rapidly adapt to the new concept; 2) it is entirely incremental, i.e., no previous data need to be stored; 3) it stores a limited number of classifiers to ensure high efficiency; and 4) it adaptively selects the chunk size in the concept drift environment. Experiments on both synthetic and real data sets containing concept drift show that ACDWM outperforms both state-of-the-art chunk-based and online methods.