Conference PaperPDF Available

Combining active learning with concept drift detection for data stream mining

Combining active learning with concept drift
detection for data stream mining
Bartosz Krawczyk
Dept of Computer Science
Virginia Commonwealth University
Richmond VA, USA
Email: bkrawczyk@vcu.edu
Bernhard Pfahringer
Dept of Computer Science
Univ. of Waikato
Hamilton, New Zealand
Email: bernhard@cs.waikato.ac.nz
Michał Wo´
zniak
Dept of Systems and Computer Networks
Wrocław Univ. of Science and Technology
Wrocław, Poland
Email: michal.wozniak@pwr.edu.pl
Abstract—Most of data stream classifier learning methods
assume that a true class of an incoming object is available
right after the instance has been processed and new and labeled
instance may be used to update a classifier’s model, drift detection
or capturing novel concepts. However, assumption that we have
an unlimited and infinite access to class labels is very naive and
usually would require a very high labeling cost. Therefore the
applicability of many supervised techniques is limited in real-life
stream analytics scenarios. Active learning emerges as a potential
solution to this problem, concentrating on selecting only the most
valuable instances and learning an accurate predictive model with
as few labeling queries as possible. However learning from data
streams differ from online learning as distribution of examples
may change over time. Therefore, an active learning strategy must
be able to handle concept drift and quickly adapt to evolving
nature of data. In this paper we present novel active learning
strategies that are designed for effective tackling of such changes.
We assume that most labeling effort is required when concept
drift occurs, as we need a representative sample of new concept
to retrain properly the predictive model. Therefore, we propose
active learning strategies that are guided by drift detection
module to save budget for difficult and evolving instances. Three
proposed strategies are based on learner uncertainty, dynamic
allocation of budget over time and search space randomization.
Experimental evaluation of the proposed methods prove their
usefulness for reducing labeling effort in learning from drifting
data streams.
Keywordsmachine learning; data stream mining; concept
drift; active learning; drift detection
I. INT ROD UC TI ON
Contemporary machine learning problems are much more
complex than the ones we have faced 10 or 20 years ago. With
the advance of the big data era we need to address emerging
problems deeply connected with the nature of analyzed in-
stances. We can identify the 5 Vs of big data: volume, velocity,
variety, veracity and value. Let us take a look on the problem
of velocity. This paradigm assume that data is in constant mo-
tion, arrives constantly and thus must be handled in real-time.
This is further connected with the notion of volume as data
will arrive for potentially infinite amount of time, flooding both
the processing and storage systems [1], [2]. Such a problem
is known in the literature as data stream [3], [4]. This forces
us to develop new methods that are able to handle learning
from such ever-growing collection of instances under certain
constrains as time and memory limitations. However, learning
from data streams differ from traditional online learning as
here properties of data may change over time. For example let
us look at malware detection problem. Such malicious software
is far from being static, as it evolves over time to elude ever-
improving security systems. Such a phenomenon is known as
concept drift [5]. Efficient data stream mining methods must
assume the presence of this problem and be able to tackle it
efficiently by constantly adapting to non-stationary distribution
of data [6]. The challenge lies in how to properly use the
incoming objects to keep our learning model updated and limit
the costs imposed by constantly modifying our recognition
system [7].
When mining data streams unlabeled objects are abundant
as they arrive over time with given ratio specific to the analyzed
problem. However labels assigned to these instances may be
costly to obtain due to required human input (labor cost).
In some applications we may obtain true class labels with
very small cost (e.g., weather prediction), but this is not true
for most of problems and is connected with the label delay
issue. In most problems obtaining a label would require a
constant access to human expert or a kind of oracle. This is
subject to various constraints, e.g., financial (we need to pay
the expert), time (objects may appear faster than the expert
is able to handle them), logistical (expert may not be able to
work 24/7) or resources (some expert-based procedures, like
laboratory tests, cannot be repeated continuously). Sometimes,
access to the true label is delayed, even if we have access to
an oracle, e.g., the true label for the problem of the credit
approval is available ca. 2 years after the decision, while
some medical diagnosis could be confirmed after laboratory
test and may take a few weeks. However, most of the methods
described in the literature for learning from streams naively
assume that true labels for objects are available all the time
upon request. This assumption is a highly unrealistic one and
limits the usefulness of many supervised techniques in real-
life tasks [8], [9]. Therefore, methods for selecting only the
most valuable samples for labeling are of crucial importance
to data stream mining community. Here active learning have
been identified as a promising solution to this challenge [10],
[11]. This approach concentrates on how to select objects for
labeling instead of requesting for all objects to be labeled. This
problem is well-known and extensively discussed in static [12]
and online scenarios [13]. However there are only few works
discussing the problem of active learning for data streams [14],
[15], [16], [17], [18], especially in the presence of concept drift
[10]. The difference between active learning in online and data
stream scenario is the changes expectation. In online scenario
2018 IEEE International Conference on Big Data (Big Data)
978-1-5386-5035-6/18/$31.00 ©2018 IEEE 2239
we usually fix a predefined threshold associated with e.g.,
certainty of the used learner and allow it to guide the labeling
process. However, some regions of the decision space may
never be queried (due to an initial high certainty associated
with them) and therefore change appearing in these areas
will never be detected. Additionally, as data streams evolve
over time it seems far from rational to keep a fixed certainty
threshold. Instead it should adapt to the nature of stream at
a given moment. When concept drift takes place, examples
from the new distributions are of highest importance to the
learner, as they must be utilized to maintain its competence.
At the same time, querying a stable data stream would not
bring any new information for the learner. However, active
learning strategies proposed so far have no direct connection
with the drift detection procedure.
In this paper we propose new active learning strategies
guided directly by drift detectors. Information about shift
detection is being used by the active learning strategy to
increase querying ratio in order to accumulate a high number of
new and valuable samples for classifier update procedure. This
allows us to balance the budget assigned to analyzed stream
by limiting querying in static moments and saving it for when
the change occurs. Three strategies are proposed, based on
learner uncertainty, dynamic allocation of budget over time
and search space randomization. We apply a semi-supervised
drift detection method that works only with previously labeled
samples. Therefore, when uncertain samples begin to appear a
drift detector is able to properly evaluate if we are dealing
with outliers or a real drift. We put no restriction on the
classifier used, which makes our methods highly suitable for
adapting any supervised learning methods to real-life data
stream mining with limited budget. We compare our proposals
to a fully labeled data stream and reference methods that do not
use feedback from the drift detection module. Their accuracies
and reaction to changes are being examined over a number of
artificial and real stream benchmarks. Obtained results prove
the usefulness of the proposed methods and show that they are
able to better allocate the labeling budget to when we really
need to update our predictive model.
II. MI NI NG DATA ST RE AM S WI TH C ON CE PT D RIFT
Four main categories of approaches for handling concept
drift can be distinguished. Let us present shortly all of them.
Methods with triggers, which base on so-called drift de-
tectors methods aim at identifying a moment when change
appears or is likely to appear and alarm the recognition system
[19]. This is an external module that monitors the properties of
data stream in a supervised, semi-supervised or unsupervised
manner. It is important to point out that using supervised drift
detectors require a full access to true class labels or to the
performance of using classifier, which in real-life scenarios
is almost impossible as discussed in the previous section. On
the other hand unsupervised drift detection methods cannot
detect a real concept drift in cases where statistical properties
of data did not changed (e.g., classes have swapped places)
[20]. Therefore, semi-supervised drift detection seems as the
best option.
Online learners are classifiers that constantly update their
structure while processing the incoming instances [21]. Such
methods should meet the following requirements: process each
instance only once, work under time and memory constraints
and when the training procedure is being interrupted its quality
should not be lower than classifier trained using batch mode on
the same data. Online classifiers offer high working speed and
are able to flexibly adapt themselves to evolving data (which
is also known as implicit drift detection). Some of popular
classifiers like Na¨
ıve Bayes or Neural Networks may work
in online mode, but there is a number of more sophisticated
approaches like Concept-Adapting Very Fast Decision Tree
algorithm [22], which ensures consistency with incoming data
by maintaining alternative subtrees [23].
Methods based on sliding windows are based on the
instance forgetting mechanisms. They assume that recently
arrived objects represent the current state of the analyzed data
stream and hence should be more relevant to the recognition
system. Usually such a window has a fixed size and cuts off
older instances, or applies a data weighting scheme in which
important objects have assigned higher weights. When dealing
with the sliding window the main question is how to adjust
the window size. On the one hand, a shorter window allows
focusing on the emerging context, though data may not be
representative for a longer lasting context. On the other hand, a
wider window may result in mixing the instances representing
different contexts. Therefore, recent proposals include dynamic
window size allocation or combining multiple windows [24].
Finally, ensemble learning have gained a significant popu-
larity in the stream mining community along recent years [25].
They maintain the advantages present in static scenarios, such
as exploiting local competencies of classifiers or robustness to
overfitting. At the same time one may view the drifting context
as an additional way to ensure diversity among committee
members. Here dynamic combiners, online ensembles and en-
sembles with dynamic line-up are the most popular approaches
[26].
III. PROP OS ED ACTIVE LEA RN IN G ST RATEGIES
In this section we will describe the proposed active learning
strategies guided by drift detection for evolving data streams.
A. Preliminaries
Let us assume that our stream consist of a potentially infi-
nite set of examples DS ={(x1, j1),(x2, j2), ..., (xk, jk), ...},
where xkstands for feature vector (xk X )describing the kth
object and jkits label jk∈ M, which should be assigned by
oracle and of course learning algorithm should pay for it. As
we mentioned before we want to reduce the label querying cost
then we introduce a budget Bthat shows how many instances
we can afford to label. We assume that 0< B < 1. In cases
of B= 0 and B= 1 we would have a fully unlabeled and
fully labeled data stream respectively. A labeling strategy is
an realization of active learning paradigms that allow us to
evaluate if for a currently analyzed sample we are interested
in obtaining its true label. Output of such an strategy is realized
as a Boolean variable, indicating a decision regarding the label
query.
2240
B. Proposed framework
Let us now present a general framework for the proposed
active learning strategies. We propose to construct it on an
online learning scenario from data streams with concept drift.
For detecting changes in data we use a drift detector module.
It is realized as ADWIN2 drift detector [27], due to its low
computational complexity and proved efficiency. It uses only
labeled samples coming from an active learning, thus not
imposing any additional costs on the proposed system. When
the accuracy of the classifier begins to decrease we start to train
a new classifier in the background using arriving objects. In
case of change being detected the new classifier replaces the
old one. For each incoming object we check if the labeling
strategy conditions are being fulfilled (they are triggered ran-
domly or by a loss of classifiers’ confidence). However, we are
most interested in obtaining labels for new objects appearing
after concept drift. Quickly gathering a representative sample
would allow for early preparation of new classifier and efficient
replacement of the outdated learner. Therefore, we should
maintain our budget and dynamically allocate it over time
when needed. We propose to create a feedback between the
labeling strategies and drift detection module. In case of alarm
being raised or change being detected we increase the labeling
rate in order to probe the emerging concept. Let’s Rstands
for labeling ratio, which should depend on answer of the drift
detector. The labeling ratios should be ordered in the following
way: R(static)< R(alarm)< R(change). This allows us
to control budget and save it for obtaining new knowledge for
the recognition system. The details of the proposed framework
are given in Algorithm 1.
Algorithm 1: Proposed general framework for active
learning from drifting data streams.
input: budget B, labeling rate R, labeling strategy S(x,
R), classifier Ψ, drift detector D
labeling cost b0
while end of stream = FALSE do
obtain new object xfrom the stream
if b<Bthen
if S(x, R) = TRUE then
obtain label yof object x
bb + 1
update classifier Ψwith (x, y)
update drift detector Dwith (x, y)
if drift warning = TRUE then
start a new classifier Ψnew
increase labeling rate R
else
if drift detected = TRUE then
replace Ψwith Ψnew
further increase labeling rate R
else
return to initial labeling rate R
if Ψnew exists then
update classifier Ψnew with (x, y)
C. Random strategy++
This is a very simple active learning strategy that randomly
draws instance labels with probability equal to the assumed
budget B. We propose to improve label query if the change is
being detected by increasing the labeling probability according
to the output of drift detector (alarm or change detected). The
details of this strategy are given in Algorithm 2.
Algorithm 2: Labeling strategy RAND++(x,r,B)
input: new object x, labeling rate adjustment
R[0,1], budget B
Result: labeling [TRUE, FALSE]
generate a uniform random variable λ[0,1]
if drift warning then
λλR
else
if drift detected then
λλ2R
labeling I(λB)
D. Variable uncertainty strategy++
This strategy is based on monitoring the certainty of clas-
sifier Ψdecision expressed as its support functions FΨ(x, j)
for object xbelonging to j-th class. It aims to label the
least certain instances within a time interval. A time-variable
threshold imposed on classifier’s certainty is being used. It
adjusts itself depending on the incoming data to balance
the budget use over time. For static parts of the stream the
classifier’s certainty stabilizes and threshold is being increased
to allow for labeling of only the most uncertain objects. When
the drift detector returns information about detected alarm or
change we start to rapidly decrease the threshold in order
to allow for gathering a higher number of labeled objects to
quickly adapt a new model to the current state of the stream.
The details of this strategy are given in Algorithm 3.
Algorithm 3: Labeling strategy VAR-UN++(x,s,θ,r,Ψ)
input: new object x, threshold θ, threshold adjustment
s[0,1], labeling rate adjustment R[0,1],
R > s, trained classifier Ψ
Result: labeling [TRUE, FALSE]
initialize θand store its latest value
if maxm∈M FΨ(x, m)< θ then
decrease the uncertainty threshold as follows:
if drift warning then
θθR
else
if drift detected then
θθ2R
else
θθs
labeling TRUE
else
increase the uncertainty threshold θθ+s
labeling FALSE
2241
E. Randomized variable uncertainty strategy++
This is a modification of the previous strategy that modifies
the threshold by a random factor. This allows for labeling some
of the examples to which classifier displays high certainty in
order not to miss any possible drift that may appear in any part
of the decision space. However, this happens at the expense
of sacrificing some of uncertain instances. Thus this strategy
is expected to perform worse than its predecessor for static
streams, but adapt faster to occurring changes. The details of
this strategy are given in Algorithm 4.
Algorithm 4: Labeling strategy R-VAR-
UN++(x,s,θ,δ,R,Ψ)
input: new object x, threshold θ, threshold adjustment
s[0,1], labeling rate adjustment R[0,1], r
>s, threshold random variance δ, trained
classifier Ψ
Result: labeling [TRUE, FALSE]
initialize θand store its latest value
ηrandom multiplier N (1, δ)
θrand θ×η
if maxj∈M FΨ(x, j)< θr and then
decrease the uncertainty threshold as follows:
if drift warning then
θθR
else
if drift detected then
θθ2R
else
θθs
labeling TRUE
else
increase the uncertainty threshold θθ+s
labeling FALSE
IV. EXP ER IM EN TAL STUDY
In this section we present the experimental evaluation of
the proposed active learning methods for drifting data streams.
A. Set-up
For non-stationary data streams there is still just a few
publicly available data sets to work with. Most of them are
artificially generated ones, with only some real-life examples.
Following the standard approaches found in literature we
decided to use both artificial and real-life data sets, details
of which can be found in Table I.
TABLE I: Details of data stream benchmarks used in the
experiments.
Data set Objects Features Classes Drift type
Airlines 539 383 7 2 unknown
Electricity 45 312 7 2 unknown
Forest Cover 581 012 53 7 unknown
RBF 1 000 000 20 4 gradual
Hyperplane 1 000 000 10 2 incremental
Tree 1 000 000 10 6 sudden recurring
We compare the proposed strategies (RAND++, VAR-
UN++ and R-VAR-UN++) with their basic versions that do
not use information from the drift detector [10]. We use the
following parameters for these strategies: threshold adjustment
s= 0.01, labeling rate adjustment r= 0.03 and thresh-
old random variance δ= 1. We analyze the budget size
B[0.05,0.10,· · · ,0.60] that is being calculated over a time
window of 2500 instances. Hoeffding tree is selected as a base
classifier.
For evaluating classifiers we use the prequential accu-
racy metric. Wilcoxon signed-rank test is adopted as a non-
parametric statistical procedure to perform pairwise compar-
isons between the classifier trained on fully labeled stream
and using active learning strategies with varying budgets.
B. Results and discussion
Figure 1 presents a detailed prequential accuracies for six
examined strategies with varying budget sizes over six stream
benchmarks, while Table II depicts a comparison of single
best accuracies for each of three proposed active learning
strategies and a classifier trained on a fully labeled dataset.
Please note that our aim is to get as close as possible to
accuracies displayed by a classifier with a full access to class
labels, at the same time using as lowest budget as possible.
TABLE II: Comparison of averaged prequential accuracies for
Hoeffding tree trained on a fully labeled data stream (FULL)
and the best one obtained from active learning strategies.
Dataset FULL RAND++ VAR-UN++ R-VAR-UN++
Airlines 69.38 67.25 65.69 66.02
Electricity 81.17 78.95 79.59 80.20
Forest Cover 80.34 71.67 74.28 74.39
RBF 93.47 92.07 92.26 92.98
Hyperplanes 83.16 81.95 82.03 82.21
Tree 69.98 68.05 69.11 69.71
From these results we can observe that only for the
Electricity datasets the proposed active learning strategies were
similar to the reference ones. For remaining stream bench-
marks we can observe a significant gain in accuracy when
the feedback from drift detector is being utilized by the label
query. Additionally, the proposed improved strategies perform
very well even with limited budgets, offering a balanced
effectiveness regardless of the budget setting. This is especially
vivid for Airlines, Forest Cover and Hyperplanes datasets. This
allows us to conclude that for limited budget the introduced
labeling queries are concentrated mainly on moments when
drift takes place, thus better sampling the changed distribution
and allowing for rapid construction of a more competent
classifier for the current concept.
Results of Wilcoxon test over multiple datasets are pre-
sented in Table III. We can see that the proposed strategies
obtain very similar results to a classifier trained on a fully
labeled stream. Using as little as 15% of data were are able to
induce a classifier that is not statistically significantly differ
from one that has access to all of labels. This is a very
important observation, which proves that by careful labeling
of only the most difficult and evolving instances we are able
to obtain comparable accuracy at greatly decreased cost.
2242
62 63 64 65 66 67 68
budget used
accuracy[%]
0.1 0.2 0.3 0.4 0.5 0.6
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(a) Airlines
74 76 78 80
budget used
accuracy[%]
0.1 0.2 0.3 0.4 0.5 0.6
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(b) Electricity
66 68 70 72 74
budget used
accuracy[%]
0.1 0.2 0.3 0.4 0.5 0.6
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(c) Forest Cover
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(d) RBF
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(e) Hyperplane
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
(f) Tree
Fig. 1: Accuracies on examined datasets for Hoeffding tree and a given labeling budget using different active learning strategies.
TABLE III: Wilcoxon tests for comparing a Hoeffding tree trained using a fully labeled stream (FULL) and a Hoeffding tree
trained with selected labeling strategy and fixed budget. Symbol ”<” stands for situation when classifier trained on a fully labeled
data stream is statistically significantly better and symbol ”=” for situation when there are no statistically significant differences
between the proposed active learning approach and fully labeled stream.
Budget
Comparison 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60
RAND++ vs. FULL <0.3643 <0.2017 <0.0740 =0.0483 =0.4724 =0.4503 =0.4025 =0.3977 =0.3643 =0.3428 =0.3215 =0.3194
VAR-UN++ vs. FULL <0.2916 <0.1866 =0.0492 =0.0446 =0.0418 =0.0378 =0.0382 =0.0357 =0.0321 =0.0277 =0.0273 =0.0270
R-VAR-UN++ vs. FULL <0.2848 <0.1609 =0.0487 =0.0431 =0.0409 =0.0351 =0.0369 =0.0334 =0.0313 =0.0258 =0.0249 =0.0246
Finally let us analyze how well the proposed and reference
active learning strategies managed the concept drift occurrence.
Figure 2 depicts the percentage of drift examples in the labeled
set. From these figures we can see that the proposed strategies
are able to better identify instances that appear during the drift
and use them to adapt classification system. Our improved
strategies label 2-3 times more examples during drift has a
direct influence on the obtained accuracies. Additionally, we
may see that R-VAR-UN++ strategy is able to detect the
highest number of drifting instances, thus proving our claim
from Section III-E that usage of threshold randomization will
be beneficial to detection of changes occurring in any point of
the decision space.
V. CO NC LU SI ON S AN D FU TU RE W OR KS
In this paper we have proposed three improved active learn-
ing strategies for mining drifting data streams. The novelty of
our proposal lied in a direct feedback from the drift detection
mechanism that controlled the labeling ratio. This way we were
able to dynamically allocate our budget and obtain labels for
objects coming from evolved distribution. This had a direct
link to the accuracy of the classification procedure as we were
able to quicker capture the changes in streams. We showed
that our proposed strategies allowed for a highly accurate
stream classification by increased label querying in the drifting
moments, even when the available budget was small. Using
statistical test we have showed that proposed active learning
2243
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
11.23
31.45
10.28
26.53
11.04
28.17
(a) Airlines
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
5.31 6.09 6.49 7.01
13.56 14.31
(b) Electricity
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
7.26
12.59
16.02
24.43
18.73
27.48
(c) Forest Cover
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
18.75
26.17 23.84
31.96
21.9
34.09
(d) RBF
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
19.63
30.83
19.86
32.98
19.97
34.74
(e) Hyperplane
RAND
RAND++
VAR−UN
VAR−UN++
R−VAR−UN
R−VAR−UN++
ratio of drift examples [%]
0
10
20
30
40
21.07 22.85 24.82
28.17
24.57
32.19
(f) Tree
Fig. 2: Percentage of drift examples in the labeled set by
examined active learning strategies. Results averaged over all
budgets examined.
strategies are not significantly worse than using a fully labeled
data stream. This contribution was a step forward to using
supervised learning methods in realistic stream settings. In our
future works we plan to apply active learning strategies with
dynamic budget for multi-class novelty detection in streaming
data.
ACK NOW LE DG ME NT
This was supported by the Polish National Science Center
under the grant no. DEC-2013/09/B/ST6/02264.
All experiments were carried out using computer equip-
ment sponsored by EC under FP7, Coordination and Support
Action, Grant Agreement Number 316097, ENGINE - Euro-
pean Research Centre of Network Intelligence for Innovation
Enhancement (http://engine.pwr.wroc.pl/).
REF ER EN CE S
[1] A. Cano, “A survey on graphic processing unit computing for large-
scale data mining,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov.,
vol. 8, no. 1, 2018.
[2] H. T. Nguyen, M. T. Thai, and T. N. Dinh, “A billion-scale approxima-
tion algorithm for maximizing benefit in viral marketing,” IEEE/ACM
Trans. Netw., vol. 25, no. 4, pp. 2419–2429, 2017.
[3] M. M. Gaber, “Advances in data stream mining,” Wiley Interdisc. Rew.:
Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 79–85, 2012.
[4] S. Ram´
ırez-Gallego, B. Krawczyk, S. Garc´
ıa, M. Wozniak, and F. Her-
rera, “A survey on data preprocessing for data stream mining: Current
status and future directions,” Neurocomputing, vol. 239, pp. 39–57,
2017.
[5] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A
survey on concept drift adaptation,ACM Comput. Surv., vol. 46, no. 4,
pp. 44:1–44:37, 2014.
[6] M. Wo´
zniak, “A hybrid decision tree training method using data
streams,” Knowl. Inf. Syst., vol. 29, no. 2, pp. 335–347, 2011.
[7] I. Zliobaite, M. Budka, and F. T. Stahl, “Towards cost-sensitive adap-
tation: When is it worth updating your predictive model?” Neurocom-
puting, vol. 150, pp. 240–249, 2015.
[8] B. Cyganek and S. Gruszczynski, “Hybrid computer vision system for
drivers’ eye recognition and fatigue monitoring,Neurocomputing, vol.
126, pp. 78–94, 2014.
[9] Z. S. Abdallah, M. M. Gaber, B. Srinivasan, and S. Krishnaswamy,
“Adaptive mobile activity recognition system with evolving data
streams,” Neurocomputing, vol. 150, pp. 304–317, 2015.
[10] I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning
with drifting streaming data,” IEEE Trans. Neural Netw. Learning Syst.,
vol. 25, no. 1, pp. 27–39, 2014.
[11] S. Mohamad, A. Bouchachia, and M. Sayed Mouchaweh, “A bi-criteria
active learning algorithm for dynamic data streams,IEEE Trans.
Neural Netw. Learning Syst., vol. 29, no. 1, pp. 74–86, 2018.
[12] C. C. Aggarwal, X. Kong, Q. Gu, J. Han, and P. S. Yu, “Active learning:
A survey,” in Data Classification: Algorithms and Applications, 2014,
pp. 571–606.
[13] L. Ma, S. Destercke, and Y. Wang, “Online active learning of decision
trees with evidential data,” Pattern Recognition, vol. 52, pp. 33–45,
2016.
[14] M. Bouguelia, Y. Bela¨
ıd, and A. Bela¨
ıd, “An adaptive streaming active
learning strategy based on instance weighting,” Pattern Recognition
Letters, vol. 70, pp. 38–44, 2016.
[15] B. Kurlej and M. Wo´
zniak, “Active learning approach to concept drift
problem,” Logic Journal of the IGPL, vol. 20, no. 3, pp. 550–559, 2012.
[16] H. Nguyen, W. K. Ng, and Y. Woon, “Concurrent semi-supervised
learning with active learning of data streams,Trans. Large-Scale Data-
and Knowledge-Centered Systems, vol. 8, pp. 113–136, 2013.
[17] M. Wo´
zniak, B. Cyganek, A. Kasprzak, P. Ksieniewicz, and
K. Walkowiak, “Active learning classifier for streaming data,” in Hybrid
Artificial Intelligent Systems - 11th International Conference, HAIS
2016, Seville, Spain, April 18-20, 2016, Proceedings, 2016, pp. 186–
197.
[18] S. Mohamad, M. Sayed Mouchaweh, and A. Bouchachia, “Active
learning for classifying data streams with unknown number of classes,”
Neural Networks, vol. 98, pp. 1–15, 2018.
[19] P. M. G. Jr., S. G. T. de Carvalho Santos, R. S. M. de Barros, and D. C.
D. L. Vieira, “A comparative study on concept drift detectors,” Expert
Syst. Appl., vol. 41, no. 18, pp. 8144–8156, 2014.
[20] P. Sobolewski and M. Woniak, “Concept drift detection and model se-
lection with simulated recurrence and ensembles of statistical detectors,”
Journal of Universal Computer Science, vol. 19, no. 4, pp. 462–483,
feb 2013.
[21] G. Melki, V. Kecman, S. Ventura, and A. Cano, “OLLAWV: online
learning algorithm using worst-violators,” Appl. Soft Comput., vol. 66,
pp. 384–393, 2018.
[22] G. Hulten, L. Spencer, and P. M. Domingos, “Mining time-changing
data streams,” in Proceedings of the seventh ACM SIGKDD inter-
national conference on Knowledge discovery and data mining, San
Francisco, CA, USA, August 26-29, 2001, 2001, pp. 97–106.
[23] P. Domingos and G. Hulten, “A general framework for mining massive
data streams.” Journal of Computational and Graphical Statistics,
vol. 12, pp. 945–949, 2003.
[24] U. Yun and G. Lee, “Sliding window based weighted erasable stream
pattern mining for stream data applications,” Future Generation Comp.
Syst., vol. 59, pp. 1–20, 2016.
[25] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Wo ´
zniak,
“Ensemble learning for data stream analysis: A survey,Information
Fusion, vol. 37, pp. 132–156, 2017.
[26] P. R. L. Almeida, L. S. Oliveira, A. S. B. Jr., and R. Sabourin, “Adapting
dynamic classifier selection for concept drift,” Expert Syst. Appl., vol.
104, pp. 67–85, 2018.
[27] A. Bifet and R. Gavald`
a, “Learning from time-changing data with
adaptive windowing,” in Proceedings of the Seventh SIAM Interna-
tional Conference on Data Mining, April 26-28, 2007, Minneapolis,
Minnesota, USA, 2007, pp. 443–448.
2244
... To the best of our knowledge, the effect of unknown verification latency on AL strategies is currently widely unexplored [18]. Further, most AL strategies assume a homogeneous budget over time, albeit label requests might be particularly beneficial right after a detected drift event [12,24]. ...
... Drift detection in presence of verification latency is getting some attention in recent years [6,16]. Semi-supervised drift detection methods are rather prominent, which monitor the performance of the queried labeled in a specific task [12,24]. But, with large verification delay recent labels might be lacking, leading to degraded performance of semi-supervised detection methods. ...
... Unsupervised drift detectors within a AL strategy is introduced in [22]. An adaptive labelling budget, where labeling ratio increases after a drift, was studied in [12]. However, null latency is assumed and semi-supervised drift detectors are used, which may not work in case of finite verification latency. ...
Preprint
Full-text available
Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
... We present a multi-modal and sequential drift detection system for medical image classifiers, which can be modified flexibly to fit different data domains. Previous work has mainly been limited to a certain type of data, like streaming text [31], image and video [32] or metadata-like informational markers from clinics, airlines, internet of things (IoT), etc. [33]. Even though their methods could be expanded into other data types, multiple metrics for drift detection would be generated. ...
Preprint
Rapidly expanding Clinical AI applications worldwide have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We use the CheXpert and PadChest public datasets to build and test a medical imaging AI drift monitoring workflow that tracks data and model drift without contemporaneous ground truth. We simulate drift in multiple experiments to compare model performance with our novel multi-modal drift metric, which uses DICOM metadata, image appearance representation from a variational autoencoder (VAE), and model output probabilities as input. Through experimentation, we demonstrate a strong proxy for ground truth performance using unsupervised distributional shifts in relevant metadata, predicted probabilities, and VAE latent representation. Our key contributions include (1) proof-of-concept for medical imaging drift detection including use of VAE and domain specific statistical methods (2) a multi-modal methodology for measuring and unifying drift metrics (3) new insights into the challenges and solutions for observing deployed medical imaging AI (4) creation of open-source tools enabling others to easily run their own workflows or scenarios. This work has important implications for addressing the translation gap related to continuous medical imaging AI model monitoring in dynamic healthcare environments.
... The split strategy was shown to be the best AL approach because it effectively spread the labelling efforts across the entire instance space. The superiority of this labelling strategy over the others is confirmed in [145]. Here, three AL strategies are proposed: 1) random, probabilistically request a label with proportional to a given labelling budget , 2) Variable Uncertainty, request labels for instances in which the classifier is least certain about during a time window , and 3) Randomised Variable Uncertainty, this allows for the labelling some samples with high classifier certainty. ...
Article
Full-text available
In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change as the stream progresses. Concepts learned by a classification model are prone to change and non-adaptive models are likely to deteriorate and become ineffective over time. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or might never be available) or in situations where manually annotating data points is prohibitively expensive. In a high-velocity stream it is perhaps impossible to manually label every incoming point and pursue a fully-supervised approach. In this article we formally describe the types of change which can occur in a data-stream and then catalogue the methods for dealing with change when there is limited access to labels. We present an overview of the most influential ideas in the field along with recent advancements and we highlight trends, research gaps, and future research directions.
... One attractive characteristic of change point algorithms is their ability to analyze data in an unsupervised manner [1], [2], which is ideal in analytical settings where data flows in an unbounded way and without annotations, as in data streams [3]. The change point detection problem is also implicitly connected to anomaly detection [4]- [7] and concept drift detection in data streams [8], [9]. However, while change point detection focuses on the identification of points where the data distribution changes (and remains in that state for a certain amount of time), anomaly detection focuses on identifying out of distribution (even single) data points. ...
Preprint
Full-text available
Detecting relevant changes in dynamic time series data in a timely manner is crucially important for many data analysis tasks in real-world settings. Change point detection methods have the ability to discover changes in an unsupervised fashion, which represents a desirable property in the analysis of unbounded and unlabeled data streams. However, one limitation of most of the existing approaches is represented by their limited ability to handle multivariate and high-dimensional data, which is frequently observed in modern applications such as traffic flow prediction, human activity recognition, and smart grids monitoring. In this paper, we attempt to fill this gap by proposing WATCH, a novel Wasserstein distance-based change point detection approach that models an initial distribution and monitors its behavior while processing new data points, providing accurate and robust detection of change points in dynamic high-dimensional data. An extensive experimental evaluation involving a large number of benchmark datasets shows that WATCH is capable of accurately identifying change points and outperforming state-of-the-art methods.
... To address this issue, one can investigate the great abilities of streambased AL [76]. Several AL-based strategies have been proposed to detect concept drift and instantly adapt to evolving characteristics of data [77] [78] [79]. • Addressing Theory of network: In Internet Engineering Task Force 97 (IETF97) 1 , the challenge is introduced as networks suffer from the lack of a unified theory that can be applied to all networks. ...
Article
Full-text available
Network Traffic Classification (NTC) has become an important feature in various network management operations, e.g., Quality of Service (QoS) provisioning and security services. Machine Learning (ML) algorithms as a popular approach for NTC can promise reasonable accuracy in classification and deal with encrypted traffic. However, ML-based NTC techniques suffer from the shortage of labeled traffic data which is the case in many real-world applications. This study investigates the applicability of an active form of ML, called Active Learning (AL), in NTC. AL reduces the need for a large number of labeled examples by actively choosing the instances that should be labeled. The study first provides an overview of NTC and its fundamental challenges along with surveying the literature on ML-based NTC methods. Then, it introduces the concepts of AL, discusses it in the context of NTC, and review the literature in this field. Further, challenges and open issues in AL-based classification of network traffic are discussed. Moreover, as a technical survey, some experiments are conducted to show the broad applicability of AL in NTC. The simulation results show that AL can achieve high accuracy with a small amount of data.
... Krawczyk et at. classified drifting data streams using the query by committee strategy [7]. Mohamad et al. proposed Stream Active Learning (sal) methodology for classifying data streams with unknown class number [10]. ...
Chapter
Real data streams often, in addition to the possibility of concept drift occurrence, can display a high imbalance ratio. Another important problem with real classification tasks, often overlooked in the literature, is the cost of obtaining labels. This work aims to connect three rarely combined research directions i.e., data stream classification, imbalanced data classification, and limited access to labels. For this purpose, the behavior of the desisc-sb framework proposed by the authors in earlier works for the classification of highly imbalanced data stream was examined under the scenario of limited label access. Experiments conducted on synthetic and real streams confirmed the potential of using desisc-sb to classify highly imbalanced data streams even in the case of low label availability.
... • Monitoring incoming stream traffic: Using passive learning methods for NTC tasks, such as security and intrusion detection is no longer reasonable, as these methods cannot handle changes in the statistical characteristics of of the target data (i.e., concept drift).To address this issue, one can investigate the great abilities of stream-based AL to solve this problem [76]. Several AL-based strategies have been proposed to detect concept drift and instantly adapt to evolving characteristics of data [77] ...
Preprint
Full-text available
Network Traffic Classification (NTC) has become an important component in a wide variety of network management operations, e.g., Quality of Service (QoS) provisioning and security purposes. Machine Learning (ML) algorithms as a common approach for NTC methods can achieve reasonable accuracy and handle encrypted traffic. However, ML-based NTC techniques suffer from the shortage of labeled traffic data which is the case in many real-world applications. This study investigates the applicability of an active form of ML, called Active Learning (AL), which reduces the need for a high number of labeled examples by actively choosing the instances that should be labeled. The study first provides an overview of NTC and its fundamental challenges along with surveying the literature in the field of using ML techniques in NTC. Then, it introduces the concepts of AL, discusses it in the context of NTC, and review the literature in this field. Further, challenges and open issues in the use of AL for NTC are discussed. Additionally, as a technical survey, some experiments are conducted to show the broad applicability of AL in NTC. The simulation results show that AL can achieve high accuracy with a small amount of data.
Article
Evolving fuzzy neural classifiers are incremental, adaptive models that use new samples to update the architecture and parameters of the models with new incoming data samples, typically occurring in form of data streams for classification problems. Most of the techniques assume that the target labels are permanently given as updating their structures and parameters in a fully supervised manner. This paper aims to implement ideas based on the concept of active learning in order to select the data most relevant for updating the model. This may greatly reduce annoying and costly labeling efforts for users/operators in an online system. Therefore, we propose an online active learning (oAL) methodology, which is closely linked to the internal evolving learning engine for fuzzy neurons, which is based on incremental data-cloud formation. It is thus based on the evaluation of the specificity of the current clouds, and especially by the change in their specificity with new (unsupervised) samples, in order to identify those samples carrying relevant information to the update of previously formed clouds. This is combined with the unsupervised cloud evolution criterion, which upon its fulfillment indicates a new knowledge contained in the data for which the class response needs to be known (thus should be selected for labeling feedback). In synergy to the evolving fuzzy neural classifier, it acts in an incremental single-pass manner, not using any past samples, which makes it extremely fast, as only fuzzy neurons attached to a new sample need to be checked for the degree of their specificity change. To prove the technique’s efficiency, tests with binary classification streams commonly used by the machine learning community were conducted for evaluation purposes. The number of supervised samples for model updates could be significantly reduced with a low or even negligible decrease in the classification accuracy trends, while a random selection of samples (with the same percentages as selected by our oAL approach) showed large performance downtrends. Furthermore, a very similar number of rule evolution trends could be observed with different percentages of selected samples, which indicates good robustness of our method with respect to knowledge extraction (as non-changing).
Chapter
Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.KeywordsStreaming active learningVerification latencyConcept driftOnline learning
Conference Paper
Full-text available
Detecting relevant changes in dynamic time series data in a timely manner is crucially important for many data analysis tasks in real-world settings. Change point detection methods have the ability to discover changes in an unsupervised fashion, which represents a desirable property in the analysis of unbounded and unlabeled data streams. However, one limitation of most of the existing approaches is represented by their limited ability to handle multivariate and high-dimensional data, which is frequently observed in modern applications such as traffic flow prediction, human activity recognition, and smart grids monitoring. In this paper, we attempt to fill this gap by proposing WATCH, a novel Wasserstein distance-based change point detection approach that models an initial distribution and monitors its behavior while processing new data points, providing accurate and robust detection of change points in dynamic high-dimensional data. An extensive experimental evaluation involving a large number of benchmark datasets shows that WATCH is capable of accurately identifying change points and outperforming state-of-the-art methods.
Article
Full-text available
Due to the ever-growing nature of dataset sizes, the need for scalable and accurate machine learning algorithms has become evident. Stochastic gradient descent methods are popular tools used to optimize large-scale learning problems because of their generalization performance, simplicity, and scalability. This paper proposes a novel stochastic, also known as online, learning algorithm for solving the L1 support vector machine (SVM) problem, named OnLine Learning Algorithm using Worst-Violators (OLLAWV). Unlike other stochastic methods, OLLAWV eliminates the need for specifying the maximum number of iterations and the use of a regularization term. OLLAWV uses early stopping for controlling the size of the margin instead of the regularization term. The experimental study, performed under very strict nested cross-validation (a.k.a., double resampling), evaluates and compares the performance of this proposal with state-of-the-art SVM kernel methods that have been shown to outperform traditional and widely used approaches for solving L1-SVMs such as Sequential Minimal Optimization. OLLAWV is also compared to 5 other traditional non-SVM algorithms. The results over 23 datasets show OLLAWV's superior performance in terms of accuracy, scalability, and model sparseness, making it suitable for large-scale learning.
Article
Full-text available
General purpose computation using Graphic Processing Units (GPUs) is a well-established research area focusing on high-performance computing solutions for massively parallelizable and time-consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high-speed volumes of information in the context of the big data era. GPUs have successfully improved the scalability of data mining algorithms to address significantly larger dataset sizes in many application areas. The popularization of distributed computing frameworks for big data mining opens up new opportunities for transformative solutions combining GPUs and distributed frameworks. This survey analyzes current trends in the use of GPU computing for large-scale data mining, discusses GPU architecture advantages for handling volume and velocity of data, identifies limitation factors hampering the scalability of the problems, and discusses open issues and future directions.
Article
Full-text available
The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non-parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy.
Article
Full-text available
Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.
Article
Online social networks have been one of the most effective platforms for marketing and advertising. Through the “world-of-mouth” exchanges, so-called viral marketing, the influence and product adoption can spread from few key influencers to billions of users in the network. To identify those key influencers, a great amount of work has been devoted for the influence maximization (IM) problem that seeks a set of k seed users that maximize the expected influence. Unfortunately, IM encloses two impractical assumptions: 1) any seed user can be acquired with the same cost and 2) all users are equally interested in the advertisement. In this paper, we propose a new problem, called cost-aware targeted viral marketing (CTVM), to find the most cost-effective seed users, who can influence the most relevant users to the advertisement. Since CTVM is NP-hard, we design an efficient (1-1/√e-ε)-approximation algorithm, named Billion-scale Cost-award Targeted algorithm (BCT), to solve the problem in billion-scale networks. Comparing with IM algorithms, we show that CTIM is both theoretically and experimentally faster than the state-of-the-arts while providing better solution quality. Moreover, we prove that under the linear threshold model, CTIM is the first sub-linear time algorithm for CTVM (and IM) in dense networks. We carry a comprehensive set of experiments on various real-networks with sizes up to several billion edges in diverse disciplines to show the absolute superiority of CTIM on both CTVM and IM domains. Experiments on Twitter data set, containing 1.46 billions of social relations and 106 millions tweets, show that CTIM can identify key influencers in trending topics in only few minutes.
Article
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research.
Conference Paper
This work reports the research on active learning approach applied to the data stream classification. The chosen characteristics of the proposed frameworks were evaluated on the basis of the wide range of computer experiments carried out on the three benchmark data streams. Obtained results confirmed the usability of proposed method to the data stream classification with the presence of incremental concept drift.
Article
As one of the variations in frequent pattern mining, erasable pattern mining discovers patterns with benefits lower than or equal to a user-specified threshold from a product database. Although traditional erasable pattern mining algorithms can perform their own mining operations on static mining environments, they are not suitable for dealing with dynamic data stream environments. In such dynamic data streams, algorithms have to process them immediately with only one database scan in order to consider characteristics of data stream mining. However, previous tree-based erasable pattern mining methods have difficulty in processing dynamic data streams because they need two or more database scans to construct their own tree structures. In addition, they do not also consider specific information of each item within a product database, but they need to conduct mining operations considering such additional information of the items in order to find more useful erasable pattern results. For this reason, in this paper, we propose a weighted erasable pattern mining algorithm suitable for sliding window-based data stream environments. The algorithm employs tree and list data structures for more efficient mining processes and solves the problems of previous erasable pattern mining approaches by using a sliding window-based stream processing technique and an item weight-based pattern pruning method. We compare performance of the proposed algorithm to state-of-the-art tree-based approaches with respect to various real and synthetic datasets. Experimental results show that our method is more efficient and scalable than the competitors in terms of runtime, memory, and pattern generation.