Conference PaperPDF Available

Abstract and Figures

In data stream learning, classification is a prominent task which aims to predict the class labels of incoming examples. However, in classification, most of the approaches from literature make assumptions that limit the usefulness of the methods in real scenarios such as the supposition that the label of an example will be available right after its prediction, i.e., there is no time delay to acquiring actual labels. It is a very optimistic assumption, since labeling the entire data stream is usually not feasible. Some recent approaches overcome this limitation, considering unsupervised learning methods to deal with delayed labels. Also, some proposals explore concepts of fuzzy set theory to add more flexibility to the learning process, although restricted to data streams with no delayed labels. In this paper, we propose a fuzzy classifier for data streams with infinitely delayed labels called FuzzMiC. Our algorithm generates a model based on fuzzy micro-clusters that provides flexible class boundaries and allows the classification of evolving data streams. Experiments show that our approach is promising in dealing with incremental changes.
Content may be subject to copyright.
A Fuzzy Classifier for Data Streams with
Infinitely Delayed Labels?
Tiago Pinho da Silva1, Vinicius M. A. Souza1, Gustavo E. A. P. A. Batista1,
and Heloisa de Arruda Camargo2
1Universidade de S˜ao Paulo, S˜ao Carlos SP 13566-590, Brazil
2Universidade Federal de S˜ao Carlos, S˜ao Carlos SP 13565-905, Brazil
tpinho@usp.br, {vmasouza, gbatista}@icmc.usp.br, heloisa@dc.ufscar.br
Abstract. In data stream learning, classification is a prominent task
which aims to predict the class labels of incoming examples. However,
in classification, most of the approaches from literature make assump-
tions that limit the usefulness of the methods in real scenarios such as
the supposition that the label of an example will be available right after
its prediction, i.e., there is no time delay to acquiring actual labels. It
is a very optimistic assumption, since labeling the entire data stream is
usually not feasible. Some recent approaches overcome this limitation,
considering unsupervised learning methods to deal with delayed labels.
Also, some proposals explore concepts of fuzzy set theory to add more
flexibility to the learning process, although restricted to data streams
with no delayed labels. In this paper, we propose a fuzzy classifier for
data streams with infinitely delayed labels called FuzzMiC. Our algo-
rithm generates a model based on fuzzy micro-clusters that provides
flexible class boundaries and allows the classification of evolving data
streams. Experiments show that our approach is promising in dealing
with incremental changes.
Keywords: Data streams ·Classification ·Delayed Labels ·Fuzzy.
1 Introduction
Data streams generate a considerable amount of data that can be used by ma-
chine learning methods for the automatic acquisition of useful knowledge. How-
ever, real-world data streams can be potentially infinite in size and susceptible to
changes in their distributions [3]. Hence, traditional methods cannot deal with
data streams particularities, requiring the development of online mechanisms.
Among the possible tasks under data streams, classification is the primary
focus of this paper. In this task, unlabeled examples arrive continuously in an
orderly fashion over time, and a classifier should predict the label of each example
in real-time or at least before the arrival of the next example [3]. However,
?This study was financed in part by the Coordena¸ao de Aperfei¸coamento de Pessoal de N´ıvel
Superior - Brasil (CAPES) - Finance Code 001, FAPESP (#2016/04986-6 and #2018/05859-3),
and CNPq (306631/2016-4). This material is based upon work supported by the United States
Agency for International Development under Grant No AID-OAA-F-16-00072.
2 Authors Suppressed Due to Excessive Length
data streams present concept drifts, which can degrade the performance of a
classification model over time if it is outdated to the recent changes [3]. Thus,
stream classifiers require constant updates so that their performance remains
stable over time in dynamic environments.
The changes may appear in different ways such as abrupt, recurring, gradual,
and incremental [3]. In this paper, we deal with incremental drifts, where there
are many intermediary concepts between one concept and another. This type
of change allows using similarities among consecutive concepts to update the
classification model and adapts in scenarios with label latency [10].
Latency is a characteristic present in many streams applications, and it is
defined as the time delay between the process of classifying an example and the
arrival of its respective actual label. This time can be null (delay = 0), interme-
diate (0 < delay < ), or extreme (delay =) [8]. One of the main reasons for
latency is the high cost for manual labeling by experts along with a high arrival
rate. Besides, other problems related to the data generation mechanism that can
also cause this delay, such as failures in the transmission of the labels [10].
In the most challenging scenario of extreme latency, the absence of actual
labels makes it unfeasible to monitor performance indicators, such as accuracy,
for change detection and to use labeled data for model update. Therefore, under
this delay condition, it is necessary to develop methods capable of producing a
stable performance, even with the presence of latency. The present work con-
tributes with algorithms for learning under extreme latency, which is regarded as
a challenging task, but also a more realistic scenario for many real applications.
To obtain more flexible learning models under evolving data stream scenarios,
researchers have developed methods using concepts of the fuzzy set theory [4,7].
In this work, we propose a Fuzzy Micro-cluster Classifier (FuzzMiC ), which
uses concepts of fuzzy set theory. Our proposal adopts the Supervised Fuzzy
Micro-Clusters (SFMiC) [9] as summarization structure for the classification for
incoming examples. Based on the memberships values, the algorithm associates
a class label with a new example. Our experimental evaluation shows that the
proposed method is more robust to class overlapping and presents more stable
results when compared to non-fuzzy methods.
The paper is organized as follows: Section 2 discusses related works concern-
ing classification under latency and fuzzy approaches for streams. In Section 3
we describe the proposed method FuzzMiC. In Section 4 and Section 5 we dis-
cuss the experiments and analysis of the results. Conclusions are provided in
Section 6.
2 Related Work
COMPOSE (Compacted Object Sample Extraction) [2] is a method that deals
with extreme latency problem in data streams with incremental concept drifts.
This method uses semi-supervised learning and computational geometry tech-
niques to identify and adapt the classification model to incremental changes.
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 3
Similarly, the Arbitrary Sub-Populations Tracker (APT) [6] addresses the ex-
treme latency problem by monitoring subpopulations of examples with arbitrary
probability distributions. A subpopulation is defined with a function belonging
to a set of examples generating functions in a multidimensional space.
SCARGC (Stream Classification Algorithm Guided by Clustering) [11] uses
successive steps of clustering over time to deal with incremental changes and
extreme latency. Each group found by the clustering algorithm has label infor-
mation, inherited from old concepts since the initial labeled training data.
The MClassification algorithm [10], which is the base of our proposal, applies
unsupervised learning approaches to classify new incoming examples without re-
quiring the actual label of examples to update the classification model. The
method creates an initial model composed of a supervised summarization struc-
ture called micro-clusters obtained from an initial labeled set. During the data
stream classification, the algorithm checks if a new example is within a micro-
cluster maximum radius. If right, the example is classified as the micro-cluster
class and associated with it. If not, a new micro-cluster is created with the new
example as a prototype and the class associated is from the closest micro-cluster.
Besides, the algorithm searches by the two farthest micro-clusters from the pre-
dicted class to merge them.
Recently, the fuzzy set theory has emerged with promising results to deal with
data streams. In [7] the authors proposed a fuzzy clustering algorithm, called Fuz-
ztream, that uses a fuzzy summarization structure named Fuzzy Micro-Clusters
(FMiC), to maintain information about the stream examples in real time. Ac-
cording to the authors, the application of fuzzy set theory, in the data stream
clustering, presented improvement scenarios with noise data. Besides, Hashemi
et al. [4] proposed a flexible decision tree (FlexDT) for stream classification,
integrating the Very Fast Decision Tree (VFDT) [5] with concepts of fuzzy sets.
FlexDT was able to generate a good performance in scenarios where concept
change, noise, and missing values coexist.
Although showing good results, the classification fuzzy algorithms discussed
consider the availability of actual labels after the example prediction. Thus, we
propose in this paper a fuzzy method based on the MClassification, which we
named FuzzMiC.
3 Fuzzy Micro-cluster Classifier - FuzzMiC
FuzzMiC is an approach for classification where extreme latency and incremen-
tal changes coexist. Our proposal is based on the MClassification algorithm,
previously discussed, that uses micro-clusters to perform the classification task.
Therefore, our method separates the learning process into a offline and online
phase. In the offline phase a decision model is learned from an initial labeled
set of examples. Later, in the online phase, new unlabeled examples from the
stream are incrementally classified in one of the known classes.
Seeking for better noise handling and flexibility, we propose a method that
uses the Fuzzy C-Means (FCM) [1] clustering algorithm to create the initial
4 Authors Suppressed Due to Excessive Length
classification decision model composed by a set of fuzzy summarization struc-
tures called SFMiC (Supervised Fuzzy Micro-Cluster) [9]. Through the online
phase, the method makes use of these structures to classify new examples.
The SFMiC [9] is defined as the vector (M, C F 1x, t, class id), where Mis
the linear sum of the membership values of the examples in the micro-cluster,
CF 1xis the linear sum of the nexamples xjweighted by their membership,
tis the timestamp of the most recent example associated to the SFMiC, and
class id is the class associated to the micro-cluster.
While the MClassification algorithm associates an example from the data
stream to only one micro-cluster, FuzzMiC considers membership degrees to
associate an example for a set of SFMiCs from the same class.
Concerning the proposed method, Algorithm 1 shows the offline phase. In
this phase are required as input the FCM parameter moff, a multiplying factor
ωconcerning the number of micro-clusters by class and the initial labeled set
used to calculate the first micro-clusters init points.
Algorithm 1 FuzzMiC -Offline Phase
Require: mof f, ω, init points
1: model ← ∅
2: for each class Ciinit points do
3: class cluster s FCM(init pointsclass=Ci, moff , ω d)
4: class SF MiC summarize(class clusters)
5: model model class SF MiC
6: end for
In the beginning, for each class of the initial labeled data, the set of corre-
sponding points is given as entry for the FCM clustering algorithm (Step 3) to
generate ωdclusters for each class, where dcorresponds to the number of at-
tributes in the evaluated dataset. The clusters found for a class are stored in the
variable class cluster and lately summarized in a supervised fuzzy micro-cluster
structure in the function summarize (Step 4). The decision model is defined as
the set of SFMICs found for all different classes (Step 5).
For the online phase of the algorithm, the classification is performed con-
sidering arriving examples from the data stream DS. Algorithm 2 presents the
process for the online phase, where θcorrespond to an adaptation threshold for
the classification step, max mic class is the maximum number of SFMiCs per
class, and mon is the fuzzification parameter regarding the membership.
In Algorithm 2, for each example xarriving from the stream, the algorithm
calculates the membership of xto all current SFMiCs (Step 3), the membership
is calculated in the same way as in the FCM algorithm [1]. After, the member-
ships regarding SFMiCs of the same class are summed, resulting in a value of
compatibility of xto each class, which we called class compatibility (Steps 5-8).
These values will be used to decide which label of an existing class, xwill be
assigned. This process is done by verifying the maximum compatibility of xto a
class. After that, the algorithm checks if xis inside the decision boundary formed
by the SFMiCs of the predicted class Ci(Step 9), by verifying if the maximum
compatibility is greater or equal than a threshold parameter θ. If true, a new
SFMiC is created for Ciwith xas the prototype, to do so, if the maximum num-
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 5
ber of SFMiCs (max mic class) of Ciis reached, then the oldest SFMIC from
Ciis removed based on the timestamp component t(Steps 12-13). Otherwise, x
is considered an outlier and it is only updated into the SFMiCs of class Ci
This procedure ensures that SFMiCs will be created only inside the class
decision boundary. Since we are dealing with incremental concept drifts, the
constant creation of new SFMiCs along with updates from outliers help the
SFMiCs to move in the direction of the drift. Besides, not creating SFMiCs
on the class boundaries decreases misclassification when there is partial class
overlapping.
Algorithm 2 FuzzMiC -Online Step
Require: DS, θ, max mic class, mon
1: while !isempty(DS)do
2: xnext(DS)
3: all member ship membership(x, model, mon)
4: all comp ← ∅
5: for each class Cimodel do
6: class compatibility sum(all membershipclass=Ci)
7: all comp all comp class compatibility
8: end for
9: (max class, max comp)max(all comp)
10: x.class max class
11: if max comp θthen
12: create sfmic(model, x)
13: if |modelclass=max class|> max mic class then
14: remove old sfmic(model)
15: end if
16: else
17: update(modelclass=max class, x)
18: end if
19: end while
4 Experimental Setup
We evaluated our proposed method on two real-world problems and 13 syn-
thetic benchmark datasets proposed in [11]. In order to verify the advantages
of the fuzzy-based approach proposed here, we compare the results for FuzzMiC
against the results obtained by MClassification. We also consider two bounds
that simulate a static supervised learning classifier (Static) and a classifier that
is constantly updated without delay time to achieve the actual labels (Sliding).
Regarding the parameters for all synthetic datasets, the initial labeled set
(init points) was defined as the first 150 examples from the data stream. Con-
cerning our proposal, the offline phase parameters mof f and ωhad their values
defined as 2. In addition, the online phase parameters mon,θand max mic class
were defined as 2, 0.9 and ωdrespectively, where dcorrespond to the number
of attributes from the evaluated dataset.
For the Keystroke dataset, which describes 8 sessions of 4 users typing the
password “.tie5Roanl” plus the Enter key 400 times for each session, we consider
the examples from the first session as the initial labeled set for all methods.
The FuzzMiC offline phase parameters mof f and ωwere defined as 2 and 4
respectively, the online phase parameters mon ,θand max mic class were defined
as 2, 0.39 and ωdrespectively.
6 Authors Suppressed Due to Excessive Length
For the NOAA dataset initial labeled set was defined as the first 30 examples
for the Static and Sliding methods, as described in [10]. The MClassification and
FuzzMiC methods had their initial labeled set defined as the first 10 examples
from the data stream. Regarding FuzzMiC parameters, the offline parameters
moff and ωwere defined as 2 and 0.25 respectively, the online phase parameters
mon,θand max mic class were defined as 2, 0.85 and 4 drespectively.
These parameter values were chosen for the offline and online phases be-
cause they have led to the best results in preliminary experiments. Concerning
MClassification, the parameter rwas defined with it default value (0.1).
5 Analysis of Results
A first assessment of the averaged accuracy (Table 1) shows that FuzzMIC per-
forms slightly equally or better than MClassification in most of the cases. How-
ever, this average may not represent the performance of each algorithm over time.
For a more thorough evaluation, we present some examples in detail (Fig. 1).
Table 1. Average accuracies over time on benchmark data
Dataset Static Sliding MClassification FuzzMiC
1CDT 97.01 99.88 99.89 99.88
1CHT 91.96 99.24 99.38 99.31
1CSurr 65.75 98.52 85.15 79.50
2CDT 54.38 93.47 95.23 95.95
2CHT 54.03 85.44 87.93 89.07
4CE1CF 95.81 97.15 94.38 92.28
4CRE-V1 26.17 97.64 90.63 98.22
4CRE-V2 27.11 89.37 91.59 92.02
5CVT 40.72 86.86 88.40 90.37
GEARS 2C 2D 93.62 99.86 94.73 95.20
UG 2C 2D 47.28 94.27 95.28 94.98
UG 2C 3D 60.64 92.86 94.72 94.88
UG 2C 5D 68.81 89.91 91.25 91.86
NOAA 66.19 72.01 67.54 68.63
KEYSTROKE 68.69 90.14 90.62 90.25
In Fig. 1a, we show the results considering 100 evaluation moments for the
4CRE-V1 dataset. We can note that all methods present 5 majors decays, which
are related to moments of class overlapping (Fig. 2b and Fig. 2a ). However,
the proposed method had the lowest decay in most moments, achieving the best
results for this dataset.This can be explained by the fact that FuzzMiC do not
create micro-clusters nearby a classes boundaries (see Fig. 2b), which is not
true for MClassification. Therefore, our proposal decreases the misclassification
when partial class overlapping moments occur.
In GEARS 2C 2D, the decision boundaries for each class has the shape of a
star that rotates in a fixed center over time (Fig. 2c and Fig. 2d). The outcomes
are shown considering 100 evaluation moments (Fig. 1b). Concerning FuzzMiC,
the results show that our proposal was able to generate more stable results
when compared to MClassification. Besides, FuzzMiC had similar behavior as the
approach with no latency (Sliding), while MClassification had a similar behavior
to the Static approach. Indicating a certain invariance of FuzzMiC concerning the
changes in this dataset, due to the generation of new micro-clusters nearby the
stationary center of the classes (see Fig. 2c). On the other hand, MClassification
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 7
may generate new micro-clusters on the classes boundaries, where the changes
occur, causing some instabilities in their performance.
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−4CRE−V1/ Windowsize1247
(a) Synthetic data 4CRE-V1
0.80
0.85
0.90
0.95
1.00
0 25 50 75 100
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−GEARS2C2D/ Windowsize1997
(b) Synthetic data GEARS 2C 2D
0.5
0.6
0.7
0.8
0.9
1.0
2 4 6
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−keystroke/ Windowsize207
(c) Real data Keystroke
0.5
0.6
0.7
0.8
0.9
1.0
0 10 20 30 40 50
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−NOAA/ Windowsize363
(d) Real data NOAA
Fig. 1. Accuracy achieved over time by the methods in 4 evaluated datasets
−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0
Feature 1
Feature 2
1234
(a) Eval. Moment: 14
−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0
Feature 1
Feature 2
1234
(b) Eval. Moment: 15
−1.0
−0.5
0.0
0.5
1.0
−1 0 1 2
Feature 1
Feature 2
1 2
(c) Eval. Moment: 1
−1.0
−0.5
0.0
0.5
1.0
−1 0 1 2
Feature 1
Feature 2
1 2
(d) Eval. Moment: 2
Fig. 2. Snapshots of 4CRE-V1 (left) and GEARS 2C 2D (right). Each class is described
as a different color and the micro-clusters are represented by the X shaped marks
Concerning the real data Keystroke, the evaluation was carried considering
7 evaluation moments related to each session of data collection (Fig. 1c). In this
dataset, FuzzMiC was able to generate better results in the first 2 sessions, when
compared to the remaining approach. However, our approach presents a minor
decrease in accuracy during sessions 3 and 4.Thus, increasing the chances to cre-
ate micro-cluster for noise data. Nonetheless, during the sessions 5, 6 and 7, our
proposal was able to recover and even achieve slightly better results on sessions
5 and 7 when compared to MClassification. Altogether, except for the Static,
the remaining methods presented a similar behavior as seen in Table 1, which
can be observed as an advantage for methods FuzzMiC and MClassification.
The evaluation on real data NOAA was carried considering 50 time moments
related to each year of weather measurements (Fig. 1d). FuzzMiC generated
better results than MClassification in the initial moments while achieving similar
results in the remaining. Despite the low accuracy results obtained by FuzzMiC
and MClassification over time, the Sliding approach also achieved approximated
results, which indicates the high complexity of this dataset. Thus, the results
8 Authors Suppressed Due to Excessive Length
obtained by the proposed method can be seen as positive, since they provide a
slight increase in accuracy when compared to MClassification.
In general, we can see that FuzzMiC presents similar or superior results than
MClassification. It is better at handle partial overlapping classes, and presented a
certain level of invariance to some concept changes, because the SFMiC structure
turns possible a flexible learning process.
6 Final Considerations
This work presents a fuzzy classifier for data streams under extreme latency
named FuzzMiC. Experiments show that FuzzMiC obtains promising results
in the evaluated datasets. Notably, the flexibility added by the integration of
fuzzy micro-clusters enables the proposed method to better deal with changes
in the data stream, regarding the crisp approach MClassification, especially in
the presence of partial class overlapping.
The experiments held in this work were made with the purpose to validate
our proposal and highlight its advantages with respect to the algorithm that
motivated its creation. Thus, there is still room for further investigations such as
the comparison with others data stream classifiers for extreme latency scenarios
and tests with different real-world datasets with incremental drifts. Another line
of further research must contemplate the automatic adaptability of θthreshold.
References
1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms.
Springer US (1981)
2. Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework
for initially labeled nonstationary streaming data. TNNLS 25(1), 12–26 (2014)
3. Gama, J., ˇ
Zliobait˙e, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on
concept drift adaptation. CSUR 46(4), 44 (2014)
4. Hashemi, S., Yang, Y.: Flexible decision tree for data stream classification in the
presence of concept change, noise and missing values. Data Mining and Knowledge
Discovery 19(1), 95–131 (2009)
5. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In:
ACM SIGKDD. pp. 97–106. ACM (2001)
6. Krempl, G.: The algorithm apt to classify in concurrence of latency and drift. In:
IDA. pp. 222–233 (2011)
7. Lopes, P.A., Camargo, H.A.: Fuzzstream: Fuzzy data stream clustering based on
the online-offline framework. In: FUZZ-IEEE (2017)
8. Marrs, G.R., Hickey, R.J., Black, M.M.: The impact of latency on online classifi-
cation learning with concept drift. In: KSEM. pp. 459–469 (2010)
9. Silva, T.P., Urban, G.A., Lopes, P.A., Camargo, H.A.: A fuzzy variant for on-
demand data stream classification. In: BRACIS. pp. 67–72 (2017)
10. Souza, V.M.A., Silva, D.F., Batista, G.E.A.P.A., Gama, J.: Classification of evolv-
ing data streams with infinitely delayed labels. In: ICMLA. pp. 214–219 (2015)
11. Souza, V.M.A., Silva, D.F., Gama, J., Batista, G.E.A.P.A.: Data stream classifica-
tion guided by clustering on nonstationary environments and extreme verification
latency. In: SIAM SDM. pp. 873–881 (2015)
... Their results showed a reduction in mean error using GRAPE compared with several methods. Another classifier has been proposed by [6] based on micro-clusters for extreme latency in data stream labeling. It utilized a fuzzy concept for creating more elastic boundaries for the classes. ...
Conference Paper
Full-text available
Research interest in data stream classification is increasing through the development of adaptive machine learning techniques. These techniques involve continuously adjusting the classification model in response to changes in the data distribution. Most of these techniques assume instance labeling for the classes to perform the model adapting process, and this assumption is rare with actual data. This work proposes using a semi-supervised label propagation technique to infer many delayed labels (considered missing values) from limited known values in a data stream. The work's implementation included using two imbalanced EEG datasets, CHB-MIT Scalp and Siena Scalp datasets, to evaluate the proposed method with various values for missing ratios. The results showed the proposed method's ability to recover all the negative class values in both datasets with a missing percentage reaching 70\%. Due to the rare positive class, the recovery of its value decreased with more than 30\% missing ratio.
Conference Paper
Novelty detection is an important topic in data stream classification, as it is responsible for identifying the emergence of new concepts, new patterns, and outliers. It becomes necessary when the true label of an instance is not available right after its classification. The time between the classification of an instance and the arrival of its true label is called latency. This is a common scenario in data streams applications. However, most classification algorithms do not consider such a problem and assume that there will be no latency. On the other hand, a few methods in the literature cope with the existence of infinite latency and novelty detection in data streams. In this work, however, we focus on the scenario where the true labels will be available to the system after a certain time, called intermediate latency. Such a scenario is present in the stock market and weather datasets. Moreover, aiming for more flexible learning to deal with the uncertainties inherent in data streams, we consider the use of fuzzy set theory concepts. Therefore, we propose a method for classification and novelty detection in data streams called Fuzzy Classifier with Novelty Detection for data streams (FuzzCND). Our method uses an ensemble of fuzzy decision trees to perform the classification of new instances and applies the concepts of fuzzy set theory to detect possible novelties. The experiments showed that our approach is promising in dealing with the emergence of new concepts in data streams and inaccuracies in the data.
Conference Paper
Full-text available
Data stream classification algorithms for nonstationary environments frequently assume the availability of class labels, instantly or with some lag after the classification. However , certain applications, mainly those related to sensors and robotics, involve high costs to obtain new labels during the classification phase. Such a scenario in which the actual labels of processed data are never available is called extreme verification latency. Extreme verification latency requires new classification methods capable of adapting to possible changes over time without external supervision. This paper presents a fast, simple, intuitive and accurate algorithm to classify nonstationary data streams in an extreme verification latency scenario, namely Stream Classification Algorithm Guided by Clustering – SCARGC. Our method consists of a clustering followed by a classification step applied repeatedly in a closed loop fashion. We show in several classification tasks evaluated in synthetic and real data that our method is faster and more accurate than the state-of-the-art.
Article
Full-text available
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Article
Full-text available
An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch.
Conference Paper
Full-text available
Online classification learners operating under concept drift can be subject to latency in examples arriving at the training base. A discussion of latency and the related notion of example filtering leads to the development of an example life cycle for online learning (OLLC). Latency in a data stream is modelled in a new Example Life-cycle Integrated Simulation Environment (ELISE). In a series of experiments, the online learner algorithm CD3 is evaluated under several drift and latency scenarios. Results show that systems subject to large random latencies can, when drift occurs, suffer substantial deterioration in classification rate with slow recovery.
Conference Paper
Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known drift-adaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models. This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed sub-populations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.
Article
In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist.
Article
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.