Content uploaded by Vinícius Mourão Alves de Souza
Author content
All content in this area was uploaded by Vinícius Mourão Alves de Souza on Dec 20, 2018
Content may be subject to copyright.
A Fuzzy Classifier for Data Streams with
Infinitely Delayed Labels?
Tiago Pinho da Silva1, Vinicius M. A. Souza1, Gustavo E. A. P. A. Batista1,
and Heloisa de Arruda Camargo2
1Universidade de S˜ao Paulo, S˜ao Carlos SP 13566-590, Brazil
2Universidade Federal de S˜ao Carlos, S˜ao Carlos SP 13565-905, Brazil
tpinho@usp.br, {vmasouza, gbatista}@icmc.usp.br, heloisa@dc.ufscar.br
Abstract. In data stream learning, classification is a prominent task
which aims to predict the class labels of incoming examples. However,
in classification, most of the approaches from literature make assump-
tions that limit the usefulness of the methods in real scenarios such as
the supposition that the label of an example will be available right after
its prediction, i.e., there is no time delay to acquiring actual labels. It
is a very optimistic assumption, since labeling the entire data stream is
usually not feasible. Some recent approaches overcome this limitation,
considering unsupervised learning methods to deal with delayed labels.
Also, some proposals explore concepts of fuzzy set theory to add more
flexibility to the learning process, although restricted to data streams
with no delayed labels. In this paper, we propose a fuzzy classifier for
data streams with infinitely delayed labels called FuzzMiC. Our algo-
rithm generates a model based on fuzzy micro-clusters that provides
flexible class boundaries and allows the classification of evolving data
streams. Experiments show that our approach is promising in dealing
with incremental changes.
Keywords: Data streams ·Classification ·Delayed Labels ·Fuzzy.
1 Introduction
Data streams generate a considerable amount of data that can be used by ma-
chine learning methods for the automatic acquisition of useful knowledge. How-
ever, real-world data streams can be potentially infinite in size and susceptible to
changes in their distributions [3]. Hence, traditional methods cannot deal with
data streams particularities, requiring the development of online mechanisms.
Among the possible tasks under data streams, classification is the primary
focus of this paper. In this task, unlabeled examples arrive continuously in an
orderly fashion over time, and a classifier should predict the label of each example
in real-time or at least before the arrival of the next example [3]. However,
?This study was financed in part by the Coordena¸c˜ao de Aperfei¸coamento de Pessoal de N´ıvel
Superior - Brasil (CAPES) - Finance Code 001, FAPESP (#2016/04986-6 and #2018/05859-3),
and CNPq (306631/2016-4). This material is based upon work supported by the United States
Agency for International Development under Grant No AID-OAA-F-16-00072.
2 Authors Suppressed Due to Excessive Length
data streams present concept drifts, which can degrade the performance of a
classification model over time if it is outdated to the recent changes [3]. Thus,
stream classifiers require constant updates so that their performance remains
stable over time in dynamic environments.
The changes may appear in different ways such as abrupt, recurring, gradual,
and incremental [3]. In this paper, we deal with incremental drifts, where there
are many intermediary concepts between one concept and another. This type
of change allows using similarities among consecutive concepts to update the
classification model and adapts in scenarios with label latency [10].
Latency is a characteristic present in many streams applications, and it is
defined as the time delay between the process of classifying an example and the
arrival of its respective actual label. This time can be null (delay = 0), interme-
diate (0 < delay < ∞), or extreme (delay =∞) [8]. One of the main reasons for
latency is the high cost for manual labeling by experts along with a high arrival
rate. Besides, other problems related to the data generation mechanism that can
also cause this delay, such as failures in the transmission of the labels [10].
In the most challenging scenario of extreme latency, the absence of actual
labels makes it unfeasible to monitor performance indicators, such as accuracy,
for change detection and to use labeled data for model update. Therefore, under
this delay condition, it is necessary to develop methods capable of producing a
stable performance, even with the presence of latency. The present work con-
tributes with algorithms for learning under extreme latency, which is regarded as
a challenging task, but also a more realistic scenario for many real applications.
To obtain more flexible learning models under evolving data stream scenarios,
researchers have developed methods using concepts of the fuzzy set theory [4,7].
In this work, we propose a Fuzzy Micro-cluster Classifier (FuzzMiC ), which
uses concepts of fuzzy set theory. Our proposal adopts the Supervised Fuzzy
Micro-Clusters (SFMiC) [9] as summarization structure for the classification for
incoming examples. Based on the memberships values, the algorithm associates
a class label with a new example. Our experimental evaluation shows that the
proposed method is more robust to class overlapping and presents more stable
results when compared to non-fuzzy methods.
The paper is organized as follows: Section 2 discusses related works concern-
ing classification under latency and fuzzy approaches for streams. In Section 3
we describe the proposed method FuzzMiC. In Section 4 and Section 5 we dis-
cuss the experiments and analysis of the results. Conclusions are provided in
Section 6.
2 Related Work
COMPOSE (Compacted Object Sample Extraction) [2] is a method that deals
with extreme latency problem in data streams with incremental concept drifts.
This method uses semi-supervised learning and computational geometry tech-
niques to identify and adapt the classification model to incremental changes.
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 3
Similarly, the Arbitrary Sub-Populations Tracker (APT) [6] addresses the ex-
treme latency problem by monitoring subpopulations of examples with arbitrary
probability distributions. A subpopulation is defined with a function belonging
to a set of examples generating functions in a multidimensional space.
SCARGC (Stream Classification Algorithm Guided by Clustering) [11] uses
successive steps of clustering over time to deal with incremental changes and
extreme latency. Each group found by the clustering algorithm has label infor-
mation, inherited from old concepts since the initial labeled training data.
The MClassification algorithm [10], which is the base of our proposal, applies
unsupervised learning approaches to classify new incoming examples without re-
quiring the actual label of examples to update the classification model. The
method creates an initial model composed of a supervised summarization struc-
ture called micro-clusters obtained from an initial labeled set. During the data
stream classification, the algorithm checks if a new example is within a micro-
cluster maximum radius. If right, the example is classified as the micro-cluster
class and associated with it. If not, a new micro-cluster is created with the new
example as a prototype and the class associated is from the closest micro-cluster.
Besides, the algorithm searches by the two farthest micro-clusters from the pre-
dicted class to merge them.
Recently, the fuzzy set theory has emerged with promising results to deal with
data streams. In [7] the authors proposed a fuzzy clustering algorithm, called Fuz-
ztream, that uses a fuzzy summarization structure named Fuzzy Micro-Clusters
(FMiC), to maintain information about the stream examples in real time. Ac-
cording to the authors, the application of fuzzy set theory, in the data stream
clustering, presented improvement scenarios with noise data. Besides, Hashemi
et al. [4] proposed a flexible decision tree (FlexDT) for stream classification,
integrating the Very Fast Decision Tree (VFDT) [5] with concepts of fuzzy sets.
FlexDT was able to generate a good performance in scenarios where concept
change, noise, and missing values coexist.
Although showing good results, the classification fuzzy algorithms discussed
consider the availability of actual labels after the example prediction. Thus, we
propose in this paper a fuzzy method based on the MClassification, which we
named FuzzMiC.
3 Fuzzy Micro-cluster Classifier - FuzzMiC
FuzzMiC is an approach for classification where extreme latency and incremen-
tal changes coexist. Our proposal is based on the MClassification algorithm,
previously discussed, that uses micro-clusters to perform the classification task.
Therefore, our method separates the learning process into a offline and online
phase. In the offline phase a decision model is learned from an initial labeled
set of examples. Later, in the online phase, new unlabeled examples from the
stream are incrementally classified in one of the known classes.
Seeking for better noise handling and flexibility, we propose a method that
uses the Fuzzy C-Means (FCM) [1] clustering algorithm to create the initial
4 Authors Suppressed Due to Excessive Length
classification decision model composed by a set of fuzzy summarization struc-
tures called SFMiC (Supervised Fuzzy Micro-Cluster) [9]. Through the online
phase, the method makes use of these structures to classify new examples.
The SFMiC [9] is defined as the vector (M, C F 1x, t, class id), where Mis
the linear sum of the membership values of the examples in the micro-cluster,
CF 1xis the linear sum of the nexamples xjweighted by their membership,
tis the timestamp of the most recent example associated to the SFMiC, and
class id is the class associated to the micro-cluster.
While the MClassification algorithm associates an example from the data
stream to only one micro-cluster, FuzzMiC considers membership degrees to
associate an example for a set of SFMiCs from the same class.
Concerning the proposed method, Algorithm 1 shows the offline phase. In
this phase are required as input the FCM parameter moff, a multiplying factor
ωconcerning the number of micro-clusters by class and the initial labeled set
used to calculate the first micro-clusters init points.
Algorithm 1 FuzzMiC -Offline Phase
Require: mof f, ω, init points
1: model ← ∅
2: for each class Ci∈init points do
3: class cluster s ←FCM(init pointsclass=Ci, moff , ω ∗d)
4: class SF MiC ←summarize(class clusters)
5: model ←model ∪class SF MiC
6: end for
In the beginning, for each class of the initial labeled data, the set of corre-
sponding points is given as entry for the FCM clustering algorithm (Step 3) to
generate ω∗dclusters for each class, where dcorresponds to the number of at-
tributes in the evaluated dataset. The clusters found for a class are stored in the
variable class cluster and lately summarized in a supervised fuzzy micro-cluster
structure in the function summarize (Step 4). The decision model is defined as
the set of SFMICs found for all different classes (Step 5).
For the online phase of the algorithm, the classification is performed con-
sidering arriving examples from the data stream DS. Algorithm 2 presents the
process for the online phase, where θcorrespond to an adaptation threshold for
the classification step, max mic class is the maximum number of SFMiCs per
class, and mon is the fuzzification parameter regarding the membership.
In Algorithm 2, for each example xarriving from the stream, the algorithm
calculates the membership of xto all current SFMiCs (Step 3), the membership
is calculated in the same way as in the FCM algorithm [1]. After, the member-
ships regarding SFMiCs of the same class are summed, resulting in a value of
compatibility of xto each class, which we called class compatibility (Steps 5-8).
These values will be used to decide which label of an existing class, xwill be
assigned. This process is done by verifying the maximum compatibility of xto a
class. After that, the algorithm checks if xis inside the decision boundary formed
by the SFMiCs of the predicted class Ci(Step 9), by verifying if the maximum
compatibility is greater or equal than a threshold parameter θ. If true, a new
SFMiC is created for Ciwith xas the prototype, to do so, if the maximum num-
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 5
ber of SFMiCs (max mic class) of Ciis reached, then the oldest SFMIC from
Ciis removed based on the timestamp component t(Steps 12-13). Otherwise, x
is considered an outlier and it is only updated into the SFMiCs of class Ci
This procedure ensures that SFMiCs will be created only inside the class
decision boundary. Since we are dealing with incremental concept drifts, the
constant creation of new SFMiCs along with updates from outliers help the
SFMiCs to move in the direction of the drift. Besides, not creating SFMiCs
on the class boundaries decreases misclassification when there is partial class
overlapping.
Algorithm 2 FuzzMiC -Online Step
Require: DS, θ, max mic class, mon
1: while !isempty(DS)do
2: x←next(DS)
3: all member ship ←membership(x, model, mon)
4: all comp ← ∅
5: for each class Ci∈model do
6: class compatibility ←sum(all membershipclass=Ci)
7: all comp ←all comp ∪class compatibility
8: end for
9: (max class, max comp)←max(all comp)
10: x.class ←max class
11: if max comp ≥θthen
12: create sfmic(model, x)
13: if |modelclass=max class|> max mic class then
14: remove old sfmic(model)
15: end if
16: else
17: update(modelclass=max class, x)
18: end if
19: end while
4 Experimental Setup
We evaluated our proposed method on two real-world problems and 13 syn-
thetic benchmark datasets proposed in [11]. In order to verify the advantages
of the fuzzy-based approach proposed here, we compare the results for FuzzMiC
against the results obtained by MClassification. We also consider two bounds
that simulate a static supervised learning classifier (Static) and a classifier that
is constantly updated without delay time to achieve the actual labels (Sliding).
Regarding the parameters for all synthetic datasets, the initial labeled set
(init points) was defined as the first 150 examples from the data stream. Con-
cerning our proposal, the offline phase parameters mof f and ωhad their values
defined as 2. In addition, the online phase parameters mon,θand max mic class
were defined as 2, 0.9 and ω∗drespectively, where dcorrespond to the number
of attributes from the evaluated dataset.
For the Keystroke dataset, which describes 8 sessions of 4 users typing the
password “.tie5Roanl” plus the Enter key 400 times for each session, we consider
the examples from the first session as the initial labeled set for all methods.
The FuzzMiC offline phase parameters mof f and ωwere defined as 2 and 4
respectively, the online phase parameters mon ,θand max mic class were defined
as 2, 0.39 and ω∗drespectively.
6 Authors Suppressed Due to Excessive Length
For the NOAA dataset initial labeled set was defined as the first 30 examples
for the Static and Sliding methods, as described in [10]. The MClassification and
FuzzMiC methods had their initial labeled set defined as the first 10 examples
from the data stream. Regarding FuzzMiC parameters, the offline parameters
moff and ωwere defined as 2 and 0.25 respectively, the online phase parameters
mon,θand max mic class were defined as 2, 0.85 and 4 ∗drespectively.
These parameter values were chosen for the offline and online phases be-
cause they have led to the best results in preliminary experiments. Concerning
MClassification, the parameter rwas defined with it default value (0.1).
5 Analysis of Results
A first assessment of the averaged accuracy (Table 1) shows that FuzzMIC per-
forms slightly equally or better than MClassification in most of the cases. How-
ever, this average may not represent the performance of each algorithm over time.
For a more thorough evaluation, we present some examples in detail (Fig. 1).
Table 1. Average accuracies over time on benchmark data
Dataset Static Sliding MClassification FuzzMiC
1CDT 97.01 99.88 99.89 99.88
1CHT 91.96 99.24 99.38 99.31
1CSurr 65.75 98.52 85.15 79.50
2CDT 54.38 93.47 95.23 95.95
2CHT 54.03 85.44 87.93 89.07
4CE1CF 95.81 97.15 94.38 92.28
4CRE-V1 26.17 97.64 90.63 98.22
4CRE-V2 27.11 89.37 91.59 92.02
5CVT 40.72 86.86 88.40 90.37
GEARS 2C 2D 93.62 99.86 94.73 95.20
UG 2C 2D 47.28 94.27 95.28 94.98
UG 2C 3D 60.64 92.86 94.72 94.88
UG 2C 5D 68.81 89.91 91.25 91.86
NOAA 66.19 72.01 67.54 68.63
KEYSTROKE 68.69 90.14 90.62 90.25
In Fig. 1a, we show the results considering 100 evaluation moments for the
4CRE-V1 dataset. We can note that all methods present 5 majors decays, which
are related to moments of class overlapping (Fig. 2b and Fig. 2a ). However,
the proposed method had the lowest decay in most moments, achieving the best
results for this dataset.This can be explained by the fact that FuzzMiC do not
create micro-clusters nearby a classes boundaries (see Fig. 2b), which is not
true for MClassification. Therefore, our proposal decreases the misclassification
when partial class overlapping moments occur.
In GEARS 2C 2D, the decision boundaries for each class has the shape of a
star that rotates in a fixed center over time (Fig. 2c and Fig. 2d). The outcomes
are shown considering 100 evaluation moments (Fig. 1b). Concerning FuzzMiC,
the results show that our proposal was able to generate more stable results
when compared to MClassification. Besides, FuzzMiC had similar behavior as the
approach with no latency (Sliding), while MClassification had a similar behavior
to the Static approach. Indicating a certain invariance of FuzzMiC concerning the
changes in this dataset, due to the generation of new micro-clusters nearby the
stationary center of the classes (see Fig. 2c). On the other hand, MClassification
A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels 7
may generate new micro-clusters on the classes boundaries, where the changes
occur, causing some instabilities in their performance.
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−4CRE−V1/ Windowsize1247
(a) Synthetic data 4CRE-V1
0.80
0.85
0.90
0.95
1.00
0 25 50 75 100
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−GEARS2C2D/ Windowsize1997
(b) Synthetic data GEARS 2C 2D
0.5
0.6
0.7
0.8
0.9
1.0
2 4 6
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−keystroke/ Windowsize207
(c) Real data Keystroke
0.5
0.6
0.7
0.8
0.9
1.0
0 10 20 30 40 50
Evaluation Moments
Accuracy
FuzzMiC MClassification Static Sliding
Dataset−NOAA/ Windowsize363
(d) Real data NOAA
Fig. 1. Accuracy achieved over time by the methods in 4 evaluated datasets
−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0
Feature 1
Feature 2
1234
(a) Eval. Moment: 14
−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0
Feature 1
Feature 2
1234
(b) Eval. Moment: 15
−1.0
−0.5
0.0
0.5
1.0
−1 0 1 2
Feature 1
Feature 2
1 2
(c) Eval. Moment: 1
−1.0
−0.5
0.0
0.5
1.0
−1 0 1 2
Feature 1
Feature 2
1 2
(d) Eval. Moment: 2
Fig. 2. Snapshots of 4CRE-V1 (left) and GEARS 2C 2D (right). Each class is described
as a different color and the micro-clusters are represented by the X shaped marks
Concerning the real data Keystroke, the evaluation was carried considering
7 evaluation moments related to each session of data collection (Fig. 1c). In this
dataset, FuzzMiC was able to generate better results in the first 2 sessions, when
compared to the remaining approach. However, our approach presents a minor
decrease in accuracy during sessions 3 and 4.Thus, increasing the chances to cre-
ate micro-cluster for noise data. Nonetheless, during the sessions 5, 6 and 7, our
proposal was able to recover and even achieve slightly better results on sessions
5 and 7 when compared to MClassification. Altogether, except for the Static,
the remaining methods presented a similar behavior as seen in Table 1, which
can be observed as an advantage for methods FuzzMiC and MClassification.
The evaluation on real data NOAA was carried considering 50 time moments
related to each year of weather measurements (Fig. 1d). FuzzMiC generated
better results than MClassification in the initial moments while achieving similar
results in the remaining. Despite the low accuracy results obtained by FuzzMiC
and MClassification over time, the Sliding approach also achieved approximated
results, which indicates the high complexity of this dataset. Thus, the results
8 Authors Suppressed Due to Excessive Length
obtained by the proposed method can be seen as positive, since they provide a
slight increase in accuracy when compared to MClassification.
In general, we can see that FuzzMiC presents similar or superior results than
MClassification. It is better at handle partial overlapping classes, and presented a
certain level of invariance to some concept changes, because the SFMiC structure
turns possible a flexible learning process.
6 Final Considerations
This work presents a fuzzy classifier for data streams under extreme latency
named FuzzMiC. Experiments show that FuzzMiC obtains promising results
in the evaluated datasets. Notably, the flexibility added by the integration of
fuzzy micro-clusters enables the proposed method to better deal with changes
in the data stream, regarding the crisp approach MClassification, especially in
the presence of partial class overlapping.
The experiments held in this work were made with the purpose to validate
our proposal and highlight its advantages with respect to the algorithm that
motivated its creation. Thus, there is still room for further investigations such as
the comparison with others data stream classifiers for extreme latency scenarios
and tests with different real-world datasets with incremental drifts. Another line
of further research must contemplate the automatic adaptability of θthreshold.
References
1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms.
Springer US (1981)
2. Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework
for initially labeled nonstationary streaming data. TNNLS 25(1), 12–26 (2014)
3. Gama, J., ˇ
Zliobait˙e, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on
concept drift adaptation. CSUR 46(4), 44 (2014)
4. Hashemi, S., Yang, Y.: Flexible decision tree for data stream classification in the
presence of concept change, noise and missing values. Data Mining and Knowledge
Discovery 19(1), 95–131 (2009)
5. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In:
ACM SIGKDD. pp. 97–106. ACM (2001)
6. Krempl, G.: The algorithm apt to classify in concurrence of latency and drift. In:
IDA. pp. 222–233 (2011)
7. Lopes, P.A., Camargo, H.A.: Fuzzstream: Fuzzy data stream clustering based on
the online-offline framework. In: FUZZ-IEEE (2017)
8. Marrs, G.R., Hickey, R.J., Black, M.M.: The impact of latency on online classifi-
cation learning with concept drift. In: KSEM. pp. 459–469 (2010)
9. Silva, T.P., Urban, G.A., Lopes, P.A., Camargo, H.A.: A fuzzy variant for on-
demand data stream classification. In: BRACIS. pp. 67–72 (2017)
10. Souza, V.M.A., Silva, D.F., Batista, G.E.A.P.A., Gama, J.: Classification of evolv-
ing data streams with infinitely delayed labels. In: ICMLA. pp. 214–219 (2015)
11. Souza, V.M.A., Silva, D.F., Gama, J., Batista, G.E.A.P.A.: Data stream classifica-
tion guided by clustering on nonstationary environments and extreme verification
latency. In: SIAM SDM. pp. 873–881 (2015)