Article

Online Methods of Learning in Occurrence of Concept Drift

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This problem can reduce the accuracy of the model because the data may become obsolete quickly over time. There are three types of method that have been successfully applied to solve the concept drift problem as follows: (i) Using a unified framework to detect the drift-an ensemble algorithm to detect using incremental learning manner [11]; (ii) Feature extraction for explicit concept drift detection [12]-using time series features to monitor how concepts evolve over time; and (iii) monitoring the change of error distribution in the learning part [13]. According to [13], detecting the change of error distribution by Drift Detection Method (DDM) is the most efficient way for concept drift detection, but it is only applicable for a classification task. ...
... There are three types of method that have been successfully applied to solve the concept drift problem as follows: (i) Using a unified framework to detect the drift-an ensemble algorithm to detect using incremental learning manner [11]; (ii) Feature extraction for explicit concept drift detection [12]-using time series features to monitor how concepts evolve over time; and (iii) monitoring the change of error distribution in the learning part [13]. According to [13], detecting the change of error distribution by Drift Detection Method (DDM) is the most efficient way for concept drift detection, but it is only applicable for a classification task. In this work, we employ DDM to detect concept drift in the learning part for regression task. ...
... Update the threshold of ALD by (13); 19: ...
Conference Paper
Full-text available
This paper proposes a novel algorithm called Meta-cognitive Recurrent Kernel Online Sequential Extreme Learning Machine with a kernel filter and a modified Drift Detector Mechanism (Meta-RKOS-ELM\(_\mathrm{ALD}\)-DDM). The algorithm aims to tackle a well-known concept drift problem in time series prediction by utilising the modified concept drift detector mechanism. Moreover, the new meta-cognitive learning strategy is employed to solve parameter dependency and reduce learning time. The experimental results show that the proposed method can achieve better performance than the conventional algorithm in a set of financial datasets.
... The information captured by online behavior can thus change over time and lead the model's performance to drop [29]. Although a number of control mechanisms can be put into place (e.g., online learning [29,30]), understanding which behavioral features have a (large) impact on a model's classifications through explanations can help domain experts make sound statements on the expected lifetime of a model and its sensitivity to rapidly changing technological indicators and digital behavior. For example, the type of mobile phone applications that people use might change more rapidly compared to the genres of movies people watch or the type of places they visit on the weekend, which reflect more 'stable' behavior. ...
Article
Full-text available
Every step we take in the digital world leaves behind a record of our behavior; a digital footprint. Research has suggested that algorithms can translate these digital footprints into accurate estimates of psychological characteristics, including personality traits, mental health or intelligence. The mechanisms by which AI generates these insights, however, often remain opaque. In this paper, we show how Explainable AI (XAI) can help domain experts and data subjects validate, question, and improve models that classify psychological traits from digital footprints. We elaborate on two popular XAI methods (rule extraction and counterfactual explanations) in the context of Big Five personality predictions (traits and facets) from financial transactions data (N = 6,408). First, we demonstrate how global rule extraction sheds light on the spending patterns identified by the model as most predictive for personality, and discuss how these rules can be used to explain, validate, and improve the model. Second, we implement local rule extraction to show that individuals are assigned to personality classes because of their unique financial behavior, and there exists a positive link between the model’s prediction confidence and the number of features that contributed to the prediction. Our experiments highlight the importance of both global and local XAI methods. By better understanding how predictive models work in general as well as how they derive an outcome for a particular person, XAI promotes accountability in a world in which AI impacts the lives of billions of people around the world.
... The information captured by online behavior can thus change over time and lead the model's performance to drop [28]. Although a number of control mechanisms can be put into place (e.g., online learning [28,29]), understanding which behavioral features have a (large) impact on a model's classifications through explanations can help domain experts make sound statements on the expected lifetime of a model and its sensitivity to rapidly changing technological indicators and digital behavior. For example, the type of mobile phone applications that people use might change more rapidly compared to the genres of movies people watch or the type of places they visit on the weekend, which reflect more 'stable' behavior. ...
Preprint
Full-text available
Every step we take in the digital world leaves behind a record of our behavior; a digital footprint. Research has suggested that algorithms can translate these digital footprints into accurate estimates of psychological characteristics, including personality traits, mental health or intelligence. The mechanisms by which AI generates these insights, however, often remain opaque. In this paper, we show how Explainable AI (XAI) can help domain experts and data subjects validate, question, and improve models that classify psychological traits from digital footprints. We elaborate on two popular XAI methods (rule extraction and counterfactual explanations) in the context of Big Five personality predictions (traits and facets) from financial transactions data (N = 6,408). First, we demonstrate how global rule extraction sheds light on the spending patterns identified by the model as most predictive for personality, and discuss how these rules can be used to explain, validate, and improve the model. Second, we implement local rule extraction to show that individuals are assigned to personality classes because of their unique financial behavior, and that there exists a positive link between the model's prediction confidence and the number of features that contributed to the prediction. Our experiments highlight the importance of both global and local XAI methods. By better understanding how predictive models work in general as well as how they derive an outcome for a particular person, XAI promotes accountability in a world in which AI impacts the lives of billions of people around the world.
... The stationary data generators generate data continuously with respect to time with uniform data distribution. A data that is generated continuously with time is termed as data stream [2][3][4][5]. ...
... Mittal and Kashyap suggested various online methods of drift detection in his paper. They presented results of experiments and comparison of online drift detection methods [3]. Bifet et al. proposed a new experimental framework for evaluating change detection methods against intended outcomes. ...
... These methods fulfill the one pass requirement of learn ing in data stream without storing the data electronically. The online approaches can be broadly discussed in two categories: (i) Online learning approaches that use an explicit mechanis m to deal with concept drifts [7], [9], [10] and (ii) On line learning approaches that do not use any exp licit mechanism to deal with concept drifts [11][12][13][14][15] [32]. Most popularly, the former online learn ing approaches include Early Drift Detection Method (EDDM) [7] and Drift Detection Method (DDM ) [10]. ...
Article
In the real world, most of the applications are inherently dynamic in nature i.e. their underlying data distribution changes with time. As a result, the concept drifts occur very frequently in the data stream. Concept drifts in data stream increase the challenges in learning as well, it also significantly decreases the accuracy of the classifier. However, recently many algorithms have been proposed that exclusively designed for data stream mining while considering drifting concept in the data stream.This paper presents an empirical evaluation of these algorithms on datasets having four possible types of concept drifts namely; sudden, gradual, incremental, and recurring drifts.
... So, there will be a concept drift between data related to old process and the newer process. In [33], four types of concept drift have been introduced: sudden drifts, gradual drifts, recurring drifts and incremental drifts. In a sudden drift, the whole process will suddenly change from start to end (e.g., due to management changes the whole process changes). ...
Article
One of the most valuable assets of an organization is its organizational data. The analysis and mining of this potential hidden treasure can lead to much added-value for the organization. Process mining is an emerging area that can be useful in helping organizations understand the status quo, check for compliance and plan for improving their processes. The aim of process mining is to extract knowledge from event logs of today's organizational information systems. Process mining includes three main types: discovering process models from event logs, conformance checking and organizational mining. In this paper, we briefly introduce process mining and review some of its most important techniques. Also, we investigate some of the applications of process mining in industry and present some of the most important challenges that are faced in this area.
Chapter
Concept drift is the scenario in online learning in which value of target variable changes with respect to time. The learning algorithms should be adaptive in nature in order to cater the changes imposed due to change in concept. This paper discusses about the adaptive algorithms that are used in learning from the evolving data with different changing patterns. The various applications owing concept drift that are major sources of digital data stream and other real-world problems addresses concept drift are also discussed in this paper.
Article
Full-text available
In the real world concepts are often not stable but change with time. Typical examples of this are weather prediction rules and customers' preferences. The underlying data distribution may change as well. Often these changes make the model built on old data inconsistent with the new data, and regular updating of the model is necessary. This problem, known as concept drift, complicates the task of learning a model from data and requires special approaches, different from commonly used techniques, which treat arriving instances as equally important contributors to the final concept. This paper considers different types of concept drift, peculiarities of the problem, and gives a critical review of existing approaches to the problem.
Conference Paper
Full-text available
Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.
Conference Paper
Full-text available
We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time-and memory-efficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naïve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain up-to-date estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixed-size and variable-size window strategies with good results.
Article
Full-text available
This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naíve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full condence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.
Article
Full-text available
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Naïve Bayes classifiers at the leaves. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license.
Article
A geometrical moving average gives the most recent observation the greatest weight, and all previous observations weights decreasing in geometric progression from the most recent back to the first. A graphical procedure for generating geometric moving averages is described in which the most recent observation is assigned a weight r. The properties of control chart tests based on geometric moving averages are compared to tests based on ordinary moving averages.
Article
Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.
Conference Paper
We consider online learning where the tar- get concept can change over time. Previ- ous work on expert prediction algorithms has bounded the worst-case performance on any subsequence of the training data relative to the performance of the best expert. However, because these \experts" may be di-cult to implement, we take a more general approach and bound performance relative to the ac- tual performance of any online learner on this single subsequence. We present the additive expert ensemble algorithm AddExp, a new, general method for using any online learner for drifting concepts. We adapt techniques for analyzing expert prediction algorithms to prove mistake and loss bounds for a discrete and a continuous version of AddExp. Fi- nally, we present pruning methods and em- pirical results for data sets with concept drift.
Conference Paper
Algorithms for tracking concept drift are important for many applications. We present a general method based on the weighted majority algorithm for using any online learner for concept drift. Dynamic weighted majority (DWM) maintains an ensemble of base learners, predicts using a weighted-majority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and incremental tree inducer [ITI] as experts. For the sake of comparison, we also included Blum's implementation of weighted majority. On the STAGGER concepts and on the SEA concepts, results suggest that the ensemble method learns drifting concepts almost as well as the base algorithms learn each concept individually. Indeed, we report the best overall results for these problems to date.
Article
We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log jAj + m) mi...
Article
Concept drift occurs when a target concept changes over time. I present a new method for learning shifting target concepts during concept drift. The method, called Concept Drift Committee (CDC), uses a weighted committee of hypotheses that votes on the current classification. When a committee member's voting record drops below a minimal threshold, the member is forced to retire. A new committee member then takes the open place on the committee. The algorithm is compared to a leading algorithm on a number of concept drift problems. The results show that using a committee to track drift has several advantages over more customary window-based approaches.
DATA STREAM MINING: A Practical Approach
  • Albert Bifet
  • Richard Kirkby
Albert Bifet and Richard Kirkby, DATA STREAM MINING: A Practical Approach, August 2009.
Learning with drift detection
  • J Gama
  • P Medas
  • G Castillo
  • P Rodrigues
Gama, J., Medas, P., Castillo, G. and Rodrigues, P. (2004). Learning with drift detection, Proceedings of the Seventh Brazilian Symposium on Artificial Intelligence (SBIA'04) -Lecture Notes in Computer Science, Vol. 3171, Springer, S˜ao Luiz do Maranh˜ao, Brazil, pp. 286-295.