Chapter

Employing dropout regularization to classify recurring drifted data streams

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Streaming data analysis is currently a rapidly growing research direction. One of the serious problems hindering the data stream classification is the fact that during the exploitation of the model, its probabilistic characteristics may change. This phenomenon is called concept drift. Until today, multiple methods have been proposed to overcome their negative influence on model performance during learning in dynamic environments. This work introduces a new streaming data classifier based on a dropout technique that can significantly reduce model restoration time and performance loss and can improve its overall score in the presence of recurring concept drifts. The usefulness of the proposed algorithm is evaluated based on extensive experimental study and backed-up with thorough statistical analysis.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, the applications of Dropout are still limited to a static data environment. In parallel to our work, Guzy and Woźniak (2020) showed that dropout helps deal with recurring concept drift because dropout leads to using submodels generated for each concept. However, this work only focused on deterministic neural networks and lacked an adaptive mechanism. ...
... Next, we examine the methods' behaviors more thoroughly when dealing with concept drift. Following by Guzy and Woźniak (2020) and Shaker and Hüllermeier (2015), we evaluate the five methods in terms of the lowest LPP achieved in a drift area, the median LPP in each concept, and restoration time for each new concept. While the lowest LPP shows the performance of methods when a new concept has just appeared, the median LPP illustrates the ability to learn this new concept from all data. ...
... where t 1 is a mini-batch where the LPP drops below 95% of the median LPP of an old concept, t 2 is a mini-batch where the LPP achieves 95% of the mean LPP of the next concept, and T is the total number of mini-batches. We use again the available source code 11 from Guzy and Woźniak (2020) to compute the mentioned measures. Table 2 shows the performance of the five methods in terms of the average of lowest LPP, median LPP, and restoration time on all times that concepts happen. ...
Article
Full-text available
The ability to analyze data streams, which arrive sequentially and possibly infinitely, is increasingly vital in various online applications. However, data streams pose various challenges, including sparse and noisy data as well as concept drifts, which easily mislead a learning method. This paper proposes a simple yet robust framework, called Adaptive Infinite Dropout (aiDropout), to effectively tackle these problems. Our framework uses a dropout technique in a recursive Bayesian approach in order to create a flexible mechanism for balancing between old and new information. In detail, the recursive Bayesian approach imposes a constraint on the model parameters to make a regularization term between the current and previous mini-batches. Then, dropout whose drop rate is autonomously learned can adjust the constraint to new data. Thanks to the ability to reduce overfitting and the ensemble property of Dropout, our framework obtains better generalization, thus it effectively handles undesirable effects of noise and sparsity. In particular, theoretical analyses show that aiDropout imposes a data-dependent regularization, therefore, it can adapt quickly to sudden changes from data streams. Extensive experiments show that aiDropout significantly outperforms the state-of-the-art baselines on a variety of tasks such as supervised and unsupervised learning.
... In many scenarios it is possible that a previously seen concept from k-th iteration may reappear D j+1 = D j−k over time (Sobolewski and Wozniak, 2017). One may store models specialized in previously seen concepts in order to speed up recovery rates after a known concept re-emerges (Guzy and Wozniak, 2020). ...
Article
Full-text available
Continuous learning from streaming data is among the most challenging topics in the contemporary machine learning. In this domain, learning algorithms must not only be able to handle massive volume of rapidly arriving data, but also adapt themselves to potential emerging changes. The phenomenon of evolving nature of data streams is known as concept drift. While there is a plethora of methods designed for detecting its occurrence, all of them assume that the drift is connected with underlying changes in the source of data. However, one must consider the possibility of a malicious injection of false data that simulates a concept drift. This adversarial setting assumes a poisoning attack that may be conducted in order to damage the underlying classification system by forcing an adaptation to false data. Existing drift detectors are not capable of differentiating between real and adversarial concept drift. In this paper, we propose a framework for robust concept drift detection in the presence of adversarial and poisoning attacks. We introduce the taxonomy for two types of adversarial concept drifts, as well as a robust trainable drift detector. It is based on the augmented restricted Boltzmann machine with improved gradient computation and energy function. We also introduce Relative Loss of Robustness—a novel measure for evaluating the performance of concept drift detectors under poisoning attacks. Extensive computational experiments, conducted on both fully and sparsely labeled data streams, prove the high robustness and efficacy of the proposed drift detection framework in adversarial scenarios.
... In many scenarios it is possible that a previously seen concept from k-th iteration may reappear D j+1 = D j−k over time [20]. One may store models specialized in previously seen concepts in order to speed up recovery rates after a known concept reemerges [21]. ...
Article
Continual learning from streaming data sources becomes more and more popular due to the increasing number of online tools and systems. Dealing with dynamic and everlasting problems poses new challenges for which traditional batch-based offline algorithms turn out to be insufficient in terms of computational time and predictive performance. One of the most crucial limitations is that we cannot assume having an access to a finite and complete data set – we always have to be ready for new data that may complement our model. This poses a critical problem of providing labels for potentially unbounded streams. In real world, we are forced to deal with very strict budget limitations, therefore, we will most likely face the scarcity of annotated instances, which are essential in supervised learning. In our work, we emphasize this problem and propose a novel instance exploitation technique. We show that when: (i) data is characterized by temporary non-stationary concepts, and (ii) there are very few labels spanned across a long time horizon, it is actually better to risk overfitting and adapt models more aggressively by exploiting the only labeled instances we have, instead of sticking to a standard learning mode and suffering from severe underfitting. We present different strategies and configurations for our methods, as well as an ensemble algorithm that attempts to maintain a sweet spot between risky and normal adaptation. Finally, we conduct a complex in-depth comparative analysis of our methods, using state-of-the-art streaming algorithms relevant for the given problem.
... Although DTW is a well-suited distance measure for time series classification, meanwhile state-of-the-art solutions are based on deep learning techniques, see e.g. [8], [18], [20], [21]. Especially, recent convolutional neural networks (CNNs) perform well for time series classification tasks, in many cases they outperform the previous baseline of kNN-DTW, see [7] for a review on CNNs for time series classification. ...
Chapter
Due to its prominent applications, time series classification is one of the most important fields of machine learning. Although there are various approaches for time series classification, dynamic time warping (DTW) is generally considered to be a well-suited distance measure for time series. Therefore, in the early 2000s, techniques based on DTW dominated this field. On the other hand, deep learning techniques, especially convolutional neural networks (CNN) were shown to be able to solve time series classification tasks accurately. Although CNNs are extraordinarily popular, the scalar product in convolution only allows for rigid pattern matching. In this paper, we aim at combining the advantages of DTW and CNN by proposing the dynamic convolution operation and dynamic convolutional neural networks (DCNNs). The main idea behind dynamic convolution is to replace the dot product in convolution by DTW. We perform experiments on 10 publicly available real-world time-series datasets and demonstrate that our proposal leads to statistically significant improvement in terms of classification accuracy in various applications. In order to promote the use of DCNN, we made our implementation publicly available at https://github.com/kr7/DCNN.
... Such issues are particularly prominent in applications deployed over a non-stationary stream where data is continuously provided to the system. Many studies [5][6][7][8][9] in the past decade have focused on concept drift challenges when performing classification over a data stream. However, most of the studies proposed solutions are for non-imaging data streams (which possess low dimensional data). ...
Article
Full-text available
In the modern era of digitization, the analysis in the Internet of Things (IoT) environment demands a brisk amalgamation of domains such as high-dimension (images) data sensing technologies, robust internet connection (4 G or 5 G) and dynamic (adaptive) deep learning approaches. This is required for a broad range of indispensable intelligent applications, like intelligent healthcare systems. Dynamic image classification is one of the major areas of concern for researchers, which may take place during analysis under the IoT environment. Dynamic image classification is associated with several temporal data perturbations (such as novel class arrival and class evolution issue) which cause a massive classification deterioration in the deployed classification models and make them in-effective. Therefore, this study addresses such temporal inconsistencies (novel class arrival and class evolution issue) and proposes an adapted deep learning framework (ameliorated adaptive convolutional neural network (CNN) ensemble framework), which handles novel class arrival and class evaluation issue during dynamic image classification. The proposed framework is an improved version of previous adaptive CNN ensemble with an additional online training (OT) and online classifier update (OCU) modules. An OT module is a clustering-based approach which uses the Euclidean distance and silhouette method to determine the potential new classes, whereas, the OCU updates the weights of the existing instances of the ensemble with newly arrived samples. The proposed framework showed the desirable classification improvement under non-stationary scenarios for the benchmark (CIFAR10) and real (ISIC 2019: Skin disease) data streams. Also, the proposed framework outperformed against state-of-art shallow learning and deep learning models. The results have shown the effectiveness and proven the diversity of the proposed framework to adapt the new concept changes during dynamic image classification. In future work, the authors of this study aim to develop an IoT-enabled adaptive intelligent dermoscopy device (for dermatologists). Therefore, further improvements in classification accuracy (for real dataset) is the future concern of this study.
Article
Due to sensor failures and other issues, real-world time series may contain missing values, often in consecutive segments. Classification of such time series is an important task with prominent applications in various domains such as medicine, manufacturing, social networks and environmental sciences. In this paper, we consider various approaches that have been designed for this task, in particular, fully-convolutional neural networks (FCNs) with sparsity-invariant convolution and dynamic time warping convolution. We compare their performance to that of a standard transformer, TARNet, which has not been tailored to the classification of time series with missing values. Our results indicate that even this simple transformer may outperform the aforementioned models that were designed to deal with missing values. As this observation is consistent for many datasets from various domains and various distributions of missing values, we conclude that transformers are an exceptionally strong baseline for the classification of time series with missing values. In order to support the reproduction of our results as well as follow-up works, we performed the aforementioned experiments on publicly available time series datasets using a publicly available implementation of TARNet.
Article
Topic models have become ubiquitous tools for analyzing streaming data. However, existing streaming topic models suffer from several limitations when applied to real-world data streams. This includes the inability to accommodate evolving vocabularies and control topic quality throughout the streaming process. In this paper, we propose a novel streaming topic modeling approach that dynamically adapts to the changing nature of data streams. Our method leverages Byte-Pair Encoding embedding (BPEmb) to resolve the out-of-vocabulary problem that arises with new words in the stream. Additionally, we introduce a topic change variable that provides fine-grained control over topics’ parameter updates and present a preservation approach to retain high-coherence topics at each time step, helping preserve semantic quality. To further enhance model adaptability, our method allows dynamical adjustment of topic space size as needed. To the best of our knowledge, we are the first to address the expansion of vocabulary and maintain topic quality during the streaming process. Extensive experiments show the superior effectiveness of our method.
Conference Paper
Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.
Chapter
Virtual reality (VR) is gaining popularity very fast due to newer solutions that increase user perception. Glasses, sensors, and treadmills are the basic functionality for immersing yourself in a virtual environment. In this paper, we propose a human-AI collaboration for analyzing the newly generated images that can be used for creating worlds. The presented method is based on analyzing different scenes (from simulation and real environment) using generative adversarial networks (GAN) and the communication with the user for assessments of the created new environment. User’s information contributes to the analysis of sample quality and possible rebuilding or retraining of the GAN model. The proposal increases the perception of VR by taking the user’s feelings in creating new environments. For this purpose, we combine GAN with fuzzy soft sets inference to gain the possibility of retraining/remodeling the used neural network. It was examined in theoretical simulation and real-environment case study.
Chapter
The main motivation for the presented research was to investigate the behavior of different convolutional neural network architectures in the analysis of non-stationary data streams. Learning a model on continuously incoming data is different from learning where a complete learning set is immediately available. However, streaming data is definitely closer to reality, as nowadays, most data needs to be analyzed as soon as it arrives (e.g., in the case of anti-fraud systems, cybersecurity, and analysis of images from on-board cameras and other sensors). Besides the vital aspect related to the limitations of computational and memory resources that the proposed algorithms must consider, one of the critical difficulties is the possibility of concept drift. This phenomenon means that the probabilistic characteristics of the considered task change, and this, in consequence, may lead to a significant decrease in classification accuracy. This paper pays special attention to models of convolutional neural networks based on probabilistic methods: Monte Carlo dropout and Bayesian convolutional neural networks. Of particular interest was the aspect related to the uncertainty of predictions returned by the model. Such a situation may occur mainly during the classification of drifting data streams. Under such conditions, the prediction system should be able to return information about the high uncertainty of predictions and the need to take action to update the model used. This paper aims to study the behavior of the network of the models mentioned above in the task of classification of non-stationary data streams and to determine the impact of the occurrence of a sudden drift on the accuracy and uncertainty of the predictions.
Conference Paper
Full-text available
Catastrophic forgetting is a problem which refers to losing the information of the first task after training from the second task in continual learning of neural networks. To resolve this problem, we propose the incremental moment matching (IMM), which uses the Bayesian neural network framework. IMM assumes that the posterior distribution of parameters of neural networks is approximated with Gaussian distribution and incrementally matches the moment of the posteriors, which are trained for the first and second task, respectively. To make our Gaussian assumption reasonable, the IMM procedure utilizes various transfer learning techniques including weight transfer, L2-norm of old and new parameters, and a newly proposed variant of dropout using old parameters. We analyze our methods on the MNIST and CIFAR-10 datasets, and then evaluate them on a real-world life-log dataset collected using Google Glass. Experimental results show that IMM produces state-of-the-art performance in a variety of datasets.
Article
Full-text available
Significance Deep neural networks are currently the most successful machine-learning technique for solving a variety of tasks, including language translation, image classification, and image generation. One weakness of such models is that, unlike humans, they are unable to learn multiple tasks sequentially. In this work we propose a practical solution to train such models sequentially by protecting the weights important for previous tasks. This approach, inspired by synaptic consolidation in neuroscience, enables state of the art results on multiple reinforcement learning problems experienced sequentially.
Article
Full-text available
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Article
Full-text available
Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm is experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios, involving many types of drift as well as static environments.
Article
Full-text available
Most stream classifiers are designed to process data incrementally, run in resource-aware environments, and react to concept drifts, i.e., unforeseen changes of the stream’s underlying data distribution. Ensemble classifiers have become an established research line in this field, mainly due to their modularity which offers a natural way of adapting to changes. However, in environments where class labels are available after each example, ensembles which process instances in blocks do not react to sudden changes sufficiently quickly. On the other hand, ensembles which process streams incrementally, do not take advantage of periodical adaptation mechanisms known from block-based ensembles, which offer accurate reactions to gradual and incremental changes. In this paper, we analyze if and how the characteristics of block and incremental processing can be combined to produce new types of ensemble classifiers. We consider and experimentally evaluate three general strategies for transforming a block ensemble into an incremental learner: online component evaluation, the introduction of an incremental learner, and the use of a drift detector. Based on the results of this analysis, we put forward a new incremental ensemble classifier, called Online Accuracy Updated Ensemble, which weights component classifiers based on their error in constant time and memory. The proposed algorithm was experimentally compared with four state-of-the-art online ensembles and provided best average classification accuracy on real and synthetic datasets simulating different drift scenarios.
Article
Full-text available
An emerging problem in Data Streams is the detection of concept drift. This problem is aggravated when the drift is gradual over time. In this work we deflne a method for detecting concept drift, even in the case of slow gradual change. It is based on the estimated distribution of the distances between classiflcation errors. The proposed method can be used with any learning algorithm in two ways: using it as a wrapper of a batch learning algorithm or implementing it inside an incremental and online algorithm. The experimentation results compare our method (EDDM) with a similar one (DDM). Latter uses the error-rate instead of distance-error-rate.
Article
Full-text available
We address adaptive classification of streaming data in the presence of concept change. An overview of the machine learning approaches reveals a deficit of methods for explicit change detection. Typically, classifier ensembles designed for changing environments do not have a bespoke change detector. Here we take a systematic look at the types of changes in streaming data and at the current approaches and techniques in online classification. Classifier ensembles for change detection are discussed. An example is carried through to illustrate individual and ensemble change detectors for both unlabelled and labelled data. While this paper does not offer ready-made solutions, it outlines possibilities for novel approaches to classification of streaming data.
Article
Full-text available
We introduce an ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time. The proposed algorithm, named Learn++.NSE, learns from consecutive batches of data without making any assumptions on the nature or rate of drift; it can learn from such environments that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift. The algorithm learns incrementally, as other members of the Learn++ family of algorithms, that is, without requiring access to previously seen data. Learn++.NSE trains one new classifier for each batch of data it receives, and combines these classifiers using a dynamically weighted majority voting. The novelty of the approach is in determining the voting weights, based on each classifier's time-adjusted accuracy on current and past environments. This approach allows the algorithm to recognize, and act accordingly, to the changes in underlying data distributions, as well as to a possible reoccurrence of an earlier distribution. We evaluate the algorithm on several synthetic datasets designed to simulate a variety of nonstationary environments, as well as a real-world weather prediction dataset. Comparisons with several other approaches are also included. Results indicate that Learn++.NSE can track the changing environments very closely, regardless of the type of concept drift. To allow future use, comparison and benchmarking by interested researchers, we also release our data used in this paper.
Conference Paper
Full-text available
Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.
Conference Paper
Full-text available
Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.
Conference Paper
Full-text available
We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time-and memory-efficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naïve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain up-to-date estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixed-size and variable-size window strategies with good results.
Conference Paper
Full-text available
This paper proposes a general framework for classify- ing data streams by exploiting incremental clustering in order to dynamically build and update an ensemble of incremental classi- fiers. To achieve this, a transformation function that maps batches of examples into a new conceptual feature space is pro- posed. The clustering algorithm is then applied in order to group different concepts and identify recurring contexts. The ensemble is produced by maintaining an classifier for every concept dis- covered in the stream2.
Article
Full-text available
In this paper we present a multiple window incremental learning algorithm that distinguishes between virtual concept drift and real concept drift. The algorithm is unsupervised and uses a novel approach to tracking concept drift that involves the use of competing windows to interpret the data. Unlike previous methods which use a single window to determine the drift in the data, our algorithm uses three windows of different sizes to estimate the change in the data. The advantage of this approach is that it allows the system to progressively adapt and predict the change thus enabling it to deal more effectively with different types of drift. We give a detailed description of the algorithm and present the results obtained from its application to two real world problems: background image processing and sound recognition. We also compare its performance with FLORA, an existing concept drift tracking algorithm.
Article
Full-text available
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Naïve Bayes classifiers at the leaves. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license.
Article
Full-text available
We have previously introduced an incremental learning algorithm Learn(++), which learns novel information from consecutive data sets by generating an ensemble of classifiers with each data set, and combining them by weighted majority voting. However, Learn(++) suffers from an inherent "outvoting" problem when asked to learn a new class omega(new) introduced by a subsequent data set, as earlier classifiers not trained on this class are guaranteed to misclassify omega(new) instances. The collective votes of earlier classifiers, for an inevitably incorrect decision, then outweigh the votes of the new classifiers' correct decision on omega(new) instances--until there are enough new classifiers to counteract the unfair outvoting. This forces Learn(++) to generate an unnecessarily large number of classifiers. This paper describes Learn(++).NC, specifically designed for efficient incremental learning of multiple new classes using significantly fewer classifiers. To do so, Learn (++).NC introduces dynamically weighted consult and vote (DW-CAV), a novel voting mechanism for combining classifiers: individual classifiers consult with each other to determine which ones are most qualified to classify a given instance, and decide how much weight, if any, each classifier's decision should carry. Experiments on real-world problems indicate that the new algorithm performs remarkably well with substantially fewer classifiers, not only as compared to its predecessor Learn(++), but also as compared to several other algorithms recently proposed for similar problems.
Article
Full-text available
We introduce Learn++, an algorithm for incremental training of neural network (NN) pattern classifiers. The proposed algorithm enables supervised NN paradigms, such as the multilayer perceptron (MLP), to accommodate new data, including examples that correspond to previously unseen classes. Furthermore, the algorithm does not require access to previously used data during subsequent incremental learning sessions, yet at the same time, it does not forget previously acquired knowledge. Learn++ utilizes ensemble of classifiers by generating multiple hypotheses using training data sampled according to carefully tailored distributions. The outputs of the resulting classifiers are combined using a weighted majority voting procedure. We present simulation results on several benchmark datasets as well as a real-world classification task. Initial results indicate that the proposed algorithm works rather well in practice. A theoretical upper bound on the error of the classifiers constructed by Learn++ is also provided
Article
Full-text available
Concept drift due to hidden changes in context complicates learning in many domains including financial prediction, medical diagnosis, and network performance. Existing machine learning approaches to this problem use an incremental learning, on-line paradigm. Batch, offline learners tend to be ineffective in domains with hidden changes in context as they assume that the training set is homogeneous. An offline, meta-learning approach for the identification of hidden context is presented. The new approachusesan existing batch learner and the process of contextual clustering to identify stable hidden contexts and the associated context specific, locally stable concepts. The approach is broadly applicable to the extraction of context reflected in time and spacial attributes. Several algorithms for the approach are presented and evaluated. A successful application of the approach to a complex control task is also presented.
Article
Full-text available
Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Article
Full-text available
On-line learning in domains where the target concept depends on some hidden context poses serious problems. Context shifts can induce changes in the target concepts, producing what is known as concept drift. We describe a family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear. The general approach underlying all these algorithms consists of (1) keeping only a window of currently trusted examples and hypotheses; (2) storing concept descriptions and re-using them if a previous context re-appears; and (3) controlling both of these functions by a heuristic that constantly monitors the system's behavior. The paper reports on experiments that test the systems' performance under various levels noise and different extent and speed of concept drift. Key words. Incremental concept learning, on-line learning, context dependence, concept drift, forgetting 1 Introduction The work presented here relates to the global model o...
Article
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows producing a synthetic data stream that may incorporate each of the three main concept drift types (i.e., sudden, gradual and incremental drift) in their recurring or non-recurring version, as well as static and dynamic class imbalance. The package allows conducting experiments following established evaluation methodologies (i.e., Test-Then-Train and Prequential). Besides, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-the-art chunk-based and online classifier ensembles. The package utilises its own implementations of prediction metrics for imbalanced binary classification tasks to improve computational efficiency.
Article
Due to variety of modern real-life tasks, where analyzed data is often not a static set, the data stream mining gained a substantial focus of machine learning community. Main property of such systems is the large amount of data arriving in a sequential manner, which creates an endless stream of objects. Taking into consideration the limited resources as memory and computational power, it is widely accepted that each instance can be processed up once and it is not remembered, making reevaluation impossible. In the following work, we will focus on the data stream classification task where parameters of a classification model may vary over time, so the model should be able to adapt to the changes. It requires a forgetting mechanism, ensuring that outdated samples will not impact a model. The most popular approaches base on so-called windowing, requiring storage of a batch of objects and when new examples arrive, the least relevant ones are forgotten. Objects in a new window are used to retrain the model, which is cumbersome especially for online learners and contradicts the principle of processing each object at most once. Therefore, this work employs inbuilt forgetting mechanism of neural networks. Additionally, to reduce a need of expensive (sometimes even impossible) object labeling, we are focusing on active learning, which asks for labels only for interesting examples, crucial for appropriate model upgrading. Characteristics of proposed methods were evaluated on the basis of the computer experiments, performed over diverse pool of data streams. Their results confirmed the convenience of proposed strategy.
Article
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research.
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Article
The extension of machine learning methods from static to dynamic environments has received increasing attention in recent years; in particular, a large number of algorithms for learning from so-called data streams has been developed. An important property of dynamic environments is non-stationarity, i.e., the assumption of an underlying data generating process that may change over time. Correspondingly, the ability to properly react to so-called concept change is considered as an important feature of learning algorithms. In this paper, we propose a new type of experimental analysis, called recovery analysis, which is aimed at assessing the ability of a learner to discover a concept change quickly, and to take appropriate measures to maintain the quality and generalization performance of the model. We develop recovery analysis for two types of supervised learning problems, namely classification and regression. Moreover, as a practical application, we make use of recovery analysis in order to compare model-based and instance-based approaches to learning on data streams.
Article
The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may have a number of paired comparisons leading to a series of differences, some of which may be positive and some negative. The appropriate methods for testing the significance of the differences of the means in these two cases are described in most of the textbooks on statistical methods.
Article
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.
An ensemble of classifier for coping with recurring contexts in data streams
  • I Katakis
  • G Tsumakas
  • I Vlahavas