Accuracy for stream-learn data stream (1).

Accuracy for stream-learn data stream (1).

Source publication
Article
Full-text available
Modern analytical systems must process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift , and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous...

Contexts in source publication

Context 1
... of the experiments, we compare the performance of the proposed method to baseline. Results were collected following the experimental protocol described in the previous sections. To save space, we do not provide results for all models and streams. Instead, we plot accuracy achieved by models on selected data streams. These results are presented in Fig. 4, 5, 6, and 7. All learning curves were smoothed using a 1D Gaussian filter with σ = ...
Context 2
... also observe larger gains from applying CAR on streams with bigger chunk sizes. To illustrate please compare results from Fig. 4 to Fig. 5. One possible explanation behind this trend is that gains obtained from employing CAR are proportional to the difference in size between the base and drift chunk size. In our experiments, drift chunk size was equal to 30 for all streams and models. This explanation is also in line with the results of hyperparameter ...

Citations

... In works of literature recently, concept drift is further described. At a certain moment, p(x t , y t ) can be obtained from the conditional class concept distribution by formula (2). ...
Full-text available
Article
The concept drift detection method is an online learner. Its main task is to determine the position of drifts in the data stream, so as to reset the classifier after detecting the drift to improve the learning performance, which is very important in practical applications such as user interest prediction or financial transaction fraud detection. Aiming at the inability of existing drift detection methods to balance the detection delay, false positives, false negatives, and space–time efficiency, a new level transition threshold parameter is proposed, and a multi-level weighted mechanism including "Stable Level-Warning Level-Drift Level" is innovatively introduced in the concept drift detection. The instances in the window are weighted in levels, and the double sliding window is also applied. Based on this, a multi-level weighted drift detection method (MWDDM) is proposed. In particular, two variants which are MWDDM_H and MWDDM_M are proposed based on Hoeffding inequality and Mcdiarmid inequality, respectively. Experiments on artificial datasets show that MWDDM_H and MWDDM_M can detect abrupt and gradual concept drift faster than other comparison algorithms while maintaining a low false positive ratio and false negative ratio. Experiments on real-world datasets show that MWDDM has the highest classification accuracy in most cases while maintaining good space time efficiency.
... In [Almeida et al. 2018], a passive approach that dynamically selects classifiers according to the current concept and neighborhood of the test instance is proposed. [Kozal et al. 2021] proposed an adaptive chunk size, where new classifiers are trained in new blocks and added to the group of classifiers, and obsolete models are removed. ...
Conference Paper
Concept Drift is a common problem when we are working with Machine Learning. It refers to changes in the target concept over time, which may deteriorate the model’s accuracy. A recurrent problem on concept drift is to find datasets that reflect real-world scenarios. In this work, we show some datasets known to have Concept Drift, and propose changes in an existing method (Dynse), which include making it capable of handling data streams, instead of batches, and putting some trigger on it, to make its window adaptive by detecting concept drift.
... Implicit drift detection methods assume that the classifier is capable of self-adjusting to new instances coming from the stream while forgetting the old information (Liu et al., 2016). This way, new information is constantly incorporated into the learner, which should allow for adapting to evolving concepts (Kozal et al., 2021). Drawbacks of implicit methods lie in their parametrization -establishing proper learning and forgetting rates, as well as the size of a sliding window. ...
Full-text available
Article
Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data stream may be subject to various changes in its characteristics over time. Furthermore, distributions of classes may evolve over time, leading to a highly difficult non-stationary class imbalance. In this work we introduce Robust Online Self-Adjusting Ensemble (ROSE), a novel online ensemble classifier capable of dealing with all of the mentioned challenges. The main features of ROSE are: (i) online training of base classifiers on variable size random subsets of features; (ii) online detection of concept drift and creation of a background ensemble for faster adaptation to changes; (iii) sliding window per class to create skew-insensitive classifiers regardless of the current imbalance ratio; and (iv) self-adjusting bagging to enhance the exposure of difficult instances from minority classes. The interplay among these features leads to an improved performance in various data stream mining benchmarks. An extensive experimental study comparing with 30 ensemble classifiers shows that ROSE is a robust and well-rounded classifier for drifting imbalanced data streams, especially under the presence of noise and class imbalance drift, while maintaining competitive time complexity and memory consumption. Results are supported by a thorough non-parametric statistical analysis.