Impact of chunk size on obtained accuracy.

Impact of chunk size on obtained accuracy.

Source publication
Article
Full-text available
Modern analytical systems must process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift , and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous...

Contexts in source publication

Context 1
... we examine the impact of the chunk size on the model performance and general capability for handling data with concept drit. To evaluate these properties, we train the AWE model on a synthetic data stream with different chunk sizes. The stream consists of 20 features, 2 classes, and it contains only 1 abrupt drift. Results are presented in Fig. 3. As expected, chunk size has an impact on the maximal accuracy that the model can achieve. It is especially visible before a drift, where models with larger chunks obtain the best accuracy. Also, with larger chunks variance of accuracy is ...
Context 2
... is available to the underlying model. Therefore it allows for the training of a more accurate model. Interestingly we can see that for all chunk sizes, performance is restored roughly at the same time. Regardless of the chunk size, a similar number of updates is required to bring back the model performance. Please keep in mind that the x-axis in Fig. 3 is the number of chunks. It means that models trained on larger chunks require a larger number of learning examples to restore accuracy. These results give the rationale behind our method. When drift is detected, we change chunk size to decrease the consumption of learning examples required for restoring accuracy. Next, we gradually ...

Citations

... Implicit drift detection methods assume that the classifier is capable of self-adjusting to new instances coming from the stream while forgetting the old information (Liu et al., 2016). This way, new information is constantly incorporated into the learner, which should allow for adapting to evolving concepts (Kozal et al., 2021). Drawbacks of implicit methods lie in their parametrization -establishing proper learning and forgetting rates, as well as the size of a sliding window. ...
Article
Full-text available
Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data stream may be subject to various changes in its characteristics over time. Furthermore, distributions of classes may evolve over time, leading to a highly difficult non-stationary class imbalance. In this work we introduce Robust Online Self-Adjusting Ensemble (ROSE), a novel online ensemble classifier capable of dealing with all of the mentioned challenges. The main features of ROSE are: (i) online training of base classifiers on variable size random subsets of features; (ii) online detection of concept drift and creation of a background ensemble for faster adaptation to changes; (iii) sliding window per class to create skew-insensitive classifiers regardless of the current imbalance ratio; and (iv) self-adjusting bagging to enhance the exposure of difficult instances from minority classes. The interplay among these features leads to an improved performance in various data stream mining benchmarks. An extensive experimental study comparing with 30 ensemble classifiers shows that ROSE is a robust and well-rounded classifier for drifting imbalanced data streams, especially under the presence of noise and class imbalance drift, while maintaining competitive time complexity and memory consumption. Results are supported by a thorough non-parametric statistical analysis.