Conference Paper

Adaptive Model Tree for Streaming Data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With an ever-growing availability of data streams the interest in and need for efficient techniques dealing with such data increases. A major challenge in this context is the accurate online prediction of continuous values in the presence of concept drift. In this paper, we introduce a new adaptive model tree (AMT), designed to incrementally learn from the data stream, adapt to the changes, and to perform real time accurate predictions at anytime. To deal with sub models lying in different subspaces, we propose a new model clustering algorithm able to identify subspace models, and use it for computing splits in the input space. Compared to state of the art, our AMT allows for oblique splits, delivering more compact and accurate models.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The availability of real-time sensor data from the developed SmartHelmet prototype allows for streaming data, which is processed via an online algorithm (stream processing) to adapt the offline model [42]. The streaming data includes newňnewˇnewň samples of measured sensor data (new input matrixˇXmatrixˇ matrixˇX) acquired from the SmartHelmet sensors, and new personal thermal comfort votes (new output vectořvectoř T C ) acquired through an interactive query provided by the developed SmartHelmet App. ...
Article
Full-text available
Bicyclists can be subjected to crashes, which can cause injuries over the whole body, especially the head. Head injuries can be prevented by wearing bicycle helmets; however, bicycle helmets are frequently not worn due to a variety of reasons. One of the most common complaints about wearing bicycle helmets relates to thermal discomfort. So far, insufficient attention has been given to the thermal performance of helmets. This paper aimed to introduce and develop an adaptive model for the online monitoring of head thermal comfort based on easily measured variables, which can be measured continuously using impeded sensors in the helmet. During the course of this work, 22 participants in total were subjected to different levels of environmental conditions (air temperature, air velocity, mechanical work and helmet thermal resistance) to develop a general model to predict head thermal comfort. A reduced-order general linear regression model with three input variables, namely, temperature difference between ambient temperature and average under-helmet temperature, cyclist’s heart rate and the interaction between ambient temperature and helmet thermal resistance, was the most suitable to predict the cyclist’s head thermal comfort and showed maximum mean absolute percentage error (MAPE) of 8.4%. Based on the selected model variables, a smart helmet prototype (SmartHelmet) was developed using impeded sensing technology, which was used to validate the developed general model. Finally, we introduced a framework of calculation for an adaptive personalised model to predict head thermal comfort based on streaming data from the SmartHelmet prototype.
... To the best of our knowledge, no algorithm employing interval predictions use incoming test instances from the stream to improve prediction quality by dynamically adjusting the intervals. The idea of using new samples based on the same underlying distribution to improve prediction accuracy is, however, a component of the Adaptive Model Tree (AMT) algorithm by Zimmer et al. [22]. In AMT, the updates can be quite extensive, affecting not only the leaf node model, but all splits along the path to that leaf, thus sacrificing tree stability in order to utilize the extra information from new examples. ...
Conference Paper
Full-text available
Online predictive modeling of streaming data is a key task for big data analytics. In this paper, a novel approach for efficient online learning of regression trees is proposed, which continuously updates, rather than retrains, the tree as more labeled data become available. A conformal predictor outputs prediction sets instead of point predictions; which for regression translates into prediction intervals. The key property of a conformal predictor is that it is always valid, i.e., the error rate, on novel data, is bounded by a preset significance level. Here, we suggest applying Mondrian conformal prediction on top of the resulting models, in order to obtain regression trees where not only the tree, but also each and every rule, corresponding to a path from the root node to a leaf, is valid. Using Mondrian conformal prediction, it becomes possible to analyze and explore the different rules separately, knowing that their accuracy, in the long run, will not be below the preset significance level. An empirical investigation, using 17 publicly available data sets, confirms that the resulting rules are independently valid, but also shows that the prediction intervals are smaller, on average, than when only the global model is required to be valid. All-in-all, the suggested method provides a data miner or a decision maker with highly informative predictive models of streaming data.
Article
Full-text available
The problem of real-time extraction of meaningful patterns from time-changing data streams is of increasing importance for the machine learning and data mining communities. Regression in time-changing data streams is a relatively unexplored topic, despite the apparent applications. This paper proposes an efficient and incremental stream mining algorithm which is able to learn regression and model trees from possibly unbounded, high-speed and time-changing data streams. The algorithm is evaluated extensively in a variety of settings involving artificial and real data. To the best of our knowledge there is no other general purpose algorithm for incremental learning regression/model trees able to perform explicit change detection and informed adaptation. The algorithm performs online and in real-time, observes each example only once at the speed of arrival, and maintains at any-time a ready-to-use model tree. The tree leaves contain linear models induced online from the examples assigned to them, a process with low complexity. The algorithm has mechanisms for drift detection and model adaptation, which enable it to maintain accurate and updated regression models at any time. The drift detection mechanism exploits the structure of the tree in the process of local change detection. As a response to local drift, the algorithm is able to update the tree structure only locally. This approach improves the any-time performance and greatly reduces the costs of adaptation. KeywordsNon-stationary data streams–Stream data mining–Regression trees–Model trees–Incremental algorithms–On-line learning–Concept drift–On-line change detection
Article
Full-text available
A new algorithm for incremental construction of binary regression trees is presented. This algorithm, called SAIRT, adapts the induced model when facing data streams involving unknown dynamics, like gradual and abrupt function drift, changes in certain regions of the function, noise, and virtual drift. It also handles both symbolic and numeric attributes. The proposed algorithm can automatically adapt its internal parameters and model structure to obtain new patterns, depending on the current dynamics of the data stream. SAIRT can monitor the usefulness of nodes and can forget examples from selected regions, storing the remaining ones in local windows associated to the leaves of the tree. On these conditions, current regression methods need a careful configuration depending on the dynamics of the problem. Experimentation suggests that the proposed algorithm obtains better results than current algorithms when dealing with data streams that involve changes with different speeds, noise levels, sampling distribution of examples, and partial or complete changes of the underlying function.
Conference Paper
Full-text available
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.
Article
Full-text available
A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have been batch techniques that operate on the entire training set. However there are many situations when an incremental learner is advantageous. In this article a new batch model tree learner is described with two alternative splitting rules and a stopping rule. An incremental algorithm is then developed that has many similarities with the batch version but is able to process examples one at a time. An online pruning rule is also developed. The incremental training time for an example is shown to only depend on the height of the tree induced so far, and not on the number of previous examples. The algorithms are evaluated empirically on a number of standard datasets, a simple test function and three dynamic domains ranging from a simple pendulum to a complex 13 dimensional flight simulator. The new batch algorithm is compared with the most recent batch model tree algorithms and is seen to perform favourably overall. The new incremental model tree learner compares well with an alternative online function approximator. In addition it can sometimes perform almost as well as the batch model tree algorithms, highlighting the effectiveness of the incremental implementation.
Article
Full-text available
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first pre sent an online SVM algorithm based on this premise. LASVM yields competitive misclassifi cation rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active exam- ple selection can yield faster training, higher accuracies , and simpler models, using only a fraction of the training example labels.
Article
Full-text available
High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be meaningful. We discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter-attribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable, and are likely to tradeoff with better accuracy.
Conference Paper
Article
Partial least squares (PLS) modeling is an algorithm for relating one or more dependent variables to two or more independent variables. As a regression procedure it apparently evolved from the method of principal components regression (PCR) using the NIPALS algorithm, which is similar to the power method for determining the eigenvectors and eigenvalues of a matrix. This paper presents a theoretical explanation of the PLS algorithm using singular value decomposition and the power method. The relation of PLS to PCR is demonstrated, and PLS is shown to be one of a continuum of possible solutions of a similar type. These other solutions may give better prediction than either PLS or PCR under appropriate conditions.
Article
In recent years, several algorithms have appeared for modifying the factors of a matrix following a rank-one change. These methods have always been given in the context of specific applications and this has probably inhibited their use over a wider field. In this report, several methods are described for modifying Cholesky factors. Some of these have been published previously while others appear for the first time. In addition, a new algorithm is presented for modifying the complete orthogonal factorization of a general matrix, from which the conventional $QR$ factors are obtained as a special case. A uniform notation has been used and emphasis has been placed on illustrating the similarity between different methods.
Article
Partial least squares (PLS) regression is effectively used in process modeling and monitoring to deal with a large number of variables with collinearity. In this paper, several recursive partial least squares (RPLS) algorithms are proposed for on-line process modeling to adapt process changes and off-line modeling to deal with a large number of data samples. A block-wise RPLS algorithm is proposed with a moving window and forgetting factor adaptation schemes. The block-wise RPLS algorithm is also used off-line to reduce computation time and computer memory usage in PLS regression and cross-validation. As a natural extension, the recursive algorithm is extended to dynamic modeling and nonlinear modeling. An application of the block recursive PLS algorithm to a catalytic reformer is presented to adapt the model based on new data.
Conference Paper
A linear model tree is a decision tree with a linear functional model in each leaf. In previous work we demonstrated that such trees can be learnt incrementally, and can form good models of non-linear dynamic environments. In this paper we introduce a new incremental node splitting criteria that is significantly faster than both our previous algorithm and other non-parametric incremental learning techniques, and in addition scales better with dimensionality. Empirical results in three domains ranging from a simple benchmark test function to a complex ten dimensional flight simulator show that in all cases the algorithm converges to a good final approximation, although the improved performance comes at the cost of slower initial learning.
Article
A new method is presented for flexible regression modeling of high dimensional data. The model takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data. This procedure is motivated by the recursive partitioning approach to regression and shares its attractive properties. Unlike recursive partitioning, however, this method produces continuous models with continuous derivatives. It has more power and flexibility to model relationships that are nearly additive or involve interactions in at most a few variables. In addition, the model can be represented in a form that separately identifies the additive contributions and those associated with the different multivariable interactions.
Article
Some empirical learning tasks are concerned with predicting values rather than the more familiar categories. This paper describes a new system, m5, that constructs tree-based piecewise linear models. Four case studies are presented in which m5 is compared to other methods.
Article
Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable regression tree algorithm is known. This paper proposes a novel regression tree construction algorithm (SECRET) that produces trees of high quality and scales to very large datasets. At every node, SECRET uses the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters.