Article

Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Advances in data technology have enabled streaming acquisition of real-time information in a wide range of settings, including consumer credit, electricity consumption, and internet user behavior. Streaming data consist of transiently observed, temporally evolving data sequences, and poses novel challenges to statistical analysis. Foremost among these challenges are the need for online processing, and temporal adaptivity in the face of unforeseen changes, both smooth and abrupt, in the underlying data generation mechanism. In this paper, we develop streaming versions of two widely used parametric classifiers, namely quadratic and linear discriminant analysis. We rely on computationally efficient, recursive formulations of these classifiers. We additionally equip them with exponential forgetting factors that enable temporal adaptivity via smoothly down-weighting the contribution of older data. Drawing on ideas from adaptive filtering, we develop an online method for self-tuning forgetting factors on the basis of an approximate gradient scheme. We provide extensive simulation and real data analysis that demonstrate the effectiveness of the proposed method in handling diverse types of change, while simultaneously offering monitoring capabilities via interpretable behavior of the adaptive forgetting factors. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The foundation of making a good selection is to correctly and efficiently track the expected reward of the arms, especially in the context of time-evolving reward distributions. Adaptive estimation approaches are useful for this task as they provide an estimator that follows a moving target (Anagnostopoulos et al., 2012;Bodenham and Adams, 2016), here the target is the expected reward. In this section, we introduce how to use an AFF estimator for monitoring a single arm. ...
... One problem with this estimator is that it often fails in the case that the reward distribution changes over time. The adaptive filtering literature (Haykin, 2002) provides a generic and practical tool to track a time-evolving data stream, and it has been recently adapted to a variety of streaming machine learning problems (Anagnostopoulos et al., 2012;Bodenham and Adams, 2016). The key idea behind adaptive estimation is to gradually reduce the weight on older data as new data arrives (Haykin, 2002). ...
... Here, we choose L t = (Ŷ t−1 − Y t ) 2 for good mean tracking performance, which can be interpreted as the one-step-ahead squared prediction error. Other choices are possible, such as the one-step-ahead negative log-likelihood (Anagnostopoulos et al., 2012), but this will not be pursued here. In addition, ∆(L t , λ t−2 ) is a derivative-like function of L t with respect to λ t−2 (see Bodenham and Adams, 2016, sect. ...
Preprint
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary, we focus on dynamic Bernoulli bandits. Standard methods like ϵ\epsilon-Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track changes in the underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of ϵ\epsilon-Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the dynamic reward process, which is important for real applications. We examine the new algorithms numerically in different scenarios and the results show solid improvements of our algorithms in dynamic environments.
... The SGD's iterative nature makes it possible to apply it on data streams by buffering the data points in fixed-size batches [7]. 1 Figure 2 illustrates this idea by analyzing a NN composed of a single linear neuron with two weights and no bias. Let's assume that the NN at d C1 , when a first abrupt drift occurs, has learned the decision boundary illustrated in Fig. 2.a. ...
... We, therefore, input the data points in chronological order. Notice that, in this way, we are not minimizing the loss function to all the data but only to the most recently seen data points [1]. Indeed, the literature on data streams [2] commonly assigns greater weight to recent data points because we expect that future data points related to the current concept will bear greater similarity to recent data. ...
... We evaluate the final accuracy in four cases to verify the two hypotheses for each concept. The first two cases ( [1,50] and [1,100]) analyze how models adapt to the new concept by considering the accuracy at the end of the first 50 50] .96, .004 ...
Chapter
Full-text available
Dealing with an unbounded data stream involves overcoming the assumption that data is identically distributed and independent. A data stream can, in fact, exhibit temporal dependencies (i.e., be a time series), and data can change distribution over time (concept drift). The two problems are deeply discussed, and existing solutions address them separately: a joint solution is absent. In addition, learning multiple concepts implies remembering the past (a.k.a. avoiding catastrophic forgetting in Neural Networks’ terminology). This work proposes Continuous Progressive Neural Networks (cPNN), a solution that tames concept drifts, handles temporal dependencies, and bypasses catastrophic forgetting. cPNN is a continuous version of Progressive Neural Networks, a methodology for remembering old concepts and transferring past knowledge to fit the new concepts quickly. We base our method on Recurrent Neural Networks and exploit the Stochastic Gradient Descent applied to data streams with temporal dependencies. Results of an ablation study show a quick adaptation of cPNN to new concepts and robustness to drifts.
... This paper introduces an 'adaptive detection and estimation procedure for transition matrices', referred to as ADEPT-M, which sequentially and adaptively estimates a transition matrix while continuously monitoring a data stream for changepoints. Temporal adaptivity is introduced via forgetting factors (Bodenham and Adams 2016;Anagnostopoulos et al. 2012;Pavlidis et al. 2011), which are a sequence of scalars that continuously down-weights older observations as newer data arrives. The forgetting factors (FFs) can be tuned online without user supervision, which removes the burden of having to subjectively specify their values. ...
... A way of producing temporally adaptive parameter estimates is by introducing FFs into the estimation process. FFs have been considered in Bodenham and Adams (2016), Anagnostopoulos et al. (2012), Pavlidis et al. (2011) andHaykin (2008), and are scalars in [0, 1] that continuously down-weights older data as newer data arrives. Therefore, if drift occurs older data is 'washed-out', allowing the estimation to be driven by newer, more informative observations. ...
... This, along with the fact that the FFs are recursively defined in terms of one another, makes an exact gradient computation challenging. However, as in Anagnostopoulos et al. (2012), for small step-sizes it can be argued that the FFs are approximately fixed. The gradient of J (· | ·) is then computed by assuming it is a function of a single, fixed, FF λ (i) . ...
Article
Full-text available
Sequentially detecting multiple changepoints in a data stream is a challenging task. Difficulties relate to both computational and statistical aspects, and in the latter, specifying control parameters is a particular problem. Choosing control parameters typically relies on unrealistic assumptions, such as the distributions generating the data, and their parameters, being known. This is implausible in the streaming paradigm, where several changepoints will exist. Further, current literature is mostly concerned with streams of continuous-valued observations, and focuses on detecting a single changepoint. There is a dearth of literature dedicated to detecting multiple changepoints in transition matrices , which arise from a sequence of discrete states. This paper makes the following contributions: a complete framework is developed for adaptively and sequentially estimating a Markov transition matrix in the streaming data setting. A change detection method is then developed, using a novel moment matching technique, which can effectively monitor for multiple changepoints in a transition matrix. This adaptive detection and estimation procedure for transition matrices, referred to as ADEPT-M , is compared to several change detectors on synthetic data streams, and is implemented on two real-world data streams – one consisting of over nine million HTTP web requests, and the other being a well-studied electricity market data set.
... In Section 2.3.3, we provide the mathematical foundations for the QDA classifier as well as an adaptive implementation of it based on [140]. ...
... The parameters are typically estimated using a training dataset offline and then used for prediction. However, an adaptive implementation of the QDA classifier developed by Anagnostopoulos et al. [140] uses online learning with forgetting factor λ as shown in (2.22). ...
... where I(x = k) is the indicator function that is equal to 1 when the value of x is equal to that of k; else it is 0. A complete derivation can be found in [140]. ...
Thesis
Full-text available
Intelligent machines, and more broadly, intelligent systems, are becoming increasingly common in the everyday lives of humans. Nonetheless, despite significant advancements in automation, human supervision and intervention are still essential in almost all sectors, ranging from manufacturing and transportation to disaster-management and healthcare. These intelligent machines interact and collaborate with humans in a way that demands a greater level of trust between human and machine. While a lack of trust can lead to a human's disuse of automation, over-trust can result in a human trusting a faulty autonomous system which could have negative consequences for the human. Therefore, human trust should be calibrated to optimize these human-machine interactions. This calibration can be achieved by designing human-aware automation that can infer human behavior and respond accordingly in real-time. In this dissertation, I present a probabilistic framework to model and calibrate a human's trust and workload dynamics during his/her interaction with an intelligent decision-aid system. More specifically, I develop multiple quantitative models of human trust, ranging from a classical state-space model to a classification model based on machine learning techniques. Both models are parameterized using data collected through human-subject experiments. Thereafter, I present a probabilistic dynamic model to capture the dynamics of human trust along with human workload. This model is used to synthesize optimal control policies aimed at improving context-specific performance objectives that vary automation transparency based on human state estimation. I also analyze the coupled interactions between human trust and workload to strengthen the model framework. Finally, I validate the optimal control policies using closed-loop human subject experiments. The proposed framework provides a foundation toward widespread design and implementation of real-time adaptive automation based on human states for use in human-machine interactions.
... An alternate method for producing adaptive estimates is to use forgetting factors. Forgetting factors are commonly used in adaptive filtering (Haykin 2008) and have been successfully used in various statistical stream mining settings (Anagnostopoulos et al. 2012;Bodenham and Adams 2017;Pavlidis et al. 2011). Forgetting factors are a sequence of scalars, which continuously downweights historic data as new data are observed without introducing any computational burden. ...
... Forgetting factors are a sequence of scalars, taking values in [0, 1], that downweight historic data as newer data arrive. Forgetting factors have been previously considered (Anagnostopoulos et al. 2012;Bodenham and Adams 2017;Pavlidis et al. 2011) and can be interpreted as a continuous analog of the sliding window approach, as discussed in Sect. 1. Fig. 1 An example of a stream with two changepoints at times τ 1 and τ 2 , and detections made by a detector at timesτ 1 andτ 2 . The shaded regions indicate grace periods where the detector is not monitoring for any changepoints In Sect. ...
... Similar to Anagnostopoulos et al. (2012), consider the weighted log-likelihood function given by ...
Article
Full-text available
The need for efficient tools is pressing in the era of big data, particularly in streaming data applications. As data streams are ubiquitous, the ability to accurately detect multiple changepoints, without affecting the continuous flow of data, is an important issue. Change detection for categorical data streams is understudied, and existing work commonly introduces fixed control parameters while providing little insight into how they may be chosen. This is ill-suited to the streaming paradigm, motivating the need for an approach that introduces few parameters which may be set without requiring any prior knowledge of the stream. This paper introduces such a method, which can accurately detect changepoints in categorical data streams with fixed storage and computational requirements. The detector relies on the ability to adaptively monitor the category probabilities of a multinomial distribution, where temporal adaptivity is introduced using forgetting factors. A novel adaptive threshold is also developed which can be computed given a desired false positive rate. This method is then compared to sequential and nonsequential change detectors in a large simulation study which verifies the usefulness of our approach. A real data set consisting of nearly 40 million events from a computer network is also investigated.
... We consider generative models in this work and incorporate dynamic characteristics using the prior class probabilities based on Markov decision process as discussed in Section III-B. In Section III-A, we provide the mathematical foundations for the QDA classifier as well as an adaptive implementation of it based on [11]. ...
... The parameters are typically estimated using a training dataset offline and then used for prediction. However, an adaptive implementation of the QDA classifier developed by Anagnostopoulos et al. [11] uses online learning with forgetting factor λ as shown in (4). ...
... where I(x = k) is the indicator function that is equal to 1 when the value of x is equal to that of k; else it is 0. A complete derivation can be found in [11]. ...
... Adaptive estimation is among the proposed techniques for handling temporal dependencies of data. For example, in [3,11,32,37], the approaches employed adaptive estimation methodology by considering a so-called forgetting factor. Here, according to a decay function of the forgetting factor, the underlying assumption is that the importance of a data in a stream is inversely proportional to its age. ...
... An example of such an approach has been suggested by Bodenham et al. [11]. They adopted a so-called forgetting factor, originally suggested by Anagnostopoulos et al. [3], to develop a new approach for continuously monitoring changes in a data stream, using adaptive estimation. They proposed an exponential forgetting factor method to decrease the importance of data according to a decay function, which is inversely proportional to the age of the observed data. ...
... Adaptive estimation approaches were previously proposed to handle issues with the uncertainty of data. In streaming data, the importance of historical data is weighted using a forgetting function with a decay factor [3,11,41]. This means that, in the stream, the most recent data is more important than the older ones. A decay function is also useful to flush any noise when detecting concept drifts. ...
Article
Full-text available
Detection of changes in streaming data is an important mining task, with a wide range of real-life applications. Numerous algorithms have been proposed to efficiently detect changes in streaming data. However, the limitation of existing algorithms is that they assume that data are generated independently. In particular, temporal dependencies of data in a stream are still not thoroughly studied. Motivated by this, in this work we propose a new efficient method to detect changes in streaming data by exploring the temporal dependencies of data in the stream. As part of this, we introduce a new statistical model called the Candidate Change Point (CCP) model, with which the main idea is to compute the probabilities of finding change points in the stream. The computed probabilities are used to generate a distribution, which is, in turn, used in statistical hypothesis tests to determine the candidate changes. We use the CCP model to develop a new algorithm called Candidate Change Point Detector (CCPD), which detects change points in linear time, and is thus applicable for real-time applications. Our extensive experimental evaluation demonstrates the efficiency and the feasibility of our approach.
... The foundation of making a good selection is to correctly and efficiently track the expected reward of the arms, especially in the context of time-evolving reward distributions. Adaptive estimation approaches are useful for this task, as they provide an estimator that follows closer a moving target, here the target is the expected reward (Anagnostopoulos et al., 2012;Bodenham and Adams, 2016). In this section, we introduce how to use an Adaptive Forgetting Factor (AFF) estimator for monitoring a single arm. ...
... One problem with this estimator is that it often fails in the case that the reward distribution changes over time. The adaptive filtering literature (Haykin, 2002) provides a generic and practical tool to track a time-evolving data stream, and it has been recently adapted to a variety of streaming machine learning problems (Anagnostopoulos et al., 2012;Bodenham and Adams, 2016). The key idea behind adaptive estimation is to gradually reduce the weight on older data as new data arrives (Haykin, 2002). ...
... Here, we choose L t = (Ŷ t−1 − Y t ) 2 for good mean tracking performance, which can be interpreted as the one-step-ahead squared prediction error. Other choices are possible, such as the one-step-ahead negative log-likelihood (Anagnostopoulos et al., 2012), but this will not be pursued here. In addition, ∆(L t , λ t−2 ) is a derivative-like function of L t with respect to λ t−2 (see Bodenham and Adams, 2016, sect. ...
Article
Full-text available
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary counts, we focus on dynamic Bernoulli bandits. Standard methods like ϵ\epsilon-Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track the changes in underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of ϵ\epsilon-Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the data, which is important for real applications. We examine the new algorithms numerically in different scenarios and find out that the results show solid improvements of our algorithms in dynamic environments.
... Specifically, such implementations need only examine each streaming datum once, and have a constant and low memory demand. Moreover, other ideas from adaptive filter theory can yield automatic sequential selection of the forgetting factor (Anagnostopoulos et al. 2012). The capability to automatically set such control parameters is crucial for streaming applications, since human intervention is impractical. ...
... Adaptive estimation approaches are not primarily intended for explicit change detection, but instead for keeping an estimator close to a time-varying target. This methodology has been successfully deployed, or adapted, to a number of streaming data problems (Anagnostopoulos et al. 2012;Pavlidis et al. 2011;Adams et al. 2010). In this paper, adaptive estimation is used to provide an up-to-date estimator that will provide some resilience to the errors that can occur in change detection, such as false positives and missed detections. ...
... Interestingly, Eqs. (16) and (17) agree with those given in Anagnostopoulos et al. (2012). However, there the derivative was defined by assuming λ i+1 = λ i , which is a counterintuitive assumption to make in order to derive update equations for time-varying − → λ . ...
Article
Full-text available
Data streams are characterised by a potentially unending sequence of high-frequency observations which are subject to unknown temporal variation. Many modern streaming applications demand the capability to sequentially detect changes as soon as possible after they occur, while continuing to monitor the stream as it evolves. We refer to this problem as continuous monitoring. Sequential algorithms such as CUSUM, EWMA and their more sophisticated variants usually require a pair of parameters to be selected for practical application. However, the choice of parameter values is often based on the anticipated size of the changes and a given choice is unlikely to be optimal for the multiple change sizes which are likely to occur in a streaming data context. To address this critical issue, we introduce a changepoint detection framework based on adaptive forgetting factors that, instead of multiple control parameters, only requires a single parameter to be selected. Simulated results demonstrate that this framework has utility in a continuous monitoring setting. In particular, it reduces the burden of selecting parameters in advance. Moreover, the methodology is demonstrated on real data arising from Foreign Exchange markets.
... In such a setting data is assumed to be arriving continuously. Applications involving streaming data are abundant, ranging from finance [2] to social networks [14] and neuroscience Imperial College London 3 Deparment of Biomedical Engineering, Kings College London [24]. In this work we are motivated by the latter application where GGMs are frequently used to model statistical dependencies across spatially remote brain regions, referred to as functional connectivity [21]. ...
... This is achieved by quantifying the performance of current parameter estimates for new observations, X t . Initially, such schemes were proposed in the context of least squares estimation but have recently been extended to tasks such as tracking second order statistics [2]. Throughout this work we denote such a measure by C(X t ). ...
... The motivation for the use of co-ordinate descent algorithms here is two-fold. First, an estimate of the sample covariance can easily be obtained in an online fashion [2]. Second, at each iteration new information is incorporated and a new lasso problem must be solved. ...
Article
Full-text available
We propose a framework to perform streaming covariance selection. Our approach employs regularization constraints where a time-varying sparsity parameter is iteratively estimated via stochastic gradient descent. This allows for the regularization parameter to be efficiently learnt in an online manner. The proposed framework is developed for linear regression models and extended to graphical models via neighbourhood selection. We demonstrate the capabilities of such an approach using both synthetic data as well as neuroimaging data.
... Beyond trees, a large variety of adaptive parametric classifiers have been proposed in the literature (see [12] for a review), where often the focus is not on the classifier per se, but rather on successfully managing the trade-off between retaining obsolete information on one hand, and discarding useful historical information on the other. This trade-off is often handled via adaptive forgetting factors, where historical information is smoothly downweighted in an adaptive manner that aims to capture not only the presence, but also the speed of the drift [13]. ...
... This trade-off is often handled via adaptive forgetting factors, where historical information is smoothly downweighted in an adaptive manner that aims to capture not only the presence, but also the speed of the drift [13]. In [12], it is argued that long-term stability of the data discarding/forgetting mechanism, as opposed to the flexibility of the underlying classifier, can often be the real performance bottleneck in streaming classification, and an approach is proposed that seems to be stable over arbitrarily long time horizons. ...
... In [34], this family of priors is shown to satisfy desirable information-theoretic optimality properties. Exponential downweighting as a means of enabling temporal adaptivity also has a long tradition in non-stationary signal processing [14], as well as streaming classification [12], where λ is often referred to as a forgetting factor. We only consider historical discarding in this section, since active discarding relies on an i.i.d. ...
Article
Full-text available
Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art.
... Moreover, the use of adaptive forgetting also provides an additional monitoring mechanism. By considering the estimated value of the forgetting factor r t at any given point in time we can gain an understanding as to the current degree of non-stationarity in the data [Anagnostopoulos et al., 2012]. This follows from the fact that the estimated forgetting factor quantifies the influence of recent observations on the sample mean and covariance. ...
... Moreover due to the highly autocorrelated nature of fMRI time series, splitting past observations into subsets is itself non-trivial. Here we build on the work of Anagnostopoulos et al. [2012] and employ adaptive forgetting methods to maximize this quantity in a computationally efficient manner. This is achieved by approximating the derivative of L t+1 with respect to r t . ...
... The results shown in this section are taken from Anagnostopoulos et al. [2012]. ...
Article
Full-text available
There has been an explosion of interest in functional Magnetic Resonance Imaging (MRI) during the past two decades. Naturally, this has been accompanied by many major advances in the understanding of the human connectome. These advances have served to pose novel challenges as well as open new avenues for research. One of the most promising and exciting of such avenues is the study of functional MRI in real-time. Such studies have recently gained momentum and have been applied in a wide variety of settings; ranging from training of healthy subjects to self-regulate neuronal activity to being suggested as potential treatments for clinical populations. To date, the vast majority of these studies have focused on a single region at a time. This is due in part to the many challenges faced when estimating dynamic functional connectivity networks in real-time. In this work we propose a novel methodology with which to accurately track changes in functional connectivity networks in real-time. We adapt the recently proposed SINGLE algorithm for estimating sparse and temporally homo- geneous dynamic networks to be applicable in real-time. The proposed method is applied to motor task data from the Human Connectome Project as well as to real-time data ob- tained while exploring a virtual environment. We show that the algorithm is able to estimate signi?cant task-related changes in network structure quickly enough to be useful in future brain-computer interface applications.
... The algorithms applied in DSA have to fulfil strict criteria in the context of: 1) the speed of data transmission to a program, 2) computational complexity of the algorithm, 3) amount of memory necessary to apply the algorithm. An algorithm should be highly elastic in adaptation to changes in the data generating mechanism [2]. ...
... The estimators applied to DSA may belong to any of the above categories. However, they should be computationally tractable, robust to a small fraction of outliers and inliers and possess what might be called forgetting mechanism enabling adaptation to a change in the data generating mechanism regime [2]. The simplest forgetting mechanism may be obtained by introducing estimation basing on a moving window of a fixed, random or data driven length. ...
... setting were studied in [18]. The authors proposed several DD-plot based statistics and presented bootstrap arguments for their consistency and high effectiveness in comparison to Hotelling 2 T and multi ...
Article
Full-text available
Data streams (streaming data) consist of transiently observed, evolving in time, multidimensional data sequences that challenge our computational and/or inferential capabilities. In this paper we propose user friendly approaches for robust monitoring of selected properties of unconditional and conditional distributions of the stream based on depth functions. Our proposals are robust to a small fraction of outliers and/or inliers, but at the same time are sensitive to a regime change in the stream. Their implementations are available in our free R package DepthProc.
... Therefore, the randomized Kaczmarz method solution of the least squares formulation of LDA converges to a horizon of the binaryclass Gaussian model LDA direction. 3. Convergence guarantees for the randomized Kaczmarz method account for randomness from the iterates [31,40,52,53]. ...
... Similar to the leverage scores, however, both the row norms andΣ would need to be computed prior to the computation of the RK iterates. The idea of viewing batches of the data in sequence is similar to that of streaming discriminant analysis [3,35,62]. ...
Preprint
Full-text available
We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access to only one row of the training data at a time. We present convergence guarantees for the sketched predictions on new data within a fixed number of iterations. These guarantees account for both the Gaussian modeling assumptions on the data and algorithmic randomness from the sketching procedure. Finally, we demonstrate performance with varying step-sizes and numbers of iterations. Our numerical experiments demonstrate that sketched LDA can offer a very viable alternative to full data LDA when the data may be too large for full data analysis.
... Some theoretical issues will also be discussed. 1 The name "Open World Machine Learning" was used to refer machine learning with unseen class [57] or out-of-distribution (OOD) data [61]. In fact, it is not beyond closed-world studies if the unseen class is known in advance, and related to Section 2 if unseen class is unknown. ...
... Sliding window based approaches hold recent instances and discard old ones falling outside the window, with a fixed or adaptive window size [38,42]. Forgetting based approaches assign a weight to each instance, and downweight old instances according to their age [41,1]. Ensemble [87] based approaches add/remove component learners in the ensemble adaptively , and dynamically adjust the weights of learners for incoming instances [28]. ...
Preprint
Conventional machine learning studies generally assume close world scenarios where important factors of the learning process hold invariant. With the great success of machine learning, nowadays, more and more practical tasks, particularly those involving open world scenarios where important factors are subject to change, called open environment machine learning (Open ML) in this article, are present to the community. Evidently it is a grand challenge for machine learning turning from close world to open world. It becomes even more challenging since, in various big data tasks, data are usually accumulated with time, like streams, while it is hard to train the machine learning model after collecting all data as in conventional studies. This article briefly introduces some advances in this line of research, focusing on techniques concerning emerging new classes, decremental/incremental features, changing data distributions, varied learning objectives, and discusses some theoretical issues.
... The proposed framework seeks to iteratively update the regularization parameter via stochastic gradient descent. As such, the proposed method is conceptually related to adaptive filtering theory [17], discussed below, which has been extensively employed in the context of streaming data [3]. ...
... As expected, there is some lag directly after each change occurs, however, the estimated regression parameters quickly adapt. The right panel of Figure [ 3] shows the mean residual error, C t+1 , for unseen data. We note there are abrupt spikes every 100 observations, corresponding to the abrupt changes in the underlying dependence structure. ...
Article
Full-text available
Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning 1\ell_1 regularized linear models in the context of streaming data. In particular, the focus of this work revolves around how to select the regularization parameter when data arrives sequentially and the underlying distribution is non-stationary (implying the choice of optimal regularization parameter is itself time-varying). We propose a novel framework through which to infer an adaptive regularization parameter. Our approach employs an 1\ell_1 penalty constraint where the corresponding sparsity parameter is iteratively updated via stochastic gradient descent. This serves to reformulate the choice of regularization parameter in a principled framework for online learning and allows for the derivation of convergence guarantees in a non-stochastic setting. We validate our approach using simulated and real datasets and present an application to a neuroimaging dataset.
... Moreover, the use of adaptive forgetting also provides an additional monitoring mechanism. By considering the estimated value of the forgetting factor r t at any given point in time, we can gain an understanding as to the current degree of non-stationarity in the data [Anagnostopoulos et al., 2012]. This follows from the fact that the estimated forgetting factor quantifies the influence of recent observations on the sample mean and covariance. ...
... Moreover, due to the highly autocorrelated nature of fMRI time series, splitting past observations into subsets is itself non-trivial. Here, we build on the work of Anagnostopoulos et al. [2012] and use adaptive forgetting methods to maximize this quantity in a computationally efficient manner. This is achieved by approximating the derivative of L t11 with respect to r t . ...
Article
Two novel and exciting avenues of neuroscientific research involve the study of task-driven dynamic reconfigurations of functional connectivity networks and the study of functional connectivity in real-time. While the former is a well-established field within neuroscience and has received considerable attention in recent years, the latter remains in its infancy. To date, the vast majority of real-time fMRI studies have focused on a single brain region at a time. This is due in part to the many challenges faced when estimating dynamic functional connectivity networks in real-time. In this work, we propose a novel methodology with which to accurately track changes in time-varying functional connectivity networks in real-time. The proposed method is shown to perform competitively when compared to state-of-the-art offline algorithms using both synthetic as well as real-time fMRI data. The proposed method is applied to motor task data from the Human Connectome Project as well as to data obtained from a visuospatial attention task. We demonstrate that the algorithm is able to accurately estimate task-related changes in network structure in real-time. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
... The algorithms applied in DSA must fulfill strict criteria in the context of: 1) the speed of data transmission to a program; 2) computational complexity of the algorithm; 3) amount of memory necessary to apply the algorithm. An algorithm should be highly elastic in adaptation to changes in the data generating mechanism (see [1]). ...
... From a merit point of view, these answers concern evaluations of the future economic perspectives of a company, uncertainty of investment and hence risk evaluation, pricing of the capital etc. However, they should be computationally tractable, robust to a small fraction of outliers and inliers, and possess what might be called a forgetting mechanism enabling adaptation to a change in the data generating mechanism regime (see [1]). The simplest forgetting mechanism may be obtained by introducing estimations based on a moving window of a fixed, random, or data driven length. ...
Article
Full-text available
Data streams (streaming data) consist of continuously observed, non-equally spaced and temporally evolving multidimensional data sequences that challenge our computational and/or inferential capabilities. In economics, data streams are among others related to electricity consumption monitoring, Internet user behavior in exploring, or order book forecasting in high-frequency financial markets. In this paper, we point out and discuss several open problems related to robust data stream analysis and propose three robust and conceptually very simple approaches in this context. We apply the proposals to real data sets related to the activity of investors in the futures contracts market.
... Quadratic discriminant analysis (QDA) is a method that can estimate nonlinear dependencies between complex indicator patterns, such as those in the SP indicators and social controversies. Unlike linear discriminant analysis, it can capture such dependencies by not assuming that the covariance of each class of data is identical (Anagnostopoulos et al., 2012;Ou and Wang, 2009;Yuan et al., 2017). QDA generates a model developed from conditional data densities by constructing a quadratic decision boundary. ...
Article
Purpose The purpose of this study is to develop a method to assess social performance. Traditionally, environment, social and governance (ESG) rating providers use subjectively weighted arithmetic averages to combine a set of social performance (SP) indicators into one single rating. To overcome this problem, this study investigates the preconditions for a new methodology for rating the SP component of the ESG by applying machine learning (ML) and artificial intelligence (AI) anchored to social controversies. Design/methodology/approach This study proposes the use of a data-driven rating methodology that derives the relative importance of SP features from their contribution to the prediction of social controversies. The authors use the proposed methodology to solve the weighting problem with overall ESG ratings and further investigate whether prediction is possible. Findings The authors find that ML models are able to predict controversies with high predictive performance and validity. The findings indicate that the weighting problem with the ESG ratings can be addressed with a data-driven approach. The decisive prerequisite, however, for the proposed rating methodology is that social controversies are predicted by a broad set of SP indicators. The results also suggest that predictively valid ratings can be developed with this ML-based AI method. Practical implications This study offers practical solutions to ESG rating problems that have implications for investors, ESG raters and socially responsible investments. Social implications The proposed ML-based AI method can help to achieve better ESG ratings, which will in turn help to improve SP, which has implications for organizations and societies through sustainable development. Originality/value To the best of the authors’ knowledge, this research is one of the first studies that offers a unique method to address the ESG rating problem and improve sustainability by focusing on SP indicators.
... SQDA extends SLDA estimating one covariance matrix for each class (i.e., Quadratic). The drawbacks are increased memory consumption and need for many samples per class to estimate reliable covariance matrices [26]. Streaming Gaussian Naïve Bayes (SNB) estimates a running variance vector per class [27] (i.e., diagonal covariance matrices assuming independent features). ...
... SQDA extends SLDA estimating one covariance matrix for each class (i.e., Quadratic). The drawbacks are increased memory consumption and need for many samples per class to estimate reliable covariance matrices [26]. Streaming Gaussian Naïve Bayes (SNB) estimates a running variance vector per class [27] (i.e., diagonal covariance matrices assuming independent features). ...
Preprint
Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new words from a non-repeated stream of samples, seen one at a time. To this end, we propose Temporal Aware Pooling (TAP) which constructs an enriched feature space computing high-order moments of speech features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a Gaussian model for each class on the enriched feature space to effectively use audio representations. In experimental analyses, TAP-SLDA outperforms competitors on several setups, backbones, and baselines, bringing a relative average gain of 11.3% on the GSC dataset.
... SQDA extends SLDA estimating one covariance matrix for each class (i.e., Quadratic). The drawbacks are increased memory consumption and need for many samples per class to estimate reliable covariance matrices [62]. Streaming Gaussian Naïve Bayes (SNB) estimates a running variance vector per class [63] (i.e., diagonal covariance matrices assuming independent features). ...
Preprint
Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e.g., horizontal or vertical) and environments (e.g., day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed.
... Supervised Unsupervised Technique Naive Bayes [45,46] Yes − Classification Linear regression [47,48] Yes − Regression ANN [49][50][51] − Yes NN K-means [52,53] − Yes Clustering Quadratic discriminant Yes Yes Dimensionality analysis [35,[54][55][56] reduction SVM [57,58] Yes − Classification regression k Nearest neighbor [59][60][61] − Yes Clustering Decision tree [62,63] Yes − Tree ...
Chapter
According to Chinese health officials, almost 250 million people in China may have caught Covid-19 in the first 20 days of December. Due to the Covid-19 pandemic and its global spread, there is a significant impact on our health system and economy, causing many deaths and slowing down worldwide economic progress. The recent pandemic continues to challenge the health systems worldwide, including a life that realizes a massive increase in various medical resource demands and leads to a critical shortage of medical equipment. Therefore, physical and virtual analysis of day-to-day death, recovery cases, and new cases by accurately providing the training data are needed to predict threats before they are outspread. Machine learning algorithms in a real-life situation help the existing cases and predict the future instances of Covid-19. Providing accurate training data to the learning algorithm and mapping between the input and output class labels minimizes the prediction error. Polynomials are usually used in statistical analysis. Furthermore, using this statistical information, the prediction of upcoming cases is more straightforward using those same algorithms. These prediction models combine many features to predict the risk of infection being developed. With the help of prediction models, many areas can be strengthened beforehand to cut down risks and maintain the health of the citizens. Many predictions before the second wave of Covid-19 were realized to be accurate, and if we had worked on it, we would have decreased the fatality rate in India. In particular, nine standard forecasting models, such as linear regression (LR), polynomial regression (PR), support vector machine (SVM), Holt's linear, Holt-Winters, autoregressive (AR), moving average (MA), seasonal autoregressive integrated moving average (SARIMA), and autoregressive combined moving average (ARIMA), are used to forecast the alarming factors of Covid-19. The models make three predictions: the number of new cases, deaths, and recoveries over the next 10 days. To identify the principal features of the dataset, we first grouped different types of cases as per the date and plotted the distribution of active and closed cases. We calculated various valuable stats like mortality and recovery rates, growth factor, and doubling rate. Our results show that the ARIMA model gives the best possible outcomes on the dataset we used with the most minor root mean squared error of 23.24, followed by the SARIMA model, which offers somewhat close results to the AR model. It provides a root mean square error (RMSE) of 25.37. Holt's linear model does not have any considerable difference with a root mean square error of 27.36. Holt's linear model has a value very close to the moving average (MA) model, which results in the root mean square of 27.43. This research, like others, is also not free from any shortcomings. We used the 2019 datasets, which missed some features due to which models like Facebook Prophet did not predict results up to the mark; so we excluded those results in our outcomes. Also, the python package for the Prophet is a little non-functional to work on massive Covid-19 datasets appropriately. The period is better, where there is a need for more robust features in the datasets to support our framework.
... Slidingwindow-based approaches hold recent instances and discard old ones falling outside the window, with a fixed or adaptive window size [46,47]. Forgetting-based approaches assign a weight to each instance, and downweight old instances according to their age [48,49]. Ensemble-based [37] approaches add/remove component learners in the ensemble adaptively, and dynamically adjust the weights of learners for incoming instances [50]. ...
Article
Full-text available
Conventional machine learning studies generally assume close world scenarios where important factors of the learning process hold invariant. With the great success of machine learning, nowadays, more and more practical tasks, particularly those involving open world scenarios where important factors are subject to change, called open environment machine learning (Open ML) in this article, are present to the community. Evidently it is a grand challenge for machine learning turning from close world to open world. It becomes even more challenging since, in various big data tasks, data are usually accumulated with time, like streams, while it is hard to train the machine learning model after collecting all data as in conventional studies. This article briefly introduces some advances in this line of research, focusing on techniques concerning emerging new classes, decremental/incremental features, changing data distributions, varied learning objectives, and discusses some theoretical issues.
... The objective of these algorithms consists of learning a function or a set of rules, denoted as a classifier, which allows assigning a new (unobserved) object to the correct class. There are several types of algorithms used to train classifiers, which can be organized by learning strategy: • Statistical models suppose the classes of objects are generated in terms of some probabilistic distribution, such as linear and quadratic discriminant analysis [40]; • Artificial neural networks (ANNs) attempt to model the human brain mathematically. An example is the back propagation multilayer perceptron algorithm [41]; • Support vector machine algorithms (SVMs) try to seek out hyperplanes in a high-dimensional feature space that separates the data into different categories. ...
Article
Full-text available
With the further liberalization of the electricity market of China, customers’ requirements, characteristics and distribution, as well as the quality, security and reliability of power supplies without interruption, have received considerable attention from power companies, policymakers and researchers. How to deeply explore distribution characteristics of electricity customers and analyze their sensitivities to electricity blackouts has become an especially important problem. This paper takes over 0.1 billion data, collected by various smart devices of Internet of things (IOT) in power system of China, such as smart meters, intelligent power consumption interactive terminals, data concentrators, and other cross-platform data, for example 95598 telephone records, complaint information, user bills, user information and maintenance records, as study objects, to analyse consumption characteristics of power users. It has been found that there is a wide range of power users who pay different electricity bills; a long-tail distribution following a power law lies in the number of users versus their paid electricity bills. Meanwhile, there are two Pareto effects (2-8 rule): the number of residents and non-residents versus their electricity bills, the number of large industry users and general industry (business users) versus in their electricity consumption and bills. Then, a decision tree algorithm is proposed to capture characteristics of electricity consumers and recognize the crowd who is power blackout sensitive. The evaluation indexes and parameters of the decision tree are discussed in detail, and a comparison with other intelligent algorithms shows that the decision tree has a good recognition performance over that of others, and the characteristics used to identify the blackout-sensitive crowd are various. All results state that except for economic factors, positive social effects should also be considered. Various marketing strategies to satisfy different requirements of power users should be provided to promote long-term relationships between the power companies and power customers.
... Saatci et al [11] combined Bayesian online change point detection with Gaussian processes to create a nonparametric time series model which can detect change points and handle different settings, such as water volume of river, snowfall and gain of investments. Anagnostopoulos et al [12] improved linear and quadratic discriminant analysis by exponential forgetting factor, using the algorithm to classify internet users' behavior. ...
Conference Paper
Changepoints detection of online data streams is a very important issue. Adaptive estimation using a forgetting factor (briefly AFF) is an efficient algorithm for this problem. However, AFF assumes the pre-change distribution is normal, which is restrictive. In addition, AFF uses a defaulted step size 0.01. In fact, numerical results show that the step size has significant impact on the final performance of AFF algorithm, and a principle is lacking on choosing the step size. In this paper, we develop an improved AFF algorithm (briefly, IAFF). Specifically, a distribution free measure for declaring changepoints is proposed, which makes IAFF algorithm performing well for different pre-change distributions. Moreover, a general principle on choosing the step size is proposed based on intensive numerical study. Simulation results show that IAFF algorithm has much better performance than AFF in different situations.
... For the purpose of exposition, a single fixed λ value is used, although it is possible to tune sequentially (e.g. see Anagnostopoulos et al., 2012). ...
Article
Instrumentation of infrastructure is changing the way engineers design, construct, monitor and maintain structures such as roads, bridges and underground structures. Data gathered from these instruments have changed the hands-on assessment of infrastructure behaviour to include data processing and statistical analysis procedures. Engineers wish to understand the behaviour of the infrastructure and detect changes – for example, degradation – but are now using high-frequency data acquired from a sensor network. Presented in this paper is a case study that models and analyses in real time the dynamic strain data gathered from a railway bridge which has been instrumented with fibre-optic sensor networks. The high frequency of the data combined with the large number of sensors requires methods that efficiently analyse the data. First, automated methods are developed to extract train passage events from the background signal and underlying trends due to environmental effects. Second, a streaming statistical model which can be updated efficiently is introduced that predicts strain measurements forward in time. This tool is enhanced to provide anomaly detection capabilities in individual sensors and the entire sensor network. These methods allow for the practical processing and analysis of large data sets. The implementation of these contributions will be essential for demonstrating the value of self-sensing structures.
... We propose a novel statistical methodology for classifying functional objects, that can be applied to monitor and manage the Internet users behaviour. This study shows, that a proper Support Vector Machine methods (SVM) (Schoelkopf & Smola, 2002) enable for efficient classification of functional data, appearing in modern e-economy (Anagnostopoulos et al. 2012). This method can be used to classify various functional time series objects. ...
Poster
Full-text available
SVM Classifiers for Functional Data in Monitoring of the Internet Users Behaviour
... We would expect this to perform well in terms of reliability and predictive efficiency. Future academic work in this area could focus on building more efficient models by careful segmentation of the limited property data available, exploration of better use of the training data (eg using longer training periods with adaptive forgetting [1]), refining estimation of the property price trend, using expert knowledge of the housing market or considering property segments (eg leasehold separately to freehold, or segmenting by geographical location). ...
Article
Full-text available
Accurate property valuation is important for property purchasers, investors and for mortgage-providers to assess credit risk in the mortgage market. Automated valuation models (AVM) are being developed to provide cheap, objective valuations that allow dynamic updating of property values over the term of a mortgage. A useful feature of automated valuations is to provide a region of plausible price estimates for each individual property, rather than just a single point estimate. This would allow buyers and sellers to understand uncertainty on pricing individual properties and mortgage providers to include conservatism in their credit risk assessment. In this study, Conformal Predictors (CP) are used to provide such region predictions, whilst strictly controlling for predictive accuracy. We show how an AVM can be constructed using a CP, based on an underlying k-nearest neighbours approach. Time trend in property prices is dealt with by assuming a systematic effect over time and adjusting prices in the training data accordingly. The AVM is tested on a large data set of London property prices. Region predictions are shown to be reliable and the efficiency, ie region width, of property price predictions is investigated. In particular, a regression model is constructed to model the uncertainty in price prediction linked to property characteristics.
... Computer scientists recently developed a series of software packages for the streaming processing of data in production environments. Frameworks such as S4 by Yahoo! (Gopalakrishna et al., 2013), and Twitter's Storm (Storm User Group, 2013) provide an infrastructure for real-time streaming computation of event-driven data (e.g., Babcock et al., 2002;Anagnostopoulos et al., 2012) which is scalable and reliable. ...
Article
Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams. However, in current production packages testing and evaluating streaming algorithms is cumbersome. This paper presents RStorm for the development and evaluation of streaming algorithms analogous to these production packages, but implemented fully in R. RStorm allows developers of streaming algorithms to quickly test, iterate, and evaluate various implementations of streaming algorithms. The paper provides both a canonical computer science example, the streaming word count, and examples of several statistical applications of RStorm.
... The class prior is shown, since the experimental study uses error rate as loss. The sources for this data include UCI (Bache and Lichman, 2013), Guyon et al. (2011), Anagnostopoulos et al. (2012) and Adams et al. (2010). ...
Article
A central question for active learning (AL) is: "what is the optimal selection?" Defining optimality by classifier loss produces a new characterisation of optimal AL behaviour, by treating expected loss reduction as a statistical target for estimation. This target forms the basis of model retraining improvement (MRI), a novel approach providing a statistical estimation framework for AL. This framework is constructed to address the central question of AL optimality, and to motivate the design of estimation algorithms. MRI allows the exploration of optimal AL behaviour, and the examination of AL heuristics, showing precisely how they make sub-optimal selections. The abstract formulation of MRI is used to provide a new guarantee for AL, that an unbiased MRI estimator should outperform random selection. This MRI framework reveals intricate estimation issues that in turn motivate the construction of new statistical AL algorithms. One new algorithm in particular performs strongly in a large-scale experimental study, compared to standard AL methods. This competitive performance suggests that practical efforts to minimise estimation bias may be important for AL applications.
... The class prior is shown, since the experimental study uses error rate as loss. The sources for this data include UCI (Bache and Lichman, 2013), Guyon et al. (2011), Anagnostopoulos et al. (2012 and Adams et al. (2010). ...
Article
Full-text available
In many classification problems unlabelled data is abundant and a subset can be chosen for labelling. This defines the context of active learning (AL), where methods systematically select that subset, to improve a classifier by retraining. Given a classification problem, and a classifier trained on a small number of labelled examples, consider the selection of a single further example. This example will be labelled by the oracle and then used to retrain the classifier. This example selection raises a central question: given a fully specified stochastic description of the classification problem, which example is the optimal selection? If optimality is defined in terms of loss, this definition directly produces expected loss reduction (ELR), a central quantity whose maximum yields the optimal example selection. This work presents a new theoretical approach to AL, example quality, which defines optimal AL behaviour in terms of ELR. Once optimal AL behaviour is defined mathematically, reasoning about this abstraction provides insights into AL. In a theoretical context the optimal selection is compared to existing AL methods, showing that heuristics can make sub-optimal selections. Algorithms are constructed to estimate example quality directly. A large-scale experimental study shows these algorithms to be competitive with standard AL methods.
Article
Respiratory diseases are contagious and immensely affect all aspects of people and spread through the air or direct contact. COVID-19 is one of the most dangerous respiratory infections, and it has exaggerated many countries. The battle to curb its spread was waged in every country, even with few or no infections. Vaccination is one of the most vital to fight against COVID-19, and it started in India in January 2021. Every country's government has created awareness programs about COVID-19 and its updates through messages and videos on social media to reduce misconceptions and panic that followed due to the outright misinformation about COVID-19 and its impacts. This study classifies the medical vaccination tweets related to COVID-19; we extracted the tweets regarding vaccination in India from 1 January 2021 to 31 December 2021. We classified the tweets into four categories: pro-vaccine, anti-vaccine, hesitancy and cognizant. We performed the text summarization using fuzzy logic and classification using the stacked ANN and compared the results using the different word embedding models. During the vaccination period, we identified that allergy is a general topic discussed by individuals in social media through quadratic discriminant analysis. The proposed model surpassed the results of the baseline models and achieved an accuracy of 96.7%.
Article
Full-text available
The growing prominence and emphasis of renewable energy to decrease carbonization in the power system and reduce the dependability of fossil fuel for energy needs play an important role in the development of smart grids. Many technological advancements are integrated into smart grid to optimize the power system and renewable energy sources. Smart grid leverages electricity and energy consumption data exchange to establish a significantly advanced, automated, and decentralized electricity network. However, this brings numerous vulnerabilities to the power system, including cyber-attacks, grid blackouts, and electricity theft. While the most significant concern is energy theft, where some culprit's consumers manipulate their energy meters to reduce their readings. This destabilizes the country's electricity utility and economic development and causes a high tariff on energy for consumers who pay the bill. Therefore, developing an advanced framework for electricity theft detection is necessary. To address this problem, we propose a machine learning-based stacked framework to detect malicious activity in the smart grid. The proposed data-based stacked ensemble model detects honest and anomalous consumers in two stages. In the first stage, the model employs four individual classifiers at the base level to analyze data and a single classifier at the meta-level to classify the results of the base learners for the second stage classification. Furthermore, the Borderline SMOTE and Principle Component Analysis techniques are employed to address the class imbalance and curse of dimensionality issues respectively. Through experimental analysis, we proved the effectiveness of the proposed framework in detecting suspicious activity in four different experiments, including preprocessed data, feature extracted data, balanced data, and lastly, both feature engineering and data balancing. The simulation outcomes demonstrate that our proposed framework enhanced energy security and overcomes the impact of theft attacks on the smart grid.
Book
Full-text available
This book is designed to consider the recent advancements in hospitals to diagnose various diseases accurately using AI-supported detection procedures. This work examines recent AI-supported disease detection techniques from prominent researchers and clinicians working in the medical imaging processing domain. Within this book, the integration of various AI methods, such as soft computing, machine learning, deep learning, and other related works will be presented. Real clinical images utilizing AI are incorporated. The book also includes several chapters on machine learning, convoluted neural networks, segmentation, and deep learning-assisted two-class and multi-class classification.
Article
Skin melanoma is a potentially life-threatening cancer. Once it has metastasized, it may cause severe disability and death. Therefore, early diagnosis is important to improve the conditions and outcomes for patients. The disease can be diagnosed based on Digital-Dermoscopy (DD) images. In this study, we propose an original and novel Automated Skin-Melanoma Detection (ASMD) system with Melanoma-Index (MI). The system incorporates image pre-processing, Bi-dimensional Empirical Mode Decomposition (BEMD), image texture enhancement, entropy and energy feature mining, as well as binary classification. The system design has been guided by feature ranking, with Student’s t-test and other statistical methods used for quality assessment. The proposed ASMD was employed to examine 600 benign and 600 DD malignant images from benchmark databases. Our classification performance assessment indicates that the combination of Support Vector Machine (SVM) and Radial Basis Function (RBF) offers a classification accuracy of greater than 97.50%. Motivated by these classification results, we also formulated a clinically relevant MI using the dominant entropy features. Our proposed index can assist dermatologists to track multiple information-bearing features, thereby increasing the confidence with which a diagnosis is given.
Article
Streaming data provides substantial challenges for data analysis. From a computational standpoint, these challenges arise from constraints related to computer memory and processing speed. Statistically, the challenges relate to constructing procedures that can handle the so-called concept drift —the tendency of future data to have different underlying properties to current and historic data. The issue of handling structure, such as trend and periodicity, remains a difficult problem for streaming estimation. We propose the real-time adaptive component (RAC), a penalized-regression modeling framework that satisfies the computational constraints of streaming data, and provides the capability for dealing with concept drift. At the core of the estimation process are techniques from adaptive filtering. The RAC procedure adopts a specified basis to handle local structure, along with a least absolute shrinkage operator-like penalty procedure to handle over fitting. We enhance the RAC estimation procedure with a streaming anomaly detection capability. The experiments with simulated data suggest the procedure can be considered as a competitive tool for a variety of scenarios, and an illustration with real cyber-security data further demonstrates the promise of the method.
Article
In many large-scale machine learning applications, data are accumulated over time, and thus, an appropriate model should be able to update in an online style. In particular, it would be ideal to have a storage independent from the data volume, and scan each data item only once. Meanwhile, the data distribution usually changes during the accumulation procedure, making distribution-free one-pass learning a challenging task. In this paper, we propose a simple yet effective approach for this task, without requiring prior knowledge about the change, where every data item can be discarded once scanned. We also present a variant for high-dimensional situations, by exploiting compressed sensing to reduce computational and storage complexity. Theoretical analysis shows that our proposal converges under mild assumptions, and the performance is validated on both synthetic and real-world datasets.
Article
Cybersecurity increasingly relies on the methodology used for statistical analysis of network data. The volume and velocity of enterprise network data sources puts a premium on streaming analytics that pass over the data once, while handling temporal variation in the process. In this paper we introduce ReTiNA: a framework for streaming network anomaly detection. This procedure first detects anomalies in the correlation processes on individual edges of the network graph. Second, anomalies across multiple edges are combined and scored to give network-wide situational awareness. The approach is tested in simulation and demonstrated on two real Netflow datasets.
Conference Paper
Streaming classification is well studied in the machine learning community. In real-world applications labels for previously observed feature vectors may only arrive after appreciable lag — that is, the labels are delayed. These delayed labels are an important aspect of streaming analysis, one that is not properly appreciated or addressed in the literature. This paper provides a taxonomy of delayed labeling and a framework for incorporating such labels into a streaming classifier. We provide a real-world demonstration of the utility of correctly handling delayed labels, in the context of a temporally adaptive linear classifier. This simple illustration shows that appropriately handling delayed labels can lead to an increase in performance, suggesting an opportunity for new research.
Article
Full-text available
Streszczenie: W artykule rozważamy prostą dwuosobową grę kooperacyjną, w której gracze działają w warunkach niepełnej informacji oraz opierając się na szeregu czaso-wym zawierającym obserwacje odstające. Rozpatrywana gra nawiązuje do idei klasyfi-katora indukowanego przez statystyczną funkcję głębi. Słowa kluczowe: funkcja głębi, kooperacyjna gra dynamiczna, odporność. Wprowadzenie Rozmiary zbiorów danych, na podstawie których ekonomiści obecnie po-dejmują decyzje, coraz częściej wymuszają stosowanie odmiennych strategii przetwarzania danych, niż miało to miejsce jeszcze kilka lat temu. Wielopozio-mowe monitorowanie aktywności użytkowników sieci Internet, handel algoryt-miczny czy on-line credit scoring stanowią przykłady zjawisk, które inicjują ewolucję tzw. klasycznych technik analizy danych i wnioskowania statystyczne-go [por. Huber, 2011]. Motywacją powstania niniejszego artykułu był jeden z kie-runków ewolucji metod statystycznych związany z gospodarką on-line nazywa-ny strumieniowym przetwarzaniem danych (SPD). W przypadku SPD dane docierają do systemu je przetwarzającego w sposób bardzo gwałtowny, w wiel-kich ilościach, podczas gdy do dyspozycji mamy jedynie ograniczoną ilość pa-mięci, aby te dane magazynować [por. Aggerwal, 2007]. Algorytmy, którymi posługujemy się w ramach SPD, muszą radzić sobie z wysokimi wymaganiami w zakresie: 1) szybkości transmisji danych do programu, 2) złożoności oblicze-niowej algorytmu oraz 3) wielkości pamięci niezbędnej dla prawidłowego dzia
Article
Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.
Conference Paper
Full-text available
In this paper we propose a new methodology for gaining insight into the temporal aspects of social networks. In order to develop higher-level, large-scale data analysis methods for classification, prediction, and anomaly detection, a solid foundation of analytical techniques is required. We present a novel approach to the analysis of these networks that leverages time series and statistical techniques to quantitatively describe the temporal nature of a social network. We report on the application of our approach toward a real data set and successfully visualize high-level changes to the network as well as discover outlying vertices. The real-time prediction of new connections given the previous connections in a graph is a notoriously difficult task. The proposed technique avoids this difficulty by modeling statistics computed from the graph over time. Vertex statistics summarize topological information as real numbers, which allows us to leverage the existing fields of computational statistics and machine learning. This creates a modular approach to analysis in which methods can be developed that are agnostic to the metrics and algorithms used to process the graph. We demonstrate these techniques using a collection of Twitter posts related to Hurricane Sandy. We study the temporal nature of betweenness centrality and clustering coefficients while producing multiple visualizations of a social network dataset with 1.2 million edges. We successfully detect vertices whose triangle-forming behavior is anomalous.
Article
Full-text available
An emerging problem in Data Streams is the detection of concept drift. This problem is aggravated when the drift is gradual over time. In this work we deflne a method for detecting concept drift, even in the case of slow gradual change. It is based on the estimated distribution of the distances between classiflcation errors. The proposed method can be used with any learning algorithm in two ways: using it as a wrapper of a batch learning algorithm or implementing it inside an incremental and online algorithm. The experimentation results compare our method (EDDM) with a similar one (DDM). Latter uses the error-rate instead of distance-error-rate.
Article
Full-text available
This paper mainly addresses the issue of semantic concept drifting in temporal sequences, such as video streams, over an extended period of time. Gaussian Mixture Model (GMM) is applied to model the distribution of under-investigating data, which are supposed to arrive or be generated in batches over time. The up-to-date classifier, which tracks the drifting concept, is directly built on the outdated models trained from the old labeled data. A couple of properties, such as Maximum Lifecycle, Dominant Component, Component Drifting Speed, System Stability, and Updating Speed, are defined to track concept drifting in the learning system, which is applied to determine corresponding parameters for model updating in order to obtain optimal up-to-date classifier. Experiments on simulated data and real-world data demonstrate that our proposed GMM-based batch learning framework is effective and efficient for dealing with concept drifting.
Article
Full-text available
Modern technology has allowed real-time data collection in a variety of domains, ranging from environmental monitoring to healthcare. Consequently, there is a growing need for algorithms capable of performing inferential tasks in an online manner, continuously revising their estimates to reflect the current status of the underlying process. In particular, we are interested in constructing online and temporally adaptive classifiers capable of handling the possibly drifting decision boundaries arising in streaming environments. We first make a quadratic approximation to the log-likelihood that yields a recursive algorithm for fitting logistic regression online. We then suggest a novel way of equipping this framework with self-tuning forgetting factors. The resulting scheme is capable of tracking changes in the underlying probability distribution, adapting the decision boundary appropriately and hence maintaining high classification accuracy in dynamic or unstable environments. We demonstrate the scheme’s effectiveness in both real and simulated streaming environments.
Article
Full-text available
To answer the questions of how information about the physical world is sensed, in what form is information remembered, and how does information retained in memory influence recognition and behavior, a theory is developed for a hypothetical nervous system called a perceptron. The theory serves as a bridge between biophysics and psychology. It is possible to predict learning curves from neurological variables and vice versa. The quantitative statistical approach is fruitful in the understanding of the organization of cognitive systems. 18 references.
Conference Paper
Full-text available
Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and testing streams are used for dynamic classification of data sets. This model reflects real life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains a high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task.
Conference Paper
Full-text available
Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.
Conference Paper
Full-text available
The purpose of this paper is to test the hypothesis that sim- ple classiflers are more robust to changing environments than complex ones. We propose a strategy for generating artiflcial, but realistic do- mains, which allows us to control the changing environment and test a variety of situations. Our results suggest that evaluating classiflers on such tasks is not straightforward since the changed environment can yield a simpler or more complex domain. We propose a metric capable of tak- ing this issue into consideration and evaluate our classiflers using it. We conclude that in mild cases of population drifts simple classiflers deteri- orate more than complex ones and that in more severe cases as well as in class deflnition changes, all classiflers deteriorate to about the same extent. This means that in all cases, complex classiflers remain more ac- curate than simpler ones, thus challenging the hypothesis that simple classiflers are more robust to changing environments than complex ones.
Conference Paper
Full-text available
The problem of concept drift has recently received con- siderable attention in machine learning research. One important practical problem where concept drift needs to be addressed is spam filtering. The literature on con- cept drift shows that among the most promising ap- proaches are ensembles and a variety of techniques for ensemble construction has been proposed. In this pa- per we compare the ensemble approach to an alterna- tive lazy learning approach to concept drift whereby a single case-based classifier for spam filtering keeps it- self up-to-date through a case-base maintenance proto- col. The case-base maintenance approach offers a more straightforward strategy for handling concept drift than updating ensembles with new classifiers. We present an evaluation that shows that the case-base maintenance approach is as least as effective as/appears marginally more effective than a selection of ensemble techniques. The evaluation is complicated by the overriding impor- tance of False Positives (FPs) in spam filtering. The ensemble approaches can have very good performance on FPs because it is possible to bias an ensemble more strongly away from FPs than it is to bias the single clas- sifer. However this comes at considerable cost to the overall accuracy.
Conference Paper
Full-text available
We propose a strategy for updating the learning rate parameter of online linear classifiers for streaming data with concept drift. The change in the learning rate is guided by the change in a running estimate of the classification error. In addition, we propose an online version of the standard linear discriminant classifier (O-LDC) in which the inverse of the common covariance matrix is updated using the Sherman-Morrison-Woodbury formula. The adaptive learning rate was applied to four online linear classifier models on generated and real streaming data with concept drift. O-LDC was found to be better than balanced Winnow, the perceptron and a recently proposed online linear discriminant analysis.
Conference Paper
Full-text available
A fundamental assumption often made in supervised classification is that the problem is static, i.e. the description of the classes does not change with time. However many practical classification tasks involve changing environments. Thus designing and testing classifiers for changing environments are of increasing interest and importance. A number of benchmark data sets are available for static classification tasks. For example, the UCI machine learning repository is extensively used by researchers to compare algorithms across various domains. No such benchmark datasets are available for changing environments. Also, while generating data for static environments is relatively straightforward, this is not so for changing environments. The reason is that an infinite amount of changes can be simulated, and it is difficult to define which ones will be realistic and hence useful. In this paper we propose a general framework for generating data to simulate changing environments. The paper gives illustrations of how the framework encompasses various types of changes observed in real data and also how the two most popular simulation models (STAGGER and moving hyperplane) are represented within.
Conference Paper
Full-text available
We consider strategies for building classier ensembles for non-stationary environments where the classication task changes dur- ing the operation of the ensemble. Individual classier models capable of online learning are reviewed. The concept of \forgetting" is discussed. Online ensembles and strategies suitable for changing environments are summarized.
Conference Paper
Full-text available
Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.
Conference Paper
Full-text available
In this paper, we propose an incremental classification algo- rithm which uses a multi-resolution data representation to find adaptive nearest neighbors of a test point. The algorithm achieves excellent per- formance by using small classifier ensembles where approximation error bounds are guaranteed for each ensemble size. The very low update cost of our incremental classifier makes it highly suitable for data stream ap- plications. Tests performed on both synthetic and real-life data indicate that our new classifier outperforms existing algorithms for data streams in terms of accuracy and computational costs.
Book
Statistical pattern recognition is a very active area of study and research, which has seen many advances in recent years. New and emerging applications - such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition - require robust and efficient pattern recognition techniques. Statistical decision making and estimation are regarded as fundamental to the study of pattern recognition. Statistical Pattern Recognition, Second Edition has been fully updated with new methods, applications and references. It provides a comprehensive introduction to this vibrant area - with material drawn from engineering, statistics, computer science and the social sciences - and covers many application areas, such as database design, artificial neural networks, and decision support systems. Provides a self-contained introduction to statistical pattern recognition. Each technique described is illustrated by real examples. Covers Bayesian methods, neural networks, support vector machines, and unsupervised classification. Each section concludes with a description of the applications that have been addressed and with further developments of the theory. Includes background material on dissimilarity, parameter estimation, data, linear algebra and probability. Features a variety of exercises, from 'open-book' questions to more lengthy projects. The book is aimed primarily at senior undergraduate and graduate students studying statistical pattern recognition, pattern processing, neural networks, and data mining, in both statistics and engineering departments. It is also an excellent source of reference for technical professionals working in advanced information development environments.
Conference Paper
The majority of automatic target recognition (ATR) studies are formulated as a traditional classification problem. Specifically, using a training set of target exemplars, a classifier is developed for application to isolated measurements of targets. Performance is assessed using a test set of target exemplars. Unfortunately, this is a simplification of the ATR problem. Often, the operating conditions differ from those prevailing at the time of training data collection, which can have severe effects on the obtained performance. It is therefore becoming increasingly recognised that development of robust ATR systems requires more than just consideration of the traditional classification problem. In particular, one should make use of any extra information or data that is available. The example in this paper focuses on a hybrid ATR system being designed to utilise both measurements from identity sensors (such as radar profiles) and motion information from tracking sensors to classify targets. The first-stage of the system uses mixture-model classifiers to classify targets into generic classes based upon data from (long-range) tracking sensors. Where the generic classes are related to platform types (e.g. fast-jets, heavy bombers and commercial aircraft), the initial classifications can be used to assist the military commander's early decision making. The second-stage of the system uses measurements from (closer-range) identity sensors to classify the targets into individual target types, while taking into account the (uncertain) outputs from the first-stage. A Bayesian classifier is proposed for the second-stage, so that the first-stage outputs can be incorporated into the second-stage prior class probabilities.
Article
Statistical pattern recognition is a term used to cover all stages of an investigation from problem formulation and data collection through to discrimination and classification, assessment of results and interpretation. This chapter introduces some of the basic concepts in classification and describes the key issues. It presents two complementary approaches to discrimination, namely a decision theory approach based on calculation of probability density functions and the use of Bayes theorem, and a discriminant function approach. Many different forms of discriminant function have been considered in the literature, varying in complexity from the linear discriminant function to multiparameter nonlinear functions such as the multilayer perceptron. Regression is an important part of statistical pattern recognition. Regression analysis is concerned with predicting the mean value of the response variable given measurements on the predictor variables and assumes a model of the form. Bayes' theorem; regression analysis; statistical process control
Conference Paper
Convergence of a matrix dynamics for online LDA is analyzed. Especially, stable spurious solutions are pointed out and two schemes to prevent the spurious solutions are proposed. Simulations of face identification confirm the performance of the algorithm.
Book
Ripley brings together two crucial ideas in pattern recognition: statistical methods and machine learning via neural networks. He brings unifying principles to the fore, and reviews the state of the subject. Ripley also includes many examples to illustrate real problems in pattern recognition and how to overcome them.
Article
The majority of automatic target recognition (ATR) studies are formulated as a traditional classification problem. Specifically, using a training set of target exemplars, a classifier is developed for application to isolated measurements of targets. Performance is assessed using a test set of target exemplars. Unfortunately, this is a simplification of the ATR problem. Often, the operating conditions differ from those prevailing at the time of training data collection, which can have severe effects on the obtained performance. It is therefore becoming increasingly recognised that development of robust ATR systems requires more than just consideration of the traditional classification problem. In particular, one should make use of any extra information or data that is available. The example in this paper focuses on a hybrid ATR system being designed to utilise both measurements from identity sensors (such as radar profiles) and motion information from tracking sensors to classify targets. The first-stage of the system uses mixture-model classifiers to classify targets into generic classes based upon data from (long range) tracking sensors. Where the generic classes are related to platform types (e.g. fast-jets, heavy bombers and commercial aircraft), the initial classifications can be used to assist the military commander's early decision making. The second-stage of the system uses measurements from (closer-range) identity sensors to classify the targets into individual target types, while taking into account the (uncertain) outputs from the first-stage. A Bayesian classifier is proposed for the second-stage, so that the first-stage outputs can be incorporated into the second-stage prior class probabilities.
Article
Streaming data introduce challenges mainly due to changing data distributions (population drift). To accommodate population drift we develop a novel linear adaptive online classification method motivated by ideas from adaptive filtering. Our approach allows the impact of past data on parameter estimates to be gradually removed, a process termed forgetting, yielding completely online adaptive algorithms. Extensive experimental results show that this approach adjusts the forgetting mechanism to maintain performance. Moreover, it might be possible to exploit the information in the evolution of the forgetting mechanism to obtain information about the type and speed of the underlying population drift process.
Article
Preface. - Matrices. - Submatrices and partitioned matricies. - Linear dependence and independence. - Linear spaces: row and column spaces. - Trace of a (square) matrix. - Geometrical considerations. - Linear systems: consistency and compatability. - Inverse matrices. - Generalized inverses. - Indepotent matrices. - Linear systems: solutions. - Projections and projection matrices. - Determinants. - Linear, bilinear, and quadratic forms. - Matrix differentiation. - Kronecker products and the vec and vech operators. - Intersections and sums of subspaces. - Sums (and differences) of matrices. - Minimzation of a second-degree polynomial (in n variables) subject to linear constraints. - The Moore-Penrose inverse. - Eigenvalues and Eigenvectors. - Linear transformations. - References. - Index.
Book
The use of adaptive algorithms is now very widespread across such varied applications as system identification, adaptive control, transmission systems, adaptive filtering for signal processing, and several aspects of pattern recognition. Numerous, very different examples of applications are given in the text. The success of adaptive algorithms has inspired an abundance of literature, and more recently a number of significant works such as the books of Ljung and Soderström (1983) and of Goodwin and Sin (1984).
Article
In this paper we present the general analysis of a class of least squares algorithms with emphasis on their dynamic performance particularly in the presence of poor excitation. The analysis is carried out in a deterministic framework and stresses geometrical interpretations. The core of this paper is the proposal and analysis of a new algorithm which incorporates exponential forgetting and resetting to an unprejudiced treatment of data when excitation is poor. The algorithm is particularly suitable for tracking time-varying parameters and is similar in computational complexity to the standard recursive least squares algorithm. The superior performance of the algorithm is verified via simulation studies.
Article
An analysis is given of the performance of the standard forgetting factor recursive least squares (RLS) algorithm when used for tracking time-varying linear regression models. Three basic results are obtained: (1) the ‘P-matrix’ in the algorithm remains bounded if and only if the (time-varying) covariance matrix of the regressors is uniformly non-singular; (2) if so, the parameter tracking error covariance matrix is of the order O(μ + 2/μ), where μ = 1 - λ, λ is the forgetting factor and is a quantity reflecting the speed of the parameter variations; (3) this covariance matrix can be arbitrarily well approximated (for small enough μ) by an expression that is easy to compute.
Conference Paper
This paper addresses the blow-up problem associated with the estimation part of an adaptive controller. A partial solution of the problem has been deviced by introduction of a variable forgetting factor. It does, however, not eliminate the blow-up possibility. This is shown by simultation experiments on two different models. Two new methods based on a vector variable forgetting factor are presented. One of these methods completely solves the blow-up problem, whereas the other reduces the possibility of blow-up tendencies. Simulation experiments using both methods are compared with the single variable forgetting factor case.
Conference Paper
The authors present a theoretical analysis for the performance of the standard forgetting factor recursive least squares (RLS) algorithm used in the tracking of time-varying linear regression models. Under some explicit excitation conditions on the regressors, it is shown that the parameter tracking error is on the order O (√μ+γ/√μ), where μ=1-λ, λ is the forgetting factor, and γ is the quantity reflecting the speed of parameter variation. Furthermore, for a large class of weakly dependent regressors, simple approximations for the covariance matrix of this error are derived. These approximations are not asymptotic in nature: they hold over all time intervals and for all μ in a certain region
Article
A modified version of the self-tuning regulator having limited adaptability has been successfully implemented on a large-scale chemical pilot plant. The new algorithm uses a least-squares estimator with variable weighting of past data; at each step a weighting factor is chosen to maintain constant a scalar measure of the information content of the estimator. It is shown that, for nearly deterministic systems, such an approach enables the parameter estimates to follow both slow and sudden changes in the plant dynamics. Furthermore, the use of a variable forgetting factor with correct choice of information bound can avoid one of the major difficulties associated with constant exponential weighting of past data—namely, ‘blowing-up’ of the covariance matrix of the estimates and subsequent unstable control. Accordingly, the control algorithm described here may be well suited to the regulation of plants which would otherwise require periodic re-tuning of control constants.
Article
To track the time-varying dynamics of a system or the time-varying properties of a signal is a fundamental problem in control and signal processing. Many approaches to derive such adaptation algorithms and to analyse their behaviour have been taken. This article gives a survey of basic techniques to derive and analyse algorithms for tracking time-varying systems. Special attention is paid to the study of how different assumptions about the true system's variations affect the algorithm. Several explicit and semi-explicit expressions for the mean square error are derived, which clearly demonstrate the character of the trade-off between tracking ability and noise rejection.
Article
The relevance weighted likelihood method was introduced by Hu and Zidek (Technical Report No. 161, Department of Statistics, The University of British Columbia, Vancouver, BC, Canada, 1995) to formally embrace a variety of statistical procedures for trading bias for precision. Their approach combines all relevant information through a weighted version of the likelihood function. The present paper is concerned with the asymptotic properties of a class of maximum weighted likelihood estimators that contains those considered by Hu and Zidek (Technical Report No. 161, Department of Statistics, The University of British Columbia, Vancouver, BC, Canada, 1995, in: Ahmed, S.E. Reid, N. (Eds.), Empirical Bayes and Likelihood Inference, Springer, New York, 2001, p. 211). Our results complement those of Hu (Can. J. Stat. 25 (1997) 45). In particular, we invoke a different asymptotic paradigm than that in Hu (Can. J. Stat. 25 (1997) 45). Moreover, our adaptive weights are allowed to depend on the data.
Chapter
The sections in this article are1The Problem2Background and Literature3Outline4Displaying the Basic Ideas: Arx Models and the Linear Least Squares Method5Model Structures I: Linear Models6Model Structures Ii: Nonlinear Black-Box Models7General Parameter Estimation Techniques8Special Estimation Techniques for Linear Black-Box Models9Data Quality10Model Validation and Model Selection11Back to Data: The Practical Side of Identification
Conference Paper
An assumption fundamental to almost all work on super- vised classification is that the probabilities of class member- ship, conditional on the feature vectors, are stationary. However, in many situations this assumption is untenable. We give examples of such population drift, examine its nature, show how the impact of population drift depends on the chosen measure of classification performance, and propose a strategy for dynamically updating classification rules.
Conference Paper
We consider on-line density estimation with the multivariate Gaussian distribution. In each of a sequence of trials, the learner must posit a mean µ and covariance ; the learner then receives an instance x and incurs loss equal to the negative log-likelihood of x under the Gaus- sian density parameterized by (µ, ). We prove bounds on the regret for the follow-the-leader strategy, which amounts to choosing the sample mean and covariance of the previously seen data. We consider an on-line learning problem based on Gaussian density estimation in Rd. The learning task proceeds in a sequence of trials. In trial t, the learner selects a mean µt and covariance t. Then, Nature reveals an instance xt to the learner, and the learner incurs a loss 't(µt, t) equal to the negative log- likelihood of xt under the Gaussian density parameterized by (µt, t). We will compare the total loss incurred from selecting the (µt, t) in T trials to the total loss incurred using the best fixed strategy for the T trials. A fixed strategy is one that sets (µt, t) to the same (µ, ) for each t. The dierence of these total losses is the regret of following a strategy and not instead selecting this best-in-hindsight (µ, ) in every trial; it is the cost of not seeing all of the data ahead of time. In this paper, we will analyze the regret of the follow-the- leader strategy: the strategy which chooses (µt, t) to be the sample mean and covariance of {x1,x2,...,xt 1}. First, we find that a na¨ive formulation of the learning problem suers from degenerate cases that lead to unbounded regret. We propose a straightforward alternative that avoids these problems by incorporating an additional, halluci- nated, trial at time zero. In this setting, a trivial upper bound on the regret of follow-the-leader (FTL) is O(T2) after T trials. We obtain the following bounds. - For any p > 1, there are sequences (xt) for which FTL has regret
Conference Paper
The basic assumption in classifier design is that the distribution from which the design sample is selected is the same as the distribution from which future objects will arise: i.e., that the training set is representative of the operating conditions. In many applications, this assumption is not valid. In this paper, we discuss sources of variation and possible approaches to handling it. We then focus on a problem in radar target recognition in which the operating sensor differs from the sensor used to gather the training data. For situations where the physical and processing models for the sensors are known, a solution based on Bayesian image restoration is proposed.
Conference Paper
Variable selection for regression is a classical statistical problem, motivated by concerns that too many covariates invite overfitting. Existing approaches notably include a class of convex optimisation techniques, such as the Lasso algorithm. Such techniques are invariably reliant on assumptions that are unrealistic in streaming contexts, namely that the data is available off-line and the correlation structure is static. In this paper, we relax both these constraints, proposing for the first time an online implementation of the Lasso algorithm with exponential forgetting. We also optimise the model dimension and the speed of forgetting in an online manner, resulting in a fully automatic scheme. In simulations our scheme improves on recursive least squares in dynamic environments, while also featuring model discovery and changepoint detection capabilities.