ArticlePDF Available

Joint Estimation of Model Parameters and Outlier Effects in Time Series

Authors:
  • IL-Data information Technology Co.

Abstract

Time series data are often subject to uncontrolled or unexpected interventions, from which various types of outlying observations are produced. Outliers in time series, depending on their nature, may have a moderate to significant impact on the effectiveness of the standard methodology for time series analysis with respect to model identification, estimation, and forecasting. In this article we use an iterative outlier detection and adjustment procedure to obtain joint estimates of model parameters and outlier effects. Four types of outliers are considered, and the issues of spurious and masking effects are discussed. The major differences between this procedure and those proposed in earlier literature include (a) the types and effects of outliers are obtained based on less contaminated estimates of model parameters, (b) the outlier effects are estimated simultaneously using multiple regression, and (c) the model parameters and the outlier effects are estimated jointly. The sampling behavior of the test statistics for cases of small to large sample sizes is investigated through a simulation study. The performance of the procedure is examined over a representative set of outlier cases. We find that the proposed procedure performs well in terms of detecting outlets and obtaining unbiased parameter estimates. An example is used to illustrate the application of the proposed procedure. It is demonstrated that this procedure performs effectively in avoiding spurious outliers and masking effects. The model parameter estimates obtained from the proposed procedure are typically very close to those estimated by the exact maximum likelihood method using an intervention model to incorporate the outliers.
... The detection of anomalies in time series has received considerable attention in both the statistics (Chen and Liu 1993) and machine learning (Chandola et al. 2009) literature. This is no surprise given the broad range of applications, from fraud detection (Ferdousi and Maeda 2006) to fault detection (Theissler 2017;Zhao et al. 2018), that this area lends itself to. ...
Article
Full-text available
The ability to quickly and accurately detect anomalous structure within data sequences is an inference challenge of growing importance. This work extends recently proposed post-hoc (offline) anomaly detection methodology to the sequential setting. The resultant procedure is capable of real-time analysis and categorisation between baseline and two forms of anomalous structure: point and collective anomalies. Various theoretical properties of the procedure are derived. These, together with an extensive simulation study, highlight that the average run length to false alarm and the average detection delay of the proposed online algorithm are very close to that of the offline version. Experiments on simulated and real data are provided to demonstrate the benefits of the proposed method.
... The original time series were also interrogated using an automated anomaly detection algorithm to identify those observations that deviate significantly from its normal behaviour (López de Lacalle, 2019). The algorithm searched for two different types of anomaly, additive outliers, which are single observations that appear to be completely unrelated to the rest of the time series, and transient changes, which are similarly unexpected but the effect of the outlier diminishes exponentially over time (Chen and Liu, 1993). ...
Article
Full-text available
A catastrophic eruption destroyed the Tongan island of Hunga Tonga-Hunga Haʻapai at 04:14 UTC on 15 January 2022. This event triggered a series of enormous ripples that spread out across the Earth. Atmospheric pressure observations recorded outside and inside the British Cave Science Centre at Poole’s Cavern, Derbyshire, present evidence for two phases of anomalous behaviour between c. 18:30 and 20:30 UTC on 15 January and c. 01:30 and 02:30 UTC on 16 January. These are thought to have been initiated by a Lamb wave circling the Earth. Visual inspection also identified a series of smaller perturbations repeating approximately every six hours until 17 January and renewed instability culminating on 19 January. In contrast, automated anomaly detection only pinpointed the larger anomalies on 17 and 19 January. Further research is needed in order confirm the existence of these later anomalies and to better understand their relationship with the volcanism in the southern Pacific.
... These interventions influence the time series data either on a single or a few datapoints, or they impact the whole process from some specific time T. Box and Tiao (1975) pioneer intervention detection and estimation analysis to solve the problem of Los Angeles pollution. Notable contributions and extensions concerning outliers in time series analysis include those by Chang et al. (1988), Chen and Liu (1993), Chareka et al. (2006) and others. Both known and unknown effects have practical aspects. ...
Article
External events are commonly known as interventions that often affect times series of counts. This research introduces a class of transfer function models that include four different types of interventions on integer-valued time series: abrupt start and abrupt decay (additive outlier), abrupt start and gradual decay (transient shift), abrupt start and permanent effect (level shift) and gradual start and permanent effect. We propose integer-valued transfer function models incorporating a generalized Poisson, log-linear generalized Poisson or negative binomial to estimate and detect these four types of interventions in a time series of counts. Utilizing Bayesian methods, which are adaptive Markov chain Monte Carlo (MCMC) algorithms to obtain the estimation, we further employ deviance information criterion (DIC), posterior odd ratios and mean squared standardized residual for model comparisons. As an illustration, this study evaluates the effectiveness of our methods through a simulation study and application to crime data in Albury City, New South Wales (NSW) Australia. Simulation results show that the MCMC procedure is reasonably effective. The empirical outcome also reveals that the proposed models are able to successfully detect the locations and type of interventions.
... The drifter has a spatial accuracy of 2.5 m CEP (Circular Error Probability), defined as the radius of a circle centered on the true value that contains 50% of the actual GPS measurements. Trajectories from drifters are quality-controlled based on the ARIMA (auto-regressive integrated moving average) model [40,41] in order to identify and remove outliers. The mean surface velocity is calculated based on the traveling distance of a drifter divided by time, using the data that has been quality-controlled. ...
Article
Full-text available
The physical processes governing coastal exchange between the surf zone, the inner shelf, and the open ocean is critical for estimating mass exchange and its impact on ecological processes. The present study combined field measurements and theoretical approaches to explore the hydrodynamics in the coastal boundary layer (CBL) in which both bottom drag and shore friction affect the transport and mixing processes. Observed drifter-cluster trajectories in a nearly alongshore-uniform coastal area showed that the occurrence of current reversal varies with cross-shore distance, which confirmed the tidal phase difference between different cross-shore distances predicted by the proposed CBL model. According to the CBL model, tidal phase difference is affected by the bottom drag coefficient and horizontal eddy viscosity coefficient. With the results of three experiments under different wave conditions, this study also discusses the effects of waves on the CBL. Data analysis based on observations indicates that the bottom drag term is closely related to the bottom shear stress induced by the interactions of waves and currents. The bottom drag coefficient under the more energetic wave condition was much greater than that under milder wave conditions during the experiment. The study also suggests that in addition to pressure gradient and bottom drag, flow structure is subject to lateral stress, which reflects the impact of shoreline roughness in the nearshore region and that the estimated eddy viscosity coefficient decreases linearly with distance from the shoreline.
... Data were cleaned and prepared prior to analysis to increase the validity of the analytical approach. Outliers were identified and removed using "tsclean ()" function in R, which applies the Chen & Liu procedure (Chen & Liu, 1993). When the data were decomposed, a clear seasonal pattern was identified and subsequently removed using the "decompose ()" function in R (R Core Team, 2013). ...
Article
The current study looks at participation on a popular message board website (reddit.com) to see if recent expansions of sports betting in the US correspond with the growth of a mutual support group. Data for the study included the number of weekly posts on the r/problemgambling message board from January 1, 2016 to December 31, 2020. The current study applies Interrupted Time Series Analysis using the introduction of online sports betting in the US outside of Nevada as the intervention point. Thematic analyses of forum post titles (17,041 titles) and full posts drawn from 75 randomly selected days for the data collection period (558 posts) were also conducted. Results show that, after the intervention, there was a significant immediate increase and significant faster growth in the number of posts on r/problemgambling. Thematic analysis revealed increased discussion of American professional sports after the interruption date and criticism of States seeking to expand gambling availability. The growth of self-organizing online communities offers an opportunity to increase help-seeking for people experiencing harm related to their gambling participation. Monitoring these communities can provide early indication of the impacts of major policy changes on gambling behaviors.
Article
This work assesses the quality of Internet of Things data not only as an intrinsic quality on how well it represents the related phenomenon but also, on how much information it contains to educate an artificial entity. The quality metrics here proposed are tested with real datasets. Also, they are implemented on OpenCPU, so the open data repositories can use them off-the-shelf to rate their datasets without computational cost and minimum human intervention, making them more attractive to potential users and gaining visibility and impact.
Article
Environmental data often include outliers that may significantly affect further modelling and data analysis. Although a number of outlier detection methods have been proposed, their use is usually complicated by the assumption of the distribution or model of the analyzed data. However, environmental variables are quite often influenced by many different factors and their distribution is difficult to estimate. The envoutliers package has been developed to provide users with a choice of recently presented, semi-parametric outlier detection methods that do not impose requirements on the distribution of the original data. This paper briefly describes the methodology as well as its implementation in the package. The application is illustrated on real data examples.
Chapter
Annuity pricing is critical to the insurance companies for their financial liabilities. Companies aim to adjust the prices using a forecasting model that fits best to their historical data, which may have outliers influencing the model. Environmental conditions and extraordinary events such as a weak health system, an outbreak of war, and occurrence of pandemics like Spanish flu or Covid-19 may cause outliers resulting in misevaluation of mortality rates. These outliers should be taken into account to preserve the financial strength and liability of the life insurance industry. In this study, we aim to determine if there is an impact of mortality jumps in annuity pricing. We question the annuity price fluctuations among different countries and two models on country characteristics. Moreover, we show the annuity pricing on a portfolio for a more comprehensive assessment. To achieve this, a simulated diverse portfolio is created for the prices of four types of life annuities. Canada, Japan, and the United Kingdom as developed countries with high longevity risk, Russia and Bulgaria as emerging countries are considered. The results of this study prove the use of outlier-adjusted models for specific countries.
Article
The resource curse literature has established that taxation of natural resources might limit the long-term development of fiscal capacity in resource-rich countries. This article explores if, and how, natural resource abundance generates fiscal dependence on natural resource revenues. We compare five peripheral economies of Latin America (Bolivia, Chile, Peru) and Scandinavia (Norway, Sweden) over a period of 90 years, between 1850 and 1939. Both groups were natural resource abundant, but in the latter natural resource dependence decreased over time. By using a novel database, we find that fiscal dependence was low in Norway and Sweden, while high and unstable in Bolivia, Chile and Peru. This suggests that natural resource abundance should not be mechanically linked to fiscal dependence. An accounting identity shows that sudden increases in fiscal dependence were related to both economic and political factors: countries’ economic diversification, and attitudes of the relevant political forces about how taxation affects the companies operating in the natural resource sector.
Article
Full-text available
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.
Article
Full-text available
Outliers in time series can be regarded as being generated by dynamic intervention models at unknown time points. Two special cases, innovational outlier (IO) and additive outlier (AO), are studied in this article. The likelihood ratio criteria for testing the existence of outliers of both types, and the criteria for distinguishing between them are derived. An iterative procedure is proposed for detecting IO and AO in practice and for estimating the time series parameters in autoregressive-integrated-moving-average models in the presence of outliers. The powers of the procedure in detecting outliers are investigated by simulation experiments. The performance of the proposed procedure for estimating the autoregressive coefficient of a simple AR(l) model compares favorably with robust estimation procedures proposed in the literature. Two real examples are presented.
Article
Time series analysis, particularly intervention analysis, is commonly employed in impact studies of environmental data. Environmental time series are susceptible to exogenous variations and often contain various types of outliers. Outliers, depending upon the time of their occurrences and nature, can have substantial impact on the estimates of intervention effects and their test statistics. Hence, outlier detection and adjustment should be an indispensable part of an intervention analysis. In this paper, an iterative procedure for the joint estimation of model parameters and outlier effects is employed with the intervention analysis. We find that this joint estimation procedure not only produces more reliable estimates of intervention effects, but also provides information on outliers, which is valuable in many respects. As a special case of outlier adjustment, this joint estimation procedure can also be used to estimate the values of missing data in a time series. Two data sets are used to illustrate the application of intervention analysis with outlier adjustment. We find the results under the new approach more meaningful and informative than those of traditional analyses. A third data set is used to illustrate time series modeling with missing data and outliers. Outlier analysis is not only useful in providing information for retroactive investigation of unusual observations in a time series, but more importantly, it can be used in conjunction with data collection and thus improve the quality of the data.
Article
This article discusses the effect of interventions on a given response variable in the presence of dependent noise structure. Difference equation models are employed to represent the possible dynamic characteristics of both the interventions and the noise. Some properties of the maximum likelihood estimators of parameters measuring level changes are discussed. Two applications, one dealing with the photochemical smog data in Los Angeles and the other with changes in the consumer price index, are presented.
Article
Two models, the aberrant innovation model and the aberrant observation model, are considered to characterize outliers in time series. The approach adopted here allows for a small probability α that any given observation is ‘bad’ and in this set-up the inference about the parameters of an autoregressive model is considered.
Article
This paper examines the effect of correlation of observations on estimators of a mean which are designed to guard against the possibility of spurious observations (that is, observations generated in a manner not intended). The mean squared error, premium and protection of these estimators are evaluated and discussed for some specific correlation structures.
Article
The effect of an additive outlier upon the accuracy of forecasts derived from extrapolative methods is investigated. It is demonstrated that an outlier affects not only the accuracy of the forecasts at the time of occurrence but also subsequent forecasts. Methods to adjust for additive outliers are discussed. The results of the paper are illustrated with two examples.
Article
Outliers, level shifts, and variance changes are commonplace in applied time series analysis. However, their existence is often ignored and their impact is overlooked, for the lack of simple and useful methods to detect and handle those extraordinary events. The problem of detecting outliers, level shifts, and variance changes in a univariate time series is considered. The methods employed are extremely simple yet useful. Only the least squares techniques and residual variance ratios are used. The effectiveness of these simple methods is demonstrated by analysing three real data sets.