Working PaperPDF Available

Forecasting with Robust Exponential Smoothing with Damped Trend and Seasonal Components

Authors:
Forecasting with Robust Exponential
Smoothing with Damped Trend and
Seasonal Components
Ruben Crevitsand Christophe Croux
October 27, 2016
Abstract
We provide a robust alternative for the exponential smoothing forecaster of Hyndman
and Khandakar (2008). For each method of a class of exponential smoothing variants we
present a robust alternative. The class includes methods with a damped trend and/or
seasonal components. The robust method is developed by robustifying every aspect of the
original exponential smoothing variant. We provide robust forecasting equations, robust
initial values, robust smoothing parameter estimation and a robust information criterion.
The method is implemented in an R-package: robets. We compare the standard non-
robust version with the robust alternative in a simulation study. The methodology is
illustrated with an application.
1 Introduction
In time series analysis exponential smoothing methods are popular because they are
straightforward and the whole forecasting procedure can happen automatically. Simple
Corresponding author: Faculty of Economics and Business, KU Leuven, Naamsestraat 69, B-3000
Leuven, Belgium.
Email address:ruben.crevits@kuleuven.be,Phone : +32 16 324758
1
exponential smoothing, or sometimes called single exponential smoothing is the most
basic method. It is a suitable method if the time series has no trend or seasonality, but a
slowly varying mean. Suppose for example a time series y1, . . . , yt, the forecasts are then
ˆyt+h|t=`t
`t=αyt+ (1 α)`t1
(1.1)
with ˆyt+h|tthe h-step ahead forecast. The degree of smoothing is determined by the
smoothing parameter α, which is usually estimated by minimizing the sum of squared
prediction errors. For trending and seasonal time series there is the Holt-Winters method.
It is also referred to as double exponential smoothing or exponential smoothing with
additive trend and seasonal component. It has additional parameters βand γwhich
determine the smoothing rate of the trend and the seasonal component. Both methods
were introduced in the late fifties. In 1969 Pegels suggested a multiplicative trend and
seasonal component. Later Gardner (1985) proposed exponential smoothing with damped
additive trend. Taylor (2003) shows that a damped multiplicative trend is also useful.
Damping a trend has in particular an advantage for long forecasting horizons h. The
forecast doesn’t go to infinity as with the regular additive or multiplicative trend, but
converges to a finite value. The extra parameter φdetermines the rate at which this
happens.
A disadvantage of exponential smoothing methods is that they are not outlier robust.
An observation has an unbounded influence on each subsequent forecast. The selection of
the smoothing parameters is also affected, since these are estimated by minimizing a sum
of squared forecasting errors. In the past there have been efforts to make exponential
smoothing methods robust. Gelper et al. (2010) proposed a methodology for robust
exponential smoothing. They also provided a way to estimate the smoothing parameters
robustly. Cipra and Hanzak (2011) have an alternative robust exponential smoothing
scheme for which Croux et al. (2008) had supplied a numerically stable algorithm of their
earlier proposal. A multivariate version of the simple exponential smoothing recursions
was robustified by Croux et al. (2010).
In this paper we aim to extend the existing robust methods to a more general class
exponential smoothing variants, including (damped) additive trends and additive or mul-
tiplicative seasonal components. The outline of the paper is as follows. First we review
the class of exponential smoothing methods. In the second section we propose the robust
2
methods. For each variant we robustify the recursions, smoothing parameter estimation
and choice of the starting values. In the fourth section the robust methods are tested
and compared in a simulation. In the last section the performance on a real data set is
measured and compared.
2 Exponential smoothing methods
We use the taxonomy of Hyndman et al. (2005) to describe the class of fifteen exponential
smoothing models. Each model can be described by three letters:
E, underlying error model: A (additive) or M (multiplicative),
T, type of trend: N (none), A (additive) or Ad(damped) and
S, type of seasonal: N (none), A (additive) or M (multiplicative).
For example: MAN is exponential smoothing with additive trend without seasonal com-
ponent and a multiplicative underlying model. All considered combinations are shown
in Table 1. The combinations ANM, AAM and AAdM are omitted because the predic-
tion intervals are not derived in Hyndman et al. (2005). Models with a multiplicative
(damped) trend are avoided for the same reason. In the next subsections we describe the
considered models in more detail.
2.1 Trend (T)
The forecasting equations of simple exponential smoothing are shown in equation (1.1).
However some time series move more persistently in one direction. For such series a full
trend (A) or a damped trend (Ad) might be useful. Suppose we have a time series yt,
Seasonal
Trend N (none) A (additive) M (multiplicative)
N (none) ANN/MNN ANA/MNA MNM
A (additive) AAN/MAN AAA/MAA MAM
Ad(damped) AAdN/MAdN AAdA/MAdA MAdM
forecasting formula (2.1) (2.2) (2.3)
Table 1: The fifteen considered exponential smoothing methods.
3
which is observed at t= 1, . . . , T. The forecasts of AAdN and MAdN can be computed
with a recursive scheme:
ˆyt+h|t=`t+
h
X
j=1
φjbt
`t=αyt+ (1 α)(`t1+φbt1)
bt=β(`t`t1) + (1 β)φbt1
(2.1)
with ˆyt+h|tthe forecast of yt+hmade at time t. By setting φ= 0, we have the forecasting
equations of ANN/MNN or simple exponential smoothing without trend. Setting φ= 1
gives the equations of AAN/MAN or exponential smoothing with a full additive trend.
The smoothing parameter αdetermines the rate at which the level `tis allowed to change.
If it close to zero the level stays almost constant and if it is one the level follows the
observations exactly. The parameter βdetermines the rate at which the trend may
change. The extra parameter φis related with how fast the local trend is damped.
Indeed, the longterm forecast converges to `t+φ
1φbtif h→ ∞. All these parameters
take values between zero and one.
2.2 Seasonal component (S)
It is also possible to model slowly changing seasonality effects. In models ANA/MNA,
AAA/MAA and AAdA/MAdA we add a seasonal component:
ˆyt+h|t=`t+
h
X
j=1
φjbt+stm+h+
m
`t=α(ytstm) + (1 α)(`t1+φbt1)
bt=β(`t`t1) + (1 β)φbt1
st=γ(yt`t1φbt1) + (1 γ)stm.
(2.2)
with h+
m=b(h1) mod mc+ 1. The number of seasons is m. The seasonal smoothing
parameter is γ. If the value is high, the seasonal components will quickly follow changes
in seasonality.
It turns out for a lot of time series a multiplicative seasonal component is more
4
suitable. The forecasting equations for methods MNM, MAM and MAdM are:
ˆyt+h|t= (`t+
h
X
j=1
φjbt)stm+h+
m
`t=αyt
stm+ (1 α)(`t1+φbt1)
bt=β(`t`t1) + (1 β)φbt1
st=γyt
`t1+φbt1+ (1 γ)stm.
(2.3)
2.3 Underlying models (E)
Hyndman et al. (2002) derived underlying models for each exponential smoothing variant.
Assuming a certain model is necessary to make prediction intervals, which in turn are
needed for outlier detection. It is also possible to set up a likelihood function to estimate
smoothing parameters. For simple exponential smoothing (defined by 1.1), the additive
error model is
yt=`t1+t
`t=`t1+αt
(2.4)
and the multiplicative error model is
yt=`t1(1 + t)
`t=`t1(1 + αt).
(2.5)
It is possible to check that both underlying models have the same optimal point forecasts.
The single source of error is t N (0, σ2). For the multiplicative model the lower tail is
truncated such that 1+tis positive. Because σ2is usually small in multiplicative models,
this truncation is negligible. The model with multiplicative errors is used when the
observations are strictly positive and when we expect that the error grows proportionally
with the observation value. For the models of other exponential smoothing methods, we
refer to Hyndman et al. (2002).
The underlying models are useful for prediction intervals. For the additive error
models the prediction interval at forecast horizon h= 1 is
ˆyt+1|tqσ, ˆyt+1|t+
with q2 for a 95% interval. For the multiplicative error models the interval is
ˆyt+1|t(1 ),ˆyt+1|t(1 + ).
5
The assumed underlying model determines the prediction intervals and the likelihood.
The prediction intervals will useful to determine whether an observation is an outlier.
An often heard remark about exponential smoothing models is that they are a sub-
class of ARIMA models. Some models can indeed be written as ARIMA models: ANN,
AAN, AAdN, ANA, AAA, and AAdA. As of how these models are related with ARIMA
models, Hyndman and Athanasopoulos (2013) is a good reference. It should be noted
that the parameterization of these models is different. With the seasonal models there are
restrictions on the parameters of the equivalent ARIMA, which makes the exponential
smoothing model less complex than their equivalent ARIMA. The multiplicative models
can not be represented as ARIMA models. For robust ARMA estimation we refer to
Muler et al. (2009).
3 Robust exponential smoothing methods
We make an adaptation to create robust forecasting equations. Next we provide a robust
way to estimate the smoothing parameters. The estimates are the solution of an opti-
mization problem, which needs to be solved numerically. The starting values are very
important to have a solution that is close to the global optimum. Therefore we also select
the starting values in a robust way. Last a robust information criterion is suggested to
compare several exponential smoothing variants.
3.1 Robust forecasting equations
For all considered exponential smoothing variants we robustify the forecasting equations
by replacing each observation ytwith a cleaned version y
t. If the one step ahead forecast
error ytˆy
t|t1exceeds ktimes the scale, we consider the observation to be an outlier.
The one step ahead predictions ˆy
t|t1are the predictions if the observations would have
been y
1, y
2, . . . , y
t1. Our choice for kis 3. If the one step prediction error would follow
a normal distribution this is equivalent with classifying observations outside the 99.7 %
prediction interval as outliers. An outlier is replaced by a cleaned observation equal to
the prediction plus or minus ktimes the scale:
y
t=ψytˆy
t|t1
ˆσtˆσt+ ˆy
t|t1(3.1)
6
with ψthe Huber function
ψ(x) =
xif |x|< k
sign(x)kotherwise
and with ˆσtan estimate of the scale of the one step ahead prediction error. The scale
can be estimated recursively in a robust way as in Gelper et al. (2010)
ˆσ2
t=λσρytˆy
t|t1
ˆσt1ˆσ2
t1+ (1 λσσ2
t1(3.2)
with ρ, the bounded biweight function
ρbiweight(x) =
ck(1 (1 (x/k)2)3) if |x|< k
ckotherwise
with k= 3, ck= 4.12 and with λσ= 0.1. If we assume an underlying multiplicative
model, we update the relative scale:
ˆσ2
t=λσρ"ytˆy
t|t1
ˆy
t|t1ˆσt1#ˆσ2
t1+ (1 λσσ2
t1.(3.3)
Outliers are replaced if the relative error is more than plus or minus ktimes the relative
scale:
y
t= 1 + ψ"ytˆy
t|t1
ˆy
t|t1ˆσt#ˆσt!ˆy
t|t1(3.4)
The robust forecasting equations for each method are the same as the non-robust
forecasting equations (2.1), (2.2) and (2.3), except that ytis replaced by y
t.
3.2 Robust parameter estimation
The parameters to be optimized are
θ= (α, β, φ, γ ).
We choose the value via a robust heuristic discussed in the next section. Depending
on the model being estimated, the parameters involving a (damped) trend (φ,β) or a
seasonal component (γ) are not included in θ. We will suggest a robust way to estimate
the parameters, but first we will review some non robust estimators.
7
3.2.1 Maximum likelihood
If we assume a single source of error state space model, we can use the procedure of Ord
et al. (1997) to estimate the parameters. They set up a likelihood function which can be
maximized. In Hyndman et al. (2002) the likelihood is derived for exponential smoothing
models. If an additive error model is assumed the maximum likelihood estimate is
ˆ
θ,ˆσ= argmax
θ
T
2log σ21
2σ2
T
X
t=1 ytˆyt|t1(θ)2.(3.5)
with ˆyt|t1(θ) the one step ahead prediction using the parameters set θ. The error variance
σis not important but is estimated as well. The likelihood is maximal if
σ2=1
T
T
X
t=1 ytˆyt|t1(θ)2.
The parameters can then simply be estimated by
ˆ
θ= argmax
θ
T
2log 1
T
T
X
t=1 ytˆyt|t1(θ)2!.(3.6)
This is actually a least squares estimate.
With a multiplicative error model, the estimate is
ˆ
θ,ˆσ= argmax
θ
T
2log σ2
T
X
t=1
log ˆyt|t1(θ)1
2σ2
T
X
t=1 ytˆyt|t1(θ)
ˆyt|t1(θ)2
.(3.7)
By setting the derivate to σequal to zero, we find
σ2=1
T
T
X
t=1 ytˆyt|t1(θ)
ˆyt|t1(θ)2
.
The parameters can then be estimated by
ˆ
θ= argmax
θ
T
2log 1
T
T
X
t=1 ytˆyt|t1(θ)
ˆyt|t1(θ)2!
T
X
t=1
log ˆyt|t1(θ).(3.8)
Other alternatives are minimizing the mean absolute percentage error (MAPE), the resid-
ual variance or an average sum of squared prediction errors at several horizons (AMSE).
3.2.2 Maximum robustified likelihood
All of these methods are not outlier robust. We suggest to replace the sum of squares by
aτ2estimator in the likelihood functions. The τ2is a robust estimator of scale proposed
8
by Yohai and Zamar (1988). We will use exactly the same variant as Gelper et al. (2010).
Suppose a set of residuals of some computation e1, . . . , eTand assume they have a normal
distribution. An asymptotically unbiased estimator of the scale is
ˆσ2{et}1tT=1
T
T
X
t=1
e2
t.(3.9)
A robust alternative is the τ2. It is consistent and has a breakdown point of 50%. It is
computed as follows:
τ2{et}1tT=s2
T
T
T
X
t=1
ρet
sT(3.10)
with sT= 1.4826 med
t|et|and with ρagain the biweight ρfunction with k= 2. The
bound on ρmakes that the τ2estimator is very robust to outlying observations.
If we assume an additive or a multiplicative error model, the errors are normally
distributed. To achieve robustness, we replace the mean sum of squared errors by the τ2
estimator. For the additive model the robust (concentrated) likelihood is:
roblikA(θ) = T
2log "s2
T(θ)
T
T
X
t=1
ρ"ytˆy
t|t1(θ)
sT(θ)## (3.11)
with sT(θ) = 1.4826 med
t|ytˆy
t|t1(θ)|. The estimator is
ˆ
θ= argmax
θ
roblikA(θ).(3.12)
For the multiplicative model the robust likelihood is
roblikM(θ) = T
2log "s2
T(θ)
T
T
X
t=1
ρ"ytˆy
t|t1(θ)
ˆy
t|t1(θ)sT(θ)##
T
X
t=1
log ˆy
t|t1(θ)(3.13)
with sT(θ)=1.4826 med
t
ytˆy
t|t1(θ)
ˆy
t|t1(θ). However the robust likelihood behaves badly. If
there exists a parameter θsuch that one prediction ˆy
t|t1(θ) is close to zero, the robust
likelihood can become unbounded due to the robustness of the tau estimator. Such a
degenerate solution should be avoided. Therefore we minimize a robust version of the
mean squared percentage error instead. The estimator is then
ˆ
θ= argmin
θ
s2
T(θ)
T
T
X
t=1
ρ"ytˆy
t|t1(θ)
ˆy
t|t1(θ)sT(θ)#.(3.14)
In order to solve the numerical optimization problems of (3.12) and (3.14), we need
an initial guess for θ. In Hyndman and Khandakar (2008) the initial values of the
smoothing parameters (α, β, φ, γ) are chosen in a data independent way. Therefore there
is no robustness issue here, and that is why we choose the initial values in exactly the
same way.
9
3.3 Robust initial values
The initial values can be found via a robust heuristic. Although effects of the initial
values `0,b0and sm+1 till s0decay exponentially, it is still important to also select these
values in a robust way. The estimation of the parameters may be heavily influenced by
non robustly chosen initial values, especially because exponential smoothing methods are
often used for modeling short time series.
The initial values are found by using a small startup period of observations y1, . . . , yS.
We take S= 5mwith mthe number of seasons. If 5m>T, we choose S=T. The
standard non robust way to find `0and b0by regressing t= 1 . . . S on y1, . . . , yS, resulting
in an estimate of the intercept ˆ
`0and of the coefficient ˆ
b0. The initial values are
`0=ˆ
`0
b0=ˆ
b0.
The suggested robust alternative is doing a robust regression. The repeated median is
such a method. Fried (2004) applied it to discover trends in short time series. The
estimates are
ˆ
`0= medi(yiˆ
b0i) and ˆ
b0= medimedi6=jyiyj
ij
with i, j = 1 . . . S. The initial seasonal components are usually found by taking the mean
difference from the regression line ˆyt=ˆ
`0+ˆ
b0tfor each season. We will take the median
difference:
sqm= med [yqˆyq, yq+mˆyq+m, . . . , yq+Smˆyq+Sm] for q= 1, . . . , m.
If the seasonal component is multiplicative, the computation is slightly different:
sqm= med yq
ˆyq
,yq+m
ˆyq+m
,...,yq+Sm
ˆyq+Smfor q= 1, . . . , m.
To be robust the startup period Sshould be at least three times the period m, because
otherwise the median is just equal to the mean, which isn’t robust. An initial guess of
the scale is ˆσ0is the median absolute deviation (MAD) of the residuals of the robust
regression.
10
3.4 Robust information criterion
We suggest a robust information criterion to compare between the models. The definition
of the robust AIC is
robaic = 2 roblik + 2p(3.15)
with pthe number of parameters in the optimization routine. The formulas for a robust
BIC and AICC are
robbic = 2 roblik + log(T)p(3.16)
and
robaicc = 2 roblik + 2 pT
Tp1.(3.17)
These definitions are the same as in Hyndman and Khandakar (2008), but with the
robustified likelihood. Note that although in the multiplicative model the parameters are
found via (3.14), the robust likelihood of (3.13) is used in the information criterion.
The robust AICC (3.17) will be used to compare several exponential smoothing vari-
ants. In order to apply the method, all models from Table 1 will be computed and the
one with the lowest robust AICC will be selected.
4 R-package: robets
We adapted the ets function the forecast package of Hyndman and Khandakar (2008)
to a robust version called robets. The function has the same possibilities as ets. We
illustrate the ease of use with an example. Given a time series object y, predictions can
be done as follows:
model1 <- ets(y) # Hyndman and Khandakar (2008)
model2 <- robets(y) # our procedure
plot(forecast(model1, h = 8)) # first plot
plot(forecast(model2, h = 8)) # second plot
The function robets works as ets except that our robust methodology is applied instead.
An additional feature of the package is the detection of outliers. The attribute outlier
is a boolean which indicates whether an observation is a outlier or not.
outliers <- model2$outlier
11
Forecasts from ETS(M,A,N)
5 10 15
1.0 1.5 2.0 2.5 3.0 3.5
Forecasts from ROBETS(M,Ad,M)
5 10 15
1.0 1.5 2.0 2.5
Forecasts from ROBETS(M,Ad,M)
5 10 15
1.0 1.5 2.0 2.5
For this example the outliers are highlighted in red:
The package is practical, but also fast. The numerical optimization problem is solved
with code written in C++. The computation time mainly depends on the length of the
time series. We timed our code with the microbenchmark package for different lengths
Tin Table 2. Each time we did one hundred replications and took the average time. We
did this with two methods, both implemented in the package robets. The non-robust
method is applied with the command
robets(y, opt.crit = "lik", rob.start.initial.values = FALSE,
ic = "aicc", k = 1000)
The initial values for this method are computed in exactly the same way as Hyndman
and Khandakar (2008). Our robust method is executed with robets(y). The default
options coincide with the choice made in the method description above.
In both methods exactly the same code was executed at each function evaluation. In
12
the program, the likelihood and the robustified likelihood are always computed, regardless
of the optimization criterion (opt.crit) that is selected. That is why the difference
in computation time between the non-robust and robust method is due to the faster
convergence of the optimization problem with the non-robust method.
time series length non-robust method robets
25 5 7
50 7 10
75 9 15
100 9 20
200 14 35
Table 2: Average computation time in milliseconds.
13
5 Simulation study
In this section we study the effect of outliers on the methodology of Hyndman and Khan-
dakar (2008) and our robust method. We generate time series with the underlying models
of the fifteen variants of 1. The parameters for those models are chosen such that it is
possible to distinguish the models from each other, even with short time series. We choose
σ= 0.05 for all models, additive or multiplicative. The other choices are α= 0.36, l0= 1,
β= 0.21, b= 0.05, φ= 0.9 and γ= 0.2. For models with additive seasonality the initial
seasonal component is s3:0 = (s3, s2, s1, s0)=(0.01,0.01,0.03,0.03) if the data
are quarterly and s11:0 = (0.01,0.01,0.03,0.03,0,0,0.04,0.04,0.05,0.01,0.03,0.01)
if monthly. For models with multiplicative seasonality the initial seasonal component is
s3:0 = (0.99,1.01,1.03,0.97) for quarterly data and
s11:0 = (0.99,1.01,1.03,0.97,1,1,1.04,0.96,0.95,1.01,1.03,1.01)
for monthly data.
We consider two types of outliers: symmetric and asymmetric outliers. To generate
time series with outliers, we adapt the underlying model. For simple exponential smooth-
ing method the equation of the additive error model is (2.4) and for the multiplicative
model (2.5). Via this model the clean simulations will be generated. We add a contam-
ination utto the source of randomness t N (0, σ2). We replace tby t+utin the
observation equation of the additive model. For simple exponential smoothing with the
additive error model, the contaminated model is
yt=`t1+t+ut
`t=`t1+αt.
(5.1)
For the multiplicative error model we choose the following contaminated model
yt=`t1f(t+ut)
`t=`t1(1 + αt).
(5.2)
with
f(x) =
exp(x) if x < 0
1 + xotherwise.
The function f[t+ut] is introduced instead of 1+t+utto avoid outliers below zero. For
other exponential smoothing variants the contaminated models are completely analogous.
14
The distribution of the contamination utfor symmetric outliers (SO) is ut= 0 with
probability 1-and ut
iid
N (0, K2σ2) with probability . For asymmetric outliers (AO)
the distribution is ut= 0 with probability 1-and ut
iid
N (Kσ, σ2) with probability .
Unless mentioned otherwise we set (, K) = (0.05,20).
5.1 Known smoothing parameters and initial values
First we do simulations with known parameters and initial values. Outliers indeed affect
the estimation of the parameters, but even if the parameters are known, outlying obser-
vations distort the forecasts. That is why in section (3.1) the forecasting equations are
robustified. Observations ytthat lie more than k= 3 times the scale away from the one
step ahead prediction ˆyt|t1are replaced as in equation (3.1).
For each simulation we calculate the one step ahead forecasting errors and compute
the in sample root mean squared error and the in sample root τ2error (3.10) with the
error sequence
{et}1tT=ytˆyt|t11tT.(5.3)
For additive error models the expected squared error is equal to σ2. We expect that
in clean simulations the RMSE and root τ2error (RτSE) are equal to σ= 0.05. For
multiplicative error models it is the relative RMSE that is expected to be σ. For those
models we will compare measures of the relative error
{et}1tT=ytˆyt|t1
ˆyt|t11tT
.(5.4)
In Table 3 we do a comparison of the non robust and the robust forecasting equations.
The time series length is T= 40 and the number of seasons per period is m= 4. The
RMSE and root τ2error (RTSE) are about equal for every model and every estimator
if there are no outliers. If there are outliers however the robust forecasting equations
result in smaller prediction errors. The RMSE is still large, but that is because the large
outliers blow up the RMSE. Also robust forecasting equations can’t predict outliers. The
difference in root τ2error is more informative, because this measure is determined by a
large part of the errors. Because the τ2is not affected as much by the few outliers, we
can conclude that forecasts of non-outlying observations are way better with the robust
equations.
15
no outliers symmetric outliers
generating C R C R
model RMSE RτSE RMSE RτSE RMSE RτSE RMSE RτSE
ANN 4.94 4.95 4.94 4.95 21.66 7.41 20.08 5.65
ANA 5.01 5.00 5.01 5.00 22.34 9.87 19.81 5.88
AAN 5.00 5.00 5.00 5.00 22.77 8.15 20.13 5.84
AAA 5.00 5.00 5.00 5.00 24.52 11.50 20.77 6.10
AAdN 4.95 4.93 4.95 4.93 22.52 7.76 20.08 5.75
AAdA 4.99 4.97 4.99 4.97 24.49 11.18 20.85 6.05
MNN 4.96 4.96 4.96 4.97 17.44 7.01 16.40 5.60
MNA 4.98 4.98 4.98 4.98 17.67 8.71 15.97 5.79
MAN 4.96 4.94 4.96 4.94 18.61 7.61 16.89 5.75
MAA 4.98 4.96 4.98 4.96 18.79 9.14 16.45 5.97
MAdN 4.98 4.95 4.98 4.95 18.19 7.32 16.56 5.69
MAdA 5.00 4.99 5.00 4.99 18.59 9.31 16.17 6.00
MNM 4.99 4.98 4.99 4.98 18.63 8.89 16.95 5.81
MAM 5.01 4.98 5.01 4.98 19.31 9.60 16.96 5.91
MAdM 4.99 4.98 4.99 4.98 19.24 9.57 16.99 5.95
Table 3: The average in-sample RMSE and root τ2error (RτSE) ×100 over 500 simulation
runs for each model variant. The parameters and model are known. We refer to classic
exponential smoothing with C, and to the proposed robust method with R.
For time series with a different length we expect the same results. In simulations
not reported in this paper, the conclusion were the same. Also for simulations with
asymmetric outliers the same conclusion holds.
5.2 Known model and unknown parameters
We want to test whether the robust parameter estimator of section 3.2 performs well, if the
exponential smoothing model is known. The time series are again simulated as explained
at the start of this section. The time series are of length T= 40 and the number of seasons
is m= 4. If we have a look at the time series in the M3-competition (Makridakis and
16
Hibon, 2000), this seems realistic. To compare the methods, we add hmax = 8 observations
to compare the out-of-sample performance. Because it is impossible to predict outliers,
we don’t allow outliers in the out-of-sample period.
The robust methodology is implemented in the function robets of the robets package
in R. In the robust method we solve the optimization problems of equation (3.12) or (3.14),
depending on the error model. The initial values are chosen robustly as explained in
section 3.3. We compare with a non-robust method, where the optimization problems of
equation (3.6) or (3.8) are solved with the initial values as in Hyndman and Khandakar
(2008).
h= 1
generating no outliers symmetric outliers
model C R C R
ANN 5 (0.15) 4.98 (0.15) 8.5 (0.88) 5.22 (0.16)
ANA 5.39 (0.17) 5.39 (0.17) 12.44 (1.39) 5.49 (0.19)
AAN 4.94 (0.16) 4.98 (0.16) 12.6 (0.86) 5.65 (0.22)
AAA 5.46 (0.18) 5.66 (0.19) 17.9 (3.31) 5.94 (0.18)
AAdN 5.38 (0.17) 5.48 (0.18) 12.56 (1.77) 5.41 (0.17)
AAdA 5.19 (0.16) 5.23 (0.16) 15.93 (1.57) 5.96 (0.21)
MNN 5.29 (0.19) 5.27 (0.19) 14.43 (3.36) 5.37 (0.2)
MNA 5.02 (0.15) 5.16 (0.16) 9.44 (0.49) 5.78 (0.18)
MAN 17.15 (0.84) 16.99 (0.86) 55.82 (10.3) 17.5 (0.75)
MAA 17.27 (0.67) 17.72 (0.71) 46.12 (8.2) 18.1 (0.76)
MAdN 7.78 (0.28) 7.71 (0.28) 22.59 (3.65) 8.67 (0.32)
MAdA 8.09 (0.3) 8.03 (0.32) 22.11 (2.72) 9.24 (0.35)
MNM 5.03 (0.17) 5.15 (0.18) 12.54 (2.43) 5.78 (0.19)
MAM 16.05 (0.69) 17.22 (0.73) 44.06 (4.32) 19.25 (0.8)
MAdM 7.56 (0.3) 7.58 (0.26) 18.74 (2.69) 8.07 (0.3)
Table 4: The root mean squared out-of-sample forecasting error (×100) over 500 simula-
tion runs for each model variant. The model is known, but the parameters are estimated.
We refer to the non-robust method with C, and to the proposed robust method with R.
The standard error of the RMSE is between brackets.
17
For each method in Table 1 we compare the out-of-sample error at different horizons.
We compare for horizon h= 1 with
eh=yT+hˆyT+h|T
and then take the root mean squared error over the prediction errors of 500 simulations
generated from the same setting. For the clean simulations the RMSE is slightly larger
with the robust method than with the classic method in Table 4. In the contaminated
setting we see a drop in MSE with the robust method for the considered horizon. We did
simulations with different time series length Tand at different horizons, for all models.
Even if the time series length is small and the number of seasons is large (m = 12), the
RMSE is seriously lower with the robust method in a contaminated setting. These results
are not reported in the paper.
5.3 Unknown model
We repeat the previous simulation study, but this time we consider the model to be
unknown. For every generated time series all fifteen models are estimated and the one
with the lowest AICC gets selected. The AICC is the criterion preferred by Hyndman and
Khandakar (2008) and also in our experience it selects the models the best. In Table 5
the root mean squared forecasting error over 500 simulations at horizon h= 1 is shown.
This is for simulations of length T= 40 with m= 4. As expected the numbers are the
slightly larger than in Table 4, but the conclusions are similar. The robust method is
slightly worse than the non-robust method for time series without outliers, but clearly
better for all models in the contaminated time series.
In Table 6 we have a look at how well each model gets selected. The robust method
is worse in selecting the correct model than the non-robust method, but apparently this
is not a problem for forecasting purposes.
18
h= 1
generating no outliers symmetric outliers
model C R C R
ANN 5.18 (0.16) 5.38 (0.16) 12.55 (1.6) 5.7 (0.22)
ANA 5.51 (0.17) 5.58 (0.18) 14.12 (1.78) 5.56 (0.18)
AAN 5.1 (0.17) 5.46 (0.18) 14.77 (0.86) 6.09 (0.24)
AAA 5.56 (0.18) 5.75 (0.18) 19.53 (3.84) 6.29 (0.21)
AAdN 5.72 (0.17) 5.91 (0.2) 18.36 (3.16) 6.03 (0.22)
AAdA 5.29 (0.17) 5.46 (0.17) 15.52 (1.17) 6.19 (0.24)
MNN 5.46 (0.19) 5.68 (0.21) 19.39 (3.91) 5.59 (0.19)
MNA 5.18 (0.16) 5.32 (0.17) 10.67 (0.95) 5.94 (0.21)
MAN 17.42 (0.86) 17.87 (0.8) 54.06 (11.3) 17.99 (0.75)
MAA 17.46 (0.68) 17.69 (0.72) 46.58 (9.54) 18.41 (0.69)
MAdN 7.95 (0.29) 8.16 (0.29) 23.61 (2.71) 9.07 (0.38)
MAdA 8.29 (0.31) 8.58 (0.33) 19.61 (1.77) 9.1 (0.33)
MNM 5.03 (0.17) 5.44 (0.19) 10.2 (0.9) 5.83 (0.21)
MAM 16.08 (0.72) 17.01 (0.72) 42.3 (3.98) 18.22 (0.74)
MAdM 7.65 (0.31) 7.94 (0.3) 20.45 (2.84) 7.94 (0.32)
Table 5: The root mean squared out-of-sample forecasting error over 500 simulation runs
for each model variant. The model is unknown and gets selected via the AICC with
the non-robust method and via robAICC (3.17) in the robust method (robets). The
standard error of the RMSE is between brackets.
19
generating no outliers symm outliers
model C R C R
ANN 23 10 12 14
ANA 35 14 34 21
AAN 34 19 17 21
AAA 61 24 35 29
AAdN 20 8 11 13
AAdA 27 9 13 17
MNN 19 11 15 13
MNA 21 12 32 13
MAN 43 16 30 17
MAA 21 19 13 15
MAdN 16 7 9 10
MAdA 8 9 3 10
MNM 27 16 11 12
MAM 73 26 39 19
MAdM 22 9 7 7
Table 6: The percentage of correctly chosen models over 500 simulation runs for each
model variant.
20
6 Application
We apply the methodology to 3003 time series of the M3 competition of Makridakis and
Hibon (2000). The median length of the time series is 69, the smallest is of length 20 and
the longest 144. The data are yearly, quarterly or monthly. The first part of the time
series is used for estimation and the last hdata points are used out of sample. For yearly
time series, h= 6, for quarterly h= 8 and for monthly, h= 18.
We will compare the out-of-sample symmetric Mean Absolute Percentage Error (sMAPE).
It is a metric that is independent of the scale, which is useful here because the different
time series have a very different scale. We choose this measure because it has been used
by Makridakis and Hibon (2000). It also has a limited value between 0 and 200% which
makes it robust against occasional very bad performances. The formula for time series yi
at horizon his:
sMAPEh=100%
3003
3003
X
i=1
2
yti+h,i ˆyti+h|ti,i
yti+h,i + ˆyti+h|ti,i
.(6.1)
with tithe time stamp of the last point of the estimation period. The out-of-sample
sMAPE over all horizons with Hyndmans methodology and with our robust method is
shown in Table 7. The results of this table can directly be compared with Table 6 of
Makridakis and Hibon (2000). In comparison with the methods considered in that article,
the ets method is among the best methods at every horizon h, while robets is among
the worst.
We also compare some other performance measures, such as the median symmetric
absolute percentage error (APE), which is the median instead of the symmetric mean
APE. The robets method is again worse than ets, but the difference are much smaller
than with the sMAPE.
Method Forecasting horizon h
1 2 3 4 5 6 8 12 15 18
ets 8.5 9.5 11.5 13.2 15.5 14.9 12.7 13.5 17.0 19.1
robets 10.5 12.4 13.7 16.7 18.0 21.9 14.3 17.5 45.5 24.4
Table 7: The symmetric MAPE for all data.
21
Method Forecasting horizon h
1 2 3 4 5 6 8 12 15 18
ets 3.0 3.8 4.6 5.9 6.3 6.7 6.2 7.0 9.0 10.0
robets 3.3 4.1 4.9 6.1 6.4 7.1 6.4 7.1 9.7 10.6
Table 8: The median symmetric APE for all data.
7 Conclusion
We have made a robust version of the exponential smoothing framework of Hyndman
and Khandakar (2008). It has robust forecasting equations, robust smoothing parameter
estimation and robust model selection. The method is outlier robust but it has a worse
out-of-sample forecasting performance than Hyndmans method in the M3 competition.
One reason is that we computed the initial values via a heuristic and Hyndman and
Khandakar (2008) estimated the smoothing parameters together with the initial values
by maximizing a likelihood function. The computation time increases but the out-of-
sample forecasting performance is also better. We choose to not optimize the initial
values, because in 1 in 1000 time series this leads to an unstable solution. However the
R-package along this paper provides the option opt.initial.values.
If forecasting is the purpose, the method can be used as a comparison with the ets
method of Hyndman and Khandakar (2008) to find time series for which the forecasts
are very different. These time series can then be labeled as problematic and investigated
further. Apart from forecasting the proposed robust method can be used for outlier
detection.
The ease of use and the speed of the robets package, makes that using this method
requires little effort. It can be a tool to find time series with outliers in large number of
time series.
22
References
Cipra, T. and Hanzak, T. (2011). Exponential smoothing for time series with outliers.
Kybernetika,47(2), 165–178.
Croux, C., Gelper, S., and Fried, R. (2008). Computational aspects of robust holt-winters
smoothing based on m-estimation. Applications of Mathematics,53(3), 163–176.
Croux, C., Gelper, S., and Mahieu, K. (2010). Robust exponential smoothing of multi-
variate time series. Computational Statistics and Data Analysis,54, 2999–3006.
Fried, R. (2004). Robust filtering of time series with trends. Nonparametric Statistics,
16, 313–328.
Gardner, E. S. (1985). Exponential smoothing: the state of the art. Journal of Forecast-
ing,4, 1–28.
Gelper, S., Fried, R., and Croux, C. (2010). Robust forecasting with exponential and
holt-winters smoothing. Journal of Forecasting,29, 285–300.
Hyndman, R. J. and Athanasopoulos, G. (2013). Forecasting: Principles & practice.
Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: The
forecast package for r. Journal of Statistical Software ,27(3).
Hyndman, R. J., Koehler, A. B., Snyder, R. D., and Grose, S. (2002). A state space
framework for automatic forecasting using exponential smoothing methods. Journal of
Forecasting,18, 439–454.
Hyndman, R. J., Koehler, A. B., Ord, J. K., and Snyder, R. D. (2005). Prediction intervals
for exponential smoothing using two new classes of state space models. Journal of
Forecasting,24, 17–37.
Makridakis, S. and Hibon, M. (2000). The m3-competition: results, conclusions and
implications. Journal of Forecasting,16, 451–476.
Muler, N., Pe˜na, D., and Yohai, V. J. (2009). Robust estimation for arma models. Annals
of Statistics,37(2), 816–840.
23
Ord, J. K., Koehler, A. B., and Snyder, R. D. (1997). Estimation and prediction for
a class of dynamic nonlinear statistical models. Journal of the American Statistical
Association,92(440), 1621–1629.
Taylor, J. W. (2003). Exponential smoothing with damped multiplicative trend. Journal
of Forecasting,19, 715–725.
Yohai, V. J. and Zamar, R. H. (1988). High breakdown-point estimates of regression by
means of the minimization of an efficient scale. Journal of the American Statistical
Association,83(402), 406–413.
24
... • In time series analysis exponential smoothing methods are popular because they are straightforward and the whole forecasting procedure can happen automatically [2]. Yet it is often used in practice where it shows good performance. ...
... SSS model is also single parameter forecasting model defined by (2), whose settings are presented in [11]: ...
... • > 1 -The last available data has impact on the next forecast in the bigger rate than its value is. Considering (1), (2) and (3) it can be concluded that in the case = 1, 2pSSS model becomes SSS model. In the case = 1and = 1 2pSSS model becomes SS model. ...
... The advantage of this approach is that we avoid the need to scale the residuals, since the scale is part of the resulting quantile, but also bounded to 50% < p ≤ 100%, as the 50% quantile is the median, giving us a limited search space. This approach results in a data-driven setting of q for the target time series, which may help overcome the poor reported performance of Huber loss based exponential smoothing by Crevits and Croux (2016), where the standard prefixed threshold was considered. Herein also lies another contribution of this work. ...
Article
Full-text available
A major challenge in automating the production of a large number of forecasts, as often required in many business applications, is the need for robust and reliable predictions. Increased noise, outliers and structural changes in the series, all too common in practice, can severely affect the quality of forecasting. We investigate ways to increase the reliability of exponential smoothing forecasts, the most widely used family of forecasting models in business forecasting. We consider two alternative sets of approaches, one stemming from statistics and one from machine learning. To this end, we adapt M-estimators, boosting and inverse boosting to parameter estimation for exponential smoothing. We propose appropriate modifications that are necessary for time series forecasting while aiming to obtain scalable algorithms. We evaluate the various estimation methods using multiple real datasets and find that several approaches outperform the widely used maximum likelihood estimation. The novelty of this work lies in (1) demonstrating the usefulness of M-estimators, (2) and of inverse boosting, which outperforms standard boosting approaches, and (3) a comparative look at statistics versus machine learning inspired approaches.
... Time series data is one type of data is collected according to the order of time in a particular time span [2]. Research on forecasting has many uses, including Seasonal Holt Winters [3], Exponential Smoothing [4] and Moving Average [5]. Forecasting a time series data need to pay attention to the type or pattern data. ...
Article
Full-text available
Bandung, a city in Indonesia is one of the favorite tourist destinations for foreign tourists. The purpose of this writing is predicted to the abundance of foreign tourists who come to Bandung city using the time series methods. Data the data used are foreign tourists entering through the Husein Sastranegara airport from the year 2010 to 2017. This research using Seasonal Autoregressive Integrated Moving Average Method in forecasting foreign tourists who come to the city of Bandung. Model accuracy was measured by comparing the percentage of the value of forecasting with the true value. This value is called the Mean Absolute Deviation (MAD). Based on the results of the comparison, the best value of SARIMA model MAD SARIMA model is the smallest (0, 1, 1) (1, 0, 0)¹² with the value of the MAD 484,04. From the results it can be concluded that the model in the SARIMA model for forecasting made worth more.
... Geçmiş gözleme ait olan öngörü, 1-α ifadesi ile gösterilirken, l t-1 ifadesi bir periyod öncesine ait olan gerçek değerdir. Tahmin değeri hesaplanırken (2) nolu eşitlik kullanılır [15]. ...
Article
Business analytics plays an important role in optimizing the management of product marketing strategies. One of the most popular analytical tools in business analytics is sales forecasting. Businesses need to conduct sales forecasting to optimize marketing management in the form of product availability predictions, predictions of capital adequacy, consumer interest, and product price governance. However, the problem that is often encountered in forecasting is the number of forecasting methods available so that it makes it difficult for business people to choose the best forecasting method. The aims of this research is to develop a forecasting software tha can be accessed online based on computational intelligence, which is a software that can make forececasting with various methods and then intelligently choose the best forecasting method. The software development method used in this study is the SDLC with waterfall model. The result of this research is the Auto sales forecasting software was developed using the R programming language by combining various package and can be accessed online through the page Http://bakrizal.com/AutoSalesForecasting. This software can be used to conduct forecast analysis with various methods such as Simple Moving Average, Robust Exponential Smoothing, Auto ARIMA, Artificial Neural Network, Holt-Winters, and Hybrid Forecast. This software contains intelligence computing to choose the best forecasting method based on the smallest RMSE value. After testing the sales transaction data at the Futry Bakery & Cake Shop in Makassar, the results show that the Robust Exponantial Smoothing method is the best forecasting method with an RMSE value of 0.829
Article
Full-text available
We develop and test a robust procedure for extracting an underlying signal in form of a time-varying trend from very noisy time series. The application we have in mind is online monitoring data measured in intensive care, where we find periods of relative constancy, slow monotonic trends, level shifts and many measurement artifacts. A procedure is needed which allows a fast and reliable denoising of the data and which distinguishes artifacts from clinically relevant changes in the patient’s condition. We use robust regression functionals for local approximation of the trend in a moving time window. For further improving the robustness of the procedure we investigate online outlier replacement by e.g. trimming or winsorization based on robust scale estimators. The performance of several versions of the procedure is compared in important data situations and applications to real and simulated data are given.
Article
Full-text available
Recursive time series methods are very popular due to their numerical simplicity. Their theoretical background is usually based on Kalman filtering in state space models (mostly in dynamic linear systems). However, in time series practice one must face frequently to outlying values (outliers), which require applying special methods of robust statistics. In this paper a simple robustification of the Kalman filter is suggested using a simple truncation of the recursive residuals. Then this concept is applied mainly to various types of exponential smoothing (recursive estimation in Box-Jenkins models with outliers is also mentioned). The methods are demonstrated using simulated data.
Article
Full-text available
To obtain a robust version of exponential and Holt-Winters smoothing the idea of M-estimation can be used. The difficulty is the formulation of an easy-to-use recursive formula for its computation. A first attempt was made by Cipra (Robust exponential smoothing, J. Forecast. 11 (1992), 57–69). The recursive formulation presented there, however, is unstable. In this paper, a new recursive computing scheme is proposed. A simulation study illustrates that the new recursions result in smaller forecast errors on average. The forecast performance is further improved upon by using auxiliary robust starting values and robust scale estimates.
Article
Full-text available
Automatic forecasts of large numbers of univariate time series are often needed in business and other contexts. We describe two automatic forecasting algorithms that have been implemented in the forecast package for R. The first is based on innovations state space models that underly exponential smoothing methods. The second is a step-wise algorithm for forecasting with ARIMA models. The algorithms are applicable to both seasonal and non-seasonal data, and are compared and illustrated using four real time series. We also briey describe some of the other functionality available in the forecast package.
Article
A new class of robust estimates, τ estimates, is introduced. The estimates have simultaneously the following properties: (a) they are qualitatively robust, (b) their breakdown point is .5, and (c) they are highly efficient for regression models with normal errors. They are defined by minimizing a new scale estimate, τ, applied to the residuals. Asymptotically, a τ estimate is equivalent to an M estimate with a ψ function given by a weighted average of two ψ functions, one corresponding to a very robust estimate and the other to a highly efficient estimate. The weights are adaptive and depend on the underlying error distribution. We prove consistency and asymptotic normality and give a convergent iterative computing algorithm. Finally, we compare the biases produced by gross error contamination in the τ estimates and optimal bounded-influence estimates.
Article
This paper is a critical review of exponential smoothing since the original work by Brown and Holt in the 1950s. Exponential smoothing is based on a pragmatic approach to forecasting which is shared in this review. The aim is to develop state-of-the-art guidelines for application of the exponential smoothing methodology. The first part of the paper discusses the class of relatively simple models which rely on the Holt-Winters procedure for seasonal adjustment of the data. Next, we review general exponential smoothing (GES), which uses Fourier functions of time to model seasonality. The research is reviewed according to the following questions. What are the useful properties of these models? What parameters should be used? How should the models be initialized? After the review of model-building, we turn to problems in the maintenance of forecasting systems based on exponential smoothing. Topics in the maintenance area include the use of quality control models to detect bias in the forecast errors, adaptive parameters to improve the response to structural changes in the time series, and two-stage forecasting, whereby we use a model of the errors or some other model of the data to improve our initial forecasts. Some of the major conclusions: the parameter ranges and starting values typically used in practice are arbitrary and may detract from accuracy. The empirical evidence favours Holt's model for trends over that of Brown. A linear trend should be damped at long horizons. The empirical evidence favours the Holt-Winters approach to seasonal data over GES. It is difficult to justify GES in standard form–the equivalent ARIMA model is simpler and more efficient. The cumulative sum of the errors appears to be the most practical forecast monitoring device. There is no evidence that adaptive parameters improve forecast accuracy. In fact, the reverse may be true.
Article
Multiplicative trend exponential smoothing has received very little attention in the literature. It involves modelling the local slope by smoothing successive ratios of the local level, and this leads to a forecast function that is the product of level and growth rate. By contrast, the popular Holt method uses an additive trend formulation. It has been argued that more real series have multiplicative trends than additive. However, even if this is true, it seems likely that the more conservative forecast function of the Holt method will be more robust when applied in an automated way to a large batch of series with different types of trend. In view of the improvements in accuracy seen in dampening the Holt method, in this paper we investigate a new damped multiplicative trend approach. An empirical study, using the monthly time series from the M3-Competition, gave encouraging results for the new approach at a range of forecast horizons, when compared to the established exponential smoothing methods.
Article
We provide a new approach to automatic forecasting based on an extended range of exponential smoothing methods. Each method in our taxonomy of exponential smoothing methods provides forecasts that are equivalent to forecasts from a state space model. This equivalence allows: (1) easy calculation of the likelihood, the AIC and other model selection criteria; (2) computation of prediction intervals for each method; and (3) random simulation from the underlying state space model. We demonstrate the methods by applying them to the data from the M-competition and the M3-competition. The method provides forecast accuracy comparable to the best methods in the competitions; it is particularly good for short forecast horizons with seasonal data.
Article
This paper describes the M3-Competition, the latest of the M-Competitions. It explains the reasons for conducting the competition and summarizes its results and conclusions. In addition, the paper compares such results/conclusions with those of the previous two M-Competitions as well as with those of other major empirical studies. Finally, the implications of these results and conclusions are considered, their consequences for both the theory and practice of forecasting are explored and directions for future research are contemplated.
Article
Robust versions of the exponential and Holt-Winters smoothing method for forecasting are presented. They are suitable for forecasting univariate time series in the presence of outliers. The robust exponential and Holt-Winters smoothing methods are presented as recursive updating schemes that apply the standard technique to pre-cleaned data. Both the update equation and the selection of the smoothing parameters are robustified. A simulation study compares the robust and classical forecasts. The presented method is found to have good forecast performance for time series with and without outliers, as well as for fat-tailed time series and under model misspecification. The method is illustrated using real data incorporating trend and seasonal effects. Copyright © 2009 John Wiley & Sons, Ltd.