ArticlePDF Available

Abstract and Figures

The main objective of this paper is to provide analytical expression for forecast variances that can be used in prediction intervals for the exponential smoothing methods. These expressions are based on state space models with a single source of error that underlie the exponential smoothing methods. In cases where an ARIMA model also underlies an exponential smoothing method, there is an equivalent state space model with the same variance expression. We also discuss relationships between these new ideas and previous suggestions for finding forecast variances and prediction intervals for the exponential smoothing methods.
Content may be subject to copyright.
ISSN 1440-771X
ISBN 0 7326 1091 5
Prediction Intervals for Exponential Smoothing State Space Models
Rob J Hyndman, Anne B Koehler, J. Keith Ord and Ralph D Snyder
Working Paper 11/2001
2001
DEPARTMENT OF ECONOMETRICS
AND BUSINESS STATISTICS
AUSTRALIA
Prediction intervals for exponential smoothing state
space models
Rob J Hyndman
1
, Anne B. Koehler
2
, J. Keith Ord
3
, Ralph D. Snyder
1
9 November 2001
Abstract: The main objective of this paper is to provide analytical expressions for forecast
variances that can be used in prediction intervals for the exponential smoothing methods. These
expressions are based on state space models with a single source of error that underlie the
exponential smoothing methods. Three general classes of the state space models are presented.
The first class is the standard linear state space model with homoscedastic errors, the second
retains the linear structure but incorporates a dynamic form of heteroscedasticity, and the third
allows for non-linear structure in the observation equation as well as heteroscedasticity. Exact
matrix formulas for the forecast variances are found for each of these three classes of models.
These formulas are specialized to non-matrix formulas for fifteen state space models that underlie
nine exponential smoothing methods, including all the widely used methods. In cases where an
ARIMA model also underlies an exponential smoothing method, there is an equivalent state
space model with the same variance expression. We also discuss relationships between these
new ideas and previous suggestions for finding forecast variances and prediction intervals for the
exponential smoothing methods.
Keywords: forecast distribution, forecast interval, forecast variance, Holt-Winters method,
structural models.
JEL classification: C22, C53.
1
Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia.
2
Department of Decision Sciences and Management Information Systems, Miami University, Oxford, OH 45056,
USA.
3
320 Old North, Georgetown University, Washington, DC 20057, USA.
Corresponding author: Rob Hyndman (Rob.Hyndman@buseco.monash.edu.au).
1
Prediction intervals for exponential smoothing state space models
Exponential smoothing methods were given a firm statistical foundation by the use of state space
models with a single source of error (Ord, Koehler, and Snyder, 1997). One of the important
contributions following from that work is the ability to provide a sound statistical basis for finding
prediction intervals for all the exponential smoothing methods. Traditionally, prediction intervals
for the exponential smoothing methods have been found through heuristic approaches or by
employing equivalent or approximate ARIMA models.
The major goal of this paper is to provide analytical expressions for the variances of the forecast
errors to be used in computing prediction intervals for many types of exponential smoothing,
including all of the widely used methods. In contrast Ord, Koehler, and Snyder (1997) found pre-
diction intervals by using the model to simulate the entire prediction distributions for each future
time period. While simulating prediction intervals may be an excellent method for producing
them, many forecasters may prefer analytical formulas for their forecasting software. Hyndman
et al. (2001) describe a framework of 24 models for exponential smoothing, including all of the
usual methods as well as some extensions. The procedures in that paper also use simulation
to produce prediction intervals for the models. We will provide analytical expressions for the
forecast variances for some of those 24 models.
Where an equivalent ARIMA model exists (such as for simple exponential smoothing, Holt’s lin-
ear method, and the additive Holt-Winters method), our results provide identical forecast vari-
ances to those from the ARIMA model. However, we also provide forecast variances for many
exponential smoothing methods where there is no equivalent ARIMA model.
State space models with multiple sources of error have also been used to find forecast variances
for the simple and Holt exponential smoothing methods (Johnston and Harrison, 1986). In these
cases the variances are limiting values in models where the convergence is rapid. The variance
formulas in these two cases are the same as in our results.
Prediction intervals for the additive Holt-Winters method and the multiplicative Holt-Winters
method have previously been considered by Chatfield and Yar. For the additive Holt-Winters
method they found an exact formula for the forecast variance that can be computed directly from
the form of the smoothing method (Yar and Chatfield, 1990). For the multiplicative Holt-Winters
method, they provided an approximate formula (Chatfield and Yar, 1991). In both papers they
assumed that the one-period ahead forecast errors are independent but they did not assume any
particular underlying model for the smoothing methods.
Using a single source of error state space model, Koehler, Ord, and Snyder (2001) derived an
approximate formula for the forecast variance for the multiplicative Holt-Winters method. Their
formula differs from that of Chatfield and Yar (1991) only in how the standard deviation of the
one-step-ahead forecast error is estimated. The variance formulas were given only for the first
year of forecasts in both of these papers (Chatfield and Yar, 1991; Koehler, Ord, and Snyder, 2001).
The results in this current paper include finding both an exact formula (ignoring the estimation
error for the smoothing parameters) for the forecast variance in all future time periods for the
multiplicative Holt-Winters method and a better approximation to this exact formula. Another
point of difference in our work is that Yar and Chatfield (1990) assumed that the variance of the
one-period-ahead forecast error is constant for the additive Holt-Winters method. We include a
class of models where this forecast variance is not constant but instead changes with the mean of
the time series.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
2
Prediction intervals for exponential smoothing state space models
In Section 1 we present the main results of the paper. We use the classification of exponential
smoothing methods from Hyndman et al. (2001) and show the relationship to three general classes
of state space models for exponential smoothing. We present formulas for the h-period-ahead
means (i.e., forecasts) and forecast variances for fifteen specific exponential smoothing models
that correspond to nine exponential smoothing methods, including the most widely used ones.
In Sections 2–4, we examine each of the three general classes of models more closely. We pro-
vide general matrix formulas for the means and variances and then specialize these formulas to
non-matrix expressions for specific exponential models. Proofs for these results are provided in
appendices. For the Class 3 models, the non-matrix expression is an approximation. Thus, we
devote Section 5 to the accuracy of this approximation.
Finally, we provide an example in Section 6 that gives forecasts and prediction intervals for the
multiplicative Holt-Winters method. Using this example, we compare our exact forecast vari-
ances with approximations and compare prediction intervals obtained by using our exact expres-
sion with ones obtained by simulating complete prediction distributions.
1. The main results
We describe the exponential smoothing methods using a similar framework to that proposed in
Hyndman et al. (2001). Each method is denoted by two letters: the first letter denotes the type
of trend (none, additive, or damped) and the second letter denotes the type of seasonality (none,
additive or multiplicative).
Seasonal Component
Trend N A M
Component (none) (additive) (multiplicative)
N (none) NN NA NM
A (additive) AN AA AM
D (damped) DN DA DM
Cell NN describes the simple exponential smoothing method, cell AN describes Holt’s linear
method. The additive Holt-Winters’ method is given by cell AA and the multiplicative Holt-
Winters’ method is given by cell AM. The other cells correspond to less commonly used but
analogous methods.
Hyndman et al. (2001) proposed two state space models for each of these methods: one with
additive errors and one with multiplicative errors. To distinguish these models, we will add a
third letter (A or M) before the letters denoting the type of trend and seasonality. For example,
MAN refers to a model with multiplicative errors, additive trend and no seasonality.
We consider three classes of state space models. In all cases, we use the Single Source of Error
(SSOE) model as formulated by Snyder (1985) and used in later work (e.g., Ord et al., 1997; Hyn-
dman et al., 2001). The first class is the usual state space form: we specify linear relationships in
both the observation and state equations and assume constant error variances. The second class
Hyndman, Koehler, Ord and Snyder: 9 November 2001
3
Prediction intervals for exponential smoothing state space models
retains the linear structure but introduces dynamic heteroscedasticity among the errors in a way
that is natural for state space processes. Finally, in the third class, we allow a special form of non-
linearity in the observation equation (additive and multiplicative relationships among the state
variables) as well as dynamic heteroscedasticity. The second and third classes are not contained
within the ARIMA class, although the second class could be formulated as a kind of GARCH
model. The third class is not covered by either ARIMA or GARCH structures, but is important
as a stochastic description of non-linear forecasting schemes such as Holt-Winters multiplicative
method (cf. Makridakis et al., 1998, pp.161–69).
Let Y
1
, . . . , Y
n
denote the time series of interest. The three classes of models may be defined as:
Class 1 Y
t
= Hx
t1
+ ε
t
x
t
= Fx
t1
+ Gε
t
Class 2 Y
t
= Hx
t1
(1 + ε
t
)
x
t
= (F + Gε
t
)x
t1
Class 3 Y
t
= H
1
x
t1
H
2
z
t1
(1 + ε
t
)
x
t
= (F
1
+ G
1
ε
t
)x
t1
z
t
= (F
2
+ G
2
ε
t
)z
t1
where F, G, H, F
1
, F
2
, G
1
, G
2
, H
1
and H
2
are all matrix coefficients, and x
t
and z
t
are unobserved
state vectors at time t. In each case, {ε
t
} is iid N(0, σ
2
). Let p be the length of vector x
t
and q be
the length of vector z
t
. Then the orders of the above matrices are as follows.
Class 1 F (p × p) G (p × 1) H (1 × p)
Class 2 F (p × p) G (p × p) H (1 × p)
Class 3 F
1
(p × p) G
1
(p × p) H
1
(1 × p)
F
2
(q × q) G
2
(q × q) H
2
(1 × q)
Fifteen of the 18 models described above fall within the three state space model classes above:
Class 1 ANN AAN ADN ANA AAA ADA
Class 2 MNN MAN MDN MNA MAA MDA
Class 3 MNM MAM MDM
The remaining three models (ANM, AAM and ADM) do not fit within one of these three classes,
and will not be considered further in this paper. Hyndman et al. (2001) also consider six additional
models with multiplicative trend which fall outside the three state space model classes defined
above. Note that the above 15 models include two models for simple exponential smoothing, two
models for Holt’s method, two models for the additive Holt-Winters’ method and one model for
the multiplicative Holt-Winters’ method.
Equations for the 15 models above are given in Table 1 using the same notation as in Hyndman et
al. (2001). As in that paper, we use the Single Source of Error (SSOE) model in our developments.
That is, all the observation and state variables are driven by the single error sequence ε
t
. For
Hyndman, Koehler, Ord and Snyder: 9 November 2001
4
Prediction intervals for exponential smoothing state space models
Class 1
ANN Y
t
= `
t1
+ ε
t
ANA Y
t
= `
t1
+ s
tm
+ ε
t
`
t
= `
t1
+ αε
t
`
t
= `
t1
+ αε
t
s
t
= s
tm
+ γε
t
.
AAN Y
t
= `
t1
+ b
t1
+ ε
t
AAA Y
t
= `
t1
+ b
t1
+ s
tm
+ ε
t
`
t
= `
t1
+ b
t1
+ αε
t
`
t
= `
t1
+ b
t1
+ αε
t
b
t
= b
t1
+ αβε
t
b
t
= b
t1
+ αβε
t
s
t
= s
tm
+ γε
t
.
ADN Y
t
= `
t1
+ b
t1
+ ε
t
ADA Y
t
= `
t1
+ b
t1
+ s
tm
+ ε
t
`
t
= `
t1
+ b
t1
+ αε
t
`
t
= `
t1
+ b
t1
+ αε
t
b
t
= φb
t1
+ αβε
t
b
t
= φb
t1
+ αβε
t
s
t
= s
tm
+ γε
t
.
Class 2
MNN Y
t
= `
t1
(1 + ε
t
) MNA Y
t
= (`
t1
+ s
tm
)(1 + ε
t
)
`
t
= `
t1
(1 + αε
t
). `
t
= `
t1
+ α(`
t1
+ s
tm
)ε
t
s
t
= s
tm
+ γ(`
t1
+ s
tm
)ε
t
.
MAN Y
t
= (`
t1
+ b
t1
)(1 + ε
t
) MAA Y
t
= (`
t1
+ b
t1
+ s
tm
)(1 + ε
t
)
`
t
= (`
t1
+ b
t1
)(1 + αε
t
) `
t
= `
t1
+ b
t1
+ α(`
t1
+ b
t1
+ s
tm
)ε
t
b
t
= b
t1
+ αβ(`
t1
+ b
t1
)ε
t
b
t
= b
t1
+ αβ(`
t1
+ b
t1
+ s
tm
)ε
t
s
t
= s
tm
+ γ(`
t1
+ b
t1
+ s
tm
)ε
t
.
MDN Y
t
= (`
t1
+ b
t1
)(1 + ε
t
) MDA Y
t
= (`
t1
+ b
t1
+ s
tm
)(1 + ε
t
)
`
t
= (`
t1
+ b
t1
)(1 + αε
t
) `
t
= `
t1
+ b
t1
+ α(`
t1
+ b
t1
+ s
tm
)ε
t
b
t
= φb
t1
+ αβ(`
t1
+ b
t1
)ε
t
b
t
= φb
t1
+ αβ(`
t1
+ b
t1
+ s
tm
)ε
t
s
t
= s
tm
+ γ(`
t1
+ b
t1
+ s
tm
)ε
t
.
Class 3
MNM Y
t
= `
t1
s
tm
(1 + ε
t
)
`
t
= `
t1
(1 + αε
t
)
s
t
= s
tm
(1 + γε
t
)
MAM Y
t
= (`
t1
+ b
t1
)s
tm
(1 + ε
t
)
`
t
= (`
t1
+ b
t1
)(1 + αε
t
)
b
t
= b
t1
+ αβ(`
t1
+ b
t1
)ε
t
s
t
= s
tm
(1 + γε
t
).
MDM Y
t
= (`
t1
+ b
t1
)s
tm
(1 + ε
t
)
`
t
= (`
t1
+ b
t1
)(1 + αε
t
)
b
t
= φb
t1
+ αβ(`
t1
+ b
t1
)ε
t
s
t
= s
tm
(1 + γε
t
).
Table 1: Equations defining each of the 15 models.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
5
Prediction intervals for exponential smoothing state space models
Mean Variance
Class 1 µ
h
v
1
= σ
2
and v
h
= σ
2
1 +
h1
j=1
c
2
j
Class 2 µ
h
v
h
= (1 + σ
2
)θ
h
µ
2
h
where θ
1
= µ
2
1
and θ
h
= µ
2
h
+ σ
2
h1
j=1
c
2
j
θ
hj
Class 3 µ
h
v
h
= s
2
nm+h
θ
h
(1 + σ
2
)(1 + γ
2
σ
2
)
k
˜
µ
2
h
where θ
1
=
˜
µ
2
1
, θ
h
=
˜
µ
2
h
+ σ
2
h1
j=1
c
2
j
θ
hj
and k = b(h 1)/mc.
Table 2: h-period-ahead forecast means and variances. Here buc denotes the largest integer less than or
equal to u and m denotes the period of seasonality. For Class 3, the expression is exact when h m but
only approximate for h > m.
µ
h
˜
µ
h
c
j
Class 1/Class 2
ANN/MNN `
n
α
AAN/MAN `
n
+ hb
n
α (1 + jβ)
ADN/MDN `
n
+ φ
h1
b
n
α
1 + φ
j1
β
ANA/MNA `
n
+ s
nm+1+(h1)
α + γd
j,m
AAA/MAA `
n
+ hb
n
+ s
nm+1+(h1)
α (1 + jβ) + γd
j,m
ADA/MDA `
n
+ φ
h1
b
n
+ s
nm+1σ(h1)
α
1 + φ
j1
β
+ γd
j,m
Class 3
MNM `
n
s
nm+1+(h1)
`
n
α
MAM (`
n
+ hb
n
) s
nm+1+(h1)
`
n
+ hb
n
α (1 + jβ)
MDM (`
n
+ φ
h1
b
n
) s
nm+1+(h1)
`
n
+ φ
h1
b
n
α
1 + φ
j1
β
Table 3: Values of µ
h
,
˜
µ
h
and c
j
for the 15 models. Here φ
j
= 1 + φ + ··· + φ
j
= (1 φ
j+1
)/(1 φ),
d
j,m
= 1 if j = m (mod m) and 0 otherwise, and j
= j (mod m).
Hyndman, Koehler, Ord and Snyder: 9 November 2001
6
Prediction intervals for exponential smoothing state space models
development of this approach, see Snyder (1985) and Ord et al. (1997). The variables `
t
, b
t
and
s
t
are elements of the state vector and denote the level, slope and seasonal components respec-
tively; the parameters α, β and γ are the usual smoothing parameters corresponding to the level
equation, trend equation and seasonal equation; φ is a damping coefficient used for the damped
trend models; and m denotes the number of seasons in a year.
We derive the forecast means and variances for each of the three model classes, and specifically
for each of the 15 models. The forecast mean for Y
n+h
made h steps ahead from forecast origin n is
denoted by µ
h
= E(Y
n+h
| x
n
) and the corresponding forecast variance is given by v
h
= Var(Y
n+h
|
x
n
).
The main results are summarized in Tables 2 and 3.
Criteria such as maximum likelihood for selection of optimal estimates for the parameters can be
found in Hyndman et al. (2001) and Ord et al. (1997). It is important to notice that estimates for
σ
2
are not done in the same manner for all three classes. The estimate for σ
2
would be
ˆ
σ
2
=
n
t=1
ˆ
ε
2
t
/n
where
ˆ
ε
t
=
(
Y
t
ˆ
Y
t1
(1) for Class 1;
(Y
t
ˆ
Y
t1
(1))/
ˆ
Y
t1
(1) for Classes 2 and 3;
and
Y
t
(1) = E( Y
t+1
| x
t
) =
(
Hx
t
for Classes 1 and 2;
H
1
x
t
H
2
z
t
for Class 3.
For the special cases in Table 2, Y
n
(1) = µ
1
.
More detail concerning the results for each class are given in the following sections. Derivations
of these results are given in the Appendices.
2. Class 1
Derivations are given in Appendix A.
In this case, the general results for the mean and variance are
µ
h
= HF
h1
x
n
, (1)
v
1
= σ
2
, (2)
and v
h
= σ
2
1 +
h1
j=1
c
2
j
, h 2, (3)
where c
j
= HF
j1
G. Specific values for µ
h
and c
j
for the particular models in Class 1 are given in
Tables 2 and 3.
Note that point forecasts from ANN are equivalent to simple exponential smoothing (SES) and
AAN gives forecasts equivalent to Holt’s method. SES with drift is obtained from AAN by setting
Hyndman, Koehler, Ord and Snyder: 9 November 2001
7
Prediction intervals for exponential smoothing state space models
β = 0 so that b
n
= b for all n. The additive Holt-Winters’ method is equivalent to the point
forecasts from AAA. Furthermore, ANN is equivalent to an ARIMA(0,1,1) model where θ = 1 α,
AAN is equivalent to an ARIMA(0,2,2) model and AAA is equivalent to an ARIMA[ 0, (1, m), m +
1] model where (1, m) denotes differences of orders 1 and m (McKenzie, 1976; Roberts, 1982).
The expressions for v
h
can be simplified as shown below.
ANN v
h
= σ
2
1 + α
2
(h 1)
AAN v
h
= σ
2
h
1 + α
2
(h 1)
1 + βh +
1
6
β
2
h(2h 1)
i
ADN v
h
= σ
2
h
1 + α
2
(h 1) + α
2
β
h1
j=1
φ
j1
(2 + φ
j1
β)
i
ANA v
h
= σ
2
h
1 + α
2
(h 1) + γ(2α + γ)b(h 1)/mc
i
AAA v
h
= σ
2
h
1 + α
2
(h 1)
1 + βh +
1
6
β
2
h(2h 1)
+ γk
γ + α[2 + βm(k + 1)]
i
where k = b(h 1 )/mc
ADA v
h
= σ
2
h
1 +
h1
j=1
α
2
(1 + φ
j1
β)
2
+ γd
j,m
[γ + 2α(1 + φ
j1
β)]
i
3. Class 2
Derivations are given in Appendix B.
In this case, the general result for the forecast mean is the same as for Model 1, namely
µ
h
= HF
h1
x
n
.
(4)
The forecast variance is given by
v
h
= HV
h1
H
0
(1 + σ
2
) + σ
2
µ
2
h
(5)
where
V
h
= F V
h1
F
0
+ σ
2
GV
h1
G
0
+ σ
2
P
h1
, h = 1, 2, . . . , (6)
V
0
= O, and P
j
= GF
j
x
n
x
0
n
(F
j
)
0
G
0
.
For the six models we consider in this class, we obtain the following simpler expression
v
h
= (1 + σ
2
)θ
h
µ
2
h
where θ
1
= µ
2
1
and
θ
h
= µ
2
h
+ σ
2
h1
j=1
c
2
j
θ
hj
(7)
Hyndman, Koehler, Ord and Snyder: 9 November 2001
8
Prediction intervals for exponential smoothing state space models
and c
j
depends on the particular model. Note that c
j
is identical to that for the corresponding
additive error model from Class 1. Specific values for µ
h
and c
j
for the particular models in
Class 2 are given in Tables 2 and 3.
Note that point forecasts from MNN are equivalent to simple exponential smoothing (SES) but
that the variances are different from ANN. Similarly, MAN gives point forecasts equivalent to
Holt’s method but with different variances from AAN and MAA gives point forecasts equivalent
to the additive Holt-Winter’s method but with different variances from AAA.
In the case of MNN, a non-recursive expression for v
h
can be obtained:
v
h
= `
2
n
h
(1 + α
2
σ
2
)
h1
(1 + σ
2
) 1
i
.
4. Class 3
Derivations are given in Appendix C.
For models in this class,
µ
h
= H
1
M
h1
H
0
2
(8)
and
v
h
= (1 + σ
2
)(H
2
H
1
)V
h1
(H
2
H
1
)
0
+ σ
2
µ
2
h
(9)
where denotes a Kronecker product, M
0
= x
n
z
0
n
, V
0
= O
2m
, and for h 1,
M
h
= F
1
M
h1
F
0
2
+ G
1
M
h1
G
0
2
σ
2
(10)
and
V
h
= (F
2
F
1
)V
h1
(F
2
F
1
)
0
+ σ
2
h
(F
2
F
1
)V
h1
(G
2
G
1
)
0
+ (G
2
G
1
)V
h1
(F
2
F
1
)
0
i
+ σ
2
(G
2
F
1
+ F
2
G
1
)
h
V
h1
+ vecM
h1
(vecM
h1
)
0
i
(G
2
F
1
+ F
2
G
1
)
0
+ σ
4
(G
2
G
1
)
h
3V
h1
+ 2vecM
h1
(vecM
h1
)
0
i
(G
2
G
1
)
0
. (11)
Note, in particular, that µ
1
= (H
1
x
n
)(H
2
z
n
)
0
and v
1
= σ
2
µ
2
1
.
Because σ
2
is usually small (much less than 1), approximate expressions for the mean and vari-
ance can be obtained:
µ
h
= µ
1,h1
µ
2,h1
+ O(σ
2
)
v
h
(1 + σ
2
)(v
1,h1
+ µ
2
1,h1
)(v
2,h1
+ µ
2
2,h1
) µ
2
1,h1
µ
2
2,h1
where µ
1,h
= H
1
F
h
1
x
n
, µ
2,h
= H
2
F
h
2
z
n
, v
1,h
= Var(H
1
x
n+h
| x
n
) and v
2,h
= Var(H
2
z
n+h
| z
n
).
In the three special cases we consider, these expressions can be written as
µ
h
=
˜
µ
h
s
nm+1+(h1)
+ O(σ
2
) (12)
and v
h
s
2
nm+1+(h1)
h
θ
h
(1 + σ
2
)(1 + γ
2
σ
2
)
k
˜
µ
2
h
i
(13)
Hyndman, Koehler, Ord and Snyder: 9 November 2001
9
Prediction intervals for exponential smoothing state space models
where k = [h 1/ m], θ
1
=
˜
µ
2
1
, and
θ
h
=
˜
µ
2
h
+ σ
2
h1
j=1
c
2
j
θ
hj
, h 2.
These expressions are exact for h m. Specific values for µ
h
,
˜
µ
h
and c
j
for the particular models
in Class 3 are given in Tables 2 and 3.
Note that the usual point forecasts for these models are given by (12) rather than (8). Also, the
point forecasts from MAM are equivalent to the multiplicative Holt-Winters method.
For the MNM model, a simpler expression for v
h
is available:
v
h
s
2
nm+1+(h1)
h
(1 + α
2
σ
2
)
h1
(1 + σ
2
)(1 + γ
2
σ
2
)
k
`
2
n
i
.
(The expression is exact for h m.)
5. The accuracy of the approximations
In order to investigate the accuracy of the approximations for the mean (12) and standard de-
viation (13) to the exact expressions in (8) and (9), we provide some comparisons for the MAM
model in Class 3.
These comparisons are done for quarterly data where the values for the components are assumed
to be the following: `
n
= 100, b
n
= 2, s
n
= 0.80, s
n1
= 1.20, s
n1
= 0.90, s
n1
= 1.10. We use
the following base level values for the parameters: α = 0.2, β = 0.3 (i.e., αβ = 0.06), γ = 0.1, and
σ = 0.05. We vary these parameters one at a time as shown in Table 4.
The results in Table 4 show that the mean and approximate mean are always very close and that
the percentage difference in the standard deviations only becomes substantial when we increase
γ. This result for the standard deviation is not surprising because the approximation is exact if
γ = 0. In fact, we recommend that the approximation not be used if the smoothing parameter for
γ exceeds 0.10.
6. Example
As a numerical example, we consider the quarterly sales data given in Makridakis, Wheelwright
and Hyndman (1998, p.162) and use the multiplicative Holt-Winters’ method (model MAM).
Following the approach outlined in Hyndman et al (2001), we estimate the parameters to be
α = 0.8, β = 0.1, γ = 0.1 and σ = 0.0384 with the final states `
n
= 757.2, b
n
= 17.6, z
n
=
(0.873, 1.146, 1.031, 0.958)
0
.
Figure 1 shows the forecast standard deviations calculated exactly using (9) and approximately
using (13). We also show the approximation suggested by Koehler, Snyder and Ord (2001) for
1 h m. Clearly, both approximations are very close to the exact values in this case (because
σ
2
is so small here).
Hyndman, Koehler, Ord and Snyder: 9 November 2001
10
Prediction intervals for exponential smoothing state space models
Period Approximate Approximate SD percent
ahead Mean (8) Mean (12) SD (9) SD (13) Difference
h µ
h
v
h
σ = 0.05, α = 0.2, αβ = 0.06, γ = 0.1
5 121.01 121.00 7.53 7.33 2.69
6 100.81 100.80 6.68 6.52 2.37
7 136.81 136.80 9.70 9.50 2.07
8 92.81 92.80 7.06 6.93 1.80
9 129.83 129.80 10.85 10.45 3.68
10 108.03 108.00 9.65 9.34 3.21
11 146.44 146.40 13.99 13.60 2.81
12 99.22 99.20 10.13 9.88 2.47
σ = 0.1, α = 0.2, αβ = 0.06, γ = 0.1
5 121.05 121.00 15.09 14.68 2.73
6 100.84 100.80 13.39 13.07 2.40
7 136.86 136.80 19.45 19.04 2.11
8 92.84 92.80 14.15 13.89 1.84
9 129.93 129.80 21.77 20.96 3.75
10 108.11 108.00 19.39 18.75 3.29
11 146.55 146.40 28.11 27.30 2.89
12 99.30 99.20 20.35 19.83 2.55
σ = 0.05, α = 0.6, αβ = 0.06, γ = 0.1
5 121.02 121.00 10.87 10.60 2.47
6 100.82 100.80 9.96 9.76 2.04
7 136.83 136.80 14.76 14.51 1.72
8 92.82 92.80 10.86 10.70 1.47
9 129.86 129.80 16.64 16.19 2.71
10 108.05 108.00 14.83 14.48 2.37
11 146.46 146.40 21.45 21.00 2.09
12 99.24 99.20 15.45 15.16 1.86
σ = 0.05, α = 0.2, αβ = 0.18, γ = 0.1
5 121.03 121.00 10.19 9.87 3.08
6 100.82 100.80 9.88 9.66 2.27
7 136.83 136.80 15.55 15.29 1.69
8 92.82 92.80 12.14 11.98 1.28
9 129.87 129.80 19.67 19.16 2.56
10 108.06 108.00 18.41 18.04 2.03
11 146.48 146.40 27.86 27.41 1.64
12 99.26 99.20 20.93 20.65 1.35
σ = 0.05, α = 0.2, αβ = 0.06, γ = 0.3
5 121.04 121.00 8.10 7.53 7.12
6 100.83 100.80 7.13 6.68 6.36
7 136.84 136.80 10.28 9.70 5.64
8 92.83 92.80 7.42 7.05 4.97
9 129.90 129.80 11.89 10.77 9.46
10 108.08 108.00 10.47 9.59 8.42
11 146.51 146.40 15.04 13.91 7.49
12 99.27 99.20 10.79 10.07 6.67
Table 4: Comparison of exact and approximate means and standard deviations for MAM model in Class 3
(i.e., (8) and (9) versus (12) and (13)).
Hyndman, Koehler, Ord and Snyder: 9 November 2001
11
Prediction intervals for exponential smoothing state space models
Forecast horizon
Forecast standard deviation
2 4 6 8 10 12
40 60 80 100 120 140
Exact
Small sigma approximation
KSO approximation
Figure 1: Forecast standard deviations calculated (a) exactly using (9); (b) approximately using (13); and
(c) using the approximation suggested by Koehler, Snyder and Ord (2001) for 1 h m.
Quarter
0 5 10 15 20 25 30 35
400 600 800 1000 1200 1400
Percentile−based interval
Variance−based interval
Figure 2: Quarterly sales data with three years of forecasts. The solid lines show prediction intervals calcu-
lated as µ
h
±1.96
v
h
and the dotted lines show prediction intervals computed by generating 20,000 future
sample paths from the fitted model and finding the 2.5% and 97.5% quantiles at each forecast horizon.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
12
Prediction intervals for exponential smoothing state space models
The data with three years of forecasts are shown in Figure 2. In this case, the conditional mean
forecasts obtained from model MAM are virtually indistinguishable from the usual forecasts be-
cause σ is so small (they are identical up to h = m). The solid lines show prediction intervals
calculated as µ
h
± 1.96
v
h
and the dotted lines show prediction intervals computed by generat-
ing 20,000 future sample paths from the fitted model and finding the 2.5% and 97.5% quantiles at
each forecast horizon. Clearly, the variance-based intervals are a good approximation despite the
non-normality of the forecast distributions.
7. Summary
For three general classes of state space models, we have provided derivations of exact matrix
expressions for the means and variances of prediction distributions. These general results are
presented separately in a section for each class with the derivations put in three separate ap-
pendices. We relate these three classes of state space models to the commonly used exponential
smoothing methods (simple, Holt, and additive and multiplicative Holt-Winters) and to other
known exponential smoothing methods (Hyndman et al, 2001). We provide a summary of these
models and the corresponding non-matrix expressions of the means and variances in Tables 1, 2
and 3. These means and variances may be used to construct analytical prediction intervals when
using the exponential smoothing methods for forecasting.
The non-matrix formulas for the Class 3 models are not exact for h > m. In Table 4 we compare
our exact matrix formulas with our approximate formulas for the model that corresponds to the
multiplicative Holt-Winters method (MAM). We find that the approximation is very good as long
as the smoothing parameter for the seasonal component remains small (i.e., less than 0.1). We
also consider an example in which we compare our forecast standard deviations and prediction
intervals with the values from some of the previously used approaches.
In summary, we have provided, for the first time, exact analytical formulas for the variances of
prediction distributions for all the exponential smoothing methods. More generally, we have ex-
act formulas for variances of the general state space models of which the exponential smoothing
models are special cases. Where possible, we have presented both matrix and non-matrix expres-
sions.
Simulation methods have been the only comprehensive approach to handling the prediction dis-
tribution problem for all exponential smoothing methods to date. Our formulas provide an effec-
tive alternative, the advantage being that they involve much lower computational loads
Hyndman, Koehler, Ord and Snyder: 9 November 2001
13
Prediction intervals for exponential smoothing state space models
Appendix A: Proofs of results for Class 1
Let
m
h
= E(x
n+h
| x
n
)
and V
h
= Var(x
n+h
| x
n
)
Note that m
0
= x
n
and V
0
= O.
For Class 1
m
h
= Fm
h1
= F
2
m
h2
= ··· = F
h
m
0
= F
h
x
n
and therefore
µ
h
= Hm
h1
= HF
h1
x
n
.
The state forecast variance is given by
V
h
= FV
h1
F
0
+ GG
0
σ
2
and therefore
V
h
= σ
2
h1
j=0
F
j
GG
0
(F
j
)
0
.
Hence, the prediction variance for h periods ahead is
v
h
= HV
h1
H
0
+ σ
2
=
σ
2
if h = 1;
σ
2
h
1 +
h1
j=1
c
2
j
i
if h 2;
where c
j
= HF
j1
G.
We now consider the particular cases.
ADA
We first derive the results for the ADA case. Here the state is x
n
= (`
n
, b
n
, s
n
, s
n1
, . . . , s
nm+1
)
0
,
H = [1 1 0
0
m1
1], F =
1 1 0
0
m1
0
0 φ 0
0
m1
0
0 0 0
0
m1
1
0
m1
0
m1
I
m1
0
m1
and G =
α
αβ
γ
0
m1
where I
n
denotes the n × n identity matrix and 0
n
denotes a zero vector of length n.
Therefore HF
i
= [1, φ
i
, d
i +1,m
, d
i +2,m
, . . . , d
i +m,m
]
0
where φ
i
= 1 + φ + ··· + φ
i
and d
j,m
= 1 if j = m
(mod m) and d
j,m
= 0 otherwise. Hence we find c
j
= HF
j1
G = α(1 + φ
j1
β) + γd
j,m
,
µ
h
= `
n
+ φ
h1
b
n
+ s
nm+1+(h1)
Hyndman, Koehler, Ord and Snyder: 9 November 2001
14
Prediction intervals for exponential smoothing state space models
and for h 2,
v
h
= σ
2
(
1 +
h1
j=1
[α(1 + φ
j1
β) + γd
j,m
]
2
)
= σ
2
1 +
h1
j=1
α
2
(1 + φ
j1
β)
2
+ γd
j,m
[γ + 2α(1 + φ
j1
β)]
.
These formulas agree with those of Yar and Chatfield (1990) except that we apply the dampening
parameter φ beginning in second forecast time period, n + 2, instead of in the first forecast time
period, n + 1.
Other cases
All other cases of Class 1 can be derived as special cases of ADA.
For ADN, we use the results of ADA with γ = 0 and s
t
= 0 for all t.
For AAN, we use the results of ADN with φ = 1.
The results for ANN are obtained from AAN by further setting β = 0 and b
t
= 0 for all t.
For AAA, the results of ADA hold with φ = 1.
The results for ANA are obtained as a special case of AAA with β = 0 and b
t
= 0 for all t.
Appendix B: Proofs of results for Class 2
Let m
h
and V
h
be defined as in Appendix A. The forecast means for Class 2 have the same form
as for Class 1, namely
µ
h
= Hm
h1
= HF
h1
x
n
.
To obtain V
h
, first note that V
h
= FV
h1
F
0
+ GVar(x
n+h1
ε
n+h
)G
0
and that
Var(x
n+h1
ε
n+h
) = E[x
n+h1
x
0
n+h1
]E(ε
2
n+h
) 0 = σ
2
[V
h1
+ m
h1
m
0
h1
].
Therefore
V
h
= FV
h1
F
0
+ σ
2
GV
h1
G
0
+ σ
2
P
h1
.
where P
j
= GF
j
x
n
x
0
n
(F
j
)
0
G
0
.
The forecast variance is given by
v
h
= HV
h1
H
0
(1 + σ
2
) + σ
2
Hm
h1
m
0
h1
H
0
.
= HV
h1
H
0
(1 + σ
2
) + σ
2
µ
2
h
.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
15
Prediction intervals for exponential smoothing state space models
In the special case where G = QH we obtain a simpler result. In this case, x
t
= Fx
t1
+ Qe
t
where e
t
= y
t
Hx
t1
= Hx
t1
ε
t
. Thus, we obtain the linear exponential smoothing updating
rule x
t
= Fx
t1
+ Q(y
t
Hx
t1
). Define θ
h
such that Var(e
n+h
| x
n
) = θ
h
σ
2
. Then it is readily seen
that V
h
= FV
h1
F
0
+ QQ
0
Var(e
n+h
| x
n
) and so, by repeated substitution,
V
h
= σ
2
h1
j=0
F
j
QQ
0
(F
j
)
0
θ
hj
and
HV
h1
H
0
= σ
2
h1
j=1
c
2
j
θ
hj
(14)
where c
j
= HF
j1
Q. Now
e
n+h
= (H( x
n+h1
m
h1
) + Hm
h1
) ε
n+h
which we square and take expectations to give θ
h
= HV
h1
H
0
+ µ
2
h
. Substituting (14) into this
expression for θ
h
gives
θ
h
= σ
2
h1
j=1
c
2
j
θ
hj
+ µ
2
h
(15)
where θ
1
= µ
2
1
. The forecast variance is then given by
v
h
= (1 + σ
2
)θ
h
µ
2
h
. (16)
We now consider the particular cases.
MDA
We first derive the results for the MDA case. Here the state is x
t
= (`
t
, b
t
, s
t
, s
t1
, . . . , s
tm+1
)
0
,
H = [1, 1, 0, . . . , 0, 1],
F =
1 1 0
0
m1
0
0 φ 0
0
m1
0
0 0 0
0
m1
1
0
m1
0
m1
I
m1
0
m1
and G =
α α 0
0
m1
α
αβ αβ 0
0
m1
αβ
γ γ 0
0
m1
γ
0
m1
0
m1
O
m1
0
m1
.
Then from (4) we obtain µ
h
= `
n
+ φ
h1
b
n
+ s
nm+1+(h1)
where φ
i
= 1 + φ + ··· + φ
i
and j
=
j mod m.
To obtain the expression for v
h
, note that this model satisfies the special case G = QH where Q =
[α, αβ, γ, 0
0
m1
]
0
. Thus we can use the expression (16) where c
j
= HF
j1
Q = α(1 + φ
j1
β) + γd
j,m
(the same as c
j
for the corresponding model from Class 1).
Other cases
All other cases of Class 2 can be derived as special cases of MDA.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
16
Prediction intervals for exponential smoothing state space models
For MDN, we use the results of MDA with γ = 0 and s
t
= 0 for all t.
For MAN, we use the results of MDN with φ = 1.
For MAA, the results of MDA hold with φ = 1.
The results for MNA are obtained as a special case of MAA with β = 0 and b
t
= 0 for all t.
The results for MNN are obtained from MAN by further setting β = 0 and b
t
= 0 for all t.
In this case, a simpler expression for v
h
can be obtained. Note that c
j
= α, θ
1
= `
2
n
and for
j 2,
θ
j
= `
2
n
+ σ
2
α
2
j1
i =1
θ
ji
= `
2
n
+ α
2
σ
2
(θ
1
+ θ
2
+ ··· + θ
j1
)
Hence
θ
j
= `
2
n
(1 + α
2
σ
2
)
j1
and
v
h
= `
2
n
h
(1 + α
2
σ
2
)
h1
i
(1 + σ
2
) `
2
n
= `
2
n
h
(1 + α
2
σ
2
)
h1
(1 + σ
2
) 1
i
.
Appendix C: Proofs of results for Class 3
Note that we can write Y
t
as
Y
t
= H
1
x
t1
z
0
t1
H
0
2
(1 + ε
t
).
So let W
h
= x
n+h
z
0
n+h
, M
h
= E(W
h
| x
n
, z
n
) and V
h
= Var(W
h
| x
n
, z
n
) where (by standard
definitions)
V
h
= Var(vecW
h
| x
n
, z
n
), and vecA =
a
1
a
2
.
.
.
a
r
where matrix A = [a
1
a
2
··· a
r
] .
Note that
W
h
= (F
1
x
n+h1
+ G
1
x
n+h1
ε
n+h
)(z
0
n+h1
F
0
2
+ z
0
n+h1
G
0
2
ε
n+h
)
= F
1
W
h1
F
0
2
+ (F
1
W
h1
G
0
2
+ G
1
W
h1
F
0
2
)ε
n+h
+ G
1
W
h1
G
0
2
ε
2
n+h
It follows that M
0
= x
n
z
0
n
and
M
h
= F
1
M
h1
F
0
2
+ G
1
M
h1
G
0
2
σ
2
. (17)
For the variance of W
h
, we find V
0
= 0, and
V
h
= Var{vec(F
1
W
h1
F
0
2
) + vec(F
1
W
h1
G
0
2
+ G
1
W
h1
F
0
2
)ε
n+h
+ vec(G
1
W
h1
G
0
2
)ε
2
n+h
}
= (F
2
F
1
)V
h1
(F
2
F
1
)
0
+ (G
2
F
1
+ F
2
G
1
)Var(vecW
h1
ε
n+h
)(G
2
F
1
+ F
2
G
1
)
0
+ (G
2
G
1
)Var(vecW
h1
ε
2
n+h
)(G
2
G
1
)
0
+ (F
2
F
1
)Cov(vecW
h1
, vecW
h1
ε
2
n+h
)(G
2
G
1
)
0
+ (G
2
G
1
)Cov(vecW
h1
ε
2
n+h
, vecW
h1
)(F
2
F
1
)
0
.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
17
Prediction intervals for exponential smoothing state space models
Next we find that
Var(vecW
h1
ε
n+h
) = E[vecW
h1
(vecW
h1
)
0
ε
2
n+h
] = σ
2
(V
h1
+ vecM
h1
(vecM
h1
)
0
),
Var(vecW
h1
ε
2
n+h
) = E(vecW
h1
(vecW
h1
)
0
ε
4
n+h
) E(vecW
h1
)E(vecW
h1
)
0
σ
4
= 3σ
4
(V
h1
+ vecM
h1
(vecM
h1
)
0
) vecM
h1
(vecM
h1
)
0
σ
4
= σ
4
(3V
h1
+ 2vecM
h1
(vecM
h1
)
0
),
and
Cov(vecW
h1
, vecW
h1
ε
2
n+h
) = E(vecW
h1
(vecW
h1
)
0
ε
2
n+h
) E(vecW
h1
)E(vecW
h1
)
0
σ
2
= σ
2
(V
h1
+ vecM
h1
(vecM
h1
)
0
) σ
2
vecM
h1
(vecM
h1
)
0
= σ
2
V
h1
.
It follows that
V
h
= (F
2
F
1
)V
h1
(F
2
F
1
)
0
+ σ
2
h
(F
2
F
1
)V
h1
(G
2
G
1
)
0
+ (G
2
G
1
)V
h1
(F
2
F
1
)
0
i
+ σ
2
(G
2
F
1
+ F
2
G
1
)
h
V
h1
+ vecM
h1
(vecM
h1
)
0
i
(G
2
F
1
+ F
2
G
1
)
0
+ σ
4
(G
2
G
1
)
h
3V
h1
+ 2vecM
h1
(vecM
h1
)
0
i
(G
2
G
1
)
0
.
The forecast mean and variance are given by
µ
h
= E(Y
n+h
| x
n
, z
n
) = H
1
M
h1
H
0
2
and
v
h
= Var(Y
n+h
| x
n
, z
n
) = Var[vec(H
1
W
h1
H
0
2
+ H
1
W
h1
H
2
ε
n+h
)]
= Var[(H
2
H
1
)vecW
h1
+ (H
2
H
1
)vecW
h1
ε
n+h
]
= (H
2
H
1
)[V
h1
(1 + σ
2
) + σ
2
vecM
h1
(vecM
h1
)
0
](H
0
2
H
0
1
)
= (1 + σ
2
)(H
2
H
1
)V
h1
(H
2
H
1
)
0
+ σ
2
µ
2
h
.
When σ is small (much less than 1), it is possible to obtain some simpler but approximate expres-
sions. The second term in (17) can be dropped to give M
h
= F
h1
1
M
0
(F
h1
2
)
0
and so
µ
h
H
1
F
h1
1
x
n
(H
2
F
h1
2
z
n
)
0
.
The order of this approximation can be obtained by noting that the observation equation may be
written as Y
t
= U
1,t
U
2,t
U
3,t
where U
1,t
= H
1
x
t1
, U
2,t
= H
2
z
t1
and U
3,t
= 1 + ε
t
. Then
E(Y
t
) = E(U
1,t
U
2,t
U
3,t
) = E(U
1,t
U
2,t
)E(U
3,t
)
since U
3,t
is independent of U
1,t
and U
2,t
. Since E(U
1,t
U
2,t
) = E(U
1,t
)E(U
2,t
) + Cov(U
1,t
, U
2,t
), we
have the approximation:
µ
h
= E(Y
n+h
| x
n
, z
n
) = E(U
1,n+h
| x
n
)E(U
2,n+h
| z
n
)E(U
3,n+h
) + O(σ
2
).
Hyndman, Koehler, Ord and Snyder: 9 November 2001
18
Prediction intervals for exponential smoothing state space models
When U
2,n+h
is constant the result is exact. Now let
µ
1,h
= E(U
1,n+h+1
| x
n
) = E(H
1
x
n+h
| x
n
) = H
1
F
h
1
x
n
µ
2,h
= E(U
2,n+h+1
| z
n
) = E(H
2
z
n+h
| z
n
) = H
2
F
h
2
z
n
v
1,h
= Var(U
1,n+h+1
| x
n
) = Var(H
1
x
n+h
| x
n
)
v
2,h
= Var(U
2,n+h+1
| z
n
) = Var(H
2
z
n+h
| z
n
)
and v
12,h
= Cov(U
2
1,n+h+1
, U
2
2,n+h+1
| x
n
, z
n
) = Cov([H
1
x
n+h
]
2
, [H
2
z
n+h
]
2
| x
n
, z
n
).
Then
µ
h
= µ
1,h1
µ
2,h1
+ O(σ
2
) = H
1
F
h1
1
x
n
H
2
F
h1
2
z
n
+ O(σ
2
).
By the same arguments, we have
E(Y
2
t
) = E(U
2
1,t
U
2
2,t
U
2
3,t
) = E(U
2
1,t
U
2
2,t
)E(U
2
3,t
).
and
E(Y
2
n+h
| z
n
, x
n
) = E(U
2
1,n+h
U
2
2,n+h
| x
n
z
n
)E(U
2
3,n+h
)
=
Cov(U
2
1,n+h
, U
2
2,n+h
| x
n
z
n
) + E(U
2
1,n+h
| x
n
)E(U
2
2,n+h
| z
n
)
E(U
2
3,n+h
)
= (1 + σ
2
)[v
12,h1
+ (v
1,h1
+ µ
2
1,h1
)(v
2,h1
+ µ
2
2,h1
)].
Assuming the covariance v
12,h1
is small compared to the other terms we obtain
v
h
(1 + σ
2
)(v
1,h1
+ µ
2
1,h1
)(v
2,h1
+ µ
2
2,h1
) µ
2
1,h1
µ
2
2,h1
.
We now consider the particular cases.
MDM
We first derive the results for the MDM case where x
t
= (`
t
, b
t
)
0
and z
t
= (s
t
, . . . , s
tm+1
)
0
, and the
matrix coefficients are H
1
= [1, 1], H
2
= [0, . . . , 0, 1],
F
1
=
1 1
0 φ
, F
2
=
0
0
m1
1
I
m1
0
m1
, G
1
=
α α
αβ αβ
, and G
2
=
0
0
m1
γ
O
m1
0
m1
.
Many terms will be zero in the formulas for the expected value and the variance because of the
following relationships: G
2
2
= O
m
, H
2
G
2
= 0
0
m
, and (H
2
H
1
)(G
2
X) = 0
0
2m
where X is any
2 × 2 matrix. For the terms that remain, H
2
H
1
and its transpose will only use the terms from
the last two rows of the last two columns of the large matrices because H
2
H
1
= [0
0
2m2
, 1, 1].
Using the small σ approximations and exploiting the structure of the MDM model, we can we
can obtain simpler expressions that approximate µ
h
and v
h
.
Note that H
2
F
j
2
G
2
= γd
j+1,m
H
2
. So for h < m, we have
H
2
z
n+h
| z
n
= H
2
h
j=1
(F
2
+ G
2
ε
n+hj+1
)z
n
= H
2
F
h
2
z
n
= s
nm+h+1
Hyndman, Koehler, Ord and Snyder: 9 November 2001
19
Prediction intervals for exponential smoothing state space models
Furthermore,
µ
2,h
= s
nm+1+h
and v
2,h
= s
2
nm+1+h
[(1 + γ
2
σ
2
)
k
1]
where k = b(h 1 )/mc.
Also note that x
n
has the same properties as for MDN in Class 2. Thus
µ
1,h
= `
n
+ φ
h1
b
n
and v
1,h
= (1 + σ
2
)θ
h
µ
2
1,h
.
Combining all the terms, we arrive at the approximations
µ
h
=
˜
µ
h
s
nm+1+(h1)
+ O(σ
2
)
and v
h
s
2
nm+1+(h1)
h
θ
h
(1 + σ
2
)(1 + γ
2
σ
2
)
k
˜
µ
2
h
i
where
˜
µ
h
= `
n
+ φ
h1
b
n
, θ
1
=
˜
µ
2
1
, and
θ
h
=
˜
µ
2
h
+ σ
2
α
2
h1
j=1
(1 + φ
j1
β)
2
θ
hj
, h 2.
These expressions are exact for h m. Also for h m, the formulas agree with those in Koehler,
Ord, and Snyder (2001) and Chatfield and Yar (1991), if the O(σ
4
) terms are dropped from the
expression.
Other cases
The other cases of Class 3 can be derived as special cases of MDM.
For MAM, we use the results of MDM with φ = 1.
The results for MNM are obtained as a special case of MAM with β = 0 and b
t
= 0 for all t.
The simpler expression for v
h
is obtained as for MNN in Class 2.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
20
Prediction intervals for exponential smoothing state space models
References
CHATFIELD, C. and M. YAR (1991) Prediction intervals for multiplicative Holt-Winters, Int. J.
Forecasting, 7, 31–37.
HYNDMAN, R.J., A.B. KOEHLER, R.D. SNYDER and S. GROSE (2001) A state space framework
for automatic forecasting using exponential smoothing methods. International J. Forecasting,
to appear.
JOHNSTON, F.R. and HARRISON, P.J. (1986) The variance of leadtime demand, J. Opl. Res. Soc.,
37, 303–308.
KOEHLER, A.B., R.D. SNYDER and J.K. ORD (2001) Forecasting models and prediction intervals
for the multiplicative Holt-Winters method, International J. Forecasting, 17, 269–286.
MAKRIDAKIS, S., S.C. WHEELWRIGHT and R.J. HYNDMAN (1998) Forecasting: methods and appli-
cations, 3rd edition, John Wiley & Sons: New York.
MCKENZIE, E. (1976), A comparison of some standard seasonal forecasting systems, The Statisti-
cian, 25, 3–14.
ORD, J.K., A.B. KOEHLER and R.D. SNYDER (1997) Estimation and prediction for a class of dy-
namic nonlinear statistical models, J. Amer. Statist. Assoc., 92, 1621–1629.
ROBERTS, S.A. (1982), A general class of Holt-Winters type forecasting models, Management Sci-
ence, 28, 808–820.
YAR, M. and C. CHATFIELD (1990). Prediction intervals for the Holt-Winters forecasting proce-
dure, Int. J. Forecasting, 6, 127–137.
Hyndman, Koehler, Ord and Snyder: 9 November 2001
21
... In phase II (FOR), the development considers forecast projections for an extended horizon beyond the testing set matching its size (i.e., ten days ahead), implementing the identified recalibrated method with the lowest error from phase I for each analyzed country. The forecasts will include their corresponding 95% prediction intervals (Hyndman et al., 2001). This outcome implies that the forecast for each coming day will consist of three values: the forecast point and the 95% lower and upper bounds. ...
Article
Full-text available
The COVID-19 pandemic showed governments’ unpreparedness as decision-makers hastily created restrictions and policies to contain its spread. Identifying prospective areas with a higher contagion risk can reduce mitigation planning uncertainty. This research proposes a risk assessment metric called AGGFORCLUS that integrates time-series forecasting and clustering to convey joint information on predicted caseload growth and variability, thereby providing an educated yet visually simple view of the risk status. In AGGFORCLUS, the development is sectioned into three phases. Phase I forecasts confirmed cases using a mixture of five different forecasting methods. Phase II develops the identified best model forecasts for an extended ten-day horizon, including their prediction intervals. In Phase III, we calculate average growth metrics for predictions and use them to cluster series by their multidimensional average growth. We present the results for various countries framed into a nine-quadrant risk-grouped associated measure linked to the expected cumulative caseload progress and uncertainty.
... The exponential smoothing method has been developed and used for time series forecasting since the 1950s [21,22], but only recently some procedures for automatic model selection were designed [23,24]. Exponential smoothing is a set of methods that was classified initially by Pegels [25], and then by Gardner [26], and Taylor [27]. ...
Article
Full-text available
Even though forecasting methods have advanced in the last few decades, economists still face a simple question: which prediction method gives the most accurate results? Econometric forecasting methods can deal with different types of time series and have good results, but in specific cases, they may fail to provide accurate predictions. Recently, new techniques borrowed from the soft computing area were adopted for economic forecasting. Starting from the importance of economic forecasts, we present an experimental study where we compared the accuracy of some of the most used econometric forecasting methods, namely the simple exponential smoothing, Holt and ARIMA methods, with that of two new methods based on the concept of fuzzy time series. We used a set of time series extracted from the Eurostat database and the R software for all data processing. The results of the experiments show that despite not being fully superior to the econometric techniques , the fuzzy time series forecasting methods could be considered as an alternative for specific time series.
... The Double Exponential Smoothing (DES)predictive algorithm has been implemented, because it reduces the adaptation time by requiring shorter time series for data modeling, in this way outperforming autoregressive solutions as ARIMA [34]. Its adjustment parameters are auto-calibrated as described in [24] but instead of inferring variations with respect to the estimated points, prediction intervals are constructed as suggested in [19]. They are expressed considering the prediction error ϵ t based on the Mahalanobis distance at t, particularly when t = m, according to the following equation: ...
Conference Paper
Full-text available
This paper reviews the Economic Denial of Sustainability (EDoS) problem in emerging network scenarios. The performed research studied them in context of adaptive approaches grounded on self-organizing networks (SON) and Network Function Virtualization (NFV). In particular, two novel threats were reviewed in depth: Workload-based EDoS (W-EDoS) and Instantiation-based EDoS (I-EDoS). With the aim to contribute to their mitigation a security architecture with network-based intrusion detection capabilities is proposed. This architecture implements machine learning techniques, network behaviour prediction, adaptive thresholding methods, and productivity-based clustering for detecting entropy-based anomalies based on the observed workload (W-EDoS) or suspicious variations of the productivity observed at the virtual instances (I-EDoS). A detailed experimentation has been conducted considering different calibration parameters under different network scenarios, on which the security architecture has been assessed. The results have proven good accuracy levels, hence demonstrating the proposal effectiveness.
... and being K the confidence interval of the estimation (by default Z α 2 ). Note that despite linking its value to the normal distribution, it was demonstrated that when time series does not approach such distribution, the obtained error is unrepresentative [46]. Figure 2 illustrates an example of novelty detection. ...
Article
In recent years, an important increase in the amount and impact of Distributed Denial of Service (DDoS) threats has been reported by the different information security organizations. They typically target the depletion of the computational resources of the victims, hence drastically harming their operational capabilities. Inspired by these methods, Economic Denial of Sustainability (EDoS) attacks pose a similar motivation, but adapted to Cloud computing environments, where the denial is achieved by damaging the economy of both suppliers and customers. Therefore, the most common EDoS approach is making the offered services unsustainable by exploiting their auto-scaling algorithms. In order to contribute to their mitigation, this paper introduces a novel EDoS detection method based on the study of entropy variations related with metrics taken into account when deciding auto-scaling actuations. Through the prediction and definition of adaptive thresholds, unexpected behaviors capable of fraudulently demand new resource hiring are distinguished. With the purpose of demonstrate the effectiveness of the proposal, an experimental scenario adapted to the singularities of the EDoS threats and the assumptions driven by their original definition is described in depth. The preliminary results proved high accuracy.
... 4. Suavizado exponencial [1,13,14,16,21]. Cada una de estas metodologías puede exhibir diversas virtudes y defectos. ...
Article
Full-text available
In this paper, three regression models are compared according to their performance in terms of forecast accuracy, for the case of time series with increasing seasonality. 617 series are used in the comparison as well as three models, being one of them an original contribution of this work. In addition, the regression models are compared with the autoregressive approach, commonly used in the forecast of these series. The results indicate that the performance of the regression models depends on the forecast horizon and on the degree of curvature of the series. At fewer curvature and longer forecast horizon, its performance is better. The conditions under which the regression models outperform the autoregressive approach are discussed. Also, the performance of the prediction intervals in order to improve its effectiveness is analyzed.
... Ord et al. (1997) and Hyndman et al. (2002) used the underlying innovation state space model to simulate future sample paths, and thereby obtained prediction intervals for all the exponential smoothing methods. Hyndman, Koehler, Ord, and Snyder (2005) used state space models to derive analytical prediction intervals for 15 of the 30 methods, including all the commonly used methods. They provide the most comprehensive algebraic approach to date for handling the prediction distribution problem for the majority of exponential smoothing methods. ...
Article
We review the past 25 years of time series research that has been published in journals managed by the International Institute of Forecasters (Journal of Forecasting 1982-1985; International Journal of Forecasting 1985-2005). During this period, over one third of all papers published in these journals concerned time series forecasting. We also review highly influential works on time series forecasting that have been published elsewhere during this period. Enormous progress has been made in many areas, but we find that there are a large number of topics in need of further development. We conclude with comments on possible future research directions in this field.
Chapter
Air pollution refers to the release of pollutants that are harmful to human health and the entire planet, into the air. To publicize the extent of pollutants in the air at any point in time, government agencies use an air quality index. As the air quality index rises, public health risks increase. Many time series forecasting methods have been proposed to predict the air pollution so that effective measures can be taken to control it. Different methods work with different aspects and types of time series data. In light of these methods, this study aims to provide a comparative study of some of these models based on their forecasting accuracy. The data chosen for this study has been collected four times a day on a daily basis from the year 2000 to 2016, from different monitoring stations in California, USA.
Article
Although time series are frequently nonlinear in reality, people tend to use linear models to fit them under some assumptions unnecessarily in accordance with the truth, which unsurprisingly leads to unsatisfactory performance. This paper proposes a forecast method: Genetic programming based on least square method (GP-LSM). Inheriting the advantages of genetic algorithm (GA), without relying on the particular distribution of the data, this method can improve the prediction accuracy because of its ability of fitting nonlinear models, and raise the convergence speed benefitting from the least square method (LSM). In order to verify the validity of this method, the authors compare this method with seasonal auto regression integrated moving average (SARIMA) and back propagation artificial neural networks (BP-ANN). The results of empirical analysis show that forecast accuracy and direction prediction accuracy of GP-LSM are obviously better than those of the others.
Article
Full-text available
Forecasting energy load demand data based on high frequency time series has become of primary importance for energy suppliers in nowadays competitive electricity markets. In this work, we model the time series of Italian electricity consumption from 2004 to 2014 using an exponential smoothing approach. Data are observed hourly showing strong seasonal patterns at different frequencies as well as some calendar effects. We combine a parsimonious model representation of the intraday and intraweek cycles with an additional seasonal term that captures the monthly variability of the series. Irregular days, such as public holidays, are modelled separately by adding a specific exponential smoothing seasonal term. An additive ARMA error term is then introduced to lower the volatility of the estimated trend component and the residuals’ autocorrelation. The forecasting exercise demonstrates that the proposed model performs remarkably well, in terms of lower root mean squared error and mean absolute percentage error criteria, in both short term and medium term forecasting horizons.
Article
Full-text available
A class of nonlinear state-space models, characterized by a single source of randomness, is introduced. A special case, the model underpinning the multiplicative Holt-Winters method of forecasting, is identified. Maximum likelihood estimation based on exponential smoothing instead of a Kalman filter, and with the potential to be applied in contexts involving non-Gaussian disturbances, is considered. A method for computing prediction intervals is proposed and evaluated on both simulated and real data.
Article
Many short-term forecasting systems are based on exponentially weighted moving averages. It is usual to forecast the cumulative demand over a lead time or production horizon, and to describe this forecast in terms of its mean and variance. When the forecast horizon is fixed, the variance is often taken as the product of the number of periods and the variance per period. This is a serious error and typically underestimates the variance by a factor of about two. This paper details the need for a proper awareness of the correction factors.
Article
This paper is concerned with the formulation of short-term forecasting models, and introduces a range of models of considerable importance. These are defined in terms of predictions and sensible updating mechanisms for estimates of quantities such as level, growth, and seasonality, and constitute generalizations of familiar (linear) exponential smoothing predictors. They are shown to be equivalent to particular ARIMA models, and generally these do not lie within that subset of the ARIMA class which forms the basis of the Box-Jenkins modelling approach. It is argued that the models of this paper have a reasoned structure, and are to be preferred to the Box-Jenkins models for most socio-economic applications.
Article
We provide a new approach to automatic forecasting based on an extended range of exponential smoothing methods. Each method in our taxonomy of exponential smoothing methods provides forecasts that are equivalent to forecasts from a state space model. This equivalence allows: (1) easy calculation of the likelihood, the AIC and other model selection criteria; (2) computation of prediction intervals for each method; and (3) random simulation from the underlying state space model. We demonstrate the methods by applying them to the data from the M-competition and the M3-competition. The method provides forecast accuracy comparable to the best methods in the competitions; it is particularly good for short forecast horizons with seasonal data.
Article
Prediction interval formulae are derived for the Holt-Winters forecasting procedure with an additive seasonal effect. The formulae make no assumptions about the ‘true’ underlying model. The results are contrasted with those obtained from various alternative approaches to the calculation of prediction intervals. Some large discrepancies are noted and it is suggested that the formulae presented here should be preferred to those which depend on an inappropriate deterministic model or which depend on invalid generalised approximations which take no account of the particular properties of the given series. Results for cumulative forecasts and for a damped trend model are also given. For completeness we also give results for one- and two-parameter exponential smoothing. Finally, we make some general comments as to why prediction intervals tend to be too narrow in practice.
Article
Yar and Chatfield (1990) have proposed a method of constructing prediction intervals for the additive Holt-Winters forecasting procedure and this companion paper extends the results to the multiplicative seasonal case. In contrast to the additive case, it is shown that the width of ‘multiplicative’ prediction intervals will depend on the time origin of the forecasts and may decrease (near a seasonal trough) as well as increase with lead time. This key result follows from the form of the Holt-Winters updating equations, and applies whatever assumption is made about the error variance, although more self-consistent results are obtained if the (one-step-ahead) error variance is assumed to be proportional to the seasonal effect rather than constant. An example is presented to compare the proposed prediction intervals, calculated under two different error assumptions, with those obtained using an empirical procedure, which effectively assumes additive errors with constant variance, and those obtained with an ‘approximate’ procedure. Some general recommendations are made.
Article
A new class of models for data showing trend and multiplicative seasonality is presented. The models allow the forecast error variance to depend on the trend and/ or the seasonality. It can be shown that each of these models has the same updating equations and forecast functions as the multiplicative Holt-Winters method, regardless of whether the error variation in the model is constant or not. While the point forecasts from the different models are identical, the prediction intervals will, of course, depend on the structure of the error variance and so it is essential to be able to choose the most appropriate form of model. Two methods for making this choice are presented and examined by simulation.