Content uploaded by Ivan Svetunkov
Author content
All content in this area was uploaded by Ivan Svetunkov on Jan 25, 2020
Content may be subject to copyright.
Content uploaded by Ivan Svetunkov
Author content
All content in this area was uploaded by Ivan Svetunkov on Nov 07, 2017
Content may be subject to copyright.
Multiplicative State Space Models
for Intermittent Time Series
Ivan Svetunkova,∗, John E. Boylana
aCentre for Marketing Analytics and Forecasting
Lancaster University Management School, Lancaster, LA1 4YX, UK
Abstract
Intermittent demand forecasting is an important supply chain task, which
is commonly done using methods based on exponential smoothing. These
methods however do not have underlying statistical models, which limits
their generalisation. In this paper we propose a general state space model
that takes intermittence of data into account, extending the taxonomy of in-
novation (single source of error) state space models. We show that this model
has a connection with conventional non-intermittent state space models and
certain forms of it may be estimated by Croston’s and Teunter-Syntetos-
Babai (TSB) forecasting methods. We discuss properties of the proposed
models and show how a selection can be made between them in the pro-
posed framework. We then conduct experiments on two real life datasets,
demonstrating advantages of the proposed approach.
Keywords: forecasting, state space models, exponential smoothing,
Croston, intermittent demand, supply chain
1. Introduction1
An intermittent time series is a series that has non-zero values occurring2
at irregular frequency. The data is usually, but not necessarily, discrete and3
often takes low integer values. Intermittent series occur in many application4
areas where there are rare events. Examples include security breaches, nat-5
ural disasters and the occurrence of demand for ‘slow-moving’ products. In6
∗Correspondence: Ivan Svetunkov, Department of Management Science, Lancaster Uni-
versity Management School, Lancaster, Lancashire, LA1 4YX, UK.
Email address: i.svetunkov@lancaster.ac.uk (Ivan Svetunkov)
Working Paper May 1, 2019
the last case we usually refer to “intermittent demand” which, in addition to7
irregularity of occurrence, contains only zeroes and positive values. The final8
application area is important in a supply chain setting, where decisions need9
to be made about the quantity to order and the discontinuation of ordering.10
In this paper, we establish a general modelling framework for intermittent11
series, which does not depend on any particular application area. However,12
its development has been motivated by the requirements of inventory man-13
agement. Two forecasting methods that have been described in the supply14
chain literature to inform how much to order and when to discontinue a prod-15
uct, are Croston’s method (Croston, 1972) and TSB method (Teunter et al.,16
2011). Neither of these methods has, so far, been furnished with a complete17
statistical model.18
From a practical supply chain perspective, there are a number of issues19
that need to be resolved. We need to decide, in a systematic way, which20
intermittent demand forecasting method to use, rather than allowing these21
choices to be arbitrary. Having chosen the forecasting method, it needs to22
be parametrised appropriately. Finally, replenishment decisions should be23
informed by reliable estimates. For inventory systems based on the probabil-24
ity of stock-out, good estimates of upper percentiles of demand are required.25
For systems based on fill-rates (percentage of demand filled immediately from26
stock), good estimates of probabilities of demand are required. In the latter27
case, these may be calculated from an estimated Cumulative Distribution28
Function. The appropriate modelling approach supports the estimation of29
both percentiles and Cumulative Distribution Functions. Indeed, we argue30
that a complete statistical model can support method choice, parametrisation31
and, ultimately, replenishment decisions in supply chains.32
In this paper we define a complete statistical model as a mathematical33
representation of a real phenomenon with a complete specification of distribu-34
tion and parameters. A forecasting method is a mathematical procedure35
that generates point and / or interval forecasts, with or without a statistical36
model. Finally, a filter is a mathematical equation (or set of equations)37
that connects input and output variables, removing unwanted components38
or features of the original signal.39
In this paper we propose a general statistical model for intermittent data,40
special cases of which underpin Croston’s method and TSB. The models that41
we propose allow producing analytical conditional moments without a need42
to use simulations. However, distributional approximations are needed in43
order to produce prediction intervals for multiple steps ahead. We also show44
2
the connection between conventional forecasting models and the intermittent45
demand model. We then demonstrate how the proposed intermittent demand46
model works on several examples. Thus we contribute towards filling a gap47
of modelling intermittent time series, which opens new research directions in48
the area.49
2. Literature review50
The most popular intermittent demand forecasting method was proposed51
by Croston (1972). His method has been researched extensively in recent52
years and has been implemented in widely adopted supply chain software53
packages (e.g. SAS, SAP APO and others). Croston was the first to note54
that, when demand is intermittent, simple exponential smoothing produces55
biased forecasts immediately after demand occurrences (known as ‘decision-56
point bias’). So he proposed splitting the observed data into two parts:57
demand sizes and demand occurrences. The proposed model in Croston58
(1972) has the following simple form:59
yt=otzt,(1)
where ytis the actual observation, otis a binary Bernoulli distributed variable60
taking a value of one when demand occurs and zero otherwise, ztis the61
potential demand size, having some conditional distribution, becoming the62
realised demand when ot= 1 and, finally, tis the time of the observation.63
Proposing the model (1), Croston suggested to work with each of these two64
parts separately, showing that the probability of occurrence can be estimated65
using intervals between demands. This also means that instead of having the66
series t= 1, . . . , T , we have two time series, namely the demand intervals qjt
67
and demand sizes zjt, where jt= 1, . . . , N reflects the sequential numbers68
of demand intervals and demand sizes and Nis the number of non-zero69
demands. If qjtis the time elapsed since the last non-zero observation, then70
it represents the demand interval when the next non-zero observation occurs.71
The demand intervals and consequently the probability of occurrence are72
assumed to be constant between the non-zero demands, while the demand73
sizes are considered to be the same during the zero demands. Both demand74
sizes zjtand demand intervals qjtare forecasted in this method using simple75
3
exponential smoothing, which leads to the following system:76
ˆyjt=1
ˆqjtˆzjt
ˆzjt=αzzjt−1+ (1 −αz)ˆzjt−1
ˆqjt=αqqjt−1+ (1 −αq)ˆqjt−1
ˆyt= ˆyjt
jt=jt−1+ot
,(2)
where ˆyjtis the predicted mean demand, ˆzjtis the predicted demand size,77
ˆqjtis the predicted demand interval, αqand αzare smoothing parameters78
for intervals and sizes respectively. The formulation (2) demonstrates how79
each observation at time ttranslates to respective jtelement of demand sizes80
and demand intervals. In Croston’s initial formulation it was assumed that81
αq=αz, but separate smoothing parameters were later suggested by Schultz82
(1987), and this additional flexibility has been supported by other researchers83
(e.g. Snyder, 2002; Kourentzes, 2014).84
Syntetos and Boylan (2001, 2005) showed that estimating the mean de-85
mand using the first equation in (2) leads to ‘inversion bias’ and in order to86
correct it, they proposed an approximation (known as the Syntetos-Boylan87
Approximation, SBA). They conducted an experiment on 3000 real time se-88
ries and showed that forecasting accuracy of SBA is higher than Croston’s89
method (Syntetos and Boylan, 2005).90
Although various models have been proposed, none have so far been iden-91
tified which would be appropriate for non-negative integer series and would92
underlie Croston’s method. This means that heuristic methods of initialisa-93
tion and parameter estimation are used instead of statistically rigorous ones.94
Several authors over the years have looked into this problem.95
Snyder (2002) discussed possible statistical models underlying Croston’s96
method. He examined the following form:97
yt=otµt|t−1+t,(3)
where µt|t−1is the conditional expectation of demand sizes. Snyder (2002)98
showed that the model (3) contradicts some basic assumptions about inter-99
mittent demand. The main reason for this is because the error term tis100
assumed to be normally distributed, but this means that demand can be101
negative. So, Snyder (2002) proposed the following modified intermittent102
demand model:103
y+
t=otexp(µt|t−1+t),(4)
4
where y+
trepresents the demand at time t.104
Shenstone and Hyndman (2005) studied several possible statistical models105
with additive errors, including those of Snyder (2002), to identify a model for106
which Croston’s method is optimal. They argued that any model underlying107
Croston’s method must be non-stationary and defined on continuous space108
including negative values. They concluded that such a model has unrealistic109
properties.110
However, one of the main conclusions of Shenstone and Hyndman (2005)111
is open to misinterpretation. One should not conclude that intermittent112
demand methods do not have and cannot have any reasonable underlying113
statistical model. This conclusion depends on the important assumption of114
an additive error term. In this paper, we shall propose statistical models115
with multiplicative error terms as an alternative to the models discussed by116
Shenstone and Hyndman (2005).117
(Hyndman et al., 2008, pp. 281 - 283) proposed a basis for Croston’s118
method using a Poisson distribution of demand sizes with time varying pa-119
rameter ˆ
λt:120
yt=otzjt
zjt∼Poisson(ˆ
λjt−1−1) + 1
ˆ
λjt=αzzjt−1+ (1 −αz)ˆ
λjt−1
ot∼Bernoulli 1
ˆqjt
ˆqjt=αpτjt−1+ (1 −αp)ˆqjt−1
,(5)
where ˆ
λjtis the estimate of the average number of events per trial, τjtis121
the observed demand intervals and jtis defined in (2). The authors point122
out that the proposed model “gives one-step-ahead forecasts equivalent to123
Croston’s method”. This is because the conditional expectation of zjtfor124
one-step-ahead in (5) is equal to ˆ
λjt−1−1 + 1 = ˆ
λjt−1, which corresponds to125
Croston’s ˆzjtin (2).126
Equation (5) defines a stochastic process and retains useful statistical127
properties, allowing the production of both point forecasts and prediction128
intervals, which, as the authors propose, should be done using simulations.129
(5) can be considered as a filter, because it allows reduction of the noise of130
the original data and production of better estimates of the mean demand131
sizes and mean demand intervals.132
However, it is not a complete statistical model as defined earlier in this pa-133
per, because it does not have a complete specification of distributions: both134
5
ˆ
λjtand ˆqtemploy the SES method, which means that ˆ
λjtis the weighted135
average of the previous observed demand sizes, while ˆqtis the weighted aver-136
age of the previous observed demand intervals. And neither of them can be137
defined in terms of an underlying distribution and a fixed parameter.138
As a result of this, (5) restricts its generalisation, because the introduc-139
tion of new components or exogenous variables is not straightforward in this140
framework. One of the reasons for this is that the proposed filter sidesteps141
the ETS taxonomy, because it is not possible to specify either ˆ
λjtor ˆqtin142
the error correction form, as the error term might not make sense for count143
data. So (5) cannot be easily extended to include, for example, trends and144
seasonal components.145
Finally, although the Poisson distribution becomes closer to the normal146
distribution with an increase of ˆ
λjt, the connection between the model (5)147
and the conventional ETS models is not apparent, making two separate cases.148
Overall, while the filter (5) solves some problems for intermittent demand,149
addresses the issue of integer data, produces prediction distributions and has150
a connection with Croston’s method, it cannot be considered as a complete151
solution to the problem.152
In Snyder et al. (2012) several intermittent demand models were pro-153
posed. The model (5) is called the “Hurdle shifted Poisson” model. The154
authors suggested applying the Negative Binomial distribution with time155
varying mean value to the intermittent data, and found that it performs156
better than the other filters. However this filter has the same problems as157
the Poisson one, discussed above. Furthermore, because the authors did not158
model the occurrence variable separately, the proposed Negative Binomial159
filter is more restrictive than the filter (5), updating its values on every ob-160
servation, in a manner similar to the SES, and thus potentially reintroducing161
the decision-point bias. Still, it can be considered as a parsimonious version162
of (5), taking into account the connection between the Poisson and Negative163
Binomial distributions. Finally, the authors did not make a comparison with164
the ETS(A,N,N) model in their paper because ETS is based on Gaussian165
distributions, and it might seem to be unsuitable for slow-moving inventories166
with discrete demands. However, we argue that applying ETS(A,N,N) to167
this type of data is still useful, because this model is considered as a good168
universal benchmark in forecasting experiments, and was the first method to169
be used on intermittent demand series.170
Another intermittent demand method was proposed by Teunter et al.171
(2011), which has been known in the literature as TSB. It was derived for172
6
obsolescence of inventory, but can be used for other cases as well. The authors173
proposed using the same principle as in (1), but estimating the time vary-174
ing probability of demand occurrence ptusing simple exponential smoothing175
based on the variable otrather than switching to intervals between demands.176
Their method can be represented by the following system of equations:177
ˆyt= ˆptˆzt
ˆzjt=αzzjt−1+ (1 −αz)ˆzjt−1
ˆpt=αpot−1+ (1 −αp)ˆpt−1
ˆzt= ˆzjt
,(6)
where ˆptis the predicted probability of demand occurrence and αpis the
smoothing parameter for this probability estimate. The second equation in
(6) can be transformed so that it is easier to interpret if it is represented as:
ˆzjt= ˆzjt−1+αz(zjt−1−ˆzjt−1).
The update of probability in TSB is done after each observation, while de-178
mand sizes are updated only when ot= 1, thus index jtinstead of tfor the179
variable. An advantage of this method is that the conditional expectation180
does not need any corrections similar to Syntetos and Boylan (2005). How-181
ever, the authors did not propose a statistical model for their method, which182
leads to issues similar to the ones for Croston’s method. These include prob-183
lems with the correct estimation of the model parameters, conditional mean184
and variance.185
Both TSB and Croston can be applied to fast moving products, where186
they become equivalent to simple exponential smoothing. They can both187
perform well on real-world datasets (Kourentzes, 2014); however they are188
not easily extendable and are disconnected from other exponential smoothing189
methods and are considered to be a different group.190
3. Statistical model191
3.1. General Intermittent State Space Model192
We start from Croston’s original formulation (1) and split intermittent193
demand into two parts in a similar way, but assuming that ztis generated194
using a statistical model on its own. We argue that the assumption that the195
error term interacts with the final demand ytrather than demand sizes zt
196
is the main flaw in the logic of derivation of statistical models underlying197
7
intermittent demand forecasting methods. Moving the error term into zt
198
allows using any statistical model that a researcher prefers (e.g. ARIMA,199
ETS, regression, diffusion model etc). The model underlying ztcorresponds200
to potential demand for a product, while the other model, underlying ot,201
corresponds to demand realisations, when a customer makes a purchase of a202
product.203
Taking into account that both Croston’s method and TSB use exponential204
smoothing methods, we propose to use a model form that underlies this205
forecasting approach. We adopt the single source of error (SSOE) state206
space model, as this has been well-established (Snyder, 1985; Ord et al.,207
1997; Hyndman et al., 2002) whilst acknowledging that other model forms208
are possible (e.g. multiple source of error, MSOE). We use the SSOE model209
for zt, which in a very general way has the following form, based on (1):210
yt=otzt
zt=w(vt−1) + r(vt−1)t
vt=f(vt−1) + g(vt−1)t
,(7)
where otis a Bernoulli distributed random variable, vtis the state vector,211
tis the error term, f(·) is the transition function, w(·) is the measurement212
function, g(·) is the persistence function and r(·) is the error term function.213
These correspond to the functions in (Hyndman et al., 2008, p.54) and allow214
both additive and multiplicative state space models. One advantage of this215
approach is that in cases of fast moving demand otbecomes equal to one216
for all t, which transforms the model (7) from an intermittent into a non-217
intermittent conventional model. This modification expands the Hyndman218
et al. (2008) taxonomy and allows introducing simple modifications of the219
model by inclusion of time series components and exogenous variables.220
In our new model, the first equation corresponds to Croston’s original221
formulation (1), while the second equation, called the measurement equation,222
reflects the potential demand size evolution over time. The third equation is223
the standard transition equation for an SSOE model, describing the change224
of components of the model over time.225
An interpretation of the new intermittent demand model (7) is that a226
potential demand size may change in time even when an actual demand is227
not observed. In these cases, ot= 0, leading to yt= 0 in the first equation228
of (7).229
One thing to note about this model is that it can be applied to inter-230
mittent data with continuous non-zero observations. Such series arise in the231
8
context of natural disasters and other natural phenomena. They are less232
common in a supply chain context, but time series with such characteristics233
do exist. For example, daily sales of an expensive coffee sold per ounce can234
exhibit such behaviour with zeroes in some days and then fractional quanti-235
ties in the others.236
However, while the model (7) solves the problem of negative values (be-237
cause now a multiplicative model can be used for zt), identified by Shenstone238
and Hyndman (2005), there is still a need for an integer-valued model. In239
order to solve this problem, we propose a simple modification of the first240
equation in (7):241
yt=otdzte,(8)
where dzteis the rounded up value of zt. In this way, the statistical model we242
propose becomes integer-valued, and it does not contradict any reasonable243
assumptions about intermittent demand. Furthermore, any statistical model244
can be used for zt. It is worth noting that the rounding is an important245
issue, which will be explored further in this paper, in Section 3.8. But there246
have already been some statistical models, exploiting rounding functions in247
the context of count time series data. For example, Kachour and Yao (2009)248
and Kachour (2014) discuss the rounded integer-valued AR models and their249
properties. But before looking into model (8) we need to study the properties250
of the basic model (7), keeping in mind that it is an approximation of the251
more realistic model (8). The model with rounded up values will be called252
in this paper “integer”, while the simpler model (7) will be referred to as253
“continuous”.254
In order for the model (7) to work we make the following assumptions,255
some of which can be relaxed and would lead to different models:256
1. Demand size ztis continuous. This assumption is relaxed in (8) and257
discussed later in this paper, in Section 3.8.258
2. Demand size ztis independent of its occurrence ot. Relaxing this as-259
sumption will lead to a different statistical model;260
3. Potential demand size may change in time even if we do not observe261
it. Restricting this assumption leads to a different model with different262
time indices: jtinstead of t– similar to (5);263
4. othas a Bernoulli distribution with some probability ptthat in the most264
general case varies in time. This is a natural assumption, following the265
idea of Croston (1972). Making some other assumption in its place will266
also lead to a different statistical model.267
9
Although the model is based on the one-step-ahead error, it can produce268
the h steps-ahead conditional mean and variance. With the assumptions dis-269
cussed above, the proposed intermittent state space model allows calculating270
these values using the following formulae:271
µy,t+h|t=µo,t+h|tµz,t+h|t
σ2
y,t+h|t=σ2
o,t+h|tσ2
z,t+h|t+σ2
o,t+h|tµ2
z,t+h|t+µ2
o,t+h|tσ2
z,t+h|t
,(9)
where µy,t+h|tand σ2
y,t+h|tare respectively conditional expectation and con-272
ditional variance of yt;µo,t+h|tand σ2
o,t+h|tare conditional expectation and273
variance of occurrence variable otand finally µz ,t+h|tand σ2
z,t+h|tare the re-274
spective values for the demand sizes zt, discussed in Appendix A for the275
special case of the ETS(M,N,N) model.276
The important point is that, taking into account intermittent demand,277
pure multiplicative models make more sense for the measurement equation278
in (7) than additive or mixed ones, because they restrict the space of de-279
mand sizes to positive numbers. In this paper we discuss ETS(M,N,N)280
model for demand sizes, which denotes multiplicative error, no trend and281
no seasonality. The reason for this choice is because ETS(M,N,N) is a simple282
well-known model underlying simple exponential smoothing (Chatfield et al.,283
2001), which is a core method in both Croston and TSB. However more com-284
plicated models can also be used instead of ETS(M,N,N), but they are not285
of the main interest in this paper.286
In this paper we discuss several types of models: the general model,287
the model with fixed probability, the odds ratio model, the inverse odds288
ratio model, the direct probability model and the one selected automatically289
between all the models. In order to distinguish intermittent state space290
models from the conventional ones we use the letter ‘i’.291
3.2. iETS – the general model292
The general continuous intermittent state space model (7) based on ETS(M,N,N)293
reduces to the special case, called iETS(M,N,N), and can be written as:294
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αzt),(10)
where lz,t is the level of the potential demand sizes and lz,t−1(1 + t) = zt.295
A natural assumption about (1 + t) is that it is i.i.d. and log normal with296
location parameters µand σ2
: (1 + t)∼logN(µ, σ2
). In the estimation of297
10
the model (10), the location parameter µis usually set to zero, decreasing the298
number of parameters to estimate. We will also consider µ= 0 throughout299
this paper.300
When the demand is not observed (ot= 0), the median should be used for301
the estimation of the level instead of the mean for (10), so that the estimate302
of the level on observation t+kis equal to (Appendix A):303
Md(lz,t+k|t) = lz,t .(11)
This is because the distribution of the error term is skewed and the condi-304
tional mean absorbs a portion of uncertainty coming from the error. The305
median, on the other hand, separates the level from the uncertainty of the306
error.307
The general properties of the ETS(M,N,N) model with the proposed as-308
sumptions are discussed in Appendix A. It is important to note that with309
the log normality assumption instead of normality (as, for example, it was310
originally proposed in Ord et al., 1997), the conditional mean and variance311
for h-steps ahead will be different than for the classical ETS(M,N,N) (see312
Appendix A for the derivations):313
µz,t+h|t=lz,t exp σ2
2(1 −αz) + αzexp σ2
2h−1
,(12)
314
σ2
z,t+h|t=exp σ2
−1exp σ2
σ2
l,t+h−1|t,(13)
where
σ2
l,t+h|t=l2
z,t h
Y
j=1 V(1 + αzt) + E(1 + αzt)2−
h
Y
j=1
E(1 + αzt)2!
is the conditional h-steps ahead variance of the level component and V(1 +315
αzt) and E(1 + αzt) are defined in the Appendix A, based on (1 + αzt)316
having a three-parameter log normal distribution. And it is also important317
to note that the conditional h-steps ahead median of ETS(M,N,N) will cor-318
respond to the straight line, produced by SES:319
Md(zt+h|t) = lz,t.(14)
This value is the point forecast that is produced from the model.320
11
As for the demand occurrence part, it is natural to assume in (10) that321
othas Bernoulli distribution:322
ot∼Bernoulli(pt),(15)
where the probability ptis assumed in a general case to vary over time. It323
can be modelled in different ways. We propose the following general model324
for the occurrence part:325
pt=at
at+bt
at=la,t−1(1 + a,t)
la,t =la,t−1(1 + αaa,t)
(1 + a,t)∼logN(0, σ2
a)
bt=lb,t−1(1 + b,t)
lb,t =lb,t−1(1 + αbb,t)
(1 + b,t)∼logN(0, σ2
b)
,(16)
where atand btare the independent variables (they can be called “shape326
parameters”), defining the shape of the distribution of pt,la,t and lb,t are the327
levels for each of the variables, 1+a,t and 1+b,t are the mutually independent328
error terms and αaand αbare the smoothing parameters. The error terms in329
(21) are distributed log normally with the zero location parameters and the330
scale parameters σ2
aand σ2
brespectively. These assumptions guarantee that331
both shape parameters are positive and, thus, the probability ptalways lies332
between zero and one.333
Both the conditional expectation and the variance of the probability of334
occurrence do not have closed forms. In order to see that, the pt=at
at+btcan335
be reformulated to the ratio:336
pt=1
1 + bt
at
,(17)
which is a logit-normal distribution (Johnson, 1949), because bt
athas a log337
normal distribution with the location parameter log lb,t−1
la,t−1and the scale pa-338
rameter σ2
b+σ2
a. So both conditional expectation µp,t+h|tand variance σ2
p,t+h|t
339
can only be estimated using simulations. However, the conditional h-steps340
ahead median of the probability of occurrence has a closed form of:341
˜pt+h|t=la,t
la,t +lb,t
.(18)
12
Given that the distribution of the probability might be skewed, the median342
(18) should be a more robust estimate of the location of the distribution and343
can be useful for forecasting purposes.344
For the reasons discussed in the subsection 3.7, the conditional expecta-345
tion of the occurrence variable µo,t+h|tcan be set equal to the conditional346
median ˜pt+h|t:347
µo,t+h|t= ˜pt+h|t,(19)
while the conditional variance, based on the assumption of Bernoulli distri-348
bution, is:349
σ2
o,t+h|t=µo,t+h|t(1 −µo,t+h|t).(20)
Summarising (10) and (16), the general iETSG(M,N,N) model can be350
written as:351
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αt)
(1 + t)∼logN(0, σ2
)
ot∼Bernoulli (pt)
pt=at
at+bt
at=la,t−1(1 + a,t)
la,t =la,t−1(1 + αaa,t)
(1 + a,t)∼logN(0, σ2
a)
bt=lb,t−1(1 + b,t)
lb,t =lb,t−1(1 + αbb,t)
(1 + b,t)∼logN(0, σ2
b)
.(21)
In addition, it is worth pointing out that pt∼logitNlog lb,t−1
la,t−1, σ2
a+σ2
b.352
Given that the model (21) has two time varying shape parameters atand353
bt, it covers all the possible cases of probability change over time, including354
building up demand, demand obsolescence and stable demand.355
The statistical model (21) allows estimating all the parameters via likeli-356
hood maximisation. The concentrated log-likelihood function for the model357
(21) can be written as (see Appendix B for the derivation):358
`(θ,ˆσ2
|Y) = −1
2Tlog(2πeˆσ2
) + T0−X
ot=1
log(zt)+X
ot=1
log(ˆpt)+X
ot=0
log(1−ˆpt)
(22)
where Yis the vector of all the in-sample observations, θis the vector of359
parameters to estimate (initial values and smoothing parameters), T0is the360
13
number of zero observations, ˆσ2=1
TPot=1 log2(1 + t) is the estimate of σ2
361
and ˆptis the estimated probability of a non-zero demand at time t.362
In order to construct the model (21), we can use exponential smoothing363
in the error correction form, assuming that the one-step-ahead forecast error364
etis a good estimate of the error term t:365
ˆyt=otˆzt
et=otyt−ˆzt
ˆzt
ˆzt=ˆ
lz,t−1
ˆ
lz,t =ˆ
lz,t−1(1 + αzet)
ˆat=ˆ
la,t−1
ˆ
la,t =ˆ
la,t−1(1 + αaea,t)
ˆ
bt=ˆ
lb,t−1
ˆ
lb,t =ˆ
lb,t−1(1 + αbeb,t)
,(23)
where ˆztis the one-step-ahead forecast of demand sizes (given the value366
on the observation t−1), ˆ
lz,t is the estimate of the level of the model on367
the observation tand otis considered to be known in-sample, ˆatand ˆ
bt
368
are one-step-ahead forecasts of the shape parameters, ˆ
la,t and ˆ
lb,t are the369
estimated levels of atand btrespectively and ea,t and ea,t are the respective370
one-step-ahead forecast errors. It is worth mentioning that given that ztis371
not observable for cases, when ot= 0, the forecast error etis set to zero in372
between the observed demands. In this case the transition of states from373
one non-zero demand to another is governed by the conditional median, as374
discussed above. Finally, neither atnor btare observable, which means that375
there need to be some sort of proxies for the respective forecast errors in376
order to construct the (23) (see Appendix C for the derivations):377
ea,t =ut
1−ut
−1 (24)
and378
eb,t =1−ut
ut
−1,(25)
where ut=1+ot−ˆpt
2. These proxies for errors can then be used in (23) for the379
update of the states.380
Note that any multiplicative ETS model could potentially be used instead381
of ETS(M,N,N) in both the demand occurrence and demand sizes parts of382
14
(21), including models with exogenous variables. This enlarges the spectrum383
of potential intermittent demand models. However we do not aim to study384
all these models in this paper.385
Concluding this subsection, the model (21) is the most complicated of all386
discussed in this paper and has 7 parameters to estimate, 3 for the demand387
sizes model ETS(M,N,N): smoothing parameter, initial seed and the scale388
parameter for the log normal distribution of the error term – and 2 for each389
part of the demand occurrence model: smoothing parameter and the initial390
seed. Note that the variances of the error terms of the occurrence models391
are not needed in the estimation, as the likelihood (22) does not depend on392
them. We call the model (21) the general iETS model. Further in this paper,393
we discuss several special cases of this model, some of which have apparent394
connections with Croston’s method and TSB under some loose conditions.395
In order to distinguish these cases from each other, we use subscript, so that396
the model with fixed probability is denoted as iETSF, the odds ratio model397
as iETSO, the inverse odds ratio model as iETSI, and, finally, the direct398
probability model as iETSD. The model (21), being the most general of399
them all, is denoted as iETSG. Also, throughout this paper, we drop the400
part denoting the type of ETS model used for demand sizes, implying that401
the ETS(M,N,N) is the standard model for demand sizes.402
3.3. iETSF– the model with fixed probability403
This is the simplest case of the model (21). It can be obtained if we assume404
that both atand btare constant over time, which leads to the simplification:405
ot∼Bernoulli(p)
p=a
a+b
.(26)
Given (26) the whole iETSFmodel can be formulated as:406
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αzt)
(1 + t)∼logN(µ, σ2
)
ot∼Bernoulli (p)
p=a
a+b
.(27)
In this model we assume that the probability of demand occurrence does407
not change over time, so that the probability of having sales is the same for408
every time period. The distribution of pin this case becomes degenerate and409
15
is equivalent to the Dirac’s distribution (Dirac, 1927), so that the probability410
density function of the random variable anywhere except at the point pis411
equal to zero. The conditional expectation of the occurrence variable otin412
this case can be calculated as:413
µo,t+h|t= ˜pt+h|t= ˆp. (28)
The conditional variance of demand occurrence does not change over time as414
well and, due to (20), is equal to:415
σ2
o,t+h|t= ˆp(1 −ˆp).(29)
Note that the values of aand bcannot be estimated correctly, because they416
are not observable, and there is an infinite combination of these variables,417
giving one and the same p. Moreover, we are not interested in the specific418
values of these variables, the probability pis of the main importance for the419
forecasting purposes. So the whole model can be simplified further, dropping420
the aand band estimating the probability ˆpdirectly, which simplifies all the421
calculations. This also allows preserving the number of degrees of freedom.422
The number of parameters of the model (27) with the direct estimation of423
the fixed probability is equal to four: initial value lz,0, smoothing parameter424
αz, variance σ2
and the probability of the occurrence ˆp.425
3.4. iETSO– the odds ratio model426
Another model rising naturally from (21) is the model with the restriction427
on bt:428
bt= 1.(30)
Given this restriction the iETSO, the odds ratio model can be obtained:429
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αt)
(1 + t)∼logN(0, σ2
)
ot∼Bernoulli (pt)
pt=at
at+1
at=la,t−1(1 + a,t)
la,t =la,t−1(1 + αaa,t)
(1 + a,t)∼logN(0, σ2
a)
.(31)
16
It is called “odds ratio”, because the probability of occurrence in (31) is430
calculated using the classical logistic transform. This also means that atis431
equal to:432
at=pt
1−pt
.(32)
When atincreases in the iETSOmodel, the odds ratio increases, meaning433
that the probability of occurrence increases as well.434
The variable pthas the logit-normal distribution, meaning that, once435
again, neither the conditional expectation nor the conditional variance of pt
436
can be derived analytically. As for the conditional median, it is equal to:437
˜pt+h|t=la,t+h|t
la,t+h|t+ 1.(33)
On the construction stage of the model (31), the following set of equations438
can be used for the update of the states:439
ˆyt=otˆ
lz,t−1
ˆ
lz,t =ˆ
lz,t−1(1 + αzet)
ˆat=ˆ
la,t−1
ˆ
la,t =ˆ
la,t−1(1 + αaea,t)
,(34)
where the error term ea,t is calculated using the proxy (24):440
ea,t =ut
1−ut
−1.(35)
The model (31) has five parameters to estimate: initial value lz,0, smooth-441
ing parameter αz, variance ˆσ2, initial value la,0and the smoothing parameter442
αa.443
3.5. iETSI– the inverse odds ratio model444
Another special case of the model (21) is obtained by restricting at= 1445
instead of bt. The general model reduces then to iETSImodel:446
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αt)
(1 + t)∼logN(0, σ2
)
ot∼Bernoulli (pt)
pt=1
1+bt
bt=lb,t−1(1 + b,t)
lb,t =lb,t−1(1 + αbb,t)
(1 + b,t)∼logN(0, σ2
b)
.(36)
17
The variable btcan be represented in terms of the probability as:447
bt=1−pt
pt
.(37)
So the occurrence part of the (36) models the inverse of the odds ratio (32),448
thus the name of the model. This means that the increase of btwill lead to449
the decrease of the probability of occurrence, decreasing the odds ratio.450
Similarly to the general iETSGand iETSOmodels, the conditional expec-451
tation of the probability pth-steps ahead does not have a closed form and452
needs to be calculated numerically. The median probability, on the other453
hand, can be calculated as:454
˜pt+h|t=1
1 + lb,t+h|t
.(38)
In the construction stage of the model (36), the following set of equations455
can be used for the update of the states:456
ˆyt=otˆ
lz,t−1
ˆ
lz,t =ˆ
lz,t−1(1 + αzet)
ˆ
bt=ˆ
lb,t−1
ˆ
lb,t =ˆ
lb,t−1(1 + αbeb,t)
.(39)
As for the error term eb,t, it can be estimated using the proxy (25):457
eb,t =1−ut
ut
−1.(40)
The other option of the construction of the model (36) emerges, when458
the probability ptin (36) is analysed further. One of the ways of estimating459
1 + btis assuming that its rounded down value is equal to the observed460
demand intervals. We can then make a substitution of qjt=b1 + btcin order461
to arrive to the mechanism of probability update in Croston’s method (2):462
pjt=1
b1 + btc=1
qjt
.(41)
So Croston’s method is just one of the ways to estimate the model iETSI. It463
is important to note that although btmay vary in time on each observation,464
influencing the corresponding probability pt, iETSImodel estimated with the465
18
probability (41) cannot be estimated when demand is zero. So during the466
estimation of the model based on (41), it is assumed that the states of btdo467
not change between demand occurrences. This assumption corresponds to468
the original Croston’s method.469
Summarising, there are two ways of estimating the model (36): either470
using (39) and the proxy for the error term (40) or using Croston’s method,471
based on the demand intervals and (41).472
The model (36) has five parameters to estimate, similar to the model473
iETSO(31), with the only difference that the parameters of btneed to be474
estimated instead of at.475
3.6. iETSD– the direct probability model476
The last special case of (21) that we discuss is the model with the restric-477
tions:478
at+bt= 1, at≤1,(42)
an iETSDmodel. After inserting (42) in the formula of the probability (16),479
it becomes apparent that pt=at. But in order to make sure that the480
probability is kept in the region [0, 1], the minimum function should be used:481
yt=otlz,t−1(1 + t)
lz,t =lz,t−1(1 + αt)
(1 + t)∼logN(0, σ2
)
ot∼Bernoulli (at)
at= min (la,t−1(1 + a,t),1)
la,t =la,t−1(1 + αaa,t)
(1 + a,t)∼logN(0, σ2
a)
.(43)
Given the restriction (42), the variable atwill have truncated log nor-482
mal distribution, which has a conditional expectation discussed in Zaninetti483
(2017). A slightly different approach would be to derive the conditional mean484
and variance for the one-step-ahead values in the logarithms based on Barr485
and Sherrill (1999):486
´µpt+1|t= ´µat+1|t= log la,t −σa
φ(ιt)
Φ (ιt),(44)
487
´σ2
pt+1|t= ´σ2
at+1|t=σ2
a1−ιt
φ(ιt)
Φ (ιt)−φ2(ιt)
Φ2(ιt),(45)
19
where ιt=−log la,t
σa,φ(·) is the probability density function and Φ(·) is the488
cumulative function of standard normal distribution, and ´µand ´
σ2are the489
mean and variance in logarithms respectively. These values can then be used490
in order to calculate the conditional h-steps ahead expectation and variance491
of the log normally distributed at. The conditional median of pt+hgiven the492
values on the observation tis:493
˜pt+h|t= ˜at+h|t=la,t exp σaΦ−1Φ (ιt)
2.(46)
When we fit the model (43) to the data, it simplifies to:494
ˆyt=otˆ
lz,t−1
ˆ
lz,t =ˆ
lz,t−1(1 + αet)
ˆat=f(ˆ
la,t−1)
ˆ
la,t =ˆ
la,t−1(1 + αaea,t)
,(47)
where f(ˆ
la,t−1) is the one-step-ahead conditional median calculated using495
(46).496
However, if all the values of at≤1, then all the conditional values can497
be simplified and will be equivalent to those from the ETS(M,N,N) model.498
This situation may occur, when demand on the product becomes obsolete.499
Then, the model (43) can be estimated using the TSB method (6). The500
estimation method (47) can then be simplified, and the one-step-ahead value501
f(ˆ
la,t) = ˜
la,t will be a good estimate of at.502
When it comes to the construction of the model (43) using (47), it is503
implied that the error term is equal to:504
1 + ea,t =ot
ˆat
.(48)
However, this is unrealistic, because in case of ot= 0 the error (48) becomes505
equal to zero, thus making the model inestimable. In order to estimate the506
model (43), we have to introduce the following approximation for the error507
term ea,t, which guarantees non-zero forecast errors for the boundary cases:508
ea,t =ot(1 −2κ) + κ−ˆat
ˆat
,(49)
where κis a very small number (for example, κ= 10−10). This modification509
is artificial but it helps in the estimation of the model. It is worth stressing510
that the only purpose of κis to make model estimable.511
20
Finally, due to the restrictions (42), the model has five parameters to512
estimate, similar to iETSOand iETSI.513
3.7. Conditional values and prediction intervals for iETS models514
One of the advantages of statistical models is the ability to work with515
distributions of variables rather than point values. For intermittent state516
space models, there are some peculiarities that need to be taken into account,517
which are mainly caused by the assumption of log normal distribution of518
residuals.519
In the previous subsections, we discussed the conditional expectations,520
medians and variances of each of the models. It can be seen from these values521
that expectations and variances will increase over the forecasting horizon.522
This is due to the skewness of the log normal distribution and the increase523
of the uncertainty. In some cases this might be a desirable property, but we524
argue that in a general case the conditional medians should be preferred to the525
conditional means. Furthermore, it has been shown that the median values526
produce more accurate forecasts than the means in the case of log normal527
distribution (B˚ardsen and L¨utkepohl, 2009). The medians are considered to528
be more robust and are easier to work with in this case.529
All of this implies that both Croston’s method and TSB produce fore-530
casts that correspond to medians of the proposed models and that the final531
forecasts produced by all the models discussed in this paper are not mean532
forecasts, but a multiplication of median forecasts for demand sizes by the533
median forecasts of the probability of occurrence.534
Using log normal distribution also means that the h steps ahead prediction535
intervals for zt+hneed to be derived separately. Rewriting the zt+hin terms536
of lz,t and the error terms, we get (based on the derivations in Appendix A):537
zt+h=lz,t(1 + t+h)
h−1
Y
j=1
(1 −α+α(1 + t+j)).(50)
From the distributional point of view, assuming that the error term is not538
autocorrelated and substituting 1 + j=x, the right hand side of (50) is539
equivalent to:540
x(1 −α+αx)h−1=x
h−1
X
j=0 h−1
j(1 −α)j(αx)h−1−j,(51)
21
which is the convolution of log normal distributions, and does not have a541
closed form. In order to get the appropriate quantiles of this distribution,542
either simulations or approximations (for example, see the method proposed543
by Fenton, 1960) for the function can be used. We need to point out that544
if either the smoothing parameter αis close to zero or to one, or the scale545
parameter σ2
is small, then the conditional distribution of the (50) with546
respect to lz,t will be close to the log normal distribution. The case of α=547
0 corresponds to the log normal distribution with constant variance (thus548
the narrowest possible intervals for the model), while the case of α= 1549
corresponds to the log normal distribution with random-walk-like variance550
(the widest possible intervals for the model). The largest difference between551
the true distribution of (50) and the log normal distribution will be when552
α= 0.5 and the scale parameter σ2
is greater than one. And the distance553
between the two distributions will be increasing with the increase of the554
forecast horizon h. In all the other cases, this should be a good approximation555
(see an example in Fenton, 1960). So, we will use the conventional hsteps556
ahead variance of the ETS(M,N,N) model (Hyndman et al., 2008, p.81) in557
the log-domain and the log normal distribution as the approximation of the558
real conditional distribution of (50).559
In order to calculate prediction intervals for intermittent state space mod-560
els, the cumulative distribution function (CDF) can be used:561
Fy(yt+h< Q) = ˜pt+h|tFz(zt+h< Q) + (1 −˜pt+h|t),(52)
where Fz(zt+h) is the h-steps ahead CDF of log normal distribution for zt+h,562
Fy(yt+h) is the final CDF of the variable yt+hand Qis the value of the563
desired quantile of the distribution. Fy(yt+h) should correspond to the desired564
probability (for example, 0.95), and given the values of conditional medians565
˜pt+h|tdiscussed in the previous subsections, the only unknown element in566
(52) is Fz(zt+h), which can be calculated as:567
Fz(zt+h< Q) = Fy(yt+h< Q)−(1 −˜pt+h|t)
˜pt+h|t
.(53)
So, in the construction of prediction intervals, the formula (53) can be used568
for the calculation of the necessary quantiles of the log normal distribution569
of zt. In the context of intermittent demand, the lower bound of the interval570
is usually not needed and in majority of cases will correspond to zero, so the571
one-sided prediction interval for the upper bound (which is important for572
22
safety stock calculation) can be calculated based on (53). In order to do so,573
the upper quantile is calculated for 1 −αrather than 1 −α
2.574
3.8. Integer state space model575
The integer iETS model is more complicated than the continuous model576
(21), and it has two important aspects that distinguish it from its counter-577
part.578
First, conditional expectation and variance cannot be analytically derived579
for this model. However, they can be both calculated via simulations. But580
in order for the model to be consistent with the other models discussed in581
this paper, the median demand sizes need to be taken during the simulation582
instead of the means for the final values of point forecasts. Simulations can583
also be used for the calculation of quantiles of distribution for the predic-584
tion intervals construction. However, we can use a simplification, which still585
allows using analytical derivations instead of simulations for both point fore-586
casts and prediction intervals. This simplification is based on the following587
equality for any quantiles of any distribution (see Appendix D):588
Qα(dzte) = dQα(zt)e,(54)
where Qα(·) is αquantile of a random variable. The equality (54) implies that589
the quantiles of the log normal distribution imposed by the continuous model590
underlying demand sizes can be used and then rounded up. As a result there591
is no need to work directly with the integer model and to produce values via592
the simulations. Furthermore, the following equality holds for all ztas well593
(Appendix D):594
Qα(bztc) = bQα(zt)c.(55)
This means that the decision of whether to round up or round down values595
can be made by a forecaster depending on their preferences after producing596
quantiles of the continuous model. The result will be equivalent to using the597
model with the respective rounding mechanism directly.598
Second, the likelihood function for the integer model is more complicated599
than for the continuous one, because it relies on the discretised version of600
the log normal distribution (see Chakraborty, 2015, for details on discrete601
analogues of continuous distributions). Bi et al. (2001) propose such distri-602
bution, called “Discrete Gaussian Exponential” (DGX) and use it on retail603
sales data, reporting that the distribution fits the data well. We adopt a sim-604
pler approach, considering a multitude of values of ztcorresponding to the605
23
one rounded up value dzte: taking into account all the values in the region606
(dzte− 1,dzte]. The density function cannot be used for the estimation of the607
likelihood for demand sizes in this case, but the CDF for the interval should608
be used instead:609
Fz(dzte − 1< zt≤ dzte)=Φlogdzte
σ−Φlog(dzte − 1)
σ.(56)
Note that with the increase of the level and the decrease of the variance610
of time series (which is typical for the non-intermittent data), the distance611
between Φ logdzte
σand Φ log(dzte−1)
σwill decrease, and (56) will asymptot-612
ically be equal to the PDF of the same distribution.613
The log-likelihood function for the integer iETS model (8) based on (56)614
is:615
`(θ, σ2|Y) = X
ot=1
log Φlogdzte
σ−Φlog(dzte − 1)
σ
−T0
2log(2πeσ2
) + X
ot=1
log(ˆpt) + X
ot=0
log(1 −ˆpt)
.(57)
The parameters of the model (8) can be estimated directly via maximisa-616
tion of the likelihood function (57), which is a more computationally inten-617
sive task than the maximisation of the likelihood function for the continuous618
model (7). In order to simplify the process we propose to use two-stage op-619
timisation, where in the first stage the parameters of the continuous model620
are estimated, and in the second the likelihood (57) is used for the correction621
of the estimated parameters.622
3.9. Model selection in the iETS framework623
Having the likelihood functions for all five intermittent state space models624
iETSG(21), iETSF(27), iETSO(31), iETSI(36) and iETSD(43), and know-625
ing the number of parameters to estimate, we can calculate any information626
criterion and use it for model selection. For example, the Akaike Information627
Criterion can be calculated as:628
AIC = 2k−2`(θ,ˆσ2|Y),(58)
where for the intermittent models kis equal to 7, 4 or 5 (depending on629
the model underlying the occurrence part) and, for example, for a basic630
24
ETS(A,N,N) k= 3. So the only difference between iETSO, iETSIand iETSD
631
is in the probability modelling mechanism. In order to distinguish the model,632
selected via an information criterion from the individual cases, we use the633
notation iETSA.634
Note that we can also compare conventional non-intermittent ETS models635
(with trend and seasonality) with the intermittent ones using information636
criteria. However we do not aim to cover all the possible models in this637
paper and focus on the level models only.638
It is also important to note at this point that having at least four pa-639
rameters to estimate, iETS models need at least five non-zero demand ob-640
servations. If for some reason the sample is smaller, then simpler models641
for demand sizes should be used instead of ETS(M,N,N). For example, using642
a model with fixed level (setting smoothing parameter αzto zero) allows643
preserving one degree of freedom without substantial loss in generality and644
fitting the model to data with at least four non-zero observations.645
4. Experiments646
4.1. Real time series experiment647
In order to examine the performance of the proposed intermittent state648
space models, we conduct an experiment on two datasets.649
The first is 3000 real time series of automotive spare parts (Auto). This650
dataset originates from Syntetos and Boylan (2005) and was also used in651
Kourentzes (2014). This is monthly time series, containing 24 observations.652
We withhold 5 observations from each time series for measuring forecasting653
accuracy.654
The second dataset is Royal Air Force (RAF) data, which contains 5000655
real time series (Eaves and Kingsman, 2004). Each of the time series in this656
dataset has 84 observations. We withhold 12 observations and use them in657
order to measure forecasting accuracy of tested models.658
The two datasets have distinctive characteristics, one of them (RAF)659
exhibits higher intermittency than the other (Auto). The distributions of660
the proportions of the non-zero demands for both datasets are shown in661
Figure 1.662
The Auto data can be characterised as mildly intermittent, as all the663
series contain less than 50% of zeroes. In contrast, the RAF data is strongly664
intermittent, as all series contain at least 75% of zeroes.665
25
Proportion of non−zero values
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0 100 200 300 400 500
(a) Automotive data.
Proportion of non−zero values
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0 200 400 600 800 1200
(b) RAF data.
Figure 1: The distributions of proportions of the non-zero demands for two dataset.
We have used the following set of models from the smooth, v2.5.1 package666
(Svetunkov, 2019b) for R (R Core Team, 2018) in this experiment:667
1. ETS(A,N,N), which is needed as a benchmark;668
2. iETSF– the model with fixed probability;669
3. iETSO– the odds ratio model;670
4. iETSI– the inverse odds ratio model;671
5. iETSD– the direct probability model;672
6. iETSG– the general iETS model;673
7. iETSA– the model with model selection between the first five models674
using AIC corrected;675
8. int iETSF– integer counterpart of iETSF;676
9. int iETSO– integer counterpart of iETSO;677
10. int iETSI– integer counterpart of iETSI;678
11. int iETSD– integer counterpart of iETSD;679
12. int iETSG– integer counterpart of iETSG;680
13. int iETSA– similar to (7), but among the integer models.681
We have also added the following filters implemented in the counter, v0.2.0682
package for R (Svetunkov, 2019a):683
26
1. Hurdle shifted Poisson filter (denoted “HSP”) discussed in Snyder et al.684
(2012), implemented in the hsp() function;685
2. Negative Binomial filter (denoted “NegBin”) from Snyder et al. (2012),686
implemented in the function negbin().687
Finally, we have included the three benchmark methods, discussed earlier in688
this paper, implemented in tsintermittent v1.9 package for R (Kourentzes689
and Petropoulos, 2016), optimised using MSE (with cost="mse" parameter):690
1. TSB method implemented in tsb() function;691
2. Croston’s method implemented in crost() function;692
3. SBA method implemented in crost() function.693
We measure the accuracy of point forecasts of all the competing methods694
and models using the following error metrics, discussed in Kourentzes (2014),695
Petropoulos and Kourentzes (2015) and Davydenko and Fildes (2013):696
•sCE – scaled Cumulative Error, measuring the potential bias in fore-697
casts;698
•sAPIS – scaled Absolute Periods in Stock, measuring the bias over the699
lead time;700
•RRMSE – relative Root Mean Squared Error, measuring the accuracy701
of forecasts;702
Given that iETS models and the filters allow producing prediction intervals,703
we used Relative Mean Interval Score (based on Gneiting and Raftery, 2007):704
RMIS = MIS1
MIS0
,(59)
where MIS1is the Mean Intervals Score for the model / method under con-705
sideration and MIS0is the same measure for the benchmark model / method.706
MIS is calculated as the arithmetic mean of the IS (Gneiting and Raftery,707
2007) over the forecasting horizon. The width of the prediction interval is708
set to 0.95.709
We used relative rather than scaled error measures because they are easier710
to interpret and work with. For example, if the RRMSE of some model /711
method is equal to 0.93, this means that it performs 7% more accurately712
than the benchmark in terms of the RMSE.713
27
We have calculated mean and median values of these errors across all714
the series and summarised them in two tables. The relative measures were715
calculated with ETS(A,N,N) as a benchmark and then aggregated using the716
geometric mean, while for the others, the arithmetic one was used.717
Methods Mean values Median Values
sCE sAPIS RRMSE RMIS sCE sAPIS RRMSE RMIS
ETS(A,N,N) -0.08 5.52 1.00 1.00 0.21 4.30 1.00 1.00
iETSF-0.62 5.33 0.98 0.88 -0.31 3.86 1.00 0.82
iETSO-0.57 5.33 0.98 0.87 -0.26 3.88 1.00 0.82
iETSI-0.70 5.40 0.98 0.88 -0.38 3.88 1.01 0.82
iETSD-0.62 5.33 0.98 0.88 -0.31 3.86 1.00 0.82
iETSG-0.64 5.40 0.98 0.88 -0.33 3.92 1.00 0.82
iETSA-0.34 5.40 0.99 0.96 -0.03 4.08 1.00 1.00
int iETSF-1.23 5.93 0.99 0.89 -0.91 4.25 1.03 0.86
int iETSO-1.19 5.90 0.99 0.89 -0.88 4.25 1.03 0.86
int iETSI-1.29 6.02 0.99 0.89 -1.00 4.34 1.04 0.85
int iETSD-1.23 5.93 0.99 0.89 -0.91 4.25 1.03 0.86
int iETSG-1.25 5.99 0.99 0.89 -0.94 4.32 1.03 0.85
int iETSA-1.02 5.87 0.99 1.00 -0.74 4.22 1.02 0.98
HSP 0.05 5.51 1.01 0.95 0.35 4.33 1.00 0.78
NegBin -0.09 5.32 1.00 0.87 0.24 4.24 1.00 0.88
TSB -0.15 5.63 1.01 NA 0.12 4.27 1.00 NA
Croston -0.10 5.76 1.02 NA 0.17 4.46 1.00 NA
SBA -0.19 5.72 1.02 NA 0.07 4.38 1.00 NA
Table 1: Automotive data results.
The results of the experiment on Automotive data are given in the Table 1,718
which shows that all the methods performed similarly, with several methods719
performing slightly worse than the others in terms of bias and prediction720
intervals. Notably, all continuous iETS models outperformed their integer721
counterparts in terms of all the error measures. The difference between722
the continuous iETS models does not seem significant. The hurdle shifted723
Poisson and Negative Binomial filters of Snyder et al. (2012) perform well724
in terms of bias measures and RMIS, but fail to outperform ETS(A,N,N) in725
terms of RRMSE.726
28
In order to determine if the differences between the models are statistically727
significant, we have conducted a Nemenyi test (Demˇsar, 2006) on RRMSE728
and RMIS values. The results of this test are shown in Figure 2. The ranking729
was done so that the model with the highest measure would have the score730
of 1 and the model with the lowest measure would have the score of 15.731
The Y-axis in Figure 2 shows average ranks for each of the models. The732
vertical lines in the figure show the groups of models, in which the difference733
between the ranks is statistically insignificant. The significance level used in734
this experiment is 5%.735
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
iETS_O − 8.72
iETS_F − 8.72
ETS(A,N,N) − 8.73
iETS_D − 8.73
iETS_A − 8.80
iETS_I − 8.82
NegBin − 8.85
iETS_G − 8.87
HSP − 9.17
TSB − 9.39
SBA − 9.62
Croston − 9.67
int iETS_A − 10.07
int iETS_O − 10.33
int iETS_F − 10.49
int iETS_D − 10.60
int iETS_G − 10.69
int iETS_I − 10.72
(a) Nemenyi test on RMSE.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
iETS_G − 6.56
iETS_I − 6.60
iETS_D − 6.63
iETS_F − 6.63
iETS_O − 6.64
HSP − 7.08
NegBin − 8.17
int iETS_I − 8.31
int iETS_G − 8.42
int iETS_F − 8.46
int iETS_D − 8.46
int iETS_O − 8.56
iETS_A − 9.15
int iETS_A − 9.98
ETS(A,N,N) − 10.33
(b) Nemenyi test on MIS.
Figure 2: Nemenyi tests on RMSE and MIS of models applied to automotive data.
Figure 2a shows that all the continuous iETS models performed similarly736
to each other and to ETS(A,N,N) model in terms of RRMSE, but their737
integer counterparts were statistically significantly worse. Note also that the738
NegBin filter performed better than HSP, which agrees with Snyder et al.739
(2012), although the difference between the two is not significant. The best740
performing model, iETSOhas almost the same average rank as the simpler741
iETSFmodel, so they can be considered as the best performing models in742
our experiment.743
As for the prediction intervals performance (Figure 2b), the continuous744
29
iETS models perform significantly better than the NegBin or HSP filters and745
better than the integer iETS models. However, the difference between the746
iETS models is not significant. Note also that the iETSAproduced the least747
accurate intervals, performing slightly better than ETS(A,N,N) model.748
The results of the experiment for the Royal Air Force data are shown in749
Table 2.750
Methods Mean values Median Values
sCE sAPIS RRMSE RMIS sCE sAPIS RRMSE RMIS
iETS(A,N,N) 0.14 8.48 1.00 1.00 0.67 6.50 1.00 1.00
iETSF-0.10 7.69 0.89 0.65 0.42 5.17 0.98 0.63
iETSO-0.03 7.91 0.91 0.65 0.47 5.39 0.99 0.64
iETSI-0.17 7.57 0.86 0.66 0.36 4.98 0.98 0.63
iETSD-0.11 7.69 0.88 0.65 0.40 5.15 0.98 0.63
iETSG-0.10 7.79 0.88 0.66 0.40 5.13 0.99 0.63
iETSA-0.11 7.71 0.88 0.67 0.41 5.15 0.98 0.65
int iETSF-0.35 7.46 0.66 0.66 0.13 4.58 0.98 0.60
int iETSO-0.30 7.61 0.67 0.67 0.16 4.67 0.99 0.62
int iETSI-0.41 7.37 0.64 0.66 0.08 4.33 0.98 0.59
int iETSD-0.36 7.46 0.66 0.66 0.12 4.58 0.98 0.60
int iETSG-0.35 7.52 0.66 0.67 0.11 4.44 0.99 0.61
int iETSA-1.04 12.34 0.94 1.75 -0.55 7.99 1.01 0.93
HSP 0.30 8.93 1.04 0.68 0.76 6.81 1.01 0.65
NegBin 0.08 8.41 0.96 0.74 0.52 6.01 1.00 0.73
TSB 0.10 8.56 0.96 NA 0.55 5.95 1.00 NA
Croston 0.02 8.24 0.92 NA 0.45 5.77 1.00 NA
SBA -0.02 8.17 0.89 NA 0.40 5.58 1.00 NA
Table 2: Royal Airforce data results.
As can be seen from the Table 2, the integer iETS models outperform751
the continuous ones in terms of the mean RRMSE and RMIS, but perform752
slightly worse in terms of mean sCE and sAPIS. As for the median values, it753
seems that all the models perform similar in terms of RRMSE, while the inte-754
ger iETS, once again, are more accurate in terms of the other error measures.755
Note also that the continuous iETS models outperform HSP and NegBin fil-756
ters and Croston, TSB and SBA forecasting methods across all the error757
measures. The only iETS model that does not perform well is the integer758
30
iETSA, the one with the model selection mechanism. The reason for such759
poor performance is probably because in many cases the integer version of760
ETS(A,N,N) was selected.761
Following the same procedure as with automotive data, we have con-762
ducted a Nemenyi test with the results shown in Figure 3.763
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
int iETS_I − 8.21
int iETS_F − 8.30
int iETS_D − 8.40
int iETS_G − 8.53
int iETS_O − 8.55
iETS_I − 8.56
iETS_F − 8.98
iETS_A − 9.00
iETS_D − 9.05
iETS_G − 9.19
iETS_O − 9.34
SBA − 9.74
Croston − 9.90
ETS(A,N,N) − 10.57
NegBin − 10.59
TSB − 10.59
HSP − 11.70
int iETS_A − 11.78
(a) Nemenyi test on RMSE.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
int iETS_I − 6.87
int iETS_F − 7.00
int iETS_D − 7.04
int iETS_G − 7.11
iETS_I − 7.21
int iETS_O − 7.24
iETS_O − 7.43
iETS_F − 7.51
iETS_G − 7.54
iETS_D − 7.64
HSP − 7.73
iETS_A − 7.87
NegBin − 8.56
int iETS_A − 10.22
ETS(A,N,N) − 13.03
(b) Nemenyi test on MIS.
Figure 3: Nemenyi tests on RMSE and MIS of models applied to RAF data.
The test shows that continuous and integer iETS models perform sig-764
nificantly better in terms of RRMSE (Figure 3a) than the other methods765
and filters, with the exception of integer iETSA. Still, integer iETS models766
perform significantly better than almost all of their continuous analogues in767
terms of RMSE. The only exception is the iETSImodel, which performed768
similar to the integer iETS models. Notably, the second best performing769
group in terms of the RRMSE is the group of the continuous iETS models.770
They are significantly better than the filters and the forecasting methods,771
but at the same time significantly worse than their integer counterparts.772
As for the prediction intervals (Figure 3b), the picture is similar to773
RRMSE, although not as distinct as in that case: integer iETS outperform774
all the other models. However, in this case some of the continuous models do775
not differ statistically from their integer counterparts in terms of the intervals776
31
accuracy (e.g. iETSI, iETSO, iETSFand G).777
Summarising the results of the both experiments, the continuous iETS778
models perform better than the other models on mildly intermittent data779
(Auto dataset), while the integer ones produce more accurate forecasts on780
the heavily intermittent one (RAF dataset). Still, the continuous iETS mod-781
els are more robust and can be recommended for wide application on different782
types of intermittent data. Finally, among all the iETS models, iETSIseems783
to be the most robust and consistent across both datasets. This is probably784
due to the nature of the datasets and the ability of the model to capture785
the “non-occurrence”. Note also that we did not round up the prediction786
intervals of the continuous iETS models in this experiment, which probably787
explains a worse performance of these models in comparison with their in-788
teger counterparts in terms of RMIS on the heavier intermittent data (RAF789
dataset).790
The rounding effect has also been checked for its effect on low count time791
series. Experimentation on simulated Binomial data (n=2, p=0.5) shows792
that integer-based iETS models perform well in terms of prediction intervals793
(according to relative MIS), although slightly less well than continuous iETS794
models in terms of relative Root Mean Square Error.795
As can be seen from this experiment the proposed intermittent state space796
models perform very well and can be applied to real life problems. Although797
we know that the data we deal with is count and that the continuous iETS798
model is not count, we find that it is still useful.799
5. Conclusions800
In this paper, we have proposed a new statistical model for intermit-801
tent demand. This model expands the Hyndman et al. (2008) taxonomy802
by the inclusion of intermittent models and unites the worlds of continu-803
ous and intermittent data. This is vital for the forecasting of a wide range804
of stock keeping units, which may evolve from slow moving to fast mov-805
ing products (or vice-versa). We have discussed both the demand sizes and806
the demand occurrence parts of the model, demonstrating the potential for807
the extendability of the model in both directions and deriving the moments808
and the quantiles analytically. We also proposed mechanisms for the model809
construction, estimation and selection.810
This paper was focused on the ETS(M,N,N) model and the intermittent811
equivalent of this model was called iETS(M,N,N). The most general model,812
32
“iETSG”, is the most complicated one, but at the same time, is the most flex-813
ible. We then discussed several special cases of this model, appearing when814
specific restrictions on the parameters of the original model are imposed. The815
simplest state space model arising from this is the one with the fixed proba-816
bility (denoted as “iETSF”), which is very easy to estimate and use. We then817
discussed two more complicated ones, namely odds ratio “iETSO”, and the818
inverse odds ratio “iETSI” models. We have shown that the “iETSI” model819
can be estimated by Croston’s method. Finally, we proposed “iETSD” model,820
which can be estimated using the TSB method under specific conditions. We821
have also derived the likelihood functions for all the iETS models, which al-822
low not only obtaining efficient and consistent estimates of parameters, but823
also selecting between several state space models. This also includes select-824
ing between intermittent and non-intermittent models, thereby simplifying825
the forecasting process. We derived analytical formulae for the conditional826
mean, variance and quantiles of distribution for our models, showing that827
the forecasts produced by the models correspond to the conditional median828
of demand sizes rather than the mean, which in the case of intermittent data829
is a useful property. We have also proposed a method of prediction intervals830
construction based on the proposed intermittent state space model without831
recourse to simulations. Finally, we developed integer counterparts of iETS832
models, addressing the issue of count data modelling.833
Lastly, the experiment on automotive data and on data from the Royal834
Air Force shows that the proposed approach is applicable to real life sup-835
ply chain problems and that the proposed models can perform very well on836
real-world datasets. They outperform the existing forecasting methods and837
several filters previously proposed in the literature. iETSIwas generally the838
most robust forecasting model on both data sets, and it can be concluded839
that the continuous iETS models performed very well overall. In addition,840
while the integer iETS models do not always produce the most accurate841
forecasts, they seem to be suitable in cases of the heavily intermittent data.842
We should remark that the focus of this paper was on a specific iETS(M,N,N)843
model with several special cases. We simplified the notation for this model844
in the paper. However, we propose a more detailed one, which acknowl-845
edges the flexibility of the proposed approach and the fact that both demand846
sizes and demand occurrence parts may have their own ETS models (po-847
tentially with exogenous variables). So, the general model, discussed in the848
paper can be denoted as iETS(M,N,N)G(M,N,N)(M,N,N), where the letters849
in the first brackets indicate the type of ETS model for demand sizes and850
33
the letters after the subscript “G” refer to the models for the variables at
851
and btin the demand occurrence part of the model respectively. Using this852
notation, new types of models can be studied in future research. For exam-853
ple, a model with additive trend in demand sizes and multiplicative trend854
in the demand occurrence with the odds ratio mechanism can be denoted as855
iETS(M,A,N)O(M,M,N). This allows extending the Hyndman et al. (2008)856
taxonomy and opens new avenues for research.857
It is also worth mentioning that the approach of intermittent state space858
modelling allows using (for both demand sizes and demand occurrence parts859
of the model) ETS, ARIMA, regression models or diffusion models, which860
could be applied to a wide range of time series (not limited with intermit-861
tent demand). In fact, any statistical model can be applied either to the862
demand sizes, or to the at, or btparts of the model, as long as the neces-863
sary transformations are made, ensuring that the model produces positive864
values only. For example, working with the pure additive ETS models on865
the log-transformed demand sizes, would simplify the derivations and could866
give potential benefits in terms of the ease of use of the model. Studying867
properties of these models would be another large area of research. Finally,868
in order to show the connections between the methods and the models, we869
assumed throughout this paper that demand occurrence and demand size870
parts are independent. This could be modified in a new model using the871
state space approach discussed in the paper.872
Acknowledgement873
We would like to thank Stephan Kolassa for his comments to this paper,874
which helped to improve it substantially. We would also like to thank two875
anonymous reviewers for their feedback which led to a reformulation and876
rethinking of the model presented in this paper.877
Appendix A. Properties of ETS(M,N,N) model878
The main properties of ETS(M,N,N) are well studied in Akram et al.879
(2009) and are not discussed here. The important thing to note is that the880
authors use Kakutani’s theorem, showing that if the mean value of (1 + αzt)881
is equal to one and the distribution is non-degenerate, then the sample path882
of ETS(M,N,N) tends to converge almost surely to zero. This is based on883
the assumption of normal distribution with zero mean of t, which leads884
34
to E(1 + t) = 1 in their context. However, given that the model that885
we work with assumes a log normal distribution of the error term, and the886
expectation of 1+ αtis not equal to one, the sample path will differ from the887
one of the conventional ETS(M,N,N) model. In order to understand how the888
sample path changes over the time, the value (1 + αt) needs to be analysed889
separately.890
Taking that we assume that the error term in ETS(M,N,N) has a log891
normal distribution, it can be shown that (1 + αt) has a three-parameter892
log normal distribution (Sangal and Biswas, 1970). In order to see what this893
value is equal to, we regroup it the following way:894
1 + αt= (1 −α) + α(1 + t).(A.1)
The element 1+ tin (A.1) has log normal distribution: 1+t∼logN(0, σ2
) -895
and as a result 1+αthas three-parameter log normal distribution: 1+αt∼896
3PlogN(log α, σ2
,1−α). This property can be used for further derivations.897
Given that the demand realisation in the model (7) is not observable,898
when ot= 0, the expectation conditional on the last non-zero demand should899
be taken for the zero demand periods. This means that the transition equa-900
tion at some observation t+kis defined based on the previous kvalues:901
lz,t+k=lz,t
k
Y
j=1
(1 + αzt+j).(A.2)
The conditional expectation of the level lz,t+kgiven the value of lz,t, and902
assuming that the error term is not autocorrelated, is:903
E(lz,t+k|t) = lz,t
k
Y
j=1
E(1 + αt+j) = lz,t (E(1 + αt+j))k.(A.3)
The expectation of the term 1 + αt−jin (A.3), given (A.1) and the assump-904
tion µ= 0, is:905
E(1 + αt+j) = (1 −α) + αexp σ2
2.(A.4)
The only case, when (A.4) would be equal to one is when σ2
= 0, which is906
not realistic. In all the other cases, the expectation will be greater than one,907
meaning that the level will always asymptotically diverge from zero. This908
35
situation might change if we do not assume that µ= 0, but that would lead909
to another, more complicated statistical model.910
Inserting (A.4) in (A.3), the following is obtained:911
E(lz,t+k|t) = lz,t (1 −α) + αexp σ2
2k
.(A.5)
Using the same logic as before, it can be shown that the variance of the term912
1 + αtis:913
V(1 + αt) = α2exp(σ2
)−1exp σ2
.(A.6)
Taking the independence of error terms into account, the conditional variance914
of lz,t+kgiven lz,t should be calculated as:915
V(lz,t+k|t) = l2
z,t k
Y
j=1 V(1 + αt) + E(1 + αt)2−
k
Y
j=1
E(1 + αt)2!,(A.7)
where the expectation and the variance of the error term are equal respec-916
tively to (A.4) and (A.6).917
At the same time, the conditional median of the states is equal to:918
Md(lz,t+k|t) = lz,t .(A.8)
In order to separate the level from the uncertainty coming from the error919
term, the transition of states in ETS(M,N,N) model with the log normal920
distribution, when the demand is not observed, should be governed by the921
equation (A.8) instead of (A.5).922
As for the h-steps ahead demand, produced using ETS(M,N,N), it can be923
written as:924
zt+h=lz,t(1 + t+h)
h−1
Y
j=1
(1 + αt+j).(A.9)
The conditional expectation of (A.9), given (A.5) is:925
E(zt+h|t) = lz,t exp σ2
2(1 −α) + αexp σ2
2h−1
.(A.10)
As for the conditional variance of (A.9), it is calculated the following way:926
V(zt+h|t) = exp σ2
−1exp σ2
V(lz,t+h−1|t).(A.11)
36
Finally, the conditional median of (A.9), based on (A.8), is:927
Md(zt+h|t) = lz,t.(A.12)
This means that the h-step-ahead point forecast, produced using the SES928
method corresponds to the median of the ETS(M,N,N) model.929
Finally, the properties of the log normal distribution and the multiplica-930
tive model also restrict the smoothing parameter with the interval [0, 1].931
Assuming that the smoothing parameter is always positive, the inequality932
(1 + t)>0 implies that:933
t>−1
αzt>−αz
1 + αzt>1−αz
(A.13)
The ETS(M,N,N) model makes sense only when 1 + αzt>0. So, if αz>1,934
then 1+αztmay become negative, which breaks the model, because the level935
may become negative. The model however still makes sense for boundary936
values of αz: when αz= 0, the level is not updated, while in the case of937
αz= 1, the level has the dynamics of a random walk process. The condition938
αz∈[0,1] is rather restrictive, because there may be some cases when even939
with αz>1 the value of (1 + αzt) will be greater than zero. However it940
guarantees that the level of time series is always positive whatever the error941
value.942
Appendix B. Likelihood function for iETS(M,N,N)943
There are two cases for the intermittent demand model: when demand944
occurs and when it does not. In the former case, the probability of obtaining945
the value ytdepends on the previous observations of ytand leads to the946
likelihood (based on the likelihood in Hyndman et al., 2008, p.35):947
L(θ, σ2
|yt, ot= 1) = ptfz(zt|lz,t−1),(B.1)
where fz(·) is the probability density function of the log normal distribution948
and θis the vector of all the parameters of the model. In the latter case it949
is similarly equal to:950
L(θ, σ2
|yt, ot= 0) = (1 −pt)fz(zt|lz,t−1).(B.2)
37
The likelihood function for the statistical model (21) for all the Tobser-951
vations, is then:952
L(θ, σ2
|Y) = Y
ot=1
ptY
ot=0
(1 −pt)
T
Y
t=1
fz(zt|lz,t−1),(B.3)
where Yis the set of all the available variables yt. However, the likelihood953
(B.3) cannot be used for the estimation of the model, because ztis not954
observable when ot= 0, thus making the estimation of the probability density955
function in those points not tractable. This means that the cases of ztwhen956
ot= 0 should be treated as missing values, and the likelihood should take957
into account the uncertainty about the distribution of zt, conditional on958
lz,t−k, where zt−kis the last observed demand size. There are two options of959
calculating the likelihood in this case:960
1. Calculate the marginal probability density function of fz(zt|lz,t−k) (sim-961
ilar to the derivation by Barnea et al., 2006):962
fz(zt|lz,t−k) = Z∞
0
· · · Z∞
0
fz(zt|lz,t−1)
k−1
Y
j=1
fl(lz,t−j|lz,t−j−1)dlz ,t−1. . . dlz,t−k+1,
(B.4)
where fz(zt|lz,t−1) is the probability density function of the demand963
sizes (based on the log normal distribution) and fl(lz,t−j|lz,t−j−1) is the964
probability density function of the level (based on the three parameter965
log normal distribution).966
2. Calculate the likelihood using Expectation Maximisation (EM) algo-967
rithm. In this case the expectation of the logarithm of the (B.3) is968
taken, which is then maximised.969
We will be using the latter approach, because the former does not have970
analytical solutions and involves numerical optimisation, for example, based971
on Monte Carlo simulations, which is more computationally expensive than972
the second option. The EM algorithm, applied to our problem, leads to the973
following expectation of the log-likelihood:974
`(θ, σ2
|Y) = X
ot=1
log fz(1 + t) + X
ot=0
E (log fz(1 + t))
+X
ot=1
log(pt) + X
ot=0
log(1 −pt).(B.5)
38
The log-likelihood (B.5) can be simplified further, given that E (log fz(1 + t))975
is the negative differential entropy (Lazo and Rathie, 1978) and is equal to976
−µ−1
2log(2πeσ2
) in the case of log normal distribution. Based on that and977
taking the assumption of µ= 0, the expected log-likelihood is equal to:978
`(θ, σ2
|Y) = X
ot=1
log fz(1 + t)−T0
2log(2πeσ2
)
+X
ot=1
log(pt) + X
ot=0
log(1 −pt),(B.6)
where T0is the number of zero observations.979
The likelihood (B.6) can be maximised directly. Inserting the density980
function of the log normal distribution in (B.6) instead of fz(1 + t), leads981
to:982
`(θ, σ2
|Y) = −X
ot=1 log(zt) + 1
2log(2πσ2
) + 1
2
log2(1 + t)
σ2
−T0
2log(2πeσ2
) + X
ot=1
log(pt) + X
ot=0
log(1 −pt)
,(B.7)
which can be simplified to:983
`(θ, σ2
|Y) = − T1
2log(2πσ2
) + T0
2log(2πeσ2
) + 1
2σ2
X
ot=1
log2(1 + t)!
−X
ot=1
log(zt) + X
ot=1
log(pt) + X
ot=0
log(1 −pt)
,
(B.8)
where T1is the number of non-zero observations. Finally, taking that T=984
T0+T1this becomes:985
`(θ, σ2
|Y) = − T
2log(2πσ2
) + T0
2+1
2σ2
X
ot=1
log2(1 + t)!
−X
ot=1
log(zt) + X
ot=1
log(pt) + X
ot=0
log(1 −pt)
.(B.9)
Maximising the likelihood (B.9) with respect to the scale parameter σ2
,986
the following estimate of the parameter is obtained:987
ˆσ2
=1
TX
ot=1
log2(1 + t),(B.10)
39
Finally, the probability ptis also not known and can be concentrated out of988
the likelihood, leading to the final expected concentrated log-likelihood:989
`(θ,ˆσ2
|Y) = −1
2Tlog(2πeˆσ2
) + T0−X
ot=1
log(zt)+X
ot=1
log(ˆpt)+X
ot=0
log(1−ˆpt)
(B.11)
Appendix C. Proxies for the error terms of iETS model990
The probability of occurrence in the model (21) at time tis defined as:991
pt=at
at+bt
.(C.1)
Based on this formula, either ator btcan be calculated if the probability is992
known:993
at=bt
pt
1−pt
(C.2)
and994
bt=at
1−pt
pt
.(C.3)
Now when the probability of occurrence ˆptis calculated on observation t, we995
can calculate the error, assuming that when ot= 1, the probability should996
be as close to one as possible, and that when ot= 0, the probability should997
be as close to zero as possible. Based on this idea the following error can be998
calculated:999
vt=ot−ˆpt.(C.4)
This error lies in the interval (-1, 1), depending on the values of otand ˆpt.1000
We then transform the variable in order to make it lie in the interval (0, 1):1001
ut=1 + vt
2.(C.5)
In this way the value of ut= 0.5 corresponds to the ideal situation, when1002
the predicted and the actual outcomes are equal (e.g. demand has occurred1003
and the probability is equal to one). In the boundary case when ˆpt= 1 and1004
ot= 0, the error proxy (C.5) is equal to 0, while in the opposite case it is1005
equal to 1. In all the other cases it lies between zero and one.1006
40
Now inserting this new variable utin (C.2) and (C.3) instead of ptand1007
setting the unobserved atand btto one (which implies independence of the1008
two models), we obtain the following two proxies of the error terms:1009
1 + ea,t =ut
1−ut
(C.6)
and1010
1 + eb,t =1−ut
ut
(C.7)
respectively.1011
Appendix D. Quantiles of rounded up random variables1012
Before proceeding with the proof we need to give the definition of the1013
quantiles of the continuous and rounded up random variables:1014
P(zt< k)=1−α, (D.1)
and1015
P(dzte ≤ n)≥1−α, (D.2)
where nis the quantile of the distribution of rounded up values (the smallest1016
integer number that satisfies the inequality (D.2)) and kis the quantile of1017
the continuous distribution of the variable.1018
In order to prove that n=dke, we need to use the following basic prop-1019
erty:1020
dzte ≤ n⇐⇒ zt≤n, (D.3)
which means that the rounded up value will always be less than or equal to1021
nif and only if the original value is less than or equal to n. Taking into1022
account (D.3), the probability (D.2) can be rewritten as:1023
P(zt≤n)≥1−α. (D.4)
Note also that the following is true:1024
P(dzte ≤ n−1) = P(zt≤n−1) <1−α. (D.5)
Taking the inequalities (D.1), (D.2), (D.4) and (D.5) into account, the fol-1025
lowing can be summarised:1026
P(zt≤n−1) < P (zt< k)≤P(zt≤n),(D.6)
41
which is possible only when k∈(n−1, n], which means that dke=n. So the1027
rounded up quantile of continuous random variable ztwill always be equal1028
to the quantile of the descritised value of zt:1029
dQα(zt)e=Qα(dzte).(D.7)
It is also worth noting that the same results can be obtained with the1030
floor function instead of ceiling, following the same logic. So the following1031
equation will hold for all ztas well:1032
bQα(zt)c=Qα(bztc).(D.8)
References1033
Akram, M., Hyndman, R. J., Ord, J. K., 2009. Exponential smoothing and1034
non-negative data. Australian & New Zealand Journal of Statistics 51 (4),1035
415–432.1036
B˚ardsen, G., L¨utkepohl, H., 2009. Forecasting Levels of log Variables in1037
Vector Autoregressions.1038
Barnea, O., Solow, A. R., Stone, L., 2006. On fitting a model to a population1039
time series with missing values. Israel Journal of Ecology & Evolution 52,1040
1–10.1041
Barr, D. R., Sherrill, E. T., 1999. Mean and variance of truncated normal1042
distributions. American Statistician 53 (4), 357–361.1043
Bi, Z., Faloutsos, C., Korn, F., 2001. The ”DGX” distribution for mining1044
massive, skewed data. In: Proceedings of the seventh ACM SIGKDD in-1045
ternational conference on Knowledge discovery and data mining - KDD1046
’01. ACM Press, pp. 17–26.1047
Chakraborty, S., 2015. Generating discrete analogues of continuous prob-1048
ability distributions-A survey of methods and constructions. Journal of1049
Statistical Distributions and Applications 2 (1), 6.1050
Chatfield, C., Koehler, A. B., Ord, J. K., Snyder, R. D., 2001. A New Look1051
at Models for Exponential Smoothing. Journal of the Royal Statistical1052
Society, Series D (The Statistician) 50 (2), 147–159.1053
42
Croston, J. D., 1972. Forecasting and Stock Control for Intermittent De-1054
mands. Operational Research Quarterly 23 (3), 289.1055
Davydenko, A., Fildes, R., 2013. Measuring Forecasting Accuracy: The Case1056
Of Judgmental Adjustments To Sku-Level Demand Forecasts. Interna-1057
tional Journal of Forecasting 29 (3), 510–522.1058
Demˇsar, J., 2006. Statistical Comparisons of Classifiers over Multiple Data1059
Sets. Journal of Machine Learning Research 7, 1–30.1060
Dirac, P. A. M., 1927. The Physical Interpretation of the Quantum Dy-1061
namics. Proceedings of the Royal Society A: Mathematical, Physical and1062
Engineering Sciences 113 (765), 621–641.1063
Eaves, A., Kingsman, B., 2004. Forecasting for the ordering and stock-holding1064
of spare parts. Journal of the Operational Research Society 55 (4), 431–437.1065
Fenton, L. F., 1960. The Sum of Log-Normal Probability Distributions in1066
Scatter Transmission Systems. IRE Transactions on Communications Sys-1067
tems 8 (1), 57–67.1068
Gneiting, T., Raftery, A. E., 2007. Strictly proper scoring rules, prediction,1069
and estimation. Journal of the American Statistical Association 102 (477),1070
359–378.1071
Hyndman, R. J., Koehler, A. B., Ord, J. K., Snyder, R. D., 2008. Forecasting1072
with Exponential Smoothing. Springer Series in Statistics. Springer Berlin1073
Heidelberg.1074
Hyndman, R. J., Koehler, A. B., Snyder, R. D., Grose, S., 2002. A state1075
space framework for automatic forecasting using exponential smoothing1076
methods. International Journal of Forecasting 13 (8), 439–454.1077
Johnson, N. L., 1949. Systems of Frequency Curves Generated by Methods1078
of Translation. Biometrika 36 (1/2), 149–176.1079
Kachour, M., 2014. On the rounded integer-valued autoregressive process.1080
Communications in Statistics - Theory and Methods 43 (2), 355–376.1081
Kachour, M., Yao, J. F., 2009. First-order rounded integer-valued autoregres-1082
sive (RINAR(1)) process. Journal of Time Series Analysis 30 (4), 417–448.1083
43
Kourentzes, N., 2014. On intermittent demand model optimisation and se-1084
lection. International Journal of Production Economics 156, 180–190.1085
Kourentzes, N., Petropoulos, F., 2016. tsintermittent: Intermittent Time1086
Series Forecasting. R package version 1.9.1087
URL https://CRAN.R-project.org/package=tsintermittent1088
Lazo, A., Rathie, P., 1978. On the entropy of continuous probability dis-1089
tributions (Corresp.). IEEE Transactions on Information Theory 24 (1),1090
120–122.1091
Ord, K. J., Koehler, A. B., Snyder, R. D., 1997. Estimation and prediction1092
for a class of dynamic nonlinear statistical models. Journal of the American1093
Statistical Association 92 (March 2013), 1621–1629.1094
Petropoulos, F., Kourentzes, N., 2015. Forecast combinations for intermittent1095
demand. Journal of the Operational Research Society 66 (6), 914–924.1096
R Core Team, 2018. R: A Language and Environment for Statistical Com-1097
puting. R Foundation for Statistical Computing, Vienna, Austria.1098
URL https://www.R-project.org/1099
Sangal, B., Biswas, A. K., 1970. The 3-Parameter Distribution Applications1100
in Hydrology. Water Resources Research 6 (2), 505–515.1101
Schultz, C. R., 1987. Forecasting and Inventory Control for Sporadic Demand1102
under Periodic Review. Journal of the Operational Research Society 38 (5),1103
453–458.1104
Shenstone, L., Hyndman, R. J., 2005. Stochastic models underlying Croston’s1105
method for intermittent demand forecasting. Journal of Forecasting 24 (6),1106
389–402.1107
Snyder, R. D., 1985. Recursive Estimation of Dynamic Linear Models. Jour-1108
nal of the Royal Statistical Society, Series B (Methodological) 47 (2), 272–1109
276.1110
Snyder, R. D., 2002. Forecasting sales of slow and fast moving inventories.1111
European Journal of Operational Research 140 (3), 684–699.1112
44
Snyder, R. D., Ord, J. K., Beaumont, A., 2012. Forecasting the intermittent1113
demand for slow-moving inventories: A modelling approach. International1114
Journal of Forecasting 28 (2), 485–496.1115
Svetunkov, I., 2019a. counter: Modelling and forecasting of count data. R1116
package version 0.2.0.41001.1117
URL https://github.com/config-i1/counter1118
Svetunkov, I., 2019b. smooth: Forecasting Using Smoothing Functions. R1119
package version 2.5.0.1120
URL https://github.com/config-i1/smooth1121
Syntetos, A. A., Boylan, J. E., 2001. On the bias of intermittent demand1122
estimates. International Journal of Production Economics 71 (1-3), 457–1123
466.1124
Syntetos, A. A., Boylan, J. E., 2005. The accuracy of intermittent demand1125
estimates. International Journal of Forecasting 21 (2), 303–314.1126
Teunter, R. H., Syntetos, A. A., Babai, M. Z., 2011. Intermittent demand:1127
Linking forecasting to inventory obsolescence. European Journal of Oper-1128
ational Research 214 (3), 606–615.1129
Zaninetti, L., 2017. A Left and Right Truncated Lognormal Distribution for1130
the Stars. Advances in Astrophysics 2 (3).1131
45
















