Conference PaperPDF Available

# Probabilistic forecasting of seasonal time series - Combining clustering and classification for forecasting

Authors:

## Abstract and Figures

In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is implemented combining several machine learning techniques 1) to identify typical seasons and 2) to forecast a probability distribution of the next season. This framework is evaluated using a wide range of real seasonal time series. On the one side, we intensively study the alternative combinations of the algorithms composing our framework (clustering, classification), and on the other side, we evaluate the framework forecasting accuracy. As demonstrated by our experiences, the proposed framework outperforms competing approaches by achieving lower forecasting errors.
Content may be subject to copyright.
Probabilistic forecasting of seasonal time series
Combining clustering and classiﬁcation for forecasting
Colin Leverger1,2, Thomas Guyet3, Simon Malinowski4, Vincent Lemaire2,
Alexis Bondu2, Laurence Roz´e5, Alexandre Termier4, and R´egis Marguerie2
1IRISA
F-35042 Rennes Cedex, France
2Orange Labs
3Institut Agro/IRISA
4Universit´e Rennes 1/Inria/IRISA
5INSA/Inria/IRISA
Abstract. In this article, we propose a framework for seasonal time se-
ries probabilistic forecasting. It aims at forecasting (in a probabilistic
way) the whole next season of a time series, rather than only the next
value. Probabilistic forecasting consists in forecasting a probability distri-
bution function for each future position. The proposed framework is im-
plemented combining several machine learning techniques 1) to identify
typical seasons and 2) to forecast a probability distribution of the next
season. This framework is evaluated using a wide range of real seasonal
time series. On the one side, we intensively study the alternative com-
binations of the algorithms composing our framework (clustering, classi-
ﬁcation), and on the other side, we evaluate the framework forecasting
accuracy. As demonstrated by our experiences, the proposed framework
outperforms competing approaches by achieving lower forecasting errors.
Keywords: Time series, Probabilistic forecasting, Seasonality
1 Introduction
Forecasting the evolution of a temporal process is a critical research topic, with
many challenging applications. In this work, we focus on time series forecasting
and on data-driven forecasting models. A time series is a timestamped sequence
of numerical values, and the goal of forecasting is, at a given point of time, to
predict the next values of the time series based on previously observed values
and possibly on other linked exogenous observations. Data-driven algorithms
are used to predict future time series values from past data, with models that
are able to adapt automatically to any type of incoming data. The data science
challenge is to learn accurate and reliable forecasting models with as few hu-
man interventions as possible. Time series forecasting has many applications in
medicine (for instance, to forecast blood glucose of a patient [1]), in economy
(for instance, to forecast macroeconomic variable changes [2]), in the ﬁnancial
domain (forecasting ﬁnancial time series [3]), in electricity load [4] or in industry
(for instance, to forecast the server load [5,6]).
2 Leverger et al.
Time series forecasting algorithms provide information about possible situa-
tions in the future, and can be used to anticipate crucial decisions. Taking correct
decisions requires anticipation and accurate forecasts. Unfortunately, these ob-
jectives are often contradictory. Indeed, the larger the forecasting horizon, the
wider the range of expectable situations. In such case, a probabilistic forecasting
algorithm is a powerful decision support tool, because it handles the uncertainty
of the predictions. Probabilistic or density forecasting is a class of forecasting
that provides intervals or probability distributions as outcomes of the forecast-
ing. It is claimed in [7] that, in recent years, probabilistic forecasts have become
widely used. For instance, fan charts [8], highest density regions [9] or functional
data analysis [10] enable to forecast ranges for possible values of future data.
We are particularly interested in time series that have some periodic regular-
ities in their values. This kind of time series is said to be seasonal. For instance,
time series related to human activities or natural phenomena are often seasonal,
because they often exhibit daily regularities (also known as the circadian cycle).
Knowing that a time series is seasonal is a valuable information that can help for
forecasting. More speciﬁcally, learning the seasonal structures can help to gen-
erate longer-term predictions as it provides information about several seasons
Furthermore, seasonality of a time series gives a natural midterm forecasting
horizon. Classical forecasting models (e.g., SARIMA [11]) predict the future
values of a given time series stepwise. The predicted values are used by further
steps. At each step, there is then a risk of the error to be accumulated due to
the recursive nature of the forecasts. The prediction of a whole season at once
aims at spreading the forecasting error all along the season. Thus, we expect
to forecast more accurately the salient part of a season that may lie in the
middle of the season. More practically, the prediction of a whole season at once
allows applications where such prediction is required to plan actions (e.g., to plan
electricity production a day ahead, it is necessary to predict the consumption
for the next 24 hours).
A second limitation of usual seasonal forecasting methods is the assumption
that the seasons have the same shape, i.e., the values evolve in the same way over
the season. The diﬀerences are with each other are due to noise and an additive
constant. Nevertheless, most of the real seasonal time series often contain more
than just one periodic pattern. For instance, daily connections to a given website
exhibit diﬀerent patterns for a weekday or for a Sunday for instance. This kind
of structure cannot be well captured by classical forecasting methods.
In this article, we propose a generic framework called P-F2C (which stands
for “Probabilistic Forecasting with Clustering and Classiﬁcation”) for seasonal
time series forecasting. This approach extends the F2C framework [6] (which
stands for “Forecasting with Clustering and Classiﬁcation”). P-F2C predicts fu-
ture values for a complete season ahead at once, and this in a probabilistic man-
ner. The P-F2C predictions may be used for supporting decision-making about
the next season, handling the uncertainty in the future through the probabilistic
presentation of the result.
Probabilistic forecasting of seasonal time series 3
2 Probabilistic seasonal time series forecasting
In this section, we introduce the notations and the problem of seasonal time
series forecasting.
2.1 Seasonal time series
A time series Yis an ordered sequence of values y0:n=y0, . . . , yn1, where
i[0, n 1], yiR(univariate time series). ndenotes the length of the
observed time series.
Yis said to be (ideally) seasonal with season length sif there exists S=
{S1, . . . , Sp}a ﬁnite collection of psub-series (of length s) called typical seasons
such that
i[0, m 1] , y(s×i):s×(i+1) =
p
X
j=1
σi,j Sj+εi(1)
where mis the number of seasons in the time series, εiRsrepresents a
white noise and Pjσi,j = 1 for all j. In other words, it means that for a seasonal
time series Y, every season in Yis a weighted linear combination of typical
seasons. Intuitively, this modelling of a typical season corresponds to additive
measurements (e.g., consumption or traﬃc) for which the observed measure
at time tis the sum of individual behaviours. In this case, a typical season
corresponds to a typical behaviour of individuals, and the σ,j represents the
proportion of individuals of type jcontributing to the observed measure.
In the following yi=y(s×i):s×(i+1) Rsdenotes the i-th season of Y.
2.2 Seasonal probabilistic forecasting
Let Y=y0, . . . , yn1be a seasonal time series, an sbe its season length.
Note that the season length of a time series (s) is estimated using Fisher’s
g-statistics [12]. Without loss of generality, we assume that the length of a time
series is a multiple of the season length, i.e.,n=m×s.mdenotes the number of
seasons in the observed time series. The goal of seasonal probabilistic forecasting
is to estimate
Pr(y
n:n+s|y(nγ×s):n) = Pr(y
m|y(mγ):m) (2)
where y
m=y
n:n+sare the forecasts of the snext values (next season) of the
observed time series, and y(mγ):m=y(nγ×s):nare the observed values of the
last γseasons. γis a parameter given by the user.
We now propose an equivalent formulation of this problem considering our
hypothesis on seasonal time series and we denote S={S1, . . . , Sp}the set of p
typical seasons. Thus, Equation 2 can be rewritten as follows:
Pr(y
m|y(mγ):m) = X
S∈S
Pr(y
m|S).Pr(S|y(mγ):m) (3)
4 Leverger et al.
COMPLETE MATCH
SUBMATCH 1 SUBMATCH 2
Fig. 1: Diﬀerence between clustering (on the left), which matches the entire time
series, with coclustering (on the right), which is able to match subintervals of
the time series of various other time series.
where Pr(y
m|S) is the probability of having y
mgiven that the type of the
next season and Pr(S|y(mγ):m) is the probability that the next season is of
type Sgiven past observations.
The problem formulation given by Eq. 3 turns the diﬃcult problem of Eq. 2
into two well-known tasks in time series analysis:
estimating the ﬁrst term, Pr(y
m|S) leads to a problem of time series clus-
tering. The problem is to both deﬁne the typical seasons, S, and to have
the distributions of the season values. A clustering of the seasons (yi)i=0:m
of the observed time series identiﬁes the typical seasons (clusters) and gives
the required empirical distributions ˆ
Pr(y, S).
estimating the second Pr(S|ymγ:m) is a probabilistic time series classi-
ﬁcation problem. This distribution can be empirically learnt from the past
observations (yiγ:i, S
i+1)i=γ:mwhere S
idenotes the empirical type of the
i-th season obtained from the clustering assignment above.
This problem formulation and remarks sketch the principles of a probabilistic
seasonal time series forecasting. P-F2C is an implementation of these principles
with a speciﬁc time series clustering.
3 The P-F2C forecaster
P-F2C is composed of a clusterer that models the latent typical seasons and a
classiﬁer that predicts the next season type given the recent data. The forecaster
is ﬁt on the historical data of a time series. Then, the forecaster can be applied
on the time series to predict the next season(s).
P-F2C clusterer is based on a probabilistic co-clustering model that is pre-
sented in the next section. In Section 3.2, we present how to use classical classi-
ﬁers to predict the next seasons.
Probabilistic forecasting of seasonal time series 5
3.1 Coclustering of time series: a probabilistic model
Coclustering is a particular type of unsupervised algorithm which diﬀers from
regular clustering approaches by creating co-clusters. The objective of cocluster-
ing approaches consists in simultaneously partitioning the lines and the columns
of an input data table. Thus, a co-cluster is deﬁned as a set of examples belong-
ing to both a group of rows and a group of columns. In [13], Boull´e proposed
an extension of co-clustering to tri-clustering in order to cluster time series.
In this approach, a time series with an identiﬁer Cis seen as a set of couples
(T, V ), where Tis a timestamp and Va value of a measurement. Thus, the
whole set of time series is a large set of points represented by triples (C, T, V ).
The tri-clustering approach handles the three variables (Cis categorical and
T,Vare numerical) to create homogeneous groups. A co-cluster gathers time
series (group of identiﬁers) that have similar values during a certain interval of
time. Contrary to the classical clustering approaches (e.g., KMeans, K-shape,
GAK) [14] that are based on the entire time series, the coclustering approach
uses a local criterion. This diﬀerence is illustrated in Figure 1: A distance based
clustering (on the left) evaluates the distance between whole time series, in the
co-clustering approaches, the distance is based on subintervals of the seasons.
This enables to identify which parts of the season are the most discriminant.
Besides, tri-clustering is robust to missing values in time series.
The tri-clustering approach of Boull´e is based on the MODL framework
[10]. The MODL framework makes a constant piecewise assumption to estimate
the joint distribution Pr(C, T, V ) by jointly discretising the variables T,Vand
grouping the time series identiﬁers of the variable C. The resulting model consists
of the Cartesian product6of the three partitions of the variables C,T,V. This
model can be represented as a 3D grid (see Figure 2, on the left). In this 3D grid,
if one considers a given group of time series (i.e., a given group of C), the model
provides a bivariate discretisation which estimates Pr(T, V |C) = Pr(C,T ,V )
Pr(C)as a
2D grid (see Figure 2, on the right). This 2D grid gives the probability to have
a given range of values during a given interval of time. Therefore knowing that
a time series belongs to a given cluster the corresponding 2D grid may then be
used for crafting forecasts (see next section).
In the MODL approach, ﬁnding the most probable tri-clustering model is
turning into a model selection problem. To do so, a Bayesian approach called
Maximum A Posteriori (MAP) is used to select the most probable model given
the data. Details about how this 3D grid model is learned may be found in
[13,15]. The main idea could be summarised as ﬁnding the grid which maximises
the contrast compared to a grid based on the assumption that T, V and Care
independent (i.e., Pr(V, T , C) compared to Pr(V) Pr(T) Pr(C)). Therefore the
estimation of this MAP model outputs: (i) νintervals of values Vi= [vl
i, vu
i] for
i= 1, . . . , ν, (ii) τintervals of times Ti= [tl
i, tu
i] for i= 1, . . . , τ , (iii) groups
of time series. These groups of time series corresponds to the typical seasons,
6The Cartesian product of the three partitions is used as a constant piecewise esti-
mator – i.e., a 3D histogram.
6 Leverger et al.
Fig. 2: Illustration of a trivariate coclustering model where a slice referred to
forecasting “grid” is extracted.
denoted Sin the above model. |S| is the number of clusters at the ﬁner level
that is optimal in the sense of the MODL framework.
In the time series forecasting approach proposed in this paper, the right
number of (tri-)clusters is optimised regarding to the forecasting task. More
precisely, this number is optimised according to the performance of the model
at prediction time, using the validation ensemble. This value could diﬀer from
|S|. Therefore the MODL coclustering approach allows applying a hierarchical
clustering to the ﬁner level to have a coarse level with a lower number of clusters
called C,C<|S|. A grid search selects the Cvalue based on the forecast
accuracy on the valid dataset.
Let us now come back to the formalisation of probabilistic time series fore-
casting: ˆ
Pr(y
m|S) is estimated by the MODL model from the conditional
probabilities Pr(V, T |C=S) where Sdenotes one of the time series groups,
i.e. a typical season. In practice, the grid is used to estimate the distribution
of values at each time point of a season. With MODL, the distribution is a
piecewise constant function.
3.2 Predict the next type of seasons
The problem is here to estimate empirically Pr(Si+1 |y(iγ):i) the probability of
having a type of season Si+1 ∈ S for the (i+ 1)-th season given the observations
over the γpast seasons. We consider two diﬀerent sets of features to represent
the γprevious seasons. The ﬁrst approach consists in having only the time series
values y(iγ):ias features. The second approach uses the time series values and
the types of the previous seasons as features.
Then, the next season prediction problem consists in learning a probabilis-
tic classiﬁer (Naive-Bayes classiﬁer, logistic regression, decision tree or random
forests) or a time series classiﬁers (TSForest [16], Rocket [17]). Note that time
series classiﬁers can use only the time series values.
3.3 Select the best parameters (Portfolio)
The P-F2C forecaster is parameterised by the number of seasons in the past (γ)
used for learning next season type, a maximum number of typical seasons to
Probabilistic forecasting of seasonal time series 7
Fig. 3: At the top: typical seasons of length 10 used for generating the time series,
at the bottom: examples of generated time series with white noise (7 seasons).
detect in a non-supervised way, and the type of classiﬁer. The γparameter is
introduced in the problem deﬁnition and its choice is left to the user who speciﬁes
what is the forecasting task. On the other hand, the other parameters may be
diﬃcult to be set by the user, and we do not think that one of the classiﬁers
will outperform the others for all the time series. For these reasons, the portfolio
approach (denoted PP-F2C) implements a grid search for the best parameters
by splitting the dataset into a training (75%) and a validation dataset (25%) to
identify the best value of the parameters. Once the best values have been set,
the clusterer and the classiﬁer are ﬁtted on the entire dataset.
4 Illustration on a synthetic dataset
This section shows results with synthetic data. The goal is to illustrate the prob-
abilistic grid used in P-F2C method, and to give intuitions behind probabilis-
tic forecasting that are provided by P-F2C. We compare the output of P-F2C
against the output of DeepAR [18], a state-of-the-art probabilistic time series
forecaster.
4.1 The data generated
Generating data is a good strategy for checking assumptions before launching ex-
periments at scale. Indeed, the shape of the generated data is often simpler, and
completely controlled. Experiments may be executed with various parameters,
to plot understandable results and to validate basic expectations.
8 Leverger et al.
The seasonal data generated for this section follows some well-established
seasonal sequences. Three diﬀerent time series patterns are deﬁned for three
diﬀerent latent types of season of length 10. In the Figure 3, one type of season
(s1in orange) with always increasing values is observed, one type of season (s2in
green) with two peaks is observed, etc. Those three diﬀerent types of season are
then repeated 50 times in a deﬁned order (s1, s1, s0, s2, s1, s1, s2, s0, as observed
in Figure 3, on the right, which shows the entire sequence that is being repeated),
and noise is added to the ﬁnal time series to make the forecasting process less
straightforward.
4.2 Grid probabilistic forecasts
Once trained, we apply the P-F2C forecaster at the end of the time series illus-
trated in Figure 3 on the right. Knowing the sequence of patterns, we can guess
that a season of type s2is coming ahead. Indeed, the last three patterns seems
to follow the sequence [s1, s1, s0].
The Figure 4 shows two examples of forecasts with diﬀerent values of γ.
The real values of the predicted time series are in blue (noisy version of the
s2pattern). The probabilistic forecasts are shown in a red overlay. It is a set of
rectangles that visualise the homogeneous regions that have been identiﬁed by
MODL coclustering. The darker the red, the more probable next season ahead
lay in this (T,V) interval.
The Figure 4 on the left is the forecast obtained with γ= 1. It illustrates a
probabilistic forecast with a lot of uncertainty. Indeed, light red cells are observed
in the ﬁgure where the data are predicted to lay (with a low probability). In this
case, the classiﬁer is unable to predict accurately the next type of season. With
γ= 1 the classiﬁer has only the information of the preceding season (of type s0).
In this case, the forecaster encountered two types of season after a s0season:
s1or s2with the same probability. Then, the predicted grid is a mixture of the
two types of grids. For the ﬁrst half of the season, the forecast is conﬁdent in
predicting the linear increase of the value (darker red cells), but for the second
half, the forecast suggests two possible behaviours: continue the linear increase
(s1) or a decrease (s2). Note that the grids of all typical seasons share the same
squaring. MODL necessarily creates the same cuttings of a dimension (Vor T)
along the others (C).
The Figure 4 on the middle is the forecast obtained with γ= 3. It illustrates
a good probabilistic forecast. The real values (in blue) often appear in the red
boxes where the red is very dark. It means that the season type was both well
described by MODL and well predicted by the classiﬁer. In this case, a larger
memory of the forecaster disentangles the two possible choices it had above.
After a [s1, s1, s0], the forecaster always observed seasons of type s2. Thus, the
grid of this pattern is predicted.
It is worth noting that, for γ= 1, the use of the MODL probabilistic grid
suggests two distinct possible evolution of the time series, but there is an un-
certainty on which evolution will actually occur. In the classical probabilistic
Probabilistic forecasting of seasonal time series 9
Fig. 4: One season ahead grid forecasts for the generated time series with γ= 1
at the top left and γ= 3 at the top right, and DeepAR at the bottom.
forecasts, probabilities are distributed around a mean time series. This is illus-
trated on the Figure 4 on the right with DeepAR using the 7 seasons in the
past to predict the next season. On the second half of the season, the predicted
probabilistic distribution suggests a behaviour in between s1and s2with a larger
uncertainty. Such model makes confusion between uncertainty of behaviour and
imprecise forecast. In the case of seasonal time series with diﬀerent types of
season, the mean time series has no meaning for an analyst.
5 Experiments
This section presents experiments to assess the accuracy of P-F2C. We start by
introducing the experimental settings, then we investigate some parameters of
our model and ﬁnally we present the result of an intensive comparison of P-F2C
to competitors.
5.1 Experimental protocol
The framework has been developed in Python 3.5. The MODL coclustering is
performed by the Khiops tool [19]. The classiﬁcation algorithms are borrowed
from the sklearn library [20].
In our experiments, we used 36 datasets7, from various sources and nature:
technical devices, human activities, electricity consumption, natural processes,
rary link). It includes the sources of time series.
10 Leverger et al.
123
CD
gamma = 3
gamma = 2
gamma = 1
234
CD
TimeSeriesForestClassifier
LogisticRegression
RandomForestClassifier
DecisionTreeClassifier
GaussianNB
Fig. 5: Critical diagrams used to ﬁnd the best parameters for the P-F2C imple-
mentation.
etc. All these datasets have been selected because seasonality was identiﬁed and
validated with a Fisher g-test [12]. Each time series is normalised using a z-
normalisation prior to data splitting, in order to have comparable results. For
the experiments, 90% of the time series are used to train the forecaster (this train
test is internally split in training and valid datasets) and 10% of the original time
series are used to evaluate the accuracy.
P-F2C and PP-F2C are compared with classical deterministic time-series
forecasters (AR, ARIMA, SARIMA, HoltWinters), with LSTM [21], Prophet [22]
and with the F2C method [6] which uses the principles as P-F2C but with K-
means clustering algorithm and random forest classiﬁers to learn the structure in
the season sequence. P-F2C being a probabilistic methodology, we also compare
it with DeepAR [18].
We use Mean Absolute Error (MAE) and Continuous Ranked Probability
Score (CRPS) to compare the forecasts to the real time series. The MAE is
dedicated to deterministic forecasts while CRPS is to probabilistic ones. It is
worth noting that the CRPS is analogous to MAE for deterministic forecasts.
Therefore, comparing MAE measure for deterministic forecasts against CRPS
values for probabilistic forecasts is technically sound [23]. The CRPS is used
for DeepAR and P-F2C. All the other approaches forecast crisp time series and
their accuracy is evaluated through MAE. For each experiment, we illustrate the
results with critical diﬀerence diagrams. A critical diﬀerence diagram represents
the mean rank of the methods that have been obtained on the set of the 36 times
series. The lower the better. In addition, the representation shows horizontal bars
that group some methods. In a same group, the methods are not statistically
diﬀerent according to the Nemenyi test.
5.2 Parameters sensitivity
In this section, an analysis of the alternative settings of the P-F2C methodology
is conducted. We investigate the eﬀect of two choices: the choice of the γvalue,
i.e., the number of seasons to consider in the history; and the choice of the
classiﬁer to predict the next type of season in case we do not use the portfolio
optimisation.
Probabilistic forecasting of seasonal time series 11
2345678
CD
F2C
PP_F2C
P_F2C
LSTM
SARIMA
HOLTWINTERS
PROPHET
DeepAR
ARIMA
DeepAR win: 5
PP_F2C win: 27
p−value=4.597e−06
0.00
0.25
0.50
0.75
1.00
0.0 0.3 0.6 0.9
DeepAR
PP_F2C
DeepAR vs PP_F2C in terms of CRPS error.
Fig. 6: At the top: Critical diagram of the comparison between diﬀerent predic-
tion approaches (acronyms of method are detailed in the text). At the bottom:
Win-Tie-Loose graph between PP-F2C and DeepAR.
Figure 5 on the left shows a critical diagram that compares the ranking of
P-F2C with diﬀerent values of γ(1, 2 or 3). For this experiment, the classi-
ﬁer is the RandomForestClassiﬁer (and we had the same results with the other
classiﬁers). We notice that the larger γ, the lower the error. Indeed, as seen in
Section 4, larger γimproves the accuracy of the forecast of the next season type.
Nonetheless, we observed that for some time series, lower γmay be better. We
explain this counter-intuitive results by the small length of some of the time
series. In the cases, the number of seasons in the training set is too small to ﬁt
the numerous parameters of a classiﬁer with γ×sfeatures.
Figure 5 on the right shows a critical diagram that compares the classiﬁers
used to predict the next type of season. It shows that time series forest classiﬁer
[16] is on average in ﬁrst position. This classiﬁer has been designed speciﬁcally for
time series classiﬁcation, it explains why it outperforms the other approaches.
Nonetheless, the diﬀerences with Logistic Regression and Random Forest are
not statistically signiﬁcant. Their capability to use extra-information, such as
the type of seasons, may be an interesting advantage to improve performances.
5.3 P-F2C and PP-F2C vs opponents
The critical diagram of Figure 6 compares the performances of the methods.
P-F2C denotes our approach conﬁgured with the best parameters on average
12 Leverger et al.
found in Section 5.2. PP-F2C denotes P-F2C that is optimised on the valid test
for each dataset (portfolio). It shows that rank-wise, the seasonal forecaster F2C,
P-F2C and PP-F2C are performing better than the others. We can ﬁrst notice
that the portfolio actually improve the performances of P-F2C. Nonetheless, the
non-probabilistic approach outperform PP-F2C.
We also notice that F2C outperforms PP-F2C. Even if a PP-F2C forecast ﬁts
the time series (see Figure 4), the piece-wise approximation generates a spread of
the probabilistic distribution that penalises the CRPS. Nonetheless, it is worth
noting that the rank diﬀerence with F2C is not statistically signiﬁcant, and that
probabilistic forecast convey meaningful information to trust the forecasts.
Then, we compared PP-F2C with another probabilistic forecaster, i.e. DeepAR.
The critical diagram of Figure 6 shows that PP-F2C outperforms DeepAR signif-
icantly (p < 106). The win/tie/lose graph on the right shows how many times
PF2C won against DeepAR (points below the diagonal) and the relative val-
ues of CRPS. The point positions illustrate that PP-F2C outperforms DeepAR
signiﬁcantly on most of the datasets.
6 Conclusion
P-F2C is a probabilistic forecaster for seasonal time series. It assumes that sea-
sons are a mixture of typical seasons to transform the forecasting problem into
both a clustering and a classiﬁcation of time series. The P-F2C applies param-
eterless coclustering approach that generates grid forecasts, each typical grid
being a typical seasonal behaviour. In addition we proposed PP-F2C that adjust
P-F2C parameters for each time series. PP-F2C outperforms on average the com-
petitors except F2C on various seasonal time series. F2C is based on the same
principle as PP-F2C but is not probabilistic and parameterless. Nonetheless,
we illustrated the interest of probabilistic grid forecasting to give information
about uncertain distinct mean behaviours. Indeed, the probabilistic grid mixture
is more interpretable than combining probabilistic distribution around a mean.
References
1. Liu, C., Veh´ı, J., Avari, P., Reddy, M., Oliver, N., Georgiou, P., Herrero, P.: Long-
term glucose forecasting using a physiological model and deconvolution of the con-
tinuous glucose monitoring signal. Sensors 19(19) (2019) 4338
2. Li, J., Chen, W.: Forecasting macroeconomic time series: Lasso-based approaches
and their forecast combinations with dynamic factor models. International Journal
of Forecasting 30(4) (2014) 996–1015
3. Tay, F., Cao, L.: Application of support vector machines in ﬁnancial time series
forecasting. Omega 29(4) (2001) 309–317
4. Laurinec, P., L´oderer, M., Luck´a, M., Rozinajov´a, V.: Density-based unsupervised
ensemble learning methods for time series forecasting of aggregated or clustered
electricity consumption. Journal of Intelligent Information Systems 53(2) (2019)
219–239
Probabilistic forecasting of seasonal time series 13
5. Bod`ık, P.: Automating Datacenter Operations Using Machine Learning. PhD
thesis, UC Berkeley (2010)
6. Leverger, C., Malinowski, S., Guyet, T., Lemaire, V., Bondu, A., Termier, A.:
Toward a framework for seasonal time series forecasting using clustering. In: Pro-
ceedings of the International Conference on Intelligent Data Engineering and Au-
tomated Learning. (2019) 328–340
7. De Gooijer, J., Hyndman, R.: 25 years of time series forecasting. International
journal of forecasting 22(3) (2006) 443–473
8. Wallis, K.F.: Asymmetric density forecasts of inﬂation and the bank of england’s
fan chart. National Institute Economic Review 167(1) (1999) 106–112
9. Hyndman, R.: Highest-density forecast regions for nonlinear and non-normal time
series models. Journal of Forecasting 14(5) (1995) 431–441
10. Boull´e, M.: Data grid models for preparation and modeling in supervised learning.
Hands-On Pattern Recognition: Challenges in Machine Learning 1(2011) 99–130
11. Kareem, Y., Majeed, A.R.: Monthly peak-load demand forecasting for sulaimany
governorate using SARIMA. In: Proceedings of the International Conference on
Transmission & Distribution Conference and Exposition. (2006) 1–5
12. Wichert, S., Fokianos, K., Strimmer, K.: Identifying periodically expressed tran-
scripts in microarray time series data. Bioinformatics 20(1) (2004) 5–20
13. Boull´e, M.: Functional data clustering via piecewise constant nonparametric den-
sity estimation. Pattern Recognition 45(12) (2012) 4389–4401
14. Paparrizos, J., Gravano, L.: Fast and accurate time-series clustering. ACM Trans-
actions on Database Systems (TODS) 42(2) (2017) 1–49
15. Bondu, A., Boull´e, M., Cornu´ejols, A.: Symbolic representation of time series: A
hierarchical coclustering formalization. In: International Workshop on Advanced
Analysis and Learning on Temporal Data, Springer (2015) 3–16
16. Deng, H., Runger, G., Tuv, E., Vladimir, M.: A time series forest for classiﬁcation
and feature extraction. Information Sciences 239 (2013) 142–153
17. Dempster, A., Petitjean, F., Webb, G.I.: Rocket: Exceptionally fast and accurate
time series classiﬁcation using random convolutional kernels. arXiv:1910.13051
(2019)
18. Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: Deepar: Probabilistic
forecasting with autoregressive recurrent networks. International Journal of Fore-
casting 36(3) (2020) 1181–1191
19. Boull´e, M.: Khiops: Outil d’apprentissage supervis´e automatique pour la fouille
de grandes bases de donn´ees multi-tables. In: Actes de la conf´erence Extraction et
Gestion des Connaissances. (2016) 505–510
20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine
learning in python. Journal of machine Learning research 12 (2011) 2825–2830
21. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual predic-
tion with LSTM. In: Proceedings of the 9th International Conference on Artiﬁcial
Neural Networks (ICANN). (1999) 850–855
22. Taylor, S., Letham, B.: Forecasting at scale. The American Statistician 72(1)
(2018) 37–45
23. Hersbach, H.: Decomposition of the continuous ranked probability score for en-
semble prediction systems. Weather and Forecasting 15(5) (2000) 559–570
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in $$<\,2\,\hbox {h}$$, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.
Article
Full-text available
(1) Objective: Blood glucose forecasting in type 1 diabetes (T1D) management is a maturing field with numerous algorithms being published and a few of them having reached the commercialisation stage. However, accurate long-term glucose predictions (e.g., >60 min), which are usually needed in applications such as precision insulin dosing (e.g., an artificial pancreas), still remain a challenge. In this paper, we present a novel glucose forecasting algorithm that is well-suited for long-term prediction horizons. The proposed algorithm is currently being used as the core component of a modular safety system for an insulin dose recommender developed within the EU-funded PEPPER (Patient Empowerment through Predictive PERsonalised decision support) project. (2) Methods: The proposed blood glucose forecasting algorithm is based on a compartmental composite model of glucose–insulin dynamics, which uses a deconvolution technique applied to the continuous glucose monitoring (CGM) signal for state estimation. In addition to commonly employed inputs by glucose forecasting methods (i.e., CGM data, insulin, carbohydrates), the proposed algorithm allows the optional input of meal absorption information to enhance prediction accuracy. Clinical data corresponding to 10 adult subjects with T1D were used for evaluation purposes. In addition, in silico data obtained with a modified version of the UVa-Padova simulator was used to further evaluate the impact of accounting for meal absorption information on prediction accuracy. Finally, a comparison with two well-established glucose forecasting algorithms, the autoregressive exogenous (ARX) model and the latent variable-based statistical (LVX) model, was carried out. (3) Results: For prediction horizons beyond 60 min, the performance of the proposed physiological model-based (PM) algorithm is superior to that of the LVX and ARX algorithms. When comparing the performance of PM against the secondly ranked method (ARX) on a 120 min prediction horizon, the percentage improvement on prediction accuracy measured with the root mean square error, A-region of error grid analysis (EGA), and hypoglycaemia prediction calculated by the Matthews correlation coefficient, was 18.8 % , 17.9 % , and 80.9 % , respectively. Although showing a trend towards improvement, the addition of meal absorption information did not provide clinically significant improvements. (4) Conclusion: The proposed glucose forecasting algorithm is potentially well-suited for T1D management applications which require long-term glucose predictions.
Conference Paper
Full-text available
https://rd.springer.com/chapter/10.1007%2F978-3-030-33607-3_36 Seasonal behaviours are widely encountered in various applications. For instance, requests on web servers are highly influenced by our daily activities. Seasonal forecasting consists in forecasting the whole next season for a given seasonal time series. It may help a service provider to provision correctly the potentially required resources, avoiding critical situations of over- or under provision. In this article, we propose a generic framework to make seasonal time series forecasting. The framework combines machine learning techniques 1) to identify the typical seasons and 2) to forecast the likelihood of having a season type in one season ahead. We study this framework by comparing the mean squared errors of forecasts for various settings and various datasets. The best setting is then compared to state-of-the-art time series forecasting methods. We show that it is competitive with these approaches.
Article
Full-text available
This paper presents a comparison of the impact of various unsupervised ensemble learning methods on electricity load forecasting. The electricity load from consumers is simply aggregated or optimally clustered to more predictable groups by cluster analysis. The clustering approach consists of efficient preprocessing of data obtained from smart meters by a model-based representation and the K-means method. We have implemented two types of unsupervised ensemble learning methods to investigate the performance of forecasting on clustered or simply aggregated load: bootstrap aggregating based and the newly proposed density-clustering based. Three new bootstrapping methods for time series analysis methods were newly proposed in order to handle the noisy behaviour of time series. The smart meter datasets used in our experiments come from Australia, London, and Ireland, where data from residential consumers were available. The achieved results suggest that for extremely fluctuating and noisy time series the forecasting accuracy improvement through the bagging can be a challenging task. However, our experimental evaluation shows that in most of the cases the density-based unsupervised ensemble learning methods are significantly improving forecasting accuracy of aggregated or clustered electricity load.
Article
Full-text available
A key enabler for optimizing business processes is accurately estimating the probability distribution of a time series future given its past. Such probabilistic forecasts are crucial for example for reducing excess inventory in supply chains. In this paper we propose DeepAR, a novel methodology for producing accurate probabilistic forecasts, based on training an auto-regressive recurrent network model on a large number of related time series. We show through extensive empirical evaluation on several real-world forecasting data sets that our methodology is more accurate than state-of-the-art models, while requiring minimal feature engineering.
Conference Paper
Full-text available
The choice of an appropriate representation remains crucial for mining time series, particularly to reach a good trade-off between the dimensionality reduction and the stored information. Symbolic representations constitute a simple way of reducing the dimensionality by turning time series into sequences of symbols. SAXO is a data-driven symbolic representation of time series which encodes typical distributions of data points. This approach was first introduced as a heuristic algorithm based on a regularized coclustering approach. The main contribution of this article is to formalize SAXO as a hierarchical coclustering approach. The search for the best symbolic representation given the data is turned into a model selection problem. Comparative experiments demonstrate the benefit of the new formalization, which results in representations that drastically improve the compression of data.
Conference Paper
Full-text available
Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow indefinitely and eventually cause the network to break down. Our remedy is an adaptive “forget gate” that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review an illustrative benchmark problem on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve a continual version of that problem. LSTM with forget gates, however, easily solves it in an elegant way.
Article
Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts – especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.
Article
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data-mining methods, not only due to its exploratory power but also because it is often a preprocessing step or subroutine for other techniques. In this article, we present k-Shape and k-MultiShapes (k-MS), two novel algorithms for time-series clustering. k-Shape and k-MS rely on a scalable iterative refinement procedure. As their distance measure, k-Shape and k-MS use shape-based distance (SBD), a normalized version of the cross-correlation measure, to consider the shapes of time series while comparing them. Based on the properties of SBD, we develop two new methods, namely ShapeExtraction (SE) and MultiShapesExtraction (MSE), to compute cluster centroids that are used in every iteration to update the assignment of time series to clusters. k-Shape relies on SE to compute a single centroid per cluster based on all time series in each cluster. In contrast, k-MS relies on MSE to compute multiple centroids per cluster to account for the proximity and spatial distribution of time series in each cluster. To demonstrate the robustness of SBD, k-Shape, and k-MS, we perform an extensive experimental evaluation on 85 datasets against state-of-the-art distance measures and clustering methods for time series using rigorous statistical analysis. SBD, our efficient and parameter-free distance measure, achieves similar accuracy to Dynamic Time Warping (DTW), a highly accurate but computationally expensive distance measure that requires parameter tuning. For clustering, we compare k-Shape and k-MS against scalable and non-scalable partitional, hierarchical, spectral, density-based, and shapelet-based methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable methods in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable approaches, with one exception, namely k-medoids with DTW, which achieves similar accuracy. However, unlike k-Shape, this approach requires tuning of its distance measure and is significantly slower than k-Shape. k-MS performs similarly to k-Shape in comparison to rival methods, but k-MS is significantly more accurate than k-Shape. Beyond clustering, we demonstrate the effectiveness of k-Shape to reduce the search space of one-nearest-neighbor classifiers for time series. Overall, SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications.
Article
In a data-rich environment, forecasting economic variables amounts to extracting and organizing useful information from a large number of predictors. So far, the dynamic factor model and its variants have been the most successful models for such exercises. In this paper, we investigate a category of LASSO-based approaches and evaluate their predictive abilities for forecasting twenty important macroeconomic variables. These alternative models can handle hundreds of data series simultaneously, and extract useful information for forecasting. We also show, both analytically and empirically, that combing forecasts from LASSO-based models with those from dynamic factor models can reduce the mean square forecast error (MSFE) further. Our three main findings can be summarized as follows. First, for most of the variables under investigation, all of the LASSO-based models outperform dynamic factor models in the out-of-sample forecast evaluations. Second, by extracting information and formulating predictors at economically meaningful block levels, the new methods greatly enhance the interpretability of the models. Third, once forecasts from a LASSO-based approach are combined with those from a dynamic factor model by forecast combination techniques, the combined forecasts are significantly better than either dynamic factor model forecasts or the naïve random walk benchmark.