Conference PaperPDF Available

Probabilistic Forecasting of Household Electrical Load Using Artificial Neural Networks


Abstract and Figures

For optimizing the usage of electricity, energy management systems require forecasts of electrical consumption on a single-household level. As this consumption is subject to high uncertainties, state-of-the-art point-forecasting methods fail to provide accurate predictions. In order to overcome this challenge, this paper incorporates the uncertainty into a probabilistic forecast using density-estimating Artificial Neural Networks. As such, Mixture Density Networks (MDN) and Softmax Regression Networks (SRN) are implemented and compared on three different datasets over a broad range of hyper-parameter configurations. The evaluation shows that both neural network models generate reliable forecasts of the probability density over the future consumption, which significantly outperform an unconditional benchmarking model. Furthermore, the experiments demonstrate that a decreased dataset granularity and lagged input improve the forecasts, while using additional calendar inputs and increasing the length of lagged inputs had little effect.
Content may be subject to copyright.
Probabilistic Forecasting of Household Electrical
Load Using Artificial Neural Networks
Julian Vossen, Baptiste Feron, Antonello Monti
Institute for Automation of Complex Power Systems, E.ON Energy Research Center, RWTH University Aachen,
Germany; [bferon, amonti]
Abstract—For optimizing the usage of electricity, energy man-
agement systems require forecasts of electrical consumption on
a single-household level. As this consumption is subject to high
uncertainties, state-of-the-art point-forecasting methods fail to
provide accurate predictions. In order to overcome this challenge,
this paper incorporates the uncertainty into a probabilistic fore-
cast using density-estimating Artificial Neural Networks. As such,
Mixture Density Networks (MDN) and Softmax Regression Net-
works (SRN) are implemented and compared on three different
datasets over a broad range of hyper-parameter configurations.
The evaluation shows that both neural network models generate
reliable forecasts of the probability density over the future
consumption, which significantly outperform an unconditional
benchmarking model. Furthermore, the experiments demonstrate
that a decreased dataset granularity and lagged input improve the
forecasts, while using additional calendar inputs and increasing
the length of lagged inputs had little effect.
Index Terms—STLF, Smart meter, Neural network, Probabilis-
tic forecasting
FCumulative distribution function
EError function used to fit the ANN model
xANN input vector
yANN output vector with forecasted value
ˆyObserved or real value
p(y|x)Probability density of y given x
SScoring function
XSet of conditioning variables
σStandard deviation
µiCentroids (MDN)
φiKernel functions (MDN)
αiMixing coefficients (MDN)
THE increase of distributed energy resources is one of
the biggest challenges faced by the energy sector in the
upcoming years due to their volatile and irregular nature. At
the same time, the spread of domestic smart appliances and
storage systems enable consumers to become an active part of
the grid. In the future, the domestic flexibility sources could
be controlled through a Energy Management System (EMS) in
order to provide grid services, reduce energy costs or reduce
carbon emission [1]. In literature, the EMS approaches are
mainly based on optimization formulation and require reliable
short-term load forecasts (STLF) [2].
Therefore, research [3], [4], [5], [6] has been focused on
generating STLF on a single-household level using methods
such as Auto-Regressive Moving-Average, Artificial Neural
Networks (ANNs) and Support Vector Regression. Most com-
monly, these methods output a single point forecast per time
step and are therefore referred to as point-forecasting methods.
However, as consumption on a household level is subject
to high volatility and unpredictable human behaviour, these
point-forecasts bear high errors. In fact, depending on the
dataset used, even advanced models fail to outperform naive
benchmarking methods [4].
As a way of dealing with this high uncertainty, probabilistic
forecasting methods provide information on the distribution
of future values. This can be in the form of intervals with
assigned probabilities or probability density functions (PDFs).
When probabilistic forecasts of the future electrical consump-
tion are available, a stochastic optimization of the consumption
can account for the uncertainty. Thereby, it can be ensured
that the decision strategy is not only locally optimal for
the expected value of the future consumption, but a global
optimum over the predicted distribution of future values [7].
However, literature on probabilistic forecasting of electrical
load on a single-household level is still sparse [8]. To the
best of the authors’ knowledge, there are only two studies
yet available. In [9], density forecasts are generated using
conditional kernel density estimation and in [10] using additive
quantile regression. While ANNs have often been applied for
point-forecasting, transferring ANNs to probabilistic electrical
load forecasting is still missing. In addition, the forecasts in
[9] and [10] were evaluated on a dataset of low temporal
resolution. Therefore, the effect of decreasing the dataset
granularity shall be investigated.
Aside from load forecasting, density-estimating ANNs have
been successfully applied to generate probabilistic forecasts in
other domains where forecasts are subject to a high uncertainty
[11], [12]. The objective of this study is to enrich the STLF
literature by
presenting the implementation of Mixture Density Net-
works (MDN) and Softmax Regression Networks (SRN)
for modelling probability density;
evaluating and comparing these approaches to an uncon-
ditional benchmarking forecasting method;
… … …
Neural,network Mixture,model
Fig. 1: Mixture Density Network: The output of a neural
network parametrizes a Gaussian mixture model.
studying the influence of a wide range of model
hyper-parameters (dataset granularity, inputs configura-
tion, ANN architecture).
This study is structured into three parts. First, we provide a
brief overview over the methods used in this study. Secondly,
we describe the setup and implementation of the experiments
conducted in this study and finally, we provide and discuss
the results of these experiments.
In this section we provide a brief overview over the forecast-
ing methods used in this study. Before covering the density-
estimating neural networks, the following paragraph provides
a short introduction for those readers unfamiliar with neural
networks. Throughout the rest of this paper, yrefers to the
forecasted value and xto the lagged inputs used to forecast
this value.
A. Artificial Neural Networks
Neural networks are computing structures, which consist
of interconnected artificial neurons. An artificial neuron is
a function that computes a single output by calculating the
weighted sum of its inputs and applying a non-linear activation
function, e.g. exponential, softmax. Many of such neurons are
connected in layers to form a network, whereby the output of
one layer is fed as input to the following layer. By adjusting
the input weights of each neuron, the resulting network can
be fit to map an input vector to an output vector. With mild
assumptions on the activation function, neural networks can
be thought of as universal function approximators. Fitting
the network weights to represent a function given observed
input and output examples can be done by backpropagation.
Thereby, a so-called error function quantifies how effective the
network captures the relation between input and outputs of the
training examples. Then, the network weights are iteratively
updated towards the direction of a reducing error function.
B. Mixture Density Networks
Conventional least-square regression neural networks can
be derived from maximum likelihood by assuming the target
data to be Gaussian distributed [13]. This motivates the idea
of replacing the Gaussian distribution with a mixture model,
which can model generic distribution functions [13]. Hence,
Fig. 2: Softmax Regression Network: The output of a neural
network represents the probability of class membership. In
this case class membership means the prediction variable yis
falling into the respective interval.
the probability density of the target data is represented as a
linear combination of kernel functions (Eq. 1)
p(y|x) =
where αi(x)are mixing coefficients conditioned on the input
vector xand φi(y|x)represents a kernel function. Gaussian
kernels (Eq. 2) are used in this study as in [13].
φi(y|x) = 1
p2σi(x)2πexp (yµi(x))2
With σi(x)as standard deviations conditioned on x. Therefore,
the output layer of the neural network resembles a parameter
vector [αi(x), µi(x), σi(x)]. The architecture of the mixture
density model is shown in Figure 1.
Using respective activation functions in the output layer
ensures that the network outputs valid parameter vectors.
In this paper, a softmax activation is used for the mixing
coefficients α, and a simple exponential function for the
standard deviations σ, while the means are unrestricted. In
a post-processing step, the non-negative load values of the
model are assigned to a positive probability density. Thereby,
the cumulative distribution function is set to zero for negative
electrical loads.
The network can be fit to observations using backpropaga-
tion. Therefore, an error function is defined to quantify the
quality of the PDF forecasted, given observations as a single
scalar. The error function E(y, ˆy)is constructed using the
maximum likelihood criterion by taking the negative logarithm
of the likelihood, also called negative log-likelihood (Eq. 3).
E(y, ˆy) = ln L(ˆy|x) = ln py|x)(3)
MDNs have been successfully applied to a wide range of
problems, such as financial forecasting [12], weather forecast-
ing [11] or speech synthesis [14], as they can approximate
arbitrary probability distributions. However, there is an alter-
native to MDNs, which approximates the probability density
function at discrete sample points, by binning the output
range of the target variable and applying a softmax activation
function to the network output. This technique is referred as
Softmax Regression Networks (SRNs) and is introduced in the
following section.
C. Softmax Regression Networks
Like the above described MDNs, SRNs can be used to ap-
proximate arbitrary probability distributions. Instead of assum-
ing a kernel mixture model parametrized by the neural network
output, each output neuron represents the mean probability
density for a fraction of the output space. These fractions are
referred to as bins (Fig. 2). Normalizing the network output
by applying a softmax function to the output layer, the sum
of all bins is ensured to be one. Hence, the outputs of each
neuron can be also interpreted as the probability that the target
variable ylies in the respective bin.
Analog to the MDN, the negative log-likelihood can be used
as error function, which for the softmax output layer with i
discrete bins yibecomes:
E(y, ˆy) = ln L(ˆy|x) = ln p(yarg min
D. Benchmarking methods
Two benchmarking models are implemented to compare and
evaluate the proposed ANN models.
First, an unconditional model obtains the overall distribution
of the target variable as a histogram from the training data
and returns this histogram as a forecast for future values.
The model is so-called unconditional as the forecasted
distribution is constant for all forecast steps, independent of
any conditioning variables.
A second benchmark is used to isolate the effect of mod-
elling a time-dependent density compared to a point-forecast.
Instead of training a separate model with a single output
neuron and mean squared error, the forecast is derived from
the predictive conditional mean of a single-component MDN.
In this way, the information on the distribution from the
probabilistic forecasts are discarded to result in a comparable
point-forecast. To compare the point-forecast with a proba-
bilistic one, a Gaussian distribution with a standard derivation
obtained from the residuals in the training data is added
around the point-forecast. This density forecast summarizes all
information on the uncertainty that a point-forecasts provides
and results in a homoscedastic benchmarking model, as the
variance of the forecasted density is constant over time.
E. Evaluation metrics
A scoring function S(p(y),ˆy)is used to evaluate the quality
of probabilistic forecasts, by comparing a predicted probability
density function p(y)to a scalar observation ˆy. To ensure that
the score motivates forecasting the true distribution over any
other, the score needs to be proper. In this paper, forecast
performance is evaluated using the Continuous Ranked Proba-
bility Score (CRPS, Eq. 5) as it is proper and has two favorable
properties [15]:
its unit is identical to the forecast variable, which makes
it more descriptive than e.g. the logarithmic score;
for point-forecasts it becomes the absolute error, which
provides a way to compare probabilistic forecasts and
TABLE I: Hyper-parameters evaluated during the grid search
Hyper parameter Values
Number of hidden layers 1,3,9
Number of hidden neurons 1,10,40,120,360
Dataset granularity [min] 1,5,30
Length of lagged input [min] 1,30,60,360,1440
Use calendar inputs T rue, F alse
Forecast horizon [min] 60
The CRPS is defined as:
CR P S(F(y),ˆy) = Z
−∞ F(z)1(zˆy)2
dz (5)
where F(y)is the cumulative distribution function of the
forecast and 1() is the Heaviside step function.
The models output the forecasted distribution over the total
consumption within the forecast interval conditioned on a
fixed number of most recent load observations and calendar
variables. As calendar variables we use the time of the day,
the day of the week and the month of the year encoded as
numbers in the interval [0,1]. The load recordings are scaled
into the same order of magnitude. Training examples are then
constructed based on input sequences, calendar variables and
respective consumptions during the forecast horizon. If there
are missing or invalid recordings during the forecast horizon,
the respective example is excluded from training data as it
would cause the model to learn on invalid values. Both models
are implemented using Keras and Tensorflow. Then, separate
models for each household are trained on the first 80% of the
training examples, cross-validated using the following 10%
and tested on the most recent 10% of recordings. To provide
insights on how the choice of hyper-parameters effects the
forecasting performance, both models are evaluated on each
dataset for all combinations of the hyper-parameters provided
in Table I.
Fig. 3: Typical domestic electrical load consumption (Smart*
dataset [16]) with a high volatility and changing patterns.
While this kind of grid search is not the most efficient way
to find a single good-performing hyper-parameter combina-
tion, it allows us to gain insights into how the hyper-parameters
affect the forecasting performance.
A. Data
Numerous datasets have been recorded to perpetuate re-
search in load forecasting and disaggregation. In this study,
Fig. 4: Mean CRPS value of the unconditional benchmark and
the presented approaches: Softmax Regression Network (SRN)
and Mixture Density Network (MDN).
Fig. 5: Normalized CRPS for different input granularities and
lengths of lagged input; Decreasing the granularity improves
the Smart* [16], the UK-DALE [17] and a UCI [18] dataset
(Fig. 3) are used to evaluate the proposed forecasting methods,
as these are publicly available, exhibit a fine granularity (<1
minute) and include relatively long recording periods.
This section highlights the impact of the hyper-parameters
on the forecasting performance. For this purpose, a single
hyper-parameter is varied, while the others are unrestricted.
The mean CRPS is normalized by dividing the CRPS of the
model with a specific hyper-parameter by the overall best
CRPS when all hyper-parameters are unrestricted. That some
values fall below a normalized CRPS of one is a consequence
of the cross-validation. The best performing model on the
cross-validation data is not necessarily the best performing
model on the test data.
The following part further elaborates on A) the impact
of the considered model inputs B) the optimum network
configuration and C) the impact of the number of mixture
components for the MDN model.
A. Model inputs
Figure 5 shows that feeding lagged input data in a finer
granularity improves the forecasting performance. However,
increasing the length of lagged input, does not significantly
improve the performance. That indicates that the forecast
mostly depends on the most recent observation before the
forecast horizon. The models do not seem to exploit higher-
order patterns in the data. Hence, the increasing performance
for lower granularities is likely an effect of the most recent
Fig. 6: Comparison of different ANN input configurations
with the benchmark model for different combination of lagged
inputs (previous power consumption) and calendar variables.
Fig. 7: Comparison of different model architectures; The
overall best configuration is three hidden layers with 100
neurons each. However, performance gains compared to very
small networks are relatively small and depend on the dataset.
recording more accurately approximating the load during the
forecast horizon rather than the exhibition of fine-granular
patterns in the data. Figure 6 shows the effect of conditioning
the models on different inputs. The best performing model
conditioned on both lagged input and calendar variables
is compared to the best performing models conditioned
on either only lagged input or only calendar variables
and the unconditional benchmark. Models based only on
calendar inputs perform slightly better than the unconditional
benchmark. Conditioning models only on lagged input
significantly improves the forecasting performance. However,
including calendar variables in addition to lagged inputs, has
a little effect.
B. Network configuration
Different network configurations (Fig. 7) were investigated.
Results highlight that a configuration of three hidden layers
with 100 neurons each, achieve the best overall performance.
Still, the gains compared to configurations with only few
hidden neurons are relatively small. This small performance
difference indicates that the feed forward ANN structure barely
learns complex features in the input data.
C. MDN: Number of mixture components
The performance of the best MDN with only a single Gaus-
sian component is compared to a MDN with five components
Fig. 8: Comparison of different number of mixture compo-
nents; restricting the predictive density to be Gaussian leads
to a worse performance compared to a density mixed from
five Gaussian components.
Fig. 9: Forecasts comparison between a time-dependent and a
constant variance models with identical conditional means.
(Fig. 8). Increasing the number of considered components
allows the forecasted density function to take more generic
shapes instead of restricting it to be one single Gaussian. This
leads to a consistently better forecasting performance.
This section aims at evaluating the presented models with an
unconditional benchmarking model and a second benchmark-
ing model with a time-constant variance (see section II-D).
A. Benchmarking model versus MDN and SRN models
The MDN and SRN forecasting models are compared to
the unconditional benchmark (Fig. 4 and 6). The presented
results are obtained comparing the models with the overall
best-performing combination of hyper-parameters from Table
The results highlight that both ANN models achieve a
similar performance and clearly outperform the unconditional
benchmark on the different datasets considered.
B. Time-dependent versus constant variance model
Both presented ANN models output a time-varying variance
of the predictive densities over the forecast steps (Fig. 9),
which means that these models can capture heteroscedacity.
Fig. 10: Performance comparison of MDN model with time-
dependent and constant variance.
Fig. 11: Reliability plots for one and five-component MDN
model. Ideal reliability is indicated by the angle-bisector
(dashed line). Single Gaussian forecasts are less reliable than
the more generic five-component ones.
Therefore, this section aims at evaluating the added value of
considering a time-dependent variance in terms of the CRPS.
The results demonstrate that the time-dependent variance
model, always scores better than the constant variance model,
even though both models have an identical predictive condi-
tional mean (Fig. 10). The better performance of the time-
dependent model indicates that the uncertainty can be well
captured and highlight the added value of using forecasting
methods capturing time-dependent variance, unlike the state-
of-the-art point forecasting methods.
C. Reliability
The forecast reliability or calibration describes the statistical
compatibility between the forecasted PDF and the realizations.
This means that if the model assigns a probability pto an
outcome, the proportion of realizations matching this outcome
should converge towards pfor a large number of experiments.
This behaviour can be evaluated using reliability plots, where
the frequency of observations falling into the predicted quan-
tile is plotted over the predictive quantile itself (see Fig. 11).
The figure shows that, especially, for the longer datasets
UK-DALE and UCI, the forecasts show little deviation for the
ideal reliability line (the angle bisector). This shows that even
though the mean absolute errors can be high, the uncertainty
in electrical load forecasts are well captured. Furthermore, the
plots indicate that the five-component MDN is more reliable
than the single-Gaussian model. Overall, the single-Gaussian
model seems to assign too much probability to small values
of the consumption, which is indicated by the deviation from
the angle bisector for low predicted probabilities.
Because of the high errors in state-of-the-art single-point
forecasts, the primary objective of this study was to present
a probabilistic forecasting model based on artificial neural
networks (ANNs) for quantifying the forecast uncertainty. The
second objective was to determine the influence of different
model input configurations on the forecasting performance.
Towards these objectives, two different density estimating
ANNs have been implemented: First, a Mixture Density Net-
work (MDN), which approximates the predictive probabil-
ity density function as a mixture of Gaussian kernels and,
second, a Softmax Regression Network (SRN) model, which
approximates the predictive probability density as a discrete
distribution over output bins. Both models were evaluated
over a variety of different configurations on the Smart*, the
UK-DALE and a UCI dataset, which consist of individual
household electrical load recordings. This evaluation led to
the following conclusions:
First and most important, it has been shown that MDNs
and SRNs can generate reliable probabilistic forecasts that sig-
nificantly outperform an unconditional benchmarking model.
Conditioning the models on lagged inputs significantly im-
proves the forecasting performance. However, models con-
ditioned on the lagged electrical load of the past 30 or 60
minutes only moderately outperform models conditioned on
solely the most recent lagged electrical load. Lagged input
of more than 60 minutes did not result in better forecasts.
The forecasting performance improves when increasing the
temporal resolution (granularity) of the training data. This is
likely due to the availability of lagged inputs closer to the
forecast horizon rather than the exploitation of higher-order
patterns exhibited by a finer granularity. Conditioning models
on calendar variables (time of the day, day of the week, month
of the year) had no effect on the forecasting performance,
when using lagged inputs. Assuming the predictive distribu-
tions to be Gaussian is restrictive, as it reduces the overall
performance and the reliability of the forecasts.
The feedforward ANNs used in this study were not able
to benefit much from more lagged input, but mostly depend
on the most recent electrical load observation. Hence, further
research can focus on trying to increase the gains from more
lagged input, by using more advanced model architectures. In
particular, recurrent neural networks or convolutional neural
networks could be combined with the output layers used in
this thesis to result in e.g. recurrent MDNs. These can be
evaluated against the feedforward ones used in this study.
[1] M. Beaudin and H. Zareipour, “Home energy management systems:
A review of modelling and complexity,” Renewable and Sustainable
Energy Reviews, vol. 45, pp. 318–335, 2015.
[2] B. Feron and A. Monti, “An agent based approach for virtual power
plant valuing thermal flexibility in energy markets,IEEE Powertech
Manchester, 2017.
[3] A. K. Singh, Ibraheem, S. Khatoon, and M. Muazzam, “An overview
of electricity demand forecasting techniques,” in National Conference
on Emerging Trends in Electrical, Instrumentation & Communication
Engineering, vol. 3, no. 3, 2013.
[4] A. Veit, C. Goebel, R. Tidke, C. Doblander, and H.-A. Jacobsen,
“Household electricity demand forecasting - benchmarking state-of-the-
art methods,” in Proceedings of the 5th international conference on
Future energy systems, 2014, pp. 233–234.
[5] H.-T. Yang, J.-T. Liao, and C.-I. Lin, “A load forecasting method for
hems applications,” in IEEE PowerTech Grenoble, 2013.
[6] R. E. Edwards, J. New, and L. E. Parker, “Predicting future hourly
residential electrical consumption: A machine learning case study,
Energy and Buildings, vol. 49, pp. 591–603, 2012.
[7] T. Gneiting and M. Katzfuss, “Probabilistic forecasting,Annual Review
of Statistics and Its Application, no. 1, pp. 125–151, 2014.
[8] T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial
review,” International Journal of Forecasting, no. 32, pp. 914–938, 2015.
[9] S. Arora and J. W. Taylor, “Forecasting electricity smart meter data using
conditional kernel density estimation,” OMEGA - The International
Journal of Management Science, 2016.
[10] S. B. Taieb, R. Huser, R. J. Hyndman, and M. G. Genton, “Forecasting
uncertainty in electricity smart meter data by boosting additive quantile
regression,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2448–
2455, 2016.
[11] M. Felder, A. Kaifel, and A. Graves, “Wind power prediction using
mixture density recurrent neural networks.”
[12] D. Ormoneit and R. Neuneier, “Experiments in predicting the german
stock index dax with density estimating neural networks,” in IEEE/IAFE
1996 Conference on Computational Intelligence for Financial Engineer-
ing (CIFEr), 1996.
[13] C. M. Bishop, “Mixture density networks,” Neural Computing Research
Group, Tech. Rep., 1994.
[14] H. Zen and A. Senior, “Deep mixture density networks for acoustic mod-
eling in statistical parametric speech synthesis,” in IEEE International
Conference on Acoustic, Speech and Signal Processing, 2014.
[15] T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction
and estimation,” Journal of the American Statistical Association, 2007.
[16] S. Barker, A. M. D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht,
“Smart*: An open data set and tools for enabling research in sustain-
able homes,” in Proceedings of the 2012 Workshop on Data Mining
Applications in Sustainability, 2012.
[17] J. Kelly and W. Knottenbelt, “The UK-DALE dataset, domestic
appliance-level electricity demand and whole-house demand from five
UK homes,” Scientific Data, vol. 2, no. 150007, 2015.
[18] M. Lichman, “UCI machine learning repository,” 2013. [Online].
... Li et al. [4] conducted a new exploration of interval forecasting technology and proposed a proportional coefficient method based on an extreme learning machine. Vossen et al. [5] put forward a short-term load probabilistic forecasting method based on density estimation and artificial neural network. Zhang et al. [6] came up with a method of constructing a forecasting interval via multi-point forecasting based on bootstrap technology. ...
... The estimated parameter β q can be obtained from Equation (6), and then the estimated value y t,q of the dependent variable under the conditional quantile q can be obtained from Equation (5). When q is continuously valued within the interval of (0, 1), the conditional distribution function of the prediction target can be obtained. ...
Full-text available
In this paper, a novel short-term load forecasting method amalgamated with quantile regression random forest is proposed. Comprised with point forecasting, it is capable of quantifying the uncertainty of power load. Firstly, a bespoke 2D data preprocessing taking advantage of empirical mode decomposition (EMD) is presented. It can effectively assist subsequent point forecasting models to extract spatial features hidden in the 2D load matrix. Secondly, by exploiting multimodal deep neural networks (DNN), three short-term load point forecasting models are conceived. Furthermore, a tailor-made multimodal spatial–temporal feature extraction is proposed, which integrates spatial features, time information, load, and electricity price to obtain more covert features. Thirdly, relying on quantile regression random forest, the probabilistic forecasting method is proposed, which exploits the results from the above three short-term load point forecasting models. Lastly, the experimental results demonstrate that the proposed method outperforms its conventional counterparts.
... Mixture density networks (MDN) and softmax regression networks (SRN) are two of the main feedforward ANN-based models aiming to obtain the distribution of uncertain parameters [23]. Regarding MDN, as a parametric model, the associated probability density is obtained from a linear combination of kernel functions [24]: (14) where x represents the input vector of the forecasting model, y is the output vector, K i (y|x) is the kernel function selected for the model, and a i (x) represents the mixing coefficients that control the inputs. In MDN models, the output neurons are the parameters of the distribution functions as well as the mixing coefficients. ...
... Mixture density networks (MDN) and softmax regression networks (SRN) are two of the main feedforward ANN-based models aiming to obtain the distribution of uncertain parameters [23]. Regarding MDN, as a parametric model, the associated probability density is obtained from a linear combination of kernel functions [24]: ...
Full-text available
This paper reviews the recent studies and works dealing with probabilistic forecasting models and their applications in smart grids. According to these studies, this paper tries to introduce a roadmap towards decision-making under uncertainty in a smart grid environment. In this way, it firstly discusses the common methods employed to predict the distribution of variables. Then, it reviews how the recent literature used these forecasting methods and for which uncertain parameters they wanted to obtain distributions. Unlike the existing reviews, this paper assesses several uncertain parameters for which probabilistic forecasting models have been developed. In the next stage, this paper provides an overview related to scenario generation of uncertain parameters using their distributions and how these scenarios are adopted for optimal decision-making. In this regard, this paper discusses three types of optimization problems aiming to capture uncertainties and reviews the related papers. Finally, we propose some future applications of probabilistic forecasting based on the flexibility challenges of power systems in the near future
... Forecasting intervals and individual quantiles provide only an estimation, whereas density forecasts can provide a model of the true distribution. Here, so far, primarily parametric approaches have been proposed, making assumptions about the underlying distribution, such as time series model ARMA-GARCH [24], a single Gaussian distribution [25], [26] or Gaussian Mixture Models (GMM) [27]. Kernel density estimates [18], [19] are a non-parametric approach but are expensive to train so that they can only consider a few conditional variables [10]. ...
Full-text available
The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.
... A structured overview of probabilistic load forecasting studies is provided in Table 8.1. The overview shows that studies have investigated a wide range of new methods, including kernel methods (Arora and Taylor, 2016), neural networks (Elvers et al., 2019;Van der Meer et al., 2018b;Vossen et al., 2018;Gan et al., 2017), Gaussian process (Shepero et al., 2018;Van der Meer et al., 2018a,b), additive quantile regression (Taieb et al., 2016), and ensemble models (Munkhammar et al., 2021;. However, most studies have applied these methods to regular households' loads. ...
... One of most common Corresponding author: Nick Rittler, general approaches in this literature is the statistical learning of complex functions taking in current information and outputting parameters governing a distribution over future outcomes; the modeling of these functions has been done by methods that include recurrent neural networks (Salinas et al. 2020), convolutional neural networks (Chen et al. 2020), and standard fully connected nets (Vossen et al. 2018). ...
Full-text available
We discuss an approach to probabilistic forecasting based on two chained machine-learning steps: a dimensional reduction step that learns a reduction map of predictor information to a low-dimensional space in a manner designed to preserve information about forecast quantities; and a density estimation step that uses the probabilistic machine learning technique of normalizing flows to compute the joint probability density of reduced predictors and forecast quantities. This joint density is then renormalized to produce the conditional forecast distribution. In this method, probabilistic calibration testing plays the role of a regularization procedure, preventing overfitting in the second step, while effective dimensional reduction from the first step is the source of forecast sharpness. We verify the method using a 22-year 1-hour cadence time series of Weather Research and Forecasting (WRF) simulation data of surface wind on a grid.
In recent years, new techniques based on artificial intelligence and machine learning in particular have been making a revolution in the work of actuaries, including in loss reserving. A particularly promising technique is that of neural networks, which have been shown to offer a versatile, flexible and accurate approach to loss reserving. However, applications of neural networks in loss reserving to date have been primarily focused on the (important) problem of fitting accurate central estimates of the outstanding claims. In practice, properties regarding the variability of outstanding claims are equally important (e.g., quantiles for regulatory purposes). In this paper we fill this gap by applying a Mixture Density Network (“MDN”) to loss reserving. The approach combines a neural network architecture with a mixture Gaussian distribution to achieve simultaneously an accurate central estimate along with flexible distributional choice. Model fitting is done using a rolling-origin approach. Our approach consistently outperforms the classical over-dispersed model both for central estimates and quantiles of interest, when applied to a wide range of simulated environments of various complexity and specifications. We further extend the MDN approach by proposing two extensions. Firstly, we present a hybrid GLM-MDN approach called “ResMDN“. This hybrid approach balances the tractability and ease of understanding of a traditional GLM model on one hand, with the additional accuracy and distributional flexibility provided by the MDN on the other. We show that it can successfully improve the errors of the baseline ccODP, although there is generally a loss of performance when compared to the MDN in the examples we considered. Secondly, we allow for explicit projection constraints, so that actuarial judgement can be directly incorporated in the modelling process. Throughout, we focus on aggregate loss triangles, and show that our methodologies are tractable, and that they out-perform traditional approaches even with relatively limited amounts of data. We use both simulated data—to validate properties, and real data—to illustrate and ascertain practicality of the approaches.
Home Energy Management Systems (HEMSs) are expected to become an inevitable part of the future smart grid technologies. To work effectively, HEMSs require reliable and accurate load forecasts. In this paper, two new modelling methods are presented. They are both suited for producing multivariate probabilistic forecasts, which consider the temporal correlation between forecast horizons. The first method employs point forecasts generated with Recursive Least Squares (RLS) models and subsequently analyses the forecasts’ residuals to estimate the marginal distributions and temporal correlation. The second method is based on quantile regression to estimate marginal distributions, and a Gaussian copula for linking them together. Furthermore, the application of two modelling approaches for the temporal correlation estimation are investigated for each of the two modelling methods. As a case study, a numerical experiment is designed to emulate an online HEMS operation using data from an inhabited home located in Denmark. Simulation results show a robust performance for the proposed models, with the quantile–copula ensemble outperforming the RLS-based models in predicting the marginal distributions and capturing the temporal correlation.
Larger shares of electricity generation based on volatile renewables often lead to high curtailment rates and thus a loss of carbon-neutral energy. Small-scale residential power-to-heat applications can help to improve this situation by flexibly increasing electricity demand and thus integrating otherwise curtailed renewable power production. Nevertheless, to include such flexibilities in overall system operation there is a need for a reliable quantitative planning and action basis. For that, we propose a method to perform probabilistic day-ahead forecasts of available thermal storage capacities for residential power-to-heat operation based on artificial neural networks. The prediction is structured as two-step approach, consisting of a day-ahead prediction of storage temperatures and a subsequent derivation of available storage capacities. In order to better address uncertainties in the residential sector, probabilistic forecasts are carried out. For the temperature prediction a neural network structure consisting of long short-term memory layers and a mixture density output is used. The predicted probability distributions of storage temperatures are subsequently sampled and transformed to probability distributions of storage capacities. To ensure suitable hyper-parameter configurations, an automated optimization of these parameters is carried out. For a demonstration of the general applicability of the approach a case study is performed based on data of a single-family household in northern Germany. We compare the approach to different deterministic and probabilistic benchmark forecasting models, showing that the proposed approach clearly outperforms the benchmark models.
Conference Paper
Full-text available
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this paper investigates the use of a mixture density output layer. It can estimate full probability density functions over real-valued output features conditioned on the corresponding input features. Experimental results in objective and subjective evaluations show that the use of the mixture density output layer improves the prediction accuracy of acoustic features and the naturalness of the synthesized speech.
Conference Paper
Full-text available
Machine learning techniques have proven ef-fective at forecasting the power output of wind turbine generators. However predictions typi-cally use a single input vector of NWP fore-casts, disregarding the potentially informative history of previous inputs. Moreover predic-tion uncertainty is often provided only when NWP ensembles are available. We address these shortcomings by using mixture density recurrent neural networks to forecast a time-dependent probability distribution over power outputs. Using historical wind farm data, we demonstrate the viability of our approach for power prediction up to 48 h, and provide a com-parison with multilayer perceptrons and base-line predictors.
Full-text available
The goal of the Smart* project is to optimize home energy con-sumption. As part of the project, we have designed and deployed a "live" system that continuously gathers a wide variety of envi-ronmental and operational data in three real homes. In contrast to prior work, our focus has been on sensing depth, i.e., collecting as much data as possible from each home, rather than breadth, i.e., collecting data from as many homes as possible. Our data captures many important aspects of the home environment, including aver-age household electricity usage every second, as well as usage at every circuit and nearly every plug load, electricity generation data from on-site solar panels and wind turbines, outdoor weather data, temperature and humidity data in indoor rooms, and, finally, data for a range of important binary events, e.g., at wall switches, the HVAC system, doors, and from motion sensors. We also have elec-tricity usage data every minute from 400 anonymous homes. This data corpus has served as the foundation for much of our recent research. In this paper, we describe our data sets as well as basic software tools we have developed to facilitate their collection. We are releasing both the data and tools publicly to the research com-munity to foster future research on designing sustainable homes.
Conference Paper
Virtual Power Plant (VPP) defines a combination of Distributed Energy Resources, seen as a single entity from a market point of view which aims maximizing its profit according to different business cases. This paper introduces a VPP concept based on an optimization on top of Real Time Control approach using a market-based Multi Agent System (MAS) which controls the flexibility of electricity and heat demand and supply using a dynamic pricing as steering signal. The peculiarity of this approach is to derive the thermal constraints of the optimization from the market bids rather than from thermal models. This reduces all the thermal constraints to a single constraint, leading to a faster optimization, that scales well with the number of houses and which can be executed often to minimize the impact of forecast and model uncertainties. In this paper, a 50 households VPP simulation has been implemented and shows significant cost savings potential.
Smart electricity meters are currently deployed in millions of households to collect detailed individual electricity consumption data. Compared with traditional electricity data based on aggregated consumption, smart meter data are much more volatile and less predictable. There is a need within the energy industry for probabilistic forecasts of household electricity consumption to quantify the uncertainty of future electricity demand in order to undertake appropriate planning of generation and distribution. We propose to estimate an additive quantile regression model for a set of quantiles of the future distribution using a boosting procedure. By doing so, we can benefit from flexible and interpretable models, which include an automatic variable selection. We compare our approach with three benchmark methods on both aggregated and disaggregated scales using a smart meter data set collected from 3639 households in Ireland at 30-min intervals over a period of 1.5 years. The empirical results demonstrate that our approach based on quantile regression provides better forecast accuracy for disaggregated demand, while the traditional approach based on a normality assumption (possibly after an appropriate Box-Cox transformation) is a better approximation for aggregated demand. These results are particularly useful since more energy data will become available at the disaggregated level in the future.
Load forecasting has been a fundamental business problem since the inception of the electric power industry. Over the past 100 plus years, both research efforts and industry practices in this area have focused primarily on point load forecasting. In the most recent decade, though, the increased market competition, aging infrastructure and renewable integration requirements mean that probabilistic load forecasting has become more and more important to energy systems planning and operations. This paper offers a tutorial review of probabilistic electric load forecasting, including notable techniques, methodologies and evaluation methods, and common misunderstandings. We also underline the need to invest in additional research, such as reproducible case studies, probabilistic load forecast evaluation and valuation, and a consideration of emerging technologies and energy policies in the probabilistic load forecasting process.
A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts. We emphasize methodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.