Content uploaded by Omar Alam
Author content
All content in this area was uploaded by Omar Alam on Oct 18, 2020
Content may be subject to copyright.
The Impact of External Features on Prediction
Accuracy in Short-term Energy Forecasting
Maher Selim, Ryan Zhou, Wenying Feng and Omar Alam
Abstract Accurate prediction of future electricity demand is important in the en-
ergy industry. Machine learning for time series prediction provides solutions for
short term energy forecasting through a variety of algorithms, such as LSTM, SVR,
Xgboost, and Facebook Prophet. However, many companies primarily rely on uni-
variate time series algorithms, while numerous external data, e.g. weather data, are
available as input features for energy forecasting. In this paper, we study the impact
of external features on the performance of univariate and multivariate time series
algorithms for Short-term Energy Forecasting using a standard benchmark energy
data set. Quantitative comparisons on prediction accuracy measured by Root Mean
Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) for the models
are obtained. It is found that multivariate algorithms using external features out-
perform univariate algorithms, and that multivariate algorithms achieve reasonable
accuracy even without using past step energy consumption as an input feature.
1 Introduction
Forecasting energy demand is critical for the energy industry as well as businesses in
related sectors, such as banks and insurance companies. It has been estimated that
a one-percent improvement in mean absolute percentage error (MAPE) can save
$300,000 annually for a utility company with a 1 GW peak load, and millions of
dollars for larger ones [1]. An accurate forecast of upcoming energy consumption
allows utility companies to plan and make decisions in real-time for all processes
in their system and is a requirement to build automated smart energy grids [2, 3].
However, this problem has increasingly become more complex in recent years due
to growing energy markets [1] and the introduction of renewable sources which
are tightly coupled with external variables such as weather conditions. Despite the
M. Selim, R. Zhou, W. Feng, O. Alam
Trent University, Peterborough, ON Canada. Corresponding: wfeng@trentu.ca (Wenying Feng).
1
2 M. Selim, R. Zhou, W. Feng and O. Alam
fact that energy forecasting is increasingly becoming a multivariate problem, many
companies in the energy sector continue to use univariate time series algorithms,
which only consider the usage history of electricity consumption.
Electricity forecasting using data-driven approaches, such as, machine learning is
the subject of ongoing research [1]. A recent survey [1] shows that the percentages of
machine learning algorithms investigated for Short-Term Energy Load Forecasting
(STELF) are as follows: 4% decision trees, 24% statistical and other algorithms,
25% support vector machines (SVM), and 47% artificial neural networks (ANN)
[1, 4, 5].
Research in energy forecasting focuses primarily on improvement of univari-
ate models [6, 7]. Our contributions in this paper are twofold. First, we test repre-
sentative models using algorithms from each of the above surveyed categories [1],
and demonstrate that multivariate approaches consistently outperform the univari-
ate model Facebook Prophet for energy forecasting. To this end, four computational
models are adopted to our research purpose and tested: Long Short-Term Mem-
ory neural networks (LSTM) [8], Support Vector Regression (SVR) [9], Gradient
Boosted Trees [10], and Facebook Prophet [11]. Prediction accuracy for all models
is compared with Root Mean Square Error (RMSE) and Mean Absolute Percentage
Error (MAPE). Second, we show that even when the past energy load is excluded
as a feature, the models are capable of predicting future load based on external fea-
tures and even when the features are measured on a larger timescale than the target
variable. These results are of interest to the energy industry as they demonstrate a
simple and computationally light method to improve currently used models.
The remainder of this paper is organized as follows. Section 2 provides an
overview of machine learning time series algorithms studied. Section 3 discusses
the computational models and methodology implementation. Section 4 explains our
experimental results and Section 5 concludes the paper.
2 Notable multivariate and univariate machine learning time
series algorithms
We first briefly discuss the mathematical background for notable time series algo-
rithms including the multivariate algorithms LSTM [8], SVR [9], Gradient Boosted
Trees via XGBoost [10], and the univariate Facebook Prophet package [11]. This
forms the basis for our model development and implementation to be explained in
Section 3.
Univariate and multivariate time series models: The univariate time series is a
set of continuous observations for a single variable with constant time steps [12],
univariate models aim to predict future values for that single variable based only on
its past values:
ˆxt=F(xt−1,xt−2,xt−3, ...)
The Impact of External Features on Prediction Accuracy 3
where xtrepresents the value of the target variable at time t, and Fis the learned
function. The multivariate time series is defined as observations of one or more
variables and features often taken simultaneously and describes the interrelation-
ships among the series [13, 14, 15]. Multivariate models use variables and features
of time-series data to develop a model to forecast future prediction for the target
variable:
ˆxt=F(xt−1,xt−2,xt−3,...,a(1)
t−1,a(1)
t−2,a(1)
t−3,...,a(2)
t−1,a(2)
t−2,a(2)
t−3,...)
where each a(i)represents the time series of an external feature. We also investigate
models of the form
ˆxt=F(a(1)
t−1,a(1)
t−2,a(1)
t−3,...,a(2)
t−1,a(2)
t−2,a(2)
t−3,...)
where past information about the target variable is unavailable.
Long Short-Term Memory (LSTM) neural networks: LSTM is a type of recur-
rent neural network architecture designed to extract long-term dependencies out of
sequential data and avoid the vanishing gradient problem present in ordinary re-
current networks [16, 17]. These properties make LSTM the method of choice for
longer time series and sequence prediction problems [18, 19]. LSTMs have been
successfully applied to Short-Term Electricity Load Forecasting (STELF) modeling
[2, 8]. There are several variations of the LSTM unit, but in this paper we use the
standard architecture designed by Graves and Schmidhuber [16].
The key idea behind LSTM is to introduce a memory cell to the standard RNN
architecture [2, 8]. This memory cell allows the LSTM module to retain information
across many timesteps when needed [18, 19].
Support Vector Regression (SVR): Nonlinear support vector regression is an ex-
tension of the support vector machine (SVM) to regression problems [20]. The sta-
tistical learning theory for support vector regression is developed in [21]. Assuming
that D={xi,yi}n
i=1is a training dataset, where xi∈Rdare the system features and
yi∈Ris the main system output observations, the goal of ε-SVR is to find a function
f(x)that has no more than εdeviation from the observed output yifor all training
data.
This leads to the Support Vector Regression optimization:
min
w
1
2||w||2+C∑
nξn+ˆ
ξn
s.t. yn−wTx−ξn≤ε
−(yn−wTx)−ˆ
ξn≤ε
ξn≥0,ˆ
ξn≥0.
where wis the learned weight vector, xiis the i-th training instance, yiis the train-
ing label, and ξithe distance between the bounds and predicted values outside the
4 M. Selim, R. Zhou, W. Feng and O. Alam
bounds. Cis a parameter set by the user that controls the penalty imposed on obser-
vations outside the bounds, which helps to prevent overfitting. The SVR uses kernel
functions to transform the data into a higher dimensional feature space to make it
possible to perform the linear separation. In this paper, we use three different ker-
nels with SVR namely (a) Linear, (b) Polynomial, (c) Radial Basis Function (RBF).
Facebook Prophet: Prophet uses a decomposable time series model [22, 11] which
models three components: trend, seasonality, and holidays. They are combined ad-
ditively as follows:
y(t) = g(t) + s(t) + h(t) + ε1,(1)
where g(t)is a piece-wise linear or a logistic growth curve for modelling the trend
function that catches non-periodic changes in the value of the time series, s(t)rep-
resents periodic changes (e.g., weekly and yearly seasonality), and h(t)represents
the effects of holidays which occur on potentially irregular schedules over one or
more days. The error term ε1represents any idiosyncratic changes which are not
accommodated by the model; the package assumes that ε1is normally distributed.
Using time as a regressor, Prophet attempts to fit several linear and nonlinear
functions of time as components. Modeling seasonality as an additive component is
the same approach taken by exponential smoothing in the Holt-Winters technique.
This package frames the forecasting problem as curve-fitting rather than looking
explicitly at the time based dependence of each observation within a time series.
This means that it is not designed for multivariate time series.
XGBoost regression: Gradient boosting is an ensemble technique which creates
a prediction model by aggregating the predictions of weak prediction models, typ-
ically decision trees. With boosting methods, weak predictors are added to the col-
lection sequentially with each one attempting to improve upon the entire ensemble’s
performance.
In the XGBoost implementation [23], given a dataset with ntraining examples
consisting of an input xiand expected output yi, a tree ensemble model φ(xi)is
defined as the sum of Kregression trees fk(xi):
ˆyi=φ(xi) =
K
∑
k=1
fk(xi).
To evaluate the performance of a given model, we choose a loss function l(ˆyi,yi)to
measure the error between the predicted value and the target value, and optionally
add a regularization term Ω(fk)to penalize overly complex trees:
L(φ) =
n
∑
i
l(ˆyi,yi) +
K
∑
k
(Ω(fk)).
The Impact of External Features on Prediction Accuracy 5
The algorithm minimizes L(φ)by iteratively introducing each fk. Assume that the
ensemble currently contains Ktrees. We add a new tree fK+1that minimizes
n
∑
i
l(ˆyi,yi+fK+1(xi)) + Ω(fk),
or in other words, we greedily add the tree that most improves the current model as
determined by L. We train the new tree using this objective function; this is done
in practice by approximating the objective function using the first and second order
gradients of the loss function l(ˆyi,yi)[24].
3 Implementation Methodology
We implement the four algorithms described above in Python, using the scikit-learn
and Keras packages with Tensorflow as a backend [25, 26]. We used the Python
implementation of Prophet [11]. Table 1 shows the configuration parameters used
in the experiment for (LSTM, SVR XGboost) multivariate models and (The Face-
book Prophet) univariate model. For more details regarding the implantation of the
algorithms, the reader can check our longer paper [27] in that field and the packages
documentations online [25, 26, 11].
Table 1 Configuration parameters for multivariate models (LSTM, SVR XGboost) and univariate
model (The Facebook Prophet)
Model Configuration parameters
LSTM Input layer, 50 LSTM neurons, 1 neuron output layer
loss (mae), optimizer (adam), epochs (300), batch size (72)
SVR (RBF) kernel=’rbf’, C=1e3, gamma=0.1
SVR (Linear) kernel = ’linear’, C=1e3
SVR (Poly) kernel=’poly’, C=1e3, degree=3
Gradient booster(gbtree), colsample bytree (1), gamma (0)
Boosting learning rate (0.1), delta step (0), max depth (3), No estimators (100)
Facebook Default parameters, Periods (1500), freq (30T)
Prophet
Before being fed into the models, categorical features are encoded as numerical
values and all features are subsequently normalized to lie in the interval [0,1]. To
test the effect of external features, we reframe the data into three different datasets
for testing: one set consisting of univariate time series with no external features,
one consisting of the full multivariate time series with all features, and one contain-
ing external features alone with no energy time series information. The time series
datasets are converted to input-output pairs for supervised learning by considering
6 M. Selim, R. Zhou, W. Feng and O. Alam
(a) (b)
Fig. 1 (a) Training and Testing processes (b) Evaluating process
a sliding window 50 timesteps in which the windowed portion of the series is used
to predict the next timestep.
The models are trained on each dataset with a 75/25 training/validation split as
shown in Fig.1, and evaluated on one month of reserved testing data. To avoid data
leakage, we split the data in such a way that all data points in the validation set occur
chronologically later than those the training set, and all data in the testing set occur
after both.
The performance of the models is evaluated by two commonly used metrics in
forecasting, root-mean-square error (RMSE) and mean absolute percentage error
(MAPE) that are defined as the following:
RMSE =s1
N
N
∑
i=1yi−ˆyi2,(2)
MAPE =1
N
N
∑
i=1
yi−ˆyi
yi×100.(3)
The data used for comparison is a well-studied [4, 28] dataset obtained from
the 2001 European Network on Intelligent Technologies (EUNITE) competition for
electricity load forecasting [29]. This data comes from the Eastern Slovakian Elec-
tricity Corporation and spans two years from January 1, 1997 until December 31,
1998. It includes the following features: the half-hourly electricity load, the daily
average temperature, and a flag signifying whether the day is a holiday. The partial
autocorrelation is shown in Fig. 2; we note a one-timestep dependency as expected
for time series, as well as a spike around 48 timesteps corresponding to daily cycles.
We also notice a spike at 336 timesteps corresponding to weekly cycles. For this
reason the important past values for our time series model to incorporate are a lag
of 1, and ideally 48 and 336 as well.
The Impact of External Features on Prediction Accuracy 7
(a) (b)
Fig. 2 a) A correlation matrix b) Partial autocorrelation for the load [28].
A correlation coefficient of -0.8676 between the daily peak load and the daily
average temperature indicates a strong relationship between the electrical load and
weather conditions [28]. Analysis of the dataset shows that the load generally re-
duces on holidays and weekends [4, 28], likely due to businesses shutting down.
This varies depending on the specific holiday; on Christmas or New Year, for exam-
ple, electricity consumption is affected more than on other holidays. Based on these
observations, we choose to use as input features for our experiments the past loads,
daily temperature, the time of day, month, day of the week and whether the day is a
holiday [4, 28]. These features are encoded as numerical or binary values and nor-
malized to lie in the range [0, 1] using the MinMaxScaler from scikit-learn, while
categorical features are one-hot encoded using LabelEncoder from scikit-learn.
4 Experimental Results
A one-month forecast obtained from the four models is shown in Fig. 3 for 100
timesteps (50 hours). Qualitatively, it can be seen from the figure that the forecast
is fairly accurate for the LSTM, SVR (RBF, linear), and XGboost models, while
being considerably worse for SVR (polynomial). The figure also shows that the
forecasts obtained from Prophet consistently overestimate the actual value, while at
the same time not capturing small-scale variations in the load behaviour. We believe
that the superior performance of the LSTM, SVR, and XGboost models is due to the
incorporation of multivariate data. Note that despite the external features provided
to the models are measured daily (temperature is provided as a daily average), the
multivariate models still exhibit superior performance.
Table 2 shows the RMSE and MAPE values for each model while predicting the
half-hourly load for one month. The results for multivariate models are as following
for the LSTM model which obtains the highest accuracy with a MAPE value of
1.51 percent and RMSE value of 13.5 MW, followed by SVR (RBF, Linear) and
XGBoost with MAPE values of 2.1 percent and RMSE values of 17.6 MW, 17.9
8 M. Selim, R. Zhou, W. Feng and O. Alam
(a) (b)
(c) (d)
Fig. 3 Predictions (blue) compared with actual values (orange) for 100 time steps in January 1999
using (a) LSTM (b) SVR (c) XGboost (d) Facebook Prophet Package.
MW, and 18.2 MW respectively. While the lowest accuracy is for the Facebook
Prophet univariate model with a MAPE value of 14.4 percent and RMSE value of
102.2 MW
Table 2 MAPE and RMSE for multivariate models (LSTM, SVR XGboost) and (The Facebook
Prophet) univariate model
LSTM SVR SVR SVR Gradient Facebook
RBF Ker. Linear Ker. Poly Ker. Boosting Prophet
MAPE 1.51 2.1 2.1 4.0 2.1 14.4
RMSE 13.5 17.6 17.9 36.7 16.64 102.2
To estimate the contribution of the external features on multivariate models ac-
curacy, we conducted the same experiment for the LSTM, SVR, and XGboost mod-
els without using past power consumption as an input feature. Table 3 shows the
RMSE and MAPE values for each model while predicting the half-hourly load for
one month. The LSTM model obtains the highest accuracy with a MAPE value of
6.1 percent and RMSE value of 51.161 MW, followed by SVR (Poly) and XGBoost
with MAPE values of 6.4 and 7.5 percent, respectively. We note that multivariate
models still achieve reasonable accuracy and outperform the univariate model even
without using past power consumption as an input feature.
The Impact of External Features on Prediction Accuracy 9
Table 3 MAPE and RMSE for multivariate models (LSTM, SVR XGboost) and univariate model
(Prophet) without using past power consumption as input feature.
LSTM SVR SVR SVR XGBoost Facebook
RBF Ker. Linear Ker. Poly Ker. Prophet
MAPE 6.1 16.0 12.1 6.4 7.5 14.4
RMSE 51.161 128.355 96.036 52.863 63.135 102.2
5 Conclusion
External features, even when provided on longer timescales than the time series
of interest, can prove useful for improving prediction accuracy. In this work, we
compare four forecasting algorithms for time series - LSTM, SVR, XGBoost, and
the Prophet package - for the problem of short-term energy load forecasting. We
show that despite the external features of interest (e.g., temperature and holidays)
being measured on a daily basis, they considerably increase the accuracy of the
forecast for multivariate models as compared to the univariate model. Even when
past values are not provided to the model, the models achieve reasonable accuracy
based only on these external features and the time of day.
As future work, we intend to use datasets from other areas such as finance and
medical applications to investigate the consistency of algorithm performance. We
will also consider to develop new computational models that would take the advan-
tages of both multivariate and univariate time series algorithms.
Acknowledgements The project was supported by a grant from the Natural Sciences and Engi-
neering Research Council of Canada (NSERC).
References
1. Kadir Amasyali and Nora M El-Gohary. A review of data-driven building energy consumption
prediction studies. Renewable and Sustainable Energy Reviews, 81:1192–1205, 2018.
2. Filippo Maria Bianchi, Enrico Maiorino, Michael C Kampffmeyer, Antonello Rizzi, and
Robert Jenssen. An overview and comparative analysis of recurrent neural networks for short
term load forecasting. arXiv preprint arXiv:1705.04378, 2017.
3. Ahmed I Saleh, Asmaa H Rabie, and Khaled M Abo-Al-Ez. A data mining based load fore-
casting strategy for smart electrical grids. Advanced Engineering Informatics, 30(3):422–448,
2016.
4. Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen lin. Load forecasting using support vec-
tor machines: A study on eunite competition 2001. IEEE transactions on power systems,
19(4):1821–1830, 2004.
5. Lars Dannecker. Energy Time Series Forecasting: Efficient and Accurate Forecasting of Evolv-
ing Time Series from the Energy Domain. Springer, 2015.
6. Feng Jiang, Xue Yang, and Shuyu Li. Comparison of forecasting india’s energy demand
using an mgm, arima model, mgm-arima model, and bp neural network model. Sustainability,
10(7):2225, 2018.
10 M. Selim, R. Zhou, W. Feng and O. Alam
7. Chaoqing Yuan, Sifeng Liu, and Zhigeng Fang. Comparison of china’s primary energy con-
sumption forecasting by using arima (the autoregressive integrated moving average) model
and gm (1, 1) model. Energy, 100:384–390, 2016.
8. Apurva Narayan and Keith W Hipel. Long short term memory networks for short-term electric
load forecasting. In 2017 IEEE International Conference on Systems, Man, and Cybernetics
(SMC), pages 1050–1059, Banff Center, Banff, Canada, October 5-8 2017.
9. Yongbao Chen, Peng Xu, Yiyi Chu, Weilin Li, Yuntao Wu, Lizhou Ni, Yi Bao, and Kun
Wang. Short-term electrical load forecasting using the support vector regression (svr) model
to calculate the demand response baseline for office buildings. Applied Energy, 195:659 –
670, 2017.
10. GY Li, Wei Li, XL Tian, and YF Che. Short-term electricity load forecasting based on the
xgboost algorithm. Smart Grid, 07:274–285, 01 2017.
11. Sean J Taylor and Benjamin Letham. Forecasting at scale. The American Statistician,
72(1):37–45, 2018.
12. D.C. Montgomery, C.L. Jennings, and M. Kulahci. Introduction to Time Series Analysis and
Forecasting. Wiley Series in Probability and Statistics. Wiley, 2015.
13. C. Chatfield. Time-Series Forecasting. CRC Press, 2000.
14. Ruey S Tsay. Multivariate time series analysis: with R and financial applications. John Wiley
& Sons, 2013.
15. Kasturi Kanchymalay, Naomie Salim, Anupong Sukprasert, Ramesh Krishnan, and
Ummi Raba’ah Hashim. Multivariate time series forecasting of crude palm oil price using
machine learning techniques. In IOP Conference Series: Materials Science and Engineering,
volume 226, page 012117. IOP Publishing, 2017.
16. Alex Graves and J¨
urgen Schmidhuber. Framewise phoneme classification with bidirectional
lstm and other neural network architectures. Neural Networks, 18(5-6):602–610, 2005.
17. Christopher Olah. Understanding lstm networks. GITHUB blog, posted on August, 27:2015,
2015.
18. John Cristian Borges Gamboa. Deep learning for time-series analysis. arXiv preprint
arXiv:1701.01887, 2017.
19. Lingxue Zhu and Nikolay Laptev. Deep and confident prediction for time series at uber. arXiv
preprint arXiv:1709.01907, 2017.
20. Alex J Smola and Bernhard Sch¨
olkopf. A tutorial on support vector regression. Statistics and
computing, 14(3):199–222, 2004.
21. Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media,
2013.
22. Andrew C Harvey and Simon Peters. Estimation procedures for structural time series models.
Journal of Forecasting, 9(2):89–108, 1990.
23. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings
of the 22nd acm sigkdd international conference on knowledge discovery and data mining,
pages 785–794. ACM, 2016.
24. Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a statis-
tical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics,
28(2):337–407, 2000.
25. Mart´
ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,
Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale
machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,
2016.
26. Franc¸ois Chollet et al. Keras. https://keras.io, 2015.
27. Maher Selim and Peter Quinsey Wenying Feng, Ryan Zouh. Uncertainty for Energy Forecast-
ing using Bayesian Deep Learning. submitted to the Journal of Mathematical Foundation of
Computing (MFC), 2020.
28. Jawad Nagi, Keem Siah Yap, Farrukh Nagi, Sieh Kiong Tiong, and Syed Khaleel Ahmed.
A computational intelligence scheme for the prediction of the daily peak load. Applied Soft
Computing, 11(8):4773–4788, 2011.
29. EUNITE. Eunite electricity load forecast 2001 competition. Proceedings of EUNITE, Dec.
2001 2001.