Content uploaded by Omar Alam

Author content

All content in this area was uploaded by Omar Alam on Oct 18, 2020

Content may be subject to copyright.

The Impact of External Features on Prediction

Accuracy in Short-term Energy Forecasting

Maher Selim, Ryan Zhou, Wenying Feng and Omar Alam

Abstract Accurate prediction of future electricity demand is important in the en-

ergy industry. Machine learning for time series prediction provides solutions for

short term energy forecasting through a variety of algorithms, such as LSTM, SVR,

Xgboost, and Facebook Prophet. However, many companies primarily rely on uni-

variate time series algorithms, while numerous external data, e.g. weather data, are

available as input features for energy forecasting. In this paper, we study the impact

of external features on the performance of univariate and multivariate time series

algorithms for Short-term Energy Forecasting using a standard benchmark energy

data set. Quantitative comparisons on prediction accuracy measured by Root Mean

Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) for the models

are obtained. It is found that multivariate algorithms using external features out-

perform univariate algorithms, and that multivariate algorithms achieve reasonable

accuracy even without using past step energy consumption as an input feature.

1 Introduction

Forecasting energy demand is critical for the energy industry as well as businesses in

related sectors, such as banks and insurance companies. It has been estimated that

a one-percent improvement in mean absolute percentage error (MAPE) can save

$300,000 annually for a utility company with a 1 GW peak load, and millions of

dollars for larger ones [1]. An accurate forecast of upcoming energy consumption

allows utility companies to plan and make decisions in real-time for all processes

in their system and is a requirement to build automated smart energy grids [2, 3].

However, this problem has increasingly become more complex in recent years due

to growing energy markets [1] and the introduction of renewable sources which

are tightly coupled with external variables such as weather conditions. Despite the

M. Selim, R. Zhou, W. Feng, O. Alam

Trent University, Peterborough, ON Canada. Corresponding: wfeng@trentu.ca (Wenying Feng).

1

2 M. Selim, R. Zhou, W. Feng and O. Alam

fact that energy forecasting is increasingly becoming a multivariate problem, many

companies in the energy sector continue to use univariate time series algorithms,

which only consider the usage history of electricity consumption.

Electricity forecasting using data-driven approaches, such as, machine learning is

the subject of ongoing research [1]. A recent survey [1] shows that the percentages of

machine learning algorithms investigated for Short-Term Energy Load Forecasting

(STELF) are as follows: 4% decision trees, 24% statistical and other algorithms,

25% support vector machines (SVM), and 47% artiﬁcial neural networks (ANN)

[1, 4, 5].

Research in energy forecasting focuses primarily on improvement of univari-

ate models [6, 7]. Our contributions in this paper are twofold. First, we test repre-

sentative models using algorithms from each of the above surveyed categories [1],

and demonstrate that multivariate approaches consistently outperform the univari-

ate model Facebook Prophet for energy forecasting. To this end, four computational

models are adopted to our research purpose and tested: Long Short-Term Mem-

ory neural networks (LSTM) [8], Support Vector Regression (SVR) [9], Gradient

Boosted Trees [10], and Facebook Prophet [11]. Prediction accuracy for all models

is compared with Root Mean Square Error (RMSE) and Mean Absolute Percentage

Error (MAPE). Second, we show that even when the past energy load is excluded

as a feature, the models are capable of predicting future load based on external fea-

tures and even when the features are measured on a larger timescale than the target

variable. These results are of interest to the energy industry as they demonstrate a

simple and computationally light method to improve currently used models.

The remainder of this paper is organized as follows. Section 2 provides an

overview of machine learning time series algorithms studied. Section 3 discusses

the computational models and methodology implementation. Section 4 explains our

experimental results and Section 5 concludes the paper.

2 Notable multivariate and univariate machine learning time

series algorithms

We ﬁrst brieﬂy discuss the mathematical background for notable time series algo-

rithms including the multivariate algorithms LSTM [8], SVR [9], Gradient Boosted

Trees via XGBoost [10], and the univariate Facebook Prophet package [11]. This

forms the basis for our model development and implementation to be explained in

Section 3.

Univariate and multivariate time series models: The univariate time series is a

set of continuous observations for a single variable with constant time steps [12],

univariate models aim to predict future values for that single variable based only on

its past values:

ˆxt=F(xt−1,xt−2,xt−3, ...)

The Impact of External Features on Prediction Accuracy 3

where xtrepresents the value of the target variable at time t, and Fis the learned

function. The multivariate time series is deﬁned as observations of one or more

variables and features often taken simultaneously and describes the interrelation-

ships among the series [13, 14, 15]. Multivariate models use variables and features

of time-series data to develop a model to forecast future prediction for the target

variable:

ˆxt=F(xt−1,xt−2,xt−3,...,a(1)

t−1,a(1)

t−2,a(1)

t−3,...,a(2)

t−1,a(2)

t−2,a(2)

t−3,...)

where each a(i)represents the time series of an external feature. We also investigate

models of the form

ˆxt=F(a(1)

t−1,a(1)

t−2,a(1)

t−3,...,a(2)

t−1,a(2)

t−2,a(2)

t−3,...)

where past information about the target variable is unavailable.

Long Short-Term Memory (LSTM) neural networks: LSTM is a type of recur-

rent neural network architecture designed to extract long-term dependencies out of

sequential data and avoid the vanishing gradient problem present in ordinary re-

current networks [16, 17]. These properties make LSTM the method of choice for

longer time series and sequence prediction problems [18, 19]. LSTMs have been

successfully applied to Short-Term Electricity Load Forecasting (STELF) modeling

[2, 8]. There are several variations of the LSTM unit, but in this paper we use the

standard architecture designed by Graves and Schmidhuber [16].

The key idea behind LSTM is to introduce a memory cell to the standard RNN

architecture [2, 8]. This memory cell allows the LSTM module to retain information

across many timesteps when needed [18, 19].

Support Vector Regression (SVR): Nonlinear support vector regression is an ex-

tension of the support vector machine (SVM) to regression problems [20]. The sta-

tistical learning theory for support vector regression is developed in [21]. Assuming

that D={xi,yi}n

i=1is a training dataset, where xi∈Rdare the system features and

yi∈Ris the main system output observations, the goal of ε-SVR is to ﬁnd a function

f(x)that has no more than εdeviation from the observed output yifor all training

data.

This leads to the Support Vector Regression optimization:

min

w

1

2||w||2+C∑

nξn+ˆ

ξn

s.t. yn−wTx−ξn≤ε

−(yn−wTx)−ˆ

ξn≤ε

ξn≥0,ˆ

ξn≥0.

where wis the learned weight vector, xiis the i-th training instance, yiis the train-

ing label, and ξithe distance between the bounds and predicted values outside the

4 M. Selim, R. Zhou, W. Feng and O. Alam

bounds. Cis a parameter set by the user that controls the penalty imposed on obser-

vations outside the bounds, which helps to prevent overﬁtting. The SVR uses kernel

functions to transform the data into a higher dimensional feature space to make it

possible to perform the linear separation. In this paper, we use three different ker-

nels with SVR namely (a) Linear, (b) Polynomial, (c) Radial Basis Function (RBF).

Facebook Prophet: Prophet uses a decomposable time series model [22, 11] which

models three components: trend, seasonality, and holidays. They are combined ad-

ditively as follows:

y(t) = g(t) + s(t) + h(t) + ε1,(1)

where g(t)is a piece-wise linear or a logistic growth curve for modelling the trend

function that catches non-periodic changes in the value of the time series, s(t)rep-

resents periodic changes (e.g., weekly and yearly seasonality), and h(t)represents

the effects of holidays which occur on potentially irregular schedules over one or

more days. The error term ε1represents any idiosyncratic changes which are not

accommodated by the model; the package assumes that ε1is normally distributed.

Using time as a regressor, Prophet attempts to ﬁt several linear and nonlinear

functions of time as components. Modeling seasonality as an additive component is

the same approach taken by exponential smoothing in the Holt-Winters technique.

This package frames the forecasting problem as curve-ﬁtting rather than looking

explicitly at the time based dependence of each observation within a time series.

This means that it is not designed for multivariate time series.

XGBoost regression: Gradient boosting is an ensemble technique which creates

a prediction model by aggregating the predictions of weak prediction models, typ-

ically decision trees. With boosting methods, weak predictors are added to the col-

lection sequentially with each one attempting to improve upon the entire ensemble’s

performance.

In the XGBoost implementation [23], given a dataset with ntraining examples

consisting of an input xiand expected output yi, a tree ensemble model φ(xi)is

deﬁned as the sum of Kregression trees fk(xi):

ˆyi=φ(xi) =

K

∑

k=1

fk(xi).

To evaluate the performance of a given model, we choose a loss function l(ˆyi,yi)to

measure the error between the predicted value and the target value, and optionally

add a regularization term Ω(fk)to penalize overly complex trees:

L(φ) =

n

∑

i

l(ˆyi,yi) +

K

∑

k

(Ω(fk)).

The Impact of External Features on Prediction Accuracy 5

The algorithm minimizes L(φ)by iteratively introducing each fk. Assume that the

ensemble currently contains Ktrees. We add a new tree fK+1that minimizes

n

∑

i

l(ˆyi,yi+fK+1(xi)) + Ω(fk),

or in other words, we greedily add the tree that most improves the current model as

determined by L. We train the new tree using this objective function; this is done

in practice by approximating the objective function using the ﬁrst and second order

gradients of the loss function l(ˆyi,yi)[24].

3 Implementation Methodology

We implement the four algorithms described above in Python, using the scikit-learn

and Keras packages with Tensorﬂow as a backend [25, 26]. We used the Python

implementation of Prophet [11]. Table 1 shows the conﬁguration parameters used

in the experiment for (LSTM, SVR XGboost) multivariate models and (The Face-

book Prophet) univariate model. For more details regarding the implantation of the

algorithms, the reader can check our longer paper [27] in that ﬁeld and the packages

documentations online [25, 26, 11].

Table 1 Conﬁguration parameters for multivariate models (LSTM, SVR XGboost) and univariate

model (The Facebook Prophet)

Model Conﬁguration parameters

LSTM Input layer, 50 LSTM neurons, 1 neuron output layer

loss (mae), optimizer (adam), epochs (300), batch size (72)

SVR (RBF) kernel=’rbf’, C=1e3, gamma=0.1

SVR (Linear) kernel = ’linear’, C=1e3

SVR (Poly) kernel=’poly’, C=1e3, degree=3

Gradient booster(gbtree), colsample bytree (1), gamma (0)

Boosting learning rate (0.1), delta step (0), max depth (3), No estimators (100)

Facebook Default parameters, Periods (1500), freq (30T)

Prophet

Before being fed into the models, categorical features are encoded as numerical

values and all features are subsequently normalized to lie in the interval [0,1]. To

test the effect of external features, we reframe the data into three different datasets

for testing: one set consisting of univariate time series with no external features,

one consisting of the full multivariate time series with all features, and one contain-

ing external features alone with no energy time series information. The time series

datasets are converted to input-output pairs for supervised learning by considering

6 M. Selim, R. Zhou, W. Feng and O. Alam

(a) (b)

Fig. 1 (a) Training and Testing processes (b) Evaluating process

a sliding window 50 timesteps in which the windowed portion of the series is used

to predict the next timestep.

The models are trained on each dataset with a 75/25 training/validation split as

shown in Fig.1, and evaluated on one month of reserved testing data. To avoid data

leakage, we split the data in such a way that all data points in the validation set occur

chronologically later than those the training set, and all data in the testing set occur

after both.

The performance of the models is evaluated by two commonly used metrics in

forecasting, root-mean-square error (RMSE) and mean absolute percentage error

(MAPE) that are deﬁned as the following:

RMSE =s1

N

N

∑

i=1yi−ˆyi2,(2)

MAPE =1

N

N

∑

i=1

yi−ˆyi

yi×100.(3)

The data used for comparison is a well-studied [4, 28] dataset obtained from

the 2001 European Network on Intelligent Technologies (EUNITE) competition for

electricity load forecasting [29]. This data comes from the Eastern Slovakian Elec-

tricity Corporation and spans two years from January 1, 1997 until December 31,

1998. It includes the following features: the half-hourly electricity load, the daily

average temperature, and a ﬂag signifying whether the day is a holiday. The partial

autocorrelation is shown in Fig. 2; we note a one-timestep dependency as expected

for time series, as well as a spike around 48 timesteps corresponding to daily cycles.

We also notice a spike at 336 timesteps corresponding to weekly cycles. For this

reason the important past values for our time series model to incorporate are a lag

of 1, and ideally 48 and 336 as well.

The Impact of External Features on Prediction Accuracy 7

(a) (b)

Fig. 2 a) A correlation matrix b) Partial autocorrelation for the load [28].

A correlation coefﬁcient of -0.8676 between the daily peak load and the daily

average temperature indicates a strong relationship between the electrical load and

weather conditions [28]. Analysis of the dataset shows that the load generally re-

duces on holidays and weekends [4, 28], likely due to businesses shutting down.

This varies depending on the speciﬁc holiday; on Christmas or New Year, for exam-

ple, electricity consumption is affected more than on other holidays. Based on these

observations, we choose to use as input features for our experiments the past loads,

daily temperature, the time of day, month, day of the week and whether the day is a

holiday [4, 28]. These features are encoded as numerical or binary values and nor-

malized to lie in the range [0, 1] using the MinMaxScaler from scikit-learn, while

categorical features are one-hot encoded using LabelEncoder from scikit-learn.

4 Experimental Results

A one-month forecast obtained from the four models is shown in Fig. 3 for 100

timesteps (50 hours). Qualitatively, it can be seen from the ﬁgure that the forecast

is fairly accurate for the LSTM, SVR (RBF, linear), and XGboost models, while

being considerably worse for SVR (polynomial). The ﬁgure also shows that the

forecasts obtained from Prophet consistently overestimate the actual value, while at

the same time not capturing small-scale variations in the load behaviour. We believe

that the superior performance of the LSTM, SVR, and XGboost models is due to the

incorporation of multivariate data. Note that despite the external features provided

to the models are measured daily (temperature is provided as a daily average), the

multivariate models still exhibit superior performance.

Table 2 shows the RMSE and MAPE values for each model while predicting the

half-hourly load for one month. The results for multivariate models are as following

for the LSTM model which obtains the highest accuracy with a MAPE value of

1.51 percent and RMSE value of 13.5 MW, followed by SVR (RBF, Linear) and

XGBoost with MAPE values of 2.1 percent and RMSE values of 17.6 MW, 17.9

8 M. Selim, R. Zhou, W. Feng and O. Alam

(a) (b)

(c) (d)

Fig. 3 Predictions (blue) compared with actual values (orange) for 100 time steps in January 1999

using (a) LSTM (b) SVR (c) XGboost (d) Facebook Prophet Package.

MW, and 18.2 MW respectively. While the lowest accuracy is for the Facebook

Prophet univariate model with a MAPE value of 14.4 percent and RMSE value of

102.2 MW

Table 2 MAPE and RMSE for multivariate models (LSTM, SVR XGboost) and (The Facebook

Prophet) univariate model

LSTM SVR SVR SVR Gradient Facebook

RBF Ker. Linear Ker. Poly Ker. Boosting Prophet

MAPE 1.51 2.1 2.1 4.0 2.1 14.4

RMSE 13.5 17.6 17.9 36.7 16.64 102.2

To estimate the contribution of the external features on multivariate models ac-

curacy, we conducted the same experiment for the LSTM, SVR, and XGboost mod-

els without using past power consumption as an input feature. Table 3 shows the

RMSE and MAPE values for each model while predicting the half-hourly load for

one month. The LSTM model obtains the highest accuracy with a MAPE value of

6.1 percent and RMSE value of 51.161 MW, followed by SVR (Poly) and XGBoost

with MAPE values of 6.4 and 7.5 percent, respectively. We note that multivariate

models still achieve reasonable accuracy and outperform the univariate model even

without using past power consumption as an input feature.

The Impact of External Features on Prediction Accuracy 9

Table 3 MAPE and RMSE for multivariate models (LSTM, SVR XGboost) and univariate model

(Prophet) without using past power consumption as input feature.

LSTM SVR SVR SVR XGBoost Facebook

RBF Ker. Linear Ker. Poly Ker. Prophet

MAPE 6.1 16.0 12.1 6.4 7.5 14.4

RMSE 51.161 128.355 96.036 52.863 63.135 102.2

5 Conclusion

External features, even when provided on longer timescales than the time series

of interest, can prove useful for improving prediction accuracy. In this work, we

compare four forecasting algorithms for time series - LSTM, SVR, XGBoost, and

the Prophet package - for the problem of short-term energy load forecasting. We

show that despite the external features of interest (e.g., temperature and holidays)

being measured on a daily basis, they considerably increase the accuracy of the

forecast for multivariate models as compared to the univariate model. Even when

past values are not provided to the model, the models achieve reasonable accuracy

based only on these external features and the time of day.

As future work, we intend to use datasets from other areas such as ﬁnance and

medical applications to investigate the consistency of algorithm performance. We

will also consider to develop new computational models that would take the advan-

tages of both multivariate and univariate time series algorithms.

Acknowledgements The project was supported by a grant from the Natural Sciences and Engi-

neering Research Council of Canada (NSERC).

References

1. Kadir Amasyali and Nora M El-Gohary. A review of data-driven building energy consumption

prediction studies. Renewable and Sustainable Energy Reviews, 81:1192–1205, 2018.

2. Filippo Maria Bianchi, Enrico Maiorino, Michael C Kampffmeyer, Antonello Rizzi, and

Robert Jenssen. An overview and comparative analysis of recurrent neural networks for short

term load forecasting. arXiv preprint arXiv:1705.04378, 2017.

3. Ahmed I Saleh, Asmaa H Rabie, and Khaled M Abo-Al-Ez. A data mining based load fore-

casting strategy for smart electrical grids. Advanced Engineering Informatics, 30(3):422–448,

2016.

4. Bo-Juen Chen, Ming-Wei Chang, and Chih-Jen lin. Load forecasting using support vec-

tor machines: A study on eunite competition 2001. IEEE transactions on power systems,

19(4):1821–1830, 2004.

5. Lars Dannecker. Energy Time Series Forecasting: Efﬁcient and Accurate Forecasting of Evolv-

ing Time Series from the Energy Domain. Springer, 2015.

6. Feng Jiang, Xue Yang, and Shuyu Li. Comparison of forecasting india’s energy demand

using an mgm, arima model, mgm-arima model, and bp neural network model. Sustainability,

10(7):2225, 2018.

10 M. Selim, R. Zhou, W. Feng and O. Alam

7. Chaoqing Yuan, Sifeng Liu, and Zhigeng Fang. Comparison of china’s primary energy con-

sumption forecasting by using arima (the autoregressive integrated moving average) model

and gm (1, 1) model. Energy, 100:384–390, 2016.

8. Apurva Narayan and Keith W Hipel. Long short term memory networks for short-term electric

load forecasting. In 2017 IEEE International Conference on Systems, Man, and Cybernetics

(SMC), pages 1050–1059, Banff Center, Banff, Canada, October 5-8 2017.

9. Yongbao Chen, Peng Xu, Yiyi Chu, Weilin Li, Yuntao Wu, Lizhou Ni, Yi Bao, and Kun

Wang. Short-term electrical load forecasting using the support vector regression (svr) model

to calculate the demand response baseline for ofﬁce buildings. Applied Energy, 195:659 –

670, 2017.

10. GY Li, Wei Li, XL Tian, and YF Che. Short-term electricity load forecasting based on the

xgboost algorithm. Smart Grid, 07:274–285, 01 2017.

11. Sean J Taylor and Benjamin Letham. Forecasting at scale. The American Statistician,

72(1):37–45, 2018.

12. D.C. Montgomery, C.L. Jennings, and M. Kulahci. Introduction to Time Series Analysis and

Forecasting. Wiley Series in Probability and Statistics. Wiley, 2015.

13. C. Chatﬁeld. Time-Series Forecasting. CRC Press, 2000.

14. Ruey S Tsay. Multivariate time series analysis: with R and ﬁnancial applications. John Wiley

& Sons, 2013.

15. Kasturi Kanchymalay, Naomie Salim, Anupong Sukprasert, Ramesh Krishnan, and

Ummi Raba’ah Hashim. Multivariate time series forecasting of crude palm oil price using

machine learning techniques. In IOP Conference Series: Materials Science and Engineering,

volume 226, page 012117. IOP Publishing, 2017.

16. Alex Graves and J¨

urgen Schmidhuber. Framewise phoneme classiﬁcation with bidirectional

lstm and other neural network architectures. Neural Networks, 18(5-6):602–610, 2005.

17. Christopher Olah. Understanding lstm networks. GITHUB blog, posted on August, 27:2015,

2015.

18. John Cristian Borges Gamboa. Deep learning for time-series analysis. arXiv preprint

arXiv:1701.01887, 2017.

19. Lingxue Zhu and Nikolay Laptev. Deep and conﬁdent prediction for time series at uber. arXiv

preprint arXiv:1709.01907, 2017.

20. Alex J Smola and Bernhard Sch¨

olkopf. A tutorial on support vector regression. Statistics and

computing, 14(3):199–222, 2004.

21. Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media,

2013.

22. Andrew C Harvey and Simon Peters. Estimation procedures for structural time series models.

Journal of Forecasting, 9(2):89–108, 1990.

23. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings

of the 22nd acm sigkdd international conference on knowledge discovery and data mining,

pages 785–794. ACM, 2016.

24. Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a statis-

tical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics,

28(2):337–407, 2000.

25. Mart´

ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,

Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorﬂow: Large-scale

machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,

2016.

26. Franc¸ois Chollet et al. Keras. https://keras.io, 2015.

27. Maher Selim and Peter Quinsey Wenying Feng, Ryan Zouh. Uncertainty for Energy Forecast-

ing using Bayesian Deep Learning. submitted to the Journal of Mathematical Foundation of

Computing (MFC), 2020.

28. Jawad Nagi, Keem Siah Yap, Farrukh Nagi, Sieh Kiong Tiong, and Syed Khaleel Ahmed.

A computational intelligence scheme for the prediction of the daily peak load. Applied Soft

Computing, 11(8):4773–4788, 2011.

29. EUNITE. Eunite electricity load forecast 2001 competition. Proceedings of EUNITE, Dec.

2001 2001.