Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on May 18, 2020
Content may be subject to copyright.
A Survey on Hyperparameters Optimization Algorithms of Forecasting
Models in Smart Grid
Rabiya Khalida,Nadeem Javaida,∗
aDepartment of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
ARTICLE INFO
Keywords:
Forecasting
Hyperparameters
Parameter Tuning
Data Preprocessing
Training Algorithms
Outliers in Data
Processing Time
ABSTRACT
Forecasting in the smart grid (SG) plays a vital role in maintaining the balance between demand and
supply of electricity, efficient energy management, better planning of energy generation units and re-
newable energy sources and their dispatching and scheduling. Existing forecasting models are being
used and new models are developed for a wide range of SG applications. These algorithms have hy-
perparameters which need to be optimized carefully before forecasting. The optimized values of these
algorithms increase the forecasting accuracy up-to a significant level. In this paper, we present a brief
literature review of forecasting models and the optimization methods used to tune their hyperparam-
eters. In addition, we have also discussed the data preprocessing methods. A comparative analysis of
these forecasting models, according to their hyperparameter optimization, error methods and prepro-
cessing methods, is also presented. Besides, we have critically analyzed the existing optimization and
data preprocessing models and highlighted the important findings. A survey of existing survey papers
is also presented and their recency score is computed based on the number of recent papers reviewed
in them. By recent, we mean that the year in which a survey paper is published and its previous three
years. Finally, future research directions are discussed in detail.
1. Introduction
The advancement in technology and increased usage of
smart devices bring about the concept of big data. The pro-
duction of data is growing rapidly and according to [1], the
volume of big data would increase by the factor of 300 in the
upcoming years. The cost of data storage has also been re-
duced which is paving the way of storing more data and use
it in the future. It has become a new focus for the researchers
while playing a very important role in engineering, science
and technology domains to acquire the efficient solution of
the problems. In SG, a huge volume of data is being gathered
and stored from sensors, smart meters and other smart de-
vices which can be used for efficient planning and forecast-
ing. So, forecasting using big data has become the new hot
topic in this domain. The rapid increment in electricity con-
sumption, intermittent nature of renewable energy sources
(RESs) and fluctuations in demand are serious issues of a
power system. The core aims of SG are to achieve the bal-
ance between demand and supply of electricity, increase the
grid’s reliability and efficiency and make grid environment
friendly. Forecasting, in this regard, plays a very important
role. It enables the utility to plan and organize the future de-
cisions related to power generation, electricity price and co-
ordination of electricity generating units and get maximum
benefits out of them. Electricity load and price forecasting
have gained great attention from researchers in this area as
these two factors have a great influence on maintaining the
stability of the grid. Additionally, forecasting of power fail-
ure, stability of transformers and power network, anomalies,
blackouts and energy generation forecasting of RES are also
∗Principal corresponding author
nadeemjavaidqau@gmail.com (N. Javaid)
http://www.njavaid.com (N. Javaid)
ORCID(s): 0000-0003-3777-8249 (N. Javaid)
studied and their solutions are provided in the literature.
The prediction of future values of SG components has
a great importance as they play an important role as an in-
put to the current decisions [2]. For example, the knowledge
of load demand in the future can play a very important role
to set the electricity price values. It is also important for
the utility to make different policies related to energy. Sev-
eral decisions related to power, base on the information of
future load. Similarly, a consumer can use the forecasted
values of electricity prices and change its load consumption
pattern accordingly. In recent years, researchers have pro-
posed a large number of forecasting models for accurate load
and price prediction. They can be used in the maintenance
of power networks, better scheduling of energy generators,
continuous energy provision to the consumers, achieve sta-
bility in demand and supply of electricity and maintain the
grid reliability. Moreover, effective planning and decision
making can save millions of dollars which are ver y important
for the economical growth of a company as well as country.
In literature, researchers have proposed several forecast-
ing methods to predict the load and price of electricity. Fig-
ure 1shows the classification of forecasting algorithms. Sup-
port vector machine (SVM) and neural networks (NNs) are
commonly used forecasting algorithms by researchers to make
predictions in the SG area. Different variants of these algo-
rithms are also available and used. Bayesian networks (BN)
are also one of the frequently used forecasting methods. In
addition to these three types, several other forecasting tech-
niques are also implemented for the prediction in SG. In fore-
casting algorithms, the accuracy is greatly affected by their
hyperparameters. So, the values of these parameters should
be chosen carefully.
In this paper, we have presented a survey of algorithms
used for the optimization of hyperparameters in SG. Their
values vary from problem to problem and need accurate op-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 1 of 35
Forecasting in smart grid
ANN SVM BN Others
Optimize weights/hyperparameters
Nature-inspired
techniques Grid search Cross validation Other statistical
methods
[3], [4], [5], [6], [7], [8],
[9], [10], [11], [12 ], [13],
[14], [15], [16 ], [17]
[18], [19], [20 ], [21],
[22], [23],
[31], [32], [33 ], [34],
[35], [36]
[37], [38], [39 ], [40], [41],
[42], [43], [44 ], [45]
Gradient descent
[24], [25], [26 ], [27],
[28], [29], [30 ]
Figure 1: Types of forecasting algorithms and optimization techniques
timized values for correct prediction. Inefficient optimiza-
tion of these parameters results in poor accuracy results and
a model that could be the best choice for a forecasting model,
does not perform well. To overcome this issue, different op-
timization methods are applied to optimize these parameters.
Figure 1shows the optimization methods commonly used
for parameters optimization of forecasting models. Here,
grid search, gradient descent and cross validation are the
frequently used methods. Nature-inspired methods are also
proposed by researchers to efficiently optimize these param-
eters. In the existing literature [2,3], [4]-[6], similar surveys
are presented. However, [2] and [5] surveyed the hyperpa-
rameter tuning methods of the least square support vector
machine (LSSVM) only. Similarly, in [3], randomized algo-
rithms are surveyed for tuning NNs, [4] presents the survey
of metaheuristic algorithms to train random single-hidden
layer feedforward neural network (RSLFN), and [6] contains
the survey of heuristic algorithms for tuning SVM. In our
survey, we have reviewed both nature-inspired and statisti-
cal methods to tune the hyperparameters of SVM, NNS, BNs
and their variants. The contributions of our survey are as fol-
lows:
•A detailed review of forecasting models (from 2014
and onwards) and optimization methods used to tune
hyperparameters of these models is presented.
•Data preprocessing methods used in these studies are
also discussed.
•All the forecasting models are critically analyzed and
future research directions are also presented.
•In related work section, a survey of similar survey pa-
pers is presented and their recency score is also com-
puted.
The rest of the paper is organized as follows: Section 2
contains similar work and in Section 3, we have discussed
the proposed framework of a forecasting model that con-
tains all the necessary steps for data forecasting, from data
gathering to the final output phase. Section 4contains a
detailed discussion on forecasting techniques and optimiza-
tions methods used to optimize the values of their hyperpa-
rameters. In Section 5, common data preprocessing meth-
ods are discussed. Section 6contains the critical analysis
and findings of this survey. In Section 7, future directions
related to hyperparameter optimizers are discussed. Finally,
paper is concluded in Section 8.
2. Related work
Hyperparameter optimization is considered very impor-
tant for the forecasting accuracy of algorithms. Researchers
are using the already existing optimizers and also proposing
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 2 of 35
new optimization algorithms for their tuning. Improvement
of forecasting accuracy is an ongoing research area. So, to
summarize this research and provide compact information
to the readers, survey papers are published. The overview
of some of the most related literature to our work is given
below.
2.1. Survey of surveys (similar work)
A survey of hyperparameter tuning of LSSVM is pre-
sented in [2]. The performance of cross validation, evolu-
tionary algorithms and swarm intelligence optimization meth-
ods is compared. The values of the regularization parameters
and kernel parameters are optimized using these methods. In
this study, it is concluded that the evolutionary algorithms
are the best choice for tuning of these hyperparameters.
In [3], a survey of randomize algorithms to train the NNs
is presented. It is stated that these algorithms can enhance
both the performance and efficiency of NNs. The authors
have discussed the use of randomization in kernel methods
and classified the model into several parts based on network
configuration. The future challenges and direction in this
area are also discussed in this article.
Han et al. provided a survey on the hyperparameter tun-
ing of RSLFN in [4]. In this forecasting model, the ini-
tial values of hyperparameters are chosen randomly and op-
timized iteratively. The authors state that the forecasting
accuracy of this model depends on the accurate selection
of the number of hidden neurons and other hyperparame-
ters. For a careful selection, several optimization methods
are used. Metaheuristic optimization methods are one of
the commonly used optimization methods. The authors pro-
posed a comprehensive survey of hyperparameter tuning of
RSLFN using metaheuristic optimization methods. The fu-
ture research direction and possible challenges are also dis-
cussed in detail.
Afshin et al. presented a comparative study for the hy-
perparameter tuning methods for LSSVM in [5]. The au-
thors state that the accurate selection of these parameters is
essential to achieve high accuracy in short-term forecasting.
Three parameter optimization methods are used, such as ge-
netic algorithm (GA), cross validation, simulated annealing
and Bayesian evidence framework. From the comparison
of optimization methods, it is concluded that the Bayesian
framework achieves the highest accuracy and its speed is
also faster.
In [6], authors survey nature-inspired algorithms for tun-
ing the hyperparameters of support vector regressor (SVR).
This survey is specific for the regression problem related to
inverse ECG. It is stated that to obtain accurate results, it
is important to tune the hyperparameters carefully. Three
optimization algorithms are used: GA, particle swarm op-
timization (PSO) and differential evolution (DE). For a fair
comparison, SVR is trained using these optimizers using the
same dataset. The simulations are carried out and results
show that SVR gives best results when its hyperparameters
are tuned using PSO.
Elsken et al. [7] have explored the research field of au-
tomated neural architecture search methods. It is stated that
the search methods developed by humans are erroneous and
time-consuming. So, the automated methods are fast and
there is a minimum chance of any error. Every phase of
these methods is automatic and models are selected when
their performance meets a predefined selection criteria. The
hyperparameters of these methods are also selected automat-
ically. Models belonging to this category are classified into
three types based on their search space, strategy and perfor-
mance evaluation method.
In [8], different hyperparameter optimization methods
are tested to check which one of them is best. To evalu-
ate the performance of these methods and for a fair com-
parison, all the methods are applied to the defect prediction
problem. After the simulation experiments, it is concluded
that a single hyperparameter tuning method cannot be de-
clared as best. Also, the results of some methods depict
that the default configurations produce the same accuracy as
the accuracy achieved by parameter tuning. It is concluded
that the hyperparameter optimization can play a significant
role in the improvement of forecasting accuracy and the op-
timization method should be selected carefully according to
the nature of prediction data and forecasting model.
Bergstra et al. [9] have presented a comparison of hyper-
parameter tuning methods for artificial NN (ANN) and deep
belief networks (DBN). The performance of random search
and sequential methods is compared for tuning of these pa-
rameters. Performance-wise, the random search produces
good results for ANN but its performance in the case of DBN
is not up to the mark. On the other hand, the sequential meth-
ods give better accuracy with the DBN forecasting model.
The accuracy of the most complex DBN is also improved
by using these methods. In this study, the dependency of
several parameters on hyperparameters is also highlighted.
A survey on deep learning-based forecasting models is
presented in [10]. In this work, the authors state that hyper-
parameter optimization is an important and time-consuming
task, so optimization methods should be used. In this paper,
only those forecasting algorithms are reviewed whose hy-
perparameters are optimized using swarm intelligence and
evolutionary algorithms. The effect of these optimization al-
gorithms on the prediction accuracy of deep learning-based
forecasting methods is analyzed for big data applications.
Additionally, commonly used deep learning methods are also
discussed along with their weaknesses and strengths. After
a comprehensive discussion of forecasting models, the core
findings of this survey are discussed and issues related to
deep learning-based methods are highlighted which need to
be improved. Moreover, the future researcher direction in
this domain are also discussed. The existing literature is crit-
ically analyzed in the âĂIJproblem and challengesâĂİ sec-
tion.
Karaboga et al. presented a survey of the adaptive network-
based fuzzy inference system (ANFIS) and the optimization
methods used to train its parameters in [11]. From the sur-
vey, it is observed that both derivative and non-derivative
based optimization algorithms are used for ANFIS training.
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 3 of 35
Former include gradient descent, least-square estimation, etc.,
while later include heuristic algorithms. It is observed that
heuristic optimization methods perform better for model train-
ing. Hence, the focus of this survey is to present the per-
formance of heuristic models and the introduction of new
hybrid algorithms for ANFIS training. A brief discussion
related to these models is presented and work is concluded.
However, critical analysis and future challenges are not in-
cluded in this work.
In [12], a survey of forecasting models for workload pre-
diction in cloud computing is presented. Workload predic-
tion plays a very important role in the efficiency and relia-
bility of a cloud system. It helps to improve the quality of
services provided by a cloud as the energy required by data
centers and cloud resources can be estimated, the scalabil-
ity of service providers can be increased, etc. In this survey,
the challenges of workload prediction are discussed in de-
tail and workload is also classified according to the architec-
tural requirements, computing model, resource requirements
and other non-functional requirements. Moreover, a detailed
survey of regression-based models, classification model and
stochastic-based forecasting models is presented. Besides,
future research direction and critical analysis of the existing
schemes are also provided in this survey.
Hossain et al. [13] presented a survey of big data and
machine learning applications in SG. The applications of
IoT are also discussed in detail as this technology provides
connectivity between several smart electric devices. The in-
teraction of these devices generates the data in a huge vol-
ume which is known as big data. This data is used by re-
searchers and machine learning techniques are applied to get
meaningful information. Electricity load and price forecast-
ing are two very important applications of this data and ma-
chine learning methods. This study contains a comprehen-
sive discussion of big data analytics and machine learning
algorithms for forecasting. Moreover, cybersecurity is a big
issue of IoT integrated systems as attackers target the smart
devices and data. In this study, this issue is also discussed
in detail. In the end, the survey is concluded and in out-
come section, some future research directions and critical
comments are discussed.
In [14], a survey of big data analytics is presented in SG.
The authors have identified the research gaps and barriers to
big data implementation in this domain. A comprehensive
review of the existing literature is presented and challenges
are highlighted. Moreover, a detailed discussion of future
research directions for big data integration in SG is also pro-
vided. Moreover, applications of big data in the SG domain
are discussed in detail. Some important applications are as
follows: energy management, improved efficiency and relia-
bility of SG, state estimation, cyber-physical systems, etc. In
addition, for utility companies, a deep insight is provided on
how big data can be beneficial for developing new business
models.
The effects of load forecasting on a microgrid are investi-
gated in [15]. The focus of this study is to evaluate the role of
power generation and consumption on a renewable resource-
Table 1
Criteria for calculating recency score
Percentage (%) Weight Recency score
0-10 0.1 *
11-20 0.2
21-30 0.3 **
31-40 0.4
41-50 0.5 ***
51-60 0.6
61-70 0.7 ****
71-80 0.8
81-90 0.9 *****
91-100 1.0
based microgrid. For this purpose, papers are selected from
highly impacted and well-cited journal publications. The
forecasting models are analyzed on the bases of cost metric,
reserve size estimation, market benefits, improved reliability
of microgrid, etc. This study aims to provide the guideline
to the new researchers about the trends of optimal planning
and operations of a microgrid.
In [16], a survey of forecasting algorithms for food sales
predictions is presented. The future stats related to food
items sales help the sellers to avoid the problem of miss-
ing products and stacks of expired products. They can plan
the purchasing of their food items according to the predicted
values and reduce the monetary loss in their business while
increasing the profit. In this study, a comprehensive liter-
ature review of forecasting models in food sales is carried
out. Additionally, the evaluation metrics of forecasting mod-
els are also discussed. Finally, the paper is concluded with
a discussion on the existing forecasting models and oppor-
tunities in this domain. However, the critical analysis and
future research directions are missing in this paper.
2.2. Recency score and comparative analysis
The percentage of recency is obtained by multiplying the
total number of recent papers with 100 and then dividing it
with the total number of papers. By recent, we mean the
year in which paper is published and its previous 3 years.
We propose an equation to compute the recency percentage
of a paper which is as follows:
𝑑−3
𝑖=𝑑Total number of papers from year i
Total number of papers cited in a survey × 100 (1)
In the above equation 𝑑represents the year in which a survey
paper is published. Table 1shows the detail of how a paper
is assigned a weight according to its percentage value. Ac-
cording to the weight of each paper, they are assigned stars.
If the weight assigned to a paper is more than 1.0 then five
stars are given, on the other hand, if assigned weight is 0.1
then only one star is given to that paper. These stars repre-
sent how much recent papers are cited in a survey.
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 4 of 35
Table 2
Comparative analysis of related work
Survey Future
challanges
Domain Critical
analysis
Recency
score
Significance
A survey of hyperparam-
eter tuning of LSSVM [2]
7Not domain
specific
7* Hyperparameter tuning of LSSVM
A survey on NN [3]3Not domain
specific
7** Hyperparameter tuning algorithms for
NN
A survey on hyperparam-
eter tuning of RSLFN [4]
3Not domain
specific
7** Hyperparameter tuning of RSLFN using
metaheuristic optimization algorithms
A comparative study for
tuning of LSSVM [5]
7Not domain
specific
7** Hyperparameter tuning of LSSVM
A comparative study for
tuning SVR [6]
7Not domain
specific
7** Hyperparameter tuning of SVR using
heuristic algorithms
A survey of deep
learning-based models
[7]
3Not domain
specific
7*** Discussed deep learning, its dimensions
and hyperparameter optimization
A survey of hyperparam-
eter optimizers [8]
3Not domain
specific
7*** Compares performance of optimizers by
training forecasting algorithms
A survey of hyperparam-
eter optimizers [9]
7Image pro-
cessing
7*** Hyperparameter optimizers for image
processing algorithms
A survey on hyperpa-
rameter optimization of
deep learning-based fore-
casting models [10]
3Not domain
specific
3*** Hyperparameter optimization of deep NN
using swarm intelligence and evolutionary
algorithms
A survey on training
methods of ANFIS [11]
7Not domain
specific
7** Hyperparameter tuning of ANFIS by
derivative and non-derivate based algo-
rithms
A survey and classifica-
tion of the workload fore-
casting methods [12]
3Cloud com-
puting
3**** Forecasting models for workload forecast-
ing in cloud computing
A survey on applications
of big data and machine
learning [13]
3SG 3**** Big data and machine learning applica-
tion in SG
A survey on big data an-
alytics [14]
3SG 3**** Big data analytics in SG
A survey on forecast-
ing models in renewable
power systems [15]
7SG 7**** Selected applications of forecasting mod-
els for optimial integration of renewable
energy
A survey on food sales
predictions [16]
7Food sales 7* Machine learning techniques for food
sales Prediction
We propose a survey on
forecasting models in SG
3SG 3**** Hyperparameter optimization of fore-
casting algorithms in SG
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 5 of 35
Start Feature selection\Dimension
reduction
Stopping
criteria
meet?
Smart
meters
Data prefiltering
Normalization
Split into training/testing data
Parameter
selection
Train model
Compute error
Optimal
parameters
Retrain model
Forcasted values
Reverse
transformation
Users
Sensors Electricity
market
Big data
Output value
Data preprocessing
Big data generation in SG
Yes
No
Hyper parameter
optimization
Forecasting algorithm
Figure 2: Framework of a forecasting model
Table 2shows the comparative analysis of the related
work. The studies [2,5,6] survey the hyperparameters op-
timization methods for SVR and its variants. These stud-
ies do not include critical analysis and future research direc-
tions. The recency score of [2] is the lowest and the recency
score of the other two studies is also low. In [7,10,4] con-
tains the survey of ANN and its variants and [49] surveys
optimization methods for ANFIS. Authors in [10] provided
a comprehensive survey with critical analysis, applications
and future research directions. However, its recency score
is three stars. Authors in [13,14,16] provide the survey of
forecasting algorithms in cloud computing, image process-
ing, and food sales, respectively. From these studies, [16]
has the lowest recency score and also it does not provide all
the necessary information. The studies [12]-[37] have the
highest recency score. From these studies, [13,14] are best
as both of these studies also include a detailed discussion on
important aspects and oncomings of their survey.
3. Forecasting model
Forecasting can be defined as a process in which cur-
rent and past data values are analyzed to predict future val-
ues. In SG, electricity price and demand forecasting are
of great importance. With the advancement in technology
and increased Internet of things applications, a huge amount
of data is gathered. This data is used to predict future en-
ergy demand, price, fault detection, electricity theft, etc. Re-
searchers are actively working in this area and proposing ef-
ficient forecasting models with enhanced performance and
accuracy, as discussed in Section 4. The collected data is
given as an input to a forecasting model and it predicts the
future value as an output.
In a forecasting model, three phases are very important
i.e. data gathering, data preprocessing and forecasting. Fig-
ure 2shows all the necessary steps used in an efficient fore-
casting model. For prediction, the availability of related data
is very important. In SG, data is gathered from different
sources. Smart meters are a big source of data related to
demand and supply of electricity, patterns of energy con-
sumption of different users, user preferences, alteration in
consumption patterns by users, etc. Sensors, on the other
hand, keep on sensing power lines, the status of energy gen-
eration units and other important components involved in
the power system. The data generated by these components
is also stored and used in forecasting. Moreover, users and
electricity markets are also two big data generating sources
in SG.
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 6 of 35
3.1. Data preprocessing
The available data is found in raw form and needs pre-
processing before usage. The first step of data preprocessing
is to convert data into tabular form and select only relevant
features. The inclusion of irrelevant features increases the
data size and decreases the learning speed and accuracy of
a forecasting algorithm [17]. After selecting the features,
the data is filtered to eliminate the outliers and delete the
records containing missing values as both outliers and miss-
ing values can reduce the accuracy of the forecasting model
and lead to inaccurate predictions. In the next step, all at-
tributes are normalized between a common interval, usually,
this interval is between 0 and 1. This step is used to make
the values of different variables comparable. After applying
these steps the resultant dataset is split into test and training
sets and passed to the forecasting phase. In Section 5, data
preprocessing techniques used in literature are discussed in
detail.
3.2. Forecasting algorithms
After data preprocessing, it is now ready to train a fore-
casting algorithm. The preprocessed data is the input of a
forecasting algorithm. In SG, SVM, NN and BN based fore-
casting algorithms are used frequently.
3.2.1. SVM
SVM is an efficient and simple learning method. It is
used for both classification and regression problems. Ini-
tially, it was designed for the two classes only; however, with
time its variants for multiple classes and regression problems
were introduced. The variant of SVM which is used for re-
gression is known as SVR. Both models possess the same
qualities as the values of target variables in SVR are real
numbers. In SG, SVR is commonly used. The problem of
training SVR using a data set with m points 𝑚
𝑖=1{𝑥𝑖, 𝑦𝑖}can
be expressed as follows [18]:
𝑚𝑖𝑛{1
2𝑤2+𝐶
𝑚
𝑖=1
(𝜒𝑖+𝜒∗
𝑖)} (2)
subject to the following constraints
𝑦𝑖− (𝑤𝑡𝑥𝑖+𝑏)⩽𝜖+𝜒𝑖,
(𝑤𝑡𝑥𝑖+𝑏) − 𝑦𝑖⩽𝜖+𝜒∗
𝑖,
𝜒𝑖, 𝜒∗
𝑖≥0.
(3)
In equation 2,𝑥𝑖and 𝑦𝑖are the training and testing samples,
respectively. 𝑤is the weight vector, 𝑏is the bias term of the
hyperplane and 𝐶represents the regularization parameter.
Figure 3[19] shows the important steps which are followed
to train the model. After initializing the input data, a suitable
kernal is chosen and hyperparameters are initialized. In the
next step the initial model is trained and fitted over the input
data. After these steps, error is computed. If the error value
is minimum then model is ready for prediction, otherwise,
the hyperparameters are tuned again.
Start
Initialize input
Select kernal
Train initial model
Is error minimum?
Tune hyper parameters
End
Yes
No
Set values of
hyper parameters
Fit input/output parameters
Compute fitting err or
Figure 3: Flow chart of SVM
3.2.2. ANN
The ANNs are widely used forecasting algorithms be-
cause of their accuracy and learning speed. These are in-
spired from human brain [20]. A human brain is made-up
of several neurons which are interconnected and pass infor-
mation to each other. Similarly, ANN follow the same pro-
cedure and activation function is used to move information
from one perceptron to another. It can mathematically be
formulated as follows [21].
𝑁𝑗=
𝑑
𝑖=1
(𝑥𝑖𝑤𝑖𝑗 +𝑤𝑗0)(4)
𝑦𝑗=𝑓(𝑁𝑗)(5)
𝑁𝑘=
𝑒
𝑗=1
(𝑦𝑗𝑤𝑘𝑗 +𝑤𝑘0)(6)
𝑧𝑘=𝑓(𝑁𝑘)(7)
𝑧𝑘=𝑓(
𝑒
𝑗=1
(𝑤𝑘𝑗 𝑓(
𝑑
𝑖=1
(𝑥𝑖𝑤𝑖𝑗 +𝑤𝑗0)) + 𝑤𝑘0)) (8)
In equation 4,𝑑represents the total number of inputs and
𝑋is the input value which is being multiplied with weight
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 7 of 35
Start
Initialize inputs
Create three netwo rk layers
Train the networ k
Last pattern trained?
Train the final netw ork
End
Total error = 0
Train on first pattern of data
Compute error for each neuron and
add to the total error
Total error < Target error
Yes
Yes
No
No
Figure 4: Flow chart of ANN
𝑤.𝑤0represents the bias value. The equation 5represents
the activation function and value of 𝑦depends on it. In the
next equation 𝑒is the number of perceptrons. The equation 7
represents the next activation function which is used to com-
pute the value of output neurons 𝑧. The equation 8is the
elaborated form of previous equation. Figure 4[22] shows
the important steps which are followed to train the model.
After input initialization, three layers are created, i.e., input,
hidden and output layers. The model is trained on initial set-
tings and error is initialized to zero. After this, the model
is trained on first pattern of data and error of each neuron is
calculated and added to the total error value. The forecast-
ing model is trained using all patterns and final error value is
computed. If the value of total error is less than the defined
threshold value then final model is trained, otherwise, model
is trained again.
3.2.3. BN
A BN is a directed acyclic graph. In this network, the
nodes are the variables and edges between these nodes are
the conditional dependencies. It is a strong candidate model
to compute the probabilities of all causes of an event. Using
this model, the real cause of an event can be determined. It
fulfills the property of local Markov chain [23] which is as
follows.
𝑃(𝑋1, 𝑋2, ..., 𝑋𝑛)(9)
=
𝑛
𝑖=1
𝑃(𝑋𝑖𝑋1, 𝑋2, ..., 𝑋𝑖−1)
Start
Select a feature
Find the probability of selected
feature to be in a class
Assign the class label
Probabilities for all
classes are computed?
Is there any feature?
Next feature
End
Yes
No
No
Yes
Figure 5: Flow chart of BN
=
𝑛
𝑖=1
𝑃(𝑋𝑖𝑃 𝑎𝑟𝑎𝑛𝑡𝑠𝑋𝑖)
The above equation represents that the 𝑋𝑖is conditionally
independent of its non-dependent and it is conditionally de-
pendent on its parents only. Due to this property of a BN,
the number of nodes reduces which reduces the overall com-
putational efforts. The figure 5represents the flow chart
of a BN. A feature is selected from the given input and its
probability of being member of a class is computed for all
classes. After probability computation, it is assigned a class
label. After that it is checked that whether probabilities of
all classes are computed or not. After this, algorithm checks
for any left features and then terminates.
3.3. Optimization algorithms
Optimization is the process of finding the best available
solution under some constraints. For an optimization prob-
lem, we define an objective function and constraints. This
objective function can be formulated to find either minimum
value or maximum value. Generally, for optimizing the hy-
perparameters of a forecasting algorithm, we minimize the
difference between actual and predicted value. It is called er-
ror minimization. Different types of optimization algorithms
are available as shown in Figure 6.
3.4. Hyperparameters optimization
The forecasting algorithms have hyperparameters. Their
values are selected before training the model as these param-
eters play a very important role in the accuracy of forecasted
values. These parameters are optimized by iteratively train-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 8 of 35
ing the model. This iterative process is continued unless the
stopping criteria is met as shown in Figure 2. The optimiza-
tion of hyperparameters means selecting the best values of
these parameters, where the accuracy of the forecasting is
highest. The detailed discussion of these methods is given in
Section 4. After the selection of parameter values, the model
is trained and final forecasted values are obtained which are
then reversed back to the original format and output is gen-
erated. The following subsections contain the hyperparam-
eters of our commonly used forecasting methods.
3.4.1. Hyperparameters of SVM
There are three important parameters of SVM that need
to be selected carefully, such as kernel function, regulariza-
tion parameter C and Gamma 𝛾[25]. There are three types
of kernels, i.e., liner, poly and RBF. Each one of these has
its pros and cons. The hyperparameter C is important while
defining the decision boundary. The higher value of C means
correct training points; however, the decision boundary will
not be smooth. On the other hand, C with lower value gen-
erates a smooth decision boundary but at the same time, it
may reduce the accuracy of training points. Moreover, the
hyperparameter Gamma defines the influence of the training
pattern. Its value has an inverse relation with the influence
of these patterns, i.e. higher value means low influence and
a lower value means more influence.
3.4.2. Hyperparameters of ANN
In an ANN, there are seven hyperparameters, such as the
number of hidden layers, learning rate, momentum, activa-
tion function, bath size, epochs and dropout rate [26]. The
selection of an optimal number of hidden layers affects the
performance of a model. The minimum number of neurons
makes a model fast and simple while increasing the num-
ber makes it slow but at the same time improves its ability
of classification. The learning rate is the step size of the
backpropagation, it affects the loss value during the train-
ing of a model. During the learning phase, the slow conver-
gence speed is tackled by keeping a record of past directions
and moving the algorithm towards the best possible direc-
tion. An activation function is used to pass the weighted
sum. Sigmoid, Tanh and RELU are examples of an activa-
tion function. Moreover, batch size can be defined as the
small samples of data. Instead of feeding all data to an ANN
at once, small samples of data are passed as input. Inappro-
priate selection of batch size results in an over-generalized
model. The value of epoch decides how many times a model
will be trained on an entire dataset. In dropout, the unneces-
sary nodes of a network are eliminated. It saves the network
from being heavy and having a repetition of information by
eliminating the less important nodes.
3.4.3. Hyperparameters of BN
A BN has four hyperparameters, such as the number of
input nodes, target nodes and states of input and target nodes
[27]. Before training the model, the values of these hyperpa-
rameters are selected carefully as they affect the complete-
ness of a model. The optimal selection of these parameters
ensures the better learning quality of a BN. The input nodes
are the parent nodes in the network and their states are the
children. It means, more states or input nodes result in a
dense network. Moreover, If both input nodes and target
nodes have multiple states then the model becomes complex
and computationally expensive. To avoid this situation, syn-
thetic nodes are defined.
3.4.4. Optimization problem
As discussed in the previous section, the hyperparame-
ters optimization is very important for accurate forecasting
in any domain. It can be solved as a separate problem. The
values of these parameters vary for each data set. The opti-
mized values for one data set may not perform well for an-
other data set of the same domain. So, whenever a forecast-
ing model is trained on a data set, it is necessary to optimize
its hyperparameters as well. The problem of hyperparameter
optimization can be formulated mathematically as: [28]:
𝐹= {ℎ1, ℎ2, ℎ3, ..., ℎ𝑛}.(10)
𝑆= {𝑠1, 𝑠2, 𝑠3, ..., 𝑠𝑛}.(11)
𝑓(ℎ) = 1
𝑘
𝑘
𝑖=1
(𝐹𝑠, 𝑇𝑠(𝑖), 𝑉𝑠(𝑖)).(12)
The equation 10 represents that a forecasting algorithm 𝐹
has 𝑛number of hyperparameters. The search space of these
hyperparameters is represented by 𝑆in equation 11 and it
also has 𝑛elements. Here values of hyperparameters are se-
lected form their search space as ℎ𝜖𝑆. Equation 12 repre-
sents the objective function which is to be minimized by se-
lecting the optimized values. Here, 𝑖represents the number
of samples and 𝐹𝑠is the error in forecasting using training
data 𝑇𝑠and validation data 𝑉𝑠.
4. Classification of tuning methods
In SG, the hyperparameters of forecasting algorithms are
tuned using different optimizations techniques. The com-
monly used methods are grid search, cross validation, gra-
dient descent and naive bayes (NB). Nature-inspired heuris-
tic algorithms are also getting popular in this field. Figure
6contains detailed classification of these algorithms. They
are classified into two major categories: nature-inspired al-
gorithms and other statistical methods.
4.1. Nature-inspired algorithms
In this subsection, a brief review of existing literature is
presented where nature-inspired algorithms are used for pa-
rameter tuning of forecasting methods. These are the most
commonly used algorithms because of their good performance
and adaptability.
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 9 of 35
Optimization
Techniques
Nature-inspired
algorithms
Statistical
methods
MGA
GA
CSA
MFA
QOABC
NSSO
PSO
Improved envir onment
adaption me thod
Tabu search
RCGA
Gradient descen t
Cross validati on
NB
DI-Cast
Quaisi newton method
Levenberg marqu ardt
Excavated associat ion
rules
NHPP
Altering direc tion of
multiplier method
Grid search
Figure 6: Classification of optimizations methods for hyperparameters
4.1.1. Differential evolution
In [29], an energy management system is proposed for a
residential area. This system helps the electricity consumers
to make strategies for their energy consumption and reduce
their electricity bills. Moreover, using this system, the peak
to average ratio is also minimized which plays an important
role in maintaining the reliability and sustainability of the
main grid. Here, the electricity consumers also generate en-
ergy locally using renewable resources. For efficient energy
management, energy generation forecasting from renewable
resources is of great importance. So, in this paper, ANN is
used for forecasting. The hyperparameters of this model are
optimized using an enhanced DE algorithm. To evaluate the
accuracy of the proposed model, NRMSE and MAPE are
used.
The SVM based classifier is used in [30] which defines a
hyperplane and classifies data with the help of support vec-
tors. As the primary goal of this study is to predict the price
of electricity with minimal forecasting error and there exist
some super parameters in SVM which have a direct effect on
forecasting accuracy, so, it is important to tune these param-
eters efficiently. For this purpose, a DE algorithm has been
used and the model is named as DE-SVM. It explores the
search space and finds the best combination of values for the
hyperparameters where forecasting accuracy of the model is
high. The simulations are carried out using Python and Intel
Core i5 with 4 GB RAM and 500 GB hard disk. To evaluate
the performance of the proposed model, it is applied to the
dataset from ISO New England Control Area from 2010 to
2015, having more than 5000 records. The proposed model
is compared with benchmark classifiers namely: NB, and
decision trees (DT). The error in forecasting accuracy shows
that the proposed model outperforms the benchmark classi-
fiers. The limitation identified in this study is that there are
several accuracy measurement metrics available in the liter-
ature that could be applied to check the effectiveness of the
proposed model. However, only basic error measure is used
to compare its performance with already existing algorithms.
Besides, the stability of the model is also not computed.
4.2. Genetic algorithm
In the study [31], an energy management technique is
proposed. The problem is formulated as a three-staged opti-
mization problem with four players i.e. utility, energy stor-
age company, microgrid (MG) and consumers. Each player
makes strategies and tries to maximize its profit. A wind
power forecasting model is proposed to forecast the power
generation as it is of intermittent nature and its prediction
plays a very important role in making strategies by MG. The
model is based on deep learning with stacked autoencoders
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 10 of 35
and GA. Former is used for the prediction and later is used
to tune the hyperparameters of the forecasting model. The
backpropagation (BP) algorithm is also employed for the train-
ing of the model and adjusting the initial weights of the net-
work. The values of the hyperparameters of autoencoders
and weights of the network have a great influence on the per-
formance accuracy of the forecasting model, so, GA is em-
ployed to determine the accurate values of these parameters.
The dataset from a local MG in Hebei (China) is used during
the period from Sep 2015 to Oct 2016. Forecasting accuracy
was compared with the BP algorithm and SVM using mean
absolute percentage error (MAPE) as a performance metric.
The simulation results demonstrate the higher accuracy of
the newly proposed method over both legacy models. The
limitation identified in this work is that data is not prepro-
cessed which can degrade the forecasting performance of the
model.
In [32], Bianchi et al. have addressed the problem of
electricity load forecasting of a time series. The dataset con-
sists of multiple variables that are considered important for
forecasting. The data preprocessing step is also included in
this paper for dimensionality reduction of the dataset. The
forecasting horizon of one day is considered with a forecast-
ing interval of 10 minutes. For electricity price prediction,
authors have used an echo state network (ESN). It has hyper-
parameters which need to be configured efficiently as they
affect the overall forecasting performance of the model. Ad-
ditionally, they use both real and integer values, so, this issue
is tackled by using a variant of GA. The difference between
classical GA and its variant lies in the values of chromo-
somes which are both integer and real values defined over
a fixed interval. For population updation, the Gaussian mu-
tation method is used where a random number is obtained
using Gaussian distribution and added to each child vector.
For crossover, the Laplacian crossover method is used. Af-
ter the prediction of each column, predicted values are inte-
grated and the final result is generated. The proposed fore-
casting model is implemented using the dataset from the Bel-
sito Prisciano feeder situated in azienda comunale energia
ambient (ACEA) power grid. The recorded dataset is for
three years, and each value is measured every 10 minutes.
The performance is compared with the autoregressive (AR)
integrated moving average (ARIMA) forecasting model. For
performance evaluation, normalized root mean square error
(NRMSE) is used as a performance metric. It is evident
from the results that the purposed model performs better than
ARIMA. Simulations are carried out using MATLAB.
In related work [33], Eseye et al. have addressed the
problem of power generation forecasting of wind generators.
The power generation of these sources is highly intermit-
tent and their integration with SG can be made efficient by
forecasting their generation in advance. So, in this study the
authors have used ANN for prediction and GA is used to
train the model. The proposed model has two stages, in the
first stage, the GA-ANN model is used to forecast the wind
speed using variables like wind speed, wind direction, air
pressure, humidity and air temperature. In the next stage,
the historical data from the SCADA database is used to train
the model for the power prediction of wind generators. So,
the proposed model is a double staged hierarchical hybrid
GA-ANN framework. In this model, GA is used to acquire
the efficient connection weight coefficients between neurons
of ANN. Generally, ANN has multiple layers and applies
the BP algorithm for parameter optimization. Generally, BP
uses gradient descent which can be trapped into local optima
and may result in poor optimization of these parameters. To
overcome this limitation of BP, GA is used as it is a global
optimum search algorithm. The proposed model is imple-
mented for the power prediction of MG wind farm in Bei-
jing China. MAPE, sum of squared error, root mean square
error (RMSE) and standard deviation error are used as per-
formance metrics. For simulations, MATLAB is used on a
PC Intel Core i5-5200 CPU, 2.20 GHz processor and 4GB
RAM.
4.2.1. Micro GA (MGA)
Alamaniotis et al. [34] proposed a new hybrid model
for price prediction. This method is ensemble method with
multiple relevant vector machines (RVMs). Each RVM pre-
dicts price and these predictions are clustered as a single lin-
ear regression optimization problem. The objective of this
optimization problem is to find the optimized weight coef-
ficients of MGA. After finding the appropriate solution, the
ensemble method forecasts the final price value. In the first
step, each RVM uses a different kernel function for predic-
tion. In this way the different dynamic values of electricity
price are obtained. In the next step, these predicted values
are clustered as a multiple linear regression ensemble and a
single forecasted value is generated. In order to complete
this task, an optimization problem of weight coefficient for
each regression value is solved using MGA. This is a variant
of classical GA based on the same core principle of survival
of fittest. It takes only five chromosomes, where mutation
is taken equal to zero and crossover value is equal to 1. It
also transfers the best chromosome from current population
to the next. After getting the optimized value of coefficient
weights, the final forecasted value is generated. The eval-
uation criteria used as objective function in MGA is mean
absolute error (MAE). In RVM, three types of kernels are
used namely: Gaussian kernel, polynomial kernel and spline
kernel. To evaluate the performance of the proposed model,
it is applied on dataset obtained from New England elec-
tricity market. The performance is compared with existing
schemes e.g. AR moving average (ARMA) and naive fore-
caster. MAE of each model is computed as it is more effi-
cient than mean square error (MSE) and MAPE because its
performance is not affected by outliers and zero values (di-
vision by zero). The values of MAE validate the better per-
formance of the proposed forecasting algorithm than other
two existing algorithms. The limitation of this work is the
absence of data preprocessing step. As in this model, the
data is not normalized such that outliers and spikes in data
are still present which affect the forecasting accuracy of the
system. So, by adding these steps the forecasting accuracy
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 11 of 35
as well as the time and space complexity of the algorithm
can also be improved.
4.2.2. Cuckoo search algorithm (CSA)
Xiao et al. have proposed a combined model for electric-
ity load forecasting in [35]. It is stated that a single forecast-
ing model may not generate best results in different scenar-
ios. Performance of model varies over the time and scenario.
So, to address this problem a combined model is designed
in this paper. This model includes BP based NN (BPNN),
radical basis function NN (RBFNN), GA optimized BPNN
(GABPNN) and generalized regression NN (GRNN). Ac-
cording to the combined forecasting model theory, when a
forecasting problem is solved by multiple forecasting mod-
els, their coefficient weights should be selected efficiently
and results are added up for a final value. In this study,
for optimization of coefficient weights of forecasting model,
CSA is utilized. This algorithm is selected because its con-
vergence speed is fast and it finds the global minima in a few
iterations. To evaluate the performance of proposed model,
three different datasets are used. The duration of first two
datasets is from November 2006 to 2008 and third between
August 2006 and 2008. The datasets are available in raw
form and need preprocessing before they are used in fore-
casting. In the first step, the issues like data spikes and re-
dundancy of features are solved. After the data preprocess-
ing step, the resultant dataset is passed to the forecasting
model. Here, the forecasting results of each model are eval-
uated separately and their accuracy is compared with pro-
posed combined model. To understand the characteristics
of these models, Diebold-Mriano accuracy test is used along
with other performance evaluation metrics e.g. absolute er-
ror (AE), MAE, MSE and MAPE. The smallest value of
these performance metrics means best prediction results. A
benchmark forecasting algorithm, ARIMA is also used for
comparison. Moreover, to evaluate the stability and accu-
racy of proposed model, bias-framework is used. The train-
ing and testing samples are chosen randomly. It is evident
from the simulation results that no single forecasting model
generates best results in all 24 hours. An algorithm whose
performance is best in first four intervals, performs average
for next three intervals and its performance becomes worst at
the middle section of prediction, when compared with other
forecasting algorithms. On the other hand, the performance
of the combined model is far better with higher accuracy and
better stability. It shows that the combined model success-
fully combines the advantages of all five models and it is
simple and efficient model for forecasting problems.
Naz et al. [36] proposed a forecasting algorithm to fore-
cast the energy generation from a photovoltaic cell. This in-
formation is then used to estimate the price of electricity.
The core aim of authors is to manage the storage capacity
and energy generation in an MG. The game theory-based ap-
proach is used for energy management. As the energy gener-
ated from photovoltaic cells is uncertain, so, forecasting its
energy generation values play a very important role in plan-
ning future strategies. The hyperparameters of the forecast-
ing model are optimized using CSA and gray wolf optimiz-
ers. To evaluate the forecasting accuracy, MAPE and RMSE
are used. Meanwhile in [37], Wang et al. have addressed
the problem of electricity price forecasting. Dynamic choice
ANN (DCANN) is used for forecasting which is a variant of
ANN and difference lies in the selection of input. In this
model, the input is selected according to the desired output.
However, it also has the same issue of parameter optimiza-
tion as ANN. To tune the hyperparameters optimally, CSA
is integrated in DCANN and this new hybrid model is called
updated DCANN. To evaluate the performance, it is imple-
mented on the dataset acquired from Queensland Australia in
2010. For simulation purpose, a Core i7 3.40 GHz processor
with MATLAB is used. The performance of proposed mod-
els is compared with BPNN, fuzzy NN (FNN), LSSVM, AR
fractionally integrated moving average (ARFIMA) and gen-
eralized AR conditional heteroskedasticity (GARCH) using
MAPE and MAE as performance metrics. The results show
that the proposed algorithm beats these existing models in
terms of both performance metrics. The accuracy of updated
DCANN is higher than other algorithms. However, the com-
putational time of the proposed model is higher than all other
benchmark models which is the only limitation of this work.
4.2.3. Modified firefly algorithm (MFA)
In [38], an SVR based forecasting model is introduced
which is hybridized with MFA for better prediction. This
model is developed for the short-term load forecasting. MFA
is used to tune the hyperparameters of SVR as these param-
eters have a direct effect on the forecasting accuracy of the
classifier. MFA is a modified version of already existing fire-
fly algorithm (FA). Several nature inspired algorithms have
been used in existing literature like: GA, PSO, ant colony
optimization (ACO) and artificial bee colony optimization
(ABCO). It is stated that these algorithms are not as effi-
cient as FA is. One main reason is that they do not have the
storage capability to store the best solution before moving
on to the next iteration. The MFA introduced in this pa-
per aims at improving the search ability of FA and reduce
the possibility of its trap in local optima. In this regard,
a modification method is introduced, where, two mutation
and three crossover operations are included. Moreover, the
whole population is moved toward the global optimal so-
lution and all the solutions are improved in this way. The
MFA is then used to tune the hyperparameters of SVR which
in return improves its forecasting accuracy. To evaluate the
performance of the proposed model, relative percentage er-
ror, mean percentile error, RMSE and MAE are used as per-
formance metrics. The proposed hybrid algorithm is also
compared with already existing models including ARMA,
ANN, SVR-FA, SVR-GA, SVR-honey bee mating optimiza-
tion (SVR-HBMO) and SVR-PSO using relative percentage
error. The results presented in this study clearly depict better
performance of new hybrid model than all other forecasting
models. Moreover, the performance of MFA is compared
with harmony search, ACO, ABCO and FA in terms of mean
and standard deviation. The results obtained from MFA vali-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 12 of 35
date the efficiency and better performance of MFA. The load
data of Faras province of Iran is used for evaluation. The lim-
itation identified in this paper is that data preprocessing step
is not included.
4.2.4. Quaisi oppositional artificial bee colony
optimization (QOABCO)
Progressing further, in [39], Shayeghi et al. have pro-
posed a forecasting model for electricity price and load pre-
diction. The Core aim of this study is to capture the existence
of electricity price and load dynamics, as in the existing lit-
erature they are sometimes used together for forecasting pur-
pose but their dynamics are never evaluated. There are three
main steps of this model. In the first step, the dimensions of
the dataset are reduced and irrelevant features are filtered out
to improve the forecasting accuracy and make the system ro-
bust. In the next step, the dataset is divided into several small
subsets. The third step is the forecasting step, where, a mul-
tiple input multiple output (MIMO) model is used to forecast
both electricity price and load. This study highlights the re-
quirement of simultaneous forecasting of load and price. It is
stated that most of the existing models predict load and price
of electricity separately which was suitable for unidirectional
grids. The modern SG has two way communication and en-
ables customers to take part in demand response program.
So, it urges the need of simultaneous forecasting of elec-
tricity load and price prediction along with their dynamics.
In the proposed model, dataset is preprocessed before fore-
casting. After data preprocessing, LSSVM is used for pre-
diction. This forecasting model is based on MIMO method
to forecast electricity load and price values simultaneously.
The hyperparameters of MIMO-LSSVM are adjusted using
newly proposed QOABCO. The classical ABCO algorithm
is simpler, flexible and more robust than other frequently
used nature inspired optimization algorithms e.g. GA and
PSO. It also has few control parameters and easy to imple-
ment. Its hybridization with other algorithms is also sim-
ple and has stochastic nature while handling the objective
cost function. However its convergence to the local optima
is very fast in case of multiple variables. To overcome this
limitation, an opposition based learning method is integrated
with ABCO and new model is named as QOABCO. This
model is implemented on the dataset obtained from New
York Independent System Operator electrical market. For
simulations, computer with 2.53 GHz processor with 4GB
RAM and MATLAB are used. To evaluate the forecast-
ing performance, MAPE, mean square error and standard
deviation error are used as performance metrics. The fore-
casted result is compared with original data and ANN. The
newly proposed optimization algorithm QOABCO is also
compared with PSO, GA and classical ABCO and it is ev-
ident from the results that QOABCO has best performance
in terms of min, max and mean of values. However its time
complexity is same as ABCO.
4.2.5. Novel shark search algorithm (NSSA)
In [40], a short term load forecasting model has been
proposed. For electricity load prediction, an improved ver-
sion of Elman NN (ENN) is proposed. It has four layers;
input, hidden and output layers are feed forward layers and
context layer is used as memory. The neurons of this layer
act as memory units and it is connected with hidden layer
through back forward loop. Whereas, the improved ENN
(IENN) uses the self-feedback mechanism between context
layer and hidden layer which makes this network more sen-
sitive to the historical data and improves its forecasting ac-
curacy. Moreover, as the IENN belongs to the NN, so, it
also has parameters which need proper tuning for accurate
forecasting. Shark search algorithm (SSA) is used for this
purpose. It was introduces in 2014 by Abediana for opti-
mization problems. SSA follows the sharkâĂŹs way of find-
ing victim. It has two basic steps: first one is initialization
and second is evaluation. The classical SSA is improved in
this paper and NSSA is proposed. In NSSA, an additional
step of evaluation of neighbors through Euclidean distance
is added. Here, with the help of neighbors and historical po-
sitions, the best position is acquired. The proposed model is
applied on the load data of three business centers of Arian
golden groups. The performance of proposed model is com-
pared with ARIMA, SVR, BPNN, RBFNN, wavelet theory
(WT) plus BPNN, WT plus RBFNN and WT plus two stage
mutual information. MAPE, RSME, normalized MAPE and
NRMSE are used as performance metrics. Simulation re-
sults demonstrate the better performance of proposed model
than other existing models.
4.2.6. PSO
Raza et al. have considered the problem of electricity
load forecasting of a building in [41] where photovoltaic
(PV) generators are integrated. Five forecasting algorithms
are integrated to predict the load value. The Bayesian model
averaging (BMA) is used to combine the outputs of all fore-
casting algorithms and generate the final prediction. The
electricity load demand of a building with PV integration
is of highly intermittent nature which affects the normal de-
mand of electricity and creates peaks. So, demand forecast-
ing of such buildings has a great importance but also diffi-
cult at the same time because of the intermittent nature of
PV generators and uncertain load consumption behaviors of
the residents. Owing to the aforementioned points, it is con-
cluded that single forecasting model cannot predict the ac-
curate value of load demand. In this regard, the authors have
proposed an ensemble framework which includes five fore-
casting models for prediction. This framework has four main
steps. First of all, the dataset is preprocessed using WT al-
gorithm. Here, fluctuations and uncertainties are removed
from the dataset for accurate prediction. In the next step, the
dataset is used by five forecasting algorithms and predictions
are made. These forecasting algorithms include: BPNN,
ENN, ARIMA, FNN and radial basis function (RBF). Af-
ter prediction, the result of each algorithm is reconstructed
back to the initial shape using WT. These outputs are then
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 13 of 35
combined using an aggregation technique. Here, Raza et al.
have used BMA. It is more frequently used aggregation tool
in literature which has potential to generate a better and ef-
ficient output predictions. Moreover, NNs have parameters
which need optimum values and they affect the accuracy of
the algorithm. So, it is important to train these parameters
efficiently. Commonly used gradient learning technique has
low convergence rate which may result in inefficient training
of the model. On the other hand, PSO is a population based
technique and frequently used by the researchers for opti-
mization. Dataset of building management system of AEB
and GCI building at University of Queensland, Australia is
used for the implementation of proposed model. MATLAB
is used for the extensive simulations. The performance of the
newly proposed forecasting model is compared with Persis-
tence, BPNN, ENN, ARIMA, RBF plus PSO, FNN plus PSO
and WT plus FNN plus PSO. Normalized MAE (NMAE)
and NRMSE are used as performance metrics. The simu-
lation results demonstrate the better accuracy of proposed
model when it is compared with other forecasting models.
In study [42], Vrablecova et al. have used online SVR
for short term load forecasting. This variant of classic SVR
saves less data and discards less important or less frequently
used data. In classic SVR, the dataset is divided into training
set and testing set and input variables are identified after ex-
ploring the whole search space. This model is not flexible for
changes and with time, its accuracy decreases which results
in retraining of the whole model from the initial stage. These
are limitations which make SVR computationally very ex-
pensive as the new changes are integrated in data and the
model is retrained. On the other hand, the online SVR over-
come these limitations. Three additional vectors are defined
and 3 staged incremental process is executed. The first step
is the addition of one new vector, in the next step an existing
vector is deleted. In the final step a third vector is updated.
So, in this way, new changes are accommodated in already
trained model and retraining of model is avoided. The tun-
ing parameters of the online SVR are same as the classic
SVR. Authors have used PSO and CSA. The performance of
both algorithms is evaluated by tuning their parameters sepa-
ratly and then their results are compared. Performance wise
the CSA is better than PSO. However CSA is slower than
PSO but as it needs fewer iterations to converge to an opti-
mal solution so the computational time of both algorithms
become equal. For performance evaluation, MAPE is used.
The limitation of this work is that the data preprocessing is
not included in the forecasting model.
4.2.7. Metaheuristic algorithms
In [43], Chou et al. have proposed a framework for en-
ergy saving in SG. It is a decision support system which
is based on big data analytics. It can monitor the power
consumed by several appliances and recognizes the pattern
of their consumption to predict the power consumption in
later hours. This information is then used for the efficient
scheduling of the appliances. The proposed framework con-
sists of multiple layers. The first layer is data layer which
contains data and all necessary information required by the
system to operate. This information includes: dataset, appli-
ancesâĂŹ information, electricity price signal, voltage in-
formation, power current, frequency and power factor. Me-
tering infrastructure, communication network and data man-
agement module are the main component used by this layer.
The second layer is analytics bench. It integrates different
dynamics and multi objective techniques to analyze the en-
ergy consumption pattern of appliances. It also maintains
the record of power consumption cost of different schedules.
Forecasting algorithms are integrated in this layer which use
historical data and predict the behaviors of power consump-
tion for next hours. This forecasted information is then used
to schedule the appliances. The forecasting algorithm is hy-
brid version of ARMA and SVR model. Hyperparameters of
this hybrid algorithm are tuned using nature inspired meta-
heuristic optimization algorithm. This algorithm requires
less resources and computational requirements and provides
near optimal solution. In addition to the forecasting model,
this layer also has optimization algorithm to generate the
suitable schedules for the appliances using forecasted de-
mand information. The scheduling algorithm is dynamic
in nature and based on multi objective functions. The third
layer is web-based portal. It enables the user to interact with
the proposed decision support system. MATLAB is used as
a tool for the implementation of the system.
An electricity price and demand forecasting algorithm is
proposed in [44]. An LSTM based forecasting model is pro-
posed. In the proposed model, the dataset is preprocessed
and then used for forecasting. In this step, missing values
and outliers are removed from data and values of all vari-
ables are normalized between 0 and 1. This dataset is then
split into test and training samples. The hyperparameters
of LSTM are tuned using the Jaya optimization algorithm.
It belongs to the family of metaheuristic optimization al-
gorithms. it is a simple algorithm and does not need deep
knowledge for implementation. Using this optimizer, win-
dow size, step size and a batch size of LSTM are optimized.
Two different datasets are used for electricity price and load
prediction, both are obtained from Elia grid data. For simu-
lations, a PC with Intel Core i3 CPU with 4 GB of RAM and
a 64-bit operating system is used. The performance of the
proposed algorithm is compared with SVM and uni-variant
LSTM. The simulation results depict that the performance
of the proposed Jaya-LSTM algorithm is better than other
forecasting algorithms. To measure the accuracy of the fore-
casting models, RMSE and MAE are used as performance
metrics.
4.2.8. Improved environment adaption algorithm
Meanwhile Singh et al. [45] have proposed a general-
ized NN model for short-term electricity price forecasting.
The classical ANN model is considered ideal for price fore-
casting because of its ability of mapping nonlinear problems
with high accuracy and dealing with complex forecasting
problems. However, the increased complexity of forecast-
ing model decreases its freedom which results in under or
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 14 of 35
over fitting problems. In the proposed model, WT method is
used for data preprocessing. The weights of the forecasting
model are tuned by using improved environment adaption
method. The over and under fitting problem is overcome
by tuning the hyperparameters by using this nature inspired
evolutionary algorithm. Adaption, alteration and selection
are the three basic operations of this algorithm. The alter-
ation operation has a great contribution as it explores the
search space and adaption exploits the search space so its has
minor contribution in finding the best solution. The forecast-
ing accuracy is evaluated by computing MAPE and MAE
performance metrics. The forecasting model with optimized
weights and without optimized weights are compared and re-
sults depict that general neuron forecasting model with op-
timized weights generates better results.
4.2.9. Tabu search
Progressing further, in [46], Bassamzadeh et al. have
proposed a data driven approach based on BN for electric-
ity demand forecasting. It computes the mutual dependen-
cies of the variables contributing in forecasting. BNs are
suitable for complex and lengthy datasets as they can han-
dle the incomplete data, integrate previous knowledge into
the model and provide a compact model to avoid overfitting.
The search and score category of this network is used, where
a scoring metrics is built by exploring the search space. To
built the scoring metric, BDeu score is used and to find the
best directed acyclic graph from multiple available graphs,
authors have used tabu search. Initially a score of a randomly
generated network is calculated, then arc operations are ap-
plied and new scores are computed. At the end, the arc oper-
ation with maximum score is applied. This procedure does
not guarantee global optimal solution rather it gives the local
best solution. This problem is solved iteratively by repeating
the same process until a stopping criteria is satisfied. As dis-
cretized BN is most suitable according to the nature of out-
put values, so the the continuous values are discretized by
using Fayyad and Irani method. The discretization parame-
ters also need tunning and authors have used junction three
algorithm for an efficient probability distribution and kernel
density estimation with Gaussian kernel for continuous dis-
tribution to the learned histogram. Performance of the pro-
posed system was analyzed using average RMSE (ARMSE)
as performance metric.
4.3. Statistical approach
In the forecasting models, several statistical approaches
are also used for optimization. The following subsections
elaborate the work where these models are implemented.
4.3.1. Grid search
In [47], Raviv et al. have addressed the question of whether
daily price prediction is better than the average hourly price
of electricity. For this purpose, the univariate models for
daily average price prediction are compared with multivari-
ate hourly price prediction. The hourly electricity price pre-
diction is a challenging task as it requires multiple input fea-
tures and increases the computational complexity. It is ob-
served that the predictive value of the previous hour is not
reliable enough to predict the value of the next hour. The
hourly values are sensitive to the values of the same hour
on the previous day. So, the daily average electricity price
is forecasted using the univariate model and its performance
is compared with hourly price prediction using the multi-
variate model. Later approach suffers from the curse of di-
mensionality issue which can be solved by using dimension
reduction methods along with regularization method which
eliminates the outliers. The results of multiple forecasting
algorithms are also combined to generate the final predic-
tion as the forecasting model performs differently in differ-
ent time intervals and combining the results of these models
can increase the accuracy of prediction. For the case study,
the dataset from the Nordic and Baltic transmission system
operator is acquired. It is a leading power market in Europe.
The duration of the dataset is from 1992 to 2010. From uni-
variate models, AR, dynamic AR (DAR) and heterogeneous
are used for forecasting. In the AR model, the number of lags
is included in the model for prediction and their value is con-
stant. Whereas, DAR determined the value of lags at each
point using the Akaike information criterion. The second
variant of AR is designed to accommodate the long memory
to increase forecasting accuracy. From multivariate forecast-
ing models, different variants of vector AR (VAR), factor
models (FM) and reduced rank regression models (RRR)
are used. These models are strictly constrained and their
complexity is limited in terms of unknown parameters, as
the unconstrained multivariate models suffer overfitting and
forecasting becomes less accurate. First, VAR models are
going to be discussed, where vectors are maintained to save
the predictive values of different hours. Its first variant is
unrestricted VAR, it has no restriction in terms of unknown
parameters and used all the available parameters during fore-
casting. The second variant is diagonal VAR, the number of
unknown parameters is restricted and all coefficients of lags
are limited to zero. The third variant is Bayesian VAR, it
used the shrinkage method to limit the parameters. After
VAR variants, FMs are used to reduce the effect of adverse
effects of the high dimensionality of data. It uses principle
component analysis (PCA) and singular value decomposi-
tion (SVD) for dimensionality reduction. The third model
is RRR which instead of forming orthogonal variables by
matrices X like PCA, reduces dimension by using orthog-
onal projection of Y. To combine the forecasting value of
the proposed models, authors have used two methods: sim-
ple average and constrained least squares. The former uses
the equal weights for every method while averaging the final
results and later optimizes the weights of the coefficient on
every point of prediction. For tuning the hyperparameters of
the forecasting models grid search is used. The accuracy of
all forecasting models is evaluated using RMSE, MAE and
MAPE. AR model is used as the benchmark model and error
values are relative errors which are used to represent the per-
formance accuracy of all other models. It is evident from the
results that conditional least square-based forecasting model
outperforms all other forecasting models in terms of all three
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 15 of 35
performance metrics.
In related work [48], a new machine learning-based elec-
tricity load forecasting method is introduced which is the
combination of convolutional NN (CNN) and K-means clus-
tering. It is a scalable model specially developed for big
data-based forecasting. The raw data is collected and pre-
processed. Then the K-means algorithm is used to generate
the subsets of the source data. These subsets are used to train
the CNN model. It is a feed-forward network and its archi-
tecture is inspired by the structure of human neurons. It has
three types of layers: an input layer, multiple hidden lay-
ers and an output layer. In contrast to the other NN models,
it does not need feature engineering as it is also a sub-part
of this network model. The proposed model is designed for
hourly load forecasting using big data. Data is collected form
the industry in raw form and data preprocessing is applied to
remove noise and outliers from it. After data preprocessing,
the K-means clustering algorithm is applied to the dataset
and small subsets of data are generated. Then these clus-
ters are used as training and testing datasets for the forecast-
ing model. The training datasets are used to train the CNN
model and after training, test sets are used to test the forecast-
ing. Besides, the k-means algorithms have a hyperparameter
K which is the optimal number of clusters. This parameter
is tuned using the trial and error method. The hyperparam-
eters of the CNN model are optimized using a grid search.
To implement this model, the dataset of 1.4 million records
is used, the duration of this dataset is from 2012 to 2014.
The performance of the proposed model is evaluated using
MAPE, RMSE, NMAE and NRMSE. It is also compared
with conventional CNN, linear regression, NN, SVR, linear
regression plus K-means, SVR plus K-means and NN plus
K-means for both summer and winter season. The perfor-
mance results depict that the proposed CNN plus K-means
model has the highest accuracy in terms of all performance
metrics.
Xiao et al. investigate the possible applications of data
mining techniques in electricity load prediction using big
data in [49]. Deep learning-based techniques are used to ex-
plore data and forecast the load consumption. There are two
types of deep learning-based techniques: supervised and un-
supervised. Former are used as prediction models and later
are used for feature extraction. In this study, DT and asso-
ciation rule mining are used. For load forecasting, the data
is partitioned into training and testing datasets. Two data
mining techniques, DT and clustering, can be used for this
purpose. The former divides data based on certain criteria
and later divides dataset based on the similarity. DT is used
in this study as it is more interpretable. Moreover, another
deep learning-based model, deep autoencoder (DAE), is de-
veloped which uses the tanh activation function and gener-
ates a new feature set. It is applied to each tree sample for
new feature sets. The next step is the knowledge discovery.
Here the relationship of input variables is determined using
association rule mining techniques. The QuantMiner algo-
rithm is used for both numeric and absolute values. It is a
quantitative association rule mining algorithm and the as-
sociation of each data subset is computed separately. After
knowledge discovery, the forecasting algorithm is applied
to data for electricity load prediction. The gradient boost-
ing algorithm, SVR and extreme boosting algorithms are
used for load prediction and evaluatethe performance of the
data mining techniques. These forecasting algorithms are
applied to raw data, basic data and data generated by the
DAE method. The hyperparameters of all forecasting al-
gorithms are tuned using the grid search. For performance
evaluation of the proposed model, it is applied to the dataset
obtained from a campus building of Polytechnic University
Hong Kong China. The performance of each forecasting al-
gorithm is evaluated using MAE, RMSE and coefficient of
variation of the RMSE. The simulation results and values of
these performance metrics demonstrate that higher accuracy
is achieved by the extreme boosting algorithm when it is ap-
plied on DAE feature set.
Raurich et al. have tackled the issue of electricity load
forecasting in non-residential buildings in [50]. The impor-
tance of temperature, occupancy, calendar and indoor am-
bient variables are also evaluated for load prediction of a
building. The dataset from a university is collected using
a wireless sensors network. The collected data is prepro-
cessed to reduce the computational cost and improve the ac-
curacy of the forecasting model. For load prediction, three
forecasting models, multilayer perceptron (MLP), multiple
linear regression (MLR) and SVR, are employed. The hy-
perparameters of these models are tuned using grid search
as it is an efficient method to tune the hyperparameters of
the models with small data size. The simulations are car-
ried out using Weka software on the machine having Intel
Core i7 CPU and 8 GB of RAM. Simulations are conducted
using different combinations of input variables and forecast-
ing accuracy of each combination is computed using MAPE
and correlation coefficient. From the results, it is concluded
that SVR gives the highest accuracy using temperature and
occupancy data.
A day-ahead electricity price and demand forecasting model
is proposed in [51]. It is beneficial for both electricity con-
sumers and providers. Electricity consumers can use this in-
formation for energy management and reduce their electric-
ity bills while electricity providers can use this forecasting
information to manage electricity generation and maintain
the reliability and efficiency of the grid. In this paper, the
authors have used two forecasting models, SVR and CNN.
The hyperparameters of both models are tuned using grid
search and their performance is compared. Two datasets are
used for forecasting. To evaluate the forecasting accuracy of
both models, MSE, RMSE, MAE and MAPE are used. The
simulations depict that the performance of enhanced CNN
is the best.
Garulli et al. [52] have addressed the problem of elec-
tricity load forecasting in the presence of demand response.
In an active demand scenario, the consumers alter their load
consumption profile according to the incentives provided by
the utility. Two models of load forecasting are introduced
i.e. black box and gray box models. In the black box model,
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 16 of 35
SVM and ANN are used for forecasting. In this model, the
load for next intervals is forecasted by using information re-
lated to the active demand, temperature and calendar data.
On the other hand, the gray box model uses prior informa-
tion related to load decomposition. It has two main stages
i.e. computation of baseload and modeling the residual. Ex-
ponential smoothing (ES) is used for the first stem, whereas,
the residual is modeled using TF, SVM and ANN and re-
sultant models are known as ES-TF, ES-SVM and ES-ANN
respectively. Moreover, to evaluate the importance of AD,
the ES-ARMA model is also presented. For the simulation
of these forecasting models, MATLAB is used and RMSE
and NRMSE are computed for each forecasting model. Hy-
perparameters of the forecasting model are tuned using the
grid search. It is concluded that AD information is very im-
portant for accurate prediction and its scope is high in the
future as new AD-based products are being launched.
Keles et al. [53] have proposed a day-ahead electricity
price forecasting model based on ANN. The main focus of
this study is on data preprocessing techniques and the right
selection of hyperparameters of the model as these inputs
have a great impact on the forecasting accuracy of the model.
For data preprocessing, different clustering algorithms have
been used. There are two forecasting strategies available i.e.
direct forecasting and iterative forecasting. The former fore-
casting model does not use historical forecasted values in-
stead it performs forecasting independently by considering
each interval a separate model. On the other hand, later uses
the previous forecasted value. Authors have used the first
model as in the case of the second strategy, if an error oc-
curs in previous forecasted value, it will be propagated to
the next forecast as well. For hyperparameter tuning, both
cross validation and grid search are used. The first method
is used to tune the parameters like the selection of activation
function, learning rate, number of hidden layers and neurons
and number of output neurons. To select the best combina-
tion of the learning rate and momentum of the algorithm,
the grid search is used. The proposed model is compared
with three benchmark models i.e. ARMA and two NB mod-
els. For performance comparison, ARMSE and mean abso-
lute deviation are used. The dataset is used from Jan to Sep
2013. The limitation of this work is that if new components
are added to the system as input then the whole model needs
changes which makes it inefficient for the integration of new
parameters and adopting new changes.
4.3.2. Gradient descent
In [54], Amarasinghe et al. have proposed a load fore-
casting model based on deep NN (DNN). This model fore-
casts the power consumption of a building. The multiple
layers of deep learning algorithms allow it to identify the
load consumption patterns and the relationship of data fea-
tures more efficiently. It also learns the relationship between
inputs and their corresponding output. In this model, CNN
is used for forecasting. It applies multiple convolutional lay-
ers on the available dataset before the final prediction. The
convolution operation requires grid data for processing and
the kernel function also uses the weighted array of multiple
dimensions. Thus this model requires a dataset of multiple
dimensions for prediction. There are three basic steps of a
convolutional layer. In the first step, a feature map is gener-
ated using convolutional operation which is passed through
an activation function in the next step. The resultant output
is then processed by a pooling function to refine and reduce
the fluctuations from the feature map. In the second step,
the rectified linear activation function is used and for pool-
ing in step, max pooling is used. As the proposed model
has multiple convolutional layers, so, each layer passes its
output to the next convolutional layer and BP is used for the
learning process. This BP based model needs the optimiza-
tion, so, ADAM optimization algorithm based gradient de-
scent method is used for the optimization. After convolu-
tional layers, the output is forwarded to the hidden layer(s)
which then forwards it to the output layer of the network.
The hidden and output layers of the convolutional network
are the same as standard NN. To evaluate the accuracy of
the model, the loss function is used which computes the er-
rors in prediction for all predicted values. To check the ef-
fectiveness of the proposed model, it is implemented on the
benchmark dataset named âĂIJindividual household electric
power consumption datasetâĂİ from Dec 2006 to Nov 2010.
This model is also compared with ANN, SVM, Factored
conditional restricted boltzmann machine and long short-
term memory (LSTM) using RMSE as a performance met-
ric. The results demonstrate that the proposed CNN is a vi-
able candidate for load forecasting as its forecasting accuracy
is higher than these benchmark forecasting algorithms. The
limitation identified in this work is that no data preprocess-
ing technique is mentioned here for the dataset. Literature
depicts the importance of data preprocessing steps in terms
of accuracy of forecasting models, time and space complex-
ity, etc.
A novel framework for electricity price prediction is pro-
posed in [55]. It includes four novel deep learning-based
models. It is stated that there is no standard benchmark for
electricity price prediction, so, the proposed models are com-
pared with 27 forecasting algorithms to validate their effec-
tiveness. The proposed models include DNN, hybrid model
of LSTM and DNN, a hybrid model of gated recurrent units
(GRU) and DNN and a CNN model. The first model, DNN,
is the extension of MLP based on deep learning with two hid-
den layers. The second model is the combination of LSTM
and DNN forecasting models. It includes a normal layer that
learns the relationship of non-sequential data and a recurrent
layer which learns and identifies the relationship of sequen-
tial data of time series. The third layer is the combination of
GRU and DNN. Like the second forecasting model, it also
has separate layers for both sequential and non-sequential
data. However, the LSTM layer has an extra computational
burden which has been reduced by using GRU instead of
the LSTM network. The fourth deep learning model intro-
duced in this work is CNN. In the previous two models, the
data was separated as past data sequences and data contain-
ing day ahead information. Whereas, in this model, the data
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 17 of 35
clusters are made based on its dimensionality. Moreover,
the detail of hyperparameters is also given in this study. It
is stated that the hyperparameters depend on the data that
we use each time. These parameters need to be optimized
each time the dataset is changed, so there is a need for an
optimizer to optimize these parameters when needed. The
hyperparameters are tuned using adam which belongs to the
stochastic gradient descent. The common hyperparameters
of all models include activation function, dropout and L1-
norm penalization. The hyperparameters of DNN are the
number of neurons in both hidden layers. The hyperparame-
ters of both GRU-DNN and LSTM-DNN are the same which
are the number of neurons in recursive and DNN layers and
length of the sequences. CNN model has pooling frequency,
pooling type, channel length, filter size, number of convo-
lution and features maps as its hyperparameters. During the
training, minimized AE is used as an objective function. The
dataset from European power exchange Belgium is used dur-
ing the period from 1 Jan 2010 to 31 Nov 2016. The data
is preprocessed using Box-Cox transformation. The perfor-
mance of the proposed models is compared with 27 fore-
casting models: AR, double seasonal ARIMA (DSARIMA),
wavelet ARIMA (WARIMA), WARIMA-RBF, ARIMA-GARCH,
double seasonal holt winter (DSHW), Trigonometric regres-
sors to model Box-Cox transformations autoregressive mov-
ing average errors trend seasonality (TBATS), dynamic re-
gression (DR), transfer function (TF), AR with exogenous
input (ARX), threshold ARX (TARX), Hsieh-Manski ARX
(HMARX), smoothed nonparametric ARX (SNARX), full
ARX (FARX), FARX-least absolute shrinkage and selection
operator (FARX-LASSO), FARX with elastic net (FARX-
EN), MLP, RBF, SVR, self-organization map (SOM)-SVR,
SVR-ARIMA, random forest (RF), extreme gradient boost-
ing (XGB), DNN, LSTM, GRU and CNN. For comparison,
the performance of all forecasting models is evaluated us-
ing symmetric MAPE (sMAPE) as a performance metric.
The simulations are carried out using Python. The results
demonstrate that the proposed models outperform other ex-
isting models and from these four models, all models per-
form best in one or the other interval.
In related work [56], Wang et al. have used deep learning-
based stacked denoising autoencoder (SDA) model and its
extension, random sample SDA (RS-SDA), for short term
electricity price forecasting. These models are used for two
types of predictions, hourly day-ahead, and hourly online
price. Autoencoders are the NN which encode the input us-
ing an activation function before learning. It is an unsuper-
vised learning algorithm where instead of assigning labels,
the symmetry in data is learned. Before generating the out-
put, the values are decoded back to the original input format.
The output generated by autoencoders may have noise in it,
this issue is overcome by using denoising autoencoder which
encodes the given input data and also removes the noise from
it. The proposed SDA is the combination of multiple denois-
ing autoencoder layers, where the input of each autoencoder
is the output of the hidden layer of the previous autoencoder.
The RS-SDA incorporates random sample consensus and
stochastic neighbor embedding. Where the former is an iter-
ative process used to eliminate the effects of outliers that oc-
cur during the construction of the model and later is used to
improve the forecasting accuracy by optimizing the number
of hidden layers. Data features are chosen with the help of
market traders and to check the relevancy of each feature, a
boosting tree algorithm is used. After feature selection, SDA
and RS-SDA are applied for forecasting, where a greedy lay-
ered architecture is used for training. It has two stages: a
pre-training stage, where autoencoder layers are trained it-
eratively using input data except the output layer and a fine-
tuning stage, where all the layers are trained including the
output layer to increase the forecasting accuracy of the model
and here mini-batch gradient descent is used. This model
is used because it does not require the whole screening of
the dataset and parameters are not updated iteratively. After
training and tuning of the forecasting models, the electric-
ity price value is forecasted. The proposed model is tested
by implementing it on the dataset from Mid-continent Inde-
pendent System Operator Inc. in the U.S including the Ne-
braska Public Power District, Arkansas, Louisiana, Texas,
and publicly available Indiana. For simulations, Python 2.7
using a computer with Core i5 processor and 8 GB of RAM
is used. The proposed models are also compared with other
forecasting models (SDA, RS-SDA, classical NN and SVM)
to check their forecasting accuracy. MAPE, MAPE(day) and
MAPE(month) are used as performance metrics. The limi-
tation identified in this paper is that the temperature feature
is not included as input. It is a very important feature which
affects the values of the price.
In the study [57], Lu et al. have proposed a NN based
short-term load forecasting system which is an improved ver-
sion of RBFNN. A new clustering algorithm, PCA based
weighted fuzzy C-Mean (PCA-WFCM), is proposed to de-
termine the optimal basis function centers which improves
the accuracy of the forecasting model. The new forecasting
model is named as RBF-PCA-WFCM. For the implemen-
tation of the proposed model, the dataset from NSW State,
Australia is used. This is an hourly dataset with 48 records
per day and the duration of the data is from April 2011 to
October 2011. The traditional RBFNN basis on the approx-
imation theory and a feed-forward network. It has a simple
architecture which makes it easy to train. It has three impor-
tant layers: input layer, hidden layer and output layer. The
source data is entered in the input layer which is forwarded to
the hidden layer using non-linear transformation. This trans-
formation is used to extract important features from data. In
the next step, the data is forwarded to the output layer us-
ing a linear transformation. In this forecasting model, the
selection of an optimal number of neurons for each layer is
very important which are equal to the input variable and out-
put layer neurons are equal to the number of required out-
puts. Whereas, the optimal number of hidden layer neurons
are determined by minimizing the MAPE value. Moreover,
the connection weights of output layers also play a very im-
portant role in forecasting accuracy. Here, the gradient de-
scent method is used to train them. The most important task
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 18 of 35
of the RBFNN model is finding the accurate center points
of RBF which improve the forecasting accuracy. The tradi-
tional RBFNN uses the K-mean algorithm for this purpose,
whereas, the authors have proposed a new PCA-WFCM al-
gorithm. PCA is used for dimension reduction of data and
clustering is based on fuzzy logic, where each point has some
degree of membership with each cluster. To evaluate the
performance of the proposed system, MAPE and MSE are
used as performance metrics. Its performance is also com-
pared with RBF and RBF fuzzy C-Mean (RBF-FCM) fore-
casting models. The results demonstrate that the proposed
algorithm outperforms these two forecasting models and has
higher accuracy. The limitation identified in this network is
that the gradient descent algorithm is not suitable for big data
problems and the number of hidden layer neurons could be
trained using some optimization algorithm.
Meanwhile, in [58], the forecasting problem of solar power
generation is addressed by using the least absolute shrinkage
and selection operator (LASSO) based algorithm. The his-
torical data related to weather is used and the significance of
each weather variable is also computed to acquire the knowl-
edge about their importance. To estimate the forecasting
model’s coefficients, a Kendalls’ tau coefficient based algo-
rithm is proposed which maximizes the values of Kendalls’s
coefficients to find the solution. LASSO reduces the weights
of irrelevant variables and reduces the dimensions of the
data. However, there is a trade-off between reducing the
number of variables and the accuracy of the model. The in-
creased number of variables results in better accuracy but
increased computational time is also there. The hyperpa-
rameter affecting the forecasting accuracy of the proposed
algorithm is tuned by using gradient descent. The solution
path method is also integrated with this tuning method to
increase the efficiency of the model and reduce the compu-
tational efforts, as using this new method the whole search
space is not explored for an optimal solution. The forecast-
ing performance of the proposed algorithm is compared with
a well-known forecasting technique i.e. SVM [59]. Two
datasets are used for the performance evaluation, the first
one is recorded from Feb 2006 to Jan 2013 and the second
dataset is April 2011 to Nov 2012. To compare the perfor-
mance of these three algorithms, RMSE and MAPE are used
as performance metrics. The simulation results depict the
outstanding performance of the proposed algorithm over two
benchmark algorithms. Hence LASSO based forecasting al-
gorithm proved to be a promising model for solar power fore-
casting problem.
Zheng et al. have proposed a short term load forecasting
model in the study [60]. This model integrates data prepro-
cessing, decomposition and forecasting methods for accurate
prediction and named as SD-EMD-LSTM. Where similar
days (SD) selection method is used for preprocessing, em-
pirical mode decomposition (EMD) is used for data decom-
position and LSTM is used for forecasting of load in the short
term. In this model, in addition to the humidity, day type and
temperature data, day-ahead peak load is also used as an in-
put feature as short term load forecasting is affected by this
feature. LSTM is a recurrent NN (RNN) which is suitable
for the long-term data dependencies. The classic RNN is not
suitable for the long-term forecasting problem as it does not
store the previous data for a long time. Whereas, the LSTM
based RNN overcomes this problem by integrating mem-
ory cells which store the state information, thus making it
suitable for long-term prediction. Additionally, researchers
have introduced the sequence to sequence architecture in this
study, where the length of input and output is variable and
the time scale of the forecasting model can be changed ac-
cording to the requirements. The hyperparameters of the
forecasting model are tuned using stochastic gradient de-
scent. The forecasting error is mini minimized using MAPE
as an objective function. The forecasting performance of the
proposed model is compared with ARIMA, BPNN and SVR.
Additionally, it is also compared with LSTM, SD-LSTM and
EMD-LSTM. MAPE is used as a performance metric and
its results demonstrate that the proposed framework has the
highest forecasting accuracy.
4.3.3. Cross validation
In the study [61], a forecasting model of electricity con-
sumption is proposed for event venues. The green button
data is used for model training. Such forecasting models can
help the venue organizers to estimate the load consumption
in advance and add its cost to their fee. Moreover, in such
events, energy is consumed in a huge amount and this de-
mand can put a burden on grids. The energy consumption
pattern of such events has huge variations as compared to the
office buildings which have strict energy consumption pat-
terns. In this work, two forecasting models are implemented
for prediction: NN and SVR. For NN, the feed-forward NNs
are selected as they are the most commonly used NN. It has
three layers and information travels in one direction only.
The neurons of each layer are connected but there is no con-
nection between the neurons of the same layer. On the other
hand, the SVR is used for classification and regression-based
problems. In the first step, the dataset is divided into a test
set and training set. Then these datasets are used to train both
forecasting models. The hyperparameters of both models are
tuned using a cross validation technique. After model train-
ing, the prediction is made and the accuracy of prediction
is checked using MAPE and coefficient of variance metrics.
Three case studies are designed for predictions. In the first
case study, the time interval is 15 min, and for second and
third case studies, the prediction is made for one hour and
one-day intervals, respectively. The duration of the dataset
used for forecasting is of two years. The performance com-
parison of both forecasting algorithms depicts that the feed-
forward NN has better performance than SVR.
In [62], an inverse optimization scheme is proposed to
forecast the electricity load consumption. It has two levels
i.e. an upper-level problem and lower-level problems. For-
mer deals with the estimation of bid in the market, this bid is
placed by either consumer of electricity or retailer. The in-
verse optimization scheme is used at this stage. On the other
hand, later deals with the price response of a group of con-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 19 of 35
sumers. The parameters of this level are determined by the
successful bid placed in the upper-level problem. The placed
market bid the predicted amount of load for the future. The
tuning parameters of this model are optimized using cross
validation. The values of these parameters are optimized for
each month. The benchmark model ARX is used to check
the effectiveness of the prosed scheme. For a fair compar-
ison, the dataset of Dec 2006 is used. The performance is
compared based on MAE, RMSE and MAPE performance
metrics. The simulations are carried out using a Linux based
system having a quad-core processor with a clock speed of
2.9 GHz and 6 GB of RAM. R studio is used as a platform
along with CPLEX 12.3 to solve the optimization problems.
In related work [63], the short term load forecasting tech-
nique is presented for an educational building. The proposed
model has two stages and the dataset is collected from a uni-
versity having 32 buildings for the past five years. The data
related to weather conditions, calendar and university tim-
ings are collected. The on-line data of energy consumption
is monitored using KEPCOâĂŹs smart system. The weather
information is collected from the Korean meteorological de-
partment. The data preprocessing step is also included and
values of input variables are normalized and transformed ac-
cording to the requirement. In the first step of the proposed
forecasting model, the frequent patterns are identified from
the available dataset with respect to the weekdays using a
simple moving average method. It is a popular method that
highlights the frequent trends in data for the long term and
short term fluctuations in the dataset are ignored. It forecasts
the load demand according to the frequent patterns identi-
fied in past data, this implies that if identified patterns are
not valid then it could generate poor forecasting values. So,
owing to this issue, this model is not efficient to forecast the
electricity load demand alone. Another forecasting method
named RF is used in this model for forecasting. This model
is an ensemble method that uses the prediction of multiple
DT. It is efficient for a large amount of data and has high ac-
curacy. Moreover, its hyperparameters need less tuning and
their default values give good results in most of the cases.
Its basic parameters are chosen using the cross validation
method. Moreover, as the proposed model is a time series
model, so, its predictive accuracy becomes poor when the
gap between the training and forecasting time gets bigger.
To solve this problem, Moon et al. have used the time series
cross validation method. In this method, there are multiple
horizons and this approach focuses on one forecasting hori-
zon at a time. The simulations are carried out using RStudio
with R-3.0.2. The proposed model is also compared with
already existing forecasting models like SVR and NN. For
performance evaluation of these forecasting models, MAPE,
RMSE and MAE are used as performance metrics. In addi-
tion, DT, multiple regression, gradient boosting machine and
J. Moon et al. models are also compared with this proposed
forecasting model. The results of all models demonstrate the
higher accuracy of the proposed model in terms of all three
performance metrics.
Moghaddass et al. [64] have proposed an anomaly de-
tection method based on data collected from smart meters.
The major aim of this model is to detect the occurrence of an
anomaly in real-time and prevent it. An error count is mea-
sured at each customer’s side and delivered to the system
with which the customer is linked. This information is then
used to predict the occurrence of an anomaly. The math-
ematical formulas are defined to compute the error count
along with anomaly detection formulas for both customers’
side and system side. Anomaly detection with a control limit
is also defined. When the health index of the severity level
exceeds a certain level of threshold, an alarm is generated
which indicates the occurrence of a possible anomaly in the
system. The dataset is split into two sets, one with 80 percent
of data and others having 20 percent of data. The first set is
used for the training of the model and the remaining 20 per-
cent is used for the testing purpose. The accuracy, precision
and false alarm generated by the system are computed and
accuracy is evaluated based on these measures. For tuning
the hyperparameters of the system cross validation method
is used.
In [65], an SVR based cascade failure prediction model
is proposed. A probabilistic framework is designed to main-
tain the historical database. This model collects both online
and offline data from the grid. The online data includes the
information related to voltage, current and measurements of
power flow from the grid. The offline data includes past data
related to islanding of grid, blackouts and transfer outage.
The proposed framework has two phases i.e. one collects
and maintains the database and in the second phase SVR is
used to predict the power outages using this historical data.
The hyperparameters of SVR are tuned using cross valida-
tion. The kernel function is chosen carefully as it affects the
complexity and smoothness of the prediction model. The
prediction generated by SVM is of binary nature which in-
dicates the occurrence of cascade power failure in the fu-
ture. This predicted value can be used to generate a warning
for possible failure. This system can be used in real-time
self-healing robust system in grid stations. Several scenar-
ios of failure are designed and tested using SVR. For sim-
ulations, LIBSVM of MATLAB is used. The performance
of the proposed model is satisfying. However, the data pre-
processing is not included in the model which is a very im-
portant step and influences the prediction accuracy of a fore-
casting model. Moreover, the proposed forecasting model is
not compared with other forecasting techniques or model of
the same area.
Zhao et al. [66] have proposed a voltage stability pre-
diction in SG. This model uses historical data for predic-
tion. The data is generated using the power system analysis
software package, it is a well-known software developed by
china electric power research center for data generation. The
number of data samples depends on the complexity of the
model. If the system is highly coupled then more test cases
are required, whereas, in the case of a loosely coupled sys-
tem, fewer test samples are required. After data generation,
PCA has been used for feature selection. For prediction, the
logistic regression-based model is selected. It is trained us-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 20 of 35
ing a cross validation method. The cross validation method
tunes the hyperparameters of the forecasting algorithm and
improves the accuracy of the system. It is an online pre-
diction model. The main contribution of this work is that
interval is defined for prediction. A moving window-based
method is used and data flows in the metrics. When matrices
get full of data, the prediction is made about the stability of
the system for the next few seconds or minutes. When new
data enters the window the old data is not entirely replaced,
rather only a portion of old data is replaced by new data. The
performance of this model is evaluated by computing the ac-
curacy results of the predicted values. The comparison of the
proposed work with existing schemes is not presented.
4.3.4. NB
In [67], data mining techniques are employed for effi-
cient load forecasting. The proposed model eliminates the
outliers and selects relevant features in the first step. In the
next stage, two forecasting algorithms are used to predict
the load demand. The former stage is data preprocessing
which is carried out using data mining techniques, and later
is the forecasting stage where a hybrid algorithm is used for
accurate and fast prediction of the load. For load forecast-
ing, Saleh et al. have proposed a hybrid algorithm named as
KN3B. It is a hybrid version of K-nearest neighbors (KNN)
and the NB technique. NB is used to assign optimal weights
to the training examples. The core idea is to replace the in-
put feature space with the weight space of KNN, NB assigns
weights and the model is trained. To evaluate the perfor-
mance of the proposed model, it is implemented on the EU-
NITE electricity dataset. To measure the accuracy of the
proposed model, precision, sensitivity and accuracy are used
as performance metrics. The performance of the proposed
model is compared with some popular forecasting models
e.g. Improved ARIMA, KNN, ANN and K-mean & KNN.
The simulation results depict that the proposed model out-
performs all the already existing models.
Lago et al. have addressed the problem of electricity
price forecasting in [68]. Two forecasting methods are pro-
posed: one for the single market price integration and sec-
ond for the multiple market integration. A new analysis of
variance (ANOVA) based feature selection algorithm is pro-
posed for the data preprocessing. For price forecasting, DNN
based MLP model is used with multiple hidden layers. Tra-
ditionally the weights are optimized using the Levenberg Mar-
quardt algorithm or gradient descent model but these algo-
rithms are not suitable for the model with a large dataset.
So, in this study Lago et al. have preferred stochastic gradi-
ent descent. Moreover, this network also has hyperparame-
ters that need proper optimization. For parameter optimiza-
tion different configurations are available, evolutionary algo-
rithms are also used for this purpose. However, the former
approach does not provide an optimal solution because of
its fast decision making and the second approach is not suit-
able for the large datasets as its computational time is high.
Owing to this, authors have used NB optimization which re-
quires a low number of function evaluations. It uses the in-
formation obtained from the previous samples and in this
way, the computation efforts are reduced. From BN, a struc-
tured parzen estimator is used. For performance evaluation,
sMAPE is used as a performance metric. The dataset from
electricity markets of France and Belgium is used and for
simulations, Python is used. The performance of the pur-
posed model is not compared with any benchmark forecast-
ing model. The forecasting performance id evaluated with
respect to the original values of the dataset.
4.3.5. Dynamic integrated forecast system DICast
Meanwhile, in [69], Sulaiman et al. have described a big
data-based power generation forecasting system that is be-
ing used by the utility to forecast the power generation of
RES. Sun4Cast system is used to forecast the power gener-
ation from solar irradiations. This system is developed by
NCAR that works for utilities and independent system oper-
ators for developing a forecasting system which allows them
to plan the efficient integration of variable power generation
sources. This model has two important components includ-
ing numerical weather power forecasts (NWP) and the Now-
cast system. Both systems have several forecasting models
integrated with them to forecast the accurate power genera-
tion value. The forecasting models which come under NWP
system include weather research and forecasting (WRF) which
is a newly developed forecasting method for solar irradiance,
global forecast system which is used to forecast the weather
conditions, high-resolution rapid refresh (HRRR) model which
is used to forecast the weather condition of a specific area
(in this model, it forecasts the weather condition up to 3km),
and rapid refresh which is similar to HRRR but has wider
domain and forecasts on an hourly basis. On the other hand,
the Nowcast system includes TSICast, StatCast, CIRACast,
MADCast for cloud coverage prediction and WRF-Solar-
Now for solar irradiation. The DICast is used to integrate
the NWP forecasts, optimizes their weights and generate the
final prediction. These forecasted values are used by the
ISO partners to plan their daily operations related to power.
The day ahead prediction is acquired using NWP. Whereas,
hourly prediction of the same day is acquired through the
NowCast system. This system is making the integration of
RES efficient and encouraging the partners to plan their day
to day operation and rely on the RES.
4.3.6. Quaisi newton method
Gonzalez et al. have proposed an ARMA exogenous
(ARMAX) based functional time series forecasting method
which is scaled over Hilbert space in the study [70]. This
model is named as ARMA Hilbertian model with exoge-
nous variables (ARMAHX) and it is capable of modeling the
complex dependencies of time on a time series curve. It also
includes the explanatory variables necessary for the time se-
ries. The kernel function of the Hilbert operator is modeled
as the weighted sum of the sigmoid function for the sim-
plicity of the model. The hyperparameters of this model are
tuned by using the Quasi-Newton method. The weights are
initialized randomly at the first iteration and tuned with every
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 21 of 35
iteration until the improvement of forecasting accuracy. The
forecasting model is employed on two time series of electric-
ity price data from, Jan 2014 to Dec 2015. Its performance
is compared with MLP, DR model and periodic model in
terms of MAE, RMSE and dynamic weighted MAE, the re-
sults show that the proposed forecasting algorithm outper-
forms these models and generates more accurate and reliable
results.
4.3.7. Levenberg marquardt
Progressing further, in [71], a new short term load fore-
casting model in MG is proposed. In this model, self-recurrent
wavelet NN (SRWNN) is used for the prediction. The per-
formance of this model is evaluated using real-world data.
The dataset from March 2012 to March 2013 of the British
Columbia Institute of information technology is used for the
case study. The WT is used for data preprocessing and as
an activation function for NN. The SRWNN is the hybrid
version of wavelet NN (WNN) and RNN. It combines the
dynamic properties of RNN and fast convergence of WNN
which is applied to the non-linear problems. SRWNN also
has memory to store the information related to past wavelets
which helps it to solve the complex non-linear problems effi-
ciently. It is specially designed to tackle the fluctuations and
volatility of the load time series of the MG. The free parame-
ters of this algorithm need tuning as the Morlet wavelet func-
tion is differentiable with respect to free parameters. Authors
have used the levenberg marquardt algorithm. It was used by
Hagan and Minhag [72] to tune the NN. It is used in literature
to solve the optimization problems and is recommended by
researchers because of its accurate training and fast conver-
gence rate. For performance evaluation, two performance
metrics are used i.e. NRMSE and NMAE. The values of
these metrics are compared with the same parameters values
obtained from both MLP and WNN. The results depict that
the performance of SRWNN has better performance in terms
of both performance metrics.
4.3.8. Excavated association rules
In [73], RBFNN based prediction model is used for pa-
rameter values of power transformers in SG. This prediction
helps to identify the possible failure of the power system.
An RBF is used as an activation function in its hidden layer.
The weights of this prediction model are tuned using exca-
vated association rules. A data-driven model is proposed
in this study, where the Apriori method is combined with a
probabilistic graph model for association rule mining. In this
method, the Apriori algorithm is used to mine the frequent
patterns in a dataset, these items are represented by a prob-
abilistic graph and association rules are excavated by these
graphs. The dataset is searched only once and to discover the
new association rule, the graph is traversed. The complexity
of traversing a graph is also less than searching the whole
dataset. Moreover, the coefficients of support and confi-
dence between rules are also computed. In this model, at
the first stage, the state variables are read from the database.
In the next step, the Apriori algorithm is used for the binary
frequent item set. The conditional probability distribution
is computed of each set. The directed graphs are plotted,
where state variables act as nodes and frequent itemsets and
their probability is treated as edges between them. In the fi-
nal step, the association rules are mined and their support
and confidence are computed. The performance of the fore-
casting algorithm with and without association rule mining
is evaluated. Simulations results depict that the association
rules play a very important role in the improvement of the
forecasting accuracy.
4.3.9. Non-homogeneous poison process (NHPP)
Yue et al. [74] have proposed an electricity outage pre-
diction model based on BN. This model uses historical data
based on radar observations. The regression models are de-
veloped to compute the failure rate of the grid’s components.
These models use the data related to the failure of grid com-
ponents and information acquired using radar. The radar
data contains the information related to the peaks of dif-
ferent weather conditions along with the duration of these
peak values. In the existing literature, only peak values of
weather conditions are used, but according to the authors,
the duration of these peaks also affects the failure rate of grid
components as more time they are exposed to bad weather
conditions, the more chances of their failure are there. The
proposed Bayesian outage prediction (BOP) algorithm uses
both historical data about outages and failure rate models.
The performance of this algorithm is improved by the inte-
gration of NHPP. The test results demonstrate that the pre-
diction of the BOP algorithm has improved significantly when
NHPP is integrated. For simulation, MATLAB is used. The
proposed algorithm uses generic data so it can easily be ap-
plied to any grid without any technical changes. Besides,
authors have mentioned that there is a trade-off between cov-
erage area and the accuracy of the model. It means that if we
consider the large area the computational efforts would in-
crease but as more data would
be available, so, accuracy will be high. On the other
hand, if small area is considered then computational efforts
would be less but at the same time, the variety in data will
be low which can reduce the accuracy of the model. So, a
moderate area should be considered.
4.3.10. Altering direction of multiplier method
In [75], Yu et al. have addressed the problem of electric-
ity load forecasting of an individual residential building. It
is a very complex task to forecast the load behavior of a sin-
gle building. The behavior of individual buildings is very
stochastic and volatile as compared to the load consump-
tion behavior of a city or group of buildings. For prediction,
authors have used sparse coding. Dataset is obtained from
smart meters of the SG data analytics project in collabora-
tion with EPB of Chattanooga, Tennessee. The load data is
collected from 5000 meters along with hourly temperature in
both winter and summer as the temperature is a very impor-
tant feature for load forecasting. Sparse coding is frequently
used in image and signal processing. For load consumption
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 22 of 35
Table 3
Forecasting models and hyperparameter optimization techniques
Forecasting technique Forecasting
domain
Optimization
method Dataset Publishing
year
Genetic-RVM [30] Price GA Real-world prices coming from the New
England electricity market 2015
Game-theoretical based ad-
justed end user forecasting
model[31]
Renewable
power GA A local MG in Hebei Province, China 2017
ESN and PCA decomposi-
tion [32]Load GA ACEA 2015
NN [33]Wind
power GA
Historical measurement records of the
wind farm SCADA system database and
meteorological variables of NWP model
2017
RVM [34] Price MGA New England electricity market 2015
Combined NN based model
[35]Load CSA
Electricity power data from February
2006 to 2009 for the State of New South
Wales, August 2006 to 2008 for the
State of Victoria and November 2006 to
2008 for the State of Queensland, Aus-
tralia
2015
DCANN [37] Price CSA Australia electricity market 2016
SVR [38] Load MFA
Practical daily load data of Fars province
in Iran published by Fars Electrical
Power Company
2014
MIMO-LSSVM model [39]Load and
price QOABCO Real data of the New York Independent
System Operator 2015
IENN [40] Load NSSA optimization
algorithm Customized dataset 2017
Neural predictors FNN and
RBF[41]Demand PSO Customized dataset 2017
Online SVR [42] Load PSO & ACO Public Irish CER dataset 2017
ARIMA/SVR [43] Load Metaheuristic algo-
rithms
Customized smart metering infrastruc-
ture 2016
Generalized neuron model
[45]Price
Improved environ-
ment adaptation
method
Electricity market of New South Wales 2017
BN [46] Demand Tabu search Pacific Northwest national lab 2017
Multi and uni-variant mod-
els [47]Price Grid search
Nordic power exchange, Nord Pool
Spot, owned by the Nordic and Baltic
transmission system
2015
CNN and K-mean [48] Load
Trial-and-error
method and Adom
optimizer
Big electricity load dataset from the
power industry 2017
Deep learning based models
[49]Load Cross validation, pa-
rameter grid search
Data from a campus building in the
Hong Kong Polytechnic University 2017
MLR, MLP, SVR [50] Load Grid search Customized dataset 2015
Black and grey box testing
models[52]Load Grid search Obtained from simulations 2015
ANN [53] Price Grid search EPEX for the German/ Austrian power
market 2016
CNN [54] Load Stochastic gradient
descent
Benchmark dataset named individual
household electric power consumption
dataset
2017
Deep learning methods [55]Price pre-
diction ADAM optimizer Day-ahead market in Belgium, i.e. Eu-
ropean power exchange Belgium 2018
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 23 of 35
Table 3
Forecasting models and hyperparameter optimization techniques
Forecasting technique Forecasting
domain
Optimization
method Dataset Publishing
year
SDA and RS-SDA [56] Price Gradient descent Nebraska, Arkansas, Louisiana, Texas,
and Indiana 2017
RBFNN model based on
PCA-WFCM [57]Load Gradient descent Real time hourly load data (in MW Hrs.)
of New South Wales State, Australia 2016
LASSO [58]Power gen-
eration Gradient descent Three different datasets gathered in
both US and UK 2018
SD-EMD-LSTM [60] Demand Stochastic gradient
descent ISO New England 2017
FFNN/SVR [61] Load Cross validation 2 years of event data from Green Button
ISO hubs in the U.S 2016
Inverse optimization
scheme [62]Load Bi-level program-
ming problem
Actual data obtained from a real-life ex-
periment 2016
RF [63] Load Cross validation Customized data of a university campus 2018
Anomaly detection model
[64]
Anomaly
prediction Cross validation Self-generated data 2017
SVM [65]Blackout
prediction Cross validation Monte-Carlo simulations 2014
Logistic regression [66]Voltage
stability Grid search Power system simulation software
PSASP 2016
Prediction
Hybrid KN3 B predictor
[67]Load NB EUNITE electrical load dataset 2016
DNN [68] Price Parzen estimator
based NB
Data from EPEX-Belgium and EPEX-
France power exchanges 2017
NWP [69]Power gen-
eration DICast Not mentioned 2017
ARMAX time series model
[70]Price Quasi-Newton algo-
rithm Spanish electricity market operator 2018
SRWNN [71] Load Levenberg mar-
quardt
British ColumbiaâĂŹs and Californi-
aâĂŹs power system data 2015
RBFNN [73]
State pa-
rameter
of power
transform-
ers
Excavated associa-
tion rules
Data of state parameters of five 500 kV
power 2016
Bayesian Approach [74]
Power out-
age predic-
tion
NHPP
Radar data and local surface meteoro-
logical measurements from the national
weather service stations
2017
Sparse coding [75] Load Alternating direction
of multiplier method Electric power board of Chattanooga 2017
information from multiple meters, a dictionary D is learned,
having q dimensions. Two types of sparse codes are devel-
oped in this study. The first one is basic sparse which has
a penalty for sparsity and square loss for reconstruction er-
ror. The second model is group sparse, it is the same as
the basic sparse the difference lies in penalty function. In
this model, the single penalty is exchanged with a group
penalty. The dictionary learning problem of this method is
solved using altering direction of multiplier method. This
model is popular to obtain the optimized values for the ob-
jective function. After learning the sparse code, a regres-
sion model is trained for prediction. In this study, the rigid
regression model is used for day-ahead and next week load
forecasting. This model is chosen instead of frequently used
SVR model because both have similar forecasting accura-
cies and rigid regression takes less time in training. This
model solves the optimization problem and learns the re-
gression weight vector by using the input values. The perfor-
mance of the proposed model is compared with ARIMA and
Holt-Winters forecasting model. All the forecasting mod-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 24 of 35
els are implemented on the dataset of electricity consump-
tion data of house-holds in Chattanooga, TN. The duration
of this dataset is from Sep 2011 to Aug 2013. The data of
meters consuming low power for two months is eliminated
from the dataset, as it happens rarely and such kind of data
can cause less accurate forecasting results. The temperature
data is also of the same duration and acquired from an air-
port. To compare the performance of the proposed model
with benchmark models, MAE, RMSE and MAPE are used
as performance metrics. The simulations are carried out
using MATLAB on Intel Xeon quad Core 3.33 GHz CPU
and 24Gb of RAM. For sparse code MEX written in C is
used. The simulation results depict the effectiveness of the
proposed sparse coding based forecasting model. The best
performance in terms of all performance metrics is of basic
sparse based model and group sparse based model is second-
best.
Table 3contains the forecasting methods and their re-
spective hyperparameter optimization methods used in liter-
ature. It also contains the information related to the dataset
and publishing year of the respective study.
5. Data preprocessing
In the previous section, we have discussed optimization
methods for the hyperparameter tuning of forecasting algo-
rithms in detail. Another important factor which affects the
forecasting accuracy of these methods is the quality of data.
A data set mostly includes noise in it, and it needs to be
cleaned before using it for forecasting. In this section, we
are going to discuss some data preprocessing methods, used
in literature for data cleaning.
Wang et al. have proposed a novel electricity price fore-
casting model in the study [30]. This model has 3 essential
modules namely: feature selector, feature extractor and fore-
casting module. For feature selection, a new hybrid model
has been proposed. This model combines the RF and re-
lief F algorithms. Both algorithms evaluate the importance
of input features independently and then the resultant values
from both algorithms are considered jointly for selection or
rejection of a feature. The joint value is compared with an
already defined threshold and features having lower values
are discarded. In this way, the less important and irrelevant
features are filtered out from the dataset. However, there
still exists some redundancy in the dataset. So, the resultant
dataset from the first module is sent to the next feature ex-
traction module of the proposed model, kernel PCA (KPCA)
which is the variation of PCA. PCA is commonly used for
feature extraction and redundancy elimination, however, it
linearly maps data from high dimension to lower dimension.
The electricity price forecasting needs non-linear mapping,
so, its variant KPCA is used as it performs non-linear dimen-
sion reduction. The third and final module of the proposed
model is price forecasting module.
In [35], 3 datasets are used for prediction and all of them
are in raw form and need preprocessing to eliminate redun-
dancy and spikes from data. So, the datasets are catego-
rized according to the different days of the week, as each day
has different load behavior. Moreover, a data preprocess-
ing technique, longitudinal data selection, is also employed
to make data more reliable and improve forecasting accu-
racy. Meanwhile, in [67], the proposed model eliminates
the outliers and two algorithms are used for feature selection.
The primary goal of the outlier elimination step is to discard
those data objects from the dataset which have exceptional
and rare behavior when compared to a large dataset e.g. data
objects recorded on some special event like Christmas or
New Year eve. The outliers are the main cause of overfit-
ting of the forecasting model as these are unwanted and rare
training patterns which have misleading behavior. So in this
study, distance-based outlier rejection (DBOR) is employed.
For feature selection, Saleh et al. have used two algorithms.
The first one is a wrapper based feature selection algorithm.
As this algorithm has some characteristics of GA, it explores
the search space. However, like other wrapping algorithms,
it can also detect only local maxima. To overcome this limi-
tation, a filter-based feature selection model, recursive best-
first search, is used. It uses an evaluation function to calcu-
late the importance of every feature. This new hybrid feature
selection algorithm is named as UHFS. The selected features
are then sent for load forecasting. Moreover, the effective-
ness of the preprocessing method is evaluated using differ-
ent scenarios e.g. DBOR+UHFS, DBOR and UHFS. The
results show that the highest accuracy is acquired when both
DBOR and UHFS are used together.
In [32], values of the dataset are rescaled in the range
[0, 1]. For rescaling, a unity based normalized method is
used. It is evident form the literature that a time series based
multivariate forecasting has high complexity which can be
efficiently reduced by using SVD. So, in this work, Bianchi
et al. have used the SVD based dimension reduction method
KPCA. It is a statistical method that generates principle com-
ponents by applying orthogonal transformation on correlated
variables. The orthogonal property of KPCA enables each
column to predict the price individually as the value of 𝑎(𝑖, 𝑗)
depends on 𝑎(𝑖−, 𝑗),𝑎(𝑖− 2, 𝑗 )∨̀‥. The authors took advan-
tage of this property and considered each column a separate
time series and used them for prediction individually. In the
end, all predicted values integrated for the final result. The
proposed model in [39] has three steps. The first two steps
are for data preprocessing and the third step is for prediction.
In the first step, the less important features are filtered out us-
ing a new feature selection algorithm. This algorithm uses
the greedy search and selects those features which have the
highest correlation with already selected features. This algo-
rithm is named as generalized mutual information (GMI). In
the next step, the dataset is divided into several subsets using
the wavelet packet transform (WPT) method. The basic steps
of WPT are the same as the discrete wavelet transform, ex-
cept it decomposes the detailed coefficients in addition to the
approximate coefficient. So, information is not lost during
the decomposition process. The WPT has multiple branches
and the best branch is selected using Shannon entropy cri-
teria. The introduction of this new selection criteria in the
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 25 of 35
WPT branch selection method is also one of the contribu-
tions of this work. For data preprocessing, a new method
based on the index of bad sample matrix (IBSM) is proposed
in [37]. The existing feature selection methods, used in lit-
erature, select the features without considering their relia-
bility. Furthermore, the number of selected features is often
fixed and the contribution of each input feature to the output
is ignored. In the proposed IBSM method these limitations
are addressed. It dynamically selects the relevant features
and bad samples are filtered out using the original indexes of
training samples. Implementation of IBSM is the first step
of this method, in the next step SOM is used. It is an unsu-
pervised learning method which belongs to the category of
ANN and it maps the input space of training samples to a
low dimensional output space.
Liu et al. [40] have proposed a sliding window EMD
(SWEMD) model for data preprocessing. It is the exten-
sion of the EMD model. For feature selection, a new algo-
rithm is introduced which is based on PearsonâĂŹs method
of computing the correlation of the features. For data pre-
processing, the temperature and load data are normalized
over the interval [0, 1]. To further improve the data qual-
ity and reduce the fluctuations in values, SWEMD is ap-
plied. After this step, features are evaluated using Pearson-
âĂŹs correlation method. This newly proposed method suc-
cessfully reduces the dimensions of the dataset as it removes
the redundancy and selects highly correlated features. This
method is named as maximize the relevancy and minimize
the redundancy based PearsonâĂŹs correlation coefficients
(MRMRPC). After feature selection, the data is forwarded to
the forecasting engine for prediction. The dataset is prepro-
cessed using the WT algorithm in [41]. Here, fluctuations
and uncertainties are removed from the dataset for accurate
prediction. Progressing further, in [68], it is stated that the
already existing features do not consider the model perfor-
mance while the filtering process which results in redundant
features and relative importance of the features is also not
computed. Moreover, in the case of the non-linear model,
the input features are transformed from a higher dimension
to the lower dimension which may result in loss of informa-
tion of the input features. In this regard, a new feature selec-
tion algorithm is proposed which is ANOVA based wrapper
selection method. It selects features without transforming
them into a lower dimension. In the first step, the features
are modeled as hyperparameters. In the second step, the hy-
perparameter optimization algorithm, tree-structured parzen
estimator, is used to optimize and select the optimal features.
In the next step, the importance of the features is analyzed.
The last step uses the feature importance value for the se-
lection of the features. In this step, a threshold is defined
and features having a value greater than this threshold are
selected for forecasting.
For data preprocessing in [69], the correlation of the in-
put variable is computed using PearsonâĂŹs correlation method.
In the first step of the proposed forecasting model, the fre-
quent patterns are identified from the available dataset with
respect to the weekdays using simple moving average method.
It is a popular method which highlights the frequent trends
in data for the long term while short term fluctuations are ig-
nored. In [57], PCA is used for dimension reduction of data
and fuzzy-based clustering is used in this algorithm where
each point has some degree of membership with each clus-
ter. In related work [48], data is cleaned and normalization
operation is performed to shrink the interval of values by tak-
ing their log. After normalization of values, it is important to
remove the irrelevant factors form the dataset as the presence
of these factors could affect the forecasting accuracy of the
model. For this purpose, PearsonâĂŹs correlation method
is used to compute the relevance and the computed values
of each factor are compared with a threshold. Factor having
values less than this threshold are filtered out.
Gonzalez et al [70], instead of using functional PCA for
data preprocessing, have proposed a functional data theory
with standard time series which uses the sigmoid function
to generate the appropriate parametric functional operator.n
This model solves the problem of information loss present
in functional PCA. While in [50], the available data is pre-
processed. In the first step of data preprocessing the rele-
vant features are selected which have higher predictive ca-
pacity and minimum redundancy. In the second step, the in-
stances with missing values are removed. A total of 21.16%
instances are removed in this step. Outliers elimination is
also an important step of data preprocessing which improves
the accuracy of data up to a significant level. It has trade-off
as if the criteria of outliers elimination is strict then informa-
tion is also lost. During this step, 1.49% of data is discarded.
In the last step, the data is normalized. Before applying the
forecasting model on the whole dataset, a subset of data is
selected as its representative. In this way, the computational
cost is reduced without affecting the performance of the sys-
tem.
Keles et al. [53] have used a moving median method to
eliminate the trends and seasonal components from available
data. The autocorrelation function is applied to get the in-
formation of the lags in data. Capacity utilization function
and residual load indicator functions are used to compute the
ratio between residual load and available load and extreme
changes in residual data respectively. The information ob-
tained by these methods is used as input in the ANN model.
The mutual information method is used to determined suit-
able lag for the input variable, in this study authors have used
this method after normalization of data. The next step is to
identify the relevant input data. KNN based filter method is
used for this purpose. A backward elimination and forward
validation procedures are used to select the subset of input
data.
In related work [45], the WT model is used for data pre-
processing. This model is applied to the price time series as
its behavior is not suitable for processing. This data prepro-
cessing method separates the low and high-frequency com-
ponents of data. In this way, the accuracy of the forecast-
ing model is improved. While Chitsaz et al. [71] have used
WT for data preprocessing. It classifies the dataset into low
and high-frequency components. Both components of the
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 26 of 35
Table 4
Data preprocessing techniques and datasets used in forecasting model
Software Hardware Data preprocess-
ing techniques Comparative techniques Data dura-
tion
Python [30]
Intel Core i5, 4 GB
RAM, and 500 GB
hard disk
GCA based selector,
KPCA having 50000
records
NB, DT 2010 to
2015
Not mentioned
[31]Not mentioned Not included BP, SVM
September
2015 to
October
2016
MATLAB using
ESN toolbox
[32]
Not mentioned Unity based normal-
ization ARIMA, ESN 3 years of
dataset
MATLAB [33]
Core i5-5200 CPU,
2.20 GHz processor
and 4 GB RAM
Not included Double stage BP trained ANN
1st May
2014 to
31st April
2015
Not mentioned
[34]Not mentioned Not included Individual RVM models, ARMA
and the naive forecaster
January to
December
2001
Not mentioned
[35]Not mentioned Longitudinal data
selection
BPNN, BPNN with hidden layers,
GABP, RBF, GRNN
2006 to
2009
MATLAB 7.0 &
Windows 7 [37]
i7-3770 3.40 G Hz
CPU SOM and IBSM BPNN, LSSVM, DCANN, FNN,
ARFIMA, GARCH 2010
Not mentioned
[38]Not mentioned Not included ARMA model, ANN, SVR-GA,
SVR-HBMO, SVR-PSO, SVR-FA
March 2007
to February
2010
MATLAB
(R2011a) [39]
2.53 GHz Pentium
2 processor with 4.0
GB of RAM
WPT and GMI ANN
January 1
to March 1,
2014
Not mentioned
[40]Not mentioned MRMRPC, Normal-
ization
ARIMA, SVM, BPNN, RBFNN,
GRNN, fuzzy ARTMAP,
WT+BPNN, WT+RBFNN,
WT+GRNN, WT+FA, WT+FFA
+ FA, WT+MIMO+NN
August 10,
2015 to
August 10,
2016
MATLAB with
NN Toolbox
[41]
Not mentioned WT
BPNN, FNN+PSO, WT+BPNN,
WT+FNN+PSO, EN, ARIMA,
RBF
January 1,
2014 to
December
31 2014
Not mentioned
[42]Not mentioned Not included
PSO vs ACO, RF-week, BAGG,
online SVR, XRT-week, BAGG-
week, DSHW, STL + ARIMA-
week, XGB-week, XGB, XRT, STL
+ ES-week, RF, SVR-week, SVR,
DLnet, MLP STL + ARIMA, STL
+ ES
2009 to
2010
MATLAB [43]
Server system,
high performance
computer, data
server
Not included Not included Not men-
tioned
Not mentioned
[45]Not mentioned WT WT+GNM Not men-
tioned
RStudio [46] Not mentioned Fayyad and Irani dis-
cretization Actual values
April 1st,
2006 to
March 31st,
2007
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 27 of 35
Table 4
Data preprocessing techniques and datasets used in the forecasting model
Software Hardware Data preprocess-
ing techniques Comparative techniques Data dura-
tion
Not mentioned
[47]Not mentioned Reduced rank
Bayesian, VAR
FMs, reduced rank models, fore-
cast combination
1992 to
2010
Python using
tensor flow [48]Not mentioned
PearsonâĂŹs
product-moment
correlation
Linear regression, SVR, NN 2012 to
2014
Not mentioned
[49]Not mentioned DT Gradient boosting machines, SVR,
XGB trees
Not men-
tioned
Weka software
[50]
Intel Core i7-4500U
processor and 8 GB
of DDR3 RAM
Genetic search,
correlation-based
feature selection
MLR, MLP, SVR
13th May,
2013 to
26th March,
2014
MATLAB [52] Not mentioned Not included ANN, SVM
April to Oc-
tober 2008,
19488 sam-
ples
Not mentioned
[53]Not mentioned
Autocorrelation,
capacity utilization
factor, relative load
indicator, mutual
information method
Naive forecasts, ARIMA
July 2011 to
September
2013
Not mentioned
[54]Not mentioned Not included
LSTM sequence-to-sequence, fac-
tored restricted boltzmann ma-
chines, ANN SVM
December
2006 to
Novem-
ber 2010
with 34608
records
Python using
keras DL library
[55]
Not mentioned Box-cox transforma-
tion
DNN, GRU, LSTM, MLP, SVR,
SVR-ARIMA, XGB, FARX-
EN, CNN, FARX-Lasso, RBF,
FARX, RF, HMARX, DR, TARX,
SNARX, TBATS, SOM-SVR,
ARIMA-GARCH, AR, DSHW,
TF, WARIMA-RBF, WARIMA,
DSARIMA
January 1,
2010 to
November
31, 2016
Python 2.7 [56]Core I5 CPU and 8
GB RAM Boosting trees Classical NN, SVM, Lasso
January,
2012 to
November,
2014
Not mentioned
[57]Not mentioned PCA RBF, RBF-FCM
04 Apr,
2011 to 24
Oct, 2011
Not mentioned
[58]Not mentioned LASSO, Kendall’s
coefficients SVM 2006 to
2013
Not mentioned
[60]Not mentioned Xgboost algorithm ARIMA, BPNN, SVR 2003 to
2016
Not mentioned
[61]Not mentioned Not included Variants of NN and SVR Duration of
two years
R and CPLEX
12.3 [62]
Quad Core 2.90 GHz
and 6 GB RAM Not included ARX
September,
2006 to
March,
2007
RStudio with
R-3.0.2 [63]Not mentioned Pearson correlation
comparison
TBATS, DT, multiple regression,
gradient boosting machine, SVR,
ANN, J. Moon et al
2012 to
2016
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 28 of 35
Table 4
Data preprocessing techniques and datasets used in forecasting model
Software Hardware Data preprocess-
ing techniques Comparative techniques Data dura-
tion
Not mentioned
[64]Not mentioned
Mathematical model
proposed for missing
values detection
Base mode
Historical
data of mul-
tiple smart
meters in
real time
MATLAB [65] Not mentioned Not inlcuded Base value
March 21st,
2010 to
June 28th,
2013
Power system
analysis soft-
ware package
[66]
Not mentioned PCA No comparison is available Not men-
tioned
Not mentioned
[67]Not mentioned
DBOR, genetic
based features
selector, UHFS
BPNN, IKNN, NBSVM, INB
January 1,
1997 to
December
31, 1998
Python [68] Not mentioned
Wrapper selection
algorithm based on
functional ANOVA
Single market model is compared
with multiple markets model
January 1,
2010 to
November
31, 2016
Not mentioned
[69]
Cloud resources are
used Not included
CIRACast, MAD-WRF, Smart
Persist, WRFSolarNow, MAD-
CAST, NowCAST, StatCAST-
Cubist
Not men-
tioned
MATLAB [70] Not mentioned Autocorrelation
functions
MLP, DR model, periodic model,
NB, functional reference method,
FPC dimension reduction
January 1,
2014 to
December
31, 2015
Not mentioned
[71]
Mac Intel Core i5
2.7 GHz with 12 GB
RAM
WT WNN and MLP
March 2012
to March
2013
Not mentioned
[73]Not mentioned Apriori Tid Test cases with and without using
association rules
March 21st,
2010 to
June 28th,
2013
MATLAB sta-
tistical Toolbox
[74]
Not mentioned Parameterization Actual and e-ported values
January,
2010 to
September,
2014
MATLAB and
C [75]
Intel Xeon quad
Core 3.33 GHz CPU
and 24 GB RAM
Sparse coding ARIMA, Holt-Winters
September,
2011 to
August,
2013
time series are then processed separately by the forecasting
model. A feature selection technique used in [76] is also used
in this paper. This model relies on the information acquired
by the mutual information method and selects the features
having a higher mutual information score. This method dis-
cards the irrelevant and redundant features by applying irrel-
evancy filter. A LASSO based algorithm is used in [58] for
the selection of input variables. In this work, instead of us-
ing all weather-related variables for solar power prediction,
only selected variables are used. In this way, the volume of
data to be processed by the forecasting algorithm is reduced
which minimizes its computational complexity. In this al-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 29 of 35
gorithm, the loss function is tuned and used to compute the
importance of each variable. The variables with high impor-
tance are then selected as input variables. For the clustering
of the SD, the Xgboost algorithm is used in [60]. SD cluster-
ing is used, as the traditional data features can lead the model
to slow convergence and poor accuracy. The relationship of
input features to the output is also learned in this model. NN
is applied in time series prediction but because of complex
linear and non-linear properties of time series, chances of
getting trapped into local minima are high. To address this
problem, the EMD method is employed here. It identifies the
frequent trend in time series and separates the singular val-
ues. It reduces the extra computational efforts. In [66], the
data is generated using a data generation software developed
by china’s electric power research institute. The important
features form this dataset are selected using PCA.
Table 4contains the information related to the data pre-
processing techniques used in literature along with the infor-
mation related to software and hardware which are used for
the implementation of the model. Moreover, the duration of
the dataset and comparative techniques are also mentioned
in the table.
6. Critical analysis
In this section, the analysis of the frequently used hyper-
parameter optimization methods for forecasting algorithms
used in the SG domain is presented. Hyperparameter train-
ing is very important for the efficient forecasting. It trains
the model according to the dataset and tuning these parame-
ters improves the forecasting accuracy significantly. So, we
have discussed the tuning methods used by researchers in
recent years. These methods are compared in terms of their
performance in optimization. Moreover, the importance of
data preprocessing is also analyzed and it is highlighted that
how to select the preprocessing methods for efficient results.
Critical comment 1: From the literature review, it is
observed that, grid search [48]-[53], gradient descent [54]-
[60], cross validation [61]-[66] and NB [67,68] are frequently
used optimization methods. Grid search is a traditional way
of finding the optimal values of hyperparameters. It uses per-
formance metrics to move towards an optimal solution. Gra-
dient descent method outperforms grid search. It first com-
putes the gradients for the required hyperparameters then
tunes their values using gradient descent. It was designed
for the NN. The limitation of this algorithm is that it can be
trapped in local optima. Moreover, cross validation and its
variations, i.e., sliding window and k-fold cross validation
methods, are also commonly used for hyperparameter opti-
mization. However, NB outperforms both gradient descent
and cross validation methods. In this method, a probabilis-
tic method is built to map the hyperparameter values to the
objective function. The limitation of NB is that the choice
of covariance function for the practical problem is uncertain
and it also has hyperparameters that need proper tuning. The
aforementioned statistical methods are good to train the fore-
casting model for a small dataset but as the size of the dataset
increases, there is a high chance for these methods to suffer
from the curse of dimensionality problem. This makes them
unsuitable for the training of forecasting models using big
data.
Critical comment 2: The nature-inspired algorithms have
evolved as a promising solution for the hyperparameter op-
timization. Researchers have used nature-inspired optimiza-
tions methods to tune the hyperparameters in [30]-[46]. These
algorithms are suitable to find the optimal solution in a large
search space. GA is a frequently used nature-inspired algo-
rithm which basis on nature’s principle of survival. PSO is
also a frequently used population-based optimization algo-
rithm. It has better performance and computationally faster
than GA. However, from the literature review, it is observed
that CSA has better performance than PSO but its execu-
tion time is more. The highlighting feature of CSA is its
fast convergence in fewer iteration, so, its execution time to
find an optimal solution becomes equal to the PSO. FA used
in [38] outperforms all these nature-inspired algorithms and
its highlighting feature is its ability to remember the best so-
lution. In our opinion, the nature-inspired algorithms give
optimum solution and are suitable for the big data problems.
The integration of these optimization methods with forecast-
ing algorithms can improve the forecasting accuracy as well
as require less time to optimize the hyperparameters.
Critical comment 3: From the literature, it is also ob-
served that the existing nature-inspired algorithms also have
limitations. Their performance is improved by adding addi-
tional steps in them. For example, the QOABC algorithm is
the variation of the ABC optimization algorithm. ABC op-
timization algorithm has fast convergence which sometimes
results in less efficient results. This limitation is overcome
by introducing QOABC and hyperparameters are tuned us-
ing this improved version and required results are obtained
[39]. Another example is NSSA, it is introduced in [40]. In
this algorithm, Euclidean distance is added to get the best
position using neighbor and legacy best position informa-
tion. From these studies, we conclude that there is still room
for improvement in nature-inspired algorithms. The already
existing algorithms can be further improved by making the
existing methods hybrid and results of forecasting would be
more accurate.
Critical comment 4: From a brief literature review, it
is learned that all data preprocessing steps are not neces-
sary. For example, studies where independent variables are
selected by researchers, the feature selection methods are not
used for such datasets. Here, data normalization and outliers
removal steps must be used. It is analyzed from the litera-
ture that a dataset contains many outliers that affect the train-
ing of forecasting models. The removal of outliers increases
the accuracy of prediction as their presence disturbs the nor-
mal patterns of data and the forecasting model learns wrong
information. Moreover, we also observed that the features
of the dataset contain values scaled over different intervals
and to make these features comparable, data normalization
methods are used. The existing data preprocessing models
are working well; however, the use of the INTERNET of
things technology is increasing both the volume and hetero-
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 30 of 35
geneity of data. So, data preprocessing methods should also
be improved.
7. Future directions
The study of existing literature depicts the improvement
in hyperparameter tuning is a continuous research area. Some
future directions and challenges are identified from the avail-
able literature. In this section, we discuss some important
future directions for hyperparameter optimizers for their im-
provement.
1. The SG is evolving with each passing day and new
actors are being integrated into it. So, it will require
new models and applications based on forecasting al-
gorithms e.g. ANN. These newly proposed models
will require tuning and their performance will be af-
fected by their learning [77]. Efficient and competitive
tuning methods would be the need for these methods.
2. As forecasting using big data results in better and ac-
curate prediction results but it also increases the com-
putational overhead. Tuning the hyperparameters of
forecasting algorithms for such models also becomes
computationally expensive as search space becomes
more complex. Thus more efficient optimization ap-
proaches are always desired. They can be achieved by
either proposing new optimization techniques which
require less configuration to reach an optimal solu-
tion, or a better model can be designed for the eval-
uator which enable it to evaluate the best solution in
less time.
3. In [78], a framework has been proposed, where pa-
rameters are identified by the evaluator and their suit-
able values are chosen using the optimization method.
This process is carried out iteratively. However, this
method is not suitable for the problems with large datasets.
So, simultaneous updation of both parameters and their
configuration is needed for future models.
4. The performance of most of the optimization algo-
rithms is nearly the same. The efficiency of these al-
gorithms is very important as the computational re-
sources are expensive assets. So, an automatic ma-
chine learning method should be adopted to select the
most suitable optimizer which considers both efficiency
and performance and maintain a balance between them.
5. From existing literature, it cannot be concluded that
which optimization method is best to optimize the hy-
perparameters of forecasting algorithms. So, the re-
searchers need to explore several optimization algo-
rithms rather than relying on a single algorithm to achieve
the best performance of their forecasting algorithm.
6. Deep learning is a popular area of machine learning.
The DNN seems to have a promising future in fore-
casting with big data. These algorithms are however
difficult to train as compare to the shallow networks.
Despite their popularity and better performance, their
theoretical aspects need to be explored.
7. The flow of power over transmission lines is not con-
stant. It depends on the demand for electricity and
power supplied by utilities. So, the forecasting algo-
rithms should be analyzed from this perspective also.
8. Conclusion
This paper presents a brief and comprehensive survey
of optimization techniques used for the optimization of hy-
perparameters of the forecasting model in SG. From liter-
ature, it is observed that the grid search and cross valida-
tion techniques are commonly used methods but as the size
of the dataset increases, they require more computational
time. On the other hand, researchers have applied nature-
inspired heuristic optimization techniques to optimize these
parameters. These techniques work efficiently as compared
to legacy methods. A comparison of forecasting accuracy
also depicts that these algorithms work efficiently and give
better performance than grid search, gradient descent and
cross validation algorithms.
In this paper, the data preprocessing methods are also
discussed and it is concluded that data preprocessing is an in-
evitable step for forecasting. The feature selection step is im-
portant to reduce the number of input variables as it reduces
the size of data which reduces the computational overhead.
The feature extraction and filtering method removes the out-
liers and missing values from data which may lead to inac-
curate results and model under-fitting. The values of data
are normalized to make the variables comparable. Without
normalization, the dependencies and influence of variables
on each other and output may not be computed efficiently
which may result in the poorly trained forecasting model.
Abbreviation
ABCO Artificial bee colony optimization
ACO Ant colony optimization
AE Absolute error
ANFIS Adaptive network based fuzzy inference
system
ANN Artificial neural network
ANOVA Analysis of variance
AR Auto regression
ARFMA Auto regressive fractionally integrated
moving average
ARIMA Autoregressive integrated moving average
ARMA Autoregressive moving average
ARMAHX Autoregressive moving average hilbertian
mosel with exogenous variables
ARMAX Autoregressive moving average exogenous
ARMSE Average root mean square error
ARX Autoregressive with exogenous input
BMA Bayesian model averaging
BOP Bayesian outage prediction
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 31 of 35
BP Back propagation
BPNN Back propagation neural network
CNN Convolutional neural network
CSA Cuckoo search algorithm
DAE Deep autoencoder
DAR Dynamic auto regression
DBN Deep belief networks
DBOR Distance based outlier rejection
DCANN Dynamic choice artificial neural network
DE Differential evolution
DICAST Dynamic integrated forecast system
DNN Deep neural network
DR Dynamic regression
DSARIMA Double seasonal auto regressive moving
average
DSHW Double seasonal holt winter
DT Decision tree
ESN Echo state network
EMD Empirical mode decomposition
ENN Elman neural network
ES Exponential smoothing
FA Firefly algorithm
FARX Full auto regression with exogenous input
FARX-EN Full auto regression with exogenous input
elastic net
FARX-LASSO Full auto regression with exogenous input
least absolute shrinkage and selection op-
erator
FM Factor model
FNN Fuzzy neural network
GA Genetic algorithm
GABPNN Genetic algorithm optimized back propa-
gation neural network
GARCH Generalized auto regressive conditional
heteroskedasticity
GMI Generalized mutual information
GRNN Generalized regression neural network
GRU Gated recurrent units
HRRR High resolution rapid refresh
IENN Improved elman neural network
HMARX Hsieh-Manski auto regressive exogenous
input
IBSM Index of bad sample matrix
KNN K-nearest neighbors
KPCA Kernel principle component analysis
LASSO Least absolute shrinkage and selection op-
erator
LSSVM Least square support vector machine
LSTM Long-short term memory
MAE Mean absolute error
MAPE Mean absolute percentage error
MFA Modified firefly algorithm
MG Micro grid
MIMO Multiple input multiple output
MLP Multi layer perceptron
MLR Multiple linear regression
MRMRPC Minimize the redundancy based Pearson-
âĂŹs correlation coefficients
MSE Mean square error
NB Naive bayes
NHPP Non-homogeneous poison process
NMAE Normalized mean absolute error
NMAPE Normalized mean absolute percentage er-
ror
NN Neural network
NRMSE Normalized root mean square error
NSSA Novel shark search algorithm
NWP Numerical weather power forecasts
PCA Principle component analysis
PCA-WFCM Principle component analysis based
weighted fuzzy C-Mean
PSO Particle swarm optimization
PV Photovoltaic
QOABCO Quaisi oppositional artificial bee colony
optimization
RBF-FCM Radial basis function fuzzy C-Mean
RBFNN Radial basis function neural network
RES Renewable energy source
RF Random forest
RMSE Root mean square error
RNN Recurrent neural network
RRR Reduced rank regression models
RS-SDA Random sample stacked denoising autoen-
coder
RVM Relevant vector machines
SD Similar days
SDA Stacked denoising autoencoder
SG Smart grid
sMAPE Symmetric mean absolute percentage error
SNARX Smoothed nonparametric auto regressive
with exogenous inputs
SOM Self-organization map
SRWNN Self-recurrent wavelet neural network
SSA Shark search algorithm
SVD Singular value decomposition
SVR Support vector regression
SVR-HBMO Support vector regression honey bee mat-
ing optimization
SWEMD Sliding window empirical mode decompo-
sition
TARX Threshold auto regression with exogenous
inputs
TBATS Trigonometric regressors to model Box-
Cox transformations autoregressive mov-
ing average errors trend seasonality
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 32 of 35
TF Transfer function
VAR Vector auto regression
WARIMA Wavelet auto regressive integrated moving
average
WNN Wavelet neural network
WPT Wavelet packet transform
WRF Weather research and forecasting
WT Wavelet theory
XGB Extreme gradient boosting
References
[1] Munshi, Amr A., and A-RI Mohamed Yasser. "Big data framework
for analytics in smart grids." Electric Power Systems Research 151
(2017): 369-380.
[2] Yusof, Yuhanis, and Zuriani Mustaffa. "A review on optimization of
least squares support vector machine for time series forecasting." In-
ternational Journal of Artificial Intelligence & Applications (IJAIA),
Vol. 7, No. 2, March 2016. 35-49.
[3] Zhang, Le, and Ponnuthurai N. Suganthan. "A survey of randomized
algorithms for training neural networks." Information Sciences 364
(2016): 146-155.
[4] Han, Fei, Jing Jiang, Qing-Hua Ling, and Ben-Yue Su. "A survey
on metaheuristic optimization for random single-hidden layer feed-
forward neural network." Neurocomputing 335 (2019): 261-273.
[5] Afshin, Mohammdreza, Alireza Sadeghian, and Kaamran Raahemi-
far. "On efficient tuning of ls-svm hyper-parameters in short-term load
forecasting: A comparative study." In 2007 IEEE Power Engineering
Society General Meeting, pp. 1-6. IEEE, 2007.
[6] Jiang, Mingfeng, Shanshan Jiang, Lingyan Zhu, Yaming Wang, Wen-
qing Huang, and Heng Zhang. "Study on parameter optimization for
support vector regression in solving the inverse ECG problem." Com-
putational and mathematical methods in medicine 2013 (2013).
[7] Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. "Neural
architecture search: A survey." arXiv preprint arXiv:1808.05377
(2018).
[8] Tu, Huy, and Vivek Nair. "Is one hyperparameter optimizer enough?."
In Proceedings of the 4th ACM SIGSOFT International Workshop on
Software Analytics, pp. 19-25. ACM, 2018.
[9] Bergstra, James S., RÃľmi Bardenet, Yoshua Bengio, and BalÃązs
KÃľgl. "Algorithms for hyper-parameter optimization." In Advances
in neural information processing systems, pp. 2546-2554. 2011.
[10] Darwish, Ashraf, Aboul Ella Hassanien, and Swagatam Das. "A sur-
vey of swarm and evolutionary computing approaches for deep learn-
ing." Artificial Intelligence Review 53, no. 3 (2020): 1767-1812.
[11] Karaboga, Dervis, and Ebubekir Kaya. "Adaptive network based
fuzzy inference system (ANFIS) training approaches: a comprehen-
sive survey." Artificial Intelligence Review 52, no. 4 (2019): 2263-
2293.
[12] Masdari, Mohammad, and Afsane Khoshnevis. "A survey and clas-
sification of the workload forecasting methods in cloud computing."
Cluster Computing (2019): 1-26.
[13] Hossain, Eklas, Imtiaj Khan, Fuad Un-Noor, Sarder Shazali Sikander,
and Md Samiul Haque Sunny. "Application of big data and machine
learning in smart grid, and associated security concerns: A review."
IEEE Access 7 (2019): 13960-13988.
[14] Bhattarai, Bishnu P., Sumit Paudyal, Yusheng Luo, Manish Mohan-
purkar, Kwok Cheung, Reinaldo Tonkoski, Rob Hovsapian et al. "Big
data analytics in smart grids: state-of-the-art, challenges, opportuni-
ties, and future directions." IET Smart Grid 2, no. 2 (2019): 141-154.
[15] Ahmed, Adil, and Muhammad Khalid. "A review on the selected ap-
plications of forecasting models in renewable power systems." Re-
newable and Sustainable Energy Reviews 100 (2019): 9-21.
[16] Tsoumakas, Grigorios. "A survey of machine learning techniques for
food sales prediction." Artificial Intelligence Review 52, no. 1 (2019):
441-447.
[17] Radha, R., and S. Muralidhara. "Removal of redundant and irrelevant
data from training datasets using speedy feature selection method."
IntâĂŹl J Comp. Sci. and Mob. Comput 5, no. 7 (2016): 359-364.
[18] Singla, Manisha, and K. K. Shukla. "Robust statistics-based support
vector machine and its variants: a survey." Neural Computing and
Applications (2019): 1-22.
[19] Shi, Fuxi, Jun Chen, Yang Xu, and Hamid Reza Karimi. "Optimiza-
tion of biodiesel injection parameters based on support vector ma-
chine." Mathematical Problems in Engineering 2013 (2013). Doi:
10.1155/2013/893084.
[20] MartÃŋnez-ÃĄlvarez, Francisco, Alicia Troncoso, Gualberto
Asencio-CortÃľs, and JosÃľ C. Riquelme. "A survey on data mining
techniques applied to electricity-related time series forecasting."
Energies 8, no. 11 (2015): 13162-13193.
[21] Agarwal, Atul. "Introduction to Artificial Neural
Networks.âĂİ Medium, December 11, 2019. URL:
https://towardsdatascience.com/introduction-to-artificial-neural-
networks-5036081137bb. [Last accessed: April 4, 2020].
[22] Sakunthala, S., R. Kiranmayi, and P. Nagaraju Mandadi. "A review on
artificial intelligence techniques in electrical drives: Neural networks,
fuzzy logic, and genetic algorithm." In 2017 International Conference
On Smart Technologies For Smart Nation (SmartTechCon), pp. 11-
16. IEEE, 2017.
[23] Soni, Devin. "Introduction to Bayesian Networks.âĂİ
Medium. Towards Data Science, June 8, 2018. URL:
https://towardsdatascience.com/introduction-to-bayesian-networks-
81031eeed94e. [Last accessed: April 4, 2020]. âĂŇ
[24] Patel, Jatin, Nikita D. Patel and Nikita S. Patel. "A Research on Expert
System using Decision Tree and Naive Bays Classifier.âĂİ (2015).
DOI: 10.1186/s40537-019-0175-6
[25] Rushikesh Pupale. "Support Vector Machines(SVM) âĂŤ
An Overview.âĂİ Medium. Towards Data Science, June 16,
2018. URL: https://towardsdatascience.com/https-medium-com-
pupalerushikesh-svm-f4b42800e989. [Last accessed: April 4,
2020].
[26] Valentina Alto. "Neural Networks: Parameters, Hyperparam-
eters and Optimization Strategies.âĂİ Medium. Towards Data
Science. July 5, 2019. URL: https://towardsdatascience.com/neural-
networks-parameters-hyperparameters-and-optimization-strategies-
3f0842fac0a5. [Last accessed: April 4, 2020].
[27] Sanchez, Felipe. "The Hyperparameter Tuning Problem in
Bayesian Networks.âĂİ Medium, February 14, 2020. URL:
https://towardsdatascience.com/the-hyperparameter-tuning-problem-
in-bayesian-networks-1371590f470. [Last accessed: April 4, 2020].
[28] Hutter, Frank, JÃűrg LÃijcke, and Lars Schmidt-Thieme. "Beyond
manual tuning of hyperparameters." KI-KÃijnstliche Intelligenz 29,
no. 4 (2015): 329-337.
[29] Aslam, Sheraz, Adia Khalid, and Nadeem Javaid. "Towards effi-
cient energy management in smart grids considering microgrids with
day-ahead energy forecasting." Electric Power Systems Research 182
(2020): 106232.
[30] Wang, Kun, Chenhan Xu, Yan Zhang, Song Guo, and Alber t Zomaya.
"Robust big data analytics for electricity price forecasting in the smart
grid." IEEE Transactions on Big Data (2017).
[31] Zhou, Zhenyu, Fei Xiong, Biyao Huang, Chen Xu, Runhai Jiao, Bin
Liao, Zhongdong Yin, and Jianqi Li. "Game-Theoretical Energy Man-
agement for Energy Internet With Big Data-Based Renewable Power
Forecasting." IEEE Access 5 (2017): 5731-5746.
[32] Bianchi, Filippo Maria, Enrico De Santis, Antonello Rizzi, and
Alireza Sadeghian. "Short-term electric load forecasting using echo
state networks and PCA decomposition." IEEE Access 3 (2015):
1931-1943.
[33] Eseye, Abinet Tesfaye, Jianhua Zhang, Dehua Zheng, Hui Ma, and
Gan Jingfu. "Short-term wind power forecasting using a double-
stage hierarchical hybrid GA-ANN approach." In Big Data Analysis
(ICBDA), 2017 IEEE 2nd International Conference on, pp. 552-556.
IEEE, 2017.
[34] Alamaniotis, Miltiadis, Dimitrios Bargiotas, Nikolaos G. Bourbakis,
and Lefteri H. Tsoukalas. "Genetic optimal regression of relevance
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 33 of 35
vector machines for electricity pricing signal forecasting in smart
grids." IEEE Transactions on Smart Grid 6, no. 6 (2015): 2997-3005.
[35] Xiao, Liye, Jianzhou Wang, Ru Hou, and Jie Wu. "A combined model
based on data pre-analysis and weight coefficients optimization for
electrical load forecasting." Energy 82 (2015): 524-549.
[36] Naz, Aqdas, Nadeem Javaid, Muhammad Babar Rasheed, Abdul
Haseeb, Musaed Alhussein, and Khursheed Aurangzeb. "Game The-
oretical Energy Management with Storage Capacity Optimization and
Photo-Voltaic Cell Generated Power Forecasting in Micro Grid." Sus-
tainability 11, no. 10 (2019): 2763.
[37] Wang, Jianzhou, Feng Liu, Yiliao Song, and Jing Zhao. "A novel
model: Dynamic choice artificial neural network (DCANN) for an
electricity price forecasting system." Applied Soft Computing 48
(2016): 281-297.
[38] Kavousi-Fard, Abdollah, Haidar Samet, and Fatemeh Marzbani. "A
new hybrid modified firefly algorithm and support vector regression
model for accurate short-term load forecasting." Expert systems with
applications 41, no. 13 (2014): 6047-6056.
[39] Shayeghi, Hossein, Ali Ghasemi, Mohammad Moradzadeh, and M.
Nooshyar. "Simultaneous day-ahead forecasting of electricity price
and load in smart grids." Energy Conversion and Management 95
(2015): 371-384.
[40] Liu, Yang, Wei Wang, and Noradin Ghadimi. "Electricity load fore-
casting by an improved forecast engine for building level consumers."
Energy 139 (2017): 18-30.
[41] Raza, Muhammad Qamar, Mithulananthan Nadarajah, and Chandima
Ekanayake. "Demand forecast of PV integrated bioclimatic buildings
using ensemble framework." Applied Energy 208 (2017): 1626-1638.
[42] Vrablecova, Petra, Anna Bou Ezzeddine, Viera RozinajovÃą,
SlavomÃŋr ÅăÃąrik, and Arun Kumar Sangaiah. "Smart grid load
forecasting using online support vector regression." Computers &
Electrical Engineering (2017).
[43] Chou, Jui-Sheng, and Ngoc-Tri Ngo. "Smart grid data analytics
framework forincreasing energy savings in residential buildings." Au-
tomation in Construction 72 (2016): 247-257.
[44] Khalid, Rabiya, Nadeem Javaid, FahadA. Al-zahrani, Khursheed Au-
rangzeb, Emad-ul-Haq Qazi, and Tehreem Ashfaq. "Electricity Load
and Price Forecasting Using Jaya-Long Short Term Memory (JL-
STM) in Smart Grids." Entropy 22, no. 1 (2020): 10.
[45] Singh, Nitin, Soumya Ranjan Mohanty, and Rishabh Dev Shukla.
"Short term electricity price forecast based on environmentally
adapted generalized neuron." Energy 125 (2017): 127-139.
[46] Bassamzadeh, Nastaran, and Roger Ghanem. "Multiscale stochastic
prediction of electricity demand in smart grids using Bayesian net-
works." Applied energy 193 (2017): 369-380.
[47] Raviv, Eran, Kees E. Bouwman, and Dick van Dijk. "Forecasting day-
ahead electricity prices: Utilizing hourly prices." Energy Economics
50 (2015): 227-239.
[48] Dong, Xishuang, Lijun Qian, and Lei Huang. "Short-term load fore-
casting in smart grid: A combined CNN and K-means clustering ap-
proach." In Big Data and Smart Computing (BigComp), 2017 IEEE
International Conference on, pp. 119-125. IEEE, 2017.
[49] Xiao, Fu, Shengwei Wang, and Cheng Fan. "Mining Big Building Op-
erational Data for Building Cooling Load Prediction and Energy Ef-
ficiency Improvement." In Smart Computing (SMARTCOMP), 2017
IEEE International Conference on, pp. 1-3. IEEE, 2017.
[50] Massana i Raurich, Joaquim, Carles Pous i SabadÃŋ, LlorenÃğ Bur-
gas Nadal, Joaquim MelÃľndez i Frigola, and Joan Colomer LlinÃăs.
"Short-term load forecasting in a non-residential building contrasting
models and attributes." Energy and Buildings, 2015, vol. 92, p. 322-
330 (2015).
[51] Zahid, Maheen, Fahad Ahmed, Nadeem Javaid, Raza Abid Abbasi,
Zainab Kazmi, Hafiza Syeda, Atia Javaid, Muhammad Bilal, Mariam
Akbar, and Manzoor Ilahi. "Electricity price and load forecasting
using enhanced convolutional neural network and enhanced support
vector regression in smart grids." Electronics 8, no. 2 (2019): 122.
[52] Garulli, Andrea, Simone Paoletti, and Antonio Vicino. "Models and
techniques for electric load forecasting in the presence of demand re-
sponse." IEEE Transactions on Control Systems Technology23, no. 3
(2015): 1087-1097.
[53] Keles, Dogan, Jonathan Scelle, Florentina Paraschiv, and Wolf Ficht-
ner. "Extended forecast methods for day-ahead electricity spot prices
applying artificial neural networks." Applied energy 162 (2016): 218-
230.
[54] Amarasinghe, Kasun, Daniel L. Marino, and Milos Manic. "Deep
neural networks for energy load forecasting." In Industrial Electronics
(ISIE), 2017 IEEE 26th International Symposium on, pp. 1483-1488.
IEEE, 2017.
[55] Lago, Jesus, Fjo De Ridder, and Bart De Schutter. "Forecasting spot
electricity prices: deep learning approaches and empirical compari-
son of traditional algorithms." Applied Energy 221 (2018): 386-405.
[56] Wang, Long, Zijun Zhang, and Jieqiu Chen. "Short-Term Electric-
ity Price Forecasting with Stacked Denoising Autoencoders." IEEE
Transactions on Power Systems 32, no. 4 (2017): 2673-2681.
[57] Lu, Yun, Tiankui Zhang, Zhimin Zeng, and Jonathan Loo. "An im-
proved RBF neural network for short-term load forecast in smart
grids." In Communication Systems (ICCS), 2016 IEEE International
Conference on, pp. 1-6. IEEE, 2016.
[58] Tang, Ningkai, Shiwen Mao, Yu Wang, and R. M. Nelms. "So-
lar Power Generation Forecasting with a LASSO-based Approach."
IEEE Internet of Things Journal (2018).
[59] Sharma, Navin, Pranshu Sharma, David Irwin, and Prashant Shenoy.
"Predicting solar generation from weather forecasts using machine
learning." In Smart Grid Communications (SmartGridComm), 2011
IEEE International Conference on, pp. 528-533. IEEE, 2011.
[60] Zheng, Huiting, Jiabin Yuan, and Long Chen. "Short-term load fore-
casting using EMD-LSTM neural networks with a Xgboost algorithm
for feature importance evaluation." Energies 10, no. 8 (2017): 1168.
[61] Grolinger, Katarina, Alexandra LâĂŹHeureux, Miriam AM Capretz,
and Luke Seewald. "Energy forecasting for event venues: big dat a and
prediction accuracy." Energy and Buildings 112 (2016): 222-233.
[62] Saez-Gallego, Javier, Juan M. Morales, Marco Zugno, and Hen-
rik Madsen. "A data-driven bidding model for a cluster of price-
responsive consumers of electricity." IEEE Transactions on Power
Systems 31, no. 6 (2016): 5001-5011.
[63] Moon, Jihoon, Kyu-Hyung Kim, Yongsung Kim, and Eenjun Hwang.
"A Short-Term Electric Load Forecasting Scheme Using 2-Stage Pre-
dictive Analytics." In Big Data and Smart Computing (BigComp),
2018 IEEE International Conference on, pp. 219-226. IEEE, 2018.
[64] Moghaddass, Ramin, and Jianhui Wang. "A hierarchical framework
for smart grid anomaly detection using large-scale smart meter data."
IEEE Transactions on Smart Grid (2017).
[65] Gupta, Sudha, Ruta Kambli, Sushama Wagh, and Faruk Kazi.
"Support-vector-machine-based proactive cascade prediction in smart
grid using probabilistic framework." IEEE Transactions on Industrial
Electronics 62, no. 4 (2015): 2478-2486.
[66] Zhao, Bingbing, Junwei Cao, Ziyu Zhu, and Huaying Zhang. "A new
transient voltage stability prediction model using big data analysis." In
Innovative Smart Grid Technologies-Asia (ISGT-Asia), 2016 IEEE,
pp. 1065-1069. IEEE, 2016.
[67] Saleh, Ahmed I., Asmaa H. Rabie, and Khaled M. Abo-Al-Ez. "A
data mining based load forecasting strategy for smart electrical grids."
Advanced Engineering Informatics 30, no. 3 (2016): 422-448.
[68] Lago, Jesus, Fjo De Ridder, Peter Vrancx, and Bart De Schutter.
"Forecasting day-ahead electricity prices in Europe: the importance
of considering market integration." Applied Energy 211 (2018): 890-
903.
[69] Sulaiman, S. M., P. Aruna Jeyanthy, and D. Devaraj. "Big data analyt-
ics of smart meter data using Adaptive Neuro Fuzzy Inference System
(ANFIS)." In Emerging Technological Trends (ICETT), International
Conference on, pp. 1-5. IEEE, 2016.
[70] GonzÃąlez, JosÃľ Portela, Antonio MuÃśoz San Roque, and Estrella
Alonso PÃľrez. "Forecasting functional time series with a new Hilber-
tian ARMAX model: Application to electricity price forecasting."
IEEE Transactions on Power Systems 33, no. 1 (2018): 545-556.
[71] Chitsaz, Hamed, Hamid Shaker, Hamidreza Zareipour, David Wood,
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 34 of 35
and Nima Amjady. "Short-term electricity load forecasting of build-
ings in microgrids." Energy and Buildings99 (2015): 50-60.
[72] Hagan, Martin T., and Mohammad B. Menhaj. "Training feedforward
networks with the Marquardt algorithm." IEEE transactions on Neural
Networks 5, no. 6 (1994): 989-993.
[73] Sheng, Gehao, Huijuan Hou, Xiuchen Jiang, and Yufeng Chen. "A
novel association rule mining method of big data for power trans-
formers state parameters based on probabilistic graph model." IEEE
Transactions on Smart Grid 9, no. 2 (2018): 695-702.
[74] Yue, Meng, Tami Toto, Michael P. Jensen, Scott E. Giangrande,
and Robert Lofaro. "A Bayesian Approach Based Outage Prediction
in Electric Utility Systems Using Radar Measurement Data." IEEE
Transactions on Smart Grid (2017).
[75] Yu, Chun-Nam, Piotr Mirowski, and Tin Kam Ho. "A sparse coding
approach to household electricity demand forecasting in smart grids."
IEEE Transactions on Smart Grid 8, no. 2 (2017): 738-748.
[76] Tascikaraoglu, A., and M. Uzunoglu. "A review of combined ap-
proaches for prediction of short-term wind speed and power." Renew-
able and Sustainable Energy Reviews 34 (2014): 243-254.
[77] Hernandez, Luis, Carlos Baladron, Javier M. Aguiar, BelÃľn Carro,
Antonio J. Sanchez-Esguevillas, Jaime Lloret, and Joaquim Massana.
"A survey on electric power demand forecasting: future trends in
smart grids, microgrids and smart buildings." IEEE Communications
Surveys & Tutorials 16, no. 3 (2014): 1460-1495.
[78] Quanming, Yao, Wang Mengshuo, Jair Escalante Hugo, Guyon Is-
abelle, Hu Yi-Qi, Li Yu-Feng, Tu Wei-Wei, Yang Qiang, and Yu
Yang. "Taking human out of learning applications: A survey on auto-
mated machine learning." arXiv preprint arXiv:1810.13306 (2018).
Rabiya Khalid received the MCS degree from Mir-
pur University of Science and Technology, Mir-
pur (Azad Kashmir), Pakistan, in 2014, and the
M.S. degree in computer science with a special-
ization in energy management in smart grid from
the Communications Over Sensors (ComSens) Re-
search Laboratory, COMSATS University Islam-
abad, Islamabad, Pakistan in 2017 under the su-
pervision of Dr. Nadeem Javaid. She has authored
more than 20 research publications in international
journals and conferences. Her research interests in-
clude data science and blockchain in smart/micro
grids. Currently she is working as research asso-
ciate and pursuing a PhD in the same lab and under
the same supervision.
Nadeem Javaid received the bachelor degree in
computer science from Gomal University, Dera Is-
mail Khan, Pakistan, in 1995, the master degree in
electronics from Quaid-i-Azam University, Islam-
abad, Pakistan, in 1999, and the Ph.D. degree from
the University of Paris-Est, France, in 2010. He
is currently an Associate Professor and the Found-
ing Director of the Communications Over Sensors
(ComSens) Research Laboratory, Department of
Computer Science, COMSATS University Islam-
abad, Islamabad. He has supervised 120 master
and 16 Ph.D. theses. He has authored over 900 ar-
ticles in technical journals and international con-
ferences. His research interests include energy op-
timization in smart/micro grids, wireless sensor
networks, big data analytics in smart grids, and
blockchain in WSNs, smart grids, etc. He was re-
cipient of the Best University Teacher Award from
the Higher Education Commission of Pakistan, in
2016, and the Research Productivity Award from
the Pakistan Council for Science and Technology,
in 2017. He is also Associate Editor of IEEE Ac-
cess, Editor of the International Journal of Space-
Based and Situated Computing and editor of Sus-
tainable Cities and Society.
Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 35 of 35