Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on May 18, 2020

Content may be subject to copyright.

A Survey on Hyperparameters Optimization Algorithms of Forecasting

Models in Smart Grid

Rabiya Khalida,Nadeem Javaida,∗

aDepartment of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan

ARTICLE INFO

Keywords:

Forecasting

Hyperparameters

Parameter Tuning

Data Preprocessing

Training Algorithms

Outliers in Data

Processing Time

ABSTRACT

Forecasting in the smart grid (SG) plays a vital role in maintaining the balance between demand and

supply of electricity, eﬃcient energy management, better planning of energy generation units and re-

newable energy sources and their dispatching and scheduling. Existing forecasting models are being

used and new models are developed for a wide range of SG applications. These algorithms have hy-

perparameters which need to be optimized carefully before forecasting. The optimized values of these

algorithms increase the forecasting accuracy up-to a signiﬁcant level. In this paper, we present a brief

literature review of forecasting models and the optimization methods used to tune their hyperparam-

eters. In addition, we have also discussed the data preprocessing methods. A comparative analysis of

these forecasting models, according to their hyperparameter optimization, error methods and prepro-

cessing methods, is also presented. Besides, we have critically analyzed the existing optimization and

data preprocessing models and highlighted the important ﬁndings. A survey of existing survey papers

is also presented and their recency score is computed based on the number of recent papers reviewed

in them. By recent, we mean that the year in which a survey paper is published and its previous three

years. Finally, future research directions are discussed in detail.

1. Introduction

The advancement in technology and increased usage of

smart devices bring about the concept of big data. The pro-

duction of data is growing rapidly and according to [1], the

volume of big data would increase by the factor of 300 in the

upcoming years. The cost of data storage has also been re-

duced which is paving the way of storing more data and use

it in the future. It has become a new focus for the researchers

while playing a very important role in engineering, science

and technology domains to acquire the eﬃcient solution of

the problems. In SG, a huge volume of data is being gathered

and stored from sensors, smart meters and other smart de-

vices which can be used for eﬃcient planning and forecast-

ing. So, forecasting using big data has become the new hot

topic in this domain. The rapid increment in electricity con-

sumption, intermittent nature of renewable energy sources

(RESs) and ﬂuctuations in demand are serious issues of a

power system. The core aims of SG are to achieve the bal-

ance between demand and supply of electricity, increase the

grid’s reliability and eﬃciency and make grid environment

friendly. Forecasting, in this regard, plays a very important

role. It enables the utility to plan and organize the future de-

cisions related to power generation, electricity price and co-

ordination of electricity generating units and get maximum

beneﬁts out of them. Electricity load and price forecasting

have gained great attention from researchers in this area as

these two factors have a great inﬂuence on maintaining the

stability of the grid. Additionally, forecasting of power fail-

ure, stability of transformers and power network, anomalies,

blackouts and energy generation forecasting of RES are also

∗Principal corresponding author

nadeemjavaidqau@gmail.com (N. Javaid)

http://www.njavaid.com (N. Javaid)

ORCID(s): 0000-0003-3777-8249 (N. Javaid)

studied and their solutions are provided in the literature.

The prediction of future values of SG components has

a great importance as they play an important role as an in-

put to the current decisions [2]. For example, the knowledge

of load demand in the future can play a very important role

to set the electricity price values. It is also important for

the utility to make diﬀerent policies related to energy. Sev-

eral decisions related to power, base on the information of

future load. Similarly, a consumer can use the forecasted

values of electricity prices and change its load consumption

pattern accordingly. In recent years, researchers have pro-

posed a large number of forecasting models for accurate load

and price prediction. They can be used in the maintenance

of power networks, better scheduling of energy generators,

continuous energy provision to the consumers, achieve sta-

bility in demand and supply of electricity and maintain the

grid reliability. Moreover, eﬀective planning and decision

making can save millions of dollars which are ver y important

for the economical growth of a company as well as country.

In literature, researchers have proposed several forecast-

ing methods to predict the load and price of electricity. Fig-

ure 1shows the classiﬁcation of forecasting algorithms. Sup-

port vector machine (SVM) and neural networks (NNs) are

commonly used forecasting algorithms by researchers to make

predictions in the SG area. Diﬀerent variants of these algo-

rithms are also available and used. Bayesian networks (BN)

are also one of the frequently used forecasting methods. In

addition to these three types, several other forecasting tech-

niques are also implemented for the prediction in SG. In fore-

casting algorithms, the accuracy is greatly aﬀected by their

hyperparameters. So, the values of these parameters should

be chosen carefully.

In this paper, we have presented a survey of algorithms

used for the optimization of hyperparameters in SG. Their

values vary from problem to problem and need accurate op-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 1 of 35

Forecasting in smart grid

ANN SVM BN Others

Optimize weights/hyperparameters

Nature-inspired

techniques Grid search Cross validation Other statistical

methods

[3], [4], [5], [6], [7], [8],

[9], [10], [11], [12 ], [13],

[14], [15], [16 ], [17]

[18], [19], [20 ], [21],

[22], [23],

[31], [32], [33 ], [34],

[35], [36]

[37], [38], [39 ], [40], [41],

[42], [43], [44 ], [45]

Gradient descent

[24], [25], [26 ], [27],

[28], [29], [30 ]

Figure 1: Types of forecasting algorithms and optimization techniques

timized values for correct prediction. Ineﬃcient optimiza-

tion of these parameters results in poor accuracy results and

a model that could be the best choice for a forecasting model,

does not perform well. To overcome this issue, diﬀerent op-

timization methods are applied to optimize these parameters.

Figure 1shows the optimization methods commonly used

for parameters optimization of forecasting models. Here,

grid search, gradient descent and cross validation are the

frequently used methods. Nature-inspired methods are also

proposed by researchers to eﬃciently optimize these param-

eters. In the existing literature [2,3], [4]-[6], similar surveys

are presented. However, [2] and [5] surveyed the hyperpa-

rameter tuning methods of the least square support vector

machine (LSSVM) only. Similarly, in [3], randomized algo-

rithms are surveyed for tuning NNs, [4] presents the survey

of metaheuristic algorithms to train random single-hidden

layer feedforward neural network (RSLFN), and [6] contains

the survey of heuristic algorithms for tuning SVM. In our

survey, we have reviewed both nature-inspired and statisti-

cal methods to tune the hyperparameters of SVM, NNS, BNs

and their variants. The contributions of our survey are as fol-

lows:

•A detailed review of forecasting models (from 2014

and onwards) and optimization methods used to tune

hyperparameters of these models is presented.

•Data preprocessing methods used in these studies are

also discussed.

•All the forecasting models are critically analyzed and

future research directions are also presented.

•In related work section, a survey of similar survey pa-

pers is presented and their recency score is also com-

puted.

The rest of the paper is organized as follows: Section 2

contains similar work and in Section 3, we have discussed

the proposed framework of a forecasting model that con-

tains all the necessary steps for data forecasting, from data

gathering to the ﬁnal output phase. Section 4contains a

detailed discussion on forecasting techniques and optimiza-

tions methods used to optimize the values of their hyperpa-

rameters. In Section 5, common data preprocessing meth-

ods are discussed. Section 6contains the critical analysis

and ﬁndings of this survey. In Section 7, future directions

related to hyperparameter optimizers are discussed. Finally,

paper is concluded in Section 8.

2. Related work

Hyperparameter optimization is considered very impor-

tant for the forecasting accuracy of algorithms. Researchers

are using the already existing optimizers and also proposing

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 2 of 35

new optimization algorithms for their tuning. Improvement

of forecasting accuracy is an ongoing research area. So, to

summarize this research and provide compact information

to the readers, survey papers are published. The overview

of some of the most related literature to our work is given

below.

2.1. Survey of surveys (similar work)

A survey of hyperparameter tuning of LSSVM is pre-

sented in [2]. The performance of cross validation, evolu-

tionary algorithms and swarm intelligence optimization meth-

ods is compared. The values of the regularization parameters

and kernel parameters are optimized using these methods. In

this study, it is concluded that the evolutionary algorithms

are the best choice for tuning of these hyperparameters.

In [3], a survey of randomize algorithms to train the NNs

is presented. It is stated that these algorithms can enhance

both the performance and eﬃciency of NNs. The authors

have discussed the use of randomization in kernel methods

and classiﬁed the model into several parts based on network

conﬁguration. The future challenges and direction in this

area are also discussed in this article.

Han et al. provided a survey on the hyperparameter tun-

ing of RSLFN in [4]. In this forecasting model, the ini-

tial values of hyperparameters are chosen randomly and op-

timized iteratively. The authors state that the forecasting

accuracy of this model depends on the accurate selection

of the number of hidden neurons and other hyperparame-

ters. For a careful selection, several optimization methods

are used. Metaheuristic optimization methods are one of

the commonly used optimization methods. The authors pro-

posed a comprehensive survey of hyperparameter tuning of

RSLFN using metaheuristic optimization methods. The fu-

ture research direction and possible challenges are also dis-

cussed in detail.

Afshin et al. presented a comparative study for the hy-

perparameter tuning methods for LSSVM in [5]. The au-

thors state that the accurate selection of these parameters is

essential to achieve high accuracy in short-term forecasting.

Three parameter optimization methods are used, such as ge-

netic algorithm (GA), cross validation, simulated annealing

and Bayesian evidence framework. From the comparison

of optimization methods, it is concluded that the Bayesian

framework achieves the highest accuracy and its speed is

also faster.

In [6], authors survey nature-inspired algorithms for tun-

ing the hyperparameters of support vector regressor (SVR).

This survey is speciﬁc for the regression problem related to

inverse ECG. It is stated that to obtain accurate results, it

is important to tune the hyperparameters carefully. Three

optimization algorithms are used: GA, particle swarm op-

timization (PSO) and diﬀerential evolution (DE). For a fair

comparison, SVR is trained using these optimizers using the

same dataset. The simulations are carried out and results

show that SVR gives best results when its hyperparameters

are tuned using PSO.

Elsken et al. [7] have explored the research ﬁeld of au-

tomated neural architecture search methods. It is stated that

the search methods developed by humans are erroneous and

time-consuming. So, the automated methods are fast and

there is a minimum chance of any error. Every phase of

these methods is automatic and models are selected when

their performance meets a predeﬁned selection criteria. The

hyperparameters of these methods are also selected automat-

ically. Models belonging to this category are classiﬁed into

three types based on their search space, strategy and perfor-

mance evaluation method.

In [8], diﬀerent hyperparameter optimization methods

are tested to check which one of them is best. To evalu-

ate the performance of these methods and for a fair com-

parison, all the methods are applied to the defect prediction

problem. After the simulation experiments, it is concluded

that a single hyperparameter tuning method cannot be de-

clared as best. Also, the results of some methods depict

that the default conﬁgurations produce the same accuracy as

the accuracy achieved by parameter tuning. It is concluded

that the hyperparameter optimization can play a signiﬁcant

role in the improvement of forecasting accuracy and the op-

timization method should be selected carefully according to

the nature of prediction data and forecasting model.

Bergstra et al. [9] have presented a comparison of hyper-

parameter tuning methods for artiﬁcial NN (ANN) and deep

belief networks (DBN). The performance of random search

and sequential methods is compared for tuning of these pa-

rameters. Performance-wise, the random search produces

good results for ANN but its performance in the case of DBN

is not up to the mark. On the other hand, the sequential meth-

ods give better accuracy with the DBN forecasting model.

The accuracy of the most complex DBN is also improved

by using these methods. In this study, the dependency of

several parameters on hyperparameters is also highlighted.

A survey on deep learning-based forecasting models is

presented in [10]. In this work, the authors state that hyper-

parameter optimization is an important and time-consuming

task, so optimization methods should be used. In this paper,

only those forecasting algorithms are reviewed whose hy-

perparameters are optimized using swarm intelligence and

evolutionary algorithms. The eﬀect of these optimization al-

gorithms on the prediction accuracy of deep learning-based

forecasting methods is analyzed for big data applications.

Additionally, commonly used deep learning methods are also

discussed along with their weaknesses and strengths. After

a comprehensive discussion of forecasting models, the core

ﬁndings of this survey are discussed and issues related to

deep learning-based methods are highlighted which need to

be improved. Moreover, the future researcher direction in

this domain are also discussed. The existing literature is crit-

ically analyzed in the âĂĲproblem and challengesâĂİ sec-

tion.

Karaboga et al. presented a survey of the adaptive network-

based fuzzy inference system (ANFIS) and the optimization

methods used to train its parameters in [11]. From the sur-

vey, it is observed that both derivative and non-derivative

based optimization algorithms are used for ANFIS training.

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 3 of 35

Former include gradient descent, least-square estimation, etc.,

while later include heuristic algorithms. It is observed that

heuristic optimization methods perform better for model train-

ing. Hence, the focus of this survey is to present the per-

formance of heuristic models and the introduction of new

hybrid algorithms for ANFIS training. A brief discussion

related to these models is presented and work is concluded.

However, critical analysis and future challenges are not in-

cluded in this work.

In [12], a survey of forecasting models for workload pre-

diction in cloud computing is presented. Workload predic-

tion plays a very important role in the eﬃciency and relia-

bility of a cloud system. It helps to improve the quality of

services provided by a cloud as the energy required by data

centers and cloud resources can be estimated, the scalabil-

ity of service providers can be increased, etc. In this survey,

the challenges of workload prediction are discussed in de-

tail and workload is also classiﬁed according to the architec-

tural requirements, computing model, resource requirements

and other non-functional requirements. Moreover, a detailed

survey of regression-based models, classiﬁcation model and

stochastic-based forecasting models is presented. Besides,

future research direction and critical analysis of the existing

schemes are also provided in this survey.

Hossain et al. [13] presented a survey of big data and

machine learning applications in SG. The applications of

IoT are also discussed in detail as this technology provides

connectivity between several smart electric devices. The in-

teraction of these devices generates the data in a huge vol-

ume which is known as big data. This data is used by re-

searchers and machine learning techniques are applied to get

meaningful information. Electricity load and price forecast-

ing are two very important applications of this data and ma-

chine learning methods. This study contains a comprehen-

sive discussion of big data analytics and machine learning

algorithms for forecasting. Moreover, cybersecurity is a big

issue of IoT integrated systems as attackers target the smart

devices and data. In this study, this issue is also discussed

in detail. In the end, the survey is concluded and in out-

come section, some future research directions and critical

comments are discussed.

In [14], a survey of big data analytics is presented in SG.

The authors have identiﬁed the research gaps and barriers to

big data implementation in this domain. A comprehensive

review of the existing literature is presented and challenges

are highlighted. Moreover, a detailed discussion of future

research directions for big data integration in SG is also pro-

vided. Moreover, applications of big data in the SG domain

are discussed in detail. Some important applications are as

follows: energy management, improved eﬃciency and relia-

bility of SG, state estimation, cyber-physical systems, etc. In

addition, for utility companies, a deep insight is provided on

how big data can be beneﬁcial for developing new business

models.

The eﬀects of load forecasting on a microgrid are investi-

gated in [15]. The focus of this study is to evaluate the role of

power generation and consumption on a renewable resource-

Table 1

Criteria for calculating recency score

Percentage (%) Weight Recency score

0-10 0.1 *

11-20 0.2

21-30 0.3 **

31-40 0.4

41-50 0.5 ***

51-60 0.6

61-70 0.7 ****

71-80 0.8

81-90 0.9 *****

91-100 1.0

based microgrid. For this purpose, papers are selected from

highly impacted and well-cited journal publications. The

forecasting models are analyzed on the bases of cost metric,

reserve size estimation, market beneﬁts, improved reliability

of microgrid, etc. This study aims to provide the guideline

to the new researchers about the trends of optimal planning

and operations of a microgrid.

In [16], a survey of forecasting algorithms for food sales

predictions is presented. The future stats related to food

items sales help the sellers to avoid the problem of miss-

ing products and stacks of expired products. They can plan

the purchasing of their food items according to the predicted

values and reduce the monetary loss in their business while

increasing the proﬁt. In this study, a comprehensive liter-

ature review of forecasting models in food sales is carried

out. Additionally, the evaluation metrics of forecasting mod-

els are also discussed. Finally, the paper is concluded with

a discussion on the existing forecasting models and oppor-

tunities in this domain. However, the critical analysis and

future research directions are missing in this paper.

2.2. Recency score and comparative analysis

The percentage of recency is obtained by multiplying the

total number of recent papers with 100 and then dividing it

with the total number of papers. By recent, we mean the

year in which paper is published and its previous 3 years.

We propose an equation to compute the recency percentage

of a paper which is as follows:

𝑑−3

𝑖=𝑑Total number of papers from year i

Total number of papers cited in a survey × 100 (1)

In the above equation 𝑑represents the year in which a survey

paper is published. Table 1shows the detail of how a paper

is assigned a weight according to its percentage value. Ac-

cording to the weight of each paper, they are assigned stars.

If the weight assigned to a paper is more than 1.0 then ﬁve

stars are given, on the other hand, if assigned weight is 0.1

then only one star is given to that paper. These stars repre-

sent how much recent papers are cited in a survey.

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 4 of 35

Table 2

Comparative analysis of related work

Survey Future

challanges

Domain Critical

analysis

Recency

score

Signiﬁcance

A survey of hyperparam-

eter tuning of LSSVM [2]

7Not domain

speciﬁc

7* Hyperparameter tuning of LSSVM

A survey on NN [3]3Not domain

speciﬁc

7** Hyperparameter tuning algorithms for

NN

A survey on hyperparam-

eter tuning of RSLFN [4]

3Not domain

speciﬁc

7** Hyperparameter tuning of RSLFN using

metaheuristic optimization algorithms

A comparative study for

tuning of LSSVM [5]

7Not domain

speciﬁc

7** Hyperparameter tuning of LSSVM

A comparative study for

tuning SVR [6]

7Not domain

speciﬁc

7** Hyperparameter tuning of SVR using

heuristic algorithms

A survey of deep

learning-based models

[7]

3Not domain

speciﬁc

7*** Discussed deep learning, its dimensions

and hyperparameter optimization

A survey of hyperparam-

eter optimizers [8]

3Not domain

speciﬁc

7*** Compares performance of optimizers by

training forecasting algorithms

A survey of hyperparam-

eter optimizers [9]

7Image pro-

cessing

7*** Hyperparameter optimizers for image

processing algorithms

A survey on hyperpa-

rameter optimization of

deep learning-based fore-

casting models [10]

3Not domain

speciﬁc

3*** Hyperparameter optimization of deep NN

using swarm intelligence and evolutionary

algorithms

A survey on training

methods of ANFIS [11]

7Not domain

speciﬁc

7** Hyperparameter tuning of ANFIS by

derivative and non-derivate based algo-

rithms

A survey and classiﬁca-

tion of the workload fore-

casting methods [12]

3Cloud com-

puting

3**** Forecasting models for workload forecast-

ing in cloud computing

A survey on applications

of big data and machine

learning [13]

3SG 3**** Big data and machine learning applica-

tion in SG

A survey on big data an-

alytics [14]

3SG 3**** Big data analytics in SG

A survey on forecast-

ing models in renewable

power systems [15]

7SG 7**** Selected applications of forecasting mod-

els for optimial integration of renewable

energy

A survey on food sales

predictions [16]

7Food sales 7* Machine learning techniques for food

sales Prediction

We propose a survey on

forecasting models in SG

3SG 3**** Hyperparameter optimization of fore-

casting algorithms in SG

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 5 of 35

Start Feature selection\Dimension

reduction

Stopping

criteria

meet?

Smart

meters

Data prefiltering

Normalization

Split into training/testing data

Parameter

selection

Train model

Compute error

Optimal

parameters

Retrain model

Forcasted values

Reverse

transformation

Users

Sensors Electricity

market

Big data

Output value

Data preprocessing

Big data generation in SG

Yes

No

Hyper parameter

optimization

Forecasting algorithm

Figure 2: Framework of a forecasting model

Table 2shows the comparative analysis of the related

work. The studies [2,5,6] survey the hyperparameters op-

timization methods for SVR and its variants. These stud-

ies do not include critical analysis and future research direc-

tions. The recency score of [2] is the lowest and the recency

score of the other two studies is also low. In [7,10,4] con-

tains the survey of ANN and its variants and [49] surveys

optimization methods for ANFIS. Authors in [10] provided

a comprehensive survey with critical analysis, applications

and future research directions. However, its recency score

is three stars. Authors in [13,14,16] provide the survey of

forecasting algorithms in cloud computing, image process-

ing, and food sales, respectively. From these studies, [16]

has the lowest recency score and also it does not provide all

the necessary information. The studies [12]-[37] have the

highest recency score. From these studies, [13,14] are best

as both of these studies also include a detailed discussion on

important aspects and oncomings of their survey.

3. Forecasting model

Forecasting can be deﬁned as a process in which cur-

rent and past data values are analyzed to predict future val-

ues. In SG, electricity price and demand forecasting are

of great importance. With the advancement in technology

and increased Internet of things applications, a huge amount

of data is gathered. This data is used to predict future en-

ergy demand, price, fault detection, electricity theft, etc. Re-

searchers are actively working in this area and proposing ef-

ﬁcient forecasting models with enhanced performance and

accuracy, as discussed in Section 4. The collected data is

given as an input to a forecasting model and it predicts the

future value as an output.

In a forecasting model, three phases are very important

i.e. data gathering, data preprocessing and forecasting. Fig-

ure 2shows all the necessary steps used in an eﬃcient fore-

casting model. For prediction, the availability of related data

is very important. In SG, data is gathered from diﬀerent

sources. Smart meters are a big source of data related to

demand and supply of electricity, patterns of energy con-

sumption of diﬀerent users, user preferences, alteration in

consumption patterns by users, etc. Sensors, on the other

hand, keep on sensing power lines, the status of energy gen-

eration units and other important components involved in

the power system. The data generated by these components

is also stored and used in forecasting. Moreover, users and

electricity markets are also two big data generating sources

in SG.

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 6 of 35

3.1. Data preprocessing

The available data is found in raw form and needs pre-

processing before usage. The ﬁrst step of data preprocessing

is to convert data into tabular form and select only relevant

features. The inclusion of irrelevant features increases the

data size and decreases the learning speed and accuracy of

a forecasting algorithm [17]. After selecting the features,

the data is ﬁltered to eliminate the outliers and delete the

records containing missing values as both outliers and miss-

ing values can reduce the accuracy of the forecasting model

and lead to inaccurate predictions. In the next step, all at-

tributes are normalized between a common interval, usually,

this interval is between 0 and 1. This step is used to make

the values of diﬀerent variables comparable. After applying

these steps the resultant dataset is split into test and training

sets and passed to the forecasting phase. In Section 5, data

preprocessing techniques used in literature are discussed in

detail.

3.2. Forecasting algorithms

After data preprocessing, it is now ready to train a fore-

casting algorithm. The preprocessed data is the input of a

forecasting algorithm. In SG, SVM, NN and BN based fore-

casting algorithms are used frequently.

3.2.1. SVM

SVM is an eﬃcient and simple learning method. It is

used for both classiﬁcation and regression problems. Ini-

tially, it was designed for the two classes only; however, with

time its variants for multiple classes and regression problems

were introduced. The variant of SVM which is used for re-

gression is known as SVR. Both models possess the same

qualities as the values of target variables in SVR are real

numbers. In SG, SVR is commonly used. The problem of

training SVR using a data set with m points 𝑚

𝑖=1{𝑥𝑖, 𝑦𝑖}can

be expressed as follows [18]:

𝑚𝑖𝑛{1

2𝑤2+𝐶

𝑚

𝑖=1

(𝜒𝑖+𝜒∗

𝑖)} (2)

subject to the following constraints

𝑦𝑖− (𝑤𝑡𝑥𝑖+𝑏)⩽𝜖+𝜒𝑖,

(𝑤𝑡𝑥𝑖+𝑏) − 𝑦𝑖⩽𝜖+𝜒∗

𝑖,

𝜒𝑖, 𝜒∗

𝑖≥0.

(3)

In equation 2,𝑥𝑖and 𝑦𝑖are the training and testing samples,

respectively. 𝑤is the weight vector, 𝑏is the bias term of the

hyperplane and 𝐶represents the regularization parameter.

Figure 3[19] shows the important steps which are followed

to train the model. After initializing the input data, a suitable

kernal is chosen and hyperparameters are initialized. In the

next step the initial model is trained and ﬁtted over the input

data. After these steps, error is computed. If the error value

is minimum then model is ready for prediction, otherwise,

the hyperparameters are tuned again.

Start

Initialize input

Select kernal

Train initial model

Is error minimum?

Tune hyper parameters

End

Yes

No

Set values of

hyper parameters

Fit input/output parameters

Compute fitting err or

Figure 3: Flow chart of SVM

3.2.2. ANN

The ANNs are widely used forecasting algorithms be-

cause of their accuracy and learning speed. These are in-

spired from human brain [20]. A human brain is made-up

of several neurons which are interconnected and pass infor-

mation to each other. Similarly, ANN follow the same pro-

cedure and activation function is used to move information

from one perceptron to another. It can mathematically be

formulated as follows [21].

𝑁𝑗=

𝑑

𝑖=1

(𝑥𝑖𝑤𝑖𝑗 +𝑤𝑗0)(4)

𝑦𝑗=𝑓(𝑁𝑗)(5)

𝑁𝑘=

𝑒

𝑗=1

(𝑦𝑗𝑤𝑘𝑗 +𝑤𝑘0)(6)

𝑧𝑘=𝑓(𝑁𝑘)(7)

𝑧𝑘=𝑓(

𝑒

𝑗=1

(𝑤𝑘𝑗 𝑓(

𝑑

𝑖=1

(𝑥𝑖𝑤𝑖𝑗 +𝑤𝑗0)) + 𝑤𝑘0)) (8)

In equation 4,𝑑represents the total number of inputs and

𝑋is the input value which is being multiplied with weight

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 7 of 35

Start

Initialize inputs

Create three netwo rk layers

Train the networ k

Last pattern trained?

Train the final netw ork

End

Total error = 0

Train on first pattern of data

Compute error for each neuron and

add to the total error

Total error < Target error

Yes

Yes

No

No

Figure 4: Flow chart of ANN

𝑤.𝑤0represents the bias value. The equation 5represents

the activation function and value of 𝑦depends on it. In the

next equation 𝑒is the number of perceptrons. The equation 7

represents the next activation function which is used to com-

pute the value of output neurons 𝑧. The equation 8is the

elaborated form of previous equation. Figure 4[22] shows

the important steps which are followed to train the model.

After input initialization, three layers are created, i.e., input,

hidden and output layers. The model is trained on initial set-

tings and error is initialized to zero. After this, the model

is trained on ﬁrst pattern of data and error of each neuron is

calculated and added to the total error value. The forecast-

ing model is trained using all patterns and ﬁnal error value is

computed. If the value of total error is less than the deﬁned

threshold value then ﬁnal model is trained, otherwise, model

is trained again.

3.2.3. BN

A BN is a directed acyclic graph. In this network, the

nodes are the variables and edges between these nodes are

the conditional dependencies. It is a strong candidate model

to compute the probabilities of all causes of an event. Using

this model, the real cause of an event can be determined. It

fulﬁlls the property of local Markov chain [23] which is as

follows.

𝑃(𝑋1, 𝑋2, ..., 𝑋𝑛)(9)

=

𝑛

𝑖=1

𝑃(𝑋𝑖𝑋1, 𝑋2, ..., 𝑋𝑖−1)

Start

Select a feature

Find the probability of selected

feature to be in a class

Assign the class label

Probabilities for all

classes are computed?

Is there any feature?

Next feature

End

Yes

No

No

Yes

Figure 5: Flow chart of BN

=

𝑛

𝑖=1

𝑃(𝑋𝑖𝑃 𝑎𝑟𝑎𝑛𝑡𝑠𝑋𝑖)

The above equation represents that the 𝑋𝑖is conditionally

independent of its non-dependent and it is conditionally de-

pendent on its parents only. Due to this property of a BN,

the number of nodes reduces which reduces the overall com-

putational eﬀorts. The ﬁgure 5represents the ﬂow chart

of a BN. A feature is selected from the given input and its

probability of being member of a class is computed for all

classes. After probability computation, it is assigned a class

label. After that it is checked that whether probabilities of

all classes are computed or not. After this, algorithm checks

for any left features and then terminates.

3.3. Optimization algorithms

Optimization is the process of ﬁnding the best available

solution under some constraints. For an optimization prob-

lem, we deﬁne an objective function and constraints. This

objective function can be formulated to ﬁnd either minimum

value or maximum value. Generally, for optimizing the hy-

perparameters of a forecasting algorithm, we minimize the

diﬀerence between actual and predicted value. It is called er-

ror minimization. Diﬀerent types of optimization algorithms

are available as shown in Figure 6.

3.4. Hyperparameters optimization

The forecasting algorithms have hyperparameters. Their

values are selected before training the model as these param-

eters play a very important role in the accuracy of forecasted

values. These parameters are optimized by iteratively train-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 8 of 35

ing the model. This iterative process is continued unless the

stopping criteria is met as shown in Figure 2. The optimiza-

tion of hyperparameters means selecting the best values of

these parameters, where the accuracy of the forecasting is

highest. The detailed discussion of these methods is given in

Section 4. After the selection of parameter values, the model

is trained and ﬁnal forecasted values are obtained which are

then reversed back to the original format and output is gen-

erated. The following subsections contain the hyperparam-

eters of our commonly used forecasting methods.

3.4.1. Hyperparameters of SVM

There are three important parameters of SVM that need

to be selected carefully, such as kernel function, regulariza-

tion parameter C and Gamma 𝛾[25]. There are three types

of kernels, i.e., liner, poly and RBF. Each one of these has

its pros and cons. The hyperparameter C is important while

deﬁning the decision boundary. The higher value of C means

correct training points; however, the decision boundary will

not be smooth. On the other hand, C with lower value gen-

erates a smooth decision boundary but at the same time, it

may reduce the accuracy of training points. Moreover, the

hyperparameter Gamma deﬁnes the inﬂuence of the training

pattern. Its value has an inverse relation with the inﬂuence

of these patterns, i.e. higher value means low inﬂuence and

a lower value means more inﬂuence.

3.4.2. Hyperparameters of ANN

In an ANN, there are seven hyperparameters, such as the

number of hidden layers, learning rate, momentum, activa-

tion function, bath size, epochs and dropout rate [26]. The

selection of an optimal number of hidden layers aﬀects the

performance of a model. The minimum number of neurons

makes a model fast and simple while increasing the num-

ber makes it slow but at the same time improves its ability

of classiﬁcation. The learning rate is the step size of the

backpropagation, it aﬀects the loss value during the train-

ing of a model. During the learning phase, the slow conver-

gence speed is tackled by keeping a record of past directions

and moving the algorithm towards the best possible direc-

tion. An activation function is used to pass the weighted

sum. Sigmoid, Tanh and RELU are examples of an activa-

tion function. Moreover, batch size can be deﬁned as the

small samples of data. Instead of feeding all data to an ANN

at once, small samples of data are passed as input. Inappro-

priate selection of batch size results in an over-generalized

model. The value of epoch decides how many times a model

will be trained on an entire dataset. In dropout, the unneces-

sary nodes of a network are eliminated. It saves the network

from being heavy and having a repetition of information by

eliminating the less important nodes.

3.4.3. Hyperparameters of BN

A BN has four hyperparameters, such as the number of

input nodes, target nodes and states of input and target nodes

[27]. Before training the model, the values of these hyperpa-

rameters are selected carefully as they aﬀect the complete-

ness of a model. The optimal selection of these parameters

ensures the better learning quality of a BN. The input nodes

are the parent nodes in the network and their states are the

children. It means, more states or input nodes result in a

dense network. Moreover, If both input nodes and target

nodes have multiple states then the model becomes complex

and computationally expensive. To avoid this situation, syn-

thetic nodes are deﬁned.

3.4.4. Optimization problem

As discussed in the previous section, the hyperparame-

ters optimization is very important for accurate forecasting

in any domain. It can be solved as a separate problem. The

values of these parameters vary for each data set. The opti-

mized values for one data set may not perform well for an-

other data set of the same domain. So, whenever a forecast-

ing model is trained on a data set, it is necessary to optimize

its hyperparameters as well. The problem of hyperparameter

optimization can be formulated mathematically as: [28]:

𝐹= {ℎ1, ℎ2, ℎ3, ..., ℎ𝑛}.(10)

𝑆= {𝑠1, 𝑠2, 𝑠3, ..., 𝑠𝑛}.(11)

𝑓(ℎ) = 1

𝑘

𝑘

𝑖=1

(𝐹𝑠, 𝑇𝑠(𝑖), 𝑉𝑠(𝑖)).(12)

The equation 10 represents that a forecasting algorithm 𝐹

has 𝑛number of hyperparameters. The search space of these

hyperparameters is represented by 𝑆in equation 11 and it

also has 𝑛elements. Here values of hyperparameters are se-

lected form their search space as ℎ𝜖𝑆. Equation 12 repre-

sents the objective function which is to be minimized by se-

lecting the optimized values. Here, 𝑖represents the number

of samples and 𝐹𝑠is the error in forecasting using training

data 𝑇𝑠and validation data 𝑉𝑠.

4. Classiﬁcation of tuning methods

In SG, the hyperparameters of forecasting algorithms are

tuned using diﬀerent optimizations techniques. The com-

monly used methods are grid search, cross validation, gra-

dient descent and naive bayes (NB). Nature-inspired heuris-

tic algorithms are also getting popular in this ﬁeld. Figure

6contains detailed classiﬁcation of these algorithms. They

are classiﬁed into two major categories: nature-inspired al-

gorithms and other statistical methods.

4.1. Nature-inspired algorithms

In this subsection, a brief review of existing literature is

presented where nature-inspired algorithms are used for pa-

rameter tuning of forecasting methods. These are the most

commonly used algorithms because of their good performance

and adaptability.

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 9 of 35

Optimization

Techniques

Nature-inspired

algorithms

Statistical

methods

MGA

GA

CSA

MFA

QOABC

NSSO

PSO

Improved envir onment

adaption me thod

Tabu search

RCGA

Gradient descen t

Cross validati on

NB

DI-Cast

Quaisi newton method

Levenberg marqu ardt

Excavated associat ion

rules

NHPP

Altering direc tion of

multiplier method

Grid search

Figure 6: Classiﬁcation of optimizations methods for hyperparameters

4.1.1. Diﬀerential evolution

In [29], an energy management system is proposed for a

residential area. This system helps the electricity consumers

to make strategies for their energy consumption and reduce

their electricity bills. Moreover, using this system, the peak

to average ratio is also minimized which plays an important

role in maintaining the reliability and sustainability of the

main grid. Here, the electricity consumers also generate en-

ergy locally using renewable resources. For eﬃcient energy

management, energy generation forecasting from renewable

resources is of great importance. So, in this paper, ANN is

used for forecasting. The hyperparameters of this model are

optimized using an enhanced DE algorithm. To evaluate the

accuracy of the proposed model, NRMSE and MAPE are

used.

The SVM based classiﬁer is used in [30] which deﬁnes a

hyperplane and classiﬁes data with the help of support vec-

tors. As the primary goal of this study is to predict the price

of electricity with minimal forecasting error and there exist

some super parameters in SVM which have a direct eﬀect on

forecasting accuracy, so, it is important to tune these param-

eters eﬃciently. For this purpose, a DE algorithm has been

used and the model is named as DE-SVM. It explores the

search space and ﬁnds the best combination of values for the

hyperparameters where forecasting accuracy of the model is

high. The simulations are carried out using Python and Intel

Core i5 with 4 GB RAM and 500 GB hard disk. To evaluate

the performance of the proposed model, it is applied to the

dataset from ISO New England Control Area from 2010 to

2015, having more than 5000 records. The proposed model

is compared with benchmark classiﬁers namely: NB, and

decision trees (DT). The error in forecasting accuracy shows

that the proposed model outperforms the benchmark classi-

ﬁers. The limitation identiﬁed in this study is that there are

several accuracy measurement metrics available in the liter-

ature that could be applied to check the eﬀectiveness of the

proposed model. However, only basic error measure is used

to compare its performance with already existing algorithms.

Besides, the stability of the model is also not computed.

4.2. Genetic algorithm

In the study [31], an energy management technique is

proposed. The problem is formulated as a three-staged opti-

mization problem with four players i.e. utility, energy stor-

age company, microgrid (MG) and consumers. Each player

makes strategies and tries to maximize its proﬁt. A wind

power forecasting model is proposed to forecast the power

generation as it is of intermittent nature and its prediction

plays a very important role in making strategies by MG. The

model is based on deep learning with stacked autoencoders

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 10 of 35

and GA. Former is used for the prediction and later is used

to tune the hyperparameters of the forecasting model. The

backpropagation (BP) algorithm is also employed for the train-

ing of the model and adjusting the initial weights of the net-

work. The values of the hyperparameters of autoencoders

and weights of the network have a great inﬂuence on the per-

formance accuracy of the forecasting model, so, GA is em-

ployed to determine the accurate values of these parameters.

The dataset from a local MG in Hebei (China) is used during

the period from Sep 2015 to Oct 2016. Forecasting accuracy

was compared with the BP algorithm and SVM using mean

absolute percentage error (MAPE) as a performance metric.

The simulation results demonstrate the higher accuracy of

the newly proposed method over both legacy models. The

limitation identiﬁed in this work is that data is not prepro-

cessed which can degrade the forecasting performance of the

model.

In [32], Bianchi et al. have addressed the problem of

electricity load forecasting of a time series. The dataset con-

sists of multiple variables that are considered important for

forecasting. The data preprocessing step is also included in

this paper for dimensionality reduction of the dataset. The

forecasting horizon of one day is considered with a forecast-

ing interval of 10 minutes. For electricity price prediction,

authors have used an echo state network (ESN). It has hyper-

parameters which need to be conﬁgured eﬃciently as they

aﬀect the overall forecasting performance of the model. Ad-

ditionally, they use both real and integer values, so, this issue

is tackled by using a variant of GA. The diﬀerence between

classical GA and its variant lies in the values of chromo-

somes which are both integer and real values deﬁned over

a ﬁxed interval. For population updation, the Gaussian mu-

tation method is used where a random number is obtained

using Gaussian distribution and added to each child vector.

For crossover, the Laplacian crossover method is used. Af-

ter the prediction of each column, predicted values are inte-

grated and the ﬁnal result is generated. The proposed fore-

casting model is implemented using the dataset from the Bel-

sito Prisciano feeder situated in azienda comunale energia

ambient (ACEA) power grid. The recorded dataset is for

three years, and each value is measured every 10 minutes.

The performance is compared with the autoregressive (AR)

integrated moving average (ARIMA) forecasting model. For

performance evaluation, normalized root mean square error

(NRMSE) is used as a performance metric. It is evident

from the results that the purposed model performs better than

ARIMA. Simulations are carried out using MATLAB.

In related work [33], Eseye et al. have addressed the

problem of power generation forecasting of wind generators.

The power generation of these sources is highly intermit-

tent and their integration with SG can be made eﬃcient by

forecasting their generation in advance. So, in this study the

authors have used ANN for prediction and GA is used to

train the model. The proposed model has two stages, in the

ﬁrst stage, the GA-ANN model is used to forecast the wind

speed using variables like wind speed, wind direction, air

pressure, humidity and air temperature. In the next stage,

the historical data from the SCADA database is used to train

the model for the power prediction of wind generators. So,

the proposed model is a double staged hierarchical hybrid

GA-ANN framework. In this model, GA is used to acquire

the eﬃcient connection weight coeﬃcients between neurons

of ANN. Generally, ANN has multiple layers and applies

the BP algorithm for parameter optimization. Generally, BP

uses gradient descent which can be trapped into local optima

and may result in poor optimization of these parameters. To

overcome this limitation of BP, GA is used as it is a global

optimum search algorithm. The proposed model is imple-

mented for the power prediction of MG wind farm in Bei-

jing China. MAPE, sum of squared error, root mean square

error (RMSE) and standard deviation error are used as per-

formance metrics. For simulations, MATLAB is used on a

PC Intel Core i5-5200 CPU, 2.20 GHz processor and 4GB

RAM.

4.2.1. Micro GA (MGA)

Alamaniotis et al. [34] proposed a new hybrid model

for price prediction. This method is ensemble method with

multiple relevant vector machines (RVMs). Each RVM pre-

dicts price and these predictions are clustered as a single lin-

ear regression optimization problem. The objective of this

optimization problem is to ﬁnd the optimized weight coef-

ﬁcients of MGA. After ﬁnding the appropriate solution, the

ensemble method forecasts the ﬁnal price value. In the ﬁrst

step, each RVM uses a diﬀerent kernel function for predic-

tion. In this way the diﬀerent dynamic values of electricity

price are obtained. In the next step, these predicted values

are clustered as a multiple linear regression ensemble and a

single forecasted value is generated. In order to complete

this task, an optimization problem of weight coeﬃcient for

each regression value is solved using MGA. This is a variant

of classical GA based on the same core principle of survival

of ﬁttest. It takes only ﬁve chromosomes, where mutation

is taken equal to zero and crossover value is equal to 1. It

also transfers the best chromosome from current population

to the next. After getting the optimized value of coeﬃcient

weights, the ﬁnal forecasted value is generated. The eval-

uation criteria used as objective function in MGA is mean

absolute error (MAE). In RVM, three types of kernels are

used namely: Gaussian kernel, polynomial kernel and spline

kernel. To evaluate the performance of the proposed model,

it is applied on dataset obtained from New England elec-

tricity market. The performance is compared with existing

schemes e.g. AR moving average (ARMA) and naive fore-

caster. MAE of each model is computed as it is more eﬃ-

cient than mean square error (MSE) and MAPE because its

performance is not aﬀected by outliers and zero values (di-

vision by zero). The values of MAE validate the better per-

formance of the proposed forecasting algorithm than other

two existing algorithms. The limitation of this work is the

absence of data preprocessing step. As in this model, the

data is not normalized such that outliers and spikes in data

are still present which aﬀect the forecasting accuracy of the

system. So, by adding these steps the forecasting accuracy

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 11 of 35

as well as the time and space complexity of the algorithm

can also be improved.

4.2.2. Cuckoo search algorithm (CSA)

Xiao et al. have proposed a combined model for electric-

ity load forecasting in [35]. It is stated that a single forecast-

ing model may not generate best results in diﬀerent scenar-

ios. Performance of model varies over the time and scenario.

So, to address this problem a combined model is designed

in this paper. This model includes BP based NN (BPNN),

radical basis function NN (RBFNN), GA optimized BPNN

(GABPNN) and generalized regression NN (GRNN). Ac-

cording to the combined forecasting model theory, when a

forecasting problem is solved by multiple forecasting mod-

els, their coeﬃcient weights should be selected eﬃciently

and results are added up for a ﬁnal value. In this study,

for optimization of coeﬃcient weights of forecasting model,

CSA is utilized. This algorithm is selected because its con-

vergence speed is fast and it ﬁnds the global minima in a few

iterations. To evaluate the performance of proposed model,

three diﬀerent datasets are used. The duration of ﬁrst two

datasets is from November 2006 to 2008 and third between

August 2006 and 2008. The datasets are available in raw

form and need preprocessing before they are used in fore-

casting. In the ﬁrst step, the issues like data spikes and re-

dundancy of features are solved. After the data preprocess-

ing step, the resultant dataset is passed to the forecasting

model. Here, the forecasting results of each model are eval-

uated separately and their accuracy is compared with pro-

posed combined model. To understand the characteristics

of these models, Diebold-Mriano accuracy test is used along

with other performance evaluation metrics e.g. absolute er-

ror (AE), MAE, MSE and MAPE. The smallest value of

these performance metrics means best prediction results. A

benchmark forecasting algorithm, ARIMA is also used for

comparison. Moreover, to evaluate the stability and accu-

racy of proposed model, bias-framework is used. The train-

ing and testing samples are chosen randomly. It is evident

from the simulation results that no single forecasting model

generates best results in all 24 hours. An algorithm whose

performance is best in ﬁrst four intervals, performs average

for next three intervals and its performance becomes worst at

the middle section of prediction, when compared with other

forecasting algorithms. On the other hand, the performance

of the combined model is far better with higher accuracy and

better stability. It shows that the combined model success-

fully combines the advantages of all ﬁve models and it is

simple and eﬃcient model for forecasting problems.

Naz et al. [36] proposed a forecasting algorithm to fore-

cast the energy generation from a photovoltaic cell. This in-

formation is then used to estimate the price of electricity.

The core aim of authors is to manage the storage capacity

and energy generation in an MG. The game theory-based ap-

proach is used for energy management. As the energy gener-

ated from photovoltaic cells is uncertain, so, forecasting its

energy generation values play a very important role in plan-

ning future strategies. The hyperparameters of the forecast-

ing model are optimized using CSA and gray wolf optimiz-

ers. To evaluate the forecasting accuracy, MAPE and RMSE

are used. Meanwhile in [37], Wang et al. have addressed

the problem of electricity price forecasting. Dynamic choice

ANN (DCANN) is used for forecasting which is a variant of

ANN and diﬀerence lies in the selection of input. In this

model, the input is selected according to the desired output.

However, it also has the same issue of parameter optimiza-

tion as ANN. To tune the hyperparameters optimally, CSA

is integrated in DCANN and this new hybrid model is called

updated DCANN. To evaluate the performance, it is imple-

mented on the dataset acquired from Queensland Australia in

2010. For simulation purpose, a Core i7 3.40 GHz processor

with MATLAB is used. The performance of proposed mod-

els is compared with BPNN, fuzzy NN (FNN), LSSVM, AR

fractionally integrated moving average (ARFIMA) and gen-

eralized AR conditional heteroskedasticity (GARCH) using

MAPE and MAE as performance metrics. The results show

that the proposed algorithm beats these existing models in

terms of both performance metrics. The accuracy of updated

DCANN is higher than other algorithms. However, the com-

putational time of the proposed model is higher than all other

benchmark models which is the only limitation of this work.

4.2.3. Modiﬁed ﬁreﬂy algorithm (MFA)

In [38], an SVR based forecasting model is introduced

which is hybridized with MFA for better prediction. This

model is developed for the short-term load forecasting. MFA

is used to tune the hyperparameters of SVR as these param-

eters have a direct eﬀect on the forecasting accuracy of the

classiﬁer. MFA is a modiﬁed version of already existing ﬁre-

ﬂy algorithm (FA). Several nature inspired algorithms have

been used in existing literature like: GA, PSO, ant colony

optimization (ACO) and artiﬁcial bee colony optimization

(ABCO). It is stated that these algorithms are not as eﬃ-

cient as FA is. One main reason is that they do not have the

storage capability to store the best solution before moving

on to the next iteration. The MFA introduced in this pa-

per aims at improving the search ability of FA and reduce

the possibility of its trap in local optima. In this regard,

a modiﬁcation method is introduced, where, two mutation

and three crossover operations are included. Moreover, the

whole population is moved toward the global optimal so-

lution and all the solutions are improved in this way. The

MFA is then used to tune the hyperparameters of SVR which

in return improves its forecasting accuracy. To evaluate the

performance of the proposed model, relative percentage er-

ror, mean percentile error, RMSE and MAE are used as per-

formance metrics. The proposed hybrid algorithm is also

compared with already existing models including ARMA,

ANN, SVR-FA, SVR-GA, SVR-honey bee mating optimiza-

tion (SVR-HBMO) and SVR-PSO using relative percentage

error. The results presented in this study clearly depict better

performance of new hybrid model than all other forecasting

models. Moreover, the performance of MFA is compared

with harmony search, ACO, ABCO and FA in terms of mean

and standard deviation. The results obtained from MFA vali-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 12 of 35

date the eﬃciency and better performance of MFA. The load

data of Faras province of Iran is used for evaluation. The lim-

itation identiﬁed in this paper is that data preprocessing step

is not included.

4.2.4. Quaisi oppositional artiﬁcial bee colony

optimization (QOABCO)

Progressing further, in [39], Shayeghi et al. have pro-

posed a forecasting model for electricity price and load pre-

diction. The Core aim of this study is to capture the existence

of electricity price and load dynamics, as in the existing lit-

erature they are sometimes used together for forecasting pur-

pose but their dynamics are never evaluated. There are three

main steps of this model. In the ﬁrst step, the dimensions of

the dataset are reduced and irrelevant features are ﬁltered out

to improve the forecasting accuracy and make the system ro-

bust. In the next step, the dataset is divided into several small

subsets. The third step is the forecasting step, where, a mul-

tiple input multiple output (MIMO) model is used to forecast

both electricity price and load. This study highlights the re-

quirement of simultaneous forecasting of load and price. It is

stated that most of the existing models predict load and price

of electricity separately which was suitable for unidirectional

grids. The modern SG has two way communication and en-

ables customers to take part in demand response program.

So, it urges the need of simultaneous forecasting of elec-

tricity load and price prediction along with their dynamics.

In the proposed model, dataset is preprocessed before fore-

casting. After data preprocessing, LSSVM is used for pre-

diction. This forecasting model is based on MIMO method

to forecast electricity load and price values simultaneously.

The hyperparameters of MIMO-LSSVM are adjusted using

newly proposed QOABCO. The classical ABCO algorithm

is simpler, ﬂexible and more robust than other frequently

used nature inspired optimization algorithms e.g. GA and

PSO. It also has few control parameters and easy to imple-

ment. Its hybridization with other algorithms is also sim-

ple and has stochastic nature while handling the objective

cost function. However its convergence to the local optima

is very fast in case of multiple variables. To overcome this

limitation, an opposition based learning method is integrated

with ABCO and new model is named as QOABCO. This

model is implemented on the dataset obtained from New

York Independent System Operator electrical market. For

simulations, computer with 2.53 GHz processor with 4GB

RAM and MATLAB are used. To evaluate the forecast-

ing performance, MAPE, mean square error and standard

deviation error are used as performance metrics. The fore-

casted result is compared with original data and ANN. The

newly proposed optimization algorithm QOABCO is also

compared with PSO, GA and classical ABCO and it is ev-

ident from the results that QOABCO has best performance

in terms of min, max and mean of values. However its time

complexity is same as ABCO.

4.2.5. Novel shark search algorithm (NSSA)

In [40], a short term load forecasting model has been

proposed. For electricity load prediction, an improved ver-

sion of Elman NN (ENN) is proposed. It has four layers;

input, hidden and output layers are feed forward layers and

context layer is used as memory. The neurons of this layer

act as memory units and it is connected with hidden layer

through back forward loop. Whereas, the improved ENN

(IENN) uses the self-feedback mechanism between context

layer and hidden layer which makes this network more sen-

sitive to the historical data and improves its forecasting ac-

curacy. Moreover, as the IENN belongs to the NN, so, it

also has parameters which need proper tuning for accurate

forecasting. Shark search algorithm (SSA) is used for this

purpose. It was introduces in 2014 by Abediana for opti-

mization problems. SSA follows the sharkâĂŹs way of ﬁnd-

ing victim. It has two basic steps: ﬁrst one is initialization

and second is evaluation. The classical SSA is improved in

this paper and NSSA is proposed. In NSSA, an additional

step of evaluation of neighbors through Euclidean distance

is added. Here, with the help of neighbors and historical po-

sitions, the best position is acquired. The proposed model is

applied on the load data of three business centers of Arian

golden groups. The performance of proposed model is com-

pared with ARIMA, SVR, BPNN, RBFNN, wavelet theory

(WT) plus BPNN, WT plus RBFNN and WT plus two stage

mutual information. MAPE, RSME, normalized MAPE and

NRMSE are used as performance metrics. Simulation re-

sults demonstrate the better performance of proposed model

than other existing models.

4.2.6. PSO

Raza et al. have considered the problem of electricity

load forecasting of a building in [41] where photovoltaic

(PV) generators are integrated. Five forecasting algorithms

are integrated to predict the load value. The Bayesian model

averaging (BMA) is used to combine the outputs of all fore-

casting algorithms and generate the ﬁnal prediction. The

electricity load demand of a building with PV integration

is of highly intermittent nature which aﬀects the normal de-

mand of electricity and creates peaks. So, demand forecast-

ing of such buildings has a great importance but also diﬃ-

cult at the same time because of the intermittent nature of

PV generators and uncertain load consumption behaviors of

the residents. Owing to the aforementioned points, it is con-

cluded that single forecasting model cannot predict the ac-

curate value of load demand. In this regard, the authors have

proposed an ensemble framework which includes ﬁve fore-

casting models for prediction. This framework has four main

steps. First of all, the dataset is preprocessed using WT al-

gorithm. Here, ﬂuctuations and uncertainties are removed

from the dataset for accurate prediction. In the next step, the

dataset is used by ﬁve forecasting algorithms and predictions

are made. These forecasting algorithms include: BPNN,

ENN, ARIMA, FNN and radial basis function (RBF). Af-

ter prediction, the result of each algorithm is reconstructed

back to the initial shape using WT. These outputs are then

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 13 of 35

combined using an aggregation technique. Here, Raza et al.

have used BMA. It is more frequently used aggregation tool

in literature which has potential to generate a better and ef-

ﬁcient output predictions. Moreover, NNs have parameters

which need optimum values and they aﬀect the accuracy of

the algorithm. So, it is important to train these parameters

eﬃciently. Commonly used gradient learning technique has

low convergence rate which may result in ineﬃcient training

of the model. On the other hand, PSO is a population based

technique and frequently used by the researchers for opti-

mization. Dataset of building management system of AEB

and GCI building at University of Queensland, Australia is

used for the implementation of proposed model. MATLAB

is used for the extensive simulations. The performance of the

newly proposed forecasting model is compared with Persis-

tence, BPNN, ENN, ARIMA, RBF plus PSO, FNN plus PSO

and WT plus FNN plus PSO. Normalized MAE (NMAE)

and NRMSE are used as performance metrics. The simu-

lation results demonstrate the better accuracy of proposed

model when it is compared with other forecasting models.

In study [42], Vrablecova et al. have used online SVR

for short term load forecasting. This variant of classic SVR

saves less data and discards less important or less frequently

used data. In classic SVR, the dataset is divided into training

set and testing set and input variables are identiﬁed after ex-

ploring the whole search space. This model is not ﬂexible for

changes and with time, its accuracy decreases which results

in retraining of the whole model from the initial stage. These

are limitations which make SVR computationally very ex-

pensive as the new changes are integrated in data and the

model is retrained. On the other hand, the online SVR over-

come these limitations. Three additional vectors are deﬁned

and 3 staged incremental process is executed. The ﬁrst step

is the addition of one new vector, in the next step an existing

vector is deleted. In the ﬁnal step a third vector is updated.

So, in this way, new changes are accommodated in already

trained model and retraining of model is avoided. The tun-

ing parameters of the online SVR are same as the classic

SVR. Authors have used PSO and CSA. The performance of

both algorithms is evaluated by tuning their parameters sepa-

ratly and then their results are compared. Performance wise

the CSA is better than PSO. However CSA is slower than

PSO but as it needs fewer iterations to converge to an opti-

mal solution so the computational time of both algorithms

become equal. For performance evaluation, MAPE is used.

The limitation of this work is that the data preprocessing is

not included in the forecasting model.

4.2.7. Metaheuristic algorithms

In [43], Chou et al. have proposed a framework for en-

ergy saving in SG. It is a decision support system which

is based on big data analytics. It can monitor the power

consumed by several appliances and recognizes the pattern

of their consumption to predict the power consumption in

later hours. This information is then used for the eﬃcient

scheduling of the appliances. The proposed framework con-

sists of multiple layers. The ﬁrst layer is data layer which

contains data and all necessary information required by the

system to operate. This information includes: dataset, appli-

ancesâĂŹ information, electricity price signal, voltage in-

formation, power current, frequency and power factor. Me-

tering infrastructure, communication network and data man-

agement module are the main component used by this layer.

The second layer is analytics bench. It integrates diﬀerent

dynamics and multi objective techniques to analyze the en-

ergy consumption pattern of appliances. It also maintains

the record of power consumption cost of diﬀerent schedules.

Forecasting algorithms are integrated in this layer which use

historical data and predict the behaviors of power consump-

tion for next hours. This forecasted information is then used

to schedule the appliances. The forecasting algorithm is hy-

brid version of ARMA and SVR model. Hyperparameters of

this hybrid algorithm are tuned using nature inspired meta-

heuristic optimization algorithm. This algorithm requires

less resources and computational requirements and provides

near optimal solution. In addition to the forecasting model,

this layer also has optimization algorithm to generate the

suitable schedules for the appliances using forecasted de-

mand information. The scheduling algorithm is dynamic

in nature and based on multi objective functions. The third

layer is web-based portal. It enables the user to interact with

the proposed decision support system. MATLAB is used as

a tool for the implementation of the system.

An electricity price and demand forecasting algorithm is

proposed in [44]. An LSTM based forecasting model is pro-

posed. In the proposed model, the dataset is preprocessed

and then used for forecasting. In this step, missing values

and outliers are removed from data and values of all vari-

ables are normalized between 0 and 1. This dataset is then

split into test and training samples. The hyperparameters

of LSTM are tuned using the Jaya optimization algorithm.

It belongs to the family of metaheuristic optimization al-

gorithms. it is a simple algorithm and does not need deep

knowledge for implementation. Using this optimizer, win-

dow size, step size and a batch size of LSTM are optimized.

Two diﬀerent datasets are used for electricity price and load

prediction, both are obtained from Elia grid data. For simu-

lations, a PC with Intel Core i3 CPU with 4 GB of RAM and

a 64-bit operating system is used. The performance of the

proposed algorithm is compared with SVM and uni-variant

LSTM. The simulation results depict that the performance

of the proposed Jaya-LSTM algorithm is better than other

forecasting algorithms. To measure the accuracy of the fore-

casting models, RMSE and MAE are used as performance

metrics.

4.2.8. Improved environment adaption algorithm

Meanwhile Singh et al. [45] have proposed a general-

ized NN model for short-term electricity price forecasting.

The classical ANN model is considered ideal for price fore-

casting because of its ability of mapping nonlinear problems

with high accuracy and dealing with complex forecasting

problems. However, the increased complexity of forecast-

ing model decreases its freedom which results in under or

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 14 of 35

over ﬁtting problems. In the proposed model, WT method is

used for data preprocessing. The weights of the forecasting

model are tuned by using improved environment adaption

method. The over and under ﬁtting problem is overcome

by tuning the hyperparameters by using this nature inspired

evolutionary algorithm. Adaption, alteration and selection

are the three basic operations of this algorithm. The alter-

ation operation has a great contribution as it explores the

search space and adaption exploits the search space so its has

minor contribution in ﬁnding the best solution. The forecast-

ing accuracy is evaluated by computing MAPE and MAE

performance metrics. The forecasting model with optimized

weights and without optimized weights are compared and re-

sults depict that general neuron forecasting model with op-

timized weights generates better results.

4.2.9. Tabu search

Progressing further, in [46], Bassamzadeh et al. have

proposed a data driven approach based on BN for electric-

ity demand forecasting. It computes the mutual dependen-

cies of the variables contributing in forecasting. BNs are

suitable for complex and lengthy datasets as they can han-

dle the incomplete data, integrate previous knowledge into

the model and provide a compact model to avoid overﬁtting.

The search and score category of this network is used, where

a scoring metrics is built by exploring the search space. To

built the scoring metric, BDeu score is used and to ﬁnd the

best directed acyclic graph from multiple available graphs,

authors have used tabu search. Initially a score of a randomly

generated network is calculated, then arc operations are ap-

plied and new scores are computed. At the end, the arc oper-

ation with maximum score is applied. This procedure does

not guarantee global optimal solution rather it gives the local

best solution. This problem is solved iteratively by repeating

the same process until a stopping criteria is satisﬁed. As dis-

cretized BN is most suitable according to the nature of out-

put values, so the the continuous values are discretized by

using Fayyad and Irani method. The discretization parame-

ters also need tunning and authors have used junction three

algorithm for an eﬃcient probability distribution and kernel

density estimation with Gaussian kernel for continuous dis-

tribution to the learned histogram. Performance of the pro-

posed system was analyzed using average RMSE (ARMSE)

as performance metric.

4.3. Statistical approach

In the forecasting models, several statistical approaches

are also used for optimization. The following subsections

elaborate the work where these models are implemented.

4.3.1. Grid search

In [47], Raviv et al. have addressed the question of whether

daily price prediction is better than the average hourly price

of electricity. For this purpose, the univariate models for

daily average price prediction are compared with multivari-

ate hourly price prediction. The hourly electricity price pre-

diction is a challenging task as it requires multiple input fea-

tures and increases the computational complexity. It is ob-

served that the predictive value of the previous hour is not

reliable enough to predict the value of the next hour. The

hourly values are sensitive to the values of the same hour

on the previous day. So, the daily average electricity price

is forecasted using the univariate model and its performance

is compared with hourly price prediction using the multi-

variate model. Later approach suﬀers from the curse of di-

mensionality issue which can be solved by using dimension

reduction methods along with regularization method which

eliminates the outliers. The results of multiple forecasting

algorithms are also combined to generate the ﬁnal predic-

tion as the forecasting model performs diﬀerently in diﬀer-

ent time intervals and combining the results of these models

can increase the accuracy of prediction. For the case study,

the dataset from the Nordic and Baltic transmission system

operator is acquired. It is a leading power market in Europe.

The duration of the dataset is from 1992 to 2010. From uni-

variate models, AR, dynamic AR (DAR) and heterogeneous

are used for forecasting. In the AR model, the number of lags

is included in the model for prediction and their value is con-

stant. Whereas, DAR determined the value of lags at each

point using the Akaike information criterion. The second

variant of AR is designed to accommodate the long memory

to increase forecasting accuracy. From multivariate forecast-

ing models, diﬀerent variants of vector AR (VAR), factor

models (FM) and reduced rank regression models (RRR)

are used. These models are strictly constrained and their

complexity is limited in terms of unknown parameters, as

the unconstrained multivariate models suﬀer overﬁtting and

forecasting becomes less accurate. First, VAR models are

going to be discussed, where vectors are maintained to save

the predictive values of diﬀerent hours. Its ﬁrst variant is

unrestricted VAR, it has no restriction in terms of unknown

parameters and used all the available parameters during fore-

casting. The second variant is diagonal VAR, the number of

unknown parameters is restricted and all coeﬃcients of lags

are limited to zero. The third variant is Bayesian VAR, it

used the shrinkage method to limit the parameters. After

VAR variants, FMs are used to reduce the eﬀect of adverse

eﬀects of the high dimensionality of data. It uses principle

component analysis (PCA) and singular value decomposi-

tion (SVD) for dimensionality reduction. The third model

is RRR which instead of forming orthogonal variables by

matrices X like PCA, reduces dimension by using orthog-

onal projection of Y. To combine the forecasting value of

the proposed models, authors have used two methods: sim-

ple average and constrained least squares. The former uses

the equal weights for every method while averaging the ﬁnal

results and later optimizes the weights of the coeﬃcient on

every point of prediction. For tuning the hyperparameters of

the forecasting models grid search is used. The accuracy of

all forecasting models is evaluated using RMSE, MAE and

MAPE. AR model is used as the benchmark model and error

values are relative errors which are used to represent the per-

formance accuracy of all other models. It is evident from the

results that conditional least square-based forecasting model

outperforms all other forecasting models in terms of all three

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 15 of 35

performance metrics.

In related work [48], a new machine learning-based elec-

tricity load forecasting method is introduced which is the

combination of convolutional NN (CNN) and K-means clus-

tering. It is a scalable model specially developed for big

data-based forecasting. The raw data is collected and pre-

processed. Then the K-means algorithm is used to generate

the subsets of the source data. These subsets are used to train

the CNN model. It is a feed-forward network and its archi-

tecture is inspired by the structure of human neurons. It has

three types of layers: an input layer, multiple hidden lay-

ers and an output layer. In contrast to the other NN models,

it does not need feature engineering as it is also a sub-part

of this network model. The proposed model is designed for

hourly load forecasting using big data. Data is collected form

the industry in raw form and data preprocessing is applied to

remove noise and outliers from it. After data preprocessing,

the K-means clustering algorithm is applied to the dataset

and small subsets of data are generated. Then these clus-

ters are used as training and testing datasets for the forecast-

ing model. The training datasets are used to train the CNN

model and after training, test sets are used to test the forecast-

ing. Besides, the k-means algorithms have a hyperparameter

K which is the optimal number of clusters. This parameter

is tuned using the trial and error method. The hyperparam-

eters of the CNN model are optimized using a grid search.

To implement this model, the dataset of 1.4 million records

is used, the duration of this dataset is from 2012 to 2014.

The performance of the proposed model is evaluated using

MAPE, RMSE, NMAE and NRMSE. It is also compared

with conventional CNN, linear regression, NN, SVR, linear

regression plus K-means, SVR plus K-means and NN plus

K-means for both summer and winter season. The perfor-

mance results depict that the proposed CNN plus K-means

model has the highest accuracy in terms of all performance

metrics.

Xiao et al. investigate the possible applications of data

mining techniques in electricity load prediction using big

data in [49]. Deep learning-based techniques are used to ex-

plore data and forecast the load consumption. There are two

types of deep learning-based techniques: supervised and un-

supervised. Former are used as prediction models and later

are used for feature extraction. In this study, DT and asso-

ciation rule mining are used. For load forecasting, the data

is partitioned into training and testing datasets. Two data

mining techniques, DT and clustering, can be used for this

purpose. The former divides data based on certain criteria

and later divides dataset based on the similarity. DT is used

in this study as it is more interpretable. Moreover, another

deep learning-based model, deep autoencoder (DAE), is de-

veloped which uses the tanh activation function and gener-

ates a new feature set. It is applied to each tree sample for

new feature sets. The next step is the knowledge discovery.

Here the relationship of input variables is determined using

association rule mining techniques. The QuantMiner algo-

rithm is used for both numeric and absolute values. It is a

quantitative association rule mining algorithm and the as-

sociation of each data subset is computed separately. After

knowledge discovery, the forecasting algorithm is applied

to data for electricity load prediction. The gradient boost-

ing algorithm, SVR and extreme boosting algorithms are

used for load prediction and evaluatethe performance of the

data mining techniques. These forecasting algorithms are

applied to raw data, basic data and data generated by the

DAE method. The hyperparameters of all forecasting al-

gorithms are tuned using the grid search. For performance

evaluation of the proposed model, it is applied to the dataset

obtained from a campus building of Polytechnic University

Hong Kong China. The performance of each forecasting al-

gorithm is evaluated using MAE, RMSE and coeﬃcient of

variation of the RMSE. The simulation results and values of

these performance metrics demonstrate that higher accuracy

is achieved by the extreme boosting algorithm when it is ap-

plied on DAE feature set.

Raurich et al. have tackled the issue of electricity load

forecasting in non-residential buildings in [50]. The impor-

tance of temperature, occupancy, calendar and indoor am-

bient variables are also evaluated for load prediction of a

building. The dataset from a university is collected using

a wireless sensors network. The collected data is prepro-

cessed to reduce the computational cost and improve the ac-

curacy of the forecasting model. For load prediction, three

forecasting models, multilayer perceptron (MLP), multiple

linear regression (MLR) and SVR, are employed. The hy-

perparameters of these models are tuned using grid search

as it is an eﬃcient method to tune the hyperparameters of

the models with small data size. The simulations are car-

ried out using Weka software on the machine having Intel

Core i7 CPU and 8 GB of RAM. Simulations are conducted

using diﬀerent combinations of input variables and forecast-

ing accuracy of each combination is computed using MAPE

and correlation coeﬃcient. From the results, it is concluded

that SVR gives the highest accuracy using temperature and

occupancy data.

A day-ahead electricity price and demand forecasting model

is proposed in [51]. It is beneﬁcial for both electricity con-

sumers and providers. Electricity consumers can use this in-

formation for energy management and reduce their electric-

ity bills while electricity providers can use this forecasting

information to manage electricity generation and maintain

the reliability and eﬃciency of the grid. In this paper, the

authors have used two forecasting models, SVR and CNN.

The hyperparameters of both models are tuned using grid

search and their performance is compared. Two datasets are

used for forecasting. To evaluate the forecasting accuracy of

both models, MSE, RMSE, MAE and MAPE are used. The

simulations depict that the performance of enhanced CNN

is the best.

Garulli et al. [52] have addressed the problem of elec-

tricity load forecasting in the presence of demand response.

In an active demand scenario, the consumers alter their load

consumption proﬁle according to the incentives provided by

the utility. Two models of load forecasting are introduced

i.e. black box and gray box models. In the black box model,

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 16 of 35

SVM and ANN are used for forecasting. In this model, the

load for next intervals is forecasted by using information re-

lated to the active demand, temperature and calendar data.

On the other hand, the gray box model uses prior informa-

tion related to load decomposition. It has two main stages

i.e. computation of baseload and modeling the residual. Ex-

ponential smoothing (ES) is used for the ﬁrst stem, whereas,

the residual is modeled using TF, SVM and ANN and re-

sultant models are known as ES-TF, ES-SVM and ES-ANN

respectively. Moreover, to evaluate the importance of AD,

the ES-ARMA model is also presented. For the simulation

of these forecasting models, MATLAB is used and RMSE

and NRMSE are computed for each forecasting model. Hy-

perparameters of the forecasting model are tuned using the

grid search. It is concluded that AD information is very im-

portant for accurate prediction and its scope is high in the

future as new AD-based products are being launched.

Keles et al. [53] have proposed a day-ahead electricity

price forecasting model based on ANN. The main focus of

this study is on data preprocessing techniques and the right

selection of hyperparameters of the model as these inputs

have a great impact on the forecasting accuracy of the model.

For data preprocessing, diﬀerent clustering algorithms have

been used. There are two forecasting strategies available i.e.

direct forecasting and iterative forecasting. The former fore-

casting model does not use historical forecasted values in-

stead it performs forecasting independently by considering

each interval a separate model. On the other hand, later uses

the previous forecasted value. Authors have used the ﬁrst

model as in the case of the second strategy, if an error oc-

curs in previous forecasted value, it will be propagated to

the next forecast as well. For hyperparameter tuning, both

cross validation and grid search are used. The ﬁrst method

is used to tune the parameters like the selection of activation

function, learning rate, number of hidden layers and neurons

and number of output neurons. To select the best combina-

tion of the learning rate and momentum of the algorithm,

the grid search is used. The proposed model is compared

with three benchmark models i.e. ARMA and two NB mod-

els. For performance comparison, ARMSE and mean abso-

lute deviation are used. The dataset is used from Jan to Sep

2013. The limitation of this work is that if new components

are added to the system as input then the whole model needs

changes which makes it ineﬃcient for the integration of new

parameters and adopting new changes.

4.3.2. Gradient descent

In [54], Amarasinghe et al. have proposed a load fore-

casting model based on deep NN (DNN). This model fore-

casts the power consumption of a building. The multiple

layers of deep learning algorithms allow it to identify the

load consumption patterns and the relationship of data fea-

tures more eﬃciently. It also learns the relationship between

inputs and their corresponding output. In this model, CNN

is used for forecasting. It applies multiple convolutional lay-

ers on the available dataset before the ﬁnal prediction. The

convolution operation requires grid data for processing and

the kernel function also uses the weighted array of multiple

dimensions. Thus this model requires a dataset of multiple

dimensions for prediction. There are three basic steps of a

convolutional layer. In the ﬁrst step, a feature map is gener-

ated using convolutional operation which is passed through

an activation function in the next step. The resultant output

is then processed by a pooling function to reﬁne and reduce

the ﬂuctuations from the feature map. In the second step,

the rectiﬁed linear activation function is used and for pool-

ing in step, max pooling is used. As the proposed model

has multiple convolutional layers, so, each layer passes its

output to the next convolutional layer and BP is used for the

learning process. This BP based model needs the optimiza-

tion, so, ADAM optimization algorithm based gradient de-

scent method is used for the optimization. After convolu-

tional layers, the output is forwarded to the hidden layer(s)

which then forwards it to the output layer of the network.

The hidden and output layers of the convolutional network

are the same as standard NN. To evaluate the accuracy of

the model, the loss function is used which computes the er-

rors in prediction for all predicted values. To check the ef-

fectiveness of the proposed model, it is implemented on the

benchmark dataset named âĂĲindividual household electric

power consumption datasetâĂİ from Dec 2006 to Nov 2010.

This model is also compared with ANN, SVM, Factored

conditional restricted boltzmann machine and long short-

term memory (LSTM) using RMSE as a performance met-

ric. The results demonstrate that the proposed CNN is a vi-

able candidate for load forecasting as its forecasting accuracy

is higher than these benchmark forecasting algorithms. The

limitation identiﬁed in this work is that no data preprocess-

ing technique is mentioned here for the dataset. Literature

depicts the importance of data preprocessing steps in terms

of accuracy of forecasting models, time and space complex-

ity, etc.

A novel framework for electricity price prediction is pro-

posed in [55]. It includes four novel deep learning-based

models. It is stated that there is no standard benchmark for

electricity price prediction, so, the proposed models are com-

pared with 27 forecasting algorithms to validate their eﬀec-

tiveness. The proposed models include DNN, hybrid model

of LSTM and DNN, a hybrid model of gated recurrent units

(GRU) and DNN and a CNN model. The ﬁrst model, DNN,

is the extension of MLP based on deep learning with two hid-

den layers. The second model is the combination of LSTM

and DNN forecasting models. It includes a normal layer that

learns the relationship of non-sequential data and a recurrent

layer which learns and identiﬁes the relationship of sequen-

tial data of time series. The third layer is the combination of

GRU and DNN. Like the second forecasting model, it also

has separate layers for both sequential and non-sequential

data. However, the LSTM layer has an extra computational

burden which has been reduced by using GRU instead of

the LSTM network. The fourth deep learning model intro-

duced in this work is CNN. In the previous two models, the

data was separated as past data sequences and data contain-

ing day ahead information. Whereas, in this model, the data

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 17 of 35

clusters are made based on its dimensionality. Moreover,

the detail of hyperparameters is also given in this study. It

is stated that the hyperparameters depend on the data that

we use each time. These parameters need to be optimized

each time the dataset is changed, so there is a need for an

optimizer to optimize these parameters when needed. The

hyperparameters are tuned using adam which belongs to the

stochastic gradient descent. The common hyperparameters

of all models include activation function, dropout and L1-

norm penalization. The hyperparameters of DNN are the

number of neurons in both hidden layers. The hyperparame-

ters of both GRU-DNN and LSTM-DNN are the same which

are the number of neurons in recursive and DNN layers and

length of the sequences. CNN model has pooling frequency,

pooling type, channel length, ﬁlter size, number of convo-

lution and features maps as its hyperparameters. During the

training, minimized AE is used as an objective function. The

dataset from European power exchange Belgium is used dur-

ing the period from 1 Jan 2010 to 31 Nov 2016. The data

is preprocessed using Box-Cox transformation. The perfor-

mance of the proposed models is compared with 27 fore-

casting models: AR, double seasonal ARIMA (DSARIMA),

wavelet ARIMA (WARIMA), WARIMA-RBF, ARIMA-GARCH,

double seasonal holt winter (DSHW), Trigonometric regres-

sors to model Box-Cox transformations autoregressive mov-

ing average errors trend seasonality (TBATS), dynamic re-

gression (DR), transfer function (TF), AR with exogenous

input (ARX), threshold ARX (TARX), Hsieh-Manski ARX

(HMARX), smoothed nonparametric ARX (SNARX), full

ARX (FARX), FARX-least absolute shrinkage and selection

operator (FARX-LASSO), FARX with elastic net (FARX-

EN), MLP, RBF, SVR, self-organization map (SOM)-SVR,

SVR-ARIMA, random forest (RF), extreme gradient boost-

ing (XGB), DNN, LSTM, GRU and CNN. For comparison,

the performance of all forecasting models is evaluated us-

ing symmetric MAPE (sMAPE) as a performance metric.

The simulations are carried out using Python. The results

demonstrate that the proposed models outperform other ex-

isting models and from these four models, all models per-

form best in one or the other interval.

In related work [56], Wang et al. have used deep learning-

based stacked denoising autoencoder (SDA) model and its

extension, random sample SDA (RS-SDA), for short term

electricity price forecasting. These models are used for two

types of predictions, hourly day-ahead, and hourly online

price. Autoencoders are the NN which encode the input us-

ing an activation function before learning. It is an unsuper-

vised learning algorithm where instead of assigning labels,

the symmetry in data is learned. Before generating the out-

put, the values are decoded back to the original input format.

The output generated by autoencoders may have noise in it,

this issue is overcome by using denoising autoencoder which

encodes the given input data and also removes the noise from

it. The proposed SDA is the combination of multiple denois-

ing autoencoder layers, where the input of each autoencoder

is the output of the hidden layer of the previous autoencoder.

The RS-SDA incorporates random sample consensus and

stochastic neighbor embedding. Where the former is an iter-

ative process used to eliminate the eﬀects of outliers that oc-

cur during the construction of the model and later is used to

improve the forecasting accuracy by optimizing the number

of hidden layers. Data features are chosen with the help of

market traders and to check the relevancy of each feature, a

boosting tree algorithm is used. After feature selection, SDA

and RS-SDA are applied for forecasting, where a greedy lay-

ered architecture is used for training. It has two stages: a

pre-training stage, where autoencoder layers are trained it-

eratively using input data except the output layer and a ﬁne-

tuning stage, where all the layers are trained including the

output layer to increase the forecasting accuracy of the model

and here mini-batch gradient descent is used. This model

is used because it does not require the whole screening of

the dataset and parameters are not updated iteratively. After

training and tuning of the forecasting models, the electric-

ity price value is forecasted. The proposed model is tested

by implementing it on the dataset from Mid-continent Inde-

pendent System Operator Inc. in the U.S including the Ne-

braska Public Power District, Arkansas, Louisiana, Texas,

and publicly available Indiana. For simulations, Python 2.7

using a computer with Core i5 processor and 8 GB of RAM

is used. The proposed models are also compared with other

forecasting models (SDA, RS-SDA, classical NN and SVM)

to check their forecasting accuracy. MAPE, MAPE(day) and

MAPE(month) are used as performance metrics. The limi-

tation identiﬁed in this paper is that the temperature feature

is not included as input. It is a very important feature which

aﬀects the values of the price.

In the study [57], Lu et al. have proposed a NN based

short-term load forecasting system which is an improved ver-

sion of RBFNN. A new clustering algorithm, PCA based

weighted fuzzy C-Mean (PCA-WFCM), is proposed to de-

termine the optimal basis function centers which improves

the accuracy of the forecasting model. The new forecasting

model is named as RBF-PCA-WFCM. For the implemen-

tation of the proposed model, the dataset from NSW State,

Australia is used. This is an hourly dataset with 48 records

per day and the duration of the data is from April 2011 to

October 2011. The traditional RBFNN basis on the approx-

imation theory and a feed-forward network. It has a simple

architecture which makes it easy to train. It has three impor-

tant layers: input layer, hidden layer and output layer. The

source data is entered in the input layer which is forwarded to

the hidden layer using non-linear transformation. This trans-

formation is used to extract important features from data. In

the next step, the data is forwarded to the output layer us-

ing a linear transformation. In this forecasting model, the

selection of an optimal number of neurons for each layer is

very important which are equal to the input variable and out-

put layer neurons are equal to the number of required out-

puts. Whereas, the optimal number of hidden layer neurons

are determined by minimizing the MAPE value. Moreover,

the connection weights of output layers also play a very im-

portant role in forecasting accuracy. Here, the gradient de-

scent method is used to train them. The most important task

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 18 of 35

of the RBFNN model is ﬁnding the accurate center points

of RBF which improve the forecasting accuracy. The tradi-

tional RBFNN uses the K-mean algorithm for this purpose,

whereas, the authors have proposed a new PCA-WFCM al-

gorithm. PCA is used for dimension reduction of data and

clustering is based on fuzzy logic, where each point has some

degree of membership with each cluster. To evaluate the

performance of the proposed system, MAPE and MSE are

used as performance metrics. Its performance is also com-

pared with RBF and RBF fuzzy C-Mean (RBF-FCM) fore-

casting models. The results demonstrate that the proposed

algorithm outperforms these two forecasting models and has

higher accuracy. The limitation identiﬁed in this network is

that the gradient descent algorithm is not suitable for big data

problems and the number of hidden layer neurons could be

trained using some optimization algorithm.

Meanwhile, in [58], the forecasting problem of solar power

generation is addressed by using the least absolute shrinkage

and selection operator (LASSO) based algorithm. The his-

torical data related to weather is used and the signiﬁcance of

each weather variable is also computed to acquire the knowl-

edge about their importance. To estimate the forecasting

model’s coeﬃcients, a Kendalls’ tau coeﬃcient based algo-

rithm is proposed which maximizes the values of Kendalls’s

coeﬃcients to ﬁnd the solution. LASSO reduces the weights

of irrelevant variables and reduces the dimensions of the

data. However, there is a trade-oﬀ between reducing the

number of variables and the accuracy of the model. The in-

creased number of variables results in better accuracy but

increased computational time is also there. The hyperpa-

rameter aﬀecting the forecasting accuracy of the proposed

algorithm is tuned by using gradient descent. The solution

path method is also integrated with this tuning method to

increase the eﬃciency of the model and reduce the compu-

tational eﬀorts, as using this new method the whole search

space is not explored for an optimal solution. The forecast-

ing performance of the proposed algorithm is compared with

a well-known forecasting technique i.e. SVM [59]. Two

datasets are used for the performance evaluation, the ﬁrst

one is recorded from Feb 2006 to Jan 2013 and the second

dataset is April 2011 to Nov 2012. To compare the perfor-

mance of these three algorithms, RMSE and MAPE are used

as performance metrics. The simulation results depict the

outstanding performance of the proposed algorithm over two

benchmark algorithms. Hence LASSO based forecasting al-

gorithm proved to be a promising model for solar power fore-

casting problem.

Zheng et al. have proposed a short term load forecasting

model in the study [60]. This model integrates data prepro-

cessing, decomposition and forecasting methods for accurate

prediction and named as SD-EMD-LSTM. Where similar

days (SD) selection method is used for preprocessing, em-

pirical mode decomposition (EMD) is used for data decom-

position and LSTM is used for forecasting of load in the short

term. In this model, in addition to the humidity, day type and

temperature data, day-ahead peak load is also used as an in-

put feature as short term load forecasting is aﬀected by this

feature. LSTM is a recurrent NN (RNN) which is suitable

for the long-term data dependencies. The classic RNN is not

suitable for the long-term forecasting problem as it does not

store the previous data for a long time. Whereas, the LSTM

based RNN overcomes this problem by integrating mem-

ory cells which store the state information, thus making it

suitable for long-term prediction. Additionally, researchers

have introduced the sequence to sequence architecture in this

study, where the length of input and output is variable and

the time scale of the forecasting model can be changed ac-

cording to the requirements. The hyperparameters of the

forecasting model are tuned using stochastic gradient de-

scent. The forecasting error is mini minimized using MAPE

as an objective function. The forecasting performance of the

proposed model is compared with ARIMA, BPNN and SVR.

Additionally, it is also compared with LSTM, SD-LSTM and

EMD-LSTM. MAPE is used as a performance metric and

its results demonstrate that the proposed framework has the

highest forecasting accuracy.

4.3.3. Cross validation

In the study [61], a forecasting model of electricity con-

sumption is proposed for event venues. The green button

data is used for model training. Such forecasting models can

help the venue organizers to estimate the load consumption

in advance and add its cost to their fee. Moreover, in such

events, energy is consumed in a huge amount and this de-

mand can put a burden on grids. The energy consumption

pattern of such events has huge variations as compared to the

oﬃce buildings which have strict energy consumption pat-

terns. In this work, two forecasting models are implemented

for prediction: NN and SVR. For NN, the feed-forward NNs

are selected as they are the most commonly used NN. It has

three layers and information travels in one direction only.

The neurons of each layer are connected but there is no con-

nection between the neurons of the same layer. On the other

hand, the SVR is used for classiﬁcation and regression-based

problems. In the ﬁrst step, the dataset is divided into a test

set and training set. Then these datasets are used to train both

forecasting models. The hyperparameters of both models are

tuned using a cross validation technique. After model train-

ing, the prediction is made and the accuracy of prediction

is checked using MAPE and coeﬃcient of variance metrics.

Three case studies are designed for predictions. In the ﬁrst

case study, the time interval is 15 min, and for second and

third case studies, the prediction is made for one hour and

one-day intervals, respectively. The duration of the dataset

used for forecasting is of two years. The performance com-

parison of both forecasting algorithms depicts that the feed-

forward NN has better performance than SVR.

In [62], an inverse optimization scheme is proposed to

forecast the electricity load consumption. It has two levels

i.e. an upper-level problem and lower-level problems. For-

mer deals with the estimation of bid in the market, this bid is

placed by either consumer of electricity or retailer. The in-

verse optimization scheme is used at this stage. On the other

hand, later deals with the price response of a group of con-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 19 of 35

sumers. The parameters of this level are determined by the

successful bid placed in the upper-level problem. The placed

market bid the predicted amount of load for the future. The

tuning parameters of this model are optimized using cross

validation. The values of these parameters are optimized for

each month. The benchmark model ARX is used to check

the eﬀectiveness of the prosed scheme. For a fair compar-

ison, the dataset of Dec 2006 is used. The performance is

compared based on MAE, RMSE and MAPE performance

metrics. The simulations are carried out using a Linux based

system having a quad-core processor with a clock speed of

2.9 GHz and 6 GB of RAM. R studio is used as a platform

along with CPLEX 12.3 to solve the optimization problems.

In related work [63], the short term load forecasting tech-

nique is presented for an educational building. The proposed

model has two stages and the dataset is collected from a uni-

versity having 32 buildings for the past ﬁve years. The data

related to weather conditions, calendar and university tim-

ings are collected. The on-line data of energy consumption

is monitored using KEPCOâĂŹs smart system. The weather

information is collected from the Korean meteorological de-

partment. The data preprocessing step is also included and

values of input variables are normalized and transformed ac-

cording to the requirement. In the ﬁrst step of the proposed

forecasting model, the frequent patterns are identiﬁed from

the available dataset with respect to the weekdays using a

simple moving average method. It is a popular method that

highlights the frequent trends in data for the long term and

short term ﬂuctuations in the dataset are ignored. It forecasts

the load demand according to the frequent patterns identi-

ﬁed in past data, this implies that if identiﬁed patterns are

not valid then it could generate poor forecasting values. So,

owing to this issue, this model is not eﬃcient to forecast the

electricity load demand alone. Another forecasting method

named RF is used in this model for forecasting. This model

is an ensemble method that uses the prediction of multiple

DT. It is eﬃcient for a large amount of data and has high ac-

curacy. Moreover, its hyperparameters need less tuning and

their default values give good results in most of the cases.

Its basic parameters are chosen using the cross validation

method. Moreover, as the proposed model is a time series

model, so, its predictive accuracy becomes poor when the

gap between the training and forecasting time gets bigger.

To solve this problem, Moon et al. have used the time series

cross validation method. In this method, there are multiple

horizons and this approach focuses on one forecasting hori-

zon at a time. The simulations are carried out using RStudio

with R-3.0.2. The proposed model is also compared with

already existing forecasting models like SVR and NN. For

performance evaluation of these forecasting models, MAPE,

RMSE and MAE are used as performance metrics. In addi-

tion, DT, multiple regression, gradient boosting machine and

J. Moon et al. models are also compared with this proposed

forecasting model. The results of all models demonstrate the

higher accuracy of the proposed model in terms of all three

performance metrics.

Moghaddass et al. [64] have proposed an anomaly de-

tection method based on data collected from smart meters.

The major aim of this model is to detect the occurrence of an

anomaly in real-time and prevent it. An error count is mea-

sured at each customer’s side and delivered to the system

with which the customer is linked. This information is then

used to predict the occurrence of an anomaly. The math-

ematical formulas are deﬁned to compute the error count

along with anomaly detection formulas for both customers’

side and system side. Anomaly detection with a control limit

is also deﬁned. When the health index of the severity level

exceeds a certain level of threshold, an alarm is generated

which indicates the occurrence of a possible anomaly in the

system. The dataset is split into two sets, one with 80 percent

of data and others having 20 percent of data. The ﬁrst set is

used for the training of the model and the remaining 20 per-

cent is used for the testing purpose. The accuracy, precision

and false alarm generated by the system are computed and

accuracy is evaluated based on these measures. For tuning

the hyperparameters of the system cross validation method

is used.

In [65], an SVR based cascade failure prediction model

is proposed. A probabilistic framework is designed to main-

tain the historical database. This model collects both online

and oﬄine data from the grid. The online data includes the

information related to voltage, current and measurements of

power ﬂow from the grid. The oﬄine data includes past data

related to islanding of grid, blackouts and transfer outage.

The proposed framework has two phases i.e. one collects

and maintains the database and in the second phase SVR is

used to predict the power outages using this historical data.

The hyperparameters of SVR are tuned using cross valida-

tion. The kernel function is chosen carefully as it aﬀects the

complexity and smoothness of the prediction model. The

prediction generated by SVM is of binary nature which in-

dicates the occurrence of cascade power failure in the fu-

ture. This predicted value can be used to generate a warning

for possible failure. This system can be used in real-time

self-healing robust system in grid stations. Several scenar-

ios of failure are designed and tested using SVR. For sim-

ulations, LIBSVM of MATLAB is used. The performance

of the proposed model is satisfying. However, the data pre-

processing is not included in the model which is a very im-

portant step and inﬂuences the prediction accuracy of a fore-

casting model. Moreover, the proposed forecasting model is

not compared with other forecasting techniques or model of

the same area.

Zhao et al. [66] have proposed a voltage stability pre-

diction in SG. This model uses historical data for predic-

tion. The data is generated using the power system analysis

software package, it is a well-known software developed by

china electric power research center for data generation. The

number of data samples depends on the complexity of the

model. If the system is highly coupled then more test cases

are required, whereas, in the case of a loosely coupled sys-

tem, fewer test samples are required. After data generation,

PCA has been used for feature selection. For prediction, the

logistic regression-based model is selected. It is trained us-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 20 of 35

ing a cross validation method. The cross validation method

tunes the hyperparameters of the forecasting algorithm and

improves the accuracy of the system. It is an online pre-

diction model. The main contribution of this work is that

interval is deﬁned for prediction. A moving window-based

method is used and data ﬂows in the metrics. When matrices

get full of data, the prediction is made about the stability of

the system for the next few seconds or minutes. When new

data enters the window the old data is not entirely replaced,

rather only a portion of old data is replaced by new data. The

performance of this model is evaluated by computing the ac-

curacy results of the predicted values. The comparison of the

proposed work with existing schemes is not presented.

4.3.4. NB

In [67], data mining techniques are employed for eﬃ-

cient load forecasting. The proposed model eliminates the

outliers and selects relevant features in the ﬁrst step. In the

next stage, two forecasting algorithms are used to predict

the load demand. The former stage is data preprocessing

which is carried out using data mining techniques, and later

is the forecasting stage where a hybrid algorithm is used for

accurate and fast prediction of the load. For load forecast-

ing, Saleh et al. have proposed a hybrid algorithm named as

KN3B. It is a hybrid version of K-nearest neighbors (KNN)

and the NB technique. NB is used to assign optimal weights

to the training examples. The core idea is to replace the in-

put feature space with the weight space of KNN, NB assigns

weights and the model is trained. To evaluate the perfor-

mance of the proposed model, it is implemented on the EU-

NITE electricity dataset. To measure the accuracy of the

proposed model, precision, sensitivity and accuracy are used

as performance metrics. The performance of the proposed

model is compared with some popular forecasting models

e.g. Improved ARIMA, KNN, ANN and K-mean & KNN.

The simulation results depict that the proposed model out-

performs all the already existing models.

Lago et al. have addressed the problem of electricity

price forecasting in [68]. Two forecasting methods are pro-

posed: one for the single market price integration and sec-

ond for the multiple market integration. A new analysis of

variance (ANOVA) based feature selection algorithm is pro-

posed for the data preprocessing. For price forecasting, DNN

based MLP model is used with multiple hidden layers. Tra-

ditionally the weights are optimized using the Levenberg Mar-

quardt algorithm or gradient descent model but these algo-

rithms are not suitable for the model with a large dataset.

So, in this study Lago et al. have preferred stochastic gradi-

ent descent. Moreover, this network also has hyperparame-

ters that need proper optimization. For parameter optimiza-

tion diﬀerent conﬁgurations are available, evolutionary algo-

rithms are also used for this purpose. However, the former

approach does not provide an optimal solution because of

its fast decision making and the second approach is not suit-

able for the large datasets as its computational time is high.

Owing to this, authors have used NB optimization which re-

quires a low number of function evaluations. It uses the in-

formation obtained from the previous samples and in this

way, the computation eﬀorts are reduced. From BN, a struc-

tured parzen estimator is used. For performance evaluation,

sMAPE is used as a performance metric. The dataset from

electricity markets of France and Belgium is used and for

simulations, Python is used. The performance of the pur-

posed model is not compared with any benchmark forecast-

ing model. The forecasting performance id evaluated with

respect to the original values of the dataset.

4.3.5. Dynamic integrated forecast system DICast

Meanwhile, in [69], Sulaiman et al. have described a big

data-based power generation forecasting system that is be-

ing used by the utility to forecast the power generation of

RES. Sun4Cast system is used to forecast the power gener-

ation from solar irradiations. This system is developed by

NCAR that works for utilities and independent system oper-

ators for developing a forecasting system which allows them

to plan the eﬃcient integration of variable power generation

sources. This model has two important components includ-

ing numerical weather power forecasts (NWP) and the Now-

cast system. Both systems have several forecasting models

integrated with them to forecast the accurate power genera-

tion value. The forecasting models which come under NWP

system include weather research and forecasting (WRF) which

is a newly developed forecasting method for solar irradiance,

global forecast system which is used to forecast the weather

conditions, high-resolution rapid refresh (HRRR) model which

is used to forecast the weather condition of a speciﬁc area

(in this model, it forecasts the weather condition up to 3km),

and rapid refresh which is similar to HRRR but has wider

domain and forecasts on an hourly basis. On the other hand,

the Nowcast system includes TSICast, StatCast, CIRACast,

MADCast for cloud coverage prediction and WRF-Solar-

Now for solar irradiation. The DICast is used to integrate

the NWP forecasts, optimizes their weights and generate the

ﬁnal prediction. These forecasted values are used by the

ISO partners to plan their daily operations related to power.

The day ahead prediction is acquired using NWP. Whereas,

hourly prediction of the same day is acquired through the

NowCast system. This system is making the integration of

RES eﬃcient and encouraging the partners to plan their day

to day operation and rely on the RES.

4.3.6. Quaisi newton method

Gonzalez et al. have proposed an ARMA exogenous

(ARMAX) based functional time series forecasting method

which is scaled over Hilbert space in the study [70]. This

model is named as ARMA Hilbertian model with exoge-

nous variables (ARMAHX) and it is capable of modeling the

complex dependencies of time on a time series curve. It also

includes the explanatory variables necessary for the time se-

ries. The kernel function of the Hilbert operator is modeled

as the weighted sum of the sigmoid function for the sim-

plicity of the model. The hyperparameters of this model are

tuned by using the Quasi-Newton method. The weights are

initialized randomly at the ﬁrst iteration and tuned with every

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 21 of 35

iteration until the improvement of forecasting accuracy. The

forecasting model is employed on two time series of electric-

ity price data from, Jan 2014 to Dec 2015. Its performance

is compared with MLP, DR model and periodic model in

terms of MAE, RMSE and dynamic weighted MAE, the re-

sults show that the proposed forecasting algorithm outper-

forms these models and generates more accurate and reliable

results.

4.3.7. Levenberg marquardt

Progressing further, in [71], a new short term load fore-

casting model in MG is proposed. In this model, self-recurrent

wavelet NN (SRWNN) is used for the prediction. The per-

formance of this model is evaluated using real-world data.

The dataset from March 2012 to March 2013 of the British

Columbia Institute of information technology is used for the

case study. The WT is used for data preprocessing and as

an activation function for NN. The SRWNN is the hybrid

version of wavelet NN (WNN) and RNN. It combines the

dynamic properties of RNN and fast convergence of WNN

which is applied to the non-linear problems. SRWNN also

has memory to store the information related to past wavelets

which helps it to solve the complex non-linear problems eﬃ-

ciently. It is specially designed to tackle the ﬂuctuations and

volatility of the load time series of the MG. The free parame-

ters of this algorithm need tuning as the Morlet wavelet func-

tion is diﬀerentiable with respect to free parameters. Authors

have used the levenberg marquardt algorithm. It was used by

Hagan and Minhag [72] to tune the NN. It is used in literature

to solve the optimization problems and is recommended by

researchers because of its accurate training and fast conver-

gence rate. For performance evaluation, two performance

metrics are used i.e. NRMSE and NMAE. The values of

these metrics are compared with the same parameters values

obtained from both MLP and WNN. The results depict that

the performance of SRWNN has better performance in terms

of both performance metrics.

4.3.8. Excavated association rules

In [73], RBFNN based prediction model is used for pa-

rameter values of power transformers in SG. This prediction

helps to identify the possible failure of the power system.

An RBF is used as an activation function in its hidden layer.

The weights of this prediction model are tuned using exca-

vated association rules. A data-driven model is proposed

in this study, where the Apriori method is combined with a

probabilistic graph model for association rule mining. In this

method, the Apriori algorithm is used to mine the frequent

patterns in a dataset, these items are represented by a prob-

abilistic graph and association rules are excavated by these

graphs. The dataset is searched only once and to discover the

new association rule, the graph is traversed. The complexity

of traversing a graph is also less than searching the whole

dataset. Moreover, the coeﬃcients of support and conﬁ-

dence between rules are also computed. In this model, at

the ﬁrst stage, the state variables are read from the database.

In the next step, the Apriori algorithm is used for the binary

frequent item set. The conditional probability distribution

is computed of each set. The directed graphs are plotted,

where state variables act as nodes and frequent itemsets and

their probability is treated as edges between them. In the ﬁ-

nal step, the association rules are mined and their support

and conﬁdence are computed. The performance of the fore-

casting algorithm with and without association rule mining

is evaluated. Simulations results depict that the association

rules play a very important role in the improvement of the

forecasting accuracy.

4.3.9. Non-homogeneous poison process (NHPP)

Yue et al. [74] have proposed an electricity outage pre-

diction model based on BN. This model uses historical data

based on radar observations. The regression models are de-

veloped to compute the failure rate of the grid’s components.

These models use the data related to the failure of grid com-

ponents and information acquired using radar. The radar

data contains the information related to the peaks of dif-

ferent weather conditions along with the duration of these

peak values. In the existing literature, only peak values of

weather conditions are used, but according to the authors,

the duration of these peaks also aﬀects the failure rate of grid

components as more time they are exposed to bad weather

conditions, the more chances of their failure are there. The

proposed Bayesian outage prediction (BOP) algorithm uses

both historical data about outages and failure rate models.

The performance of this algorithm is improved by the inte-

gration of NHPP. The test results demonstrate that the pre-

diction of the BOP algorithm has improved signiﬁcantly when

NHPP is integrated. For simulation, MATLAB is used. The

proposed algorithm uses generic data so it can easily be ap-

plied to any grid without any technical changes. Besides,

authors have mentioned that there is a trade-oﬀ between cov-

erage area and the accuracy of the model. It means that if we

consider the large area the computational eﬀorts would in-

crease but as more data would

be available, so, accuracy will be high. On the other

hand, if small area is considered then computational eﬀorts

would be less but at the same time, the variety in data will

be low which can reduce the accuracy of the model. So, a

moderate area should be considered.

4.3.10. Altering direction of multiplier method

In [75], Yu et al. have addressed the problem of electric-

ity load forecasting of an individual residential building. It

is a very complex task to forecast the load behavior of a sin-

gle building. The behavior of individual buildings is very

stochastic and volatile as compared to the load consump-

tion behavior of a city or group of buildings. For prediction,

authors have used sparse coding. Dataset is obtained from

smart meters of the SG data analytics project in collabora-

tion with EPB of Chattanooga, Tennessee. The load data is

collected from 5000 meters along with hourly temperature in

both winter and summer as the temperature is a very impor-

tant feature for load forecasting. Sparse coding is frequently

used in image and signal processing. For load consumption

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 22 of 35

Table 3

Forecasting models and hyperparameter optimization techniques

Forecasting technique Forecasting

domain

Optimization

method Dataset Publishing

year

Genetic-RVM [30] Price GA Real-world prices coming from the New

England electricity market 2015

Game-theoretical based ad-

justed end user forecasting

model[31]

Renewable

power GA A local MG in Hebei Province, China 2017

ESN and PCA decomposi-

tion [32]Load GA ACEA 2015

NN [33]Wind

power GA

Historical measurement records of the

wind farm SCADA system database and

meteorological variables of NWP model

2017

RVM [34] Price MGA New England electricity market 2015

Combined NN based model

[35]Load CSA

Electricity power data from February

2006 to 2009 for the State of New South

Wales, August 2006 to 2008 for the

State of Victoria and November 2006 to

2008 for the State of Queensland, Aus-

tralia

2015

DCANN [37] Price CSA Australia electricity market 2016

SVR [38] Load MFA

Practical daily load data of Fars province

in Iran published by Fars Electrical

Power Company

2014

MIMO-LSSVM model [39]Load and

price QOABCO Real data of the New York Independent

System Operator 2015

IENN [40] Load NSSA optimization

algorithm Customized dataset 2017

Neural predictors FNN and

RBF[41]Demand PSO Customized dataset 2017

Online SVR [42] Load PSO & ACO Public Irish CER dataset 2017

ARIMA/SVR [43] Load Metaheuristic algo-

rithms

Customized smart metering infrastruc-

ture 2016

Generalized neuron model

[45]Price

Improved environ-

ment adaptation

method

Electricity market of New South Wales 2017

BN [46] Demand Tabu search Paciﬁc Northwest national lab 2017

Multi and uni-variant mod-

els [47]Price Grid search

Nordic power exchange, Nord Pool

Spot, owned by the Nordic and Baltic

transmission system

2015

CNN and K-mean [48] Load

Trial-and-error

method and Adom

optimizer

Big electricity load dataset from the

power industry 2017

Deep learning based models

[49]Load Cross validation, pa-

rameter grid search

Data from a campus building in the

Hong Kong Polytechnic University 2017

MLR, MLP, SVR [50] Load Grid search Customized dataset 2015

Black and grey box testing

models[52]Load Grid search Obtained from simulations 2015

ANN [53] Price Grid search EPEX for the German/ Austrian power

market 2016

CNN [54] Load Stochastic gradient

descent

Benchmark dataset named individual

household electric power consumption

dataset

2017

Deep learning methods [55]Price pre-

diction ADAM optimizer Day-ahead market in Belgium, i.e. Eu-

ropean power exchange Belgium 2018

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 23 of 35

Table 3

Forecasting models and hyperparameter optimization techniques

Forecasting technique Forecasting

domain

Optimization

method Dataset Publishing

year

SDA and RS-SDA [56] Price Gradient descent Nebraska, Arkansas, Louisiana, Texas,

and Indiana 2017

RBFNN model based on

PCA-WFCM [57]Load Gradient descent Real time hourly load data (in MW Hrs.)

of New South Wales State, Australia 2016

LASSO [58]Power gen-

eration Gradient descent Three diﬀerent datasets gathered in

both US and UK 2018

SD-EMD-LSTM [60] Demand Stochastic gradient

descent ISO New England 2017

FFNN/SVR [61] Load Cross validation 2 years of event data from Green Button

ISO hubs in the U.S 2016

Inverse optimization

scheme [62]Load Bi-level program-

ming problem

Actual data obtained from a real-life ex-

periment 2016

RF [63] Load Cross validation Customized data of a university campus 2018

Anomaly detection model

[64]

Anomaly

prediction Cross validation Self-generated data 2017

SVM [65]Blackout

prediction Cross validation Monte-Carlo simulations 2014

Logistic regression [66]Voltage

stability Grid search Power system simulation software

PSASP 2016

Prediction

Hybrid KN3 B predictor

[67]Load NB EUNITE electrical load dataset 2016

DNN [68] Price Parzen estimator

based NB

Data from EPEX-Belgium and EPEX-

France power exchanges 2017

NWP [69]Power gen-

eration DICast Not mentioned 2017

ARMAX time series model

[70]Price Quasi-Newton algo-

rithm Spanish electricity market operator 2018

SRWNN [71] Load Levenberg mar-

quardt

British ColumbiaâĂŹs and Californi-

aâĂŹs power system data 2015

RBFNN [73]

State pa-

rameter

of power

transform-

ers

Excavated associa-

tion rules

Data of state parameters of ﬁve 500 kV

power 2016

Bayesian Approach [74]

Power out-

age predic-

tion

NHPP

Radar data and local surface meteoro-

logical measurements from the national

weather service stations

2017

Sparse coding [75] Load Alternating direction

of multiplier method Electric power board of Chattanooga 2017

information from multiple meters, a dictionary D is learned,

having q dimensions. Two types of sparse codes are devel-

oped in this study. The ﬁrst one is basic sparse which has

a penalty for sparsity and square loss for reconstruction er-

ror. The second model is group sparse, it is the same as

the basic sparse the diﬀerence lies in penalty function. In

this model, the single penalty is exchanged with a group

penalty. The dictionary learning problem of this method is

solved using altering direction of multiplier method. This

model is popular to obtain the optimized values for the ob-

jective function. After learning the sparse code, a regres-

sion model is trained for prediction. In this study, the rigid

regression model is used for day-ahead and next week load

forecasting. This model is chosen instead of frequently used

SVR model because both have similar forecasting accura-

cies and rigid regression takes less time in training. This

model solves the optimization problem and learns the re-

gression weight vector by using the input values. The perfor-

mance of the proposed model is compared with ARIMA and

Holt-Winters forecasting model. All the forecasting mod-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 24 of 35

els are implemented on the dataset of electricity consump-

tion data of house-holds in Chattanooga, TN. The duration

of this dataset is from Sep 2011 to Aug 2013. The data of

meters consuming low power for two months is eliminated

from the dataset, as it happens rarely and such kind of data

can cause less accurate forecasting results. The temperature

data is also of the same duration and acquired from an air-

port. To compare the performance of the proposed model

with benchmark models, MAE, RMSE and MAPE are used

as performance metrics. The simulations are carried out

using MATLAB on Intel Xeon quad Core 3.33 GHz CPU

and 24Gb of RAM. For sparse code MEX written in C is

used. The simulation results depict the eﬀectiveness of the

proposed sparse coding based forecasting model. The best

performance in terms of all performance metrics is of basic

sparse based model and group sparse based model is second-

best.

Table 3contains the forecasting methods and their re-

spective hyperparameter optimization methods used in liter-

ature. It also contains the information related to the dataset

and publishing year of the respective study.

5. Data preprocessing

In the previous section, we have discussed optimization

methods for the hyperparameter tuning of forecasting algo-

rithms in detail. Another important factor which aﬀects the

forecasting accuracy of these methods is the quality of data.

A data set mostly includes noise in it, and it needs to be

cleaned before using it for forecasting. In this section, we

are going to discuss some data preprocessing methods, used

in literature for data cleaning.

Wang et al. have proposed a novel electricity price fore-

casting model in the study [30]. This model has 3 essential

modules namely: feature selector, feature extractor and fore-

casting module. For feature selection, a new hybrid model

has been proposed. This model combines the RF and re-

lief F algorithms. Both algorithms evaluate the importance

of input features independently and then the resultant values

from both algorithms are considered jointly for selection or

rejection of a feature. The joint value is compared with an

already deﬁned threshold and features having lower values

are discarded. In this way, the less important and irrelevant

features are ﬁltered out from the dataset. However, there

still exists some redundancy in the dataset. So, the resultant

dataset from the ﬁrst module is sent to the next feature ex-

traction module of the proposed model, kernel PCA (KPCA)

which is the variation of PCA. PCA is commonly used for

feature extraction and redundancy elimination, however, it

linearly maps data from high dimension to lower dimension.

The electricity price forecasting needs non-linear mapping,

so, its variant KPCA is used as it performs non-linear dimen-

sion reduction. The third and ﬁnal module of the proposed

model is price forecasting module.

In [35], 3 datasets are used for prediction and all of them

are in raw form and need preprocessing to eliminate redun-

dancy and spikes from data. So, the datasets are catego-

rized according to the diﬀerent days of the week, as each day

has diﬀerent load behavior. Moreover, a data preprocess-

ing technique, longitudinal data selection, is also employed

to make data more reliable and improve forecasting accu-

racy. Meanwhile, in [67], the proposed model eliminates

the outliers and two algorithms are used for feature selection.

The primary goal of the outlier elimination step is to discard

those data objects from the dataset which have exceptional

and rare behavior when compared to a large dataset e.g. data

objects recorded on some special event like Christmas or

New Year eve. The outliers are the main cause of overﬁt-

ting of the forecasting model as these are unwanted and rare

training patterns which have misleading behavior. So in this

study, distance-based outlier rejection (DBOR) is employed.

For feature selection, Saleh et al. have used two algorithms.

The ﬁrst one is a wrapper based feature selection algorithm.

As this algorithm has some characteristics of GA, it explores

the search space. However, like other wrapping algorithms,

it can also detect only local maxima. To overcome this limi-

tation, a ﬁlter-based feature selection model, recursive best-

ﬁrst search, is used. It uses an evaluation function to calcu-

late the importance of every feature. This new hybrid feature

selection algorithm is named as UHFS. The selected features

are then sent for load forecasting. Moreover, the eﬀective-

ness of the preprocessing method is evaluated using diﬀer-

ent scenarios e.g. DBOR+UHFS, DBOR and UHFS. The

results show that the highest accuracy is acquired when both

DBOR and UHFS are used together.

In [32], values of the dataset are rescaled in the range

[0, 1]. For rescaling, a unity based normalized method is

used. It is evident form the literature that a time series based

multivariate forecasting has high complexity which can be

eﬃciently reduced by using SVD. So, in this work, Bianchi

et al. have used the SVD based dimension reduction method

KPCA. It is a statistical method that generates principle com-

ponents by applying orthogonal transformation on correlated

variables. The orthogonal property of KPCA enables each

column to predict the price individually as the value of 𝑎(𝑖, 𝑗)

depends on 𝑎(𝑖−, 𝑗),𝑎(𝑖− 2, 𝑗 )∨̀‥. The authors took advan-

tage of this property and considered each column a separate

time series and used them for prediction individually. In the

end, all predicted values integrated for the ﬁnal result. The

proposed model in [39] has three steps. The ﬁrst two steps

are for data preprocessing and the third step is for prediction.

In the ﬁrst step, the less important features are ﬁltered out us-

ing a new feature selection algorithm. This algorithm uses

the greedy search and selects those features which have the

highest correlation with already selected features. This algo-

rithm is named as generalized mutual information (GMI). In

the next step, the dataset is divided into several subsets using

the wavelet packet transform (WPT) method. The basic steps

of WPT are the same as the discrete wavelet transform, ex-

cept it decomposes the detailed coeﬃcients in addition to the

approximate coeﬃcient. So, information is not lost during

the decomposition process. The WPT has multiple branches

and the best branch is selected using Shannon entropy cri-

teria. The introduction of this new selection criteria in the

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 25 of 35

WPT branch selection method is also one of the contribu-

tions of this work. For data preprocessing, a new method

based on the index of bad sample matrix (IBSM) is proposed

in [37]. The existing feature selection methods, used in lit-

erature, select the features without considering their relia-

bility. Furthermore, the number of selected features is often

ﬁxed and the contribution of each input feature to the output

is ignored. In the proposed IBSM method these limitations

are addressed. It dynamically selects the relevant features

and bad samples are ﬁltered out using the original indexes of

training samples. Implementation of IBSM is the ﬁrst step

of this method, in the next step SOM is used. It is an unsu-

pervised learning method which belongs to the category of

ANN and it maps the input space of training samples to a

low dimensional output space.

Liu et al. [40] have proposed a sliding window EMD

(SWEMD) model for data preprocessing. It is the exten-

sion of the EMD model. For feature selection, a new algo-

rithm is introduced which is based on PearsonâĂŹs method

of computing the correlation of the features. For data pre-

processing, the temperature and load data are normalized

over the interval [0, 1]. To further improve the data qual-

ity and reduce the ﬂuctuations in values, SWEMD is ap-

plied. After this step, features are evaluated using Pearson-

âĂŹs correlation method. This newly proposed method suc-

cessfully reduces the dimensions of the dataset as it removes

the redundancy and selects highly correlated features. This

method is named as maximize the relevancy and minimize

the redundancy based PearsonâĂŹs correlation coeﬃcients

(MRMRPC). After feature selection, the data is forwarded to

the forecasting engine for prediction. The dataset is prepro-

cessed using the WT algorithm in [41]. Here, ﬂuctuations

and uncertainties are removed from the dataset for accurate

prediction. Progressing further, in [68], it is stated that the

already existing features do not consider the model perfor-

mance while the ﬁltering process which results in redundant

features and relative importance of the features is also not

computed. Moreover, in the case of the non-linear model,

the input features are transformed from a higher dimension

to the lower dimension which may result in loss of informa-

tion of the input features. In this regard, a new feature selec-

tion algorithm is proposed which is ANOVA based wrapper

selection method. It selects features without transforming

them into a lower dimension. In the ﬁrst step, the features

are modeled as hyperparameters. In the second step, the hy-

perparameter optimization algorithm, tree-structured parzen

estimator, is used to optimize and select the optimal features.

In the next step, the importance of the features is analyzed.

The last step uses the feature importance value for the se-

lection of the features. In this step, a threshold is deﬁned

and features having a value greater than this threshold are

selected for forecasting.

For data preprocessing in [69], the correlation of the in-

put variable is computed using PearsonâĂŹs correlation method.

In the ﬁrst step of the proposed forecasting model, the fre-

quent patterns are identiﬁed from the available dataset with

respect to the weekdays using simple moving average method.

It is a popular method which highlights the frequent trends

in data for the long term while short term ﬂuctuations are ig-

nored. In [57], PCA is used for dimension reduction of data

and fuzzy-based clustering is used in this algorithm where

each point has some degree of membership with each clus-

ter. In related work [48], data is cleaned and normalization

operation is performed to shrink the interval of values by tak-

ing their log. After normalization of values, it is important to

remove the irrelevant factors form the dataset as the presence

of these factors could aﬀect the forecasting accuracy of the

model. For this purpose, PearsonâĂŹs correlation method

is used to compute the relevance and the computed values

of each factor are compared with a threshold. Factor having

values less than this threshold are ﬁltered out.

Gonzalez et al [70], instead of using functional PCA for

data preprocessing, have proposed a functional data theory

with standard time series which uses the sigmoid function

to generate the appropriate parametric functional operator.n

This model solves the problem of information loss present

in functional PCA. While in [50], the available data is pre-

processed. In the ﬁrst step of data preprocessing the rele-

vant features are selected which have higher predictive ca-

pacity and minimum redundancy. In the second step, the in-

stances with missing values are removed. A total of 21.16%

instances are removed in this step. Outliers elimination is

also an important step of data preprocessing which improves

the accuracy of data up to a signiﬁcant level. It has trade-oﬀ

as if the criteria of outliers elimination is strict then informa-

tion is also lost. During this step, 1.49% of data is discarded.

In the last step, the data is normalized. Before applying the

forecasting model on the whole dataset, a subset of data is

selected as its representative. In this way, the computational

cost is reduced without aﬀecting the performance of the sys-

tem.

Keles et al. [53] have used a moving median method to

eliminate the trends and seasonal components from available

data. The autocorrelation function is applied to get the in-

formation of the lags in data. Capacity utilization function

and residual load indicator functions are used to compute the

ratio between residual load and available load and extreme

changes in residual data respectively. The information ob-

tained by these methods is used as input in the ANN model.

The mutual information method is used to determined suit-

able lag for the input variable, in this study authors have used

this method after normalization of data. The next step is to

identify the relevant input data. KNN based ﬁlter method is

used for this purpose. A backward elimination and forward

validation procedures are used to select the subset of input

data.

In related work [45], the WT model is used for data pre-

processing. This model is applied to the price time series as

its behavior is not suitable for processing. This data prepro-

cessing method separates the low and high-frequency com-

ponents of data. In this way, the accuracy of the forecast-

ing model is improved. While Chitsaz et al. [71] have used

WT for data preprocessing. It classiﬁes the dataset into low

and high-frequency components. Both components of the

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 26 of 35

Table 4

Data preprocessing techniques and datasets used in forecasting model

Software Hardware Data preprocess-

ing techniques Comparative techniques Data dura-

tion

Python [30]

Intel Core i5, 4 GB

RAM, and 500 GB

hard disk

GCA based selector,

KPCA having 50000

records

NB, DT 2010 to

2015

Not mentioned

[31]Not mentioned Not included BP, SVM

September

2015 to

October

2016

MATLAB using

ESN toolbox

[32]

Not mentioned Unity based normal-

ization ARIMA, ESN 3 years of

dataset

MATLAB [33]

Core i5-5200 CPU,

2.20 GHz processor

and 4 GB RAM

Not included Double stage BP trained ANN

1st May

2014 to

31st April

2015

Not mentioned

[34]Not mentioned Not included Individual RVM models, ARMA

and the naive forecaster

January to

December

2001

Not mentioned

[35]Not mentioned Longitudinal data

selection

BPNN, BPNN with hidden layers,

GABP, RBF, GRNN

2006 to

2009

MATLAB 7.0 &

Windows 7 [37]

i7-3770 3.40 G Hz

CPU SOM and IBSM BPNN, LSSVM, DCANN, FNN,

ARFIMA, GARCH 2010

Not mentioned

[38]Not mentioned Not included ARMA model, ANN, SVR-GA,

SVR-HBMO, SVR-PSO, SVR-FA

March 2007

to February

2010

MATLAB

(R2011a) [39]

2.53 GHz Pentium

2 processor with 4.0

GB of RAM

WPT and GMI ANN

January 1

to March 1,

2014

Not mentioned

[40]Not mentioned MRMRPC, Normal-

ization

ARIMA, SVM, BPNN, RBFNN,

GRNN, fuzzy ARTMAP,

WT+BPNN, WT+RBFNN,

WT+GRNN, WT+FA, WT+FFA

+ FA, WT+MIMO+NN

August 10,

2015 to

August 10,

2016

MATLAB with

NN Toolbox

[41]

Not mentioned WT

BPNN, FNN+PSO, WT+BPNN,

WT+FNN+PSO, EN, ARIMA,

RBF

January 1,

2014 to

December

31 2014

Not mentioned

[42]Not mentioned Not included

PSO vs ACO, RF-week, BAGG,

online SVR, XRT-week, BAGG-

week, DSHW, STL + ARIMA-

week, XGB-week, XGB, XRT, STL

+ ES-week, RF, SVR-week, SVR,

DLnet, MLP STL + ARIMA, STL

+ ES

2009 to

2010

MATLAB [43]

Server system,

high performance

computer, data

server

Not included Not included Not men-

tioned

Not mentioned

[45]Not mentioned WT WT+GNM Not men-

tioned

RStudio [46] Not mentioned Fayyad and Irani dis-

cretization Actual values

April 1st,

2006 to

March 31st,

2007

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 27 of 35

Table 4

Data preprocessing techniques and datasets used in the forecasting model

Software Hardware Data preprocess-

ing techniques Comparative techniques Data dura-

tion

Not mentioned

[47]Not mentioned Reduced rank

Bayesian, VAR

FMs, reduced rank models, fore-

cast combination

1992 to

2010

Python using

tensor ﬂow [48]Not mentioned

PearsonâĂŹs

product-moment

correlation

Linear regression, SVR, NN 2012 to

2014

Not mentioned

[49]Not mentioned DT Gradient boosting machines, SVR,

XGB trees

Not men-

tioned

Weka software

[50]

Intel Core i7-4500U

processor and 8 GB

of DDR3 RAM

Genetic search,

correlation-based

feature selection

MLR, MLP, SVR

13th May,

2013 to

26th March,

2014

MATLAB [52] Not mentioned Not included ANN, SVM

April to Oc-

tober 2008,

19488 sam-

ples

Not mentioned

[53]Not mentioned

Autocorrelation,

capacity utilization

factor, relative load

indicator, mutual

information method

Naive forecasts, ARIMA

July 2011 to

September

2013

Not mentioned

[54]Not mentioned Not included

LSTM sequence-to-sequence, fac-

tored restricted boltzmann ma-

chines, ANN SVM

December

2006 to

Novem-

ber 2010

with 34608

records

Python using

keras DL library

[55]

Not mentioned Box-cox transforma-

tion

DNN, GRU, LSTM, MLP, SVR,

SVR-ARIMA, XGB, FARX-

EN, CNN, FARX-Lasso, RBF,

FARX, RF, HMARX, DR, TARX,

SNARX, TBATS, SOM-SVR,

ARIMA-GARCH, AR, DSHW,

TF, WARIMA-RBF, WARIMA,

DSARIMA

January 1,

2010 to

November

31, 2016

Python 2.7 [56]Core I5 CPU and 8

GB RAM Boosting trees Classical NN, SVM, Lasso

January,

2012 to

November,

2014

Not mentioned

[57]Not mentioned PCA RBF, RBF-FCM

04 Apr,

2011 to 24

Oct, 2011

Not mentioned

[58]Not mentioned LASSO, Kendall’s

coeﬃcients SVM 2006 to

2013

Not mentioned

[60]Not mentioned Xgboost algorithm ARIMA, BPNN, SVR 2003 to

2016

Not mentioned

[61]Not mentioned Not included Variants of NN and SVR Duration of

two years

R and CPLEX

12.3 [62]

Quad Core 2.90 GHz

and 6 GB RAM Not included ARX

September,

2006 to

March,

2007

RStudio with

R-3.0.2 [63]Not mentioned Pearson correlation

comparison

TBATS, DT, multiple regression,

gradient boosting machine, SVR,

ANN, J. Moon et al

2012 to

2016

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 28 of 35

Table 4

Data preprocessing techniques and datasets used in forecasting model

Software Hardware Data preprocess-

ing techniques Comparative techniques Data dura-

tion

Not mentioned

[64]Not mentioned

Mathematical model

proposed for missing

values detection

Base mode

Historical

data of mul-

tiple smart

meters in

real time

MATLAB [65] Not mentioned Not inlcuded Base value

March 21st,

2010 to

June 28th,

2013

Power system

analysis soft-

ware package

[66]

Not mentioned PCA No comparison is available Not men-

tioned

Not mentioned

[67]Not mentioned

DBOR, genetic

based features

selector, UHFS

BPNN, IKNN, NBSVM, INB

January 1,

1997 to

December

31, 1998

Python [68] Not mentioned

Wrapper selection

algorithm based on

functional ANOVA

Single market model is compared

with multiple markets model

January 1,

2010 to

November

31, 2016

Not mentioned

[69]

Cloud resources are

used Not included

CIRACast, MAD-WRF, Smart

Persist, WRFSolarNow, MAD-

CAST, NowCAST, StatCAST-

Cubist

Not men-

tioned

MATLAB [70] Not mentioned Autocorrelation

functions

MLP, DR model, periodic model,

NB, functional reference method,

FPC dimension reduction

January 1,

2014 to

December

31, 2015

Not mentioned

[71]

Mac Intel Core i5

2.7 GHz with 12 GB

RAM

WT WNN and MLP

March 2012

to March

2013

Not mentioned

[73]Not mentioned Apriori Tid Test cases with and without using

association rules

March 21st,

2010 to

June 28th,

2013

MATLAB sta-

tistical Toolbox

[74]

Not mentioned Parameterization Actual and e-ported values

January,

2010 to

September,

2014

MATLAB and

C [75]

Intel Xeon quad

Core 3.33 GHz CPU

and 24 GB RAM

Sparse coding ARIMA, Holt-Winters

September,

2011 to

August,

2013

time series are then processed separately by the forecasting

model. A feature selection technique used in [76] is also used

in this paper. This model relies on the information acquired

by the mutual information method and selects the features

having a higher mutual information score. This method dis-

cards the irrelevant and redundant features by applying irrel-

evancy ﬁlter. A LASSO based algorithm is used in [58] for

the selection of input variables. In this work, instead of us-

ing all weather-related variables for solar power prediction,

only selected variables are used. In this way, the volume of

data to be processed by the forecasting algorithm is reduced

which minimizes its computational complexity. In this al-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 29 of 35

gorithm, the loss function is tuned and used to compute the

importance of each variable. The variables with high impor-

tance are then selected as input variables. For the clustering

of the SD, the Xgboost algorithm is used in [60]. SD cluster-

ing is used, as the traditional data features can lead the model

to slow convergence and poor accuracy. The relationship of

input features to the output is also learned in this model. NN

is applied in time series prediction but because of complex

linear and non-linear properties of time series, chances of

getting trapped into local minima are high. To address this

problem, the EMD method is employed here. It identiﬁes the

frequent trend in time series and separates the singular val-

ues. It reduces the extra computational eﬀorts. In [66], the

data is generated using a data generation software developed

by china’s electric power research institute. The important

features form this dataset are selected using PCA.

Table 4contains the information related to the data pre-

processing techniques used in literature along with the infor-

mation related to software and hardware which are used for

the implementation of the model. Moreover, the duration of

the dataset and comparative techniques are also mentioned

in the table.

6. Critical analysis

In this section, the analysis of the frequently used hyper-

parameter optimization methods for forecasting algorithms

used in the SG domain is presented. Hyperparameter train-

ing is very important for the eﬃcient forecasting. It trains

the model according to the dataset and tuning these parame-

ters improves the forecasting accuracy signiﬁcantly. So, we

have discussed the tuning methods used by researchers in

recent years. These methods are compared in terms of their

performance in optimization. Moreover, the importance of

data preprocessing is also analyzed and it is highlighted that

how to select the preprocessing methods for eﬃcient results.

Critical comment 1: From the literature review, it is

observed that, grid search [48]-[53], gradient descent [54]-

[60], cross validation [61]-[66] and NB [67,68] are frequently

used optimization methods. Grid search is a traditional way

of ﬁnding the optimal values of hyperparameters. It uses per-

formance metrics to move towards an optimal solution. Gra-

dient descent method outperforms grid search. It ﬁrst com-

putes the gradients for the required hyperparameters then

tunes their values using gradient descent. It was designed

for the NN. The limitation of this algorithm is that it can be

trapped in local optima. Moreover, cross validation and its

variations, i.e., sliding window and k-fold cross validation

methods, are also commonly used for hyperparameter opti-

mization. However, NB outperforms both gradient descent

and cross validation methods. In this method, a probabilis-

tic method is built to map the hyperparameter values to the

objective function. The limitation of NB is that the choice

of covariance function for the practical problem is uncertain

and it also has hyperparameters that need proper tuning. The

aforementioned statistical methods are good to train the fore-

casting model for a small dataset but as the size of the dataset

increases, there is a high chance for these methods to suﬀer

from the curse of dimensionality problem. This makes them

unsuitable for the training of forecasting models using big

data.

Critical comment 2: The nature-inspired algorithms have

evolved as a promising solution for the hyperparameter op-

timization. Researchers have used nature-inspired optimiza-

tions methods to tune the hyperparameters in [30]-[46]. These

algorithms are suitable to ﬁnd the optimal solution in a large

search space. GA is a frequently used nature-inspired algo-

rithm which basis on nature’s principle of survival. PSO is

also a frequently used population-based optimization algo-

rithm. It has better performance and computationally faster

than GA. However, from the literature review, it is observed

that CSA has better performance than PSO but its execu-

tion time is more. The highlighting feature of CSA is its

fast convergence in fewer iteration, so, its execution time to

ﬁnd an optimal solution becomes equal to the PSO. FA used

in [38] outperforms all these nature-inspired algorithms and

its highlighting feature is its ability to remember the best so-

lution. In our opinion, the nature-inspired algorithms give

optimum solution and are suitable for the big data problems.

The integration of these optimization methods with forecast-

ing algorithms can improve the forecasting accuracy as well

as require less time to optimize the hyperparameters.

Critical comment 3: From the literature, it is also ob-

served that the existing nature-inspired algorithms also have

limitations. Their performance is improved by adding addi-

tional steps in them. For example, the QOABC algorithm is

the variation of the ABC optimization algorithm. ABC op-

timization algorithm has fast convergence which sometimes

results in less eﬃcient results. This limitation is overcome

by introducing QOABC and hyperparameters are tuned us-

ing this improved version and required results are obtained

[39]. Another example is NSSA, it is introduced in [40]. In

this algorithm, Euclidean distance is added to get the best

position using neighbor and legacy best position informa-

tion. From these studies, we conclude that there is still room

for improvement in nature-inspired algorithms. The already

existing algorithms can be further improved by making the

existing methods hybrid and results of forecasting would be

more accurate.

Critical comment 4: From a brief literature review, it

is learned that all data preprocessing steps are not neces-

sary. For example, studies where independent variables are

selected by researchers, the feature selection methods are not

used for such datasets. Here, data normalization and outliers

removal steps must be used. It is analyzed from the litera-

ture that a dataset contains many outliers that aﬀect the train-

ing of forecasting models. The removal of outliers increases

the accuracy of prediction as their presence disturbs the nor-

mal patterns of data and the forecasting model learns wrong

information. Moreover, we also observed that the features

of the dataset contain values scaled over diﬀerent intervals

and to make these features comparable, data normalization

methods are used. The existing data preprocessing models

are working well; however, the use of the INTERNET of

things technology is increasing both the volume and hetero-

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 30 of 35

geneity of data. So, data preprocessing methods should also

be improved.

7. Future directions

The study of existing literature depicts the improvement

in hyperparameter tuning is a continuous research area. Some

future directions and challenges are identiﬁed from the avail-

able literature. In this section, we discuss some important

future directions for hyperparameter optimizers for their im-

provement.

1. The SG is evolving with each passing day and new

actors are being integrated into it. So, it will require

new models and applications based on forecasting al-

gorithms e.g. ANN. These newly proposed models

will require tuning and their performance will be af-

fected by their learning [77]. Eﬃcient and competitive

tuning methods would be the need for these methods.

2. As forecasting using big data results in better and ac-

curate prediction results but it also increases the com-

putational overhead. Tuning the hyperparameters of

forecasting algorithms for such models also becomes

computationally expensive as search space becomes

more complex. Thus more eﬃcient optimization ap-

proaches are always desired. They can be achieved by

either proposing new optimization techniques which

require less conﬁguration to reach an optimal solu-

tion, or a better model can be designed for the eval-

uator which enable it to evaluate the best solution in

less time.

3. In [78], a framework has been proposed, where pa-

rameters are identiﬁed by the evaluator and their suit-

able values are chosen using the optimization method.

This process is carried out iteratively. However, this

method is not suitable for the problems with large datasets.

So, simultaneous updation of both parameters and their

conﬁguration is needed for future models.

4. The performance of most of the optimization algo-

rithms is nearly the same. The eﬃciency of these al-

gorithms is very important as the computational re-

sources are expensive assets. So, an automatic ma-

chine learning method should be adopted to select the

most suitable optimizer which considers both eﬃciency

and performance and maintain a balance between them.

5. From existing literature, it cannot be concluded that

which optimization method is best to optimize the hy-

perparameters of forecasting algorithms. So, the re-

searchers need to explore several optimization algo-

rithms rather than relying on a single algorithm to achieve

the best performance of their forecasting algorithm.

6. Deep learning is a popular area of machine learning.

The DNN seems to have a promising future in fore-

casting with big data. These algorithms are however

diﬃcult to train as compare to the shallow networks.

Despite their popularity and better performance, their

theoretical aspects need to be explored.

7. The ﬂow of power over transmission lines is not con-

stant. It depends on the demand for electricity and

power supplied by utilities. So, the forecasting algo-

rithms should be analyzed from this perspective also.

8. Conclusion

This paper presents a brief and comprehensive survey

of optimization techniques used for the optimization of hy-

perparameters of the forecasting model in SG. From liter-

ature, it is observed that the grid search and cross valida-

tion techniques are commonly used methods but as the size

of the dataset increases, they require more computational

time. On the other hand, researchers have applied nature-

inspired heuristic optimization techniques to optimize these

parameters. These techniques work eﬃciently as compared

to legacy methods. A comparison of forecasting accuracy

also depicts that these algorithms work eﬃciently and give

better performance than grid search, gradient descent and

cross validation algorithms.

In this paper, the data preprocessing methods are also

discussed and it is concluded that data preprocessing is an in-

evitable step for forecasting. The feature selection step is im-

portant to reduce the number of input variables as it reduces

the size of data which reduces the computational overhead.

The feature extraction and ﬁltering method removes the out-

liers and missing values from data which may lead to inac-

curate results and model under-ﬁtting. The values of data

are normalized to make the variables comparable. Without

normalization, the dependencies and inﬂuence of variables

on each other and output may not be computed eﬃciently

which may result in the poorly trained forecasting model.

Abbreviation

ABCO Artiﬁcial bee colony optimization

ACO Ant colony optimization

AE Absolute error

ANFIS Adaptive network based fuzzy inference

system

ANN Artiﬁcial neural network

ANOVA Analysis of variance

AR Auto regression

ARFMA Auto regressive fractionally integrated

moving average

ARIMA Autoregressive integrated moving average

ARMA Autoregressive moving average

ARMAHX Autoregressive moving average hilbertian

mosel with exogenous variables

ARMAX Autoregressive moving average exogenous

ARMSE Average root mean square error

ARX Autoregressive with exogenous input

BMA Bayesian model averaging

BOP Bayesian outage prediction

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 31 of 35

BP Back propagation

BPNN Back propagation neural network

CNN Convolutional neural network

CSA Cuckoo search algorithm

DAE Deep autoencoder

DAR Dynamic auto regression

DBN Deep belief networks

DBOR Distance based outlier rejection

DCANN Dynamic choice artiﬁcial neural network

DE Diﬀerential evolution

DICAST Dynamic integrated forecast system

DNN Deep neural network

DR Dynamic regression

DSARIMA Double seasonal auto regressive moving

average

DSHW Double seasonal holt winter

DT Decision tree

ESN Echo state network

EMD Empirical mode decomposition

ENN Elman neural network

ES Exponential smoothing

FA Fireﬂy algorithm

FARX Full auto regression with exogenous input

FARX-EN Full auto regression with exogenous input

elastic net

FARX-LASSO Full auto regression with exogenous input

least absolute shrinkage and selection op-

erator

FM Factor model

FNN Fuzzy neural network

GA Genetic algorithm

GABPNN Genetic algorithm optimized back propa-

gation neural network

GARCH Generalized auto regressive conditional

heteroskedasticity

GMI Generalized mutual information

GRNN Generalized regression neural network

GRU Gated recurrent units

HRRR High resolution rapid refresh

IENN Improved elman neural network

HMARX Hsieh-Manski auto regressive exogenous

input

IBSM Index of bad sample matrix

KNN K-nearest neighbors

KPCA Kernel principle component analysis

LASSO Least absolute shrinkage and selection op-

erator

LSSVM Least square support vector machine

LSTM Long-short term memory

MAE Mean absolute error

MAPE Mean absolute percentage error

MFA Modiﬁed ﬁreﬂy algorithm

MG Micro grid

MIMO Multiple input multiple output

MLP Multi layer perceptron

MLR Multiple linear regression

MRMRPC Minimize the redundancy based Pearson-

âĂŹs correlation coeﬃcients

MSE Mean square error

NB Naive bayes

NHPP Non-homogeneous poison process

NMAE Normalized mean absolute error

NMAPE Normalized mean absolute percentage er-

ror

NN Neural network

NRMSE Normalized root mean square error

NSSA Novel shark search algorithm

NWP Numerical weather power forecasts

PCA Principle component analysis

PCA-WFCM Principle component analysis based

weighted fuzzy C-Mean

PSO Particle swarm optimization

PV Photovoltaic

QOABCO Quaisi oppositional artiﬁcial bee colony

optimization

RBF-FCM Radial basis function fuzzy C-Mean

RBFNN Radial basis function neural network

RES Renewable energy source

RF Random forest

RMSE Root mean square error

RNN Recurrent neural network

RRR Reduced rank regression models

RS-SDA Random sample stacked denoising autoen-

coder

RVM Relevant vector machines

SD Similar days

SDA Stacked denoising autoencoder

SG Smart grid

sMAPE Symmetric mean absolute percentage error

SNARX Smoothed nonparametric auto regressive

with exogenous inputs

SOM Self-organization map

SRWNN Self-recurrent wavelet neural network

SSA Shark search algorithm

SVD Singular value decomposition

SVR Support vector regression

SVR-HBMO Support vector regression honey bee mat-

ing optimization

SWEMD Sliding window empirical mode decompo-

sition

TARX Threshold auto regression with exogenous

inputs

TBATS Trigonometric regressors to model Box-

Cox transformations autoregressive mov-

ing average errors trend seasonality

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 32 of 35

TF Transfer function

VAR Vector auto regression

WARIMA Wavelet auto regressive integrated moving

average

WNN Wavelet neural network

WPT Wavelet packet transform

WRF Weather research and forecasting

WT Wavelet theory

XGB Extreme gradient boosting

References

[1] Munshi, Amr A., and A-RI Mohamed Yasser. "Big data framework

for analytics in smart grids." Electric Power Systems Research 151

(2017): 369-380.

[2] Yusof, Yuhanis, and Zuriani Mustaﬀa. "A review on optimization of

least squares support vector machine for time series forecasting." In-

ternational Journal of Artiﬁcial Intelligence & Applications (IJAIA),

Vol. 7, No. 2, March 2016. 35-49.

[3] Zhang, Le, and Ponnuthurai N. Suganthan. "A survey of randomized

algorithms for training neural networks." Information Sciences 364

(2016): 146-155.

[4] Han, Fei, Jing Jiang, Qing-Hua Ling, and Ben-Yue Su. "A survey

on metaheuristic optimization for random single-hidden layer feed-

forward neural network." Neurocomputing 335 (2019): 261-273.

[5] Afshin, Mohammdreza, Alireza Sadeghian, and Kaamran Raahemi-

far. "On eﬃcient tuning of ls-svm hyper-parameters in short-term load

forecasting: A comparative study." In 2007 IEEE Power Engineering

Society General Meeting, pp. 1-6. IEEE, 2007.

[6] Jiang, Mingfeng, Shanshan Jiang, Lingyan Zhu, Yaming Wang, Wen-

qing Huang, and Heng Zhang. "Study on parameter optimization for

support vector regression in solving the inverse ECG problem." Com-

putational and mathematical methods in medicine 2013 (2013).

[7] Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. "Neural

architecture search: A survey." arXiv preprint arXiv:1808.05377

(2018).

[8] Tu, Huy, and Vivek Nair. "Is one hyperparameter optimizer enough?."

In Proceedings of the 4th ACM SIGSOFT International Workshop on

Software Analytics, pp. 19-25. ACM, 2018.

[9] Bergstra, James S., RÃľmi Bardenet, Yoshua Bengio, and BalÃązs

KÃľgl. "Algorithms for hyper-parameter optimization." In Advances

in neural information processing systems, pp. 2546-2554. 2011.

[10] Darwish, Ashraf, Aboul Ella Hassanien, and Swagatam Das. "A sur-

vey of swarm and evolutionary computing approaches for deep learn-

ing." Artiﬁcial Intelligence Review 53, no. 3 (2020): 1767-1812.

[11] Karaboga, Dervis, and Ebubekir Kaya. "Adaptive network based

fuzzy inference system (ANFIS) training approaches: a comprehen-

sive survey." Artiﬁcial Intelligence Review 52, no. 4 (2019): 2263-

2293.

[12] Masdari, Mohammad, and Afsane Khoshnevis. "A survey and clas-

siﬁcation of the workload forecasting methods in cloud computing."

Cluster Computing (2019): 1-26.

[13] Hossain, Eklas, Imtiaj Khan, Fuad Un-Noor, Sarder Shazali Sikander,

and Md Samiul Haque Sunny. "Application of big data and machine

learning in smart grid, and associated security concerns: A review."

IEEE Access 7 (2019): 13960-13988.

[14] Bhattarai, Bishnu P., Sumit Paudyal, Yusheng Luo, Manish Mohan-

purkar, Kwok Cheung, Reinaldo Tonkoski, Rob Hovsapian et al. "Big

data analytics in smart grids: state-of-the-art, challenges, opportuni-

ties, and future directions." IET Smart Grid 2, no. 2 (2019): 141-154.

[15] Ahmed, Adil, and Muhammad Khalid. "A review on the selected ap-

plications of forecasting models in renewable power systems." Re-

newable and Sustainable Energy Reviews 100 (2019): 9-21.

[16] Tsoumakas, Grigorios. "A survey of machine learning techniques for

food sales prediction." Artiﬁcial Intelligence Review 52, no. 1 (2019):

441-447.

[17] Radha, R., and S. Muralidhara. "Removal of redundant and irrelevant

data from training datasets using speedy feature selection method."

IntâĂŹl J Comp. Sci. and Mob. Comput 5, no. 7 (2016): 359-364.

[18] Singla, Manisha, and K. K. Shukla. "Robust statistics-based support

vector machine and its variants: a survey." Neural Computing and

Applications (2019): 1-22.

[19] Shi, Fuxi, Jun Chen, Yang Xu, and Hamid Reza Karimi. "Optimiza-

tion of biodiesel injection parameters based on support vector ma-

chine." Mathematical Problems in Engineering 2013 (2013). Doi:

10.1155/2013/893084.

[20] MartÃŋnez-ÃĄlvarez, Francisco, Alicia Troncoso, Gualberto

Asencio-CortÃľs, and JosÃľ C. Riquelme. "A survey on data mining

techniques applied to electricity-related time series forecasting."

Energies 8, no. 11 (2015): 13162-13193.

[21] Agarwal, Atul. "Introduction to Artiﬁcial Neural

Networks.âĂİ Medium, December 11, 2019. URL:

https://towardsdatascience.com/introduction-to-artiﬁcial-neural-

networks-5036081137bb. [Last accessed: April 4, 2020].

[22] Sakunthala, S., R. Kiranmayi, and P. Nagaraju Mandadi. "A review on

artiﬁcial intelligence techniques in electrical drives: Neural networks,

fuzzy logic, and genetic algorithm." In 2017 International Conference

On Smart Technologies For Smart Nation (SmartTechCon), pp. 11-

16. IEEE, 2017.

[23] Soni, Devin. "Introduction to Bayesian Networks.âĂİ

Medium. Towards Data Science, June 8, 2018. URL:

https://towardsdatascience.com/introduction-to-bayesian-networks-

81031eeed94e. [Last accessed: April 4, 2020]. âĂŇ

[24] Patel, Jatin, Nikita D. Patel and Nikita S. Patel. "A Research on Expert

System using Decision Tree and Naive Bays Classiﬁer.âĂİ (2015).

DOI: 10.1186/s40537-019-0175-6

[25] Rushikesh Pupale. "Support Vector Machines(SVM) âĂŤ

An Overview.âĂİ Medium. Towards Data Science, June 16,

2018. URL: https://towardsdatascience.com/https-medium-com-

pupalerushikesh-svm-f4b42800e989. [Last accessed: April 4,

2020].

[26] Valentina Alto. "Neural Networks: Parameters, Hyperparam-

eters and Optimization Strategies.âĂİ Medium. Towards Data

Science. July 5, 2019. URL: https://towardsdatascience.com/neural-

networks-parameters-hyperparameters-and-optimization-strategies-

3f0842fac0a5. [Last accessed: April 4, 2020].

[27] Sanchez, Felipe. "The Hyperparameter Tuning Problem in

Bayesian Networks.âĂİ Medium, February 14, 2020. URL:

https://towardsdatascience.com/the-hyperparameter-tuning-problem-

in-bayesian-networks-1371590f470. [Last accessed: April 4, 2020].

[28] Hutter, Frank, JÃűrg LÃĳcke, and Lars Schmidt-Thieme. "Beyond

manual tuning of hyperparameters." KI-KÃĳnstliche Intelligenz 29,

no. 4 (2015): 329-337.

[29] Aslam, Sheraz, Adia Khalid, and Nadeem Javaid. "Towards eﬃ-

cient energy management in smart grids considering microgrids with

day-ahead energy forecasting." Electric Power Systems Research 182

(2020): 106232.

[30] Wang, Kun, Chenhan Xu, Yan Zhang, Song Guo, and Alber t Zomaya.

"Robust big data analytics for electricity price forecasting in the smart

grid." IEEE Transactions on Big Data (2017).

[31] Zhou, Zhenyu, Fei Xiong, Biyao Huang, Chen Xu, Runhai Jiao, Bin

Liao, Zhongdong Yin, and Jianqi Li. "Game-Theoretical Energy Man-

agement for Energy Internet With Big Data-Based Renewable Power

Forecasting." IEEE Access 5 (2017): 5731-5746.

[32] Bianchi, Filippo Maria, Enrico De Santis, Antonello Rizzi, and

Alireza Sadeghian. "Short-term electric load forecasting using echo

state networks and PCA decomposition." IEEE Access 3 (2015):

1931-1943.

[33] Eseye, Abinet Tesfaye, Jianhua Zhang, Dehua Zheng, Hui Ma, and

Gan Jingfu. "Short-term wind power forecasting using a double-

stage hierarchical hybrid GA-ANN approach." In Big Data Analysis

(ICBDA), 2017 IEEE 2nd International Conference on, pp. 552-556.

IEEE, 2017.

[34] Alamaniotis, Miltiadis, Dimitrios Bargiotas, Nikolaos G. Bourbakis,

and Lefteri H. Tsoukalas. "Genetic optimal regression of relevance

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 33 of 35

vector machines for electricity pricing signal forecasting in smart

grids." IEEE Transactions on Smart Grid 6, no. 6 (2015): 2997-3005.

[35] Xiao, Liye, Jianzhou Wang, Ru Hou, and Jie Wu. "A combined model

based on data pre-analysis and weight coeﬃcients optimization for

electrical load forecasting." Energy 82 (2015): 524-549.

[36] Naz, Aqdas, Nadeem Javaid, Muhammad Babar Rasheed, Abdul

Haseeb, Musaed Alhussein, and Khursheed Aurangzeb. "Game The-

oretical Energy Management with Storage Capacity Optimization and

Photo-Voltaic Cell Generated Power Forecasting in Micro Grid." Sus-

tainability 11, no. 10 (2019): 2763.

[37] Wang, Jianzhou, Feng Liu, Yiliao Song, and Jing Zhao. "A novel

model: Dynamic choice artiﬁcial neural network (DCANN) for an

electricity price forecasting system." Applied Soft Computing 48

(2016): 281-297.

[38] Kavousi-Fard, Abdollah, Haidar Samet, and Fatemeh Marzbani. "A

new hybrid modiﬁed ﬁreﬂy algorithm and support vector regression

model for accurate short-term load forecasting." Expert systems with

applications 41, no. 13 (2014): 6047-6056.

[39] Shayeghi, Hossein, Ali Ghasemi, Mohammad Moradzadeh, and M.

Nooshyar. "Simultaneous day-ahead forecasting of electricity price

and load in smart grids." Energy Conversion and Management 95

(2015): 371-384.

[40] Liu, Yang, Wei Wang, and Noradin Ghadimi. "Electricity load fore-

casting by an improved forecast engine for building level consumers."

Energy 139 (2017): 18-30.

[41] Raza, Muhammad Qamar, Mithulananthan Nadarajah, and Chandima

Ekanayake. "Demand forecast of PV integrated bioclimatic buildings

using ensemble framework." Applied Energy 208 (2017): 1626-1638.

[42] Vrablecova, Petra, Anna Bou Ezzeddine, Viera RozinajovÃą,

SlavomÃŋr ÅăÃąrik, and Arun Kumar Sangaiah. "Smart grid load

forecasting using online support vector regression." Computers &

Electrical Engineering (2017).

[43] Chou, Jui-Sheng, and Ngoc-Tri Ngo. "Smart grid data analytics

framework forincreasing energy savings in residential buildings." Au-

tomation in Construction 72 (2016): 247-257.

[44] Khalid, Rabiya, Nadeem Javaid, FahadA. Al-zahrani, Khursheed Au-

rangzeb, Emad-ul-Haq Qazi, and Tehreem Ashfaq. "Electricity Load

and Price Forecasting Using Jaya-Long Short Term Memory (JL-

STM) in Smart Grids." Entropy 22, no. 1 (2020): 10.

[45] Singh, Nitin, Soumya Ranjan Mohanty, and Rishabh Dev Shukla.

"Short term electricity price forecast based on environmentally

adapted generalized neuron." Energy 125 (2017): 127-139.

[46] Bassamzadeh, Nastaran, and Roger Ghanem. "Multiscale stochastic

prediction of electricity demand in smart grids using Bayesian net-

works." Applied energy 193 (2017): 369-380.

[47] Raviv, Eran, Kees E. Bouwman, and Dick van Dijk. "Forecasting day-

ahead electricity prices: Utilizing hourly prices." Energy Economics

50 (2015): 227-239.

[48] Dong, Xishuang, Lijun Qian, and Lei Huang. "Short-term load fore-

casting in smart grid: A combined CNN and K-means clustering ap-

proach." In Big Data and Smart Computing (BigComp), 2017 IEEE

International Conference on, pp. 119-125. IEEE, 2017.

[49] Xiao, Fu, Shengwei Wang, and Cheng Fan. "Mining Big Building Op-

erational Data for Building Cooling Load Prediction and Energy Ef-

ﬁciency Improvement." In Smart Computing (SMARTCOMP), 2017

IEEE International Conference on, pp. 1-3. IEEE, 2017.

[50] Massana i Raurich, Joaquim, Carles Pous i SabadÃŋ, LlorenÃğ Bur-

gas Nadal, Joaquim MelÃľndez i Frigola, and Joan Colomer LlinÃăs.

"Short-term load forecasting in a non-residential building contrasting

models and attributes." Energy and Buildings, 2015, vol. 92, p. 322-

330 (2015).

[51] Zahid, Maheen, Fahad Ahmed, Nadeem Javaid, Raza Abid Abbasi,

Zainab Kazmi, Haﬁza Syeda, Atia Javaid, Muhammad Bilal, Mariam

Akbar, and Manzoor Ilahi. "Electricity price and load forecasting

using enhanced convolutional neural network and enhanced support

vector regression in smart grids." Electronics 8, no. 2 (2019): 122.

[52] Garulli, Andrea, Simone Paoletti, and Antonio Vicino. "Models and

techniques for electric load forecasting in the presence of demand re-

sponse." IEEE Transactions on Control Systems Technology23, no. 3

(2015): 1087-1097.

[53] Keles, Dogan, Jonathan Scelle, Florentina Paraschiv, and Wolf Ficht-

ner. "Extended forecast methods for day-ahead electricity spot prices

applying artiﬁcial neural networks." Applied energy 162 (2016): 218-

230.

[54] Amarasinghe, Kasun, Daniel L. Marino, and Milos Manic. "Deep

neural networks for energy load forecasting." In Industrial Electronics

(ISIE), 2017 IEEE 26th International Symposium on, pp. 1483-1488.

IEEE, 2017.

[55] Lago, Jesus, Fjo De Ridder, and Bart De Schutter. "Forecasting spot

electricity prices: deep learning approaches and empirical compari-

son of traditional algorithms." Applied Energy 221 (2018): 386-405.

[56] Wang, Long, Zijun Zhang, and Jieqiu Chen. "Short-Term Electric-

ity Price Forecasting with Stacked Denoising Autoencoders." IEEE

Transactions on Power Systems 32, no. 4 (2017): 2673-2681.

[57] Lu, Yun, Tiankui Zhang, Zhimin Zeng, and Jonathan Loo. "An im-

proved RBF neural network for short-term load forecast in smart

grids." In Communication Systems (ICCS), 2016 IEEE International

Conference on, pp. 1-6. IEEE, 2016.

[58] Tang, Ningkai, Shiwen Mao, Yu Wang, and R. M. Nelms. "So-

lar Power Generation Forecasting with a LASSO-based Approach."

IEEE Internet of Things Journal (2018).

[59] Sharma, Navin, Pranshu Sharma, David Irwin, and Prashant Shenoy.

"Predicting solar generation from weather forecasts using machine

learning." In Smart Grid Communications (SmartGridComm), 2011

IEEE International Conference on, pp. 528-533. IEEE, 2011.

[60] Zheng, Huiting, Jiabin Yuan, and Long Chen. "Short-term load fore-

casting using EMD-LSTM neural networks with a Xgboost algorithm

for feature importance evaluation." Energies 10, no. 8 (2017): 1168.

[61] Grolinger, Katarina, Alexandra LâĂŹHeureux, Miriam AM Capretz,

and Luke Seewald. "Energy forecasting for event venues: big dat a and

prediction accuracy." Energy and Buildings 112 (2016): 222-233.

[62] Saez-Gallego, Javier, Juan M. Morales, Marco Zugno, and Hen-

rik Madsen. "A data-driven bidding model for a cluster of price-

responsive consumers of electricity." IEEE Transactions on Power

Systems 31, no. 6 (2016): 5001-5011.

[63] Moon, Jihoon, Kyu-Hyung Kim, Yongsung Kim, and Eenjun Hwang.

"A Short-Term Electric Load Forecasting Scheme Using 2-Stage Pre-

dictive Analytics." In Big Data and Smart Computing (BigComp),

2018 IEEE International Conference on, pp. 219-226. IEEE, 2018.

[64] Moghaddass, Ramin, and Jianhui Wang. "A hierarchical framework

for smart grid anomaly detection using large-scale smart meter data."

IEEE Transactions on Smart Grid (2017).

[65] Gupta, Sudha, Ruta Kambli, Sushama Wagh, and Faruk Kazi.

"Support-vector-machine-based proactive cascade prediction in smart

grid using probabilistic framework." IEEE Transactions on Industrial

Electronics 62, no. 4 (2015): 2478-2486.

[66] Zhao, Bingbing, Junwei Cao, Ziyu Zhu, and Huaying Zhang. "A new

transient voltage stability prediction model using big data analysis." In

Innovative Smart Grid Technologies-Asia (ISGT-Asia), 2016 IEEE,

pp. 1065-1069. IEEE, 2016.

[67] Saleh, Ahmed I., Asmaa H. Rabie, and Khaled M. Abo-Al-Ez. "A

data mining based load forecasting strategy for smart electrical grids."

Advanced Engineering Informatics 30, no. 3 (2016): 422-448.

[68] Lago, Jesus, Fjo De Ridder, Peter Vrancx, and Bart De Schutter.

"Forecasting day-ahead electricity prices in Europe: the importance

of considering market integration." Applied Energy 211 (2018): 890-

903.

[69] Sulaiman, S. M., P. Aruna Jeyanthy, and D. Devaraj. "Big data analyt-

ics of smart meter data using Adaptive Neuro Fuzzy Inference System

(ANFIS)." In Emerging Technological Trends (ICETT), International

Conference on, pp. 1-5. IEEE, 2016.

[70] GonzÃąlez, JosÃľ Portela, Antonio MuÃśoz San Roque, and Estrella

Alonso PÃľrez. "Forecasting functional time series with a new Hilber-

tian ARMAX model: Application to electricity price forecasting."

IEEE Transactions on Power Systems 33, no. 1 (2018): 545-556.

[71] Chitsaz, Hamed, Hamid Shaker, Hamidreza Zareipour, David Wood,

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 34 of 35

and Nima Amjady. "Short-term electricity load forecasting of build-

ings in microgrids." Energy and Buildings99 (2015): 50-60.

[72] Hagan, Martin T., and Mohammad B. Menhaj. "Training feedforward

networks with the Marquardt algorithm." IEEE transactions on Neural

Networks 5, no. 6 (1994): 989-993.

[73] Sheng, Gehao, Huijuan Hou, Xiuchen Jiang, and Yufeng Chen. "A

novel association rule mining method of big data for power trans-

formers state parameters based on probabilistic graph model." IEEE

Transactions on Smart Grid 9, no. 2 (2018): 695-702.

[74] Yue, Meng, Tami Toto, Michael P. Jensen, Scott E. Giangrande,

and Robert Lofaro. "A Bayesian Approach Based Outage Prediction

in Electric Utility Systems Using Radar Measurement Data." IEEE

Transactions on Smart Grid (2017).

[75] Yu, Chun-Nam, Piotr Mirowski, and Tin Kam Ho. "A sparse coding

approach to household electricity demand forecasting in smart grids."

IEEE Transactions on Smart Grid 8, no. 2 (2017): 738-748.

[76] Tascikaraoglu, A., and M. Uzunoglu. "A review of combined ap-

proaches for prediction of short-term wind speed and power." Renew-

able and Sustainable Energy Reviews 34 (2014): 243-254.

[77] Hernandez, Luis, Carlos Baladron, Javier M. Aguiar, BelÃľn Carro,

Antonio J. Sanchez-Esguevillas, Jaime Lloret, and Joaquim Massana.

"A survey on electric power demand forecasting: future trends in

smart grids, microgrids and smart buildings." IEEE Communications

Surveys & Tutorials 16, no. 3 (2014): 1460-1495.

[78] Quanming, Yao, Wang Mengshuo, Jair Escalante Hugo, Guyon Is-

abelle, Hu Yi-Qi, Li Yu-Feng, Tu Wei-Wei, Yang Qiang, and Yu

Yang. "Taking human out of learning applications: A survey on auto-

mated machine learning." arXiv preprint arXiv:1810.13306 (2018).

Rabiya Khalid received the MCS degree from Mir-

pur University of Science and Technology, Mir-

pur (Azad Kashmir), Pakistan, in 2014, and the

M.S. degree in computer science with a special-

ization in energy management in smart grid from

the Communications Over Sensors (ComSens) Re-

search Laboratory, COMSATS University Islam-

abad, Islamabad, Pakistan in 2017 under the su-

pervision of Dr. Nadeem Javaid. She has authored

more than 20 research publications in international

journals and conferences. Her research interests in-

clude data science and blockchain in smart/micro

grids. Currently she is working as research asso-

ciate and pursuing a PhD in the same lab and under

the same supervision.

Nadeem Javaid received the bachelor degree in

computer science from Gomal University, Dera Is-

mail Khan, Pakistan, in 1995, the master degree in

electronics from Quaid-i-Azam University, Islam-

abad, Pakistan, in 1999, and the Ph.D. degree from

the University of Paris-Est, France, in 2010. He

is currently an Associate Professor and the Found-

ing Director of the Communications Over Sensors

(ComSens) Research Laboratory, Department of

Computer Science, COMSATS University Islam-

abad, Islamabad. He has supervised 120 master

and 16 Ph.D. theses. He has authored over 900 ar-

ticles in technical journals and international con-

ferences. His research interests include energy op-

timization in smart/micro grids, wireless sensor

networks, big data analytics in smart grids, and

blockchain in WSNs, smart grids, etc. He was re-

cipient of the Best University Teacher Award from

the Higher Education Commission of Pakistan, in

2016, and the Research Productivity Award from

the Pakistan Council for Science and Technology,

in 2017. He is also Associate Editor of IEEE Ac-

cess, Editor of the International Journal of Space-

Based and Situated Computing and editor of Sus-

tainable Cities and Society.

Rabiya Khalid and Nadeem Javaid: Preprint submitted to Elsevier Page 35 of 35