Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on Oct 07, 2019

Content may be subject to copyright.

COMSATS University Islamabad, Islamabad Campus

Synopsis For the Degree of M.S/MPhil. PhD.

PART-1

Name of Student Ghulam Hafeez

Department Electrical and Computer Engineering

Registration No. FA17-PEE-001

Name of

(i) Research Supervisor

(ii) Co-Supervisor

(i) Dr. Khurram Saleem Alimgeer

(ii) Dr. Nadeem Javaid

Research Area

Members of Supervisory Committee

1

2

3

4

Title of Research Proposal Electric load forecasting based on deep learning for the decision

making in smart grid

Signature of Student:

Summary of the research

Accurate electric load forecasting is indispensable due to its application in the decision making

and operation of the smart grid (SG). With accurate electric load forecasting, operators are capa-

ble to develop an optimal market plan to enhance the economic beneﬁts of energy management.

Therefore, it is a signiﬁcant goal for scholars and industry to develop a forecasting model, which

provides accurate and precise load forecasting. Thus, several forecasting strategies have been

proposed in the literature starting from legacy time series to contemporary data analytic models.

However, the performance of single technique based forecasting models is not satisfactory due to

the inherent limitations. On the other hand, hybrid models fully utilize the advantages of individ-

ual techniques and have improved performance. Furthermore, some of these models have either

better performance in terms of accuracy while others perform well in convergence rate. However,

the forecast accuracy and convergence rate still can be improved. In this synopsis, a short-term

electric load forecasting model is proposed. The proposed model is a hybrid model composed of

data pre-processing and feature selection module, training and forecasting module, optimization

module, and utilization module. The data pre-processing and feature selection module is based on

modiﬁed mutual information (MMI) technique, which is an improved version of mutual informa-

tion technique, used to select abstractive features from historical data. The training and forecasting

module is based on factored conditional restricted Boltzmann machine (FCRBM), which is a deep

learning model, enabled via learning to forecast the future electric load. The optimization module

is based on our proposed genetic wind driven optimization (GWDO) algorithm, which is used to

ﬁne tune the adjustable parameters of the model. The forecasting results are utilized in the decision

making of the SG. The proposed model is tested on historical electric load data of three USA grids

(FE, EKPC, and Daytown) and global energy forecasting competition 2012. The data is taken from

publicly available PJM electricity market and Kaggel repository. To verify the effectiveness of the

proposed model, three existing models (MI-ANN, Bi-Level, and AFC-ANN) are used. Two per-

formance metrics, i.e., accuracy (mean absolute percentage error (MAPE), root mean square error

(RMSE), and correlation coefﬁcient) and convergence rate are used for performance evaluation of

the proposed model.

1 Introduction

Smart grid (SG) emerged as a smart power system that has recently achieved a lot of popularity

[1]. Although a variety of novel research work has been emanated in the ﬁeld of electric load

forecasting, however, more accurate and robust electric load forecast models are still the need

of the day. An accurate estimation of variation in future electric load is of great importance for

both electric utility companies and consumers due to its application in the decision making and

operation of power grid [2]. However, the major obstacles in future electric load forecasting

are the various inﬂuencing factors such as variable climate, temperature, humidity, occupancy

patterns, calendar indicators, and social conventions.

The valid mapping of these inﬂuencing factors and load variations is extremely cumber-

some due to the stochastic and non-linear behavior of consumers. In fact, data acquisition has

not been an easy task. The emanation of advanced metering infrastructure (AMI), communi-

cation technologies, and sensing methods enable us to record, monitor, and analyze the impact

of these inﬂuencing factors on electric load [3]. However, data handling is still a challenging

problem due to non-linear and stochastic varying weather conditions. In literature, both clas-

sical (time-series methods) and computational intelligence methods are applied for electrical

load forecasting [4]. Both methods have their own limitations. The former classical methods

are blamed for their limited ability to handle non-linear data. On the other hand, computational

intelligence methods are criticized for problems like handcrafted features, limited learning ca-

pacity, impotent learning, inaccurate appraisal, and insufﬁcient guiding signiﬁcance. Although,

there are some existing machine learning models applied for electric load forecasting, which

partially resolve the aforementioned problems and have improved performance due to the use

of ingenious design [5].

A suitable mechanism is required to completely solve the problems because low forecast

accuracy results in huge economic loss. One percent increase in the forecast error will cause

10 million increase in the overall utility cost. Therefore, electric utility companies are trying

to develop a fast, accurate, and robust short-term electric load forecasting model. Moreover,

accurate forecasting can also be beneﬁcial for the detection of potential faults and reliable grid

operation. Over the last two decades, numerous load forecasting models have been developed

due to its application in the decision making of the power grid. Boroojeni et al. proposed a

generalized method to model off-line data that have different seasonal cycles (e.g., daily, weekly,

quarterly, and annually). Both seasonal and non-seasonal load cycles are modeled individually

with the help of auto-regressive and moving-average (ARMA) components [6]. Xiaomin Xu

et al. investigated ensemble subsampled support vector regression (SSVR) for forecasting and

estimation of load [7]. A deep belief network restricted Boltzmann machine (RBM) is used

for electric load forecasting. The network reduced the forecast error with affordable execution

time [8]. Hong et al. forecasts electric load of Southeast China with the help of the hybrid

model based on seasonal recurrent support vector regression (SVR) model and chaotic artiﬁcial

bee colony algorithm (CABCA). The performance of the model is validated by comparing to

the auto-regressive integrated moving average (ARIMA) model [9]. These references provide a

good study for future electric load forecasting. However, the load of power grid is more volatile,

containing high frequency, and sharp variation as compared to the load of microgrid. Moreover,

the aforementioned studies have not considered the connection between data pre-processing

and feature selection, control parameters and performance parameters, and network training

methods. Moreover, there is also need to integrate the optimization module to the forecasting

model for outstanding performance. The main contributions are described as follows:

1. A novel hybrid forecast model composed of modiﬁed mutual information (MMI), con-

ditional RBM (CRBM) and factored CRBM (FCRBM), and genetic wind driven opti-

mization (GWDO) techniques is proposed for short-term electric load forecasting. The

aforementioned techniques are arranged in coordinated modular framework to construct

the proposed hybrid model.

2. Based on the existing mutual information (MI) technique [10], a new MMI technique

for feature selection is proposed (Section 5.3). The proposed MMI technique rank the

candidate inputs according to their information value and select key features from the

data to overcome the problem of the curse of dimensionality.

3. A deep learning technique FCRBM is adapted, which is enabled via learning to forecast

the future electric load.

4. A GWDO algorithm is proposed, which is a hybrid of genetic algorithm (GA) and wind

driven optimization (WDO) algorithm. The proposed algorithm has global powerful

search capability and fast convergence rate.

5. The adjustable parameters of both data pre-processing and feature selection module, and

the training and forecasting module are ﬁne tuned by our proposed GWDO algorithm.

The purpose is to optimize the performance of the proposed model.

6. The proposed model is tested on historical hourly load data of three USA grids (FE,

Daytown, and EKPC) and global energy forecasting competition 2012. The results uti-

lizing the proposed model have proven more accurate when compared to the existing

models like ANN, CNN, Bi-level, MI-artiﬁcial neural network (MI-ANN), and accurate

fast converging-ANN (AFC-ANN).

The remaining sections of this synopsis are arranged in the following manner: related work

is presented in Section 2. Research gap analysis is presented in Section 3. In Section 4, the

statement of the problem is discussed. The proposed system models are demonstrated in Section

5. The proposed methods are presented in Section 6. Research methodology is provided at the

end of the synopsis in Section 7.

2 Related work

Electric load forecasting is crucial in the decision making of SG especially at large scale, where

countries and group of countries have common power system such as European Union. In this

regard, some relevant work is discussed in this section. The forecasting is categorized into

four categories according to the forecasting period [11]. First category is the very short-term

forecasting [12] which corresponds to less than one day forecasting. Second category is the

short-term forecasting which corresponds to forecasting period of one day to one week [13].

Third category is medium-term forecasting which corresponds to one week and year ahead

forecasting [14]. Fourth category is the long-term forecasting which corresponds to more than a

year ahead forecasting [15]. In literature, both statical models and machine learning models are

commonly used for electric load forecasting. Let us discuss some of these models adopted for

forecasting in recent years.

ANN are widely used as an intrinsic system and as a part of hybrid system for electric

load forecasting. In [16], Kohonen self organising map is utilized for day ahead electric load

forecasting in Spain. The described strategy comprising of three stages. Daily load proﬁle

is treated as time series and stored in the neurons. After training phase the arrangement of

neurons are such that the load proﬁle given to neuron is similar to the neighboring neurons.

During the second phase the data samples are presented to the network and wining neurons are

extracted. Then, the data samples of the wining neurons are divided into two parts. The ﬁrst

one corresponds to input proﬁle and second one corresponds to forecasted proﬁle. The effect of

exogenous parameters on the accuracy is also considered. It is also reported that the percentage

error varied from 1.84% to 2.33%. A differential polynomial neural network for short-term

load forecasting is described in [17]. The network is multi-layer and by its decomposition

partial differential equations are solved. The twenty-four hours ahead load is forecasted using

the historical electric load data of Canada. The forecasted load deviates from target value by an

error of 1.56%. A short-term load forecasting method based on weather information is proposed

in [18]. The power system is divided into subnetworks on the basis of weather information.

Separate models are developed for each subnetwork. The abstracted features are selected form

large data sets using cosine distance method. The models are based on ANN, ARIMA, and grey

model (GM) to forecast the future load. A hybrid forecast strategy is proposed in [19], which

is based on intelligent algorithm. The described hybrid forecast strategy including novel feature

selection technique and complex forecast engine. The novel features selection technique selects

appropriate features which is fed into the forecast engine. The forecast engine is two stages

and implemented on Redglet and Elman neural network. The intelligent algorithm tunes the

adjustable parameters of the forecast engine to improve both forecast accuracy and convergence

rate. The performance of the described model is validated by comparing with existing models.

A deep learning based forecasting framework with appliances energy consumption sequence is

proposed in [20]. The accuracy is notably improved by incorporating appliances consumption

sequence in addition to the deep neural network.

The authors in [21], proposed an Elman neural network based forecast engine to predict the

future load in the SG. The weights and biases for this network is optimally adjusted by intelligent

algorithm to obtain the accurate forecasting results. The authors proposed a novel forecasting

model that is able to generalize standard ARMAX model to Hilbert space [22]. The proposed

model has linear regression structure and use functional variables for operation. The considered

variables are autoregressive terms, moving average terms, and exogenous parameters. The func-

tional variables are integral operators whose Kernels are modeled as sigmoidal function. The

parameters of the sigmoidal function are optimized using Quasi-Newton algorithm. The model

is validated on the daily price proﬁle of Spanish and German electricity market. However, the

forecast accuracy is improved due to the use of optimization module at the expense of high

execution time. In [23], authors reveal data integrity attacks effect on the accuracy of four fore-

casting models, i.e., SVR, multiple linear regression, ANN, and fuzzy interaction regression.

The data integrity attacks are expected to damage the performance of the discussed forecasting

models and have signiﬁcant impact on the resilience of the power system.

Authors in [24], proposed a short-term load forecasting model based on deep neural net-

work. Two variants of long-term and short-term memory (LSTM), i.e., standard and sequence to

sequence LSTM are used for forecasting individual building energy consumption. Both LSTM

variants are trained and tested with one hour and one-minute time resolution. Experimental

results express that sequence to sequence LSTM outperforms the standard LSTM.

In [25], authors proposed a hybrid of extreme learning machine (ELM) and new delayed

particle swarm optimization (PSO) technique to solve the forecasting problem. In addition,

the weights and biases are optimally tuned by new switching delayed PSO technique. Tangent

hyperbolic (Tanh) function is used to test the performance in a comprehensive and systematic

manner. Simulations results demonstrate that the proposed model outperforms the existing ma-

chine learning-based models. Moreover, the proposed model is applied to the power system for

load forecasting.

Authors in [26], used a newly designed algorithm in order to train a radial basis function to

forecast day-ahead electric load. The newly designed algorithm is comparatively evaluated with

the existing algorithms in terms of forecast accuracy. Simulation results demonstrate that the

newly designed method has less MAPE as compared to RNN and SVR.

Authors in [10], proposed a short term load forecasting model for industrial applications.

The proposed model is based on ANN and modiﬁed enhanced deferential evolutionary algorithm

(MEDEA) to improve the forecast accuracy. For feature extraction and network training, mutual

information-based technique and multi-variate autoregressive model are used, respectively. The

short term load forecast model is enabled via training to forecast the future load. Simulation

results express that the proposed model provides 99.5% accurate predictions as compared to

bi-level strategy. However, forecast accuracy is improved by feeding the output of the forecast

module to optimization module which takes more time to execute.

Authors used deep neural network architecture for day-ahead load forecasting in [27]. To

extract features from historical load data, CNN is used. LSTM model is used to model dynamics

and variability of historical load data. For holidays and other features modeling, the dense feed-

forward network is used. The proposed model is evaluated on an hourly dataset of North China

City. However, forecast accuracy is improved at the expense of high model complexity.

In [28], authors proposed a forecasting model for building energy estimation. The proposed

model is based on deep learning techniques to reduce uncertainty and improve forecast accuracy.

The accuracy of the proposed model is evaluated in terms of RMSE, correlation coefﬁcient, and

p-value. Simulation results validate the forecast accuracy of the proposed model. However,

forecast accuracy is obtained at the cost of slow convergence rate. Mujeeb et al. propose a

load and price forecasting technique that is based on deep learning and evolutionary algorithm

named differential evolution [29]. Their proposed method achieves high accuracy in terms of

load and price forecasting. Another work [30] proposes a hybrid energy demand forecasting

model, where the authors hybrid the LSTM and CNN networks. Moreover, it is afﬁrmed from

experiments that the hybrid model shows higher accuracy than previous approaches.

The authors of [31] develop a day-ahead load forecasting model for individual residence

based on CNN and another work [32] proposes a multi-scale CNN (MS-CNN) with time cogni-

tion for multi-step short term demand prediction. They have performed extensive simulations to

validate the performance of MS-CNN in terms of accuracy. Results from their simulations show

the effectiveness of their developed MS-CNN model over counterparts. Deep reinforcement

learning-based building energy optimization model is proposed in [33]. A hybrid technique of

reinforcement learning and deep learning is proposed for aggregated and individual building

energy optimization. For exploring learning procedure authors have used deep policy gradient

and Q-learning. The proposed model is validated on Pecan Street Inc. database ofﬂine data.

Simulation results expressed that deep policy gradient was most suitable for the cost and peak

reduction as compared to the deep Q-learning.

In [35], authors used a deep learning-based model to forecast the cooling load proﬁle of

the building. The performance of the deep learning-based model, extreme gradient boosting

is evaluated in terms of accuracy and computationally efﬁciency by comparing with existing

forecasting models. The proposed model outperforms existing models in terms of accuracy.

Authors in [36], proposed short term load forecasting model based on bi-level strategy of

microgrids. Bi-level strategy has upper and lower levels structure. The lower level structure

has feature selection technique and forecasting module using a hybrid of ANN and evolutionary

algorithm. The upper-level structure is composed of a stochastic search optimization technique

to optimize the performance of the forecast module. The efﬁcacy of the proposed model is

evaluated on the real-life data of Canada University

Macedonian electricity load forecasting is performed using multiple layer restricted Boltz-

mann machine (RBM) in [38]. The parameters are ﬁne-tuned and weights and biases are updated

using backpropagation. The deep belief network (multi-layers RBM) is trained using hourly

Macedonian electricity consumption data of 6 years from 2008 to 2014. To evaluate the forecast

accuracy of multi-layers RBM, it is compared with actual data, Macedonian system operator

load proﬁle, and multi-layer perceptron.

RBM based models are also used for load forecasting that show reasonable results [39, 40],

however, these models can be improved in order to achieve better accuracy in less computational

time.

Some of the related work with respect to convergence rate, execution time, and forecast

accuracy is summarized in Table 1.

In [7], the authors presented an intelligent model to forecasts the load on distributed gener-

Table 1 Comprehensive analysis of existing forecasting models in terms convergence rate,

execution time, computational complexity, and forecast accuracy

Short term load forecasting models Accuracy Execution

time

Convergence

rate

Computational

complexity

Deep learning based building energy forecast-

ing model [28]

Moderate High Slow High

Deep learning based energy forecasting model

[33]

Low High Slow High

Deep neural network based short term load

forecasting model [39]

Low High Slow High

LSTM based building energy forecasting model

[24]

Low High Slow Low

PSO+extream learning machine based forecast-

ing model [25]

Moderate High Slow High

Radial basis function based forecasting model

[26]

Moderate High Slow High

ANN based forecasting model [10] Moderate High Fast High

CNN and LSTM based forecasting model [27] Moderate High Slow High

Deep belief network based forecasting model

[38]

Moderate High Moderate High

Building cooling load forecasting model based

on deep neural network [35]

Low High Slow Low

Bi-level strategy based forecasting model [36] Moderate high Slow Low

Mutual information and ANN based forecasting

model [37]

Low Low Fast Low

ation (DG) and examine the power supply structure. First, the support vector machine (SVM)

and fruit-ﬂy immune (FFI) algorithm are used to predict DG load. Second, the combined neural

network and the polynomial regression model is used for power supply structure analysis in rela-

tion to hourly load and weather factors. Finally, the impact of DG on the regional power system

structure is analyzed in terms of load reduction on the main electric gridstation. This com-

bined intelligent model has low performance error and strong generalization. However, higher

accuracy is achieved at the cost of slow convergence rate and high computational complexity.

Authors proposed distributed methods in [41] to forecast the future load using weather infor-

mation. The power system is divided into two subnetworks according to the weather variations.

Moreover, separate forecasting models, i.e., ARIMA and grey are established for both subnet-

works. The adapted models are evaluated by comparing with the traditional models using two

performance metrics, i.e., relative root mean error (RRMSE) and mean absolute percentage er-

ror (MAPE). In [42]-[47], authors proposed heuristic based energy management controller for

smart homes. The purpose is to reduce the peak load and electricity cost. However, forecasting

is necessary before optimal load scheduling.

Authors in [48] introduced combined Bluetooth home energy management system (HEMS)

with ANN in order to forecast future load. This approach enables its decision making stronger

taking into account the current situation and energy consumption conditions to forecast the

future load (different time of the working day and different days of the working week). The

purpose of this work is to optimally manage the peak load and smoothout the demand curve.

However, the objectives are obtained at the cost of execution time and slow convergence rate.

Deep recurrent neural network (DRNN) based model is proposed to forecast the household

load [49]. This method overcomes the problems of overﬁtting created by classical deep learning

methods. The results show that DRNN outperforms the existing methods ARIMA, SVR, and

convolutional RNN (CRNN) by 19.5%, 13.1%, and 6.5%, respectively, in terms of RMSE.

In [50], LSTM and recurrent neural network (LSTM-RNN) based framework is proposed

to forecast the future residential load. The accuracy of the proposed framework is enhanced

by embedding appliance consumption sequences in the training data. The proposed framework

is validated on the real world data. However, the authors focus only on accuracy while the

convergence rate and computational complexity are ignored.

A demand response (DR) scheme based on real time pricing (RTP) is proposed in [51] for

industrial facilities. The scheme adapted ANN for forecasting the future prices for global time

horizon optimization. The energy cost minimization is facilitated by price forecasting and is for-

mulated by mixed integer linear programming (MILP). The proposed framework performance

analysis is carried out by the practical case study of steel powder manufacturing. The simulation

results illustrate that hourly ahead DR is better than day ahead DR, with an improved ability to

satisfy industrial demand with reducing cost while satisfying targets.

Authors in [52], proposed an IoT-based deep learning system to forecast the future load

with high precision. Moreover, the proposed method also qualitatively analyzed the inﬂuenc-

ing factors such as variable climates, temperature, humidity, and social conventions that have

a great impact on the forecast. However, the transfer of a huge amount of data on existing

communication infrastructure is challenging.

In [53], adaptive hybrid learning model (AHLM) is proposed to forecast the intensity of

solar. The linear and dynamic behavior of data are captured by time varying and multiple layer

linear models. Also, a hybrid model of back propagation, GA, and neural network is used to

learn the non-linear behavior of the data. The proposed AHLM learn linear, temporal, and non-

linear behavior from the off-line data and predict the intensity of the solar with more precision.

The proposed model outperforms for both short and long term forecast horizons.

To optimally harvest the potential of solar energy, forecasting of solar power energy is in-

dispensable. Thus, the least absolute shrinkage and selection operator model is proposed for

forecasting solar energy generation [54]. The proposed model is trained using historical weather

data aiming not only to reduce prediction error but also to reveal the weather variables signif-

icance in model training for forecasting. An algorithm is developed based on a single index,

least absolute shrinkage, and selection operator models that maximize Kendall ’s coefﬁcient in

order to estimate forecasting model coefﬁcients. The goal of this algorithm is to ignore the less

important variables and increase the sparsity of the coefﬁcient vector. With the proposed model,

either prediction accuracy is improved or tradeoff between accuracy and complexity is achieved.

However, the accuracy is improved at the cost more execution time.

Authors in [55] presented a probabilistic forecasting model to forecast solar power, electri-

cal energy consumption, and netload across the seasonal variations and scalability. Dynamic

gaussian process and Quantile regression models are employed on the data of metropolitan

area Sydney, Australia for probabilistic forecasting. Simulation results depict that the proposed

model outperforms in all three scenarios of forecasting: solar power generation, electricity con-

sumption, and netload.

For short-term load prediction, a hybrid model is proposed in [56]. This model is based on

improved empirical mode decomposition, ARIMA, and wavelet neural network (WNN) opti-

mized by the fruit ﬂy immune (FFI) optimization algorithm. For performance demonstration of

the proposed model electric load data of Australian and New York electricity market are used.

Simulation results show that proposed model prediction is more accurate as compared to the

existing models.

In [57], a deep learning based electric load prediction model is proposed to forecast the fu-

ture load. The proposed model extract abstracted features using stacked denoising auto-encoders

technique. With these abstracted features, SVR model is trained to forecast the future load. The

proposed model is evaluated by comparing with plain SVR and ANN in terms of accuracy im-

provement.

Authors in [58], investigated the recency effect of electricity load forecasting using preced-

ing hours load and temperature. The aim is to determine the lagged hourly temperature and

daily moving average temperature to enhance the forecast accuracy. The data used for network

training and validation is of global energy forecasting competition 2012. The recency effect is

investigated in three scenarios: aggregated level geographic hierarchy, bottom level geographic

hierarchy, and individual level hours of the day. However, accuracy is enhanced at the cost of

model complexity.

In [59], proposed long-term forecasting model in order to improve the relative forecast ac-

curacy of electric utility resource integrated planning. The analysis was conducted on twelve

Western US electric utility in the mid-2000s for both peak and normal energy consumption.

The ANN model is used to forecast the hourly energy consumption of buildings in the

Sugimoto Campus of Osaka City University, Japan [60]. The presented model is trained with

Levenberg-Marquardt and backpropagation algorithms. The six parameters are given as input

such as dry bulb, humidity, temperature, global hourly irradiance, previous hourly, and weekly

energy consumption. The accuracy of the proposed model is evaluated in terms of correlation

coefﬁcient and RMSE. Simulation results illustrate that RMSE is largest in the science and tech-

nology area of the university campus as compared to humanities College area and old liberal arts

area.

A novel type of hybrid system based on artiﬁcial intelligence is discussed in [61] to forecast

24 hours load proﬁle of Polish gridstation. The proposed hybrid system was tested on the off-

line data of Poland and a few other countries. The MAPE varies from 1.08% to 2.26% in this

scenario depending on the country.

In the paper [62], an ensemble model based on empirical mode decomposition algorithm and

deep learning is proposed for load forecasting. The proposed model is tested and validated on

the electrical energy consumption datasets of the Australian energy market operator (AEMO).

Moreover, the electric energy consumption data is decomposed into intrinsic mode functions

(IMF) and the proposed model was used to model each of the IMF in order to improve the

forecast accuracy. An autocorrelation function is for selecting input parameters and least squares

support vector machine (LSSVM) is for forecasting is discussed in [63]. The main contribution

of the paper is to provide a fully automated machine learning model without human intervention

in order to forecast the future load.

A hybrid incremental learning approach is proposed in [64], that is composed of discrete WT

(DWT), empirical mode decomposition, and random vector functional link network (RVFLN), is

discussed for short-term load forecasting. To evaluate the proposed model, the AEMO electric-

ity load data is used. Simulation results depict that the proposed system is effective as compared

to eight benchmark prediction methods.

In the paper [65], ELM model based on a mixed kernel for future load forecasting is dis-

cussed. The half-hour resolution electric load data is used to validate the proposed model. This

the electric load data of the state of New South Wales, Victoria and Queensland in Australia.

Simulation results illustrate that our proposed method is better as compared to existing three

methods, i.e., radial basis function-ELM (RBF-ELM), UKF-ELM, and mixed-ELM in terms of

accuracy.

In [66], the authors proposed a hybrid of ELM and new switching delayed particle swarm

optimization (PSO) algorithm for short-term load forecasting. The weights and biases are op-

timized with new switching delayed algorithm. Tanh function is used as an activation function

because it has better generalization problem and avoids the unnecessary hidden nodes and over-

training problem. Experimental results show that the proposed model outperforms the RBF

neural network. The proposed model is successfully applied for short-term load forecasting in

the power system.

A novel hybrid model, which is a combination of singular spectrum analysis (SSA), support

vector machine (SVM), and cuckoo search (CS) algorithm, is proposed in [67] to forecast the

future load. The historical data is pre-processed with the help of SSA. The pre-processed data

is fed to the SVM model to forecast the future load and performance is optimized with the

help of the CS algorithm. The performance of the proposed model is evaluated in terms of

accuracy by comparing with SVM, CS-SVM (CS-SVM), SSA-SVM (SSA-SVM), ARIMA,

and backpropagation neural network (BPNN).

In [68], clustering based hybrid model is proposed to predict hourly electricity demand of

hotel buildings. The operating buildings are non-stationary because of irregular electric tem-

poral features. The on-line modiﬁed predictor model is proposed. The model is a combination

of SVR and wavelet decomposition algorithm, which takes extracted training samples as in-

put by fuzzy-C-means (FCM). The proposed model has improved accuracy as compared to the

traditional models.

A deep neural network model is adopted for short-term load and probability density fore-

casting in [69]. The proposed model is evaluated on electricity consumption case studies of

three Chinese cities for the year 2014. The simulation results demonstrate that: 1) deep learning

based model has better forecast accuracy relative to random forest and gradient boosting model,

2) temperature, weather, and other environmental variables have a signiﬁcant impact on elec-

tricity consumption, and 3) the probability density forecasting method is able to provide a high

quality prediction.

In [70], a hybrid forecast model is proposed, which is a combination of feature extraction

technique and two stage forecast engine. The two stage forecast engine using Ridgelet neural

network and Elman neural network to provide accurate predictions. The optimization algorithm

is applied to optimally select the control parameters for forecast engine.

Authors in [71] proposed a short-term load forecasting model based on SSVR. The main ob-

jective is to improve relative forecast accuracy and efﬁciency. The relative forecast accuracy and

efﬁciency are improved by giving the output of forecast module to optimization module for ﬁne

tuning of parameters. However, the forecast accuracy is improved at the cost of computational

complexity.

A hybrid model of GA and non-linear AR with exogenous neural network is proposed for

short and medium-term forecasting in [72]. In order to ﬁne tune input parameters for the pro-

posed model statistical and pattern recognition based schemes are employed. The GA is used

for selection weights and biases for the training of the neural network. The proposed model is

validated by comparing with the existing models such as average with exogenous inputs and

regression tree models.

In [73], data-analytic based framework is proposed to forecast solar energy. The proposed

framework is developed and validated on eight years (2005-2012) large dataset of a golden

site of USA with one minute resolution taken from the national renewable energy laboratory

(NREL). The uniqueness of this method is that data preprocessing is performed using integrated

serial time-domain analysis coupled with multivariate ﬁltering.

Authors proposed a hybrid approach in order to forecast the electricity production from

solar panel based microgrid in [74]. The hybrid model is based on genetic GA, PSO, and

neuro-fuzzy inference system (NFIS). The proposed model is tested on the real time power

generation data obtained from gold wind microgrid found in Beijing. The proposed model has

better performance as compared to existing ANN and linear regression based models. Some

of the relevant work with respect to strategies, dataset taken from the repository, and critical

remarks is comprehensively summarized in Table 2.

3 Research gap analysis

The precise and accurate electric load forecasting is indispensable task in SG because low fore-

cast accuracy results in huge economic loss. One percent increase in the forecast error will

cause 10 million increase in the overall utility cost. Moreover, accurate forecasting can also be

beneﬁcial for the detection of potential faults, reliable operation, and decision making of the

SG. Therefore, electric utility companies are trying to develop a fast, accurate, and robust short-

term electric load forecasting model. The following research gaps are highlighted in the above

mentioned recent and relevant work: (1) there is no universal forecast model, some models are

better for some objectives and some conditions; (2) there is a problem that training data set is not

similar with predicted period; (3) there is a trade-off between forecast accuracy and convergence

rate, when forecast accuracy is increased convergence rate will be compromised and vice versa;

(4) adjust the offered prices and amounts in the intra-day and day-ahead market; and (5) ﬁnd

the needed power reserves with start-up times counted in hours when the state of transactions

from those markets is known. In this regard, a novel hybrid forecast model composed of MMI,

Table 2 Recent and relevant work summary

Strategies Objectives Repository Limitations

Intelligent model for

forecasting based on SVM

and FFI algorithm [7]

DG forecasting and regional

power supply structure

analysis

Certain area data in

Northeast of Chine

The model is suitable for

short horizon of forecasting

Weather information based

electric load forecasting of a

bulk power system [41]

Forecast accuracy

improvement for effective

performance of bulk power

system

Fujian Province bulk power

system China

This model is suitable and

quite effective only for bulk

power system

Household forecasting using

DRNN [49]

To improve the comfort of

the users by reliable

provision of electricity

Ireland commission for

energy regulation

The complexity of the model

is increased

LSTM-RNN based

residential load forecasting

[50],

Accuracy improvement to

facilitate the residential

consumers

Canadian residential load

data

The proposed model

improved only the meter

level forecast accuracy

IoT-based electric load

forecasting [52]

Improvement of the accuracy

and capability for effective

power system operation

Electric load record of urban

area in south China

The framework has large

complexity

Deep model with stacked

de-noising auto-encoders for

day-ahead load forecasting

[57]

Forecast accuracy

improvement California electric load data

The system model

performance is compromised

with the decrease in the

datasize

A big data approach for

electric load forecasting [58]

Forecast accuracy

improvement for scalable

models

Global energy forecasting

competition 2012

The model has complex

structure and slow

convergence rate

ANN based prediction model

[60]

To reduce the RMSE and

improve forecast accuracy

Real data of Sugimoto

Campus of Osaka City

University Japan

Objectives are achieved at the

cost of high convergence rate

Intelligent hybrid model

based load forecast [61]

Day-ahead electric load

forecasting

Historical load data of

Poland

To efﬁciently manage the

generation of the Polish grid

FCRBM, and GWDO techniques is proposed for high quality electric load forecast ranging from

day to a week.

4 Statement of the problem

In literature, tremendous research progress has been conducted in the ﬁeld of short-term load

forecasting. However, in spite of much research in this ﬁeld, more accurate and robust short-term

load forecasting is the need of the day due to its application in decision making of the SG. Thus,

with fast and accurate forecasting, the SG can facilitate effective management and utilization

of available resources. However, fast and accurate forecasting is a very complex process due

to the high variation and non-linearity of consumers’ load proﬁle. It is worth mentioning that

variations and non-linearity are inversely linked with forecasting. The predictability is low if

the variations and non-linearity are high and vice versa. Authors in [75] and [76] adopted ANN

based models for accurate electric load forecasting. However, due to the shallow layout of ANN,

these models can suffer from the vanishing gradient, under-ﬁtting, and computational power

problems [77, 78]. The aforementioned problems disturb the forecast accuracy of ANN based

models. In [36], MI and ANN based model is used for short-term load forecasting. Authors in

[37] used Bi-level strategy to address short-term load forecasting, which is based on ANN and

differential evolutionary algorithm (DEA). Authors in [10] proposed short-term load forecasting

model based on ANN and MEDEA. However, these models perform well for small data size, as

the size of data increases their performance degraded due to their shallow layout. It also suffers

from the curse of dimensionality and under-ﬁtting. Moreover, in [37]-[10] authors used DEA

and MEDEA to optimize the forecasting process. However, both DEA and MEDEA have lower

precision and convergence speed as compared to WDO algorithm [79].

In this synopsis, ﬁrst, a new way to adopt deep learning model, i.e., stacked FCRBM is pro-

posed. Secondly, a fast and accurate electric load forecasting model based on two deep learning

models, i.e., stacked FCRBM and CRBM is proposed. The deep learning models have deep

layers layout to capture the highly abstracted characteristics from off-line data. This deep lay-

ers structure highly contributes in improving relative forecast accuracy. Moreover, deep learning

models performance is improving with an increase in the data size while ANN based models per-

formance is compromised with increase in data size [80]. Furthermore, the proposed model has

modular framework in which the output of each former module is fed into the later module. The

framework comprises of four modules: data pre-processing and feature extraction module based

on MMI, FCRBM based training and forecasting module, GWDO based optimization module,

and utilization module. The MMI is used for features selection in data pre-processing and fea-

ture extraction module. For training and forecasting, the deep learning technique FCRBM is

adopted because the deep learning techniques performance is directly linked with data size and

have the ability to capture the desired features in a more effective manner. In order to optimize

the error performance, a GWDO algorithm is proposed because of high convergence speed and

precision [79]. The objective of this synopsis is to improve forecast accuracy with affordable

execution time and computational complexity. The proposed model is applied to hourly load

data of three USA grids and global energy forecasting competition 2012. The proposed model

is validated by comparing with ANN [76], convolution neural network (CNN) [81], MI-ANN

[36], Bi-level [37], and AFC-ANN [10] in terms of accuracy (MAPE, RMSE, and correlation

coefﬁcient) and convergence rate.

5 Proposed system models

In this synopsis, three system models are presented in order to improve the forecast accuracy

and convergence rate. The detail description is as follows:

5.1 Proposed system model I

In this section, the proposed short-term load forecasting framework based on two deep learning

models: stacked FCRBM and CRBM is introduced as shown in Figure 1. For the proposed

framework modular strategy is adopted, where the output of each former module is fed into the

later module. In addition, the proposed framework consists of three modules: data processing

and feature extraction module, deep learning-based training module, and deep learning-based

forecasting module. The detail description is as follows:

Weather data

Input

Load data Data pr ocessing Feature extraction Training data

Testing data

Validation data

Training data

Testing data

Validation data

Data normalization

Data structuring

Data cleansing

Data normalization

Data structuring

Data cleansing

Load data Data pr ocessing Feature extraction Training data

Testing data

Validation data

Data normalization

Data structuring

Data cleansing

Weather data

Input

Load data Data pr ocessing Feature extraction Training data

Testing data

Validation data

Data normalization

Data structuring

Data cleansing

Sigmoid

Parameters

tuning

Deep learning

techniques

Model picking

Stacked

FCRBM CRBM

ReLU

Parameters

tuning

Sigmoid

Parameters

tuning

Deep learning

techniques

Model picking

Stacked

FCRBM CRBM

ReLU

Parameters

tuning

Data processing and feature extraction module

`

Training module Forecating module Applications

Figure 1. Proposed system model I

5.1.1 Data processing and feature extraction module

First, twenty zones’ historical data of US utility (global energy forecasting competition 2012)

consists of hourly load and weather data are taken from the publicly available Kaggle repository

[83]. This data is given as an input to the data processing and feature extraction module. The

three data operations: cleansing, normalization, and structuring are performed on the received

data. The cleansing operation is performed in order to replace the missing and defective values

by the mean of previous values. After cleansing, the data is normalized in order to reduce and

eliminate the redundancy. Moreover, the data has large values, the normalization is performed

on the data to make the weighted sum stay within the limits of the sigmoid function value. At

the end, results are denormalized to achieve the desired load predictions. After cleansing and

normalization, the data is structured in ascending order. The desired features from the dataset

are extracted by the feature extraction process and ﬁnally, the data is split into training and

testing dataset. The training data have hourly load and weather data to train stacked FCRBM

and CRBM. The testing data is used to evaluate the forecast accuracy of the proposed model in

terms of MAPE, RMSE, and correlation coefﬁcient. The validation dataset is constructed for

proper parameters tuning.

5.1.2 Training module

Deep learning models stacked FCRBM and CRBM based training is the main part of this frame-

work. These models are trained with the training data to learn the non-linear relationship be-

tween demand load proﬁle and historical observations. The output of the data processing and

feature extraction module is given as an input to the training module. Moreover, if we are us-

ing stacked FCRBM forecasting model, then the model will be trained using rectiﬁed linear

unit (ReLU) activation function. In contrary, CRBM forecasting model exploits sigmoid activa-

tion function to perform data training. In this way, the deep learning-based training module is

enabled via learning to forecasts the future load.

5.1.3 Forecasting module

The output of the training module is fed into the forecasting module. In this module, trained

stacked FCRBM and CRBM models forecast the future load. The forecast accuracy of the

proposed model is evaluated in terms of MAPE, RMSE, and correlation coefﬁcient using the

testing data. The output of the forecasting module can be used for SG applications such as power

generation planning, economic operation, unit commitment, load switching, power purchasing,

demand side management, and contract evaluation.

5.2 Proposed system model II

To forecast the future electric load, prediction models must have the ability to learn the non-

linear input/output mapping in a most efﬁcient way. In artiﬁcial intelligence, ANN is one of the

techniques mostly used to forecast non-linear load due to easy and ﬂexible implementation [27].

However, the performance of ANN is highly dependent on adjustable tuning parameters such as

learning rate, number of layers, and number of neurons in the layers. The learning algorithms for

training neural networks such as gradient descent, multivariate AR, and backpropagation may

suffer from premature convergence and overﬁtting [36]. To cure the aforementioned problems,

hybrid forecast strategies in literature have been proposed. However, hybrid forecast strategies

have improved modeling capabilities as compared to unhybrid methods. Still, there is a problem

of slow convergence and high execution time due to a large number of adjustable parameters.

In [37], the authors have used a Bi-level strategy, which is based on ANN and DEA for electric

load forecasting. An accurate fast convergence strategy based on ANN and MEDEA [82] is

proposed to forecast the future load [10]. However, these strategies are highly dependent on

the modular’s knowledge and experience. Moreover, the performance of the aforesaid strategies

are satisfactory for small data size and their performance is compromised as the size of the data

increases. There is no mechanism proposed to handle the large data (big data) and in real life,

the data size is increasing dramatically.

Data

cleansing

Data

normalization

Entropy based mutual information

feature selection

Redundancy

filter

Irrelevancy

filter

Data pre-processing and feature selection module

Historical data Optimization module

Forecasted load

FCRBM based training and forecasting module

X

Figure 2. Schematic diagram and main procedure of the FCRBM based proposed

system model II for hourly and weekly electric load prediction. One arrowheads

denote one way data ﬂow and double arrowheads denote two way data ﬂow.

In this synopsis, a hybrid model based on MMI, FCRBM, and GWDO techniques are pro-

posed for short-term load forecasting, as shown in Figure 2, which is the extension of our earlier

conference paper [84]. The proposed model comprises of three modules as illustrated in Figure

2: a) data pre-processing and feature selection module based on MMI, b) FCRBM based training

and forecasting module, and c) the proposed GWDO algorithm based optimization module.

Prior to performing electric load forecasting, it is indispensable to identify the factors, which

inﬂuence the load behavior. These inﬂuencing parameters include weather factors (humidity,

temperature, and dew point), occupancy patterns, and calendar indicators. However, it is not

feasible to apply all aforementioned candidate inputs to FCRBM based training and forecasting

module. Moreover, the candidates include ineffective features which complicate and degrade the

performance of the model. Thus, the candidate inputs are ﬁrst fed into the data pre-processing

and feature selection module. Then, the pre-processed data is fed to the MMI based feature

selection phase. The output of data pre-processing and features selection module is given as

input to training and forecasting module based on FCRBM. The output of this module is fed

into optimization module based on GWDO, which is the new contribution. The optimization

module ﬁrst calculates the error between the real and forecasted value. Then, it minimizes the

error in order to make accurate predictions. The detailed description of the proposed system

model is as follows:

5.2.1 Data pre-processing and feature selection module

Let, Eis the historical electric load, which is represented in the matrix form. This historical

electric load data is fed into the data pre-processing and feature selection module.

E=

E(1,1)E(2,1)E(3,1)E(4,1)... E(x,1)

E(1,2)E(2,2)E(3,2)E(4,2)... E(x,2)

E(1,3)E(2,3)E(3,3)E(4,3)... E(x,3)

E(1,4)E(2,4)E(3,4)E(4,4)... E(x,4)

. . . . . .

. . . . . .

. . . . . .

E(1,y)E(2,y)E(3,y)E(4,y)... E(x,y)

(1)

where E(1,1)is the electric load of ﬁrst day ﬁrst hour, E(2,1)is the electric load of second day

ﬁrst hour, such that E(x,y)is the electric load of xth day and yth hour. The data is of four years

having 1460 days and each day has 24 hours. The dimension of the data set is 1460 ∗24. Thus,

the total number of data samples is 35040. The rows show the number of hours and columns

show the number of days. The value of xis linked with the tuning of FCRBM training, larger

the value of ximplies ﬁne tuning and vice versa. There is a performance tradeoff between ﬁne

tuning and convergence rate. This input data is ﬁrst passed through the data cleansing phase,

where defective and missing values are replaced by the average value of preceding days’ elec-

tric load. The cleansed data is passed through the normalization phase because the data have

outliers and weight matrix is extremely small, to make the overall weighted sum within the

limits of the activation function. In machine learning, feature extraction/selection is a process

to select abstracted features and ﬁlter out unimportant features. The data pre-processing and

feature selection have signiﬁcant importance because it helps to avoid the curse of dimension-

ality and highly contributes to accuracy. In this regard, entropy based MI technique is a feature

selection technique, which is used in a variety of taxonomy problems such as image processing,

cancer categorization, image recognition, and data mining. The MI features selection technique

is developed and used by [36] and [10] for features selection. In this work, the MI technique

is improved by modiﬁcation (MMI) subjected to accuracy and convergence rate. The cleansed

and normalized data is passed through the MMI based feature selection phase to rank the in-

puts according to their information values. The ranked inputs are ﬁltered using the irrelevancy

and redundancy ﬁlters in order to remove irrelevant and redundant information. The subset of

selected features contains best and more relevant information which highly contributes to the

accuracy. First, the existing MI features selection technique is discussed. Then, the MMI fea-

ture selection technique will be discussed.

The joint entropy for two discrete random variables is deﬁned as, the information obtained while

observing both discrete random variables at the same time. The mathematical description is as

follows:

HE,Et=−∑

i

∑

j

pEi,Et

jlog2pEi,Et

j ∀i,j∈{1,2},(2)

where pEi,Et

jis the joint probability of two discrete random variables, Eiis the input discrete

random variables, and Et

jis the target value. In feature selection, the information which is

common among both variables are indispensable, which is formulated as in [36]:

MI E,Et=∑

i

∑

j

pEi,Et

jlog2

pEi,Et

j

p(Ei)pEt

j

,(3)

where MI Ei,Et

jis used to ﬁnd the mutual information between the two variables Eiand Et

j.

In this case, the candidate inputs are ranked by MI technique between input and the target value.

From entropy based MI technique, the following three conclusions can be drawn:

• If M I Ei,Et

j=0, it indicates that the discrete random variables Eiand Et

jare irrelevant.

• If M I Ei,Et

jhas some larger value, it indicates that discrete random variables Eiand Et

j

are highly relevant.

• If M I Ei,Et

jhas smaller value, it indicates that discrete variables Eiand Et

jare lightly

related.

In [36], among the training data samples last value of every hour of the day is chosen as the target

value. The target value or last sample is very close to next day with respect to time and seems

logical, however, it may cause serious forecast errors due to ignorance of average behavior while

forecasting. In [10], the authors have used average value in addition to the target value because

both average and target values are of equal importance. The Equation 32 is modiﬁed for three

variables as follows:

MI (E,Et,En) = ∑

i

∑

j

∑

k

pEi,Et

j,En

k

×log2p(Ei,Et

j,Ekn)

p(Ei)p(Et

j)p(Ekn),

(4)

where Eknis the average value, which indicates the second target. However, the average value

will be very low, if some values in the selected features are very small. The addition of average

with other two parameters is not sufﬁcient because it may cause serious prediction problems due

to ignorance of moving average. Thus, the Equation 32 is modiﬁed for four variables as follows:

MI (E,Et,Em,En) = ∑

i

∑

j

∑

k

∑

l

pEi,Et

j,Ekn,Em

l

×log2p(Ei,Et

j,Ekn,Em

l)

p(Ei)p(Et

j)p(En

k)p(Em

l),

(5)

where the third target value Em

lis moving average. The Equation 34 is expanded for binary

encoded information as:

A supplementary Szvariable is deﬁned in Equation 36 to ﬁnd the joint and individual prob-

abilities, such that:

Sz=8Et+4En+2Em+E∀E,Et,En,Em∈{0,1},(6)

where Sz∈{0,1,2,3, ......, 15}.Szcounts the number of zeros, ones, twos, threes, and ﬁnally

the number of ﬁfteens. From the aforesaid discussion, the joint and individual probabilities can

be ﬁnd using Equation 36 as:

p(E=0) = S0z+S2z+S4z+S6z+S8z+S10z+S12z+S14z

L

p(E=1) = S1z+S3z+S5z+S7z+S9z+S11z+S13z+S15z

L

p(En=0) = S0z+S1z+S4z+S5z+S8z+S9z+S12z+S13z

L

p(En=1) = S2z+S3z+S6z+S7z+S10z+S11z+S14z+S15z

L

p(Em=0) = S0z+S1z+S2z+S3z+S8z+S9z+S10z+S11z

L

p(Em=1) = S4z+S5z+S6z+S7z+S12z+S13z+S14z+S15z

L

p(Et=0) = S0z+S1z+S2z+S3z+S4z+S5z+S6z+S7z

L

p(Et=1) = S8z+S9z+S10z+S11z+S12z+S13z+S14z+S15z

L

(7)

The Equations 34-37 are the MMI technique equations, which are used to ﬁnd the mutual

information between the four variables such as Ei,Et,En, and Em. The candidate inputs are

ranked on the basis of this mutual information to remove irrelevant and redundant information.

The MMI feature selection technique provides two-fold beneﬁts: a) selection of suitable and

relevant features minimizes the forecast error, and b) selection of the subset of features improves

the convergence rate. Before feeding them to the training and forecasting module, the selected

features are split into training, testing, and validation data samples for training and validation of

the FCRBM. The selected subset of key features is given as input to the training and forecasting

module based on FCRBM.

5.2.2 FCRBM based training and forecasting module

The aim of this module is to devise a framework which is enabled via learning to forecast the

future electric load. In this regard, in literature, a wide variety of short-term load forecasting

models such as dynamic regression, transfer function, and AR heteroscedastic, have been pro-

posed. Although, these models are capable of linear predictions and the behavior of load is

non-linear and stochastic. To solve the aforesaid problems, the authors used novel strategies

for short-term load forecasting in [85] and [86] based on ANN. These forecasting strategies are

capable to handle the non-linear behavior of electric load and forecast the future load. How-

ever, the performance of these strategies is compromised with the increase in the data size. The

deep learning models such as RBM, conditional RBM (CRBM), and FCRBM have better per-

formance at the large datasets. These models have deep layers layout, which has the ability to

capture the highly abstracted features. Thus, FCRBM from the deep learning models is selected

to forecast the future electric load because it provides high quality forecasting .

The training and forecasting module is based on FCRBM, which is the indispensable part

of our proposed hybrid forecasting model. At ﬁrst, the architecture of FCRBM model is deter-

mined. The model has four layers along with neurons, i.e., hidden layers, visible layer, style

layer, and history layer. As discussed earlier, the FCRBM network must be trained and enabled

via learning to forecast the future electric load. Generally, learning is of 3 types, i.e., supervised,

unsupervised, and reinforced. Since in our scenario, we use historical load data, thus we use

supervised learning. In literature, many supervised learning activation functions exist such as

sigmoidal, tangent hyperbolic, ReLU and softmax. However, we chose ReLU activation func-

tion as shown in Equation 38 because it has faster convergence and also overcomes the problem

of vanishing gradient.

f(X,b) = max (0,β(X,b))

∆f(X,b)1 if β(X,b)≥0

0 otherwise,

(8)

where Xis the selected candidate inputs (see Section-III), bindicates bias value, βis for steep-

ness control of activation function. In Figure 2, wv,wy,whare weights of the corresponding

layers and Av,Au,Ay,Bh,Bu,Byare the connections of the corresponding layers to factors,

these are also known as model free parameters. The ˆaand ˆ

belements represent dynamic biases

associated with visible and hidden layers, respectively.

The training and learning procedure iterates for the number of epochs to enable the network

for forecasting. The FCRBM is enabled via training and learning to forecast the future electric

load. Moreover, the performance metric, mean absolute percentage error (MAPE), is considered

as validation error, which is formulated as follows:

MAPE = 1

τ

τ

∑

t=1

|Tt−F

t|

|Tt|!∗100,(9)

where Ttrepresents actual load values, F

tindicates forecasted load values, and τrepresents

number of days under consideration. The further details of FCRBM working and learning ac-

tivation function can be found in [28]. The output of this module is fed into the GWDO based

optimization module to further improve forecast accuracy with affordable convergence rate.

5.2.3 GWDO based optimization module

The objective of this module is to minimize the forecast error with affordable convergence rate.

The authors used DEA [36] and MEDEA [10] to optimize the performance of the model. Both

algorithms have slow convergence rate and low precision [79]. Furthermore, the aforementioned

algorithms are trapped in local optimum [79]. To remedy the aforementioned problems, GWDO

algorithm is proposed, which is a hybrid of WDO and GA algorithms [87]. The proposed

algorithm takes beneﬁt from the features of both algorithms (GA and WDO). The GA enables

the diversity of the population and WDO has faster convergence. The GWDO based module

receives the forecasted load with some error that is minimum as per the ability of FCRBM.

This forecasting error can be minimized with the proposed GWDO optimization technique. The

sole objective of GWDO based optimization module is to ﬁne tune the adjustable parameters of

the model to improve forecast accuracy with affordable convergence rate. In other words, the

optimization module is integrated with the forecasting module in order to minimize the error

and improve the forecast accuracy. Thus, error minimization (MAPE) becomes the objective

function of the optimization module, which is mathematically modeled as:

Mini MAPE(j)

Rt,Irt

∀j∈{1,2,3, ....τ},(10)

where Rtand Irt are the thresholds of redundancy and irrelevancy, respectively. The GWDO

based optimization module fed the optimized values of the thresholds to MMI based feature

selection module to select key features from the given data. The integration of optimization

module to forecasting model increased the execution time, which disturbs the convergence rate

because of the tradeoff between execution time and convergence rate. The integration of opti-

mization module is favorable for applications where forecast accuracy is of primary importance

and vice versa. Our proposed GWDO algorithm among the heuristic algorithms is preferred due

to the following reasons: (i) it avoids premature convergence and (ii) it has faster convergence.

The GWDO algorithm randomly produce population, i.e., the position 11 and velocity matrix

12 as in [87]:

(xnew =1i f rand(1)6sig(j,i)

xnew =0i f rand(1)>sig(j,i)(11)

vi=vmax×2×(rand(populationsize,n)−0.5)(12)

The ﬁtness functions for velocity and position are deﬁned as in Equations 13 and 14 be-

cause the position vector and velocity vector will now be updated by comparing random number

(rand(.)∈[0,1]with ﬁtness function (F F (.)∈[0,1]) as shown in Equation 15.

FF (v(i)) = MAPE (xnew (i))

MAPE (v(i)) + MAPE (xnew (i)) (13)

FF (xnew (i)) = MAPE (v(i))

MAPE (xnew (i)) + MAPE (v(i)) (14)

If random number is less than ﬁtness function then load value will be update because our

objective function is minimization problem.

Fpr(i) = (vn(i)i f rand (i)6FF (v(i))

xn

new (i)i f rand(i)6FF (xnew (i)) (15)

Now there is question mark, why load update have inﬂuence on random value. We cure this

problem by eliminating the load update inﬂuence on random number, now the comparison is

between ﬁtness function of the candidate input to the previous one as shown in Equation 16.

Thus, the selected load update value will have high quality of accuracy.

Fpr+1(i) =

vn+1(i)vn(i)

vn(imax)6F F (v(i))

xn+1

new (i)xn

new (i)

xn

new (imax)6F F (xnew (i))

(16)

With the integration of GWDO based optimization module, the accuracy is improved while

the convergence is compromised because there is a trade-off between accuracy and convergence

rate. However, the proposed short-term load forecasting model outperforms the existing models

i.e., MI-ANN [36], Bi-level [37], and AFC-ANN [10] in terms of accuracy. It is due to the

fact that ANN based models have a shallow layout and their performance is degraded with the

increase in data size. The FCRBM has improved performance with the large datasize due to

their deep layers layout.

5.3 The proposed system model III

In literature, many authors used ANN based forecaster for load prediction due its capability for

handling the nonlinearity of consumers load. However, the performance of ANN based models

are not satisfactory in terms of accuracy. Thus, some authors integrated optimization module

with ANN based forecaster, which improves signiﬁcantly the forecast accuracy. However, the

accuracy is improved at the cost of slow convergence rate. Moreover, the ANN based models are

suitable for small datasize while their performance is degraded as the datasize increases. Thus,

we proposed new electric load forecasting model based on FCRBM [88] as shown in Figure 3.

The proposed model is subjected to: accuracy, convergence rate, and scalability. The proposed

system architecture comprises of four modules: 1) data processing and feature selection mod-

ule, 2) FCRBM based forecaster module, 3) GWDO based optimizer module, and 4) utilization

module. The historical load data and exogenous parameters (temperature, humidity, wind speed,

and dew point) are given as input to data processing and features selection module. The input

data is normalized and passed through relevancy ﬁlter, redundancy ﬁlter, and candidate inter-

action phase. The aim of this module is to clean the data and select abstractive features for

forecast process by maximizing relevancy, minimizing redundancy, and maximizing candidates

interaction. The selected features are fed into forecaster module based on FCRBM. The purpose

is to predict the future load of FE grid. The forecasted load is fed into optimizer module based

on GWDO. The purpose is to improve the forecast accuracy. At last, the forecasted load is fed

into the utilization module to utilize the predicted load proﬁle for the decision making in the

SG. The detailed description is as follows:

5.3.1 Data processing and features selection module

The input data including historical load data and exogenous parameters (temperature, humidity,

wind speed, and dew point) is fed into the data processing and feature selection module. At ﬁrst,

the data cleansing is performed to recover the missing and defective values. Then, the clean data

is normalized using Equation 30 to remove the outliers and make the data within the limit of the

activation function;

Norm =X−µ(X)

std(X),(17)

where Norm is the normalized data, Xis the input data, and std is the standard deviation. The

input data (X) includes electric load data (P(h,d)), temperature (T(h,d)), humidity (H(h,d)),

dew point (D(h,d)), and wind speed (W(h,d)) parameters. The hshows particular hour and d

shows particular day of historical data. The temperature, humidity, dew point, and wind speed

are called exogenous variables. The normalized data is passed to through irrelevancy ﬁlter, re-

dundancy ﬁlter, and candidate interaction phase subjected to removal of irrelevant, redundant,

and nonconstructive information due to three reasons: a) redundant information are not use-

ful and increase the execution time during training phase, b) irrelevant features do not provide

useful information and act as an outlier, and c) interacting candidates provide useful informa-

tion to improve the forecast accuracy. The detailed description of relevancy, redundancy, and

candidates interaction are as follows:

Data

normalization

Data cleansing

Mutual information-based feature

selection

Redundancy

filter

Irrelevancy

filter

Data processing and feature selection module

Data

normalization

Data cleansing

Mutual information-based feature

selection

Redundancy

filter

Irrelevancy

filter

Data processing and feature selection module

Candidates

interaction

P(h,d)

T(h,d)

H(h,d)

W(h,d)

D(h,d)

1 2 3

, , ....., n

S S S S

GWDO based optimizer module

GWDO based parameters

tuning, settings,

and optimizing

, ,

( ( ))

th th I

R I C

Minimization Erro r i

P(h+1, d+1)

, ,th th i

R I andC

, ,th th i

R I and C

Utilization module

Decision making,

generation planning,

operation planning,

load switching,

energy purchasing,

contract evaluation,

and load scheduling

Iterative search procedure

Data

normalization

Data cleansing

Mutual information-based feature

selection

Redundancy

filter

Irrelevancy

filter

Data processing and feature selection module

Candidates

interaction

P(h,d)

T(h,d)

H(h,d)

W(h,d)

D(h,d)

1 2 3

, , ....., n

S S S S

GWDO based optimizer module

GWDO based parameters

tuning, settings,

and optimizing

, ,

( ( ))

th th I

R I C

Minimization Erro r i

P(h+1, d+1)

, ,th th i

R I andC

, ,th th i

R I and C

Utilization module

Decision making,

generation planning,

operation planning,

load switching,

energy purchasing,

contract evaluation,

and load scheduling

Iterative search procedure

Figure 3. The proposed system model III

5.3.2 Relevancy operation

The relevance of candidate inputs to the target variable is important for abstractive and key fea-

tures selection. For relevancy measurement in literature many techniques are used [89] among

which MI feature selection technique is chosen good. The MI measures the relevance between

to variables xand y. The MI measurement is interpreted as observing yby studying xand vice

versa. The MI for continuous variables xand yis represented by I(x;y)and deﬁned for both

individual (p(x),p(y)) and joint probability distribution (p(x,y)). Assume that

S={x1,x2,x3, ...., xM},(18)

where Srepresents the set of candidate inputs and yis the target variable. The relevance of each

candidate input with target variable yis checked. The relevance of candidate input xiwith target

variable yis deﬁned by the following Equation;

D(xi) = I(xi;y),(19)

where D(xi)represents the relevance of each candidate input to the target variable.

5.3.3 Redundancy operation

Many authors in [90]-[92] modeled the redundancy operation between the candidate inputs.

The purpose is to remove the redundant information from the input data to improve conver-

gence rate. The redundancy is evaluated in terms of the common information between the two

candidate inputs. In [89], authors demonstrated that closely related candidate inputs degrade

the performance of feature selection technique. The reason is that two candidate inputs have a

lot of common information and less redundant information about the target variable. Thus, a

variable with less redundant information about the target variable may be incorrectly count as

highly redundant and ﬁltered out, while it may be the key feature for forecaster. To overcome

the aforementioned problem a redundancy measure based on interaction gain (Ig) is proposed

[93] as:

RM(xi,xs) = Ig(xi;xs;y)

=I[(xi,xs);y]−I(xi;xs)−I(xs;y),(20)

where RM(xi,xs)is the redundancy measure, xi,xsare candidate inputs, and yis the target vari-

able. The Ig can be mathematically modeled in terms of joint and individual entropy as:

Ig(xi;xs;y) = H(xi,xs) + H(xi,y) + H(xs,y)

−H(xi)−H(xs)−H(y)−H(xi,xs,y),(21)

where H(xi),H(xs), and H(y)denote individual entropy and H(xi,xs),H(xi,y),H(xs,y), and

H(xi,xs,y)denote joint entropy.

5.3.4 Interaction session

In [93], used redundancy and irrelevancy ﬁlters for feature selection. However, the individual

features may be irrelevant but becomes relevant when used together with other input candidates.

Thus, the feature selection technique can be extended to interaction among the candidate inputs.

If two candidate inputs xiand xshave redundant information about target y, then the joint MI

of both candidates with ywill be less than the sum of individual MIs. Thus, the result will

be negative according to Equation 32, which indicates redundant features xiand xsfor fore-

caster. The absolute value of Equation 32 shows amount of redundancy. On the other hand,

if xiand xscandidate inputs interact with target ytheir interaction causes joint (xiand xs) MI

with target ygreater than the sum of individual MIs. Thus, the positive value of Equation 32

indicates interacting features and its absolute value shows amount of interaction. Consequently,

for redundancy and interaction the Equation 32 can be deﬁned as in terms of interaction gain

(Ig):

RM(xi,xs) = {Ig(xi;xs;y),if Ig(xi;xs;y)<0,

0 otherwise (22)

In(xi,xs) = Ig(xi;xs;y),if Ig(xi;xs;y)>0,

0 otherwise (23)

where Equation 34 is a modiﬁed version of Equation 32 for redundancy measure and Equation

35 is for interaction measure. The

IM(xi) = Maximize

xj∈S−{xi}In(xi,xj)(24)

5.3.5 The modiﬁed feature selection technique

The purpose of this modiﬁed features selection technique is to maximize both relevancy and

interaction, and minimize redundancy based on the ﬁlters introduced in the preceding subsection

(5.3.1). Our modiﬁed feature selection technique also consider candidates interaction while the

existing techniques i.e., [90]-[93] only consider relevancy and the redundancy ﬁlters. The ﬂow

chart of our modiﬁed feature selection technique is shown in Figure 4. The detailed description

and step by step procedure is as follows:

Start

1 2

{ , , , ...., }

p p p p p

M

S x x x x

Return

Filtering stage (see Fig. 3)

Post filtering stage (see Fig. 4)

n

S

n

S

Return , which is the finally selected

candidates

End

Input candidates set and

target y

i

x S

Relevancy measure using

Equation (3)

i

x S

Interaction measure using

Equation (4)

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

i

x S

Interaction measure using

Equation (4)

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

i

x S

Interaction measure using

Equation (4)

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

i

x S

Relevancy measure using

Equation (3)

i

x S

Interaction measure using

Equation (4)

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

Input candidates set and

target y

i

x S

Relevancy measure using

Equation (3)

i

x S

Interaction measure using

Equation (4)

( ) ( ), ( )

( ) . ( ), 0

i i i

i i

Ic x f D x I M x

D x IM x

Sorting candidate inputs

using

s

S

&

s n

S S

&

s n

S S

NO

Yes

s

S S

Remove

n

S

s

S S

Remove

n

S

Figure 4. FLow chart of the modiﬁed feature selection technique

Step1: Input data including candidate set of inputs and target value yare given as input to

the technique.

Step2: Pre-ﬁltering phase is demonstrated as:

• The blocks enclosed in the dotted box are pre-ﬁltering phase. In this phase, the relevancy

and interaction measures are calculated and candidate inputs are ranked on the basis of

calculated measure.

• The information content can be measured from its individual information and the gained

information using a modiﬁed version of Equation 32 mentioned in the ﬂow chart 4. The

function f(,)is a monotonically increasing function, and αis a weight factor that weights

the relevancy verses interaction measure. It can be adjusted and ﬁne tuned subjected to

forecasting problem.

• The selected candidate inputs of pre-ﬁltering phase (Sp) are sorted in descending order

based on the information content.

Step3: Filtering phase individually depicted in Figure 5 and is described as follows:

• The output of pre-ﬁltering phase is fed as an input to the ﬁltering phase. In this step, the

pre-selected features are partitioned into selected (Ss) and non-selected (Sn) features as

Start

1 2

{ , , ,...., }

p p p p p

M

S x x x x

Input

1 2

, { , , ,...., }

p p p p

n s

M

S S x x x x

R( ) ( , )

p p

i

p p p

i i j

x S

x Minimize RM x x

( ) ( ), ( ), ( )

( ) . ( ) . ( ), , 0

p p p p

i i i i

p p p

i i i

V x g D x IM x R x

D x IM x R x

( )

p

i th

V x R

{ }

p p

s

i

S S x

{ }

p

n n

i

S S x

i M

Yes

1i i

Sorting & according to ( )

p

s n

i

S S V x

1 2 1 2

{ , , ,...., }, { , , ,...., },

p p p p

s n n n n n s n

M M u

S x x x x S x x x x S S S

Return

1 2 1 2

{ , , ,...., }, { , , ,...., },

p p p p

s n n n n n s n

M M u

S x x x x S x x x x S S S

Return

1 2 1 2

{ , , ,...., }, { , , ,...., },

p p p p

s n n n n n s n

M M u

S x x x x S x x x x S S S

Return

End

Yes

No

Yes

Start

1 2

{ , , ,...., }

p p p p p

M

S x x x x

Input

1 2

, { , , ,...., }

p p p p

n s

M

S S x x x x

R( ) ( , )

p p

i

p p p

i i j

x S

x Minimize RM x x

( ) ( ), ( ), ( )

( ) . ( ) . ( ), , 0

p p p p

i i i i

p p p

i i i

V x g D x IM x R x

D x IM x R x

( )

p

i th

V x R

{ }

p p

s

i

S S x

{ }

p

n n

i

S S x

i M

Yes

1i i

Sorting & according to ( )

p

s n

i

S S V x

1 2 1 2

{ , , ,...., }, { , , ,...., },

p p p p

s n n n n n s n

M M u

S x x x x S x x x x S S S

Return

End

Yes

No

Yes

Figure 5. Flow chart of the ﬁltering phase

shown in Figure 4. The redundancy measure is calculated by Equation 25 as:

R(p

xi) = Minimize

p

xi∈p

SnRM(p

xi,

p

xj)o,(25)

where R(p

xi)indicates the redundancy measure for each candidate input p

xi∈p

S.

• The information value of candidate features is evaluated on the basis of three measures,

i.e., redundancy, relevancy, and interaction, which is mathematically described as:

V(p

xi) = gnD(p

xi),IM(p

xi),R(p

xi)o

=D(p

xi) + α.IM(p

xi) + β.R(p

xi),α,β>0,

(26)

where V(p

xi)denotes information value, g(,)indicates a monotonically increasing linear

function, and βdenotes adjustable parameter, respectively.

• Decision about the information value is taken as follows:

If V(p

xi)>Rth →S

S=S

S+{p

xi}

If V(p

xi)6Rth →n

S=n

S+{p

xi},

(27)

where Rth is the redundancy threshold. The information value is compared with the re-

dundancy threshold, if the information value is greater or equal to the relevancy threshold,

it will be put into the set of selected features list (s

S) otherwise it will be put into the set of

non-selected features (n

S).

• The set of selected and non-selected features are sorted in descending order according

to their information value and also their union is taken. The selected and not selected

features sets and their union are given as input to the post-ﬁltering stage, which is indi-

vidually depicted in Figure 6.

Step4: In post-ﬁltering phase the selected (s

S) and non-selected inputs are modiﬁed and the

information value V(.)is updated. The updated information values are evaluated again using

Equation 27 to transfer candidate inputs either to selected and non-selected features. Step5:

The algorithm is terminated if the non-selected features n

Sset becomes null. The pre-ﬁltering,

ﬁltering, and post-ﬁltering phases are executed in each iteration and the execution never trapped

into inﬁnite loop. Finally, the selected features are fed into the forecaster module.

Start

End

A

Input

&

s n

S S

Se t K M

Set

s

k

S

S e t 1m

{ } ( , )

j i

i j

x S x

Maximize In x x

−

Yes

No

{ }

s s

k k i

S S x

m IM

1m m

No

Yes

Yes

s

k

S

No

Set as the first featureof

s

i k

x S

Set as the next

feature of

s

i

k

x

S

( ) :

( ) ( ), ( ), ( ) ( ) . ( ) . ( ), , 0

R( ) ( , )

s s

i

s s

i k i

s s s s s s s

i i i i i i i

s s s

i i j

x S

For x S updateV x as

V x g D x IM x R x D x IM x R x

x Minimize RM x x

( )

s

i th

V x R

( ) ( )

2

s s

i k

ik

V x V x

V

ik th

V R

{ }

{ }

n n n

k k i

s s

k k i

S S x

S S x

Is a s th e la s t

fe a t u re o f

s

i

k

x

S

No

Yes

No

Yes

1k k−

1k N

Return &

s n

S S

No

Yes

No

Yes

Start

End

A

Input

&

s n

S S

Se t K M

Set

s

k

S

S e t 1m

{ } ( , )

j i

i j

x S x

Maximize In x x

−

Yes

No

{ }

s s

k k i

S S x

m IM

1m m

No

Yes

Yes

s

k

S

No

Set as the first featureof

s

i k

x S

Set as the next

feature of

s

i

k

x

S

( ) :

( ) ( ), ( ), ( ) ( ) . ( ) . ( ), , 0

R( ) ( , )

s s

i

s s

i k i

s s s s s s s

i i i i i i i

s s s

i i j

x S

For x S updateV x as

V x g D x IM x R x D x IM x R x

x Minimize RM x x

( )

s

i th

V x R

( ) ( )

2

s s

i k

ik

V x V x

V

ik th

V R

{ }

{ }

n n n

k k i

s s

k k i

S S x

S S x

Is a s th e la s t

fe a t u re o f

s

i

k

x

S

No

Yes

No

Yes

1k k−

1k N

Return &

s n

S S

No

Yes

No

Yes

Figure 6. Flow chart of the post-ﬁltering stage

5.3.6 FCRBM based forecaster module

The purpose of this module is devise a framework which is enabled via learning to forecast

the future electric load. From section 2, it is concluded that all forecast models are capable

to predict non-linear electric load proﬁle. Thus, we chose FCRBM for forecaster module due

to two reasons: a) it predicts the non-linear electric load with reasonable accuracy and con-

vergence rate, b) and it has improving performances with the scalability of data. The FCRBM

is a deep learning model. It has four layers, i.e., hidden layer, visible layer, style layer, and

history layer. Each layer has particular numbers of neuron. In forecaster module FCRBM is

activated by ReLU activation function and multivariate auto regressive algorithm. The ReLU

and multivariate auto regressive algorithm are chosen because they overcome the problems of

overﬁtting and vanishing gradient, and has fast convergence as compared to other activation

functions. The mathematical model of ReLU is descried as in Equation 38; The training and

learning procedure iterates for a number of epochs to forecast the future load. To update weight

and bias vectors during training processes authors used different algorithms, i.e., gradient decent

and back-propagation [94], Levenberg-Marquardt algorithm [36], and multivariate auto regres-

sive algorithm [10]. The Levenberg-Marquardt algorithm train the network faster as compared

to gradient decent and backpropagation. Thus, multivariate auto regressive algorithm is used for

network training due to its fast convergence and better performance. The selected features of

data processing module S1,S2,S3, ....Snis fed into the forecaster module, where the forecaster

constructs training and testing data samples. The ﬁrst three years data samples are used for the

network training. On the other hand, the last year data samples are used for testing. The pur-

pose is to enable FCRBM via training to forecast the future load. The pictorial view of training

and learning process is shown in Figure 7. The forecaster module returns error signal and the

weights and biases are adjusted as per multivariate auto regressive algorithm [95]. This error

signal is fed into the optimization module to further improve the forecast accuracy.

FCRBM

Training process

Real load

Initial forecast

Error signal

Input

Final forecast

FCRBM

Training process

Real load

Initial forecast

Error signal

Input

Final forecast

FCRBM

Training process

Real load

Initial forecast

Error signal

Input

Final forecast

Figure 7. Training and learning process of FCRBM

5.3.7 GWDO based optimizer module

The preceding module returns the future predicted load with some error, which is minimum

as per the capability of FCRBM, ReLU, and multivariate auto regressive algorithm. To further

minimize the forecast error the output of forecaster module is fed into the optimizer module. The

purpose of the optimizer module is to minimize the forecast error. Thus, the error minimization

becomes an objective function for optimizer module and can be mathematically modeled as:

Minimize

Rth ,Ith ,Ci

Error (x)∀x∈ {h,d},(28)

where Rth is redundancy threshold, Ith is irrelevancy threshold, and Ciis candidates interac-

tion. The optimizer module is based on our proposed GWDO algorithm. The optimizer module

optimizes Rth ,Ith , and Ciand feedback these parameters to data processing module. In data

processing module, feature selection technique use optimized values of Rth ,Ith thresholds, and

Cicandidates interaction for optimal selection of features. The integration of optimizer module

with forecaster module increase forecast accuracy at the cost of high execution time. Usually,

the integration of optimizer with forecaster module is preferred for those application where ac-

curacy is of high importance compared to convergence rate. For optimization various techniques

are available like linear programming, non-linear programming, convex programming, quadratic

programming, and heuristic techniques. The linear programming is avoided because the opti-

mization problem is non-linear. The non-linear programming is applicable here and returns

more accurate results at cost of large execution time. The convex optimization and heuristic

optimization suffer from slow and premature convergence, respectively. Similarly, the mEDE

[10], [82], and DE [36] are not adopted because of slow convergence, low precision, and trapped

into local optimum [96]. To cure the aforementioned problems, we proposed GWDO algorithm.

In other words, GWDO algorithm is preferred because it provides optimal solution with fast

convergence rate. The proposed GWDO algorithm is hybrid of GA [97] and WDO [79]. This

hybrid algorithm is beneﬁcial because it utilizes the key characteristics of both algorithms. The

GA enables diversity of population and WDO has fast convergence. The forecasted future load

is utilized in the utilization module for planning, operation, and unitcommitment.

5.3.8 Utilization module

The forecasted load is utilized for long term planning and development of SG that needed state

permits ﬁnancing, right of ways, transmission and generation equipment, power lines (transmis-

sion lines and distribution lines), and substation construction.

6 Proposed methods

In this section, two deep learning techniques: CRBM and stacked FCRBM are introduced for

short-term load forecasting. For both of these models, three ingredients: error function, con-

ditional probability, and update/learning rules are described. Error function of a given network

provides scalar values that are essential for the conﬁguration. Conditional probability calculates

the probability of an event over the speciﬁc condition. Update/learning rules are required for

tuning free parameters of the system.

6.1 CRBM

CRBM [98] is a modiﬁed version of the RBM [99]. It is a machine learning probabilistic model

used to model human activities, weather data, collaborative ﬁltering, classiﬁcation, and time-

series data [100]. For training non-conditional RBM much progress has been conducted which

is not applicable to conditional model and almost no work has been made on training and predic-

tions from CRBM. It has three-layered architecture: visible layer, hidden layer and conditional

history layer, as shown in Figure 8. Moreover, it deﬁnes the probability distribution of one layer

conditioned over two remaining layers. It also allows conditional history layer to determine

increment in visible and hidden layers biases and weights, respectively. Three ingredients, i.e.,

error function, conditional probability and learning rules of CRBM are described as follows:

b

a

uh

w

uv

w

h

v

u

Figure 8. Generic architecture of CRBM

6.1.1 Error function

The error function expresses the possible correlation between input, conditional history layer,

hidden layer, and output. In addition, the error function taking into account all the possible

interactions between neurons, weights, and biases. Equation 29 computes the error function as:

E(v,u,h;w) = −(vTwvh h+uTwuvv+uTwuh h+yTa+hTb)(29)

where v= [v1,v2, ....., vn]is the real valued vector having visible unit neurons from 1 to

n,u= [u1,u2, ....., un]shows the real valued vector having history neurons from 1 to n,h=

[h1,h2, ....., hn]denotes the binary vector having hidden neurons from 1 to n,wis the weight

matrix, ais the visible layer bias, and bis the hidden layer bias. The weight matrix wvh is

bidirectional while the weight matrices wuh and wuv are unidirectional.

6.1.2 Conditional probability

Conditional probability in case of CRBM determines the probability distribution over two infer-

ences. First inference p(h|v,u), is to determine the probability of hidden layer inferenced on all

the layers, while the second inference p(v|h,u), is to determine the probability of visible layer

conditioned on all the other layers. Since in CRBM there is no intra-layer connection between

the neurons of the same layer, but inter-layer connection between the neurons of different layers

exist. The two inferences are leading to:

p(h|v,u) = sigmoid(uTwuh +vTwvh +b)(30)

p(v|h,u) = sigmoid(wuv uT+wvhh+a)(31)

where

sigmoid =1

1+exp(−x)

6.1.3 Weights and biases learning and update rules

We use stochastic gradient decent method for learning and updating the weights and biases of

the layers because other alternate methods, sometimes, have the problem of vanishing gradi-

ent which made the network hard to train. The parameters are ﬁne tuned to minimize the gap

between real and forecast values. The gradient of weights are calculated by the following Equa-

tion:

∆wuh

t=−η∂E

∂wuh

∆wuv

t=−η∂E

∂wuv

∆wvh

t=−η∂E

∂wvh

(32)

For each layer the change in biases is calculated by the following Equation:

∆at=−η∂E

∂av

∆bt=−η∂E

∂bh

(33)

The weights are updated as follows:

wuh

t+1=wuh

t+∆wuh

t

wuv

t+1=wuv

t+∆wuv

t

wvh

t+1=wvh

t+∆wvh

t.

(34)

The biases are updated by the following Equation:

at+1=at+∆at

bt+1=bt+∆bt

(35)

where ηis the learning rate, ∆shows the gradient, and tis the iteration number. The aforemen-

tioned procedure is repeated for the number of epochs until the model converge.

6.2 Stacked FCRBM

FCRBM is an extension of the CRBM introduced by Taylor and Hinton in [100]. In FCRBM

[103], they add the concept of factor and styles to mimic multiple human actions (as shown

in Figure 9). Its contrastive divergence does not suffer from the issue of vanishing gradient

as in backpropagation. It has a rich, distributed hidden state which permits simple and exact

inference that helps in preserving the temporal information present in the electricity load time

series [101]. We propose a new way to adopt deep learning technique stacked FCRBM for

short-term load forecasting, where the successive layers take the output from preceding trained

layers and improve the forecast accuracy. The stacked FCRBM has three layers of CRBM and

an additional layers of styles (as shown in Figure 10). The last style layer represents multiple

parameters that are important for load forecasting.

h

y

a

b

y

w

y

B

y

A

h

B

u

B

v

A

u

A

v

w

h

w

v

Figure 9. Generic architecture of stacked FCRBM

Now the stacked FCRBM comprises of four layers as shown in Figure 9: a) visible layer

v, b) history layer u, c) hidden layer h, and d) style layer y. The visible and history layers are

real-valued while the hidden layer is binary. These layers are signiﬁcant for the proper operation

of stacked FCRBM. The visible layer is responsible for encoding the present time series data to

forecast the future value, while the history layer will encode historical time series data. Hidden

layer is responsible for the discovery of signiﬁcant features that are required for analysis. The

different styles and parameters, which are essential for forecasting, are embedded into the style

layer. The relation and interaction between the layers, weights, and factors are expressed by an

error function as:

E(v,u,h;w) = −vTˆa−hTˆ

b−∑n(vTwv)◦(yTwy)◦(hTwh)o(36)

where Eis the error function, vTwvis the visible factored, yTwyis the style factored, and hTwh

is the hidden factored. It ◦is a hadamard product used for element wise multiplication. The ˆa

C

R

B

M

L

a

y

e

r

1

C

R

B

M

L

a

y

e

r

2

C

R

B

M

L

a

y

e

r

3

S

t

y

l

e

L

a

y

e

r

Predicted Load

Input Features

Figure 10. Architecture of proposed stacked FCRBM

and ˆ

belements represent dynamic biases associated with visible and hidden layers, respectively,

which are deﬁned as follows:

ˆa=a+AvuTAu◦yTAyT

ˆ

b=b+BhuTBu◦yTByT

(37)

where wv,wy,whare weights of the corresponding layers and Av,Au,Ay,Bh,Bu,Byare the con-

nections of the corresponding layers to factors, these are also known as model free parameters.

The connections and weights are the parameters that must be ﬁne tuned and trained for accurate

performance of deep learning technique stacked FCRBM.

6.3 Conditional probability

In case of stacked FCRBM, conditional probability determines probability distribution of one

layer conditioned over all the remaining layers. In ﬁrst case, we deﬁne probability distribution of

hidden layer conditioned over all the remaining layers p(h|v,u,y). There is no intra-layer con-

nection between the neurons of the same layer, but inter-layer connection between the neurons

of different layers. The conditional probability of hidden layer can be calculated as:

p(h|v,u,y) = ReLU hˆ

b+whvTwv◦yTwyi(38)

where ReLU is deﬁned in Equation 38.

For all inputs, probability of hidden layer neurons is evaluated using ReLU activation func-

tion.

In second case, we determine the probability of the visible layer i.e., p(v|h,u,y)conditioned

over remaining layers. The conditional probability of visible is deﬁned as:

p(v|h,u,y) = ReLUhˆa+wvnhTwh◦yTwyoi (39)

Finally, we deﬁne the joint probability distribution of visible and hidden layer neurons con-

ditioned on history layer, style layer, and model parameters p(v,h|u,y, ...). The restriction is that

there is no intra-layer connection between the neurons while there is only inter-layer connection

between the neurons of different layers. The joint probability is calculated as:

p(v,h|u,y, ...) = ReLUˆ

b+wh{(vTwv)◦(yTwy)}∗

ˆa+wv{(hTwh)◦(yTwy)} (40)

Equation 41 represents the joint probability distribution of visible and hidden layer neurons.

Algorithm 1 Pesudo-code of the proposed short-term load forecasting model

1: Import the off-line data of the US utility

2: Restore the defective and missing values by data cleansing phase

3: Normalize the data w.r.t. its maximum value by data normalization phase

4: Change the data structure by data structuring phase

5: Extract the desired features from the data and split into training, testing, and validation

datasets

6: Create architecture of the CRBM

7: Create architecture of the stacked FCRBM

8: Initialize parameters: learning rate η, weights w, and biases b

9: repeat for the number of training epochs

10: if Model selected is CRBM do

11: for available training data do

12: Adjust the visible layer v

13: Adjust the history layer u

14: Adjust the hidden layer h

15: Create weights and biases of the corresponding layers

16: Convolve weights to the corresponding layers

17: Add dynamic biases to the weighted sum

18: Process the results (using Equations 30-31)

19: Calculate the error function of CRBM

20: Passed the error function through stochastic gradient decent (using Equations 32-33)

21: Update the weights and biases (using Equations 34-35)

22: end for

23: else if Pick stacked FCRBM do

24: for available training data do

25: Adjust the visible layer v

26: Adjust the history layer u

27: Adjust the hidden layer h

28: Adjust the style layer y

29: Create the factored visible, factored hidden, and factored label weights

30: Interact factored weights to the corresponding layers

31: Add dynamic biases to the factor weighted sum

32: Passed the result to the activation function: ReLU (using Equation 36)

33: Calculate the error function for stacked FCRBM

34: Passed the error function through stochastic gradient decent (using Equations 41-44)

35: Update the weights and biases (using Equations 45-48)

36: end for

37: end if

38: until converge

6.4 Stacked FCRBM weights and biases learning rules

We adopt stochastic gradient decent for learning and updating rules to overcome the problem

of vanishing gradient. Moreover, the stochastic gradient decent converges faster and avoids

overﬁtting on large datasets as compared to batch gradient decent and mini-batch gradient decent

algorithms [102]. The gradient of the weights for each layer is calculated as:

∆wh

t=−η∂E

∂wh

∆wv

t=−η∂E

∂wv

∆wy

t=−η∂E

∂wy

(41)

For each layer the gradient of connections are calculated as follows:

∆Au

t=−η∂E

∂Au

∆Av

t=−η∂E

∂Av

∆Ay

t=−η∂E

∂Ay

(42)

∆Bu

t=−η∂E

∂Bu

∆Bh

t=−η∂E

∂Bh

∆By

t=−η∂E

∂By

(43)

The gradient of dynamic biases are as follow:

∆ˆa=−η∂E

∂v

∆ˆ

b=−η∂E

∂h

(44)

The weights of corresponding layers are updated as:

wh

t+1=wh

t+∆wh

t

wy

t+1=wy

t+∆wy

t

wv

t+1=wv

t+∆wv

t

(45)

The connections are updated as follows:

Au

t+1=Au

t+∆Au

t

Av

t+1=Av

t+∆Av

t

Ay

t+1=Ay

t+∆Ay

t

(46)

Bu

t+1=Bu

t+∆Bu

t

Bh

t+1=Bh

t+∆Bht

By

t+1=By

t+∆Byt

(47)

The dynamic biases are updated as follows:

ˆat+1=ˆat+∆ˆat

ˆ

bt+1=ˆ

bt+∆ˆ

bt

(48)

where Equation 45 is weight update equation for each layer, Equation 46, Equation 47, and

Equation 48 are dynamic biases update equations. Pseudo-code of our proposed short-term load

forecasting model is given as in Algorithm 1.

7 Research methodology

The main objective of this research work is to design an accurate and fast converging model

based on deep neural network for the decision making of the SG. Thus, a novel hybrid forecast

model composed of MMI, FCRBM, and GWDO techniques is proposed for short-term electric

load forecasting. The aforementioned techniques are arranged in coordinated modular frame-

work to construct the proposed hybrid model. Furthermore, the proposed model is tested on

hourly historical load data of three USA grids (FE, Daytown, and EKPC) and global energy

forecasting competition 2012. The results utilizing the proposed model have proven more ac-

curate when compared to the existing models like ANN, CNN, Bi-level, MI-ANN (MI-ANN),

and accurate fast converging-ANN (AFC-ANN). Subsequently, the main body of this research

work will involve the following topics:

1. Acquiring the basic knowledge and detailed literature survey

2. Research gap analysis

3. Statement of the problem

4. Investigation of the proposed system model

5. Provision of the requirements for proposed methods evaluation

6. Simulation results and discussion study

7. Comparison with existing literature models

8. Thesis write-up

References

[1] Javaid, Nadeem, Ghulam Hafeez, Sohail Iqbal, Nabil Alrajeh, Mohamad Souheil Alabed,

and Mohsen Guizani. “Energy efﬁcient integration of renewable energy sources in the smart

grid for demand side management." IEEE Access 6 (2018): 77077-77096.

[2] Xiao, Liye, Wei Shao, Chen Wang, Kequan Zhang, and Haiyan Lu. “Research and applica-

tion of a hybrid model based on multi-objective optimization for electrical load forecasting."

Applied Energy 180 (2016): 213-233.

[3] Alahakoon, Damminda, and Xinghuo Yu. “Smart electricity meter data intelligence for

future energy systems: A survey." IEEE Transactions on Industrial Informatics 12, no. 1

(2016): 425-436.

[4] Hernandez, Luis, Carlos Baladron, Javier M. Aguiar, Belén Carro, Antonio J. Sanchez-

Esguevillas, Jaime Lloret, and Joaquim Massana. “A survey on electric power demand fore-

casting: future trends in smart grids, microgrids and smart buildings." IEEE Communica-

tions Surveys & Tutorials 16, no. 3 (2014): 1460-1495.

[5] Rahman, Aowabin, Vivek Srikumar, and Amanda D. Smith. “Predicting electricity con-

sumption for commercial and residential buildings using deep recurrent neural networks."

Applied Energy 212 (2018): 372-385.

[6] Boroojeni, Kianoosh G., M. Hadi Amini, Shahab Bahrami, S. S. Iyengar, Arif I. Sarwat,

and Orkun Karabasoglu. “A novel multi-time-scale modeling for electric power demand

forecasting: From short-term to medium-term horizon." Electric Power Systems Research

142 (2017): 58-73.

[7] Xu, Xiaomin, Dongxiao Niu, Qiong Wang, Peng Wang, and Desheng Dash Wu. “Intelli-

gent forecasting model for regional power grid with distributed generation." IEEE Systems

Journal 11, no. 3 (2017): 1836-1845.

[8] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. “Deep

belief network based electricity load forecasting: An analysis of Macedonian case." Energy

115 (2016): 1688-1700.

[9] Hong, Wei-Chiang. “Electric load forecasting by seasonal recurrent SVR (support vector

regression) with chaotic artiﬁcial bee colony algorithm." Energy 36, no. 9 (2011): 5568-

5578.

[10] Ahmad, Ashfaq, Nadeem Javaid, Mohsen Guizani, Nabil Alrajeh, and Zahoor Ali Khan.

"An accurate and fast converging short-term load forecasting model for industrial appli-

cations in a smart grid." IEEE Transactions on Industrial Informatics 13, no. 5 (2017):

2587-2596.

[11] Hahn, Heiko, Silja Meyer-Nieberg, and Stefan Pickl. “Electric load forecasting methods:

Tools for decision making." European journal of operational research 199, no. 3 (2009):

902-907.

[12] Taylor, James W. “An evaluation of methods for very short-term load forecasting using

minute-by-minute British data." International Journal of Forecasting 24, no. 4 (2008): 645-

658.

[13] De Felice, Matteo, and Xin Yao. “Short-term load forecasting with neural network ensem-

bles: A comparative study [application notes]." IEEE Computational Intelligence Magazine

6, no. 3 (2011): 47-56.

[14] Pedregal, Diego J., and Juan R. Trapero. “Mid-term hourly electricity forecasting based on

a multi-rate approach." Energy Conversion and Management 51, no. 1 (2010): 105-111.

[15] Filik, Ümmühan Ba¸saran, Ömer Nezih Gerek, and Mehmet Kurban. “A novel modeling

approach for hourly forecasting of long-term electric energy demand." Energy Conversion

and Management 52, no. 1 (2011): 199-211.

[16] López, M., S. Valero, C. Senabre, J. Aparicio, and A. Gabaldon. “Application of SOM

neural networks to short-term load forecasting: The Spanish electricity market case study."

Electric Power Systems Research 91 (2012): 18-27.

[17] Zjavka, Ladislav, and Václav Snášel. “Short-term powerload forecasting with ordinary dif-

ferential equation substitutions of polynomial networks." Electric Power Systems Research

137 (2016): 113-123.

[18] Liu, Dunnan, Long Zeng, Canbing Li, Kunlong Ma, Yujiao Chen, and Yijia Cao. “A dis-

tributed short-term load forecasting method based on local weather information." IEEE Sys-

tems Journal 12, no. 1 (2018): 208-215.

[19] Ghadimi, Noradin, Adel Akbarimajd, Hossein Shayeghi, and Oveis Abedinia. “Two stage

forecast engine with feature selection technique and improved meta-heuristic algorithm for

electricity load forecasting." Energy 161 (2018): 130-142.

[20] Kong, Weicong, Zhao Yang Dong, David J. Hill, Fengji Luo, and Yan Xu. “Short-term

residential load forecasting based on resident behaviour learning." IEEE Transactions on

Power Systems 33, no. 1 (2018): 1087-1088.

[21] Vrablecova, Petra, Anna Bou Ezzeddine, Viera Rozinajová, Slavomír Šárik, and Arun Ku-

mar Sangaiah. “Smart grid load forecasting using online support vector regression." Com-

puters & Electrical Engineering 65 (2018): 102-117.

[22] González, Jose Portela, Antonio Munoz San Roque, and Estrella Alonso Perez. “Forecast-

ing functional time series with a new Hilbertian ARMAX model: Application to electricity

price forecasting." IEEE Transactions on Power Systems 33, no. 1 (2018): 545-556.

[23] Luo, Jian, Tao Hong, and Shu-Cherng Fang. “Benchmarking robustness of load forecasting

models under data integrity attacks." International Journal of Forecasting 34, no. 1 (2018):

89-104.

[24] Marino, Daniel L., Kasun Amarasinghe, and Milos Manic. "Building energy load fore-

casting using deep neural networks." In Industrial Electronics Society, IECON 2016-42nd

Annual Conference of the IEEE, pp. 7046-7051. IEEE, 2016.

[25] Zeng, Nianyin, Hong Zhang, Weibo Liu, Jinling Liang, and Fuad E. Alsaadi. "A switching

delayed PSO optimized extreme learning machine for short-term load forecasting." Neuro-

computing 240 (2017): 175-182.

[26] Cecati, Carlo, Janusz Kolbusz, Pawel Rozycki, Pierluigi Siano, and Bogdan M. Wilam-

owski. "A novel RBF training algorithm for short-term electric load forecasting and com-

parative studies." IEEE Transactions on industrial Electronics 62, no. 10 (2015): 6519-6529.

[27] Mocanu, Elena, Decebal Constantin Mocanu, Phuong H. Nguyen, Antonio Liotta, Michael

E. Webber, Madeleine Gibescu, and Johannes G. Slootweg. “On-line building energy op-

timization using deep reinforcement learning." IEEE Transactions on Smart Grid (2018).

1-11

[28] Mocanu, Elena, Phuong H. Nguyen, Madeleine Gibescu, and Wil L. Kling. "Deep learning

for estimating building energy consumption." Sustainable Energy, Grids and Networks 6

(2016): 91-99.

[29] Mujeeb, Sana, and Nadeem Javaid. "ESAENARX and DE-RELM: Novel Schemes for Big

Data Predictive Analytics of Electricity Load and Price." Sustainable Cities and Society

(2019).

[30] Kim, Myoungsoo, Wonik Choi, Youngjun Jeon, and Ling Liu. "A Hybrid Neural Network

Model for Power Demand Forecasting." Energies 12, no. 5 (2019): 931.

[31] Huang, Yunyou, Nana Wang, Tianshu Hao, Wanling Gao, Cheng Huang, Jianqing Li,

and Jianfeng Zhan. "LoadCNN: A Efﬁcient Green Deep Learning Model for Day-ahead

Individual Resident Load Forecasting." arXiv preprint arXiv:1908.00298 (2019).

[32] Deng, Zhuofu, Binbin Wang, Yanlu Xu, Tengteng Xu, Chenxu Liu, and Zhiliang Zhu.

"Multi-Scale Convolutional Neural Network With Time-Cognition for Multi-Step Short-

Term Load Forecasting." IEEE Access 7 (2019): 88058-88071.

[33] Mocanu, Elena, Decebal Constantin Mocanu, Phuong H. Nguyen, Antonio Liotta, Michael

E. Webber, Madeleine Gibescu, and J. G. Slootweg. "On-line building energy optimization

using deep reinforcement learning." arXiv preprint arXiv:1707.05878 (2017).

[34] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. "Deep

belief network based electricity load forecasting: An analysis of Macedonian case." Energy

115 (2016): 1688-1700.

[35] Fan, Cheng, Fu Xiao, and Yang Zhao. "A short-term building cooling load prediction

method using deep learning algorithms." Applied energy 195 (2017): 222-233.

[36] Amjady, Nima, Farshid Keynia, and Hamidreza Zareipour. "Short-term load forecast of

microgrids by a new bilevel prediction strategy." IEEE Transactions on smart grid 1, no. 3

(2010): 286-294.

[37] Amjady, Nima, and Farshid Keynia. "Day-ahead price forecasting of electricity markets

by mutual information technique and cascaded neuro-evolutionary algorithm." IEEE Trans-

actions on Power Systems 24, no. 1 (2009): 306-318.

[38] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. "Deep

belief network based electricity load forecasting: An analysis of Macedonian case." Energy

115 (2016): 1688-1700.

[39] Ryu, Seunghyoung, Jaekoo Noh, and Hongseok Kim. "Deep neural network based demand

side short term load forecasting." Energies 10, no. 1 (2016): 3.

[40] Qiu, Xueheng, Ye Ren, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga.

"Empirical mode decomposition based ensemble deep learning for load demand time series

forecasting." Applied Soft Computing 54 (2017): 246-255.

[41] Liu, Dunnan, Long Zeng, Canbing Li, Kunlong Ma, Yujiao Chen, and Yijia Cao. “A dis-

tributed short-term load forecasting method based on local weather information." IEEE Sys-

tems Journal 12, no. 1 (2018): 208-215.

[42] Khalid, Rabiya, Nadeem Javaid, Muhammad Hassan Rahim, Sheraz Aslam, and Arshad

Sher. “Fuzzy energy management controller and scheduler for smart homes." Sustainable

Computing: Informatics and Systems 21 (2019): 103-118.

[43] Iqbal, Sajid, Muhammad U. Ghani Khan, Tanzila Saba, Zahid Mehmood, Nadeem Javaid,

Amjad Rehman, and Rashid Abbasi. "Deep learning model integrating features and novel

classiﬁers fusion for brain tumor segmentation." Microscopy research and technique (2019).

[44] Javaid, Nadeem, Fahim Ahmed, Ibrar Ullah, Samia Abid, Wadood Abdul, Atif Alamri,

and Ahmad Almogren. "Towards cost and comfort based hybrid optimization for residential

load scheduling in a smart grid." Energies 10, no. 10 (2017): 1546.

[45] Khan, Zahoor Ali, Ayesha Zafar, Sakeena Javaid, Sheraz Aslam, Muhammad Hassan

Rahim, and Nadeem Javaid. "Hybrid meta-heuristic optimization based home energy man-

agement system in smart grid." Journal of Ambient Intelligence and Humanized Computing

(2019): 1-17.

[46] Khan, Zahoor Ali, Ayesha Zafar, Sakeena Javaid, Sheraz Aslam, Muhammad Hassan

Rahim, and Nadeem Javaid. "Hybrid meta-heuristic optimization based home energy man-

agement system in smart grid." Journal of Ambient Intelligence and Humanized Computing

(2019): 1-17.

[47] Aslam, Sheraz, Zafar Iqbal, Nadeem Javaid, Zahoor Khan, Khursheed Aurangzeb, and

Syed Haider. "Towards efﬁcient energy management of smart buildings exploiting heuristic

optimization with real time and critical peak pricing schemes." Energies 10, no. 12 (2017):

2065.

[48] Collotta, Mario, and Giovanni Pau. “An innovative approach for forecasting of energy

requirements to improve a smart home management system based on BLE." IEEE Transac-

tions on Green Communications and Networking 1, no. 1 (2017): 112-120.

[49] Shi, Heng, Minghao Xu, and Ran Li. “Deep learning for household load forecasting—a

novel pooling deep RNN." IEEE Transactions on Smart Grid 9, no. 5 (2018): 5271-5280.

[50] Kong, Weicong, Zhao Yang Dong, David J. Hill, Fengji Luo, and Yan Xu. “Short-term

residential load forecasting based on resident behaviour learning." IEEE Transactions on

Power Systems 33, no. 1 (2018): 1087-1088.

[51] Huang, Xuefei, Seung Ho Hong, and Yuting Li. “Hour-ahead price based energy manage-

ment scheme for industrial facilities." IEEE Transactions on Industrial Informatics 13, no.

6 (2017): 2886-2898.

[52] Li, Liangzhi, Kaoru Ota, and Mianxiong Dong. “When weather matters: IoT-based elec-

trical load forecasting for smart grid." IEEE Communications Magazine 55, no. 10 (2017):

46-51.

[53] Wang, Yu, Yinxing Shen, Shiwen Mao, Guanqun Cao, and Robert M. Nelms. “Adaptive

learning hybrid model for solar intensity forecasting." IEEE Transactions on Industrial In-

formatics 14, no. 4 (2018): 1635-1645.

[54] Tang, Ningkai, Shiwen Mao, Yu Wang, and R. M. Nelms. “Solar Power Generation Fore-

casting with a LASSO-based Approach." IEEE Internet of Things Journal (2018). 1-10

[55] van der Meer, D. W., J. Munkhammar, and J. Widén. “Probabilistic forecasting of solar

power, electricity consumption and net load: Investigating the effect of seasons, aggregation

and penetration on prediction intervals." Solar Energy 171 (2018): 397-413.

[56] Zhang, Jinliang, Yi-Ming Wei, Dezhi Li, Zhongfu Tan, and Jianhua Zhou. “Short term

electricity load forecasting using a hybrid model." Energy (2018).

[57] Tong, Chao, Jun Li, Chao Lang, Fanxin Kong, Jianwei Niu, and Joel JPC Rodrigues. “An

efﬁcient deep model for day-ahead electricity load forecasting with stacked denoising auto-

encoders." Journal of Parallel and Distributed Computing 117 (2018): 267-273.

[58] Wang, Pu, Bidong Liu, and Tao Hong. “Electric load forecasting with recency effect: A

big data approach." International Journal of Forecasting 32, no. 3 (2016): 585-597.

[59] Carvallo, Juan Pablo, Peter H. Larsen, Alan H. Sanstad, and Charles A. Goldman. “Long

term load forecasting accuracy in electric utility integrated resource planning." Energy Pol-

icy 119 (2018): 410-422.

[60] Yuan, Jihui, Craig Farnham, Chikako Azuma, and Kazuo Emura. “Predictive artiﬁcial neu-

ral network models to forecast the seasonal hourly electricity consumption for a University

Campus." Sustainable Cities and Society 42 (2018): 82-92.

[61] Brodowski, Stanisław, Andrzej Bielecki, and Maciej Filocha. “A hybrid system for fore-

casting 24-h power load proﬁle for Polish electric grid." Applied Soft Computing 58 (2017):

527-539.

[62] Qiu, Xueheng, Ye Ren, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga.

“Empirical mode decomposition based ensemble deep learning for load demand time series

forecasting." Applied Soft Computing 54 (2017): 246-255.

[63] Yang, Ailing, Weide Li, and Xuan Yang. “Short-term electricity load forecasting based on

feature selection and Least Squares Support Vector Machines." Knowledge-Based Systems

163 (2019): 159-173.

[64] Qiu, Xueheng, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga. “Ensem-

ble incremental learning Random Vector Functional Link network for short-term electric

load forecasting." Knowledge-Based Systems 145 (2018): 182-196.

[65] Chen, Yanhua, Marius Kloft, Yi Yang, Caihong Li, and Lian Li.“Mixed kernel based ex-

treme learning machine for electric load forecasting." Neurocomputing 312 (2018): 90-106.

[66] Zeng, Nianyin, Hong Zhang, Weibo Liu, Jinling Liang, and Fuad E. Alsaadi. “A switching

delayed PSO optimized extreme learning machine for short-term load forecasting." Neuro-

computing 240 (2017): 175-182.

[67] Zhang, Xiaobo, Jianzhou Wang, and Kequan Zhang. “Short-term electric load forecast-

ing based on singular spectrum analysis and support vector machine optimized by Cuckoo

search algorithm." Electric Power Systems Research 146 (2017): 270-285.

[68] Chen, Yibo, Hongwei Tan, and Umberto Berardi. “Day-ahead prediction of hourly elec-

tric demand in non-stationary operated commercial buildings: A clustering-based hybrid

approach." Energy and Buildings 148 (2017): 228-237.

[69] Guo, Zhifeng, Kaile Zhou, Xiaoling Zhang, and Shanlin Yang. “A deep learning model for

short-term power load and probability density forecasting." Energy 160 (2018): 1186-1200.

[70] Ghadimi, Noradin, Adel Akbarimajd, Hossein Shayeghi, and Oveis Abedinia. “Two stage

forecast engine with feature selection technique and improved meta-heuristic algorithm for

electricity load forecasting." Energy 161 (2018): 130-142.

[71] Li, Yanying, Jinxing Che, and Youlong Yang. “Subsampled support vector regression en-

semble for short term electric load forecasting." Energy 164 (2018): 160-170.

[72] Jawad, Muhammad, Sahibzada M. Ali, Bilal Khan, Chaudry A. Mehmood, Umar Farid,

Zahid Ullah, Saeeda Usman et al. “Genetic algorithm-based non-linear auto-regressive with

exogenous inputs neural network short-term and medium-term uncertainty modelling and

prediction for electrical load and wind speed." The Journal of Engineering 2018, no. 8

(2018): 721-729.

[73] Manjili, Yashar Sahraei, Rolando Vega, and Mo M. Jamshidi. “Data-Analytic-Based Adap-

tive Solar Energy Forecasting Framework." IEEE Systems Journal 12, no. 1 (2018): 285-

296.

[74] Semero, Yordanos Kassa, Jianhua Zhang, and Dehua Zheng. “PV power forecasting us-

ing an integrated GA-PSO-ANFIS approach and Gaussian process regression based feature

selection strategy." CSEE Journal of Power and Energy Systems 4, no. 2 (2018): 210-218.

[75] Liang, Yi, Dongxiao Niu, and Wei-Chiang Hong. "Short term load forecasting based on

feature extraction and improved general regression neural network model." Energy 166

(2019): 653-663.

[76] Devarajan, Sandhiya, and S. Chitra. "LOAD FORECASTING MODEL FOR ENERGY

MANAGEMENT SYSTEM USING ELMAN NEURAL NETWORK." International Re-

search Journal of Multidisciplinary Technovation 1, no. 5 (2019): 48-56.

[77] Bianchini, Monica, and Franco Scarselli. "On the complexity of neural network classi-

ﬁers: A comparison between shallow and deep architectures." IEEE transactions on neural

networks and learning systems 25, no. 8 (2014): 1553-1565.

[78] Mhaskar, Hrushikesh, Qianli Liao, and Tomaso Poggio. "When and why are deep net-

works better than shallow ones?." In Thirty-First AAAI Conference on Artiﬁcial Intelli-

gence. 2017.

[79] Bao, Zongfan, Yongquan Zhou, Liangliang Li, and Mingzhi Ma. “A hybrid global opti-

mization algorithm based on wind driven optimization and differential evolution." Mathe-

matical Problems in Engineering 2015 (2015). 608-620

[80] Zhang, Qingchen, Laurence T. Yang, Zhikui Chen, and Peng Li. "A survey on deep learn-

ing for big data." Information Fusion 42 (2018): 146-157.

[81] Kim, Junhong, Jihoon Moon, Eenjun Hwang, and Pilsung Kang. "Recurrent inception

convolution neural network for multi short-term load forecasting." Energy and Buildings

194 (2019): 328-341.

[82] Hafeez, Ghulam, Noor Islam, Ammar Ali, Salman Ahmad, Muhammad Usman, and Khur-

ram Saleem Alimgeer. "A Modular Framework for Optimal Load Scheduling under Price-

Based Demand Response Scheme in Smart Grid." Processes 7, no. 8 (2019): 499.

[83] https://www.kaggle.com/c/GEFC-2012(accesedon17FebApril2019).

[84] Ghulam Hafeez, Nadeem Javaid, Muhammad Riaz, Ammar Ali, Khalid Umar, and Zafar

Iqbal “Day ahead electric load forecasting by an intelligent hybrid model based on deep

learning for smart grid”, 13th International Conference on Complex, Intelligent, and Soft-

ware Intensive System pp. 276-290. Springer, Cham, 2019.

[85] Abedinia, Oveis, Nima Amjady, and Hamidreza Zareipour. “A new feature selection tech-

nique for load and price forecast of electrical power systems." IEEE Transactions on Power

Systems 32, no. 1 (2017): 62-74.

[86] Khwaja, A. S., M. Naeem, A. Anpalagan, A. Venetsanopoulos, and B. Venkatesh. “Im-

proved short-term load forecasting using bagged neural networks." Electric Power Systems

Research 125 (2015): 109-115.

[87] Hafeez, Ghulam, Nadeem Javaid, Sohail Iqbal, and Farman Khan. “Optimal residential

load scheduling under utility and rooftop photovoltaic units." Energies 11, no. 3 (2018):

611.

[88] Ghulam Hafeez, Nadeem Javaid, Muhmammad Riaz, and Zafar Iqbal. “An innovative

model based on FCRBM for load forecasting in the smart grid.” In International Confer-

ence on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 276-290.

Springer, Cham, 2019.

[89] Kwak, Nojun, and Chong-Ho Choi. "Input feature selection for classiﬁcation problems."

IEEE transactions on neural networks 13, no. 1 (2002): 143-159.

[90] Latham, Peter E., and Sheila Nirenberg. “Synergy, redundancy, and independence in pop-

ulation codes, revisited." Journal of Neuroscience 25, no. 21 (2005): 5195-5206.

[91] Peng, Hanchuan, Fuhui Long, and Chris Ding. “Feature selection based on mutual infor-

mation criteria of max-dependency, max-relevance, and min-redundancy." IEEE Transac-

tions on pattern analysis and machine intelligence 27, no. 8 (2005): 1226-1238.

[92] Estévez, Pablo A., Michel Tesmer, Claudio A. Perez, and Jacek M. Zurada. “Normalized

mutual information feature selection." IEEE Transactions on Neural Networks 20, no. 2

(2009): 189-201.

[93] Amjady, Nima, and Farshid Keynia. “A new prediction strategy for price spike forecasting

of day-ahead electricity markets." Applied Soft Computing 11, no. 6 (2011): 4246-4256.

[94] Engelbrecht, A.P. Computational Intelligence: An Introduction, 2nd ed.; John Wiley &

Sons: New York, NY, USA, 2007.

[95] Anderson, Charles W., Erik A. Stolz, and Sanyogita Shamsunder. “Multivariate autore-

gressive models for classiﬁcation of spontaneous electroencephalographic signals during

mental tasks." IEEE Transactions on Biomedical Engineering 45, no. 3 (1998): 277-286.

[96] Bao, Zongfan, Yongquan Zhou, Liangliang Li, and Mingzhi Ma. “A hybrid global opti-

mization algorithm based on wind driven optimization and differential evolution." Mathe-

matical Problems in Engineering 2015 (2015): 620-635.

[97] Man, Kim-Fung, Kit-Sang Tang, and Sam Kwong. “Genetic algorithms: concepts and

applications [in engineering design]." IEEE transactions on Industrial Electronics 43, no. 5

(1996): 519-534.

[98] V. Mnih, H. Larochelle, G. Hinton, Conditional restricted Boltzmann machines for struc-

tured output prediction, in: Proceedings of the International Conference on Uncertainty in

Artiﬁcial Intelligence (2011).

[99] Hinton, Geoffrey E. "A practical guide to training restricted Boltzmann machines." In

Neural networks: Tricks of the trade, pp. 599-619. Springer, Berlin, Heidelberg, 2012.

[100] G.W. Taylor, G.E. Hinton, S.T. Roweis, Two distributed-state models for generating high-

dimensional time series, J. Mach. Learn. Res. 12 (2011) 1025–1068.

[101] Mocanu, Elena, Phuong H. Nguyen, Madeleine Gibescu, Emil Mahler Larsen, and Pierre

Pinson. "Demand forecasting at low aggregation levels using factored conditional restricted

boltzmann machine." In 2016 Power Systems Computation Conference (PSCC), pp. 1-7.

IEEE, 2016.

[102] Introduction to various types of gradient descent.

https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-

conﬁgure-batch-size/] (last accessed: August 13, 2019)

[103] Mocanu, Decebal Constantin, Haitham Bou Ammar, Dietwig Lowet, Kurt Driessens,

Antonio Liotta, Gerhard Weiss, and Karl Tuyls. "Factored four way conditional restricted

boltzmann machines for activity recognition." Pattern Recognition Letters 66 (2015): 100-

108.

Tentative Time Table

Sr No. Activity Date

1 Background study and detailed literature review Completed

2 Formulation of problem and proposing solution August

3 Analysis and dissemination of results November

4 Thesis writing December

PART II

Recommendation by the Research Supervisor

Name_________________________Signature_____________________Date________

Recommendation by the Research Co-Supervisor

Name_________________________Signature_____________________Date________

Signed by Supervisory Committee

S.# Name of Committee member Designation Signature & Date

1

2

3

4

Approved by Departmental Advisory Committee

Certiﬁed that the synopsis has been seen by members of DAC and considered it suitable for

putting up to BASAR.

Secretary

Departmental Advisory Committee

Name: _____________________________

Signature: _____________________________

Date: _____________________________

Chairman/HoD: ____________________________

Signature: _____________________________

Date: _____________________________

PART III

Dean, Faculty of Engineering

_____________________Approved for placement before BASAR.

_____________________Not Approved on the basis of following reasons

Signature_____________________Date________

Secretary BASAR

_____________________Approved for placement before BASAR.

_____________________Not Approved on the basis of following reasons

Signature_____________________Date________

Dean, Faculty of Engineering

________________________________________________________________________

________________________________________________________________________

________________________________________________________________________

Signature_____________________Date________

Please provide the list of courses studied

1. Power Transmission and Distribution

2. Advanced Power System Analysis

3. Special Topics in Computer Networks

4. Smart Grid System Operation

5. Advanced Topics in Optical Communication

6. Antennas Theory Design and Applications