Research ProposalPDF Available

Title of Research Proposal Electric load forecasting based on deep learning for the decision making in smart grid

Authors:
COMSATS University Islamabad, Islamabad Campus
Synopsis For the Degree of M.S/MPhil. PhD.
PART-1
Name of Student Ghulam Hafeez
Department Electrical and Computer Engineering
Registration No. FA17-PEE-001
Name of
(i) Research Supervisor
(ii) Co-Supervisor
(i) Dr. Khurram Saleem Alimgeer
(ii) Dr. Nadeem Javaid
Research Area
Members of Supervisory Committee
1
2
3
4
Title of Research Proposal Electric load forecasting based on deep learning for the decision
making in smart grid
Signature of Student:
Summary of the research
Accurate electric load forecasting is indispensable due to its application in the decision making
and operation of the smart grid (SG). With accurate electric load forecasting, operators are capa-
ble to develop an optimal market plan to enhance the economic benefits of energy management.
Therefore, it is a significant goal for scholars and industry to develop a forecasting model, which
provides accurate and precise load forecasting. Thus, several forecasting strategies have been
proposed in the literature starting from legacy time series to contemporary data analytic models.
However, the performance of single technique based forecasting models is not satisfactory due to
the inherent limitations. On the other hand, hybrid models fully utilize the advantages of individ-
ual techniques and have improved performance. Furthermore, some of these models have either
better performance in terms of accuracy while others perform well in convergence rate. However,
the forecast accuracy and convergence rate still can be improved. In this synopsis, a short-term
electric load forecasting model is proposed. The proposed model is a hybrid model composed of
data pre-processing and feature selection module, training and forecasting module, optimization
module, and utilization module. The data pre-processing and feature selection module is based on
modified mutual information (MMI) technique, which is an improved version of mutual informa-
tion technique, used to select abstractive features from historical data. The training and forecasting
module is based on factored conditional restricted Boltzmann machine (FCRBM), which is a deep
learning model, enabled via learning to forecast the future electric load. The optimization module
is based on our proposed genetic wind driven optimization (GWDO) algorithm, which is used to
fine tune the adjustable parameters of the model. The forecasting results are utilized in the decision
making of the SG. The proposed model is tested on historical electric load data of three USA grids
(FE, EKPC, and Daytown) and global energy forecasting competition 2012. The data is taken from
publicly available PJM electricity market and Kaggel repository. To verify the effectiveness of the
proposed model, three existing models (MI-ANN, Bi-Level, and AFC-ANN) are used. Two per-
formance metrics, i.e., accuracy (mean absolute percentage error (MAPE), root mean square error
(RMSE), and correlation coefficient) and convergence rate are used for performance evaluation of
the proposed model.
1 Introduction
Smart grid (SG) emerged as a smart power system that has recently achieved a lot of popularity
[1]. Although a variety of novel research work has been emanated in the field of electric load
forecasting, however, more accurate and robust electric load forecast models are still the need
of the day. An accurate estimation of variation in future electric load is of great importance for
both electric utility companies and consumers due to its application in the decision making and
operation of power grid [2]. However, the major obstacles in future electric load forecasting
are the various influencing factors such as variable climate, temperature, humidity, occupancy
patterns, calendar indicators, and social conventions.
The valid mapping of these influencing factors and load variations is extremely cumber-
some due to the stochastic and non-linear behavior of consumers. In fact, data acquisition has
not been an easy task. The emanation of advanced metering infrastructure (AMI), communi-
cation technologies, and sensing methods enable us to record, monitor, and analyze the impact
of these influencing factors on electric load [3]. However, data handling is still a challenging
problem due to non-linear and stochastic varying weather conditions. In literature, both clas-
sical (time-series methods) and computational intelligence methods are applied for electrical
load forecasting [4]. Both methods have their own limitations. The former classical methods
are blamed for their limited ability to handle non-linear data. On the other hand, computational
intelligence methods are criticized for problems like handcrafted features, limited learning ca-
pacity, impotent learning, inaccurate appraisal, and insufficient guiding significance. Although,
there are some existing machine learning models applied for electric load forecasting, which
partially resolve the aforementioned problems and have improved performance due to the use
of ingenious design [5].
A suitable mechanism is required to completely solve the problems because low forecast
accuracy results in huge economic loss. One percent increase in the forecast error will cause
10 million increase in the overall utility cost. Therefore, electric utility companies are trying
to develop a fast, accurate, and robust short-term electric load forecasting model. Moreover,
accurate forecasting can also be beneficial for the detection of potential faults and reliable grid
operation. Over the last two decades, numerous load forecasting models have been developed
due to its application in the decision making of the power grid. Boroojeni et al. proposed a
generalized method to model off-line data that have different seasonal cycles (e.g., daily, weekly,
quarterly, and annually). Both seasonal and non-seasonal load cycles are modeled individually
with the help of auto-regressive and moving-average (ARMA) components [6]. Xiaomin Xu
et al. investigated ensemble subsampled support vector regression (SSVR) for forecasting and
estimation of load [7]. A deep belief network restricted Boltzmann machine (RBM) is used
for electric load forecasting. The network reduced the forecast error with affordable execution
time [8]. Hong et al. forecasts electric load of Southeast China with the help of the hybrid
model based on seasonal recurrent support vector regression (SVR) model and chaotic artificial
bee colony algorithm (CABCA). The performance of the model is validated by comparing to
the auto-regressive integrated moving average (ARIMA) model [9]. These references provide a
good study for future electric load forecasting. However, the load of power grid is more volatile,
containing high frequency, and sharp variation as compared to the load of microgrid. Moreover,
the aforementioned studies have not considered the connection between data pre-processing
and feature selection, control parameters and performance parameters, and network training
methods. Moreover, there is also need to integrate the optimization module to the forecasting
model for outstanding performance. The main contributions are described as follows:
1. A novel hybrid forecast model composed of modified mutual information (MMI), con-
ditional RBM (CRBM) and factored CRBM (FCRBM), and genetic wind driven opti-
mization (GWDO) techniques is proposed for short-term electric load forecasting. The
aforementioned techniques are arranged in coordinated modular framework to construct
the proposed hybrid model.
2. Based on the existing mutual information (MI) technique [10], a new MMI technique
for feature selection is proposed (Section 5.3). The proposed MMI technique rank the
candidate inputs according to their information value and select key features from the
data to overcome the problem of the curse of dimensionality.
3. A deep learning technique FCRBM is adapted, which is enabled via learning to forecast
the future electric load.
4. A GWDO algorithm is proposed, which is a hybrid of genetic algorithm (GA) and wind
driven optimization (WDO) algorithm. The proposed algorithm has global powerful
search capability and fast convergence rate.
5. The adjustable parameters of both data pre-processing and feature selection module, and
the training and forecasting module are fine tuned by our proposed GWDO algorithm.
The purpose is to optimize the performance of the proposed model.
6. The proposed model is tested on historical hourly load data of three USA grids (FE,
Daytown, and EKPC) and global energy forecasting competition 2012. The results uti-
lizing the proposed model have proven more accurate when compared to the existing
models like ANN, CNN, Bi-level, MI-artificial neural network (MI-ANN), and accurate
fast converging-ANN (AFC-ANN).
The remaining sections of this synopsis are arranged in the following manner: related work
is presented in Section 2. Research gap analysis is presented in Section 3. In Section 4, the
statement of the problem is discussed. The proposed system models are demonstrated in Section
5. The proposed methods are presented in Section 6. Research methodology is provided at the
end of the synopsis in Section 7.
2 Related work
Electric load forecasting is crucial in the decision making of SG especially at large scale, where
countries and group of countries have common power system such as European Union. In this
regard, some relevant work is discussed in this section. The forecasting is categorized into
four categories according to the forecasting period [11]. First category is the very short-term
forecasting [12] which corresponds to less than one day forecasting. Second category is the
short-term forecasting which corresponds to forecasting period of one day to one week [13].
Third category is medium-term forecasting which corresponds to one week and year ahead
forecasting [14]. Fourth category is the long-term forecasting which corresponds to more than a
year ahead forecasting [15]. In literature, both statical models and machine learning models are
commonly used for electric load forecasting. Let us discuss some of these models adopted for
forecasting in recent years.
ANN are widely used as an intrinsic system and as a part of hybrid system for electric
load forecasting. In [16], Kohonen self organising map is utilized for day ahead electric load
forecasting in Spain. The described strategy comprising of three stages. Daily load profile
is treated as time series and stored in the neurons. After training phase the arrangement of
neurons are such that the load profile given to neuron is similar to the neighboring neurons.
During the second phase the data samples are presented to the network and wining neurons are
extracted. Then, the data samples of the wining neurons are divided into two parts. The first
one corresponds to input profile and second one corresponds to forecasted profile. The effect of
exogenous parameters on the accuracy is also considered. It is also reported that the percentage
error varied from 1.84% to 2.33%. A differential polynomial neural network for short-term
load forecasting is described in [17]. The network is multi-layer and by its decomposition
partial differential equations are solved. The twenty-four hours ahead load is forecasted using
the historical electric load data of Canada. The forecasted load deviates from target value by an
error of 1.56%. A short-term load forecasting method based on weather information is proposed
in [18]. The power system is divided into subnetworks on the basis of weather information.
Separate models are developed for each subnetwork. The abstracted features are selected form
large data sets using cosine distance method. The models are based on ANN, ARIMA, and grey
model (GM) to forecast the future load. A hybrid forecast strategy is proposed in [19], which
is based on intelligent algorithm. The described hybrid forecast strategy including novel feature
selection technique and complex forecast engine. The novel features selection technique selects
appropriate features which is fed into the forecast engine. The forecast engine is two stages
and implemented on Redglet and Elman neural network. The intelligent algorithm tunes the
adjustable parameters of the forecast engine to improve both forecast accuracy and convergence
rate. The performance of the described model is validated by comparing with existing models.
A deep learning based forecasting framework with appliances energy consumption sequence is
proposed in [20]. The accuracy is notably improved by incorporating appliances consumption
sequence in addition to the deep neural network.
The authors in [21], proposed an Elman neural network based forecast engine to predict the
future load in the SG. The weights and biases for this network is optimally adjusted by intelligent
algorithm to obtain the accurate forecasting results. The authors proposed a novel forecasting
model that is able to generalize standard ARMAX model to Hilbert space [22]. The proposed
model has linear regression structure and use functional variables for operation. The considered
variables are autoregressive terms, moving average terms, and exogenous parameters. The func-
tional variables are integral operators whose Kernels are modeled as sigmoidal function. The
parameters of the sigmoidal function are optimized using Quasi-Newton algorithm. The model
is validated on the daily price profile of Spanish and German electricity market. However, the
forecast accuracy is improved due to the use of optimization module at the expense of high
execution time. In [23], authors reveal data integrity attacks effect on the accuracy of four fore-
casting models, i.e., SVR, multiple linear regression, ANN, and fuzzy interaction regression.
The data integrity attacks are expected to damage the performance of the discussed forecasting
models and have significant impact on the resilience of the power system.
Authors in [24], proposed a short-term load forecasting model based on deep neural net-
work. Two variants of long-term and short-term memory (LSTM), i.e., standard and sequence to
sequence LSTM are used for forecasting individual building energy consumption. Both LSTM
variants are trained and tested with one hour and one-minute time resolution. Experimental
results express that sequence to sequence LSTM outperforms the standard LSTM.
In [25], authors proposed a hybrid of extreme learning machine (ELM) and new delayed
particle swarm optimization (PSO) technique to solve the forecasting problem. In addition,
the weights and biases are optimally tuned by new switching delayed PSO technique. Tangent
hyperbolic (Tanh) function is used to test the performance in a comprehensive and systematic
manner. Simulations results demonstrate that the proposed model outperforms the existing ma-
chine learning-based models. Moreover, the proposed model is applied to the power system for
load forecasting.
Authors in [26], used a newly designed algorithm in order to train a radial basis function to
forecast day-ahead electric load. The newly designed algorithm is comparatively evaluated with
the existing algorithms in terms of forecast accuracy. Simulation results demonstrate that the
newly designed method has less MAPE as compared to RNN and SVR.
Authors in [10], proposed a short term load forecasting model for industrial applications.
The proposed model is based on ANN and modified enhanced deferential evolutionary algorithm
(MEDEA) to improve the forecast accuracy. For feature extraction and network training, mutual
information-based technique and multi-variate autoregressive model are used, respectively. The
short term load forecast model is enabled via training to forecast the future load. Simulation
results express that the proposed model provides 99.5% accurate predictions as compared to
bi-level strategy. However, forecast accuracy is improved by feeding the output of the forecast
module to optimization module which takes more time to execute.
Authors used deep neural network architecture for day-ahead load forecasting in [27]. To
extract features from historical load data, CNN is used. LSTM model is used to model dynamics
and variability of historical load data. For holidays and other features modeling, the dense feed-
forward network is used. The proposed model is evaluated on an hourly dataset of North China
City. However, forecast accuracy is improved at the expense of high model complexity.
In [28], authors proposed a forecasting model for building energy estimation. The proposed
model is based on deep learning techniques to reduce uncertainty and improve forecast accuracy.
The accuracy of the proposed model is evaluated in terms of RMSE, correlation coefficient, and
p-value. Simulation results validate the forecast accuracy of the proposed model. However,
forecast accuracy is obtained at the cost of slow convergence rate. Mujeeb et al. propose a
load and price forecasting technique that is based on deep learning and evolutionary algorithm
named differential evolution [29]. Their proposed method achieves high accuracy in terms of
load and price forecasting. Another work [30] proposes a hybrid energy demand forecasting
model, where the authors hybrid the LSTM and CNN networks. Moreover, it is affirmed from
experiments that the hybrid model shows higher accuracy than previous approaches.
The authors of [31] develop a day-ahead load forecasting model for individual residence
based on CNN and another work [32] proposes a multi-scale CNN (MS-CNN) with time cogni-
tion for multi-step short term demand prediction. They have performed extensive simulations to
validate the performance of MS-CNN in terms of accuracy. Results from their simulations show
the effectiveness of their developed MS-CNN model over counterparts. Deep reinforcement
learning-based building energy optimization model is proposed in [33]. A hybrid technique of
reinforcement learning and deep learning is proposed for aggregated and individual building
energy optimization. For exploring learning procedure authors have used deep policy gradient
and Q-learning. The proposed model is validated on Pecan Street Inc. database offline data.
Simulation results expressed that deep policy gradient was most suitable for the cost and peak
reduction as compared to the deep Q-learning.
In [35], authors used a deep learning-based model to forecast the cooling load profile of
the building. The performance of the deep learning-based model, extreme gradient boosting
is evaluated in terms of accuracy and computationally efficiency by comparing with existing
forecasting models. The proposed model outperforms existing models in terms of accuracy.
Authors in [36], proposed short term load forecasting model based on bi-level strategy of
microgrids. Bi-level strategy has upper and lower levels structure. The lower level structure
has feature selection technique and forecasting module using a hybrid of ANN and evolutionary
algorithm. The upper-level structure is composed of a stochastic search optimization technique
to optimize the performance of the forecast module. The efficacy of the proposed model is
evaluated on the real-life data of Canada University
Macedonian electricity load forecasting is performed using multiple layer restricted Boltz-
mann machine (RBM) in [38]. The parameters are fine-tuned and weights and biases are updated
using backpropagation. The deep belief network (multi-layers RBM) is trained using hourly
Macedonian electricity consumption data of 6 years from 2008 to 2014. To evaluate the forecast
accuracy of multi-layers RBM, it is compared with actual data, Macedonian system operator
load profile, and multi-layer perceptron.
RBM based models are also used for load forecasting that show reasonable results [39, 40],
however, these models can be improved in order to achieve better accuracy in less computational
time.
Some of the related work with respect to convergence rate, execution time, and forecast
accuracy is summarized in Table 1.
In [7], the authors presented an intelligent model to forecasts the load on distributed gener-
Table 1 Comprehensive analysis of existing forecasting models in terms convergence rate,
execution time, computational complexity, and forecast accuracy
Short term load forecasting models Accuracy Execution
time
Convergence
rate
Computational
complexity
Deep learning based building energy forecast-
ing model [28]
Moderate High Slow High
Deep learning based energy forecasting model
[33]
Low High Slow High
Deep neural network based short term load
forecasting model [39]
Low High Slow High
LSTM based building energy forecasting model
[24]
Low High Slow Low
PSO+extream learning machine based forecast-
ing model [25]
Moderate High Slow High
Radial basis function based forecasting model
[26]
Moderate High Slow High
ANN based forecasting model [10] Moderate High Fast High
CNN and LSTM based forecasting model [27] Moderate High Slow High
Deep belief network based forecasting model
[38]
Moderate High Moderate High
Building cooling load forecasting model based
on deep neural network [35]
Low High Slow Low
Bi-level strategy based forecasting model [36] Moderate high Slow Low
Mutual information and ANN based forecasting
model [37]
Low Low Fast Low
ation (DG) and examine the power supply structure. First, the support vector machine (SVM)
and fruit-fly immune (FFI) algorithm are used to predict DG load. Second, the combined neural
network and the polynomial regression model is used for power supply structure analysis in rela-
tion to hourly load and weather factors. Finally, the impact of DG on the regional power system
structure is analyzed in terms of load reduction on the main electric gridstation. This com-
bined intelligent model has low performance error and strong generalization. However, higher
accuracy is achieved at the cost of slow convergence rate and high computational complexity.
Authors proposed distributed methods in [41] to forecast the future load using weather infor-
mation. The power system is divided into two subnetworks according to the weather variations.
Moreover, separate forecasting models, i.e., ARIMA and grey are established for both subnet-
works. The adapted models are evaluated by comparing with the traditional models using two
performance metrics, i.e., relative root mean error (RRMSE) and mean absolute percentage er-
ror (MAPE). In [42]-[47], authors proposed heuristic based energy management controller for
smart homes. The purpose is to reduce the peak load and electricity cost. However, forecasting
is necessary before optimal load scheduling.
Authors in [48] introduced combined Bluetooth home energy management system (HEMS)
with ANN in order to forecast future load. This approach enables its decision making stronger
taking into account the current situation and energy consumption conditions to forecast the
future load (different time of the working day and different days of the working week). The
purpose of this work is to optimally manage the peak load and smoothout the demand curve.
However, the objectives are obtained at the cost of execution time and slow convergence rate.
Deep recurrent neural network (DRNN) based model is proposed to forecast the household
load [49]. This method overcomes the problems of overfitting created by classical deep learning
methods. The results show that DRNN outperforms the existing methods ARIMA, SVR, and
convolutional RNN (CRNN) by 19.5%, 13.1%, and 6.5%, respectively, in terms of RMSE.
In [50], LSTM and recurrent neural network (LSTM-RNN) based framework is proposed
to forecast the future residential load. The accuracy of the proposed framework is enhanced
by embedding appliance consumption sequences in the training data. The proposed framework
is validated on the real world data. However, the authors focus only on accuracy while the
convergence rate and computational complexity are ignored.
A demand response (DR) scheme based on real time pricing (RTP) is proposed in [51] for
industrial facilities. The scheme adapted ANN for forecasting the future prices for global time
horizon optimization. The energy cost minimization is facilitated by price forecasting and is for-
mulated by mixed integer linear programming (MILP). The proposed framework performance
analysis is carried out by the practical case study of steel powder manufacturing. The simulation
results illustrate that hourly ahead DR is better than day ahead DR, with an improved ability to
satisfy industrial demand with reducing cost while satisfying targets.
Authors in [52], proposed an IoT-based deep learning system to forecast the future load
with high precision. Moreover, the proposed method also qualitatively analyzed the influenc-
ing factors such as variable climates, temperature, humidity, and social conventions that have
a great impact on the forecast. However, the transfer of a huge amount of data on existing
communication infrastructure is challenging.
In [53], adaptive hybrid learning model (AHLM) is proposed to forecast the intensity of
solar. The linear and dynamic behavior of data are captured by time varying and multiple layer
linear models. Also, a hybrid model of back propagation, GA, and neural network is used to
learn the non-linear behavior of the data. The proposed AHLM learn linear, temporal, and non-
linear behavior from the off-line data and predict the intensity of the solar with more precision.
The proposed model outperforms for both short and long term forecast horizons.
To optimally harvest the potential of solar energy, forecasting of solar power energy is in-
dispensable. Thus, the least absolute shrinkage and selection operator model is proposed for
forecasting solar energy generation [54]. The proposed model is trained using historical weather
data aiming not only to reduce prediction error but also to reveal the weather variables signif-
icance in model training for forecasting. An algorithm is developed based on a single index,
least absolute shrinkage, and selection operator models that maximize Kendall ’s coefficient in
order to estimate forecasting model coefficients. The goal of this algorithm is to ignore the less
important variables and increase the sparsity of the coefficient vector. With the proposed model,
either prediction accuracy is improved or tradeoff between accuracy and complexity is achieved.
However, the accuracy is improved at the cost more execution time.
Authors in [55] presented a probabilistic forecasting model to forecast solar power, electri-
cal energy consumption, and netload across the seasonal variations and scalability. Dynamic
gaussian process and Quantile regression models are employed on the data of metropolitan
area Sydney, Australia for probabilistic forecasting. Simulation results depict that the proposed
model outperforms in all three scenarios of forecasting: solar power generation, electricity con-
sumption, and netload.
For short-term load prediction, a hybrid model is proposed in [56]. This model is based on
improved empirical mode decomposition, ARIMA, and wavelet neural network (WNN) opti-
mized by the fruit fly immune (FFI) optimization algorithm. For performance demonstration of
the proposed model electric load data of Australian and New York electricity market are used.
Simulation results show that proposed model prediction is more accurate as compared to the
existing models.
In [57], a deep learning based electric load prediction model is proposed to forecast the fu-
ture load. The proposed model extract abstracted features using stacked denoising auto-encoders
technique. With these abstracted features, SVR model is trained to forecast the future load. The
proposed model is evaluated by comparing with plain SVR and ANN in terms of accuracy im-
provement.
Authors in [58], investigated the recency effect of electricity load forecasting using preced-
ing hours load and temperature. The aim is to determine the lagged hourly temperature and
daily moving average temperature to enhance the forecast accuracy. The data used for network
training and validation is of global energy forecasting competition 2012. The recency effect is
investigated in three scenarios: aggregated level geographic hierarchy, bottom level geographic
hierarchy, and individual level hours of the day. However, accuracy is enhanced at the cost of
model complexity.
In [59], proposed long-term forecasting model in order to improve the relative forecast ac-
curacy of electric utility resource integrated planning. The analysis was conducted on twelve
Western US electric utility in the mid-2000s for both peak and normal energy consumption.
The ANN model is used to forecast the hourly energy consumption of buildings in the
Sugimoto Campus of Osaka City University, Japan [60]. The presented model is trained with
Levenberg-Marquardt and backpropagation algorithms. The six parameters are given as input
such as dry bulb, humidity, temperature, global hourly irradiance, previous hourly, and weekly
energy consumption. The accuracy of the proposed model is evaluated in terms of correlation
coefficient and RMSE. Simulation results illustrate that RMSE is largest in the science and tech-
nology area of the university campus as compared to humanities College area and old liberal arts
area.
A novel type of hybrid system based on artificial intelligence is discussed in [61] to forecast
24 hours load profile of Polish gridstation. The proposed hybrid system was tested on the off-
line data of Poland and a few other countries. The MAPE varies from 1.08% to 2.26% in this
scenario depending on the country.
In the paper [62], an ensemble model based on empirical mode decomposition algorithm and
deep learning is proposed for load forecasting. The proposed model is tested and validated on
the electrical energy consumption datasets of the Australian energy market operator (AEMO).
Moreover, the electric energy consumption data is decomposed into intrinsic mode functions
(IMF) and the proposed model was used to model each of the IMF in order to improve the
forecast accuracy. An autocorrelation function is for selecting input parameters and least squares
support vector machine (LSSVM) is for forecasting is discussed in [63]. The main contribution
of the paper is to provide a fully automated machine learning model without human intervention
in order to forecast the future load.
A hybrid incremental learning approach is proposed in [64], that is composed of discrete WT
(DWT), empirical mode decomposition, and random vector functional link network (RVFLN), is
discussed for short-term load forecasting. To evaluate the proposed model, the AEMO electric-
ity load data is used. Simulation results depict that the proposed system is effective as compared
to eight benchmark prediction methods.
In the paper [65], ELM model based on a mixed kernel for future load forecasting is dis-
cussed. The half-hour resolution electric load data is used to validate the proposed model. This
the electric load data of the state of New South Wales, Victoria and Queensland in Australia.
Simulation results illustrate that our proposed method is better as compared to existing three
methods, i.e., radial basis function-ELM (RBF-ELM), UKF-ELM, and mixed-ELM in terms of
accuracy.
In [66], the authors proposed a hybrid of ELM and new switching delayed particle swarm
optimization (PSO) algorithm for short-term load forecasting. The weights and biases are op-
timized with new switching delayed algorithm. Tanh function is used as an activation function
because it has better generalization problem and avoids the unnecessary hidden nodes and over-
training problem. Experimental results show that the proposed model outperforms the RBF
neural network. The proposed model is successfully applied for short-term load forecasting in
the power system.
A novel hybrid model, which is a combination of singular spectrum analysis (SSA), support
vector machine (SVM), and cuckoo search (CS) algorithm, is proposed in [67] to forecast the
future load. The historical data is pre-processed with the help of SSA. The pre-processed data
is fed to the SVM model to forecast the future load and performance is optimized with the
help of the CS algorithm. The performance of the proposed model is evaluated in terms of
accuracy by comparing with SVM, CS-SVM (CS-SVM), SSA-SVM (SSA-SVM), ARIMA,
and backpropagation neural network (BPNN).
In [68], clustering based hybrid model is proposed to predict hourly electricity demand of
hotel buildings. The operating buildings are non-stationary because of irregular electric tem-
poral features. The on-line modified predictor model is proposed. The model is a combination
of SVR and wavelet decomposition algorithm, which takes extracted training samples as in-
put by fuzzy-C-means (FCM). The proposed model has improved accuracy as compared to the
traditional models.
A deep neural network model is adopted for short-term load and probability density fore-
casting in [69]. The proposed model is evaluated on electricity consumption case studies of
three Chinese cities for the year 2014. The simulation results demonstrate that: 1) deep learning
based model has better forecast accuracy relative to random forest and gradient boosting model,
2) temperature, weather, and other environmental variables have a significant impact on elec-
tricity consumption, and 3) the probability density forecasting method is able to provide a high
quality prediction.
In [70], a hybrid forecast model is proposed, which is a combination of feature extraction
technique and two stage forecast engine. The two stage forecast engine using Ridgelet neural
network and Elman neural network to provide accurate predictions. The optimization algorithm
is applied to optimally select the control parameters for forecast engine.
Authors in [71] proposed a short-term load forecasting model based on SSVR. The main ob-
jective is to improve relative forecast accuracy and efficiency. The relative forecast accuracy and
efficiency are improved by giving the output of forecast module to optimization module for fine
tuning of parameters. However, the forecast accuracy is improved at the cost of computational
complexity.
A hybrid model of GA and non-linear AR with exogenous neural network is proposed for
short and medium-term forecasting in [72]. In order to fine tune input parameters for the pro-
posed model statistical and pattern recognition based schemes are employed. The GA is used
for selection weights and biases for the training of the neural network. The proposed model is
validated by comparing with the existing models such as average with exogenous inputs and
regression tree models.
In [73], data-analytic based framework is proposed to forecast solar energy. The proposed
framework is developed and validated on eight years (2005-2012) large dataset of a golden
site of USA with one minute resolution taken from the national renewable energy laboratory
(NREL). The uniqueness of this method is that data preprocessing is performed using integrated
serial time-domain analysis coupled with multivariate filtering.
Authors proposed a hybrid approach in order to forecast the electricity production from
solar panel based microgrid in [74]. The hybrid model is based on genetic GA, PSO, and
neuro-fuzzy inference system (NFIS). The proposed model is tested on the real time power
generation data obtained from gold wind microgrid found in Beijing. The proposed model has
better performance as compared to existing ANN and linear regression based models. Some
of the relevant work with respect to strategies, dataset taken from the repository, and critical
remarks is comprehensively summarized in Table 2.
3 Research gap analysis
The precise and accurate electric load forecasting is indispensable task in SG because low fore-
cast accuracy results in huge economic loss. One percent increase in the forecast error will
cause 10 million increase in the overall utility cost. Moreover, accurate forecasting can also be
beneficial for the detection of potential faults, reliable operation, and decision making of the
SG. Therefore, electric utility companies are trying to develop a fast, accurate, and robust short-
term electric load forecasting model. The following research gaps are highlighted in the above
mentioned recent and relevant work: (1) there is no universal forecast model, some models are
better for some objectives and some conditions; (2) there is a problem that training data set is not
similar with predicted period; (3) there is a trade-off between forecast accuracy and convergence
rate, when forecast accuracy is increased convergence rate will be compromised and vice versa;
(4) adjust the offered prices and amounts in the intra-day and day-ahead market; and (5) find
the needed power reserves with start-up times counted in hours when the state of transactions
from those markets is known. In this regard, a novel hybrid forecast model composed of MMI,
Table 2 Recent and relevant work summary
Strategies Objectives Repository Limitations
Intelligent model for
forecasting based on SVM
and FFI algorithm [7]
DG forecasting and regional
power supply structure
analysis
Certain area data in
Northeast of Chine
The model is suitable for
short horizon of forecasting
Weather information based
electric load forecasting of a
bulk power system [41]
Forecast accuracy
improvement for effective
performance of bulk power
system
Fujian Province bulk power
system China
This model is suitable and
quite effective only for bulk
power system
Household forecasting using
DRNN [49]
To improve the comfort of
the users by reliable
provision of electricity
Ireland commission for
energy regulation
The complexity of the model
is increased
LSTM-RNN based
residential load forecasting
[50],
Accuracy improvement to
facilitate the residential
consumers
Canadian residential load
data
The proposed model
improved only the meter
level forecast accuracy
IoT-based electric load
forecasting [52]
Improvement of the accuracy
and capability for effective
power system operation
Electric load record of urban
area in south China
The framework has large
complexity
Deep model with stacked
de-noising auto-encoders for
day-ahead load forecasting
[57]
Forecast accuracy
improvement California electric load data
The system model
performance is compromised
with the decrease in the
datasize
A big data approach for
electric load forecasting [58]
Forecast accuracy
improvement for scalable
models
Global energy forecasting
competition 2012
The model has complex
structure and slow
convergence rate
ANN based prediction model
[60]
To reduce the RMSE and
improve forecast accuracy
Real data of Sugimoto
Campus of Osaka City
University Japan
Objectives are achieved at the
cost of high convergence rate
Intelligent hybrid model
based load forecast [61]
Day-ahead electric load
forecasting
Historical load data of
Poland
To efficiently manage the
generation of the Polish grid
FCRBM, and GWDO techniques is proposed for high quality electric load forecast ranging from
day to a week.
4 Statement of the problem
In literature, tremendous research progress has been conducted in the field of short-term load
forecasting. However, in spite of much research in this field, more accurate and robust short-term
load forecasting is the need of the day due to its application in decision making of the SG. Thus,
with fast and accurate forecasting, the SG can facilitate effective management and utilization
of available resources. However, fast and accurate forecasting is a very complex process due
to the high variation and non-linearity of consumers’ load profile. It is worth mentioning that
variations and non-linearity are inversely linked with forecasting. The predictability is low if
the variations and non-linearity are high and vice versa. Authors in [75] and [76] adopted ANN
based models for accurate electric load forecasting. However, due to the shallow layout of ANN,
these models can suffer from the vanishing gradient, under-fitting, and computational power
problems [77, 78]. The aforementioned problems disturb the forecast accuracy of ANN based
models. In [36], MI and ANN based model is used for short-term load forecasting. Authors in
[37] used Bi-level strategy to address short-term load forecasting, which is based on ANN and
differential evolutionary algorithm (DEA). Authors in [10] proposed short-term load forecasting
model based on ANN and MEDEA. However, these models perform well for small data size, as
the size of data increases their performance degraded due to their shallow layout. It also suffers
from the curse of dimensionality and under-fitting. Moreover, in [37]-[10] authors used DEA
and MEDEA to optimize the forecasting process. However, both DEA and MEDEA have lower
precision and convergence speed as compared to WDO algorithm [79].
In this synopsis, first, a new way to adopt deep learning model, i.e., stacked FCRBM is pro-
posed. Secondly, a fast and accurate electric load forecasting model based on two deep learning
models, i.e., stacked FCRBM and CRBM is proposed. The deep learning models have deep
layers layout to capture the highly abstracted characteristics from off-line data. This deep lay-
ers structure highly contributes in improving relative forecast accuracy. Moreover, deep learning
models performance is improving with an increase in the data size while ANN based models per-
formance is compromised with increase in data size [80]. Furthermore, the proposed model has
modular framework in which the output of each former module is fed into the later module. The
framework comprises of four modules: data pre-processing and feature extraction module based
on MMI, FCRBM based training and forecasting module, GWDO based optimization module,
and utilization module. The MMI is used for features selection in data pre-processing and fea-
ture extraction module. For training and forecasting, the deep learning technique FCRBM is
adopted because the deep learning techniques performance is directly linked with data size and
have the ability to capture the desired features in a more effective manner. In order to optimize
the error performance, a GWDO algorithm is proposed because of high convergence speed and
precision [79]. The objective of this synopsis is to improve forecast accuracy with affordable
execution time and computational complexity. The proposed model is applied to hourly load
data of three USA grids and global energy forecasting competition 2012. The proposed model
is validated by comparing with ANN [76], convolution neural network (CNN) [81], MI-ANN
[36], Bi-level [37], and AFC-ANN [10] in terms of accuracy (MAPE, RMSE, and correlation
coefficient) and convergence rate.
5 Proposed system models
In this synopsis, three system models are presented in order to improve the forecast accuracy
and convergence rate. The detail description is as follows:
5.1 Proposed system model I
In this section, the proposed short-term load forecasting framework based on two deep learning
models: stacked FCRBM and CRBM is introduced as shown in Figure 1. For the proposed
framework modular strategy is adopted, where the output of each former module is fed into the
later module. In addition, the proposed framework consists of three modules: data processing
and feature extraction module, deep learning-based training module, and deep learning-based
forecasting module. The detail description is as follows:
Weather data
Input
Load data Data pr ocessing Feature extraction Training data
Testing data
Validation data
Training data
Testing data
Validation data
Data normalization
Data structuring
Data cleansing
Data normalization
Data structuring
Data cleansing
Load data Data pr ocessing Feature extraction Training data
Testing data
Validation data
Data normalization
Data structuring
Data cleansing
Weather data
Input
Load data Data pr ocessing Feature extraction Training data
Testing data
Validation data
Data normalization
Data structuring
Data cleansing
Sigmoid
Parameters
tuning
Deep learning
techniques
Model picking
Stacked
FCRBM CRBM
ReLU
Parameters
tuning
Sigmoid
Parameters
tuning
Deep learning
techniques
Model picking
Stacked
FCRBM CRBM
ReLU
Parameters
tuning
Data processing and feature extraction module
`
Training module Forecating module Applications
Figure 1. Proposed system model I
5.1.1 Data processing and feature extraction module
First, twenty zones’ historical data of US utility (global energy forecasting competition 2012)
consists of hourly load and weather data are taken from the publicly available Kaggle repository
[83]. This data is given as an input to the data processing and feature extraction module. The
three data operations: cleansing, normalization, and structuring are performed on the received
data. The cleansing operation is performed in order to replace the missing and defective values
by the mean of previous values. After cleansing, the data is normalized in order to reduce and
eliminate the redundancy. Moreover, the data has large values, the normalization is performed
on the data to make the weighted sum stay within the limits of the sigmoid function value. At
the end, results are denormalized to achieve the desired load predictions. After cleansing and
normalization, the data is structured in ascending order. The desired features from the dataset
are extracted by the feature extraction process and finally, the data is split into training and
testing dataset. The training data have hourly load and weather data to train stacked FCRBM
and CRBM. The testing data is used to evaluate the forecast accuracy of the proposed model in
terms of MAPE, RMSE, and correlation coefficient. The validation dataset is constructed for
proper parameters tuning.
5.1.2 Training module
Deep learning models stacked FCRBM and CRBM based training is the main part of this frame-
work. These models are trained with the training data to learn the non-linear relationship be-
tween demand load profile and historical observations. The output of the data processing and
feature extraction module is given as an input to the training module. Moreover, if we are us-
ing stacked FCRBM forecasting model, then the model will be trained using rectified linear
unit (ReLU) activation function. In contrary, CRBM forecasting model exploits sigmoid activa-
tion function to perform data training. In this way, the deep learning-based training module is
enabled via learning to forecasts the future load.
5.1.3 Forecasting module
The output of the training module is fed into the forecasting module. In this module, trained
stacked FCRBM and CRBM models forecast the future load. The forecast accuracy of the
proposed model is evaluated in terms of MAPE, RMSE, and correlation coefficient using the
testing data. The output of the forecasting module can be used for SG applications such as power
generation planning, economic operation, unit commitment, load switching, power purchasing,
demand side management, and contract evaluation.
5.2 Proposed system model II
To forecast the future electric load, prediction models must have the ability to learn the non-
linear input/output mapping in a most efficient way. In artificial intelligence, ANN is one of the
techniques mostly used to forecast non-linear load due to easy and flexible implementation [27].
However, the performance of ANN is highly dependent on adjustable tuning parameters such as
learning rate, number of layers, and number of neurons in the layers. The learning algorithms for
training neural networks such as gradient descent, multivariate AR, and backpropagation may
suffer from premature convergence and overfitting [36]. To cure the aforementioned problems,
hybrid forecast strategies in literature have been proposed. However, hybrid forecast strategies
have improved modeling capabilities as compared to unhybrid methods. Still, there is a problem
of slow convergence and high execution time due to a large number of adjustable parameters.
In [37], the authors have used a Bi-level strategy, which is based on ANN and DEA for electric
load forecasting. An accurate fast convergence strategy based on ANN and MEDEA [82] is
proposed to forecast the future load [10]. However, these strategies are highly dependent on
the modular’s knowledge and experience. Moreover, the performance of the aforesaid strategies
are satisfactory for small data size and their performance is compromised as the size of the data
increases. There is no mechanism proposed to handle the large data (big data) and in real life,
the data size is increasing dramatically.
Data
cleansing
Data
normalization
Entropy based mutual information
feature selection
Redundancy
filter
Irrelevancy
filter
Data pre-processing and feature selection module
Historical data Optimization module
Forecasted load
FCRBM based training and forecasting module
X
Figure 2. Schematic diagram and main procedure of the FCRBM based proposed
system model II for hourly and weekly electric load prediction. One arrowheads
denote one way data flow and double arrowheads denote two way data flow.
In this synopsis, a hybrid model based on MMI, FCRBM, and GWDO techniques are pro-
posed for short-term load forecasting, as shown in Figure 2, which is the extension of our earlier
conference paper [84]. The proposed model comprises of three modules as illustrated in Figure
2: a) data pre-processing and feature selection module based on MMI, b) FCRBM based training
and forecasting module, and c) the proposed GWDO algorithm based optimization module.
Prior to performing electric load forecasting, it is indispensable to identify the factors, which
influence the load behavior. These influencing parameters include weather factors (humidity,
temperature, and dew point), occupancy patterns, and calendar indicators. However, it is not
feasible to apply all aforementioned candidate inputs to FCRBM based training and forecasting
module. Moreover, the candidates include ineffective features which complicate and degrade the
performance of the model. Thus, the candidate inputs are first fed into the data pre-processing
and feature selection module. Then, the pre-processed data is fed to the MMI based feature
selection phase. The output of data pre-processing and features selection module is given as
input to training and forecasting module based on FCRBM. The output of this module is fed
into optimization module based on GWDO, which is the new contribution. The optimization
module first calculates the error between the real and forecasted value. Then, it minimizes the
error in order to make accurate predictions. The detailed description of the proposed system
model is as follows:
5.2.1 Data pre-processing and feature selection module
Let, Eis the historical electric load, which is represented in the matrix form. This historical
electric load data is fed into the data pre-processing and feature selection module.
E=
E(1,1)E(2,1)E(3,1)E(4,1)... E(x,1)
E(1,2)E(2,2)E(3,2)E(4,2)... E(x,2)
E(1,3)E(2,3)E(3,3)E(4,3)... E(x,3)
E(1,4)E(2,4)E(3,4)E(4,4)... E(x,4)
. . . . . .
. . . . . .
. . . . . .
E(1,y)E(2,y)E(3,y)E(4,y)... E(x,y)
(1)
where E(1,1)is the electric load of first day first hour, E(2,1)is the electric load of second day
first hour, such that E(x,y)is the electric load of xth day and yth hour. The data is of four years
having 1460 days and each day has 24 hours. The dimension of the data set is 1460 24. Thus,
the total number of data samples is 35040. The rows show the number of hours and columns
show the number of days. The value of xis linked with the tuning of FCRBM training, larger
the value of ximplies fine tuning and vice versa. There is a performance tradeoff between fine
tuning and convergence rate. This input data is first passed through the data cleansing phase,
where defective and missing values are replaced by the average value of preceding days’ elec-
tric load. The cleansed data is passed through the normalization phase because the data have
outliers and weight matrix is extremely small, to make the overall weighted sum within the
limits of the activation function. In machine learning, feature extraction/selection is a process
to select abstracted features and filter out unimportant features. The data pre-processing and
feature selection have significant importance because it helps to avoid the curse of dimension-
ality and highly contributes to accuracy. In this regard, entropy based MI technique is a feature
selection technique, which is used in a variety of taxonomy problems such as image processing,
cancer categorization, image recognition, and data mining. The MI features selection technique
is developed and used by [36] and [10] for features selection. In this work, the MI technique
is improved by modification (MMI) subjected to accuracy and convergence rate. The cleansed
and normalized data is passed through the MMI based feature selection phase to rank the in-
puts according to their information values. The ranked inputs are filtered using the irrelevancy
and redundancy filters in order to remove irrelevant and redundant information. The subset of
selected features contains best and more relevant information which highly contributes to the
accuracy. First, the existing MI features selection technique is discussed. Then, the MMI fea-
ture selection technique will be discussed.
The joint entropy for two discrete random variables is defined as, the information obtained while
observing both discrete random variables at the same time. The mathematical description is as
follows:
HE,Et=
i
j
pEi,Et
jlog2pEi,Et
j i,j{1,2},(2)
where pEi,Et
jis the joint probability of two discrete random variables, Eiis the input discrete
random variables, and Et
jis the target value. In feature selection, the information which is
common among both variables are indispensable, which is formulated as in [36]:
MI E,Et=
i
j
pEi,Et
jlog2
pEi,Et
j
p(Ei)pEt
j
,(3)
where MI Ei,Et
jis used to find the mutual information between the two variables Eiand Et
j.
In this case, the candidate inputs are ranked by MI technique between input and the target value.
From entropy based MI technique, the following three conclusions can be drawn:
If M I Ei,Et
j=0, it indicates that the discrete random variables Eiand Et
jare irrelevant.
If M I Ei,Et
jhas some larger value, it indicates that discrete random variables Eiand Et
j
are highly relevant.
If M I Ei,Et
jhas smaller value, it indicates that discrete variables Eiand Et
jare lightly
related.
In [36], among the training data samples last value of every hour of the day is chosen as the target
value. The target value or last sample is very close to next day with respect to time and seems
logical, however, it may cause serious forecast errors due to ignorance of average behavior while
forecasting. In [10], the authors have used average value in addition to the target value because
both average and target values are of equal importance. The Equation 32 is modified for three
variables as follows:
MI (E,Et,En) =
i
j
k
pEi,Et
j,En
k
×log2p(Ei,Et
j,Ekn)
p(Ei)p(Et
j)p(Ekn),
(4)
where Eknis the average value, which indicates the second target. However, the average value
will be very low, if some values in the selected features are very small. The addition of average
with other two parameters is not sufficient because it may cause serious prediction problems due
to ignorance of moving average. Thus, the Equation 32 is modified for four variables as follows:
MI (E,Et,Em,En) =
i
j
k
l
pEi,Et
j,Ekn,Em
l
×log2p(Ei,Et
j,Ekn,Em
l)
p(Ei)p(Et
j)p(En
k)p(Em
l),
(5)
where the third target value Em
lis moving average. The Equation 34 is expanded for binary
encoded information as:
A supplementary Szvariable is defined in Equation 36 to find the joint and individual prob-
abilities, such that:
Sz=8Et+4En+2Em+EE,Et,En,Em{0,1},(6)
where Sz{0,1,2,3, ......, 15}.Szcounts the number of zeros, ones, twos, threes, and finally
the number of fifteens. From the aforesaid discussion, the joint and individual probabilities can
be find using Equation 36 as:
p(E=0) = S0z+S2z+S4z+S6z+S8z+S10z+S12z+S14z
L
p(E=1) = S1z+S3z+S5z+S7z+S9z+S11z+S13z+S15z
L
p(En=0) = S0z+S1z+S4z+S5z+S8z+S9z+S12z+S13z
L
p(En=1) = S2z+S3z+S6z+S7z+S10z+S11z+S14z+S15z
L
p(Em=0) = S0z+S1z+S2z+S3z+S8z+S9z+S10z+S11z
L
p(Em=1) = S4z+S5z+S6z+S7z+S12z+S13z+S14z+S15z
L
p(Et=0) = S0z+S1z+S2z+S3z+S4z+S5z+S6z+S7z
L
p(Et=1) = S8z+S9z+S10z+S11z+S12z+S13z+S14z+S15z
L
(7)
The Equations 34-37 are the MMI technique equations, which are used to find the mutual
information between the four variables such as Ei,Et,En, and Em. The candidate inputs are
ranked on the basis of this mutual information to remove irrelevant and redundant information.
The MMI feature selection technique provides two-fold benefits: a) selection of suitable and
relevant features minimizes the forecast error, and b) selection of the subset of features improves
the convergence rate. Before feeding them to the training and forecasting module, the selected
features are split into training, testing, and validation data samples for training and validation of
the FCRBM. The selected subset of key features is given as input to the training and forecasting
module based on FCRBM.
5.2.2 FCRBM based training and forecasting module
The aim of this module is to devise a framework which is enabled via learning to forecast the
future electric load. In this regard, in literature, a wide variety of short-term load forecasting
models such as dynamic regression, transfer function, and AR heteroscedastic, have been pro-
posed. Although, these models are capable of linear predictions and the behavior of load is
non-linear and stochastic. To solve the aforesaid problems, the authors used novel strategies
for short-term load forecasting in [85] and [86] based on ANN. These forecasting strategies are
capable to handle the non-linear behavior of electric load and forecast the future load. How-
ever, the performance of these strategies is compromised with the increase in the data size. The
deep learning models such as RBM, conditional RBM (CRBM), and FCRBM have better per-
formance at the large datasets. These models have deep layers layout, which has the ability to
capture the highly abstracted features. Thus, FCRBM from the deep learning models is selected
to forecast the future electric load because it provides high quality forecasting .
The training and forecasting module is based on FCRBM, which is the indispensable part
of our proposed hybrid forecasting model. At first, the architecture of FCRBM model is deter-
mined. The model has four layers along with neurons, i.e., hidden layers, visible layer, style
layer, and history layer. As discussed earlier, the FCRBM network must be trained and enabled
via learning to forecast the future electric load. Generally, learning is of 3 types, i.e., supervised,
unsupervised, and reinforced. Since in our scenario, we use historical load data, thus we use
supervised learning. In literature, many supervised learning activation functions exist such as
sigmoidal, tangent hyperbolic, ReLU and softmax. However, we chose ReLU activation func-
tion as shown in Equation 38 because it has faster convergence and also overcomes the problem
of vanishing gradient.
f(X,b) = max (0,β(X,b))
f(X,b)1 if β(X,b)0
0 otherwise,
(8)
where Xis the selected candidate inputs (see Section-III), bindicates bias value, βis for steep-
ness control of activation function. In Figure 2, wv,wy,whare weights of the corresponding
layers and Av,Au,Ay,Bh,Bu,Byare the connections of the corresponding layers to factors,
these are also known as model free parameters. The ˆaand ˆ
belements represent dynamic biases
associated with visible and hidden layers, respectively.
The training and learning procedure iterates for the number of epochs to enable the network
for forecasting. The FCRBM is enabled via training and learning to forecast the future electric
load. Moreover, the performance metric, mean absolute percentage error (MAPE), is considered
as validation error, which is formulated as follows:
MAPE = 1
τ
τ
t=1
|TtF
t|
|Tt|!100,(9)
where Ttrepresents actual load values, F
tindicates forecasted load values, and τrepresents
number of days under consideration. The further details of FCRBM working and learning ac-
tivation function can be found in [28]. The output of this module is fed into the GWDO based
optimization module to further improve forecast accuracy with affordable convergence rate.
5.2.3 GWDO based optimization module
The objective of this module is to minimize the forecast error with affordable convergence rate.
The authors used DEA [36] and MEDEA [10] to optimize the performance of the model. Both
algorithms have slow convergence rate and low precision [79]. Furthermore, the aforementioned
algorithms are trapped in local optimum [79]. To remedy the aforementioned problems, GWDO
algorithm is proposed, which is a hybrid of WDO and GA algorithms [87]. The proposed
algorithm takes benefit from the features of both algorithms (GA and WDO). The GA enables
the diversity of the population and WDO has faster convergence. The GWDO based module
receives the forecasted load with some error that is minimum as per the ability of FCRBM.
This forecasting error can be minimized with the proposed GWDO optimization technique. The
sole objective of GWDO based optimization module is to fine tune the adjustable parameters of
the model to improve forecast accuracy with affordable convergence rate. In other words, the
optimization module is integrated with the forecasting module in order to minimize the error
and improve the forecast accuracy. Thus, error minimization (MAPE) becomes the objective
function of the optimization module, which is mathematically modeled as:
Mini MAPE(j)
Rt,Irt
j{1,2,3, ....τ},(10)
where Rtand Irt are the thresholds of redundancy and irrelevancy, respectively. The GWDO
based optimization module fed the optimized values of the thresholds to MMI based feature
selection module to select key features from the given data. The integration of optimization
module to forecasting model increased the execution time, which disturbs the convergence rate
because of the tradeoff between execution time and convergence rate. The integration of opti-
mization module is favorable for applications where forecast accuracy is of primary importance
and vice versa. Our proposed GWDO algorithm among the heuristic algorithms is preferred due
to the following reasons: (i) it avoids premature convergence and (ii) it has faster convergence.
The GWDO algorithm randomly produce population, i.e., the position 11 and velocity matrix
12 as in [87]:
(xnew =1i f rand(1)6sig(j,i)
xnew =0i f rand(1)>sig(j,i)(11)
vi=vmax×2×(rand(populationsize,n)0.5)(12)
The fitness functions for velocity and position are defined as in Equations 13 and 14 be-
cause the position vector and velocity vector will now be updated by comparing random number
(rand(.)[0,1]with fitness function (F F (.)[0,1]) as shown in Equation 15.
FF (v(i)) = MAPE (xnew (i))
MAPE (v(i)) + MAPE (xnew (i)) (13)
FF (xnew (i)) = MAPE (v(i))
MAPE (xnew (i)) + MAPE (v(i)) (14)
If random number is less than fitness function then load value will be update because our
objective function is minimization problem.
Fpr(i) = (vn(i)i f rand (i)6FF (v(i))
xn
new (i)i f rand(i)6FF (xnew (i)) (15)
Now there is question mark, why load update have influence on random value. We cure this
problem by eliminating the load update influence on random number, now the comparison is
between fitness function of the candidate input to the previous one as shown in Equation 16.
Thus, the selected load update value will have high quality of accuracy.
Fpr+1(i) =
vn+1(i)vn(i)
vn(imax)6F F (v(i))
xn+1
new (i)xn
new (i)
xn
new (imax)6F F (xnew (i))
(16)
With the integration of GWDO based optimization module, the accuracy is improved while
the convergence is compromised because there is a trade-off between accuracy and convergence
rate. However, the proposed short-term load forecasting model outperforms the existing models
i.e., MI-ANN [36], Bi-level [37], and AFC-ANN [10] in terms of accuracy. It is due to the
fact that ANN based models have a shallow layout and their performance is degraded with the
increase in data size. The FCRBM has improved performance with the large datasize due to
their deep layers layout.
5.3 The proposed system model III
In literature, many authors used ANN based forecaster for load prediction due its capability for
handling the nonlinearity of consumers load. However, the performance of ANN based models
are not satisfactory in terms of accuracy. Thus, some authors integrated optimization module
with ANN based forecaster, which improves significantly the forecast accuracy. However, the
accuracy is improved at the cost of slow convergence rate. Moreover, the ANN based models are
suitable for small datasize while their performance is degraded as the datasize increases. Thus,
we proposed new electric load forecasting model based on FCRBM [88] as shown in Figure 3.
The proposed model is subjected to: accuracy, convergence rate, and scalability. The proposed
system architecture comprises of four modules: 1) data processing and feature selection mod-
ule, 2) FCRBM based forecaster module, 3) GWDO based optimizer module, and 4) utilization
module. The historical load data and exogenous parameters (temperature, humidity, wind speed,
and dew point) are given as input to data processing and features selection module. The input
data is normalized and passed through relevancy filter, redundancy filter, and candidate inter-
action phase. The aim of this module is to clean the data and select abstractive features for
forecast process by maximizing relevancy, minimizing redundancy, and maximizing candidates
interaction. The selected features are fed into forecaster module based on FCRBM. The purpose
is to predict the future load of FE grid. The forecasted load is fed into optimizer module based
on GWDO. The purpose is to improve the forecast accuracy. At last, the forecasted load is fed
into the utilization module to utilize the predicted load profile for the decision making in the
SG. The detailed description is as follows:
5.3.1 Data processing and features selection module
The input data including historical load data and exogenous parameters (temperature, humidity,
wind speed, and dew point) is fed into the data processing and feature selection module. At first,
the data cleansing is performed to recover the missing and defective values. Then, the clean data
is normalized using Equation 30 to remove the outliers and make the data within the limit of the
activation function;
Norm =Xµ(X)
std(X),(17)
where Norm is the normalized data, Xis the input data, and std is the standard deviation. The
input data (X) includes electric load data (P(h,d)), temperature (T(h,d)), humidity (H(h,d)),
dew point (D(h,d)), and wind speed (W(h,d)) parameters. The hshows particular hour and d
shows particular day of historical data. The temperature, humidity, dew point, and wind speed
are called exogenous variables. The normalized data is passed to through irrelevancy filter, re-
dundancy filter, and candidate interaction phase subjected to removal of irrelevant, redundant,
and nonconstructive information due to three reasons: a) redundant information are not use-
ful and increase the execution time during training phase, b) irrelevant features do not provide
useful information and act as an outlier, and c) interacting candidates provide useful informa-
tion to improve the forecast accuracy. The detailed description of relevancy, redundancy, and
candidates interaction are as follows:
Data
normalization
Data cleansing
Mutual information-based feature
selection
Redundancy
filter
Irrelevancy
filter
Data processing and feature selection module
Data
normalization
Data cleansing
Mutual information-based feature
selection
Redundancy
filter
Irrelevancy
filter
Data processing and feature selection module
Candidates
interaction
P(h,d)
T(h,d)
H(h,d)
W(h,d)
D(h,d)
1 2 3
, , ....., n
S S S S
GWDO based optimizer module
GWDO based parameters
tuning, settings,
and optimizing
, ,
( ( ))
th th I
R I C
Minimization Erro r i
P(h+1, d+1)
, ,th th i
R I and C
Utilization module
Decision making,
generation planning,
operation planning,
load switching,
energy purchasing,
contract evaluation,
and load scheduling
Iterative search procedure
Data
normalization
Data cleansing
Mutual information-based feature
selection
Redundancy
filter
Irrelevancy
filter
Data processing and feature selection module
Candidates
interaction
P(h,d)
T(h,d)
H(h,d)
W(h,d)
D(h,d)
1 2 3
, , ....., n
S S S S
GWDO based optimizer module
GWDO based parameters
tuning, settings,
and optimizing
, ,
( ( ))
th th I
R I C
Minimization Erro r i
P(h+1, d+1)
, ,th th i
R I and C
Utilization module
Decision making,
generation planning,
operation planning,
load switching,
energy purchasing,
contract evaluation,
and load scheduling
Iterative search procedure
Figure 3. The proposed system model III
5.3.2 Relevancy operation
The relevance of candidate inputs to the target variable is important for abstractive and key fea-
tures selection. For relevancy measurement in literature many techniques are used [89] among
which MI feature selection technique is chosen good. The MI measures the relevance between
to variables xand y. The MI measurement is interpreted as observing yby studying xand vice
versa. The MI for continuous variables xand yis represented by I(x;y)and defined for both
individual (p(x),p(y)) and joint probability distribution (p(x,y)). Assume that
S={x1,x2,x3, ...., xM},(18)
where Srepresents the set of candidate inputs and yis the target variable. The relevance of each
candidate input with target variable yis checked. The relevance of candidate input xiwith target
variable yis defined by the following Equation;
D(xi) = I(xi;y),(19)
where D(xi)represents the relevance of each candidate input to the target variable.
5.3.3 Redundancy operation
Many authors in [90]-[92] modeled the redundancy operation between the candidate inputs.
The purpose is to remove the redundant information from the input data to improve conver-
gence rate. The redundancy is evaluated in terms of the common information between the two
candidate inputs. In [89], authors demonstrated that closely related candidate inputs degrade
the performance of feature selection technique. The reason is that two candidate inputs have a
lot of common information and less redundant information about the target variable. Thus, a
variable with less redundant information about the target variable may be incorrectly count as
highly redundant and filtered out, while it may be the key feature for forecaster. To overcome
the aforementioned problem a redundancy measure based on interaction gain (Ig) is proposed
[93] as:
RM(xi,xs) = Ig(xi;xs;y)
=I[(xi,xs);y]I(xi;xs)I(xs;y),(20)
where RM(xi,xs)is the redundancy measure, xi,xsare candidate inputs, and yis the target vari-
able. The Ig can be mathematically modeled in terms of joint and individual entropy as:
Ig(xi;xs;y) = H(xi,xs) + H(xi,y) + H(xs,y)
H(xi)H(xs)H(y)H(xi,xs,y),(21)
where H(xi),H(xs), and H(y)denote individual entropy and H(xi,xs),H(xi,y),H(xs,y), and
H(xi,xs,y)denote joint entropy.
5.3.4 Interaction session
In [93], used redundancy and irrelevancy filters for feature selection. However, the individual
features may be irrelevant but becomes relevant when used together with other input candidates.
Thus, the feature selection technique can be extended to interaction among the candidate inputs.
If two candidate inputs xiand xshave redundant information about target y, then the joint MI
of both candidates with ywill be less than the sum of individual MIs. Thus, the result will
be negative according to Equation 32, which indicates redundant features xiand xsfor fore-
caster. The absolute value of Equation 32 shows amount of redundancy. On the other hand,
if xiand xscandidate inputs interact with target ytheir interaction causes joint (xiand xs) MI
with target ygreater than the sum of individual MIs. Thus, the positive value of Equation 32
indicates interacting features and its absolute value shows amount of interaction. Consequently,
for redundancy and interaction the Equation 32 can be defined as in terms of interaction gain
(Ig):
RM(xi,xs) = {Ig(xi;xs;y),if Ig(xi;xs;y)<0,
0 otherwise (22)
In(xi,xs) = Ig(xi;xs;y),if Ig(xi;xs;y)>0,
0 otherwise (23)
where Equation 34 is a modified version of Equation 32 for redundancy measure and Equation
35 is for interaction measure. The
IM(xi) = Maximize
xjS−{xi}In(xi,xj)(24)
5.3.5 The modified feature selection technique
The purpose of this modified features selection technique is to maximize both relevancy and
interaction, and minimize redundancy based on the filters introduced in the preceding subsection
(5.3.1). Our modified feature selection technique also consider candidates interaction while the
existing techniques i.e., [90]-[93] only consider relevancy and the redundancy filters. The flow
chart of our modified feature selection technique is shown in Figure 4. The detailed description
and step by step procedure is as follows:
Start
1 2
{ , , , ...., }
p p p p p
M
S x x x x
Return
Filtering stage (see Fig. 3)
Post filtering stage (see Fig. 4)
 
n
S
 
n
S
Return , which is the finally selected
candidates
End
Input candidates set and
target y
i
x S 
Relevancy measure using
Equation (3)
i
x S 
Interaction measure using
Equation (4)
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
i
x S 
Interaction measure using
Equation (4)
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
i
x S 
Interaction measure using
Equation (4)
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
i
x S 
Relevancy measure using
Equation (3)
i
x S 
Interaction measure using
Equation (4)
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
Input candidates set and
target y
i
x S 
Relevancy measure using
Equation (3)
i
x S 
Interaction measure using
Equation (4)
 
( ) ( ), ( )
( ) . ( ), 0
i i i
i i
Ic x f D x I M x
D x IM x
 
 
Sorting candidate inputs
using
s
S
&
s n
S S
&
s n
S S
NO
Yes
s
S S
Remove
n
S
s
S S
Remove
n
S
Figure 4. FLow chart of the modified feature selection technique
Step1: Input data including candidate set of inputs and target value yare given as input to
the technique.
Step2: Pre-filtering phase is demonstrated as:
The blocks enclosed in the dotted box are pre-filtering phase. In this phase, the relevancy
and interaction measures are calculated and candidate inputs are ranked on the basis of
calculated measure.
The information content can be measured from its individual information and the gained
information using a modified version of Equation 32 mentioned in the flow chart 4. The
function f(,)is a monotonically increasing function, and αis a weight factor that weights
the relevancy verses interaction measure. It can be adjusted and fine tuned subjected to
forecasting problem.
The selected candidate inputs of pre-filtering phase (Sp) are sorted in descending order
based on the information content.
Step3: Filtering phase individually depicted in Figure 5 and is described as follows:
The output of pre-filtering phase is fed as an input to the filtering phase. In this step, the
pre-selected features are partitioned into selected (Ss) and non-selected (Sn) features as
Start
1 2
{ , , ,...., }
p p p p p
M
S x x x x
Input
 
1 2
, { , , ,...., }
p p p p
n s
M
S S x x x x 
R( ) ( , )
p p
i
p p p
i i j
x S
x Minimize RM x x
 
 
 
( ) ( ), ( ), ( )
( ) . ( ) . ( ), , 0
p p p p
i i i i
p p p
i i i
V x g D x IM x R x
D x IM x R x
 
 
 
 
 
( )
p
i th
V x R
{ }
p p
s
i
S S x 
{ }
p
n n
i
S S x 
i M
Yes
1i i 
Sorting & according to ( )
p
s n
i
S S V x
1 2 1 2
{ , , ,...., }, { , , ,...., },
p p p p
s n n n n n s n
M M u
S x x x x S x x x x S S S  
Return
1 2 1 2
{ , , ,...., }, { , , ,...., },
p p p p
s n n n n n s n
M M u
S x x x x S x x x x S S S  
Return
1 2 1 2
{ , , ,...., }, { , , ,...., },
p p p p
s n n n n n s n
M M u
S x x x x S x x x x S S S  
Return
End
Yes
No
Yes
Start
1 2
{ , , ,...., }
p p p p p
M
S x x x x
Input
 
1 2
, { , , ,...., }
p p p p
n s
M
S S x x x x 
R( ) ( , )
p p
i
p p p
i i j
x S
x Minimize RM x x
 
 
 
( ) ( ), ( ), ( )
( ) . ( ) . ( ), , 0
p p p p
i i i i
p p p
i i i
V x g D x IM x R x
D x IM x R x
 
 
 
 
 
( )
p
i th
V x R
{ }
p p
s
i
S S x 
{ }
p
n n
i
S S x 
i M
Yes
1i i 
Sorting & according to ( )
p
s n
i
S S V x
1 2 1 2
{ , , ,...., }, { , , ,...., },
p p p p
s n n n n n s n
M M u
S x x x x S x x x x S S S  
Return
End
Yes
No
Yes
Figure 5. Flow chart of the filtering phase
shown in Figure 4. The redundancy measure is calculated by Equation 25 as:
R(p
xi) = Minimize
p
xip
SnRM(p
xi,
p
xj)o,(25)
where R(p
xi)indicates the redundancy measure for each candidate input p
xip
S.
The information value of candidate features is evaluated on the basis of three measures,
i.e., redundancy, relevancy, and interaction, which is mathematically described as:
V(p
xi) = gnD(p
xi),IM(p
xi),R(p
xi)o
=D(p
xi) + α.IM(p
xi) + β.R(p
xi),α,β>0,
(26)
where V(p
xi)denotes information value, g(,)indicates a monotonically increasing linear
function, and βdenotes adjustable parameter, respectively.
Decision about the information value is taken as follows:
If V(p
xi)>Rth S
S=S
S+{p
xi}
If V(p
xi)6Rth n
S=n
S+{p
xi},
(27)
where Rth is the redundancy threshold. The information value is compared with the re-
dundancy threshold, if the information value is greater or equal to the relevancy threshold,
it will be put into the set of selected features list (s
S) otherwise it will be put into the set of
non-selected features (n
S).
The set of selected and non-selected features are sorted in descending order according
to their information value and also their union is taken. The selected and not selected
features sets and their union are given as input to the post-filtering stage, which is indi-
vidually depicted in Figure 6.
Step4: In post-filtering phase the selected (s
S) and non-selected inputs are modified and the
information value V(.)is updated. The updated information values are evaluated again using
Equation 27 to transfer candidate inputs either to selected and non-selected features. Step5:
The algorithm is terminated if the non-selected features n
Sset becomes null. The pre-filtering,
filtering, and post-filtering phases are executed in each iteration and the execution never trapped
into infinite loop. Finally, the selected features are fed into the forecaster module.
Start
End
A
Input
&
s n
S S
Se t K M
 
Set
s
k
S
S e t 1m
 
{ } ( , )
j i
i j
x S x
Maximize In x x
Yes
No
{ }
s s
k k i
S S x 
m IM
1m m 
No
Yes
Yes
 
s
k
S
No
Set as the first featureof
s
i k
x S
Set as the next
feature of
s
i
k
x
S
 
 
( ) :
( ) ( ), ( ), ( ) ( ) . ( ) . ( ), , 0
R( ) ( , )
s s
i
s s
i k i
s s s s s s s
i i i i i i i
s s s
i i j
x S
For x S updateV x as
V x g D x IM x R x D x IM x R x
x Minimize RM x x
 
 
( )
s
i th
V x R
( ) ( )
2
s s
i k
ik
V x V x
V
ik th
V R
{ }
{ }
n n n
k k i
s s
k k i
S S x
S S x
 
 
Is a s th e la s t
fe a t u re o f
s
i
k
x
S
No
Yes
No
Yes
1k k
1k N 
Return &
s n
S S
No
Yes
No
Yes
Start
End
A
Input
&
s n
S S
Se t K M
 
Set
s
k
S
S e t 1m
 
{ } ( , )
j i
i j
x S x
Maximize In x x
Yes
No
{ }
s s
k k i
S S x 
m IM
1m m 
No
Yes
Yes
 
s
k
S
No
Set as the first featureof
s
i k
x S
Set as the next
feature of
s
i
k
x
S
 
 
( ) :
( ) ( ), ( ), ( ) ( ) . ( ) . ( ), , 0
R( ) ( , )
s s
i
s s
i k i
s s s s s s s
i i i i i i i
s s s
i i j
x S
For x S updateV x as
V x g D x IM x R x D x IM x R x
x Minimize RM x x
 
 
( )
s
i th
V x R
( ) ( )
2
s s
i k
ik
V x V x
V
ik th
V R
{ }
{ }
n n n
k k i
s s
k k i
S S x
S S x
 
 
Is a s th e la s t
fe a t u re o f
s
i
k
x
S
No
Yes
No
Yes
1k k
1k N 
Return &
s n
S S
No
Yes
No
Yes
Figure 6. Flow chart of the post-filtering stage
5.3.6 FCRBM based forecaster module
The purpose of this module is devise a framework which is enabled via learning to forecast
the future electric load. From section 2, it is concluded that all forecast models are capable
to predict non-linear electric load profile. Thus, we chose FCRBM for forecaster module due
to two reasons: a) it predicts the non-linear electric load with reasonable accuracy and con-
vergence rate, b) and it has improving performances with the scalability of data. The FCRBM
is a deep learning model. It has four layers, i.e., hidden layer, visible layer, style layer, and
history layer. Each layer has particular numbers of neuron. In forecaster module FCRBM is
activated by ReLU activation function and multivariate auto regressive algorithm. The ReLU
and multivariate auto regressive algorithm are chosen because they overcome the problems of
overfitting and vanishing gradient, and has fast convergence as compared to other activation
functions. The mathematical model of ReLU is descried as in Equation 38; The training and
learning procedure iterates for a number of epochs to forecast the future load. To update weight
and bias vectors during training processes authors used different algorithms, i.e., gradient decent
and back-propagation [94], Levenberg-Marquardt algorithm [36], and multivariate auto regres-
sive algorithm [10]. The Levenberg-Marquardt algorithm train the network faster as compared
to gradient decent and backpropagation. Thus, multivariate auto regressive algorithm is used for
network training due to its fast convergence and better performance. The selected features of
data processing module S1,S2,S3, ....Snis fed into the forecaster module, where the forecaster
constructs training and testing data samples. The first three years data samples are used for the
network training. On the other hand, the last year data samples are used for testing. The pur-
pose is to enable FCRBM via training to forecast the future load. The pictorial view of training
and learning process is shown in Figure 7. The forecaster module returns error signal and the
weights and biases are adjusted as per multivariate auto regressive algorithm [95]. This error
signal is fed into the optimization module to further improve the forecast accuracy.
FCRBM
Training process
Real load
Initial forecast
Error signal
Input
Final forecast
FCRBM
Training process
Real load
Initial forecast
Error signal
Input
Final forecast
FCRBM
Training process
Real load
Initial forecast
Error signal
Input
Final forecast
Figure 7. Training and learning process of FCRBM
5.3.7 GWDO based optimizer module
The preceding module returns the future predicted load with some error, which is minimum
as per the capability of FCRBM, ReLU, and multivariate auto regressive algorithm. To further
minimize the forecast error the output of forecaster module is fed into the optimizer module. The
purpose of the optimizer module is to minimize the forecast error. Thus, the error minimization
becomes an objective function for optimizer module and can be mathematically modeled as:
Minimize
Rth ,Ith ,Ci
Error (x)x∈ {h,d},(28)
where Rth is redundancy threshold, Ith is irrelevancy threshold, and Ciis candidates interac-
tion. The optimizer module is based on our proposed GWDO algorithm. The optimizer module
optimizes Rth ,Ith , and Ciand feedback these parameters to data processing module. In data
processing module, feature selection technique use optimized values of Rth ,Ith thresholds, and
Cicandidates interaction for optimal selection of features. The integration of optimizer module
with forecaster module increase forecast accuracy at the cost of high execution time. Usually,
the integration of optimizer with forecaster module is preferred for those application where ac-
curacy is of high importance compared to convergence rate. For optimization various techniques
are available like linear programming, non-linear programming, convex programming, quadratic
programming, and heuristic techniques. The linear programming is avoided because the opti-
mization problem is non-linear. The non-linear programming is applicable here and returns
more accurate results at cost of large execution time. The convex optimization and heuristic
optimization suffer from slow and premature convergence, respectively. Similarly, the mEDE
[10], [82], and DE [36] are not adopted because of slow convergence, low precision, and trapped
into local optimum [96]. To cure the aforementioned problems, we proposed GWDO algorithm.
In other words, GWDO algorithm is preferred because it provides optimal solution with fast
convergence rate. The proposed GWDO algorithm is hybrid of GA [97] and WDO [79]. This
hybrid algorithm is beneficial because it utilizes the key characteristics of both algorithms. The
GA enables diversity of population and WDO has fast convergence. The forecasted future load
is utilized in the utilization module for planning, operation, and unitcommitment.
5.3.8 Utilization module
The forecasted load is utilized for long term planning and development of SG that needed state
permits financing, right of ways, transmission and generation equipment, power lines (transmis-
sion lines and distribution lines), and substation construction.
6 Proposed methods
In this section, two deep learning techniques: CRBM and stacked FCRBM are introduced for
short-term load forecasting. For both of these models, three ingredients: error function, con-
ditional probability, and update/learning rules are described. Error function of a given network
provides scalar values that are essential for the configuration. Conditional probability calculates
the probability of an event over the specific condition. Update/learning rules are required for
tuning free parameters of the system.
6.1 CRBM
CRBM [98] is a modified version of the RBM [99]. It is a machine learning probabilistic model
used to model human activities, weather data, collaborative filtering, classification, and time-
series data [100]. For training non-conditional RBM much progress has been conducted which
is not applicable to conditional model and almost no work has been made on training and predic-
tions from CRBM. It has three-layered architecture: visible layer, hidden layer and conditional
history layer, as shown in Figure 8. Moreover, it defines the probability distribution of one layer
conditioned over two remaining layers. It also allows conditional history layer to determine
increment in visible and hidden layers biases and weights, respectively. Three ingredients, i.e.,
error function, conditional probability and learning rules of CRBM are described as follows:
b
a
uh
w
uv
w
h
v
u
Figure 8. Generic architecture of CRBM
6.1.1 Error function
The error function expresses the possible correlation between input, conditional history layer,
hidden layer, and output. In addition, the error function taking into account all the possible
interactions between neurons, weights, and biases. Equation 29 computes the error function as:
E(v,u,h;w) = (vTwvh h+uTwuvv+uTwuh h+yTa+hTb)(29)
where v= [v1,v2, ....., vn]is the real valued vector having visible unit neurons from 1 to
n,u= [u1,u2, ....., un]shows the real valued vector having history neurons from 1 to n,h=
[h1,h2, ....., hn]denotes the binary vector having hidden neurons from 1 to n,wis the weight
matrix, ais the visible layer bias, and bis the hidden layer bias. The weight matrix wvh is
bidirectional while the weight matrices wuh and wuv are unidirectional.
6.1.2 Conditional probability
Conditional probability in case of CRBM determines the probability distribution over two infer-
ences. First inference p(h|v,u), is to determine the probability of hidden layer inferenced on all
the layers, while the second inference p(v|h,u), is to determine the probability of visible layer
conditioned on all the other layers. Since in CRBM there is no intra-layer connection between
the neurons of the same layer, but inter-layer connection between the neurons of different layers
exist. The two inferences are leading to:
p(h|v,u) = sigmoid(uTwuh +vTwvh +b)(30)
p(v|h,u) = sigmoid(wuv uT+wvhh+a)(31)
where
sigmoid =1
1+exp(x)
6.1.3 Weights and biases learning and update rules
We use stochastic gradient decent method for learning and updating the weights and biases of
the layers because other alternate methods, sometimes, have the problem of vanishing gradi-
ent which made the network hard to train. The parameters are fine tuned to minimize the gap
between real and forecast values. The gradient of weights are calculated by the following Equa-
tion:
wuh
t=ηE
wuh
wuv
t=ηE
wuv
wvh
t=ηE
wvh
(32)
For each layer the change in biases is calculated by the following Equation:
at=ηE
av
bt=ηE
bh
(33)
The weights are updated as follows:
wuh
t+1=wuh
t+wuh
t
wuv
t+1=wuv
t+wuv
t
wvh
t+1=wvh
t+wvh
t.
(34)
The biases are updated by the following Equation:
at+1=at+at
bt+1=bt+bt
(35)
where ηis the learning rate, shows the gradient, and tis the iteration number. The aforemen-
tioned procedure is repeated for the number of epochs until the model converge.
6.2 Stacked FCRBM
FCRBM is an extension of the CRBM introduced by Taylor and Hinton in [100]. In FCRBM
[103], they add the concept of factor and styles to mimic multiple human actions (as shown
in Figure 9). Its contrastive divergence does not suffer from the issue of vanishing gradient
as in backpropagation. It has a rich, distributed hidden state which permits simple and exact
inference that helps in preserving the temporal information present in the electricity load time
series [101]. We propose a new way to adopt deep learning technique stacked FCRBM for
short-term load forecasting, where the successive layers take the output from preceding trained
layers and improve the forecast accuracy. The stacked FCRBM has three layers of CRBM and
an additional layers of styles (as shown in Figure 10). The last style layer represents multiple
parameters that are important for load forecasting.
h
y
a
b
y
w
y
B
y
A
h
B
u
B
v
A
u
A
v
w
h
w
v
Figure 9. Generic architecture of stacked FCRBM
Now the stacked FCRBM comprises of four layers as shown in Figure 9: a) visible layer
v, b) history layer u, c) hidden layer h, and d) style layer y. The visible and history layers are
real-valued while the hidden layer is binary. These layers are significant for the proper operation
of stacked FCRBM. The visible layer is responsible for encoding the present time series data to
forecast the future value, while the history layer will encode historical time series data. Hidden
layer is responsible for the discovery of significant features that are required for analysis. The
different styles and parameters, which are essential for forecasting, are embedded into the style
layer. The relation and interaction between the layers, weights, and factors are expressed by an
error function as:
E(v,u,h;w) = vTˆahTˆ
bn(vTwv)(yTwy)(hTwh)o(36)
where Eis the error function, vTwvis the visible factored, yTwyis the style factored, and hTwh
is the hidden factored. It is a hadamard product used for element wise multiplication. The ˆa
C
R
B
M
L
a
y
e
r
1
C
R
B
M
L
a
y
e
r
2
C
R
B
M
L
a
y
e
r
3
S
t
y
l
e
L
a
y
e
r
Predicted Load
Input Features
Figure 10. Architecture of proposed stacked FCRBM
and ˆ
belements represent dynamic biases associated with visible and hidden layers, respectively,
which are defined as follows:
ˆa=a+AvuTAuyTAyT
ˆ
b=b+BhuTBuyTByT
(37)
where wv,wy,whare weights of the corresponding layers and Av,Au,Ay,Bh,Bu,Byare the con-
nections of the corresponding layers to factors, these are also known as model free parameters.
The connections and weights are the parameters that must be fine tuned and trained for accurate
performance of deep learning technique stacked FCRBM.
6.3 Conditional probability
In case of stacked FCRBM, conditional probability determines probability distribution of one
layer conditioned over all the remaining layers. In first case, we define probability distribution of
hidden layer conditioned over all the remaining layers p(h|v,u,y). There is no intra-layer con-
nection between the neurons of the same layer, but inter-layer connection between the neurons
of different layers. The conditional probability of hidden layer can be calculated as:
p(h|v,u,y) = ReLU hˆ
b+whvTwvyTwyi(38)
where ReLU is defined in Equation 38.
For all inputs, probability of hidden layer neurons is evaluated using ReLU activation func-
tion.
In second case, we determine the probability of the visible layer i.e., p(v|h,u,y)conditioned
over remaining layers. The conditional probability of visible is defined as:
p(v|h,u,y) = ReLUhˆa+wvnhTwhyTwyoi (39)
Finally, we define the joint probability distribution of visible and hidden layer neurons con-
ditioned on history layer, style layer, and model parameters p(v,h|u,y, ...). The restriction is that
there is no intra-layer connection between the neurons while there is only inter-layer connection
between the neurons of different layers. The joint probability is calculated as:
p(v,h|u,y, ...) = ReLUˆ
b+wh{(vTwv)(yTwy)}
ˆa+wv{(hTwh)(yTwy)} (40)
Equation 41 represents the joint probability distribution of visible and hidden layer neurons.
Algorithm 1 Pesudo-code of the proposed short-term load forecasting model
1: Import the off-line data of the US utility
2: Restore the defective and missing values by data cleansing phase
3: Normalize the data w.r.t. its maximum value by data normalization phase
4: Change the data structure by data structuring phase
5: Extract the desired features from the data and split into training, testing, and validation
datasets
6: Create architecture of the CRBM
7: Create architecture of the stacked FCRBM
8: Initialize parameters: learning rate η, weights w, and biases b
9: repeat for the number of training epochs
10: if Model selected is CRBM do
11: for available training data do
12: Adjust the visible layer v
13: Adjust the history layer u
14: Adjust the hidden layer h
15: Create weights and biases of the corresponding layers
16: Convolve weights to the corresponding layers
17: Add dynamic biases to the weighted sum
18: Process the results (using Equations 30-31)
19: Calculate the error function of CRBM
20: Passed the error function through stochastic gradient decent (using Equations 32-33)
21: Update the weights and biases (using Equations 34-35)
22: end for
23: else if Pick stacked FCRBM do
24: for available training data do
25: Adjust the visible layer v
26: Adjust the history layer u
27: Adjust the hidden layer h
28: Adjust the style layer y
29: Create the factored visible, factored hidden, and factored label weights
30: Interact factored weights to the corresponding layers
31: Add dynamic biases to the factor weighted sum
32: Passed the result to the activation function: ReLU (using Equation 36)
33: Calculate the error function for stacked FCRBM
34: Passed the error function through stochastic gradient decent (using Equations 41-44)
35: Update the weights and biases (using Equations 45-48)
36: end for
37: end if
38: until converge
6.4 Stacked FCRBM weights and biases learning rules
We adopt stochastic gradient decent for learning and updating rules to overcome the problem
of vanishing gradient. Moreover, the stochastic gradient decent converges faster and avoids
overfitting on large datasets as compared to batch gradient decent and mini-batch gradient decent
algorithms [102]. The gradient of the weights for each layer is calculated as:
wh
t=ηE
wh
wv
t=ηE
wv
wy
t=ηE
wy
(41)
For each layer the gradient of connections are calculated as follows:
Au
t=ηE
Au
Av
t=ηE
Av
Ay
t=ηE
Ay
(42)
Bu
t=ηE
Bu
Bh
t=ηE
Bh
By
t=ηE
By
(43)
The gradient of dynamic biases are as follow:
ˆa=ηE
v
ˆ
b=ηE
h
(44)
The weights of corresponding layers are updated as:
wh
t+1=wh
t+wh
t
wy
t+1=wy
t+wy
t
wv
t+1=wv
t+wv
t
(45)
The connections are updated as follows:
Au
t+1=Au
t+Au
t
Av
t+1=Av
t+Av
t
Ay
t+1=Ay
t+Ay
t
(46)
Bu
t+1=Bu
t+Bu
t
Bh
t+1=Bh
t+Bht
By
t+1=By
t+Byt
(47)
The dynamic biases are updated as follows:
ˆat+1=ˆat+ˆat
ˆ
bt+1=ˆ
bt+ˆ
bt
(48)
where Equation 45 is weight update equation for each layer, Equation 46, Equation 47, and
Equation 48 are dynamic biases update equations. Pseudo-code of our proposed short-term load
forecasting model is given as in Algorithm 1.
7 Research methodology
The main objective of this research work is to design an accurate and fast converging model
based on deep neural network for the decision making of the SG. Thus, a novel hybrid forecast
model composed of MMI, FCRBM, and GWDO techniques is proposed for short-term electric
load forecasting. The aforementioned techniques are arranged in coordinated modular frame-
work to construct the proposed hybrid model. Furthermore, the proposed model is tested on
hourly historical load data of three USA grids (FE, Daytown, and EKPC) and global energy
forecasting competition 2012. The results utilizing the proposed model have proven more ac-
curate when compared to the existing models like ANN, CNN, Bi-level, MI-ANN (MI-ANN),
and accurate fast converging-ANN (AFC-ANN). Subsequently, the main body of this research
work will involve the following topics:
1. Acquiring the basic knowledge and detailed literature survey
2. Research gap analysis
3. Statement of the problem
4. Investigation of the proposed system model
5. Provision of the requirements for proposed methods evaluation
6. Simulation results and discussion study
7. Comparison with existing literature models
8. Thesis write-up
References
[1] Javaid, Nadeem, Ghulam Hafeez, Sohail Iqbal, Nabil Alrajeh, Mohamad Souheil Alabed,
and Mohsen Guizani. “Energy efficient integration of renewable energy sources in the smart
grid for demand side management." IEEE Access 6 (2018): 77077-77096.
[2] Xiao, Liye, Wei Shao, Chen Wang, Kequan Zhang, and Haiyan Lu. “Research and applica-
tion of a hybrid model based on multi-objective optimization for electrical load forecasting."
Applied Energy 180 (2016): 213-233.
[3] Alahakoon, Damminda, and Xinghuo Yu. “Smart electricity meter data intelligence for
future energy systems: A survey." IEEE Transactions on Industrial Informatics 12, no. 1
(2016): 425-436.
[4] Hernandez, Luis, Carlos Baladron, Javier M. Aguiar, Belén Carro, Antonio J. Sanchez-
Esguevillas, Jaime Lloret, and Joaquim Massana. “A survey on electric power demand fore-
casting: future trends in smart grids, microgrids and smart buildings." IEEE Communica-
tions Surveys & Tutorials 16, no. 3 (2014): 1460-1495.
[5] Rahman, Aowabin, Vivek Srikumar, and Amanda D. Smith. “Predicting electricity con-
sumption for commercial and residential buildings using deep recurrent neural networks."
Applied Energy 212 (2018): 372-385.
[6] Boroojeni, Kianoosh G., M. Hadi Amini, Shahab Bahrami, S. S. Iyengar, Arif I. Sarwat,
and Orkun Karabasoglu. “A novel multi-time-scale modeling for electric power demand
forecasting: From short-term to medium-term horizon." Electric Power Systems Research
142 (2017): 58-73.
[7] Xu, Xiaomin, Dongxiao Niu, Qiong Wang, Peng Wang, and Desheng Dash Wu. “Intelli-
gent forecasting model for regional power grid with distributed generation." IEEE Systems
Journal 11, no. 3 (2017): 1836-1845.
[8] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. “Deep
belief network based electricity load forecasting: An analysis of Macedonian case." Energy
115 (2016): 1688-1700.
[9] Hong, Wei-Chiang. “Electric load forecasting by seasonal recurrent SVR (support vector
regression) with chaotic artificial bee colony algorithm." Energy 36, no. 9 (2011): 5568-
5578.
[10] Ahmad, Ashfaq, Nadeem Javaid, Mohsen Guizani, Nabil Alrajeh, and Zahoor Ali Khan.
"An accurate and fast converging short-term load forecasting model for industrial appli-
cations in a smart grid." IEEE Transactions on Industrial Informatics 13, no. 5 (2017):
2587-2596.
[11] Hahn, Heiko, Silja Meyer-Nieberg, and Stefan Pickl. “Electric load forecasting methods:
Tools for decision making." European journal of operational research 199, no. 3 (2009):
902-907.
[12] Taylor, James W. “An evaluation of methods for very short-term load forecasting using
minute-by-minute British data." International Journal of Forecasting 24, no. 4 (2008): 645-
658.
[13] De Felice, Matteo, and Xin Yao. “Short-term load forecasting with neural network ensem-
bles: A comparative study [application notes]." IEEE Computational Intelligence Magazine
6, no. 3 (2011): 47-56.
[14] Pedregal, Diego J., and Juan R. Trapero. “Mid-term hourly electricity forecasting based on
a multi-rate approach." Energy Conversion and Management 51, no. 1 (2010): 105-111.
[15] Filik, Ümmühan Ba¸saran, Ömer Nezih Gerek, and Mehmet Kurban. “A novel modeling
approach for hourly forecasting of long-term electric energy demand." Energy Conversion
and Management 52, no. 1 (2011): 199-211.
[16] López, M., S. Valero, C. Senabre, J. Aparicio, and A. Gabaldon. “Application of SOM
neural networks to short-term load forecasting: The Spanish electricity market case study."
Electric Power Systems Research 91 (2012): 18-27.
[17] Zjavka, Ladislav, and Václav Snášel. “Short-term powerload forecasting with ordinary dif-
ferential equation substitutions of polynomial networks." Electric Power Systems Research
137 (2016): 113-123.
[18] Liu, Dunnan, Long Zeng, Canbing Li, Kunlong Ma, Yujiao Chen, and Yijia Cao. “A dis-
tributed short-term load forecasting method based on local weather information." IEEE Sys-
tems Journal 12, no. 1 (2018): 208-215.
[19] Ghadimi, Noradin, Adel Akbarimajd, Hossein Shayeghi, and Oveis Abedinia. “Two stage
forecast engine with feature selection technique and improved meta-heuristic algorithm for
electricity load forecasting." Energy 161 (2018): 130-142.
[20] Kong, Weicong, Zhao Yang Dong, David J. Hill, Fengji Luo, and Yan Xu. “Short-term
residential load forecasting based on resident behaviour learning." IEEE Transactions on
Power Systems 33, no. 1 (2018): 1087-1088.
[21] Vrablecova, Petra, Anna Bou Ezzeddine, Viera Rozinajová, Slavomír Šárik, and Arun Ku-
mar Sangaiah. “Smart grid load forecasting using online support vector regression." Com-
puters & Electrical Engineering 65 (2018): 102-117.
[22] González, Jose Portela, Antonio Munoz San Roque, and Estrella Alonso Perez. “Forecast-
ing functional time series with a new Hilbertian ARMAX model: Application to electricity
price forecasting." IEEE Transactions on Power Systems 33, no. 1 (2018): 545-556.
[23] Luo, Jian, Tao Hong, and Shu-Cherng Fang. “Benchmarking robustness of load forecasting
models under data integrity attacks." International Journal of Forecasting 34, no. 1 (2018):
89-104.
[24] Marino, Daniel L., Kasun Amarasinghe, and Milos Manic. "Building energy load fore-
casting using deep neural networks." In Industrial Electronics Society, IECON 2016-42nd
Annual Conference of the IEEE, pp. 7046-7051. IEEE, 2016.
[25] Zeng, Nianyin, Hong Zhang, Weibo Liu, Jinling Liang, and Fuad E. Alsaadi. "A switching
delayed PSO optimized extreme learning machine for short-term load forecasting." Neuro-
computing 240 (2017): 175-182.
[26] Cecati, Carlo, Janusz Kolbusz, Pawel Rozycki, Pierluigi Siano, and Bogdan M. Wilam-
owski. "A novel RBF training algorithm for short-term electric load forecasting and com-
parative studies." IEEE Transactions on industrial Electronics 62, no. 10 (2015): 6519-6529.
[27] Mocanu, Elena, Decebal Constantin Mocanu, Phuong H. Nguyen, Antonio Liotta, Michael
E. Webber, Madeleine Gibescu, and Johannes G. Slootweg. “On-line building energy op-
timization using deep reinforcement learning." IEEE Transactions on Smart Grid (2018).
1-11
[28] Mocanu, Elena, Phuong H. Nguyen, Madeleine Gibescu, and Wil L. Kling. "Deep learning
for estimating building energy consumption." Sustainable Energy, Grids and Networks 6
(2016): 91-99.
[29] Mujeeb, Sana, and Nadeem Javaid. "ESAENARX and DE-RELM: Novel Schemes for Big
Data Predictive Analytics of Electricity Load and Price." Sustainable Cities and Society
(2019).
[30] Kim, Myoungsoo, Wonik Choi, Youngjun Jeon, and Ling Liu. "A Hybrid Neural Network
Model for Power Demand Forecasting." Energies 12, no. 5 (2019): 931.
[31] Huang, Yunyou, Nana Wang, Tianshu Hao, Wanling Gao, Cheng Huang, Jianqing Li,
and Jianfeng Zhan. "LoadCNN: A Efficient Green Deep Learning Model for Day-ahead
Individual Resident Load Forecasting." arXiv preprint arXiv:1908.00298 (2019).
[32] Deng, Zhuofu, Binbin Wang, Yanlu Xu, Tengteng Xu, Chenxu Liu, and Zhiliang Zhu.
"Multi-Scale Convolutional Neural Network With Time-Cognition for Multi-Step Short-
Term Load Forecasting." IEEE Access 7 (2019): 88058-88071.
[33] Mocanu, Elena, Decebal Constantin Mocanu, Phuong H. Nguyen, Antonio Liotta, Michael
E. Webber, Madeleine Gibescu, and J. G. Slootweg. "On-line building energy optimization
using deep reinforcement learning." arXiv preprint arXiv:1707.05878 (2017).
[34] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. "Deep
belief network based electricity load forecasting: An analysis of Macedonian case." Energy
115 (2016): 1688-1700.
[35] Fan, Cheng, Fu Xiao, and Yang Zhao. "A short-term building cooling load prediction
method using deep learning algorithms." Applied energy 195 (2017): 222-233.
[36] Amjady, Nima, Farshid Keynia, and Hamidreza Zareipour. "Short-term load forecast of
microgrids by a new bilevel prediction strategy." IEEE Transactions on smart grid 1, no. 3
(2010): 286-294.
[37] Amjady, Nima, and Farshid Keynia. "Day-ahead price forecasting of electricity markets
by mutual information technique and cascaded neuro-evolutionary algorithm." IEEE Trans-
actions on Power Systems 24, no. 1 (2009): 306-318.
[38] Dedinec, Aleksandra, Sonja Filiposka, Aleksandar Dedinec, and Ljupco Kocarev. "Deep
belief network based electricity load forecasting: An analysis of Macedonian case." Energy
115 (2016): 1688-1700.
[39] Ryu, Seunghyoung, Jaekoo Noh, and Hongseok Kim. "Deep neural network based demand
side short term load forecasting." Energies 10, no. 1 (2016): 3.
[40] Qiu, Xueheng, Ye Ren, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga.
"Empirical mode decomposition based ensemble deep learning for load demand time series
forecasting." Applied Soft Computing 54 (2017): 246-255.
[41] Liu, Dunnan, Long Zeng, Canbing Li, Kunlong Ma, Yujiao Chen, and Yijia Cao. “A dis-
tributed short-term load forecasting method based on local weather information." IEEE Sys-
tems Journal 12, no. 1 (2018): 208-215.
[42] Khalid, Rabiya, Nadeem Javaid, Muhammad Hassan Rahim, Sheraz Aslam, and Arshad
Sher. “Fuzzy energy management controller and scheduler for smart homes." Sustainable
Computing: Informatics and Systems 21 (2019): 103-118.
[43] Iqbal, Sajid, Muhammad U. Ghani Khan, Tanzila Saba, Zahid Mehmood, Nadeem Javaid,
Amjad Rehman, and Rashid Abbasi. "Deep learning model integrating features and novel
classifiers fusion for brain tumor segmentation." Microscopy research and technique (2019).
[44] Javaid, Nadeem, Fahim Ahmed, Ibrar Ullah, Samia Abid, Wadood Abdul, Atif Alamri,
and Ahmad Almogren. "Towards cost and comfort based hybrid optimization for residential
load scheduling in a smart grid." Energies 10, no. 10 (2017): 1546.
[45] Khan, Zahoor Ali, Ayesha Zafar, Sakeena Javaid, Sheraz Aslam, Muhammad Hassan
Rahim, and Nadeem Javaid. "Hybrid meta-heuristic optimization based home energy man-
agement system in smart grid." Journal of Ambient Intelligence and Humanized Computing
(2019): 1-17.
[46] Khan, Zahoor Ali, Ayesha Zafar, Sakeena Javaid, Sheraz Aslam, Muhammad Hassan
Rahim, and Nadeem Javaid. "Hybrid meta-heuristic optimization based home energy man-
agement system in smart grid." Journal of Ambient Intelligence and Humanized Computing
(2019): 1-17.
[47] Aslam, Sheraz, Zafar Iqbal, Nadeem Javaid, Zahoor Khan, Khursheed Aurangzeb, and
Syed Haider. "Towards efficient energy management of smart buildings exploiting heuristic
optimization with real time and critical peak pricing schemes." Energies 10, no. 12 (2017):
2065.
[48] Collotta, Mario, and Giovanni Pau. “An innovative approach for forecasting of energy
requirements to improve a smart home management system based on BLE." IEEE Transac-
tions on Green Communications and Networking 1, no. 1 (2017): 112-120.
[49] Shi, Heng, Minghao Xu, and Ran Li. “Deep learning for household load forecasting—a
novel pooling deep RNN." IEEE Transactions on Smart Grid 9, no. 5 (2018): 5271-5280.
[50] Kong, Weicong, Zhao Yang Dong, David J. Hill, Fengji Luo, and Yan Xu. “Short-term
residential load forecasting based on resident behaviour learning." IEEE Transactions on
Power Systems 33, no. 1 (2018): 1087-1088.
[51] Huang, Xuefei, Seung Ho Hong, and Yuting Li. “Hour-ahead price based energy manage-
ment scheme for industrial facilities." IEEE Transactions on Industrial Informatics 13, no.
6 (2017): 2886-2898.
[52] Li, Liangzhi, Kaoru Ota, and Mianxiong Dong. “When weather matters: IoT-based elec-
trical load forecasting for smart grid." IEEE Communications Magazine 55, no. 10 (2017):
46-51.
[53] Wang, Yu, Yinxing Shen, Shiwen Mao, Guanqun Cao, and Robert M. Nelms. “Adaptive
learning hybrid model for solar intensity forecasting." IEEE Transactions on Industrial In-
formatics 14, no. 4 (2018): 1635-1645.
[54] Tang, Ningkai, Shiwen Mao, Yu Wang, and R. M. Nelms. “Solar Power Generation Fore-
casting with a LASSO-based Approach." IEEE Internet of Things Journal (2018). 1-10
[55] van der Meer, D. W., J. Munkhammar, and J. Widén. “Probabilistic forecasting of solar
power, electricity consumption and net load: Investigating the effect of seasons, aggregation
and penetration on prediction intervals." Solar Energy 171 (2018): 397-413.
[56] Zhang, Jinliang, Yi-Ming Wei, Dezhi Li, Zhongfu Tan, and Jianhua Zhou. “Short term
electricity load forecasting using a hybrid model." Energy (2018).
[57] Tong, Chao, Jun Li, Chao Lang, Fanxin Kong, Jianwei Niu, and Joel JPC Rodrigues. “An
efficient deep model for day-ahead electricity load forecasting with stacked denoising auto-
encoders." Journal of Parallel and Distributed Computing 117 (2018): 267-273.
[58] Wang, Pu, Bidong Liu, and Tao Hong. “Electric load forecasting with recency effect: A
big data approach." International Journal of Forecasting 32, no. 3 (2016): 585-597.
[59] Carvallo, Juan Pablo, Peter H. Larsen, Alan H. Sanstad, and Charles A. Goldman. “Long
term load forecasting accuracy in electric utility integrated resource planning." Energy Pol-
icy 119 (2018): 410-422.
[60] Yuan, Jihui, Craig Farnham, Chikako Azuma, and Kazuo Emura. “Predictive artificial neu-
ral network models to forecast the seasonal hourly electricity consumption for a University
Campus." Sustainable Cities and Society 42 (2018): 82-92.
[61] Brodowski, Stanisław, Andrzej Bielecki, and Maciej Filocha. “A hybrid system for fore-
casting 24-h power load profile for Polish electric grid." Applied Soft Computing 58 (2017):
527-539.
[62] Qiu, Xueheng, Ye Ren, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga.
“Empirical mode decomposition based ensemble deep learning for load demand time series
forecasting." Applied Soft Computing 54 (2017): 246-255.
[63] Yang, Ailing, Weide Li, and Xuan Yang. “Short-term electricity load forecasting based on
feature selection and Least Squares Support Vector Machines." Knowledge-Based Systems
163 (2019): 159-173.
[64] Qiu, Xueheng, Ponnuthurai Nagaratnam Suganthan, and Gehan AJ Amaratunga. “Ensem-
ble incremental learning Random Vector Functional Link network for short-term electric
load forecasting." Knowledge-Based Systems 145 (2018): 182-196.
[65] Chen, Yanhua, Marius Kloft, Yi Yang, Caihong Li, and Lian Li.“Mixed kernel based ex-
treme learning machine for electric load forecasting." Neurocomputing 312 (2018): 90-106.
[66] Zeng, Nianyin, Hong Zhang, Weibo Liu, Jinling Liang, and Fuad E. Alsaadi. “A switching
delayed PSO optimized extreme learning machine for short-term load forecasting." Neuro-
computing 240 (2017): 175-182.
[67] Zhang, Xiaobo, Jianzhou Wang, and Kequan Zhang. “Short-term electric load forecast-
ing based on singular spectrum analysis and support vector machine optimized by Cuckoo
search algorithm." Electric Power Systems Research 146 (2017): 270-285.
[68] Chen, Yibo, Hongwei Tan, and Umberto Berardi. “Day-ahead prediction of hourly elec-
tric demand in non-stationary operated commercial buildings: A clustering-based hybrid
approach." Energy and Buildings 148 (2017): 228-237.
[69] Guo, Zhifeng, Kaile Zhou, Xiaoling Zhang, and Shanlin Yang. “A deep learning model for
short-term power load and probability density forecasting." Energy 160 (2018): 1186-1200.
[70] Ghadimi, Noradin, Adel Akbarimajd, Hossein Shayeghi, and Oveis Abedinia. “Two stage
forecast engine with feature selection technique and improved meta-heuristic algorithm for
electricity load forecasting." Energy 161 (2018): 130-142.
[71] Li, Yanying, Jinxing Che, and Youlong Yang. “Subsampled support vector regression en-
semble for short term electric load forecasting." Energy 164 (2018): 160-170.
[72] Jawad, Muhammad, Sahibzada M. Ali, Bilal Khan, Chaudry A. Mehmood, Umar Farid,
Zahid Ullah, Saeeda Usman et al. “Genetic algorithm-based non-linear auto-regressive with
exogenous inputs neural network short-term and medium-term uncertainty modelling and
prediction for electrical load and wind speed." The Journal of Engineering 2018, no. 8
(2018): 721-729.
[73] Manjili, Yashar Sahraei, Rolando Vega, and Mo M. Jamshidi. “Data-Analytic-Based Adap-
tive Solar Energy Forecasting Framework." IEEE Systems Journal 12, no. 1 (2018): 285-
296.
[74] Semero, Yordanos Kassa, Jianhua Zhang, and Dehua Zheng. “PV power forecasting us-
ing an integrated GA-PSO-ANFIS approach and Gaussian process regression based feature
selection strategy." CSEE Journal of Power and Energy Systems 4, no. 2 (2018): 210-218.
[75] Liang, Yi, Dongxiao Niu, and Wei-Chiang Hong. "Short term load forecasting based on
feature extraction and improved general regression neural network model." Energy 166
(2019): 653-663.
[76] Devarajan, Sandhiya, and S. Chitra. "LOAD FORECASTING MODEL FOR ENERGY
MANAGEMENT SYSTEM USING ELMAN NEURAL NETWORK." International Re-
search Journal of Multidisciplinary Technovation 1, no. 5 (2019): 48-56.
[77] Bianchini, Monica, and Franco Scarselli. "On the complexity of neural network classi-
fiers: A comparison between shallow and deep architectures." IEEE transactions on neural
networks and learning systems 25, no. 8 (2014): 1553-1565.
[78] Mhaskar, Hrushikesh, Qianli Liao, and Tomaso Poggio. "When and why are deep net-
works better than shallow ones?." In Thirty-First AAAI Conference on Artificial Intelli-
gence. 2017.
[79] Bao, Zongfan, Yongquan Zhou, Liangliang Li, and Mingzhi Ma. “A hybrid global opti-
mization algorithm based on wind driven optimization and differential evolution." Mathe-
matical Problems in Engineering 2015 (2015). 608-620
[80] Zhang, Qingchen, Laurence T. Yang, Zhikui Chen, and Peng Li. "A survey on deep learn-
ing for big data." Information Fusion 42 (2018): 146-157.
[81] Kim, Junhong, Jihoon Moon, Eenjun Hwang, and Pilsung Kang. "Recurrent inception
convolution neural network for multi short-term load forecasting." Energy and Buildings
194 (2019): 328-341.
[82] Hafeez, Ghulam, Noor Islam, Ammar Ali, Salman Ahmad, Muhammad Usman, and Khur-
ram Saleem Alimgeer. "A Modular Framework for Optimal Load Scheduling under Price-
Based Demand Response Scheme in Smart Grid." Processes 7, no. 8 (2019): 499.
[83] https://www.kaggle.com/c/GEFC-2012(accesedon17FebApril2019).
[84] Ghulam Hafeez, Nadeem Javaid, Muhammad Riaz, Ammar Ali, Khalid Umar, and Zafar
Iqbal “Day ahead electric load forecasting by an intelligent hybrid model based on deep
learning for smart grid”, 13th International Conference on Complex, Intelligent, and Soft-
ware Intensive System pp. 276-290. Springer, Cham, 2019.
[85] Abedinia, Oveis, Nima Amjady, and Hamidreza Zareipour. “A new feature selection tech-
nique for load and price forecast of electrical power systems." IEEE Transactions on Power
Systems 32, no. 1 (2017): 62-74.
[86] Khwaja, A. S., M. Naeem, A. Anpalagan, A. Venetsanopoulos, and B. Venkatesh. “Im-
proved short-term load forecasting using bagged neural networks." Electric Power Systems
Research 125 (2015): 109-115.
[87] Hafeez, Ghulam, Nadeem Javaid, Sohail Iqbal, and Farman Khan. “Optimal residential
load scheduling under utility and rooftop photovoltaic units." Energies 11, no. 3 (2018):
611.
[88] Ghulam Hafeez, Nadeem Javaid, Muhmammad Riaz, and Zafar Iqbal. “An innovative
model based on FCRBM for load forecasting in the smart grid.” In International Confer-
ence on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 276-290.
Springer, Cham, 2019.
[89] Kwak, Nojun, and Chong-Ho Choi. "Input feature selection for classification problems."
IEEE transactions on neural networks 13, no. 1 (2002): 143-159.
[90] Latham, Peter E., and Sheila Nirenberg. “Synergy, redundancy, and independence in pop-
ulation codes, revisited." Journal of Neuroscience 25, no. 21 (2005): 5195-5206.
[91] Peng, Hanchuan, Fuhui Long, and Chris Ding. “Feature selection based on mutual infor-
mation criteria of max-dependency, max-relevance, and min-redundancy." IEEE Transac-
tions on pattern analysis and machine intelligence 27, no. 8 (2005): 1226-1238.
[92] Estévez, Pablo A., Michel Tesmer, Claudio A. Perez, and Jacek M. Zurada. “Normalized
mutual information feature selection." IEEE Transactions on Neural Networks 20, no. 2
(2009): 189-201.
[93] Amjady, Nima, and Farshid Keynia. “A new prediction strategy for price spike forecasting
of day-ahead electricity markets." Applied Soft Computing 11, no. 6 (2011): 4246-4256.
[94] Engelbrecht, A.P. Computational Intelligence: An Introduction, 2nd ed.; John Wiley &
Sons: New York, NY, USA, 2007.
[95] Anderson, Charles W., Erik A. Stolz, and Sanyogita Shamsunder. “Multivariate autore-
gressive models for classification of spontaneous electroencephalographic signals during
mental tasks." IEEE Transactions on Biomedical Engineering 45, no. 3 (1998): 277-286.
[96] Bao, Zongfan, Yongquan Zhou, Liangliang Li, and Mingzhi Ma. “A hybrid global opti-
mization algorithm based on wind driven optimization and differential evolution." Mathe-
matical Problems in Engineering 2015 (2015): 620-635.
[97] Man, Kim-Fung, Kit-Sang Tang, and Sam Kwong. “Genetic algorithms: concepts and
applications [in engineering design]." IEEE transactions on Industrial Electronics 43, no. 5
(1996): 519-534.
[98] V. Mnih, H. Larochelle, G. Hinton, Conditional restricted Boltzmann machines for struc-
tured output prediction, in: Proceedings of the International Conference on Uncertainty in
Artificial Intelligence (2011).
[99] Hinton, Geoffrey E. "A practical guide to training restricted Boltzmann machines." In
Neural networks: Tricks of the trade, pp. 599-619. Springer, Berlin, Heidelberg, 2012.
[100] G.W. Taylor, G.E. Hinton, S.T. Roweis, Two distributed-state models for generating high-
dimensional time series, J. Mach. Learn. Res. 12 (2011) 1025–1068.
[101] Mocanu, Elena, Phuong H. Nguyen, Madeleine Gibescu, Emil Mahler Larsen, and Pierre
Pinson. "Demand forecasting at low aggregation levels using factored conditional restricted
boltzmann machine." In 2016 Power Systems Computation Conference (PSCC), pp. 1-7.
IEEE, 2016.
[102] Introduction to various types of gradient descent.
https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-
configure-batch-size/] (last accessed: August 13, 2019)
[103] Mocanu, Decebal Constantin, Haitham Bou Ammar, Dietwig Lowet, Kurt Driessens,
Antonio Liotta, Gerhard Weiss, and Karl Tuyls. "Factored four way conditional restricted
boltzmann machines for activity recognition." Pattern Recognition Letters 66 (2015): 100-
108.
Tentative Time Table
Sr No. Activity Date
1 Background study and detailed literature review Completed
2 Formulation of problem and proposing solution August
3 Analysis and dissemination of results November
4 Thesis writing December
PART II
Recommendation by the Research Supervisor
Name_________________________Signature_____________________Date________
Recommendation by the Research Co-Supervisor
Name_________________________Signature_____________________Date________
Signed by Supervisory Committee
S.# Name of Committee member Designation Signature & Date
1
2
3
4
Approved by Departmental Advisory Committee
Certified that the synopsis has been seen by members of DAC and considered it suitable for
putting up to BASAR.
Secretary
Departmental Advisory Committee
Name: _____________________________
Signature: _____________________________
Date: _____________________________
Chairman/HoD: ____________________________
Signature: _____________________________
Date: _____________________________
PART III
Dean, Faculty of Engineering
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
Signature_____________________Date________
Secretary BASAR
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
Signature_____________________Date________
Dean, Faculty of Engineering
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Signature_____________________Date________
Please provide the list of courses studied
1. Power Transmission and Distribution
2. Advanced Power System Analysis
3. Special Topics in Computer Networks
4. Smart Grid System Operation
5. Advanced Topics in Optical Communication
6. Antennas Theory Design and Applications
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Electric load forecasting is used for forecasting of future electric loads. Since the economy and reliability of operations of a power system are greatly affected by electric load, cost savings mainly depend on load forecasting accuracy. An accurate system load forecasting which is used to calculate short-term electric load forecasts, is an essential component of any Energy Management System (EMS). This can be improved by making use of Artificial Neural Networks (ANN). Existing Boosted Neural Networks (BooNN) technique helps in reduction of forecasting errors and variation in forecasting accuracy. However it is not flexible to rapid load changes.In the proposed work, Elman Neural Network technique is considered. This technique improves the load forecasting accuracy. The proposed method is implemented in IEEE 14 bus system. Simulation results showed that this method has increased the Voltage profile and also the active power losses have been reduced. Overall power transfer capability has been improved. Also the computational time has been minimized when compared to the existing techniques.
Article
Full-text available
With the emergence of the smart grid (SG), real-time interaction is favorable for both residents and power companies in optimal load scheduling to alleviate electricity cost and peaks in demand. In this paper, a modular framework is introduced for efficient load scheduling. The proposed framework is comprised of four modules: power company module, forecaster module, home energy management controller (HEMC) module, and resident module. The forecaster module receives a demand response (DR), information (real-time pricing scheme (RTPS) and critical peak pricing scheme (CPPS)), and load from the power company module to forecast pricing signals and load. The HEMC module is based on our proposed hybrid gray wolf-modified enhanced differential evolutionary (HGWmEDE) algorithm using the output of the forecaster module to schedule the household load. Each appliance of the resident module receives the schedule from the HEMC module. In a smart home, all the appliances operate according to the schedule to reduce electricity cost and peaks in demand with the affordable waiting time. The simulation results validated that the proposed framework handled the uncertainties in load and supply and provided optimal load scheduling, which facilitates both residents and power companies.
Article
Full-text available
Electric load forecasting has always been a key component of power grids. Many countries have opened up electricity markets and facilitated the participation of multiple agents, which creates a competitive environment and reduces costs to consumers. In the electricity market, multi-step short-term load forecasting becomes increasingly significant for electricity market bidding and spot price calculation, but performances of traditional algorithms are not robust and unacceptable enough. In recent years, the rise of deep learning gives us the opportunity to improve the accuracy of multi-step forecasting further. In this paper, we propose a novel model multi-scale convolutional neural network with time-cognition (TCMS-CNN). At first, a deep convolutional neural network model based on multi-scale convolutions (MS-CNN) extracts different level features that are fused into our network. In addition, we design an innovative time coding strategy called the periodic coding strengthening the ability of sequential model for time cognition effectively. At last, we integrate MS-CNN and periodic coding into the proposed TCMS-CNN model with an end to end training and inference process. With ablation experiments, MS-CNN and periodic coding methods had better performances obviously than the most popular methods at present. Specifically, for 48 steps point load forecasting, TCMS-CNN had been improved by 34.73%, 14.22%, and 19.05% on MAPE than the state-of-the-art methods recursive multi-step LSTM (RM-LSTM), direct multi-step MS-CNN (DM-MS-CNN), and the direct Multi-step GCNN(DM-GCNN) respectively. For 48 steps probabilistic load forecasting, TCMS-CNN had been improved by 3.54% and 6.77% on average pinball score than DM-MS-CNN and DM-GCNN. These results show a great promising potential applied in practice.
Article
Full-text available
Accurate forecasting of the electricity price and load is an essential and challenging task in smart grids. Since electricity load and price have a strong correlation, the forecast accuracy degrades when bidirectional relation of price and load is not considered. Therefore, this paper considers price and load relationship and proposes two Multiple Inputs Multiple Outputs (MIMO) Deep Recurrent Neural Networks (DRNNs) models for price and load forecasting. The first proposed model, Efficient Sparse Autoencoder Nonlinear Autoregressive Network with eXogenous inputs (ESAENARX) comprises of feature engineering and forecasting. For feature engineering, we propose ESAE and performed forecasting using existing method NARX. The second proposed model: Differential Evolution Recurrent Extreme Learning Machine (DE-RELM) is based on RELM model and the meta-heuristic DE optimization technique. The descriptive and predictive analyses are performed on two well-known electricity markets' big data, i.e., ISO NE and PJM. The proposed models outperform their sub models and a benchmark model. The refined and informative features extracted by ESAE improve the forecasting accuracy in ESANARX and optimization improves the DE-RELMâââs accuracy. As compared to cascade Elman network, ESAENARX has reduced MAPE upto 16% for load forecasting, 7% for price forecasting. DE-RELM reduce 1% MAPE for both load and price forecasting.
Article
Full-text available
Accurate forecasting of electric loads has a great impact on actual power generation, power distribution, and tariff pricing. Therefore, in recent years, scholars all over the world have been proposing more forecasting models aimed at improving forecasting performance; however, many of them are conventional forecasting models which do not take the limitations of individual predicting models or data preprocessing into account, leading to poor forecasting accuracy. In this study, to overcome these drawbacks, a novel model combining a data preprocessing technique, forecasting algorithms and an advanced optimization algorithm is developed. Thirty-minute electrical load data from power stations in New South Wales and Queensland, Australia, are used as the testing data to estimate our proposed model’s effectiveness. From experimental results, our proposed combined model shows absolute superiority in both forecasting accuracy and forecasting stability compared with other conventional forecasting models.
Article
Full-text available
The problem of power demand forecasting for the effective planning and operation of smart grid, renewable energy and electricity market bidding systems is an open challenge. Numerous research efforts have been proposed for improving prediction performance in practical environments through statistical and artificial neural network approaches. Despite these efforts, power demand forecasting problems remain to be a grand challenge since existing methods are not sufficiently practical to be widely deployed due to their limited accuracy. To address this problem, we propose a hybrid power demand forecasting model, called (c, l)-Long Short-Term Memory (LSTM) + Convolution Neural Network (CNN). We consider the power demand as a key value, while we incorporate c different types of contextual information such as temperature, humidity and season as context values in order to preprocess datasets into bivariate sequences consisting of <Key, Context[1, c]> pairs. These c bivariate sequences are then input into c LSTM networks with l layers to extract feature sets. Using these feature sets, a CNN layer outputs a predicted profile of power demand. To assess the applicability of the proposed hybrid method, we conduct extensive experiments using real-world datasets. The results of the experiments indicate that the proposed (c, l)-LSTM+CNN hybrid model performs with higher accuracy than previous approaches.
Article
Forecasts of electricity consumption and peak demand over time horizons of one or two decades are a key element in electric utilities’ meeting their core objective and obligation to ensure reliable and affordable electricity supplies for their customers while complying with a range of energy and environmental regulations and policies. These forecasts are an important input to integrated resource planning (IRP) processes involving utilities, regulators, and other stake-holders. Despite their importance, however, there has been little analysis of long term utility load forecasting accuracy. We conduct a retrospective analysis of long term load forecasts on twelve Western U. S. electric utilities in the mid-2000s to find that most overestimated both energy consumption and peak demand growth. A key reason for this was the use of assumptions that led to an overestimation of economic growth. We find that the complexity of forecast methods and the accuracy of these forecasts are mildly correlated. In addition, sensitivity and risk analysis of load growth and its implications for capacity expansion were not well integrated with subsequent implementation. We review changes in the utilities load forecasting methods over the subsequent decade, and discuss the policy implications of long term load forecast inaccuracy and its underlying causes.
Article
Automatic and precise segmentation and classification of tumor area in medical images is still a challenging task in medical research. Most of the conventional neural network based models usefully connected or convolutional neural networks to perform segmentation and classification. In this research, we present deep learning models using long short term memory (LSTM) and convolutional neural networks (ConvNet) for accurate brain tumor delineation from benchmark medical images. The two different models, that is, ConvNet and LSTM networks are trained using the same data set and combined to form an ensemble to improve the results. We used publicly available MICCAI BRATS 2015 brain cancer data set consisting of MRI images of four modalities T1, T2, T1c, and FLAIR. To enhance the quality of input images, multiple combinations of preprocessing methods such as noise removal, histogram equalization, and edge enhancement are formulated and best performer combination is applied. To cope with the class imbalance problem, class weighting is used in proposed models. The trained models are tested on validation data set taken from the same image set and results obtained from each model are reported. The individual score (accuracy) of ConvNet is found 75% whereas for LSTM based network produced 80% and ensemble fusion produced 82.29% accuracy. ConvNet, LSTMNet and their fusion is employed to improve the tumor segmentation of human brain MRI images taken from publically available multimodal MICCAI BRATS 2015 training data set. Ensemble network provided best results.
Article
Smart grid and microgrid technology based on energy storage systems (ESS) and renewable energy are attracting significant attention in addressing the challenges associated with climate change and energy crises. In particular, building an accurate short-term load forecasting (STLF) model for energy management systems (EMS) is a key factor in the successful formulation of an appropriate energy management strategy. Recent recurrent neural network (RNN)-based models have demonstrated favorable performance in electric load forecasting. However, when forecasting electric load at a specific time, existing RNN-based forecasting models neither use a predicted future hidden state vector nor the fully available past information. Therefore, once a hidden state vector has been incorrectly generated at a specific prediction time, it cannot be corrected for enhanced forecasting of the following prediction times. To address these problems, we propose a recurrent inception convolution neural network (RICNN) that combines RNN and 1-dimensional CNN (1-D CNN). We use the 1-D convolution inception module to calibrate the prediction time and the hidden state vector values calculated from nearby time steps. By doing so, the inception module generates an optimized network via the prediction time generated in the RNN and the nearby hidden state vectors. The proposed RICNN model has been verified in terms of the power usage data of three large distribution complexes in South Korea. Experimental results demonstrate that the RICNN model outperforms the benchmarked multi-layer perception, RNN, and 1-D CNN in daily electric load forecasting (48-time steps with an interval of 30 min).
Article
The integration of information and communication 1 technologies in traditional grid brings about a smart grid. 2 Energy management plays a vital role in maintaining the 3 sustainability and reliability of a smart grid which in turn helps 4 to prevent blackouts. Energy management at consumers side 5 is a complex task, it requires efficient scheduling of appliances 6 with minimum delay to reduce peak-to-average ratio (PAR) 7 and energy consumption cost. In this paper, the classification 8 of appliances is introduced based on their energy consumption 9 pattern. An energy management controller is developed for 10 demand side management. We have used fuzzy logic and 11 heuristic optimization techniques for cost, energy consump-12 tion and PAR reduction. Fuzzy logic is used to control the 13 throttleable and interruptible appliances. On the other hand, 14 the heuristic optimization algorithms, BAT inspired and flower 15 pollination, are employed for scheduling of shiftable appliances. 16 We have also proposed a hybrid optimization algorithm for 17 the scheduling of home appliances, named as hybrid BAT 18 pollination optimization algorithm. Simulation results show a 19 significant reduction in energy consumption, cost and PAR. 20