Data Analytics based Efﬁcient Energy Management in Smart Grids
Dr. Manzoor Ilahi Tamimi,
Dr. Nadeem Javaid
Communication over Sensors (ComSens) Research Lab
Department of Computer Science, COMSATS University Islamabad
Islamabad - Pakistan
List of Publications
1. Sana Mujeeb and Nadeem Javaid, “ESAENARX and DE-RELM: Novel Schemes for Big
Data Predictive Analytics of Electricity Load and Price”, Sustainable Cities and Society,
Volume: 51, Article Number: 101642, Pages: 1-16, Published: November 2019, ISSN:
2210-6707. (IF= 4.624, Q1)
2. Sana Mujeeb, Turki Ali Alghamdi, Sameeh Ullah, Ayesha Fatima, Nadeem Javaid and
Tanzila Saba, “Exploiting Deep Learning for Wind Power Forecasting based on Big Data
Analytics”, Applied Sciences, Volume: 9, Issue: 20, Article Number: 4417, Pages: 1-19,
Published: October 2019, ISSN: 2076-3417. (IF=2.217, Q2).
3. Sana Mujeeb, Nadeem Javaid, Manzoor Ilahi, Zahid Wadud, Farruh Ishmanov and Muham-
mad Khalil Afzal, “Deep Long Short-Term Memory: A New Price and Load Forecasting
Scheme for Big Data in Smart Cities”, Sustainability, Volume: 11, Issue: 4, Article Num-
ber: 987, Pages: 1-29, Published: February 2019, ISSN: 2071-1050. (IF=2.592, Q2)
Conference Proceedings/Book Chapters
1. Sana Mujeeb, Nadeem Javiad and Sakeena Javaid, “Data Analytics for Price Forecasting
in Smart Grids: A Survey”, in the 21st International Multitopic Conference (INMIC),
2018, pp: 1-10.
2. Sana Mujeeb, Nadeem Javaid, Sakeena Javaid, Asma Raﬁque and Manzoor Ilahi, “Big
Data Analytics for Load Forecasting in Smart Grids: A Survey”, International Conference
on Cyber Security and Computer Science (ICONCS), 2018, pp: 193-202.
3. Sana Mujeeb, Nadeem Javaid, Rabiya Khalid, Orooj Nazeer, Isra Shaﬁ and Mahnoor
Khan, “Big Data Analytics for Price and Load Forecasting in Smart Grids”, in 13th Inter-
national Conference on Broadband and Wireless Computing, Communication and Appli-
cations (BWCCA), 2018, pp: 77-87, ISBN: 978-3-030-02613-4.
COMSATS University Islamabad, Islamabad Campus
Synopsis For the Degree of M.S/MPhil.
Name of Student Sana Mujeeb
Department Department of Computer Science
Sp15-pcs-003 Date of Thesis Registration 01-07-2016
(i) Research Supervisor
(i) Dr. Manzoor Ilahi
(ii) Dr. Nadeem Javaid
Research Area Data Science
Members of Supervisory Committee
1 Dr. Sohail Asghar
2 Dr. Nadeem Javaid
3 Dr. Manzoor Ilahi
4 Dr. Majid Iqbal
Title of Research Proposal Data Analytics based Efﬁcient Energy Management in Smart Grids
Signature of Student:
Summary of the Research
With the advent of Smart Grid (SG), the data collected by smart meters and Phasor Measurement
Units (PMUs) has become a valuable source for grid operators and researchers to perform ad-
vanced analytics. In SG, the energy-related data is collected in a very huge volume, at a high
velocity from a variety of sources. Data analytics provide solutions to the emerging challenges
of power systems, such as: Demand Side Management (DSM), environmental pollution (due to
carbon emission), fossil fuel dependency mitigation, reliable Renewable Energy Sources (RESs)
incorporation, cost curtailment, grid’s stability and security. The global energy demand is increas-
ing with the increasing population. The trivial power generation source, i.e., fossil fuel is decreas-
ing continuously. Moreover, the environmental pollution is increasing at an alarming rate due to
the carbon emission for trivial power generation sources. Therefore, effective DSM and RES in-
corporation have become important to maintain demand, supply balance and optimize energy in
an environment friendly manner. DSM programs are based on the future energy consumption and
price predictions. On the other hand, the reliable incorporation of RES is possible if there is a
correct estimation of future generation. For this purpose, Deep Learning (DL) combined with data
analytics techniques are proposed in this research. The aim of this research is to explore the SG
databases and device solutions to the aforementioned problems. First, the predictive modeling is
used for learning the consumption pattern from the data, to ensure the uninterrupted power supply.
Predictive analytics are performed on energy price that is beneﬁcial in effective DSM programs’
formulation. Moreover, as a popular RES, the wind power is analyzed and predicted. A DSM
algorithm is proposed that considers the day-ahead energy price, consumption and wind power
forecast for energy demand management. This research applies the data science techniques to the
smart grid data as well as elaborates the beneﬁts of this emerging data to the smart grid.
Smart Grid (SG) is a modern and intelligent power grid that efﬁciently manages the genera-
tion, distribution and consumption of electricity . It introduced communication, sensing and
control technologies in power grids. It facilitates consumers in an economical, reliable, sustain-
able and secure manner. Consumers can manage their energy demand in an economical fashion
based on Demand Side Management (DSM). The DSM program allows customers to manage
their load demand according to the price variations . It offers energy consumers for load
shifting and energy preservation in order to reduce the cost of power consumption. Smart grid
establishes an interactive environment between energy consumers and utility. Customers par-
take in smart grid operations, for reducing the price by load shifting and energy preservation.
Competitive electricity markets beneﬁt from load and price forecast. Several important operat-
ing decisions are based on load forecasts, such as power generation scheduling, demand supply
management, maintenance planning and reliability analysis. Price forecast is crucial to energy
market participants for bidding strategies formulation, assets allocation, risk assessment and
facility investment planning. Effective bidding strategies help market participants in maximiz-
ing proﬁt. Utility maximization is the ultimate goal of both power producers and consumers.
With the help of a robust and exact price estimate, power producers can maximize proﬁt and
consumers can minimize the cost of their purchased electricity. The efﬁcient generation and
consumption is another crucial issue in the energy sector. Most of the generated electricity
cannot be stored, therefore, a perfect equilibrium is necessary to be maintained between the
generated and consumed electricity. Therefore, an accurate forecast of both electricity load and
price holds a great importance in market operations management.
Electricity load and price have a relationship of direct proportionality. However, some unex-
pected variations are observed in the price data. There are various reasons for these unexpected
changes in price pattern. In reality, the price is not only affected by the change in load. There
are several different parameters that inﬂuence the energy price, such as, fuel price, availabil-
ity of inexpensive generation sources (e.g., photovoltaic generation, windmill generation, etc.),
weather conditions, etc.
Energy data exhibits a few characteristics: (i) data as an energy: data analytics should cause
energy saving, (ii) data as an exchange: energy data should be exchanged and integrated with
other sources of data to identify its value, (iii) data as an empathy: data analytics should help in
improving the service quality of energy utilities .
For making decision regarding energy market operation, predictive analytics is performed on
this load and price data. For maintaining the demand and supply balance, an accurate predic-
tion of load is essential. Whereas, the price forecasting plays an important role in the bidding
process and energy trading. Data analytics enables identiﬁcation of hidden patterns, consumer
preferences, market trends, and other valuable information that helps utility company to make
strategic business decisions. The size of real-world historical data of smart grid is very large
. Authors survey smart grid data with great detail in . This large volume of data enables
energy utilities to make novel analysis leading to major improvements in the market operation’s
planning and management. Utilities can have a better understanding of consumer’s behavior,
demand, consumption, power failures, downtimes, etc.
Smart grid provides energy in an efﬁcient, secure, reliable, economical and environment-friendly
manner. RESs of power generation are integrated for reducing the carbon emission. It allows a
two-way communication between the consumers and utility. With the emergence of smart me-
tering infrastructure, consumers are informed about the price per unit in advance. Consumers
can adjust their demand load economically according to the price signals. They can reduce
consumption cost by shifting load to a low price hour. Smart grids make a price responsive
environment where the price varies from a change in demand and vice versa.
In unidirectional grids, there is a one-way interaction from generation side to consumers that
leads to inefﬁcient energy management.
ABC Artiﬁcial Bee Colony aActivation value of a neuron
AEMO Australia Electricity Market Operators ρAverage activation of sparse parameter
ANN Artiﬁcial Neural Networks b Bias
ARIMA Auto-Regressive Integrated Moving Average C Cost function of SAE
CNN Convolution Neural Networks δDelta function
DNN Deep Neural Networks bDeviation matrix
DSM Demand Side Management εtError term of NARX
DE Differential Evaluation xInput vector to network
DWT Discrete Wavelet Transform αLearning rate
ELM Extreme Learning Machine βMomentum to network
ISONE Independent System Operator New England yNetwork output or forecast value
LSSVM Least Square Support Vector Machine nNumber of delays in NARX
LSTM Long Short Term Memory hQuadrature mirror ﬁlter
MAE Mean Absolute Error cj,kScale coefﬁcient
NARX Nonlinear Autoregressive Network with Exogenous
NRMSE Normalized Root Mean Square Error E Squared error
NYISO New York ISO λThreshold for wavelet denoising
RNN Recurrent Neural Network tTime step
SAE Sparse Auto-Encoders dj,kWavelet coefﬁcient
STLF Short-Term Load Forecast ωWavelet decomposed signal
WNN Wavelet Neural Network W Weight of network connection
WPT Wavelet packet Transform SyWhite noise
The price and demand forecasting play an important role in energy: systems planning, market
design, security of supply, and operation planning for future power consumption. An accurate
forecast is very important. A 1% reduction in Mean Absolute Percentage Error (MAPE) of load
forecast reduces the generation cost to 0.1% to 0.3% . 0.1% generation cost is approximately
$1 million annually in a large scale smart grid. Due to the importance of an accurate forecast
of electricity price and load, the researchers are still competing for improving the forecast accu-
Due to the continuous depletion of the fossil fuel, the energy crisis has become crucial [7,8]. To
mitigate the energy crisis, regulative acts that encourage the utilization of renewable energy are
promoted worldwide. Wind power has attracted a lot of a attention as a RESs, recently. Wind
power has gained popularity due to its characteristics of: wide availability, low investment cost
 and no carbon emission. Wind power helps in reducing environmental pollution . It
is introduced worldwide as a way to reduce greenhouse gas emission. Moreover, wind power
generation leads to fuel cost saving as the wind has zero fuel cost. According to the Global
Wind Energy Council , the cumulative capacity of wind power reached 486 GW across the
global market in 2016. Wind power is expected to signiﬁcantly expand leading to an overall
zero emission power system. The U.S. Department of Energy Target of Renewable Integration
is responsible for providing 20% of the total energy through wind, by the year 2030 . In
this regard, the Independent System Operators (ISOs) are producing signiﬁcant wind power and
increasing their wind generation.
The wind power is majorly affected by meteorological conditions, especially wind speed. The
wind power exhibits strongly volatile and intermittent behavior resulting in uncertain power
output. This uncertainty signiﬁcantly affects the quality of power system operations, such as,
distribution, dispatching, peak load management , etc. The greatest challenge of adapting
the wind power on a large scale is the control of its uncertain output. The effective solution
to this issue is the correct estimate of future wind power. The correct Wind Power Forecasting
(WPF) helps in improving the operation scheduling of power systems. The operating sched-
ule for backup generators and storage systems are optimized based on the accurate WPF. The
accuracy of WPF determines the amount of cost curtailment for power generation . A 1%
improvement in WPF accuracy results in 0.06% reduction in generation system’s cost that is
approximately $6 million cost saving in a large scale power system with 30% wind penetration
It is acknowledged widely that accurate WPF signiﬁcantly reduces the risks of incorporating
wind power in power supply systems . Generally, the WPF results are in the determinis-
tic form (i.e., point forecast). Reducing the forecasting errors of WPF is the focus of many
researchers . A point forecast is the estimated value of future wind energy. However, the
wind power is random variable having a Probability Density Function (PDF), and point fore-
casts are unable to capture the uncertainty of this random variable. This is the limitation of the
point forecasts. Therefore, point forecasts have limited use in stability and security analysis of
power systems. To overcome the limitation of point forecasts, deep learning methods are widely
used in the ﬁeld of WPF. Deep Neural Networks (DNN) have the inherent property of automatic
modeling of the wind power characteristics .
Using large data for predictive analytics improves the forecasting accuracy . Electricity data
is big data, as the smart meters record data in small time intervals. In a large-sized smart grid,
approximately 220 million smart meter measurements are recorded daily. The volume of input
data is increasing and training of classical forecasting methods is difﬁcult. Processing of big
data by classiﬁer based models is very difﬁcult. Because of their high space and time complex-
ity. On the other hand, DNNs perform very well on big data . DNN have an excellent ability
of self learning and nonlinear approximation. They optimize the space by dividing the training
data into mini-batches. After dividing, whole data is trained batch by batch.
2 Research Objectives
The major objectives of this research is efﬁcient, economical and environmental-friendly en-
ergy management using data analytics’ techniques. It includes reducing environmental pollu-
tion, saving energy generation and consumption cost, mitigating dependency on fossil fuels for
generation and minimizing energy losses either they technical losses (due to over-generation) or
non-technical losses (caused by electricity theft). The aims of this research work are:
1. To utilize past of power systems and deep learning techniques for accurately forecasting
the electricity consumption and price.
2. To forecast wind power generation accurately in order to ensure its reliable integration in
power systems to make them eco-friendly.
3. To detect electricity theft by analyzing past consumption data for minimizing the non-
technical loss of power.
4. To forecast photovoltaic power and quantify RESs’ impact on carbon emissions, energy
price and cost.
3 Related Work
The imbalance ratio between energy demand and supply cause energy scarcity. To reduce the
scarcity and utilize energy efﬁciently, DSM and Supply Side Management (SSM) techniques are
proposed. Mostly, researchers focus on appliance scheduling to reduce the load on utility and
balance supply and load. However, with the appliance scheduling, the user comfort is compro-
mised  - . Therefore, Short-Term Load Forecasting (STLF) is important. STLF enables
the utility to generate sufﬁcient electricity to meet the demand.
Several forecasting methods are available in the literature, from classic statistical to modern
machine learning methods. Generally, forecasting models can be categorized into three major
categories: classical, artiﬁcial intelligence and data-driven. Classical methods are the statisti-
cal and mathematical such as, Auto-Regressive Integrated Moving Average (ARIMA), Seasonal
ARIMA (SARIMA), Naive Bayes, Random Forest, etc. Artiﬁcial intelligence methods are Ar-
tiﬁcial Neural Networks (ANNs), Particle Swarm Optimization (PSO), etc. Classiﬁer-based
approaches are widely used for forecasting, such as (Sperm Whale Algorithm) SWA + LSSVM
(Least Square Support Vector Machine) , SVM + PSO , FWPT (Flexible Wavelet Packet
Transform), TVABC (Time-Varying Artiﬁcial Bee Colony), LSSVM (FWPT + LSSVM + TV-
ABC)  and Differential Evaluation (DE) + SVM. Although, the aforementioned methods
show reasonable results in load or price forecasting; however, they are computationally com-
The existing forecasting methods mostly forecast only load or price. A forecasting method that
can accurately forecast both load and price together is greatly required. Conventional forecast-
ing methods in literature have to extract most relevant features with great effort - before
forecasting. For feature extraction, correlation analysis or other feature selection techniques are
used. Whereas, ANNs have an advantage over other methods that they automatically extract
features from data and learn complex and meaningful pattern efﬁciently. However, Shallow
ANN (SANN) - tends to over-ﬁt. The optimization is required for improving forecast
accuracy of SANN.
Recently, DNNs have shown promising results in forecasting of electricity load -, price
- and generation . In , authors used Restricted Boltzman Machine (RBM) with
pre-training and Rectiﬁed Linear Unit (ReLU) to forecast day and week ahead load. RBM re-
sults in accurate forecast compared to ReLU. Deep Auto-Encoders (DAE) are implemented in
the paper  for prediction of building’s cooling load. DAE is unsupervised learning method.
It learns the pattern of data very well and predicts with greater accuracy. The authors of 
implement Gated Recurrent Units (GRU) for price forecasting that is a type of Recurrent Neural
Networks (RNN). GRU outperforms Long Short-Term Memory (LSTM) and several statistical
time series forecasting models. Authors of  proposed a hybrid model for price forecasting.
Two deep learning methods are combined, i.e., Convolution Neural Networks (CNN) are used
for useful feature’s extraction. LSTM forecasting model is learned on features extracted by
CNN. This hybrid model performs better than both CNN and LSTM. This model outperforms
several state-of-the-art forecasting models. The good performance of the aforementioned DNN
models proves the effectiveness of deep learning in forecasting. A brief description of related
work is listed in table 1.
In smart grid, data analysis helps in ﬁnding the trend of electricity consumption - and
price -. This further enables the utility to design predictive demand supply mainte-
nance programs. The demand-supply maintenance programs ensure the demand-supply bal-
ance. Smart grid data is studied for: power system anomaly detection , optimal placement
of computing units for communicating data to smart grid , price forecasting  and con-
sumption forecasting -. The aforementioned methods show reasonable results in load or
price forecasting; however, most of these methods do not consider the forecasting of both load
and price. The classiﬁer based forecasting methods require extensive feature engineering and
model optimization, resulting in high complexity.
To ensure the reliability, stability and security of smart grid, accurate forecasts of electricity load
and price is essential. Electricity load and price have bi-directional nature, therefore simultane-
ous prediction of load and price yields greater accuracy.
In paper , the authors have predicted price and load simultaneously using a multi-stage
forecasting approach. The complex forecasting approach proposed in this work is comprised of
feature selection and multi-stage forecast engine. Features are selected through a modiﬁed Max-
imum Relevancy Minimum Redundancy (MRMR) method. Electricity load and price is forecast
using multi-block ANN known as Elman Neural Network (ENN). The forecasting model is op-
timized by a shark smell optimization method.
Table 1 Related Work
Task Platform / Scenario Dataset Algorithms
Load and price fore-
Hourly data of 6 states OF USA NYISO, 2015 MRMR, Multi-block Elman
ANN, Enhanced shark smell
Load forecast  Historic load Sichuan Energy Internet
Research Center dataset,
Sperm Whale Algorithm,
Wavelet Least Square SVM,
Wavelet transform, inconsis-
tency rate model
Load forecast  Half hourly consumption NYISO, NSW, 2007 Hybrid EMD, PSO, GA, SVR
Price and load fore-
cast, DSM 
Historic price and load Hourly load and price of
(i) NYC, (ii) PJM, (iii)
NYC, 2010, 2013, 2014
FWPT, NLSSVM, ARIMA,
6 second resolution consumption of
5 homes with 109 domestic appli-
UK-Dale, 2012–2015 Association rule mining, In-
cremental k-means clustering,
Price forecasting  Hourly price of 5 hubs of MISO USA, 2012–2014 Stacked Denoising Auto-
Aggregated hourly load of four re-
Los Angeles, California,
Florida, New York City,
USA, August 2015–2016
Electricity market data of 3 grids:
FE, DAYTOWN, and EKPC
PJM, USA, 2015 Mutual Information (MI),
Electricity market data of 2 grids:
DAYTOWN, and EKPC
PJM, USA, 2015 Modiﬁed MI + ANN
Price forecasting  Half hourly price of PJM Intercontinental Ex-
Price forecasting  Turkish day-ahead market electric-
Turkey, 2013–2016 RNN
Cooling load forecast-
Cooling load of an educational
Hong Kong, 2015 Elastic Net, SAE, RF, MLR,
Gradient Boosting (GB) Ma-
chines, Extreme GB tree, SVR
Hourly load of Korea Electric
South Korea, 2012–2014 RBM
Individual house consumption of
7km of Paris
Individual household elec-
tric power consumption,
Conditional RBM (CRBM),
Load forecasting  15 minute resolution of one retail
Fremont, CA SAE, Extreme Learning Ma-
Load forecasting  15 minutes cooling consumption of
a commercial building in Shenzhen
South China, 2015
Empirical Mode Decomposi-
tion (EMD), Deep Belief Net-
Load forecasting  Hourly consumption from Macedo-
nian Transmission Network Opera-
Republic of Macedonia,
Load forecasting  Hourly consumption from Australia AEMO, 2013 EMD, DBN
Hourly consumption of a public
safety building, Salt Lake City,
Utah. Aggregated hourly consump-
tion of residential buildings, Austin,
USA, 2015, 2016 LSTM
Half hourly metropolitan electricity
France, 2008–2016 LSTM, GA
Load forecasting  Hourly aggregated consumption of
6 states OF USA
ISO NE, 2003–2016 Xgboost weighted k-means,
Load forecasting  Ireland consumption Smart meter database of
load proﬁle, Ireland
Pooling deep RNN
Load forecasting  Eight buildings of a public univer-
15 min consumption,
K-means clustering, Davies–
Bouldin distance function
peak demand fore-
Entertainment venues of Ontario Daily, hourly and 15 min
21 zones of USA Temperature, humidity
and consumption data,
Recency effect model without
Wind power forecast-
5–min intervals past wind power SIWF wind farm, China,
Wavelet transform, Ensemble
Wind power forecast-
Wind speed, wind direction, tem-
perature, humidity, pressure
MADE wind farm, ITER,
Tenerife Island, Spain
Feed Forward ANN, SELU
Wind power forecast-
Wind power, weather forecasts 5 Wind farms of Europe Mutual Information, Deep au-
toencoders, Deep Belief Net-
∗Papers have forecasting horizon of medium-term, rest are short-term.
This method results in a reasonable forecasting accuracy. However, it is computationally very
expensive. The feature engineering process and optimization of ENN increase complexity.
Authors of paper  have conducted a predictive analysis of electricity price forecasting tak-
ing advantage of big data. The relevant features for the training prediction model are selected
through an extensive feature engineering process. This process has three steps: ﬁrstly, correlated
features are selected using GCA. Secondly, features are selected through a hybrid of two feature
selection methods: RliefF and Random Forest (RF), is used for further feature selection. Lastly,
the Kernel PCA (KPCA) is applied for dimension reduction. Price is predicted by SVM, the
hyper-parameters of SVM are optimized through modiﬁed DE. Albeit, this framework results
in acceptable accuracy in the price forecasting. However, the extensive feature engineering and
model optimization increase the computational complexity. In paper , the authors forecast
the energy consumption on big data. An analysis of frequent patterns is performed using a su-
pervised clustering method. Energy consumption is forecast by the Bayesian network. Authors
of paper  have utilized the computational power of deep learning for Electricity Price Fore-
casting (EPF). SDA and RANSAC-SDA (RS-SDA) models are implemented for online and day
ahead hourly EPF. The three years (i.e., January 2012 – November 2014) data utilized in this
research. Data is collected from Texas, Arkansas, Nebraska, Indiana and Louisiana ISO hubs
in the USA. Comprehensive analyses of the capabilities of the RS-SDA and SDA models in the
EPF are performed. The effectiveness of the proposed models is validated through their com-
parative analyses with classical ANN, SVM, and multivariate adaptive regression splines. Both
the SDA and RS-SDA models are able to accurately predict electricity price with a considerably
less MAPE as compared to the aforementioned models.
A deep learning model for STLF is proposed by Tong et al . The features are extracted
using SDA from the historical electricity load and corresponding temperature data. Support
Vector Regressor (SVR) model is trained for the day ahead STLF. The SDA has effectively
extracted the abstract features from data. SVR model trained on these extracted features fore-
casts electricity load with low errors. Proposed model outperforms simple SVR and ANN in
terms of forecasting accuracy, which validates its performance. The Shallow SANN is utilized
for electricity load forecasting in  and . SANN have the problem of over-ﬁtting. To
avoid overﬁtting, hyper parameters’ optimization is required, that increases the complexity of
the forecasting model. A hybrid deep learning method is applied to forecast price in . Two
deep learning methods are combined in this research work. Features are extracted by CNN.
Short-term energy price is predicted using LSTM. Half hourly price data of PJM 2017 is used
for prediction. Previous 24 hours price is used to predict the next 1-hour electricity price. The
hybrid DNN structure has 10 hidden layers. It has 2 convolution layers, 2 max-pooling layers,
3 ReLU, 1 batch normalization layer, 1 LSTM layer for prediction and the last hidden layer is
a fully connected layer. The CNN feature extractor has 7 hidden layers and LSTM predictor
has 3 hidden layers. The output of 7t h hidden layer of feature extractor CNN becomes the input
of LSTM predictor. Proposed method outperforms simple CNN, LSTM and various machine
learning methods. Authors of  have utilized the GRU in RNN for EPF.
Recently deep learning forecasting methods has shown good performance in electricity price
- and load forecasting -. However, the interdependency of load and price are
not considered in these DNN forecasting models. Deep learning is an effective technique for big
data analytics . With the high computation power and ability to model huge data, DNN gives
the deeper insights into data. In , authors perform a comprehensive and detailed survey on
the importance of deep learning techniques in the area of big data analytics.
4 Problem statement
To manage electricity efﬁciently, its wastage should be minimized in a reliable, economic and
environment-friendly manner. The electricity is wasted due to over generation and electricity
theft. These two problems can be solved using data analytics. Through accurate electricity de-
mand forecasting, over generation can be avoided. If electricity theft is detected efﬁciently, then
the loss can be compensated. Efﬁcient DSM results in cost effective and environment-friendly
energy management. Therefore, accurate electricity demand forecasting, price forecasting, gen-
eration forecasting and theft detection lead to an efﬁcient power system.
Authors of  perform predictive analytics of electricity price using a hybrid framework. The
extensive feature engineering process increases the computational complexity. Differential
Evolution based optimized SVM tends to over-ﬁt that results in low forecasting accuracy.
To avoid the extensive feature engineering process, the deep learning forecasting methods are
proposed. Kong et al.  present a DNN; LSTM as an individual and aggregated residential
LF model. The LSTM’s weights and biases are randomly initialized; therefore, it has the
problem of slow learning rate, over-ﬁtting and high error rate. In , authors have pre-
sented a novel prior and posterior probability based Bayesian DNN (BDNN) for residential net
LF. It suffer from few major limitations, such as, the hand-crafting of the input features that
requires domain knowledge, model’s high sensitive to the initial prior that is difﬁcult to
select and parameter optimization through Grid Search (GS) that doesnot guarantee the
selection of most appropriate parameter values. Due to these limitations, the accuracy is
negatively affected. Ye et al. in  propose a temporal and Spatial LF (SLF) model using
Sparse Autoencoder (SAE) based feature extractor and Softmax forecaster. The time complex-
ity of the SLF’s training is very high that ranges from 22 minutes to 62 minutes. The
error rate is very high which proves the inefﬁciency of the forecasting model*. In ,
authors simultaneously predict load and price using a multi-stage forecasting approach. It is
computationally very expensive. The feature engineering process and optimization of El-
man Neural Network increase complexity. The incorporation of the inherent bi-directional
relation of electricity load and price in prediction models’ inputs results in higher predic-
tion accuracy as compared to separate forecasting; however, the correlation of electricity
load and price is not taken into consideration in .
In , authors propose a univariate wind power prediction model that has the problem of
low prediction accuracy because it doesn’t take into account the effect of exogenous vari-
ables that impact the wind power; such as, wind speed, wind direction and hour. Authors
in , use exogenous input variables for wind power forecasting; however, inefﬁcient fea-
ture engineering result in low forecasting accuracy.
In , a deep CNN based electricity theft detection model is proposed. The authors donot
take into account the impact of inherent imbalance data while detection that drastically
decrease the theft detection rate. Authors in  tackle the data imbalance issue and pro-
pose an ensemble model named Random Under Sampling Boosting (RUSBoost) for Electricity
Theft Detection (ETD). The detection accuracy is improved from . However, the false de-
tection rate is high because model’s parameters are not optimized efﬁciently. GS is used
for parameter optimization that is computationally complex and doesnot guarantee best
parameters; consequently, detection accuracy is affected negatively.
*The solution of electricity load and price forecasting is discussed in detail in section 5.1. Its results are
discussed in 7.
*The ANN autoencoders based simultaneous electricity load and price forecasting models are discussed
in sections 5.2 and 5.3. Results of both models are elaborated in 8.
*The details of enhanced WPF model EDCNN are presented in section 5.4. The simulation results prov-
ing the efﬁciency of the EDCNN are discussed in section 9.
*The enhanced algorithm for electricity theft detection DE-RUSBoost is presented in section 5.5. The
experimental results and analysis are given in the section 10.
5 Proposed System Model
Figure 1illustrates the proposed system for efﬁcient energy management that minimizes the power sys-
tem’s losses caused by over generation and electricity theft. The ﬁrst layer of model shows the data
acquisition. The raw data is pre-processed in the second layer. In the third layer, data is processed. The
results obtained from third layer are sent to the ﬁrst layer where power systems make decisions based on
these results. In second and third layers, data analytics are performed in order to achieve useful results
that help in improvement of operational planning power systems. There are three modules in the system
model. First module that contains 5.1, 5.2 and 5.3 (discussed in sections 5.1, 5.2 and 5.3, respectively) is
for predictive analytics of electricity price and load. In the ﬁrst module, three deep learning based models
are proposed; i.e., Deep LSTM (DLSTM) (model 5.1 for univariate data), Efﬁcient Sparse Auto-encoder
Nonlinear Autoregressive Network with Exogenous inputs (ESAENARX) (model 5.2 for multivariate
data) and Differential Evaluation Recurrent ELM (DE-RELM) (model 5.3 for multivariate data). In sec-
ond module (model 5.4 discussed in section 5.4), for predictive analytics of wind power, a deep learning
based model named Efﬁcient Deep Convolution Neural Network (EDCNN) is proposed. In the third
module (model 5.5 discussed in section 5.5), for electricity theft detection, an ensemble model named
Differential Evolution Random Under Sampling Boosting (DE-RUSBoost) is proposed. The proposed
model is discussed below.
Layer 1: Data Acquisition
. . .
. . .
. . .
Layer 2: Data Pre-Processing
Layer 3: Data Processing
Smart Grid Smart Community
Load and Price Forecasting
Wind Power Forecasting
Electricity Theft Detection
5.1, 5.2, 5.3 5.5
Microgrid with RES
Figure 1. Schematic diagram of proposed model.
5.1 DLSTM Single Input Single Output (SISO) Model for Electricity Load and
The proposed method comprises of four main parts: preprocessing of data, training LSTM network,
validation of network and forecasting on the test data . System model is shown in ﬁgure 2. The
problem statement of this model is stated in the section 4. The steps in the proposed model are listed as
1. The historical price and load vectors are pand l, respectively, which are normalized as:
Where pnor is vector of normalized price, mean(·)is the function to calculate average and std(·)
is the function to calculate standard deviation. This normalization is known as zero mean unit
variance normalization. Price data is splitted month-wise. Data is divided into three partitions:
train, validate and test.
2. Network is trained on training data and tested on validation data. NRMSE is calculated on valida-
3. Network is tuned and updated on actual values of validation data.
4. The upgraded network is tested on the test data where day-ahead, week-ahead and month ahead
prices and load are forecast. Forecaster’s performance is evaluated by calculating the NRMSE.
Predicted Load and Price
Electricity Market Historic Load and
Figure 2. Overview of DLSTM SISO model for electricity load and price forecast-
5.1.1 Data Preprocessing
Hourly data of regulation market capacity clearing price and system load is acquired from ISO NE. The
data of ISO NE is eight years, i.e., from January 2011 to March 2018. Data comprises of price and load
of 7 complete years, i.e., 2011 to 2017 and only three months data for the year 2018, i.e., January to
March. The data is divided month-wise. For example, data of January 2011, January 2012, .. ., January
2018 are combined, all twelve months data is combined in the same fashion. The DLSTM network is
trained on month-wise data. Data is partitioned into three parts: training, validation and test data.
5.1.2 Working of DLSTM
The DLSTM network works on the train and update state method [78,79]. At a time step t, the network
learns a value of price or load time series and stores a state. On the next time step, the network learns the
next value and updates the state of previously learned network. All data is learned in the same fashion
to train the network. While testing, the last value of training data is taken as the initial input. One value
is predicted at a time step t. Now, this predicted value is made the part of training data and network is
trained and updated. Every predicted value is made the part of the training data to predict the next value.
For example, if network dlstmnis learned on nvalues, the nth value is the input to predict the n+1th
value. After predicting the n+1th value, the network dlstmn+1is now trained and updated on n+1 values
to predict the n+2th value. The n+1th value is the ﬁrst predicted value by the initially learned network
dlstmn. To predict mvalues, the network will train and update mtimes. After predicting mvalues, the last
trained and updated network dlstmn+mis trained on n+mvalues, i.e., n,n+1,n+2.. . , n+m−1. The
step by step ﬂow of the proposed method is shown in the ﬂowchart, ﬁgure 3.
5.1.3 Network Training and Forecasting
Training, validation and test data is obtained by preprocessing the data. The price and load data is feed to
the DLSTM network for training. The proposed DLSTM has ﬁve layers, i.e., an input layer, two LSTM
layers, a fully connected layer and the regression output layer. The number of hidden units in LSTM layer
1 is 250, and LSTM layer 2 is 200. The ﬁnal number of hidden units are decided after experimenting
on a different number of hidden units and keeping the number of hidden units with the least forecast
error. During the training process of DLSTM, the network predicts step ahead values at every time step.
The DLSTM learns patterns of data at every time step tand updates the network trained till the previous
time step t-1. Every predicted value is made the part of the whole data for the next prediction. In this
manner, the network trained iteratively. DLSTM network is trained for price and load data separately.
The network trained on training data is the initial network. Initial network is tested on validation data.
The initial network forecasts step ahead value on validation data. After taking forecast results from the
initial network the NRMSE is calculated. The initial network re-learns and re-tunes on actual values of
validation data until the NRMSE reduces to a desired level. Now the ﬁnal and tuned network is used to
forecast price and load.
Historical load data
Historical price data
Normalize price data Normalize price data
Prepare month-wise price data
Split Training Xt, Valida on Xv and Tes ng Xs Data
Split Training, Valida on and Tes ng Data
Tune and update network on Xv
Forecast load and
Print forecasted price and load
price on test data Xs
Figure 3. Flowchart of the DLSTM SISO.
5.2 ESAENARX Multiple Inputs Multiple Outputs (MIMO) Model for Simulta-
neous Electricity Load and Price Forecasting
ESAENARX is a two stage predictive models . In the ﬁrst stage, feature are extracted using the
proposed efﬁcient feature extractor ESAE. Price and load are simultaneously predicted by NARX in the
second stage. In the next section, the proposed methods are described in detail. The problem statement
of ESAENARX is given in section 4. The proposed system model is shown in ﬁgure 4.
5.2.1 Efﬁcient SAE (ESAE)
The ESAE is proposed to create a better representation of electricity data, that is useful for an accurate
forecast of price and load. In this section, the proposed feature extractor ESAE is discussed in detail.
5.2.2 Pre-training of ESAE
To initialize the weights and biases, an unsupervised pre-training is applied. Where the input of a hidden
layer is the output of its previous layer. In the pre-training step, the initial biases and weights of the
autoencoder are learned.
In the proposed method, the input data Xtis corrupted by introducing white noise . The white noise
is added to randomly selected 30% data points. A random process y(t)is known as white noise if the
Sy(f)is constant on all the frequencies f:
D å s
Historic Temperature Forecast
Historic Data ESAE Feature Extractor MIMO Forecaster ESAENARX
Figure 4. Overview of ESAENARX MIMO model for simultaneous electricity
load and price forecasting.
Sy(f) = N0
The white noise describes random disturbances with small correlation periods. The white noise general-
ized correlation function is deﬁned as:
B(t) = δ(t)σ2(3)
Where δ(t)is the delta function and σis a positive constant.
5.2.3 Fine-tuning of ESAE
The ﬁne-tuning step is followed by the pre-training step. In the ﬁne-tuning, the wavelet denoising is
proposed as the encoding transfer function of the ﬁrst hidden layer of ESAE. The activation function
of the second layer is sigmoid. The wavelet denoising has two steps: (i) wavelet packet decomposition
and (ii) reconstruction denoising operation. Firstly, the input time series is decomposed into different
frequency band by passing through the high pass and low pass ﬁlters. Then, the frequency band of noise
is set to be zero. The signal is then reconstructed using wavelet reconstruction function, that is inverse of
wavelet decomposition function . Wavelet decomposition operation can be expressed as:
dj,k=∑ndj−1,gn−2kk= (1,2, ... , N−1)
Where cj,kis scale coefﬁcient, dj,kis the wavelet coefﬁcient, hand gare the quadrature mirror ﬁlter
banks. jis level of decomposition and Nrepresents the sampling points. The wavelet reconstruction
function that is inverse wavelet decomposition is expressed as:
The denoising operation is shown by equations below.
ωj,k=sign(ωj,k(ωj,k−Tλ)),|ωj,k| ≥ λ,
ωj,kis denoised signal, ωj,kis wavelet transformed signal and λis threshold.
In ESAE feature extractor, the number of the units in hidden layer one and two are 400 and 300, re-
spectively. The coefﬁcient that controls the layer 2 weight regularization is set to be 0.001. Sparsity
regularization is 4 and sparsity proportion is 0.05. A maximum number of epochs is 100. The algorithm
for learning of weights is scale conjugate gradient descent.
5.2.4 Non-linear Autoregressive Network with Exogenous Inputs (NARX)
NARX is an autoregressive recurrent ANN. Its feedback connections enclose several hidden layers of
the network while leaving the input layer. NARX has a memory that is utilized for creating a nonlinear
mapping between inputs and outputs. The network learns from the recurrence on the past values of time
series and the past predicted values of the network . For predicting a value y(t), the inputs of the
NARX are y(t−1),y(t−2),...,y(t−d). NARX can be explained by the following equation 5.
ˆy(t+1) = f(y(t),y(t−1), ..., y(t−d),x(t+1),x(t), .. . , x(t−d)) + ε(t)(5)
Where ˆy(t+1)is network’s output at t,f() is the nonlinear mapping function, y(t),y(t−1), ..., y(t−d)
are the past observed values, x(t+1),x(t), ..., x(t−d)are the network’s inputs, number of the delays
is dand the error term is denoted by ε(t). In the proposed NARX, for simultaneous forecasting of price
and load, the number of delays are 2. The hidden layers of the network are 10. The training function is
The deep learning is well known for its high precision feature extraction. A sparse autoencoder deep neu-
ral network with dropout is proposed to extract useful feature. This deep neural network can signiﬁcantly
reduce the adverse effect of overﬁtting, making the learned features more conducive for identiﬁcation
and forecasting. NARX is applied for prediction. The proposed Multi Input Multi Output (MIMO)
model predicts the price and load, simultaneously. Features are extracted using ESAE. Then, the NARX
network is trained for simultaneous forecasting of price and load. The ﬂowchart is shown in ﬁgure 5.
Extracted FeaturesDe-normalization Forecasting by
Price and load
normalization of data
Stage 1: Feature Extraction
Stage 2: Prediction
Encoding with SAE
features with white
Figure 5. Step by step ﬂow of ESAENARX MIMO.
The input features are: hour, temperature forecast, wind speed forecast, lagged load and lagged price.
There are two targets: electricity load and price. The prediction process has the following ﬁve steps.
1. Inputs and targets are normalized using min-max normalization. Suppose an input vector:
X= [x1,x2,x3,...,xn]. The number of instances in the vector is n. The min-max normalization
is obtained by equation 6:
2. The normalized inputs are fed to train the ESAE feature extractor. After the ESAE is trained, the
input features are encoded using this trained ESAE. The output of ESAE is the encoded features.
3. The encoded features are given as input to train NARX network. 80% data is given for training,
15% is used for validation and 5% is used for testing.
4. The price and load are predicted for 168 hours of one week.
5. The predicted values of load and price are de-normalized to obtain actual values. The NARX
accurately predicts the price and load, simultaneously.
5.3 DE-RELM MIMO Model for Simultaneous Electricity Load and Price Fore-
The third proposed model is also a MIMO model like ESAENARX . Its problem statement is given
section 4. DE-RELM is an efﬁcient method for electricity load and price forecasting. DE-RELM has
three stages, in the ﬁrst stage, the parameters of ELM are optimized by applying DE algorithm. In
the second stage, ELM is trained. The inputs and outputs of ELM are the input features of load and
price. With similar inputs and outputs, ELM acts like an encoder. Once the optimized ELM is trained,
the learned weights are set as the initial weights of the RNN network that is used for forecasting. The
learned weights of ELM are the best representation of the input data. Setting these initial weights helps
RNN to converge fast and forecast accurately. This is the third and ﬁnal stage of DE-RELM. The number
of neurons in the hidden layer of ELM and RNN is kept same. In order to use the learned weights of
ELM for the RNN network, the dimensions of weight vectors have to be the same. For the prediction of
load and price, DE-RELM follows the steps shown in the ﬂowchart, ﬁgure 6.
Price and load
normalization of data
Stage 1: ELM optimization
Stage 2: Training ELM
Select weights and
biases with DE No
Stage 3: Prediction with DE-RELM
Train ELM with same
inputs and outputs
Learned Weights Train ELM with
Forecasting by DE-
with learned weights
Figure 6. Flowchart of the DE-ELM MIMO.
1. The inputs and targets are normalized using min-max normalization (as shown in equation 6).
2. The normalized inputs are given to the ELM networks as inputs and outputs. The network is
3. The forecasting error is calculated by MAPE.
4. The DE algorithm is used to optimize the weights and biases of ELM. The objective function of
DE is the minimization of the prediction error.
Ob j =minimize "1
Where yf or
iis the forecast value, Xact
iis the value of the actual target.
5. When the forecasting error reduces to the desired value, the optimized ELM network is trained.
6. The weights of ELM are set as initial weights of the RNN network.
7. The RNN network predicts the price and load simultaneously.
8. The predicted values are de-normalized by inverse min-max function as shown in equation 8.
X= [xf or ×(Xmax −Xmin)] + Xmin (8)
Where xf or is the forecast value, Xmax is the maximum value of the actual target and Xmin is the
minimum value of the actual target.
In DE-RELM, the number of neurons in the hidden layer of ELM and RNN is 100. ELM has 1 hidden
layer. The activation function of ELM is sigmoid. DE has 100 iterations, population size is 50, mutation
factor is 0.5 and the crossover rate is 1. The RNN network has 1 hidden layer. The transfer function is
5.4 EDCNN Multiple Inputs Single Output (MISO) Model for Wind Power Fore-
The proposed method for forecasting wind power generation  and power management algorithm
(as shown in ﬁgure 7) are discussed in this section. The problem statemen of EDCNN is discussed
in section 4. The features and target (wind power) are normalized using min-max normalization (as
shown in equation 6). Three types of inputs are given to the forecasting model are: (i) NWP: dew
point temperature, dry bulb temperature, wind speed, (ii) past lagged values of wind power and (iii) past
decomposed wind power. The wavelet decomposition is described in the next section.
Windmill Farm Data
NWP Data Windmill Farm
layer 1 Convolution
Normally Distributed Load
Figure 7. Overview of EDCNN MISO model for wind power forecasting.
5.4.1 Feature Engineering
The historical wind power signal is decomposed using Wavelet Packet Transform (WPT). The WPT is a
general form of the wavelet decomposition which performs a better signal analysis. WPT is introduced
in 1992 by Coifman and Wickerhauser . Unlike Discrete Wavelet Transform (DWT), the WPT wave-
forms or packets are interpreted by three different parameters: frequency, position and scale (similar to
the DWT). For every orthogonal wavelet function multiple wavelet packets are generated, having differ-
ent bases. With the help of these bases, the input signal can be encoded in such a way that the global
energy of signal is preserved and exact signal can be reconstructed effectively. Multiple expansions of
an input signal can be achieved using WPT. The suitable most decomposition is selected by calculating
the entropy (e.g., Shannon entropy). The minimal representation of the relevant data based on a cost
function is calculated in WPT. The beneﬁt of the WPT is its characteristic of analyzing signals in dif-
ferent temporal as well as spatial positions. For highly nonlinear and oscillating signal like wind power
DWT doesn’t guarantee good results. In WPT, both the approximation and detail coefﬁcients are further
decomposed into approximation and detail coefﬁcients as the wavelet tree grows deeper. Wavelet packet
decomposition operation can be expressed by equations 9and 10. For a signal ato be decomposed, two
ﬁlters of size 2Nare applied on a. The corresponding wavelets are h(n) and g(n).
W2n(a) = √2
W2n+1(a) = √2
Where the scaling factor W0(a) = φ(a)and the wavelet function is W1(a) = ψ(a).
The past wind power signal is decomposed into 36 signals. The best representation of the input signal
is selected through Shannon entropy. After decomposing the past wind signals, the engineered features
along with NWP variables (dew point, dry bulb, wind speed), lagged wind power (w-24, w-25) and hours
are input to the proposed forecasting model. The proposed forecasting model is discussed in the next
5.4.2 Efﬁcient DCNN
The inputs are given to the EDCNN for predicting day-ahead hourly wind power (24 values). Firstly,
the functionality of trivial CNN is discussed in this section. Secondly, the proposed method EDCNN is
explained. The problem statement of the EDCNN is discussed in section 4.
CNN is the computational model of human’s visual cortex’s functionality. CNN has an excellent capa-
bility of extracting deep underlying features of data. The CNN effectively identiﬁes the spatially local
correlations in data through convolution operation. In the convolution operation, a ﬁlter is applied to a
block of spatially adjacent neurons and the result is passed through an activation function. This output
of convolution layer becomes the input to next layer’s neurons. Thus, the input to every neuron of a
layer is the output of a convolved block of the previous layer. Unlike ANN, the CNN training is efﬁcient
due to the weight sharing scheme. Due to the weight sharing, the learning efﬁciency improves. CNN
is composed of four altering layers: (i) convolution layer, (ii) sampling or pooling layer, (iii) batch nor-
malization layer and (iv) fully connected layer. The convolution operation can be explained by following
equation 11. Suppose, X = [x1,x2,x3, ..., xn] is the vector of training samples and C = [c1,c2,c3, ...,
cn] is the vector of corresponding targets. nis the number of training samples. CNN attempts to learn the
optimal ﬁlter weights and biases that minimize the forecasting error. CNN can be deﬁned as :
Where i = [1, 2, . . ., n] and m = [1, 2, . . ., M]. mis the number of layer to be learned. The ﬁlter weights
of the mth layer is denoted by wm.bmrepresents the corresponding biases, ⊗refers to the convolution
operation. f(·)is the nonlinear activation function. Ym
iis the feature map generated by sample Xiat layer
In the EDCNN network, there are eleven layers: three convolution layers, three max pooling layers,
two batch normalization layers, three ReLU layers, one modiﬁed fully connected layer and modiﬁed
output layer Enhanced Regression Output Layer (EROL). Functionality of two layers is modiﬁed, in
order to improve the forecasting performance of EDCNN. According to the ANN literature, there is no
standard way to choose an optimal activation function. However, it is a well-known fact that machine
learning methods have an excellent optimization capability of any model or function. On the basis of
these facts, a modiﬁed activation function is employed in a hidden layer. The proposed activation function
is ensemble of results of three activation functions: hyperbolic tangent, sigmoid and radial base as shown
in equations 12, equation 13 and 14, respectively. The proposed activation function takes the average of
three functions’ (as shown in equations 12–14) results. New activation function is shown in equation 15.
T H =exw −e−xw
exw +e−xw (12)
F(x,w) = (T H +σ+φ)
Where xw is the intermediate output of a network layer (weighted sum of input) on which activation is
to be applied to achieve the ﬁnal output. φis the radial base function. The proposed activation function
takes the average of the three aforementioned functions to calculate the results of corresponding hidden
In the proposed output layer EROL, a modiﬁed objective function is embedded. The objective is to
minimize the absolute percentage error between the forecast values and actual targets. The objective can
be expressed as equation 16:
min Loss(w,Xi,ci) = L(w,Xi,ci)(16)
Where L(w,Xi,ci)is the forecasting error or loss from sample Xi. The loss function is expressed as
L(w,Xi,ci) = 1
i=1Xiwi)is the output of the output layer and ciis the desired actual target.
After forecasting the wind power, it is used in the DSM algorithm. The day-ahead Locational Marginal
Price (LMP), day-ahead demand and forecast wing power are the inputs to the proposed DSM algorithm.
The proposed DSM algorithm is applied to the data of a smart grid-connected micro grid. The system
description is presented in the next section.
5.4.3 Wind Power Forecasting based Demand Side Management
A smart grid tied micro grid with the wind power plants is studied in this research work. For the MG’s
load management, three parameters are utilized: (i) wind power forecast, (ii) day-ahead demand / load
and (iii) day-ahead LMP. The LMP is the price of energy purchased from the SG in case of insufﬁcient
generation of wind power. In the wind power generation, there are following possible cases:
188.8.131.52 Case 1
The ﬁrst and simplest case is when the generated wind power is equal to the load. There is no gap
between the generation and demanded power. In this case, no energy is required to be purchased from
the SG. MG is self-sufﬁcient.
184.108.40.206 Case 2
The wind power generated in the MG is more than the required power. In this case, the excessive power
is transmitted to the SG (as shown in equation 18).
Where W is the wind power, L is the load and transmission process is denoted by the symbol →.
In exchange of this energy, the SG will give MG a subsidiary on the future price of energy purchase.
220.127.116.11 Case 3
Another case is when there is either no or lesser wind power as compared to the demand. In this case,
the MG have to purchase the required power from the SG. If there is a subsidiary on price from the past,
the price is reduced, otherwise the actual price is paid for purchasing energy. Generally, a 10% to 15%
concession on energy price is offered as a subsidiary. In this case, the proposed demand management
algorithm is applied to achieve the objectives listed below:
- Load factor maximization.
- Consumption cost minimization.
5.4.4 Proposed DSM Algorithm
The wind power is forecast for 24 hours. The ﬁrst objective is to maximize load factor for maximum
utilization the power resource. The second objective is to minimize the consumption cost.
Ob j1=maximize LF (19)
Ob j2=minimize C (20)
Where LF is the load factor (equation 21) and C (equation 22) is the total consumption cost.
Lis the sum of total load, ¯
Lis the average load, Lis the load vector, Pis the LMP vector and the
unit of LMP is $/MW h.nis the length of the load and LMP vectors.
There are a few constraints of the system. The ﬁrst constraint is that the demanded load must be equal
to the load after applying the DSM scheme. Second constraint is that after applying the DSM, the
consumption cost should be less than the initial cost. And the third constraint is that load factor must
increase. Following are the constraints (equations 23–25):
LFnew >LF (25)
Where L is load before DSM and Lnew is load after applying DSM. Cold is the consumption cost before
DSM and C is cost after DSM. LFnew is the load factor after DSM. The purpose of the proposed DSM
scheme is to bring the consumption as close to the normal distribution curve as possible.
Let the input vectors containing 24 values: W = wind power forecast,
L = day-ahead demand and
P = day-ahead LMP.
Other variables used are:
C = consumption cost,
S = subsidiary,
DWD = demand-wind power difference,
new = new adjusted price and
Lnew = new normally distributed load after applying DSM scheme.
The proposed algorithm for managing wind power and demand in an economical manner is given bellow.
Manage_Demand(·) is the proposed function for managing demand in an economical manner. This func-
tion will distribute the load in a normal form. The peak periods are shaved and valley periods are ﬁlled.
The resultant load proﬁle achieved by this method will follow the normal distribution, approximately.
Algorithm 1 Algorithm for Demand Side Management.
Require: Input: [W, L, P]
1: Output: C
2: if W=Lthen Wind power is sufﬁcient to fulﬁll demand
new =0Wind power is sufﬁcient that has no cost
4: Lnew =LLoad is equal to wind power, so load adjustment is not performed
new ×Lnew Calculating consumption cost
6: else if W>Lthen Wind power is greater than demand
7: W−L→SG Excessive wind power is transmitted to the SG
8: S = 0.9 10 % reduction in price is subsidiary for next power purchase
new =0Wind power is sufﬁcient that has no cost
10: Lnew =LLoad is lesser than wind power, so load adjustment is not performed
new ×Lnew Calculating the consumption cost
12: else if W≥0AND W <Lthen Wind power is not sufﬁcient to fulﬁll the demand
13: DW D =L−WFinding demand that have to be fulﬁlled by the SG
14: Lnew =Manage_Demand(DW D,L)Managing demand to distribute it normally
15: if S = 0.9 then If there is subsidiary on the price, the price will be adjusted
new =P×S10% reduction on price by subsidiary
new ×Lnew Calculating consumption cost
new =PIf there is no subsidiary on price, price remains same
new ×Lnew Calculating consumption cost
21: end if
22: end if
Algorithm 2 Function for Load Shifting.
1: Manage_Demand Function
2: Function Lnew =Manage_Demand(DW D,L)
3: µ=mean(DW D)Average of demand to be fulﬁlled by the SG
4: σ=std(DW D)Standard deviation of demand to be fulﬁlled by the SG
5: SD =sum(DW D)Sum of demand to be fulﬁlled by the SG
6: if DWD < µthen Checking each value of demand vector if it is smaller than mean
7: L0=L+σWhen value is smaller, add standard deviation to make it closer to mean
8: else if DWD > µthen Checking each value of demand vector if it is greater than mean
9: L0=L−σWhen value is larger, subtract standard deviation to make it closer to mean
10: end if
11: SL =sum(L0)Taking sum of all values of new adjusted load vector
12: d=SL–SD Taking the difference of demanding load and new adjusted load and adjusting new
13: load to be equal the demanded load
14: if d>0then Difference greater than zero means the new adjusted load is more than the
15: [idx Count ] = L>µCount is the number of values greater than average and index are their
17: Lnew =L(indx)−d
count Subtracting the difference from all the larger values
18: else if d<0then Difference smaller than zero means the new adjusted load is lesser than the
19: [indx Count ] = L<µCount is the number of values that are smaller than average load
20: Lnew =L(indx) + d
count Adding the difference in all the smaller values
21: end if
22: [index Lsorted ] = Sort(Lnew)Sort will sort the Lnew in ascending order and return index of the
23: sorted array Lsorted
24: For i=1to6 Shift the peak load to the lowest load
25: j = i-1, sf = 5*i a = length(Lnew )Deﬁning shifting factor
26: if index(i) > 6 then Shift the load to lowest load that is not late night
27: shftFac = Lnew(index(a−j))
28: Lnew(index(i)) = Lnew(index(i)) - shftFac Subtracting the shifting factor from the highest load
29: Lnew(index(a-j)) = Lnew(index(a-j)) + shftFac Adding the shifting factor to the lowest load
30: end if
31: End For
32: End Function
5.5 DE-RUSBoost model for Electricity Theft Detection
In this section, the proposed ETD system is discussed in detail. The proposed model comprises of
ﬁve stages, namely: (i) data preprocessing, (ii) classiﬁer training and optimization, (iii) classiﬁcation
and (iv) model evaluation. The system model is illustrated in ﬁgure 8. The ﬁgure shows the step by
step procedure of the proposed system. Firstly, the data is acquired from smart homes or community,
secondly, the unlabeled data is labeled, thirdly, the labeled data is fed to classify fraudulent consumers,
fourthly, the classiﬁcation model is optimized and lastly, classiﬁcation is performed. Description of the
system model is given below. The problem statement of the DE-RUSBoost is discussed in section 4.
5.5.1 Data Preprocessing
The data contains missing values. The missing values are ﬁlled using linear interpolation. If less than
seven consecutive values are missing or zero, they are replaced by linearly interpolated values. If seven
or more than seven consecutive values are missing, they are replaced by zeros. Every week starts on
Monday and ends on Sunday in all the calculations.
The electricity thieves are only 8% of the total data. The positive class is the class of fraudulent consumers
and negative class contains fair consumers. The negative class is 12 times larger than the positive class.
Therefore, this is a binary classiﬁcation problem of highly imbalanced data. An ensemble approach is
designed for classiﬁcation [76,86] that is described in the next section.
Rejection Interpolation Normalization
Yes Tune Hyper
. . .
. . .
. . .
Figure 8. Overview of DE-RUSBoost model for electricity theft detection.
5.5.2 DE-RUSBoost Classiﬁcation Model
RUSBoost is an ensemble method that is successfully used for imbalanced data classiﬁcation in ETD 
and several other ﬁelds [87,88,89]. It works on the sampling strategy to overcome the class imbalance
ratio. It combines the strengths of both Adaptive boosting (Adaboost) and Random Under Sampling
(RUS) techniques. The Adaboost algorithm repeatedly trains multiple weak learners (usually Decision
Trees (DTs)) on subsets of training data S0. For classifying a new example, a weighted vote of all the
learners is taken. The weights are assigned to misclassiﬁed examples by every learner using the following
error formula shown in equation 26.
Where hnis the weak learner, xiis the example to be labeled, Dn(i)is the probability distribution of all
the examples at iteration nand Sis training set. After calculating the error, the weights are updated by
following equations 27 and 28.
Dn+1(i) = (Dn(i)×e−αnI f hn(xi) = yi
Where Dn+1(i)is the updated weight of a sample, Dn(i)is the weight of the sample in the previous
iteration n,αnrepresents the weight updation factor and hn(xi)is the label assigned to the example xiby
the weak learner hn.
The performance of Adaboost algorithm is better for balanced data, however, in the case of imbalanced
classes, it tends to underﬁt. To diminish the effect of imbalanced data, under sampling is performed in
RUSBoost method. The RUSBoost method under sample the majority class while selecting subsets for
training week learners. For example, the percentage of majority class is 90% and minority class is 10%.
The week learners are trained on 20% of training examples by taking all examples of minority class and
same number of examples are randomly selected from the majority class. In this way, the imbalanced ratio
is reduced and learners are trained on the balanced data. With under sampling, the weights are updated
using the Adaboost method’s steps (as shown in equations 27 and 28). A new example is classiﬁed by
yi=H(xi) = signN
Where yiis the label assigned to example xiby the RUSBoost classiﬁer H(xi). The nis the number of
training cycles or iterations, hnis the weak learners.
The details given above describe the trivial RUSBoost method. In order to reduce the false detection
rate and to improve the performance of the conventional RUSBoost algorithm, an enhanced scheme is
proposed that is named DE-RUSBoost. In this work, the RUSBoost’s parameters are optimized using DE
in order to make it more robust and accurate than trivial RUSBoost. DE is a well-known meta-heuristic
optimization technique . It iteratively optimizes a problem attempting to improve a candidate solu-
tion according to an objective function. It keeps a population of candidate solutions and generate new
solution by combination of initial solutions (crossover) and altering one or more elements in the solutions
(mutation). It then selects the best solution based on the ﬁtness value of that solution. Following are the
major steps DE followed while optimizing the RUSBoost classiﬁer:
1. Initialization: Randomly generates the initial population of size NPthat follows a uniform distri-
2. Mutation: In the mutation step, a new solution υg+1
iis created in the ith iteration by following
equation 30 :
s3are individuals selected from generation gand Fis the mutation factor.
3. Crossover: In the crossover step, individuals are combined to create new solutions by following
equation 31 :
i,jI f ri≤CR
i,jis the trail vector of intermediate crossing, Vg
i,jis the corresponding mutant solution
vector, j∈[1,d],dis the dimension of solution vectors. CR is the crossover rate and ri∈[0,1]is
a uniformly distributed random factor that deﬁnes the possible values of CR.
4. Selection: The last operation is the selection of the best solution that is described in equation 32
i,jI f f (Ug+1
i) = T P
2(T P +FP)+T N
2(T N +F N )(33)
i)denotes the ﬁtness value of trail vector Ug+1
i)denotes the ﬁtness function
i, TP are correctly classiﬁed positive test sample, FP are misclassiﬁed positive samples, TN
are correctly classiﬁed negative samples and FN are misclassiﬁed negative samples. The ﬁtness
function is to maximize the Area Under the Curve (AUC). The algorithm selects the parameters
that make RUSBoost more accurate. The proposed method is presented in the algorithm 3.
Algorithm 3 DE-RUSBoost Algorithm.
Require: Input: [0,0]
1: Output: P,N
2: Set NP,Gmax, CR, F
3: Randomly set X = [P, N]
4: for n=1ToNPdo
5: If fi(Xi)>fi(Xi+1)
6: Reserve fi(Xi)
7: Compare fi(Xi)with fi(Xi+2)
9: Reserve fi(Xi+1)
10: Compare fi(Xi+1)with fi(Xi+2)
11: End If
12: Obtain fmax(Xi)
13: End for
14: Denote fmax(Xi)as X∗= [N∗,P∗]
15: Classify a new example using RUSBoost (equations 26–29).
The parameters that are optimized are the imbalance ratio of subset selected for weak learner’s training
and the number of weak learners hn(i.e., DTs). The performance of DE-RUSBoost is signiﬁcantly
improved as compared to the RUSBoost. The objective function of DE is minimization of the training
error (as shown in equation 33). The DE selects parameters, train the RUSBoost classiﬁer and calculates
the error. In DE, the population size NPis 300, number of iterations Gmax is 100, crossover rate CR is 1
and mutation factor Fis 0.5.
The best parameters DE selected for the classiﬁers are: class imbalance percentage of 54%, 46% for
majority and minority class, respectively and 250 decision trees (weak learners). The DE-RUSBoost
classiﬁer is trained on 70% of the total data and tested on 30% data. The results and analyses of the
proposed models are presented in the next section.
6 Simulations’ Setup and Results
All simulations are performed using MATLAB R2018a on a computer system with core i3 processor, 8
GB RAM and 500 GB hard disk. In the next sections 7–10, simulation results are discussed.
7 Results and Discussion of DLSTM
In this section, performance of the proposed system 1 is validated through simulations and discussion.
The problem statement of this model is discussed in section 4. A case study is presented in the next
section, i.e., short-term forecasting using aggregated load and the average price of six states of USA .
7.1 Data Description
The historic electricity price and load data used in simulations are taken from ISO NE . ISO NE man-
ages the generation and transmission system of New England. ISO NE produces and transmits almost
30,000 MW electric energy daily. In ISO NE, annually 10 million dollars of transactions are completed
by 400 electricity market participants. The data comprises ISO NE control area’s hourly system load and
regulation capacity clearing price of 6 states of the USA captured in the last eight years; i.e., January
2011 to March 2018. The data contains 63,528 measurements.
When the performance of DLSTM is compared with the aforementioned methods, it had less error. DL-
STM had lower MAE and NRMSE as compared to ELM, WT + SAPSO + KELM, NARX and INARX.
WT+SAPSO+KELM  is proposed for electricity price prediction. For price forecasting, DLSTM is
compared with ELM, NARX and WT + SAPSO + KELM. Buitrago et al. proposed INARX  for elec-
tricity load prediction. The DLTM load prediction results are compared with ELM, NARX and INARX.
The comparison of forecast results is shown in ﬁgures 9and 10.
0 20 40 60 80 100 120 140 160
Figure 9. Comparison of DLSTM, ELM and NARX for price forecast of one
week, ISO NE.
0 20 40 60 80 100 120 140 160
Figure 10. Comparison of DLSTM, ELM and NARX for load forecast of one
week, ISO NE.
DLSTM has a feedback architecture, where errors are backpropagated. In DLSTM, weights are updated
multiple times during training, with every new input. The learned weights are obtained when network
completes its training on complete training data.
7.2 Performance Evaluation
For performance evaluation, two evaluation indicators are used: MAE and NRMSE. MAPE performance
metric has a limitation of being inﬁnite, if the denominator is zero; MAPE is negative, if the values are
negative, which are considered meaningless. Therefore, MAE and NRMSE are suitable performance
MAE is the average of absolute errors. In MAE, the absolute difference of all forecast values is taken from
their respective observed values. After taking absolute difference, their arithmetic mean is calculated.
NRMSE is the average root square error. In NRMSE the difference is calculated similar to MAE and
difference is squared. Arithmetic mean of squared error is calculated and its square root is taken. The
calculated error is normalized by dividing it to the max(Xs)−min(Xs).max(Xs)is the maximum value
from the vector of observed test value and min(Xs)is the minimum value from the vector of observed test
The formulas of MAE and NRMSE are given in equations (34) and (35), respectively.
Where Xsis the observed test value at time t and ysis forecast value at time t.
A vector of values that are to be forecast are [y1,y2, .. ., yn]. These values are predicted by two fore-
casting models: M1and M2. The forecasting errors of these models are [εM1
2, ... , εM1
2, ... , εM2
n]. A covariance loss function L() and differential loss are calculated in DM as
equation (36) :
In its one-sided version, the DM test evaluates the null hypothesis H0of M1having an accuracy equal to
or worse than M2; i.e., equal or larger expected loss, against the alternative hypothesis H1of M2having
a better accuracy, i.e., :
One −sided DM test (H0:dM1,M2
SANN cannot handle large amount data very well and tends to overﬁt. DNN has more computational
power than SANN. For a prediction on big data, deep learning is shown to be an effective and viable al-
ternative to traditional data-driven machine learning prediction methods . The validated and updated
Deep LSTM forecaster outperformed ELM and NARX in terms of MAE and NRMSE.
The NRMSE and MAE matrices are used to compare the accuracy of different forecasting models. How-
ever, the fact that the accuracy of a model is higher does not conﬁrm that a model is better than the
others. The difference between the accuracy of two models should be statistically signiﬁcant. For this
purpose, the forecasting accuracy is validated using statistical tests; such as, Friedman test , error
analysis , Diebold–Mariano (DM) test , etc. The performance of the proposed method is vali-
dated by two statistical tests, DM and Friedman test. DM is a well-known statistical test for validation
of electricity load  and price forecasting . DM forecasting accuracy comparison test is used for
comparing the accuracy of the proposed model with the existing models, i.e., ELM, WT + SAPSO +
KELM, NARX and INARX.
Table 2 Comparison of load and price forecasting errors of DLSTM SISO with benchmark
Forecast Forecasting Method MAE NRMSE
ELM 67.4 11.86
NARX 12.47 8.24
Price Forecast WT + SAPSO + KELM 8.99 0.13
DLSTM 1.945 0.08
ELM 52.8 8.42
Load Forecast NARX 37.18 14.74
INARX 9.7 0.2
DLSTM 2.9 0.087
The second test used for veriﬁcation of improved accuracy of the proposed model is the Friedman test.
The Friedman test is a two-way analysis of variance by ranks. It is a non-parametric alternative to the
one-way ANOVA with repeated measures. Multiple comparison tests are conducted in the Friedman
test. Its goal is to detect the signiﬁcant differences between the results of different forecasting methods.
The null hypothesis of Friedman test states that the forecasting performances of all methods are equal.
To calculate the test statistics, ﬁrst the predicted results are converted into the ranks. The predicted results
and observed values pairs are gathered for all methods. Ranks are assigned to every pair i. Ranks range
from 1 (least error) to k(highest error) and denoted by rj
i(1≤j≤k). For all forecasting methods j,
average ranks are computed by:
Ranks are assigned to all forecasts of a method, separately. The best algorithm has rank 1, the second
best has 2 and so on. The null hypothesis states that all methods’ forecast results are similar; therefore,
their Riare equal. Friedman statistics are calculated by equation (39) .
where nis the total number of forecasting results, kis the number of compared models, Rankiis the
average rank sum received from each forecasting value for each model. The null hypothesis for Fried-
man’s test is that equality of forecasting errors among compared models. The alternative hypothesis is
deﬁned as the negation of the null hypothesis. The test results are shown in table 3. Clearly, the proposed
DLSTM model is signiﬁcantly superior to the other compared models.
Friedman test (H0:F≤0M1
In table 3, the results of DM and Friedman tests are presented. The DM test statistics of DLSTM with the
compared methods are listed. The DM results greater than zero mean the DLSTM method is signiﬁcantly
better than the compared method (as shown by hypotheses in equation (37)). Friedman R ranks are
computed by equation (39). The ranks range from 1 to 4 for four compared methods. Rank 1 shows the
best performance and 4 shows the worst performance of forecasting method. The DM values of DLSTM
versus three compared method are shown (DLSTM is not compared with itself, therefore Not Applicable
(N/A) is stated). For price forecasting, the F rank was: DLSTM > WT + SAPSO + KELM  > NARX >
ELM. The F rank for load forecasting was: DLSTM > INARX  > NARX > ELM. The used statistical
tests validated that the accuracy of the proposed method DLSTM is signiﬁcantly improved. The DLSTM
ranked ﬁrst for both load and price forecasting. The DM results are greater than zero, which means
DLSTM is better than the other compared methods.
Table 3 Diebold-Mariano and Friedman tests’ rank F of DLSTM SISO.
Forecast Forecasting Method Diebold–Mariano Friedman
DLSTM vs. F Rank
ELM 47.3 4
Price Forecast NARX 27.6 3
WT + SAPSO + KELM 12.8 2
DLSTM N/A 1
ELM 43.2 3
Load Forecast NARX 6.8 2
INARX 4.2 2
DLSTM N/A 1
Experimental results prove that the proposed method forecasts the real patterns and recent trends of load
and price with greater accuracy as compared to ELM and NARX. Comparison of the proposed method
with NARX and ELM is shown in table 2. The price forecast errors listed in table 2are the average of all
twelve months of forecasting errors for ELM, NARX and DLSTM.
8 Results and Discussion of Model ESAENARX and DE-RELM
In this section, the description of datasets, data analysis and results’ discussion of ESAENARX and
DE-RELM models are presented.
8.1 Data Description
The data used for forecasting is taken from the well-known electricity utility: ISO NE (ISO New Eng-
land), USA. Datasets is publicly available.
8.2 Performance Evaluation
To evaluate the performance of ESAENARX two performance measures are used, i.e., MAPE, RMSE
and NRMSE. The lower value of the error is better forecasting accuracy.
8.3 Comparison and Discussion
The proposed methods are compared with four ANN forecasting methods: NARX and ELM, DE-ELM
and RELM. These methods are widely used in electricity load and price forecasting.
The detailed comparison of all the compared methods is presented in this section. The results and rea-
soning are also elaborated with the comparative analysis. Moreover, the strengths and limitations of the
compared methods are highlighted.
The effect of the proposed feature engineering is clear from the numerical results. The forecast accu-
racy of ESAENARX with extracted features is much better as compared to simple NARX. The extracted
features are informative therefore, the forecaster is able to model data better and forecast with greater
The proposed methods are compared with three types of ELMs; i.e., standard ELM, DE-ELM and RELM.
The comparative analysis of these methods is given below.
The ELM is optimized using a meta-heuristic optimization algorithm, named Differential Evaluation.
The initial weights and biases of ELM’s hidden and output layers are optimized using DE. DE is an
optimization method that iteratively improves the performance of an algorithm with respect to the opti-
RELM is a variant of the recurrent neural network. It is a combination of two methods, ELM and RNN.
ELM acts as an encoder, where the inputs and outputs of the network are same, i.e., the input features.
The learned weights of the ELM network are set as the initial weights of the RNN. By keeping the inputs
and outputs of ELM network similar, the learned weights are a good representation of the input features.
The number of neurons in the hidden layer of ELM and RNN is kept the same. Two ELM encoders are
trained, one for the hidden layer’s weights of RNN and second for the output layer’s weights of the RNN.
The learned weights, make the RNN converge fast and better. The results of RELM are slightly better
than DE-ELM and comparable to NARX. Both RELM and NARX belong to the same category of the
neural network that is known as a recurrent neural network.
The second proposed method DE-RELM performs reasonably well on load forecasting. The load fore-
casting results are much better as compared to other techniques and comparable to ESAENARX. How-
ever, no signiﬁcant improvement is seen in the price forecast. ESAENARX performs equally well for
both load and price. The DE-RELM trains the forecaster on learned weights, a minor improvement is
achieved, that is not comparable to ESAENARX. For price forecast only properly extracted features can
improve accuracy. ESAE extracts the relevant and the most informative features, that improves the fore-
ELM has the worst forecast results in the six compared methods. Because of the fact that ELM is a feed
forward network. Its weights are learned once in a forward pass and never updated. Therefore, to achieve
acceptable forecast results, the initial weights of the ELM have to be very optimized. NARX performs
better as compared to the ELM. However, its forecast results are not as accurate as the proposed methods
ESAENARX and DE-RELM. The errors MAPE and NRMSE are shown in table 4.
Table 4 Comparison of forecasting errors of ESAENARX MIMO and DE-RELM MIMO with
benchmark models on ISO NE dataset.
Forecast Method MAPE RMSE NRMSE
ELM 74.59 7.82 1.53
NARX 1.35 4.35 0.37
Load Forecast DE-ELM 21.73 5.23 0.41
RELM 18.78 4.62 0.37
CEANN 8.62 3.75 0.57
DE-RELM 7.78 3.14 0.32
ESAENARX 1.13 2.27 0.03
ELM 89.95 9.78 1.91
NARX 8.29 5.24 0.89
Price Forecast DE-ELM 28.06 6.92 0.32
RELM 21.06 5.62 0.28
CEANN 19.96 4.45 0.96
DE-RELM 18.62 3.75 0.34
ESAENARX 3.32 2.85 0.08
The forecast accuracy of all six methods is in sequence: ESAENARX > DE-RELM > NARX > DE-ELM
> RELM > ELM.
The lesser error than compared methods veriﬁes the good performance of the ESAENARX forecast
model. The results in ﬁgure 11 and ﬁgure 12, prove the better accuracy of ESAENARX and DE-RELM as
compared to ELM, DE-ELM, RELM and NARX. The MAPE and NRMSE of ESAENARX, DE-RELM,
ELM, DE-ELM, RELM, NARX and CEANN  are listed in table 4. The efﬁciency of ESAENARX
and DE-RELM is conﬁrmed by lesser MAPE and RMSE compared to the mentioned methods.
0 20 40 60 80 100 120 140 160
Figure 11. Comparison of ESAENARX and DE-RELM price prediction with
NARX, ELM and DE-ELM, ISO NE.
0 60 120 170
Figure 12. Comparison of ESAENARX and DE-RELM load prediction with
NARX, ELM and DE-ELM, ISO NE.
9 Results and Discussion of Model EDCNN
The simulation results of EDCNN wind forecasting model are discussed in this section. The problem
statement of the EDCNN is discussed in section 4.
9.1 Data Description
The three year hourly data of wind power is taken from ISO New England’s wind farm located in Maine
. The duration of data utilized in this research is from January 2015 to December 2017.
9.2 Wind Power Analysis
Wind power is the widely available RES, therefore it is one the most popular and emerging power gener-
ation source. The predictive analytics are performed on wind power data of Maine wind farms, ISO New
England. According to the annual report, Maine wind form produces approximately 900 MW energy,
annually, which contributes in almost 14% of the total electricity in the Maine state. The wind power is
directly proportional to the wind speed. The wind speed varies from season to season. In Maine USA,
the wind speed is affected by seasonality. The wind power in the autumn is higher compared to other
seasons. The reason behind this is the fastest winds in coastal area of Maine, where the wind turbines are
9.3 EDCNN Performance Evaluation
EDCNN is compared with two models: typical CNN and SELU CNN for wind power forecasting (as
shown in ﬁgure 13). For performance evaluation of wind power forecasting, three evaluation indicators
are used: Mean Absolute Error (MAE), Normalized Root Mean Square Error (NRMSE) and MAPE (as
shown in table 5).
0 5 10 15 20 25
Wind Power (MW)
0 5 10 15 20 25
Wind Power (MW)
0 5 10 15 20 25
Wind Power (MW)
0 5 10 15 20 25
Wind Power (MW)
Figure 13. All season predictions of wind power.
9.4 Statistical Analysis of EDCNN
The aforementioned error indicator (as shown in table 5) are utilized for accuracy comparison of forecast-
ing models. However, the lesser error or higher accuracy of a model doesn’t guarantee its superiority over
other models. A model is better as compared to another model, if the difference between their accuracies
is statistically signiﬁcant. Different statistical tests are used to validate the signiﬁcance of models, such
as Friedman test , error analysis , DM test , etc. To validate the performance of the proposed
forecasting model EDCNN, a well-known statistical test DM is used (as shown in table 6). Diebold and
Mariano propose the classical Diebold–Mariano statistical test in 1995 . The DM test evaluates the
signiﬁcant difference between forecasting errors of two models. In this research work, the error metric
used for DM is MAE. DM is widely used for validation of wind power forecasting .
Table 5 MAPE and NRMSE of the EDCNN MISO and compared methods.
Method Season MAPE NRMSE MAE
Spring 8.42 2.34 3.34
Summer 8.23 2.27 3.24
CNN Autumn 7.9 2.65 3.36
Winter 8.1 2.71 2.89
Spring 3.47 0.12 3.1
Summer 3.62 0.13 3.3
SELU CNN Autumn 3.45 0.12 3.4
Winter 3.27 0.17 3.2
Spring 2.67 0.092 2.4
Summer 2.43 0.096 2.24
EDCNN Autumn 2.56 0.085 2.67
Winter 2.62 0.094 2.18
The results of the DM test with conﬁdence level of 95% are shown in table 6. DM is applied to the
forecasting results of EDCNN and two compared methods: CNN and SELU CNN . Three compar-
isons are performed, i.e., EDCNN with CNN, EDCNN with SELU CNN and CNN with SELU CNN.
The EDCNN is better than CNN and SELU CNN. Whereas, SELU CNN is better than CNN.
Table 6 Diebold–Mariano test results of EDCNN MISO at a 95% conﬁdence level.
Season EDCNN SELU CNN CNN
Spring DM-MAE 1.4252 0.0842 1.4256
Summer DM-MAE 1.3262 0.1024 1.3692
Autumn DM-MAE 1.2714 0.1762 1.6728
Winter DM-MAE 1.4632 1.1426 1.2464
9.5 Analysis of Proposed DSM Algorithm
The results of the proposed DSM algorithm are shown in ﬁgure 14. It is clearly seen that the load from
peak hours are clipped and shifted to the off peak hours. The total power consumption, power supplied
by the MG and power consumed from the SG are shown in the ﬁgure 14. The proposed DSM scheme
is applied on 24 hours of 7th January 2017 because of the fairly reasonable wind power generation and
no zero generation hour throughout the day that leads to a clear depiction of DSM’s results. The purpose
of DSM is to reduce the consumption load of peak hours to minimize the usage of the dispatchable
generators of SG. The MG only has WPP and no dispatchable generators. If the wind generation is
insufﬁcient, the MG purchase energy from SG. If energy demand of MG’s consumers is in the peak
hours, then the load of MG is shifted from peak hours to off peak hours.
An assumption is made that the MG encourages its consumers to shift their load from peak hours to off
peak hours by offering some incentives and consumers shift their consumption load that leads to overall
load shifting in MG, consequently, the consumption cost of consumers is reduced. MG gets the advantage
of not purchasing more energy from SG in peak hours (where price is higher than off peak hours’ price)
that leads to the purchasing cost reduction for MG too. In this manner, the consumers will be satisﬁed as
well as MG will have cost effective demand management.
The proposed algorithm successfully shifts the load. In the proposed method, the load is shifted to off
peak hours that are not late night. This is suitable because late night at sleeping hours, the electricity
cannot be consumed much. The goal of almost normally distributing the load proﬁle is achieved. The
load before DSM and after applying proposed DSM algorithm is shown in ﬁgure 15.The load proﬁle after
DSM is more towards the normal distribution than the proﬁle before DSM. Exact normal distribution of
load is unable to achieve because of the ﬁxed working hours. The electricity consumption in working
hours cannot be shifted to other hours in a manner to achieve perfectly normal distribution of load. A
portion of load is able to be shifted that is known as shift-able load. The goals are to shift the shift-able
load in order to improve load factor and reduce price that are achieved by applying proposed DSM.
Another goal of the proposed DSM algorithm is reducing the consumption cost. When the load is shifted
to off peak hours, the consumption cost reduces due to the fact that the there is low power price in off
peak hours. The reduction in consumption cost achieved by the proposed DSM algorithm is presented
in Table 7that shows price before and after applying DSM algorithm. The cost reduced by DSM and its
percentage is also mentioned. On average 1.1% of total cost is reduced by applying the proposed DSM
algorithm. When the proposed algorithm is applied to the 365 days of the year 2017, approximately $2.25
million consumption cost is reduced. The DSM results of one day consumption cost from all four seasons
are presented in the Table 7. One day from every season of the year is taken for calculating results of DSM
algorithm; i.e., 1st January (Winter), 1st April (Spring), 1st July (Summer) and 1st October (Autumn).
Table 7 Energy consumption cost reduction by the proposed DSM algorithm.
Consumption Cost / Day ($) Reduction / Day
Season Before DSM After DSM Amount ($) Percentage
Spring 483330 475170 8153 $ 1.7%
Summer 793930.5 784403 7527 $ 1.2%
Autumn 417980.5 413770.5 4210 $ 1%
Winter 3347106 3305006 42109 $ 1.3%
0 5 10 15 20 25
SG Load Before DSM
SG Load After DSM
Figure 14. Valley ﬁlling and peak clipping through Efﬁcient DSM algorithm.
550 610 670 730 790
(a) Load curve before DSM.
535 605 675 745 815
(b) Load curve after DSM.
Figure 15. Effect of the proposed DSM scheme on load proﬁle.
10 Results and Discussion of DE-RUSBoost
In this section the experimental results and discussion are presented. The description of datasets used for
evaluating the proposed model is also given in this section. The problem statement of the DE-RUSBoost
is discussed in section 4.
10.1 Data Description
The State Grid Corporation of China (SGCC) dataset comprises of labeled data of 42,372 commercial and
residential electricity consumers. The data is the daily consumption of 1035 days from 1st January 2014
to 31st October 2016. Among 42,372 consumers, 3615 consumers are labeled as fraudulent. Only 9% of
the whole data is the electricity thieves. The data contains noisy and missing values that are replaced in
the preprocessing step. This data is published online on China State Grid Corporation’s website  in
10.2 Performance Evaluation
The performance of the proposed model is evaluated using ﬁve well-known classiﬁcation evaluation ma-
trices, i.e., recall, precision, speciﬁcity, accuracy and AUC. In classiﬁcation models, recall determines
the correctly classiﬁed positive samples that is also known as true positive rate. Precision or positive pre-
dictive value is the ratio of correctly classiﬁed positive examples to the total classiﬁed positive examples.
Speciﬁcity is also known as true negative rate and it presents the correctly classiﬁed negative examples.
Accuracy shows all the correctly classiﬁed examples. Whereas, AUC is a performance measure whose
value is high if both the true positive and true negative rates are high. The range of all the performance
matrices discussed above is from 0 to 1, where 0 is the worst and 1 is the best value. The proposed
classiﬁcation performance is compared with two electricity theft detection models that are proposed in
the papers  and . The performance is shown in table 8. Numerical comparison shows the superior
performance of the proposed method. The proposed method has greater accuracy and AUC as compared
to the other models that proves its effectiveness in ETD.
10.3 Comparisons and Discussion
The proposed model outperforms grid search based RUSBoost classiﬁer  in terms of all aforemen-
tioned performance matrices (as shown in table 8). This performance gain is achieved due to the ﬁne
tuned parameters of the classiﬁer. Although, the authors of  implement RUSBoost classiﬁer, how-
ever, the parameters are selected using grid search. Grid search is an exhaustive search method that selects
best parameters from a subset of all possible parameter values. It doesn’t guarantee the selection of the
best parameters for the classiﬁer. Whereas, in the proposed scheme, the best parameters are selected
using meta-heuristic technique DE. DE selects the best parameters according to the objective function
of maximizing the AUC (as shown in equation 7). Therefore, the parameters selected by DE are better
than grid search selected parameters. The performance of the proposed method is signiﬁcantly improved
from . The second comparative method is . This method is evaluated on SGCC dataset. A new
scheme Wide And Deep CNN (WADCNN) is proposed to detect electricity theft on the highly imbal-
anced labeled data. This method achieves reasonable performance, however, the class imbalance problem
is not tackled. With properly tackling the class imbalance problem, the proposed method’s performance
enhanced signiﬁcantly as compared to .
Table 8 Detection accuracy of the DE-RUSBoost model.
Method Accuracy Precision Recall Speciﬁcity AUC
RUSBoost  0.863 0.736 0.726 0.872 0.762
WADCNN  0.925 0.766 0.792 0.752 0.801
DE-RUSBoost 0.956 0.902 0.735 0.996 0.896
10.4 Parameter Study
The performance of the proposed algorithm is analyzed with multiple values of hyper parameters, i.e.,
imbalance ratio and number of trees.
10.4.1 Effect of Weak Learners’ Number
The number of weak learners impacts the AUC as shown in ﬁgure 16. The weak learners in our case are
decision trees. Less number of trees in RUSBoost results in low AUC. If the number of trees are kept on
increasing, it has a positive impact on AUC, however, after a certain number of trees, the AUC becomes
stable. After that point, if the number of trees is further increased, the classiﬁer becomes unstable and
its performance degrades that results in lesser AUC. In the ﬁgure 16, the DE-RUSBoost performance
becomes almost stable after 200 trees, a minor improvement is visible till it achieves maximum AUC
on 320 trees. There is no signiﬁcant improvement in AUC after that point, whereas, the AUC start
decreasing as clearly visible in the ﬁgure, at 350 trees, AUC decrease a little bit as compared to 320 trees
and after that drastic degradation in accuracy is seen. The reason behind the performance degradation on
too many weak classiﬁers is the over ﬁtting. The classiﬁer over trains on the training samples, therefore,
it misclassiﬁes the unseen test samples.
10.4.2 Effect of Class Imbalance Ratio
In ﬁgure 17, the impact of class imbalance on the AUC is shown. The percentage shown in this ﬁgure
is the percentage of training samples of the majority class to the minority class (i.e., fair consumers’
consumption data). The class imbalance ratio is optimized using DE to achieve the highest AUC. The
highest AUC is achieved on class imbalance percentage of 56%, 44%. As the class imbalance percentage
increases, the performance degrades. This happens because the minority class samples are underrep-
resented. The classiﬁer has not sufﬁcient training samples of a minority class, therefore, it is unable to
learn and generalize it. After the training, the classiﬁer is more biased towards the majority class and mis-
classiﬁes the test samples from minority class. It is clear from ﬁgure 17, that the classiﬁer’s performance
signiﬁcantly degrades on the high class imbalance percentage.
50 100 150 200 250 300 350 400
Number of Trees
Figure 16. Area under the curve versus number of trees.
(50:50)% (54:46)% (58:42)% (62:38)% (66:34)% (70:30)% (74:26)%
Class Imbalance Percentage
Figure 17. Area under the curve versus imbalance percentage of training examples
of two classes.
11 Future Work
In the previous section, the preliminary results are shown. The future work is given below:
• The performance of SISO (DLSTM) and MIMO (ESAENARX and DE-RELM) models will be
validated by conducting multiple case studies and applying them to different scenarios.
• The wind and photovoltaic power will be analyzed deeply in order to quantify their impacts on
green house gas emissions and electricity generation cost.
• A labeling method will be introduced to label the unlabeled electricity consumption data to identify
the fair and malicious electricity consumers.
 Daki H, El Hannani A, Aqqal A, Haidine A, Dahbi A. Big Data Management in Smart Grid: Con-
cepts, Requirements and Implementation. Journal of Big Data. 2017 Dec;4(1):13–27.
 Li C, Yu X, Yu W, Chen G, Wang J. Efﬁcient Computation for Sparse Load Shifting in Demand Side
Management. IEEE Transactions on Smart Grid. 2016 Feb 12;8(1):250–61.
 Zhou K, Fu C, Yang S. Big Data driven Smart Energy Management: From Big Data to Big Insights.
Renewable and Sustainable Energy Reviews. 2016 Apr 1;56:215–225.
 Wang K, Yu J, Yu Y, Qian Y, Zeng D, Guo S, Xiang Y, Wu J. A Survey on Energy Internet: Architec-
ture, Approach, and Emerging Technologies. IEEE Systems Journal. 2017 Jan 5;12(3):2403–2416.
 Jiang H, Wang K, Wang Y, Gao M, Zhang Y. Energy Big Data: A Survey. IEEE Access.
 Liu Y, Wang W, Ghadimi N. Electricity Load Forecasting by an Improved Forecast Engine for Build-
ing Level Consumers. Energy. 2017 Nov 15;139:18–30.
 Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A Novel Bidirectional Mechanism based on Time Series
Model for Wind Power Forecasting. Applied energy. 2016 Sep 1;177:793–803.
 Naz A, Javaid N, Rasheed MB, Haseeb A, Alhussein M, Aurangzeb K. Game Theoretical Energy
Management with Storage Capacity Optimization and Photo-Voltaic Cell Generated Power Forecast-
ing in Micro Grid. Sustainability. 2019 Jan;11(10):2763–2781.
 U.S. Department of Energy, Staff Report to the Secretary on Electricity Markets and Reliability,
2017. Online available at: https://www.energy.gov/downloads 1st March 2019).
 De Jong P, Kiperstok A, Sanchez AS, Dargaville R, Torres EA. Integrating Large Scale Wind Power
into the Electricity Grid in the Northeast of Brazil. Energy. 2016 Apr 1;100:401–415.
 Global Wind Energy Council. GWEC Global Wind Report 2016. Online available at: https://
(Last accessed on 1st March 2019).
 U.S. Department of Energy, 20% Wind Energy by 2030: Increasing Wind Energy’s Contribution to
US Electricity Supply, Energy Efﬁciency and Renewable Energy (EERE), 2008. Online available at:
https://www.energy.gov/eere/wind (Last accessed on 1st March 2019).
 Athari MH, Wang Z. Impacts of Wind Power Uncertainty on Grid Vulnerability to Cascading Over-
load Failures. IEEE Transactions on Sustainable Energy. 2017 Jun 22;9(1):128–137.
 Wang Q, Martinez-Anido CB, Wu H, Florita AR, Hodge BM. Quantifying the Economic and Grid
Reliability Impacts of Improved Wind Power Forecasting. IEEE Transactions on Sustainable Energy.
2016 May 13;7(4):1525–1537.
 Swinand GP, O’Mahoney A. Estimating the Impact of Wind Generation and Wind Forecast Errors
on Energy Prices and Costs in Ireland. Renewable energy. 2015 Mar 1;75:468–473.
 Chen Z. Wind Power in Modern Power Systems. Journal of Modern Power Systems and Clean
Energy. 2013 Jun 1;1(1):2–13.
 Haque AU, Nehrir MH, Mandal P. A Hybrid Intelligent Model for Deterministic and Quantile Re-
gression Approach for Probabilistic Wind Power Forecasting. IEEE Transactions on power systems.
2014 Jan 28;29(4):1663–1672.
 Juban J, Siebert N, Kariniotakis GN. Probabilistic Short-Term Wind Power Forecasting for the
Optimal Management of Wind Generation. In 2007 IEEE Lausanne Power Tech 2007. Jul 1;683–
 Akhavan-Hejazi H, Mohsenian-Rad H. Power Systems Big Data Analytics: An Assessment of
Paradigm Shift Barriers and Prospects. Energy Reports. 2018 Nov 1;4:91–100.
 Zhang Q, Yang LT, Chen Z, Li P. A survey on Deep Learning for Big Data. Information Fusion.
2018 Jul 1;42:146–157.
 Nadeem Z, Javaid N, Malik A, Iqbal S. Scheduling Appliances with GA, TLBO, FA, OSR and their
Hybrids using Chance Constrained Optimization for Smart Homes. Energies. 2018 Apr;11(4):888–
 Wang K, Xu C, Zhang Y, Guo S, Zomaya AY. Robust Big Data Analytics for Electricity Price
Forecasting in the Smart Grid. IEEE Transactions on Big Data. 2017 Jul 5;5(1):34–45.
 Mahmood D, Javaid N, Alrajeh N, Khan Z, Qasim U, Ahmed I, Ilahi M. Realistic Scheduling
Mechanism for Smart Homes. Energies. 2016 Mar;9(3):202–220.
 Rasheed MB, Javaid N, Malik MS, Asif M, Hanif MK, Chaudary MH. Intelligent Multi-agent based
Multilayered Control System for Opportunistic Load Scheduling in Smart Buildings. IEEE Access.
2019 Feb 18;7:23990–24006.
 Rasheed M, Javaid N, Ahmad A, Khan Z, Qasim U, Alrajeh N. An Efﬁcient Power Scheduling
Scheme for Residential Load Management in Smart Homes. Applied Sciences. 2015;5(4):1134–
 Hafeez G, Javaid N, Iqbal S, Khan F. Optimal Residential Load Scheduling under Utility and
Rooftop Photovoltaic Units. Energies. 2018 Mar;11(3):611–633.
 Javaid N, Ahmed F, Ullah I, Abid S, Abdul W, Alamri A, Almogren A. Towards Cost and Comfort-
Based Hybrid Optimization for Residential Load Scheduling in A Smart Grid. Energies. 2017 Oct
 Naz M, Iqbal Z, Javaid N, Khan Z, Abdul W, Almogren A, Alamri A. Efﬁcient Power Scheduling
in Smart Homes using Hybrid Grey Wolf Differential Evolution Optimization Technique with Real
Time and Critical Peak Pricing Schemes. Energies. 2018 Feb 7;11(2):384–409.
 Khalid R, Javaid N, Rahim MH, Aslam S, Sher A. Fuzzy Energy Management Controller and
Scheduler for Smart Homes. Sustainable Computing: Informatics and Systems. 2019 Mar 1;21:103–
 Samuel O, Javaid S, Javaid N, Ahmed S, Afzal M, Ishmanov F. An Efﬁcient Power Scheduling in
Smart Homes Using Jaya Based Optimization with Time-of-Use and Critical Peak Pricing Schemes.
Energies. 2018 Nov;11(11):3155–3179.
 Javaid N, Ahmed A, Iqbal S, Ashraf M. Day Ahead Real Time Pricing and Critical Peak
Pricing-Based Power Scheduling for Smart Homes with Different Duty Cycles. Energies. 2018
 Rahim M, Khalid A, Javaid N, Ashraf M, Aurangzeb K, Altamrah A. Exploiting Game Theoretic-
Based Coordination Among Appliances in Smart Homes for Efﬁcient Energy Utilization. Energies.
 Javaid N, Ahmed F, Ullah I, Abid S, Abdul W, Alamri A, Almogren A. Towards Cost and Comfort
Based Hybrid Optimization for Residential Load Scheduling in A Smart Grid. Energies. 2017 Oct
 Javaid N, Ullah I, Akbar M, Iqbal Z, Khan FA, Alrajeh N, Alabed MS. An Intelligent Load Man-
agement System with Renewable Energy Integration for Smart Homes. IEEE Access. 2017 Jun
 Liu JP, Li CL. The Short-Term Power Load Forecasting Based on Sperm Whale Algorithm and
Wavelet Least Square Support Vector Machine with DWT-IR for Feature Selection. Sustainability.
 Fan GF, Peng LL, Zhao X, Hong WC. Applications of Hybrid EMD with PSO and GA for A
SVR-Based Load Forecasting Model. Energies. 2017 Oct 26;10(11):1713–1734.
 Ghasemi A, Shayeghi H, Moradzadeh M, Nooshyar M. A Novel Hybrid Algorithm for Electricity
Price and Load Forecasting in Smart Grids with Demand-Side Management. Applied energy. 2016
 Singh S, Yassine A. Big Data Mining of Energy Time Series for Behavioral Analytics and Energy
Consumption Forecasting. Energies. 2018 Feb 20;11(2):452–470.
 Wang L, Zhang Z, Chen J. Short-Term Electricity Price Forecasting with Stacked Denoising Au-
toencoders. IEEE Transactions on Power Systems. 2016 Nov 15;32(4):2673–2681.
 Tong C, Li J, Lang C, Kong F, Niu J, Rodrigues JJ. An Efﬁcient Deep Model for Day-Ahead Elec-
tricity Load Forecasting with Stacked Denoising Autoencoders. Journal of Parallel and Distributed
Computing. 2018 Jul 1;117:267–273.
 Ahmad A, Javaid N, Guizani M, Alrajeh N, Khan ZA. An Accurate and Fast Converging Short-
Term Load Forecasting Model for Industrial Applications in A Smart Grid. IEEE Transactions on
Industrial Informatics. 2016 Dec 9;13(5):2587–2596.
 Ahmad A, Javaid N, Alrajeh N, Khan Z, Qasim U, Khan A. A Modiﬁed Feature Selection and
Artiﬁcial Neural Network-Based Day-Ahead Load Forecasting Model for A Smart Grid. Applied
 Kuo PH, Huang CJ. An Electricity Price Forecasting Model by Hybrid Structured Deep Neural
Networks. Sustainability. 2018 Apr;10(4):1280–1300.
 Ugurlu U, Oksuz I, Tas O. Electricity Price Forecasting using Recurrent Neural Networks. Energies.
 Fan C, Xiao F, Zhao Y. A Short-Term Building Cooling Load Prediction Method using Deep Learn-
ing Algorithms. Applied energy. 2017 Jun 1;195:222–233.
 Ryu S, Noh J, Kim H. Deep Neural Network-Based Demand Side Short Term Load Forecasting.
Energies. 2016 Dec 22;10(1):3–21.
 Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep Learning for Estimating Building Energy
Consumption. Sustainable Energy, Grids and Networks. 2016 Jun 1;6:91–99.
 Li C, Ding Z, Zhao D, Yi J, Zhang G. Building Energy Consumption Prediction: An Extreme Deep
Learning Approach. Energies. 2017;10(10):1525–1543.
 Fu G. Deep Belief Network-Based Ensemble Approach for Cooling Load Forecasting of Air-
Conditioning System. Energy. 2018 Apr 1;148:269–282.
 Dedinec A, Filiposka S, Dedinec A, Kocarev L. Deep Belief Network-Based Electricity Load Fore-
casting: An Analysis Of Macedonian Case. Energy. 2016 Nov 15;115:1688–1700.
 Ahmad A, Javaid N, Mateen A, Awais M, Khan Z. Short-Term Load Forecasting in Smart Grids:
An Intelligent Modular Approach. Energies. 2019 Jan;12(1):164–185.
 Zahid M, Ahmed F, Javaid N, Abbasi RA, Kazmi Z, Syeda H, Javaid A, Bilal M, Akbar M, Ilahi
M. Electricity Price and Load Forecasting Using Enhanced Convolutional Neural Network and En-
hanced Support Vector Regression in Smart Grids. Electronics. 2019 Feb;8(2):122–142.
 Khan M, Javaid N, Naseem A, Ahmed S, Riaz M, Akbar M, Ilahi M. Game Theoretical Demand
Response Management and Short-Term Load Forecasting By Knowledge-Based Systems on The
Basis of Priority Index. Electronics. 2018 Dec;7(12):431–455.
 Naz A, Javed MU, Javaid N, Saba T, Alhussein M, Aurangzeb K. Short-Term Electric Load And
Price Forecasting Using Enhanced Extreme Learning Machine Optimization in Smart Grids. Ener-
gies. 2019 Jan;12(5):866–887.
 Mujeeb S, Javaid N, Akbar M, Khalid R, Nazeer O, Khan M. Big Data Dnalytics for Price and
Load Forecasting in Smart Grids. InInternational Conference on Broadband and Wireless Comput-
ing, Communication and Applications 2018 Oct 27, 77–87. Springer, Cham.
 Qiu X, Ren Y, Suganthan PN, Amaratunga GA. Empirical Mode Decomposition-Based Ensemble
Deep Learning for Load Demand Time Series Forecasting. Applied Soft Computing. 2017 May
 Rahman A, Srikumar V, Smith AD. Predicting Electricity Consumption for Commercial and Res-
idential Buildings using Deep Recurrent Neural Networks. Applied energy. 2018 Feb 15;212:372–
 Bouktif S, Fiaz A, Ouni A, Serhani M. Optimal Deep Learning LSTM Model for Electric Load
Forecasting Using Feature Selection and Genetic Algorithm: Comparison with Machine Learning
Approaches. Energies. 2018 Jul;11(7):1636–1658.
 Mujeeb S, Javaid N, Javaid S. Data Analytics for Price Forecasting in Smart Grids: A Survey. In
2018 IEEE 21st International Multi-Topic Conference (INMIC) 2018 Nov 1, 1–10. IEEE.
 Mujeeb S, Javaid N, Akbar M, Khalid R, Nazeer O, Khan M. Big Data Analytics for Price and
Load Forecasting in Smart Grids. InInternational Conference on Broadband and Wireless Comput-
ing, Communication and Applications 2018 Oct 27, 77–87. Springer, Cham.
 Ayub N, Javaid N, Mujeeb S, Zahid M, Khan WZ, Khattak MU. Electricity Load Forecasting in
Smart Grids Using Support Vector Machine. InInternational Conference on Advanced Information
Networking and Applications 2019 Mar 27 (pp. 1-13). Springer, Cham.
 Mujeeb S, Javaid N, Gul H, Daood N, Shabbir S, Arif A. Wind Power Forecasting based on Efﬁcient
Deep Convolution Neural Networks. InInternational Conference on P2P, Parallel, Grid, Cloud and
Internet Computing 2019 Nov 7, 47–56. Springer, Cham.
 Zheng H, Yuan J, Chen L. Short-Term Load Forecasting using EMD-LSTM Neural Networks with
a XGboost Algorithm for Feature Importance Evaluation. Energies. 2017 Aug;10(8):1168–1188.
 Shi H, Xu M, Li R. Deep Learning for Household Load Forecasting–A Novel Pooling Deep RNN.
IEEE Transactions on Smart Grid. 2017 Mar 22;9(5):5271–5280.
 Perez-Chacon R, Luna-Romera J, Troncoso A, Martinez-Alvarez F, Riquelme J. Big Data Analytics
for Discovering Electricity Consumption Patterns in Smart Cities. Energies. 2018;11(3):683–700.
 Grolinger K, L’Heureux A, Capretz MA, Seewald L. Energy Forecasting for Event Venues: Big
Data and Prediction Accuracy. Energy and Buildings. 2016 Jan 15;112:222–233.
 Wang P, Liu B, Hong T. Electric Load Forecasting with Recency Effect: A Big Data Approach.
International Journal of Forecasting. 2016 Jul 1;32(3):585–597.
 Wang HZ, Li GQ, Wang GB, Peng JC, Jiang H, Liu YT. Deep Learning based Ensemble Approach
for Probabilistic Wind Power Forecasting. Applied energy. 2017 Feb 15;188:56–70.
 Torres JM, Aguilar RM. Using Deep Learning to Predict Complex Systems: A Case Study in Wind
Farm Generation. Complexity. 2018;2018:1–10.
 Qureshi AS, Khan A, Zameer A, Usman A. Wind Power Prediction Using Deep Neural Network
Based Meta Regression and Transfer Learning. Applied Soft Computing. 2017 Sep 1;58:742–755.
 Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-Term Residential Load Forecasting based
on LSTM Recurrent Neural Network. IEEE Transactions on Smart Grid. 2019 Jan 18;10(1):841–851.
 Sun M, Zhang T, Wang Y, Strbac G, Kang C. Using Bayesian Deep Learning to Capture Uncer-
tainty for Residential Net Load Forecasting. IEEE Transactions on Power Systems. 2019 Jun 21;
 Ye C, Ding Y, Wang P, Lin Z. A Data-Driven Bottom-Up Approach for Spatial and Temporal
Electric Load Forecasting. IEEE Transactions on Power Systems. 2019 Jan 4;34(3):1966–1979.
 Gao W, Darvishan A, Toghani M, Mohammadi M, Abedinia O, Ghadimi N. Different States of
Multi-Block based Forecast Engine for Price and Load Prediction. International Journal of Electrical
Power & Energy Systems. 2019 Jan 1;104:423–435.
 Zheng Z, Yang Y, Niu X, Dai HN, Zhou Y. Wide and Deep Convolutional Neural Networks for
Electricity-Theft Detection to Secure Smart Grids. IEEE Transactions on Industrial Informatics. 2017
 Avila NF, Figueroa G, Chu CC. NTL Detection in Electric Distribution Systems using the Maximal
Overlap Discrete Wavelet-Packet Transform and Random Undersampling Boosting. IEEE Transac-
tions on Power Systems. 2018 Jul 5;33(6):7171–7180.
 Mujeeb S, Javaid N, Ilahi M, Wadud Z, Ishmanov F, Afzal MK. Deep Long Short-Term Mem-
ory: A New Price and Load Forecasting Scheme for Big Data in Smart Cities. Sustainability. 2019
 Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks. In Advances
in neural information processing systems 2014, 3104–3112.
 Zaytar MA, El Amrani C. Sequence to Sequence Weather Forecasting with Long Short-Term
Memory Recurrent Neural Networks. International Journal of Computer Applications. 2016
 Mujeeb S, Javaid N. ESAENARX and DE-RELM: Novel Schemes for Big Data Predictive Analyt-
ics of Electricity Load and Price. Sustainable Cities and Society. 2019 Nov 1;51:101642–101655.
 Hida T, Kuo HH, Potthoff J, Streit L. White Noise: An Inﬁnite Dimensional Calculus. Springer
Science & Business Media; 2013 Jun 29.
 Coifman RR, Wickerhauser MV. Entropy-Based Algorithms for Best Basis Selection. IEEE Trans-
actions on Information Theory. 1992 Mar;38(2):713–718.
 Chen X, Li S, Wang W. New De-Noising Method for Speech Signal Based on Wavelet Entropy and
Adaptive Threshold. Journal of Information & Computational Science. 2015, 12(3):1257–1265.
 Mujeeb S, Alghamdi TA, Ullah S, Fatima A, Javaid N, Saba T. Exploiting Deep Learning for Wind
Power Forecasting Based on Big Data Analytics. Applied Sciences. 2019 Jan;9(20):4417–4445.
 Fukushima K. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern
Recognition Unaffected by Shift in Position. Biological Cybernetics. 1980 Apr 1;36(4):193–202.
 Figueroa G, Chen YS, Avila N, Chu CC. Improved Practices in Machine Learning Algorithms for
NTL Detection with Imbalanced Data. In 2017 IEEE Power & Energy Society General Meeting 2017
Jul 16, 1–5. IEEE.
 Winkler D, Haltmeier M, Kleidorfer M, Rauch W, Tscheikner-Gratl F. Pipe Failure Modelling for
Water Distribution Networks using Boosted Decision Trees. Structure and Infrastructure Engineer-
ing. 2018 Oct 3;14(10):1402–1411.
 Krawczyk B, Galar M, Jelen L, Herrera F. Evolutionary Undersampling Boosting for Imbalanced
Classiﬁcation of Breast Cancer Malignancy. Applied Soft Computing. 2016 Jan 1;38:714–726.
 Elhassan, T. and Aljurf, M. Classiﬁcation of Imbalance Data using Tomek Link (T-Link) Combined
with Random Under-sampling (RUS) as a Data Reduction Method. Global Journal of Technology &
Optimization. 2016, DOI: 10.4172/2229-8711.S1:111.
 Civicioglu P, Besdok E, Gunen MA, Atasever UH. Weighted Differential Evolution Algorithm for
Numerical Function Optimization: A Comparative Study with Cuckoo Search, Artiﬁcial Bee Colony,
Adaptive Differential Evolution, And Backtracking Search Optimization Algorithms. Neural Com-
puting and Applications. 2018:1–15.
 ISO NE Market Operations Data, https://www.iso-ne.com/isoexpress/web/
reports/pricing/-/tree/zone-info (Last visited on 10th Febraury 2019)
 Yang Z, Ce L, Lian L. Electricity Price Forecasting by A Hybrid Model, Combining Wavelet Trans-
form, ARMA and Kernel-Based Extreme Learning Machine Methods. Applied Energy. 2017 Mar
 Buitrago J, Asfour S. Short-Term Forecasting of Electric Loads Using Nonlinear Autoregressive
Artiﬁcial Neural Networks with Exogenous Vector Inputs. Energies. 2017 Jan 1;10(1):40–60.
 Lago J, De Ridder F, De Schutter B. Forecasting spot electricity prices: Deep Learning Approaches
and Empirical Comparison of Traditional Algorithms. Applied Energy. 2018 Jul 1;221:386–405.
 Derrac J, Garcia S, Molina D, Herrera F. A Practical Tutorial on The Use of Nonparametric Sta-
tistical Tests as A Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms.
Swarm and Evolutionary Computation. 2011 Mar 1;1(1):3–18.
 Martin P, Moreno G, Rodriguez F, Jimenez J, Fernandez I. A Hybrid Approach to Short-Term Load
Forecasting Aimed at Bad Data Detection in Secondary Substation Monitoring Equipment. Sensors.
 Diebold FX, Mariano RS. Comparing Predictive Accuracy. Journal of Business & economic statis-
tics. 2002 Jan 1;20(1):134–144.
 Ludwig N, Feuerriegel S, Neumann D. Putting Big Data Analytics to Work: Feature Selection for
Forecasting Electricity Prices using the LASSO and Random Forests. Journal of Decision Systems.
2015 Jan 2;24(1):19–36.
 ISO NE Generation Data, https://www.iso- ne.com/isoexpress/web/reports/
operations/-/tree/daily-gen-fuel-type (Last visited on 20th January 2019).
 Chen H, Wan Q, Wang Y. Reﬁned Diebold-Mariano test methods for the evaluation of wind power
forecasting models. Energies. 2014 Jul 1;7(7):4185–4198.
 State Gride Corporation of China Data, http://www.sgcc.com.cn (Last visited on 20th
Tentative Time table
Sr No. Activity Date
1 Background study and detailed literature review Completed
2 Formulation of problem and proposing solution Completed
3 Analysis and dissemination of results April
4 Thesis Writing May
Recommendation by the Research Supervisor
Recommendation by the Research Co-Supervisor
Signed by Supervisory Committee
S.# Name of Committee member Designation Signature & Date
1 Dr. Sohail Asghar Professor
2 Dr. Nadeem Javaid Associate Professor
3 Dr. Manzoor Ilahi Associate Professor
4 Dr. Majid Iqbal Associate Professor
Approved by Departmental Advisory Committee
Certiﬁed that the synopsis has been seen by members of DAC and considered it suitable for
putting up to BASAR.
Departmental Advisory Committee
Dean, Faculty of Information Sciences & Technology
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
_____________________Approved for placement before BASAR.
_____________________Not Approved on the basis of following reasons
Dean, Faculty of Information Sciences & Technology
Please provide the list of courses studied
1. Special Topics in Artiﬁcial Neural Networks
2. Advanced Topics in Data Mining
3. Advanced Topics in Computer Vision
4. Special Topics in Machine Learning
5. Advanced Topics in Digital Image Processing
6. Special Topics in Computer Vision