Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on Oct 31, 2019

Content may be subject to copyright.

Data Analytics based Efﬁcient Energy Management in Smart Grids

Ph.D Synopsis

In

Computer Science

By

Sana Mujeeb

CIIT/SP15-PCS-003/ISB

Supervised By

Dr. Manzoor Ilahi Tamimi,

Co-Supervised By

Dr. Nadeem Javaid

Communication over Sensors (ComSens) Research Lab

Department of Computer Science, COMSATS University Islamabad

Islamabad - Pakistan

List of Publications

Journal Publications

1. Sana Mujeeb and Nadeem Javaid, “ESAENARX and DE-RELM: Novel Schemes for Big

Data Predictive Analytics of Electricity Load and Price”, Sustainable Cities and Society,

Volume: 51, Article Number: 101642, Pages: 1-16, Published: November 2019, ISSN:

2210-6707. (IF= 4.624, Q1)

2. Sana Mujeeb, Turki Ali Alghamdi, Sameeh Ullah, Ayesha Fatima, Nadeem Javaid and

Tanzila Saba, “Exploiting Deep Learning for Wind Power Forecasting based on Big Data

Analytics”, Applied Sciences, Volume: 9, Issue: 20, Article Number: 4417, Pages: 1-19,

Published: October 2019, ISSN: 2076-3417. (IF=2.217, Q2).

3. Sana Mujeeb, Nadeem Javaid, Manzoor Ilahi, Zahid Wadud, Farruh Ishmanov and Muham-

mad Khalil Afzal, “Deep Long Short-Term Memory: A New Price and Load Forecasting

Scheme for Big Data in Smart Cities”, Sustainability, Volume: 11, Issue: 4, Article Num-

ber: 987, Pages: 1-29, Published: February 2019, ISSN: 2071-1050. (IF=2.592, Q2)

Conference Proceedings/Book Chapters

1. Sana Mujeeb, Nadeem Javiad and Sakeena Javaid, “Data Analytics for Price Forecasting

in Smart Grids: A Survey”, in the 21st International Multitopic Conference (INMIC),

2018, pp: 1-10.

2. Sana Mujeeb, Nadeem Javaid, Sakeena Javaid, Asma Raﬁque and Manzoor Ilahi, “Big

Data Analytics for Load Forecasting in Smart Grids: A Survey”, International Conference

on Cyber Security and Computer Science (ICONCS), 2018, pp: 193-202.

3. Sana Mujeeb, Nadeem Javaid, Rabiya Khalid, Orooj Nazeer, Isra Shaﬁ and Mahnoor

Khan, “Big Data Analytics for Price and Load Forecasting in Smart Grids”, in 13th Inter-

national Conference on Broadband and Wireless Computing, Communication and Appli-

cations (BWCCA), 2018, pp: 77-87, ISBN: 978-3-030-02613-4.

COMSATS University Islamabad, Islamabad Campus

Synopsis For the Degree of M.S/MPhil.

XPhD.

PART-1

Name of Student Sana Mujeeb

Department Department of Computer Science

Registration No.

Sp15-pcs-003 Date of Thesis Registration 01-07-2016

Name of

(i) Research Supervisor

(ii) Co-Supervisor

(i) Dr. Manzoor Ilahi

(ii) Dr. Nadeem Javaid

Research Area Data Science

Members of Supervisory Committee

1 Dr. Sohail Asghar

2 Dr. Nadeem Javaid

3 Dr. Manzoor Ilahi

4 Dr. Majid Iqbal

Title of Research Proposal Data Analytics based Efﬁcient Energy Management in Smart Grids

Signature of Student:

Summary of the Research

With the advent of Smart Grid (SG), the data collected by smart meters and Phasor Measurement

Units (PMUs) has become a valuable source for grid operators and researchers to perform ad-

vanced analytics. In SG, the energy-related data is collected in a very huge volume, at a high

velocity from a variety of sources. Data analytics provide solutions to the emerging challenges

of power systems, such as: Demand Side Management (DSM), environmental pollution (due to

carbon emission), fossil fuel dependency mitigation, reliable Renewable Energy Sources (RESs)

incorporation, cost curtailment, grid’s stability and security. The global energy demand is increas-

ing with the increasing population. The trivial power generation source, i.e., fossil fuel is decreas-

ing continuously. Moreover, the environmental pollution is increasing at an alarming rate due to

the carbon emission for trivial power generation sources. Therefore, effective DSM and RES in-

corporation have become important to maintain demand, supply balance and optimize energy in

an environment friendly manner. DSM programs are based on the future energy consumption and

price predictions. On the other hand, the reliable incorporation of RES is possible if there is a

correct estimation of future generation. For this purpose, Deep Learning (DL) combined with data

analytics techniques are proposed in this research. The aim of this research is to explore the SG

databases and device solutions to the aforementioned problems. First, the predictive modeling is

used for learning the consumption pattern from the data, to ensure the uninterrupted power supply.

Predictive analytics are performed on energy price that is beneﬁcial in effective DSM programs’

formulation. Moreover, as a popular RES, the wind power is analyzed and predicted. A DSM

algorithm is proposed that considers the day-ahead energy price, consumption and wind power

forecast for energy demand management. This research applies the data science techniques to the

smart grid data as well as elaborates the beneﬁts of this emerging data to the smart grid.

1 Introduction

Smart Grid (SG) is a modern and intelligent power grid that efﬁciently manages the genera-

tion, distribution and consumption of electricity [1]. It introduced communication, sensing and

control technologies in power grids. It facilitates consumers in an economical, reliable, sustain-

able and secure manner. Consumers can manage their energy demand in an economical fashion

based on Demand Side Management (DSM). The DSM program allows customers to manage

their load demand according to the price variations [2]. It offers energy consumers for load

shifting and energy preservation in order to reduce the cost of power consumption. Smart grid

establishes an interactive environment between energy consumers and utility. Customers par-

take in smart grid operations, for reducing the price by load shifting and energy preservation.

Competitive electricity markets beneﬁt from load and price forecast. Several important operat-

ing decisions are based on load forecasts, such as power generation scheduling, demand supply

management, maintenance planning and reliability analysis. Price forecast is crucial to energy

market participants for bidding strategies formulation, assets allocation, risk assessment and

facility investment planning. Effective bidding strategies help market participants in maximiz-

ing proﬁt. Utility maximization is the ultimate goal of both power producers and consumers.

With the help of a robust and exact price estimate, power producers can maximize proﬁt and

consumers can minimize the cost of their purchased electricity. The efﬁcient generation and

consumption is another crucial issue in the energy sector. Most of the generated electricity

cannot be stored, therefore, a perfect equilibrium is necessary to be maintained between the

generated and consumed electricity. Therefore, an accurate forecast of both electricity load and

price holds a great importance in market operations management.

Electricity load and price have a relationship of direct proportionality. However, some unex-

pected variations are observed in the price data. There are various reasons for these unexpected

changes in price pattern. In reality, the price is not only affected by the change in load. There

are several different parameters that inﬂuence the energy price, such as, fuel price, availabil-

ity of inexpensive generation sources (e.g., photovoltaic generation, windmill generation, etc.),

weather conditions, etc.

Energy data exhibits a few characteristics: (i) data as an energy: data analytics should cause

energy saving, (ii) data as an exchange: energy data should be exchanged and integrated with

other sources of data to identify its value, (iii) data as an empathy: data analytics should help in

improving the service quality of energy utilities [3].

For making decision regarding energy market operation, predictive analytics is performed on

this load and price data. For maintaining the demand and supply balance, an accurate predic-

tion of load is essential. Whereas, the price forecasting plays an important role in the bidding

process and energy trading. Data analytics enables identiﬁcation of hidden patterns, consumer

preferences, market trends, and other valuable information that helps utility company to make

strategic business decisions. The size of real-world historical data of smart grid is very large

[4]. Authors survey smart grid data with great detail in [5]. This large volume of data enables

energy utilities to make novel analysis leading to major improvements in the market operation’s

planning and management. Utilities can have a better understanding of consumer’s behavior,

demand, consumption, power failures, downtimes, etc.

Smart grid provides energy in an efﬁcient, secure, reliable, economical and environment-friendly

manner. RESs of power generation are integrated for reducing the carbon emission. It allows a

two-way communication between the consumers and utility. With the emergence of smart me-

tering infrastructure, consumers are informed about the price per unit in advance. Consumers

can adjust their demand load economically according to the price signals. They can reduce

consumption cost by shifting load to a low price hour. Smart grids make a price responsive

environment where the price varies from a change in demand and vice versa.

In unidirectional grids, there is a one-way interaction from generation side to consumers that

leads to inefﬁcient energy management.

NOMENCLATURE

ABC Artiﬁcial Bee Colony aActivation value of a neuron

AEMO Australia Electricity Market Operators ρAverage activation of sparse parameter

ANN Artiﬁcial Neural Networks b Bias

ARIMA Auto-Regressive Integrated Moving Average C Cost function of SAE

CNN Convolution Neural Networks δDelta function

DNN Deep Neural Networks bDeviation matrix

DSM Demand Side Management εtError term of NARX

DE Differential Evaluation xInput vector to network

DWT Discrete Wavelet Transform αLearning rate

ELM Extreme Learning Machine βMomentum to network

ISONE Independent System Operator New England yNetwork output or forecast value

LSSVM Least Square Support Vector Machine nNumber of delays in NARX

LSTM Long Short Term Memory hQuadrature mirror ﬁlter

MAE Mean Absolute Error cj,kScale coefﬁcient

NARX Nonlinear Autoregressive Network with Exogenous

inputs

σSigmoid function

NRMSE Normalized Root Mean Square Error E Squared error

NYISO New York ISO λThreshold for wavelet denoising

RNN Recurrent Neural Network tTime step

SAE Sparse Auto-Encoders dj,kWavelet coefﬁcient

STLF Short-Term Load Forecast ωWavelet decomposed signal

WNN Wavelet Neural Network W Weight of network connection

WPT Wavelet packet Transform SyWhite noise

The price and demand forecasting play an important role in energy: systems planning, market

design, security of supply, and operation planning for future power consumption. An accurate

forecast is very important. A 1% reduction in Mean Absolute Percentage Error (MAPE) of load

forecast reduces the generation cost to 0.1% to 0.3% [6]. 0.1% generation cost is approximately

$1 million annually in a large scale smart grid. Due to the importance of an accurate forecast

of electricity price and load, the researchers are still competing for improving the forecast accu-

racy.

Due to the continuous depletion of the fossil fuel, the energy crisis has become crucial [7,8]. To

mitigate the energy crisis, regulative acts that encourage the utilization of renewable energy are

promoted worldwide. Wind power has attracted a lot of a attention as a RESs, recently. Wind

power has gained popularity due to its characteristics of: wide availability, low investment cost

[9] and no carbon emission. Wind power helps in reducing environmental pollution [10]. It

is introduced worldwide as a way to reduce greenhouse gas emission. Moreover, wind power

generation leads to fuel cost saving as the wind has zero fuel cost. According to the Global

Wind Energy Council [11], the cumulative capacity of wind power reached 486 GW across the

global market in 2016. Wind power is expected to signiﬁcantly expand leading to an overall

zero emission power system. The U.S. Department of Energy Target of Renewable Integration

is responsible for providing 20% of the total energy through wind, by the year 2030 [12]. In

this regard, the Independent System Operators (ISOs) are producing signiﬁcant wind power and

increasing their wind generation.

The wind power is majorly affected by meteorological conditions, especially wind speed. The

wind power exhibits strongly volatile and intermittent behavior resulting in uncertain power

output. This uncertainty signiﬁcantly affects the quality of power system operations, such as,

distribution, dispatching, peak load management [13], etc. The greatest challenge of adapting

the wind power on a large scale is the control of its uncertain output. The effective solution

to this issue is the correct estimate of future wind power. The correct Wind Power Forecasting

(WPF) helps in improving the operation scheduling of power systems. The operating sched-

ule for backup generators and storage systems are optimized based on the accurate WPF. The

accuracy of WPF determines the amount of cost curtailment for power generation [14]. A 1%

improvement in WPF accuracy results in 0.06% reduction in generation system’s cost that is

approximately $6 million cost saving in a large scale power system with 30% wind penetration

level [15].

It is acknowledged widely that accurate WPF signiﬁcantly reduces the risks of incorporating

wind power in power supply systems [16]. Generally, the WPF results are in the determinis-

tic form (i.e., point forecast). Reducing the forecasting errors of WPF is the focus of many

researchers [17]. A point forecast is the estimated value of future wind energy. However, the

wind power is random variable having a Probability Density Function (PDF), and point fore-

casts are unable to capture the uncertainty of this random variable. This is the limitation of the

point forecasts. Therefore, point forecasts have limited use in stability and security analysis of

power systems. To overcome the limitation of point forecasts, deep learning methods are widely

used in the ﬁeld of WPF. Deep Neural Networks (DNN) have the inherent property of automatic

modeling of the wind power characteristics [18].

Using large data for predictive analytics improves the forecasting accuracy [19]. Electricity data

is big data, as the smart meters record data in small time intervals. In a large-sized smart grid,

approximately 220 million smart meter measurements are recorded daily. The volume of input

data is increasing and training of classical forecasting methods is difﬁcult. Processing of big

data by classiﬁer based models is very difﬁcult. Because of their high space and time complex-

ity. On the other hand, DNNs perform very well on big data [20]. DNN have an excellent ability

of self learning and nonlinear approximation. They optimize the space by dividing the training

data into mini-batches. After dividing, whole data is trained batch by batch.

2 Research Objectives

The major objectives of this research is efﬁcient, economical and environmental-friendly en-

ergy management using data analytics’ techniques. It includes reducing environmental pollu-

tion, saving energy generation and consumption cost, mitigating dependency on fossil fuels for

generation and minimizing energy losses either they technical losses (due to over-generation) or

non-technical losses (caused by electricity theft). The aims of this research work are:

1. To utilize past of power systems and deep learning techniques for accurately forecasting

the electricity consumption and price.

2. To forecast wind power generation accurately in order to ensure its reliable integration in

power systems to make them eco-friendly.

3. To detect electricity theft by analyzing past consumption data for minimizing the non-

technical loss of power.

4. To forecast photovoltaic power and quantify RESs’ impact on carbon emissions, energy

price and cost.

3 Related Work

The imbalance ratio between energy demand and supply cause energy scarcity. To reduce the

scarcity and utilize energy efﬁciently, DSM and Supply Side Management (SSM) techniques are

proposed. Mostly, researchers focus on appliance scheduling to reduce the load on utility and

balance supply and load. However, with the appliance scheduling, the user comfort is compro-

mised [21] - [34]. Therefore, Short-Term Load Forecasting (STLF) is important. STLF enables

the utility to generate sufﬁcient electricity to meet the demand.

Several forecasting methods are available in the literature, from classic statistical to modern

machine learning methods. Generally, forecasting models can be categorized into three major

categories: classical, artiﬁcial intelligence and data-driven. Classical methods are the statisti-

cal and mathematical such as, Auto-Regressive Integrated Moving Average (ARIMA), Seasonal

ARIMA (SARIMA), Naive Bayes, Random Forest, etc. Artiﬁcial intelligence methods are Ar-

tiﬁcial Neural Networks (ANNs), Particle Swarm Optimization (PSO), etc. Classiﬁer-based

approaches are widely used for forecasting, such as (Sperm Whale Algorithm) SWA + LSSVM

(Least Square Support Vector Machine) [35], SVM + PSO [36], FWPT (Flexible Wavelet Packet

Transform), TVABC (Time-Varying Artiﬁcial Bee Colony), LSSVM (FWPT + LSSVM + TV-

ABC) [37] and Differential Evaluation (DE) + SVM. Although, the aforementioned methods

show reasonable results in load or price forecasting; however, they are computationally com-

plex.

The existing forecasting methods mostly forecast only load or price. A forecasting method that

can accurately forecast both load and price together is greatly required. Conventional forecast-

ing methods in literature have to extract most relevant features with great effort [38]-[41] before

forecasting. For feature extraction, correlation analysis or other feature selection techniques are

used. Whereas, ANNs have an advantage over other methods that they automatically extract

features from data and learn complex and meaningful pattern efﬁciently. However, Shallow

ANN (SANN) [42]-[44] tends to over-ﬁt. The optimization is required for improving forecast

accuracy of SANN.

Recently, DNNs have shown promising results in forecasting of electricity load [45]-[55], price

[56]-[61] and generation [62]. In [45], authors used Restricted Boltzman Machine (RBM) with

pre-training and Rectiﬁed Linear Unit (ReLU) to forecast day and week ahead load. RBM re-

sults in accurate forecast compared to ReLU. Deep Auto-Encoders (DAE) are implemented in

the paper [46] for prediction of building’s cooling load. DAE is unsupervised learning method.

It learns the pattern of data very well and predicts with greater accuracy. The authors of [47]

implement Gated Recurrent Units (GRU) for price forecasting that is a type of Recurrent Neural

Networks (RNN). GRU outperforms Long Short-Term Memory (LSTM) and several statistical

time series forecasting models. Authors of [48] proposed a hybrid model for price forecasting.

Two deep learning methods are combined, i.e., Convolution Neural Networks (CNN) are used

for useful feature’s extraction. LSTM forecasting model is learned on features extracted by

CNN. This hybrid model performs better than both CNN and LSTM. This model outperforms

several state-of-the-art forecasting models. The good performance of the aforementioned DNN

models proves the effectiveness of deep learning in forecasting. A brief description of related

work is listed in table 1.

In smart grid, data analysis helps in ﬁnding the trend of electricity consumption [45]-[50] and

price [56]-[61]. This further enables the utility to design predictive demand supply mainte-

nance programs. The demand-supply maintenance programs ensure the demand-supply bal-

ance. Smart grid data is studied for: power system anomaly detection [63], optimal placement

of computing units for communicating data to smart grid [64], price forecasting [41] and con-

sumption forecasting [65]-[67]. The aforementioned methods show reasonable results in load or

price forecasting; however, most of these methods do not consider the forecasting of both load

and price. The classiﬁer based forecasting methods require extensive feature engineering and

model optimization, resulting in high complexity.

To ensure the reliability, stability and security of smart grid, accurate forecasts of electricity load

and price is essential. Electricity load and price have bi-directional nature, therefore simultane-

ous prediction of load and price yields greater accuracy.

In paper [21], the authors have predicted price and load simultaneously using a multi-stage

forecasting approach. The complex forecasting approach proposed in this work is comprised of

feature selection and multi-stage forecast engine. Features are selected through a modiﬁed Max-

imum Relevancy Minimum Redundancy (MRMR) method. Electricity load and price is forecast

using multi-block ANN known as Elman Neural Network (ENN). The forecasting model is op-

timized by a shark smell optimization method.

Table 1 Related Work

Task Platform / Scenario Dataset Algorithms

Load and price fore-

casting [21]

Hourly data of 6 states OF USA NYISO, 2015 MRMR, Multi-block Elman

ANN, Enhanced shark smell

optimization

Load forecast [35] Historic load Sichuan Energy Internet

Research Center dataset,

2015–2016

Sperm Whale Algorithm,

Wavelet Least Square SVM,

Wavelet transform, inconsis-

tency rate model

Load forecast [36] Half hourly consumption NYISO, NSW, 2007 Hybrid EMD, PSO, GA, SVR

Price and load fore-

cast, DSM [37]

Historic price and load Hourly load and price of

(i) NYC, (ii) PJM, (iii)

NYC, 2010, 2013, 2014

FWPT, NLSSVM, ARIMA,

TV-ABC

Consumption fore-

casting [38]∗

6 second resolution consumption of

5 homes with 109 domestic appli-

ance

UK-Dale, 2012–2015 Association rule mining, In-

cremental k-means clustering,

Bayesian network

Price forecasting [39] Hourly price of 5 hubs of MISO USA, 2012–2014 Stacked Denoising Auto-

encoders (SDA)

Consumption fore-

casting [40]

Aggregated hourly load of four re-

gions

Los Angeles, California,

Florida, New York City,

USA, August 2015–2016

SDA, SVR

Consumption fore-

casting [41]

Electricity market data of 3 grids:

FE, DAYTOWN, and EKPC

PJM, USA, 2015 Mutual Information (MI),

ANN

Consumption fore-

casting [42]

Electricity market data of 2 grids:

DAYTOWN, and EKPC

PJM, USA, 2015 Modiﬁed MI + ANN

Price forecasting [43] Half hourly price of PJM Intercontinental Ex-

change, USA

LSTM, CNN

Price forecasting [44] Turkish day-ahead market electric-

ity prices

Turkey, 2013–2016 RNN

Cooling load forecast-

ing [45]

Cooling load of an educational

building

Hong Kong, 2015 Elastic Net, SAE, RF, MLR,

Gradient Boosting (GB) Ma-

chines, Extreme GB tree, SVR

Consumption fore-

casting [46]

Hourly load of Korea Electric

Power Corporation

South Korea, 2012–2014 RBM

Consumption fore-

casting [47]

Individual house consumption of

7km of Paris

Individual household elec-

tric power consumption,

France, 2006–2010

Conditional RBM (CRBM),

Factored CRBM

Load forecasting [48] 15 minute resolution of one retail

building

Fremont, CA SAE, Extreme Learning Ma-

chine (ELM)

Load forecasting [49] 15 minutes cooling consumption of

a commercial building in Shenzhen

city

Guangdong province,

South China, 2015

Empirical Mode Decomposi-

tion (EMD), Deep Belief Net-

works (DBN)

Load forecasting [50] Hourly consumption from Macedo-

nian Transmission Network Opera-

tor

Republic of Macedonia,

2008–2014

DBN

Load forecasting [56] Hourly consumption from Australia AEMO, 2013 EMD, DBN

Load forecasting

[57]∗

Hourly consumption of a public

safety building, Salt Lake City,

Utah. Aggregated hourly consump-

tion of residential buildings, Austin,

Texas

USA, 2015, 2016 LSTM

Load forecasting

[58]∗

Half hourly metropolitan electricity

consumption

France, 2008–2016 LSTM, GA

Load forecasting [63] Hourly aggregated consumption of

6 states OF USA

ISO NE, 2003–2016 Xgboost weighted k-means,

EMD-LSTM

Load forecasting [64] Ireland consumption Smart meter database of

load proﬁle, Ireland

Pooling deep RNN

Load forecasting [65] Eight buildings of a public univer-

sity

15 min consumption,

2011–2017

K-means clustering, Davies–

Bouldin distance function

Consumption and

peak demand fore-

casting [66]∗

Entertainment venues of Ontario Daily, hourly and 15 min

energy consumption,

2012–2014

ANN, SVR

Demand forecast-

ing [67]

21 zones of USA Temperature, humidity

and consumption data,

2004–2007

Recency effect model without

computational constraints

Wind power forecast-

ing [68]

5–min intervals past wind power SIWF wind farm, China,

2011–2013

Wavelet transform, Ensemble

CNN

Wind power forecast-

ing [69]

Wind speed, wind direction, tem-

perature, humidity, pressure

MADE wind farm, ITER,

Tenerife Island, Spain

Feed Forward ANN, SELU

CNN, RNN

Wind power forecast-

ing [70]

Wind power, weather forecasts 5 Wind farms of Europe Mutual Information, Deep au-

toencoders, Deep Belief Net-

work

∗Papers have forecasting horizon of medium-term, rest are short-term.

This method results in a reasonable forecasting accuracy. However, it is computationally very

expensive. The feature engineering process and optimization of ENN increase complexity.

Authors of paper [22] have conducted a predictive analysis of electricity price forecasting tak-

ing advantage of big data. The relevant features for the training prediction model are selected

through an extensive feature engineering process. This process has three steps: ﬁrstly, correlated

features are selected using GCA. Secondly, features are selected through a hybrid of two feature

selection methods: RliefF and Random Forest (RF), is used for further feature selection. Lastly,

the Kernel PCA (KPCA) is applied for dimension reduction. Price is predicted by SVM, the

hyper-parameters of SVM are optimized through modiﬁed DE. Albeit, this framework results

in acceptable accuracy in the price forecasting. However, the extensive feature engineering and

model optimization increase the computational complexity. In paper [38], the authors forecast

the energy consumption on big data. An analysis of frequent patterns is performed using a su-

pervised clustering method. Energy consumption is forecast by the Bayesian network. Authors

of paper [39] have utilized the computational power of deep learning for Electricity Price Fore-

casting (EPF). SDA and RANSAC-SDA (RS-SDA) models are implemented for online and day

ahead hourly EPF. The three years (i.e., January 2012 – November 2014) data utilized in this

research. Data is collected from Texas, Arkansas, Nebraska, Indiana and Louisiana ISO hubs

in the USA. Comprehensive analyses of the capabilities of the RS-SDA and SDA models in the

EPF are performed. The effectiveness of the proposed models is validated through their com-

parative analyses with classical ANN, SVM, and multivariate adaptive regression splines. Both

the SDA and RS-SDA models are able to accurately predict electricity price with a considerably

less MAPE as compared to the aforementioned models.

A deep learning model for STLF is proposed by Tong et al [40]. The features are extracted

using SDA from the historical electricity load and corresponding temperature data. Support

Vector Regressor (SVR) model is trained for the day ahead STLF. The SDA has effectively

extracted the abstract features from data. SVR model trained on these extracted features fore-

casts electricity load with low errors. Proposed model outperforms simple SVR and ANN in

terms of forecasting accuracy, which validates its performance. The Shallow SANN is utilized

for electricity load forecasting in [41] and [42]. SANN have the problem of over-ﬁtting. To

avoid overﬁtting, hyper parameters’ optimization is required, that increases the complexity of

the forecasting model. A hybrid deep learning method is applied to forecast price in [43]. Two

deep learning methods are combined in this research work. Features are extracted by CNN.

Short-term energy price is predicted using LSTM. Half hourly price data of PJM 2017 is used

for prediction. Previous 24 hours price is used to predict the next 1-hour electricity price. The

hybrid DNN structure has 10 hidden layers. It has 2 convolution layers, 2 max-pooling layers,

3 ReLU, 1 batch normalization layer, 1 LSTM layer for prediction and the last hidden layer is

a fully connected layer. The CNN feature extractor has 7 hidden layers and LSTM predictor

has 3 hidden layers. The output of 7t h hidden layer of feature extractor CNN becomes the input

of LSTM predictor. Proposed method outperforms simple CNN, LSTM and various machine

learning methods. Authors of [44] have utilized the GRU in RNN for EPF.

Recently deep learning forecasting methods has shown good performance in electricity price

[43]-[45] and load forecasting [46]-[67]. However, the interdependency of load and price are

not considered in these DNN forecasting models. Deep learning is an effective technique for big

data analytics [20]. With the high computation power and ability to model huge data, DNN gives

the deeper insights into data. In [20], authors perform a comprehensive and detailed survey on

the importance of deep learning techniques in the area of big data analytics.

4 Problem statement

To manage electricity efﬁciently, its wastage should be minimized in a reliable, economic and

environment-friendly manner. The electricity is wasted due to over generation and electricity

theft. These two problems can be solved using data analytics. Through accurate electricity de-

mand forecasting, over generation can be avoided. If electricity theft is detected efﬁciently, then

the loss can be compensated. Efﬁcient DSM results in cost effective and environment-friendly

energy management. Therefore, accurate electricity demand forecasting, price forecasting, gen-

eration forecasting and theft detection lead to an efﬁcient power system.

Authors of [22] perform predictive analytics of electricity price using a hybrid framework. The

extensive feature engineering process increases the computational complexity. Differential

Evolution based optimized SVM tends to over-ﬁt that results in low forecasting accuracy.

To avoid the extensive feature engineering process, the deep learning forecasting methods are

proposed. Kong et al. [71] present a DNN; LSTM as an individual and aggregated residential

LF model. The LSTM’s weights and biases are randomly initialized; therefore, it has the

problem of slow learning rate, over-ﬁtting and high error rate. In [72], authors have pre-

sented a novel prior and posterior probability based Bayesian DNN (BDNN) for residential net

LF. It suffer from few major limitations, such as, the hand-crafting of the input features that

requires domain knowledge, model’s high sensitive to the initial prior that is difﬁcult to

select and parameter optimization through Grid Search (GS) that doesnot guarantee the

selection of most appropriate parameter values. Due to these limitations, the accuracy is

negatively affected. Ye et al. in [73] propose a temporal and Spatial LF (SLF) model using

Sparse Autoencoder (SAE) based feature extractor and Softmax forecaster. The time complex-

ity of the SLF’s training is very high that ranges from 22 minutes to 62 minutes. The

error rate is very high which proves the inefﬁciency of the forecasting model*. In [74],

authors simultaneously predict load and price using a multi-stage forecasting approach. It is

computationally very expensive. The feature engineering process and optimization of El-

man Neural Network increase complexity. The incorporation of the inherent bi-directional

relation of electricity load and price in prediction models’ inputs results in higher predic-

tion accuracy as compared to separate forecasting; however, the correlation of electricity

load and price is not taken into consideration in [74].

In [68], authors propose a univariate wind power prediction model that has the problem of

low prediction accuracy because it doesn’t take into account the effect of exogenous vari-

ables that impact the wind power; such as, wind speed, wind direction and hour. Authors

in [69], use exogenous input variables for wind power forecasting; however, inefﬁcient fea-

ture engineering result in low forecasting accuracy.

In [75], a deep CNN based electricity theft detection model is proposed. The authors donot

take into account the impact of inherent imbalance data while detection that drastically

decrease the theft detection rate. Authors in [76] tackle the data imbalance issue and pro-

pose an ensemble model named Random Under Sampling Boosting (RUSBoost) for Electricity

Theft Detection (ETD). The detection accuracy is improved from [75]. However, the false de-

tection rate is high because model’s parameters are not optimized efﬁciently. GS is used

for parameter optimization that is computationally complex and doesnot guarantee best

parameters; consequently, detection accuracy is affected negatively.

*The solution of electricity load and price forecasting is discussed in detail in section 5.1. Its results are

discussed in 7.

*The ANN autoencoders based simultaneous electricity load and price forecasting models are discussed

in sections 5.2 and 5.3. Results of both models are elaborated in 8.

*The details of enhanced WPF model EDCNN are presented in section 5.4. The simulation results prov-

ing the efﬁciency of the EDCNN are discussed in section 9.

*The enhanced algorithm for electricity theft detection DE-RUSBoost is presented in section 5.5. The

experimental results and analysis are given in the section 10.

5 Proposed System Model

Figure 1illustrates the proposed system for efﬁcient energy management that minimizes the power sys-

tem’s losses caused by over generation and electricity theft. The ﬁrst layer of model shows the data

acquisition. The raw data is pre-processed in the second layer. In the third layer, data is processed. The

results obtained from third layer are sent to the ﬁrst layer where power systems make decisions based on

these results. In second and third layers, data analytics are performed in order to achieve useful results

that help in improvement of operational planning power systems. There are three modules in the system

model. First module that contains 5.1, 5.2 and 5.3 (discussed in sections 5.1, 5.2 and 5.3, respectively) is

for predictive analytics of electricity price and load. In the ﬁrst module, three deep learning based models

are proposed; i.e., Deep LSTM (DLSTM) (model 5.1 for univariate data), Efﬁcient Sparse Auto-encoder

Nonlinear Autoregressive Network with Exogenous inputs (ESAENARX) (model 5.2 for multivariate

data) and Differential Evaluation Recurrent ELM (DE-RELM) (model 5.3 for multivariate data). In sec-

ond module (model 5.4 discussed in section 5.4), for predictive analytics of wind power, a deep learning

based model named Efﬁcient Deep Convolution Neural Network (EDCNN) is proposed. In the third

module (model 5.5 discussed in section 5.5), for electricity theft detection, an ensemble model named

Differential Evolution Random Under Sampling Boosting (DE-RUSBoost) is proposed. The proposed

model is discussed below.

Layer 1: Data Acquisition

Suppliers Consumers

. . .

. . .

. . .

Layer 2: Data Pre-Processing

Layer 3: Data Processing

Data Cleaning

(Missing Values'

Interpolation

Normalization)

Feature Engineering

Extraction

(Enhanced Sparse

Autoencoder)

Selection

(Spearman

Correlation Analysis)

Smart Grid Smart Community

Load and Price Forecasting

(DLSTM

DE-RELM

ESAENARX)

Wind Power Forecasting

(EDCNN)

Electricity Theft Detection

(DE-RUSBoost)

5.1, 5.2, 5.3 5.5

5.4

Microgrid with RES

Figure 1. Schematic diagram of proposed model.

5.1 DLSTM Single Input Single Output (SISO) Model for Electricity Load and

Price Forecasting

The proposed method comprises of four main parts: preprocessing of data, training LSTM network,

validation of network and forecasting on the test data [77]. System model is shown in ﬁgure 2. The

problem statement of this model is stated in the section 4. The steps in the proposed model are listed as

follows:

1. The historical price and load vectors are pand l, respectively, which are normalized as:

pnor =p−mean(p)

std(p)(1)

Where pnor is vector of normalized price, mean(·)is the function to calculate average and std(·)

is the function to calculate standard deviation. This normalization is known as zero mean unit

variance normalization. Price data is splitted month-wise. Data is divided into three partitions:

train, validate and test.

2. Network is trained on training data and tested on validation data. NRMSE is calculated on valida-

tion data.

3. Network is tuned and updated on actual values of validation data.

4. The upgraded network is tested on the test data where day-ahead, week-ahead and month ahead

prices and load are forecast. Forecaster’s performance is evaluated by calculating the NRMSE.

Normalization

Training data

preparation

Predicted Load and Price

Electricity Market Historic Load and

Price Data

Preprocessing

Forecast Engine

.

.

.

.

.

.

Hourly

Price /

Hourly

Load

.

.

.

.

.

.

Load

/

Price

.

.

.

.

.

.

.

.

.

Figure 2. Overview of DLSTM SISO model for electricity load and price forecast-

ing.

5.1.1 Data Preprocessing

Hourly data of regulation market capacity clearing price and system load is acquired from ISO NE. The

data of ISO NE is eight years, i.e., from January 2011 to March 2018. Data comprises of price and load

of 7 complete years, i.e., 2011 to 2017 and only three months data for the year 2018, i.e., January to

March. The data is divided month-wise. For example, data of January 2011, January 2012, .. ., January

2018 are combined, all twelve months data is combined in the same fashion. The DLSTM network is

trained on month-wise data. Data is partitioned into three parts: training, validation and test data.

5.1.2 Working of DLSTM

The DLSTM network works on the train and update state method [78,79]. At a time step t, the network

learns a value of price or load time series and stores a state. On the next time step, the network learns the

next value and updates the state of previously learned network. All data is learned in the same fashion

to train the network. While testing, the last value of training data is taken as the initial input. One value

is predicted at a time step t. Now, this predicted value is made the part of training data and network is

trained and updated. Every predicted value is made the part of the training data to predict the next value.

For example, if network dlstmnis learned on nvalues, the nth value is the input to predict the n+1th

value. After predicting the n+1th value, the network dlstmn+1is now trained and updated on n+1 values

to predict the n+2th value. The n+1th value is the ﬁrst predicted value by the initially learned network

dlstmn. To predict mvalues, the network will train and update mtimes. After predicting mvalues, the last

trained and updated network dlstmn+mis trained on n+mvalues, i.e., n,n+1,n+2.. . , n+m−1. The

step by step ﬂow of the proposed method is shown in the ﬂowchart, ﬁgure 3.

5.1.3 Network Training and Forecasting

Training, validation and test data is obtained by preprocessing the data. The price and load data is feed to

the DLSTM network for training. The proposed DLSTM has ﬁve layers, i.e., an input layer, two LSTM

layers, a fully connected layer and the regression output layer. The number of hidden units in LSTM layer

1 is 250, and LSTM layer 2 is 200. The ﬁnal number of hidden units are decided after experimenting

on a different number of hidden units and keeping the number of hidden units with the least forecast

error. During the training process of DLSTM, the network predicts step ahead values at every time step.

The DLSTM learns patterns of data at every time step tand updates the network trained till the previous

time step t-1. Every predicted value is made the part of the whole data for the next prediction. In this

manner, the network trained iteratively. DLSTM network is trained for price and load data separately.

The network trained on training data is the initial network. Initial network is tested on validation data.

The initial network forecasts step ahead value on validation data. After taking forecast results from the

initial network the NRMSE is calculated. The initial network re-learns and re-tunes on actual values of

validation data until the NRMSE reduces to a desired level. Now the ﬁnal and tuned network is used to

forecast price and load.

Start

Historical load data

Historical price data

Normalize price data Normalize price data

Prepare month-wise price data

No

Split Training Xt, Valida on Xv and Tes ng Xs Data

Forecast Engine

Split Training, Valida on and Tes ng Data

Tune and update network on Xv

stopping criterion

satisfied

Is

Forecast load and

Print forecasted price and load

YesNo

Finish

Xt

price on test data Xs

Figure 3. Flowchart of the DLSTM SISO.

5.2 ESAENARX Multiple Inputs Multiple Outputs (MIMO) Model for Simulta-

neous Electricity Load and Price Forecasting

ESAENARX is a two stage predictive models [80]. In the ﬁrst stage, feature are extracted using the

proposed efﬁcient feature extractor ESAE. Price and load are simultaneously predicted by NARX in the

second stage. In the next section, the proposed methods are described in detail. The problem statement

of ESAENARX is given in section 4. The proposed system model is shown in ﬁgure 4.

5.2.1 Efﬁcient SAE (ESAE)

The ESAE is proposed to create a better representation of electricity data, that is useful for an accurate

forecast of price and load. In this section, the proposed feature extractor ESAE is discussed in detail.

5.2.2 Pre-training of ESAE

To initialize the weights and biases, an unsupervised pre-training is applied. Where the input of a hidden

layer is the output of its previous layer. In the pre-training step, the initial biases and weights of the

autoencoder are learned.

In the proposed method, the input data Xtis corrupted by introducing white noise [81]. The white noise

is added to randomly selected 30% data points. A random process y(t)is known as white noise if the

Sy(f)is constant on all the frequencies f:

x1

xn

x2

D

D

D å s

.

.

.

Input

Layer

Hidden

Layer 1

Output Layer

.

.

.

w11

w1n

w12

w21

w22

w2n

Outputs

.

.

.

.

.

.

Hidden

Layer 2

Time

Delay

Layer

w31

w32

w3n

Smart Grid

Historic Temperature Forecast

s

Load Forecast

Price Forecast

Historic Data ESAE Feature Extractor MIMO Forecaster ESAENARX

Figure 4. Overview of ESAENARX MIMO model for simultaneous electricity

load and price forecasting.

Sy(f) = N0

2∀f(2)

The white noise describes random disturbances with small correlation periods. The white noise general-

ized correlation function is deﬁned as:

B(t) = δ(t)σ2(3)

Where δ(t)is the delta function and σis a positive constant.

5.2.3 Fine-tuning of ESAE

The ﬁne-tuning step is followed by the pre-training step. In the ﬁne-tuning, the wavelet denoising is

proposed as the encoding transfer function of the ﬁrst hidden layer of ESAE. The activation function

of the second layer is sigmoid. The wavelet denoising has two steps: (i) wavelet packet decomposition

and (ii) reconstruction denoising operation. Firstly, the input time series is decomposed into different

frequency band by passing through the high pass and low pass ﬁlters. Then, the frequency band of noise

is set to be zero. The signal is then reconstructed using wavelet reconstruction function, that is inverse of

wavelet decomposition function [82]. Wavelet decomposition operation can be expressed as:

cj,k=∑ncj−1,hn−2k

dj,k=∑ndj−1,gn−2kk= (1,2, ... , N−1)

Where cj,kis scale coefﬁcient, dj,kis the wavelet coefﬁcient, hand gare the quadrature mirror ﬁlter

banks. jis level of decomposition and Nrepresents the sampling points. The wavelet reconstruction

function that is inverse wavelet decomposition is expressed as:

cj−1,n=∑

n

cjhk −2n+∑

n

djgk −2n(4)

The denoising operation is shown by equations below.

ˆ

ωj,k=sign(ωj,k(ωj,k−Tλ)),|ωj,k| ≥ λ,

0,|ωj,k|<λ.

Where ˆ

ωj,kis denoised signal, ωj,kis wavelet transformed signal and λis threshold.

In ESAE feature extractor, the number of the units in hidden layer one and two are 400 and 300, re-

spectively. The coefﬁcient that controls the layer 2 weight regularization is set to be 0.001. Sparsity

regularization is 4 and sparsity proportion is 0.05. A maximum number of epochs is 100. The algorithm

for learning of weights is scale conjugate gradient descent.

5.2.4 Non-linear Autoregressive Network with Exogenous Inputs (NARX)

NARX is an autoregressive recurrent ANN. Its feedback connections enclose several hidden layers of

the network while leaving the input layer. NARX has a memory that is utilized for creating a nonlinear

mapping between inputs and outputs. The network learns from the recurrence on the past values of time

series and the past predicted values of the network [83]. For predicting a value y(t), the inputs of the

NARX are y(t−1),y(t−2),...,y(t−d). NARX can be explained by the following equation 5.

ˆy(t+1) = f(y(t),y(t−1), ..., y(t−d),x(t+1),x(t), .. . , x(t−d)) + ε(t)(5)

Where ˆy(t+1)is network’s output at t,f() is the nonlinear mapping function, y(t),y(t−1), ..., y(t−d)

are the past observed values, x(t+1),x(t), ..., x(t−d)are the network’s inputs, number of the delays

is dand the error term is denoted by ε(t). In the proposed NARX, for simultaneous forecasting of price

and load, the number of delays are 2. The hidden layers of the network are 10. The training function is

Levenberg Marquardt.

The deep learning is well known for its high precision feature extraction. A sparse autoencoder deep neu-

ral network with dropout is proposed to extract useful feature. This deep neural network can signiﬁcantly

reduce the adverse effect of overﬁtting, making the learned features more conducive for identiﬁcation

and forecasting. NARX is applied for prediction. The proposed Multi Input Multi Output (MIMO)

model predicts the price and load, simultaneously. Features are extracted using ESAE. Then, the NARX

network is trained for simultaneous forecasting of price and load. The ﬂowchart is shown in ﬁgure 5.

Start

Extracted FeaturesDe-normalization Forecasting by

NARX

Price and load

forecasts

Min-max

normalization of data

Finish

Stage 1: Feature Extraction

Stage 2: Prediction

Pre-training Fine-tuning

Encoding with SAE

Corrupting input

features with white

noise

Fine-tuning with

efficient SAE

Figure 5. Step by step ﬂow of ESAENARX MIMO.

The input features are: hour, temperature forecast, wind speed forecast, lagged load and lagged price.

There are two targets: electricity load and price. The prediction process has the following ﬁve steps.

1. Inputs and targets are normalized using min-max normalization. Suppose an input vector:

X= [x1,x2,x3,...,xn]. The number of instances in the vector is n. The min-max normalization

is obtained by equation 6:

Xnor =xi−Xmin

Xmax −Xmin

(6)

Where i=1,2,...,n.

2. The normalized inputs are fed to train the ESAE feature extractor. After the ESAE is trained, the

input features are encoded using this trained ESAE. The output of ESAE is the encoded features.

3. The encoded features are given as input to train NARX network. 80% data is given for training,

15% is used for validation and 5% is used for testing.

4. The price and load are predicted for 168 hours of one week.

5. The predicted values of load and price are de-normalized to obtain actual values. The NARX

accurately predicts the price and load, simultaneously.

5.3 DE-RELM MIMO Model for Simultaneous Electricity Load and Price Fore-

casting

The third proposed model is also a MIMO model like ESAENARX [80]. Its problem statement is given

section 4. DE-RELM is an efﬁcient method for electricity load and price forecasting. DE-RELM has

three stages, in the ﬁrst stage, the parameters of ELM are optimized by applying DE algorithm. In

the second stage, ELM is trained. The inputs and outputs of ELM are the input features of load and

price. With similar inputs and outputs, ELM acts like an encoder. Once the optimized ELM is trained,

the learned weights are set as the initial weights of the RNN network that is used for forecasting. The

learned weights of ELM are the best representation of the input data. Setting these initial weights helps

RNN to converge fast and forecast accurately. This is the third and ﬁnal stage of DE-RELM. The number

of neurons in the hidden layer of ELM and RNN is kept same. In order to use the learned weights of

ELM for the RNN network, the dimensions of weight vectors have to be the same. For the prediction of

load and price, DE-RELM follows the steps shown in the ﬂowchart, ﬁgure 6.

Start

De-normalization

Price and load

forecasts

Min-max

normalization of data

Finish

Stage 1: ELM optimization

Stage 2: Training ELM

Select weights and

biases with DE No

Yes

Stage 3: Prediction with DE-RELM

Calculate objective

function

Train ELM with same

inputs and outputs

Learned Weights Train ELM with

optimized weights

Forecasting by DE-

RELM

Initialize DE-RELM

with learned weights

Figure 6. Flowchart of the DE-ELM MIMO.

1. The inputs and targets are normalized using min-max normalization (as shown in equation 6).

2. The normalized inputs are given to the ELM networks as inputs and outputs. The network is

trained.

3. The forecasting error is calculated by MAPE.

4. The DE algorithm is used to optimize the weights and biases of ELM. The objective function of

DE is the minimization of the prediction error.

Ob j =minimize "1

n

n

∑

i=1

Xact

i−yf or

i

Xact

i

100#(7)

Where yf or

iis the forecast value, Xact

iis the value of the actual target.

5. When the forecasting error reduces to the desired value, the optimized ELM network is trained.

6. The weights of ELM are set as initial weights of the RNN network.

7. The RNN network predicts the price and load simultaneously.

8. The predicted values are de-normalized by inverse min-max function as shown in equation 8.

X= [xf or ×(Xmax −Xmin)] + Xmin (8)

Where xf or is the forecast value, Xmax is the maximum value of the actual target and Xmin is the

minimum value of the actual target.

In DE-RELM, the number of neurons in the hidden layer of ELM and RNN is 100. ELM has 1 hidden

layer. The activation function of ELM is sigmoid. DE has 100 iterations, population size is 50, mutation

factor is 0.5 and the crossover rate is 1. The RNN network has 1 hidden layer. The transfer function is

logistic sigmoid.

5.4 EDCNN Multiple Inputs Single Output (MISO) Model for Wind Power Fore-

casting

The proposed method for forecasting wind power generation [84] and power management algorithm

(as shown in ﬁgure 7) are discussed in this section. The problem statemen of EDCNN is discussed

in section 4. The features and target (wind power) are normalized using min-max normalization (as

shown in equation 6). Three types of inputs are given to the forecasting model are: (i) NWP: dew

point temperature, dry bulb temperature, wind speed, (ii) past lagged values of wind power and (iii) past

decomposed wind power. The wavelet decomposition is described in the next section.

Windmill Farm Data

.

.

.

NWP Data Windmill Farm

Convolution

layer 1 Convolution

Layer 2

Max

Pooling

Layer

Image Input

Layer

.

.

.

Inputs

.

.

.

.

.

.

.

.

.

.

.

.

Max

Pooling

Layer

ReLU

Batch

Normalization

Batch

Normalization Fully

Connected

Layer

.

.

.

Regression

Layer

Wind

Power

Forecast

X

CA

CD

CA

CD

CA

CD

{

{

Wavelet

Packet

Decomposition

Wind

Power

Time

Series

Day-Ahead LMP

Day-Ahead Demand

Normally Distributed Load

Profile

Figure 7. Overview of EDCNN MISO model for wind power forecasting.

5.4.1 Feature Engineering

The historical wind power signal is decomposed using Wavelet Packet Transform (WPT). The WPT is a

general form of the wavelet decomposition which performs a better signal analysis. WPT is introduced

in 1992 by Coifman and Wickerhauser [82]. Unlike Discrete Wavelet Transform (DWT), the WPT wave-

forms or packets are interpreted by three different parameters: frequency, position and scale (similar to

the DWT). For every orthogonal wavelet function multiple wavelet packets are generated, having differ-

ent bases. With the help of these bases, the input signal can be encoded in such a way that the global

energy of signal is preserved and exact signal can be reconstructed effectively. Multiple expansions of

an input signal can be achieved using WPT. The suitable most decomposition is selected by calculating

the entropy (e.g., Shannon entropy). The minimal representation of the relevant data based on a cost

function is calculated in WPT. The beneﬁt of the WPT is its characteristic of analyzing signals in dif-

ferent temporal as well as spatial positions. For highly nonlinear and oscillating signal like wind power

DWT doesn’t guarantee good results. In WPT, both the approximation and detail coefﬁcients are further

decomposed into approximation and detail coefﬁcients as the wavelet tree grows deeper. Wavelet packet

decomposition operation can be expressed by equations 9and 10. For a signal ato be decomposed, two

ﬁlters of size 2Nare applied on a. The corresponding wavelets are h(n) and g(n).

W2n(a) = √2

2N−1

∑

k=0

h(k)Wn(2a−k)(9)

W2n+1(a) = √2

2N−1

∑

k=0

g(k)Wn(2a−k)(10)

Where the scaling factor W0(a) = φ(a)and the wavelet function is W1(a) = ψ(a).

The past wind power signal is decomposed into 36 signals. The best representation of the input signal

is selected through Shannon entropy. After decomposing the past wind signals, the engineered features

along with NWP variables (dew point, dry bulb, wind speed), lagged wind power (w-24, w-25) and hours

are input to the proposed forecasting model. The proposed forecasting model is discussed in the next

section.

5.4.2 Efﬁcient DCNN

The inputs are given to the EDCNN for predicting day-ahead hourly wind power (24 values). Firstly,

the functionality of trivial CNN is discussed in this section. Secondly, the proposed method EDCNN is

explained. The problem statement of the EDCNN is discussed in section 4.

CNN is the computational model of human’s visual cortex’s functionality. CNN has an excellent capa-

bility of extracting deep underlying features of data. The CNN effectively identiﬁes the spatially local

correlations in data through convolution operation. In the convolution operation, a ﬁlter is applied to a

block of spatially adjacent neurons and the result is passed through an activation function. This output

of convolution layer becomes the input to next layer’s neurons. Thus, the input to every neuron of a

layer is the output of a convolved block of the previous layer. Unlike ANN, the CNN training is efﬁcient

due to the weight sharing scheme. Due to the weight sharing, the learning efﬁciency improves. CNN

is composed of four altering layers: (i) convolution layer, (ii) sampling or pooling layer, (iii) batch nor-

malization layer and (iv) fully connected layer. The convolution operation can be explained by following

equation 11. Suppose, X = [x1,x2,x3, ..., xn] is the vector of training samples and C = [c1,c2,c3, ...,

cn] is the vector of corresponding targets. nis the number of training samples. CNN attempts to learn the

optimal ﬁlter weights and biases that minimize the forecasting error. CNN can be deﬁned as [85]:

Ym

i=f(wm⊗Xm

i+bm)(11)

Where i = [1, 2, . . ., n] and m = [1, 2, . . ., M]. mis the number of layer to be learned. The ﬁlter weights

of the mth layer is denoted by wm.bmrepresents the corresponding biases, ⊗refers to the convolution

operation. f(·)is the nonlinear activation function. Ym

iis the feature map generated by sample Xiat layer

m.

In the EDCNN network, there are eleven layers: three convolution layers, three max pooling layers,

two batch normalization layers, three ReLU layers, one modiﬁed fully connected layer and modiﬁed

output layer Enhanced Regression Output Layer (EROL). Functionality of two layers is modiﬁed, in

order to improve the forecasting performance of EDCNN. According to the ANN literature, there is no

standard way to choose an optimal activation function. However, it is a well-known fact that machine

learning methods have an excellent optimization capability of any model or function. On the basis of

these facts, a modiﬁed activation function is employed in a hidden layer. The proposed activation function

is ensemble of results of three activation functions: hyperbolic tangent, sigmoid and radial base as shown

in equations 12, equation 13 and 14, respectively. The proposed activation function takes the average of

three functions’ (as shown in equations 12–14) results. New activation function is shown in equation 15.

T H =exw −e−xw

exw +e−xw (12)

σ=exw

1+exw(13)

φ=φkxw −ck(14)

F(x,w) = (T H +σ+φ)

3(15)

Where xw is the intermediate output of a network layer (weighted sum of input) on which activation is

to be applied to achieve the ﬁnal output. φis the radial base function. The proposed activation function

takes the average of the three aforementioned functions to calculate the results of corresponding hidden

layer.

In the proposed output layer EROL, a modiﬁed objective function is embedded. The objective is to

minimize the absolute percentage error between the forecast values and actual targets. The objective can

be expressed as equation 16:

min Loss(w,Xi,ci) = L(w,Xi,ci)(16)

Where L(w,Xi,ci)is the forecasting error or loss from sample Xi. The loss function is expressed as

equation 17:

L(w,Xi,ci) = 1

n

n

∑

i=1

Yi−ci

Yi

100 (17)

Where Yi=F(∑n

i=1Xiwi)is the output of the output layer and ciis the desired actual target.

After forecasting the wind power, it is used in the DSM algorithm. The day-ahead Locational Marginal

Price (LMP), day-ahead demand and forecast wing power are the inputs to the proposed DSM algorithm.

The proposed DSM algorithm is applied to the data of a smart grid-connected micro grid. The system

description is presented in the next section.

5.4.3 Wind Power Forecasting based Demand Side Management

A smart grid tied micro grid with the wind power plants is studied in this research work. For the MG’s

load management, three parameters are utilized: (i) wind power forecast, (ii) day-ahead demand / load

and (iii) day-ahead LMP. The LMP is the price of energy purchased from the SG in case of insufﬁcient

generation of wind power. In the wind power generation, there are following possible cases:

4.4.3.1 Case 1

The ﬁrst and simplest case is when the generated wind power is equal to the load. There is no gap

between the generation and demanded power. In this case, no energy is required to be purchased from

the SG. MG is self-sufﬁcient.

4.4.3.2 Case 2

The wind power generated in the MG is more than the required power. In this case, the excessive power

is transmitted to the SG (as shown in equation 18).

W−L→SG (18)

Where W is the wind power, L is the load and transmission process is denoted by the symbol →.

In exchange of this energy, the SG will give MG a subsidiary on the future price of energy purchase.

4.4.3.3 Case 3

Another case is when there is either no or lesser wind power as compared to the demand. In this case,

the MG have to purchase the required power from the SG. If there is a subsidiary on price from the past,

the price is reduced, otherwise the actual price is paid for purchasing energy. Generally, a 10% to 15%

concession on energy price is offered as a subsidiary. In this case, the proposed demand management

algorithm is applied to achieve the objectives listed below:

- Load factor maximization.

- Consumption cost minimization.

5.4.4 Proposed DSM Algorithm

The wind power is forecast for 24 hours. The ﬁrst objective is to maximize load factor for maximum

utilization the power resource. The second objective is to minimize the consumption cost.

Ob j1=maximize LF (19)

Ob j2=minimize C (20)

Where LF is the load factor (equation 21) and C (equation 22) is the total consumption cost.

LF =ˆ

L

¯

L(21)

C=

n

∑

i=1

Li×P

i(22)

Where ˆ

Lis the sum of total load, ¯

Lis the average load, Lis the load vector, Pis the LMP vector and the

unit of LMP is $/MW h.nis the length of the load and LMP vectors.

There are a few constraints of the system. The ﬁrst constraint is that the demanded load must be equal

to the load after applying the DSM scheme. Second constraint is that after applying the DSM, the

consumption cost should be less than the initial cost. And the third constraint is that load factor must

increase. Following are the constraints (equations 23–25):

L=Lnew (23)

C≤Cold (24)

LFnew >LF (25)

Where L is load before DSM and Lnew is load after applying DSM. Cold is the consumption cost before

DSM and C is cost after DSM. LFnew is the load factor after DSM. The purpose of the proposed DSM

scheme is to bring the consumption as close to the normal distribution curve as possible.

Let the input vectors containing 24 values: W = wind power forecast,

L = day-ahead demand and

P = day-ahead LMP.

Other variables used are:

C = consumption cost,

S = subsidiary,

DWD = demand-wind power difference,

P

new = new adjusted price and

Lnew = new normally distributed load after applying DSM scheme.

The proposed algorithm for managing wind power and demand in an economical manner is given bellow.

Manage_Demand(·) is the proposed function for managing demand in an economical manner. This func-

tion will distribute the load in a normal form. The peak periods are shaved and valley periods are ﬁlled.

The resultant load proﬁle achieved by this method will follow the normal distribution, approximately.

Algorithm 1 Algorithm for Demand Side Management.

Require: Input: [W, L, P]

1: Output: C

2: if W=Lthen Wind power is sufﬁcient to fulﬁll demand

3: P

new =0Wind power is sufﬁcient that has no cost

4: Lnew =LLoad is equal to wind power, so load adjustment is not performed

5: C=P

new ×Lnew Calculating consumption cost

6: else if W>Lthen Wind power is greater than demand

7: W−L→SG Excessive wind power is transmitted to the SG

8: S = 0.9 10 % reduction in price is subsidiary for next power purchase

9: P

new =0Wind power is sufﬁcient that has no cost

10: Lnew =LLoad is lesser than wind power, so load adjustment is not performed

11: C=P

new ×Lnew Calculating the consumption cost

12: else if W≥0AND W <Lthen Wind power is not sufﬁcient to fulﬁll the demand

13: DW D =L−WFinding demand that have to be fulﬁlled by the SG

14: Lnew =Manage_Demand(DW D,L)Managing demand to distribute it normally

15: if S = 0.9 then If there is subsidiary on the price, the price will be adjusted

16: P

new =P×S10% reduction on price by subsidiary

17: C=P

new ×Lnew Calculating consumption cost

18: else

19: P

new =PIf there is no subsidiary on price, price remains same

20: C=P

new ×Lnew Calculating consumption cost

21: end if

22: end if

Algorithm 2 Function for Load Shifting.

1: Manage_Demand Function

2: Function Lnew =Manage_Demand(DW D,L)

3: µ=mean(DW D)Average of demand to be fulﬁlled by the SG

4: σ=std(DW D)Standard deviation of demand to be fulﬁlled by the SG

5: SD =sum(DW D)Sum of demand to be fulﬁlled by the SG

6: if DWD < µthen Checking each value of demand vector if it is smaller than mean

7: L0=L+σWhen value is smaller, add standard deviation to make it closer to mean

8: else if DWD > µthen Checking each value of demand vector if it is greater than mean

9: L0=L−σWhen value is larger, subtract standard deviation to make it closer to mean

10: end if

11: SL =sum(L0)Taking sum of all values of new adjusted load vector

12: d=SL–SD Taking the difference of demanding load and new adjusted load and adjusting new

13: load to be equal the demanded load

14: if d>0then Difference greater than zero means the new adjusted load is more than the

demanded load

15: [idx Count ] = L>µCount is the number of values greater than average and index are their

16: index

17: Lnew =L(indx)−d

count Subtracting the difference from all the larger values

18: else if d<0then Difference smaller than zero means the new adjusted load is lesser than the

demanded load

19: [indx Count ] = L<µCount is the number of values that are smaller than average load

20: Lnew =L(indx) + d

count Adding the difference in all the smaller values

21: end if

22: [index Lsorted ] = Sort(Lnew)Sort will sort the Lnew in ascending order and return index of the

23: sorted array Lsorted

24: For i=1to6 Shift the peak load to the lowest load

25: j = i-1, sf = 5*i a = length(Lnew )Deﬁning shifting factor

26: if index(i) > 6 then Shift the load to lowest load that is not late night

27: shftFac = Lnew(index(a−j))

s f

28: Lnew(index(i)) = Lnew(index(i)) - shftFac Subtracting the shifting factor from the highest load

29: Lnew(index(a-j)) = Lnew(index(a-j)) + shftFac Adding the shifting factor to the lowest load

30: end if

31: End For

32: End Function

5.5 DE-RUSBoost model for Electricity Theft Detection

In this section, the proposed ETD system is discussed in detail. The proposed model comprises of

ﬁve stages, namely: (i) data preprocessing, (ii) classiﬁer training and optimization, (iii) classiﬁcation

and (iv) model evaluation. The system model is illustrated in ﬁgure 8. The ﬁgure shows the step by

step procedure of the proposed system. Firstly, the data is acquired from smart homes or community,

secondly, the unlabeled data is labeled, thirdly, the labeled data is fed to classify fraudulent consumers,

fourthly, the classiﬁcation model is optimized and lastly, classiﬁcation is performed. Description of the

system model is given below. The problem statement of the DE-RUSBoost is discussed in section 4.

5.5.1 Data Preprocessing

The data contains missing values. The missing values are ﬁlled using linear interpolation. If less than

seven consecutive values are missing or zero, they are replaced by linearly interpolated values. If seven

or more than seven consecutive values are missing, they are replaced by zeros. Every week starts on

Monday and ends on Sunday in all the calculations.

The electricity thieves are only 8% of the total data. The positive class is the class of fraudulent consumers

and negative class contains fair consumers. The negative class is 12 times larger than the positive class.

Therefore, this is a binary classiﬁcation problem of highly imbalanced data. An ensemble approach is

designed for classiﬁcation [76,86] that is described in the next section.

Consumption Data

Outlier

Rejection Interpolation Normalization

Data Preprocessing

++

No

Train

RUSBoost

Classifier

Criterion

Satisfied?

Yes Tune Hyper

parameters

by DE

. . .

Smart Community

. . .

. . .

Detect

Malicious

Consumers

Figure 8. Overview of DE-RUSBoost model for electricity theft detection.

5.5.2 DE-RUSBoost Classiﬁcation Model

RUSBoost is an ensemble method that is successfully used for imbalanced data classiﬁcation in ETD [76]

and several other ﬁelds [87,88,89]. It works on the sampling strategy to overcome the class imbalance

ratio. It combines the strengths of both Adaptive boosting (Adaboost) and Random Under Sampling

(RUS) techniques. The Adaboost algorithm repeatedly trains multiple weak learners (usually Decision

Trees (DTs)) on subsets of training data S0. For classifying a new example, a weighted vote of all the

learners is taken. The weights are assigned to misclassiﬁed examples by every learner using the following

error formula shown in equation 26.

εn=∑

i:hn(xi)6=yi

Dn(i)∀xi∈S(26)

Where hnis the weak learner, xiis the example to be labeled, Dn(i)is the probability distribution of all

the examples at iteration nand Sis training set. After calculating the error, the weights are updated by

following equations 27 and 28.

αn=ln1−εn

εn(27)

Dn+1(i) = (Dn(i)×e−αnI f hn(xi) = yi

Dn(i)×eαnOtherwise (28)

Where Dn+1(i)is the updated weight of a sample, Dn(i)is the weight of the sample in the previous

iteration n,αnrepresents the weight updation factor and hn(xi)is the label assigned to the example xiby

the weak learner hn.

The performance of Adaboost algorithm is better for balanced data, however, in the case of imbalanced

classes, it tends to underﬁt. To diminish the effect of imbalanced data, under sampling is performed in

RUSBoost method. The RUSBoost method under sample the majority class while selecting subsets for

training week learners. For example, the percentage of majority class is 90% and minority class is 10%.

The week learners are trained on 20% of training examples by taking all examples of minority class and

same number of examples are randomly selected from the majority class. In this way, the imbalanced ratio

is reduced and learners are trained on the balanced data. With under sampling, the weights are updated

using the Adaboost method’s steps (as shown in equations 27 and 28). A new example is classiﬁed by

equation 29.

yi=H(xi) = signN

∑

n=1

αnhn(xi)(29)

Where yiis the label assigned to example xiby the RUSBoost classiﬁer H(xi). The nis the number of

training cycles or iterations, hnis the weak learners.

The details given above describe the trivial RUSBoost method. In order to reduce the false detection

rate and to improve the performance of the conventional RUSBoost algorithm, an enhanced scheme is

proposed that is named DE-RUSBoost. In this work, the RUSBoost’s parameters are optimized using DE

in order to make it more robust and accurate than trivial RUSBoost. DE is a well-known meta-heuristic

optimization technique [90]. It iteratively optimizes a problem attempting to improve a candidate solu-

tion according to an objective function. It keeps a population of candidate solutions and generate new

solution by combination of initial solutions (crossover) and altering one or more elements in the solutions

(mutation). It then selects the best solution based on the ﬁtness value of that solution. Following are the

major steps DE followed while optimizing the RUSBoost classiﬁer:

1. Initialization: Randomly generates the initial population of size NPthat follows a uniform distri-

bution.

2. Mutation: In the mutation step, a new solution υg+1

iis created in the ith iteration by following

equation 30 [90]:

υg+1

i=Xg

s1+F(Xg

s2−Xg

s3)F∈[0,1](30)

Where Xg

s1,Xg

s2,Xg

s3are individuals selected from generation gand Fis the mutation factor.

3. Crossover: In the crossover step, individuals are combined to create new solutions by following

equation 31 [90]:

Ug+1

i=(υg

i,jI f ri≤CR

Xg

i,jOtherwise (31)

Where Ug+1

i,jis the trail vector of intermediate crossing, Vg

i,jis the corresponding mutant solution

vector, j∈[1,d],dis the dimension of solution vectors. CR is the crossover rate and ri∈[0,1]is

a uniformly distributed random factor that deﬁnes the possible values of CR.

4. Selection: The last operation is the selection of the best solution that is described in equation 32

[90]:

Xg+1

i=(Ug

i,jI f f (Ug+1

i)≥f(Xg

i)

Xg

iOtherwise (32)

f(Xg

i) = T P

2(T P +FP)+T N

2(T N +F N )(33)

Where f(Ug+1

i)denotes the ﬁtness value of trail vector Ug+1

i,f(Xg

i)denotes the ﬁtness function

of Xg

i, TP are correctly classiﬁed positive test sample, FP are misclassiﬁed positive samples, TN

are correctly classiﬁed negative samples and FN are misclassiﬁed negative samples. The ﬁtness

function is to maximize the Area Under the Curve (AUC). The algorithm selects the parameters

that make RUSBoost more accurate. The proposed method is presented in the algorithm 3.

Algorithm 3 DE-RUSBoost Algorithm.

Require: Input: [0,0]

1: Output: P,N

2: Set NP,Gmax, CR, F

3: Randomly set X = [P, N]

4: for n=1ToNPdo

5: If fi(Xi)>fi(Xi+1)

6: Reserve fi(Xi)

7: Compare fi(Xi)with fi(Xi+2)

8: Else

9: Reserve fi(Xi+1)

10: Compare fi(Xi+1)with fi(Xi+2)

11: End If

12: Obtain fmax(Xi)

13: End for

14: Denote fmax(Xi)as X∗= [N∗,P∗]

15: Classify a new example using RUSBoost (equations 26–29).

The parameters that are optimized are the imbalance ratio of subset selected for weak learner’s training

and the number of weak learners hn(i.e., DTs). The performance of DE-RUSBoost is signiﬁcantly

improved as compared to the RUSBoost. The objective function of DE is minimization of the training

error (as shown in equation 33). The DE selects parameters, train the RUSBoost classiﬁer and calculates

the error. In DE, the population size NPis 300, number of iterations Gmax is 100, crossover rate CR is 1

and mutation factor Fis 0.5.

The best parameters DE selected for the classiﬁers are: class imbalance percentage of 54%, 46% for

majority and minority class, respectively and 250 decision trees (weak learners). The DE-RUSBoost

classiﬁer is trained on 70% of the total data and tested on 30% data. The results and analyses of the

proposed models are presented in the next section.

6 Simulations’ Setup and Results

All simulations are performed using MATLAB R2018a on a computer system with core i3 processor, 8

GB RAM and 500 GB hard disk. In the next sections 7–10, simulation results are discussed.

7 Results and Discussion of DLSTM

In this section, performance of the proposed system 1 is validated through simulations and discussion.

The problem statement of this model is discussed in section 4. A case study is presented in the next

section, i.e., short-term forecasting using aggregated load and the average price of six states of USA [91].

7.1 Data Description

The historic electricity price and load data used in simulations are taken from ISO NE [91]. ISO NE man-

ages the generation and transmission system of New England. ISO NE produces and transmits almost

30,000 MW electric energy daily. In ISO NE, annually 10 million dollars of transactions are completed

by 400 electricity market participants. The data comprises ISO NE control area’s hourly system load and

regulation capacity clearing price of 6 states of the USA captured in the last eight years; i.e., January

2011 to March 2018. The data contains 63,528 measurements.

When the performance of DLSTM is compared with the aforementioned methods, it had less error. DL-

STM had lower MAE and NRMSE as compared to ELM, WT + SAPSO + KELM, NARX and INARX.

WT+SAPSO+KELM [92] is proposed for electricity price prediction. For price forecasting, DLSTM is

compared with ELM, NARX and WT + SAPSO + KELM. Buitrago et al. proposed INARX [93] for elec-

tricity load prediction. The DLTM load prediction results are compared with ELM, NARX and INARX.

The comparison of forecast results is shown in ﬁgures 9and 10.

0 20 40 60 80 100 120 140 160

Time (Hours)

-300

-200

-100

0

100

200

300

Price ($/MWh)

Observed

DLSTM

ELM

NARX

WT+SAPSO+KELM

Figure 9. Comparison of DLSTM, ELM and NARX for price forecast of one

week, ISO NE.

0 20 40 60 80 100 120 140 160

Time (Hours)

0

0.5

1

1.5

2

2.5

Load (MW)

104

Observed

DLSTM

ELM

NARX

INARX

Figure 10. Comparison of DLSTM, ELM and NARX for load forecast of one

week, ISO NE.

DLSTM has a feedback architecture, where errors are backpropagated. In DLSTM, weights are updated

multiple times during training, with every new input. The learned weights are obtained when network

completes its training on complete training data.

7.2 Performance Evaluation

For performance evaluation, two evaluation indicators are used: MAE and NRMSE. MAPE performance

metric has a limitation of being inﬁnite, if the denominator is zero; MAPE is negative, if the values are

negative, which are considered meaningless. Therefore, MAE and NRMSE are suitable performance

measures.

MAE is the average of absolute errors. In MAE, the absolute difference of all forecast values is taken from

their respective observed values. After taking absolute difference, their arithmetic mean is calculated.

NRMSE is the average root square error. In NRMSE the difference is calculated similar to MAE and

difference is squared. Arithmetic mean of squared error is calculated and its square root is taken. The

calculated error is normalized by dividing it to the max(Xs)−min(Xs).max(Xs)is the maximum value

from the vector of observed test value and min(Xs)is the minimum value from the vector of observed test

value.

The formulas of MAE and NRMSE are given in equations (34) and (35), respectively.

MAE =1

T

T

∑

t=1|(Xs−ys)|(34)

NRMSE =q1

T∑T

t=1(Xs−ys)2

(max(Xs)−min(Xs)) (35)

Where Xsis the observed test value at time t and ysis forecast value at time t.

A vector of values that are to be forecast are [y1,y2, .. ., yn]. These values are predicted by two fore-

casting models: M1and M2. The forecasting errors of these models are [εM1

1,εM1

2, ... , εM1

n]and

[εM2

1,εM2

2, ... , εM2

n]. A covariance loss function L() and differential loss are calculated in DM as

equation (36) [94]:

dM1,M2

t=L(εM1

t)−L(εM2

t)(36)

In its one-sided version, the DM test evaluates the null hypothesis H0of M1having an accuracy equal to

or worse than M2; i.e., equal or larger expected loss, against the alternative hypothesis H1of M2having

a better accuracy, i.e., [94]:

One −sided DM test (H0:dM1,M2

t≤0,

H1:dM1,M2

t>0.(37)

SANN cannot handle large amount data very well and tends to overﬁt. DNN has more computational

power than SANN. For a prediction on big data, deep learning is shown to be an effective and viable al-

ternative to traditional data-driven machine learning prediction methods [20]. The validated and updated

Deep LSTM forecaster outperformed ELM and NARX in terms of MAE and NRMSE.

The NRMSE and MAE matrices are used to compare the accuracy of different forecasting models. How-

ever, the fact that the accuracy of a model is higher does not conﬁrm that a model is better than the

others. The difference between the accuracy of two models should be statistically signiﬁcant. For this

purpose, the forecasting accuracy is validated using statistical tests; such as, Friedman test [95], error

analysis [96], Diebold–Mariano (DM) test [97], etc. The performance of the proposed method is vali-

dated by two statistical tests, DM and Friedman test. DM is a well-known statistical test for validation

of electricity load [98] and price forecasting [94]. DM forecasting accuracy comparison test is used for

comparing the accuracy of the proposed model with the existing models, i.e., ELM, WT + SAPSO +

KELM, NARX and INARX.

Table 2 Comparison of load and price forecasting errors of DLSTM SISO with benchmark

models.

Forecast Forecasting Method MAE NRMSE

ELM 67.4 11.86

NARX 12.47 8.24

Price Forecast WT + SAPSO + KELM [92]8.99 0.13

DLSTM 1.945 0.08

ELM 52.8 8.42

Load Forecast NARX 37.18 14.74

INARX [93]9.7 0.2

DLSTM 2.9 0.087

The second test used for veriﬁcation of improved accuracy of the proposed model is the Friedman test.

The Friedman test is a two-way analysis of variance by ranks. It is a non-parametric alternative to the

one-way ANOVA with repeated measures. Multiple comparison tests are conducted in the Friedman

test. Its goal is to detect the signiﬁcant differences between the results of different forecasting methods.

The null hypothesis of Friedman test states that the forecasting performances of all methods are equal.

To calculate the test statistics, ﬁrst the predicted results are converted into the ranks. The predicted results

and observed values pairs are gathered for all methods. Ranks are assigned to every pair i. Ranks range

from 1 (least error) to k(highest error) and denoted by rj

i(1≤j≤k). For all forecasting methods j,

average ranks are computed by:

Ri=1

n

n

∑

i=1

rj

i(38)

Ranks are assigned to all forecasts of a method, separately. The best algorithm has rank 1, the second

best has 2 and so on. The null hypothesis states that all methods’ forecast results are similar; therefore,

their Riare equal. Friedman statistics are calculated by equation (39) [95].

F=12n

k(k+1)"n

∑

i=1

Rank2

i−k(k+1)2

4#(39)

where nis the total number of forecasting results, kis the number of compared models, Rankiis the

average rank sum received from each forecasting value for each model. The null hypothesis for Fried-

man’s test is that equality of forecasting errors among compared models. The alternative hypothesis is

deﬁned as the negation of the null hypothesis. The test results are shown in table 3. Clearly, the proposed

DLSTM model is signiﬁcantly superior to the other compared models.

Friedman test (H0:F≤0M1

Accuracy ≤M2

Accuracy ,

H1:F>0M1

Accuracy >M2

Accuracy.(40)

In table 3, the results of DM and Friedman tests are presented. The DM test statistics of DLSTM with the

compared methods are listed. The DM results greater than zero mean the DLSTM method is signiﬁcantly

better than the compared method (as shown by hypotheses in equation (37)). Friedman R ranks are

computed by equation (39). The ranks range from 1 to 4 for four compared methods. Rank 1 shows the

best performance and 4 shows the worst performance of forecasting method. The DM values of DLSTM

versus three compared method are shown (DLSTM is not compared with itself, therefore Not Applicable

(N/A) is stated). For price forecasting, the F rank was: DLSTM > WT + SAPSO + KELM [92] > NARX >

ELM. The F rank for load forecasting was: DLSTM > INARX [93] > NARX > ELM. The used statistical

tests validated that the accuracy of the proposed method DLSTM is signiﬁcantly improved. The DLSTM

ranked ﬁrst for both load and price forecasting. The DM results are greater than zero, which means

DLSTM is better than the other compared methods.

Table 3 Diebold-Mariano and Friedman tests’ rank F of DLSTM SISO.

Forecast Forecasting Method Diebold–Mariano Friedman

DLSTM vs. F Rank

ELM 47.3 4

Price Forecast NARX 27.6 3

WT + SAPSO + KELM [92]12.8 2

DLSTM N/A 1

ELM 43.2 3

Load Forecast NARX 6.8 2

INARX [93]4.2 2

DLSTM N/A 1

Experimental results prove that the proposed method forecasts the real patterns and recent trends of load

and price with greater accuracy as compared to ELM and NARX. Comparison of the proposed method

with NARX and ELM is shown in table 2. The price forecast errors listed in table 2are the average of all

twelve months of forecasting errors for ELM, NARX and DLSTM.

8 Results and Discussion of Model ESAENARX and DE-RELM

In this section, the description of datasets, data analysis and results’ discussion of ESAENARX and

DE-RELM models are presented.

8.1 Data Description

The data used for forecasting is taken from the well-known electricity utility: ISO NE (ISO New Eng-

land), USA. Datasets is publicly available.

8.2 Performance Evaluation

To evaluate the performance of ESAENARX two performance measures are used, i.e., MAPE, RMSE

and NRMSE. The lower value of the error is better forecasting accuracy.

8.3 Comparison and Discussion

The proposed methods are compared with four ANN forecasting methods: NARX and ELM, DE-ELM

and RELM. These methods are widely used in electricity load and price forecasting.

The detailed comparison of all the compared methods is presented in this section. The results and rea-

soning are also elaborated with the comparative analysis. Moreover, the strengths and limitations of the

compared methods are highlighted.

The effect of the proposed feature engineering is clear from the numerical results. The forecast accu-

racy of ESAENARX with extracted features is much better as compared to simple NARX. The extracted

features are informative therefore, the forecaster is able to model data better and forecast with greater

accuracy.

The proposed methods are compared with three types of ELMs; i.e., standard ELM, DE-ELM and RELM.

The comparative analysis of these methods is given below.

The ELM is optimized using a meta-heuristic optimization algorithm, named Differential Evaluation.

The initial weights and biases of ELM’s hidden and output layers are optimized using DE. DE is an

optimization method that iteratively improves the performance of an algorithm with respect to the opti-

mization function.

RELM is a variant of the recurrent neural network. It is a combination of two methods, ELM and RNN.

ELM acts as an encoder, where the inputs and outputs of the network are same, i.e., the input features.

The learned weights of the ELM network are set as the initial weights of the RNN. By keeping the inputs

and outputs of ELM network similar, the learned weights are a good representation of the input features.

The number of neurons in the hidden layer of ELM and RNN is kept the same. Two ELM encoders are

trained, one for the hidden layer’s weights of RNN and second for the output layer’s weights of the RNN.

The learned weights, make the RNN converge fast and better. The results of RELM are slightly better

than DE-ELM and comparable to NARX. Both RELM and NARX belong to the same category of the

neural network that is known as a recurrent neural network.

The second proposed method DE-RELM performs reasonably well on load forecasting. The load fore-

casting results are much better as compared to other techniques and comparable to ESAENARX. How-

ever, no signiﬁcant improvement is seen in the price forecast. ESAENARX performs equally well for

both load and price. The DE-RELM trains the forecaster on learned weights, a minor improvement is

achieved, that is not comparable to ESAENARX. For price forecast only properly extracted features can

improve accuracy. ESAE extracts the relevant and the most informative features, that improves the fore-

cast accuracy.

ELM has the worst forecast results in the six compared methods. Because of the fact that ELM is a feed

forward network. Its weights are learned once in a forward pass and never updated. Therefore, to achieve

acceptable forecast results, the initial weights of the ELM have to be very optimized. NARX performs

better as compared to the ELM. However, its forecast results are not as accurate as the proposed methods

ESAENARX and DE-RELM. The errors MAPE and NRMSE are shown in table 4.

Table 4 Comparison of forecasting errors of ESAENARX MIMO and DE-RELM MIMO with

benchmark models on ISO NE dataset.

Forecast Method MAPE RMSE NRMSE

ELM 74.59 7.82 1.53

NARX 1.35 4.35 0.37

Load Forecast DE-ELM 21.73 5.23 0.41

RELM 18.78 4.62 0.37

CEANN [74]8.62 3.75 0.57

DE-RELM 7.78 3.14 0.32

ESAENARX 1.13 2.27 0.03

ELM 89.95 9.78 1.91

NARX 8.29 5.24 0.89

Price Forecast DE-ELM 28.06 6.92 0.32

RELM 21.06 5.62 0.28

CEANN [74]19.96 4.45 0.96

DE-RELM 18.62 3.75 0.34

ESAENARX 3.32 2.85 0.08

The forecast accuracy of all six methods is in sequence: ESAENARX > DE-RELM > NARX > DE-ELM

> RELM > ELM.

The lesser error than compared methods veriﬁes the good performance of the ESAENARX forecast

model. The results in ﬁgure 11 and ﬁgure 12, prove the better accuracy of ESAENARX and DE-RELM as

compared to ELM, DE-ELM, RELM and NARX. The MAPE and NRMSE of ESAENARX, DE-RELM,

ELM, DE-ELM, RELM, NARX and CEANN [74] are listed in table 4. The efﬁciency of ESAENARX

and DE-RELM is conﬁrmed by lesser MAPE and RMSE compared to the mentioned methods.

0 20 40 60 80 100 120 140 160

Time (Hours)

0

50

100

150

200

250

Price ($/MWh)

Observed

ESAENARX

ELM

NARX

DE-ELM

RELM

DE-RELM

CEANN

Figure 11. Comparison of ESAENARX and DE-RELM price prediction with

NARX, ELM and DE-ELM, ISO NE.

0 60 120 170

Time (Hours)

0

0.5

1

1.5

2

2.5

Load (MW)

105

Observed

ESAENARX

ELM

NARX

DE-ELM

RELM

DE-RELM

CEANN

Figure 12. Comparison of ESAENARX and DE-RELM load prediction with

NARX, ELM and DE-ELM, ISO NE.

9 Results and Discussion of Model EDCNN

The simulation results of EDCNN wind forecasting model are discussed in this section. The problem

statement of the EDCNN is discussed in section 4.

9.1 Data Description

The three year hourly data of wind power is taken from ISO New England’s wind farm located in Maine

[99]. The duration of data utilized in this research is from January 2015 to December 2017.

9.2 Wind Power Analysis

Wind power is the widely available RES, therefore it is one the most popular and emerging power gener-

ation source. The predictive analytics are performed on wind power data of Maine wind farms, ISO New

England. According to the annual report, Maine wind form produces approximately 900 MW energy,

annually, which contributes in almost 14% of the total electricity in the Maine state. The wind power is

directly proportional to the wind speed. The wind speed varies from season to season. In Maine USA,

the wind speed is affected by seasonality. The wind power in the autumn is higher compared to other

seasons. The reason behind this is the fastest winds in coastal area of Maine, where the wind turbines are

installed.

9.3 EDCNN Performance Evaluation

EDCNN is compared with two models: typical CNN and SELU CNN for wind power forecasting (as

shown in ﬁgure 13). For performance evaluation of wind power forecasting, three evaluation indicators

are used: Mean Absolute Error (MAE), Normalized Root Mean Square Error (NRMSE) and MAPE (as

shown in table 5).

0 5 10 15 20 25

Time (Hours)

100

200

300

400

500

Wind Power (MW)

Spring

Observed

ECNN

SELU CNN

CNN

0 5 10 15 20 25

Time (Hours)

200

400

600

800

Wind Power (MW)

Summer

Observed

ECNN

SELU CNN

CNN

0 5 10 15 20 25

Time (Hours)

600

700

800

900

1000

Wind Power (MW)

Autumn

Observed

ECNN

SELU CNN

CNN

0 5 10 15 20 25

Time (Hours)

0

200

400

600

Wind Power (MW)

Winter

Observed

ECNN

SELU CNN

CNN

Figure 13. All season predictions of wind power.

9.4 Statistical Analysis of EDCNN

The aforementioned error indicator (as shown in table 5) are utilized for accuracy comparison of forecast-

ing models. However, the lesser error or higher accuracy of a model doesn’t guarantee its superiority over

other models. A model is better as compared to another model, if the difference between their accuracies

is statistically signiﬁcant. Different statistical tests are used to validate the signiﬁcance of models, such

as Friedman test [95], error analysis [96], DM test [97], etc. To validate the performance of the proposed

forecasting model EDCNN, a well-known statistical test DM is used (as shown in table 6). Diebold and

Mariano propose the classical Diebold–Mariano statistical test in 1995 [97]. The DM test evaluates the

signiﬁcant difference between forecasting errors of two models. In this research work, the error metric

used for DM is MAE. DM is widely used for validation of wind power forecasting [100].

Table 5 MAPE and NRMSE of the EDCNN MISO and compared methods.

Method Season MAPE NRMSE MAE

Spring 8.42 2.34 3.34

Summer 8.23 2.27 3.24

CNN Autumn 7.9 2.65 3.36

Winter 8.1 2.71 2.89

Spring 3.47 0.12 3.1

Summer 3.62 0.13 3.3

SELU CNN Autumn 3.45 0.12 3.4

Winter 3.27 0.17 3.2

Spring 2.67 0.092 2.4

Summer 2.43 0.096 2.24

EDCNN Autumn 2.56 0.085 2.67

Winter 2.62 0.094 2.18

The results of the DM test with conﬁdence level of 95% are shown in table 6. DM is applied to the

forecasting results of EDCNN and two compared methods: CNN and SELU CNN [69]. Three compar-

isons are performed, i.e., EDCNN with CNN, EDCNN with SELU CNN and CNN with SELU CNN.

The EDCNN is better than CNN and SELU CNN. Whereas, SELU CNN is better than CNN.

Table 6 Diebold–Mariano test results of EDCNN MISO at a 95% conﬁdence level.

DM Score

Season EDCNN SELU CNN CNN

Spring DM-MAE 1.4252 0.0842 1.4256

Summer DM-MAE 1.3262 0.1024 1.3692

Autumn DM-MAE 1.2714 0.1762 1.6728

Winter DM-MAE 1.4632 1.1426 1.2464

9.5 Analysis of Proposed DSM Algorithm

The results of the proposed DSM algorithm are shown in ﬁgure 14. It is clearly seen that the load from

peak hours are clipped and shifted to the off peak hours. The total power consumption, power supplied

by the MG and power consumed from the SG are shown in the ﬁgure 14. The proposed DSM scheme

is applied on 24 hours of 7th January 2017 because of the fairly reasonable wind power generation and

no zero generation hour throughout the day that leads to a clear depiction of DSM’s results. The purpose

of DSM is to reduce the consumption load of peak hours to minimize the usage of the dispatchable

generators of SG. The MG only has WPP and no dispatchable generators. If the wind generation is

insufﬁcient, the MG purchase energy from SG. If energy demand of MG’s consumers is in the peak

hours, then the load of MG is shifted from peak hours to off peak hours.

An assumption is made that the MG encourages its consumers to shift their load from peak hours to off

peak hours by offering some incentives and consumers shift their consumption load that leads to overall

load shifting in MG, consequently, the consumption cost of consumers is reduced. MG gets the advantage

of not purchasing more energy from SG in peak hours (where price is higher than off peak hours’ price)

that leads to the purchasing cost reduction for MG too. In this manner, the consumers will be satisﬁed as

well as MG will have cost effective demand management.

The proposed algorithm successfully shifts the load. In the proposed method, the load is shifted to off

peak hours that are not late night. This is suitable because late night at sleeping hours, the electricity

cannot be consumed much. The goal of almost normally distributing the load proﬁle is achieved. The

load before DSM and after applying proposed DSM algorithm is shown in ﬁgure 15.The load proﬁle after

DSM is more towards the normal distribution than the proﬁle before DSM. Exact normal distribution of

load is unable to achieve because of the ﬁxed working hours. The electricity consumption in working

hours cannot be shifted to other hours in a manner to achieve perfectly normal distribution of load. A

portion of load is able to be shifted that is known as shift-able load. The goals are to shift the shift-able

load in order to improve load factor and reduce price that are achieved by applying proposed DSM.

Another goal of the proposed DSM algorithm is reducing the consumption cost. When the load is shifted

to off peak hours, the consumption cost reduces due to the fact that the there is low power price in off

peak hours. The reduction in consumption cost achieved by the proposed DSM algorithm is presented

in Table 7that shows price before and after applying DSM algorithm. The cost reduced by DSM and its

percentage is also mentioned. On average 1.1% of total cost is reduced by applying the proposed DSM

algorithm. When the proposed algorithm is applied to the 365 days of the year 2017, approximately $2.25

million consumption cost is reduced. The DSM results of one day consumption cost from all four seasons

are presented in the Table 7. One day from every season of the year is taken for calculating results of DSM

algorithm; i.e., 1st January (Winter), 1st April (Spring), 1st July (Summer) and 1st October (Autumn).

Table 7 Energy consumption cost reduction by the proposed DSM algorithm.

Consumption Cost / Day ($) Reduction / Day

Season Before DSM After DSM Amount ($) Percentage

Spring 483330 475170 8153 $ 1.7%

Summer 793930.5 784403 7527 $ 1.2%

Autumn 417980.5 413770.5 4210 $ 1%

Winter 3347106 3305006 42109 $ 1.3%

0 5 10 15 20 25

Time (H)

0

200

400

600

800

1000

1200

1400

Load (MW)

SG Load Before DSM

SG Load After DSM

MG Load

Total Load

Valley Filling

Peak Clipping

Load Shifting

Figure 14. Valley ﬁlling and peak clipping through Efﬁcient DSM algorithm.

550 610 670 730 790

Load (MW)

0

5

10

15

Frequency

(a) Load curve before DSM.

535 605 675 745 815

Load (MW)

0

5

10

15

Frequency

(b) Load curve after DSM.

Figure 15. Effect of the proposed DSM scheme on load proﬁle.

10 Results and Discussion of DE-RUSBoost

In this section the experimental results and discussion are presented. The description of datasets used for

evaluating the proposed model is also given in this section. The problem statement of the DE-RUSBoost

is discussed in section 4.

10.1 Data Description

The State Grid Corporation of China (SGCC) dataset comprises of labeled data of 42,372 commercial and

residential electricity consumers. The data is the daily consumption of 1035 days from 1st January 2014

to 31st October 2016. Among 42,372 consumers, 3615 consumers are labeled as fraudulent. Only 9% of

the whole data is the electricity thieves. The data contains noisy and missing values that are replaced in

the preprocessing step. This data is published online on China State Grid Corporation’s website [101] in

2017.

10.2 Performance Evaluation

The performance of the proposed model is evaluated using ﬁve well-known classiﬁcation evaluation ma-

trices, i.e., recall, precision, speciﬁcity, accuracy and AUC. In classiﬁcation models, recall determines

the correctly classiﬁed positive samples that is also known as true positive rate. Precision or positive pre-

dictive value is the ratio of correctly classiﬁed positive examples to the total classiﬁed positive examples.

Speciﬁcity is also known as true negative rate and it presents the correctly classiﬁed negative examples.

Accuracy shows all the correctly classiﬁed examples. Whereas, AUC is a performance measure whose

value is high if both the true positive and true negative rates are high. The range of all the performance

matrices discussed above is from 0 to 1, where 0 is the worst and 1 is the best value. The proposed

classiﬁcation performance is compared with two electricity theft detection models that are proposed in

the papers [75] and [76]. The performance is shown in table 8. Numerical comparison shows the superior

performance of the proposed method. The proposed method has greater accuracy and AUC as compared

to the other models that proves its effectiveness in ETD.

10.3 Comparisons and Discussion

The proposed model outperforms grid search based RUSBoost classiﬁer [76] in terms of all aforemen-

tioned performance matrices (as shown in table 8). This performance gain is achieved due to the ﬁne

tuned parameters of the classiﬁer. Although, the authors of [76] implement RUSBoost classiﬁer, how-

ever, the parameters are selected using grid search. Grid search is an exhaustive search method that selects

best parameters from a subset of all possible parameter values. It doesn’t guarantee the selection of the

best parameters for the classiﬁer. Whereas, in the proposed scheme, the best parameters are selected

using meta-heuristic technique DE. DE selects the best parameters according to the objective function

of maximizing the AUC (as shown in equation 7). Therefore, the parameters selected by DE are better

than grid search selected parameters. The performance of the proposed method is signiﬁcantly improved

from [76]. The second comparative method is [75]. This method is evaluated on SGCC dataset. A new

scheme Wide And Deep CNN (WADCNN) is proposed to detect electricity theft on the highly imbal-

anced labeled data. This method achieves reasonable performance, however, the class imbalance problem

is not tackled. With properly tackling the class imbalance problem, the proposed method’s performance

enhanced signiﬁcantly as compared to [75].

Table 8 Detection accuracy of the DE-RUSBoost model.

Method Accuracy Precision Recall Speciﬁcity AUC

RUSBoost [76] 0.863 0.736 0.726 0.872 0.762

WADCNN [75] 0.925 0.766 0.792 0.752 0.801

DE-RUSBoost 0.956 0.902 0.735 0.996 0.896

10.4 Parameter Study

The performance of the proposed algorithm is analyzed with multiple values of hyper parameters, i.e.,

imbalance ratio and number of trees.

10.4.1 Effect of Weak Learners’ Number

The number of weak learners impacts the AUC as shown in ﬁgure 16. The weak learners in our case are

decision trees. Less number of trees in RUSBoost results in low AUC. If the number of trees are kept on

increasing, it has a positive impact on AUC, however, after a certain number of trees, the AUC becomes

stable. After that point, if the number of trees is further increased, the classiﬁer becomes unstable and

its performance degrades that results in lesser AUC. In the ﬁgure 16, the DE-RUSBoost performance

becomes almost stable after 200 trees, a minor improvement is visible till it achieves maximum AUC

on 320 trees. There is no signiﬁcant improvement in AUC after that point, whereas, the AUC start

decreasing as clearly visible in the ﬁgure, at 350 trees, AUC decrease a little bit as compared to 320 trees

and after that drastic degradation in accuracy is seen. The reason behind the performance degradation on

too many weak classiﬁers is the over ﬁtting. The classiﬁer over trains on the training samples, therefore,

it misclassiﬁes the unseen test samples.

10.4.2 Effect of Class Imbalance Ratio

In ﬁgure 17, the impact of class imbalance on the AUC is shown. The percentage shown in this ﬁgure

is the percentage of training samples of the majority class to the minority class (i.e., fair consumers’

consumption data). The class imbalance ratio is optimized using DE to achieve the highest AUC. The

highest AUC is achieved on class imbalance percentage of 56%, 44%. As the class imbalance percentage

increases, the performance degrades. This happens because the minority class samples are underrep-

resented. The classiﬁer has not sufﬁcient training samples of a minority class, therefore, it is unable to

learn and generalize it. After the training, the classiﬁer is more biased towards the majority class and mis-

classiﬁes the test samples from minority class. It is clear from ﬁgure 17, that the classiﬁer’s performance

signiﬁcantly degrades on the high class imbalance percentage.

50 100 150 200 250 300 350 400

Number of Trees

0.5

0.6

0.7

0.8

0.9

AUC

SGCC

Figure 16. Area under the curve versus number of trees.

(50:50)% (54:46)% (58:42)% (62:38)% (66:34)% (70:30)% (74:26)%

Class Imbalance Percentage

0.4

0.6

0.8

AUC

SGCC

Figure 17. Area under the curve versus imbalance percentage of training examples

of two classes.

11 Future Work

In the previous section, the preliminary results are shown. The future work is given below:

• The performance of SISO (DLSTM) and MIMO (ESAENARX and DE-RELM) models will be

validated by conducting multiple case studies and applying them to different scenarios.

• The wind and photovoltaic power will be analyzed deeply in order to quantify their impacts on

green house gas emissions and electricity generation cost.

• A labeling method will be introduced to label the unlabeled electricity consumption data to identify

the fair and malicious electricity consumers.

References

[1] Daki H, El Hannani A, Aqqal A, Haidine A, Dahbi A. Big Data Management in Smart Grid: Con-

cepts, Requirements and Implementation. Journal of Big Data. 2017 Dec;4(1):13–27.

[2] Li C, Yu X, Yu W, Chen G, Wang J. Efﬁcient Computation for Sparse Load Shifting in Demand Side

Management. IEEE Transactions on Smart Grid. 2016 Feb 12;8(1):250–61.

[3] Zhou K, Fu C, Yang S. Big Data driven Smart Energy Management: From Big Data to Big Insights.

Renewable and Sustainable Energy Reviews. 2016 Apr 1;56:215–225.

[4] Wang K, Yu J, Yu Y, Qian Y, Zeng D, Guo S, Xiang Y, Wu J. A Survey on Energy Internet: Architec-

ture, Approach, and Emerging Technologies. IEEE Systems Journal. 2017 Jan 5;12(3):2403–2416.

[5] Jiang H, Wang K, Wang Y, Gao M, Zhang Y. Energy Big Data: A Survey. IEEE Access.

2016;4:3844–3861.

[6] Liu Y, Wang W, Ghadimi N. Electricity Load Forecasting by an Improved Forecast Engine for Build-

ing Level Consumers. Energy. 2017 Nov 15;139:18–30.

[7] Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A Novel Bidirectional Mechanism based on Time Series

Model for Wind Power Forecasting. Applied energy. 2016 Sep 1;177:793–803.

[8] Naz A, Javaid N, Rasheed MB, Haseeb A, Alhussein M, Aurangzeb K. Game Theoretical Energy

Management with Storage Capacity Optimization and Photo-Voltaic Cell Generated Power Forecast-

ing in Micro Grid. Sustainability. 2019 Jan;11(10):2763–2781.

[9] U.S. Department of Energy, Staff Report to the Secretary on Electricity Markets and Reliability,

2017. Online available at: https://www.energy.gov/downloads 1st March 2019).

[10] De Jong P, Kiperstok A, Sanchez AS, Dargaville R, Torres EA. Integrating Large Scale Wind Power

into the Electricity Grid in the Northeast of Brazil. Energy. 2016 Apr 1;100:401–415.

[11] Global Wind Energy Council. GWEC Global Wind Report 2016. Online available at: https://

gwec.net/publications/global-wind-report-2/global-wind-report-2016

(Last accessed on 1st March 2019).

[12] U.S. Department of Energy, 20% Wind Energy by 2030: Increasing Wind Energy’s Contribution to

US Electricity Supply, Energy Efﬁciency and Renewable Energy (EERE), 2008. Online available at:

https://www.energy.gov/eere/wind (Last accessed on 1st March 2019).

[13] Athari MH, Wang Z. Impacts of Wind Power Uncertainty on Grid Vulnerability to Cascading Over-

load Failures. IEEE Transactions on Sustainable Energy. 2017 Jun 22;9(1):128–137.

[14] Wang Q, Martinez-Anido CB, Wu H, Florita AR, Hodge BM. Quantifying the Economic and Grid

Reliability Impacts of Improved Wind Power Forecasting. IEEE Transactions on Sustainable Energy.

2016 May 13;7(4):1525–1537.

[15] Swinand GP, O’Mahoney A. Estimating the Impact of Wind Generation and Wind Forecast Errors

on Energy Prices and Costs in Ireland. Renewable energy. 2015 Mar 1;75:468–473.

[16] Chen Z. Wind Power in Modern Power Systems. Journal of Modern Power Systems and Clean

Energy. 2013 Jun 1;1(1):2–13.

[17] Haque AU, Nehrir MH, Mandal P. A Hybrid Intelligent Model for Deterministic and Quantile Re-

gression Approach for Probabilistic Wind Power Forecasting. IEEE Transactions on power systems.

2014 Jan 28;29(4):1663–1672.

[18] Juban J, Siebert N, Kariniotakis GN. Probabilistic Short-Term Wind Power Forecasting for the

Optimal Management of Wind Generation. In 2007 IEEE Lausanne Power Tech 2007. Jul 1;683–

688. IEEE.

[19] Akhavan-Hejazi H, Mohsenian-Rad H. Power Systems Big Data Analytics: An Assessment of

Paradigm Shift Barriers and Prospects. Energy Reports. 2018 Nov 1;4:91–100.

[20] Zhang Q, Yang LT, Chen Z, Li P. A survey on Deep Learning for Big Data. Information Fusion.

2018 Jul 1;42:146–157.

[21] Nadeem Z, Javaid N, Malik A, Iqbal S. Scheduling Appliances with GA, TLBO, FA, OSR and their

Hybrids using Chance Constrained Optimization for Smart Homes. Energies. 2018 Apr;11(4):888–

1005.

[22] Wang K, Xu C, Zhang Y, Guo S, Zomaya AY. Robust Big Data Analytics for Electricity Price

Forecasting in the Smart Grid. IEEE Transactions on Big Data. 2017 Jul 5;5(1):34–45.

[23] Mahmood D, Javaid N, Alrajeh N, Khan Z, Qasim U, Ahmed I, Ilahi M. Realistic Scheduling

Mechanism for Smart Homes. Energies. 2016 Mar;9(3):202–220.

[24] Rasheed MB, Javaid N, Malik MS, Asif M, Hanif MK, Chaudary MH. Intelligent Multi-agent based

Multilayered Control System for Opportunistic Load Scheduling in Smart Buildings. IEEE Access.

2019 Feb 18;7:23990–24006.

[25] Rasheed M, Javaid N, Ahmad A, Khan Z, Qasim U, Alrajeh N. An Efﬁcient Power Scheduling

Scheme for Residential Load Management in Smart Homes. Applied Sciences. 2015;5(4):1134–

1163.

[26] Hafeez G, Javaid N, Iqbal S, Khan F. Optimal Residential Load Scheduling under Utility and

Rooftop Photovoltaic Units. Energies. 2018 Mar;11(3):611–633.

[27] Javaid N, Ahmed F, Ullah I, Abid S, Abdul W, Alamri A, Almogren A. Towards Cost and Comfort-

Based Hybrid Optimization for Residential Load Scheduling in A Smart Grid. Energies. 2017 Oct

8;10(10):1546–1568.

[28] Naz M, Iqbal Z, Javaid N, Khan Z, Abdul W, Almogren A, Alamri A. Efﬁcient Power Scheduling

in Smart Homes using Hybrid Grey Wolf Differential Evolution Optimization Technique with Real

Time and Critical Peak Pricing Schemes. Energies. 2018 Feb 7;11(2):384–409.

[29] Khalid R, Javaid N, Rahim MH, Aslam S, Sher A. Fuzzy Energy Management Controller and

Scheduler for Smart Homes. Sustainable Computing: Informatics and Systems. 2019 Mar 1;21:103–

118.

[30] Samuel O, Javaid S, Javaid N, Ahmed S, Afzal M, Ishmanov F. An Efﬁcient Power Scheduling in

Smart Homes Using Jaya Based Optimization with Time-of-Use and Critical Peak Pricing Schemes.

Energies. 2018 Nov;11(11):3155–3179.

[31] Javaid N, Ahmed A, Iqbal S, Ashraf M. Day Ahead Real Time Pricing and Critical Peak

Pricing-Based Power Scheduling for Smart Homes with Different Duty Cycles. Energies. 2018

Jun;11(6):1464–1481.

[32] Rahim M, Khalid A, Javaid N, Ashraf M, Aurangzeb K, Altamrah A. Exploiting Game Theoretic-

Based Coordination Among Appliances in Smart Homes for Efﬁcient Energy Utilization. Energies.

2018 Jun;11(6):1426–1442.

[33] Javaid N, Ahmed F, Ullah I, Abid S, Abdul W, Alamri A, Almogren A. Towards Cost and Comfort

Based Hybrid Optimization for Residential Load Scheduling in A Smart Grid. Energies. 2017 Oct

8;10(10):1546–1567.

[34] Javaid N, Ullah I, Akbar M, Iqbal Z, Khan FA, Alrajeh N, Alabed MS. An Intelligent Load Man-

agement System with Renewable Energy Integration for Smart Homes. IEEE Access. 2017 Jun

14;5:13587–135600.

[35] Liu JP, Li CL. The Short-Term Power Load Forecasting Based on Sperm Whale Algorithm and

Wavelet Least Square Support Vector Machine with DWT-IR for Feature Selection. Sustainability.

2017 Jul;9(7):1188–1204.

[36] Fan GF, Peng LL, Zhao X, Hong WC. Applications of Hybrid EMD with PSO and GA for A

SVR-Based Load Forecasting Model. Energies. 2017 Oct 26;10(11):1713–1734.

[37] Ghasemi A, Shayeghi H, Moradzadeh M, Nooshyar M. A Novel Hybrid Algorithm for Electricity

Price and Load Forecasting in Smart Grids with Demand-Side Management. Applied energy. 2016

Sep 1;177:40-59.

[38] Singh S, Yassine A. Big Data Mining of Energy Time Series for Behavioral Analytics and Energy

Consumption Forecasting. Energies. 2018 Feb 20;11(2):452–470.

[39] Wang L, Zhang Z, Chen J. Short-Term Electricity Price Forecasting with Stacked Denoising Au-

toencoders. IEEE Transactions on Power Systems. 2016 Nov 15;32(4):2673–2681.

[40] Tong C, Li J, Lang C, Kong F, Niu J, Rodrigues JJ. An Efﬁcient Deep Model for Day-Ahead Elec-

tricity Load Forecasting with Stacked Denoising Autoencoders. Journal of Parallel and Distributed

Computing. 2018 Jul 1;117:267–273.

[41] Ahmad A, Javaid N, Guizani M, Alrajeh N, Khan ZA. An Accurate and Fast Converging Short-

Term Load Forecasting Model for Industrial Applications in A Smart Grid. IEEE Transactions on

Industrial Informatics. 2016 Dec 9;13(5):2587–2596.

[42] Ahmad A, Javaid N, Alrajeh N, Khan Z, Qasim U, Khan A. A Modiﬁed Feature Selection and

Artiﬁcial Neural Network-Based Day-Ahead Load Forecasting Model for A Smart Grid. Applied

Sciences. 2015;5(4):1756–1772.

[43] Kuo PH, Huang CJ. An Electricity Price Forecasting Model by Hybrid Structured Deep Neural

Networks. Sustainability. 2018 Apr;10(4):1280–1300.

[44] Ugurlu U, Oksuz I, Tas O. Electricity Price Forecasting using Recurrent Neural Networks. Energies.

2018 May;11(5):1255–1278.

[45] Fan C, Xiao F, Zhao Y. A Short-Term Building Cooling Load Prediction Method using Deep Learn-

ing Algorithms. Applied energy. 2017 Jun 1;195:222–233.

[46] Ryu S, Noh J, Kim H. Deep Neural Network-Based Demand Side Short Term Load Forecasting.

Energies. 2016 Dec 22;10(1):3–21.

[47] Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep Learning for Estimating Building Energy

Consumption. Sustainable Energy, Grids and Networks. 2016 Jun 1;6:91–99.

[48] Li C, Ding Z, Zhao D, Yi J, Zhang G. Building Energy Consumption Prediction: An Extreme Deep

Learning Approach. Energies. 2017;10(10):1525–1543.

[49] Fu G. Deep Belief Network-Based Ensemble Approach for Cooling Load Forecasting of Air-

Conditioning System. Energy. 2018 Apr 1;148:269–282.

[50] Dedinec A, Filiposka S, Dedinec A, Kocarev L. Deep Belief Network-Based Electricity Load Fore-

casting: An Analysis Of Macedonian Case. Energy. 2016 Nov 15;115:1688–1700.

[51] Ahmad A, Javaid N, Mateen A, Awais M, Khan Z. Short-Term Load Forecasting in Smart Grids:

An Intelligent Modular Approach. Energies. 2019 Jan;12(1):164–185.

[52] Zahid M, Ahmed F, Javaid N, Abbasi RA, Kazmi Z, Syeda H, Javaid A, Bilal M, Akbar M, Ilahi

M. Electricity Price and Load Forecasting Using Enhanced Convolutional Neural Network and En-

hanced Support Vector Regression in Smart Grids. Electronics. 2019 Feb;8(2):122–142.

[53] Khan M, Javaid N, Naseem A, Ahmed S, Riaz M, Akbar M, Ilahi M. Game Theoretical Demand

Response Management and Short-Term Load Forecasting By Knowledge-Based Systems on The

Basis of Priority Index. Electronics. 2018 Dec;7(12):431–455.

[54] Naz A, Javed MU, Javaid N, Saba T, Alhussein M, Aurangzeb K. Short-Term Electric Load And

Price Forecasting Using Enhanced Extreme Learning Machine Optimization in Smart Grids. Ener-

gies. 2019 Jan;12(5):866–887.

[55] Mujeeb S, Javaid N, Akbar M, Khalid R, Nazeer O, Khan M. Big Data Dnalytics for Price and

Load Forecasting in Smart Grids. InInternational Conference on Broadband and Wireless Comput-

ing, Communication and Applications 2018 Oct 27, 77–87. Springer, Cham.

[56] Qiu X, Ren Y, Suganthan PN, Amaratunga GA. Empirical Mode Decomposition-Based Ensemble

Deep Learning for Load Demand Time Series Forecasting. Applied Soft Computing. 2017 May

1;54:246–255.

[57] Rahman A, Srikumar V, Smith AD. Predicting Electricity Consumption for Commercial and Res-

idential Buildings using Deep Recurrent Neural Networks. Applied energy. 2018 Feb 15;212:372–

385.

[58] Bouktif S, Fiaz A, Ouni A, Serhani M. Optimal Deep Learning LSTM Model for Electric Load

Forecasting Using Feature Selection and Genetic Algorithm: Comparison with Machine Learning

Approaches. Energies. 2018 Jul;11(7):1636–1658.

[59] Mujeeb S, Javaid N, Javaid S. Data Analytics for Price Forecasting in Smart Grids: A Survey. In

2018 IEEE 21st International Multi-Topic Conference (INMIC) 2018 Nov 1, 1–10. IEEE.

[60] Mujeeb S, Javaid N, Akbar M, Khalid R, Nazeer O, Khan M. Big Data Analytics for Price and

Load Forecasting in Smart Grids. InInternational Conference on Broadband and Wireless Comput-

ing, Communication and Applications 2018 Oct 27, 77–87. Springer, Cham.

[61] Ayub N, Javaid N, Mujeeb S, Zahid M, Khan WZ, Khattak MU. Electricity Load Forecasting in

Smart Grids Using Support Vector Machine. InInternational Conference on Advanced Information

Networking and Applications 2019 Mar 27 (pp. 1-13). Springer, Cham.

[62] Mujeeb S, Javaid N, Gul H, Daood N, Shabbir S, Arif A. Wind Power Forecasting based on Efﬁcient

Deep Convolution Neural Networks. InInternational Conference on P2P, Parallel, Grid, Cloud and

Internet Computing 2019 Nov 7, 47–56. Springer, Cham.

[63] Zheng H, Yuan J, Chen L. Short-Term Load Forecasting using EMD-LSTM Neural Networks with

a XGboost Algorithm for Feature Importance Evaluation. Energies. 2017 Aug;10(8):1168–1188.

[64] Shi H, Xu M, Li R. Deep Learning for Household Load Forecasting–A Novel Pooling Deep RNN.

IEEE Transactions on Smart Grid. 2017 Mar 22;9(5):5271–5280.

[65] Perez-Chacon R, Luna-Romera J, Troncoso A, Martinez-Alvarez F, Riquelme J. Big Data Analytics

for Discovering Electricity Consumption Patterns in Smart Cities. Energies. 2018;11(3):683–700.

[66] Grolinger K, L’Heureux A, Capretz MA, Seewald L. Energy Forecasting for Event Venues: Big

Data and Prediction Accuracy. Energy and Buildings. 2016 Jan 15;112:222–233.

[67] Wang P, Liu B, Hong T. Electric Load Forecasting with Recency Effect: A Big Data Approach.

International Journal of Forecasting. 2016 Jul 1;32(3):585–597.

[68] Wang HZ, Li GQ, Wang GB, Peng JC, Jiang H, Liu YT. Deep Learning based Ensemble Approach

for Probabilistic Wind Power Forecasting. Applied energy. 2017 Feb 15;188:56–70.

[69] Torres JM, Aguilar RM. Using Deep Learning to Predict Complex Systems: A Case Study in Wind

Farm Generation. Complexity. 2018;2018:1–10.

[70] Qureshi AS, Khan A, Zameer A, Usman A. Wind Power Prediction Using Deep Neural Network

Based Meta Regression and Transfer Learning. Applied Soft Computing. 2017 Sep 1;58:742–755.

[71] Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-Term Residential Load Forecasting based

on LSTM Recurrent Neural Network. IEEE Transactions on Smart Grid. 2019 Jan 18;10(1):841–851.

[72] Sun M, Zhang T, Wang Y, Strbac G, Kang C. Using Bayesian Deep Learning to Capture Uncer-

tainty for Residential Net Load Forecasting. IEEE Transactions on Power Systems. 2019 Jun 21;

DOI:10.1109/TPWRS.2019.2924294.

[73] Ye C, Ding Y, Wang P, Lin Z. A Data-Driven Bottom-Up Approach for Spatial and Temporal

Electric Load Forecasting. IEEE Transactions on Power Systems. 2019 Jan 4;34(3):1966–1979.

[74] Gao W, Darvishan A, Toghani M, Mohammadi M, Abedinia O, Ghadimi N. Different States of

Multi-Block based Forecast Engine for Price and Load Prediction. International Journal of Electrical

Power & Energy Systems. 2019 Jan 1;104:423–435.

[75] Zheng Z, Yang Y, Niu X, Dai HN, Zhou Y. Wide and Deep Convolutional Neural Networks for

Electricity-Theft Detection to Secure Smart Grids. IEEE Transactions on Industrial Informatics. 2017

Dec 21;14(4):1606–1615.

[76] Avila NF, Figueroa G, Chu CC. NTL Detection in Electric Distribution Systems using the Maximal

Overlap Discrete Wavelet-Packet Transform and Random Undersampling Boosting. IEEE Transac-

tions on Power Systems. 2018 Jul 5;33(6):7171–7180.

[77] Mujeeb S, Javaid N, Ilahi M, Wadud Z, Ishmanov F, Afzal MK. Deep Long Short-Term Mem-

ory: A New Price and Load Forecasting Scheme for Big Data in Smart Cities. Sustainability. 2019

Jan;11(4):987–1016.

[78] Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks. In Advances

in neural information processing systems 2014, 3104–3112.

[79] Zaytar MA, El Amrani C. Sequence to Sequence Weather Forecasting with Long Short-Term

Memory Recurrent Neural Networks. International Journal of Computer Applications. 2016

Jun;143(11):7–11.

[80] Mujeeb S, Javaid N. ESAENARX and DE-RELM: Novel Schemes for Big Data Predictive Analyt-

ics of Electricity Load and Price. Sustainable Cities and Society. 2019 Nov 1;51:101642–101655.

[81] Hida T, Kuo HH, Potthoff J, Streit L. White Noise: An Inﬁnite Dimensional Calculus. Springer

Science & Business Media; 2013 Jun 29.

[82] Coifman RR, Wickerhauser MV. Entropy-Based Algorithms for Best Basis Selection. IEEE Trans-

actions on Information Theory. 1992 Mar;38(2):713–718.

[83] Chen X, Li S, Wang W. New De-Noising Method for Speech Signal Based on Wavelet Entropy and

Adaptive Threshold. Journal of Information & Computational Science. 2015, 12(3):1257–1265.

[84] Mujeeb S, Alghamdi TA, Ullah S, Fatima A, Javaid N, Saba T. Exploiting Deep Learning for Wind

Power Forecasting Based on Big Data Analytics. Applied Sciences. 2019 Jan;9(20):4417–4445.

[85] Fukushima K. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern

Recognition Unaffected by Shift in Position. Biological Cybernetics. 1980 Apr 1;36(4):193–202.

[86] Figueroa G, Chen YS, Avila N, Chu CC. Improved Practices in Machine Learning Algorithms for

NTL Detection with Imbalanced Data. In 2017 IEEE Power & Energy Society General Meeting 2017

Jul 16, 1–5. IEEE.

[87] Winkler D, Haltmeier M, Kleidorfer M, Rauch W, Tscheikner-Gratl F. Pipe Failure Modelling for

Water Distribution Networks using Boosted Decision Trees. Structure and Infrastructure Engineer-

ing. 2018 Oct 3;14(10):1402–1411.

[88] Krawczyk B, Galar M, Jelen L, Herrera F. Evolutionary Undersampling Boosting for Imbalanced

Classiﬁcation of Breast Cancer Malignancy. Applied Soft Computing. 2016 Jan 1;38:714–726.

[89] Elhassan, T. and Aljurf, M. Classiﬁcation of Imbalance Data using Tomek Link (T-Link) Combined

with Random Under-sampling (RUS) as a Data Reduction Method. Global Journal of Technology &

Optimization. 2016, DOI: 10.4172/2229-8711.S1:111.

[90] Civicioglu P, Besdok E, Gunen MA, Atasever UH. Weighted Differential Evolution Algorithm for

Numerical Function Optimization: A Comparative Study with Cuckoo Search, Artiﬁcial Bee Colony,

Adaptive Differential Evolution, And Backtracking Search Optimization Algorithms. Neural Com-

puting and Applications. 2018:1–15.

[91] ISO NE Market Operations Data, https://www.iso-ne.com/isoexpress/web/

reports/pricing/-/tree/zone-info (Last visited on 10th Febraury 2019)

[92] Yang Z, Ce L, Lian L. Electricity Price Forecasting by A Hybrid Model, Combining Wavelet Trans-

form, ARMA and Kernel-Based Extreme Learning Machine Methods. Applied Energy. 2017 Mar

15;190:291–305.

[93] Buitrago J, Asfour S. Short-Term Forecasting of Electric Loads Using Nonlinear Autoregressive

Artiﬁcial Neural Networks with Exogenous Vector Inputs. Energies. 2017 Jan 1;10(1):40–60.

[94] Lago J, De Ridder F, De Schutter B. Forecasting spot electricity prices: Deep Learning Approaches

and Empirical Comparison of Traditional Algorithms. Applied Energy. 2018 Jul 1;221:386–405.

[95] Derrac J, Garcia S, Molina D, Herrera F. A Practical Tutorial on The Use of Nonparametric Sta-

tistical Tests as A Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms.

Swarm and Evolutionary Computation. 2011 Mar 1;1(1):3–18.

[96] Martin P, Moreno G, Rodriguez F, Jimenez J, Fernandez I. A Hybrid Approach to Short-Term Load

Forecasting Aimed at Bad Data Detection in Secondary Substation Monitoring Equipment. Sensors.

2018 Nov;18(11):3947–3963.

[97] Diebold FX, Mariano RS. Comparing Predictive Accuracy. Journal of Business & economic statis-

tics. 2002 Jan 1;20(1):134–144.

[98] Ludwig N, Feuerriegel S, Neumann D. Putting Big Data Analytics to Work: Feature Selection for

Forecasting Electricity Prices using the LASSO and Random Forests. Journal of Decision Systems.

2015 Jan 2;24(1):19–36.

[99] ISO NE Generation Data, https://www.iso- ne.com/isoexpress/web/reports/

operations/-/tree/daily-gen-fuel-type (Last visited on 20th January 2019).

[100] Chen H, Wan Q, Wang Y. Reﬁned Diebold-Mariano test methods for the evaluation of wind power

forecasting models. Energies. 2014 Jul 1;7(7):4185–4198.

[101] State Gride Corporation of China Data, http://www.sgcc.com.cn (Last visited on 20th

September 2019).

Tentative Time table

Sr No. Activity Date

1 Background study and detailed literature review Completed

2 Formulation of problem and proposing solution Completed

3 Analysis and dissemination of results April

4 Thesis Writing May

PART II

Recommendation by the Research Supervisor

Name_________________________Signature_____________________Date________

Recommendation by the Research Co-Supervisor

Name_________________________Signature_____________________Date________

Signed by Supervisory Committee

S.# Name of Committee member Designation Signature & Date

1 Dr. Sohail Asghar Professor

2 Dr. Nadeem Javaid Associate Professor

3 Dr. Manzoor Ilahi Associate Professor

4 Dr. Majid Iqbal Associate Professor

Approved by Departmental Advisory Committee

Certiﬁed that the synopsis has been seen by members of DAC and considered it suitable for

putting up to BASAR.

Secretary

Departmental Advisory Committee

Name: _____________________________

Signature: _____________________________

Date: _____________________________

Chairman/HoD: ____________________________

Signature: _____________________________

Date: _____________________________

PART III

Dean, Faculty of Information Sciences & Technology

_____________________Approved for placement before BASAR.

_____________________Not Approved on the basis of following reasons

Signature_____________________Date________

Secretary BASAR

_____________________Approved for placement before BASAR.

_____________________Not Approved on the basis of following reasons

Signature_____________________Date________

Dean, Faculty of Information Sciences & Technology

________________________________________________________________________

________________________________________________________________________

________________________________________________________________________

Signature_____________________Date________

Please provide the list of courses studied

1. Special Topics in Artiﬁcial Neural Networks

2. Advanced Topics in Data Mining

3. Advanced Topics in Computer Vision

4. Special Topics in Machine Learning

5. Advanced Topics in Digital Image Processing

6. Special Topics in Computer Vision