Content uploaded by Nadeem Javaid

Author content

All content in this area was uploaded by Nadeem Javaid on May 30, 2019

Content may be subject to copyright.

ESAENARX and DE-RELM: Novel Schemes for Big Data Predictive

Analytics of Electricity Load and Price

Sana Mujeeba,Nadeem Javaida,∗

aCOMSATS University Islamabad, Islamabad 44000, Pakistan

ARTICLE INFO

Keywords:

Coordination

Dynamic programming

Knapsack

Multi-objective optimization

Pareto front

Meta-heuristic

Nature-inspired

Bird swarm and Cuckoo search algo-

rithm

Hybrid technique

Demand side management

Demand response

Smart grid.

ABSTRACT

Accurate forecasting of the electricity price and load is an essential and challenging task in smart

grids. Since electricity load and price have a strong correlation, the forecast accuracy degrades when

bidirectional relation of price and load is not considered. Therefore, this paper considers price and

load relationship and proposes two Multiple Inputs Multiple Outputs (MIMO) Deep Recurrent Neural

Networks (DRNNs) models for price and load forecasting. The ﬁrst proposed model, Eﬃcient Sparse

Autoencoder Nonlinear Autoregressive Network with eXogenous inputs (ESAENARX) comprises of

feature engineering and forecasting. For feature engineering, we propose ESAE and performed fore-

casting using existing method NARX. The second proposed model: Diﬀerential Evolution Recurrent

Extreme Learning Machine (DE-RELM) is based on RELM model and the meta-heuristic DE opti-

mization technique. The descriptive and predictive analyses are performed on two well-known elec-

tricity markets’ big data, i.e., ISO NE and PJM. The proposed models outperform their sub models and

a benchmark model. The reﬁned and informative features extracted by ESAE improve the forecasting

accuracy in ESANARX and optimization improves the DE-RELMâĂŹs accuracy. As compared to

cascade Elman network, ESAENARX has reduced MAPE upto 16% for load forecasting, 7% for price

forecasting. DE-RELM reduce 1% MAPE for both load and price forecasting.

1. Introduction

THE smart grid is a modern power supply network that

uses communication technology. It consists of automation,

control and technology that responds quickly to the con-

sumption changes. Smart grid provides energy in an eﬃ-

cient, secure, reliable, economical and environment-friendly

manner. Renewable Energy Sources (RESs) of power gener-

ation are integrated for reducing the carbon emission. It al-

lows a two-way communication between the consumers and

utility. With the emergence of smart metering infrastructure,

consumers are informed about the price per unit in advance.

Consumers can adjust their demand load economically, ac-

cording to the price signals. They can reduce consumption

cost by shifting load to a low price hour. Smart grids make

a price responsive environment where the price varies from

a change in demand and vice versa.

In unidirectional grids, there is a one-way interaction from

the generation side to consumers. The consumers are not

able to respond to the price signal because of the fact that

they are unaware of the price dynamically. The demand has

shown a very little or no elasticity to price variations in uni-

directional grids. However, with the advent of the smart me-

tering system, consumers are well aware of the price and

they control their power consumption accordingly. There-

fore, price and demand are highly correlated and interde-

pendent. The market participants need reliable techniques to

maximize their proﬁt that depends on accurate load and price

forecasting. The price and demand forecasting also play an

∗Corresponding author

nadeemjavaidqau@gmail.com (N. Javaid)

www.njavaid.com (N. Javaid)

ORCID(s): 0000-0003-3777-8249 (N. Javaid)

important role in energy: systems planning, market design,

security of supply and operation planning for future power

consumption. An accurate forecast is very important. A 1%

reduction in Mean Absolute Percentage Error (MAPE) of the

load forecast reduces the generation cost to 0.1% to 0.3% [1].

0.1% generation cost is approximately $1 million annually in

a large scale smart grid. Due to the importance of an accu-

rate forecast of electricity price and load, the researchers are

still competing for improving the forecast accuracy. Using

big data for predictive analytics improves the forecasting ac-

curacy [2]. Electricity data is big data as the smart meters

record data in small time intervals [3]. In a large-sized smart

grid, approximately 220 million smart meter measurements

are recorded daily. Analytics of this energy big data helps

the power utilities to get deeper insights of consumer behav-

ior [4]. The volume of input data is increasing and train-

ing of classical forecasting methods is diﬃcult. Processing

of big data by classiﬁer based models is very diﬃcult. Be-

cause of their high space and time complexity. On the other

hand, Deep Neural Networks (DNN) perform very well on

big data [5]. DNN has an excellent ability of self learning

and nonlinear approximation. They optimize the space by

dividing the training data into mini-batches. After dividing

whole data is trained batch by batch.

The rest of the paper is organized as: Section 2is related

work, the problem statement is stated in Section 3, descrip-

tions of used methods are presented in Section 4, proposed

models are described in Section 12 and DE-RELM 13, Sec-

tion 15 is simulations and results and Section 22 concludes

this article.

Sana et al.: Preprint submitted to Elsevier Page 1 of 16

Big Data Predictive Analytics of Electricity Load and Price

2. Related Work

With the advent of smart metering system, the energy-

related data is collected in a very huge volume at a high ve-

locity from variety of sources. This data is referred as en-

ergy big data. For making decision regarding energy mar-

ket operation, predictive analytics is performed on this load

and price data. For maintaining the demand and supply bal-

ance, an accurate prediction of load is essential. Whereas,

the price forecasting plays an important role in the bidding

process and energy trading. To ensure the reliability, stabil-

ity and security of smart grid accurate forecasts of electricity

load and price are essential. Electricity load and price have

bi-directional nature, therefore, simultaneous prediction of

load and price yields greater accuracy.

The authors of papers [6,7] have predicted price and load si-

multaneously. Authors of [6] have proposed a hybrid model

for simultaneous forecasting of electricity load and price.

The proposed model consists of three stages, i.e., denois-

ing, feature engineering and forecasting. For denoising,

authors propose a new Wavelet Packet Transform (WPT)

based method, Flexible WPT (FWPT). The features are se-

lected by adjacent features and Conditional Mutual Infor-

mation (CMI). In the forecasting step, Autoregressive In-

tegrated Moving Average (ARIMA) and Nonlinear Least

Square Support Vector Machine (NLSSVM) are employed

for linear and nonlinear modeling. The NLSSVM is opti-

mized using enhanced optimization technique Time Vary-

ing Artiﬁcial Bee Colony (TV-ABC). This hybrid model re-

sults in reasonable forecasting accuracy, however, the model

is highly complex. Moreover, the optimization of forecast-

ing model leads to over-ﬁtting. In paper [7], authors pre-

dict load and price using a multi-stage forecasting approach.

The complex forecasting approach proposed in this work is

comprised of feature selection and multi-stage forecast en-

gine. Features are selected through a modiﬁed Maximum

Relevancy Minimum Redundancy (MRMR) method. Elec-

tricity load and price are forecasted using multi-block Arti-

ﬁcial Neural Network (ANN) known as Elman Neural Net-

work (ENN). The forecasting model is optimized by a shark

smell optimization method. This method results in a reason-

able forecasting accuracy. However, it is computationally

very expensive. The feature engineering process and opti-

mization of ENN increase complexity. Moreover, big data

is not considered in this method. In paper [7], authors pre-

dict load and price using a multi-stage forecasting approach.

The complex forecasting approach proposed in this work is

comprised of feature selection and multi-stage forecast en-

gine. Features are selected through a modiﬁed Maximum

Relevancy Minimum Redundancy (MRMR) method. Elec-

tricity load and price are forecasted using multi-block Arti-

ﬁcial Neural Network (ANN) known as Elman Neural Net-

work (ENN). The forecasting model is optimized by a shark

smell optimization method. This method results in a rea-

sonable forecasting accuracy. However, it is computation-

ally very expensive. The feature engineering process and

optimization of ENN increase complexity. Moreover, big

data is not considered in this method. Authors of paper [8]

have conducted a predictive analysis of electricity price fore-

casting taking advantage of big data. The relevant features

for the training prediction model are selected through an ex-

tensive feature engineering process. This process has three

steps: ﬁrstly, correlated features are selected using Gray

Correlation Analysis (GCA). Secondly, features are selected

through a hybrid of two feature selection methods: RliefF

and Random Forest (RF) are used for further feature selec-

tion. Lastly, the Kernel Principle Analysis (KPCA) is ap-

plied for dimension reduction. Price is predicted by SVM

and the hyper-parameters of SVM are optimized through

modiﬁed Diﬀerential Evolution (DE). In paper [9], the au-

thors forecast the energy consumption on big data. An anal-

ysis of frequent patterns is performed using a supervised

clustering method. Energy consumption is forecasted by the

Bayesian network.

Authors of paper [10] have utilized the computational power

of deep learning for Electricity Price Forecasting (EPF).

Stacked Denoising Autoencoder (SDA) and RANSAC-SDA

(RS-SDA) models are implemented for online and the day

ahead hourly EPF. The three years (i.e., January 2012 –

November 2014) data utilized in this paper. Data is collected

from Texas, Arkansas, Nebraska, Indiana and Louisiana

ISO hubs in the USA. Comprehensive analyses of the ca-

pabilities of the RS-SDA and SDA models in the EPF are

performed. The eﬀectiveness of the proposed models is

validated through their comparative analyses with classical

ANN, SVM (Support Vector Machine) and MARS (Multi-

variate Adaptive Regression Splines). Both the SDA and

RS-SDA models are able to accurately predict electricity

price with a considerably less MAPE as compared to the

aforementioned models.

A deep learning model for Short-term Load Forecasting

(STLF) is proposed by Tong et al. [11]. The features are

extracted using SDA from the historical electricity load and

corresponding temperature data. Support Vector Regressor

(SVR) model is trained for the day ahead STLF. The SDA

has eﬀectively extracted the abstract features from the data.

SVR model trained on these extracted features forecasts elec-

tricity load with low errors. The proposed model outper-

forms simple SVR and ANN in terms of forecasting accu-

racy which validates its performance.

The Shallow ANN (SANN) is utilized for electricity load

forecasting in [12] and [13]. SANN have the problem of

overﬁtting. To avoid overﬁtting, hyperparametersâĂŹ op-

timization is required that increases the complexity of the

forecasting model.

A hybrid deep learning method is applied to forecast price

in [14]. Two deep learning methods are combined in this re-

search work. Features are extracted by Convolution Neural

Network (CNN). Short-term energy price is predicted using

LSTM. Half hourly price data of PJM 2017 is used for pre-

diction. Previous 24 hour price is used to predict the next

1-hour electricity price. The hybrid DNN structure has 10

hidden layers. It has 2 convolution layers, 2 max-pooling

layers, 3 Rectiﬁed Linear Unit (ReLU), 1 batch normaliza-

tion layer, 1 LSTM layer for prediction and the last hidden

Sana et al.: Preprint submitted to Elsevier Page 2 of 16

Big Data Predictive Analytics of Electricity Load and Price

Table 1

Related work of load and price forecasting.

Task Forecast Horizon Platform / Testbed Dataset Algorithms

Load and price forecasting [6] Short-term Hourly data of 6 states OF USA NYISO, 2015 MRMR, Multi-block Elman ANN, En-

hanced shark smell optimization

Price forecasting [8] Short-term Hourly electricity price of 6 states of USA ISO NE, 2010-2015 GCA, Random forest (RF), ReliefF,

SVM, DE

Consumption forecasting [9] Short and long-

term

6 second resolution consumption of 5

homes with 109 domestic appliance

UK-Dale, 2012-2015 Association rule mining, Incremental

k-means clustering, Bayesian network

Price forecasting [10] Short-term Hourly price of 5 hubs of MISO USA, 2012-2014 Stacked Denoising Autoencoders

(SDA)

Consumption forecasting [11] Short-term Aggregated hourly load of four regions Los Angeles, California, Florida,

New York City, USA, August

2015-2016

SDA, SVR

Consumption forecasting [12] Short-term Electricity market data of 3 grids: FE,

DAYTOWN, and EKPC

PJM, USA, 2015 Mutual Information (MI), ANN

Consumption forecasting [13] Short-term Electricity market data of 2 grids: DAY-

TOWN, and EKPC

PJM, USA, 2015 Modiﬁed MI + ANN

Price forecasting [14] Short-term Half hourly price of PJM Intercontinental Exchange

(ICE), USA

Long Short Term Memory (LSTM),

Convolutional Neural Network (CNN)

Price forecasting [15] Short-term Turkish day-ahead market electricity prices Turkey, 2013-2016 Recurrent Neural Network (RNN)

Cooling load forecasting [16] Short-term HVAC Cooling load of an educational build-

ing

Hong Kong, 2015 Elastic Net (ELN), SAE, RF, MLR,

Gradient Boosting Machines (GBM),

Extreme GB tree, SVR

Consumption forecasting [17] Short-term Hourly load of Korea Electric Power Cor-

poration (KEPCO)

South Korea, 2012-2014 Restricted Boltzman Machine (RBM)

Consumption forecasting [18] Short-term Individual house consumption of 7km of

Paris

Individual household electric

power consumption, France,

2006-2010

Conditional RBM (CRBM), Factored

CRBM

Load forecasting [19] Short-term 15 minute resolution of one retail building Fremont, CA SAE, ELM

Load forecasting [20] Short-term 15 minutes cooling consumption of a com-

mercial building in Shenzhen city

Guangdong province, South

China, 2015

Empirical Mode Decomposition

(EMD), Deep Belief Networks

(DBN)

Load forecasting [21] Short-term Hourly consumption from Macedonian

Transmission Network Operator (MEPSO)

Republic of Macedonia, 2008-

2014

DBN

Load forecasting [22] Short-term Hourly consumption from Australia AEMO, 2013 EMD, DBN

Load forecasting [23] Medium to

long-term

Hourly consumption of a public safety

building, Salt Lake City, Utah. Aggregated

hourly consumption of residential buildings,

Austin, Texas

USA, 2015, 2016 LSTM

Load forecasting [24] Medium-term Half hourly metropolitan electricity con-

sumption

France, 2008-2016 LSTM, GA

Load forecasting [25] Short-term Hourly aggregated consumption of 6 states

OF USA

ISO NE, 2003-2016 Xgboost weighted k-means, EMD-

LSTM

Load forecasting [26] Short-term Ireland consumption Smart meter database of load

proﬁle, Ireland

Pooling deep RNN

Load forecasting [27] Short-term Daily electricity consumption data 3 Chinese cities, 2014 Feed Forward DNN (FFDNN), Prob-

ability Density Estimation

Load and photovoltaic power

forecasting [28]

Short-term Hourly residential power load data Dataport dataset, 2018 Deep Recurrent Neural Network

(DRNN) with LSTM units

Load forecasting [29] Short-term Hourly electricity market data ISO NE, 2007–2012 Deep RNN

Load forecasting [30] Short-term Hourly aggregated consumption of 6 states ISO NE, USA, DRNN, FFDNN

layer is a fully connected layer. The CNN feature extrac-

tor has 7 hidden layers and LSTM predictor has 3 hidden

layers. The output of 7𝑡ℎ hidden layer of feature extractor

CNN becomes the input of LSTM predictor. The proposed

method outperforms simple CNN, LSTM and various ma-

chine learning methods.

Authors of [15] have utilized the Gated Recurrent Units

(GRU) in RNN for Energy Price Forecasting (EPF).

Recently deep learning forecasting methods have shown

good performance in electricity price [14–16] and load fore-

casting [17–30]. However, the interdependency of load and

price are not considered in these DNN forecasting models.

In [31], the author discusses the importance of big data ap-

plications and analytics in the development of Smart Sus-

tainable Cities (SSCs). An IoT based framework is proposed

to improve the functionalities of SSCs. The importance of

accurate load and price forecasting in smart gridâĂŹs sta-

bility is discussed. Stability of grid improves sustainabil-

ity of SSCs. A SSC uses Information and Communication

Technology (ICT) for improving lifeâĂŹs quality, services

and urban operations. It ensures to fulﬁll the present and fu-

tureâĂŹs environmental, social, cultural and economic re-

quirements.

The authors of [32] conduct an extensive literature review on

future SSCs. Besides other aspects of future SSCs, energy

eﬃciency is also mooted in this review. The authors describe

the SSC as an energy eﬃcient, eco-friendly and real-time

city. Load demand forecasting plays a key role in energy

management and eﬃciency.

The future trends, architecture and challenges of SSCs are

reviewed in [33]. The major aspects of a smart city are illus-

trated in this study. Smart grid is discussed as an important

component of a smart city. The role of load demand fore-

casting is emphasized in an energy eﬃcient city. Six dimen-

sions of SSCs are explained in [34]. The authors present a

road map towards SSCs. The concept of SSC is elaborated

with the help of six dimensions; one of these dimensions is

energy eﬃciency.

The authors of [35] discuss the present services of smart

cities like load demand forecasting in order to achieve a

sustainable city. The short-term load of Girona University,

Spain is studied. The forecasting model consists of outlier

rejection, feature selection using auto correlation and pre-

diction using auto regression. First, outliers are removed

based on k nearest neighbors and Euclidean distance. Sec-

ondly, highly correlated features with the target class are se-

lected and features having high correlation with other fea-

tures and less correlation with target class are eliminated. Fi-

nally, a classical data-driven prediction model, auto regres-

sion is implemented for STLF. The services embedded in the

studied layered architecture are described in detail, aiming to

make it part of a sustainable city.

Sana et al.: Preprint submitted to Elsevier Page 3 of 16

Big Data Predictive Analytics of Electricity Load and Price

3. Problem Statement and Contributions

Authors of paper [8] and [9] have used big data for pre-

dictive analytics. However, the extensive feature engineer-

ing process increases the computational complexity. The

feature engineering involves denoising of inputs, feature se-

lection and dimension reduction. After the feature engineer-

ing step, another important step is the optimization of the

prediction method’s hyperparameters. This optimization is

crucial to achieving accurate forecast results. Feature en-

gineering and model optimization steps make forecasting

complex. To avoid the extensive feature engineering pro-

cess, the deep learning methods are proposed for electricity

price [10] and load [11] forecasting. The mentioned deep

learning based forecasting models have forecasted electric-

ity load and price separately.

The electricity load and price signals have a high correla-

tion. The incorporation of the inherent bi-directional rela-

tion of electricity load and price in prediction models’ inputs

results in high prediction accuracy. The correlation of elec-

tricity load and price is not taken into consideration in [10]

and [11]. A forecasting method is needed that accurately

forecasts the electricity load and price simultaneously. In

this article, a forecasting model is proposed that is based on

deep learning. The proposed method accurately forecasts

electricity load and price simultaneously taking advantage

of big data. The major contributions of this study are en-

listed below:

•The proposed models take advantage of big data. Big

data analyses of electricity load and price are pre-

sented in this study. Data and forecasting models are

analyzed statistically and graphically.

•A new feature extraction scheme based on Sparse Au-

toencoder (SAE) is introduced in the ﬁrst proposed

model. The performance of SAE is improved by us-

ing wavelet packet denoising as a decoding function

that signiﬁcantly improves the quality of extracted fea-

tures. The extracted features are presented as reﬁned

information and smooth training input of the forecast-

ing model Nonlinear Autoregressive Network with

Exogenous variables (NARX).

•The second proposed model is an optimized Recurrent

Extreme Learning Machine (RELM). The parameters

of RELM are optimized using a meta-heuristic opti-

mization technique diﬀerential evolution. The pro-

posed models outperform ELM, RELM, NARX, DE-

ELM and Cascade Elman ANN (CEANN) [7].

4. Proposed Model

Before describing the proposed forecasting model, the

utilized methods are introduced. A brief description of the

methods used in the proposed models is given in this section.

5. Artiﬁcial Neural Network for Forecasting

ANNs are inspired by the learning process of the bio-

logical neural networks. ANNs have the capability to model

the complex patterns hidden in the data. Multilayer Percep-

tron (MLP) is the simplest and fundamental architecture of

ANN [36]. The MLP comprises of the neurons, bias and

weights. The ANNs make a mapping of the inputs 𝑥𝑖and

their respective targets 𝑡𝑖. The weights, 𝑊𝑖are updated while

creating this mapping. The network learns when the weights

are updated.

𝑦(𝑡) = 𝑓(𝑊1𝑥1+𝑊2𝑥2+…+𝑊𝑛𝑥𝑛)(1)

Where, 𝑊𝑖are the weights and 𝑓is the activation function.

The most common algorithm used for updating the weights

is gradient descent. It reduces the squared error 𝐸using the

delta rule:

𝐸=𝑦(𝑡) − 𝑡(𝑡)2(2)

Where, 𝑡(𝑡)is the correspondent target vector of the 𝑥(𝑡)

training vector.

𝑤(𝓁)

𝑖𝑗 (𝑡+ 1) = 𝑤(𝓁)

𝑖𝑗 (𝑡) − 𝛼𝜕𝐸

𝜕𝑤(𝓁)

𝑖𝑗 (𝑡)

(3)

𝑏(𝓁)

𝑗(𝑡+ 1) = 𝑏(𝓁)

𝑗(𝑡) − 𝛼𝜕𝐸

𝜕𝑏(𝓁)

𝑗(𝑡)

(4)

Where, 𝑤(𝓁)

𝑖𝑗 (𝑡+ 1) is the new modiﬁed weight, 𝑤(𝓁)

𝑖𝑗 (𝑡)is the

weight that is required to be changed, bias is 𝑏(𝓁)

𝑗(𝑡)and the

learning rate is 𝛼(>0).

Deep Neural Network (DNN) is ANN with deeper architec-

ture, i.e., several numbers of hidden layers. DNN is compu-

tationally stronger as compared to Shallow ANN (SANN).

The proposed forecasting engines are based on Deep Recur-

rent Neural Networks (DRNN), i.e., NARX and LSTM.

6. Sparse Autoencoder

The SAE neural network is an unsupervised learning al-

gorithm that applies back propagation method setting the tar-

get values to be equal to the inputs, i.e., 𝑦𝑖=𝑥𝑖. The SAE

attempts to learn a function ℎ𝑊 ,𝑏(𝑥) ≈ 𝑥. Basically, SAE

tries to learn an approximation function, so the output ̂𝑥 is

similar to the input 𝑥. The network must reconstruct the in-

put data. By placing constraints on the network and limiting

the number of hidden units and adding sparsity, an interest-

ing structure of the data is discovered. The network is forced

to learn a compressed representation of the input, i.e., given

only the vector of hidden unit activations. Generally, sig-

moid is the activation function of the autoencoder, which

is designed to obtain a better representation of input data:

ℎ(𝑋, 𝑊 , 𝑏) = 𝜎(𝑊 𝑋 +𝑏). A sparse penalty term is added

to the sparse autoencoder cost function to limit the average

activation value of the hidden-layer neuron. Normally, when

the output value of a neuron is 1, it is active and the neuron

is inactive when its output value is 0. The purpose of enforc-

ing sparsity is to limit the undesired activation. 𝑎𝑗(𝑥)is set

Sana et al.: Preprint submitted to Elsevier Page 4 of 16

Big Data Predictive Analytics of Electricity Load and Price

x1

xn

x2

D

D

D å s

.

.

.

Input

Layer

Hidden

Layer 1

Output Layer

.

.

.

w11

w1n

w12

w21

w22

w2n

Outputs

.

.

.

.

.

.

Hidden

Layer 2

Time

Delay

Layer

w31

w32

w3n

Smart Grid

Historic Temperature Forecast

s

Load Forecast

Price Forecast

Historic Data ESAE Feature Extractor MIMO Forecaster ESAENARX

Figure 1: Proposed System model.

as the 𝑗𝑡ℎ activation value. In the process of feature learning,

the activation value of the hidden-layerneuron is usually ex-

pressed as 𝑎=𝜎(𝑊 𝑋 +𝑏), where, 𝑊are the weight matrix

and 𝑏is the deviation matrix. The mean activation value of

the 𝑗𝑡ℎ neuron in the hidden layer is deﬁned as:

𝜌𝑗=1

𝑛

𝑛

𝑖=1

[𝑎𝑗(𝑥𝑖)] (5)

The hidden layer is kept at a lower value to ensure that the

average activation value of the sparse parameter is deﬁned as

𝜌, and the penalty term is used to prevent 𝜌𝑗from deviating

from parameter 𝜌. The Kullback-Leibler (KL) divergence

[37] is used in this study for the re-enforcement learning.

The mathematical expression of KL divergence is as follows:

𝐾𝐿(𝜌𝜌𝑗) = 𝜌ln 𝜌

𝜌𝑗

+ (1 − 𝜌) ln 1 − 𝜌

1 − 𝜌𝑗

(6)

When 𝜌𝑗does not deviate from parameter 𝜌, the KL diver-

gence value is 0; otherwise, the KL divergence value will

gradually increase with the deviation. The cost function of

the neural network is set as 𝐶(𝑊 , 𝑏). Then, the cost function

of adding the sparse penalty term is:

𝐶𝑆𝑝𝑎𝑟𝑠𝑒 =𝐶(𝑊 , 𝑏) + 𝛽

𝑆2

𝑗=1

𝐾𝐿(𝜌𝜌𝑗)(7)

Where, 𝑆2is the number of neurons in the implicit layer and

𝑊is the weight of the sparse penalty term. The training

essence of a neural network is to ﬁnd the appropriate weight

and threshold parameter (𝑊 , 𝑏). After the sparse penalty

term is deﬁned, the sparse expression can be obtained by

minimizing the sparse cost function.

An SAE can be transformed into Sparse Denoising Autoen-

coder (SDA). Data is corrupted in a stochastic manner by

introducing some noise into it. The corrupted data is then

attempted to reconstruct to the original data.

SAE is capable of discovering the correlation among the fea-

tures. A reﬁned and the most relevant feature representation

achieved using SAE.

7. Eﬃcient SAE (ESAE)

The Eﬃcient SAE (ESAE) is proposed to create a better

representation of electricity data, that is useful for an accu-

rate forecast of price and load. In this section, the proposed

feature extractor Eﬃcient SAE is discussed in detail.

8. Pre-training of ESAE

To initialize the weights and bias an unsupervised pre-

training is applied. Where the input of a hidden layer is the

output of its previous layer. In the pre-training step, the ini-

tial bias and weights of the autoencoder are learned.

In the proposed method, the input data 𝑋𝑡is corrupted by

introducing white noise [38]. The white noise is added to

randomly selected 30% data points. A random process 𝑦(𝑡)

is known as white noise when the 𝑆𝑦(𝑓)is constant at all the

frequencies 𝑓:

𝑆𝑦(𝑓) = 𝑁0

2∀𝑓(8)

The white noise describes random disturbances with small

correlation periods. The white noise generalized correlation

function is deﬁned by:

𝐵(𝑡) = 𝛿(𝑡)𝜎2(9)

Where, 𝛿(𝑡)is the delta function and 𝜎is a positive constant.

9. Fine-tuning of ESAE

The ﬁne-tuning step is followed by the pre-training step.

In ﬁne-tuning, the wavelet denoising is proposed as the en-

coding transfer function of the ﬁrst hidden layer of ESAE.

The activation function of the second layer is sigmoid. The

wavelet denoising has two steps: (i) wavelet packet decom-

position and, (ii) reconstruction denoising operation. Firstly,

the input time series is decomposed into diﬀerent frequency

band by passing through the high pass and low pass ﬁlters.

Then the frequency band of noise is set to be zero. The signal

is then reconstructed using wavelet reconstruction function,

that is the inverse of a wavelet decomposition function [39].

Sana et al.: Preprint submitted to Elsevier Page 5 of 16

Big Data Predictive Analytics of Electricity Load and Price

Start

Extracted FeaturesDe-normalization Forecasting by

NARX

Price and load

forecasts

Min-max

normalization of data

Finish

Stage 1: Feature Extraction

Stage 2: Prediction

Pre-training Fine-tuning

Encoding with SAE

Corrupting input

features with white

noise

Fine-tuning with

efficient SAE

Figure 2: Step by step ﬂow of proposed model ESAENARX.

Wavelet decomposition operation can be expressed by:

𝑐𝑗,𝑘 =𝑛𝑐𝑗−1 , ℎ𝑛−2𝑘

𝑑𝑗,𝑘 =𝑛𝑑𝑗−1 , 𝑔𝑛−2𝑘𝑘= (1,2,…, 𝑁 − 1)

Where, 𝑐𝑗,𝑘 is scale coeﬃcient, 𝑑𝑗 ,𝑘 is the wavelet coeﬃcient,

ℎand 𝑔are the quadrature mirror ﬁlter banks. 𝑗is level of

decomposition and 𝑁are the sampling points. The wavelet

reconstruction function that is inverse wavelet decomposi-

tion is expressed as:

𝑐𝑗−1,𝑛 =

𝑛

𝑐𝑗ℎ𝑘 − 2𝑛+

𝑛

𝑑𝑗𝑔𝑘 − 2𝑛(10)

The denoising operation is shown by equations below.

̂𝜔𝑗,𝑘 =𝑠𝑖𝑔𝑛(𝜔𝑗,𝑘 (𝜔𝑗,𝑘 −𝑇 𝜆)),𝜔𝑗 ,𝑘≥𝜆,

0,𝜔𝑗,𝑘 < 𝜆.

Where, ̂𝜔𝑗,𝑘 is denoised signal, 𝜔𝑗 ,𝑘 is wavelet transformed

signal and 𝜆is the threshold.

In ESAE feature extractor, the number of the units in hid-

den layer one and two are 400 and 300, respectively. The

coeﬃcient that controls the layer 2 weight regularization is

set to be 0.001. Sparsity regularization is 4 and sparsity pro-

portion is 0.05. A maximum number of epochs is 100. The

algorithm for the learning of weights is scale conjugate gra-

dient descent.

10. Non-linear Autoregressive Network with

Exogenous Variables

NARX is an autoregressive RNN. Its feedback connec-

tions enclose several hidden layers of the network, leaving

the input layer. NARX has a memory that is utilized for

creating a nonlinear mapping between inputs and outputs.

The network learns from the recurrence on the past values

of time series and the past predicted values of the network

[40]. For predicting a value 𝑦(𝑡), the inputs of the NARX are

𝑦(𝑡− 1), 𝑦(𝑡− 2),…, 𝑦(𝑡−𝑑). NARX can be explained by

the following equation:

̂𝑦(𝑡+ 1) = 𝑓(𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡−𝑑), 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡−𝑑)) + 𝜀(𝑡)

(11)

Where ̂𝑦(𝑡+ 1) is network’s output at 𝑡,𝑓() is the nonlin-

ear mapping function, 𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡−𝑑)are the

past observed values, 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡−𝑑)are the net-

work’s inputs, number of the delays is 𝑑, and the error term

is denoted by 𝜀(𝑡). In the proposed NARX, for simultaneous

forecasting of price and load, the number of delays is 2. The

hidden layers of the network are 10. The training function is

Levenberg Marquardt.

11. Long Short-term Memory

LSTM is a well-known sub-category of the RNN. It is

widely used for modeling of sequential data. In LSTM, in-

ternal states are used to process input sequence. This struc-

ture allows it to learn dynamic temporal behavior for a time

sequence. Unlike feed forward ANNs, LSTM use their inter-

nal state to process sequences of inputs and remember longer

dependencies in the data. The LSTM is used to solve many

time sequence problems. LSTM contains three gates: input

gate, forget gate and output gate. It has a memory cell that

keeps relevant information of data as a memory. The pur-

pose of the forget gate is to ﬂush out irrelevant data. LSTM

can be explained by following equations:

Suppose an input time series, 𝑥=𝑥1, 𝑥2,…, 𝑥𝑛. The

LSTM models the input time series using recurrence (as

shown in equation 12):

ℎ𝑡=𝑓(𝑥𝑡, ℎ𝑡−1)(12)

Where, ℎ𝑡is the hidden state at time 𝑡,𝑥𝑡is input at time 𝑡

and ℎ𝑡−1 is the previous hidden state, i.e., at time 𝑡− 1. The

Sana et al.: Preprint submitted to Elsevier Page 6 of 16

Big Data Predictive Analytics of Electricity Load and Price

recurrence function 𝑓(⋅)contains gated operations as shown

in the following equations 13,14 and 15:

𝑖𝑡=𝜎(𝑤𝑖[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑖)(13)

𝑓𝑡=𝜎(𝑤𝑓[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑓)(14)

𝑜𝑡=𝜎(𝑤𝑜[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑜)(15)

̃

𝐶𝑡=𝑡𝑎𝑛ℎ(𝑤𝑐[𝑥𝑡, ℎ𝑡−1] + 𝑏𝐶)(16)

𝐶𝑡=𝑖𝑡⋅̃

𝐶𝑡+𝑓𝑡⋅𝐶𝑡−1 (17)

ℎ𝑡=𝑡𝑎𝑛ℎ(𝐶𝑡)⋅𝑜𝑡(18)

Where, 𝑖𝑡,𝑓𝑡and 𝑜𝑡are input, forget and output gates, respec-

tively. 𝑤𝑖,𝑤𝑓and 𝑤𝑜are their respective weights. 𝑏𝑖,𝑏𝑓and

𝑏𝑜are their respective biases. 𝐶𝑡is the current state of the

memory cell. ̃

𝐶𝑡is the new value candidate for the memory

cell. The sigmoid function 𝜎(⋅)converts the gatesâĂŹ val-

ues in the range of 0 to 1. The gates’ decisions depend on the

current input 𝑥𝑡and previous output ℎ𝑡−1. An input signal

is blocked if the gate’s value is 0. The forget gate decides

the amount of previous state ℎ𝑡−1 to be passed. The input

gate deﬁnes the amount of new input to be added or updated

to the previous cell state. Based on the cell state, the output

gate determines which information is output. In this man-

ner, the short and long-term sequence related information is

learned in the LSTM.

LSTM is superior to ANN because of its quality that it

can handle the problem of vanishing or exploding gradient.

The vanishing gradient problem arises while updating of

weights. The weights are updated by the delta rule in which

the gradient of the weight is taken with respect to the error

(as shown in equation 3). If the gradient becomes too small,

the change in updated weights will also be smaller resulting

in no improvement in learning. Whereas, if the gradient be-

comes too big, the updated weights will change too much

resulting in no convergence and un-stability of the network.

LSTM overcomes this problem by using the memory cell 𝑐𝑡,

that is able to preserve the state over a long period of time.

The amount of information to be restrained or discarded is

controlled by changing the values of forget gate, 𝑓𝑡, and in-

put gate, 𝑖𝑡. The dependency on individual inputs is also

controlled. This increased regulation helps in overcoming

the vanishing and exploding gradient problems.

12. ESAENARX Forecast Model

The deep learning is well known for its high precision

feature extraction. A sparse autoencoder deep neural net-

work with dropout is proposed to extract useful feature. This

deep neural network can signiﬁcantly reduce the adverse ef-

fect of overﬁtting, making the learned features more con-

ducive to the identiﬁcation and forecasting. NARX is pro-

posed for load and price forecasting.

A Multi Input Multi Output (MIMO) forecast model is pro-

posed to predict the price and load simultaneously. Fea-

tures are extracted using ESAE. Then the NARX network is

trained for simultaneous forecasting of price and load. The

system model is shown in Figure 1. The input features are:

hour, temperature forecast, wind speed forecast, lagged load,

the lagged price. There are two targets, electricity load and

price. The prediction process has the following ﬁve steps:

1. Inputs and targets are normalized using min-max

normalization. Suppose an input vector 𝑋=

𝑥1, 𝑥2, 𝑥3, ..., 𝑥𝑛. The number of instances in the vec-

tor is 𝑛. The min-max normalized is obtained by:

𝑋𝑛𝑜𝑟 =𝑥𝑖−𝑋𝑚𝑖𝑛

𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛

(19)

Where, 𝑖= 1,2, ..., 𝑛.

2. The normalized inputs are fed to train the ESAE fea-

ture extractor. After the ESAE is trained, the input fea-

tures are encoded using this trained ESAE. The output

of ESAE is the encoded features.

3. The encoded features are given as input to train NARX

network. 80% data is given for training, 15% is used

for validation and 5% is used for testing.

4. The price and load are predicted for 168 hours that is

one week.

5. The predicted values of load and price are de-

normalized to obtain actual values. The NARX ac-

curately predicts the price and load simultaneously.

The ESAE feature extractor has wavelet packet denoising as

a decoder function that performs the denoising of the input

features along with extraction. A reﬁned and rich represen-

tation of features is extracted by ESAE. Generally, SAE has

sigmoid decoder functions. The usage of wavelet packet de-

noising enhanced the extracted features and consequently the

forecasting accuracy improved signiﬁcantly. The purpose of

good forecasting accuracy is achieved by ESAENARX with

the help of eﬃcient feature extraction.

13. DE-RELM Forecast Model

The second proposed model is an also a MIMO model

like ESAENARX. DE-RELM is an eﬃcient method for elec-

tricity load and price forecasting. DE-RELM has three

stages, in the ﬁrst stage, the parameters of ELM are opti-

mized by applying the DE algorithm. In the second stage,

ELM is trained. The inputs and outputs of ELM are the in-

put features of load and price. With similar inputs and out-

puts, ELM acts like an encoder. Once the optimized ELM is

trained, the learned weights are set as the initial weights of

the RNN network that is used for forecasting. The learned

weights of ELM are the best representation of the input data.

Setting these initial weights helps RNN converge faster and

Sana et al.: Preprint submitted to Elsevier Page 7 of 16

Big Data Predictive Analytics of Electricity Load and Price

Start

De-normalization

Price and load

forecasts

Min-max

normalization of data

Finish

Stage 1: ELM optimization

Stage 2: Training ELM

Select weights and

biases with DE No

Yes

Stage 3: Prediction with DE-RELM

Calculate objective

function

Train ELM with same

inputs and outputs

Learned Weights Train ELM with

optimized weights

Forecasting by DE-

RELM

Initialize DE-RELM

with learned weights

Figure 3: Flowchart of DE-ELM.

forecast accurately. This is the third and ﬁnal stage of DE-

RELM. The number of neurons in the hidden layer of ELM

and RNN is kept the same. In order to use the learned

weights of ELM for the RNN network, the dimensions of

weight vectors have to be the same. For the prediction of

load and price, DE-RELM follows the steps shown in the

ﬂowchart, Figure 3.

1. The inputs and targets are normalized using min-max

normalization (as shown in equation 19).

2. The normalized inputs are given to the ELM networks

as inputs and outputs. The network is trained.

3. The forecasting error is calculated by equation 22.

4. The DE algorithm is used to optimize the weights and

biases of ELM. The objective function of DE is the

minimization of the prediction error.

𝑂𝑏𝑗 =minimize 1

𝑛

𝑛

𝑖=1

𝑋𝑎𝑐𝑡

𝑖−𝑦𝑓 𝑜𝑟

𝑖

𝑋𝑎𝑐𝑡

𝑖

100(20)

Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-

imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-

mum value of the actual target.

5. When the forecasting error is reduced to the desired

value, the optimized ELM network is trained.

6. The weights of ELM are set as initial weights of the

RNN network.

7. The RNN network predicts the price and load simul-

taneously.

8. The predicted values are de-normalized by inverse

min-max function.

𝑋= [𝑥𝑓 𝑜𝑟 × (𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛)] + 𝑋𝑚𝑖𝑛 (21)

Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-

imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-

mum value of the actual target.

In DE-RELM, the number of neurons in the hidden layer of

ELM and RNN is 100. ELM has 1 hidden layer. The acti-

vation function of ELM is sigmoid. DE has 100 iterations,

population size is 50, mutation factor is 0.5 and the crossover

rate is 1. The RNN network has 1 hidden layer. The transfer

function is logistic sigmoid.

The proposed models have multiple inputs and outputs. In-

puts are: hour, temperature, wind speed, lagged price and

lagged load and outputs are: price and load. The forecast en-

gines create a mapping between inputs and targets. Hence,

a mapping of input hour, temperature, price and load is cre-

ated with target price and target load. The relation between

price and load is captured while creating this mapping. The

price and load are aﬀected by past price and load, therefore,

lagged values are good features for prediction. The load is

aﬀected by temperature. The temperature and lagged values

are the most relevant inputs for price and load prediction.

Sana et al.: Preprint submitted to Elsevier Page 8 of 16

Big Data Predictive Analytics of Electricity Load and Price

Moreover, the input is extracted with ESAE feature extractor

which further enhances the input to NARX forecaster. The

best mapping of relevant and informative inputs with targets,

results in improved forecast accuracy.

Both proposed models comprise of neural network based en-

coders: ESAE and ELM encoder, and deep RNN forecasters:

NARX and LSTM. In the ﬁrst model; ESAENARX, features

are extracted by an eﬃcient sparse encoder. In the second

model; DE-RELM, an extreme learning machine is used as

encoder to learn the initial weights for forecast engine.

This study is aimed at helping electricity market experts and

traders. Several market operations beneﬁt from the load

forecasting, such as: formulation of demand-response pro-

grams, generation scheduling and planning new generation

sources. On the other hand, the traders take advantage of

price forecasting for making bidding strategies and market

experts make modiﬁed pricing schemes to control consump-

tion behaviors. No speciﬁc sector (i.e., residential, indus-

trial, commercial, etc.) is targeted in this study, instead ag-

gregated load and average regulation price of two power util-

ities are studied.

14. Applications of Proposed Models

The proposed models forecast electricity load and price.

Both price and load forecasts are useful in the case of smart

grids and micro grids. They help utility experts in under-

standing load and price correlation and dynamics. They have

following applications:

1. Minimize the risk of demand and supply imbalance. If

the generation of electricity is less than the demand,

the grids will not be able to fulﬁll the demands of con-

sumers. If generation is more than demand, the energy

will be wasted.

2. Enable the power utility companies to plan better since

they understand the future load demand.

3. Help to determine the required resources; such as, fu-

els required to operate the generating plants.

4. Maximize utilization of power generating plants. The

load forecasting prevents under generation and over

generation.

Several Independent Service Operators (ISOs) take advan-

tage of load forecasting. These ISOs publish the day-ahead

or month-ahead load forecasting data on their websites; such

as, NYISO [41], PJM [42], etc. In the aforementioned real

world scenarios, the proposed forecasting models are appli-

cable.

15. Simulations and Results

All the simulations are performed using MATLAB

R2018a on a computer with core i3 processor and 8 GB

RAM. In this section, the description of datasets, big data

analysis and results’ discussion are presented.

16. Data Description

The data used for forecasting is taken from the well-

known electricity utilities: ISO NE (Independent System

Operator New England) [43] and PJM [44], USA. Both

datasets are publicly available.

17. ISO NE Electricity Market

ISO NE is an independent system operator that provides

power to the six states of the USA, known as New England.

It serves Maine, Connecticut, Massachusetts, Rhode Island,

Vermont and New Hampshire. Approximately, every year

the transaction of $10 million is made by 400 electricity

market participants in ISO NE. It has almost 7 million con-

sumers: business and household. Hourly electricity market

data of almost 8 years is used for prediction purpose. Dura-

tion of data used in simulations is from January 2011 to June

2018. Total measurements are 65,616. The data utilized in

this paper is aggregated load and regulation capacity clear-

ing price of the ISO NE control area.

18. PJM Electricity Market

PJM Interconnection is a Regional Transmission Organi-

zation (RTO) in the USA. It is an electric transmission sys-

tem that is part of the Eastern Interconnection grid. It sup-

plies power to 14 regions, i.e., Illinois, Delaware, Kentucky,

Indiana, Maryland, New Jersey, Michigan, Ohio, North Car-

olina, Pennsylvania, Virginia, West Virginia, District of

Columbia and Tennessee. The data taken from PJM is hourly

consumption and price of thirteen years, i.e., January 2006

to October 2018. Data comprises of 112,300 measurements

of load and price each.

19. Performance Evaluation

To evaluate the performance of ESAENARX two per-

formance measures are used, i.e., MAPE, Root Mean Square

Error (RMSE) and Normalized RMSE (NRMSE). The lower

value of the error is better forecasting accuracy. MAPE is an

average absolute error of the forecasted and observed values

and deﬁned by the following equation:

𝑀 𝐴𝑃 𝐸 =1

𝑛

𝑛

𝑖=1

𝑋𝑎𝑐𝑡

𝑖−𝑦𝑓 𝑜𝑟

𝑖

𝑋𝑎𝑐𝑡

𝑖

100 (22)

NRMSE is the normalized root mean square error of fore-

casted and observed values and deﬁned by:

𝑅𝑀𝑆𝐸 =

1

𝑛

𝑛

𝑖=1

(𝑋𝑎𝑐𝑡

𝑖−𝑦𝑓 𝑜𝑟

𝑖)2(23)

𝑁𝑅𝑀𝑆𝐸 =𝑅𝑀𝑆𝐸

(𝑚𝑎𝑥(𝑋𝑎𝑐𝑡

𝑖) − 𝑚𝑖𝑛(𝑋𝑎𝑐𝑡

𝑖)) (24)

Where 𝑋𝑎𝑐𝑡

𝑖is the observed value, 𝑦𝑓 𝑜𝑟

𝑖is the forecasted

value and 𝑛is number of values.

20. Big Data Analytics of Electricity Price and

Demand

In this research study, the big data of load and price are

deeply analyzed. Both visual and statistical analyses are per-

formed. The visual analyses are presented in graphs. The vi-

sual analyses of ISO NE load are shown in Figures 4,8,11,

Sana et al.: Preprint submitted to Elsevier Page 9 of 16

Big Data Predictive Analytics of Electricity Load and Price

and 13 and PJM load are illustrated in Figures 14,18,21,

and 23. The price analyses of ISONE are presented in Fig-

ures: 5,9,10, and 12, and PJM price is demonstrated by

Figures: 15,19,20, and 22. The price demand relation of

ISO NE is shown in Figure 6, and 7and PJM is shown in

Figure 16, and 17. The statistical analysis of the forecast er-

ror is shown in Table 2.

ISO NE price and load have daily and weekly seasonality.

Price and load have a strong relation with the ISO NE mar-

ket. The load of 8 years is shown in Figure 4and price is

shown in Figure 5. The scatter plot in Figure 6shows the di-

rectly proportional relation of price and demand. The scatter

plot shows the proportionality of price and load. The corre-

lation coeﬃcient is also shown in the ﬁgure. The normalized

load and price of one week are shown in a Figure 7for better

visualization of their bidirectional relation. The price elas-

ticity of demand is a factor that describes changes in demand

with respect to changes in the price. Usually, the demand de-

creases if the price increases, however, the price elasticity of

power demand is low. According to the analysis presented

in [45], the price elasticity of demand is âĂŞ0.1 or lesser

within a year in the USA. The season aﬀects the energy con-

sumption and price. In the USA there are four seasons in a

year. The spring season duration is from March to May, the

summer season is from June to August, the autumn (fall) is

from September to November and winters are from Decem-

ber to February. The summer season has the highest electric-

ity consumption of the year as shown in Figure 8. The peak

consumption hours of summer are from 1:00 pm to 5:00 pm

on weekdays. In winters (December to January), the peak

consumption hours are from 5:00 pm to 7:00 pm on week-

days. In ISO NE there are two peak load points in a day.

The 1𝑠𝑡 peak point is around 11:00 am and 2𝑛𝑑 peak point

is between 4:00 pm to 5:00 pm (as shown in Figure 8). The

consumption of 1𝑠𝑡 January, 1𝑠𝑡 April, 1𝑠𝑡 July and 1𝑠𝑡 Octo-

ber is shown in Figure 8and 18. The mentioned four days

are from the four diﬀerent seasons of a year.

Prices of the same four days are shown in Figure 9and 19.

Both consumption and price are the highest in the summer

season from the rest of the year. The building cooling is re-

quired in the hot weather of summer. Air conditioners con-

sume a lot of power, that is the major reason for an increase in

energy consumption. Electricity prices are relatively higher

in the winters too. The electricity price and load are less in

the spring season as compared to the rest of the year. Due to

the fact that in moderate weather building heating or cool-

ing is not required, that reduces consumption and ultimately

price too. The electricity consumption pattern is ﬁxed with

the seasons and time of use. The electricity consumption is

more in the working hours and less in the nonworking hours.

The load pattern trend has fewer variations as compared to

the price trend. Mostly price and load increasing and de-

creasing at the same time. However, there are a few points

in time where the energy price increase sharply in an unex-

pected manner, even if the load is not increased accordingly

(as shown in Figure 7, between hours 75 to 82 and Figure 17,

between hours 30 to 35). The unexpected change in the price

is due to the external inﬂuential factors other than consump-

tion. The factors that inﬂuence energy price are: Renewable

Energy Resources (RES) available, fuel prices, economic

conditions, excessive use penalty and transmission contin-

gency. The load is not much aﬀected by most of these fac-

tors. Energy load shows a little or no variation towards the

aforementioned external factors. The energy consumption is

majorly aﬀected by weather conditions. The electricity con-

sumption and price continue to increase over the last 8 years,

that is clear from Figure 4and 5. The visual representation

of past years’ consumption enables utility experts to visual-

ize increasing demand that helps in planning new generation

plants to satisfy future power demand.

PJM load and price of 13 years (2006-2018) are shown in

Figure 14 and Figure 15, respectively. Scatter plot in Fig-

ure 6illustrates the relation of price and load in ISO NE.

Figure 16 shows price demand relation in the PJM electric-

ity markets. The direct proportionality of load and price sig-

nals can be seen in these two ﬁgures. In Figure 7and 17, the

normalized load and price of 1𝑠𝑡 week of January 2018 are

plotted. The correlation of price and load signals is demon-

strated in these two ﬁgures.

The proposed models ESAENARX and DE-RELM are used

for short-term load and price forecasting. The forecast pe-

riod is one week that is 168 hours. The results of ISO NE

price and load forecast of 1𝑠𝑡 week of June 2018 are shown

in Figure 10 and Figure 11. The PJM price and load fore-

cast of 1𝑠𝑡 week of September 2018 are shown in Figure 20

and 21, respectively. The actual and forecasted values are

plotted and the forecasted values are following the trend of

the actual values. The forecasted load trend closer to the ac-

tual load trend as compared to price. The price forecast is

slightly less accurate as compared to the load forecast. This

is because the load has a similar repetitive pattern and price

pattern has a volatile nature.

Price data exhibit certain characteristics: volatility, sudden,

sharp spikes and changes. The nature of price makes its

forecasting diﬃcult. Learning the pattern of price require

great eﬀort. Only reﬁned features learned with a good pre-

diction method can produce an accurate price forecast result.

It is clear from the results of the experiments that the ESAE-

NARX forecasts price and load very well.

21. Comparison and Discussion

The proposed methods are compared with four ANN

forecasting methods: NARX and ELM, DE-ELM and

RELM. These methods are widely used in electricity load

and price forecasting. The ESAENSARX, ELM, enhanced

ELM, NARX and RELM results for ISO NE price and load

forecast are shown in Figure 12 and Figure 13, respectively.

ESAENARX is able to follow the price and load trend bet-

ter than compared methods. The reason behind the better

forecast accuracy is the best representative features extracted

by proposed feature extractor ESAE. NARX forecaster is

trained with extracted features and it performs very well.

The proposed method takes advantage of the strengths of

both SAE and NARX. The SAE is further made eﬃcient for

Sana et al.: Preprint submitted to Elsevier Page 10 of 16

Big Data Predictive Analytics of Electricity Load and Price

0123456

Hours 104

0.5

2

3

Load (MW)

104

Figure 4: Load of January 2011 to March 2018, ISO NE.

0 1 2 3 4 5

Hours 104

0

500

1000

1500

Price ($/MWh)

Figure 5: Price of January 2011 to March 2018, ISO NE.

1 1.2 1.4 1.6 1.8

Load (MW) 104

0

100

200

300

400

Price ($/MWh)

Correlation Coefficient = 0.62

Figure 6: Price-demand signals relation of January 2018 to

March 2018, ISO NE.

0 20 40 60 80 100 120 140 160

Hours

0.2

0.4

0.6

0.8

1

1.2

Load (MW)

0

0.2

0.4

0.6

0.8

1

Price ($/MWh)

Load

Price

Figure 7: Normalized load and price of ﬁrst week of June 2018,

ISO NE.

better performance. The detailed comparison of all the com-

pared methods is presented in this section. The results and

reasoning are also elaborated with the comparative analy-

sis. Moreover, the strengths and limitations of the compared

methods are highlighted.

The eﬀect of proposed feature engineering is clear from

the numerical results. The forecasted accuracy of ESAE-

0 5 10 15 20 25

Hours

1

1.5

2

Load (MW)

104

Figure 8: One day consumption of all four seasons, ISO NE.

0 5 10 15 20 25

Hours

0

50

100

150

Price ($/MWh)

Figure 9: One day energy price of all four seasons, ISO NE.

0 20 40 60 80 100 120 140 160

Hours

0

20

40

60

Price ($/MWh)

Observed

Predicted

Figure 10: Forecasted and observed price of ﬁrst week of June

2018, ISO NE.

0 50 100 150 200 250

Hours

1

1.5

2

2.5

Load (MW)

104

Predicted

Observed

Figure 11: Forecasted and observed load of ﬁrst week of June

2018, ISO NE.

NARX with extracted features is much better as compared to

simple NARX. The extracted features are informative; there-

fore, the forecaster is able to model data in a better way and

forecast with greater accuracy.

The proposed methods are compared with three types of

ELMs: ELM, DE-ELM and RELM. The comparative anal-

ysis of these methods is given below.

Sana et al.: Preprint submitted to Elsevier Page 11 of 16

Big Data Predictive Analytics of Electricity Load and Price

0 20 40 60 80 100 120 140 160

Hours

0

50

100

150

200

250

Price ($/MWh)

Observed

ESAENARX

ELM

NARX

DE-ELM

RELM

DE-RELM

CEANN

Figure 12: Comparison of ESAENARX and DE-RELM price

prediction with NARX, ELM and DE-ELM, ISO NE.

0 60 120 170

Hours

0

1

2

3

4

Load (MW)

104

Observed

ESAENARX

ELM

NARX

RELM

DE-ELM

DE-RELM

CEANN

Figure 13: Comparison of ESAENARX and DE-RELM load

prediction with NARX, ELM and DE-ELM, ISO NE.

0 1 2 3 4 5 6 7

Hours 104

0.5

1

1.5

Load (MW)

105

Figure 14: Load of PJM from January 2010 to March 2018.

0 1 2 3 4 5 6 7

Hours 104

0

500

1000

Price ($/MWh)

Figure 15: Price of PJM from January 2010 to March 2018.

The ELM is optimized using a meta-heuristic optimization

algorithm, named diﬀerential evolution. The initial weights

and biases of ELMâĂŹs hidden and output layers are op-

timized using DE. DE is an optimization method that iter-

atively improves the performance of an algorithm with re-

spect to the optimization function. In the case of ELM, the

performance is improved, when the forecast accuracy im-

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Load (MW) 105

0

500

1000

Price ($/MWh)

Correlation Coefficient = 0.87

Figure 16: Price-demand signals relation of PJM from January

2018 to March 2018.

0 20 40 60 80 100 120 140 160

Hours

0

0.5

1

Load (MW)

0

0.5

1

Price ($/MWh)

Load

Price

Figure 17: Normalized load and price of PJM ﬁrst week of

January 2018.

0 5 10 15 20 25

Hours

0.6

0.8

1

1.2

1.4

Load (MW)

105

Figure 18: One day consumption of all four seasons, PJM.

0 5 10 15 20 25

Hours

0

50

100

150

Price ($/MWh)

Figure 19: One day energy price of all four seasons, PJM.

proves. The objective function is to reduce the forecast error

on validation data of electricity load and price. First of all,

the population of weights and bias is generated. The pop-

ulation follows the normal distribution. For every selected

weight combination, the NRMSE and MAE are calculated.

The crossover and mutation operations are performed to gen-

erate new combinations of weights and biases. The opti-

Sana et al.: Preprint submitted to Elsevier Page 12 of 16

Big Data Predictive Analytics of Electricity Load and Price

0 20 40 60 80 100 120 140 160

Hours

0

50

100

150

Price ($/MWh)

Observed

Predicted

Figure 20: Actual and predicted price of PJM.

0 20 40 60 80 100 120 140 160

Hours

0.6

0.8

1

1.2

1.4

Load (MW)

105

Observed

Predicted

Figure 21: Actual and predicted load of PJM.

0 60 120 170

Hours

0

50

100

150

Price ($/MWh)

Observed

ESAENARX

NARX

ELM

DE-ELM

RELM

DE-RELM

CEANN

Figure 22: Comparison of ESAENARX and DE-RELM price

prediction with NARX, ELM and DE-ELM, PJM.

0 60 120 170

Hours

0

0.5

1

1.5

2

2.5

Load (MW)

105

Observed

ESAENARX

ELM

NARX

DE-ELM

RELM

DE-RELM

CEANN

Figure 23: Comparison of ESAENARX and DE-RELM load

prediction with NARX, ELM and DE-ELM, PJM.

mized combination of weights and biases are achieved after

multiple iterations of DE. The optimized weights and biases

are used in ELM for the price and load forecasting on test

data. The DE-ELM has a lesser error as compared to simple

ELM. The accuracy of DE-ELM is improved due to the op-

timized initial weights and biases according to the data. The

accuracy of DE-ELM is better than ELM and slightly worse

than RELM in load forecasting. However, for price fore-

casting, the performance of DE-ELM degrades. The price

data has high nonlinearity and dependency on exogenous

variables. Therefore, the relevant features of price are re-

quired to be extracted carefully. The proposed feature ex-

tractor ESAE is capable of extracting the ﬁne details of rel-

evant data. Therefore, the proposed method, ESAENARX

shows good accuracy for both price and load forecasting.

RELM is a variant of the recurrent neural network. It is a

combination of two methods, ELM and RNN. ELM acts as

an encoder, where the inputs and outputs of the network are

same, i.e., the input features. The learned weights of the

ELM network are set as the initial weights of the RNN. By

keeping the inputs and outputs of ELM network similar, the

learned weights are a good representation of the input fea-

tures. The number of neurons in the hidden layer of ELM

and RNN is kept the same. Two ELM encoders are trained,

one for the hidden layerâĂŹs weights of RNN and second

for the output layerâĂŹs weights of the RNN. The learned

weights, make the RNN converge fast and better. The results

of RELM are slightly better than DE-ELM and comparable

to NARX. Both RELM and NARX belong to the same cat-

egory of the neural network, known as a recurrent neural

network.

The second proposed method DE-RELM perform reason-

ably well on load forecasting. The load forecasting results

are much better as compared to other techniques and com-

parable to ESAENARX. However, no signiﬁcant improve-

ment is seen in the price forecast. ESAENARX performs

equally well for both load and price. The DE-RELM trains

the forecaster on learned weights, a minor improvement is

achieved, that is not comparable to ESAENARX. For price

forecast only properly extracted features can improve accu-

racy. ESAE extracts the relevant and the most informative

features, that improves the forecast accuracy.

ELM has the worst forecast results in the six compared meth-

ods. Because of the fact that ELM is a feed forward network.

Its weights are learned once in a forward pass and never

updated. Therefore, to achieve acceptable forecast results,

the initial weights of the ELM have to be very optimized.

NARX performs better as compared to the ELM. However,

its forecast results are not as accurate as the proposed meth-

ods ESAENARX and DE-RELM. The errors MAPE and

NRMSE are shown in Table 2. The forecast accuracy of all

six methods is in sequence: ESAENARX > DE-RELM >

NARX > DE-ELM > RELM > ELM.

The lesser error than compared methods veriﬁes the good

performance of the ESAENARX forecast model. The PJM

results in Figure 22 and Figure 23, prove the better accu-

racy of ESAENARX and DE-RELM as compared to ELM,

DE-ELM, RELM and NARX. The MAPE and NRMSE of

ESAENARX, DE-RELM, ELM, DE-ELM, RELM, NARX

and CEANN [7] are listed in Table 2. The eﬃciency of

ESAENARX and DE-RELM is conﬁrmed by lesser MAPE

and RMSE compared to the mentioned methods.

The computational time of both proposed models is pre-

sented in Table 3. The computational time of ESAENARX is

Sana et al.: Preprint submitted to Elsevier Page 13 of 16

Big Data Predictive Analytics of Electricity Load and Price

Table 2

Comparison of forecasting errors.

ISO NE

Forecast Method MAPE RMSE NRMSE

ELM 74.59 7.82 1.53

NARX 1.35 4.35 0.37

Load Forecast DE-ELM 21.73 5.23 0.41

RELM 18.78 4.62 0.37

CEANN [7]8.62 3.75 0.57

DE-RELM 7.78 3.14 0.32

ESAENARX 1.13 2.27 0.03

ELM 89.95 9.78 1.91

NARX 8.29 5.24 0.89

Price Forecast DE-ELM 28.06 6.92 0.32

RELM 21.06 5.62 0.28

CEANN [7]19.96 4.45 0.96

DE-RELM 18.62 3.75 0.34

ESAENARX 3.32 2.85 0.08

PJM

ELM 72.32 21.2 1.92

NARX 32 9.26 1.8

Load Forecast DE-ELM 6.52 9.18 0.08

RELM 1.14 9.04 0.032

CEANN [7]3.87 8.96 0.64

DE-RELM 1.09 5.24 0.028

ESAENARX 1.08 3.86 0.03

ELM 99 21.6 2.19

NARX 8.78 18.72 0.16

Price Forecast DE-ELM 18.49 21.76 0.35

RELM 11.09 18.96 0.52

CEANN [7]10.74 8.76 0.2604

DE-RELM 10.56 7.24 0.18

ESAENARX 4.32 4.67 0.12

Table 3

Computational time of proposed algorithms.

Model Dataset Training

Time (s)

Testing Time

(s)

SAENARX ISO NE 162 37

PJM 187 53

DE-RELM ISO NE 104 28

PJM 123 29

higher as compared to DE-RELM because the feature extrac-

tor ESAE involves pre-training and ﬁne tuning steps. Both

models take more time for training on PJM data. The reason

behind PJM’s higher time complexity is its larger size than

ISO NE.

22. Conclusion

In this paper, electricity load and price forecasting is con-

sidered in order to take part in the ISO NE and PJM mar-

kets that regulate the price and demand in the power systems

of the USA. The modeling of electricity load and price is

addressed by two new deep learning based models: ESAE-

NARX and DE-RELM. Descriptive and predictive analytics

of electricity big data are performed. The proposed methods

consider the bidirectional impacts of demand and prices on

each other. These methods capture the load and price inter-

dependencies in the past market data. Following conclusions

are drawn from this study:

•The big data analytics unveils the insightful infor-

mation about consumer behaviors and increasing de-

mand. This information helps in the formulation of

new demand-response programs and long term deci-

sions, such as, upscaling of the grid for satisfying the

future demand. Consequently, the grid stability is sig-

niﬁcantly improved.

•The proposed feature extractor; ESAE, signiﬁcantly

improves the quality of extracting feature resulting in

accurate forecasting. The functionality of ESAE is im-

proved because of implementing proposed combina-

tion of decoder functions.

•The proposed models eﬃciently capture price-

demand trends in energy big data. Numerical results

show that proposed forecasting models have lesser

MAPE and RMSE than the compared methods.

•The feasibility and practicality of proposed models are

conﬁrmed by their accuracy on well-known real elec-

tricity market data.

In future work, the SAE feature extractor will be enhanced

using multiple combinations of encoder and decoder func-

tions. The eﬀect of each combination on the performance

of feature extractor will be examined. A comparative analy-

sis will be performed on enhanced feature extractor in order

to propose a generalized SAE that performs well on multi-

ple scenarios and datasets. Proposed models can be imple-

mented in real world scenario of smart grid or micro grid in

order to improve power system operations.

References

[1] Liu Y, Wang W, Ghadimi N. Electricity load forecasting

by an improved forecast engine for building level con-

sumers. Energy. 2017 Nov 15;139:18-30.

[2] Akhavan-Hejazi H, Mohsenian-Rad H. Power systems

big data analytics: An assessment of paradigm shift bar-

riers and prospects. Energy Reports. 2018 Nov 30;4:91-

100.

[3] Jiang H, Wang K, Wang Y, Gao M, Zhang Y. Energy big

data: A survey. IEEE Access. 2016; 4:3844-61.

[4] Zhou K, Fu C, Yang S. Big data driven smart energy

management: From big data to big insights. Renewable

and Sustainable Energy Reviews. 2016 Apr 1;56:215-

25.

[5] Zhang Q, Yang LT, Chen Z, Li P. A survey on deep

learning for big data. Information Fusion. 2018 Jul 31;

42:146-57.

Sana et al.: Preprint submitted to Elsevier Page 14 of 16

Big Data Predictive Analytics of Electricity Load and Price

[6] Ghasemi A, Shayeghi H, Moradzadeh M, Nooshyar M.

A novel hybrid algorithm for electricity price and load

forecasting in smart grids with demand-side manage-

ment. Applied energy. 2016 Sep 1;177:40-59.

[7] Gao W, Darvishan A, Toghani M, Mohammadi M, Abe-

dinia O, Ghadimi N. Diﬀerent states of multi-block

based forecast engine for price and load prediction. In-

ternational Journal of Electrical Power & Energy Sys-

tems. 2019 Jan 1;104:423-35.

[8] Wang K, Xu C, Zhang Y, Guo S, Zomaya A. Robust

big data analytics for electricity price forecasting in the

smart grid. IEEE Transactions on Big Data. 2017 Jul 5,

DOI: 10.1109/TBDATA.2017.2723563.

[9] Singh S, Yassine A. Big data mining of energy time

series for behavioral analytics and energy consumption

forecasting. Energies. 2018 Feb 20;11(2):452.

[10] Wang L, Zhang Z, Chen J. Short-term electricity price

forecasting with stacked denoising autoencoders. IEEE

Transactions on Power Systems. 2017 Jul;32(4):2673-

81.

[11] Tong C, Li J, Lang C, Kong F, Niu J, Rodrigues JJ.

An eﬃcient deep model for day-ahead electricity load

forecasting with stacked denoising autoencoders. Jour-

nal of Parallel and Distributed Computing. 2018 Jul

1;117:267-73.

[12] Ahmad A, Javaid N, Guizani M, Alrajeh N, Khan ZA.

An accurate and fast converging short-term load fore-

casting model for industrial applications in a smart grid.

IEEE Transactions on Industrial Informatics. 2017 Oct

1;13(5):2587-96.

[13] Ahmad A, Javaid N, Alrajeh N, Khan ZA, Qasim U,

Khan A. A modiﬁed feature selection and artiﬁcial neu-

ral network-based day-ahead load forecasting model for

a smart grid. Applied Sciences. 2015 Dec 11;5(4):1756-

72.

[14] Kuo PH, Huang CJ. An Electricity Price Forecasting

Model by Hybrid Structured Deep Neural Networks.

Sustainability. 2018 Apr 21;10(4):1280.

[15] Ugurlu U, Oksuz I, Tas O. Electricity Price Forecasting

Using Recurrent Neural Networks. Energies. 2018 Apr

23;11(5):1-23.

[16] Fan C, Xiao F, Zhao Y. A short-term building cooling

load prediction method using deep learning algorithms.

Applied energy. 2017 Jun 1;195:222-33.

[17] Ryu S, Noh J, Kim H. Deep neural network based de-

mand side short term load forecasting. Energies. 2016

Dec 22;10(1):3.

[18] Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep

learning for estimating building energy consumption.

Sustainable Energy, Grids and Networks. 2016 Jun

1;6:91-9.

[19] Li C, Ding Z, Zhao D, Yi J, Zhang G. Building energy

consumption prediction: An extreme deep learning ap-

proach. Energies. 2017 Oct 7;10(10):1525.

[20] Fu G. Deep belief network based ensemble approach

for cooling load forecasting of air-conditioning system.

Energy. 2018 Apr 1;148:269-82.

[21] Dedinec A, Filiposka S, Dedinec A, Kocarev L.

Deep belief network based electricity load forecasting:

An analysis of Macedonian case. Energy. 2016 Nov

15;115:1688-700.

[22] Qiu X, Ren Y, Suganthan PN, Amaratunga GA. Empir-

ical mode decomposition based ensemble deep learning

for load demand time series forecasting. Applied Soft

Computing. 2017 May 1;54:246-55.

[23] Rahman A, Srikumar V, Smith AD. Predicting electric-

ity consumption for commercial and residential build-

ings using deep recurrent neural networks. Applied En-

ergy. 2018 Feb 15;212:372-85.

[24] Bouktif S, Fiaz A, Ouni A, Serhani M. Optimal deep

learning lstm model for electric load forecasting using

feature selection and genetic algorithm: Comparison

with machine learning approaches. Energies. 2018 Jun

22;11(7):1636.

[25] Zheng H, Yuan J, Chen L. Short-term load forecast-

ing using EMD-LSTM neural networks with a Xgboost

algorithm for feature importance evaluation. Energies.

2017 Aug 8;10(8):1168.

[26] Shi H, Xu M, Li R. Deep learning for household load

forecasting-A novel pooling deep RNN. IEEE Transac-

tions on Smart Grid. 2018 Sep;9(5):5271-80.

[27] Guo Z, Zhou K, Zhang X, Yang S. A deep learning

model for short-term power load and probability density

forecasting. Energy. 2018 Oct 1;160:1186-200.

[28] Wen L, Zhou K, Yang S, Lu X. Optimal load dispatch

of community microgrid with deep learning based solar

power and load forecasting. Energy. 2019 Jan 16.

[29] Torres JF, Fernandez AM, Troncoso A, Martinez-

Alvarez F. Deep learning-based approach for time series

forecasting with application to electricity load. In In-

ternational Work-Conference on the Interplay Between

Natural and Artiﬁcial Computation 2017 Jun 19 (pp.

203-212). Springer, Cham.

[30] Din GM, Marnerides AK. Short term power load fore-

casting using deep neural networks. In 2017 Interna-

tional Conference on Computing, Networking and Com-

munications (ICNC) 2017 Jan 26 (pp. 594-598). IEEE.

[31] Bibri SE. The IoT for smart sustainable cities of the

future: An analytical framework for sensor-based big

data applications for environmental sustainability. Sus-

tainable Cities and Society. 2018 Apr 1, 38: 230-253.

Sana et al.: Preprint submitted to Elsevier Page 15 of 16

Big Data Predictive Analytics of Electricity Load and Price

[32] Bibri SE, Krogstie J. Smart sustainable cities of the

future: An extensive interdisciplinary literature review.

Sustainable Cities and Society. 2017 May 1, 31: 183-

212.

[33] Silva BN, Khan M, Han K. Towards sustainable smart

cities: A review of trends, architectures, components,

and open challenges in smart cities. Sustainable Cities

and Society. 2018 Apr 1, 38: 697-713.

[34] Ibrahim M, El-Zaart A, Adams C. Smart sustainable

cities roadmap: Readiness for transformation towards

urban sustainability. Sustainable cities and society. 2018

Feb 1, 37: 530-540.

[35] Massana J, Pous C, Burgas L, Melendez J, Colomer J.

Identifying services for short-term load forecasting us-

ing data driven models in a Smart City platform. Sus-

tainable cities and society. 2017 Jan 1, 28: 108-17.

[36] White, B.W. Principles of neurodynamics: Perceptrons

and the theory of brain mechanisms. Spartan Books,

Washington DC. 1963.

[37] Youssef A, Delpha C, Diallo D. An optimal fault de-

tection threshold for early detection using Kullback-

âĂŞLeibler divergence for unknown distribution data.

Signal Processing. 2016 Mar 1;120:266-79.

[38] Hida T, Kuo HH, Potthoﬀ J, Streit L. White noise: an

inﬁnite dimensional calculus. Springer Science & Busi-

ness Media; 2013 Jun 29.

[39] Chen S, Billings SA, Grant PM. Non-linear system

identiﬁcation using neural networks. International jour-

nal of control.

[40] Chen X, Li S, Wang W. New de-noising method for

speech signal based on wavelet entropy and adaptive

threshold. Journal of Information & Computational Sci-

ence. 2015;12(3):1257-65.

[41] NYISO Market Operation Data, https://www.nyiso.

com/load-data (Last visited on 16𝑡ℎ March 2019)

[42] PJM Market Operation Data, https://www.pjm.com

(Last visited on 16𝑡ℎ March 2019)

[43] ISO NE Market Operations Data, https://www.iso-ne.

com/isoexpress/web/reports/pricing/-/tree/zone- info

(Last visited on 10𝑡ℎ November 2018)

[44] PJM Market Operations Data, https://dataminer2.pjm.

com (Last visited on 10𝑡ℎ November 2018)

[45] Burke PJ, Abayasekara A. The price elas-

ticity of electricity demand in the United

States: A three-dimensional analysis. Energy J.

2017;39(2):123âĂŞ145.

Sana et al.: Preprint submitted to Elsevier Page 16 of 16