Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on May 30, 2019
Content may be subject to copyright.
ESAENARX and DE-RELM: Novel Schemes for Big Data Predictive
Analytics of Electricity Load and Price
Sana Mujeeba,Nadeem Javaida,∗
aCOMSATS University Islamabad, Islamabad 44000, Pakistan
ARTICLE INFO
Keywords:
Coordination
Dynamic programming
Knapsack
Multi-objective optimization
Pareto front
Meta-heuristic
Nature-inspired
Bird swarm and Cuckoo search algo-
rithm
Hybrid technique
Demand side management
Demand response
Smart grid.
ABSTRACT
Accurate forecasting of the electricity price and load is an essential and challenging task in smart
grids. Since electricity load and price have a strong correlation, the forecast accuracy degrades when
bidirectional relation of price and load is not considered. Therefore, this paper considers price and
load relationship and proposes two Multiple Inputs Multiple Outputs (MIMO) Deep Recurrent Neural
Networks (DRNNs) models for price and load forecasting. The first proposed model, Efficient Sparse
Autoencoder Nonlinear Autoregressive Network with eXogenous inputs (ESAENARX) comprises of
feature engineering and forecasting. For feature engineering, we propose ESAE and performed fore-
casting using existing method NARX. The second proposed model: Differential Evolution Recurrent
Extreme Learning Machine (DE-RELM) is based on RELM model and the meta-heuristic DE opti-
mization technique. The descriptive and predictive analyses are performed on two well-known elec-
tricity markets’ big data, i.e., ISO NE and PJM. The proposed models outperform their sub models and
a benchmark model. The refined and informative features extracted by ESAE improve the forecasting
accuracy in ESANARX and optimization improves the DE-RELMâĂŹs accuracy. As compared to
cascade Elman network, ESAENARX has reduced MAPE upto 16% for load forecasting, 7% for price
forecasting. DE-RELM reduce 1% MAPE for both load and price forecasting.
1. Introduction
THE smart grid is a modern power supply network that
uses communication technology. It consists of automation,
control and technology that responds quickly to the con-
sumption changes. Smart grid provides energy in an effi-
cient, secure, reliable, economical and environment-friendly
manner. Renewable Energy Sources (RESs) of power gener-
ation are integrated for reducing the carbon emission. It al-
lows a two-way communication between the consumers and
utility. With the emergence of smart metering infrastructure,
consumers are informed about the price per unit in advance.
Consumers can adjust their demand load economically, ac-
cording to the price signals. They can reduce consumption
cost by shifting load to a low price hour. Smart grids make
a price responsive environment where the price varies from
a change in demand and vice versa.
In unidirectional grids, there is a one-way interaction from
the generation side to consumers. The consumers are not
able to respond to the price signal because of the fact that
they are unaware of the price dynamically. The demand has
shown a very little or no elasticity to price variations in uni-
directional grids. However, with the advent of the smart me-
tering system, consumers are well aware of the price and
they control their power consumption accordingly. There-
fore, price and demand are highly correlated and interde-
pendent. The market participants need reliable techniques to
maximize their profit that depends on accurate load and price
forecasting. The price and demand forecasting also play an
∗Corresponding author
nadeemjavaidqau@gmail.com (N. Javaid)
www.njavaid.com (N. Javaid)
ORCID(s): 0000-0003-3777-8249 (N. Javaid)
important role in energy: systems planning, market design,
security of supply and operation planning for future power
consumption. An accurate forecast is very important. A 1%
reduction in Mean Absolute Percentage Error (MAPE) of the
load forecast reduces the generation cost to 0.1% to 0.3% [1].
0.1% generation cost is approximately $1 million annually in
a large scale smart grid. Due to the importance of an accu-
rate forecast of electricity price and load, the researchers are
still competing for improving the forecast accuracy. Using
big data for predictive analytics improves the forecasting ac-
curacy [2]. Electricity data is big data as the smart meters
record data in small time intervals [3]. In a large-sized smart
grid, approximately 220 million smart meter measurements
are recorded daily. Analytics of this energy big data helps
the power utilities to get deeper insights of consumer behav-
ior [4]. The volume of input data is increasing and train-
ing of classical forecasting methods is difficult. Processing
of big data by classifier based models is very difficult. Be-
cause of their high space and time complexity. On the other
hand, Deep Neural Networks (DNN) perform very well on
big data [5]. DNN has an excellent ability of self learning
and nonlinear approximation. They optimize the space by
dividing the training data into mini-batches. After dividing
whole data is trained batch by batch.
The rest of the paper is organized as: Section 2is related
work, the problem statement is stated in Section 3, descrip-
tions of used methods are presented in Section 4, proposed
models are described in Section 12 and DE-RELM 13, Sec-
tion 15 is simulations and results and Section 22 concludes
this article.
Sana et al.: Preprint submitted to Elsevier Page 1 of 16
Big Data Predictive Analytics of Electricity Load and Price
2. Related Work
With the advent of smart metering system, the energy-
related data is collected in a very huge volume at a high ve-
locity from variety of sources. This data is referred as en-
ergy big data. For making decision regarding energy mar-
ket operation, predictive analytics is performed on this load
and price data. For maintaining the demand and supply bal-
ance, an accurate prediction of load is essential. Whereas,
the price forecasting plays an important role in the bidding
process and energy trading. To ensure the reliability, stabil-
ity and security of smart grid accurate forecasts of electricity
load and price are essential. Electricity load and price have
bi-directional nature, therefore, simultaneous prediction of
load and price yields greater accuracy.
The authors of papers [6,7] have predicted price and load si-
multaneously. Authors of [6] have proposed a hybrid model
for simultaneous forecasting of electricity load and price.
The proposed model consists of three stages, i.e., denois-
ing, feature engineering and forecasting. For denoising,
authors propose a new Wavelet Packet Transform (WPT)
based method, Flexible WPT (FWPT). The features are se-
lected by adjacent features and Conditional Mutual Infor-
mation (CMI). In the forecasting step, Autoregressive In-
tegrated Moving Average (ARIMA) and Nonlinear Least
Square Support Vector Machine (NLSSVM) are employed
for linear and nonlinear modeling. The NLSSVM is opti-
mized using enhanced optimization technique Time Vary-
ing Artificial Bee Colony (TV-ABC). This hybrid model re-
sults in reasonable forecasting accuracy, however, the model
is highly complex. Moreover, the optimization of forecast-
ing model leads to over-fitting. In paper [7], authors pre-
dict load and price using a multi-stage forecasting approach.
The complex forecasting approach proposed in this work is
comprised of feature selection and multi-stage forecast en-
gine. Features are selected through a modified Maximum
Relevancy Minimum Redundancy (MRMR) method. Elec-
tricity load and price are forecasted using multi-block Arti-
ficial Neural Network (ANN) known as Elman Neural Net-
work (ENN). The forecasting model is optimized by a shark
smell optimization method. This method results in a reason-
able forecasting accuracy. However, it is computationally
very expensive. The feature engineering process and opti-
mization of ENN increase complexity. Moreover, big data
is not considered in this method. In paper [7], authors pre-
dict load and price using a multi-stage forecasting approach.
The complex forecasting approach proposed in this work is
comprised of feature selection and multi-stage forecast en-
gine. Features are selected through a modified Maximum
Relevancy Minimum Redundancy (MRMR) method. Elec-
tricity load and price are forecasted using multi-block Arti-
ficial Neural Network (ANN) known as Elman Neural Net-
work (ENN). The forecasting model is optimized by a shark
smell optimization method. This method results in a rea-
sonable forecasting accuracy. However, it is computation-
ally very expensive. The feature engineering process and
optimization of ENN increase complexity. Moreover, big
data is not considered in this method. Authors of paper [8]
have conducted a predictive analysis of electricity price fore-
casting taking advantage of big data. The relevant features
for the training prediction model are selected through an ex-
tensive feature engineering process. This process has three
steps: firstly, correlated features are selected using Gray
Correlation Analysis (GCA). Secondly, features are selected
through a hybrid of two feature selection methods: RliefF
and Random Forest (RF) are used for further feature selec-
tion. Lastly, the Kernel Principle Analysis (KPCA) is ap-
plied for dimension reduction. Price is predicted by SVM
and the hyper-parameters of SVM are optimized through
modified Differential Evolution (DE). In paper [9], the au-
thors forecast the energy consumption on big data. An anal-
ysis of frequent patterns is performed using a supervised
clustering method. Energy consumption is forecasted by the
Bayesian network.
Authors of paper [10] have utilized the computational power
of deep learning for Electricity Price Forecasting (EPF).
Stacked Denoising Autoencoder (SDA) and RANSAC-SDA
(RS-SDA) models are implemented for online and the day
ahead hourly EPF. The three years (i.e., January 2012 –
November 2014) data utilized in this paper. Data is collected
from Texas, Arkansas, Nebraska, Indiana and Louisiana
ISO hubs in the USA. Comprehensive analyses of the ca-
pabilities of the RS-SDA and SDA models in the EPF are
performed. The effectiveness of the proposed models is
validated through their comparative analyses with classical
ANN, SVM (Support Vector Machine) and MARS (Multi-
variate Adaptive Regression Splines). Both the SDA and
RS-SDA models are able to accurately predict electricity
price with a considerably less MAPE as compared to the
aforementioned models.
A deep learning model for Short-term Load Forecasting
(STLF) is proposed by Tong et al. [11]. The features are
extracted using SDA from the historical electricity load and
corresponding temperature data. Support Vector Regressor
(SVR) model is trained for the day ahead STLF. The SDA
has effectively extracted the abstract features from the data.
SVR model trained on these extracted features forecasts elec-
tricity load with low errors. The proposed model outper-
forms simple SVR and ANN in terms of forecasting accu-
racy which validates its performance.
The Shallow ANN (SANN) is utilized for electricity load
forecasting in [12] and [13]. SANN have the problem of
overfitting. To avoid overfitting, hyperparametersâĂŹ op-
timization is required that increases the complexity of the
forecasting model.
A hybrid deep learning method is applied to forecast price
in [14]. Two deep learning methods are combined in this re-
search work. Features are extracted by Convolution Neural
Network (CNN). Short-term energy price is predicted using
LSTM. Half hourly price data of PJM 2017 is used for pre-
diction. Previous 24 hour price is used to predict the next
1-hour electricity price. The hybrid DNN structure has 10
hidden layers. It has 2 convolution layers, 2 max-pooling
layers, 3 Rectified Linear Unit (ReLU), 1 batch normaliza-
tion layer, 1 LSTM layer for prediction and the last hidden
Sana et al.: Preprint submitted to Elsevier Page 2 of 16
Big Data Predictive Analytics of Electricity Load and Price
Table 1
Related work of load and price forecasting.
Task Forecast Horizon Platform / Testbed Dataset Algorithms
Load and price forecasting [6] Short-term Hourly data of 6 states OF USA NYISO, 2015 MRMR, Multi-block Elman ANN, En-
hanced shark smell optimization
Price forecasting [8] Short-term Hourly electricity price of 6 states of USA ISO NE, 2010-2015 GCA, Random forest (RF), ReliefF,
SVM, DE
Consumption forecasting [9] Short and long-
term
6 second resolution consumption of 5
homes with 109 domestic appliance
UK-Dale, 2012-2015 Association rule mining, Incremental
k-means clustering, Bayesian network
Price forecasting [10] Short-term Hourly price of 5 hubs of MISO USA, 2012-2014 Stacked Denoising Autoencoders
(SDA)
Consumption forecasting [11] Short-term Aggregated hourly load of four regions Los Angeles, California, Florida,
New York City, USA, August
2015-2016
SDA, SVR
Consumption forecasting [12] Short-term Electricity market data of 3 grids: FE,
DAYTOWN, and EKPC
PJM, USA, 2015 Mutual Information (MI), ANN
Consumption forecasting [13] Short-term Electricity market data of 2 grids: DAY-
TOWN, and EKPC
PJM, USA, 2015 Modified MI + ANN
Price forecasting [14] Short-term Half hourly price of PJM Intercontinental Exchange
(ICE), USA
Long Short Term Memory (LSTM),
Convolutional Neural Network (CNN)
Price forecasting [15] Short-term Turkish day-ahead market electricity prices Turkey, 2013-2016 Recurrent Neural Network (RNN)
Cooling load forecasting [16] Short-term HVAC Cooling load of an educational build-
ing
Hong Kong, 2015 Elastic Net (ELN), SAE, RF, MLR,
Gradient Boosting Machines (GBM),
Extreme GB tree, SVR
Consumption forecasting [17] Short-term Hourly load of Korea Electric Power Cor-
poration (KEPCO)
South Korea, 2012-2014 Restricted Boltzman Machine (RBM)
Consumption forecasting [18] Short-term Individual house consumption of 7km of
Paris
Individual household electric
power consumption, France,
2006-2010
Conditional RBM (CRBM), Factored
CRBM
Load forecasting [19] Short-term 15 minute resolution of one retail building Fremont, CA SAE, ELM
Load forecasting [20] Short-term 15 minutes cooling consumption of a com-
mercial building in Shenzhen city
Guangdong province, South
China, 2015
Empirical Mode Decomposition
(EMD), Deep Belief Networks
(DBN)
Load forecasting [21] Short-term Hourly consumption from Macedonian
Transmission Network Operator (MEPSO)
Republic of Macedonia, 2008-
2014
DBN
Load forecasting [22] Short-term Hourly consumption from Australia AEMO, 2013 EMD, DBN
Load forecasting [23] Medium to
long-term
Hourly consumption of a public safety
building, Salt Lake City, Utah. Aggregated
hourly consumption of residential buildings,
Austin, Texas
USA, 2015, 2016 LSTM
Load forecasting [24] Medium-term Half hourly metropolitan electricity con-
sumption
France, 2008-2016 LSTM, GA
Load forecasting [25] Short-term Hourly aggregated consumption of 6 states
OF USA
ISO NE, 2003-2016 Xgboost weighted k-means, EMD-
LSTM
Load forecasting [26] Short-term Ireland consumption Smart meter database of load
profile, Ireland
Pooling deep RNN
Load forecasting [27] Short-term Daily electricity consumption data 3 Chinese cities, 2014 Feed Forward DNN (FFDNN), Prob-
ability Density Estimation
Load and photovoltaic power
forecasting [28]
Short-term Hourly residential power load data Dataport dataset, 2018 Deep Recurrent Neural Network
(DRNN) with LSTM units
Load forecasting [29] Short-term Hourly electricity market data ISO NE, 2007–2012 Deep RNN
Load forecasting [30] Short-term Hourly aggregated consumption of 6 states ISO NE, USA, DRNN, FFDNN
layer is a fully connected layer. The CNN feature extrac-
tor has 7 hidden layers and LSTM predictor has 3 hidden
layers. The output of 7𝑡ℎ hidden layer of feature extractor
CNN becomes the input of LSTM predictor. The proposed
method outperforms simple CNN, LSTM and various ma-
chine learning methods.
Authors of [15] have utilized the Gated Recurrent Units
(GRU) in RNN for Energy Price Forecasting (EPF).
Recently deep learning forecasting methods have shown
good performance in electricity price [14–16] and load fore-
casting [17–30]. However, the interdependency of load and
price are not considered in these DNN forecasting models.
In [31], the author discusses the importance of big data ap-
plications and analytics in the development of Smart Sus-
tainable Cities (SSCs). An IoT based framework is proposed
to improve the functionalities of SSCs. The importance of
accurate load and price forecasting in smart gridâĂŹs sta-
bility is discussed. Stability of grid improves sustainabil-
ity of SSCs. A SSC uses Information and Communication
Technology (ICT) for improving lifeâĂŹs quality, services
and urban operations. It ensures to fulfill the present and fu-
tureâĂŹs environmental, social, cultural and economic re-
quirements.
The authors of [32] conduct an extensive literature review on
future SSCs. Besides other aspects of future SSCs, energy
efficiency is also mooted in this review. The authors describe
the SSC as an energy efficient, eco-friendly and real-time
city. Load demand forecasting plays a key role in energy
management and efficiency.
The future trends, architecture and challenges of SSCs are
reviewed in [33]. The major aspects of a smart city are illus-
trated in this study. Smart grid is discussed as an important
component of a smart city. The role of load demand fore-
casting is emphasized in an energy efficient city. Six dimen-
sions of SSCs are explained in [34]. The authors present a
road map towards SSCs. The concept of SSC is elaborated
with the help of six dimensions; one of these dimensions is
energy efficiency.
The authors of [35] discuss the present services of smart
cities like load demand forecasting in order to achieve a
sustainable city. The short-term load of Girona University,
Spain is studied. The forecasting model consists of outlier
rejection, feature selection using auto correlation and pre-
diction using auto regression. First, outliers are removed
based on k nearest neighbors and Euclidean distance. Sec-
ondly, highly correlated features with the target class are se-
lected and features having high correlation with other fea-
tures and less correlation with target class are eliminated. Fi-
nally, a classical data-driven prediction model, auto regres-
sion is implemented for STLF. The services embedded in the
studied layered architecture are described in detail, aiming to
make it part of a sustainable city.
Sana et al.: Preprint submitted to Elsevier Page 3 of 16
Big Data Predictive Analytics of Electricity Load and Price
3. Problem Statement and Contributions
Authors of paper [8] and [9] have used big data for pre-
dictive analytics. However, the extensive feature engineer-
ing process increases the computational complexity. The
feature engineering involves denoising of inputs, feature se-
lection and dimension reduction. After the feature engineer-
ing step, another important step is the optimization of the
prediction method’s hyperparameters. This optimization is
crucial to achieving accurate forecast results. Feature en-
gineering and model optimization steps make forecasting
complex. To avoid the extensive feature engineering pro-
cess, the deep learning methods are proposed for electricity
price [10] and load [11] forecasting. The mentioned deep
learning based forecasting models have forecasted electric-
ity load and price separately.
The electricity load and price signals have a high correla-
tion. The incorporation of the inherent bi-directional rela-
tion of electricity load and price in prediction models’ inputs
results in high prediction accuracy. The correlation of elec-
tricity load and price is not taken into consideration in [10]
and [11]. A forecasting method is needed that accurately
forecasts the electricity load and price simultaneously. In
this article, a forecasting model is proposed that is based on
deep learning. The proposed method accurately forecasts
electricity load and price simultaneously taking advantage
of big data. The major contributions of this study are en-
listed below:
•The proposed models take advantage of big data. Big
data analyses of electricity load and price are pre-
sented in this study. Data and forecasting models are
analyzed statistically and graphically.
•A new feature extraction scheme based on Sparse Au-
toencoder (SAE) is introduced in the first proposed
model. The performance of SAE is improved by us-
ing wavelet packet denoising as a decoding function
that significantly improves the quality of extracted fea-
tures. The extracted features are presented as refined
information and smooth training input of the forecast-
ing model Nonlinear Autoregressive Network with
Exogenous variables (NARX).
•The second proposed model is an optimized Recurrent
Extreme Learning Machine (RELM). The parameters
of RELM are optimized using a meta-heuristic opti-
mization technique differential evolution. The pro-
posed models outperform ELM, RELM, NARX, DE-
ELM and Cascade Elman ANN (CEANN) [7].
4. Proposed Model
Before describing the proposed forecasting model, the
utilized methods are introduced. A brief description of the
methods used in the proposed models is given in this section.
5. Artificial Neural Network for Forecasting
ANNs are inspired by the learning process of the bio-
logical neural networks. ANNs have the capability to model
the complex patterns hidden in the data. Multilayer Percep-
tron (MLP) is the simplest and fundamental architecture of
ANN [36]. The MLP comprises of the neurons, bias and
weights. The ANNs make a mapping of the inputs 𝑥𝑖and
their respective targets 𝑡𝑖. The weights, 𝑊𝑖are updated while
creating this mapping. The network learns when the weights
are updated.
𝑦(𝑡) = 𝑓(𝑊1𝑥1+𝑊2𝑥2+…+𝑊𝑛𝑥𝑛)(1)
Where, 𝑊𝑖are the weights and 𝑓is the activation function.
The most common algorithm used for updating the weights
is gradient descent. It reduces the squared error 𝐸using the
delta rule:
𝐸=𝑦(𝑡) − 𝑡(𝑡)2(2)
Where, 𝑡(𝑡)is the correspondent target vector of the 𝑥(𝑡)
training vector.
𝑤(𝓁)
𝑖𝑗 (𝑡+ 1) = 𝑤(𝓁)
𝑖𝑗 (𝑡) − 𝛼𝜕𝐸
𝜕𝑤(𝓁)
𝑖𝑗 (𝑡)
(3)
𝑏(𝓁)
𝑗(𝑡+ 1) = 𝑏(𝓁)
𝑗(𝑡) − 𝛼𝜕𝐸
𝜕𝑏(𝓁)
𝑗(𝑡)
(4)
Where, 𝑤(𝓁)
𝑖𝑗 (𝑡+ 1) is the new modified weight, 𝑤(𝓁)
𝑖𝑗 (𝑡)is the
weight that is required to be changed, bias is 𝑏(𝓁)
𝑗(𝑡)and the
learning rate is 𝛼(>0).
Deep Neural Network (DNN) is ANN with deeper architec-
ture, i.e., several numbers of hidden layers. DNN is compu-
tationally stronger as compared to Shallow ANN (SANN).
The proposed forecasting engines are based on Deep Recur-
rent Neural Networks (DRNN), i.e., NARX and LSTM.
6. Sparse Autoencoder
The SAE neural network is an unsupervised learning al-
gorithm that applies back propagation method setting the tar-
get values to be equal to the inputs, i.e., 𝑦𝑖=𝑥𝑖. The SAE
attempts to learn a function ℎ𝑊 ,𝑏(𝑥) ≈ 𝑥. Basically, SAE
tries to learn an approximation function, so the output ̂𝑥 is
similar to the input 𝑥. The network must reconstruct the in-
put data. By placing constraints on the network and limiting
the number of hidden units and adding sparsity, an interest-
ing structure of the data is discovered. The network is forced
to learn a compressed representation of the input, i.e., given
only the vector of hidden unit activations. Generally, sig-
moid is the activation function of the autoencoder, which
is designed to obtain a better representation of input data:
ℎ(𝑋, 𝑊 , 𝑏) = 𝜎(𝑊 𝑋 +𝑏). A sparse penalty term is added
to the sparse autoencoder cost function to limit the average
activation value of the hidden-layer neuron. Normally, when
the output value of a neuron is 1, it is active and the neuron
is inactive when its output value is 0. The purpose of enforc-
ing sparsity is to limit the undesired activation. 𝑎𝑗(𝑥)is set
Sana et al.: Preprint submitted to Elsevier Page 4 of 16
Big Data Predictive Analytics of Electricity Load and Price
x1
xn
x2
D
D
D å s
.
.
.
Input
Layer
Hidden
Layer 1
Output Layer
.
.
.
w11
w1n
w12
w21
w22
w2n
Outputs
.
.
.
.
.
.
Hidden
Layer 2
Time
Delay
Layer
w31
w32
w3n
Smart Grid
Historic Temperature Forecast
s
Load Forecast
Price Forecast
Historic Data ESAE Feature Extractor MIMO Forecaster ESAENARX
Figure 1: Proposed System model.
as the 𝑗𝑡ℎ activation value. In the process of feature learning,
the activation value of the hidden-layerneuron is usually ex-
pressed as 𝑎=𝜎(𝑊 𝑋 +𝑏), where, 𝑊are the weight matrix
and 𝑏is the deviation matrix. The mean activation value of
the 𝑗𝑡ℎ neuron in the hidden layer is defined as:
𝜌𝑗=1
𝑛
𝑛
𝑖=1
[𝑎𝑗(𝑥𝑖)] (5)
The hidden layer is kept at a lower value to ensure that the
average activation value of the sparse parameter is defined as
𝜌, and the penalty term is used to prevent 𝜌𝑗from deviating
from parameter 𝜌. The Kullback-Leibler (KL) divergence
[37] is used in this study for the re-enforcement learning.
The mathematical expression of KL divergence is as follows:
𝐾𝐿(𝜌𝜌𝑗) = 𝜌ln 𝜌
𝜌𝑗
+ (1 − 𝜌) ln 1 − 𝜌
1 − 𝜌𝑗
(6)
When 𝜌𝑗does not deviate from parameter 𝜌, the KL diver-
gence value is 0; otherwise, the KL divergence value will
gradually increase with the deviation. The cost function of
the neural network is set as 𝐶(𝑊 , 𝑏). Then, the cost function
of adding the sparse penalty term is:
𝐶𝑆𝑝𝑎𝑟𝑠𝑒 =𝐶(𝑊 , 𝑏) + 𝛽
𝑆2
𝑗=1
𝐾𝐿(𝜌𝜌𝑗)(7)
Where, 𝑆2is the number of neurons in the implicit layer and
𝑊is the weight of the sparse penalty term. The training
essence of a neural network is to find the appropriate weight
and threshold parameter (𝑊 , 𝑏). After the sparse penalty
term is defined, the sparse expression can be obtained by
minimizing the sparse cost function.
An SAE can be transformed into Sparse Denoising Autoen-
coder (SDA). Data is corrupted in a stochastic manner by
introducing some noise into it. The corrupted data is then
attempted to reconstruct to the original data.
SAE is capable of discovering the correlation among the fea-
tures. A refined and the most relevant feature representation
achieved using SAE.
7. Efficient SAE (ESAE)
The Efficient SAE (ESAE) is proposed to create a better
representation of electricity data, that is useful for an accu-
rate forecast of price and load. In this section, the proposed
feature extractor Efficient SAE is discussed in detail.
8. Pre-training of ESAE
To initialize the weights and bias an unsupervised pre-
training is applied. Where the input of a hidden layer is the
output of its previous layer. In the pre-training step, the ini-
tial bias and weights of the autoencoder are learned.
In the proposed method, the input data 𝑋𝑡is corrupted by
introducing white noise [38]. The white noise is added to
randomly selected 30% data points. A random process 𝑦(𝑡)
is known as white noise when the 𝑆𝑦(𝑓)is constant at all the
frequencies 𝑓:
𝑆𝑦(𝑓) = 𝑁0
2∀𝑓(8)
The white noise describes random disturbances with small
correlation periods. The white noise generalized correlation
function is defined by:
𝐵(𝑡) = 𝛿(𝑡)𝜎2(9)
Where, 𝛿(𝑡)is the delta function and 𝜎is a positive constant.
9. Fine-tuning of ESAE
The fine-tuning step is followed by the pre-training step.
In fine-tuning, the wavelet denoising is proposed as the en-
coding transfer function of the first hidden layer of ESAE.
The activation function of the second layer is sigmoid. The
wavelet denoising has two steps: (i) wavelet packet decom-
position and, (ii) reconstruction denoising operation. Firstly,
the input time series is decomposed into different frequency
band by passing through the high pass and low pass filters.
Then the frequency band of noise is set to be zero. The signal
is then reconstructed using wavelet reconstruction function,
that is the inverse of a wavelet decomposition function [39].
Sana et al.: Preprint submitted to Elsevier Page 5 of 16
Big Data Predictive Analytics of Electricity Load and Price
Start
Extracted FeaturesDe-normalization Forecasting by
NARX
Price and load
forecasts
Min-max
normalization of data
Finish
Stage 1: Feature Extraction
Stage 2: Prediction
Pre-training Fine-tuning
Encoding with SAE
Corrupting input
features with white
noise
Fine-tuning with
efficient SAE
Figure 2: Step by step flow of proposed model ESAENARX.
Wavelet decomposition operation can be expressed by:
𝑐𝑗,𝑘 =𝑛𝑐𝑗−1 , ℎ𝑛−2𝑘
𝑑𝑗,𝑘 =𝑛𝑑𝑗−1 , 𝑔𝑛−2𝑘𝑘= (1,2,…, 𝑁 − 1)
Where, 𝑐𝑗,𝑘 is scale coefficient, 𝑑𝑗 ,𝑘 is the wavelet coefficient,
ℎand 𝑔are the quadrature mirror filter banks. 𝑗is level of
decomposition and 𝑁are the sampling points. The wavelet
reconstruction function that is inverse wavelet decomposi-
tion is expressed as:
𝑐𝑗−1,𝑛 =
𝑛
𝑐𝑗ℎ𝑘 − 2𝑛+
𝑛
𝑑𝑗𝑔𝑘 − 2𝑛(10)
The denoising operation is shown by equations below.
̂𝜔𝑗,𝑘 =𝑠𝑖𝑔𝑛(𝜔𝑗,𝑘 (𝜔𝑗,𝑘 −𝑇 𝜆)),𝜔𝑗 ,𝑘≥𝜆,
0,𝜔𝑗,𝑘 < 𝜆.
Where, ̂𝜔𝑗,𝑘 is denoised signal, 𝜔𝑗 ,𝑘 is wavelet transformed
signal and 𝜆is the threshold.
In ESAE feature extractor, the number of the units in hid-
den layer one and two are 400 and 300, respectively. The
coefficient that controls the layer 2 weight regularization is
set to be 0.001. Sparsity regularization is 4 and sparsity pro-
portion is 0.05. A maximum number of epochs is 100. The
algorithm for the learning of weights is scale conjugate gra-
dient descent.
10. Non-linear Autoregressive Network with
Exogenous Variables
NARX is an autoregressive RNN. Its feedback connec-
tions enclose several hidden layers of the network, leaving
the input layer. NARX has a memory that is utilized for
creating a nonlinear mapping between inputs and outputs.
The network learns from the recurrence on the past values
of time series and the past predicted values of the network
[40]. For predicting a value 𝑦(𝑡), the inputs of the NARX are
𝑦(𝑡− 1), 𝑦(𝑡− 2),…, 𝑦(𝑡−𝑑). NARX can be explained by
the following equation:
̂𝑦(𝑡+ 1) = 𝑓(𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡−𝑑), 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡−𝑑)) + 𝜀(𝑡)
(11)
Where ̂𝑦(𝑡+ 1) is network’s output at 𝑡,𝑓() is the nonlin-
ear mapping function, 𝑦(𝑡), 𝑦(𝑡− 1), ..., 𝑦(𝑡−𝑑)are the
past observed values, 𝑥(𝑡+ 1), 𝑥(𝑡), ..., 𝑥(𝑡−𝑑)are the net-
work’s inputs, number of the delays is 𝑑, and the error term
is denoted by 𝜀(𝑡). In the proposed NARX, for simultaneous
forecasting of price and load, the number of delays is 2. The
hidden layers of the network are 10. The training function is
Levenberg Marquardt.
11. Long Short-term Memory
LSTM is a well-known sub-category of the RNN. It is
widely used for modeling of sequential data. In LSTM, in-
ternal states are used to process input sequence. This struc-
ture allows it to learn dynamic temporal behavior for a time
sequence. Unlike feed forward ANNs, LSTM use their inter-
nal state to process sequences of inputs and remember longer
dependencies in the data. The LSTM is used to solve many
time sequence problems. LSTM contains three gates: input
gate, forget gate and output gate. It has a memory cell that
keeps relevant information of data as a memory. The pur-
pose of the forget gate is to flush out irrelevant data. LSTM
can be explained by following equations:
Suppose an input time series, 𝑥=𝑥1, 𝑥2,…, 𝑥𝑛. The
LSTM models the input time series using recurrence (as
shown in equation 12):
ℎ𝑡=𝑓(𝑥𝑡, ℎ𝑡−1)(12)
Where, ℎ𝑡is the hidden state at time 𝑡,𝑥𝑡is input at time 𝑡
and ℎ𝑡−1 is the previous hidden state, i.e., at time 𝑡− 1. The
Sana et al.: Preprint submitted to Elsevier Page 6 of 16
Big Data Predictive Analytics of Electricity Load and Price
recurrence function 𝑓(⋅)contains gated operations as shown
in the following equations 13,14 and 15:
𝑖𝑡=𝜎(𝑤𝑖[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑖)(13)
𝑓𝑡=𝜎(𝑤𝑓[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑓)(14)
𝑜𝑡=𝜎(𝑤𝑜[𝑥𝑡, ℎ𝑡−1] + 𝑏𝑜)(15)
̃
𝐶𝑡=𝑡𝑎𝑛ℎ(𝑤𝑐[𝑥𝑡, ℎ𝑡−1] + 𝑏𝐶)(16)
𝐶𝑡=𝑖𝑡⋅̃
𝐶𝑡+𝑓𝑡⋅𝐶𝑡−1 (17)
ℎ𝑡=𝑡𝑎𝑛ℎ(𝐶𝑡)⋅𝑜𝑡(18)
Where, 𝑖𝑡,𝑓𝑡and 𝑜𝑡are input, forget and output gates, respec-
tively. 𝑤𝑖,𝑤𝑓and 𝑤𝑜are their respective weights. 𝑏𝑖,𝑏𝑓and
𝑏𝑜are their respective biases. 𝐶𝑡is the current state of the
memory cell. ̃
𝐶𝑡is the new value candidate for the memory
cell. The sigmoid function 𝜎(⋅)converts the gatesâĂŹ val-
ues in the range of 0 to 1. The gates’ decisions depend on the
current input 𝑥𝑡and previous output ℎ𝑡−1. An input signal
is blocked if the gate’s value is 0. The forget gate decides
the amount of previous state ℎ𝑡−1 to be passed. The input
gate defines the amount of new input to be added or updated
to the previous cell state. Based on the cell state, the output
gate determines which information is output. In this man-
ner, the short and long-term sequence related information is
learned in the LSTM.
LSTM is superior to ANN because of its quality that it
can handle the problem of vanishing or exploding gradient.
The vanishing gradient problem arises while updating of
weights. The weights are updated by the delta rule in which
the gradient of the weight is taken with respect to the error
(as shown in equation 3). If the gradient becomes too small,
the change in updated weights will also be smaller resulting
in no improvement in learning. Whereas, if the gradient be-
comes too big, the updated weights will change too much
resulting in no convergence and un-stability of the network.
LSTM overcomes this problem by using the memory cell 𝑐𝑡,
that is able to preserve the state over a long period of time.
The amount of information to be restrained or discarded is
controlled by changing the values of forget gate, 𝑓𝑡, and in-
put gate, 𝑖𝑡. The dependency on individual inputs is also
controlled. This increased regulation helps in overcoming
the vanishing and exploding gradient problems.
12. ESAENARX Forecast Model
The deep learning is well known for its high precision
feature extraction. A sparse autoencoder deep neural net-
work with dropout is proposed to extract useful feature. This
deep neural network can significantly reduce the adverse ef-
fect of overfitting, making the learned features more con-
ducive to the identification and forecasting. NARX is pro-
posed for load and price forecasting.
A Multi Input Multi Output (MIMO) forecast model is pro-
posed to predict the price and load simultaneously. Fea-
tures are extracted using ESAE. Then the NARX network is
trained for simultaneous forecasting of price and load. The
system model is shown in Figure 1. The input features are:
hour, temperature forecast, wind speed forecast, lagged load,
the lagged price. There are two targets, electricity load and
price. The prediction process has the following five steps:
1. Inputs and targets are normalized using min-max
normalization. Suppose an input vector 𝑋=
𝑥1, 𝑥2, 𝑥3, ..., 𝑥𝑛. The number of instances in the vec-
tor is 𝑛. The min-max normalized is obtained by:
𝑋𝑛𝑜𝑟 =𝑥𝑖−𝑋𝑚𝑖𝑛
𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛
(19)
Where, 𝑖= 1,2, ..., 𝑛.
2. The normalized inputs are fed to train the ESAE fea-
ture extractor. After the ESAE is trained, the input fea-
tures are encoded using this trained ESAE. The output
of ESAE is the encoded features.
3. The encoded features are given as input to train NARX
network. 80% data is given for training, 15% is used
for validation and 5% is used for testing.
4. The price and load are predicted for 168 hours that is
one week.
5. The predicted values of load and price are de-
normalized to obtain actual values. The NARX ac-
curately predicts the price and load simultaneously.
The ESAE feature extractor has wavelet packet denoising as
a decoder function that performs the denoising of the input
features along with extraction. A refined and rich represen-
tation of features is extracted by ESAE. Generally, SAE has
sigmoid decoder functions. The usage of wavelet packet de-
noising enhanced the extracted features and consequently the
forecasting accuracy improved significantly. The purpose of
good forecasting accuracy is achieved by ESAENARX with
the help of efficient feature extraction.
13. DE-RELM Forecast Model
The second proposed model is an also a MIMO model
like ESAENARX. DE-RELM is an efficient method for elec-
tricity load and price forecasting. DE-RELM has three
stages, in the first stage, the parameters of ELM are opti-
mized by applying the DE algorithm. In the second stage,
ELM is trained. The inputs and outputs of ELM are the in-
put features of load and price. With similar inputs and out-
puts, ELM acts like an encoder. Once the optimized ELM is
trained, the learned weights are set as the initial weights of
the RNN network that is used for forecasting. The learned
weights of ELM are the best representation of the input data.
Setting these initial weights helps RNN converge faster and
Sana et al.: Preprint submitted to Elsevier Page 7 of 16
Big Data Predictive Analytics of Electricity Load and Price
Start
De-normalization
Price and load
forecasts
Min-max
normalization of data
Finish
Stage 1: ELM optimization
Stage 2: Training ELM
Select weights and
biases with DE No
Yes
Stage 3: Prediction with DE-RELM
Calculate objective
function
Train ELM with same
inputs and outputs
Learned Weights Train ELM with
optimized weights
Forecasting by DE-
RELM
Initialize DE-RELM
with learned weights
Figure 3: Flowchart of DE-ELM.
forecast accurately. This is the third and final stage of DE-
RELM. The number of neurons in the hidden layer of ELM
and RNN is kept the same. In order to use the learned
weights of ELM for the RNN network, the dimensions of
weight vectors have to be the same. For the prediction of
load and price, DE-RELM follows the steps shown in the
flowchart, Figure 3.
1. The inputs and targets are normalized using min-max
normalization (as shown in equation 19).
2. The normalized inputs are given to the ELM networks
as inputs and outputs. The network is trained.
3. The forecasting error is calculated by equation 22.
4. The DE algorithm is used to optimize the weights and
biases of ELM. The objective function of DE is the
minimization of the prediction error.
𝑂𝑏𝑗 =minimize 1
𝑛
𝑛
𝑖=1
𝑋𝑎𝑐𝑡
𝑖−𝑦𝑓 𝑜𝑟
𝑖
𝑋𝑎𝑐𝑡
𝑖
100(20)
Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-
imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-
mum value of the actual target.
5. When the forecasting error is reduced to the desired
value, the optimized ELM network is trained.
6. The weights of ELM are set as initial weights of the
RNN network.
7. The RNN network predicts the price and load simul-
taneously.
8. The predicted values are de-normalized by inverse
min-max function.
𝑋= [𝑥𝑓 𝑜𝑟 × (𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛)] + 𝑋𝑚𝑖𝑛 (21)
Where, 𝑥𝑓 𝑜𝑟 is the forecasted value, 𝑋𝑚𝑎𝑥 is the max-
imum value of the actual target and 𝑋𝑚𝑖𝑛 is the mini-
mum value of the actual target.
In DE-RELM, the number of neurons in the hidden layer of
ELM and RNN is 100. ELM has 1 hidden layer. The acti-
vation function of ELM is sigmoid. DE has 100 iterations,
population size is 50, mutation factor is 0.5 and the crossover
rate is 1. The RNN network has 1 hidden layer. The transfer
function is logistic sigmoid.
The proposed models have multiple inputs and outputs. In-
puts are: hour, temperature, wind speed, lagged price and
lagged load and outputs are: price and load. The forecast en-
gines create a mapping between inputs and targets. Hence,
a mapping of input hour, temperature, price and load is cre-
ated with target price and target load. The relation between
price and load is captured while creating this mapping. The
price and load are affected by past price and load, therefore,
lagged values are good features for prediction. The load is
affected by temperature. The temperature and lagged values
are the most relevant inputs for price and load prediction.
Sana et al.: Preprint submitted to Elsevier Page 8 of 16
Big Data Predictive Analytics of Electricity Load and Price
Moreover, the input is extracted with ESAE feature extractor
which further enhances the input to NARX forecaster. The
best mapping of relevant and informative inputs with targets,
results in improved forecast accuracy.
Both proposed models comprise of neural network based en-
coders: ESAE and ELM encoder, and deep RNN forecasters:
NARX and LSTM. In the first model; ESAENARX, features
are extracted by an efficient sparse encoder. In the second
model; DE-RELM, an extreme learning machine is used as
encoder to learn the initial weights for forecast engine.
This study is aimed at helping electricity market experts and
traders. Several market operations benefit from the load
forecasting, such as: formulation of demand-response pro-
grams, generation scheduling and planning new generation
sources. On the other hand, the traders take advantage of
price forecasting for making bidding strategies and market
experts make modified pricing schemes to control consump-
tion behaviors. No specific sector (i.e., residential, indus-
trial, commercial, etc.) is targeted in this study, instead ag-
gregated load and average regulation price of two power util-
ities are studied.
14. Applications of Proposed Models
The proposed models forecast electricity load and price.
Both price and load forecasts are useful in the case of smart
grids and micro grids. They help utility experts in under-
standing load and price correlation and dynamics. They have
following applications:
1. Minimize the risk of demand and supply imbalance. If
the generation of electricity is less than the demand,
the grids will not be able to fulfill the demands of con-
sumers. If generation is more than demand, the energy
will be wasted.
2. Enable the power utility companies to plan better since
they understand the future load demand.
3. Help to determine the required resources; such as, fu-
els required to operate the generating plants.
4. Maximize utilization of power generating plants. The
load forecasting prevents under generation and over
generation.
Several Independent Service Operators (ISOs) take advan-
tage of load forecasting. These ISOs publish the day-ahead
or month-ahead load forecasting data on their websites; such
as, NYISO [41], PJM [42], etc. In the aforementioned real
world scenarios, the proposed forecasting models are appli-
cable.
15. Simulations and Results
All the simulations are performed using MATLAB
R2018a on a computer with core i3 processor and 8 GB
RAM. In this section, the description of datasets, big data
analysis and results’ discussion are presented.
16. Data Description
The data used for forecasting is taken from the well-
known electricity utilities: ISO NE (Independent System
Operator New England) [43] and PJM [44], USA. Both
datasets are publicly available.
17. ISO NE Electricity Market
ISO NE is an independent system operator that provides
power to the six states of the USA, known as New England.
It serves Maine, Connecticut, Massachusetts, Rhode Island,
Vermont and New Hampshire. Approximately, every year
the transaction of $10 million is made by 400 electricity
market participants in ISO NE. It has almost 7 million con-
sumers: business and household. Hourly electricity market
data of almost 8 years is used for prediction purpose. Dura-
tion of data used in simulations is from January 2011 to June
2018. Total measurements are 65,616. The data utilized in
this paper is aggregated load and regulation capacity clear-
ing price of the ISO NE control area.
18. PJM Electricity Market
PJM Interconnection is a Regional Transmission Organi-
zation (RTO) in the USA. It is an electric transmission sys-
tem that is part of the Eastern Interconnection grid. It sup-
plies power to 14 regions, i.e., Illinois, Delaware, Kentucky,
Indiana, Maryland, New Jersey, Michigan, Ohio, North Car-
olina, Pennsylvania, Virginia, West Virginia, District of
Columbia and Tennessee. The data taken from PJM is hourly
consumption and price of thirteen years, i.e., January 2006
to October 2018. Data comprises of 112,300 measurements
of load and price each.
19. Performance Evaluation
To evaluate the performance of ESAENARX two per-
formance measures are used, i.e., MAPE, Root Mean Square
Error (RMSE) and Normalized RMSE (NRMSE). The lower
value of the error is better forecasting accuracy. MAPE is an
average absolute error of the forecasted and observed values
and defined by the following equation:
𝑀 𝐴𝑃 𝐸 =1
𝑛
𝑛
𝑖=1
𝑋𝑎𝑐𝑡
𝑖−𝑦𝑓 𝑜𝑟
𝑖
𝑋𝑎𝑐𝑡
𝑖
100 (22)
NRMSE is the normalized root mean square error of fore-
casted and observed values and defined by:
𝑅𝑀𝑆𝐸 =
1
𝑛
𝑛
𝑖=1
(𝑋𝑎𝑐𝑡
𝑖−𝑦𝑓 𝑜𝑟
𝑖)2(23)
𝑁𝑅𝑀𝑆𝐸 =𝑅𝑀𝑆𝐸
(𝑚𝑎𝑥(𝑋𝑎𝑐𝑡
𝑖) − 𝑚𝑖𝑛(𝑋𝑎𝑐𝑡
𝑖)) (24)
Where 𝑋𝑎𝑐𝑡
𝑖is the observed value, 𝑦𝑓 𝑜𝑟
𝑖is the forecasted
value and 𝑛is number of values.
20. Big Data Analytics of Electricity Price and
Demand
In this research study, the big data of load and price are
deeply analyzed. Both visual and statistical analyses are per-
formed. The visual analyses are presented in graphs. The vi-
sual analyses of ISO NE load are shown in Figures 4,8,11,
Sana et al.: Preprint submitted to Elsevier Page 9 of 16
Big Data Predictive Analytics of Electricity Load and Price
and 13 and PJM load are illustrated in Figures 14,18,21,
and 23. The price analyses of ISONE are presented in Fig-
ures: 5,9,10, and 12, and PJM price is demonstrated by
Figures: 15,19,20, and 22. The price demand relation of
ISO NE is shown in Figure 6, and 7and PJM is shown in
Figure 16, and 17. The statistical analysis of the forecast er-
ror is shown in Table 2.
ISO NE price and load have daily and weekly seasonality.
Price and load have a strong relation with the ISO NE mar-
ket. The load of 8 years is shown in Figure 4and price is
shown in Figure 5. The scatter plot in Figure 6shows the di-
rectly proportional relation of price and demand. The scatter
plot shows the proportionality of price and load. The corre-
lation coefficient is also shown in the figure. The normalized
load and price of one week are shown in a Figure 7for better
visualization of their bidirectional relation. The price elas-
ticity of demand is a factor that describes changes in demand
with respect to changes in the price. Usually, the demand de-
creases if the price increases, however, the price elasticity of
power demand is low. According to the analysis presented
in [45], the price elasticity of demand is âĂŞ0.1 or lesser
within a year in the USA. The season affects the energy con-
sumption and price. In the USA there are four seasons in a
year. The spring season duration is from March to May, the
summer season is from June to August, the autumn (fall) is
from September to November and winters are from Decem-
ber to February. The summer season has the highest electric-
ity consumption of the year as shown in Figure 8. The peak
consumption hours of summer are from 1:00 pm to 5:00 pm
on weekdays. In winters (December to January), the peak
consumption hours are from 5:00 pm to 7:00 pm on week-
days. In ISO NE there are two peak load points in a day.
The 1𝑠𝑡 peak point is around 11:00 am and 2𝑛𝑑 peak point
is between 4:00 pm to 5:00 pm (as shown in Figure 8). The
consumption of 1𝑠𝑡 January, 1𝑠𝑡 April, 1𝑠𝑡 July and 1𝑠𝑡 Octo-
ber is shown in Figure 8and 18. The mentioned four days
are from the four different seasons of a year.
Prices of the same four days are shown in Figure 9and 19.
Both consumption and price are the highest in the summer
season from the rest of the year. The building cooling is re-
quired in the hot weather of summer. Air conditioners con-
sume a lot of power, that is the major reason for an increase in
energy consumption. Electricity prices are relatively higher
in the winters too. The electricity price and load are less in
the spring season as compared to the rest of the year. Due to
the fact that in moderate weather building heating or cool-
ing is not required, that reduces consumption and ultimately
price too. The electricity consumption pattern is fixed with
the seasons and time of use. The electricity consumption is
more in the working hours and less in the nonworking hours.
The load pattern trend has fewer variations as compared to
the price trend. Mostly price and load increasing and de-
creasing at the same time. However, there are a few points
in time where the energy price increase sharply in an unex-
pected manner, even if the load is not increased accordingly
(as shown in Figure 7, between hours 75 to 82 and Figure 17,
between hours 30 to 35). The unexpected change in the price
is due to the external influential factors other than consump-
tion. The factors that influence energy price are: Renewable
Energy Resources (RES) available, fuel prices, economic
conditions, excessive use penalty and transmission contin-
gency. The load is not much affected by most of these fac-
tors. Energy load shows a little or no variation towards the
aforementioned external factors. The energy consumption is
majorly affected by weather conditions. The electricity con-
sumption and price continue to increase over the last 8 years,
that is clear from Figure 4and 5. The visual representation
of past years’ consumption enables utility experts to visual-
ize increasing demand that helps in planning new generation
plants to satisfy future power demand.
PJM load and price of 13 years (2006-2018) are shown in
Figure 14 and Figure 15, respectively. Scatter plot in Fig-
ure 6illustrates the relation of price and load in ISO NE.
Figure 16 shows price demand relation in the PJM electric-
ity markets. The direct proportionality of load and price sig-
nals can be seen in these two figures. In Figure 7and 17, the
normalized load and price of 1𝑠𝑡 week of January 2018 are
plotted. The correlation of price and load signals is demon-
strated in these two figures.
The proposed models ESAENARX and DE-RELM are used
for short-term load and price forecasting. The forecast pe-
riod is one week that is 168 hours. The results of ISO NE
price and load forecast of 1𝑠𝑡 week of June 2018 are shown
in Figure 10 and Figure 11. The PJM price and load fore-
cast of 1𝑠𝑡 week of September 2018 are shown in Figure 20
and 21, respectively. The actual and forecasted values are
plotted and the forecasted values are following the trend of
the actual values. The forecasted load trend closer to the ac-
tual load trend as compared to price. The price forecast is
slightly less accurate as compared to the load forecast. This
is because the load has a similar repetitive pattern and price
pattern has a volatile nature.
Price data exhibit certain characteristics: volatility, sudden,
sharp spikes and changes. The nature of price makes its
forecasting difficult. Learning the pattern of price require
great effort. Only refined features learned with a good pre-
diction method can produce an accurate price forecast result.
It is clear from the results of the experiments that the ESAE-
NARX forecasts price and load very well.
21. Comparison and Discussion
The proposed methods are compared with four ANN
forecasting methods: NARX and ELM, DE-ELM and
RELM. These methods are widely used in electricity load
and price forecasting. The ESAENSARX, ELM, enhanced
ELM, NARX and RELM results for ISO NE price and load
forecast are shown in Figure 12 and Figure 13, respectively.
ESAENARX is able to follow the price and load trend bet-
ter than compared methods. The reason behind the better
forecast accuracy is the best representative features extracted
by proposed feature extractor ESAE. NARX forecaster is
trained with extracted features and it performs very well.
The proposed method takes advantage of the strengths of
both SAE and NARX. The SAE is further made efficient for
Sana et al.: Preprint submitted to Elsevier Page 10 of 16
Big Data Predictive Analytics of Electricity Load and Price
0123456
Hours 104
0.5
2
3
Load (MW)
104
Figure 4: Load of January 2011 to March 2018, ISO NE.
0 1 2 3 4 5
Hours 104
0
500
1000
1500
Price ($/MWh)
Figure 5: Price of January 2011 to March 2018, ISO NE.
1 1.2 1.4 1.6 1.8
Load (MW) 104
0
100
200
300
400
Price ($/MWh)
Correlation Coefficient = 0.62
Figure 6: Price-demand signals relation of January 2018 to
March 2018, ISO NE.
0 20 40 60 80 100 120 140 160
Hours
0.2
0.4
0.6
0.8
1
1.2
Load (MW)
0
0.2
0.4
0.6
0.8
1
Price ($/MWh)
Load
Price
Figure 7: Normalized load and price of first week of June 2018,
ISO NE.
better performance. The detailed comparison of all the com-
pared methods is presented in this section. The results and
reasoning are also elaborated with the comparative analy-
sis. Moreover, the strengths and limitations of the compared
methods are highlighted.
The effect of proposed feature engineering is clear from
the numerical results. The forecasted accuracy of ESAE-
0 5 10 15 20 25
Hours
1
1.5
2
Load (MW)
104
Figure 8: One day consumption of all four seasons, ISO NE.
0 5 10 15 20 25
Hours
0
50
100
150
Price ($/MWh)
Figure 9: One day energy price of all four seasons, ISO NE.
0 20 40 60 80 100 120 140 160
Hours
0
20
40
60
Price ($/MWh)
Observed
Predicted
Figure 10: Forecasted and observed price of first week of June
2018, ISO NE.
0 50 100 150 200 250
Hours
1
1.5
2
2.5
Load (MW)
104
Predicted
Observed
Figure 11: Forecasted and observed load of first week of June
2018, ISO NE.
NARX with extracted features is much better as compared to
simple NARX. The extracted features are informative; there-
fore, the forecaster is able to model data in a better way and
forecast with greater accuracy.
The proposed methods are compared with three types of
ELMs: ELM, DE-ELM and RELM. The comparative anal-
ysis of these methods is given below.
Sana et al.: Preprint submitted to Elsevier Page 11 of 16
Big Data Predictive Analytics of Electricity Load and Price
0 20 40 60 80 100 120 140 160
Hours
0
50
100
150
200
250
Price ($/MWh)
Observed
ESAENARX
ELM
NARX
DE-ELM
RELM
DE-RELM
CEANN
Figure 12: Comparison of ESAENARX and DE-RELM price
prediction with NARX, ELM and DE-ELM, ISO NE.
0 60 120 170
Hours
0
1
2
3
4
Load (MW)
104
Observed
ESAENARX
ELM
NARX
RELM
DE-ELM
DE-RELM
CEANN
Figure 13: Comparison of ESAENARX and DE-RELM load
prediction with NARX, ELM and DE-ELM, ISO NE.
0 1 2 3 4 5 6 7
Hours 104
0.5
1
1.5
Load (MW)
105
Figure 14: Load of PJM from January 2010 to March 2018.
0 1 2 3 4 5 6 7
Hours 104
0
500
1000
Price ($/MWh)
Figure 15: Price of PJM from January 2010 to March 2018.
The ELM is optimized using a meta-heuristic optimization
algorithm, named differential evolution. The initial weights
and biases of ELMâĂŹs hidden and output layers are op-
timized using DE. DE is an optimization method that iter-
atively improves the performance of an algorithm with re-
spect to the optimization function. In the case of ELM, the
performance is improved, when the forecast accuracy im-
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Load (MW) 105
0
500
1000
Price ($/MWh)
Correlation Coefficient = 0.87
Figure 16: Price-demand signals relation of PJM from January
2018 to March 2018.
0 20 40 60 80 100 120 140 160
Hours
0
0.5
1
Load (MW)
0
0.5
1
Price ($/MWh)
Load
Price
Figure 17: Normalized load and price of PJM first week of
January 2018.
0 5 10 15 20 25
Hours
0.6
0.8
1
1.2
1.4
Load (MW)
105
Figure 18: One day consumption of all four seasons, PJM.
0 5 10 15 20 25
Hours
0
50
100
150
Price ($/MWh)
Figure 19: One day energy price of all four seasons, PJM.
proves. The objective function is to reduce the forecast error
on validation data of electricity load and price. First of all,
the population of weights and bias is generated. The pop-
ulation follows the normal distribution. For every selected
weight combination, the NRMSE and MAE are calculated.
The crossover and mutation operations are performed to gen-
erate new combinations of weights and biases. The opti-
Sana et al.: Preprint submitted to Elsevier Page 12 of 16
Big Data Predictive Analytics of Electricity Load and Price
0 20 40 60 80 100 120 140 160
Hours
0
50
100
150
Price ($/MWh)
Observed
Predicted
Figure 20: Actual and predicted price of PJM.
0 20 40 60 80 100 120 140 160
Hours
0.6
0.8
1
1.2
1.4
Load (MW)
105
Observed
Predicted
Figure 21: Actual and predicted load of PJM.
0 60 120 170
Hours
0
50
100
150
Price ($/MWh)
Observed
ESAENARX
NARX
ELM
DE-ELM
RELM
DE-RELM
CEANN
Figure 22: Comparison of ESAENARX and DE-RELM price
prediction with NARX, ELM and DE-ELM, PJM.
0 60 120 170
Hours
0
0.5
1
1.5
2
2.5
Load (MW)
105
Observed
ESAENARX
ELM
NARX
DE-ELM
RELM
DE-RELM
CEANN
Figure 23: Comparison of ESAENARX and DE-RELM load
prediction with NARX, ELM and DE-ELM, PJM.
mized combination of weights and biases are achieved after
multiple iterations of DE. The optimized weights and biases
are used in ELM for the price and load forecasting on test
data. The DE-ELM has a lesser error as compared to simple
ELM. The accuracy of DE-ELM is improved due to the op-
timized initial weights and biases according to the data. The
accuracy of DE-ELM is better than ELM and slightly worse
than RELM in load forecasting. However, for price fore-
casting, the performance of DE-ELM degrades. The price
data has high nonlinearity and dependency on exogenous
variables. Therefore, the relevant features of price are re-
quired to be extracted carefully. The proposed feature ex-
tractor ESAE is capable of extracting the fine details of rel-
evant data. Therefore, the proposed method, ESAENARX
shows good accuracy for both price and load forecasting.
RELM is a variant of the recurrent neural network. It is a
combination of two methods, ELM and RNN. ELM acts as
an encoder, where the inputs and outputs of the network are
same, i.e., the input features. The learned weights of the
ELM network are set as the initial weights of the RNN. By
keeping the inputs and outputs of ELM network similar, the
learned weights are a good representation of the input fea-
tures. The number of neurons in the hidden layer of ELM
and RNN is kept the same. Two ELM encoders are trained,
one for the hidden layerâĂŹs weights of RNN and second
for the output layerâĂŹs weights of the RNN. The learned
weights, make the RNN converge fast and better. The results
of RELM are slightly better than DE-ELM and comparable
to NARX. Both RELM and NARX belong to the same cat-
egory of the neural network, known as a recurrent neural
network.
The second proposed method DE-RELM perform reason-
ably well on load forecasting. The load forecasting results
are much better as compared to other techniques and com-
parable to ESAENARX. However, no significant improve-
ment is seen in the price forecast. ESAENARX performs
equally well for both load and price. The DE-RELM trains
the forecaster on learned weights, a minor improvement is
achieved, that is not comparable to ESAENARX. For price
forecast only properly extracted features can improve accu-
racy. ESAE extracts the relevant and the most informative
features, that improves the forecast accuracy.
ELM has the worst forecast results in the six compared meth-
ods. Because of the fact that ELM is a feed forward network.
Its weights are learned once in a forward pass and never
updated. Therefore, to achieve acceptable forecast results,
the initial weights of the ELM have to be very optimized.
NARX performs better as compared to the ELM. However,
its forecast results are not as accurate as the proposed meth-
ods ESAENARX and DE-RELM. The errors MAPE and
NRMSE are shown in Table 2. The forecast accuracy of all
six methods is in sequence: ESAENARX > DE-RELM >
NARX > DE-ELM > RELM > ELM.
The lesser error than compared methods verifies the good
performance of the ESAENARX forecast model. The PJM
results in Figure 22 and Figure 23, prove the better accu-
racy of ESAENARX and DE-RELM as compared to ELM,
DE-ELM, RELM and NARX. The MAPE and NRMSE of
ESAENARX, DE-RELM, ELM, DE-ELM, RELM, NARX
and CEANN [7] are listed in Table 2. The efficiency of
ESAENARX and DE-RELM is confirmed by lesser MAPE
and RMSE compared to the mentioned methods.
The computational time of both proposed models is pre-
sented in Table 3. The computational time of ESAENARX is
Sana et al.: Preprint submitted to Elsevier Page 13 of 16
Big Data Predictive Analytics of Electricity Load and Price
Table 2
Comparison of forecasting errors.
ISO NE
Forecast Method MAPE RMSE NRMSE
ELM 74.59 7.82 1.53
NARX 1.35 4.35 0.37
Load Forecast DE-ELM 21.73 5.23 0.41
RELM 18.78 4.62 0.37
CEANN [7]8.62 3.75 0.57
DE-RELM 7.78 3.14 0.32
ESAENARX 1.13 2.27 0.03
ELM 89.95 9.78 1.91
NARX 8.29 5.24 0.89
Price Forecast DE-ELM 28.06 6.92 0.32
RELM 21.06 5.62 0.28
CEANN [7]19.96 4.45 0.96
DE-RELM 18.62 3.75 0.34
ESAENARX 3.32 2.85 0.08
PJM
ELM 72.32 21.2 1.92
NARX 32 9.26 1.8
Load Forecast DE-ELM 6.52 9.18 0.08
RELM 1.14 9.04 0.032
CEANN [7]3.87 8.96 0.64
DE-RELM 1.09 5.24 0.028
ESAENARX 1.08 3.86 0.03
ELM 99 21.6 2.19
NARX 8.78 18.72 0.16
Price Forecast DE-ELM 18.49 21.76 0.35
RELM 11.09 18.96 0.52
CEANN [7]10.74 8.76 0.2604
DE-RELM 10.56 7.24 0.18
ESAENARX 4.32 4.67 0.12
Table 3
Computational time of proposed algorithms.
Model Dataset Training
Time (s)
Testing Time
(s)
SAENARX ISO NE 162 37
PJM 187 53
DE-RELM ISO NE 104 28
PJM 123 29
higher as compared to DE-RELM because the feature extrac-
tor ESAE involves pre-training and fine tuning steps. Both
models take more time for training on PJM data. The reason
behind PJM’s higher time complexity is its larger size than
ISO NE.
22. Conclusion
In this paper, electricity load and price forecasting is con-
sidered in order to take part in the ISO NE and PJM mar-
kets that regulate the price and demand in the power systems
of the USA. The modeling of electricity load and price is
addressed by two new deep learning based models: ESAE-
NARX and DE-RELM. Descriptive and predictive analytics
of electricity big data are performed. The proposed methods
consider the bidirectional impacts of demand and prices on
each other. These methods capture the load and price inter-
dependencies in the past market data. Following conclusions
are drawn from this study:
•The big data analytics unveils the insightful infor-
mation about consumer behaviors and increasing de-
mand. This information helps in the formulation of
new demand-response programs and long term deci-
sions, such as, upscaling of the grid for satisfying the
future demand. Consequently, the grid stability is sig-
nificantly improved.
•The proposed feature extractor; ESAE, significantly
improves the quality of extracting feature resulting in
accurate forecasting. The functionality of ESAE is im-
proved because of implementing proposed combina-
tion of decoder functions.
•The proposed models efficiently capture price-
demand trends in energy big data. Numerical results
show that proposed forecasting models have lesser
MAPE and RMSE than the compared methods.
•The feasibility and practicality of proposed models are
confirmed by their accuracy on well-known real elec-
tricity market data.
In future work, the SAE feature extractor will be enhanced
using multiple combinations of encoder and decoder func-
tions. The effect of each combination on the performance
of feature extractor will be examined. A comparative analy-
sis will be performed on enhanced feature extractor in order
to propose a generalized SAE that performs well on multi-
ple scenarios and datasets. Proposed models can be imple-
mented in real world scenario of smart grid or micro grid in
order to improve power system operations.
References
[1] Liu Y, Wang W, Ghadimi N. Electricity load forecasting
by an improved forecast engine for building level con-
sumers. Energy. 2017 Nov 15;139:18-30.
[2] Akhavan-Hejazi H, Mohsenian-Rad H. Power systems
big data analytics: An assessment of paradigm shift bar-
riers and prospects. Energy Reports. 2018 Nov 30;4:91-
100.
[3] Jiang H, Wang K, Wang Y, Gao M, Zhang Y. Energy big
data: A survey. IEEE Access. 2016; 4:3844-61.
[4] Zhou K, Fu C, Yang S. Big data driven smart energy
management: From big data to big insights. Renewable
and Sustainable Energy Reviews. 2016 Apr 1;56:215-
25.
[5] Zhang Q, Yang LT, Chen Z, Li P. A survey on deep
learning for big data. Information Fusion. 2018 Jul 31;
42:146-57.
Sana et al.: Preprint submitted to Elsevier Page 14 of 16
Big Data Predictive Analytics of Electricity Load and Price
[6] Ghasemi A, Shayeghi H, Moradzadeh M, Nooshyar M.
A novel hybrid algorithm for electricity price and load
forecasting in smart grids with demand-side manage-
ment. Applied energy. 2016 Sep 1;177:40-59.
[7] Gao W, Darvishan A, Toghani M, Mohammadi M, Abe-
dinia O, Ghadimi N. Different states of multi-block
based forecast engine for price and load prediction. In-
ternational Journal of Electrical Power & Energy Sys-
tems. 2019 Jan 1;104:423-35.
[8] Wang K, Xu C, Zhang Y, Guo S, Zomaya A. Robust
big data analytics for electricity price forecasting in the
smart grid. IEEE Transactions on Big Data. 2017 Jul 5,
DOI: 10.1109/TBDATA.2017.2723563.
[9] Singh S, Yassine A. Big data mining of energy time
series for behavioral analytics and energy consumption
forecasting. Energies. 2018 Feb 20;11(2):452.
[10] Wang L, Zhang Z, Chen J. Short-term electricity price
forecasting with stacked denoising autoencoders. IEEE
Transactions on Power Systems. 2017 Jul;32(4):2673-
81.
[11] Tong C, Li J, Lang C, Kong F, Niu J, Rodrigues JJ.
An efficient deep model for day-ahead electricity load
forecasting with stacked denoising autoencoders. Jour-
nal of Parallel and Distributed Computing. 2018 Jul
1;117:267-73.
[12] Ahmad A, Javaid N, Guizani M, Alrajeh N, Khan ZA.
An accurate and fast converging short-term load fore-
casting model for industrial applications in a smart grid.
IEEE Transactions on Industrial Informatics. 2017 Oct
1;13(5):2587-96.
[13] Ahmad A, Javaid N, Alrajeh N, Khan ZA, Qasim U,
Khan A. A modified feature selection and artificial neu-
ral network-based day-ahead load forecasting model for
a smart grid. Applied Sciences. 2015 Dec 11;5(4):1756-
72.
[14] Kuo PH, Huang CJ. An Electricity Price Forecasting
Model by Hybrid Structured Deep Neural Networks.
Sustainability. 2018 Apr 21;10(4):1280.
[15] Ugurlu U, Oksuz I, Tas O. Electricity Price Forecasting
Using Recurrent Neural Networks. Energies. 2018 Apr
23;11(5):1-23.
[16] Fan C, Xiao F, Zhao Y. A short-term building cooling
load prediction method using deep learning algorithms.
Applied energy. 2017 Jun 1;195:222-33.
[17] Ryu S, Noh J, Kim H. Deep neural network based de-
mand side short term load forecasting. Energies. 2016
Dec 22;10(1):3.
[18] Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep
learning for estimating building energy consumption.
Sustainable Energy, Grids and Networks. 2016 Jun
1;6:91-9.
[19] Li C, Ding Z, Zhao D, Yi J, Zhang G. Building energy
consumption prediction: An extreme deep learning ap-
proach. Energies. 2017 Oct 7;10(10):1525.
[20] Fu G. Deep belief network based ensemble approach
for cooling load forecasting of air-conditioning system.
Energy. 2018 Apr 1;148:269-82.
[21] Dedinec A, Filiposka S, Dedinec A, Kocarev L.
Deep belief network based electricity load forecasting:
An analysis of Macedonian case. Energy. 2016 Nov
15;115:1688-700.
[22] Qiu X, Ren Y, Suganthan PN, Amaratunga GA. Empir-
ical mode decomposition based ensemble deep learning
for load demand time series forecasting. Applied Soft
Computing. 2017 May 1;54:246-55.
[23] Rahman A, Srikumar V, Smith AD. Predicting electric-
ity consumption for commercial and residential build-
ings using deep recurrent neural networks. Applied En-
ergy. 2018 Feb 15;212:372-85.
[24] Bouktif S, Fiaz A, Ouni A, Serhani M. Optimal deep
learning lstm model for electric load forecasting using
feature selection and genetic algorithm: Comparison
with machine learning approaches. Energies. 2018 Jun
22;11(7):1636.
[25] Zheng H, Yuan J, Chen L. Short-term load forecast-
ing using EMD-LSTM neural networks with a Xgboost
algorithm for feature importance evaluation. Energies.
2017 Aug 8;10(8):1168.
[26] Shi H, Xu M, Li R. Deep learning for household load
forecasting-A novel pooling deep RNN. IEEE Transac-
tions on Smart Grid. 2018 Sep;9(5):5271-80.
[27] Guo Z, Zhou K, Zhang X, Yang S. A deep learning
model for short-term power load and probability density
forecasting. Energy. 2018 Oct 1;160:1186-200.
[28] Wen L, Zhou K, Yang S, Lu X. Optimal load dispatch
of community microgrid with deep learning based solar
power and load forecasting. Energy. 2019 Jan 16.
[29] Torres JF, Fernandez AM, Troncoso A, Martinez-
Alvarez F. Deep learning-based approach for time series
forecasting with application to electricity load. In In-
ternational Work-Conference on the Interplay Between
Natural and Artificial Computation 2017 Jun 19 (pp.
203-212). Springer, Cham.
[30] Din GM, Marnerides AK. Short term power load fore-
casting using deep neural networks. In 2017 Interna-
tional Conference on Computing, Networking and Com-
munications (ICNC) 2017 Jan 26 (pp. 594-598). IEEE.
[31] Bibri SE. The IoT for smart sustainable cities of the
future: An analytical framework for sensor-based big
data applications for environmental sustainability. Sus-
tainable Cities and Society. 2018 Apr 1, 38: 230-253.
Sana et al.: Preprint submitted to Elsevier Page 15 of 16
Big Data Predictive Analytics of Electricity Load and Price
[32] Bibri SE, Krogstie J. Smart sustainable cities of the
future: An extensive interdisciplinary literature review.
Sustainable Cities and Society. 2017 May 1, 31: 183-
212.
[33] Silva BN, Khan M, Han K. Towards sustainable smart
cities: A review of trends, architectures, components,
and open challenges in smart cities. Sustainable Cities
and Society. 2018 Apr 1, 38: 697-713.
[34] Ibrahim M, El-Zaart A, Adams C. Smart sustainable
cities roadmap: Readiness for transformation towards
urban sustainability. Sustainable cities and society. 2018
Feb 1, 37: 530-540.
[35] Massana J, Pous C, Burgas L, Melendez J, Colomer J.
Identifying services for short-term load forecasting us-
ing data driven models in a Smart City platform. Sus-
tainable cities and society. 2017 Jan 1, 28: 108-17.
[36] White, B.W. Principles of neurodynamics: Perceptrons
and the theory of brain mechanisms. Spartan Books,
Washington DC. 1963.
[37] Youssef A, Delpha C, Diallo D. An optimal fault de-
tection threshold for early detection using Kullback-
âĂŞLeibler divergence for unknown distribution data.
Signal Processing. 2016 Mar 1;120:266-79.
[38] Hida T, Kuo HH, Potthoff J, Streit L. White noise: an
infinite dimensional calculus. Springer Science & Busi-
ness Media; 2013 Jun 29.
[39] Chen S, Billings SA, Grant PM. Non-linear system
identification using neural networks. International jour-
nal of control.
[40] Chen X, Li S, Wang W. New de-noising method for
speech signal based on wavelet entropy and adaptive
threshold. Journal of Information & Computational Sci-
ence. 2015;12(3):1257-65.
[41] NYISO Market Operation Data, https://www.nyiso.
com/load-data (Last visited on 16𝑡ℎ March 2019)
[42] PJM Market Operation Data, https://www.pjm.com
(Last visited on 16𝑡ℎ March 2019)
[43] ISO NE Market Operations Data, https://www.iso-ne.
com/isoexpress/web/reports/pricing/-/tree/zone- info
(Last visited on 10𝑡ℎ November 2018)
[44] PJM Market Operations Data, https://dataminer2.pjm.
com (Last visited on 10𝑡ℎ November 2018)
[45] Burke PJ, Abayasekara A. The price elas-
ticity of electricity demand in the United
States: A three-dimensional analysis. Energy J.
2017;39(2):123âĂŞ145.
Sana et al.: Preprint submitted to Elsevier Page 16 of 16